Generic compression has been around for years and can compress plain text files quite well. But generic compressors, like WinZip, cannot compress files that have already been compressed. If you think about the file types that are driving storage growth they are almost all already compressed; Images, Office documents, PDF, zip and tar files are all examples. Infima content-aware compression can, in contrast, reduce the space taken by those files by up to 2:1 ratio.
Infima provides a solution for the rampant growth of data storage with a technology that can dramatically reduce the amount of space taken by storing each bit of data content with bit-for-bit losslessness of all the original image data. Infima does this with a patented process called NCM (Neural Compress Method).
Content Aware Compression
By understanding the layout of specific content – like the digital image – you can make intelligent decisions about how to reorganize that data for optimal compression. Content-aware storage optimization starts by identifying block/objects types and then applying file type-specific algorithms to optimize the storage of those files.
Re-compression and very few duplicates
A relatively small number of file types are driving the explosion of digital content. These are files like Images and Video (JPEG, MEPG, TIFF, GIF, PNG), Compound Documents (zip, email, HTML, Web Pages, PDF) and Microsoft Office (PowerPoint, Word, Excel, Share point etc.).
The majority of these files files are already compressed by their applications. Generic compression will not work on these already compressed files. Content-aware compression techniques can reduce the space taken up by these files by as much as 50% or 2:1.
How the solution works
In a nutshell, the Infima solution uses a patent-pending N-layer neural network to perform a real-time compression without compromising high availability. The neural network implements context, mixing algorithms combined with network learning models to enable in-line data compression. For each input buffer, the N-layer network learns patterns in the buffer and predicts the probability range for each pattern occurrence within the buffer. The network then compresses a similar quantity of buffer patterns according to their adaptive context models generated in real-time. As a result, the compression and decompression times are significantly improved. The context flexibility further ensures the compression of any type of data, including combinations of different data types, thereby enabling both block and file level compression.
Infima Approach
Infima begins by identifying each bit of content and then makes intelligent decisions about how to compress them. This is not just traditional compression or dedupe – but an intelligent set of operations triggered by first identifying the type of the content.
The Infima system involves three key steps:
- Extract: The Infima solution extracts fundamental content objects – the actual data – from each data stream/file. This often requires de-layering compound documents, decoding already-compressed files and going through several steps to get to the fundamental objects in a given content.
- Correlate: Once the content objects have been identified, Infima correlates objects both within and across files, whereas dedupe looks for exact matches of duplicate data chunks across files, Infima can find both exact matches and similar buffer objects. Correlation works at the information level, not just the byte level.
- Compress: After correlation, Infima compresses the context objects employing patent-pending content-aware algorithms to get the best possible space savings for each fundamental data type.
Infima Compression Concept: An Example
A Zip file is a compressed archive of multiple files. If you try to zip a zip file, it won’t shrink any further. Infima identifies a zip block content to extract each block data.
Infima Infima will then extract the various PDF objects, reading the header and identifying the text, image, and other sections.
The Infima Correlation process will identify the relationships between the images even though they are not identical and even though at the byte level they share no duplicate patterns. A dedupe product would never identify these similarities. Infima eliminates the redundant information across the files and then routes each part of each block to the content-aware compressor.
Deploying Infima
Infima solution is sold as an integrated solution, including the Infima Encoder and the Infima Decoder. The Encoder is a high-performance SDK that reads files from your existing storage, optimizes them, and writes them back. The Decoder allows users and applications to access those compressed files transparently as though they have never been compressed. Infima can work with all sorts of network accessible storage – NAS appliances, Linux or Windows file servers and any other storage that supports NFS, CIFS, or WebDAV.
Core Technology – NCM (Neural Compress Method)
- A new Compression Technology based on a Learning System.
- Neural system understands & analyses the data (others are built on plain statistics).
- The system implements a learning model during the compression phase. Unique proprietary compression formats, and yet is able to compress to standard formats.
- Sub-modeling is made for specific data, enhancing the capabilities.
Technological Competitive Advantages
- Provide highest compression ratio & speed.
- Compression of “hard to crack” formats and data like- JPEG, PDF, DOC, ZIP, JAR, PNG, GIF and more.
- There is no dependency on data type! It provides excellent compression for a wide array.
- No loss of information during the compression– Lossless Compression is where we excel and others fail!
- Advantage increases as long as hardware improves.