Syntax highlighter header

Monday, 20 July 2020

Setting Compression Level in GZIPOutputStream

Most of the time people want to compress files which they are generating to save space on disk and bandwidth in transmission. Apart from saving space compression can actually speed up the application because of low disk usage because of small file size. For this the compression and decompression need to done in memory and not after writing whole uncompressed contents to disk.

There are two formats which can be used if you are generating files from Java. One is GZIPOutputStream which is used for generating GZIP files, other is ZipOutputStream which is used for generating ZIP files. There is one basic difference between GZIP and ZIP file. GZIP file can contain only one file inside it and name of the file contained inside it is optional and while ZIP file is an archive of multiple files and name of the files contained in a ZIP file is mandatory while creating a ZIP file. Because of presence of multiple files inside a ZIP file. ZIP file cannot be passed to a filter which will decompress a ZIP file on the fly from an input stream because filter can't select one file out of multiple files which may be present in the ZIP file.

For seamless processing of compressed file while reading GZIP format is most suitable one. But unfortunately Java API for GZIPOutputStream lacks one method which can be used to controlling compression level to achieve BEST_SPEED or BEST_COMPRESSION as per your need. This facility is available in ZipOutputStream. Sometime people just use ZipOutput stream by setting compression level to BEST_SPEED to gain performance  when they actually need GZIPOutputStream for compressing their data. It create problem for the reader because now he need to handle a archive which can potentially contain multiple file rather than a compressed file. In memory filters can't be used for decompression because of possibility of multiple files in ZIP file. Therefore there are no libraries which can provide in memory filter for reading ZIP file contents as a stream of data.

Fortunately you can set compression level in GZIPOutputStream also by creating sub class of GZIPOutputStream and exposing setLevel(int level) method in your subclass.  We did it in our code and achieved even slightly better results than using ZipOutputStream with BEST_SPEED compression level. Following is comparison when compressing a 5.4 GB file:


Zip compression with BEST_SPEED        48078219 bytes     80 seconds
GZip compression with BEST_SPEED     48078113 bytes     78 seconds

Here is the code for MyGZIPOutputStream class:

import java.util.zip.*;
import java.io.*;

public class MyGZIPOutputStream extends GZIPOutputStream
{
    /**
     * Creates a new output stream with the specified buffer size.
     * @param out the output stream
     * @param size the output buffer size
     * @exception IOException If an I/O error has occurred.
     * @exception IllegalArgumentException if size is <= 0
     */
    public MyGZIPOutputStream(OutputStream out, int size) throws IOException {
        super(out, size);
    }

    /**
     * Creates a new output stream with a default buffer size.
     * @param out the output stream
     * @exception IOException If an I/O error has occurred.
     */
    public MyGZIPOutputStream(OutputStream out) throws IOException {
        this(out, 512);
    }

    /**
     * Sets the compression level for subsequent entries which are DEFLATED.
     * The default setting is DEFAULT_COMPRESSION.
     * @param level the compression level (0-9)
     * @exception IllegalArgumentException if the compression level is invalid
     */
    public void setLevel(int level) {
        def.setLevel(level);
    }
}

Sample file for using it BEST_SPEED compression in GZIPOutputStream:

import java.io.BufferedWriter;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.util.zip.*;

public class GZipCompression {

    public static void main(String[] args) throws IOException {
        compressInputFile("a.txt", "a.txt.gz");
    }

    public static void compressInputFile(String inputFileName,
            String outputFileName) throws IOException {
        FileOutputStream fos = new FileOutputStream(new File(outputFileName));
        MyGZIPOutputStream gzos = null;
        byte[] buffer = new byte[1024];
        gzos = new MyGZIPOutputStream(fos);
        gzos.setLevel(Deflater.BEST_SPEED);
        long startTime = System.currentTimeMillis();              

        FileInputStream fis = new FileInputStream(inputFileName);

        int length;
        while ((length = fis.read(buffer)) > 0) {
            gzos.write(buffer, 0, length);
        }
        fis.close();
        gzos.close();

        long endTime = System.currentTimeMillis();
        System.out.println("Time taken to gzip "+ (endTime-startTime) + " miliseconds.");
    }
}

No comments:

Post a Comment