Compress files faster using Snappy

  •        0
  

We aggregate and tag open source projects. We have collections of more than one million projects. Check out the projects section.



Snappy is the fast compression/decompression library from Google. It does not target to reduce compression size but it does faster compression. Most of the open source products use Snappy.

<dependency>
    <groupId>org.xerial.snappy</groupId>
    <artifactId>snappy-java</artifactId>
    <version>1.1.7.2</version>
</dependency>
public class SnappyTest {

public static void testCompress(String inFilePath, String outFilePath) {

try {
long start = Calendar.getInstance().getTimeInMillis();
Path inPath = FileSystems.getDefault().getPath(inFilePath);
Path outPath = FileSystems.getDefault().getPath(outFilePath);

byte[] compressed = Snappy.compress(Files.readAllBytes(inPath));
Files.write(outPath, compressed, StandardOpenOption.CREATE);

System.out.println("Snappy time " + (Calendar.getInstance().getTimeInMillis() - start));
}
catch (IOException e) {
e.printStackTrace();
}
}

public static void testCompressZip(String inFilePath, String outFilePath) {

try {
long start = Calendar.getInstance().getTimeInMillis();
Path inPath = FileSystems.getDefault().getPath(inFilePath);
Path outPath = FileSystems.getDefault().getPath(outFilePath);

ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(outFilePath));
zos.putNextEntry(new ZipEntry(inPath.getFileName().toString()));

byte data[] = Files.readAllBytes(inPath);
zos.write(data, 0, data.length);
zos.closeEntry();
zos.close();
System.out.println("Zip time " + (Calendar.getInstance().getTimeInMillis() - start));
}
catch (IOException e) {
e.printStackTrace();
}
}
}

public static void main( String[] args )
{
   String inDir = "<input-directory>";
   String outDir = "<output-directory>";

   File inDirFile = new File(inDir);
   String inputFiles[] = inDirFile.list();
   for (String inputFile : inputFiles) {
        String inFile = inDir + File.separator + inputFile;
        String outSFile = outDir + File.separator + "output-s" + index + ".out";
        String outzFile = outDir + File.separator + "output-z" + index + ".zip";

        System.out.println("File: " + inputFile);
        SnappyTest.testCompress(inFile, outSFile);
        SnappyTest.testCompressZip(inFile, outzFile);
    }

    Snappy.cleanUp();
}

Intially i tested for couple of files and saw Snappy is taking more time and size. Then i dropped lot of files (close to 100) in the input directory and tried to run the above code, Snappy outperfomed JDK compression.

Snappy is good library, when we have lots of data to compress and we need faster compression with slight compromise on compression size.

Reference:
https://google.github.io/snappy/

 


Sponsored:
To find embedded technology information about MCU, IoT, AI etc Check out embedkari.com.


   

We publish blog post about open source products. If you are interested in sharing knowledge about open source products, please visit write for us

Subscribe to our newsletter.

We will send mail once in a week about latest updates on open source tools and technologies. subscribe our newsletter



Related Articles

An introduction to web cache proxy server - nuster

  • web-cache proxy-server load-balancer

Nuster is a simple yet powerful web caching proxy server based on HAProxy. It is 100% compatible with HAProxy, and takes full advantage of the ACL functionality of HAProxy to provide fine-grained caching policy based on the content of request, response or server status. This article gives an overview of nuster - web cache proxy server, its installation and few examples of how to use it.

Read More


Best open source Text Editors

  • text-editor editor tools dev-tools

Text editors are mainly used by programmers and developers for manipulating plain text source code, editing configuration files or preparing documentation and even viewing error logs. Text editors is a piece of software which enables to create, modify and delete files that a programmer is using while creating website or mobile app.In this article, we will discuss about top 7 all-round performing text editors which is highly supportive for programmers.

Read More


Thymeleaf - Text display, Iteration and Conditionals

  • thymeleaf template-engine web-programming java

Thymeleaf is a server-side Java template engine for both web and standalone environments. It is a better alternative to JavaServer Pages (JSP). Spring MVC and Thymeleaf compliment each other if chosen for web application development. In this article, we will discuss how to use Thymeleaf.

Read More


JHipster - Generate simple web application code using Spring Boot and Angular

  • jhipster spring-boot angular web-application

JHipster is one of the full-stack web app development platform to generate, develop and deploy. It provides the front end technologies options of React, Angular, Vue mixed with bootstrap and font awesome icons. Last released version is JHipster 6.0.1. It is licensed under Apache 2 license.

Read More


LogicalDOC - Open Source DMS

  • dms document-management-system

LogicalDOC is both a document management and a collaboration system. The software is loaded with many functions and allows organizing, indexing, retrieving, controlling and distributing important business documents securely and safely for any organization and individual.

Read More



An Introduction to the UnQLite Embedded NoSQL Database Engine

  • database nosql embedded key-value-store

UnQLite is an embedded NoSQL database engine. It's a standard Key/Value store similar to the more popular Berkeley DB and a document-store database similar to MongoDB with a built-in scripting language called Jx9 that looks like Javascript. Unlike most other NoSQL databases, UnQLite does not have a separate server process. UnQLite reads and writes directly to ordinary disk files. A complete database with multiple collections is contained in a single disk file. The database file format is cross-platform, you can freely copy a database between 32-bit and 64-bit systems or between big-endian and little-endian architectures.

Read More


Getting Started on Undertow Server

  • java web-server undertow rest

Undertow is a high performing web server which can be used for both blocking and non-blocking tasks. It is extermely flexible as application can assemble the parts in whatever way it would make sense. It also supports Servlet 4.0, JSR-356 compliant web socket implementation. Undertow is licensed under Apache License, Version 2.0.

Read More


Angular Service Workers Usage Guide

  • angular service-worker offline-app

Web developers come across scenarios like web application completely breaks when workstation goes offline. Likewise to get into our application, every time we need to open a browser and then access it. Instead if it is in app, it will be easy to access for end-user. Push notifications similar to email client need to be done through web application. All these are addressed by a magic called service worker.

Read More


Microsoft released F# under Open Source

  • fsharp opensource

F# is a functional programming language for the .NET Framework. It combines the succinct, expressive and compositional style of functional programming with the runtime, libraries, interoperability, and object model of .NET. Microsoft recently released its source code under Apache License.

Read More


10 sites to get the large data set or data corpus for free

  • search test-data large-data-set data-corpus dataset

You may require GBs of data to do performance or load testing. How your app behaves when there is loads of data. You need to know the capacity of your application. This is the frequently asked question from the sales team "The customer is having 100GB of data and he wants to know whether our product will handle this? If so how much RAM / Disk storage required?". This article has pointers to the large data corpus.

Read More


Caching using Ehcache Java Library

  • ehcache cache java map key-value

Ehcache from Terracotta is one of Java's most widely used Cache. It is concurrent and highly scalable. It has small footprint with SL4J as the only dependencies. It supports multiple strategies like Expiration policies, Eviction policies. It supports three storage tiers, heap, off-heap, disk storage. There are very few caching products supports multiple tier storage. If you want to scale, you cannot store all items in heap there should be support for off-heap and disk storage. Ehcache is licensed under Apache 2.0. In this article, we can see about basic usage of Ehcache.

Read More


React Patent Clause Licensing issue. Is it something to worry?

  • react react-license facebook

React libraries from Facebook is one of the most used UI libraries. It is competitive to AngularJS. There are many open source UI components or frameworks available but mostly people narrow down to two choices Angular / React. Recently Facebook has updated React license and added a patent clause which makes companies to worry and rethink whether to use React or not.

Read More


10 Free services for your Website / Blog. Just plug it.

  • free website blog free-service free-resources

Each website / blog delivers useful content or service to its users. But website themselves requires some service to monitor and increase its presence. Here are few free services which could be used by Website / Blog. This will be very much helpful for small business owners.

Read More


Generate Thumbnail in Java using Thumbnailator library 

  • thumbnail image-processing java

In our work there will be situation where we need to resize the image, generate thumbnails and so on. Users need to have little bit of image processing knowledge to achieve it. We have Java ImageIO APIs to achieve these functionalities. As said, we need to be aware of or spend time in learning these APIs. To help us, Thumbnailator library provides easy fluent style API and generates thumbnail in simple three lines of code.

Read More


LucidWorks Vs SearchBlox - Enterprise Search Solution

  • lucene solr searchblox lucidworks enterprise-search

Enterprise search software should be capable to search the data available in the entire organization or personnel desktop. The data could be in File system, Web or in Database. It should search contents of Emails, file formats like doc, xls, ppt, pdf and lot more. There are many commercial products available but LucidWorks and SearchBlox are best and free.

Read More


AbanteCart - Easy to use open source e-commerce platform, helps selling online

  • e-commerce ecommerce cart

AbanteCart is a free, open source shopping cart that was built by developers with a passion for free and accessible software. Founded in 2010 (launched in 2011), the platform is coded in PHP and supports MySQL. AbanteCart’s easy to use admin and basic layout management tool make this open source solution both easy to use and customizable, depending on the skills of the user. AbanteCart is very user-friendly, it is entirely possible for a user with little to no coding experience to set up and use this cart. If the user would be limited to the themes and features available in base AbanteCart, there is a marketplace where third-party extensions or plugins come to the rescue.

Read More


Univention Corporate Server - An open source identity management system

  • ucs identity-management-system

Univention Corporate Server is an open source identity management system, an IT infrastructure and device management solution and an extensible platform with a store-like App Center that includes tested third party applications and further UCS components: This is what Univention combines in their main product Univention Corporate Server, a Debian GNU/Linux based enterprise distribution. This article provides you the overview of Univention Corporate Server, its feature and installation.

Read More


PySpark: Installation, RDDs, MLib, Broadcast and Accumulator

  • pyspark spark python rdd big-data

We knew that Apace Spark- the most famous parallel computing model or processing the massive data set is written in Scala programming language. The Apace foundation offered a tool to support the Python in Spark which was named PySpark. The PySpark allows us to use RDDs in Python programming language through a library called Py4j. This article provides basic introduction about PySpark, RDD, MLib, Broadcase and Accumulator.

Read More


How to install and setup Redis

  • redis install setup redis-cluster

Redis is an open source (BSD licensed), in-memory data structure store, used also as a database cache and message broker. It is written in ANSI C and works in all the operating systems. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. This article explains about how to install Redis.

Read More


RESTEasy Advanced guide - File Upload

  • resteasy rest-api file-upload java

RESTEasy is a JBoss project that provides various frameworks to help you build RESTful Web Services and RESTful Java applications. It is a fully certified and portable implementation of the JAX-RS 2.1 specification, a JCP specification that provides a Java API for RESTful Web Services over the HTTP protocol. It is licensed under the ASL 2.0.

Read More