Compress files faster using Snappy

  •        0
  

We aggregate and tag open source projects. We have collections of more than one million projects. Check out the projects section.



Snappy is the fast compression/decompression library from Google. It does not target to reduce compression size but it does faster compression. Most of the open source products use Snappy.

<dependency>
    <groupId>org.xerial.snappy</groupId>
    <artifactId>snappy-java</artifactId>
    <version>1.1.7.2</version>
</dependency>
public class SnappyTest {

public static void testCompress(String inFilePath, String outFilePath) {

try {
long start = Calendar.getInstance().getTimeInMillis();
Path inPath = FileSystems.getDefault().getPath(inFilePath);
Path outPath = FileSystems.getDefault().getPath(outFilePath);

byte[] compressed = Snappy.compress(Files.readAllBytes(inPath));
Files.write(outPath, compressed, StandardOpenOption.CREATE);

System.out.println("Snappy time " + (Calendar.getInstance().getTimeInMillis() - start));
}
catch (IOException e) {
e.printStackTrace();
}
}

public static void testCompressZip(String inFilePath, String outFilePath) {

try {
long start = Calendar.getInstance().getTimeInMillis();
Path inPath = FileSystems.getDefault().getPath(inFilePath);
Path outPath = FileSystems.getDefault().getPath(outFilePath);

ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(outFilePath));
zos.putNextEntry(new ZipEntry(inPath.getFileName().toString()));

byte data[] = Files.readAllBytes(inPath);
zos.write(data, 0, data.length);
zos.closeEntry();
zos.close();
System.out.println("Zip time " + (Calendar.getInstance().getTimeInMillis() - start));
}
catch (IOException e) {
e.printStackTrace();
}
}
}

public static void main( String[] args )
{
   String inDir = "<input-directory>";
   String outDir = "<output-directory>";

   File inDirFile = new File(inDir);
   String inputFiles[] = inDirFile.list();
   for (String inputFile : inputFiles) {
        String inFile = inDir + File.separator + inputFile;
        String outSFile = outDir + File.separator + "output-s" + index + ".out";
        String outzFile = outDir + File.separator + "output-z" + index + ".zip";

        System.out.println("File: " + inputFile);
        SnappyTest.testCompress(inFile, outSFile);
        SnappyTest.testCompressZip(inFile, outzFile);
    }

    Snappy.cleanUp();
}

Intially i tested for couple of files and saw Snappy is taking more time and size. Then i dropped lot of files (close to 100) in the input directory and tried to run the above code, Snappy outperfomed JDK compression.

Snappy is good library, when we have lots of data to compress and we need faster compression with slight compromise on compression size.

Reference:
https://google.github.io/snappy/

 


Sponsored:
To find embedded technology information about MCU, IoT, AI etc Check out embedkari.com.


   

We publish blog post about open source products. If you are interested in sharing knowledge about open source products, please visit write for us




Related Articles

An introduction to web cache proxy server - nuster

  • web-cache proxy-server load-balancer

Nuster is a simple yet powerful web caching proxy server based on HAProxy. It is 100% compatible with HAProxy, and takes full advantage of the ACL functionality of HAProxy to provide fine-grained caching policy based on the content of request, response or server status. This article gives an overview of nuster - web cache proxy server, its installation and few examples of how to use it.

Read More


LogicalDOC - Open Source DMS

  • dms document-management-system

LogicalDOC is both a document management and a collaboration system. The software is loaded with many functions and allows organizing, indexing, retrieving, controlling and distributing important business documents securely and safely for any organization and individual.

Read More


An Introduction to the UnQLite Embedded NoSQL Database Engine

  • database nosql embedded key-value-store

UnQLite is an embedded NoSQL database engine. It's a standard Key/Value store similar to the more popular Berkeley DB and a document-store database similar to MongoDB with a built-in scripting language called Jx9 that looks like Javascript. Unlike most other NoSQL databases, UnQLite does not have a separate server process. UnQLite reads and writes directly to ordinary disk files. A complete database with multiple collections is contained in a single disk file. The database file format is cross-platform, you can freely copy a database between 32-bit and 64-bit systems or between big-endian and little-endian architectures.

Read More


Microsoft released F# under Open Source

  • fsharp opensource

F# is a functional programming language for the .NET Framework. It combines the succinct, expressive and compositional style of functional programming with the runtime, libraries, interoperability, and object model of .NET. Microsoft recently released its source code under Apache License.

Read More


10 sites to get the large data set or data corpus for free

  • search test-data large-data-set data-corpus dataset

You may require GBs of data to do performance or load testing. How your app behaves when there is loads of data. You need to know the capacity of your application. This is the frequently asked question from the sales team "The customer is having 100GB of data and he wants to know whether our product will handle this? If so how much RAM / Disk storage required?". This article has pointers to the large data corpus.

Read More



React Patent Clause Licensing issue. Is it something to worry?

  • react react-license facebook

React libraries from Facebook is one of the most used UI libraries. It is competitive to AngularJS. There are many open source UI components or frameworks available but mostly people narrow down to two choices Angular / React. Recently Facebook has updated React license and added a patent clause which makes companies to worry and rethink whether to use React or not.

Read More


10 Free services for your Website / Blog. Just plug it.

  • free website blog free-service free-resources

Each website / blog delivers useful content or service to its users. But website themselves requires some service to monitor and increase its presence. Here are few free services which could be used by Website / Blog. This will be very much helpful for small business owners.

Read More


Generate Thumbnail in Java using Thumbnailator library 

  • thumbnail image-processing java

In our work there will be situation where we need to resize the image, generate thumbnails and so on. Users need to have little bit of image processing knowledge to achieve it. We have Java ImageIO APIs to achieve these functionalities. As said, we need to be aware of or spend time in learning these APIs. To help us, Thumbnailator library provides easy fluent style API and generates thumbnail in simple three lines of code.

Read More


LucidWorks Vs SearchBlox - Enterprise Search Solution

  • lucene solr searchblox lucidworks enterprise-search

Enterprise search software should be capable to search the data available in the entire organization or personnel desktop. The data could be in File system, Web or in Database. It should search contents of Emails, file formats like doc, xls, ppt, pdf and lot more. There are many commercial products available but LucidWorks and SearchBlox are best and free.

Read More


AbanteCart - Easy to use open source e-commerce platform, helps selling online

  • e-commerce ecommerce cart

AbanteCart is a free, open source shopping cart that was built by developers with a passion for free and accessible software. Founded in 2010 (launched in 2011), the platform is coded in PHP and supports MySQL. AbanteCart’s easy to use admin and basic layout management tool make this open source solution both easy to use and customizable, depending on the skills of the user. AbanteCart is very user-friendly, it is entirely possible for a user with little to no coding experience to set up and use this cart. If the user would be limited to the themes and features available in base AbanteCart, there is a marketplace where third-party extensions or plugins come to the rescue.

Read More


Univention Corporate Server - An open source identity management system

  • ucs identity-management-system

Univention Corporate Server is an open source identity management system, an IT infrastructure and device management solution and an extensible platform with a store-like App Center that includes tested third party applications and further UCS components: This is what Univention combines in their main product Univention Corporate Server, a Debian GNU/Linux based enterprise distribution. This article provides you the overview of Univention Corporate Server, its feature and installation.

Read More


ONLYOFFICE Document Server, an online office app for Nextcloud and ownCloud

  • office office-suite word spreadsheet

ONLYOFFICE Document Server is a free collaborative online office suite including viewers and editors for texts, spreadsheets and presentations, fully compatible with Office Open XML formats (.docx, .xlsx, .pptx). This article provides you the overview of ONLYOFFICE Document Server, its features, installation and integration with Nextcloud and ownCloud.

Read More


Is ZooKeeper mandatory for Cloud

  • zookeeper distributed cloud

Cloud is nothing but more than one system or application distributed across the network, across the globe. It may have couple of application servers, database server, shared data storage, backup server and lot more. The resources in the distributed environment must have information about each other so that they could co-ordinate and share without any issues. ZooKeeper helps to do that.

Read More


Advantages and Disadvantages of using Hibernate like ORM libraries

  • database orm

Traditionally Programmers used ODBC, JDBC, ADO etc to access database. Developers need to write SQL queries, process the result set and convert the data in the form of objects (Data model). I think most programmers would typically write a function to convert the object to query and result set to object. To overcome these difficulties, ORM provides a mechanism to directly use objects and interact with the database.

Read More


Whats new in Lucene / Solr 4.0

  • lucene solr new-release

The release 4.0 is one of the important milestone for Lucene and Solr. It has lot of new features and performance important. Few important ones are highliggted in this article.

Read More


Restrict Solr Admin Access

  • solr searchengine tips

Solr is a search engine built on top of Lucene. It supports REST interface and has lot of built-in capabilities. Solr package has Admin UI interface which has support to perform query and even delete the contents of the index. If you are using Solr in production then you may need to restrict access. I saw couple of questions in the group related to this topic. Thought to write an article explaining few tips to restrict the user access to Solr admin UI.

Read More


How to solve CommunicationsException in Java while using Hibernate and MySQL

  • java hibernate mysql communicationsexception timeout

You might have faced CommunicationsException, basically timeout issue in Java while connecting to MySQL using Hibernate. The session would be timed out after certain period of time. You might be thinking that the your site or application is running without any issue but it would have stopped or crashed due to exception.

Read More


Activiti - Open Source Business Automation

  • business-automation business bpm

Activiti Cloud is the first Cloud Native BPM framework built to provide a scalable and transparent solution for BPM implementations in cloud environments. The BPM discipline was created to provide a better understanding of how organisations do their work and how this work can be improved in an iterative fashion.

Read More


GreenMail - Email Test Framework in Java

  • email email-server test automation

In any project there will be a need to send mail out to users. It could be an alert mail, forget password or authentication related mail. Mail is the default communication between the software and the users. As a developer, we can write code, to send out a mail but we need to make sure whether it got successfully received and how the body of mail, Is it the same like what we have sent. GreenMail is a Email test framework which helps to send and receive mails. It is a test framework which supports SMTP, POP3, IMAP including SSL.

Read More


LetsEncrypt certificate using ZeroSSL tools

  • ssl-certificate certificate security

Let’s Encrypt is a free, automated, and open Certificate Authority. It uses ACME protocol to validate your domain. If you have complete control over your domain, you can get a certificate for free. In order to provide secure access to your public network like HTTPS, LDAPS etc you need a certificate from a Certificate Authority. The cost of the certificate range from 10$ to 100$. If you want a wildcard certificate then it may cost more. The certificate is valid for one year and you need to pay and renew every year. Let's Encrypt comes for the rescue. You can create and renew certificate for few.

Read More