We aggregate and tag open source projects. We have collections of more than one million projects. Check out the projects section.
Apache Cassandra was designed by Facebook and was open-sourced in July 2008. It is regarded as perfect choice when the users demand scalability and high availability without any impact towards performance. Apache Cassandra is highly scalable, high-performance distributed database designed to handle large voluminous amounts of data across many commodity servers with no failure. As compared to other popular distributed databases like Riak, HBase and Voldemort, Cassandra offers a robust and expressive interface for modeling and querying data. Cassandra is fully NoSQL style database engine, and as compared to traditional databases, it is capable for storing and accessing largely unstructured data.
Some of the unique points surrounding Apache Cassandra are:
Features
The following are the top features of Apache Cassandra:
Apache Cassandra V/s Traditional Relational Database Management Systems
The following Table highlights the differences between Apache Cassandra and Traditional RDBMS systems:
Basis of Difference |
Apache Cassandra |
Traditional RDBMS |
Data Types |
Deals with Unstructured data and can handle data including sound, video and images. As based on NoSQL DB, it can support huge volumes of Data |
It deals with Structured data, just text, characters or numbers with moderate amount. |
Schema |
Highly-scalable and Flexible. Also known as schema-less |
Fixed Schema and generally lots of limitations in data storage |
Table Dimension |
In Cassandra, Table dimension is: Row x Column Key x Column Value. Row is unit of replication, Column is unit of storage, Relationships are represented using collections. |
In RDBMS, Table dimension is: Row x Column. Row is an individual record, Column represents attributes of a relation and there is concept of Foreign Keys, joins etc. |
Storage |
Handle large data and Keyspace is the outermost storage unit and data transfer rate is extremely fast cum automatic data distribution. |
Handles moderate data and database is the outermost storage area and data transfer rate is slow and manual distribution of data is possible in RDBMS. |
Misc. Features |
Decentralized Deployments Transactions written in many locations Deployed in Horizontal fashion |
Centralized deployments Transactions are written in one location Deployed in vertical fashion |
Cassandra Architecture
The primary objective of Cassandra is to handle large data workloads across multiple nodes without any failure. Cassandra has peer-to-peer distributed system across its nodes, and data is distributed among all the nodes in a cluster.
Writing and Reading Data
Data written to a Cassandra node is first recorded in on-disk commit log and then written to memory-based structure called memtable. When memtable’s size exceeds a configurable threshold, the data is written to an immutable file on disk called an SSTable. Buffering writes in memory in this way allows writes always to be a fully sequential operation, with many megabytes of disk I/O happening at the same time, rather than one at a time over a long period.
Reading data from Cassandra involves a number of processes that can include various memory caches and other mechanisms designed to produce fast read response times. For a read request, Cassandra consults an in-memory data structure called a Bloom filter that checks the probability of an SSTable having the needed data. The Bloom filter can tell very quickly whether the file probably has the needed data, or certainly does not have it.
Data Distribution and Replication
Data Distribution
Cassandra automatically distributes and maintains data across a cluster, freeing developers and architects to direct their energies into value-creating application features.
Cassandra has an internal component called a partitioner, which determines how data is distributed across the nodes that make up a database cluster.
Cassandra also automatically maintains the balance of data across a cluster even when existing nodes are removed or new nodes are added to a system.
Data Replication
Cassandra features a replication mechanism that is very easy to configure and administer. A Cassandra cluster can have one or more keyspaces. Replication is configured at the keyspace level, allowing different keyspaces to have different replication models. Cassandra is able to replicate data to multiple nodes in a cluster, which helps ensure reliability, continuous availability, and fast I/O operations. Cassandra automatically maintains that replication even when nodes are removed, added, or fail.
Multi-Data Center and Cloud Support
Cassandra’s replication support multiple data centers and cloud availability zones. Users can easily set up replication so that data is replicated across geographically diverse data centers, with users being able to read and write to any data center they choose and the data being automatically synchronized across all locations.
Cassandra Query Language (CQL)
The Cassandra Query Language (CQL) is the primary language for communicating with the Cassandra database. CQL is purposefully similar to Structured Query Language (SQL) used in relational databases like MySQL and Postgres.
The most basic way to interact with Cassandra is using the CQL shell, cqlsh. CQLSH is a platform that allows the user to launch the Cassandra query language (CQL). The user can perform many operations using cqlsh. Some of them include: defining a schema, inserting and altering data, executing a query etc. It basically is a coding platform for Cassandra. CQL adds an abstraction layer that hides implementation details of this structure and provides native syntaxes for collections and other common encodings.
Common ways to access CQL are:
CQL Schema:
Creating Table:
CREATE (TABLE | COLUMNFAMILY) <tablename> ('<column-definition>' , '<column-definition>')
(WITH <option> AND <option>
Inserting Data into Table
Insert into KeyspaceName.TableName(ColumnName1, ColumnName2, ColumnName3 . . . .) values (Column1Value, Column2Value, Column3Value . . . .
Updating Data into Table
Update KeyspaceName.TableName
Set ColumnName1=new Column1Value,
ColumnName2=new Column2Value,
ColumnName3=new Column3Value,
Where ColumnName=ColumnValue
Deleting Data from Table
Delete from KeyspaceName.TableName Where ColumnName1=ColumnValue
Selecting Data from Table
Select ColumnNames from KeyspaceName.TableName Where ColumnName1=Column1Value AND
ColumnName2=Column2Value
CQL prevents the following:
Conclusion
Cassandra is fully replicated distributed database. There is no master, no slave. It's always on, its performant and these are some of the features and characteristics of Cassandra that make it a fantastic solution to the big data challenge.
Reference:
Subscribe to our newsletter.
We will send mail once in a week about latest updates on open source tools and technologies. subscribe our newsletterUnQLite is an embedded NoSQL database engine. It's a standard Key/Value store similar to the more popular Berkeley DB and a document-store database similar to MongoDB with a built-in scripting language called Jx9 that looks like Javascript. Unlike most other NoSQL databases, UnQLite does not have a separate server process. UnQLite reads and writes directly to ordinary disk files. A complete database with multiple collections is contained in a single disk file. The database file format is cross-platform, you can freely copy a database between 32-bit and 64-bit systems or between big-endian and little-endian architectures.
MongoDB is the most exciting SQL-free database currently available in the market. The new kid on the block, called MongoDB is a scalable, high-performance, open source, schema free and document oriented database that focuses on the ideas of NoSQL Approach. Written in C++, it has taken rapid strides since its emergence into the public sphere as a popular way to build your database applications.
Lucene and Solr are most popular and widely used search engine. It indexes the content and delivers the search result faster. It has all capabilities of NoSQL database. This article describes about its pros and cons.
MongoDB is a popular and widely used open source NoSQL database. MongoDB is a distributed database at its core, so high availability, horizontal scaling, and geographic distribution is quite possible. It is licensed under Server Side Public License. Recently they moved to Server Side Public License, before that MongoDB was released under AGPL. This article will provide basic example to connect and work with MongoDB using Java.
Lucidworks Enterprise search solution is built on top of Apache Solr. It scales seamlessly w/sub-second response times under extreme query loads for multi-billion document collections. It has user friendly UI, which does all the job of configuration and search.
Light 4j is a fast, lightweight and cloud-native microservices framework. In this article, we will see what and how hybrid framework works and integrate with RDMS databases like MySQL, also built in option of CORS handler for in-flight request.
The release 4.0 is one of the important milestone for Lucene and Solr. It has lot of new features and performance important. Few important ones are highliggted in this article.
When there is a requirement for having local storage for the desktop application context and data needs to be synchronized to central database, we can think of Electron with PouchDB having CouchDB stack. Electron can be used for cross-platform desktop apps with pouch db as local storage. It can sync those data to centralized database CouchDB seamlessly so any point desktop apps can recover or persist the data. In this article, we will go through of creation of desktop apps with ElectronJS, PouchDB and show the sync happens seamlessly with remote CouchDB.
We knew that Apace Spark- the most famous parallel computing model or processing the massive data set is written in Scala programming language. The Apace foundation offered a tool to support the Python in Spark which was named PySpark. The PySpark allows us to use RDDs in Python programming language through a library called Py4j. This article provides basic introduction about PySpark, RDD, MLib, Broadcase and Accumulator.
Exonum is an extensible open source blockchain framework for building private blockchains which offers outstanding performance, data security, as well as fault tolerance. The framework does not include any business logic, instead, you can develop and add the services that meet your specific needs. Exonum can be used to build various solutions from a document registry to a DevOps facilitation system.
Redis is an open source (BSD licensed), in-memory data structure store, used also as a database cache and message broker. It is written in ANSI C and works in all the operating systems. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. This article explains about how to install Redis.
Enterprise search software should be capable to search the data available in the entire organization or personnel desktop. The data could be in File system, Web or in Database. It should search contents of Emails, file formats like doc, xls, ppt, pdf and lot more. There are many commercial products available but LucidWorks and SearchBlox are best and free.
Light 4j is fast, lightweight, secure and cloud native microservices platform written in Java 8. It is based on pure HTTP server without Java EE platform. It is hosted by server UnderTow. Light-4j and related frameworks are released under the Apache 2.0 license.
Lucene is most powerful and widely used Search engine. Here is the list of 7 search engines which is built on top of Lucene. You could imagine how powerful they are.
SEO friendly URL is recommended for any website which wants to be indexed and wants its presence in search results. Searchengine mostly index the static URL. It will avoid the URL which has lot of query strings. Almost all websites generate content dynamically then how could the URL be static. That is the job of the programmer.
Hazelcast is an open source In-Memory Data Grid (IMDG). It provides elastically scalable distributed In-Memory computing, widely recognized as the fastest and most scalable approach to application performance. Hazelcast makes distributed computing simple by offering distributed implementations of many developer-friendly interfaces from Java such as Map, Queue, ExecutorService, Lock and JCache.
JHipster is one of the full-stack web app development platform to generate, develop and deploy. It provides the front end technologies options of React, Angular, Vue mixed with bootstrap and font awesome icons. Last released version is JHipster 6.0.1. It is licensed under Apache 2 license.
Traditionally Programmers used ODBC, JDBC, ADO etc to access database. Developers need to write SQL queries, process the result set and convert the data in the form of objects (Data model). I think most programmers would typically write a function to convert the object to query and result set to object. To overcome these difficulties, ORM provides a mechanism to directly use objects and interact with the database.
Column oriented database or datastore as the name sounds it stores the data by column rather than by row. It has some advantages and disadvantages over traditional RDBMS. Developer should know the typical situation to choose column oriented database.
Web developers most frequent question, Should user images be stored in database or file system? Which is the best way. Both has some pros and cons.
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.