Storm is a simple and powerful toolkit for BoltDB. Basically, Storm provides indexes, a wide range of methods to store and fetch data, an advanced query system, and much more.In addition to the examples below, see also the examples in the GoDoc.
https://github.com/asdine/stormTags | storm boltdb database toolkit bucket query-engine indexes utility go-library |
Implementation | Go |
License | MIT |
Platform | Windows MacOS Linux |
Queries are mapped to operator trees in the spirit of the query plans of relational database systems. These are are in turn mapped to Storm workers. (There is a parallel implementation of each operator, so in general an operator is processed by multiple workers). Some operations of relational algebra, such as selections and projections, are quite simple, and assigning them to separate workers is inefficient. Rather than requiring the predecessor operator to send its output over the network to the workers implementing these simple operations, the simple operations can be integrated into the predecessor operators and postprocess the output there. This is typically also done in classical relational database systems, but in a distributed environment, the benefits are even greater. In the Squall API, query plans are built bottom-up from operators (called components or super-operators) such as data source scans and joins; these components can then be extended by postprocessing operators such as projections. Here is an example of a fully running query with window semantics.
BoltHold is a simple querying and indexing layer on top of a Bolt DB instance. For a similar library built on Badger see BadgerHold. The goal is to create a simple, higher level interface on top of Bolt DB that simplifies dealing with Go Types and finding data, but exposes the underlying Bolt DB for customizing as you wish. By default the encoding used is Gob, so feel free to use the GobEncoder/Decoder interface for faster serialization. Or, alternately, you can use any serialization you want by supplying encode / decode funcs to the Options struct on Open.
boltdb bucket query-criteria nosqlStorm is a distributed real time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real time processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more.
real-time-computation analytics real-time stream-processing distributed-rpc data-processingPerfect Storm started life as a code generation utility. It now includes Model Components, Rule Engine details and a class loader. It requires .NET 4.0 to be installed.
code-generation-tool codegen mdd xsltStreamparse lets you run Python code against real-time streams of data via Apache Storm. With streamparse you can create Storm bolts and spouts in Python without having to write a single line of Java. It also provides handy CLI utilities for managing Storm clusters and projects.The Storm/streamparse combo can be viewed as a more robust alternative to Python worker-and-queue systems, as might be built atop frameworks like Celery and RQ. It offers a way to do "real-time map/reduce style computation" against live streams of data. It can also be a powerful way to scale long-running, highly parallel Python processes in production.
apache-storm stormSummingbird is a library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding.The logic is exactly the same, and the code is almost the same. The main difference is that you can execute the Summingbird program in "batch mode" (using Scalding), in "realtime mode" (using Storm), or on both Scalding and Storm in a hybrid batch/realtime mode that offers your application very attractive fault-tolerance properties.
StormCrawler is an open source collection of resources for building low-latency, scalable web crawlers on Apache Storm. StormCrawler is a library and collection of resources that developers can leverage to build their own crawlers. The good news is that doing so can be pretty straightforward. Often, all you'll have to do will be to declare StormCrawler as a Maven dependency, write your own Topology class (tip : you can extend ConfigurableTopology), reuse the components provided by the project and maybe write a couple of custom ones for your own secret sauce.
web-crawler apache-storm distributed crawler web-scraping* Written in [Go](http://golang.org)* Easy to get running (3 or 4 commands, below)* RESTful API * or a REPL if you prefer* Built-in query editor and visualizer* Multiple query languages: * JavaScript, with a [Gremlin](http://gremlindocs.com/)-inspired\* graph object. * (simplified) [MQL](https://developers.google.com/freebase/v1/mql-overview), for Freebase fans* Plays well with multiple backend stores: * [LevelDB](http://code.google.com/p/leveldb/) * [Bolt](http://github.com/boltdb/bolt) *
When testing using a database with rollback after each test, failing tests are very hard to resolve. Data Storm is a simple DB viewer directly launchable from within your test code to enable you to inspect the current state of the database.
Xapian is an Open Source Search Engine Library. It is written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C# and Ruby. Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.
searchengine search-engine full-text-search lucene-alternativeSTORM is a free and open source tool for testing web services. It is written mostly in F#. (I love this language!) STORM allows you to 1. Test web services written using any technology (.NET , Java, etc.) 2. Dynamically invoke web service methods even those that h...
web-services tools soap storm wcfStorm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN. It is still a work in progress.To run the tests, you execute the following command.
Code examples that show how to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark 1.1+ while using Apache Avro as the data serialization format. Take a look at the Kafka Streams code examples at https://github.com/confluentinc/examples.
apache-kafka kafka apache-storm storm spark apache-spark integration avro apache-avroScalaStorm provides a Scala DSL for Nathan Marz's Storm real-time computation system. It also provides a framework for Scala and SBT development of Storm topologies. There is a sample Trident topology, in src/storm/scala/examples/trident. It features an experimental new DSL for doing functional Trident topologies (see FunctionalTrident.scala). I am currently soliciting feedback for this feature, so drop me a line if you like it.
Web server with built-in support for HTTP/2, Lua, Markdown, Pongo2, HyperApp, Amber, Sass(SCSS), GCSS, JSX, BoltDB, Redis, PostgreSQL, MariaDB/MySQL, rate limiting, graceful shutdown, plugins, users and permissions. Uses no external libraries, only pure Go.
http2 scss pongo2 amber application-server redis boltdb mariadb postgresql gcss web-serverheroprotocol is a reference Python library and standalone tool to decode Heroes of the Storm replay files into Python data structures.heroprotocol can be used as a base-build-specific library to decode binary blobs, or it can be run as a standalone tool to pretty print information from supported replay files.
Palo is an MPP-based interactive SQL data warehousing for reporting and analysis. Palo mainly integrates the technology of Google Mesa and Apache Impala. Unlike other popular SQL-on-Hadoop systems, Palo is designed to be a simple and single tightly coupled system, not depending on other systems. Palo not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Palo not only provides batch data loading, but also provides near real-time mini-batch data loading. Palo also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Palo. In Baidu, the largest Chinese search engine, we run a two-tiered data warehousing system for data processing, reporting and analysis. Similar to lambda architecture, the whole data warehouse comprises data processing and data serving. Data processing does the heavy lifting of big data: cleaning data, merging and transforming it, analyzing it and preparing it for use by end user queries; data serving is designed to serve queries against that data for different use cases. Currently data processing includes batch data processing and stream data processing technology, like Hadoop, Spark and Storm; Palo is a SQL data warehouse for serving online and interactive data reporting and analysis querying.
data-warehouseLux is an open source XML search engine using Lucene /Solr and Saxon XQuery/XSLT processor. Lux provides XML-aware indexing, an XQuery 1.0 optimizer that rewrites queries to use the indexes, and a function library for interacting with Lucene via XQuery. These capabilities are tightly integrated with Solr, and leverage its application framework in order to deliver a REST service, application server, and supporting tools.
search-engine searchengine xml-searchengine xml-search xml-databaseCodernityDB is pure python (no 3rd party dependency), fast multi-platform, schema-less, NoSQL database. It has optional support for HTTP server version (CodernityDB-HTTP), and also Python client library (CodernityDB-PyClient) that aims to be 100% compatible with embeded version. It is an advanced key-value database, with multiple key-values indexes in the same engine. It has support for Multiple indexes, Custom storage, Sharding.
database embedded-database key-value-store nosql python-database
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.