fscrawler - Elasticsearch File System Crawler (FS Crawler)

  •        43

FS Crawler offers a simple way to index binary files into elasticsearch.

https://fscrawler.readthedocs.io/
https://github.com/dadoonet/fscrawler
https://github.com/dadoonet/fscrawler/

Dependencies:

fr.pilato.elasticsearch.crawler:fscrawler-framework:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-test-framework:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-test-documents:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-core:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-settings:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-cli:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-beans:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-crawler:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-crawler-abstract:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-crawler-fs:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-crawler-ssh:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-tika:2.6-SNAPSHOT
fr.pilato.elasticsearch.crawler:fscrawler-rest:2.6-SNAPSHOT
org.elasticsearch.client:elasticsearch-rest-client:6.3.2
org.elasticsearch.client:elasticsearch-rest-high-level-client:6.3.2
com.fasterxml.jackson.core:jackson-core:2.9.6
com.fasterxml.jackson.core:jackson-databind:2.9.6
com.fasterxml.jackson.core:jackson-annotations:2.9.6
com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.9.6
com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.9.6
org.apache.tika:tika-parsers:1.18
org.xerial:sqlite-jdbc:3.23.1
com.levigo.jbig2:levigo-jbig2-imageio:2.0
com.github.jai-imageio:jai-imageio-core:1.4.0
com.github.jai-imageio:jai-imageio-jpeg2000:1.3.0
org.apache.tika:tika-langdetect:1.18
com.jcraft:jsch:0.1.54
com.beust:jcommander:1.72
org.glassfish.jersey.containers:jersey-container-grizzly2-http:2.27
org.glassfish.jersey.media:jersey-media-json-jackson:2.27
org.glassfish.jersey.media:jersey-media-multipart:2.27
org.glassfish.jersey.inject:jersey-hk2:2.27
org.apache.logging.log4j:log4j-core:2.11.1
org.apache.logging.log4j:log4j-1.2-api:2.11.1
org.apache.logging.log4j:log4j-slf4j-impl:2.11.1
org.apache.logging.log4j:log4j-jcl:2.11.1
org.apache.logging.log4j:log4j-jul:2.11.1
org.fusesource.jansi:jansi:1.17.1
commons-io:commons-io:2.5
org.apache.lucene:lucene-test-framework:7.3.1
org.hamcrest:hamcrest-all:1.3
junit:junit:4.12
com.carrotsearch.randomizedtesting:randomizedtesting-runner:2.6.3
fr.pilato.elasticsearch.testcontainers:testcontainers-elasticsearch:0.1
org.bouncycastle:bcprov-jdk15on:1.60

Tags
Implementation
License
Platform

   




Related Projects

elasticsearch-mapper-attachments - Mapper Attachments Type plugin for Elasticsearch

  •    Java

If you have a question about the plugin, please use discuss.elastic.co. If you want to report a bug, please use elasticsearch repository.The mapper attachments plugin lets Elasticsearch index file attachments in over a thousand formats (such as PPT, XLS, PDF) using the Apache text extraction library Tika.

diskover - File system crawler, disk space usage, file search engine and file system analytics powered by Elasticsearch

  •    Python

diskover is an open source file system crawler and disk space usage software that uses Elasticsearch to index and manage data across heterogeneous storage systems. Using diskover, you are able to more effectively search and organize files and system administrators are able to manage storage infrastructure, efficiently provision storage, monitor and report on storage use, and effectively make decisions about new infrastructure purchases. As the amount of file data generated by business' continues to expand, the stress on expensive storage infrastructure, users and system administrators, and IT budgets continues to grow.

Apache Tika - A content analysis toolkit

  •    Java

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Tika Converter

  •    

A converter that automates MS office to do different kind converting. Particularly for text extracting -- and it hopefully integrates with apache Tika and Jackrabbit for document searching, displaying.


yomu - Read text and metadata from files and documents (.doc, .docx, .pages, .odt, .rtf, .pdf)

  •    Ruby

Yomu is a library for extracting text and metadata from files and documents using the Apache Tika content analysis toolkit. For the complete list of supported formats, please visit the Apache Tika Supported Document Formats page.

Elastalert - Easy & Flexible Alerting With ElasticSearch

  •    Python

ElastAlert is a simple framework for alerting on anomalies, spikes, or other patterns of interest from data in Elasticsearch. ElastAlert works with all versions of Elasticsearch. If you have data being written into Elasticsearch in near real time and want to be alerted when that data matches certain patterns, ElastAlert is the tool for you. If you can see it in Kibana, ElastAlert can alert on it.

Mirage - An interactive query explorer for Elasticsearch

  •    Typescript

Mirage is a modern, open-source web based query explorer for Elasticsearch. It offers a blocks based GUI for composing Elasticsearch queries and comes with an on-the-fly transformer to show the corresponding JSON query API of Elasticsearch.

Bigdesk - Live charts and statistics for Elasticsearch cluster.

  •    Javascript

Bigdesk helps to generate live charts and statistics for Elasticsearch cluster. It very easy to see how your Elasticsearch cluster is doing. It pulls data from Elasticsearch REST API and turns it into charts.

Jest - ElasticSearch Java Rest Client

  •    Java

Jest is a Java HTTP Rest client for ElasticSearch. ElasticSearch already has a Java API which is also used by ElasticSearch internally, but Jest fills a gap, it is the missing client for ElasticSearch Http Rest interface.

elasticsearch-dsl-py - High level Python client for Elasticsearch

  •    Python

Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built on top of the official low-level client (elasticsearch-py).It provides a more convenient and idiomatic way to write and manipulate queries. It stays close to the Elasticsearch JSON DSL, mirroring its terminology and structure. It exposes the whole range of the DSL from Python either directly using defined classes or a queryset-like expressions.

elasticsearch-py - Official Python low-level client for Elasticsearch.

  •    Python

Official low-level client for Elasticsearch. Its goal is to provide common ground for all Elasticsearch-related code in Python; because of this it tries to be opinion-free and very extendable.For a more high level client library with more limited scope, have a look at elasticsearch-dsl - a more pythonic library sitting on top of elasticsearch-py.

elasticsearch-gui - An angularJS client for elasticsearch as a plugin

  •    Javascript

Welcome to the Gui plugin for elasticsearch. Using this plugin you can explore your elasticsearch index. This plugin gives you a few different ways to start exploring. There is a way to search the repository in a way you would do it on a web site. You can enter keywords, do advanced search, use facets. Another way to explore the index is focussed on learning the structure of the actual executed query. You can enter a number of items to include in the query. You can enter fields, facets, highlighting, limit the indexes, limit the types. Finally there is a way to show some of the data in a graph. Since we use mainly JavaScript, it is possible to connect to a remote elasticsearch instance. To facilitate this, elasticsearch returns a specific html header.

kopf - Web admin interface for elasticsearch

  •    Javascript

kopf is a simple web administration tool for elasticsearch written in JavaScript + AngularJS + jQuery + Twitter bootstrap. It offers an easy way of performing common tasks on an elasticsearch cluster. Not every single API is covered by this plugin, but it does offer a REST client which allows you to explore the full potential of the ElasticSearch API.

Elasticquent - Maps Laravel Eloquent models to Elasticsearch types

  •    PHP

Elasticquent makes working with Elasticsearch and Eloquent models easier by mapping them to Elasticsearch types. You can use the default settings or define how Elasticsearch should index and search your Eloquent models right in the model. Elasticquent uses the official Elasticsearch PHP API. To get started, you should have a basic knowledge of how Elasticsearch works (indexes, types, mappings, etc).

dejavu - The Missing Web UI for Elasticsearch

  •    Javascript

dejavu is the missing Web UI for Elasticsearch. Its goal is to build a modern Web UI (no page reloads, infinite scroll, filtered views, realtime updates) with 100% client side rendering. It is available today as a hosted app, chrome extension and as a docker image.

Praeco - Elasticsearch alerting made simple

  •    Vue

Praeco is an alerting tool for Elasticsearch – a GUI for ElastAlert, using the ElastAlert API. It interactively build alerts for your Elasticsearch data using a query builder, helps you to preview and test your alerts using historical data.

Elastic HQ - Sleek, intuitive, and powerful ElasticSearch Management and Monitoring

  •    Javascript

ElasticHQ provides monitoring, management, and querying web Interface for ElasticSearch instances and clusters. It provides support for Real Time Monitoring for Clusters, Manage Indices, Mappings, Shards, Aliases, and Nodes,Full Cluster Management. It works in your web browser, allowing you to manage and monitor your ElasticSearch clusters from anywhere at any time.

elasticsearch-learning-to-rank - Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch

  •    Java

Rank Elasticsearch results using tree based (LambdaMART, Random Forest, MART) and linear models. Models are trained using the scores of Elasicsearch queries as features. You train offline using tooling such as with xgboost or ranklib. You then POST your model to a to Elasticsearch in a specific text format (the custom "ranklib" language, documented here). You apply a model using this plugin's ltr query. See blog post and the full demo (training and searching).Models are stored using an Elasticsearch script plugin. Tree-based models can be large. So we recommend increasing the script.max_size_in_bytes setting. Don't worry, just because tree-based models are verbose, doesn't nescesarilly imply they'll be slow.





We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.