Displaying 1 to 5 from 5 results

fscrawler - Elasticsearch File System Crawler (FS Crawler)

  •    Java

FS Crawler offers a simple way to index binary files into elasticsearch.

tika-similarity - Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features

  •    Python

This project demonstrates using the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features. The script can iterate over all files in the current directory or given files by command line and derives their metadata features, then computes the union of all features. The union of all features become the "golden feature set" that all document features are compared to via intersect. The length of that intersect per file divided by the length of the unioned set becomes the similarity score.

ipfs-tika - Java web application taking IPFS hashes, extracting (textual) content and metadata through Apache's Tika

  •    Java

Java web application taking IPFS hashes, extracting (textual) content and metadata through Apache's Tika.




tikaondotnet - Use the Java Tika text extraction library on the .NET platform

  •    CSharp

Take a look at our tests for more usage examples. Have an idea to make this project better? Great! Start out by taking a look at our Contributing Guide.