Addresses and the locations they represent are essential for any application dealing with maps (place search, transportation, on-demand/delivery services, check-ins, reviews). Yet even the simplest addresses are packed with local conventions, abbreviations and context, making them difficult to index/query effectively with traditional full-text search engines. This library helps convert the free-form addresses that humans use into clean normalized forms suitable for machine comparison and full-text indexing. Though libpostal is not itself a full geocoder, it can be used as a preprocessing step to make any geocoding application smarter, simpler, and more consistent internationally. The core library is written in pure C. Language bindings for Python, Ruby, Go, Java, PHP, and NodeJS are officially supported and it's easy to write bindings in other languages.
https://github.com/openvenues/libpostalTags | address-parser machine-learning nlp address international deduplication record-linkage deduping natural-language-processing |
Implementation | C |
License | MIT |
Platform |
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.
nlp natural-language-processing machine-learning parts-of-speech categorization auto-tagging sentence-detectorThe library's full documentation can be found here. Be sure to lint & pass the unit tests before submitting your pull request.
natural-language-processing machine-learning fuzzy-matching clustering record-linkage bayes bloom-filter canberra caverphone chebyshev cologne cosine classifier daitch-mokotoff dice fingerprint fuzzy hamming k-means jaccard jaro lancaster levenshtein lig metaphone mra ngrams nlp nysiis perceptron phonetic porter punkt schinke sorensen soundex stats tfidf tokenizer tversky vectorizer winklerspaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license. 💫 Version 2.0 out now! Check out the new features here.
natural-language-processing data-science big-data machine-learning cython nlp artificial-intelligence ai spacy nlp-library neural-network neural-networks deep-learningJohn Snow Labs Spark-NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment. This library has been uploaded to the spark-packages repository https://spark-packages.org/package/JohnSnowLabs/spark-nlp .
nlp nlu natural-language-processing natural-language-understanding spark spark-ml pyspark machine-learning named-entity-recognition sentiment-analysis lemmatizer spell-checker tokenizer entity-extraction stemmer part-of-speech-tagger annotation-frameworkThis curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines. This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.
machine-learning natural-language-processing nlp sentiment-analysis pos-tag awesome awesome-list list computational-linguistics rubynlp rubymlThe NLP newsletter provides (weekly or biweekly) quick summaries of some of the latest natural language processing (NLP) and machine learning (ML) stories across a range of important categories such as fairness & ethics, society, educational resources, publications, etc. Want your story or project to be featured in the NLP Newsletter? Reach out via ellfae@gmail.com or Twitter. If you are interested in translating any of the NLP Newsletter issues, you can check out this GitHub issue.
nlp machine-learning deep-learningPyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation). The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.
nlp computational-linguistics linguistics library folia machine-learning language-modelling search-algorithms evaluation-metrics text-processing nlp-library natural-language-processingThis repository contains the lecture slides and course description for the Deep Natural Language Processing course offered in Hilary Term 2017 at the University of Oxford. This is an applied course focussing on recent advances in analysing and generating speech and text using recurrent neural networks. We introduce the mathematical definitions of the relevant machine learning models and derive their associated optimisation algorithms. The course covers a range of applications of neural networks in NLP including analysing latent dimensions in text, transcribing speech to text, translating between languages, and answering questions. These topics are organised into three high level themes forming a progression from understanding the use of neural networks for sequential language modelling, to understanding their use as conditional language models for transduction tasks, and finally to approaches employing these techniques in combination with other mechanisms for advanced applications. Throughout the course the practical implementation of such models on CPU and GPU hardware is also discussed.
deep-learning machine-learning natural-language-processing nlp oxfordA Machine Learning library written in pure Go designed to support relevant neural architectures in Natural Language Processing. spaGO is self-contained, in that it uses its own lightweight computational graph framework for both training and inference, easy to understand from start to finish.
nlp machine-learning natural-language-processing deep-learning neural-network automatic-differentiation artificial-intelligence recurrent-networks lstm computation-graph question-answering bart automatic-translation deeplearning language-model bert transformer-architecture bert-as-service named-entities-recognitionA feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by Plasticity. It is primarily intended to be a simpler / faster alternative to Gensim, but can be used as a generic key-vector store for domains outside NLP. Vector space embedding models have become increasingly common in machine learning and traditionally have been popular for natural language processing applications. A fast, lightweight tool to consume these large vector space embedding models efficiently is lacking.
natural-language-processing nlp machine-learning vectors embeddings word2vec fasttext glove gensim fast memory-efficient machine-learning-library word-embeddingsSLING is a parser for annotating text with frame semantic annotations. It is trained on an annotated corpus using Tensorflow and Dragnn.The parser is a general transition-based frame semantic parser using bi-directional LSTMs for input encoding and a Transition Based Recurrent Unit (TBRU) for output decoding. It is a jointly trained model using only the text tokens as input and the transition system has been designed to output frame graphs directly without any intervening symbolic representation.
natural-language-understanding natural-language-processing neural-network machine-learning jit-compiler frame-semantic-parsing nlpPyTorch-NLP, or torchnlp for short, is a library of neural network layers, text processing modules and datasets designed to accelerate Natural Language Processing (NLP) research. Join our community, add datasets and neural network layers! Chat with us on Gitter and join the Google Group, we're eager to collaborate with you.
pytorch nlp natural-language-processing pytorch-nlp torchnlp data-loader embeddings word-vectors deep-learning dataset metrics neural-network sru machine-learningWelcome to my GitHub repo. I am a Data Scientist and I code in R, Python and Wolfram Mathematica. Here you will find some Machine Learning, Deep Learning, Natural Language Processing and Artificial Intelligence models I developed.
anomaly-detection deep-learning autoencoder keras keras-models denoising-autoencoders generative-adversarial-network glove keras-layer word2vec nlp natural-language-processing sentiment-analysis opencv segnet resnet-50 variational-autoencoder t-sne svm-classifier latent-dirichlet-allocationPlease read the contribution guidelines before contributing. Please feel free to create pull requests.
natural-language-processing deep-learning machine-learning language awesome awesome-list nlp text-miningBotSharp is an open source machine learning framework for AI Bot platform builder. This project involves natural language understanding, computer vision and audio processing technologies, and aims to promote the development and application of intelligent robot assistants in information systems. Out-of-the-box machine learning algorithms allow ordinary programmers to develop artificial intelligence applications faster and easier. It's witten in C# running on .Net Core that is full cross-platform framework. C# is a enterprise grade programming language which is widely used to code business logic in information management related system. More friendly to corporate developers. BotSharp adopts machine learning algrithm in C# directly. That will facilitate the feature of the typed language C#, and be more easier when refactoring code in system scope.
artificial-intelligence natural-language-processing chatbot chatbot-framework nlp nlu automationNLP Architect is an open source Python library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing and Natural Language Understanding Neural Networks. NLP Architect is an NLP library designed to be flexible, easy to extend, allow for easy and rapid integration of NLP models in applications and to showcase optimized models.
nlp deep-learning tensorflow nlu transformers pytorch deeplearning quantization bert dynetNLP Architect is an open-source Python library for exploring state-of-the-art deep learning topologies and techniques for natural language processing and natural language understanding. It is intended to be a platform for future research and collaboration. Framework documentation on NLP models, algorithms, and modules, and instructions on how to contribute can be found at our main documentation site.
deeplearning nlp nlu tensorflow dynet kerasStanza is a Python NLP Library for Many Human Languages. It contains support for running various accurate natural language processing tools on 60+ languages and for accessing the Java Stanford CoreNLP software from Python. A new collection of biomedical and clinical English model packages are now available, offering seamless experience for syntactic analysis and named entity recognition (NER) from biomedical literature text and clinical notes.
nlp machine-learning natural-language-processing deep-learning pytorch artificial-intelligence named-entity-recognition universal-dependencies corenlpThinc is the machine learning library powering spaCy. It features a battle-tested linear model designed for large sparse learning problems, and a flexible neural network model under development for spaCy v2.0. Thinc is a practical toolkit for implementing models that follow the "Embed, encode, attend, predict" architecture. It's designed to be easy to install, efficient for CPU usage and optimised for NLP and deep learning with text – in particular, hierarchically structured input and variable-length sequences.
machine-learning deep-learning artificial-intelligence ai nlp natural-language-processing spacy machine-learning-libraryThe Natural Language Decathlon is a multitask challenge that spans ten tasks: question answering (SQuAD), machine translation (IWSLT), summarization (CNN/DM), natural language inference (MNLI), sentiment analysis (SST), semantic role labeling(QA‑SRL), zero-shot relation extraction (QA‑ZRE), goal-oriented dialogue (WOZ, semantic parsing (WikiSQL), and commonsense reasoning (MWSC). Each task is cast as question answering, which makes it possible to use our new Multitask Question Answering Network (MQAN). This model jointly learns all tasks in decaNLP without any task-specific modules or parameters in the multitask setting. For a more thorough introduction to decaNLP and the tasks, see the main website, our blog post, or the paper. While the research direction associated with this repository focused on multitask learning, the framework itself is designed in a way that should make single-task training, transfer learning, and zero-shot evaluation simple. Similarly, the paper focused on multitask learning as a form of question answering, but this framework can be easily adapted for different approached to single-task or multitask learning.
deep-learning natural-language-processing multitask-learning
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.