rsemantic - A document vector search with flexible matrix transforms

  •        101

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

http://github.com/josephwilk/rsemantic/wikis/home
https://github.com/josephwilk/rsemantic

Tags
Implementation
License
Platform

   




Related Projects

rsemantic

  •    Ruby

A document vector search with flexible matrix transforms. Currently supports Latent semantic analysis and Term frequency - inverse document frequency

Semantic Weblog Monitoring Framework

  •    Java

Facilitates data mining/natural language processing experiments to be executed on weblogs, such as classification, clustering and rating. As part of these experiments, it is possible to apply Latent Semantic Analysis.

Latent semantic analysis: all you need is to use!

  •    

Baggr is feed aggregator with web interface, user rating and LSA filter. Enjoy it!

SenseClusters

  •    C

SenseClusters is a suite of Perl programs that clusters similar written contexts using unsupervised methods. It supports its own native methods and Latent Semantic Analysis. It takes users from preprocessing of text to clustered output.

OpenLSA

  •    Python

OpenLSA is a general purpose engine for performing latent semantic analysis (LSA). LSA is a statistical process that can identify complex co-occurrences of items, and is being used in the next generation of spam filters.


The Large Time/frequency Analysis TB

  •    

The Large Time/Frequency Analysis Toolbox is a Matlab/Octave/C toolbox for doing time/frequency and wavelet analysis. It is inteded as both an educational and a computational tool.

coinai - Seed applications based on AI for digital currency quantitative analysis, medium-term forecast and asset allocation for the secondary market of the BANCA community

  •    Python

coinai is a set of seed applications based on AI for digital currency quantitative analysis, medium-term forecast, and asset allocation for the secondary market of the BANCA community. Clients can use CoinAI to conduct in-depth analysis of digital tokens and compare the investment value and risk of different currencies. They can also obtain the prediction for the future trend of tokens based on artificial intelligence and big data smart beta market timing models. According to your own risk assessment, you are one click away from building the optimum portfolio.

dex-oracle - A pattern based Dalvik deobfuscator which uses limited execution to improve semantic analysis

  •    Ruby

A pattern based Dalvik deobfuscator which uses limited execution to improve semantic analysis. Also, the inspiration for another Android deobfuscator: Simplify. Make sure adb is on your path.

bayesian - Naive Bayesian Classification for Golang.

  •    Go

Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports term frequency-inverse document frequency calculations (TF-IDF).Copyright (c) 2011-2017. Jake Brukhman. (jbrukh@gmail.com). All rights reserved. See the LICENSE file for BSD-style license.

tf-idf-similarity - Ruby gem to calculate the similarity between texts using tf*idf

  •    Ruby

Calculates the similarity between texts using a bag-of-words Vector Space Model with Term Frequency-Inverse Document Frequency (tf*idf) weights. If your use case demands performance, use Lucene (see below).

semantic-segmentation-pytorch - Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

  •    Python

This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset. This module differs from the built-in PyTorch BatchNorm as the mean and standard-deviation are reduced across all devices during training. The importance of synchronized batch normalization in object detection has been recently proved with a an extensive analysis in the paper MegDet: A Large Mini-Batch Object Detector, and we empirically find that it is also important for segmentation.

treelstm - Tree-structured Long Short-Term Memory networks (http://arxiv.org/abs/1503.00075)

  •    Lua

An implementation of the Tree-LSTM architectures described in the paper Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, and Christopher Manning. The preprocessing script generates dependency parses of the SICK dataset using the Stanford Neural Network Dependency Parser.

lectures - Oxford Deep NLP 2017 course

  •    

This repository contains the lecture slides and course description for the Deep Natural Language Processing course offered in Hilary Term 2017 at the University of Oxford. This is an applied course focussing on recent advances in analysing and generating speech and text using recurrent neural networks. We introduce the mathematical definitions of the relevant machine learning models and derive their associated optimisation algorithms. The course covers a range of applications of neural networks in NLP including analysing latent dimensions in text, transcribing speech to text, translating between languages, and answering questions. These topics are organised into three high level themes forming a progression from understanding the use of neural networks for sequential language modelling, to understanding their use as conditional language models for transduction tasks, and finally to approaches employing these techniques in combination with other mechanisms for advanced applications. Throughout the course the practical implementation of such models on CPU and GPU hardware is also discussed.

classifier-reborn - A general classifier module to allow Bayesian and other types of classifications

  •    Ruby

Classifier Reborn is a general classifier module to allow Bayesian and other types of classifications. It is a fork of cardmagic/classifier under more active development. Currently, it has Bayesian Classifier and Latent Semantic Indexer (LSI) implemented. Here is a quick illustration of the Bayesian classifier.

models - Model configurations

  •    Python

PaddlePaddle provides a rich set of computational units to enable users to adopt a modular approach to solving various learning problems. In this repo, we demonstrate how to use PaddlePaddle to solve common machine learning tasks, providing several different neural network model that anyone can easily learn and use. The word embedding expresses words with a real vector. Each dimension of the vector represents some of the latent grammatical or semantic features of the text and is one of the most successful concepts in the field of natural language processing. The generalized word vector can also be applied to discrete features. The study of word vector is usually an unsupervised learning. Therefore, it is possible to take full advantage of massive unmarked data to capture the relationship between features and to solve the problem of sparse features, missing tag data, and data noise. However, in the common word vector learning method, the last layer of the model often encounters a large-scale classification problem, which is the bottleneck of computing performance.

LibEEGTools

  •    C

This is a c-library that provides tools for advanced analysis of electrophysiological data. It features denoising, unsupervised classification, time-frequency analysis, phase-space analysis, neural networks, time-warping and more.

clubber - Application of music theory in audio reactive visualizations

  •    Javascript

This small js library listens to audio sources and extracts the underlying rhythmic information. The linear frequency energies are converted into midi notes which music theory suggests to be a better segregation for music audio. A set of meaningful measurements are produced in a form suitable for direct use in webgl shaders, or any other context. This simple flow provides a powerful framework for the rapid development of awesome audio reactive visualisations.

Ephyra - Question Answering System

  •    Java

Ephyra is a modular and extensible framework for open domain question answering (QA). The system retrieves accurate answers to natural language questions from the Web and other sources. The goal is to give researchers the opportunity to develop new QA techniques without worrying about the end-to-end system.

flint - A Time Series Library for Apache Spark

  •    Scala

The ability to analyze time series data at scale is critical for the success of finance and IoT applications based on Spark. Flint is Two Sigma's implementation of highly optimized time series operations in Spark. It performs truly parallel and rich analyses on time series data by taking advantage of the natural ordering in time series data to provide locality-based optimizations. Flint is an open source library for Spark based around the TimeSeriesRDD, a time series aware data structure, and a collection of time series utility and analysis functions that use TimeSeriesRDDs. Unlike DataFrame and Dataset, Flint's TimeSeriesRDDs can leverage the existing ordering properties of datasets at rest and the fact that almost all data manipulations and analysis over these datasets respect their temporal ordering properties. It differs from other time series efforts in Spark in its ability to efficiently compute across panel data or on large scale high frequency data.