rsemantic - A document vector search with flexible matrix transforms

  •        101

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

http://github.com/josephwilk/rsemantic/wikis/home
https://github.com/josephwilk/rsemantic

Tags
Implementation
License
Platform

   




Related Projects

rsemantic


A document vector search with flexible matrix transforms. Currently supports Latent semantic analysis and Term frequency - inverse document frequency

Semantic Weblog Monitoring Framework


Facilitates data mining/natural language processing experiments to be executed on weblogs, such as classification, clustering and rating. As part of these experiments, it is possible to apply Latent Semantic Analysis.

Latent semantic analysis: all you need is to use!


Baggr is feed aggregator with web interface, user rating and LSA filter. Enjoy it!

SenseClusters


SenseClusters is a suite of Perl programs that clusters similar written contexts using unsupervised methods. It supports its own native methods and Latent Semantic Analysis. It takes users from preprocessing of text to clustered output.

OpenLSA


OpenLSA is a general purpose engine for performing latent semantic analysis (LSA). LSA is a statistical process that can identify complex co-occurrences of items, and is being used in the next generation of spam filters.


The Large Time/frequency Analysis TB


The Large Time/Frequency Analysis Toolbox is a Matlab/Octave/C toolbox for doing time/frequency and wavelet analysis. It is inteded as both an educational and a computational tool.

dex-oracle - A pattern based Dalvik deobfuscator which uses limited execution to improve semantic analysis


A pattern based Dalvik deobfuscator which uses limited execution to improve semantic analysis. Also, the inspiration for another Android deobfuscator: Simplify. Make sure adb is on your path.

bayesian - Naive Bayesian Classification for Golang.


Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports term frequency-inverse document frequency calculations (TF-IDF).Copyright (c) 2011-2017. Jake Brukhman. (jbrukh@gmail.com). All rights reserved. See the LICENSE file for BSD-style license.

tf-idf-similarity - Ruby gem to calculate the similarity between texts using tf*idf


Calculates the similarity between texts using a bag-of-words Vector Space Model with Term Frequency-Inverse Document Frequency (tf*idf) weights. If your use case demands performance, use Lucene (see below).

semantic-segmentation-pytorch - Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset


This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset. This module differs from the built-in PyTorch BatchNorm as the mean and standard-deviation are reduced across all devices during training. The importance of synchronized batch normalization in object detection has been recently proved with a an extensive analysis in the paper MegDet: A Large Mini-Batch Object Detector, and we empirically find that it is also important for segmentation.

classifier-reborn - A general classifier module to allow Bayesian and other types of classifications


Classifier Reborn is a general classifier module to allow Bayesian and other types of classifications. It is a fork of cardmagic/classifier under more active development. Currently, it has Bayesian Classifier and Latent Semantic Indexer (LSI) implemented. Here is a quick illustration of the Bayesian classifier.

models - Model configurations


PaddlePaddle provides a rich set of computational units to enable users to adopt a modular approach to solving various learning problems. In this repo, we demonstrate how to use PaddlePaddle to solve common machine learning tasks, providing several different neural network model that anyone can easily learn and use. The word embedding expresses words with a real vector. Each dimension of the vector represents some of the latent grammatical or semantic features of the text and is one of the most successful concepts in the field of natural language processing. The generalized word vector can also be applied to discrete features. The study of word vector is usually an unsupervised learning. Therefore, it is possible to take full advantage of massive unmarked data to capture the relationship between features and to solve the problem of sparse features, missing tag data, and data noise. However, in the common word vector learning method, the last layer of the model often encounters a large-scale classification problem, which is the bottleneck of computing performance.

LibEEGTools


This is a c-library that provides tools for advanced analysis of electrophysiological data. It features denoising, unsupervised classification, time-frequency analysis, phase-space analysis, neural networks, time-warping and more.

Ephyra - Question Answering System


Ephyra is a modular and extensible framework for open domain question answering (QA). The system retrieves accurate answers to natural language questions from the Web and other sources. The goal is to give researchers the opportunity to develop new QA techniques without worrying about the end-to-end system.

flint - A Time Series Library for Apache Spark


The ability to analyze time series data at scale is critical for the success of finance and IoT applications based on Spark. Flint is Two Sigma's implementation of highly optimized time series operations in Spark. It performs truly parallel and rich analyses on time series data by taking advantage of the natural ordering in time series data to provide locality-based optimizations. Flint is an open source library for Spark based around the TimeSeriesRDD, a time series aware data structure, and a collection of time series utility and analysis functions that use TimeSeriesRDDs. Unlike DataFrame and Dataset, Flint's TimeSeriesRDDs can leverage the existing ordering properties of datasets at rest and the fact that almost all data manipulations and analysis over these datasets respect their temporal ordering properties. It differs from other time series efforts in Spark in its ability to efficiently compute across panel data or on large scale high frequency data.

aubio - a library for audio and music analysis


aubio is a library to label music and sounds. It listens to audio signals and attempts to detect events. For instance, when a drum is hit, at which frequency is a note, or at what tempo is a rhythmic melody. Its features include segmenting a sound file before each of its attacks, performing pitch detection, tapping the beat and producing midi streams from live audio.

text2vec - Fast vectorization, topic modeling, distances and GloVe word embeddings in R.


text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP). To learn how to use this package, see text2vec.org and the package vignettes. See also the text2vec articles on my blog.

ycmd - A code-completion & code-comprehension server


ycmd is a server that provides APIs for code-completion and other code-comprehension use-cases like semantic GoTo commands (and others). For certain filetypes, ycmd can also provide diagnostic errors and warnings. ycmd was originally part of YouCompleteMe's codebase, but has been split out into a separate project so that it can be used in editors other than Vim.