sotawhat - Returns latest research results by crawling arxiv papers and summarizing abstracts

  •        142

This script runs using Python 3. First, install the required packages. This script only requires nltk and PyEnchant.

https://github.com/chiphuyen/sotawhat

Tags
Implementation
License
Platform

   




Related Projects

arxiv-sanity-preserver - Web interface for browsing, search and filtering recent arxiv submissions

  •    Python

This project is a web interface that attempts to tame the overwhelming flood of papers on Arxiv. It allows researchers to keep track of recent papers, search for papers, sort papers by similarity to any paper, see recent popular papers, to add papers to a personal library, and to get personalized recommendations of (new or old) Arxiv papers. This code is currently running live at www.arxiv-sanity.com/, where it's serving 25,000+ Arxiv papers from Machine Learning (cs.[CV|AI|CL|LG|NE]/stat.ML) over the last ~3 years. With this code base you could replicate the website to any of your favorite subsets of Arxiv by simply changing the categories in fetch_papers.py. Indexing code. Uses Arxiv API to download the most recent papers in any categories you like, and then downloads all papers, extracts all text, creates tfidf vectors based on the content of each paper. This code is therefore concerned with the backend scraping and computation: building up a database of arxiv papers, calculating content vectors, creating thumbnails, computing SVMs for people, etc.

Sumarizador

  •    

A tool for multi-texts automatic summarization in web which objective is to produce summaries based on search terms supplied by the user. Initially the main methods of automatic summarization of texts are used to compose these funcionalities. After that, a methodology for the ...

Zotero - Tool to Collect, Organize, Cite, and Share your Research Sources

  •    C

Zotero [zoh-TAIR-oh] is a free, easy-to-use tool to help you collect, organize, cite, and share your research sources. It lives right where you do your work—in the web browser itself. It is a research tool that automatically senses content, allowing you to add it to your personal library with a single click.

decaNLP - The Natural Language Decathlon: A Multitask Challenge for NLP

  •    Python

The Natural Language Decathlon is a multitask challenge that spans ten tasks: question answering (SQuAD), machine translation (IWSLT), summarization (CNN/DM), natural language inference (MNLI), sentiment analysis (SST), semantic role labeling(QA‑SRL), zero-shot relation extraction (QA‑ZRE), goal-oriented dialogue (WOZ, semantic parsing (WikiSQL), and commonsense reasoning (MWSC). Each task is cast as question answering, which makes it possible to use our new Multitask Question Answering Network (MQAN). This model jointly learns all tasks in decaNLP without any task-specific modules or parameters in the multitask setting. For a more thorough introduction to decaNLP and the tasks, see the main website, our blog post, or the paper. While the research direction associated with this repository focused on multitask learning, the framework itself is designed in a way that should make single-task training, transfer learning, and zero-shot evaluation simple. Similarly, the paper focused on multitask learning as a form of question answering, but this framework can be easily adapted for different approached to single-task or multitask learning.

node-summary - Node module that summarizes text using a naive summarization algorithm

  •    Javascript

Summarizes text using a naive summarization algorithm, based off of the Python implementation by shlomibabluki. And now with UTF8 support, thanks to xissy.


TextTeaser - Automatic Summarization Algorithm

  •    Scala

TextTeaser is an automatic summarization algorithm that combines the power of natural language processing and machine learning to produce good results. It can provide provide a gist of an article, Better previews in news readers.

RLSeq2Seq - Deep Reinforcement Learning For Sequence to Sequence Models

  •    Python

NOTE: THE CODE IS UNDER DEVELOPMENT, PLEASE ALWAYS PULL THE LATEST VERSION FROM HERE. In recent years, sequence-to-sequence (seq2seq) models are used in a variety of tasks from machine translation, headline generation, text summarization, speech to text, to image caption generation. The underlying framework of all these models are usually a deep neural network which contains an encoder and decoder. The encoder processes the input data and a decoder receives the output of the encoder and generates the final output. Although simply using an encoder/decoder model would, most of the time, produce better result than traditional methods on the above-mentioned tasks, researchers proposed additional improvements over these sequence to sequence models, like using an attention-based model over the input, pointer-generation models, and self-attention models. However, all these seq2seq models suffer from two common problems: 1) exposure bias and 2) inconsistency between train/test measurement. Recently a completely fresh point of view emerged in solving these two problems in seq2seq models by using methods in Reinforcement Learning (RL). In these new researches, we try to look at the seq2seq problems from the RL point of view and we try to come up with a formulation that could combine the power of RL methods in decision-making and sequence to sequence models in remembering long memories. In this paper, we will summarize some of the most recent frameworks that combines concepts from RL world to the deep neural network area and explain how these two areas could benefit from each other in solving complex seq2seq tasks. In the end, we will provide insights on some of the problems of the current existing models and how we can improve them with better RL models. We also provide the source code for implementing most of the models that will be discussed in this paper on the complex task of abstractive text summarization.

fast-rcnn - Fast R-CNN

  •    Python

Created by Ross Girshick at Microsoft Research, Redmond. Fast R-CNN was initially described in an arXiv tech report and later published at ICCV 2015.

adversarial - Code and hyperparameters for the paper "Generative Adversarial Networks"

  •    Python

"Generative Adversarial Networks." Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. ArXiv 2014. Please cite this paper if you use the code in this repository as part of a published research project.

StarData - Starcraft AI Research Dataset

  •    Python

We release the largest StarCraft: Brood War replay dataset yet, with 65646 games. The full dataset after compression is 365 GB, 1535 million frames, and 496 million player actions. The entire frame data was dumped out at 8 frames per second. We made a big effort to ensure this dataset is clean and has mostly high quality replays. You can access it with TorchCraft in C++, Python, and Lua. The replays are in an AWS S3 bucket at s3://stardata. Read below for more details, or our whitepaper on arXiv for more details. Note: The current set of replays are only compatible with the 1.3.0 version of torchcraft included here.

TorchCraft - Connecting Torch to StarCraft

  •    C++

A bridge between Torch and StarCraft. Synnaeve, G., Nardelli, N., Auvolat, A., Chintala, S., Lacroix, T., Lin, Z., Richoux, F. and Usunier, N., 2016. TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games - arXiv:1611.00625.

Switchable-Normalization - Code for Switchable Normalization from "Differentiable Learning-to-Normalize via Switchable Normalization", https://arxiv

  •    HTML

Switchable Normalization is a normalization technique that is able to learn different normalization operations for different normalization layers in a deep neural network in an end-to-end manner. This repository provides imagenet classification results and models trained with Switchable Normalization. You are encouraged to cite the following paper if you use SN in research.

BOLT - Binary Optimization and Layout Tool - A linux command-line utility used for optimizing performance of binaries

  •    C++

BOLT is a post-link optimizer developed to speed up large applications. It achieves the improvements by optimizing application's code layout based on execution profile gathered by sampling profiler, such as Linux perf tool. BOLT can operate on any binary with a symbol table, but for maximum gains it utilizes relocations saved by a linker (--emit-relocs). An overview of the ideas implemented in BOLT along with a discussion of its potential and current results is available in an arXiv paper. NOTE: current support is limited to non-PIE X86-64 and AArch64 ELF binaries.

idb - idb is a tool to simplify some common tasks for iOS pentesting and research

  •    Ruby

idb is a tool to simplify some common tasks for iOS pentesting and research. Originally there was a command line version of the tool, but it is no longer under development so you should get the GUI version. idb has a new homepage at http://www.idbtool.com All documentation and news are posted over there.

voyager - Recommendation-Powered Visualization Tool for Data Exploration

  •    TypeScript

Voyager 2 is a data exploration tool that blends manual and automated chart specification. Voyager 2 combines PoleStar, a traditional chart specification tool inspired by Tableau and Polaris (research project that led to the birth of Tableau), with two partial chart specification interfaces: (1) wildcards let users specify multiple charts in parallel,(2) related views suggest visualizations relevant to the currently specified chart. With Voyager 2, we aim to help analysts engage in both breadth-oriented exploration and depth-oriented question answering. For a quick overview of Voyager, see our preview video, or a 4-minute demo in our Vega-Lite talk at OpenVisConf, or watch our research talk at CHI 2017. For more information about our design, please read our CHI paper and other related papers (1, 2, 3).

MEAnalyzer - Intel Engine Firmware Analysis Tool

  •    Python

ME Analyzer is a tool which parses Intel Engine & PMC firmware images from the (Converged Security) Management Engine, (Converged Security) Trusted Execution Engine, (Converged Security) Server Platform Services & Power Management Controller families. It can be used by end-users who are looking for all relevant firmware information such as Family, Version, Release, Type, Date, SKU, Platform etc. It is capable of detecting new/unknown firmware, checking firmware health, Updated/Outdated status and many more. ME Analyzer is also a powerful Engine firmware research analysis tool with multiple structures which allow, among others, full parsing and unpacking of Converged Security Engine (CSE) code & file system, Flash Partition Table (FPT), Boot Partition Descriptor Table (BPDT/IFWI), CSE Layout Table (LT), advanced Size detection etc. Moreover, with the help of its extensive database, ME Analyzer is capable of uniquely categorizing all supported Engine firmware as well as check for any firmware which have not been stored at the Intel Engine Firmware Repositories yet. ME Analyzer allows end-users and/or researchers to quickly analyze and/or report new firmware versions without the use of special Intel tools (FIT/FITC, FWUpdate) or Hex Editors. To do that effectively, a database had to be built. The Intel Engine Firmware Repositories is a collection of every (CS)ME, (CS)TXE & (CS)SPS firmware we have found. Its existence is very important for ME Analyzer as it allows us to continue doing research, find new types of firmware, compare same major version releases for similarities, check for updated firmware etc. Bundled with ME Analyzer is a file called MEA.dat which is required for the program to run. It includes entries for all Engine firmware that are available to us. This accommodates primarily three actions: a) Detect each firmware's Family via unique identifier keys, b) Check whether the imported firmware is up to date and c) Help find new Engine firmware sooner by reporting them at the Intel Management Engine: Drivers, Firmware & System Tools or Intel Trusted Execution Engine: Drivers, Firmware & System Tools threads respectively.

fluxion - Fluxion is a remake of linset by vk496 with less bugs and enhanced functionality.

  •    HTML

Fluxion is a security auditing and social-engineering research tool. It is a remake of linset by vk496 with (hopefully) less bugs and more functionality. The script attempts to retrieve the WPA/WPA2 key from a target access point by means of a social engineering (phishing) attack. It's compatible with the latest release of Kali (rolling). Fluxion's attacks' setup is mostly manual, but experimental auto-mode handles some of the attacks' setup parameters. Read the FAQ before requesting issues. If you need quick help, fluxion is also avaible on gitter. You can talk with us on Gitter or on Discord.

pivot.js - Build Pivot Tables from CSV/JSON Data

  •    Javascript

Pivot.js is a simple way for you to get to your data. It allows for the creation of highly customizable unique table views from your browser. In data processing, a pivot table is a data summarization tool found in data visualization programs such as spreadsheets or business intelligence software. Among other functions, pivot-table tools can automatically sort, count, total or give the average of the data stored in one table or spreadsheet. It displays the results in a second table (called a "pivot table") showing the summarized data.

Reductio - Automatic summarizer text in Swift

  •    Swift

Reductio is a tool used to extract keywords and phrases using an implementation of the algorithm TextRank. Simply add Reductio as a dependency to your project's Package.swift.

SIDoBI

  •    PHP

SIDoBI is an automatic summarization system for documents in Indonesian language. It is an acronym for Sistem Ikhtisar Dokumen untuk Bahasa Indonesia. SIDoBI is built based on MEAD, a public domain portable multi-document summarization system.