Displaying 1 to 4 from 4 results

transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

  •    Python

I tried to implement the idea in Attention Is All You Need. They authors claimed that their model, the Transformer, outperformed the state-of-the-art one in machine translation with only attention, no CNNs, no RNNs. How cool it is! At the end of the paper, they promise they will make their code available soon, but apparently it is not so yet. I have two goals with this project. One is I wanted to have a full understanding of the paper. Often it's hard for me to have a good grasp before writing some code for it. Another is to share my code with people who are interested in this model before the official code is unveiled. I got a BLEU score of 17.14. (Recollect I trained with a small dataset, limited vocabulary) Some of the evaluation results are as follows. Details are available in the results folder.

sockeye - Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

  •    Python

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton and Matt Post (2017): Sockeye: A Toolkit for Neural Machine Translation. In eprint arXiv:cs-CL/1712.05690.If you are interested in collaborating or have any questions, please submit a pull request or issue. You can also send questions to sockeye-dev-at-amazon-dot-com.

Linear-Attention-Recurrent-Neural-Network - A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention

  •    Jupyter

A fixed-size, go-back-k recurrent attention module on an RNN so as to have linear short-term memory by the means of attention. The LARNN model can be easily used inside a loop on the cell state just like any other RNN. The cell state keeps the k last states for its multi-head attention mechanism. The LARNN is derived from the Long Short-Term Memory (LSTM) cell. The LARNN introduces attention on the state's past values up to a certain range, limited by a time window k to keep the forward processing linear in time in terms sequence length (time steps).









We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.