Part of speech tagging is done primarily through the use of the trigram hidden-markov model. While there are many methods used since then, Trigram HMM, seems to be the easiest to implement while maintaining an effective accuracy. This was built through the use of several resources online including bootstrapping the vocabulary using Wiktionary (https://www.wiktionary.org/). This is a common alternative technique to the unsupervised learning technique by providing a bit of an edge to the model with an existing dictionary of sorts. In some cases, the dictionary can be generated from a part of speech corpus (sometimes manually or automatically tagged). On top of Wiktionary, I am using several corpus to build the English language model including: Brown Corpus, Penn TreeBank, Twitter TreeBank. These treebanks provide a resource for calculating and training the model for supervised learning cases. The actually tagging portion is done using the Viterbi path finding algorithm implemented for all standard models. The spanish model is trained using the IULA Spanish LSP TreeBank. You will notice both models are stored in the bin directory.