This documentation describes the handy python modules that I use frequently in the course of my research in Natural Language Processing.
Below is a list of Python modules in this package.
|bagofwords||This module contains classes and methods that are useful when handling bags of words data structure.|
|bigvocab||This module contains classes and methods for handling vocabulary of corpora with millions of tokens.|
|bleu||This module contains methods for computing the BLEU precision/recall from two given list of tokens.|
|corpus||This module contains classes and methods that are useful when handling corpora.|
|tfidf||This module contains classes and methods for computing TF-IDF values.|
|tokenize||This module contains function to split a piece of text into individual sentences and into individual words.|
|tsvio||This module contains the TSVFile class which handles reading/writing to tab separated values files.|
|urls||This module contains classes and methods that are useful when trying to download data from the web.|
|misc||This module contains miscellaneous functions that don’t really fall under any category.|
Here are some useful scripts for performing common NLP preprocessing tasks. These script depends heavily on the above modules and classes. They can be found in scripts/ directory of the package.
|prune-vocab.py||This script prunes the vocabulary file according to criterion specified on the command line.|
|tokenize-docs.py||This script tokenizes large collections of text using ycutils.tokenize.|
|vocab-stats.py||This script displays statistics about the given ycutils.corpus.CorpusVocabulary file.|
yc-pyutils is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
yc-pyutils is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with yc-pyutils. If not, see http://www.gnu.org/licenses/.