Table Of Contents

Next topic

bagofwords module

This Page

Documentation for yc-pyutils 0.1

This documentation describes the handy python modules that I use frequently in the course of my research in Natural Language Processing.

Python modules

Below is a list of Python modules in this package.

Modules Description
bagofwords This module contains classes and methods that are useful when handling bags of words data structure.
bigvocab This module contains classes and methods for handling vocabulary of corpora with millions of tokens.
bleu This module contains methods for computing the BLEU precision/recall from two given list of tokens.
corpus This module contains classes and methods that are useful when handling corpora.
tfidf This module contains classes and methods for computing TF-IDF values.
tokenize This module contains function to split a piece of text into individual sentences and into individual words.
tsvio This module contains the TSVFile class which handles reading/writing to tab separated values files.
urls This module contains classes and methods that are useful when trying to download data from the web.
misc This module contains miscellaneous functions that don’t really fall under any category.

Useful scripts

Here are some useful scripts for performing common NLP preprocessing tasks. These script depends heavily on the above modules and classes. They can be found in scripts/ directory of the package.

Scripts Description
prune-vocab.py This script prunes the vocabulary file according to criterion specified on the command line.
tokenize-docs.py This script tokenizes large collections of text using ycutils.tokenize.
vocab-stats.py This script displays statistics about the given ycutils.corpus.CorpusVocabulary file.

License

yc-pyutils is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

yc-pyutils is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with yc-pyutils. If not, see http://www.gnu.org/licenses/.