Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

Natural Language Toolkit (NLTK)


  Analyzed 8 days ago

NLTK — the Natural Language Toolkit — is a suite of open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks, with distributions for Windows, Mac OSX and Linux.

235K lines of code

42 current contributors

4 months since last commit

45 users on Open Hub

Low Activity
I Use This

Text Encoding Initiative


  Analyzed 8 days ago

The TEI is an international and interdisciplinary community-based open standard used by research project, libraries, museums, publishers, and academics to represent all kinds of literary and linguistic texts, using an encoding scheme that is maximally expressive and minimally obsolescent.

553K lines of code

14 current contributors

about 1 month since last commit

3 users on Open Hub

Moderate Activity
I Use This



  Analyzed 8 days ago

Use the internet as a linguistic corpus: Provide tools and infrastructure for acquisition, visual annotation, merging and storage of web pages as parts of bigger corpora. Develop a classification engine that learns to automatically annotate pages, provide visual tools for inspection of results.

3.35K lines of code

1 current contributors

over 5 years since last commit

2 users on Open Hub

I Use This

Atomic (multi-level annotation)


  Analyzed 8 days ago

Software for multi-level annotation of linguistic corpora

17K lines of code

0 current contributors

about 8 years since last commit

1 users on Open Hub

I Use This



  Analyzed 7 days ago

Greenstone is a suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD-ROM. Greenstone is produced by the New Zealand Digital Library Project at the University of Waikato, and developed ... [More] and distributed in cooperation with UNESCO and the Human Info NGO. [Less]

0 lines of code

0 current contributors

0 since last commit

1 users on Open Hub

Activity Not Available
I Use This
Mostly written in language not available
Licenses: gpl

IMS Open Corpus Workbench


  Analyzed 7 days ago

The IMS Open Corpus Workbench is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP.

282K lines of code

2 current contributors

6 months since last commit

1 users on Open Hub

Very Low Activity
I Use This
Licenses: No declared licenses

LexAt Lexical/Corpus Statistics


  No analysis available

The LexAt "lexical attraction" aka the RelEx Statistical Linguistics package adds statistical algorithms to the RelEx. Corpus statistics, including mutual information, are maintained in an SQL database, and drawn on to enhance various RelEx functions, such as parse ranking and chunk ranking, and word-sense disambiguation (Mihalcea algo).

0 lines of code

0 current contributors

0 since last commit

1 users on Open Hub

Activity Not Available
I Use This
Mostly written in language not available
Licenses: apache_2



  Analyzed 8 days ago

An engine for creating and annotating textual corpora

38.6K lines of code

3 current contributors

over 1 year since last commit

1 users on Open Hub

Very Low Activity
I Use This



  Analyzed 8 days ago

StmSrch is a reverse-stem searching script. It implements the Porter stemming algorithm, by Martin Porter. It also handles irregular verbs and noun pluralizations. This script can be useful for searching or scanning through corpus files. Each word input to the :StmSrch command will be stemmed ... [More] and then formulated in such a way as to match possible conjugations or pluralizations. Without any word given for input, it will attempt to stem the current word under the cursor. The matching is done using word boundaries so not just any substring will match. For example: - :StmSrch searcher will match any of: - search, searching, searches, searchers, searched, ... and a string of words will work as well, matching in order: - :StmSrch thieves are running from bunnies will match strings of word [Less]

308 lines of code

0 current contributors

almost 15 years since last commit

0 users on Open Hub

I Use This



  Analyzed 8 days ago

A specialized crawler for the German newspaper 'Die Zeit'. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw text ... [More] file. The project includes scripts to convert it into the XML format for further use with natural language processing tools. [Less]

1.64K lines of code

0 current contributors

about 11 years since last commit

0 users on Open Hub

I Use This