Projects tagged ‘corpora’

Natural Language Toolkit (NLTK)

Analyzed 11 months ago

NLTK — the Natural Language Toolkit — is a suite of open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks, with distributions for Windows, Mac OSX and Linux.

234K lines of code

42 current contributors

almost 1 year since last commit

45 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in Python

Licenses: apache_2

Treex - NLP Framework

T

Analyzed 11 months ago

Treex (formerly TectoMT) is a highly modular NLP software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to ... [More]

242K lines of code

4 current contributors

12 months since last commit

4 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in Perl

Licenses: artistic_gpl

Tags classifier computational_linguistics coreferenceresolution corpora grammar linguistics machine_learning natural_language natural_language_processing nlp parser part_of_speech 6 more...

krdwrd

Analyzed 11 months ago

Use the internet as a linguistic corpus: Provide tools and infrastructure for acquisition, visual annotation, merging and storage of web pages as parts of bigger corpora. Develop a classification engine that learns to automatically annotate pages, provide visual tools for inspection of results.

3.35K lines of code

1 current contributors

over 5 years since last commit

2 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in TeX/LaTeX

Licenses: gpl

Tags addon corpora corpus firefox linguistics machine_learning xul

moses-for-mere-mortals

M

Analyzed 11 months ago

This site offers a set of Bash scripts and Windows executables add-ins that, together, create a basic translation chain prototype able of processing very large corpora. It uses Moses, a widely known statistical machine translation system. The idea is to help build a translation chain for the real ... [More]

7.21K lines of code

0 current contributors

almost 5 years since last commit

1 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in Perl

Licenses: gpl3_or_l...

Tags bash corpora irstlm mgiza moses mt nlp python randlm scripts smt tmx

LexAt Lexical/Corpus Statistics

L

No analysis available

The LexAt "lexical attraction" aka the RelEx Statistical Linguistics package adds statistical algorithms to the RelEx. Corpus statistics, including mutual information, are maintained in an SQL database, and drawn on to enhance various RelEx functions, such as parse ranking and chunk ranking, and word-sense disambiguation (Mihalcea algo).

0 lines of code

0 current contributors

0 since last commit

1 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: apache_2

Tags computational_linguistics corpora corpus corpus_linguistics database java linguistics natural_language natural_language_processing nlp opencog perl 1 more...

Affisix

No analysis available

Affisix is a program for automatic recognition of affixes. It takes large amount of words and according to the user setting it tries to determine which segments of these words are prefixes.

0 lines of code

0 current contributors

0 since last commit

1 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: gpl3

Tags computational_linguistics corpora education language linguistics natural_language natural-language-processing natural_language_processing nlp research science scientific_computing 1 more...

Ruby LinkParser

Analyzed 11 months ago

A high-level interface to the CMU Link Grammar. This binding wraps the link-grammar shared library provided by the AbiWord project for their grammar-checker.

2.41K lines of code

1 current contributors

almost 2 years since last commit

1 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in C

Licenses: bsd

Tags classifier cmu computational_linguistics corpora grammar information_retrieval language linguistics machine_learning natural_language natural_language_processing nlp 5 more...

opencorpora

O

Analyzed 11 months ago

An engine for creating and annotating textual corpora

38.6K lines of code

3 current contributors

over 1 year since last commit

1 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in PHP

Licenses: gpl

Tags computational_linguistics corpora corpus corpus_linguistics crowdsourcing disambiguation linguistics natural-language-processing natural_language_processing nlp part_of_speech russian 1 more...

CSniper

C

Analyzed 11 months ago

CSniper (Corpus Sniper) is a tool that implements (i) a web-based multi-user scenario for identifying and annotating non-canonical grammatical constructions in large corpora based on linguistic queries and (ii) evaluation of annotation quality by measuring inter-rater agreement. This ... [More]

23.6K lines of code

0 current contributors

about 3 years since last commit

0 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags annotation corpora java linguistics search uima uimafit

CorpusCatcher

C

Claimed by Translate Analyzed 11 months ago

CorpusCatcher is a corpus collection toolset. It can help you to build language or topic specific corpora from publicly available web resources. This can be very useful for many purposes, especially for data to build spell checkers.

813 lines of code

0 current contributors

almost 13 years since last commit

0 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in Python

Licenses: gpl

Tags corpora corpus corpus_linguistics language linguistics multi-platform natural-language-processing python spell spellchecker tools

Tags : Browse Projects