Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

  • corpus_linguistics (18)

    Text Encoding Initiative

    Compare

      Analyzed 27 days ago

    The TEI is an international and interdisciplinary community-based open standard used by research project, libraries, museums, publishers, and academics to represent all kinds of literary and linguistic texts, using an encoding scheme that is maximally expressive and minimally obsolescent.

    553K lines of code

    14 current contributors

    2 months since last commit

    3 users on Open Hub

    Moderate Activity
    5.0
     
    I Use This

    RelEx Semantic Relationship Extractor

    Compare

      Analyzed 28 days ago

    RelEx is an English-language semantic relationship extractor, built on the Carnegie-Mellon Link Grammar parser. It can identify dependency-grammar dependencies, such as subject, object, indirect object and many other relationships between words in a sentence. It can also provide part-of-speech ... [More] tagging, noun-number tagging, verb tense tagging, gender tagging, and so on. Relex includes a basic implementation of the Hobbs anaphora (pronoun) resolution algorithm. RelEx also provides semantic relationship framing, similar to that of FrameNet. [Less]

    11.8K lines of code

    4 current contributors

    3 months since last commit

    2 users on Open Hub

    Very Low Activity
    0.0
     
    I Use This

    LexAt Lexical/Corpus Statistics

    Compare

      No analysis available

    The LexAt "lexical attraction" aka the RelEx Statistical Linguistics package adds statistical algorithms to the RelEx. Corpus statistics, including mutual information, are maintained in an SQL database, and drawn on to enhance various RelEx functions, such as parse ranking and chunk ranking, and word-sense disambiguation (Mihalcea algo).

    0 lines of code

    0 current contributors

    0 since last commit

    1 users on Open Hub

    Activity Not Available
    0.0
     
    I Use This
    Mostly written in language not available
    Licenses: apache_2

    opencorpora

    Compare

      Analyzed 28 days ago

    An engine for creating and annotating textual corpora

    38.6K lines of code

    3 current contributors

    over 1 year since last commit

    1 users on Open Hub

    Very Low Activity
    0.0
     
    I Use This

    porter-stem.vim

    Compare

      Analyzed 28 days ago

    Implementation of Porter stemming algorithm in vim script. See https://www.ohloh.net/p/stem-search-vim for a script that makes use of this.

    205 lines of code

    0 current contributors

    over 8 years since last commit

    0 users on Open Hub

    Inactive
    0.0
     
    I Use This

    stem-search.vim

    Compare

      Analyzed 27 days ago

    StmSrch is a reverse-stem searching script. It implements the Porter stemming algorithm, by Martin Porter. It also handles irregular verbs and noun pluralizations. This script can be useful for searching or scanning through corpus files. Each word input to the :StmSrch command will be stemmed ... [More] and then formulated in such a way as to match possible conjugations or pluralizations. Without any word given for input, it will attempt to stem the current word under the cursor. The matching is done using word boundaries so not just any substring will match. For example: - :StmSrch searcher will match any of: - search, searching, searches, searchers, searched, ... and a string of words will work as well, matching in order: - :StmSrch thieves are running from bunnies will match strings of word [Less]

    308 lines of code

    0 current contributors

    almost 15 years since last commit

    0 users on Open Hub

    Inactive
    0.0
     
    I Use This

    He Kupu Tawhito

    Compare

      Analyzed 28 days ago

    979 lines of code

    1 current contributors

    over 5 years since last commit

    0 users on Open Hub

    Inactive
    5.0
     
    I Use This

    Zeitcrawler

    Compare

      Analyzed 28 days ago

    A specialized crawler for the German newspaper 'Die Zeit'. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw text ... [More] file. The project includes scripts to convert it into the XML format for further use with natural language processing tools. [Less]

    1.64K lines of code

    0 current contributors

    about 11 years since last commit

    0 users on Open Hub

    Inactive
    0.0
     
    I Use This

    Équipe Crawler

    Compare

      Analyzed 27 days ago

    A specialized crawler for the French sport newspaper L'Équipe. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw ... [More] text file. The project includes scripts to convert it into the XML format for further use with natural language processing tools. [Less]

    401 lines of code

    0 current contributors

    over 12 years since last commit

    0 users on Open Hub

    Inactive
    0.0
     
    I Use This

    German Political Speeches Corpus-Builder

    Compare

      Analyzed 28 days ago

    Tools to crawl German official speeches repositories in order to gather a corpus. More information to come. A complete version of the corpus including a visualization tool is available here : http://purl.org/corpus/german-speeches

    1.08K lines of code

    0 current contributors

    over 11 years since last commit

    0 users on Open Hub

    Inactive
    0.0
     
    I Use This