Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

Zeitcrawler

Compare

  Analyzed about 1 month ago

A specialized crawler for the German newspaper 'Die Zeit'. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw text ... [More] file. The project includes scripts to convert it into the XML format for further use with natural language processing tools. [Less]

1.64K lines of code

0 current contributors

almost 11 years since last commit

0 users on Open Hub

Activity Not Available
0.0
 
I Use This

Équipe Crawler

Compare

  Analyzed about 1 month ago

A specialized crawler for the French sport newspaper L'Équipe. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw ... [More] text file. The project includes scripts to convert it into the XML format for further use with natural language processing tools. [Less]

401 lines of code

0 current contributors

over 12 years since last commit

0 users on Open Hub

Activity Not Available
0.0
 
I Use This

German Political Speeches Corpus-Builder

Compare

  Analyzed about 1 month ago

Tools to crawl German official speeches repositories in order to gather a corpus. More information to come. A complete version of the corpus including a visualization tool is available here : http://purl.org/corpus/german-speeches

1.08K lines of code

0 current contributors

over 11 years since last commit

0 users on Open Hub

Activity Not Available
0.0
 
I Use This

scalar2

Compare

  Analyzed about 1 month ago

Born-digital, open source, media-rich scholarly publishing that’s as easy as blogging.

470K lines of code

8 current contributors

2 months since last commit

0 users on Open Hub

Activity Not Available
0.0
 
I Use This