D

dkpro-bigdata

Settings | Report Duplicate

0

I Use This!

Activity Not Available

Commits : Listings

Analyzed 12 months ago. based on code collected 12 months ago.

Commit Message	Contributor	Files Modified	Lines Added	Lines Removed	Code Location	Date
Jan 18, 2023 — Jan 18, 2024 Showing page 2 of 6 Search / Filter on:
Evaluate what needs to be done to use spring-data #3	Tobias Horsmann	More...				over 9 years ago
Evaluate what needs to be done to use spring-data #3 - removed HdfsResource and HdfsResourceLoader - added spring-data dependency	Tobias Horsmann	More...				over 9 years ago
Closes issue #8: Folder "de.tudarmstadt.ukp.dkpro.bigdata" should be removed, pom.xml should be in repo root	noname	More...				over 9 years ago
HdfsResourceLoaderLocator: Initialize does not call super-method #7 - PARAM_FILESYSTEM is public now - added call of super.init(...) in initialize method	Tobias Horsmann	More...				over 9 years ago
Copied google-code frontpage to readme.md	Hans-Peter Zorn	More...				over 9 years ago
- add dkpro.document.language parameter	Johannes Simon	More...				about 10 years ago
override -> overwrite	Johannes Simon	More...				about 10 years ago
- add "dkpro.output.override" parameter. When set to false (default is true), the output folder name is suffixed with a unique integer if it already exists	Johannes Simon	More...				about 10 years ago
- restore MultiLineText2CASInputFormat	Johannes Simon	More...				over 10 years ago
- fix warning by removing unused import	Johannes Simon	More...				over 10 years ago
Revert "Merge Text2CASInputFormat and MultiLineText2CASInputFormat: Allow multi-line CASes for Text2CASInputFormat and set default number of text lines per CAS to 1."	Johannes Simon	More...				over 10 years ago
- move "replace variables" code to new AnalysisEngineUtil class	Johannes Simon	More...				over 10 years ago
- do not minimize jar, this removes jaxb's com.sun.xml.bind.v2.ContextFactory which is apparently required by Hadoop at runtime (and we explicityly exclude Hadoop classes from the jar which causes the optimizer to remove this class)	Johannes Simon	More...				over 10 years ago
- do not change configuration settings for number of mappers and memory reserved for map/reduce jobs, these should be solely configured by the user or cluster default settings (UKP-cluster related settings are in the wrong place here :-) - handle unsplittable input files correctly in Text2CASInputFormat (logic copied from TextInputFormat; before it tried to split e.g. gzip, which doesn't make sense)	Johannes Simon	More...				over 10 years ago
Fix maven build: replace org.apache.tools.ant.filters.StringInputStream (which was provided by a test-scope dependency) by apache commons' IOUtils.toInputStream(String)	Johannes Simon	More...				over 10 years ago
- remove unused dependency	Johannes Simon	More...				over 10 years ago
Merge Text2CASInputFormat and MultiLineText2CASInputFormat: Allow multi-line CASes for Text2CASInputFormat and set default number of text lines per CAS to 1.	Johannes Simon	More...				over 10 years ago
Use mapreduce.input.fileinputformat.inputdir instead of map.input.dir and map.input.file. Apparently the latter two don't officially exist and are not present in CDH5.	Johannes Simon	More...				over 10 years ago
Really handle URIs correctly in UIMAMapReduceBase	Johannes Simon	More...				over 10 years ago
Correctly handle file:/... URIs in UIMAMapReduceBase	Johannes Simon	More...				over 10 years ago
- Introduce $taskid variable for UIMA XML pipelines	Johannes Simon	More...				over 10 years ago
- fix regression I introduced that caused UIMA configuration variables not to be replaced correctly in some cases (e.g. "$dir"); CasConsumerOutputTest now passes again	Johannes Simon	More...				over 10 years ago
- extend CasConsumerOutputTest	Johannes Simon	More...				over 10 years ago
- usually UIMA output folder is appended by Hadoop task ID ("uima_output_attempt_..."), make this configurable by dkpro.output.onedirpertask which defaults to true. If set to false no task ID is appended and all output is copied to one directory.	Johannes Simon	More...				over 10 years ago
- add $cache variable to allow specification of distributed cache files in UIMA annotator configuration (e.g. "$cache/foo.xml" in combination with "hadoop jar ... -files foo.xml") - add $input variable which contains path to input file or folder (e.g. "hdfs:///path/to/input.txt")	Johannes Simon	More...				over 10 years ago
Create dkpro.bigdata.hadoop fat-jar as dkpro-hadoop-[version].jar	Johannes Simon	More...				over 10 years ago
- exclude hadoop dependencies from bigdata-hadoop fat jar and minimize jar additionally from 35 down to 7 MB	Johannes Simon	More...				over 10 years ago
- add XMLDescriptorRunner that allows to execute UIMA pipelines represented by XML descriptors on Hadoop - add maven shade plugin for creaton of fat jars - make .arc and .warc files be handled as text files by git (in .gitattributes) - fix bug in UIMAMapReduceBase that caused variables (e.g. "$dir") defined in UIMA XML descriptors not to be handled correctly - add option dkpro.input.maxlinesperrecord for GenericMultiLineRecordReader to configure number of lines read into one record	Johannes Simon	More...				over 10 years ago
- add MultiLineText2CASInputFormat that reads all lines in one split into one CAS (and uses the split.toString() as key)	Johannes Simon	More...				over 10 years ago
Hopefully fixed compatibility to hadoop 1.x	Hans-Peter Zorn	More...				over 10 years ago

←
1
2
3
4
5
6
→