0
I Use This!
Activity Not Available

Commits : Listings

Analyzed 12 months ago. based on code collected 12 months ago.
Jan 18, 2023 — Jan 18, 2024
Commit Message Contributor Files Modified Lines Added Lines Removed Code Location Date
Evaluate what needs to be done to use spring-data #3 More... over 9 years ago
Evaluate what needs to be done to use spring-data #3 - removed HdfsResource and HdfsResourceLoader - added spring-data dependency More... over 9 years ago
Closes issue #8: Folder "de.tudarmstadt.ukp.dkpro.bigdata" should be removed, pom.xml should be in repo root More... over 9 years ago
HdfsResourceLoaderLocator: Initialize does not call super-method #7 - PARAM_FILESYSTEM is public now - added call of super.init(...) in initialize method More... over 9 years ago
Copied google-code frontpage to readme.md More... over 9 years ago
- add dkpro.document.language parameter More... about 10 years ago
override -> overwrite More... about 10 years ago
- add "dkpro.output.override" parameter. When set to false (default is true), the output folder name is suffixed with a unique integer if it already exists More... about 10 years ago
- restore MultiLineText2CASInputFormat More... over 10 years ago
- fix warning by removing unused import More... over 10 years ago
Revert "Merge Text2CASInputFormat and MultiLineText2CASInputFormat: Allow multi-line CASes for Text2CASInputFormat and set default number of text lines per CAS to 1." More... over 10 years ago
- move "replace variables" code to new AnalysisEngineUtil class More... over 10 years ago
- do not minimize jar, this removes jaxb's com.sun.xml.bind.v2.ContextFactory which is apparently required by Hadoop at runtime (and we explicityly exclude Hadoop classes from the jar which causes the optimizer to remove this class) More... over 10 years ago
- do not change configuration settings for number of mappers and memory reserved for map/reduce jobs, these should be solely configured by the user or cluster default settings (UKP-cluster related settings are in the wrong place here :-) - handle unsplittable input files correctly in Text2CASInputFormat (logic copied from TextInputFormat; before it tried to split e.g. gzip, which doesn't make sense) More... over 10 years ago
Fix maven build: replace org.apache.tools.ant.filters.StringInputStream (which was provided by a test-scope dependency) by apache commons' IOUtils.toInputStream(String) More... over 10 years ago
- remove unused dependency More... over 10 years ago
Merge Text2CASInputFormat and MultiLineText2CASInputFormat: Allow multi-line CASes for Text2CASInputFormat and set default number of text lines per CAS to 1. More... over 10 years ago
Use mapreduce.input.fileinputformat.inputdir instead of map.input.dir and map.input.file. Apparently the latter two don't officially exist and are not present in CDH5. More... over 10 years ago
Really handle URIs correctly in UIMAMapReduceBase More... over 10 years ago
Correctly handle file:/... URIs in UIMAMapReduceBase More... over 10 years ago
- Introduce $taskid variable for UIMA XML pipelines More... over 10 years ago
- fix regression I introduced that caused UIMA configuration variables not to be replaced correctly in some cases (e.g. "$dir"); CasConsumerOutputTest now passes again More... over 10 years ago
- extend CasConsumerOutputTest More... over 10 years ago
- usually UIMA output folder is appended by Hadoop task ID ("uima_output_attempt_..."), make this configurable by dkpro.output.onedirpertask which defaults to true. If set to false no task ID is appended and all output is copied to one directory. More... over 10 years ago
- add $cache variable to allow specification of distributed cache files in UIMA annotator configuration (e.g. "$cache/foo.xml" in combination with "hadoop jar ... -files foo.xml") - add $input variable which contains path to input file or folder (e.g. "hdfs:///path/to/input.txt") More... over 10 years ago
Create dkpro.bigdata.hadoop fat-jar as dkpro-hadoop-[version].jar More... over 10 years ago
- exclude hadoop dependencies from bigdata-hadoop fat jar and minimize jar additionally from 35 down to 7 MB More... over 10 years ago
- add XMLDescriptorRunner that allows to execute UIMA pipelines represented by XML descriptors on Hadoop - add maven shade plugin for creaton of fat jars - make *.arc and *.warc files be handled as text files by git (in .gitattributes) - fix bug in UIMAMapReduceBase that caused variables (e.g. "$dir") defined in UIMA XML descriptors not to be handled correctly - add option dkpro.input.maxlinesperrecord for GenericMultiLineRecordReader to configure number of lines read into one record More... over 10 years ago
- add MultiLineText2CASInputFormat that reads all lines in one split into one CAS (and uses the split.toString() as key) More... over 10 years ago
Hopefully fixed compatibility to hadoop 1.x More... over 10 years ago