openhub.net
Black Duck Software, Inc.
Open Hub
Follow @
OH
Sign In
Join Now
Projects
People
Organizations
Tools
Blog
BDSA
Projects
People
Projects
Organizations
Forums
D
dkpro-bigdata
Settings
|
Report Duplicate
0
I Use This!
×
Login Required
Log in to Open Hub
Remember Me
Activity Not Available
Commits
: Listings
Analyzed
12 months
ago. based on code collected
12 months
ago.
Jan 18, 2023 — Jan 18, 2024
Showing page 2 of 6
Search / Filter on:
Commit Message
Contributor
Files Modified
Lines Added
Lines Removed
Code Location
Date
Evaluate what needs to be done to use spring-data #3
Tobias Horsmann
More...
over 9 years ago
Evaluate what needs to be done to use spring-data #3 - removed HdfsResource and HdfsResourceLoader - added spring-data dependency
Tobias Horsmann
More...
over 9 years ago
Closes issue #8: Folder "de.tudarmstadt.ukp.dkpro.bigdata" should be removed, pom.xml should be in repo root
noname
More...
over 9 years ago
HdfsResourceLoaderLocator: Initialize does not call super-method #7 - PARAM_FILESYSTEM is public now - added call of super.init(...) in initialize method
Tobias Horsmann
More...
over 9 years ago
Copied google-code frontpage to readme.md
Hans-Peter Zorn
More...
over 9 years ago
- add dkpro.document.language parameter
Johannes Simon
More...
about 10 years ago
override -> overwrite
Johannes Simon
More...
about 10 years ago
- add "dkpro.output.override" parameter. When set to false (default is true), the output folder name is suffixed with a unique integer if it already exists
Johannes Simon
More...
about 10 years ago
- restore MultiLineText2CASInputFormat
Johannes Simon
More...
over 10 years ago
- fix warning by removing unused import
Johannes Simon
More...
over 10 years ago
Revert "Merge Text2CASInputFormat and MultiLineText2CASInputFormat: Allow multi-line CASes for Text2CASInputFormat and set default number of text lines per CAS to 1."
Johannes Simon
More...
over 10 years ago
- move "replace variables" code to new AnalysisEngineUtil class
Johannes Simon
More...
over 10 years ago
- do not minimize jar, this removes jaxb's com.sun.xml.bind.v2.ContextFactory which is apparently required by Hadoop at runtime (and we explicityly exclude Hadoop classes from the jar which causes the optimizer to remove this class)
Johannes Simon
More...
over 10 years ago
- do not change configuration settings for number of mappers and memory reserved for map/reduce jobs, these should be solely configured by the user or cluster default settings (UKP-cluster related settings are in the wrong place here :-) - handle unsplittable input files correctly in Text2CASInputFormat (logic copied from TextInputFormat; before it tried to split e.g. gzip, which doesn't make sense)
Johannes Simon
More...
over 10 years ago
Fix maven build: replace org.apache.tools.ant.filters.StringInputStream (which was provided by a test-scope dependency) by apache commons' IOUtils.toInputStream(String)
Johannes Simon
More...
over 10 years ago
- remove unused dependency
Johannes Simon
More...
over 10 years ago
Merge Text2CASInputFormat and MultiLineText2CASInputFormat: Allow multi-line CASes for Text2CASInputFormat and set default number of text lines per CAS to 1.
Johannes Simon
More...
over 10 years ago
Use mapreduce.input.fileinputformat.inputdir instead of map.input.dir and map.input.file. Apparently the latter two don't officially exist and are not present in CDH5.
Johannes Simon
More...
over 10 years ago
Really handle URIs correctly in UIMAMapReduceBase
Johannes Simon
More...
over 10 years ago
Correctly handle file:/... URIs in UIMAMapReduceBase
Johannes Simon
More...
over 10 years ago
- Introduce $taskid variable for UIMA XML pipelines
Johannes Simon
More...
over 10 years ago
- fix regression I introduced that caused UIMA configuration variables not to be replaced correctly in some cases (e.g. "$dir"); CasConsumerOutputTest now passes again
Johannes Simon
More...
over 10 years ago
- extend CasConsumerOutputTest
Johannes Simon
More...
over 10 years ago
- usually UIMA output folder is appended by Hadoop task ID ("uima_output_attempt_..."), make this configurable by dkpro.output.onedirpertask which defaults to true. If set to false no task ID is appended and all output is copied to one directory.
Johannes Simon
More...
over 10 years ago
- add $cache variable to allow specification of distributed cache files in UIMA annotator configuration (e.g. "$cache/foo.xml" in combination with "hadoop jar ... -files foo.xml") - add $input variable which contains path to input file or folder (e.g. "hdfs:///path/to/input.txt")
Johannes Simon
More...
over 10 years ago
Create dkpro.bigdata.hadoop fat-jar as dkpro-hadoop-[version].jar
Johannes Simon
More...
over 10 years ago
- exclude hadoop dependencies from bigdata-hadoop fat jar and minimize jar additionally from 35 down to 7 MB
Johannes Simon
More...
over 10 years ago
- add XMLDescriptorRunner that allows to execute UIMA pipelines represented by XML descriptors on Hadoop - add maven shade plugin for creaton of fat jars - make *.arc and *.warc files be handled as text files by git (in .gitattributes) - fix bug in UIMAMapReduceBase that caused variables (e.g. "$dir") defined in UIMA XML descriptors not to be handled correctly - add option dkpro.input.maxlinesperrecord for GenericMultiLineRecordReader to configure number of lines read into one record
Johannes Simon
More...
over 10 years ago
- add MultiLineText2CASInputFormat that reads all lines in one split into one CAS (and uses the split.toString() as key)
Johannes Simon
More...
over 10 years ago
Hopefully fixed compatibility to hadoop 1.x
Hans-Peter Zorn
More...
over 10 years ago
←
1
2
3
4
5
6
→
This site uses cookies to give you the best possible experience. By using the site, you consent to our use of cookies. For more information, please see our
Privacy Policy
Agree