21
I Use This!
Activity Not Available

Commits : Listings

Analyzed about 1 year ago. based on code collected about 1 year ago.
Jan 16, 2023 — Jan 16, 2024
Commit Message Contributor Files Modified Lines Added Lines Removed Code Location Date
NUTCH-3024 Remove flaky 'dependency check' target (#795) More... about 1 year ago
Merge pull request #796 from DigitalPebble/NUTCH-3025 More... about 1 year ago
Merged changes from master; improved Javadoc and exception handling More... about 1 year ago
Merge branch 'NUTCH-3017', closes #793 More... about 1 year ago
[NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input - use Hadoop-provided compression codecs - update description of property urlfilter.fast.file More... about 1 year ago
Added filtering on whole string + documented config in nutch-default + fixed tests More... about 1 year ago
NUTCH-3020 -- ParseSegment should check for okhttp's truncation flag (#794) More... about 1 year ago
NUTCH-3019 -- update Tika (#797) More... about 1 year ago
[NUTCH-3025^Curlfilter-fast to filter based on the length of the URL More... about 1 year ago
NUTCH-3014 Standardize Job names (#789) More... about 1 year ago
[NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input More... about 1 year ago
NUTCH-3015 Add more CI steps to GitHub master-build.yml (#790) More... about 1 year ago
NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic (#788) More... about 1 year ago
NUTCH-2990 HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309 (#779) More... about 1 year ago
Merge pull request #776 from tballison/NUTCH-2959 More... over 1 year ago
NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unparsed documents - fall back to UTF-8 when stringifying the content of unparsed documents More... over 1 year ago
update howto_upgrade_tika.txt More... over 1 year ago
Working now locally and with Seb's single_node_cluster tests More... over 1 year ago
Merge remote-tracking branch 'upstream/master' into NUTCH-2959 More... over 1 year ago
NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx) More... over 1 year ago
NUTCH-2853 bin/nutch: remove deprecated commands solrindex, solrdedup, solrclean More... over 1 year ago
NUTCH-2897 Do not supress deprecated API warnings - deprecate constructor of NutchJob - remove deprocated call to Object.finalize() from Plugin.finalize() More... over 1 year ago
NUTCH-3010 Injector: count unique number of injected URLs - add counter urls_injected_unique - improve log messages reporting the counts of injected/merged URLs More... over 1 year ago
NUTCH-3009 Upgrade to Hadoop 3.3.6 More... over 1 year ago
NUTCH-3007 Fix impossible casts - remove code blocks (else clauses) unneeded and containing impossible casts More... over 1 year ago
NUTCH-2852 SpotBugs: Method invokes System.exit(...) - remove all calls of System.exit(...) in methods except main(args) of various "checker" tools More... over 1 year ago
Merge pull request #778 from tballison/NUTCH-3004 More... over 1 year ago
NUTCH-3004 -- propagate ssl exception if message doesn't match "handshake alert..." More... over 1 year ago
NUTCH-2959 -- downgrade commons-io to match the version we expect to come out with Hadoop 3.4.0. More... over 1 year ago
NUTCH-2959 -- bump commons-io More... over 1 year ago