Parse CDX response properly and skip failed matches. |
|
More...
|
over 6 years ago
|
Add option to configure whether to seek to the beginning of the incoming-URL queue. |
|
More...
|
over 6 years ago
|
Fix bug in scoping logic. |
|
More...
|
over 6 years ago
|
Fix missing bracket. |
|
More...
|
over 6 years ago
|
Do no record quota-exceeded events in the persist store. |
|
More...
|
over 6 years ago
|
Add a filter for pathalogical redirect loops. |
|
More...
|
over 6 years ago
|
Make filtering outlink by the scope an option. |
|
More...
|
over 6 years ago
|
Switch to rely on dynamic configuration. Reduce partition fetch size. |
|
More...
|
over 6 years ago
|
Add detail counts to logging. |
|
More...
|
over 6 years ago
|
Emit info on queue consumption. |
|
More...
|
over 6 years ago
|
Longer timeout when talking to OutbackCDX. |
|
More...
|
over 6 years ago
|
Do not create topics automatically. |
|
More...
|
over 6 years ago
|
Prevent GeoIP inclusion overriding crawl constraints. |
|
More...
|
over 6 years ago
|
Cope with URLs ending with asterisk. |
|
More...
|
over 6 years ago
|
Fix topic creation. |
|
More...
|
over 6 years ago
|
Handle DNS case. |
|
More...
|
over 6 years ago
|
Ensure separate host folders. |
|
More...
|
over 6 years ago
|
Hash the key appropriately, clean uo config. |
|
More...
|
over 6 years ago
|
Hash the classKey for better distribution across partitions. |
|
More...
|
over 6 years ago
|
Added OWB for manual checking and improved the launch test. |
|
More...
|
over 6 years ago
|
Added automated testing framework. |
|
More...
|
over 6 years ago
|
Revert to old name to be internally consistent. |
|
More...
|
over 6 years ago
|
Ensure SIGTERM is trapped and attempt clean shutdown. |
|
More...
|
over 6 years ago
|
Allow job name and crawl name to be separate. Minor tweaks. |
|
More...
|
over 6 years ago
|
Added CompressibilityDecideRule but commented out for now. |
|
More...
|
over 6 years ago
|
More overrides and Domain Crawl settings included. |
|
More...
|
over 6 years ago
|
Refactor to make code easier to understand. |
|
More...
|
over 6 years ago
|
Improve test. |
|
More...
|
over 6 years ago
|
Added initial implementation of module designed to ensure the GOV.UK Content API gets crawled. |
|
More...
|
over 6 years ago
|
Tune logging. |
|
More...
|
over 6 years ago
|