0
I Use This!
Activity Not Available

Commits : Listings

Analyzed about 1 year ago. based on code collected about 1 year ago.
Jan 20, 2023 — Jan 20, 2024
Commit Message Contributor Files Modified Lines Added Lines Removed Code Location Date
extracted HEROSHI_VERSION and REAL_USER_AGENT from heroshi/__init__.py to settings More... almost 15 years ago
renamed heroshi.worker.worker to heroshi.worker.Crawler because it honestly contains only Crawler class More... almost 15 years ago
README: added repo URL More... almost 15 years ago
moved configs to separate `etc` directory More... almost 15 years ago
fixed imports in worker.tests, but the tests are still NOT fixed More... almost 15 years ago
cosmetic: annotate unused options in cli_append More... almost 15 years ago
cosmetic: changed format of shared.error.Error str(), unicode() and repr() More... almost 15 years ago
cosmetic: removed unused BIND_PORT from shared package More... almost 15 years ago
`api.report_results` now accepts a single item and is thus renamed to `api.report_result` More... almost 15 years ago
cosmetic: using package-relative imports More... almost 15 years ago
cosmetic: removed unused random_useragent() More... almost 15 years ago
`len() == 0` instead of `len() is 0` More... almost 15 years ago
added contact info into REAL_USER_AGENT More... almost 15 years ago
cosmetic: removed unused keys from config More... almost 15 years ago
cosmetic: imports sorted More... almost 15 years ago
worker: unicode a bit of logging. There were unicode errors while logging. This fix should aid those errors. More... almost 15 years ago
worker: reraising KeyboardInterrupt so worker gets stopped even if timing was so exception raised inside conn.request or page.parse More... almost 15 years ago
worker: extracted setting report['visited'] to one place More... almost 15 years ago
manager: using new-random view to get random urls across all dataset More... almost 15 years ago
fix: worker: was always passing max_queue_size to get_crawl_queue instead of only remainder to-become-full More... almost 15 years ago
worker: extracted full queue pause value to config More... almost 15 years ago
worker: added socket timeout to crawling More... almost 15 years ago
manager: removed the local in-memory NEW_URLS queue More... almost 15 years ago
manager: removed document 'given' sharing lock More... almost 15 years ago
worker: reporting mechanics changed to report each URL just after it was crawled, reports buffer removed More... almost 15 years ago
shared.api uses Factory pool of Http() to cache connections to manager More... almost 15 years ago
data: FactoryPool is also exported More... almost 15 years ago
cosmetic: removed unused imports More... almost 15 years ago
CouchDB queue view renamed from 'not-given' to 'new' More... almost 15 years ago
fix: wrong use of class attributes More... almost 15 years ago