0
I Use This!
Activity Not Available

Commits : Listings

Analyzed about 1 year ago. based on code collected about 1 year ago.
Jan 20, 2023 — Jan 20, 2024
Commit Message Contributor Files Modified Lines Added Lines Removed Code Location Date
cosmetic. added some not helpful docstrings More... almost 15 years ago
cosmetic More... almost 15 years ago
worker.Crawler: using nullary constructor for connections PoolMap More... almost 15 years ago
rewrote manager and storage: both now encapsulate their state in classes. CouchDB attachments are used to store crawled content. More... almost 15 years ago
data.PoolMap rewritten to use Cache instead of custom timer management More... almost 15 years ago
data.Cache: stop_timer now does one lookup in dict using `pop` method. More... almost 15 years ago
dns: catch socket.herror More... almost 15 years ago
worker.Crawler: major change: now reporting fetch_time in integral miliseconds. (Was: float seconds) More... almost 15 years ago
Custom profiler. Just prints time to log. More... almost 15 years ago
updated links to new online documentation More... almost 15 years ago
manager: prefetching many small chunks of URLs in separate green thread More... almost 15 years ago
added documentation More... almost 15 years ago
manager: caching given items to update them w/o getting first More... almost 15 years ago
worker.cli_crawl: Ctrl+C performs graceful stop (waits for running crawls to finish) More... almost 15 years ago
manager: "postreport buffering": accumulate some reports and then save them in bulk More... almost 15 years ago
dns: catch TypeError. More... almost 15 years ago
worker: catch all errors during robots check More... almost 15 years ago
worker: explicit catching of httplib.BadStatusLine, PageParseError More... almost 15 years ago
Using DNS cache. Tests fixed accordingly. More... almost 15 years ago
worker.Crawler: added exception log points which must be detailized More... almost 15 years ago
worker.Crawler.fetch: proper handling of DnsError More... almost 15 years ago
worker.Crawler: small str.replace() fix: should replace only 1 occurence of host in URI More... almost 15 years ago
worker.Crawler: refactored _process w/o sending report to separate function for simpler 'return report' statements More... almost 15 years ago
worker.tests: critical fix: make_http_response returns proper httplib2 Response More... almost 15 years ago
Caching DNS resolver implementation. Currently using thread-pooled socket.gethostbyname_ex. More... almost 15 years ago
data: new data structure: Cache. More... almost 15 years ago
worker.Crawler: using PoolMap.getc for shortening ask_robots More... almost 15 years ago
data.PoolMap: new `.getc` method acts like `.get` but returns a contextmanager, suitable for `with` statements More... almost 15 years ago
worker.cli_crawl: now heroshi-crawl support URIs in command-line More... almost 15 years ago
cosmetic More... almost 15 years ago