openhub.net
Black Duck Software, Inc.
Open Hub
Follow @
OH
Sign In
Join Now
Projects
People
Organizations
Tools
Blog
BDSA
Projects
People
Projects
Organizations
Forums
H
heroshi
Settings
|
Report Duplicate
0
I Use This!
×
Login Required
Log in to Open Hub
Remember Me
Activity Not Available
Commits
: Listings
Analyzed
about 1 year
ago. based on code collected
about 1 year
ago.
Jan 20, 2023 — Jan 20, 2024
Showing page 6 of 9
Search / Filter on:
Commit Message
Contributor
Files Modified
Lines Added
Lines Removed
Code Location
Date
cosmetic. added some not helpful docstrings
Sergey Shepelev
More...
almost 15 years ago
cosmetic
Sergey Shepelev
More...
almost 15 years ago
worker.Crawler: using nullary constructor for connections PoolMap
Sergey Shepelev
More...
almost 15 years ago
rewrote manager and storage: both now encapsulate their state in classes. CouchDB attachments are used to store crawled content.
Sergey Shepelev
More...
almost 15 years ago
data.PoolMap rewritten to use Cache instead of custom timer management
Sergey Shepelev
More...
almost 15 years ago
data.Cache: stop_timer now does one lookup in dict using `pop` method.
Sergey Shepelev
More...
almost 15 years ago
dns: catch socket.herror
Sergey Shepelev
More...
almost 15 years ago
worker.Crawler: major change: now reporting fetch_time in integral miliseconds. (Was: float seconds)
Sergey Shepelev
More...
almost 15 years ago
Custom profiler. Just prints time to log.
Sergey Shepelev
More...
almost 15 years ago
updated links to new online documentation
Sergey Shepelev
More...
almost 15 years ago
manager: prefetching many small chunks of URLs in separate green thread
Sergey Shepelev
More...
almost 15 years ago
added documentation
Sergey Shepelev
More...
almost 15 years ago
manager: caching given items to update them w/o getting first
Sergey Shepelev
More...
almost 15 years ago
worker.cli_crawl: Ctrl+C performs graceful stop (waits for running crawls to finish)
Sergey Shepelev
More...
almost 15 years ago
manager: "postreport buffering": accumulate some reports and then save them in bulk
Sergey Shepelev
More...
almost 15 years ago
dns: catch TypeError.
Sergey Shepelev
More...
almost 15 years ago
worker: catch all errors during robots check
Sergey Shepelev
More...
almost 15 years ago
worker: explicit catching of httplib.BadStatusLine, PageParseError
Sergey Shepelev
More...
almost 15 years ago
Using DNS cache. Tests fixed accordingly.
Sergey Shepelev
More...
almost 15 years ago
worker.Crawler: added exception log points which must be detailized
Sergey Shepelev
More...
almost 15 years ago
worker.Crawler.fetch: proper handling of DnsError
Sergey Shepelev
More...
almost 15 years ago
worker.Crawler: small str.replace() fix: should replace only 1 occurence of host in URI
Sergey Shepelev
More...
almost 15 years ago
worker.Crawler: refactored _process w/o sending report to separate function for simpler 'return report' statements
Sergey Shepelev
More...
almost 15 years ago
worker.tests: critical fix: make_http_response returns proper httplib2 Response
Sergey Shepelev
More...
almost 15 years ago
Caching DNS resolver implementation. Currently using thread-pooled socket.gethostbyname_ex.
Sergey Shepelev
More...
almost 15 years ago
data: new data structure: Cache.
Sergey Shepelev
More...
almost 15 years ago
worker.Crawler: using PoolMap.getc for shortening ask_robots
Sergey Shepelev
More...
almost 15 years ago
data.PoolMap: new `.getc` method acts like `.get` but returns a contextmanager, suitable for `with` statements
Sergey Shepelev
More...
almost 15 years ago
worker.cli_crawl: now heroshi-crawl support URIs in command-line
Sergey Shepelev
More...
almost 15 years ago
cosmetic
Sergey Shepelev
More...
almost 15 years ago
←
1
2
3
4
5
6
7
8
9
→
This site uses cookies to give you the best possible experience. By using the site, you consent to our use of cookies. For more information, please see our
Privacy Policy
Agree