0
I Use This!
Inactive

Commits : Listings

Analyzed 14 days ago. based on code collected 14 days ago.
Jan 28, 2024 — Jan 28, 2025
Commit Message Contributor Files Modified Lines Added Lines Removed Code Location Date
Update LICENSE More... over 9 years ago
Create LICENSE More... over 9 years ago
Merge pull request #2 from beng/bugfix/require-read_data More... about 11 years ago
fix path bug so examine_results can load ruby/read_data.rb More... about 11 years ago
Merge pull request #1 from Kitton/patch-1 More... over 11 years ago
Correct syntax for compare.rb call More... over 11 years ago
use writable stringpair instead of tab seperated string More... about 13 years ago
more work in progress More... about 13 years ago
first cut at MR version of sketching More... about 13 years ago
first cut of pig only version
mat
More... almost 15 years ago
running mr bash version in test.rb
mat
More... over 15 years ago
remove working stuff
mat
More... over 15 years ago
new hadoop version, wip
mat
More... over 15 years ago
first cut of resemblance in hadoop
mat
More... over 15 years ago
per postcode processing
mat
More... over 15 years ago
use num files for num partitions
mat
More... over 15 years ago
bug fix for data shorter than shingle length
mat
More... over 15 years ago
add timestamp to d debug and remove debug from sketcher
mat
More... over 15 years ago
another big performance win by avoiding process dict, 75% util to 100% util wtf?
mat
More... over 15 years ago
huge speedup for sum job (avoid process dict lookup), fix for explode combos, general cleanup
mat
More... over 15 years ago
helper util for exploding dup_id files
mat
More... over 15 years ago
include calculation of representative id from sketch duplicate sets
mat
More... over 15 years ago
move nap stuff (combo ids etc) to split working dir
mat
More... over 15 years ago
bug fix for combo.ids
mat
More... over 15 years ago
fully working. untested with larger dataset, expect mem problems with ruby shortcuts. minor connected component analysis
mat
More... over 15 years ago
calculate overall resemblance for exact n, a and p
mat
More... over 15 years ago
allow multi input dirs, allow multi tasks calls in map_reduce_s, remove state from mappers
mat
More... over 15 years ago
pre multi task refactor
mat
More... over 15 years ago
do a first pass handling the extraction of exact duplicates
mat
More... over 15 years ago
extract parse func from preparer so can have more than one type.
mat
More... over 15 years ago