Projects tagged ‘mapreduce’

Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.

1.52M lines of code

0 current contributors

2 months since last commit

4 users on Open Hub

High Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags apache bigdata cache cluster cluster_computing distributed distributed_computing elastic grid grid_computing in_memory java 2 more...

StreamSets Data Collector

Claimed by StreamSets No analysis available

Open source software for the rapid development and reliable operation of complex data flows.

0 lines of code

60 current contributors

0 since last commit

4 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: apache_2

Tags azure bigdata cassandra cluster dataflow ec2 etl hadoop hdfs ingest jdbc kafka 5 more...

Cascading

Analyzed 26 days ago

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.

106K lines of code

0 current contributors

over 11 years since last commit

2 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags data_mining groovy hadoop java machine_learning mapreduce nlp

Apache Whirr

Claimed by Apache Software Foundation Analyzed 26 days ago

Apache Whirr is a set of libraries for running cloud services. Whirr provides: * A cloud-neutral way to run services. You don't have to worry about the idiosyncrasies of each provider. * A common service API. The details of provisioning are particular to the service. * Smart defaults for ... [More]

26.9K lines of code

0 current contributors

over 9 years since last commit

2 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags amazon apache apache-software-foundation aws bigdata cassandra chef cloudcomputing cloudservers data ec2 hadoop 14 more...

Disco

Analyzed 26 days ago

Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers. The Disco core is written in Erlang, a functional language that is designed for ... [More]

29.8K lines of code

0 current contributors

over 8 years since last commit

2 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Erlang

Licenses: BSD-3-Clause

Tags bigdata cluster distributed distributed_computing erlang mapreduce python

Apache Crunch

Analyzed 27 days ago

Apache Crunch is a Java library for writing, testing, and running MapReduce pipelines, based on Google's FlumeJava. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.

125K lines of code

5 current contributors

about 4 years since last commit

2 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags java mapreduce pipeline scala

WikiHadoop

W

Analyzed 26 days ago

WikiHadoop is a set of Hadoop modules focusing onto processing Wikipedia's TB-scale XML dump files. Wikipedia XML dumps with complete edit histories have been difficult to process because of its exceptional size and structure. While a "page" is a common processing unit, one Wikipedia page may ... [More]

4.92K lines of code

0 current contributors

almost 12 years since last commit

1 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Python

Licenses: apache_2

Tags distributed hadoop mapreduce streaming wikipedia

Pangool

Analyzed 26 days ago

Pangool is a Java, low-level MapReduce API. It aims to be a replacement for the Hadoop Java MapReduce API. By implementing an intermediate Tuple-based schema and configuring a Job conveniently, many of the accidental complexities that arise from using the Hadoop Java MapReduce API disappear. Things ... [More]

27.8K lines of code

0 current contributors

about 4 years since last commit

1 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags hadoop mapreduce

Shark - Hive on Spark

S

Analyzed 27 days ago

Hive on Spark

17K lines of code

0 current contributors

over 10 years since last commit

1 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Scala

Licenses: apache_2

Tags bigdata datawarehouse large_scale mapreduce

Tags : Browse Projects