23
I Use This!
Activity Not Available

News

Analyzed about 1 year ago. based on code collected about 1 year ago.
Posted over 13 years ago by Matt Casters
Today, one of our community members posted a deviously simply XML format on the forum that needed to be parsed.  The format looks like this: USD GBP 1 1 Fri, 01 Jun 2001 22:50:00 GMT 1.4181 1.4177 USD [...]
Posted over 13 years ago by Matt Casters
Dear Kettle users, Most of you usually use a data integration engine to process data in a batch-oriented way.  Pentaho Data Integration (Kettle) is typically deployed to run monthly, nightly, hourly workloads.  Sometimes folks run micro-batches of ... [More] work every minute or so.  However, it’s lesser known that our beloved transformation engine can also be used to [...] [Less]
Posted over 13 years ago by Matt Casters
I took the time out to build a high level overview of all the new big ticket items that are going to be in the upcoming version 4.2 of Kettle (Pentaho Data Integration).
Posted over 13 years ago by Matt Casters
Dear Kettle friends, on occasion we need to support environments where not only a lot of data needs to be processed but also in frequent batches.  For example, a new data file with hundreds of thousands of rows arrives in a folder every few seconds. In this setting we want to use clustering to use “commodity” computing [...]
Posted over 13 years ago by Matt Casters
Dear Kettlers, A couple of years ago I wrote a post about key/value tables and how they can ruin the day of any honest person that wants to create BI solutions.  The obvious advice I gave back then was to not use those tables in the first place if you’re serious about a BI solution.  And [...]
Posted over 13 years ago by Matt Casters
Dear Kettle fans, At the end of last year while we were doing a lot of optimizations and testing with embedding Pentaho Data Integration in Hadoop we came upon the brilliant idea to write a single threaded engine. The idea back then was that since Hadoop itself was already using parallelism it might be more efficient for [...]
Posted almost 14 years ago by Matt Casters
Dear Kettle fans, As you can tell from the Kettle JDBC driver project and also from the Talend job execution job entry (if you’re still wondering, that was NOT a joke) we announced a few weeks ago, we’re constantly looking for new and better ways to integrate Kettle into the wide world. Today I’m blogging to spread [...]
Posted almost 14 years ago by Matt Casters
Dear Kettle friends, I know a number of you have been asking for this feature to facilitate your migration projects for a while. However, today the “Talend Job Execution” job entry has finally arrived in Kettle! Take a look at this screen shot demonstrating how easy it is to integrate: The way that this works is by first [...]
Posted almost 14 years ago by Matt Casters
Dear Kettle friends, Some time ago while I visited the nice folks from Human Inference in Arnhem, I ran into Kasper Sørensen, the lead developer of DataCleaner. DataCleaner is an open source data quality tool released (like Kettle) under the LGPL license.  It is essentially to blame for the lack of a profiling tool inside of Kettle.  [...]
Posted almost 14 years ago by Matt Casters
How to read data from MongoDB using PDI 4.2