5
I Use This!
Activity Not Available

News

Analyzed about 1 year ago. based on code collected about 1 year ago.
Posted over 12 years ago by kasper
We've released version 3.2.1 of ​MetaModel. This release is a minor feature enhancement and bugfix release. Here's the list of changes: We've drastically improved the performance of "DELETE FROM" statements on CSV files. We've added mapping of ... [More] unavailable-to-available data types when issuing "CREATE TABLE" statements containing unavailable data types on eg. DB2 or PostgreSQL. In these cases a proper data type will be automatically applied, eg. to use SMALLINT instead of BOOLEAN on DB2, or use BYTEA instead of BLOB on PostgreSQL. A bug pertaining to multithreaded execution of compiled JDBC queries was fixed. We've created a pool of prepared statements to ensure parallel execution of compiled queries. A bug pertaining to proper quoting of HAVING clause operands was fixed. When the data type of an aggregate function is different from data type of the functions argument, the quoting would not be correct under given circumstances. Refer to the roadmap milestone for more details. MetaModel 3.2.1 is ​available for direct download or as a Maven dependency. [Less]
Posted over 12 years ago by kasper
It's friday afternoon and we have a little weekend gift to share with everyone. The last couple of weeks we've been working on a number of small but nice feature improvements and minor bugfixes in DataCleaner. These are now all available in ... [More] DataCleaner version 3.0.2 - ​go grab it at the downloads page. Here's a wrap-up of the work that we've done: When triggering a job in the monitoring web application, the panel auto-refreshes every second to get the latest state of the execution. File-based datastores (such as CSV or Excel spreadsheets) with absolute paths are now correctly resolved in the monitoring web application. The "Select from key/value map" transformer now supports nested select expressions like "Address.Street" or "orderlines[0].product.name". The table lookup mechanism have been optimized for performance, using prepared statements when running against JDBC databases. Administrators can now download file-based datastores directly from the "Datastores" page. Exception handling in the monitoring web application has been improved a bit, making the error messages more precise and intuitive. We hope you enjoy the new version. It should be a drop-in replacement of previous DataCleaner 3 releases, so no need to wait, upgrade now. If you're using DataCleaner and think it would be fun to meet up with team members from Human Inference who work on the product, as well as consultants and other users of it - join our new ​Google+ page from where we will start doing community hangouts and thereby invite you to share ideas, questions and good vibes. [Less]
Posted over 12 years ago by kasper
Thank you to all for the positive attention about our recent ​DataCleaner 3 release. With this information we've been able to quickly and effectively identify a few minor improvements and have introduced these in a new release: Version 3.0.1. The ... [More] primary bugfix in this release was about restoring the mapping of columns and specific enumerable categorizations. For instance in the new Completeness analyzer, we found that after reloading a saved job, the mapping was not always correct. Furthermore a few internal improvements have been made, making it easier to deploy the DataCleaner monitor web application in environments using the Spring Framework. Last but not least, the visualization settings in the desktop application have been improved by automatically taking a look at the job being visualized and toggling displayed artifacts based on the screen size and amount of details needed to show it nicely. DataCleaner 3.0.1 is available for download on our ​downloads page. We wish you good luck cleaning your data, and enjoy the software. [Less]
Posted over 12 years ago by kasper
Dear friends, users, customers, developers, analysts, partners and more! After an intense period of development and a long wait, it is our pleasure to finally announce that DataCleaner 3 is available. We at Human Inference invite you all to our ... [More] celebration! Impatient to try it out? Go ​download it right now! So what is all the fuzz about? Well, in all modesty, we think that with DataCleaner 3 we are redefining 'the premier open source data quality solution'. With DataCleaner 3 we've embraced a whole new functional area of data quality, namely data monitoring. Traditionally, DataCleaner has its roots in data profiling. In the former years, we've added several related additional functions:- transformations, data cleansing, duplicate detection and more. With data monitoring we basically deliver all of the above, but in a continuous environment for analyzing, improving and reporting on your data. Furthermore, we will deliver these functions in a centralized web-based system. So how will the users benefit from this new data monitoring environment? We've tried to answer this question using a series of images: Monitor the evolution of your data: Share your data quality analysis with everyone: Continuously monitor and improve your data's quality: Connect DataCleaner to your infrastructure using web services: The monitoring web application is a fully fledged environment for data quality, covering several functional and non-functional areas: Display of timeline and trends of data quality metrics Centralized repository for managing and containing jobs, results, timelines etc. Scheduling and auditing of DataCleaner jobs Providing web services for invoking DataCleaner transformations Security and multi-tenancy Alerts and notifications when data quality metrics are out of their expected comfort zones. Naturally, the traditional desktop application of DataCleaner continues to be the tool of choice for expert users and one-time data quality efforts. We've even enhanced the desktop experience quite substantially: There is a new Completeness analyzer which is very useful for simply identifying records that have incomplete fields. You can now export DataCleaner results to nice-looking HTML reports that you can give to your manager, or send to your XML parser! The new monitoring environment is also closely integrated with the desktop application. Thus, the desktop application now has the ability to publish jobs and results to the monitor repository, and to be used as an interactive editor for content already in the repository. New date-oriented transformations are now available: Date range filter, which allows you to subset datasets based on date ranges, and format date, which allows to format a date using a date mask. The Regex Parser (which was previously only available through ​the ExtensionSwap) has now been included in DataCleaner. This makes it very convenient to parse and standardize rich text fields using regular expressions. There's a new Text case transformer available. With this transformation you can easily convert between upper/lower case and proper capitalization of sentences and words. Two new search/replace transformations have been added: Plain search/replace and Regex search/replace. The user experience of the desktop application has been improved. We've added several in-application help messages, made the colors look brighter and clearer and improved the font handling. More than 50 features and enhancements were implemented in this release, in addition to incorporating several hundreds of upstream improvements from dependent projects. We hope you will enjoy everything that is new about DataCleaner 3. And do watch out for follow-up material in the coming weeks and months. We will be posting more and more online material and examples to demonstrate the wonderful new features that we are very proud of. [Less]
Posted over 12 years ago by kasper
Today we've released version 3.0.1 of  MetaModel. This is a minor point release which contains the following bugfixes and improvements: Fixed a bug pertaining to "first row" semantics in the JDBC module. This issue was effective when both "first ... [More] row" and "max rows" was specified - one more row than desired would be produced. The toSql() method of Table Creation builders now includes NOT NULL and PRIMARY KEY tokens in the ANSI SQL statement. The documentation for POJO datastores has been updated since it contained a minor compilation issue due to an ambigiuous constructor. A bug in the IBM DB2 support, related to handling of BLOBs was fixed. This should be a drop-in replacement for version 3.0, so we encourage everyone to upgrade. [Less]
Posted over 12 years ago by kasper
We've finally come to the day where we get to push the big red RELEASE button on the MetaModel 3.0 project! This release is very significant since it marks the point where MetaModel is for the first time able to call itself a full CRUD capable API ... [More] for practically any data format. Go to the  MetaModel website to read all about what a nice release this is:  What's new in MetaModel 3.0?  Using the new POJO based datastore  Check the full CRUD example on the frontpage. Congratulations to everyone involved in this release. We hope you will all appreciate this major arcievement and help us spread the word about MetaModel even more. [Less]
Posted over 12 years ago by manuel
We are celebrating the plans to build a version 3.0 of DataCleaner, where we hope to be pushing the limits of what you can expect from your open source data quality applications. A few big themes for version 3.0 has already been decided: A data ... [More] quality monitoring web application. A multi-tenant repository for data quality artifacts (jobs, profiling results, configurations, datastore definitions etc.) Being able to edit data (in the desktop application). Wizards to guide users through their first-time user experience with DataCleaner. Go read Kasper Sørensen's  blog post about the data quality monitoring application, which underlines the general direction and scope of the release! [Less]
Posted over 12 years ago by barry
Human Inference is arranging two sessions of free online training, for people wishing to learn about data profiling, data cleansing, deduplication and more ... And of course, how you do it all in DataCleaner. The two sessions are (click a link to ... [More] read more and to register):  May 29th, free training for European and APAC timezones.  June 7th, free training for US timezones. We hope to see a lot of people join in on the training, which we hope will be a good and fun event, where you'll learn about data quality, data quality tools and get a chance to say hello to other community members. [Less]
Posted over 12 years ago by kasper
DataCleaner 2.5.2 has just been released. The DataCleaner 2.5.2 release is a minor release, but does contain some significant feature improvements and enhancements. Here's a walkthrough of this release: Apache CouchDB support We've added support ... [More] for the NoSQL database  Apache CouchDB. DataCleaner supports both reading from, analyzing and writing to your CouchDB instances. Connect to CouchDB databases Update table writer Following our previous efforts to bring ETLightweight-style features into DataCleaner, we've added a writer which updates records in a table. You can use this for example to insert or update records based on specific conditions. Like the Insert into table writer, the new DataCleaner Update table writer is not restricted to SQL-based databases, but any datastore type which supports writing (currently relational databases, CSV files, Excel spreadsheets, MongoDB databases and MongoDB databases), but the semantics are the same as with a traditional UPDATE TABLE statement in SQL. Drill-to-detail information saved in result files When using the Save result feature of DataCleaner 2.5, some users experienced that their drill-to-detail information was lost. In DataCleaner 2.5.2 we now also persist this information, making your DQ archives much more valuable when investigating historic data incidents. Improved EasyDQ error handling The EasyDQ components have been improved in terms of error handling. If a momentary network issue occurs or another similar issue causes a few records to fail, the EasyDQ components will now gracefully recover and most importantly - your batch work will prevail even in spite of errors. Table mapping for NoSQL datastores Since CouchDB and MongoDB are not table based, but have a more dynamic structure we provide two approaches to working with them: The default, which is to let DataCleaner autodetect a table structure, and the advanced which allows you to manually specify your desired table structure. Previously the advanced option was only available through XML configuration, but now the user interface contains appropriate dialogs for doing this directly in the application. We hope you enjoy the new 2.5.2 version of DataCleaner. Go get it now at the  downloads page. [Less]
Posted almost 13 years ago by kasper
Today we announce an exciting new partnership with Pentaho, the leading open source Business Intelligence and Business Analytics stack! For the past years Human Inference, members of the DataCleaner community and Pentaho have been in close contact ... [More] to design  a new data quality package for the Pentaho Suite. DataCleaner plays a key part in this new solution. DataCleaner’s integration in Pentaho is primarily focused on the open source ETL product, Pentaho Data Integration (aka Kettle). Pentaho and Human Inference will be running a joint webinar on May 10th to tell everyone about all the new features ( register for the webinar here), but until then – here’s a summary! Profile ETL steps using DataCleaner When working with ETL you often find yourself asking what kinds of values to expect for a particular transformation. With the data quality package for Pentaho we offer a unique integration of profiling and ETL: Simply right click any step in your transformation, select ‘Profile’, and it will start up DataCleaner with the data available for profiling, which the step produces! Not only is this a great feature for Pentaho Data Integration, it is also a one-of-a-kind in the ETL space. We are very excited to see this great use of embedding DataCleaner into other applications. Right click any step to profile Execute DataCleaner job Another great feature in the Pentaho data quality package is that you now orchestrate and execute DataCleaner jobs using Pentaho Data Integration. This makes it significantly easier to manage scheduled executions, data quality monitoring and orchestration of multiple DataCleaner jobs. Mix and match DataCleaner’s DQ jobs with Kettle’s transformations and you’ve got the best of both worlds. Execute DataCleaner jobs as part of your ETL flow EasyDQ integration Additionally, the data quality package for Pentaho contains the  EasyDQ cleansing functions as ETL steps, similar to what you know from their DataCleaner counterparts. Deduplication and merging via DataCleaner In addition to embedding DataCleaner for profiling of steps, you can also start up DataCleaner when browsing databases in Pentaho Data Integration. This will create a database connection which is appropriate for more in-depth interactions with the Database. For example, you can use it to find duplicates in your source or destination databases. Detect duplicates in your sources For more information: The press release from Pentaho:  Pentaho announces new Data Quality solution Installation instructions and information from Pentaho:  Pentaho wiki: Human Inference Example of using the DataCleaner profiler with Pentaho:  Pentaho wiki: Kettle Data Profiling with DataCleaner Information about the EasyDQ functions for Pentaho:  EasyDQ Pentaho page [Less]