almost 12 years
We're very happy to announce the release of several busy months' work: MetaModel version 3.3!
With the 3.3 release we're opening a couple of new modules to fully-fledged business applications, namely Salesforce and SugarCRM. This means that you
... [More]
can now interact with these business applications just as if they where a regular database or data file. For our typical use-cases in the Information Management area, these additions make interactions a lot easier and unified with database interactions.
Additionally the 3.3 release contains the regular round of bugfixes and improvements, specifically the IBM DB2 support has been further optimized.
For all the details, see What's new in version 3.3 page on the MetaModel website.
almost 12 years
We're happy to announce another release of DataCleaner - version 3.1.2. This version is a minor improvement and bugfix release.
So what's new? Here's the summary:
We've added a web service in the monitoring application for getting a (list of)
... [More]
metric values. This makes the monitoring even more usable as a key infrastructure component, as a way to monitor data (quality) and expose the results to third party applications. Read more in the documentation.
The 'Table lookup' component has been improved by adding join semantics as a configurable property. Using the join semantics you can tweak if you wish the lookup to work semantically like a LEFT JOIN or an INNER JOIN.
The EasyDQ components have been upgraded, adding further configuration options and a richer deduplication result interface.
Performance improvements have been a specific focus of this release. Improvements have been made in the engine of DataCleaner to further utilize a streaming processing approach in certain corner cases which was not covered previously.
For more details on the individual issues worked on, visit our milestone page.
The 3.1.2 release should be a drop-in replacement of other 3.x releases, so go download and upgrade now!
about 12 years
We've added two additional transformations to the EasyDQ additionals package, provided by Human Inference.
The two transformations are:
Country standardization. This allows you to get direct access to EasyDQ's country standardizer and unify
... [More]
different spellings, formats and more of country names and codes.
Similarity evaluator. This feature provides a low-level function for comparing two sets of values. For instance, if you've done a reference data lookup, you will often be interested in knowing if the result of the lookup matches the data that you already have. Using the similarity evaluator you can easily compare the incoming and resulting values and thereby make visible the improvements and changes you are doing to your data with these lookups.
The extension is available, like always, in the Extensions section of the website.
about 12 years
We have a nice little release for you today, which contains the usual maintenance fixes, but also some improvements and minor new features. DataCleaner 3.1.1 is ready for download as of now.
Let's dive into the news ...
The date and time related
... [More]
analysis options have been expanded, adding distribution analyzers for week numbers, months and years. All analyzers related to date and time are now grouped within a submenu called "Date and time" under "Analyze".
An optional "descriptive statistics" option has been added to the Number analyzer and the Date/time analyzer. This option adds additional metrics to the results of these analyzers, such as Median, Skewness, percentiles and Kurtosis. These metrics are optional since their memory footprint is somewhat larger than the existing metrics.
The lines in the timeline charts of the monitoring web application now have small dots in them. This is especially useful for charts with few (or even only one) observations in them - to point out exactly where the observation points are.
The query parser when invoking ad-hoc queries have also been substantially improved. Now queries can contain DISTINCT clauses, *-wildcards, subqueries and are fault-tolerant towards text-case issues.
Two new transformers have been added for generating UUIDs and for generating timestamps.
For the full list of improvements, go to the milestone page on our bugtracker.
We hope you enjoy this release, and go get it immediately from the downloads page.
about 12 years
We've just released MetaModel 3.2.5. Version 3.2.5 improves existing features on a number of areas, primarily on query parsing capabilities and on improved support for DB2 and MS SQL server. Totaling up to 10 significant improvements.
The full
... [More]
list of improvements can be found at the 3.2.5 milestone summary page.
We hope you will enjoy this release, which should be a drop-in replacement for other 3.2.x releases.
MetaModel 3.2.5 is available via our Google Code downloads and via the central Maven repository.
about 12 years
Human Inference is happy to announce that DataCleaner 3.1 has been released and that it is available for download now! With DataCleaner 3.1 we’ve really focused on usability and day-to-day requirements of both the DataCleaner desktop data profiling
... [More]
application, and the web application for continuous data quality monitoring. Features that we feel really aids the user to do what he wants to do. Here’s a summary of what has been done.
Metric formulas – elaborated Data Quality KPIs
It is now possible to build much more elaborate Data Quality KPIs in DataCleaner’s monitoring web application. The user interface allows you to build complex formulas in a spreadsheet-like formula style; using variables collected by DataCleaner jobs.
Metric formulas can combine any number of metrics, constants and operations, as long as it can be expressed in a mathematical equation.
For instance – measure the rate of duplicate records in percentage of the total record count. Or measure the amount of product codes that conform to a set of multiple string patterns.
Ad-hoc querying – of any datastore
With DataCleaner 3.1 you can now perform ad-hoc queries to any datastore! Queries can be expressed in plain SQL and will be applied to databases as well as files, NoSQL databases and more, providing a truly helpful query mechanism to extend into your discovery and data profiling experience.
The query option is also available through a web service to monitoring users with the ADMIN role. The query is provided as a HTTP parameter or POST body, and the result is provided as an XHTML table.
Value matcher – a new analysis option
Often times you have a firm idea on which values should be allowed and expected for a particular field. In DataCleaner there’s always been the Value Distribution analysis option which would help you assert your assumptions. In DataCleaner 3.1 though, you have a more precise offering – the Value matcher. This analysis option allows you to specify a set of expected values and then perform a value distribution like analysis, specifically to validate and identify unexpected values.
Copying, deleting and management of jobs
Management of jobs and results in the DataCleaner monitor application has been improved greatly. You can now click a job in the Scheduling page of the monitor, and find management options available for operations such as renaming, copying, deleting and more. Each operation respects the linkages to other artifacts in the monitor, such as analysis results, schedules and more. This means that management of the monitoring repository has become a lot easier and mature.
Manage data quality history
Sometimes you’re facing situations where you actually want to do monitoring with historic data! It might be that you have historic dumps or backups of databases, which you wish to show and tell the story of. You can now do the analysis of this historic data, upload it to the DataCleaner monitor, and using a new web service, set a historic data of that particular analysis result. This means that your timelines will properly plot the results using their intended date, but with the results that you’ve collected maybe at a later point in time.
Clustered scheduler support (EE only)
The scheduler of DataCleaner monitor has been externalized, so that it can be replaced by the means of simple configuration. In the Enterprise Edition (EE) of DataCleaner, we provide a clustered scheduler, providing the ability to load balance and distribute your executions across a cluster of machines.
Single-signon (SSO) using CAS (EE only)
In the Enterprise Edition (EE) of DataCleaner we now provide a single-signon option for the monitor application. Now DataCleaner can be an integrated part of your IT infrastructure, also security-wise.
... And a lot more
The above is just a summary. More than thirty issues have been resolved in this release. We have solved several requests coming from the forums and community, and we encourage everyone to use this medium as a vehicle for change. We’re very happy to make the development of DataCleaner be heavily influenced by the streams in the community.
For a full list of changes in DataCleaner 3.1, go to the milestone report in our issue tracker.
To download DataCleaner, go to the downloads page and get your copy now.
To learn more, get the documentation, watch screenshots or the webcast demonistrations.
about 12 years
Neopost, the European leader and number two worldwide supplier of mailroom solutions, today announced that it has completed the acquisition of Human Inference.
With products and services marketed in 90 countries and subsidiaries in 29 countries
... [More]
, the Neopost Group has 5,900 employees all over the world, 1,300 sales representatives and 450 R&D engineers.
As the postal sector is undergoing major changes, Neopost is anticipating the needs of its customers by bringing new services and technological innovation to the market. Therefore, Neopost has been acquiring multiple companies; several components have been added to the mix, all relating to the topic of communications between people. Satori software, a US-based data quality vendor has been part of the mix for a while and GMC, a Swiss-based Customers Communications Management vendor has been acquired recently. For Neopost, Human Inference is a strategic acquisition helping them to create the portfolio that they need to bring future-proof solutions to the market and their current customers.
Neopost has chosen Human Inference for its strong expertise, its proven solutions and its splendid reputation. We will continue to operate independently, with an unchanged management team. Our core values will remain to be our guidelines. Our customers will be able to enjoy an even broader set of solutions, which we believe will be in perfect fit with our single customer view-strategy. In addition, Human Inference will be able to use the sales and distribution channels of Neopost, which will give us the opportunity to service new markets.
Human Inference CEO Winfried van Holland said: "We are very pleased to join Neopost. This offers us access to new markets and the support and relationships from a large organization. Our solutions fit perfectly in Neopost’s portfolio. This way Neopost customers, Human Inference customers, common customers and the DataCleaner community members will benefit from a broader range of solutions allowing them to reduce their risk, become more efficient and grow their profit by deploying a single customer view."
See here the press release on the Neopost website.
about 12 years
Who will post the best content for use in DataCleaner?
Human Inference is announcing a competition for the DataCleaner community. The goal is to provide the best contribution for our favourite open source data quality tool.
What kind of
... [More]
Submitted content can be of many forms:
Educational content like tutorials, videos etc.
Regular Expressions for the RegexSwap.
DataCleaner extensions for the ExtensionSwap?.
Reference data for inclusion in the tool.
Use case descriptions – tell the community about your experiences.
Third party tool integration.
We do cherish everything in the community being free. But we will also be giving a nice prize to the winner with the best submission. The exact prize is to be announced shortly. All submissions will be reviewed and mentioned on the DataCleaner website.
Content must be submitted before Christmas (December 24) 2012. Post a comment on this discussion topic to tell the community where and how to retrieve your submitted content. We also encourage people to join our Google+ community hangouts where authors will be invited to present their contributions.
Submitted contributions (so far)
Here's a list of the submitted contributions in the contest so far:
Pentaho Data Integration auto-profiling generator, by Alex Meadows.
about 12 years
Dear DataCleaner users and developers,
We have a new release for you today, version 3.0.3 of DataCleaner. Grab it before your neighbor at the download page.
The focus of this release has been stability, performance and convenience for
... [More]
monitoring repository maintenance. Thus, the new and improved list follows:
We've added a service for renaming jobs in the monitoring repository. You can access this as a RESTful web service or interactively in the UI:
A web service was added for changing the historic date of an analysis result in the monitoring repository. This is convenient if you have historic dumps of data that you wish to include in a timeline.
The documentation has been updated with more elaborate descriptions of the web services available for repository navigation, job invocation and more.
The login dialog in the desktop application had a low-level version conflict, which caused it to be unusable. This has been fixed.
The web application has been made compatible with legacy JSF containers, making the range of applicable Java Webservers wider.
Caching of configuration in the web application was greatly improving, leading to faster page load and job initialization times.
We hope you enjoy this release. It should be 100% backwards compatible with other 3.x releases, so we encourage everyone to upgrade.
about 12 years
We are happy to invite everyone to a new initiative: The DataCleaner community hangout. The community hangout is a chance for users and developers of DataCleaner to meet face-to-face online every once in a while.
The last couple of weeks we've
... [More]
been trying out the new concept with a limited amount of people, and we are now ready to make the invite to everyone with an interest!
The date of the next hangout is Tuesday the 6th of November at 10:00 CET. Please be aware of any timezone differences.
The hangouts are happening on Google+ on a semi-weekly basis. The frequency will be adjusted according to the interest in the community. To kick it off we will from the Human Inference side provide some presentations and discussion topics for the first couple of sessions. But the idea is also to engage users and friends to join the hangouts with their own input.
For the next hangout, project founder Kasper Sørensen will be demoing the new monitoring web application, and how it relates to the traditional desktop application.
For more information, go to our Google+ page and sign up to the next hangout.