Category Archives: data correction methods

Announcing the simputation package: make imputation simple

I am happy to announce that my simputation package has appeared on CRAN this weekend. This package aims to simplify missing value imputation. In particular it offers standardized interfaces that make it easy to define both imputation method and imputation … Continue reading

Posted in data cleaning, data correction methods, imputation, programming, R | 5 Comments

validate version 0.1.5 is out

A new version of the validate package for data validation was just accepted on CRAN and will be available on all mirrors in a few days. The most important addition is that you can now reference the data set as … Continue reading

Posted in data cleaning, data correction methods, data manipulation, programming, R, Uncategorized | Leave a comment

stringdist 0.8: now with soundex

An update to the stringdist package was released earlier this month. Thanks to a contribution of Jan van der Laan the package now includes a method to compute soundex codes as defined here. Briefly, soundex encoding aims to translate words … Continue reading

Posted in data correction methods, data manipulation, R, string metrics | Leave a comment

Approximate string matching in R

I have released a new version of the stringdist package. Besides a some new string distance algorithms it now contains two convenient matching functions: amatch: Equivalent to R's match function but allowing for approximate matching. ain: Similar to R's %in% … Continue reading

Posted in data correction methods, R, string metrics | 12 Comments

Deductive imputation with the deducorrect package

Missing data hinders statistical analyses. Estimating missing values (imputation) prior to analysis is one way to deal with that. In some cases however, the missings need not be estimated at all, since they can be derived with certainty from other … Continue reading

Posted in data correction methods, imputation, R | Leave a comment

What do your rules look like? editrules 1.8-x answers with the help of igraph

We (Edwin de Jonge and me) have recently updated our editrules package. The most important new features include (beta) support for categorical data. However, in this post I'm going to show some visualizations we included, made possible by Gabor Csardi's … Continue reading

Posted in data correction methods, R | 2 Comments

Improving data quality with deducorrect

Does your raw numerical data suffer from typos? sign errors? variable swaps? rounding errors? You may be able to fix all that with the deducorrect package. Today, we (that is Edwin de Jonge, Sander Scholtus and myself) uploaded the, 1.0-0 … Continue reading

Posted in data correction methods, R | Leave a comment