Category Archives: data manipulation

Track changes in data with the lumberjack %>>%

So you are using this pipeline to have data treated by different functions in R. For example, you may be imputing some missing values using the simputation package. Let us first load the only realistic dataset in R > data(retailers, … Continue reading

Posted in data cleaning, data manipulation, programming, R | Leave a comment

validate version 0.1.5 is out

A new version of the validate package for data validation was just accepted on CRAN and will be available on all mirrors in a few days. The most important addition is that you can now reference the data set as … Continue reading

Posted in data cleaning, data correction methods, data manipulation, programming, R, Uncategorized | Leave a comment

stringdist 0.8: now with soundex

An update to the stringdist package was released earlier this month. Thanks to a contribution of Jan van der Laan the package now includes a method to compute soundex codes as defined here. Briefly, soundex encoding aims to translate words … Continue reading

Posted in data correction methods, data manipulation, R, string metrics | Leave a comment

sort.data.frame

I came accross this post on SO, where several solutions to sorting data.frames are presented. It must have been solved a million times, but here's a solution I like to use. It benefits from the fact that sort is an … Continue reading

Posted in data manipulation, R | 5 Comments