Tag Archives: data cleaning

validate 1.0.1: new features and a cookbook

Version 1.0.1 of our validate package has arrived on CRAN on 2020-12-08. At the same time, a complete Data Validation Cookbook has been published online, and is also included with the package as a vignette. The new features of validate … Continue reading

Posted in programming, R | Tagged , , , | Leave a comment

validate 0.9.3 is on CRAN

CRAN just accepted the latest version of our R package validate. The validate package provides an infrastructure to perform any data quality check in a flexible and extensible way. This is a minor update with the following new features: New … Continue reading

Posted in R, Uncategorized | Tagged , | 2 Comments

gower 0.2.0 is on CRAN

A new version of R package gower has just been released on CRAN. Thanks to our new contributor David Turner who was kind enough to provide a pull request, gower now also computes weighted gower distances. From the NEWS file: … Continue reading

Posted in programming, R | Tagged , , | 1 Comment

stringdist now with C API

Version of stringdist is on CRAN. The main new feature, with a huge thanks to our awesome new contributor Chris Muir, is that we made it easy to call stringdist functionality from your package's C or C++ code. The … Continue reading

Posted in programming, R | Tagged , | 2 Comments

Track changes in data with the lumberjack %>>%

So you are using this pipeline to have data treated by different functions in R. For example, you may be imputing some missing values using the simputation package. Let us first load the only realistic dataset in R > data(retailers, … Continue reading

Posted in programming, R | Tagged , | Leave a comment

Announcing the simputation package: make imputation simple

I am happy to announce that my simputation package has appeared on CRAN this weekend. This package aims to simplify missing value imputation. In particular it offers standardized interfaces that make it easy to define both imputation method and imputation … Continue reading

Posted in programming, R | Tagged , , | 5 Comments

validate version 0.1.5 is out

A new version of the validate package for data validation was just accepted on CRAN and will be available on all mirrors in a few days. The most important addition is that you can now reference the data set as … Continue reading

Posted in programming, R, Uncategorized | Tagged , , | Leave a comment

Easy data validation with the validate package

The validate package is our attempt to make checking data against domain knowledge as easy as possible. Here is an example. library(magrittr) library(validate) iris %>% check_that( Sepal.Width > 0.5 * Sepal.Length , mean(Sepal.Width) > 0 , if ( Sepal.Width > … Continue reading

Posted in programming, R | Tagged | 14 Comments