Author Archives: mark

stringdist 0.9.4.2 released

stringdist 0.9.4.2 was accepted on CRAN at the end of last week. This release just fixes a few bugs affecting the stringdistmatrix function, when called with a single argument. From the NEWS file: bugfix in stringdistmatrix(a): value of p, for … Continue reading

Posted in programming, R, string metrics | 2 Comments

validate version 0.1.5 is out

A new version of the validate package for data validation was just accepted on CRAN and will be available on all mirrors in a few days. The most important addition is that you can now reference the data set as … Continue reading

Posted in data cleaning, data correction methods, data manipulation, programming, R, Uncategorized | Leave a comment

Easy data validation with the validate package

The validate package is our attempt to make checking data against domain knowledge as easy as possible. Here is an example. library(magrittr) library(validate) iris %>% check_that( Sepal.Width > 0.5 * Sepal.Length , mean(Sepal.Width) > 0 , if ( Sepal.Width > … Continue reading

Posted in data cleaning, programming, R | 14 Comments

settings 0.2.3

An updated version of the settings package has been accepted on CRAN. The settings package provides alternative options settings management for R. It is aimed to allow for layered options management where global options are the default that can easily … Continue reading

Posted in programming, R | Leave a comment

stringdist 0.9.4 and 0.9.3: distances between integer sequences

A new release of stringdist has been accepted on CRAN. stringdist offers a number of popular distance functions between sequences of integers or characters that are independent of character encoding. version 0.9.4 bugfix: edge case for zero-size for lower tridiagonal … Continue reading

Posted in programming, R, string metrics | Leave a comment

Stringdist 0.9.2: dist objects, string similarities and some deprecated arguments

On 24-06-2015 stringdist 0.9.2 was accepted on CRAN. A summary of new features can be found in the NEWS file; here I discuss the changes with some examples. Computing 'dist' objects with 'stringdistmatrix' The R dist object is used as … Continue reading

Posted in programming, R, string metrics | Leave a comment

stringdist 0.9: exercise all your cores

The latest release of the stringdist package for approximate text matching has two performance-enhancing novelties. First of all, encoding conversion got a lot faster since this is now done from C rather than from R. Secondly, stringdist now employs multithreading … Continue reading

Posted in programming, R, string metrics | 4 Comments

Easy to use option settings management with the 'settings' package

Last week I released a new package called settings. It grew out of my frustration built up during several small projects where I'm generating heavily parameterized d3/js output. What I wanted was support to define a whole bunch of option … Continue reading

Posted in programming, R | Leave a comment

stringdist 0.8: now with soundex

An update to the stringdist package was released earlier this month. Thanks to a contribution of Jan van der Laan the package now includes a method to compute soundex codes as defined here. Briefly, soundex encoding aims to translate words … Continue reading

Posted in data correction methods, data manipulation, R, string metrics | Leave a comment

sort.data.frame

I came accross this post on SO, where several solutions to sorting data.frames are presented. It must have been solved a million times, but here's a solution I like to use. It benefits from the fact that sort is an … Continue reading

Posted in data manipulation, R | 5 Comments