stringdist 0.9.6 on CRAN: new features

stringdist version 0.9.6 arrived on CRAN on 16 july 2020.

This release brings a few new features.

Fuzzy text search

Search text for approximate matches of a search string using any stringdist distance. There are several functions that allow you to

  • detect whether there is a match within a certain maximum distance
  • return the position of the first best match
  • return the best match.

There are several interfaces for this. Functions grab and grabl work like base grep and grepl. The function extract has output similar to stringr::str_extract. The workhorse function is called afind (approximate find), which returns all results for multiple search patterns.

There is also a new implementation of the popular 'cosine' distance that I developed especially for this purpose. It is called 'running_cosine' and it avoids double work otherwise done with by the standard 'cosine' method. The result is a much faster implementation (up to about 100 times faster).

string similarity matrices

Thanks to a PR by Johannes Gruber stringdist now has a function to compute string similarity matrices: stringsimmatrix

This entry was posted in programming, R, Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.