I have released a new version of the stringdist package.
Besides a some new string distance algorithms it now contains two convenient matching functions:
amatch: Equivalent to R's
matchfunction but allowing for approximate matching.
ain: Similar to R's
# here's an example of amatch
> x <- c('foo', 'bar')
# if we decrease the maximum allowd distance, we get
# just like with 'match' you can control the output of no-matches:
# to see if 'fu' matches approximately with any element of x:
# however, if we allow for larger distances
Check the helpfile of for other options, like how to choose the string distance algorithm.
-1 if a distance was undefined or exceeding a predefined maximum. Now,
these functions return
Inf in such cases, making it easier to do comparisons. It may break your code if you explicitly test output for this.
With the latest release also arrive the latest bugs, so please drop me a line if you happen to stumble upon one.
The next release will probably not include any user-facing changes, but I'm planning to improve performance by smarter memory allocation and better
maxDist handling for some of the string distance algorithms. I've also started working on a 'useBytes' option, which gives considerable performance improvement for edit-based distances, at the cost of getting encoding-dependent results.