Version 0.9.5.1 of stringdist is on CRAN. The main new feature, with a huge thanks to our awesome new contributor Chris Muir, is that we made it easy to call stringdist
functionality from your package's C
or C++
code.
The main steps to get it done are:
- Make sure to add
stringdist
to theImports:
andLinkingTo:
fields in yourDESRIPTION
file - Add the
#include <stringdist_api>
to yourC
/C++
source file. - Start using stringdist from
C
!
Here's an example source file
#include <R.h>
#include <Rdefines.h>
#include <stringdist_api.h>
SEXP my_soundex(SEXP strings, SEXP useBytes){
Rprintf("\nWow, using 'stringdist' soundex encoding, from my own C code!\n");
return sd_soundex(strings, useBytes);
}
Great! how can I learn more?
- The full API is desribed in a pdf file that is generated from doxygen that comes with the package. You can find it by typing
?stringdist_api
on theR
command line. - A minimal example package that links to
stringdist
is available on GitHub - A more sophisticated package with more elaborate examples can be found here: refinr (By Chris)
Any other news?
A few fixes, and a couple of long-deprecated function arguments have finally been removed. Check out the NEWS file on CRAN for a complete overview.
Happy coding!
I recently read your benchmarks of stringdist vs RecordLinkage on R-Bloggers. (several years old now) Is stringdist the fastest? I'm trying to do a comparison of around 20,000 x 20,000 records and I'm looking for the fastest way to process through these.
It depends a bit on the distance you use, but I would say in general: yes, especially since stringdist can take advantage of multiple cores.