How to set library path on a {parallel} R cluster

In R you can add extra library locations (directories where your packages are installed) with the .libPaths() function. For example, to add "~/my/lib", you can do

libs <- c("~/my/lib", .libPaths())
libPaths(new = libs)

If you want to set library locations for all workers in a cluster using the parallel package, the intuitive way of doing this is as follows.

libs <- c("~/my/lib", .libPaths())
cluster <- parallel::makeCluster(2)
clusterCall(cluster, .libPaths, new=libs)

However, this does not work. I have not spent any time figuring out why, but presumably the side effect caused by .libPaths() is sent to the wrong place. Here are the internals of .libPaths().

> .libPaths
function (new) 
    if (!missing(new)) {
        new <- Sys.glob(path.expand(new))
        paths <- c(new,, .Library)
        paths <- paths[dir.exists(paths)]
        .lib.loc <<- unique(normalizePath(paths, "/"))
    else .lib.loc

The side effect is where .lib.loc is altered.

In any case, the following approach does work. We export the libs variable to the workers and then set libPaths() using clusterEvalQ().

e <- new.env()
e$libs <- c("~/my/lib", .libPaths())

cluster <- makeCluster(2)
clusterExport(cluster, "libs", envir=e)
clusterEvalQ(cluster, .libPaths(libs))

Update (2020-12-23) I posted a question about this on the R-devel mailinglist, and Luke Thierny was kind enough to explain what is happening here. He also provides a simpler workaround, namely by passing .libPaths as a string.

clusterCall(cluster, ".libPaths", new=libs)
This entry was posted in programming, R and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.