How to set library path on a {parallel} R cluster

In R you can add extra library locations (directories where your packages are installed) with the .libPaths() function. For example, to add "~/my/lib", you can do

libs <- c("~/my/lib", .libPaths())
libPaths(new = libs)

If you want to set library locations for all workers in a cluster using the parallel package, the intuitive way of doing this is as follows.

libs <- c("~/my/lib", .libPaths())
cluster <- parallel::makeCluster(2)
clusterCall(cluster, .libPaths, new=libs)

However, this does not work. I have not spent any time figuring out why, but presumably the side effect caused by .libPaths() is sent to the wrong place. Here are the internals of .libPaths().

> .libPaths
function (new) 
{
    if (!missing(new)) {
        new <- Sys.glob(path.expand(new))
        paths <- c(new, .Library.site, .Library)
        paths <- paths[dir.exists(paths)]
        .lib.loc <<- unique(normalizePath(paths, "/"))
    }
    else .lib.loc
}

The side effect is where .lib.loc is altered.

In any case, the following approach does work. We export the libs variable to the workers and then set libPaths() using clusterEvalQ().

e <- new.env()
e$libs <- c("~/my/lib", .libPaths())

cluster <- makeCluster(2)
clusterExport(cluster, "libs", envir=e)
clusterEvalQ(cluster, .libPaths(libs))

Update (2020-12-23) I posted a question about this on the R-devel mailinglist, and Luke Thierny was kind enough to explain what is happening here. He also provides a simpler workaround, namely by passing .libPaths as a string.

clusterCall(cluster, ".libPaths", new=libs)
This entry was posted in programming, R and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

*