Path assignment to setwd() is delayed in for/foreach loop - r

The objective is to change within a for loop the current working directory and do some other stuff in it,.e.g. searching for files. The paths are stored in generic variables.
The R code I am running for this is the following:
require("foreach")
# The following lines are generated by an external tool and stored in filePath.info
# Loaded via source("filePaths.info")
result1 <- '/home/user/folder1'
result2 <- '/home/user/folder2'
result3 <- '/home/user/folder3'
number_results <- 3
# So I know that I have all in all 3 folders with results by number_results
# and that the variable name that contains the path to the results is generic:
# string "result" plus 1:number_results.
# Now I want to switch to each result path and do some computation within each folder
start_dir <- getwd()
print(paste0("start_dir: ",start_dir))
# For every result folder switch into the directory of the folder
foreach(i=1:number_results) %do% {
# for (i in 1:number_results){ leads to the same output
# Assign path in variable, not the variable name as string: current_variable <- result1 (not string "result1")
current_variable <- eval(parse(text = paste0("result", i)))
print(paste0(current_variable, " in interation_", i))
# Set working directory to string in variable current_variable
current_dir <- setwd(current_variable)
print(paste0("current_dir: ",current_dir))
# DO SOME OTHER STUFF WITH FILES IN THE CURRENT FOLDER
}
# Switch back into original directory
current_dir <- setwd(start_dir)
print(paste0("end_dir: ",current_dir))
The output is the following ...
[1] "start_dir: /home/user"
[1] "/home/user/folder1 in interation_1"
[1] "current_dir: /home/user"
[1] "/home/user/folder2 in interation_2"
[1] "current_dir: /home/user/folder1"
[1] "/home/user/folder3 in interation_3"
[1] "current_dir: /home/user/folder2"
[1] "end_dir: /home/user/folder3"
... while I would have expected this:
[1] "start_dir: /home/user"
[1] "/home/user/folder1 in interation_1"
[1] "current_dir: /home/user/folder1"
[1] "/home/user/folder2 in interation_2"
[1] "current_dir: /home/user/folder2"
[1] "/home/user/folder3 in interation_3"
[1] "current_dir: /home/user/folder3"
[1] "end_dir: /home/user/"
So it turns out that the path assigned to current_dir is somewhat "behind" ...
Why is this the case?
As I am far away from being a R expert, I have no idea what is causing this behaviour and most important how to get the desired behaviour.
So any help, hint, code correction/optimization would be highly appreciated!
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Platform: x86_64-pc-linux-gnu (64-bit)

From the ?setwd help page...
setwd returns the current directory before the change, invisibly and with the same conventions as getwd. It will give an error if it does not succeed (including if it is not implemented).
So when you do
current_dir <- setwd(current_variable)
print(paste0("current_dir: ",current_dir))
You are not getting the "current" directory, you are getting the previous one. You should use getwd() to get the current one
setwd(current_variable)
current_dir <- getwd()
print(paste0("current_dir: ",current_dir))

Related

Use function from a package but different versions simultaneously

I have a multiple versions of the same package foo (all with one function bar), that I all want to use in the same script.
Following this question, I can load v1 of the package with library("foo", lib.loc = "pkgs/v1"). But this loads all of the functions from the package.
Now I want to assign foo::bar from the version v1 to bar_v1 and foo::bar from version v2 to bar_v2 to call them independently. But I do not see the option to only load one function of the library given a lib location (eg a solution would be to specify a lib.loc in the function call bar_v1 <- foo::bar).
Is this possible in R?
MWE
I have created a test package here github.com/DavZim/testPkg, which has one function foo which prints the package version (hard coded). The package has two releases, one for each version.
To get the tar.gz files of the package, you can use this
# Download Files from https://github.com/DavZim/testPkg
download.file("https://github.com/DavZim/testPkg/releases/download/v0.1.0/testPkg_0.1.0.tar.gz", "testPkg_0.1.0.tar.gz")
download.file("https://github.com/DavZim/testPkg/releases/download/v0.2.0/testPkg_0.2.0.tar.gz", "testPkg_0.2.0.tar.gz")
Then to setup the folder structure in the form of
pkgs/
0.1.0/
testPkg/
0.2.0/
testPkg/
I use
if (dir.exists("pkgs")) unlink("pkgs", recursive = TRUE)
dir.create("pkgs")
dir.create("pkgs/0.1.0")
dir.create("pkgs/0.2.0")
# install the packages locally
install.packages("testPkg_0.1.0.tar.gz", lib = "pkgs/0.1.0", repos = NULL)
install.packages("testPkg_0.2.0.tar.gz", lib = "pkgs/0.2.0", repos = NULL)
Now the question is what do I write in myscript.R?
Ideally I would have something like this
bar_v1 <- some_function(package = "testPkg", function = "foo", lib.loc = "pkgs/0.1.0")
bar_v2 <- some_function(package = "testPkg", function = "foo", lib.loc = "pkgs/0.2.0")
bar_v1() # calling testPkg::foo from lib.loc pkgs/0.1.0
#> [1] "Hello World from Version 0.1.0"
bar_v2() # calling testPkg::foo from lib.loc pkgs/0.2.0
#> [1] "Hello World from Version 0.2.0"
Non-working Try
Playing around with it, I thought something like this might work.
But it doesn't...
lb <- .libPaths()
.libPaths("pkgs/0.1.0")
v1 <- testPkg::foo
v1()
#> [1] "Hello from 0.1.0"
.libPaths("pkgs/0.2.0")
v2 <- testPkg::foo
v2()
#> [1] "Hello from 0.1.0"
.libPaths(lb)
v1()
#> [1] "Hello from 0.1.0"
v2()
#> [1] "Hello from 0.1.0" #! This should be 0.2.0!
Interestingly, if I swap around the versions to load 0.2.0 first then 0.1.0, I get this
lb <- .libPaths()
.libPaths("pkgs/0.2.0")
v1 <- testPkg::foo
v1()
#> [1] "Hello from 0.2.0"
.libPaths("pkgs/0.1.0")
v2 <- testPkg::foo
v2()
#> [1] "Hello from 0.2.0"
.libPaths(lb)
v1()
#> [1] "Hello from 0.2.0"
v2()
#> [1] "Hello from 0.2.0"
1) Successive loads Assume that we have source packages for testPkg in the current directory and they are named testPkg_0.1.0.tar.gz and testPkg_0.2.0.tar.gz. Now create pkgs, pkgs/0.1.0 and pkgs/0.2.0 directories to act as release libraries and then install those source packages into those libraries.
Now assuming each has a function foo which does not depend on other objects in the package, load each package in turn, rename foo, and detach/unload the package. Now both can be accessed under the new names.
dir.create("pkgs")
dir.create("pkgs/0.1.0")
dir.create("pkgs/0.2.0")
install.packages("testPkg_0.1.0.tar.gz", "pkgs/0.1.0", NULL)
install.packages("testPkg_0.2.0.tar.gz", "pkgs/0.2.0", NULL)
library("testPkg", lib.loc = "pkgs/0.1.0")
fooA <- foo
detach(unload = TRUE)
library("testPkg", lib.loc = "pkgs/0.2.0")
fooB <- foo
detach(unload = TRUE)
fooA
fooB
2) Change package name Another approach is to install one release normally and then for the other release download its source, change its name in the DESCRIPTION file and then install it under the new name. Then both can be used.
Assuming testPkg_0.1.0.tar.gz and testPkg_0.2.0.tar.gz source packages both have function foo and that the two tar.gz files are in current directory we can accomplish this as follows. Note that changer will both change the DESCRIPTION file to use name testPkgTest and the directory name of the source package to the same.
library(changer)
install.packages("testPkg_0.1.0.tar.gz", repos = NULL)
untar("testPkg_0.2.0.tar.gz")
changer("testPkg", "testPkgTest")
install.packages("testPkgTest", type = "source", repos = NULL)
testPkg::foo()
## [1] "Hello World from Version 0.1.0"
testPkgTest::foo()
## [1] "Hello World from Version 0.2.0"
Old
3) import Below we suggested the import package but unfortunately, as was pointed out in the comments, the code below using this package does not actually work and imports the same package twice. I have created an issue on the import github site. https://github.com/rticulate/import/issues/74
Suppose we have source packages
mypkg_0.2.4.tar.gz and mypkg_0.2.5.tar.gz
in the current directory and that each has a function myfun.
Then this will create a library for each, install them into
the respective libraries and import myfun from each. These will be located in A and B on the search path.
Note that the import package
should be installed but not loaded, i.e. no library(import)
statement should be used. You may wish to read the documentation of the import package since variations of this are possible.
# use development version of import -- the CRAN version (1.3.0) has
# a bug in the .library= argument
devtools::install_github("rticulate/import")
dir.create("mypkglib")
dir.create("mypkglib/v0.2.4")
dir.create("mypkglib/v0.2.5")
install.packages("mypkg_0.2.4.tar.gz", "mypkglib/v0.2.4", NULL)
install.packages("mypkg_0.2.5.tar.gz", "mypkglib/v0.2.5", NULL)
import::from("mypkg", .library = "mypkglib/v0.2.4", .into = "A", myfun)
import::from("mypkg", .library = "mypkglib/v0.2.5", .into = "B", myfun)
search()
ls("A")
ls("B")
get("myfun", "A")
get("myfun", "B")
Another possibility is to put them both into imports (used by default) with different names
import::from("mypkg", .library = "mypkglib/v0.2.4", myfunA = myfun)
import::from("mypkg", .library = "mypkglib/v0.2.5", myfunB = myfun)
search()
ls("imports")
myfunA
myfunB

how can I redirect system command and its output in r?

I am using system command to find files on ubuntu, and trying to redirect the result on screen to a txt file, as following example.
# make file
system("touch a.txt")
system("touch b.txt")
system("touch c.txt")
system("touch d.txt")
sink("t.txt")
c("a.txt", "b.txt")%>% lapply(function(f) {
system(sprintf("find -name %s", f))
})
sink()
but the result turns out to be a list with O's inside.
Please advise how I can achieve that. Thanks.
You could use intern = TRUE option which returns system result directly into the R environment:
c("a.txt", "b.txt" , "test.txt") %>% lapply(function(f) {
system(sprintf("find -name %s", f), intern = T)
})
[[1]]
[1] "./a.txt"
[[2]]
[1] "./b.txt"
[[3]]
character(0)

R loop completes only 3 iterations out of 2504

I've written a function to download multiple files from NOAA's database. Firstly, I've got sites which is a list of site ID's that I want to download off the website. It looks like this:
> head(sites)
[[1]]
[1] "9212"
[[2]]
[1] "10158"
[[3]]
[1] "11098"
> length(sites)
[1] 2504
My function is shown below.
tested<-lapply(seq_along(sites), function(x) {
no<-sites[[x]]
data=GET(paste0('https://www.ncdc.noaa.gov/paleo-search/data/search.json?xmlId=', no))
v<-content(data)
check=GET(v$statusUrl)
j<-content(check)
URL<-j$archive
download.file(URL, destfile=paste0('./tree_ring/', no, '.zip'))
})
The weird issue is that it works for the first three sites (downloads properly), but then it stops after the three sites and throws the following error:
Error in charToRaw(URL) : argument must be a character vector of length 1
I've tried manually downloading the 4th and 5th site (using the same code as above, but not within function) and it works fine. What could be going on here?
EDIT 1: Showing more site ID's as requested
> dput(sites[1:6])
list("9212", "10158", "11098", "15757", "15777", "15781")
I converted your code to a for loop so I could see the most recent values of all your variables when things fail.
The fails aren't consistently on the 4th site. Running your code a few times, sometimes it fails on 2, or 3, or 4. When it fails, if I look at j, I see this:
$message
[1] "finalizing archive"
$status
[1] "working"
$message
[1] "finalizing archive"
$status
[1] "working"
If I re-run check=GET(v$statusUrl); j<-content(check) a few seconds later, then I see
$archive
[1] "https://www.ncdc.noaa.gov/web-content/paleo/bundle/1986420067_2020-04-23.zip"
$status
[1] "complete"
So, I think it takes the server a little bit of time to prepare the file for download, and sometimes R asks for the file before it's ready, which causes an error. A simple fix might look like this:
check_status <- function(v) {
check <- GET(v$statusUrl)
content(check)
}
for(x in seq_along(sites)) {
no<-sites[[x]]
data=GET(paste0('https://www.ncdc.noaa.gov/paleo-search/data/search.json?xmlId=', no))
v<-content(data)
try_counter <- 0
j <- check_status(v)
while(j$status != "complete" & try_counter < 100) {
Sys.sleep(0.1)
j <- check_status(v)
}
URL<-j$archive
download.file(URL, destfile=paste0(no, '.zip'))
}
If the status isn't ready, this version will wait 0.1 seconds before checking again, up to 10 seconds.

Specify order of import for multiple tables in R

I'm trying to read in 360 data files in text format. I can do so using this code:
temp = list.files(pattern="*.txt")
myfiles = lapply(temp, read.table)
The problem I have is that the files are named as "DO_1, DO_2,...DO_360" and when I try to import the files into a list, they do not maintain this order. Instead I get DO_1, DO_10, etc. Is there a way to specify the order in which the files are imported and stored? I didn't see anything in the help pages for list.files or read.table. Any suggestions are greatly appreciated.
lapply will process the files in the order you have them stored in temp. So your goal is to sort them the way you actually think about them. Luckily there is the mixedsort function from the gtools package that does just the kind of sorting you're looking for. Here is a quick demo.
> library(gtools)
> vals <- paste("DO", 1:20, sep = "_")
> vals
[1] "DO_1" "DO_2" "DO_3" "DO_4" "DO_5" "DO_6" "DO_7" "DO_8" "DO_9"
[10] "DO_10" "DO_11" "DO_12" "DO_13" "DO_14" "DO_15" "DO_16" "DO_17" "DO_18"
[19] "DO_19" "DO_20"
> vals <- sample(vals)
> sort(vals) # doesn't give us what we want
[1] "DO_1" "DO_10" "DO_11" "DO_12" "DO_13" "DO_14" "DO_15" "DO_16" "DO_17"
[10] "DO_18" "DO_19" "DO_2" "DO_20" "DO_3" "DO_4" "DO_5" "DO_6" "DO_7"
[19] "DO_8" "DO_9"
> mixedsort(vals) # this is the sorting we're looking for.
[1] "DO_1" "DO_2" "DO_3" "DO_4" "DO_5" "DO_6" "DO_7" "DO_8" "DO_9"
[10] "DO_10" "DO_11" "DO_12" "DO_13" "DO_14" "DO_15" "DO_16" "DO_17" "DO_18"
[19] "DO_19" "DO_20"
So in your case you just want to do
library(gtools)
temp <- mixedsort(temp)
before your call to lapply that calls read.table.

R sQuote(vector) prints "'c("..."'" in non-interactive mode?

I have been stuck on this for quite a while. I am trying to build a sql query to write to file, but I keep writing the text 'c("...")' as part of the output file, as if the concatenate function in R was being interpreted very literally.
I have eliminated the write() function itself, toString(), and the paste0() used as part of building the final output string. The first occurrence of the 'c' appears in the output of sQuote. When I try doing a call to sQuote() in interactive mode, I don't get the same behaviour:
Browse[2]> sQuote(sqlTableColumnValues)
[1] "‘c(\"0\", \"XXX0\", \"XXX056\", \"XXX139\", \"XXX143\", \"XXX144\", \"XXX159\", \"XXX171\", \"XXX185\", \"XXX188\", \"XXX192\", \"XXX202\", \"XXX239\", \"XXX240\", \"XXX245\", \"XXX256\", \"XXX271\", \"XXX303\", \"XXX319\", \"XXX326\", \"XXX334\", \"XXX357\", \"XXX363\", \"XXX368\", \"XXX390\", \"XXX391\", \"XXX417\", \"XXX426\", \"XXX431\", \"XXX439\", \"XXX447\", \"XXX456\", \"XXX461\", \"XXX466\", \"XXX475\", \"XXX483\", \"XXX488\", \"XXX491\", \"XXX521\", \"XXX531\", \n\"XXX538\", \"XXX541\", \"XXX548\", \"XXX550\", \"XXX581\")’"
Browse[2]> str(sQuote(sqlTableColumnValues))
chr "‘c(\"0\", \"XXX0\", \"XXX056\", \"XXX139\", \"XXX143\", \"XXX144\", \"XXX159\", \"XXX171\", \"XXX185\","| __truncated__
Browse[2]> tst <- c("foo","bar") #my own interactive test
Browse[2]> tst
[1] "foo" "bar"
Browse[2]> sQuote(tst) #does not show the 'c' character in the result
[1] "‘foo’" "‘bar’"
Browse[2]>
What is causing this discrepancy and how can I stop the 'c(...)' being written to my output file?
Update: dput output as requested:
Browse[2]> dput(sqlTableColumnValues)
structure(list(`1` = c("0", "XXX0", "XXX056", "XXX139",
"XXX143", "XXX144", "XXX159", "XXX171", "XXX185", ... #etc, I've truncated.
I don't yet understand what that means / what to do with this info. :-/

Resources