Question
It seems the knitr cache becomes invalidated by copying the relevant files (.rmd script and cache directory) to another computer.
Why is that so and
how can I work around this?
Details
I do various lengthy calculations on two computers. I thought the following procedure could work:
Knit a first version of a report on machine A. (includes some lengthy calculations)
Copy the files created, i.e. the script and the cache directory, to machine B.
Continue editing the report on machine B (without recalculations because everything is cached).
This does not work, after copying the files to B, "knit" performs a full recalculation. This is even the case before any editing of the script was performed, i.e. just the act of copying from A to B seems enough to invalidate the cache.
Why is a full recalculation on B performed? As I understood it the caching mechanism boils down to creating and comparing a hash. I had hoped that after copying the hash would remain unchanged.
Is there something else I should copy in addition? Or is there any other way I can make the procedure above work?
Example
Any trivial script works as an example such as the one below:
```{r setup, include=FALSE}
knitr::opts_chunk$set(cache = TRUE)
```
Bla Bla
```{r test}
tmp = sort(runif(1e7))
```
I don't know the details of why that happens, but the workaround is easy: save values to files explicitly, and read them back in. You can use
saveRDS(x, "x.rds")
to save the variable x to a file named x.rds, and then
x <- readRDS("x.rds")
to read it back in. If you want to get fancy, you can check for the existence of x.rds using file.exists("x.rds") and do the full calculation followed by saveRDS if that returns FALSE, otherwise just read the data.
EDITED TO ADD: If you really want to know the answer to your first question, one possible approach would be to copy the folder back from the 2nd computer to the 1st, and see if it works back there. If not, do a binary compare of the original and twice copied directories and see what has changed.
If it does work, it might simply be different RNGkind() settings on the two computers: it's pretty common to have the buggy sample.kind = "Rounding" saved. Not sure that caching would use this. Or perhaps different package versions or R versions: when I updated knitr the cache was invalidated.
MORE additions:
If you want to see what has changed, then turn on debugging on the digest::digest function, and call knitr::knit("src.Rmd"). digest() is called for each cached chunk, and passed a large list in its object argument. It should return the same hash value if the list is the same, so you'll want to save those objects, and compare them between the two computers. For example, with your toy example above, I get this passed as object:
list(eval = TRUE, echo = TRUE, results = "markup", tidy = FALSE,
tidy.opts = NULL, collapse = FALSE, prompt = FALSE, comment = "##",
highlight = TRUE, size = "normalsize", background = "#F7F7F7",
strip.white = TRUE, cache = 3, cache.path = "cache/", cache.vars = NULL,
cache.lazy = TRUE, dependson = NULL, autodep = FALSE, fig.keep = "high",
fig.show = "asis", fig.align = "default", fig.path = "figure/",
dev = "png", dev.args = NULL, dpi = 72, fig.ext = NULL, fig.width = 7,
fig.height = 7, fig.env = "figure", fig.cap = NULL, fig.scap = NULL,
fig.lp = "fig:", fig.subcap = NULL, fig.pos = "", out.width = NULL,
out.height = NULL, out.extra = NULL, fig.retina = 1, external = TRUE,
sanitize = FALSE, interval = 1, aniopts = "controls,loop",
warning = TRUE, error = TRUE, message = TRUE, render = NULL,
ref.label = NULL, child = NULL, engine = "R", split = FALSE,
purl = TRUE, label = "test", code = "tmp = sort(runif(1e7))",
75L)
Related
I am downloading global data from tiles by NASA MODIS satellite using MODIStsp 2.0.9 in R. This would give me a single stitched TIFF file for the entire glob.
I am getting difference in the resolution of the image when I run the same code in Windows 10 vs linux.
PS: I am not able to understand why the resolution of the stitched global image should vary when the input parameters to the function call don't vary at all. The difference in large between the two OS.
MODIStsp(gui = FALSE,
out_folder = dropbox,
out_folder_mod = dropbox,
selprod = 'Surf_Temp_Daily_1Km (M*D11A1)',
bandsel = "LST_Day_1km", # daily surface temp
sensor = "Terra",
# your username for NASA http server
user = "user" ,
# your password for NASA http server
password = "pass",
start_date = '2002.01.01',
end_date = '2002.01.01',
#end_date = '2020.12.31',
verbose = TRUE,
spatmeth = "bbox",
bbox = c(-180.00,-90.00,180.00,90.00),
out_format = 'GTiff',
compress = 'None',
out_projsel = 'User Defined',
output_proj = 4326,
delete_hdf = FALSE,
parallel = TRUE,
reprocess = FALSE
)
I run the following command from the exams2openolat() video tutorial for summative online exams using R/exams
exams2openolat(exm, n = 50, name = "R-exams-OpenOLAT",
points = 1, maxattempts = 0, cutvalue = 2, solutionswitch = FALSE,
duration = 60, shufflesections = TRUE, navigation = "linear",
stitle = names(exm), ititle = "Question", adescription = "", sdescription = "")
and get the error
## Error in rmarkdown::pandoc_convert(input = infile, output = outfile, from = from, :
## unused Arguments (shufflesections = TRUE, navigation = "linear")
When I leave the two arguments out, it works fine. In the YouTube tutorial the command also works with the two arguments.
The two arguments have been introduced in version 2.4-0 of the package which was still the development version when the question was asked.
This point along with a few other details are explained in a blog post that accompanies the YouTube tutorial: http://www.R-exams.org/tutorials/openolat_exam/
In png(), the first argument is filename = "Rplot%03d.png" which causes files to be generated with ascending numbers. However, in ggsave, this doesn't work, the number always stays at the lowest number (Rplots001.png") and this file is always overwritten.
Looking at the code of the grDevices-functions (e.g. grDevices::png() it appears that the automatic naming happens in functions which are called by .External()
Is there already an implementation of this file naming functionality in R such that it is accessible outside of the grDevices functions?
Edit:
asked differently, is there a way to continue automatic numbering after shutting off and restarting a device? For example, in this code, the two later files overwrite the former ones:
png(width = 100)
plot(1:10)
plot(1:10)
dev.off()
png(width = 1000)
plot(1:10)
plot(1:10)
dev.off()
You can write a function to do this. For example, how about simply adding a time stamp. something like:
fname = function(basename = 'myfile', fileext = 'png'){
paste(basename, format(Sys.time(), " %b-%d-%Y %H-%M-%S."), fileext, sep="")
}
ggsave(fname())
Or, if you prefer sequential numbering, then something along the lines of
next_file = function(basename = 'myfile', fileext = 'png', filepath = '.'){
old.fnames = grep(paste0(basename,' \\d+\\.', fileext,'$'),
list.files(filepath), value = T)
lastnum = gsub(paste0(basename,' (\\d+)\\.', fileext,'$'), '\\1', old.fnames)
if (!length(lastnum)) {
lastnum = 1
} else {
lastnum = sort(as.integer(lastnum),T)[1] + 1L
}
return(paste0(basename, ' ', sprintf('%03i', lastnum), '.', fileext))
}
ggsave(next_file())
I have this function in RStudio to synchronize 2 folders on windows.
p1 and p2 are paths;
fsync<-function(p1,p2){
A<-dir(p1,all.files = T,recursive = T,ignore.case = T, include.dirs = F,full.names = T);
B<-dir(p2,all.files = T,recursive = T,ignore.case = T, include.dirs = F,full.names = T);
d1<-setdiff(A,B);
d2<-setdiff(B,A);
if(length(d1)!=0) file.copy(d1,p2,overwrite = F,recursive = T)
if(length(d2)!=0) file.copy(d2,p1,overwrite = F,recursive = T)
}
When I run it, it worked, but also shows warnings saying "the file does not exist" or "no such file or directory" (I'm not really sure right now). I think it is only with files containning non-english characters (e.g. á, é, ...). How can I make dir() take the file names correctly?
I would like to move the whole folder from one directory to another, this is my code,
folder_old_path = "C:/Users/abc/Downloads/managerA"
path_new = "C:/User/abc/Desktop/managerA"
current_files = list.files(folder_old_path, full.names = TRUE)
file.copy(from = current_files, to = path_new,
overwrite = recursive, recursive = FALSE, copy.mode = TRUE)
However, I am getting this error msg
Error in file.copy(from = current_files, to = path_new, overwrite = recursive, :
more 'from' files than 'to' files
any idea how to fix this? thank you so much for your help!
library(ff)
from <- "~/Path1/" #Current path of your folder
to <- "~/Path2/" #Path you want to move it.
path1 <- paste0(from,"NameOfMyFolder")
path2 <- paste0(to,"NameOfMyFolder")
file.move(path1,path2)
Try using this little code.
Easiest:
file.rename(folder_old_path, path_new)
If you want to check if path_new already exists you can expand the above to:
if (dir.exists(path_new) {
print(paste("already exists so recurively deleting path_new", path_new))
unlink(path_new, recursive = TRUE)
}
It appears as though the current_files = list.files(folder_old_path, full.names = TRUE) step is unnecessary. If my understanding of the R file documentation is correct, then you should be able to just use the following:
folder_old_path = "C:/Users/abc/Downloads/managerA"
path_new = "C:/User/abc/Desktop/managerA"
file.copy(from = folder_old_path, to = path_new,
overwrite = recursive, recursive = FALSE, copy.mode = TRUE)
If that doesn't work, then you'll have to create a new list of files (iterate over the current_files and replace folder_old_path with folder_new_path for each item in the list) and call file.copy on those:
folder_old_path = "C:/Users/abc/Downloads/managerA"
path_new = "C:/User/abc/Desktop/managerA"
current_files = list.files(folder_old_path, full.names = TRUE)
new_files = # replace folder_old_path with path_new for every file in current_files
file.copy(from = current_files, to = new_files,
overwrite = recursive, recursive = FALSE, copy.mode = TRUE)
... this all assumes (of course) that both folder_old_path and path_new exist and you have the correct permissions on them.
The linked page does contain a caveat/note about windows paths:
There is no guarantee that these functions will handle Windows
relative paths of the form d:path: try d:./path instead. In
particular, d: is not recognized as a directory. Nor are \\?\ prefixes
(and similar) supported.
On linux you should be able to simply:
1) make the OTHER_DIR if needed. If it is a subdirectory to OUTPUT_DIR then:
dir.create(file.path(OUTPUT_DIR, OTHER_DIR), showWarnings = FALSE)
setwd(file.path(OUTPUT_DIR, OTHER_DIR))
dir.create() will just print a warning if the directory exists. If you want to see the warning, just remove the showWarnings = FALSE.
If it is just another directory at the same level as OUTPUT_DIR then:
dir.create(OTHER_DIR)
2) Then move the file (e.g. if OTHER_DIR is at the same level as OUTPUT_DIR):
file.rename("C:/OUTPUT_DIR/file.csv", "C:/OTHER_DIR/file.csv")