Setting scratch as temporary directory from R - r

I would like to set my temporary directory using the scratch space of the cluster. I have tried various methods and this one: How to change directory for temporary files - problems with huge temporary raster files but nothing works.
I have to read a large file (12 GB) in R and run some code using it.
I would like to read the file in this way:
library(data.table)
mydata<-fread("path/file")
But first I believe it is necessary to set the temporary directory as scratch/ otherwise the job has been killed.
Feel free to suggest any other approach.

Related

Recoverable file deletion in R

According to these questions:
Automatically Delete Files/Folders
how to delete a file with R?
the two ways to delete files in R are file.remove and unlink. These are both permanent and non-recoverable.
Is there an alternative method to delete files so they end up in the trash / recycle bin?
I wouldn't know about a solution that is fully compatible with Windows' "recycle bin", but if you're looking for something that doesn't quite delete files, but prevents them from being stored indefinitely, a possible solution would be to move files to the temporary folder for the current session.
The command tempdir() will give the location of the temporary folder, and you can just move files there - to move files, use file.rename().
They will remain available for as long as the current session is running, and will automatically be deleted afterwards . This is less persistent than the classic recycle bin, but if that's what you're looking for, you probably just want to move files to a different folder and delete it completely when you're done.
For a slightly more consistent syntax, you can use the fs package (https://github.com/r-lib/fs), and its fs::path_temp() and fs::file_move().

Bundling large .rda files with package

I am currently working on a package that I want to bundle some large .rda files with (hundreds of MB). If I use devtools::load_all(), my package takes forever to load since I included the files in the /data/ dir.
Is there a way to tell R to ignore the files in /data/ until I manually load them with data(), or am I better of just putting my data into a different directory?
How about you
create a directory inst/optionalData/ (or another suitable name)
add functions to load these data sets on demand
as you can rely on
system.files("optionalDate", "nameOfFile.rds", package="yourPackage")
to find it.

Load Folder of Scripts in R at startup?

I'm new to R and frankly the amount of documentation is overwhelming, and I haven't been able to find the answer to this question.
I have created a number of .R script files, all stored in a folder that I can access on my server (let's say the folder is, using the Windows backslash character \\servername\Paige\myscripts)
I know that in R you can call each script individually, for example (using the forward slash required in R)
source(file="//servername/Paige/myscripts/con_mdb.r")
and now this script, con_mdb, is available for use.
If I want to make all the scripts in this folder available at startup, how do I do this?
Briefly:
Use your ~/.Rprofile in the directory found via Sys.getenv("HOME") (or if that fails, in R's own Rprofile.site)
Loop over the contents of the directory via dir() or list.files().
Source each file.
as eg via this one liner
sapply(dir("//servername/Paige/myscripts/", "*.r"), source)
but the real story is that you should not do this. Create a package instead, and load that. Bazillion other questions here on how to build a package. Research it -- it is worth it.
Far the best way is to create a package! But as first step, you could also create one r script file (collection.r) in your script directory which includes all the scripts in a relative manner.
In your separate project scripts you can than include only that script with
source(file="//servername/Paige/myscripts/collection.r", chdir = TRUE)
which changes the directory before sourcing. Therefore you would have only to include one file for each project.
In the collection file you could use a loop over all files (except collection.r) or simply list them all.

Can I run R on an alternate hard drive in Windows?

I installed R in C:/programfiles/users... directory. All my files are on a different hard drive, at N:/project/rasterimg/stacked/.... The size of files I want to use for my work are very large (126 GB), so I can not move them to C: drive.
Is there any method to run R without moving my files into C drive?
run
setwd("directory_path")
as already suggested. Then check with
getwd()
if you are really, where you wanted to be.
Then run
dir()
to see/list what files are there in your working dorectory.
You can even acces the file via its the dir() function output index number if you want to avoid mis-typing.
If I understood you correctly you want to change your working directory to some kind of external hard drive. You can do that by using the command:
setwd("N:/project/rasterimg/stacked/...")
or you can just access your files by reading them through a command like (or similarly for the objects you work with):
read.table("N:/project/rasterimg/stacked/...")
Edit: My concern with working with 126 GB files would be that you will run out of memory though... but that's a totally different issue, which is somewhat beyond my knowledge. But there might be others that could help you with that in a new thread (if needed).

How do I Download efficiently with rsync?

A couple of questions related to one theme: downloading efficiently with Rsync.
Currently, I move files from an 'upload' folder onto a local server using rsync. Files to be moved are often dumped there, and I regularly run rsync so the files don't build up. I use '--remove-source-files' to remove files that have been transferred.
1) the '--delete' options that remove destination files have various options that allow you to choose when to remove the files. This would be handly for '--remove-source-files' since is seems that, by default, rsync only removes the files after all files have been transferred, rather than after each file; Othere than writing a script to make rsync transfer files one-by-one, is there a better way to do this?
2) on the same problem, if a large (single) file is transferred, it can only be deleted after the whole thing has been sucessfully moved. It strikes me that I might be able to use 'split' to split the file up into smaller chunks, to allow each to be deleted as the file downloads; is there a better way to do this?
Thanks.

Resources