Saving data with a built name in R - r

In an R script, I assign a name to some data. The name depends on parameters. I do this using
number<-1
assign(paste("variable", as.character(number), sep=""),2)
The above accomplices the same as variable1<-2. Now I want to save the result for later
save(?,file=paste("variable",as.character(number),".RData",sep=""))
What code can go in the ? slot where it should say variable1 except I need to construct this name using paste or some similar technique. Simply putting get(paste("variable",as.character(number),".RData",sep="")) does not work.

save can also use list as parameter. According to ?save
list - A character vector containing the names of objects to be saved.
Thus, we specify the object name as a string (paste0('variable', number)) for the list argument and file as the one used by OP (or make it more concise with paste0 (as.character is not necessary as integer/numeric gets automatically convert to type character in paste
save(list = paste0('variable', number),
file = paste0("variable", number, ".RData"))
Check for the file created in the working directory
list.files(getwd(), pattern = '\\.RData$')
#[1] "variable1.RData"

Related

R: locating files that their names contain a specific string from a directory and match to my list of wanted files

It's me the newbie again with another messy file and folder situation(thanks to us biologiests): I got this directory containing a huge amount of .txt files (~900,000+), all the files have been previously handed with inconsistent naming format :(
For example, messy files in directory look like these:
ctrl_S978765_uns_dummy_00_none.txt
ctrl_S978765_3S_Cookie_00_none.txt
S59607_3S_goody_3M_V10.txt
ctrlnuc30-100_S3245678_DMSO_00_none.txt
ctrlRAP_S0846567_3S_Dex_none.txt
S6498432_2S_Fulra_30mM_V100.txt
.....
As you see the naming has no reliable consistency. What's important for me is the ID code embedded in them, such as S978765. Now I have got a list (100 ID codes) of these ID codes that I want.
The CSV file containing the list as below, mind you the list does have repetitive ID codes in the row due to different CLnumber value in the second columns:
ID code CLnumber
S978765 1
S978765 2
S306223 1
S897458 1
S514486 2
....
So I want to achieve below task: find all the messy named files using the code IDs by matching to my list. And copy them into a new directory.
I have thought of use list.files() to get all the .txt files and their names, then I got stuck at the next step at matching the code ID names, I know how to do it with one string, say "S978765", but if I do it one by one, this is almost just like manual digging the folder.
How could I feed the ID code names in column1 as a list and compare/match them with the messy file title names in the directory and then copy them into a new folder?
Many thanks,
ML
This works:
library(stringr)
# get this via list.files in your actual code
files <- c("ctrl_S978765_uns_dummy_00_none.txt",
"ctrl_S978765_3S_Cookie_00_none.txt",
"S59607_3S_goody_3M_V10.txt",
"ctrlnuc30-100_S3245678_DMSO_00_none.txt",
"ctrlRAP_S0846567_3S_Dex_none.txt",
"S6498432_2S_Fulra_30mM_V100.txt")
ids <- data.frame(`ID Code` = c("S978765", "S978765", "S306223", "S897458", "S514486"),
CLnumber = c(1, 2, 1, 1, 2),
stringsAsFactors = FALSE)
str_subset(files, paste(ids$ID.Code, collapse = "|"))
#> [1] "ctrl_S978765_uns_dummy_00_none.txt" "ctrl_S978765_3S_Cookie_00_none.txt"
str_subset takes a character vector and returns elements matching some pattern. In this case, the pattern is "S978765|S978765|S306223|S897458|S514486" (created by using paste), which is a regular expression that matches any of the ID codes separated by |. So we take files and keep only the elements that have a match in ID Code.
There are many other ways to do this, which may or may not be more clear. For example, you could pass ids$ID.Code directly to str_subset instead of constructing a regular expression via paste, but that would throw a warning about object lengths every time, which could get confusing (or cause problems if you get used to ignoring it and then ignore it in a different context where it matters). Another method would be to use purrr and keep, but while that might be a little bit more clear to write, it would be a lot more inefficient since it would mean making multiple passes over the files vector -- not relevant in this context, but possibly very relevant if you suddenly need to do this for hundreds of thousands of files and IDs.
You could use regex to extract the ID codes from the file name.
Here, I have used the pattern "S" followed by 5 or more numbers. Once we extract the ID_codes, we can compare them with the ones which we have in csv.
Assuming the csv is called df and the column name is ID_Codes we can use %in% to filter them.
We can then use file.copy to move files from one folder to another folder.
all_files <- list.files(path = '/Path/To/Folder', full.names = TRUE)
selected_files <- all_files[sub('.*(S\\d{5,}).*', '\\1', basename(all_files))
%in% unique(df$ID_Codes)]
file.copy(selected_files, 'new_path/for/files')

save get'd variable (after assign)

Why can't R find this variable?
assign(paste0('my', '_var'), 2)
get(paste0('my', '_var')) ## isn't this returning an object?
save(get(paste0('my', '_var')), file = paste0('my', '_var.RDATA'))
This throws the error:
Error in save(paste0("my", "_var"), file = paste0("my", "_var.RDATA")) :
object ‘paste0("my", "_var")’ not found
From the help page, the save() function expects "the names of the objects to be saved (as symbols or character strings)." Those values are not evaulated, ie you can't put in functions that will eventually return strings or raw values themselves. Use the list= parameter if you want to call a function to return a string the the name of a variable.
save(list=paste0('my', '_var'), file = paste0('my', '_var.RDATA'))
Though using get/assign is often not a good practice in R. They are usually better ways so you might want to rethink your general approach.
And finally, if you are saving a single object, you might want to consider saveRDS() instead. Often that's the behavior people are expecting when they use the save() function.
The documentation for save says that ... should be
the names of the objects to be saved (as symbols or character strings).
And indeed if you type save into the console you can see that the source has the line
names <- as.character(substitute(list(...)))[-1L]
where substitute captures its argument and doesn't evaluate it. So as the error suggests, it is looking for an object with the name paste0('my', '_var'), not evaluating the expressions supplied.

Error related to Excel importing using read_excel

I'm new to R and studying how read_excel() and excel_sheets() work, trying to use two such codes below. These are intended to read the second sheet of an excel file.
output <- read_excel(excel_sheets("population.xlsx")[2],
path = "population.xlsx")
output <- read_excel(excel_sheets("population.xlsx"),
sheet = 2, path = "population.xlsx")
The first code runs successfully, but the second one doesn't with the error
Error: length(x) == 1L is not TRUE
I'd like to know the reason why
it happens and how I can fix it.
The document of read_excel says it can use 'sheet' argument to select which sheet to read, which I guess is identical to stating the number of the order of a character vector ( i.e. excel_sheets("population.xlsx")[2] in this case).
Just read_excel(path = "population.xlsx", sheet = 2) should work. Your first code is getting a list of all sheets and then selecting the name of the second; read_excel accepts both integer position and sheet names as the argument for sheet.
You may be confused because you need to know something about argument matching; named arguments are matched and then they are matched left to right. So in the first example, path is specified, and then the excel_sheets call is passed to the sheet argument. In the second, you specify sheet and so I think it will get passed to the range argument, which is supposed to only accept a character vector of length 1. That's the source of the error you have.

How to read in a file and have its variable name automatically take on the file name?

How can I use something like readLines to save the file as a character string in my environment with its own file name as the variable?
I tried something like the pseudocode below without success.
paste(filename) <- readLines(filename)
assign function could be an alternative:
assign("filename", readLines("filename"))
if your filename starts with a character different from a to z (e.g. ., _), you can always call your variable circumscribed with `` symbols

Two PASTE functions in a character vector

attach.files = c(paste("/users/joesmith/nosection_", currentDate,".csv",sep=""),
paste("/users/joesmith/withsection_", currentDate,".csv",sep=""))
Basically, if I did it like
c("nosection_051418.csv", "withsection_051418.csv")
And I did that manually it would work fine but since I'm automating this to run every day I can't do that.
I'm trying to attach files in an automated email but when I structure it like this, it doesn't work. How can I recreate this so that the character vector accepts it?
I thought your example implied the need for "parallel" inputs to the path stem, the first portion of the file name, and the date portions of those full paths. Consider this illustration of using a 2 item vector and a one item vector (produced by Sys.Date, replacing your "currentdate") to populate the %s positions in that sprintf string (suggested by #Gregor):
sprintf("/users/joesmith/%s_%s.csv", c("nosection", "withsection"), Sys.Date() )
[1] "/users/joesmith/nosection_2018-05-14.csv" "/users/joesmith/withsection_2018-05-14.csv"

Resources