how to use loop to run through set of lists - r

I am trying to create an r loop to run a command on a series of datasets. the command is make.design.data from the RMark library. The only argument it takes is the name of a list. I have 17 of these lists I'd like to pass to make.design.data This is the code I've been trying to use
DFNames<-c("DFAmerican.Goldfinch", "DFAmerican.Robin","DFBarn.Swallow","DFBobolink", "DFBrown.head.Cowbird", "DFCedar.Waxwing", "DFCommon.Grackle","DFCommon.Yellowthroat", "DFEuropean.Starling","DFHorned.Lark", "DFKilldeer","DFRed.wing.Blackbird", "DFSavannah.Sparrow", "DFSong.Sparrow","DFTree.Swallow", "DFVesper.Sparrow", "DFYellow.Warbler")
#in my environment each of the names given to DFNames represents a list
for (x in DFNames){
n<-make.design.data(x)
assign(paste0("ddl",x),n)
}
this gives me the error
Error in data$model : $ operator is invalid for atomic vectors
can anyone please suggest a way to fix my code, or a different way of tackling this?
Thanks, Jude

Instead, you can make a list of the actual data sets instead of a vector of their names.
x <- list(DFAmerican.Goldfinch, ...)
Then you can use:
lapply(x, make.design.data)`.
Or use get inside your for loop:
for (x in DFNames) {
make.design.data(get(x))
}
The "R" way is the former using lists and the apply family. Then you can avoid the gymnastics of assign.

Related

Referencing recently used objects in R

My question refers to redundant code and a problem that I've been having with a lot of my R-Code.
Consider the following:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
combined_df_putnam$fu_time<-combined_df_putnam$age*365.25
combined_df_einstein$fu_time<-combined_einstein$age*365.25
combined_df_newton$fu_time<-combined_newton$age*365.25
...
combined_leibniz$fu_time<-combined_leibniz$age*365.25
I am trying to slim-down my code to do something like this:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1)
paste0("combined_df_",list_names[0:7]) <- paste0("combined_df_",list_names[0:7])$age*365.25
When I try to do that, I get "target of assignment expands to non-language object".
Basically, I want to create a list that contains descriptors, use that list to create a list of dataframes/lists and use these shortcuts again to do calculations. Right now, I am copy-pasting these assignments and this has led to various mistakes because I failed to replace the "name" from the previous line in some cases.
Any ideas for a solution to my problem would be greatly appreciated!
The central problem is that you are trying to assign a value (or data.frame) to the result of a function.
In paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1), the left-hand-side returns a character vector:
> paste0("combined_df_",list_names[0:7])
[1] "combined_df_putnam" "combined_df_einstein" "combined_df_newton"
[4] "combined_df_kant" "combined_df_hume" "combined_df_locke"
[7] "combined_df_leibniz"
R will not just interpret these strings as variables that should be created and be referenced to. For that, you should look at the function assign.
Similarily, in the code paste0("combined_df_",list_names[0:7])$age*365.25, the paste0 function does not refer to variables, but simply returns a character vector -- for which the $ operator is not accepted.
There are many ways to solve your problem, but I will recommend that you create a function that performs the necessary operations of each data frame. The function should then return the data frame. You can then re-use the function for all 7 philosophers/scientists.

Run a piece of code several times changing certain parameters in R (like a macro in SAS)

Trying to learn the ropes in R and already struggling trying to find a replacement for SAS macro.
I'm trying to run a piece of code several times, but I'm having a hard time and came here for help.
First, I'm working with this example file, with a variable that gives me the quantity of rows that I have previously analised in another file (qtde_registros), followed by three variables that give me the quantity of rows that had different type of errors.
file <- readRDS(file="file.Rda")
file
qtde_registros error1 error2 error3
1 1175 0 0 0
After that, I created a list with the errors and another one with the description of each one of them.
Then, using those lists and the file mentioned initially, I wish to create several files (one for each error) that will later be binded in one last file to form a final report.
As I said, I'm struggling with it, so I made an example code of how it would be forming the first file:
error_list <- list("Error1","Error2","Error3",)
description_list <- list("Code not found",
"Invalid date.",
"Negative value.")
error1 <- file
error1$file_name <- "Clients"
error1$error <- error_list[1]
error1$qtde <- error1$error1
error1$desc <- description_list[1]
error1 <- select(error1, file_name, error, qtde, desc)
error1
file_name error qtde desc
1 Clients Error1 0 Code not found
And that leads to my question: how can I make the code above run several times, one for each erros on my list?
I'm aware that the whole mentality may not be the best, as the approach to do certain things are different depending on the language used, but I have to work with the knowledge I have at the moment.
I'm thinking of using the apply family of functions, but I didn't managed to work it out.
Thanks in advance for the help and sorry for any errors in typing or grammar (english is not my first language).
EDIT: forgot to say that I'm not intend to do via For or While loop.
In R (and many other languages) you'll be using a form of for-loop. In R there are several wrappers for loops with specific outcome in the *apply family. Here's a short (incomplete) list of the *apply family and their input/output:
lapply -> list output
sapply -> List or atomic (integer vector, numeric vector etc.)
mapply -> Similar to sapply but can take more than 1 input to go over (so if you have 2 simultanious things to loop over for example)
tapply -> loop over groups defined by INDEX
apply -> Loop over an array (either rows or columns) return matrix/vector
And so on.
I am guessing that your example is incomplete, but I'll show 3 examples to get you started. One using a for-loop, one using lapply and one using mapply.
for-loop
A for-loop is the classic method (found in most programming languages). It works by having a for(---) where --- is replaced by something to iterate over. This could be error_list or it could be a numeric vector seq(1, n) or 1:n. Here you have more than 1 thing to iterate over, so a numeric vector makes sense (and we use this to subset the data)
errors <- list() # <== Somewhere to put our results
for(i in 1:length(error_list)){
error_i <- list(file = file,
file_name = "Clients",
error = error_list[[i]], # Use i to subset error_list
qtde = error_list[[i]], # Maybe this should be something else in your case
desc = description_list[[i]]
)
# Put into our errors list. Create "error1" using paste and our index
errors[[paste0('error', i)]] <- error_i
}
And by the end all of your results will be in the errors list to be extracted using errors[1] or errors["errors1"] (change the number to your error). This can then be combined using do.call(rbind, errors) and then saved using write.table (or write.csv or similar).
lapply
For the *apply family, the *apply takes care of the looping. But instead we have to provide a function to execute (a macro in SAS terms) in each iteration. So we wrap the contents of the loop in the function above.
macro <- function(i){
list(file = file,
file_name = "Clients",
error = error_list[[i]], # Use i to subset error_list
qtde = error_list[[i]], # Maybe this should be something else in your case?
desc = description_list[[i]]
)
}
errors <- lapply(1:length(error_list), macro)
#set names afterwards
names(errors) <- paste0("error", 1:length(error_list))
And once again we have the data ready to be extracted saved etc. This is equivalent to:
errors <- list()
for(i in 1:length(error_list))
errors[[i]] <- macro(i)
names(errors) <- paste0("error", 1:length(error_list))
mapply
Now in your case you have more than 1 thing to iterate over. An alternative is to use mapply and add these as parameters to your function instead. This way we remove error_list[[i]] and description_list[[i]] from the function and instead add these as parameters
macro_mapply <- function(error, description){
list(file = file,
file_name = "Clients",
error = error, # No need to use I here anymore
qtde = error, # Maybe this should be something else in your case?
desc = description
)
}
errors <- mapply(macro_mapply,
# parameters to iterate over comes after function
error = error_list,
description = description_list,
# Avoid simplification (if we want a list returned)
SIMPLIFY = FALSE)
names(errors) <- paste0("error", 1:length(error_list))
Note that "mapply" will try to return a vector if possible, so I set SIMPLIFY = FALSE to avoid this.
Things to note:
In the above 3 examples I have not taken into account if you read multiple files, or any other parameters changing. So if you have to read a file in each iteration it will make sense to go with the first 2 examples and add readRDS to the loop or function with appropriate file naming. Also I have used your data, but I am guessing qtde and error should be different in your specific case but this is not clear from your example.
I hope this will help getting you started.
Once you've gotten the hang of your first loops I and somewhat understand how *applys work, I would then suggest checking out tidyverse which provides what many find to be a more "user-friendly" and intuitive interface to data transformation.
I hope that this will help you getting started on solving your problem.

Problems obtaining the correct object class. R

I created a small function to process a dataframe to be able to use the function:
preprocessCore::normalize.quantiles()
Since normalize.quintles() can only use a matrixc object, and I need to rearrange my data, I create a small function that takes a specific column (variable) in a especific data frame and do the following:
normal<-function(boco,df){
df_p1<-subset(df,df$Plate==1)
df_p2<-subset(df,df$Plate==2)
mat<-cbind(df_p1$boco,df_p2$boco)
norm<-preprocessCore::normalize.quantiles(mat)
df_1<-data.frame(var_1=c(norm[,1],norm[,2]),well=c(df_p1$well,df_p2$well))
return(df_1)
}
However, "mat" should be a matrix, but it seems the cbind() does not do its job since I'm obtaining the following Error:
normal(antitrombina_FI,Six_Plex_IID)
Error in preprocessCore::normalize.quantiles(mat) :
Matrix expected in normalize.quantiles
So, it is clear that the cbind() is not creating a matrix. I don't understand why this is happening.
Most likely you are binding two NULL objects together, yielding NULL, which is not a matrix. If your df objects are data.frame, then df_p1$boco is interpreted as "extract the variable named boco", not "extract the variable whose name is the value of an object having the symbol boco". I suspect that your data does not contain a variable literally named "boco", so df_p1$boco is evaluated as NULL.
If you want to extract the column that is given as the value to the formal argument boco in function normal() then you should use [[, not $:
normal<-function(boco,df){
df_p1<-subset(df,df$Plate==1)
df_p2<-subset(df,df$Plate==2)
mat<-cbind(df_p1[[boco]],df_p2[[boco]])
norm<-preprocessCore::normalize.quantiles(mat)
df_1<-data.frame(var_1=c(norm[,1],norm[,2]),well=c(df_p1$well,df_p2$well))
return(df_1)
}
Thanks for your help bcarlsen. However I have found some errors:
First, I believe you need to introduce quotes in
mat<-cbind(df_p1[["boco"]],df_p2[["boco"]])
If I run this script outside of a function works erally perfectly:
df_p1<-subset(Six_Plex_IID,Six_Plex_IID$Plate==1)
df_p2<-subset(Six_Plex_IID,Six_Plex_IID$Plate==2)
mat<-cbind(df_p1[["antitrombina_FI"]],df_p2[["antitrombina_FI"]])
norm<-preprocessCore::normalize.quantiles(mat)
However If I introduce this now in a function and try to run it like a function:
normal<-function(boco,df){
df_p1<-subset(df,df$Plate==1)
df_p2<-subset(df,df$Plate==2)
mat<-cbind(df_p1[["boco"]],df_p2[["boco"]])
norm<-preprocessCore::normalize.quantiles(mat)
df_1<-data.frame(var_1=c(norm[,1],norm[,2]),well=c(df_p1$well,df_p2$well))
return(df_1)
}
normal(antitrombina_FI,Six_Plex_IID)
I get the same error mesage:
Error in preprocessCore::normalize.quantiles(mat) :
Matrix expected in normalize.quantiles
I'm completely clueless about why this is happening, why outside the function I'm obtaining a matrix and why inside the function not.
Thanks

$ operator for variable object names

I am trying to use the $ operator for selecting and reformating specific columns in a for loop on variably created data.frame objects. I tried 4 different solutions in my commented code, but none of them works. I looked all over SO but i don't seem to find another solution to try.
How can i make use of the $ operator to select specific columns with variable data.frame names?
Thanks
weather_data_files<-c("CMC","ECMWF","ECMWF_VAR_EPS_MONTHLY_FORECAST",
"GFS","ICON_EU","UKMET_EURO4")
for(filename in weather_data_files){
#create data frame environment objects
assign(paste(filename),read.csv(file = paste(filename,".csv",sep = ""),sep = ";"))
#first solution does not work, because filename is here an atomic vector
#rather than a data.frame
#ErrorMessage: $ operator is invalid for atomic vectors
filename$Forecast.Time<- as.POSIXct(filename$Forecast.Time,
format="%d.%m.%Y %H:%M+%S",tz="UTC")
#ok get it, let's try second soltution,but
#it also does not work allthough i try to get the data.frame object
#ErrorMesssage: could not find function "get<-
get(filename)$Forecast.Time<-
as.POSIXct(get(filename)$Forecast.Time,format="%d.%m.%Y %H:%M+%S",tz="UTC")
#Third solution as.name also does not work
#ErrorMessage: object of type 'symbol' is not subsettable
as.name(filename)$Forecast.Time<-
as.POSIXct(as.name(filename)$Forecast.Time,format="%d.%m.%Y %H:%M+%S",tz="UTC")
#Fourth solution comparable to second solution, still not working
#ErrorMessage: could not find function "eval<-"
eval(assign(filename,get(filename)))$Forecast.Time<-
as.POSIXct(eval(assign(filename,get(filename)))$Forecast.Time,
format="%d.%m.%Y %H:%M+%S",tz="UTC")
}
So, the problem is you're passing in character strings, not objects. The get function retrieves the object, just doesn't have a place to store it.
You could always load the character string into a temporary variable as you're looping. Operate on the temporary variable and then assign when you're done.
for(filename in c("a","b")){
tmp <- get(filename)
}
You could also skip most of the for loop and use the apply family.
files = lapply(paste(c("CMC","ECMWF","ECMWF_VAR_EPS_MONTHLY_FORECAST",
"GFS","ICON_EU","UKMET_EURO4"),".csv",sep=""),
read.csv,sep=";")
files = lapply(files,function(x){x$Forecast.Time = as.POSIXct(x$Forecast.Time,
format="%d.%m.%Y %H:%M+%S",tz="UTC");return(x)}
Now you have a list of your files you can work on. You could assign them to global variables if you want.

In R, package xts, how would one iterate period subsetting over a list without throwing errors?

Assume:
list of n xts objects in .GlobalEnv with the suffix ".raw" (e.g: ABC.raw)
have created a list of .raw names in a list (ie, rawfiles <- ls(pattern="*.raw",envir=.GlobalEnv))
Would like to:
loop or lapply through rawfiles and subset a particular timeperiod in each iteration
for example, to write this as a single line would be: new <- ABC.raw["T09:00/T10:00"] if I wanted to subset ABC.raw from 9am to 10am each day.
The problem is:
Doesn't seem to be an easy way of passing["Thh:mm/Thh:mm"] to a loop, apply or assign without causing errors.
Any ideas how to pass this?
In pidgeon code, I guess I'm looking for a working equivalent of:
for(i in 1:length(raw)){
raw[i]["T09:00/T10:00"]
}
Many thanks in advance for any assistance on this.
Try get.
get(x) retrieves the variable whose name is stored in x, so foo<-1; get('foo') would return 1.
for ( rawname in rawfiles ) {
get(rawname)["T09:00/T10:00"]
}

Resources