I am trying to use the $ operator for selecting and reformating specific columns in a for loop on variably created data.frame objects. I tried 4 different solutions in my commented code, but none of them works. I looked all over SO but i don't seem to find another solution to try.
How can i make use of the $ operator to select specific columns with variable data.frame names?
Thanks
weather_data_files<-c("CMC","ECMWF","ECMWF_VAR_EPS_MONTHLY_FORECAST",
"GFS","ICON_EU","UKMET_EURO4")
for(filename in weather_data_files){
#create data frame environment objects
assign(paste(filename),read.csv(file = paste(filename,".csv",sep = ""),sep = ";"))
#first solution does not work, because filename is here an atomic vector
#rather than a data.frame
#ErrorMessage: $ operator is invalid for atomic vectors
filename$Forecast.Time<- as.POSIXct(filename$Forecast.Time,
format="%d.%m.%Y %H:%M+%S",tz="UTC")
#ok get it, let's try second soltution,but
#it also does not work allthough i try to get the data.frame object
#ErrorMesssage: could not find function "get<-
get(filename)$Forecast.Time<-
as.POSIXct(get(filename)$Forecast.Time,format="%d.%m.%Y %H:%M+%S",tz="UTC")
#Third solution as.name also does not work
#ErrorMessage: object of type 'symbol' is not subsettable
as.name(filename)$Forecast.Time<-
as.POSIXct(as.name(filename)$Forecast.Time,format="%d.%m.%Y %H:%M+%S",tz="UTC")
#Fourth solution comparable to second solution, still not working
#ErrorMessage: could not find function "eval<-"
eval(assign(filename,get(filename)))$Forecast.Time<-
as.POSIXct(eval(assign(filename,get(filename)))$Forecast.Time,
format="%d.%m.%Y %H:%M+%S",tz="UTC")
}
So, the problem is you're passing in character strings, not objects. The get function retrieves the object, just doesn't have a place to store it.
You could always load the character string into a temporary variable as you're looping. Operate on the temporary variable and then assign when you're done.
for(filename in c("a","b")){
tmp <- get(filename)
}
You could also skip most of the for loop and use the apply family.
files = lapply(paste(c("CMC","ECMWF","ECMWF_VAR_EPS_MONTHLY_FORECAST",
"GFS","ICON_EU","UKMET_EURO4"),".csv",sep=""),
read.csv,sep=";")
files = lapply(files,function(x){x$Forecast.Time = as.POSIXct(x$Forecast.Time,
format="%d.%m.%Y %H:%M+%S",tz="UTC");return(x)}
Now you have a list of your files you can work on. You could assign them to global variables if you want.
Related
My question refers to redundant code and a problem that I've been having with a lot of my R-Code.
Consider the following:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
combined_df_putnam$fu_time<-combined_df_putnam$age*365.25
combined_df_einstein$fu_time<-combined_einstein$age*365.25
combined_df_newton$fu_time<-combined_newton$age*365.25
...
combined_leibniz$fu_time<-combined_leibniz$age*365.25
I am trying to slim-down my code to do something like this:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1)
paste0("combined_df_",list_names[0:7]) <- paste0("combined_df_",list_names[0:7])$age*365.25
When I try to do that, I get "target of assignment expands to non-language object".
Basically, I want to create a list that contains descriptors, use that list to create a list of dataframes/lists and use these shortcuts again to do calculations. Right now, I am copy-pasting these assignments and this has led to various mistakes because I failed to replace the "name" from the previous line in some cases.
Any ideas for a solution to my problem would be greatly appreciated!
The central problem is that you are trying to assign a value (or data.frame) to the result of a function.
In paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1), the left-hand-side returns a character vector:
> paste0("combined_df_",list_names[0:7])
[1] "combined_df_putnam" "combined_df_einstein" "combined_df_newton"
[4] "combined_df_kant" "combined_df_hume" "combined_df_locke"
[7] "combined_df_leibniz"
R will not just interpret these strings as variables that should be created and be referenced to. For that, you should look at the function assign.
Similarily, in the code paste0("combined_df_",list_names[0:7])$age*365.25, the paste0 function does not refer to variables, but simply returns a character vector -- for which the $ operator is not accepted.
There are many ways to solve your problem, but I will recommend that you create a function that performs the necessary operations of each data frame. The function should then return the data frame. You can then re-use the function for all 7 philosophers/scientists.
I created a small function to process a dataframe to be able to use the function:
preprocessCore::normalize.quantiles()
Since normalize.quintles() can only use a matrixc object, and I need to rearrange my data, I create a small function that takes a specific column (variable) in a especific data frame and do the following:
normal<-function(boco,df){
df_p1<-subset(df,df$Plate==1)
df_p2<-subset(df,df$Plate==2)
mat<-cbind(df_p1$boco,df_p2$boco)
norm<-preprocessCore::normalize.quantiles(mat)
df_1<-data.frame(var_1=c(norm[,1],norm[,2]),well=c(df_p1$well,df_p2$well))
return(df_1)
}
However, "mat" should be a matrix, but it seems the cbind() does not do its job since I'm obtaining the following Error:
normal(antitrombina_FI,Six_Plex_IID)
Error in preprocessCore::normalize.quantiles(mat) :
Matrix expected in normalize.quantiles
So, it is clear that the cbind() is not creating a matrix. I don't understand why this is happening.
Most likely you are binding two NULL objects together, yielding NULL, which is not a matrix. If your df objects are data.frame, then df_p1$boco is interpreted as "extract the variable named boco", not "extract the variable whose name is the value of an object having the symbol boco". I suspect that your data does not contain a variable literally named "boco", so df_p1$boco is evaluated as NULL.
If you want to extract the column that is given as the value to the formal argument boco in function normal() then you should use [[, not $:
normal<-function(boco,df){
df_p1<-subset(df,df$Plate==1)
df_p2<-subset(df,df$Plate==2)
mat<-cbind(df_p1[[boco]],df_p2[[boco]])
norm<-preprocessCore::normalize.quantiles(mat)
df_1<-data.frame(var_1=c(norm[,1],norm[,2]),well=c(df_p1$well,df_p2$well))
return(df_1)
}
Thanks for your help bcarlsen. However I have found some errors:
First, I believe you need to introduce quotes in
mat<-cbind(df_p1[["boco"]],df_p2[["boco"]])
If I run this script outside of a function works erally perfectly:
df_p1<-subset(Six_Plex_IID,Six_Plex_IID$Plate==1)
df_p2<-subset(Six_Plex_IID,Six_Plex_IID$Plate==2)
mat<-cbind(df_p1[["antitrombina_FI"]],df_p2[["antitrombina_FI"]])
norm<-preprocessCore::normalize.quantiles(mat)
However If I introduce this now in a function and try to run it like a function:
normal<-function(boco,df){
df_p1<-subset(df,df$Plate==1)
df_p2<-subset(df,df$Plate==2)
mat<-cbind(df_p1[["boco"]],df_p2[["boco"]])
norm<-preprocessCore::normalize.quantiles(mat)
df_1<-data.frame(var_1=c(norm[,1],norm[,2]),well=c(df_p1$well,df_p2$well))
return(df_1)
}
normal(antitrombina_FI,Six_Plex_IID)
I get the same error mesage:
Error in preprocessCore::normalize.quantiles(mat) :
Matrix expected in normalize.quantiles
I'm completely clueless about why this is happening, why outside the function I'm obtaining a matrix and why inside the function not.
Thanks
ive created a lot of character objects in R that i would like to put into a list (storing all their information).
the object looks like this and the pattern is "TMC"
str(TMCS09g10086933)
chr [1:10] "TMCS09g1008699" "TMCS09g1008610 "TMCS09g10086101" "TMCS09g10086104" "TMCS09g100864343" "TMCS09g10086434343" "TMCS09g10086994111" ...
i have hundreds of these objects. Could someone tell me how to do this?
You can use the function objects with the argument pattern to list them.
Then, you can call the function get to fetch them. If you do this with an lapply, you will get a list returned right away.
TMClist <- lapply(objects(pattern = "^TMC"), get)
First you need to find the objects, which you can do with a regex search through the list of the objects in your environment grep("^TMC", ls(), value = TRUE), then you need to get the objects using the character vector of their names. For that you use mget.
your_list <- mget(grep("^TMC", ls(), value = TRUE))
I've got a large function in R and the users have the ability to not include/specify an object. If they DO, the code checks to make sure the names in that object match the names in another. If they DON'T, there's no need to do that checking. The code line is:
if(exists("grids")) if(!all(expvarnames %in% names(grids))) {stop("Not all expvar column names found as column names in grids")}
But I'm getting the following error:
Error in match(x, table, nomatch = 0L) : argument "grids" is missing, with no default
Well in this trial run, grids is SUPPOSED to be missing. If I try
if(exists("grids")) print("yay")
Then nothing prints, i.e. the absence of grids means the expression isn't evaluated, which is as I'd expect. So can anyone think why R seems to be evaluating the subsequent IF statement in the main example? Should I slap another set of curly brackets around the second one??
Thanks!
Edit: more problems. Removing "grids," from the functions list of variables means it works if there's no object called grids and you don't specify it in the call (i.e. function(x,grids=whatever)). And keeping "grids," IN the functions list of variables means it works if there IS an object called grids and you do specify it in the call.
Please see this: http://i.imgur.com/9mr1Lwi.png
using exists(grids) is out because exists wants "quotes" and without em everything fails. WITH them ("grids"), I need to decide whether to keep "grids," in the functions list. If I don't, but I specify it in the call (function(x,grids=whatever)) then I get unused argument fail. If I DO, but don't specify it in the call because grids doesn't exist and I don't want to use it, I get match error, grids missing no default.
How do I get around this? Maybe list it in the function variables list as grids="NULL", then rather than if(exists("grids")) do if(grids!="NULL")
I still don't know why the original match problem is happening though. Match is from the expvarnames/grids names checker, which is AFTER if(exists("grids")) which evaluates to FALSE. WAaaaaaaiiiiittttt..... If I specify grids in the function variables list, i.e. simply putting function(x,grids,etc){do stuff}, does that mean the function CREATES an object called grids, within its environment?
Man this is so f'd up....
testfun <- function(x,grids)
{if(exists("grids")) globalgrids<<-grids
print(x+1)}
testfun(1) # Error in testfun(1) : argument "grids" is missing, with no default
testfun <- function(x,grids)
{if(exists("grids")) a<<-c(1,2,3)
print(x+1)}
testfun(1) #2 (and globally assigns a)
So in the first example, the function seems to have created an object called "grids" because exists("grids") evaluates to true. But THEN, ON THE SAME LINE, when asked to do something with grids, it says it doesn't exist! Schroedinger's object?!
This is proven in example 2: grids evaluates true and a is globally assigned then the function does its thing. Madness. Complete madness. Does anyone know WHY this ridiculousness is going on? And is the best solution to use my grids="NULL" default in the functions variables list?
Thanks.
Reproducible example, if you want to but I've already done it for every permutation:
testfun <- function(x,grids)
{if(exists("grids")) if(!all(expvarnames %in% names(grids))) {stop("Not all expvar column names found as column names in grids")}
print(x+1)}
testfun(1)
testfun(x=1,grids=grids)
grids<-data.frame(c(1,2,3),c(1,2,3),c(1,2,3))
expvarnames <- c("a","b","c")
colnames(grids) <- c("a","b","c")
Solution
Adapting your example use:
testfun <- function(x,grids = NULL)
{
if(!is.null(grids)){
if(!all(expvarnames %in% names(grids))){
stop("Not all expvar column names found as column names in grids")
}
print(x+1)
}
}
Using this testfun(1) will return nothing. By specifying a default argument in the function as NULL the function then checks for this (i.e. no argument specified) and then doesn't continue the function if so.
The Reason the Problem Occurs
We go through each of the examples:
testfun <- function(x,grids)
{if(exists("grids")) globalgrids<<-grids
print(x+1)}
testfun(1) # Error in testfun(1) : argument "grids" is missing, with no default
Here we call the function testfun, giving only the x argument. testfun knows it needs two arguments, and so creates local variables x and grids. We have then given an argument to x and so it assigns the value to x. There is no argument to grids, however the variable has still been created, even though no value has been assigned to it. So grids exists, but has no value.
From this exists("grids") will be TRUE, but when we try to do globalgrids<<-grids we will get an error as grids has not been assigned a value, and so we can't assign anything to globalgrids.
testfun <- function(x,grids)
{if(exists("grids")) a<<-c(1,2,3)
print(x+1)}
testfun(1) #2 (and globally assigns a)
This, however is fine. grids exists as in the previous case, and we never actually try and access the value stored in grids, which would cause an error as we have not assigned one.
In the solution, we simply set a default value for grids, which means we can always get something whenever we try and access the variable. Unlike in the previous cases, we will get NULL, not that nothing is stored there.
The main point of this is that when you declare arguments in your function, they are created each time you use the function. They exist. However, if you don't assign them values in your function call then they will exist, but have no value. Then when you try and use them, their lack of values will throw an error.
> a <- c(1,2,3,4)
> b <- c(2,4,6,8)
> if(exists("a")) if(!all(a %in% b)) {stop("Not all a in b")}
Error: Not all a in b
> rm(a)
> if(exists("a")) if(!all(a %in% b)) {stop("Not all a in b")}
>
When a does not exist, the expression does not evaluate, as expected. Before testing your first expression, make sure that grids does not exist by running rm(grids) in the console.
Richard Scriven's comment got me thinking: grids was an argument in my function but was optional, so maybe shouldn't be specified (like anything in "..." optional functions). I commented it out and it worked. Hooray, cheers everyone.
I am trying to create an r loop to run a command on a series of datasets. the command is make.design.data from the RMark library. The only argument it takes is the name of a list. I have 17 of these lists I'd like to pass to make.design.data This is the code I've been trying to use
DFNames<-c("DFAmerican.Goldfinch", "DFAmerican.Robin","DFBarn.Swallow","DFBobolink", "DFBrown.head.Cowbird", "DFCedar.Waxwing", "DFCommon.Grackle","DFCommon.Yellowthroat", "DFEuropean.Starling","DFHorned.Lark", "DFKilldeer","DFRed.wing.Blackbird", "DFSavannah.Sparrow", "DFSong.Sparrow","DFTree.Swallow", "DFVesper.Sparrow", "DFYellow.Warbler")
#in my environment each of the names given to DFNames represents a list
for (x in DFNames){
n<-make.design.data(x)
assign(paste0("ddl",x),n)
}
this gives me the error
Error in data$model : $ operator is invalid for atomic vectors
can anyone please suggest a way to fix my code, or a different way of tackling this?
Thanks, Jude
Instead, you can make a list of the actual data sets instead of a vector of their names.
x <- list(DFAmerican.Goldfinch, ...)
Then you can use:
lapply(x, make.design.data)`.
Or use get inside your for loop:
for (x in DFNames) {
make.design.data(get(x))
}
The "R" way is the former using lists and the apply family. Then you can avoid the gymnastics of assign.