Referencing recently used objects in R - r

My question refers to redundant code and a problem that I've been having with a lot of my R-Code.
Consider the following:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
combined_df_putnam$fu_time<-combined_df_putnam$age*365.25
combined_df_einstein$fu_time<-combined_einstein$age*365.25
combined_df_newton$fu_time<-combined_newton$age*365.25
...
combined_leibniz$fu_time<-combined_leibniz$age*365.25
I am trying to slim-down my code to do something like this:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1)
paste0("combined_df_",list_names[0:7]) <- paste0("combined_df_",list_names[0:7])$age*365.25
When I try to do that, I get "target of assignment expands to non-language object".
Basically, I want to create a list that contains descriptors, use that list to create a list of dataframes/lists and use these shortcuts again to do calculations. Right now, I am copy-pasting these assignments and this has led to various mistakes because I failed to replace the "name" from the previous line in some cases.
Any ideas for a solution to my problem would be greatly appreciated!

The central problem is that you are trying to assign a value (or data.frame) to the result of a function.
In paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1), the left-hand-side returns a character vector:
> paste0("combined_df_",list_names[0:7])
[1] "combined_df_putnam" "combined_df_einstein" "combined_df_newton"
[4] "combined_df_kant" "combined_df_hume" "combined_df_locke"
[7] "combined_df_leibniz"
R will not just interpret these strings as variables that should be created and be referenced to. For that, you should look at the function assign.
Similarily, in the code paste0("combined_df_",list_names[0:7])$age*365.25, the paste0 function does not refer to variables, but simply returns a character vector -- for which the $ operator is not accepted.
There are many ways to solve your problem, but I will recommend that you create a function that performs the necessary operations of each data frame. The function should then return the data frame. You can then re-use the function for all 7 philosophers/scientists.

Related

Avoid partial matching of list names? [duplicate]

I thought that R did not do partial matching on named lists, so I'm confused by the example below. I tried reading the Argument matching document but I'm still not sure of what's going on. Any help understanding this example would be appreciated.
ll <- list("dir_session" = "some_directory")
print(ll$dir_session) # prints contents of ll$dir_session as expected
print(ll$dir) # prints contents of ll$dir_session, but I expected to print NULL
print(ll[["dir"]]) # prints NULL as expected
Not sure if it makes a difference but I'm using R version 3.3.3 (2017-03-06).
I'm afraid the answer is you thought wrong. It has less to do with the class of object (a named list) and more to do with the "$" operator which does partial matching. See the ?Extract help page. This is different than argument matching when calling a function.

Problems obtaining the correct object class. R

I created a small function to process a dataframe to be able to use the function:
preprocessCore::normalize.quantiles()
Since normalize.quintles() can only use a matrixc object, and I need to rearrange my data, I create a small function that takes a specific column (variable) in a especific data frame and do the following:
normal<-function(boco,df){
df_p1<-subset(df,df$Plate==1)
df_p2<-subset(df,df$Plate==2)
mat<-cbind(df_p1$boco,df_p2$boco)
norm<-preprocessCore::normalize.quantiles(mat)
df_1<-data.frame(var_1=c(norm[,1],norm[,2]),well=c(df_p1$well,df_p2$well))
return(df_1)
}
However, "mat" should be a matrix, but it seems the cbind() does not do its job since I'm obtaining the following Error:
normal(antitrombina_FI,Six_Plex_IID)
Error in preprocessCore::normalize.quantiles(mat) :
Matrix expected in normalize.quantiles
So, it is clear that the cbind() is not creating a matrix. I don't understand why this is happening.
Most likely you are binding two NULL objects together, yielding NULL, which is not a matrix. If your df objects are data.frame, then df_p1$boco is interpreted as "extract the variable named boco", not "extract the variable whose name is the value of an object having the symbol boco". I suspect that your data does not contain a variable literally named "boco", so df_p1$boco is evaluated as NULL.
If you want to extract the column that is given as the value to the formal argument boco in function normal() then you should use [[, not $:
normal<-function(boco,df){
df_p1<-subset(df,df$Plate==1)
df_p2<-subset(df,df$Plate==2)
mat<-cbind(df_p1[[boco]],df_p2[[boco]])
norm<-preprocessCore::normalize.quantiles(mat)
df_1<-data.frame(var_1=c(norm[,1],norm[,2]),well=c(df_p1$well,df_p2$well))
return(df_1)
}
Thanks for your help bcarlsen. However I have found some errors:
First, I believe you need to introduce quotes in
mat<-cbind(df_p1[["boco"]],df_p2[["boco"]])
If I run this script outside of a function works erally perfectly:
df_p1<-subset(Six_Plex_IID,Six_Plex_IID$Plate==1)
df_p2<-subset(Six_Plex_IID,Six_Plex_IID$Plate==2)
mat<-cbind(df_p1[["antitrombina_FI"]],df_p2[["antitrombina_FI"]])
norm<-preprocessCore::normalize.quantiles(mat)
However If I introduce this now in a function and try to run it like a function:
normal<-function(boco,df){
df_p1<-subset(df,df$Plate==1)
df_p2<-subset(df,df$Plate==2)
mat<-cbind(df_p1[["boco"]],df_p2[["boco"]])
norm<-preprocessCore::normalize.quantiles(mat)
df_1<-data.frame(var_1=c(norm[,1],norm[,2]),well=c(df_p1$well,df_p2$well))
return(df_1)
}
normal(antitrombina_FI,Six_Plex_IID)
I get the same error mesage:
Error in preprocessCore::normalize.quantiles(mat) :
Matrix expected in normalize.quantiles
I'm completely clueless about why this is happening, why outside the function I'm obtaining a matrix and why inside the function not.
Thanks

Combining many vectors into one larger vector (in an automated way)

I have a list of identifiers as follows:
url_num <- c('85054655', '85023543', '85001177', '84988480', '84978776', '84952756', '84940316', '84916976', '84901819', '84884081', '84862066', '84848942', '84820189', '84814935', '84808144')
And from each of these I'm creating a unique variable:
for (id in url_num){
assign(paste('test_', id, sep = ""), FUNCTION GOES HERE)
}
This leaves me with my variables which are:
test_8505465, test_85023543, etc, etc
Each of them hold the correct output from the function (I've checked), however my next step is to combine them into one big vector which holds all of these created variables as a seperate element in the vector. This is easy enough via:
c(test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144)
However, as I update the original 'url_num' vector with new identifiers, I'd also have to come down to the above chunk and update this too!
Surely there's a more automated way I can setup the above chunk?
Maybe some sort of concat() function in the original for-loop which just adds each created variable straight into an empty vector right then and there?
So far I've just been trying to list all the variable names and somehow get the output to be in an acceptable format to get thrown straight into the c() function.
for (id in url_num){
cat(as.name(paste('test_', id, ",", sep = "")))
}
...which results in:
test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144,
This is close to the output I'm looking for but because it's using the cat() function it's essentially a print statement and its output can't really get put anywhere. Not to mention I feel like this method I've attempted is wrong to begin with and there must be something simpler I'm missing.
Thanks in advance for any help you guys can give me!
Troy

how to use loop to run through set of lists

I am trying to create an r loop to run a command on a series of datasets. the command is make.design.data from the RMark library. The only argument it takes is the name of a list. I have 17 of these lists I'd like to pass to make.design.data This is the code I've been trying to use
DFNames<-c("DFAmerican.Goldfinch", "DFAmerican.Robin","DFBarn.Swallow","DFBobolink", "DFBrown.head.Cowbird", "DFCedar.Waxwing", "DFCommon.Grackle","DFCommon.Yellowthroat", "DFEuropean.Starling","DFHorned.Lark", "DFKilldeer","DFRed.wing.Blackbird", "DFSavannah.Sparrow", "DFSong.Sparrow","DFTree.Swallow", "DFVesper.Sparrow", "DFYellow.Warbler")
#in my environment each of the names given to DFNames represents a list
for (x in DFNames){
n<-make.design.data(x)
assign(paste0("ddl",x),n)
}
this gives me the error
Error in data$model : $ operator is invalid for atomic vectors
can anyone please suggest a way to fix my code, or a different way of tackling this?
Thanks, Jude
Instead, you can make a list of the actual data sets instead of a vector of their names.
x <- list(DFAmerican.Goldfinch, ...)
Then you can use:
lapply(x, make.design.data)`.
Or use get inside your for loop:
for (x in DFNames) {
make.design.data(get(x))
}
The "R" way is the former using lists and the apply family. Then you can avoid the gymnastics of assign.

R data table issue

I'm having trouble working with a data table in R. This is probably something really simple but I can't find the solution anywhere.
Here is what I have:
Let's say t is the data table
colNames <- names(t)
for (col in colNames) {
print (t$col)
}
When I do this, it prints NULL. However, if I do it manually, it works fine -- say a column name is "sample". If I type t$"sample" into the R prompt, it works fine. What am I doing wrong here?
You need t[[col]]; t$col does an odd form of evaluation.
edit: incorporating #joran's explanation:
t$col tries to find an element literally named 'col' in list t, not what you happen to have stored as a value in a variable named col.
$ is convenient for interactive use, because it is shorter and one can skip quotation marks (i.e. t$foo vs. t[["foo"]]. It also does partial matching, which is very convenient but can under unusual circumstances be dangerous or confusing: i.e. if a list contains an element foolicious, then t$foo will retrieve it. For this reason it is not generally recommended for programming.
[[ can take either a literal string ("foo") or a string stored in a variable (col), and does not do partial matching. It is generally recommended for programming (although there's no harm in using it interactively).

Resources