How can I use attr<- with lapply? - r

Or put it more general: How can I add multiple attributes to the elements of list?
I am stuck trying to set an attribute to elements of a list all of which are data.frames. In the end I would like to add names(myList) as a varying attribute to every data.frame inside. But I even cannot get a static attribute for all list elements to go.
lapply(myList,attr,which="myname") <- "myStaticName"
This does not work because lapply does not work with lapply<-. If I had at least an idea how to do this, maybe I could figure out how to do it with varying attributes like the name of the list.

I don't recommend it, but you could do: lapply(myList, 'attr<-', which='myname', value='myStaticName'). An old fashioned for loop is probably the clearest way to perform this task---or do this assignment upstream when the objects are created.
for (i in seq_along(myList)) attr(myList[[i]], 'myname') <- 'myStaticName'
EDIT:
As #mnel points out in the comments, setattr in the data.table package is also an efficient option, since it assigns by reference.
Edit: #mnel -- don't use setattr with lapply. This is one case where the for loop is much faster.
library(microbenchmark)
library(data.table)
myList <- as.list(1:10000)
`lapply.attr<-` <-
function()
lapply(myList, 'attr<-', which='myname', value='myStaticName')
`for.attr<-` <-
function()
for (i in seq_along(myList))
attr(myList[[i]], 'myname') <- 'myStaticName'
lapply.setattr <-
function()
lapply(myList, setattr, name='myname', value='myStaticName')
for.setattr <- function()
for (i in seq_along(myList))
setattr(myList[[i]], name = 'myname', value = 'myStaticName')
result <- microbenchmark(`lapply.attr<-`(), `for.attr<-`(), lapply.setattr(), for.setattr())
plot(result)

Based on this answer by Thierry I found a solution on my own. Actually I have been close with several tries but did not return the WHOLE list which is key.
myList <- lapply(names(myList),function(X){
attr(myList[[X]],"myname") <- X
myList[[X]]
})
My mistake was not to return the whole list but only the second line of the function, i.e. the attribute. Thus I was not able to replace the initial list.
#Matthew Plourde: what's strange: your benchmark looks somewhat different on my machine: RStudio, OS X, 2.5 Ghz Intel Core i7, 16GB RAM.

Related

How to use lapply function to instead of for loop in a custom function which is not vectorized in arguments

Firstly, let us generate data like this:
library(data.table)
data <- data.table(date = as.Date("2015-05-01")+0:299)
set.seed(123)
data[,":="(
a = round(30*cumprod(1+rnorm(300,0.001,0.05)),2),
b = rbinom(300,5000,0.8)
)]
Then I want to use my custom function to operate multiple columns multiple times without manually typing out .Such as my custom function is add <- function(x,n) (x+n)
I provide my for loops code as following:
add <- function(x,n) (x+n)
n <- 3
freture_old <- c("a","b")
for(i in 1:n ){
data[,(paste0(freture_old,"_",i)) := add(.SD,i),.SDcols =freture_old ]
}
Could you please tell me a lapply version to instead of for loop?
If all you want is to use an lapply loop instead of a for loop you really do not need to change much. For a data.table object it is even easier since every iteration will change the data.table without having to save a copy to the global environment. One thing I add just to suppress the output to the console is to wrap an invisible around it.
lapply(1:n,function(i) data[,paste0(freture_old,"_",i):=lapply(.SD,add,i),.SDcols =freture_old])
Note that if you assign this lapply to an object you will get a list of data.tables the size of the number of iterations or in this case 3. This will kill memory because you are really only interested in the final entry. Therefore just run the code without assigning it to a variable. Now if you do not assign it to anything you will get every iteration printed out to the console. So what I would suggest is to wrap an invisible around it like this:
invisible(lapply(1:n,function(i) data[,paste0(freture_old,"_",i):=lapply(.SD,add,i),.SDcols =freture_old]))
Hope this helps and let me know if you need me to add anything else to this answer. Good luck!
An option without R "loop" (quoted since ultimately its a loop at certain level somewhere):
data[,
c(outer(freture_old, seq_len(n), paste, sep="_")) :=
as.data.table(matrix(outer(as.matrix(.SD), seq_len(n), add), .N)),
.SDcols=freture_old]
Or equivalently in base R:
setDF(data)
cbind(data, matrix(outer(as.matrix(data[, freture_old]), seq_len(n), add),
nrow(data)))

Recall different data names inside loop

here is how I created number of data sets with names data_1,data_2,data_3 .....and so on
for initial
dim(data)<- 500(rows) 17(column) matrix
for ( i in 1:length(unique( data$cluster ))) {
assign(paste("data", i, sep = "_"),subset(data[data$cluster == i,]))
}
upto this point everything is fine
now I am trying to use these inside the other loop one by one like
for (i in 1:5) {
data<- paste(data, i, sep = "_")
}
however this is not giving me the data with required format
any help will be really appreciated.
Thank you in advance
Let me give you a tip here: Don't just assign everything in the global environment but use lists for this. That way you avoid all the things that can go wrong when meddling with the global environment. The code you have in your question, will overwrite the original dataset data, so you'll be in trouble if you want to rerun that code when something went wrong. You'll have to reconstruct the original dataframe.
Second: If you need to split a data frame based on a factor and carry out some code on each part, you should take a look at split, by and tapply, or at the plyr and dplyr packages.
Using Base R
With base R, it depends on what you want to do. In the most general case you can use a combination of split() and lapply or even a for loop:
mylist <- split( data, f = data$cluster)
for(mydata in mylist){
head(mydata)
...
}
Or
mylist <- split( data, f = data$cluster)
result <- lapply(mylist, function(mydata){
doSomething(mydata)
})
Which one you use, depends largely on what the result should be. If you need some kind of a summary for every subset, using lapply will give you a list with the results per subset. If you need this for a simulation or plotting or so, you better use the for loop.
If you want to add some variables based on other variables, then the plyr or dplyr packages come in handy
Using plyr and dplyr
These packages come especially handy if the result of your code is going to be an array or data frame of some kind. This would be similar to using split and lapply but then in a way Hadley approves of :-)
For example:
library(plyr)
result <- ddply(data, .(cluster),
function(mydata){
doSomething(mydata)
})
Use dlply if the result should be a list.

Replace characters in a column, based on a translation table from another data frame

I have a data.frame mapping which contains path and map.
I also have another data.frame DATA which contains the raw path and value.
EDIT: Path might have two components or more: e.g. "A>C" or "A>C>B"
set.seed(24);
DATA <- data.frame(
path=paste0(sample(LETTERS[1:3], 25, replace=TRUE), ">", sample(LETTERS[1:3], 25, replace=TRUE)),
value=rnorm(25)
)
mapping <- data.frame(path=c("A","B","C"), map=c("X","Y","Z"))
lapply(mapping, function (x) {
for (i in 1:nrow(DATA)) {
DATA$path[i] <- gsub(as.character(x["path"]),as.character(x["map"]),as.character(DATA$path[i]))
}
})
I'm trying to replace the path in DATA with the map value in mapping but this doesn't seem to be working for me.
"A>C" will be converted to "X>Z".
I understand that for loops are not good in R, but I can't think of another way to code it. Data size I'm working with is 6m row in DATA and 16k rows in mapping.
Clarification on Data: While the path consists of alphabets (ABC) now, the real path are actually domain names. Number of steps in a path is also not fixed at 2 and can be any number.
You can use chartr
DATA$path <- chartr('ABC', 'XYZ', DATA$path)
Or if we are using the data from 'mapping'
DATA$path <- chartr(paste(mapping$path, collapse=''),
paste(mapping$map, collapse=''), DATA$path)
Or using gsubfn
library(gsubfn)
pat <- paste0('[', paste(mapping$path, collapse=''),']')
indx <- setNames(as.character(mapping$map), mapping$path)
gsubfn(pat, as.list(indx), as.character(DATA$path))
Or a base R option based on #smci's comment
vapply(strsplit(as.character(DATA$path), '>'), function(x)
paste(indx[x], collapse=">"), character(1L))
Using data.table (1.9.5+), especially advisable b/c of the size of your data.
library(data.table)
setDT(DATA); setDT(mapping)
DATA[,paste0("path",1:2):=tstrsplit(path,split=">")]
setkey(DATA,path1)[mapping,new.path1:=i.map]
setkey(DATA,path2)[mapping,new.path2:=i.map]
DATA[,new.path:=paste0(new.path1,">",new.path2)]
If you want to get rid of the extra columns:
DATA[,paste0(c("","","new.","new."),"path",rep(1:2,2)):=NULL]
If you just want to overwrite path, use path on the LHS of the last line instead of new.path.
This could also be written more concisely:
library(data.table)
setDT(mapping)
setkey(setkey(setDT(DATA)[,paste0("path",1:2):=tstrsplit(path,split=">")
],path1)[mapping,new.path1:=i.map],path2
)[mapping,new.path:=paste0(new.path1,">",i.map)]
I think you're using the wrong apply.
mapply allows you to use two arguments to the function, here the path and the map. Note that in mapply, the argument FUN comes first. You also do not need to do this row by row, you can just do the entire column at once. Finally, in an apply the variables do not get updated as they do in a for loop, so you need to assign them in the .GlobalEnv. You can do this with an explicit call to assign() or using <<- which assigns them in the first place it finds them in the stack. In this case, that will be back in .GlobalEnv.
After defining mapping and DATA as you do above, try this.
head(DATA)
invisible(mapply( function (x,y) {
DATA$path <<- gsub(x,y,DATA$path)
},mapping$path, mapping$map))
head(DATA)
note that the call to invisible suppresses output from mapply.
If you really want to use lapply, you can. But you need to transpose mapping. You can do that but it will be converted to a matrix, so you have to convert it back. Then, you can just use the same tricks with <<- and not using a for loop as above to get this code:
invisible(lapply(as.data.frame(t(mapping)), function (x) {
DATA$path <<- gsub(x[1],x[2],DATA$path)
}))
head(DATA)
Thanks for sharing, I learned a lot answering this question.

plyr within loop: unexpected behavior

I found an odd issue with plyr when using it inside a loop.
What I want to perform with this script is to iterate the plyr function with different input values (provided by the for loop) and store the results as a list of data.frames.
k=as.factor(c(rep("a",2), rep("b",2), rep("c",2), rep("d",2), rep("e",2)))
indata=data.frame(k)
outdata<-list()
for (i in 1:10){
tempdata<-ddply(.data = indata, .variables = .(k), .fun = summarize, i=i)
data[[i]]<-tempdata
rm(tempdata)
}
data
I would expect it to produce a list of data.frames each produced within a single iteration of the loop, and therefore a single value of the loop variable.
What happens instead is that each of the data.frames looks identical, with each row having a sequential value of the loop variable.
Storing the loop variable into a separate one makes it work, but seems like an awkward workaround.
k=as.factor(c(rep("a",2), rep("b",2), rep("c",2), rep("d",2), rep("e",2)))
indata=data.frame(k)
outdata<-list()
for (i in 1:10){
z=i
tempdata<-ddply(.data = indata, .variables = .(k), .fun = summarize, i=i, z=z)
data[[i]]<-tempdata
rm(tempdata)
}
data
Any ideas on what's causing this odd behavior?
This is a scoping issue. Functions within ddply (I believe llply) use i as a local variable and that's before your i in the search path. The easiest fix would be using j as the iterator:
for (j in 1:10)
However, I have no idea why you use ddply in your example. It doesn't seem necessary, so I assume it's only a toy example.

Evaluate a list of functions in R all with the same input and return list of arrays or matrix

Say you have a list of functions
funList=list()
for (i in 1:5){
funList[[i]]=approxfun(0:5,(0:5)^i,method="linear", rule=2)
}
and later you want a matrix of values with each row (or column which ever makes the code simpler or even a list of arrays instead of a matrix would be fine) being of the form of lets say
funList[[i]](1:3)
I've tried using lapply, but I haven't been able to get that to work
I would do:
eval.with.args <- function(FUN, ...) FUN(...)
Then one of:
lapply(funList, eval.with.args, 1:3)
sapply(funList, eval.with.args, 1:3)
mapply(eval.with.args, funList, list(1:3))
Map(eval.with.args, funList, list(1:3))
I think I remember asking on the forums if there was a function that already implemented function(FUN, ...)FUN(...) but the answer was "no" at the time. It could make a nice addition to the base or functional packages IMHO.
You're looking for do.call:
lapply(funList, do.call, list(1:3))
You can replace eval.with.args in all of #flodel's examples with do.call if you wrap the second argument in an additional call to list.

Resources