I have the following vectors and a combined data frame which are objects feed to the expresion below.
x <- c(1,2,3,4)
y <- c(5,6,7,8)
z <- c(9,10,11,12)
h <- data.frame(x,y,z)
D <- print (( rep ( paste ( "h[,3]" ) , nrow(h) )) , quote=FALSE )
# [1] h[,3] h[,3] h[,3] h[,3]
DD <- c ( print ( paste ( (D) , collapse=",")))
# "[1] h[,3],h[,3],h[,3],h[,3]"
DDD <- print ( DD, quote = FALSE )
# However when I place DDD in expand.grid it does not work
is(DDD)
[1] "character" "vector" "data.frameRowLabels" "SuperClassMethod"
Thus the expresion expand.grid(DDD) does not work. How could I get a process where I repeat n times a character element which represents an object as to obtain a vector of the number of repeated character elements which when placed in expand.grid works.
It looks like you are trying to generate some R code then execute it. For your case, this will work:
# From your question
DDD
# [1] "h[,3],h[,3],h[,3],h[,3]"
# The code that you wish to execute, as a string
my_code <- paste("expand.grid(", DDD, ")")
# [1] "expand.grid( h[,3],h[,3],h[,3],h[,3] )"
# Execute the code
eval(parse(text = my_code))
I really recommend against doing this. See here for some good reasons why eval(parse(text = ...)) is a bad idea.
A more "R" solution to accomplish your task:
# Generate the data.frame, h
x <- c(1,2,3,4)
y <- c(5,6,7,8)
z <- c(9,10,11,12)
h <- data.frame(x,y,z)
# Repeat the 3rd column 3 times, then call expand.grid
expand.grid(rep(list(h[,3]), times = 3))
# Alternatively, access the column by name
expand.grid(rep(list(h$z), times = 3))
By the way, I recommend looking at the help files for expand.grid - they helped me reach a solution to your problem quite quickly after understanding the arguments for expand.grid.
Related
I would like to create evaluate different indexes in a for loop cycle.
those indexes has different formulas and not always they need to be evaluated.
f.i. :
my indices to evaluate might be
a=1
b=2
c=5
d=8
IDX1=function(a,b) {result=a+b}
IDX2=function(c,b) {result=c+b}
IDX3=function(d,b) {result=d+b-c}
IDX4=function(a,d) {result=a+d+b+c}
the formulas doesn't really matter
in a data frame I have the iteration number and the indices i need to take at each loop (let's say that I have to evaluate 2 indices for each iteration)
head=c("iter","IndexA","IndexB")
r1=c(1,"IDX1","IDX2")
r2=c(2,"IDX3","IDX4")
r3=c(3,"IDX1","IDX4")
df=as.data.frame(rbind(head,r1,r2,r3))
what I would like to do is within the loop evaluate for each iteration the respective 2 indices, calling automatically the right formula ad feed it with the right arguments
iter 1 : IndexA=IDX1(args)=3 ; IndexB(args)=IDX2(args)=7
iter 2 : IndexA=IDX3(args)=5 ; IndexB(args)=IDX4(args)=16
iter 3 : IndexA=IDX1(args)=3 ; IndexB(args)=IDX4(args)=16
Plese do not answer with "just run all the function and recall the
needed result in the loop".
I'm working with big matrix and memory is a problem indeed. I need to evaluate the function within the loop to reduce the usage of memory
I believe that the answer is some what inside this discussion but I can't get trough.
How to create an R function programmatically?
Can somebody explain me
1. how to built a function that can be programmatically changed in a loop?
2. once I have it how can I run the formula and get the result I want?
thanks
You can use a combination of eval and parse function to call (evaluate) any string as code. First, you have to construct such a string. For this, you can specify your indexes as character strings. For example: IDX1 = "a + b". Then, you can get that value by name with get("IDX1").
Try this code:
# Your preparations
a <- 1
b <- 2
c <- 5
d <- 8
IDX1 <- "a + b"
IDX2 <- "c + b"
IDX3 <- "d + b - c"
IDX4 <- "a + d + b + c"
head = c("iter", "IndexA", "IndexB")
r1 = c(1, "IDX1", "IDX2")
r2 = c(2, "IDX3", "IDX4")
r3 = c(3, "IDX1", "IDX4")
df = as.data.frame(rbind(r1, r2, r3))
colnames(df) <- head
# Loop over df with apply
result <- apply(df, 1, function(x){
# Construct call string for further evaluation
IndexA_call <- paste("ia <- ", get(x[2]), sep = "")
IndexB_call <- paste("ib <- ", get(x[3]), sep = "")
# eval each call string
eval(parse(text = IndexA_call))
eval(parse(text = IndexB_call))
x[2:3] <- c(ia, ib)
return(as.numeric(x))
})
result <- t(result)
colnames(result) <- head
print(result)
This gives:
iter IndexA IndexB
r1 1 3 7
r2 2 5 16
r3 3 3 16
I'm still very new to R and haven't found any answer so far. Sorry to finally ask.
Edition with a quick example:
I want to compute a multidimensional development index based on South Africa Data.
My list is composed of individual information for each year, so basically df1 is about year 1 and df2 about year2.
df1<-data.frame(var1=c(1, 1,1), var2=c(0,0,1), var3=c(1,1,0))
df2<-data.frame(var1=c(1, 0,1), var2=c(1,0,1), var3=c(0,1,0))
mylist <-list (df1,df2)
You can find here a very simplified working index function:
myindex <- function(x, dimX, dimY){
econ_i<- ( x[dimX]+ x[dimY] )
return ( (1/length(econ_i))*sum(econ_i) )
}
myindex(df1, "var2", "var3")
Then I have my dataframe of variables I want to use for my index
mydf <- data.frame(set1=c("var1", "var2"), set2=c("var2", "var3"))
I'm using a function to get arguments from database such as:
pick_values <-function(x){
vect <-c()
for(i in x){
vect <- c(vect, i)
}
return(vect)
}
I'd like to set up a lapply loop such that I apply my function for my list, for all sets of arguments in my dataframe. In other words, I'd like to compute my index for both years, with all sets of variables I can use. //end Edit
I've tried many unsuccessful things so far. For instance:
lapply(mylist, myindex, lapply(mydf,pick_values))
Thanks a lot for your help!
Okay, I don't like your mydf name nor that it has factors, so I rename it args because it has function arguments and I set stringsAsFactors = F:
args <- data.frame(set1=c("var1", "var2"), set2=c("var2", "var3"), stringsAsFactors = F)
We'll also write a wrapper for myindex that accepts a vector of arguments instead of dimX and dimY:
myindex2 = function(x, d) {
myindex(x, d[1], d[2])
}
Then we can nest lapply like this:
lapply(mylist, function(m) lapply(args, myindex2, x = m))
# $df1
# $df1$set1
# [1] 4
#
# $df1$set2
# [1] 3
#
#
# $df2
# $df2$set1
# [1] 4
#
# $df2$set2
# [1] 3
I have a data frame that looks like this:
set.seed(42)
data <- runif(1000)
utility <- sample(c("abc","bcd","cde","def"),1000,replace=TRUE)
stage <- sample(c("vwx","wxy","xyz"),1000,replace=TRUE)
x <- data.frame(data,utility,stage)
head(x)
data utility stage
1 0.9148060 def xyz
2 0.9370754 abc wxy
3 0.2861395 def xyz
4 0.8304476 cde xyz
5 0.6417455 bcd xyz
6 0.5190959 abc xyz
and I want to generate cumulative distribution functions for the unique combinations of utility and stage. In my real application I'll end up generating about 100 cdfs but this random data will have 12 (4x3) unique combinations. But I'll be using each of those cdfs thousands of times, so I don't want to calculate the cdf on the fly each time. The ecdf() function works exactly as I'd like, except I'd need to vectorize it. The following code doesn't work, but it's the gist of what I'm trying to do:
ecdf_multiple <- function(x)
{
i=0
utilities <- levels(x$utilities)
stages <- levels(x$stages)
for(utility in utilities)
{
for(stage in stages)
{
i <- i + 1
y <- ecdf(x[x$utilities == utility & x$stage == stage,1])
# calculate ecdf for the unique util/stage combo
z[i] <- list(y,utility,stage)
# then assign it to a data element (list, data frame, json, whatever) note-this doesn't actually work
}
}
z # return value
}
so after running ecdf_multiple and assigning it to a variable, I'd reference that variable somehow by passing a value (for which I wanted the cdf), the utility and the stage.
Is there a way to vectorize the ecdf function (or use/build another) so that I can the output several times without neededing to generate distributions over and over?
-------Added to respond to #Pascal 's excellent suggestion.-------
How might one expand this to a more general case of taking "n" dimensions of categories? This is my stab, based on Pascal's case of two dimensions. Notice how I tried to assign "y":
set.seed(42)
data <- runif(1000)
utility <- sample(c("abc","bcd","cde","def"),1000,replace=TRUE)
stage <- sample(c("vwx","wxy","xyz"),1000,replace=TRUE)
openclose <- sample(c("open","close"),1000,replace=TRUE)
x <- data.frame(data,utility,stage,openclose)
numlabels <- length(names(x))-1
y <- split(x, list(x[,2:(numlabels+1)]))
l <- lapply(y,function(x) ecdf(x[,"data"]))
#execute
utility <- "abc"
stage <- "xyz"
openclose <- "close"
comb <- paste(utility, stage, openclose, sep = ".")
# call the function
l[[comb]](.25)
During the assignment of "y" above, I get this error message:
"Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?"
The following might help:
# we create a list of criteria by excluding
# the first column of the data.frame
y <- split(x, as.list(x[,-1]))
l <- lapply(y, function(x) ecdf(x[,"data"]))
utility <- "abc"
stage <- "xyz"
comb <- paste(utility, stage, sep = ".")
l[[comb]](0.25)
# [1] 0.2613636
plot(l[[comb]])
I have a list of records:
z <- list(list(a=1),list(a=4),list(a=2))
and I try to add fields to each of them.
Alas, neither
lapply(z,function(l) l$b <- 1+l$a)
nor
for(l in z) l$b <- 1+l$a
modifies z.
In this simple case I can, of course, do
z <- lapply(z,function(l) c(list(b= 1+l$a),l))
but this quickly gets out of hand when the lists have more nesting:
z <- list(list(a=list(b=1)),list(a=list(b=4)),list(a=list(b=2)))
How do I turn it into
list(list(a=list(b=1,c=2)),list(a=list(b=4,c=5)),list(a=list(b=2,c=3)))
without repeating the definition of the whole structure?
Each element of z has many fields, not just a; and z[[10]]$a has many subfields, not just b.
Your first code example doesn't modify the list because you need to return the list in your call to lapply:
z <- list(list(a=1),list(a=4),list(a=2))
expected <- list(list(a=1, b=2), list(a=4, b=5), list(a=2, b=3))
outcome <- lapply(z,function(l) {l$b <- 1+l$a ; l})
all.equal(expected, outcome)
# [1] TRUE
In the doubly nested example, you could use lapply within lapply, again making sure to return the list in the inner lapply:
z <- list(list(a=list(b=1)),list(a=list(b=4)),list(a=list(b=2)))
expected <- list(list(a=list(b=1, c=2)), list(a=list(b=4, c=5)), list(a=list(b=2, c=3)))
obtained <- lapply(z, function(l1) { lapply(l1, function(l2) {l2$c = l2$b+1 ; l2 } )})
all.equal(expected, obtained)
# [1] TRUE
Another, somewhat convoluted, option:
z <- list(list(a=1),list(a=4),list(a=2))
res <- list(list(a=list(b=1,c=2)),list(a=list(b=4,c=5)),list(a=list(b=2,c=3)))
res1 <- rapply(z,function(x) list(b = x,c = x+1),how = "replace")
> all.equal(res,res1)
[1] TRUE
I only say convoluted because rapply can be tricky to use at times (for me at least).
I have a list containing 3 vectors.
mylist <- list( a = c(1,2),
b = c(3,4),
c = c(5,6) )
Is there any simple way to, for instance, perform computations on the first values of the three objects with the sum() function?
I tried many things like:
sum(mylist[c(a, b, c)][1])
This line of code does not work, but it gives insight into what I am trying to do.
Thanks for your help.
use sapply
> sum(sapply(mylist, "[", 1))
[1] 9
Bonus fun fact: You can use c( ) inside of [[ ]]:
sum( sapply(seq(mylist), function(i) mylist[[ c(i, 1) ]]) )
Not very efficient solution:
sum(unlist(lapply(mylist,'[',1)))
[1] 9