I have a named list of vectors, y. The names of the list correspond to the values of variable, x. I need to return the value of the vector in y that matches the value of x at position i. For example, if x == "b" at index 25, I expect to return the 25th value of the "b" vector contained in the list y.
This is my current solution:
x <- sample(letters[1:4], 100, replace = T)
y <- list("a"=rnorm(100), "b"=rnorm(100), "c"=rnorm(100))
i <- match(x, names(y))
m <- sapply(i, function(i) {out <- rep(0,3); out[i] <- 1; out})
final <- apply(t(m) * do.call(cbind, y), 1, sum)
I am hoping for something more idiomatic. As part of the solution, the answer handle cases where values in x do not appear in the names of y.
The real world use case I am trying to solve is the case where I have several segmented model predictions applied to the entire population that I need to assign to their appropriate segment.
EDIT
Also, trying to avoid the clunky usage of ifelse. Since the names are known, I shouldn't have to specify them manually.
Using matrix subsetting with 2-dimensional indices, you could simply do
do.call(cbind, y)[cbind(1:length(i), i)]
Related
My Problem in short
Suppose I have a list of two vectors each with 6 element.
I would like to compare the values of these two vectors using if statement. Then, after that, I would like to assign a name for each element based on the result of the if statement.
Example:
x <- c(1,3,5,22,78,56)
y <- c(2,4,3,21,88,77)
z <- list(x, y)
Then, I would like to compare the value of x and y as follows:
Compare the first element of x with the first one of y. If the value of x is larger than y then, the first element should be named as A otherwise it should named as B. Then, the output should be 6 elements as follows:
B B A A B B
Here is my try:
for(i in 1:6){
if(z[[1]][i] > z[[2]][i])
z[[1]][[i]] <- "M"
else "B"
}
but return me a list.
The 'best' solutions in R often use vectors rather than loops. The previous examples use vectors. Here's another such solution:
ifelse(z[[1]] > z[[2]], "A", "B")
If you take advantage of naming elements in a list, this example might make the code more user-friendly. In this example the names in the list are chosen to be identical to the variable names. They can be any syntactically valid name. And quotation marks aren't needed around the names in the list() function.
z <- list(x = x, y = y)
ifelse(z$x > z$y, "A", "B")
I'm not sure if you have a particular reason why you want to use and 'if' statement and a list, but if you're just looking for the result in a column, this works:
library(dplyr)
df <- tibble(
x = c(1,3,5,22,78,56),
y = c(2,4,3,21,88,77)
) %>%
mutate(result = case_when(
x > y ~ "A",
TRUE ~ "B"
))
the case_when is often a good substitute for an if statement, and run faster too.
If you really want the result in a list, you could do this at the end:
df_list <- list(df$x, df$y, df$result), but in that case the comment above probably offers a more direct solution.
i am working with consumer price index CPI and in order to calculate it i have to multiply the index matrix with the corresponding weights:
grossCPI77_10 <- grossIND1977 %*% weights1910/100
grossCPI82_10 <- grossIND1982 %*% weights1910/100
of course i would rather like to have a code like the one beyond:
grossIND1982 <- replicate(20, cbind(1:61))
grossIND1993 <- replicate(20, cbind(1:61))
weights1910_sc <- c(1:20)
grossIND_list <- mget(ls(pattern = "grossIND...."))
totalCPI <- mapply("*", grossIND_list, weights1910_sc)
the problem is that it gives me a 1200x20 matrix. i expected a normal matrix (61x20) vector (20x1) multiplication which should result in a 20x1 vector? could you explain me what i am doing wrong? thanks
part of your problem is that you don't have matrices but 3D arrays, with one singleton dimension. The other issue is that mapply likes to try and combine the results into a matrix, and also that constant arguments should be passed via MoreArgs. But actually, this is more a case for lapply.
grossIND1982 <- replicate(20, cbind(1:61))[,1,]
grossIND1993 <- replicate(20, cbind(1:61))[,1,]
weights1910_sc <- c(1:20)
grossIND_list <- mget(ls(pattern = "grossIND...."))
totalCPI <- mapply("*", grossIND_list, MoreArgs=list(e2 = weights1910_sc), SIMPLIFY = FALSE)
totalCPI <- lapply(grossIND_list, "*", e2 = weights1910_sc)
I am not sure if I understood all aspects of your problem (especially concerning what should be colums, what should be rows, and in which order the crossproduct shall be applied), but I will try at least to cover some aspects. See comments in below code for clarifications of what you did and what you might want. I hope it helps, let me know if this is what you need.
#instead of using mget, I recommend to use a list structure
#otherwise you might capture other variables with similar names
#that you do not want
INDlist <- sapply(c("1990", "1991"), function(x) {
#this is how to set up a matrix correctly, check `?matrix`
#I think your combination of cbind and rep did not give you what you wanted
matrix(rep(1:61, 20), nrow = 61)
}, USE.NAMES = TRUE, simplify = F)
weights <- list(c(1:20))
#the first argument of mapply needs to be a function, in this case of two variables
#the body of the function calculates the cross product
#you feed the arguments (both lists) in the following part of mapply
#I have repeated your weights, but you might assign different weights for each year
res <- mapply(function(x, y) {x %*% y}, INDlist, rep(weights, length(INDlist)))
dim(res)
#[1] 61 2
I am wanting to return multiple values from the apply() function and place them in separate columns in R but I keep getting errors. What I am trying to do is this:
experiments$result1, experiments$result2, experiments$result3 <- apply(experiments, 1,
function(row)
#Some analysis here
#return x, y, and z for column result1, result2, and result3
x, y, z
)
Maybe this is the wrong approach to the problem. Experiments is a data frame with several columns of data. I am wanting to append columns which are the result of the analysis for each row but I don't know how to do that without loops which is not idiomatic for R. Thanks for the help ahead of time.
So here is some more exact code.
experiments$result1, experiments$result2, experiments$result3 <- apply(experiments, 1, function(row)
x <- row["startingTemp"]*2
y <- row["startingTemp"]*3
z <- row["startingTemp"]*4
return (list(x, y, z))
)
the "startingTemp" field is one of the columns in my "experiments" data frame. I'm getting errors that the type 'closure' is not subsettable and object 'z' not found.
If the three values you want to return can be put in a vector (i.e. they are not of some complicated type like the results of a statistical test or a fitted model), just return the vector and apply will bind it to an 3xN matrix.
experiments$result <- apply(experiments, 1, function(row){
x <- row["startingTemp"]*2
y <- row["startingTemp"]*3
z <- row["startingTemp"]*4
c(x, y, z)
})
experiments$result1 <- experiments$result[1,]
experiments$result2 <- experiments$result[2,]
experiments$result3 <- experiments$result[3,]
If your three return values are of a complicated type (or not scalars) return them as a list (like Alan suggested) and extract them with lapply/sapply.
experiment$result1 <- lapply(experiment$result, "[[", 1)
Try return(list(x=x,y=y,z=z)) . For each experiments you return a list of 3 value.
EDIT
The function return ONE list for each experiment.
results <- apply(exp,1,function(x) return(list(x=x,y=x*2,z=x*3))) #results is a list of lists`
experiment$res1 <- unlist(results)[attr(unlist(results),"names")=="x"]
experiment$res2 <- unlist(results)[attr(unlist(results),"names")=="y"]
experiment$res3 <- unlist(results)[attr(unlist(results),"names")=="z"]
I have a question about applying a function on each elements of a list.
Here's my problem:
I have a list of DF (I divided a bigger DF by days):
mydf <- data.frame(x=c(1:5), y=c(21:25),z=rnorm(1:5))
mylist <- rep(list(mydf),5)
names(mylist) <-c("2006-01-01","2006-01-02","2006-01-03","2006-01-04","2006-01-05")
Don't care about this fake data if it's identical), it's just for the example. I've my results in column "z" for each DF of the list, and 2 other columns "x" and "y" representing some spatial coordinates.
I have another independent DF containing a list of "x" and "y" too, representing some specific regions (imagine 10 regions):
region <- data.frame(x=c(1:10),y=c(21:30),region=c(1:10))
The final aim is to have for each 10 regions, a value "z" (of my results) from the nearest point (according to coordinates) of each of the DF of my list.
That means for one region: 10 results "z" from DF1 of my list, then 10 other results "z" from DF2, ...
My final DF should look like this if possible (for the structure):
final1 <- data.frame("2006-01-01"=rnorm(1:10),"2006-02-01"=rnorm(1:10),
"2006-03-01"=rnorm(1:10),"2006-04-01"=rnorm(1:10),"2006-05-01"=rnorm(1:10))
With one column for one day (so one DF of the list) and one value for each row (so for example for 2006-01-01: the value "z" from the nearest point with the first region).
I already have a small function to look for the nearest value:
min.dist <- function(p, coord){
which.min( colSums((t(coord) - p)^2) )
}
Then, I'm trying to make a loop to have what I want, but I have difficulties with the list. I would need to put 2 variables in the loop, but it doesn't works.
This works approximately if I just take 1 DF of my list:
for (j in 1:nrow(region)){
imin <- min.dist(c(region[j,1],region[j,2]),mylist[[1]][,1:2])
imin[j] <- min.dist(c(region[j,1],region[j,2]),mylist[[1]][,1:2])
final <- mylist[[1]][imin[j], "z"]
final[j] <- mylist[[1]][imin[j], "z"]
final <- as.data.frame(final)
}
But if I select my whole list (in order to have one column of results for each DF of the list in the object "final"), I have errors.
I think the first problem is that the length of "regions" is different of the length of my list, and the second maybe is about adding a second variable for the length of my list.
I'm not very familiar with loop, and so with 2-variables loops.
Could you help me to change in the loop what should be changed in order to have what I'm looking for?
Thank you very much!
You can use lapply() to apply a function over a list.
This should work. It returns a list of vectors.
lapply(
mylist,
FUN = function(mydf)
mydf[apply(
region[, -3],
1,
FUN = function(x)
which.min(apply(
mydf[, -3],
1,
FUN = function(y)
dist(rbind(x, y))
))
), 3]
)
I have a list of different data types (factors, data.frames, and vectors, all the same length or number of rows), What I would like to do is subset each element of the list by a vector (let's call it rows) that represents row names.
If it was a data.frame() I would:
x <- x[rows,]
If it was a vector() or factor() I would:
x <- x[rows]
So, I've been playing around with this:
x <- lapply(my_list, function(x) ifelse(is.data.frame(x), x[rows,], x[rows]))
So, how do I accomplish my goal of getting a list of subsetted data?
I think this is YAIEP (Yet Another If Else Problem). From ?ifelse:
ifelse returns a value with the same shape as test which is filled
with elements selected from either yes or no depending on whether the
element of test is TRUE or FALSE.
See the trouble? Same shape as test.
So just do this:
l <- list(a = data.frame(x=1:10,y=1:10),b = 1:10, c = factor(letters[1:20]))
rows <- 1:3
fun <- function(x){
if (is.data.frame(x)){
x[rows,]
}
else{
x[rows]
}
}
lapply(l,fun)