Accessing list elements within mutate - r

I am trying to use the dplyr 'mutate' command to perform matching over a list of arrays, but am getting an error "Error: recursive indexing failed at level 2"
here is an example:
templist=list();templist[["A"]]=c(6,9,8,1);templist[["B"]]=c(1,9,6,8);templist[["C"]]=c(8,1,9,6)
tempdat=data.frame(SYSTEM=c("A","A","A","B","B","B","C","C","C"),nums=c(1,8,9,1,8,9,1,8,9))
which provides
templist
$A
[1] 6 9 8 1
$B
[1] 1 9 6 8
$C
[1] 8 1 9 6
and
tempdat
SYSTEM idnum
1 A 1
2 A 8
3 A 9
4 B 1
5 B 8
6 B 9
7 C 1
8 C 8
9 C 9
I then want to find the position of matching numbers the lists corresponding to the appropriate systems. E.g.
tempdat %>% mutate(numids=match(nums,templist[[SYSTEM]]))
should yield
tempdat
SYSTEM nums numids
1 A 1 1
2 A 8 3
3 A 9 2
4 B 1 1
5 B 8 4
6 B 9 2
7 C 1 2
8 C 8 1
9 C 9 3
but I get the above noted error instead
(Error: recursive indexing failed at level 2)
Can anyone explain why this is failing? Or better yet, figure out a way to get this accomplished correctly?
I have a hunch that it could be done using a for loop to create separate data frames for each list and then use left_join to add the match indices from each system frame onto the original frame, but this seems like it will probably be very inefficient, inelegant, and clunky...

The reason it fails is that [[ for list doesn't accept vector indexing, and variable passed to mutate function is essentially a vector. A quick fix would be grouping your data frame by SYSTEM and pass unique variable to it thus for every group the SYSTEM would be a single value instead of a vector:
tempdat %>% group_by(SYSTEM) %>% mutate(numids=match(nums,templist[[unique(SYSTEM)]]))
# Source: local data frame [9 x 3]
# Groups: SYSTEM [3]
#
# SYSTEM nums numids
# (fctr) (dbl) (int)
# 1 A 1 4
# 2 A 8 3
# 3 A 9 2
# 4 B 1 1
# 5 B 8 4
# 6 B 9 2
# 7 C 1 2
# 8 C 8 1
# 9 C 9 3
If you check templist[[c("A", "B", "A")]], you will find that it throws exactly the same error as you have seen:
Error in templist[[c("A", "B", "A")]] : recursive indexing failed
at level 2

Related

Placing multiple outputs from each function call using apply into a row in a dataframe in R

I have a function that I repeat, changing the argument each time, using apply/sapply/lapply.
Works great.
I want to return a data set, where each row contains two (or more) variables from each iteration of the function.
Instead I get an unusable list.
do <-function(x){
a <- x+1
b <- x+2
cbind(a,b)
}
over <- [1:6]
final <- lapply(over, do)
Any suggestions?
Without changing your function do, you can use sapply and transpose it.
data.frame(t(sapply(over, do)))
# X1 X2
#1 2 3
#2 3 4
#3 4 5
#4 5 6
#5 6 7
#6 7 8
If you want to use do in current form with lapply, we can do
do.call(rbind.data.frame, lapply(over, do))
You could also try
as.data.frame(Reduce(rbind, final))
# a b
# 1 2 3
# 2 3 4
# 3 4 5
# 4 5 6
# 5 6 7
# 6 7 8
See ?Reduce and ?rbind for information about what they'll do.
You could also modify your final expression as
final <- as.data.frame(Reduce(rbind, lapply(over, do)))
#final
# a b
# 1 2 3
# 2 3 4
# 3 4 5
# 4 5 6
# 5 6 7
# 6 7 8

Need help concatenating column names

I am generating 5 different prediction and adding those predictions to an existing data frame. My code is:
For j in i{
…
actual.predicted <- data.frame(test_data, predicted)
}
I am trying to concatenate words together to create new column names, in the loop. Specifically, I have a column named “predicted” and I am generating predictions in each iteration of the loop. So, in the first iteration, I want the new column name to be “predicted.1” and for the second iteration, the new column name should be “predicted.2” and so on.
Any thoughts would be greatly appreciated.
You may not even need to use a loop here, but assuming you do, one pattern which might work well here would be to use a list:
results <- list()
for j in i {
# do something involving j
name <- paste0("predicted.", j)
results[[name]] <- data.frame(test_data, predicted)
}
One option is to set the names after assigning new columns
actual.predicted <- data.frame(orig_col = sample(10))
for (j in 1:5){
new_col = sample(10)
actual.predicted <- cbind(actual.predicted, new_col)
names(actual.predicted)[length(actual.predicted)] <- paste0('predicted.',j)
}
actual.predicted
# orig_col predicted.1 predicted.2 predicted.3 predicted.4 predicted.5
# 1 1 4 4 9 1 5
# 2 10 2 3 7 5 9
# 3 8 6 5 4 2 3
# 4 5 9 9 10 7 7
# 5 2 1 10 8 3 10
# 6 9 7 6 6 8 6
# 7 7 8 7 2 4 2
# 8 3 3 1 1 6 8
# 9 6 10 2 3 9 4
# 10 4 5 8 5 10 1

Operation between two dataframe with different size in R

I'd like to sum two dataframe with different size in R.
> x = data.frame(a=c(1,2,3),b=c(5,6,7))
> y = data.frame(x=c(1,1,1))
> x
a b
1 1 5
2 2 6
3 3 7
> y
x
1 1
2 1
3 1
The result I want is,
>
a b
1 2 6
2 3 7
3 4 8
How can I do this?
Maybe easiest to convert y to a vector with unlist and then perform the operation. Here, the vector in unlist(y) will be recycled over the columns of the data.frame x.
x + unlist(y)
a b
1 2 6
2 3 7
3 4 8
As a side note, data.frames are a special type of list object and sometimes performing operations on lists can be a bit more involved. On the otherhand, they tend to work fairly well with vectors as long as the dimensions line up (here, as long as the vector has the same length as the number of rows in the data.frame).
We can make the dimensions same and then get the sum
x + rep(y, ncol(x))
# a b
#1 2 6
#2 3 7
#3 4 8
Or another option is sweep
sweep(x, y$x, 1, `+`)
# a b
#1 2 6
#2 3 7
#3 4 8

Pasting as object names

I am trying to use paste0 with merge, so that I can merge a bunch of stuff in a loop. However, I'm having trouble with calling specific columns from data.frames
To illustrate, I'll use head
Example:
df <- data.frame(x=1:10,y=1:10)
head(df)
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
head(get("df"))
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
head(df$x)
[1] 1 2 3 4 5 6
head(get("df$x"))
Error in get("df$x") : object 'df$x' not found
Is there a way to get a specific column?
The function get looks for objects defined in an environment. If you do not specify the environment, it defaults to your global workspace.
You need to coerce df into an environment using as.environment, and then call get using this environment, e.g.:
get("x", as.enviroment(get("df")))

How to assign a name to an object when the name is stored in a different vector

Given these two objects:
v <- "new.name"
w <- 1:10
How can I tell R to rename w as new.name, so I can have this
> new.name
[1] 1 2 3 4 5 6 7 8 9 10
Thanks
You could do
assign(v, w)
new.name
# [1] 1 2 3 4 5 6 7 8 9 10
But it is considered a very bad practice in R, so read this first

Resources