Creating global environment objects from two lists - r

I am having an issue assigning variable objects to a list of data frames. For example,
df1 <- data.frame(a = 1, b = 1:10)
df2 <- data.frame(a = 2, b = 1:10)
df3 <- data.frame(a = 3, b = 1:10)
x <- c("a", "b", "c")
y <- list("df1", "df2", "df3")
My goal is to assign each data frame in the y list to the object in x. I can do it long hand.
a <- y[[1]]
But I have many iterations. I have tried the following without any luck
map2(x, y, function(x, y) x <- y)
and
map2(x, y, ~assign(x, y))
Appreciate any help!

We can unlist the list of object names (unlist(y)), get the values with mget in a list, set the names of the list elements with 'x' vector values and use list2env to create objects in the global env (not recommended though)
list2env(setNames(mget(unlist(y)), x), .GlobalEnv)
If we use map2, then we need to get the value of 'y' and also specify the environment to assign the value
map2(x, y, ~ assign(.x, get(.y), envir = .GlobalEnv))
-output
a
# a b
#1 1 1
#2 1 2
#3 1 3
#4 1 4
#5 1 5
#6 1 6
#7 1 7
#8 1 8
#9 1 9
#10 1 10
b
# a b
#1 2 1
#2 2 2
#3 2 3
#4 2 4
#5 2 5
#6 2 6
#7 2 7
#8 2 8
#9 2 9
#10 2 10

Unfortunately, the above options didn't work in my case. I ended up adapting the suggestions made by akrun which worked.
names(y) <- x
list2env(y, envir=.GlobalEnv)

Related

Use apply on two data.frame's

If I had a data.frame X and wanted to apply a function foo to each of its rows, I would just run apply(X, 1, foo). This is all well-known and simple.
Now imagine I have another data.frame Y and the following function:
mean_of_sum <- function(x,y) {
return(mean(x+y))
}
Is there a way to write an "apply equivalent" to the following loop:
my_loop_fun <- function(X, Y)
results <- numeric(nrow(X))
for(i in 1: length(results)) {
results[i] <- mean_of_sum(X[i,], Y[i,])
}
return(results)
If such an "apply syntax" exists, would it be more efficient than my "good" old loop?
this should work:
sapply(seq_len(nrow(X)), function(i) mean_of_sum(X[i,], Y[i,]))
You apply the function on the sequence 1, 2, ..., n (where n is the number of rows ) and in each "iteration" you evaluate mean_of_sum for the i-th row.
We can split every row of X and Y in list and use mapply to apply the function. Changing the function mean_of_sum a bit to convert one-row dataframe to numeric
mean_of_sum <- function(x,y) {
return(mean(as.numeric(x) + as.numeric(y)))
}
Consider an example,
X <- data.frame(a = 1:5, b = 6:10)
Y <- data.frame(c = 11:15, d = 16:20)
mapply(mean_of_sum, split(X, seq_len(nrow(X))), split(Y, seq_len(nrow(Y))))
# 1 2 3 4 5
#17 19 21 23 25
where X and Y are
X
# a b
#1 1 6
#2 2 7
#3 3 8
#4 4 9
#5 5 10
Y
# c d
#1 11 16
#2 12 17
#3 13 18
#4 14 19
#5 15 20
So the first value 17 is counted as
mean(c(1 + 11, 6 + 16))
#[1] 17
and so on for next values.

How can filter Function be used between 2 datasets in R Studio?

If I generate a sample of 30 from a data frame that has 50 observations, how can I separate the remaining 20 from the data frame of 50 using filter function? Is it possible to use the filter function between two data frames? If so, then how?
Thanks in advance.
Here is an example:
# dummy data
dat <- data.frame(x = 1:10,
y = letters[1:10], stringsAsFactors = FALSE)
Create a sample index, set a seed for reproducibility.
set.seed(1)
idx <- sort(sample(1:nrow(dat), size = 6, replace = FALSE))
idx
#[1] 2 3 4 5 7 8
Subset your data frame
dat[idx, ]
# x y
#2 2 b
#3 3 c
#4 4 d
#5 5 e
#7 7 g
#8 8 h
Get the rows that are not in idx
dat[-idx, ]
# x y
#1 1 a
#6 6 f
#9 9 i
#10 10 j

Assign the results of do.call using cbind to data frames

I want to combine multiple sets of two data frames (a & a_1, b & b_1, etc.). Basically, I want to do what this question is asking. I created a list of my two data sets:
# create data
a <- c(1, 2, 3)
b <- c(2, 3, 4)
at0H0 <- data.frame(a, b)
c <- c(1, 2, 3)
d <- c(2, 3, 4)
at0H0_1 <- data.frame(c, d)
e <- c(1, 2, 3)
f <- c(2, 3, 4)
at0H1 <- data.frame(a, b)
g <- c(1, 2, 3)
h <- c(2, 3, 4)
at0H1_1 <- data.frame(c, d)
# create lists of names
names <- list("at0H0", "at0H1")
namesLPC <- list("at0H0_1", "at0H1_1")
# column bind the data frames?
dfList <- list(cbind(names, namesLPC))
do.call(cbind, dfList)
But now I need it to create data frames for each. This do.call function just creates a list of the names of the data frames. Thanks!
(Edited to make reproducible code)
It's not super straight-forward, but with a little editing to a joining function you can get there:
joinfun <- function(x) do.call(cbind, unname(mget(x,inherits=TRUE)))
lapply(Map(c, names, namesLPC), joinfun)
#[[1]]
# a b c d
#1 1 2 1 2
#2 2 3 2 3
#3 3 4 3 4
#
#[[2]]
# a b c d
#1 1 2 1 2
#2 2 3 2 3
#3 3 4 3 4
The Map function pairs up the dataset names as required:
Map(c, names, namesLPC)
#[[1]]
#[1] "at0H0" "at0H0_1"
#
#[[2]]
#[1] "at0H1" "at0H1_1"
The lapply then loops over each part of the above list to mget (multiple-get) each object into a combined list. Like so, for the first part:
unname(mget(c("at0H0","at0H0_1"),inherits=TRUE))
#[[1]]
# a b
#1 1 2
#2 2 3
#3 3 4
#
#[[2]]
# c d
#1 1 2
#2 2 3
#3 3 4
Finally, do.call(cbind, ...) puts this combined list back into a single data.frame:
do.call(cbind, unname(mget(c("at0H0","at0H0_1"),inherits=TRUE)))
# a b c d
#1 1 2 1 2
#2 2 3 2 3
#3 3 4 3 4
I've figured out a way to do it. A few notes: I have 360 data sets that I need to combine, which is why it is i in 1:360. This also names the data sets from an array of the names of the data sets (which is dataNames)
for (i in 1:360){
assign(paste(dataNames[i], sep = ""), cbind(names[[i]], namesLPC[[i]]))
}

remove cases following certain other cases

I have a dataframe, say
df = data.frame(x = c("a","a","b","b","b","c","d","t","c","b","t","c","t","a","a","b","d","t","t","c"),
y = c(2,4,5,2,6,2,4,5,2,6,2,4,5,2,6,2,4,5,2,6))
I want to remove only those rows in which one or multiple ts are directly in between a d and a c, in all other cases I want to retain the cases. So for this example, I would like to remove the ts on row 8, 18 and 19, but keep the others. I have over thousands of cases so doing this manually would be a true horror. Any help is very much appreciated.
One option would be to use rle to get runs of the same string and then you can use an sapply to check forward/backward and return all the positions you want to drop:
rle_vals <- rle(as.character(df$x))
drop <- unlist(sapply(2:length(rle_vals$values), #loop over values
function(i, vals, lengths) {
if(vals[i] == "t" & vals[i-1] == "d" & vals[i+1] == "c"){#Check if value is "t", previous is "d" and next is "c"
(sum(lengths[1:i-1]) + 1):sum(lengths[1:i]) #Get row #s
}
},vals = rle_vals$values, lengths = rle_vals$lengths))
drop
#[1] 8 18 19
df[-drop,]
# x y
#1 a 2
#2 a 4
#3 b 5
#4 b 2
#5 b 6
#6 c 2
#7 d 4
#9 c 2
#10 b 6
#11 t 2
#12 c 4
#13 t 5
#14 a 2
#15 a 6
#16 b 2
#17 d 4
#20 c 6
This also works, by collapsing to a string, identifying groups of t's between d and c (or c and d - not sure whether you wanted this option as well), then working out where they are and removing the rows as appropriate.
df = data.frame(x=c("a","a","b","b","b","c","d","t","c","b","t","c","t","a","a","b","d","t","t","c"),
y=c(2,4,5,2,6,2,4,5,2,6,2,4,5,2,6,2,4,5,2,6),stringsAsFactors = FALSE)
dfs <- paste0(df$x,collapse="") #collapse to a string
dfs2 <- do.call(rbind,lapply(list(gregexpr("dt+c",dfs),gregexpr("ct+d",dfs)),
function(L) data.frame(x=L[[1]],y=attr(L[[1]],"match.length"))))
dfs2 <- dfs2[dfs2$x>0,] #remove any -1 values (if string not found)
drop <- unlist(mapply(function(a,b) (a+1):(a+b-2),dfs2$x,dfs2$y))
df2 <- df[-drop,]
Here is another solution with base R:
df = data.frame(x = c("a","a","b","b","b","c","d","t","c","b","t","c","t","a","a","b","d","t","t","c"),
y = c(2,4,5,2,6,2,4,5,2,6,2,4,5,2,6,2,4,5,2,6))
#
s <- paste0(df$x, collapse="")
L <- c(NA, NA)
while (TRUE) {
r <- regexec("dt+c", s)[[1]]
if (r[1]==-1) break
L <- rbind(L, c(pos=r[1]+1, length=attr(r, "match.length")-2))
s <- sub("d(t+)c", "x\\1x", s)
}
L <- L[-1,]
drop <- unlist(apply(L,1, function(x) seq(from=x[1], len=x[2])))
df[-drop, ]
# > drop
# 8 18 19
# > df[-drop, ]
# x y
# 1 a 2
# 2 a 4
# 3 b 5
# 4 b 2
# 5 b 6
# 6 c 2
# 7 d 4
# 9 c 2
# 10 b 6
# 11 t 2
# 12 c 4
# 13 t 5
# 14 a 2
# 15 a 6
# 16 b 2
# 17 d 4
# 20 c 6
With gregexpr() it is shorter:
s <- paste0(df$x, collapse="")
g <- gregexpr("dt+c", s)[[1]]
L <- data.frame(pos=g+1, length=attr(g, "match.length")-2)
drop <- unlist(apply(L,1, function(x) seq(from=x[1], len=x[2])))
df[-drop, ]

LIst of lists in R into a data.frame - inconsistent variable names

I have a list of lists and I want to convert it into a dataframe. The challenge is that there are missing variables names in lists (not NA's but the variable is missing completely).
To illustrate on example: from
my_list <- list()
my_list[[1]] <- list(a = 1, b = 2, c = 3)
my_list[[2]] <- list(a = 4, c = 6)
I would like to get
a b c
[1,] 1 2 3
[2,] 4 NA 6
Another option is
library(reshape2)
as.data.frame(acast(melt(my_list), L1~L2, value.var='value'))
# a b c
#1 1 2 3
#2 4 NA 6
Or as #David Arenburg suggested a wrapper for melt/dcast would be recast
recast(my_list, L1 ~ L2, value.var = 'value')[, -1]
# a b c
#1 1 2 3
#2 4 NA 6
You can use the bind_rows function from the dplyr package :
my_list <- list()
my_list[[1]] <- list(a = 1, b = 2, c = 3)
my_list[[2]] <- list(a = 4, c = 6)
dplyr::bind_rows(lapply(my_list, as.data.frame))
This outputs:
Source: local data frame [2 x 3]
a b c
1 1 2 3
2 4 NA 6
Another answer, this requires to change the class of the arguments to data.frames:
library(plyr)
lista <- list(a=1, b=2, c =3)
listb <- list(a=4, c=6)
lista <- as.data.frame(lista)
listb <- as.data.frame(listb)
my_list <- list(lista, listb)
my_list <- do.call(rbind.fill, my_list)
my_list
a b c
1 1 2 3
2 4 NA 6

Resources