cbind with do.call stores dataframe name in column variable - r

Why do I have different column names here for test1 and test2? If I change the data frame to only one column, they have the same name. I would like to have them the same name but use the do.call function.
a <- data.frame(col1 = c(1,2,3), col2 = c(1,2,3))
b <- data.frame(col3 = c(1,2,3), col4 = c(1,2,3))
test1 = cbind(a, b)
dataframe_name = c("a", "b")
test2 <- do.call(cbind, mget(dataframe_name, envir = .GlobalEnv))
colnames(test1)
colnames(test2)
# only one column
a=a[1]
b=b[1]
test1 = cbind(a, b)
test2 <- do.call(cbind, mget(dataframe_name, envir = .GlobalEnv))
colnames(test1)
colnames(test2)

Related

Partical match string between columns for multiple dataframes

I have a list of dataframes (df1, df2, df3) for which I would like to match columns with another dataframe (df) and substitute strings only if there is a match. Match should be based on a string specified when running the function, specified as partial match, in other words here it only for fields containing string "TEXT" and should work on cases like TEXT123 and TEXTabc. I did not get very far myself...
df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)
list<-c(df1, df2, df3)
example for df1
partial_match <- function(column_A$df1, column_B, TEXT, df) {
df1_new <-df1
df1_new[, column_B] <- ifelse(grepl("TEXT.*", df1[, column_A]),
df[, column_B] - nchar(TEXT),
df[, column_B])
df1_new
}
Outcome for df1:
name column_A column_B
TEXT333 1 11
b 2 b
c 3 c
Here's one approach using a for loop. You were close! Note that I changed your reference dataframe name to dfs to avoid confusion with list().
Do you think you might encounter a situation where you might match multiple times in the same dataframe? If so, what I show below won't work without a couple more lines.
df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
dfs <- list(df1, df2, df3)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)
# loop over all dataframes in your list
for(i in 1:length(dfs)){
# get name that matches regex
val <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
# use name to update value from reference df
dfs[[i]][dfs[[i]]$name == val,"column_A"] <- df[df$name == val,"column_B"]
}
Updated answer that can account for multiple matches in the same df
for(i in 1:length(dfs)){
vals <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
for(val in vals){
dfs[[i]][dfs[[i]]$name == val, "column_A"] <- df[df$name == val,"column_B"]
}
}

Using apply functions instead of for loops in R

I have been trying to replace a for loop in my code with an apply function, and i attempted to do it in all the possible ways, using sapply and lapply and apply and mapply, always seems to not work out, the original function looks like this
ds1 <- data.frame(col1 = c(NA, 2), col2 = c("A", "B"))
ds2 <- data.frame(colA = c("A", "B"), colB = c(90, 110))
for(i in 1:nrow(ds1)){
if(is.na(ds1$col1[i])){
ds1$col1[i] <- ds2[ds2[,"colA"] == ds1$col2[i], "colB"]
}
}
My latest attempt with the apply family looks like this
ds1 <- data.frame(col1 = c(NA, 2), col2 = c("A", "B"))
ds2 <- data.frame(colA = c("A", "B"), colB = c(90, 110))
sFunc <- function(x, y, z){
if(is.na(x)){
return(z[z[,"colA"] == y, "colB"])
} else {
return(x)
}
}
ds1$col1 <- sapply(ds1$col1, sFunc, ds1$col2, ds2)
Which returns ds2$colB for each row, can someone explain to me what I got wrong about this?
sapply only iterates over the first vector you pass. The other arguments you pass will be treated as whole vectors in each loop. To iterate over multiple vectors you need multivariate apply, which is mapply.
sFunc <- function(x, y){
if(is.na(x)){
return(ds2[ds2[,"colA"] == y, "colB"])
} else {
return(x)
}
}
mapply(sFunc, ds1$col1, ds1$col2)
#> [1] 90 2
A join would be useful here. You can do it in base R :
transform(merge(ds1, ds2, by.x = "col2", by.y = "colA"),
col1 = ifelse(is.na(col1), colB, col1))[names(ds1)]
# col1 col2
#1 90 A
#2 2 B
Or with dplyr
library(dplyr)
inner_join(ds1, ds2, by = c("col2" = "colA")) %>%
mutate(col1 = coalesce(col1, colB)) %>%
select(names(ds1))

Return a changed list in R via lapply(), but objects in list not changed

I'm trying to loop through a list of data frames, dropping columns that don't match some condition. I want to change the data frames such that they're missing 1 column essentially. After executing the function, I'm able to change the LIST of data frames, but not the original data frames themselves.
df1 <- data.frame(
a = c("John","Peter","Dylan"),
b = c(1, 2, 3),
c = c("yipee", "ki", "yay"))
df2 <- data.frame(
a = c("Ray","Bob","Derek"),
b = c(4, 5, 6),
c = c("yum", "yummy", "donuts"))
df3 <- data.frame(
a = c("Bill","Sam","Nate"),
b = c(7, 8, 9),
c = c("I", "eat", "cake"))
l <- list(df1, df2, df3)
drop_col <- function(x) {
x <- x[, !names(x) %in% c("e", "b", "f")]
return(x)
}
l <- lapply(l, drop_col)
When I call the list l, I get a list of data frames with the changes I want. When I call an element in the list, df1 or df2 or df3, they do not have a dropped column.
I've looked at this solution and many others, I'm obviously missing something.
l list and df1 , df2 etc. dataframes are independent. They have nothing to do with each other. One way to get new changed dataframes is to assign names to the list and create new dataframe.
l <- lapply(l, drop_col)
names(l) <- paste0("df", 1:3)
list2env(l, .GlobalEnv)
The problem is that when you are creating l, you are filling it with copies of your data frames df1, df2, df3.
In R, it is not generally possible to pass references to variables. One workaround is to create an environment as #Ronak Shah does.
Another is to use get() and <<- to change the variable within the function.
drop_cols <- function(x) {
for(iter in x)
do.call("<<-", list(iter, drop_col(get(iter))))
}
drop_cols(c("df1","df2","df3"))
df1 <- data.frame(
a = c("John","Peter","Dylan"),
b = c(1, 2, 3),
c = c("yipee", "ki", "yay"))
df2 <- data.frame(
a = c("Ray","Bob","Derek"),
b = c(4, 5, 6),
c = c("yum", "yummy", "donuts"))
df3 <- data.frame(
a = c("Bill","Sam","Nate"),
b = c(7, 8, 9),
c = c("I", "eat", "cake"))
# Name the list elements:
l <- list(df1 = df1, df2 = df2, df3 = df3)
drop_col <- function(x) {
x <- x[, !names(x) %in% c("e", "b", "f")]
return(x)
}
l <- lapply(l, drop_col)
# View altered dfs:
View(l["df1"])

Append columns to list of dataframes using lapply and mapply

I have a list of dataframes that to manipulate individually that looks like this:
df_list <- list(A1 = data.frame(v1 = 1:10,
v2 = 11:20),
A2 = data.frame(v1 = 21:30,
v2 = 31:40))
df_list
Using lapply allows me to run a function over the list of dataframes like this:
library(tidyverse)
some_func <- function(lizt, comp = 2){
lizt <- lapply(lizt, function(x){
x <- x %>%
mutate(IMPORTANT_v3 = v2 + comp)
return(x)
})
}
df_list_1 <- some_func(df_list)
df_list_1
So far so good but I need to run the function multiple times with different arguments so using mapply returns:
df_list_2 <- mapply(some_func,
comp = c(2, 3, 4),
MoreArgs = list(
lizt = df_list
),
SIMPLIFY = F
)
df_list_2
This creates a new list of dataframes for each argument fed to the function in mapply giving me 3 lists of 2 dataframes. This is good but the output I'm looking for is to append a new column to each original dataframe for each argument in the mapply that would look like this:
desired_df_list <- list(A1 = data.frame(v1 = 1:10,
v2 = 11:20,
IMPORTANT_v3 = 13:22,
IMPORTANT_v4 = 14:23,
IMPORTANT_v5 = 15:24),
A2 = data.frame(v1 = 21:30,
v2 = 31:40,
IMPORTANT_v3 = 33:42,
IMPORTANT_v4 = 34:43,
IMPORTANT_v5 = 35:44))
desired_df_list
How can I wrangle the output of lists of lists of dataframes to isolate and append only the desired new columns (IMPORTANT_v3) to the original dataframe? Also open to other options such as mutating multiple columns inside the lapply using mapply but I haven't figured out how to code that as yet.
Thanks!
Solved like this:
main_func <- function(lizt, comp = c(2:4)){
lizt <- lapply(lizt, function(x){
df <- mapply(movavg,
n = comp,
type = "w",
MoreArgs = list(x$v2),
SIMPLIFY = T
)
colnames(df) <- paste0("IMPORTANT_v", 1:ncol(df))
print(df)
print(x)
x <- cbind(x, df)
return(x)
})
}
desired_df_list_complete <- main_func(df_list)
desired_df_list_complete
using movavg from pracma package in this example.

How to flatten a list with tibbles and tibbles within lists to have all tibbles on the same level?

I have a list where the list elements are tibbles or lists that contain multiple tibbles. I would like to get a list where all the tibbles are on the same level.
How would I do that?
library(tibble)
tib_1 <- tibble(a = 1:4, b = LETTERS[1:4])
tib_2 <- tibble(c = 1:4, d = LETTERS[1:4])
tib_3 <- tibble(e = 1:4, f = LETTERS[1:4])
tib_4 <- tibble(g = 1:4, h = LETTERS[1:4])
my_list <- list(tib_1, tib_2, list(tib_3, tib_4))
desired_list <- list(tib_1, tib_2, tib_3, tib_4)
We can just use flatten
library(rlang)
out <- flatten(my_list)
-checking
identical(desired_list, out)
#[1] TRUE

Resources