Apply function to list of dataframes and columns matching pattern - r

I have a list of dataframes and I would like to apply a function to specific columns that follow a pattern across all the dataframes in the list.
Here is an example list of dataframes:
k_2 <- data.frame(Site = c(rep("A",3), rep("B",2)), V1 = c(1,2,3,4,5), V2 = c(1,2,3,4,5))
k_3 <- data.frame(Site = c(rep("A",3), rep("B",2)), V1 = c(1,2,3,4,5), V2 = c(1,2,3,4,5), V3 = c(1,2,3,4,5))
k_4 <- data.frame(Site = c(rep("A",3), rep("B",2)), V1 = c(1,2,3,4,5), V2 = c(1,2,3,4,5), V3 = c(1,2,3,4,5), V4 = c(1,2,3,4,5))
my.list <- list(k_2, k_3, k_4)
my.list
I want to apply this
k2_res <- ddply(k_2, "Site", function(x) colSums(x[c("V1", "V2")])/nrow(x))
to all the dataframes in the list. However, for K_3 the calculation will need to be colSums(x[c("V1","V2","V3")]) and k_4 will go up to V4 and so on.
Ideas
I thought that maybe I could use some sort of grep or regrex to automatically select all the columns beginning with V?

Are you looking for something like below?
lapply(
my.list,
function(df) ddply(df, "Site", function(x) colSums(x[grepl("V\\d+", names(x))]) / nrow(x))
)

Related

Partical match string between columns for multiple dataframes

I have a list of dataframes (df1, df2, df3) for which I would like to match columns with another dataframe (df) and substitute strings only if there is a match. Match should be based on a string specified when running the function, specified as partial match, in other words here it only for fields containing string "TEXT" and should work on cases like TEXT123 and TEXTabc. I did not get very far myself...
df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)
list<-c(df1, df2, df3)
example for df1
partial_match <- function(column_A$df1, column_B, TEXT, df) {
df1_new <-df1
df1_new[, column_B] <- ifelse(grepl("TEXT.*", df1[, column_A]),
df[, column_B] - nchar(TEXT),
df[, column_B])
df1_new
}
Outcome for df1:
name column_A column_B
TEXT333 1 11
b 2 b
c 3 c
Here's one approach using a for loop. You were close! Note that I changed your reference dataframe name to dfs to avoid confusion with list().
Do you think you might encounter a situation where you might match multiple times in the same dataframe? If so, what I show below won't work without a couple more lines.
df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
dfs <- list(df1, df2, df3)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)
# loop over all dataframes in your list
for(i in 1:length(dfs)){
# get name that matches regex
val <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
# use name to update value from reference df
dfs[[i]][dfs[[i]]$name == val,"column_A"] <- df[df$name == val,"column_B"]
}
Updated answer that can account for multiple matches in the same df
for(i in 1:length(dfs)){
vals <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
for(val in vals){
dfs[[i]][dfs[[i]]$name == val, "column_A"] <- df[df$name == val,"column_B"]
}
}

Combine variables into numeric vector and find distance between them

I have four numeric variables that I would like to combine into two vectors, and then take the distance between those vectors.
df = data.frame(V1 = 1:10,
V2 = 11:20,
V3 = 21:30,
V4 = 31:40)
I can create the vectors this way:
df2 <- df %>%
mutate(vector1 = mapply(c, V1, V2, SIMPLIFY = F),
vector2 = mapply(c, V3, V4, SIMPLIFY = F))
But I haven't been able to force them to be numeric so I can't calculate the distance between them:
# want to be able to do something like this
df2 %>%
mutate(distance = sqrt(sum((vector1 - vector2) ^ 2)))
I've tried all sorts of combinations of:
distance_df$vector1 <- lapply(distance_df$vector1, as.numeric)
distance_df$vector1 <- as.numeric(as.character(distance_df$vector1))
I must be missing something quite obvious since this doesn't seem that difficult.
might this be an option?
library(tidyverse)
df = data.frame(V1 = 1:10,
V2 = 11:20,
V3 = 21:30,
V4 = 31:40)
df %>%
rowwise() %>%
mutate(distance = sqrt(sum((c(V1,V2) - c(V3,V4)) ^ 2)))

Append columns to list of dataframes using lapply and mapply

I have a list of dataframes that to manipulate individually that looks like this:
df_list <- list(A1 = data.frame(v1 = 1:10,
v2 = 11:20),
A2 = data.frame(v1 = 21:30,
v2 = 31:40))
df_list
Using lapply allows me to run a function over the list of dataframes like this:
library(tidyverse)
some_func <- function(lizt, comp = 2){
lizt <- lapply(lizt, function(x){
x <- x %>%
mutate(IMPORTANT_v3 = v2 + comp)
return(x)
})
}
df_list_1 <- some_func(df_list)
df_list_1
So far so good but I need to run the function multiple times with different arguments so using mapply returns:
df_list_2 <- mapply(some_func,
comp = c(2, 3, 4),
MoreArgs = list(
lizt = df_list
),
SIMPLIFY = F
)
df_list_2
This creates a new list of dataframes for each argument fed to the function in mapply giving me 3 lists of 2 dataframes. This is good but the output I'm looking for is to append a new column to each original dataframe for each argument in the mapply that would look like this:
desired_df_list <- list(A1 = data.frame(v1 = 1:10,
v2 = 11:20,
IMPORTANT_v3 = 13:22,
IMPORTANT_v4 = 14:23,
IMPORTANT_v5 = 15:24),
A2 = data.frame(v1 = 21:30,
v2 = 31:40,
IMPORTANT_v3 = 33:42,
IMPORTANT_v4 = 34:43,
IMPORTANT_v5 = 35:44))
desired_df_list
How can I wrangle the output of lists of lists of dataframes to isolate and append only the desired new columns (IMPORTANT_v3) to the original dataframe? Also open to other options such as mutating multiple columns inside the lapply using mapply but I haven't figured out how to code that as yet.
Thanks!
Solved like this:
main_func <- function(lizt, comp = c(2:4)){
lizt <- lapply(lizt, function(x){
df <- mapply(movavg,
n = comp,
type = "w",
MoreArgs = list(x$v2),
SIMPLIFY = T
)
colnames(df) <- paste0("IMPORTANT_v", 1:ncol(df))
print(df)
print(x)
x <- cbind(x, df)
return(x)
})
}
desired_df_list_complete <- main_func(df_list)
desired_df_list_complete
using movavg from pracma package in this example.

R Loop code over several lists of dataframes

I have several lists of dataframes and I want to format the date in each single dataframe within all lists of dataframes. Here is an example code:
v1 = c("2000-05-01", "2000-05-02", "2000-05-03", "2000-05-04", "2000-05-05")
v2 = seq(2,20, length = 5)
v3 = seq(-2,7, length = 5)
v4 = seq(-6,3, length = 5)
df1 = data.frame(Date = v1, df1_Tmax = v2, df1_Tmean = v3, df1_Tmin = v4)
dfl1 <- list(df1, df1, df1, df1)
names(dfl1) = c("ABC_1", "DEF_1", "GHI_1", "JKL_1")
v1 = c("2000-05-01", "2000-05-02", "2000-05-03", "2000-05-04", "2000-05-05")
v2 = seq(3,21, length = 5)
v3 = seq(-3,8, length = 5)
v4 = seq(-7,4, length = 5)
df2 = data.frame(Date = v1, df2_Tmax = v2, df2_Tmean = v3, df2_Tmin = v4)
dfl2 <- list(df2, df2, df2, df2)
names(dfl2) = c("ABC_2", "DEF_2", "GHI_2", "JKL_2")
v1 = c("2000-05-01", "2000-05-02", "2000-05-03", "2000-05-04", "2000-05-05")
v2 = seq(4,22, length = 5)
v3 = seq(-4,9, length = 5)
v4 = seq(-8,5, length = 5)
df3 = data.frame(Date = v1, df3_Tmax = v2, df3_Tmean = v3, df3_Tmin = v4)
dfl3 <- list(df3, df3, df3, df3)
names(dfl3) = c("ABC_3", "DEF_3", "GHI_3", "JKL_3")
v1 = c("2000-05-01", "2000-05-02", "2000-05-03", "2000-05-04", "2000-05-05")
v2 = seq(2,20, length = 5)
v3 = seq(-2,8, length = 5)
v4 = seq(-6,3, length = 5)
abc = data.frame(Date = v1, ABC_Tmax = v2, ABC_Tmean = v3, ABC_Tmin = v4)
abclist <-list(abc, abc, abc, abc)
names(abclist) = c("ABC_abc", "DEF_abc", "GHI_abc", "JKL_abc")
I know how to change the date-column manually:
dfl1$ABC_1$Date = as.Date(dfl1$ABC_1$Date,format="%Y-%m-%d")
class(dfl1$ABC_1$Date)
But how can I do that for each single Date-Column in all of my lists of dataframes?
Here is one option using get and assign
nms <- c('dfl1', 'dfl2', 'dfl3', 'abclist')
lapply(nms, function(x) assign(x,lapply(get(x),
function(y) {y$Date1 <- as.Date(y$Date, format="%Y-%m-%d")
return(y)}),
envir = .GlobalEnv))
PS: Be careful with assign since it will change your global environment .GlobalEnv. Many R users will suggest the list solution over assign.
This can be done with lapply:
lapply(dfl1, function(x) {
x$Date <- as.Date(x$Date, format="%Y-%m-%d")
return(x)})
If you want to do this for all of you df-lists you need to store them in a list and then you can use a slightly modified version of the above call:
df_list <- list(dfl1, dfl2, dfl3, abclist)
lapply(df_list, function(x) {
x[[1]]$Date <- as.Date(x[[1]]$Date, format="%Y-%m-%d")
return(x)})
This assumes that the Date-column has always the same name "Date".

Want to loop through the columns of dataframes in a list

I would like to loop through a list of dataframes and change the column names (I want each of the columns to have the same name)
Does anyone have a solution using the following data?
df <- data.frame(x = 1:10, y = 2:11, z = 3:12)
df2 <- data.frame(x = 1:10, y = 2:11, z = 3:12)
df3 <- data.frame(x = 1:10, y = 2:11, z = 3:12)
x <- list(df, df2, df3)
Either using a for loop or apply? Would actually love to see both if possible
Thanks,
Ben
Both hrbrmstr and David Arenburg's answers are perfect.

Resources