Apply subset function to a list of dataframes - r

I have a list of SpatialPolygonDataFrame that I can assimilate to dataframe like this:
df.1 <- data.frame(A = c(1:10), B = c(1, 2, 2, 2, 5:10))
df.2 <- data.frame(A = c(1:10), B = c(1, 2, 2, 2, 2, 2, 7:10))
df.3 <- data.frame(A = c(1:10), B = c(1, 2, 2, 4:10))
list.df <- list(df.1, df.2, df.3)
I would like to get a list of a subset of each dataframe based on condition (list.df.sub is the result I am looking for):
df.1.sub <- subset(df.1, df.1$B != 2)
df.2.sub <- subset(df.2, df.2$B != 2)
df.3.sub <- subset(df.3, df.3$B != 2)
list.df.sub <- list(df.1.sub, df.2.sub, df.3.sub)
I would like to apply directly my subset on list.df. I know that I have to use lapply function but don't know how?

This should work:
lapply(list.df, function(x)x[x$B!=2,])
or with subset:
lapply(list.df, subset, B!=2)

If you only want to subset one column, you can also use the "[[" function
example_list <- list(iris, iris, iris)
lapply(example_list, "[[", "Species")

Related

Remove columns with certain column name patterns in multiple dataframes in R

I have >100 dataframes loaded into R. I want to remove all the columns from all data frames containing a certain pattern, in the example case below "abc".
df1 <- data.frame(`abc_1` = rep(3, 5), `b` = seq(1, 5, 1), `c` = letters[1:5])
df2 <- data.frame(`d` = rep(5, 5), `e_abc` = seq(2, 6, 1), `f` = letters[6:10])
df3 <- data.frame(`g` = rep(5, 5), `h` = seq(2, 6, 1), `i_a_abc` = letters[6:10])
I would thus like to remove the column abc_1 in df1, e_abc in df2 and i_a_abc in df3. How could this be done?
Do all of your dataframes start with or contain a shared string (e.g., df)? If yes, then it might be easier to put all your dataframes in a list by using that shared string and then apply the function to remove the abc columns in every dataframe in that list.
You can then read your dataframes back into your environment with list2env(), but it probably is in your interest to keep everything in a list for convenience.
library(dplyr)
df1 <- data.frame(`abc_1` = rep(3, 5), `b` = seq(1, 5, 1), `c` = letters[1:5])
df2 <- data.frame(`d` = rep(5, 5), `e_abc` = seq(2, 6, 1), `f` = letters[6:10])
df3 <- data.frame(`g` = rep(5, 5), `h` = seq(2, 6, 1), `i_a_abc` = letters[6:10])
dfpattern <- grep("df", names(.GlobalEnv), value = TRUE)
dflist <- do.call("list", mget(dfpattern))
dflist <- lapply(dflist, function(x){ x <- x %>% select(!contains("abc")) })
list2env(dflist, envir = .GlobalEnv)

change rownames in a list of data.frames using a master data.frame

I have a bunch of lists, and I want to rename the rows using a master data.frame.
Here's an example list I created:
df <- data.frame(Value = c(1, 0, 6, 3, 4))
rownames(df) <- c("B144","B211","B678","B111", "B7090")
df2 <- data.frame(Value = c(0, 0, 1, 4, 2))
rownames(df2) <- c("V2","V543","V6577","V4322", "V55")
List<-list(df,df2)
And here's the master data.frame:
master=data.frame(ID=c("B144", "B211", "B7090","B111", "B678", "B242", "V4322", "V2694", "V4399", "V543","V6577","V2", "V55", "V554", "V2009"),
rename=c("S1","S2", "S3","N1","N2","S4","S5","N3","N4","N5","S6","S7","S8", "N6", "N7"))
Is there a way to rename the row names in the lists using this master?
Here's example output I want:
df <- data.frame(Value = c(1, 0, 6, 3, 4))
rownames(df) <- c("S1","S2","N2","N1", "S3")
df2 <- data.frame(Value = c(0, 0, 1, 4, 2))
rownames(df2) <- c("S7","N5","S6","S5", "S8")
List<-list(df,df2)
The function is long in one line. So I define it separately for better readability:
func <- function (DF) {
lookup <- match(row.names(DF), master$ID)
row.names(DF) <- master$rename[lookup]
DF
}
List <- lapply(List, func)

How to change the value of an element within a list using purrr (tidyverse)

If I have the following list
list1 <- list(list(a = 2, b = 3), list(c = 4, d = 5))
list2 <- list(e = "a", f = "b")
mylist <- list(list1, list2)
What is the easiest way to change the value of a within mylist to a different value (preferably in purrr)?
Something like:
list1[[1]][[1]] <- newvalue
Maybe you can use pluck :
purrr::pluck(mylist, 1, 1, 'a') <- 'new_value'

Return a changed list in R via lapply(), but objects in list not changed

I'm trying to loop through a list of data frames, dropping columns that don't match some condition. I want to change the data frames such that they're missing 1 column essentially. After executing the function, I'm able to change the LIST of data frames, but not the original data frames themselves.
df1 <- data.frame(
a = c("John","Peter","Dylan"),
b = c(1, 2, 3),
c = c("yipee", "ki", "yay"))
df2 <- data.frame(
a = c("Ray","Bob","Derek"),
b = c(4, 5, 6),
c = c("yum", "yummy", "donuts"))
df3 <- data.frame(
a = c("Bill","Sam","Nate"),
b = c(7, 8, 9),
c = c("I", "eat", "cake"))
l <- list(df1, df2, df3)
drop_col <- function(x) {
x <- x[, !names(x) %in% c("e", "b", "f")]
return(x)
}
l <- lapply(l, drop_col)
When I call the list l, I get a list of data frames with the changes I want. When I call an element in the list, df1 or df2 or df3, they do not have a dropped column.
I've looked at this solution and many others, I'm obviously missing something.
l list and df1 , df2 etc. dataframes are independent. They have nothing to do with each other. One way to get new changed dataframes is to assign names to the list and create new dataframe.
l <- lapply(l, drop_col)
names(l) <- paste0("df", 1:3)
list2env(l, .GlobalEnv)
The problem is that when you are creating l, you are filling it with copies of your data frames df1, df2, df3.
In R, it is not generally possible to pass references to variables. One workaround is to create an environment as #Ronak Shah does.
Another is to use get() and <<- to change the variable within the function.
drop_cols <- function(x) {
for(iter in x)
do.call("<<-", list(iter, drop_col(get(iter))))
}
drop_cols(c("df1","df2","df3"))
df1 <- data.frame(
a = c("John","Peter","Dylan"),
b = c(1, 2, 3),
c = c("yipee", "ki", "yay"))
df2 <- data.frame(
a = c("Ray","Bob","Derek"),
b = c(4, 5, 6),
c = c("yum", "yummy", "donuts"))
df3 <- data.frame(
a = c("Bill","Sam","Nate"),
b = c(7, 8, 9),
c = c("I", "eat", "cake"))
# Name the list elements:
l <- list(df1 = df1, df2 = df2, df3 = df3)
drop_col <- function(x) {
x <- x[, !names(x) %in% c("e", "b", "f")]
return(x)
}
l <- lapply(l, drop_col)
# View altered dfs:
View(l["df1"])

Renaming columns in a loop R

I have tried searching how to rename columns of multiple data frames in a loop, but I can't find a cohesive answer. Let's say I have 4 data frames that have 2 columns each. I want to rename each y1 column as "number" and each y2 column as "value" in all 4 data frames. I know that I can do this by creating a list, but I want to change the name of the column directly for that data frame, not as a data frame list value (like df_list[[1]]). I get that type of result when I use this code:
df_list <- list(d1, d2, d3, d4)
for (i in 1:length(df_list)){
colnames(df_list[[i]]) <- c("number", "value")
}
Data frames:
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
d3 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d4 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
Easiest would be setNames
lapply(df_list, setNames, c("number", "value"))
As #Parfait mentioned, it is better to have objects in a list rather than changing the objects in the global environment, but it can be done if the list name is also the object name
list2env(lapply(mget(paste0("d", 1:4)), setNames,
c("number", "value")), envir = .GlobalEnv)
names(d1)
#[1] "number" "value"
Or using a for loop
for(nm in paste0("d", 1:4)) assign(nm, `names<-`(get(nm), c("number", "value")))

Resources