I have tried searching how to rename columns of multiple data frames in a loop, but I can't find a cohesive answer. Let's say I have 4 data frames that have 2 columns each. I want to rename each y1 column as "number" and each y2 column as "value" in all 4 data frames. I know that I can do this by creating a list, but I want to change the name of the column directly for that data frame, not as a data frame list value (like df_list[[1]]). I get that type of result when I use this code:
df_list <- list(d1, d2, d3, d4)
for (i in 1:length(df_list)){
colnames(df_list[[i]]) <- c("number", "value")
}
Data frames:
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
d3 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d4 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
Easiest would be setNames
lapply(df_list, setNames, c("number", "value"))
As #Parfait mentioned, it is better to have objects in a list rather than changing the objects in the global environment, but it can be done if the list name is also the object name
list2env(lapply(mget(paste0("d", 1:4)), setNames,
c("number", "value")), envir = .GlobalEnv)
names(d1)
#[1] "number" "value"
Or using a for loop
for(nm in paste0("d", 1:4)) assign(nm, `names<-`(get(nm), c("number", "value")))
Related
I have >100 dataframes loaded into R. I want to remove all the columns from all data frames containing a certain pattern, in the example case below "abc".
df1 <- data.frame(`abc_1` = rep(3, 5), `b` = seq(1, 5, 1), `c` = letters[1:5])
df2 <- data.frame(`d` = rep(5, 5), `e_abc` = seq(2, 6, 1), `f` = letters[6:10])
df3 <- data.frame(`g` = rep(5, 5), `h` = seq(2, 6, 1), `i_a_abc` = letters[6:10])
I would thus like to remove the column abc_1 in df1, e_abc in df2 and i_a_abc in df3. How could this be done?
Do all of your dataframes start with or contain a shared string (e.g., df)? If yes, then it might be easier to put all your dataframes in a list by using that shared string and then apply the function to remove the abc columns in every dataframe in that list.
You can then read your dataframes back into your environment with list2env(), but it probably is in your interest to keep everything in a list for convenience.
library(dplyr)
df1 <- data.frame(`abc_1` = rep(3, 5), `b` = seq(1, 5, 1), `c` = letters[1:5])
df2 <- data.frame(`d` = rep(5, 5), `e_abc` = seq(2, 6, 1), `f` = letters[6:10])
df3 <- data.frame(`g` = rep(5, 5), `h` = seq(2, 6, 1), `i_a_abc` = letters[6:10])
dfpattern <- grep("df", names(.GlobalEnv), value = TRUE)
dflist <- do.call("list", mget(dfpattern))
dflist <- lapply(dflist, function(x){ x <- x %>% select(!contains("abc")) })
list2env(dflist, envir = .GlobalEnv)
I have a list of dataframes and they all have the same column numbers.
The difference is some of them have only a couple rows whereas others have a few hundred rows.
Now before my next step, I would like to filter this list and subset only the dataframes in the list have longer than n rows.
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6), y3 = c(5, 5, 6))
d2 <- data.frame(y1 = c(3),y2 = c(4), y3 = c(5))
d3 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6), y3 = c(5, 5, 6))
my.list <- list(d1, d2,d3)
dataframe 1-3 all have 3 columns but d2 only has 1 row. How do I filter this list so I get my.list with only dataframes longer than 1 rows, aka d1 and d3?
Try the code below
my.list[sapply(my.list,nrow)>1]
Using purrr.
library(purrr)
keep(my.list, ~ nrow(.x) > 1)
A base R solution:
Filter(function(x) nrow(x) > 1, my.list)
I am trying to find the minimum position i.e. the leftmost of the first 1 by column in a list of binary data frames. I have used the following to get the first 1 by row which works but when I try something similar for column I get the same output. Anyone any suggestions as to what could be wrong or what I should try?
files <- list.files(pattern="*.csv")
file_list <- lapply(files, read.table)
first_1 <- sapply(file_list, function(x) min(which(t(x) == 1, arr.ind = T)))
Here, we can direcly get the minimum column index by creating a logical vector with colSums
sapply(file_list, function(x) which(colSums(x == 1) > 0)[1])
data
file_list <- list(data.frame(col1 = c(5, 3, 1, 2, 3), col2 = c(3, 4, 5, 1, 4)),
data.frame(col1 = c(5, 3, 2, 2, 1), col2 = c(3, 4, 5, 1, 4)))
I have three data frames which I have combined in a list
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
d3 <- data.frame(y1 = c(5, 7, 8),y2 = c(6, 4, 2))
my.list <- list(d1, d2,d3)
I want to extract the first row of each element in the list, bind them row wise and save as csv file.
For example, in above example, I want to extract first row from d1, d2 and d3
row1.d1 <- c(1,4)
row1.d2 <- c(3,6)
row1.d3 <- c(5,6)
and bind them together
dat <- rbind(row1.d1,row1.d2,row1.d3)
dat
row1.d1 1 4
row1.d2 3 6
row1.d3 5 6
and repeat it for all rows.
I found a way to do this if I have a list of vectors,
A=list()
A[[1]]=c(1,2)
A[[2]]=c(3,4)
A[[3]]=c(5,6)
sapply(A,'[[',1)
But for dataframes, I am not sure how to go about it.
Another way would be the following. You go through each data frame in my.list and get the first row with lapply(). Then you bind the result.
do.call(rbind, (lapply(my.list, function(x) x[1,])))
# y1 y2
#1 1 4
#2 3 6
#3 5 6
A tidyverse/FP solution is below. I added id and df to retain information about the row number and source, respectively.
# your data
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
d3 <- data.frame(y1 = c(5, 7, 8),y2 = c(6, 4, 2))
my.list <- list(d1, d2,d3)
# tidyverse/ FP solution
library(dplyr)
library(purrr)
library(readr)
map_df(.x = seq(1:3),
.f = function(x) bind_rows(my.list[x]) %>%
mutate(id = row_number(),
df = x)) %>%
arrange(id) %>%
#select(-id, -df) %>% # uncomment if you want to lose row num and source
write_csv(path = 'yourfile.csv')
We need to use , after the 1
t(sapply(my.list, `[`, 1, ))
I have a list of SpatialPolygonDataFrame that I can assimilate to dataframe like this:
df.1 <- data.frame(A = c(1:10), B = c(1, 2, 2, 2, 5:10))
df.2 <- data.frame(A = c(1:10), B = c(1, 2, 2, 2, 2, 2, 7:10))
df.3 <- data.frame(A = c(1:10), B = c(1, 2, 2, 4:10))
list.df <- list(df.1, df.2, df.3)
I would like to get a list of a subset of each dataframe based on condition (list.df.sub is the result I am looking for):
df.1.sub <- subset(df.1, df.1$B != 2)
df.2.sub <- subset(df.2, df.2$B != 2)
df.3.sub <- subset(df.3, df.3$B != 2)
list.df.sub <- list(df.1.sub, df.2.sub, df.3.sub)
I would like to apply directly my subset on list.df. I know that I have to use lapply function but don't know how?
This should work:
lapply(list.df, function(x)x[x$B!=2,])
or with subset:
lapply(list.df, subset, B!=2)
If you only want to subset one column, you can also use the "[[" function
example_list <- list(iris, iris, iris)
lapply(example_list, "[[", "Species")