Select first row from multiple dataframe and bind - r

I have three data frames which I have combined in a list
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
d3 <- data.frame(y1 = c(5, 7, 8),y2 = c(6, 4, 2))
my.list <- list(d1, d2,d3)
I want to extract the first row of each element in the list, bind them row wise and save as csv file.
For example, in above example, I want to extract first row from d1, d2 and d3
row1.d1 <- c(1,4)
row1.d2 <- c(3,6)
row1.d3 <- c(5,6)
and bind them together
dat <- rbind(row1.d1,row1.d2,row1.d3)
dat
row1.d1 1 4
row1.d2 3 6
row1.d3 5 6
and repeat it for all rows.
I found a way to do this if I have a list of vectors,
A=list()
A[[1]]=c(1,2)
A[[2]]=c(3,4)
A[[3]]=c(5,6)
sapply(A,'[[',1)
But for dataframes, I am not sure how to go about it.

Another way would be the following. You go through each data frame in my.list and get the first row with lapply(). Then you bind the result.
do.call(rbind, (lapply(my.list, function(x) x[1,])))
# y1 y2
#1 1 4
#2 3 6
#3 5 6

A tidyverse/FP solution is below. I added id and df to retain information about the row number and source, respectively.
# your data
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
d3 <- data.frame(y1 = c(5, 7, 8),y2 = c(6, 4, 2))
my.list <- list(d1, d2,d3)
# tidyverse/ FP solution
library(dplyr)
library(purrr)
library(readr)
map_df(.x = seq(1:3),
.f = function(x) bind_rows(my.list[x]) %>%
mutate(id = row_number(),
df = x)) %>%
arrange(id) %>%
#select(-id, -df) %>% # uncomment if you want to lose row num and source
write_csv(path = 'yourfile.csv')

We need to use , after the 1
t(sapply(my.list, `[`, 1, ))

Related

how to subset a list of dataframes by its rowlength in R

I have a list of dataframes and they all have the same column numbers.
The difference is some of them have only a couple rows whereas others have a few hundred rows.
Now before my next step, I would like to filter this list and subset only the dataframes in the list have longer than n rows.
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6), y3 = c(5, 5, 6))
d2 <- data.frame(y1 = c(3),y2 = c(4), y3 = c(5))
d3 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6), y3 = c(5, 5, 6))
my.list <- list(d1, d2,d3)
dataframe 1-3 all have 3 columns but d2 only has 1 row. How do I filter this list so I get my.list with only dataframes longer than 1 rows, aka d1 and d3?
Try the code below
my.list[sapply(my.list,nrow)>1]
Using purrr.
library(purrr)
keep(my.list, ~ nrow(.x) > 1)
A base R solution:
Filter(function(x) nrow(x) > 1, my.list)

Is there a way in R to compute a new column on a df based on another df?

is it possible to do something like this in R (assuming both df1 and df2 have the same number of rows?
if (df1$var1 = 8) df2$var1 = 1.
if (df1$var2 = 9) df2$var2 = 1.
A simple two line code can be done with Base R ifelse statement
df1 <- data.frame(var1 = c(1:10), var2 = c(1:10))
df2 <- data.frame(var1 = c(1:10), var2 = c(1:10))
df2$var1 <- ifelse(df1$var1 == 8, 1,df2$var1)
df2$var2 <- ifelse(df1$var2 == 9, 1,df2$var2)
Here is one simple option in base R, where we replicate the values 8, 9 to make the lengths same and compare with the subset of columns of 'df1', resulting in a logical matrix. Subset the 'df2' and assign those columns to 1
nm1 <- c('var1', 'var2')
df2[nm1][df1[nm1] == c(8, 9)[col(df1[nm1])]] <- 1
df2
# var1 var2 var3
#1 5 1 1
#2 3 1 2
#3 1 3 3
#4 1 4 4
#5 4 2 5
Or this can be done in two steps
df2$var1[df1$var1 == 8] <- 1
df2$var2[df1$var2 == 9] <- 1
Or using Map
df2[nm1] <- Map(function(x, y, z) replace(x, y == z, 1),
df2[nm1], df1[nm1], c(8, 9))
The if/else loop can be also done, but it is not vectorized i.e. it expects input to be of length 1. If we do a loop, then it can be done (but would be inefficient in R)
vals <- c(8, 9)
for(i in seq_len(nrow(df1))) {
for(j in seq_along(nm1)) {
if(df1[[nm1[j]]][i] == vals[j]) df2[[nm1[j]]][i] <- 1
}
}
data
df1 <- data.frame(var1 = c(1, 3, 8, 5, 2), var2 = c(9, 3, 1, 8, 4),
var3 = 1:5)
df2 <- data.frame(var1 = c(5, 3, 2, 1, 4), var2 = c(3, 1, 3, 4, 2),
var3 = 1:5)

How would I find the minimum position of 1 going by column in a a list of data frames?

I am trying to find the minimum position i.e. the leftmost of the first 1 by column in a list of binary data frames. I have used the following to get the first 1 by row which works but when I try something similar for column I get the same output. Anyone any suggestions as to what could be wrong or what I should try?
files <- list.files(pattern="*.csv")
file_list <- lapply(files, read.table)
first_1 <- sapply(file_list, function(x) min(which(t(x) == 1, arr.ind = T)))
Here, we can direcly get the minimum column index by creating a logical vector with colSums
sapply(file_list, function(x) which(colSums(x == 1) > 0)[1])
data
file_list <- list(data.frame(col1 = c(5, 3, 1, 2, 3), col2 = c(3, 4, 5, 1, 4)),
data.frame(col1 = c(5, 3, 2, 2, 1), col2 = c(3, 4, 5, 1, 4)))

Transpose data frames inside a list of lists

I have a list of lists containing multiple data frames. I would like to transpose the data frames and leave the lists structured as is.
The data is setup in this format (from:John McDonnell):
parent <- list(
a = list(
foo = data.frame(first = c(1, 2, 3), second = c(4, 5, 6)),
bar = data.frame(first = c(1, 2, 3), second = c(4, 5, 6)),
puppy = data.frame(first = c(1, 2, 3), second = c(4, 5, 6))
),
b = list(
foo = data.frame(first = c(1, 2, 3), second = c(4, 5, 6)),
bar = data.frame(first = c(1, 2, 3), second = c(4, 5, 6)),
puppy = data.frame(first = c(1, 2, 3), second = c(4, 5, 6))
)
)
This works when a single list of data frames is used, but not for a list of lists:
a_tran <- lapply(a, function(x) {
t(x)
})
Any thoughts on how to modify?
You could use modify_depth from purrr
library(purrr)
modify_depth(.x = parent, .depth = 2, .f = ~ as.data.frame(t(.)))
#$a
#$a$foo
# V1 V2 V3
#first 1 2 3
#second 4 5 6
#$a$bar
# V1 V2 V3
#first 1 2 3
#second 4 5 6
#$a$puppy
# V1 V2 V3
#first 1 2 3
#second 4 5 6
#$b
# ...
A base R option that #hrbrmstr initially posted in a comment would be
lapply(parent, function(x) lapply(x, function(y) as.data.frame(t(y))))

Renaming columns in a loop R

I have tried searching how to rename columns of multiple data frames in a loop, but I can't find a cohesive answer. Let's say I have 4 data frames that have 2 columns each. I want to rename each y1 column as "number" and each y2 column as "value" in all 4 data frames. I know that I can do this by creating a list, but I want to change the name of the column directly for that data frame, not as a data frame list value (like df_list[[1]]). I get that type of result when I use this code:
df_list <- list(d1, d2, d3, d4)
for (i in 1:length(df_list)){
colnames(df_list[[i]]) <- c("number", "value")
}
Data frames:
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
d3 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d4 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
Easiest would be setNames
lapply(df_list, setNames, c("number", "value"))
As #Parfait mentioned, it is better to have objects in a list rather than changing the objects in the global environment, but it can be done if the list name is also the object name
list2env(lapply(mget(paste0("d", 1:4)), setNames,
c("number", "value")), envir = .GlobalEnv)
names(d1)
#[1] "number" "value"
Or using a for loop
for(nm in paste0("d", 1:4)) assign(nm, `names<-`(get(nm), c("number", "value")))

Resources