Control number of rows when binding dataframes with different number of rows? - r

I have a dataframe generated by a function:
Each time it's of different number of rows:
structure(list(a = c(1, 2, 3), b = c("er", "gd", "ku"), c = c(43,
453, 12)), .Names = c("a", "b", "c"), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
structure(list(a = c(1, 2), b = c("er", "gd"), c = c(43, 453)), .Names = c("a",
"b", "c"), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame"))
I want to be able like in a while loop to control the number of rows to be less then n (n = 4, 100, 4242...) when I bind rows.
Please advise how to do this using functional programming without a while loop?
I mean sometimes you will get n = 10 and the df before bind_rows is 7 and after binding the last one it will be 20. It's ok, I want the number of rows to be min_k (k >= n)
Here is my while loop doing this:
b <- list()
total_rows <- 0
while(total_rows < 1000) {
df <- f_produce_rand_df()
b[[length(b) + 1]] <- df
total_rows <- total_rows + nrow(df)
}

Related

how to identify whether all data frame in a list has unique ID or not

I have a list of dfs. I want to know whether there is a smart way to tell whether each df in lst has unique ID, and create a summary table like below"
Sample data:
lst<-list(structure(list(ID = c("Tom", "Jerry", "Mary"), Score = c(85,
85, 96)), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame")), structure(list(ID = c("Tom", "Jerry", "Mary",
"Jerry"), Score = c(75, 65, 88, 98)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(ID = c("Tom", "Jerry",
"Tom"), Score = c(97, 65, 96)), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame")))
We could loop over the list and check with n_distinct
library(dplyr)
library(stringr)
library(purrr)
map_dfr(setNames(lst, str_c("df", seq_along(lst))),
~.x %>%
summarise(UniqueID = c("N", "Y")[1 + (n_distinct(ID) == n())]), .id= 'Data')
-output
# A tibble: 3 × 2
Data UniqueID
<chr> <chr>
1 df1 Y
2 df2 N
3 df3 N
In base R:
data.frame(Data = paste0("df", seq(lst)),
UniqueID = ifelse(sapply(lst, \(x) length(unique(x$ID)) == nrow(x)), "Y", "N"))
Data UniqueID
1 df1 Y
2 df2 N
3 df3 N

how to get the variable list from each data frame in a list

I have a list of df, and I would like to rename the df as df1, df2, df3. and then create a summary like below to capture the variables in each df. What should I do?
I tried to use map to setNames for the data frames in lst, but I must do it in the wrong way. my current codes set variable names to df1, df2, def3. 😅
lst<- map( lst ~
setNames(.x, str_c("df", seq_along(lst))))
sample data:
lst<-list(structure(list(ID = c("Tom", "Jerry", "Mary"), Score = c(85,
85, 96), Test = c("Y", "N", "Y")), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(ID = c("Tom", "Jerry",
"Mary", "Jerry"), Score = c(75, 65, 88, 98), try = c("Y", NA,
"N", NA)), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame")), structure(list(ID = c("Tom", "Jerry", "Tom"),
Score = c(97, 65, 96), weight = c("A", NA, "C")), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame")))
We get the column names with names/colnames by looping, paste to a single string with toString, convert to a data.frame column and bind the elements (_dfr).
library(purrr)
library(dplyr)
library(stringr)
setNames(lst, str_c("df", seq_along(lst))) %>%
map_dfr(~ tibble(Var = toString(names(.x))), .id = 'Data')
-output
# A tibble: 3 × 2
Data Var
<chr> <chr>
1 df1 ID, Score, Test
2 df2 ID, Score, try
3 df3 ID, Score, weight

Generate column based if other columns are equal

What I want to do is generate a new column in a dataframe that meets these conditions:
dataframe1$var1 == dataframe2$var1 &
dataframe1$var2 == dataframe2$var2 &
dataframe1var3 == dataframe3$var3*
Basically I need to generate a dummy variable that has the value 1 if the conditions are met, and the value 0 if they are not.
I've tried the following code that doesn't work:
dataframe1$NewVar <- ifelse(dataframe1$var1 == dataframe2$var1 &
dataframe1$var2 == dataframe2$var2 & dataframe1$var3 == dataframe2$var3 , 1, 0)
Data
dput(df1)
structure(list(var1 = c("A", "B", "C"), var2 = c("X", "X", "X"
), var3 = c(1, 2, 2)), .Names = c("var1", "var2", "var3"), row.names = c(NA,
-3L), class = "data.frame")
dput(df2)
structure(list(var1 = c("A", "A", "C"), var2 = c("X", "X", "Y"
), var3 = c(1, 1, 1)), .Names = c("var1", "var2", "var3"), row.names = c(NA,
-3L), class = "data.frame")
btw my dataset is not as simple as the example I posted in the pictures.
I don't know if it's relevant but values in my variables (columns) would look like this:
var1: 24000000000
var2: 1234567
var3: 8
You can simply do,
as.integer(rowSums(df1 == df2) == ncol(df1))
#[1] 1 0 0

Convert column names to lower case while stacking data objects stored with different case styles

I've several data objects nested in a huge object which I need to stack using rbind. However, before stacking these, I need to convert column names to lower case, once the data objects were stored with different case styles. How could I make this happen?
Toy data
df <- list(structure(list(a = 1:3, x = c(-1.99, -1.11, -0.34), y = c("C", "B", "A")), .Names = c("a", "x",
"y"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L)), structure(list(a = 1:3, x = c(-0.44, -1.07,
-0.23)), .Names = c("A", "x"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L)), structure(list(
a = 1:3, x = c(-0.62, -0.60, -0.06
), y = c(3L, 2L, 1L)), .Names = c("a", "X", "y"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L)))
lapply(df, names)
rbind
data.table::rbindlist(df, fill=TRUE, idcol = TRUE)
Here is a solution using lapply. However, it creates a duplicate of the original list.
df_lower <- lapply(df, function(x) setNames(x, tolower(names(x))))
Design a function and use lapply to apply that function to all the data frames. This will change all column names to lower cases.
colname_fun <- function(dt){
dt <- setNames(dt, tolower(names(dt)))
return(dt)
}
lapply(df, colname_fun)

Export data frames from list to txt file

I have a question in exporting data frame from list into txt file. I found some solutions, but it was only for vectors. Here is one example:
dataframe1 <- data.frame(a= c(1,2,3,4,5), b= c(1,1,1,1,1))
dataframe2 <- data.frame(a= c(5,5,5), b= c(1,1,1))
mylist <- list(dataframe1, dataframe2)
I would like that the txt file looks like this:
$dataframe1
a b
1 1
2 1
3 1
4 1
5 1
$dataframe2
a b
5 1
5 1
5 1
Thank you for the help.
Say your list is named:
mylist<-structure(list(dataframe1 = structure(list(a = c(1, 2, 3, 4,
5), b = c(1, 1, 1, 1, 1)), .Names = c("a", "b"), row.names = c(NA,
-5L), class = "data.frame"), dataframe2 = structure(list(a = c(5,
5, 5), b = c(1, 1, 1)), .Names = c("a", "b"), row.names = c(NA,
-3L), class = "data.frame")), .Names = c("dataframe1", "dataframe2"
))
You can try:
con<-file("temp.csv",open="at")
Map(function(x,y) {cat(file=con,y,"\n");write.table(x,file=con,quote=FALSE,row.names=FALSE)},
mylist,names(mylist))
close(con)
The above will write the files on the file temp.csv. You have to give names to your list if you want it to work.
Alternatively, if you are ok with the print method, you can just redirect the standard output to a file:
sink("temp.csv")
print(mylist)
sink(NULL)

Resources