I have large number of dataframes in R. Now I want to have a readable output for all column names against each dataframe
Let us say there are three dataframes A, B, C with with different number of columns and different column names as c("Col1", "Col2","Col3"), c("Col4", "Col5") and c("Col6", "Col7", "Col8", "Col9", "COl10") respectively
Now I want to have a output like this
Note: My intention is to later write it in a .csv file and break column names as per requirement (separated by tabs or "," separated)
Here's a stab.
df1 <- data.frame(a=1,b=2,c=3)
df2 <- data.frame(A=1,E=2)
df3 <- data.frame(quux=7,cronk=9)
dfnms <- rownames(subset(ls.objects(), Type %in% c("data.frame", "tbl_df", "data.table")))
dfnms
# [1] "df1" "df2" "df3"
data.frame(name = dfnms, columns = sapply(mget(dfnms), function(x) paste(colnames(x), collapse = ",")))
# name columns
# df1 df1 a,b,c
# df2 df2 A,E
# df3 df3 quux,cronk
If you really need them double-quoted, then add dQuote, as in
data.frame(name = dfnms, columns = sapply(mget(dfnms), function(x) paste(dQuote(colnames(x)), collapse = ",")))
# name columns
# df1 df1 "a","b","c"
# df2 df2 "A","E"
# df3 df3 "quux","cronk"
I am putting my work here, although it is similar to r2evans's solution.
Data
A <- data.frame(col1=1:2, col2=1:2, col3=1:2)
B <- data.frame(col4=1:2, col5=1:2)
C <- data.frame(col6=1:2, col7=1:2, col8=1:2, col9=1:2, col10=1:2)
Code
DataFrameName = c('A', 'B', 'C')
data.frame(DataFrameName = DataFrameName,
Columns = sapply(DataFrameName, function(x) paste(names(get(x)), collapse = ",")),
stringsAsFactors = FALSE)
Output
# DataFrameName Columns
# A A col1,col2,col3
# B B col4,col5
# C C col6,col7,col8,col9,col10
With tidyverse, we can get the datasets in a list with lst, then loop over the list, get the column names, convert it to string, get the list of named strings into a two column tibble with enframe and unnest the 'Columns'
library(dplyr)
library(tidyr)
librarry(purrr)
lst(A, B, C) %>%
map(~ .x %>% names %>% toString) %>%
enframe(name = "DataFrameName", value = "Columns") %>%
unnest(c(Columns))
# A tibble: 3 x 2
# DataFrameName Columns
# <chr> <chr>
#1 A Col1, Col2, Col3
#2 B Col1, Col5
#3 C Col6, Col7, Col8, Col9, Col10
data
A <- data.frame(Col1 = 1:5, Col2 = 6:10, Col3 = 11:15)
B <- data.frame(Col1 = 1:5, Col5 = 6:10)
C <- data.frame(Col6 =1:5, Col7 = 6:10, Col8 = 6:10, Col9 = 7:11, Col10 = 11:15)
Related
dears!
Summarizing my problem in a small example...
I want to append a row in data.frame using a list of variables with the same name of the data.frame columns, like this:
#createing a blank data.frame
df <- data.frame(matrix(ncol=3, nrow=0))
#naming the header
head <- c("col1", "col2", "col3")
# assigning header to data.frame
colnames(df) <- head
# creating three variables with the same name of header
col1 <- 1
col2 <- 2
col3 <- 3
#appending the row
rbind(df, list(col1, col2, col3))
The code runs, but the df continues blank. I would like a result like this for df:
col1 col2 col3
1 2 3
Help me with this rbind.
If you use the names() function, you can rename the columns in R
#createing a blank data.frame
df <- data.frame(matrix(ncol=3, nrow=0))
#naming the header
head <- c("col1", "col2", "col3")
# assigning header to data.frame
colnames(df) <- head
# creating three variables with the same name of header
col1 <- 1
col2 <- 2
col3 <- 3
#appending the row
df2 <- rbind(df, list(col1, col2, col3))
names(df2) <- c("col1", "col2", "col3")
df2
produces the output below
col1 col2 col3
1 2 3
I have two dataframes, the first one has 30 thousand lines the second 571.
I need filter the first one with 2 criteria of the second.
Criteria A: (fctr) DF1$Col1 == [i]DF2$Col1
Criteria B: (date) DF1$Col2 <= [i]DF2$Col2
df1 = data.frame(col1 = c("a","a","a","b","b","b","b","c","c"), col2 = c("10/02", "15/02", "14/03", "05/03", "07/03", "15/03", "20/03", "12/03", "15/03"))
df2 = data.frame(col1 = c("a","b","c"), col2 = c("15/02", "15/03", "15/03"))
I need something like this:
dataframe3 = filter(df1, col1 == [i]df2$col1 & col2 <= [i]df2$col2)
#or
for(i in df2$col1){
a=filter(df1, col1 ==i)
for(e in df2$col2){ #here is the problem, i don't want loop in all dates
b[]=filter(a, col2 <=e)}
If I understand correctly, this should do what you're looking for:
# Add year to make the columns readable as dates
df1$col2 <- as.Date(paste0(df1$col2, "/2019"), format = "%d/%m/%Y")
df2$col2 <- as.Date(paste0(df2$col2, "/2019"), format = "%d/%m/%Y")
# Create df3 such that col1 mathces both dataframes
df3 <- merge(df1, df2, by = "col1", all.y = TRUE)
# Keep rows where df1$col2 <= df2$col2
df3 <- df3[df3$col2.x <= df3$col2.y, c("col1", "col2.x")]
# Rename columns
setnames(df3, "col2.x", "col2")
I would like to "copy paste" one column's value from df A under DF B's column values.
Below is I've visualized on what I'm trying to achieve
An option is to use bind_rows for the selected columns after making the type of the column same
library(dplyr)
bind_rows(df2, df1[1] %>%
transmute(ColumnC = as.character(ColumnA)))
# ColumnC ColumnD
#1 a b
#2 1 <NA>
#3 2 <NA>
#4 3 <NA>
data
df1 <- data.frame(ColumnA = 1:3, ColumnB = 4:6)
df2 <- data.frame(ColumnC = 'a', ColumnD = 'b',
stringsAsFactors = FALSE)
You may use also R base for this. You actually want to right join df2 with df1 :
df1 <- data.frame(1:3, 4:6)
names(df1) <- paste0("c", 1:2)
df2 <- data.frame("a", "b")
names(df2) <- paste0("c", 3:4)
# renaming column to join on
names(df2)[1] <- "c1"
merge(x = df1[,1,drop=FALSE], y = df2, by.y = c("c1"), all = TRUE)
I have 20 data frames and in each of them I want to format the same column in the same way. Of course, I could make a list of the dfs and then use lapply. Instead, my goal is to modify the dfs such that in the end I do not have to access them as elements of a list but as dfs. Here is an example:
df1 <- data.frame(col1 = rnorm(5), col2 = rnorm(5))
df2 <- data.frame(col1 = rnorm(5), col2 = rnorm(5))
Now, suppose I want to add 1 to every value of col1 in df1 and df2. Of course, I could do
df_list <- lapply(list(df1, df2), function(df) {
df$col1 <- df$col1 + 1
return(df)
})
But now df1 returns the original df instead of the modified one. How to do it?
One option based on the OP's code would be to use list2env after naming the list elements
names(df_list) <- paste0("df", 1:2)
list2env(df_list, envir = .GlobalEnv)
If we need to avoid creating the list (it is recommended to have a list of datasets instead of creating individual objects in the global environment), then use assign with for loop
for(obj in paste0('df', 1:2)) {
assign(obj, `[<-`(get(obj), 'col1', value = get(obj)[['col1']] +1))
}
You could use a hack from #g-grothendieck in this question :
http://stackoverflow.com/questions/1826519/how-to-assign-from-a-function-which-returns-more-than-one-value
and do this:
list[df1, df2] <- lapply(list(df1, df2), function(df) {
df$col1 <- df$col1 + 1
return(df)
})
the hack
list <- structure(NA,class="result")
"[<-.result" <- function(x,...,value) {
args <- as.list(match.call())
args <- args[-c(1:2,length(args))]
length(value) <- length(args)
for(i in seq(along=args)) {
a <- args[[i]]
if(!missing(a)) eval.parent(substitute(a <- v,list(a=a,v=value[[i]])))
}
x
}
full code and results
df1 <- data.frame(col1 = rnorm(5), col2 = rnorm(5))
# col1 col2
# 1 -0.5451934 0.5043287
# 2 -1.4047701 -0.1184588
# 3 0.1745109 0.8279085
# 4 -0.5066673 -0.3269411
# 5 0.4838625 -0.3895784
df2 <- data.frame(col1 = rnorm(5), col2 = rnorm(5))
# col1 col2
# 1 0.4168078 -0.44654445
# 2 -1.9991098 -0.06179699
# 3 -1.0625996 1.21098946
# 4 0.4977718 0.45834008
# 5 -1.6181048 0.97917877
list[df1, df2] <- lapply(list(df1, df2), function(df) {
df$col1 <- df$col1 + 1
return(df)
})
# > df1
# col1 col2
# 1 0.4548066 0.5043287
# 2 -0.4047701 -0.1184588
# 3 1.1745109 0.8279085
# 4 0.4933327 -0.3269411
# 5 1.4838625 -0.3895784
# > df2
# col1 col2
# 1 1.41680778 -0.44654445
# 2 -0.99910976 -0.06179699
# 3 -0.06259959 1.21098946
# 4 1.49777179 0.45834008
# 5 -0.61810483 0.97917877
You could avoid the function (and its temporary environment) with a loop like this:
df1 <- data.frame(col1 = 1:5, col2 = rnorm(5))
df2 <- data.frame(col1 = rep(0, 5), col2 = rnorm(5))
df1 # before
for (d in c("df1", "df2")) {
eval(parse(text = paste(d, "[['col1']] <- ", d, "[['col1']] + 1")))
}
df1 # after
Option 2:
df1 <- data.frame(col1 = 1:5, col2 = rnorm(5))
df2 <- data.frame(col1 = rep(0, 5), col2 = rnorm(5))
df1 # before
df2 # before
eval(parse(text = unlist(lapply(c("df1", "df2"), function(x) {
expr.dummy <- quote(df$col1 <- df$col1 +1) # df will be replaced by df1, df2
gsub("df", x, deparse(expr.dummy))
}))))
df1 # after
df2 # after
This may be a bad question because I am not posting any reproducible example. My main goal is to identify columns that are of different types between two dataframe that have the same column names.
For example
df1
Id Col1 Col2 Col3
Numeric Factor Integer Date
df2
Id Col1 Col2 Col3
Numeric Numeric Integer Date
Here both the dataframes (df1, df2) have same column names but the Col1 type is different and I am interested in identifying such columns. Expected output.
Col1 Factor Numeric
Any suggestions or tips on achieving this ?. Thanks
Try compare_df_cols() from the janitor package:
library(janitor)
mtcars2 <- mtcars
mtcars2$cyl <- as.character(mtcars2$cyl)
compare_df_cols(mtcars, mtcars2, return = "mismatch")
#> column_name mtcars mtcars2
#> 1 cyl numeric character
Self-promotion alert, I authored this package - am posting this function because it exists to solve precisely this problem.
Try this:
compareColumns <- function(df1, df2) {
commonNames <- names(df1)[names(df1) %in% names(df2)]
data.frame(Column = commonNames,
df1 = sapply(df1[,commonNames], class),
df2 = sapply(df2[,commonNames], class)) }
For a more compact method, you could use a list with sapply(). Efficiency shouldn't be a problem here since all we're doing is grabbing the class. Here I add data frame names to the list to create a more clear output.
m <- sapply(list(df1 = df1, df2 = df2), sapply, class)
m[m[, "df1"] != m[, "df2"], , drop = FALSE]
# df1 df2
# Col1 "factor" "character"
where df1 and df2 are the data from #ycw's answer.
If two data frame have same column names, then below will give you columns with different classes.
library(dplyr)
m1 = mtcars
m2 = mtcars %>% mutate(cyl = factor(cyl), vs = factor(cyl))
out = cbind(sapply(m1, class), sapply(m2, class))
out[apply(out, 1, function(x) !identical(x[1], x[2])), ]
We can use sapply with class to loop through all columns in df1 and df2. After that, we can compare the results.
# Create example data frames
df1 <- data.frame(ID = 1:3,
Col1 = as.character(2:4),
Col2 = 2:4,
Col3 = as.Date(paste0("2017-01-0", 2:4)))
df2 <- data.frame(ID = 1:3,
Col1 = as.character(2:4),
Col2 = 2:4,
Col3 = as.Date(paste0("2017-01-0", 2:4)),
stringsAsFactors = FALSE)
# Use sapply and class to find out all the class
class1 <- sapply(df1, class)
class2 <- sapply(df2, class)
# Combine the results, then filter for rows that are different
result <- data.frame(class1, class2, stringsAsFactors = FALSE)
result[!(result$class1 == result$class2), ]
class1 class2
Col1 factor character