How to test if column names of multiple dataframes are same - r

Say I have 10 dataframes. I would like to check if all have same column names irrespective of their cases.
I can do this in multiple steps, but I was wondering if there is a shortcut way to do this?

We place the datasets in a list, loop over the list with lapply, get the column names, convert it to a single case, get the unique and check if the length is 1
length(unique(lapply(lst1, function(x) sort(toupper(names(x)))))) == 1
#[1] TRUE
data
lst1 <- list(mtcars, mtcars, mtcars)

You can use Reduce + intersect to get all the common column names in the list of dataframes and compare it with the names of any single dataframe in the list.
all(sort(Reduce(intersect, lapply(list_df, names))) == sort(names(list_df[[1]])))

Related

What is happening during assignment to a dataframe by lapply

Given a dataframe df and a function f which is applied to df:
df[] <- lapply(df, f)
What is the magic R is performing to replace columns in df with collection of vectors in the list from lapply? I see that the result from lapply is a list of vectors having the same names as the dataframe df. I assume some magic mapping is being done to map the vectors to df[], which is the collection of columns in df (methinks). Just works? Trying to better understand so that I remember what to use the next time.
A data.frame is merely a list of vectors having the same length. You can see it using is.list(a_data_frame). It will return TRUE.
[] can have different meaning or action depending of the object it is applied on. It even can be redefined as it is in fact a function.
[] allows to subset or insert vector columns from data.frame.
df[1] get the first column
df[1] <- 2 replace the first column with 2 (repeated in order to have the same length as other columns)
df[] return the whole data.frame
df[] <- list(c1,c2,c3) sets the content of the data.frame replacing it's current content
Plus a wide number of other way to access or set data in a data.frame (by column name, by subset of rows, of columns, ...)

Convert structure of columns in list of dataframes to character in R

I'm creating an empty list of dataframes that I will append later using lapply.
library(tidyverse)
library(dplyr)
library(purrr)
my.list <- lapply(1:192, function(x, nr = 468, nc = 1) { data.frame(symbol = matrix(nrow=nr, ncol=nc)) })
str(my.list)
If you obtain the structure of my.list you will notice that the structure of the columns within each dataframe is "logical". I would like the structure of the column in each dataframe to be character rather than logical.
Can I change anything within my lapply function above so that the columns in the resulting list of dataframes are character? Or how best would I go about this task? I'm creating this empty list of dataframes because I understand that R works faster if it doesn't have to constantly append files. Thus my next step is to perform a map function to populate each dataframe in this list of dataframes with character data.
The issue would be that by creating NA, by default it is NA_logical_. If we want to create a character column, use NA_character_. Here, we can fix with
my.list <- lapply(my.list, function(x) {x[] <- lapply(x, as.character); x})
Or while creating the data.frame column, use
my.list <- lapply(1:192, function(x) data.frame(symbol = rep(NA_character_, 468)))
The matrix route to get a single column data.frame is not ideal and is sometimes incorrect (because matrix can have only a single class whereas data.frame columns can be of different type). The easiest option is replicate the NA_character_ with n times to create a single column data.frame with n rows

How do I pass a data frame as an argument to a function?

T12 is a data frame with 22 columns (but I just want column 2 till 8) and about one million entries.
Some of the Entries are NA in column one. Everytime there is NA in first column, complete cases deletes the complete row. Everything works well.
I Have a lot more data frames and I don't want to write the whole code again for every data frame.
I would like to have something like this function and want to put as x T12, T13, T14, T15 and so on.
Might you help me?
split <- function (x){
x <- x[,2:8]
x <- x[complete.cases(x[ ,1]),]
}
If you have dataframes named "T12", "T13" etc, you can use the pattern "T" followed by a number to capture all such dataframes in a character vector using ls.
Using mget you can get dataframes from those character vector in a named list.
You can then use lapply to apply split function on each list.
new_data <- lapply(mget(ls(pattern = 'T\\d+')), split)
new_data has list of dataframes. If you want these changes to reflect in original dataframe use list2env.
list2env(new_data, .GlobalEnv)
PS - split is a default function in R, so it is better to give some different name to your function.

Searching for a list of string in a dataframe in R

I have a list of names, and a data.frame with many different columns. How can I retrieve rows in the data frame that their row.name is one of the names in my list?
For example if the row.names in my data frame has many rows, including TC09001536.hg.1 , TC03002852.hg.1 , and TC18000664.hg.1 names, which are saved in list called Top.list.
Assuming my data frame is called df then I tried:
test <- df[grep(Top.list, df$cluster_id),]
to look within cluster_id column and if matches the names in my list then give me whole rows.
This should work:
test <- df[unlist(lapply(Top.list, function(x) grep(x, df$cluster_id, fixed = TRUE))),]
The lapply(Top.list, function(x) grep(x, df$cluster_id, fixed = TRUE)) part generates a list with vectors of matching row numbers for each of your words, the unlist combines the vectors to one vector, from which your dataframe will be subsetted.

List elements to dataframes in R

How would I go about taking elements of a list and making them into dataframes, with each dataframe name consistent with the list element name?
Ex:
exlist <- list(west=c(2,3,4), north=c(2,5,6), east=c(2,4,7))
Where I'm tripping up is in the actual naming of the unique dataframes -- I can't figure out how to do this with a for() loop or with lapply:
for(i in exlist) {
i <- data.frame(exlist$i)
}
gives me an empty dataframe called i, whereas I'd expect three dataframes to be made (one called west, another called north, and another called east)
When I use lapply syntax and call the individual list element name, I get empty dataframes:
lapply(exlist, function(list) i <- data.frame(list["i"]))
yields
data frame with 0 columns and 0 rows
> $west
list..i..
1 NA
$north
list..i..
1 NA
$east
list..i..
1 NA
If you want to convert your list elements to data.frames, you can try either
lapply(exlist, as.data.frame)
Or (as suggested by #Richard), depends on your desired output:
lapply(exlist, as.data.frame.list)
It is always recommended to keep multiple data frames in a list rather than polluting your global environment, but if you insist on doing this, you could use list2env (don't do this), such as:
list2env(lapply(exlist, as.data.frame.list), .GlobalEnv)
This should create the three objects you want:
df.names <- "value" ## vector with column names here
for (i in names(exlist)) setNames(assign(i, data.frame(exlist[[i]])), df.names)

Resources