getting the names of data frames from list in R - r
I have a list which contains 36 data frames. I want to create a list containing all the names of those data frames :
dput(myfiles[1:2])
list(structure(list(X.Treatment.1.Treatment.10.Treatment.2.Treatment.3.Treatment.4.Treatment.5.Treatment.6.Treatment.7.Treatment.8.Treatment.9 = c("Treatment.1,1,0.779269898976048,0.987582177817029,0.999865208543176,0.999637376053903,0.969316946773183,0.992798203986959,0.424960684181985,0.804869101320034,0.934784678841289",
"Treatment.10,0.779269898976048,1,0.671138248567996,0.789454098761072,0.762111859396959,0.909408486972833,0.848734212632234,-0.236126723371631,0.255300504533133,0.505840502482398",
"Treatment.2,0.987582177817029,0.671138248567996,1,0.984869671366683,0.991454531822078,0.918661911614817,0.961649044703906,0.561895346303209,0.888107698459535,0.978982111839266",
"Treatment.3,0.999865208543176,0.789454098761072,0.984869671366683,1,0.99906051831384,0.973222174821046,0.994631289318653,0.410041249133801,0.795017057233326,0.9288266084351",
"Treatment.4,0.999637376053903,0.762111859396959,0.991454531822078,0.99906051831384,1,0.962346166096083,0.989212254209048,0.449182113577399,0.820557713571369,0.944010924367408",
"Treatment.5,0.969316946773183,0.909408486972833,0.918661911614817,0.973222174821046,0.962346166096083,1,0.991784351747349,0.189407610662142,0.634294194129571,0.81878574572229",
"Treatment.6,0.992798203986959,0.848734212632234,0.961649044703906,0.994631289318653,0.989212254209048,0.991784351747349,1,0.31345701514879,0.72797778020465,0.885498274066011",
"Treatment.7,0.424960684181985,-0.236126723371631,0.561895346303209,0.410041249133801,0.449182113577399,0.189407610662142,0.31345701514879,1,0.879237827530393,0.718791431723663",
"Treatment.8,0.804869101320034,0.255300504533133,0.888107698459535,0.795017057233326,0.820557713571369,0.634294194129571,0.72797778020465,0.879237827530393,1,0.963182415401058",
"Treatment.9,0.934784678841289,0.505840502482398,0.978982111839266,0.9288266084351,0.944010924367408,0.81878574572229,0.885498274066011,0.718791431723663,0.963182415401058,1"
)), class = "data.frame", row.names = c(NA, -10L)), structure(list(
X.Treatment.1.Treatment.10.Treatment.2.Treatment.3.Treatment.4.Treatment.5.Treatment.6.Treatment.7.Treatment.8.Treatment.9 = c("Treatment.1,1,NA,NA,NA,NA,NA,NA,NA,NA,NA",
"Treatment.10,NA,1,NA,NA,NA,NA,NA,NA,NA,NA", "Treatment.2,NA,NA,1,NA,NA,NA,NA,NA,NA,NA",
"Treatment.3,NA,NA,NA,1,NA,NA,NA,NA,NA,NA", "Treatment.4,NA,NA,NA,NA,1,NA,NA,NA,NA,NA",
"Treatment.5,NA,NA,NA,NA,NA,1,NA,NA,NA,NA", "Treatment.6,NA,NA,NA,NA,NA,NA,1,NA,NA,NA",
"Treatment.7,NA,NA,NA,NA,NA,NA,NA,1,NA,NA", "Treatment.8,NA,NA,NA,NA,NA,NA,NA,NA,1,NA",
"Treatment.9,NA,NA,NA,NA,NA,NA,NA,NA,NA,1")), class = "data.frame", row.names = c(NA,
-10L)))
I want a list containing all the names of the data frames. The problem is that when I write:
names(list_median)[i]
It just returns NULL. Each data frame in the list is a correlation matrix that looks like this.
I am not understanding if this is it:
mat_names <- lapply(list_median, \(x) do.call(cbind, dimnames(x)))
mat_names <- lapply(mat_names, \(x) {colnames(x) <- c("Rows", "Cols"); x})
Here is a possible explanation why you are running into issues. The code is commented:
# extract each dataframe to global environment with this code
for (i in seq(list_median))
assign(paste0("df", i), list_median[[i]])
# you should see df1 and df2 etc.. in the Environment
# Now construct a list out of a few of df eg.df1 and df2 with a list of two dataframes:
my_list<- list(df1,df2)
# Now try to get the names
names(my_list)
# you will get NULL
# Now try this: name the dataframes like here and call the names:
my_list<- list(df1nownamed = df1, df2nownamed = df2)
names(my_list)
# and you will get:
[1] "df1nownamed" "df2nownamed"
Related
Loop over a list of dataframes and change column names in R
I have a list of data frames in which some data frames are abit messed up with column names and my intention is to loop over the list of data frame columns, identify those data frames where the columns are messed up then be able to delete the column names and replace the first row to be column names, this is my data frames sample dput(df) structure(list(v1 = c("Silva", "Brandon", "Mango"), v2 = c("James","Jane", "Egg")), class = "data.frame", row.names = c(NA, -3L)) dput(df2) structure(list(X2 = c("v1", "Brandon", "Mango"), X..X1 = c("v2","Jane", "Egg")), class = "data.frame", row.names = c(NA, -3L)) Now this is the example of my dataframes where we have a dataframe in which the column names in df2 are appearing as rows, I need to loop through see which dataframes have messed up column names like df2 then delete the column names and replace with first row this is what I tried dflist <- list(df,df2) remNames <- c("X2", "X..x1") dflist <- c() for (i in 1:length(dflist)) { if(dflist[[i]][names(dflist[[i]])] == remNames){ colnames(dflist[[i]]) <- dflist[[i]][1,] dflist[[i]] = dflist[[i]][-1, ] } } This doesn't work, what am I missing out, my EXPECTED OUTPUT is the list of data frames to have same column names which are supposed to be V1 and V2
dflist <- list(df,df2) for (i in 1:length(dflist)) { if(any(names(dflist[[i]]) == remNames)){ colnames(dflist[[i]]) <- dflist[[i]][1,] dflist[[i]] = dflist[[i]][-1, ] } } dflist[[i]][names(dflist[[i]])] == remNames will check the enitre dataframe, hence if will return FALSE and nothing happend, consider the following example when i=2 > i=2 > dflist[[i]][names(dflist[[i]])] == remNames X2 X..X1 [1,] FALSE FALSE [2,] FALSE FALSE [3,] FALSE FALSE A better solution is to use grepl to see if the column names contain a .. or X, so the if becomes if(any(grepl('\\.\\.|X',names(dflist[[i]])))){...}
converting list variables within data frame to data frame in R
I have read data from a sav (spss) file. Using the following code: library(foreign) test <- read.spss(path_to_file, to.data.frame = TRUE) the resultant data frame is in the following format: structure(list(srl = c(4096, 15024, 4094), mem_id = c(278812, 2341700, 251337), q1 = c(2, 2, 1)), row.names = c(NA, 3L), class = "data.frame") While the object test is a data frame, each of the columns is rendered as a list. I tried the following to convert: dd <- data.frame(srl = unlist(df$srl), mem_id = unlist(df$mem_id), q1 = unlist(df$q1)) still the resultant data frame is in the same as given in the dput.
Even if we cannot reproduce it and run it so that we could check if it works, why don't you try: lst <- lst[-c(4,5)] and then new_lst <- as.data.frame(lst) ,where lst is the name of your list. I suggest remove the 4th and 5th element cause in a dataframe you probably won't need it.
How do I reorder columns for all data frames in a list in R?
I already have a list of data frames (mylist) and need to switch the first and second column for all the data frames in the list. Test Data Frame in List [reads] [phylum] 1 phylum1 2 phylum2 3 phylum3 Into.... [phylum] [reads] phylum1 1 phylum2 2 phylum3 3 I know I need to use lapply, but not sure what to input for the FUN= mylist <- lapply(mylist, FUN = mylist[ ,c("phylum", "reads")]) errors saying incorrect number of dimensions Sorry if this is a simple question and thanks in advance for your help! -Brand new R user
The FUN asks for a function that it can apply to every element in the list. You are passing mylist[ ,c("phylum", "reads")]) which is not a function. # sample data df1 <- data.frame(reads = sample(10,4), phylum = sample(10,4)) df2 <- data.frame(reads = sample(10,4), phylum = sample(10,4)) df3 <- data.frame(reads = sample(10,4), phylum = sample(10,4)) df4 <- data.frame(reads = sample(10,4), phylum = sample(10,4)) ldf <- list(df1,df2,df3,df4) ldf_re <- lapply(ldf, FUN = function(X){X[c('phylum', 'reads')]}) In the last line, the lapply will iterate through all the dataframes, they will be passed as the X argument for the function defined in the FUN argument and the columns will be dataframes will be stored in the list ldf_re with their columns rearranged.
Removing the special symbols in data.frame column values
I have two data frame each with a column Name df1: name #one2 !iftwo there_2_go come&go df1 = structure(list(name = c("#one2", "!iftwo", "there_2_go", "come&go")),.Names = c("name"), row.names = c(NA, -4L), class = "data.frame") df2: name One2 IfTwo# there-2-go come.go df2 = structure(list(name = c("One2", "IfTwo#", "there-2-go", "come.go")),.Names = c("name"), row.names = c(NA, -4L), class = "data.frame") Now to compare the two data frames for inequality is cumbersome because of special symbols using %in%. To remove the special symbols using stringR can be useful. But how exactly we can use stringR functions with %in% and display the mismatch between them have already done the mutate() to convert all in lowercasestoLower()as follows df1<-mutate(df1,name=tolower(df1$name)) df2<-mutate(df2,name=tolower(df2$name)) Current output of comparison: df2[!(df2 %in% df1),] [1] "one2" "iftwo#" "there-2-go" "come.go" Expected output as essentially the contents are same but with special symbols: df2[!(df2 %in% df1),] character(0) Question : How do we ignore the symbols in the contents of the Frame
Here it is in a function, f1 <- function(df1, df2){ i1 <- tolower(gsub('[[:punct:]]', '', df1$name)) i2 <- tolower(gsub('[[:punct:]]', '', df2$name)) d1 <- sapply(i1, function(i) grepl(paste(i2, collapse = '|'), i)) return(!d1) } f1(df, df2) # one2 iftwo there2go comego # FALSE FALSE FALSE FALSE #or use it for indexing, df2[f1(df, df2),] #character(0)
How to check identical for multiple R objects?
Suppose I have a list object such like: set.seed(123) df <- data.frame(x = rnorm(5), y = rbinom(5,2,0.5)) rownames(df) <- LETTERS[1:5] ls <- list(df1 = df, df2 = df, df3 = df) My question is how to quickly check the row names are identical across the three elements (data frames) in the ls.
You can try all(sapply(ls, rownames) == rownames(ls[[1]])) To check only the name of the ith column, you can modify this to all(sapply(ls, rownames)[i, ] == rownames(ls[[1]])[i])
You can get a list of row names with: Map(rownames, ls) so you can check that all the dataframes have the same rownames checking that there is only one unique value of row.names vector with: length(unique(Map(rownames, ls))) == 1