I have a list which contains 36 data frames. I want to create a list containing all the names of those data frames :
dput(myfiles[1:2])
list(structure(list(X.Treatment.1.Treatment.10.Treatment.2.Treatment.3.Treatment.4.Treatment.5.Treatment.6.Treatment.7.Treatment.8.Treatment.9 = c("Treatment.1,1,0.779269898976048,0.987582177817029,0.999865208543176,0.999637376053903,0.969316946773183,0.992798203986959,0.424960684181985,0.804869101320034,0.934784678841289",
"Treatment.10,0.779269898976048,1,0.671138248567996,0.789454098761072,0.762111859396959,0.909408486972833,0.848734212632234,-0.236126723371631,0.255300504533133,0.505840502482398",
"Treatment.2,0.987582177817029,0.671138248567996,1,0.984869671366683,0.991454531822078,0.918661911614817,0.961649044703906,0.561895346303209,0.888107698459535,0.978982111839266",
"Treatment.3,0.999865208543176,0.789454098761072,0.984869671366683,1,0.99906051831384,0.973222174821046,0.994631289318653,0.410041249133801,0.795017057233326,0.9288266084351",
"Treatment.4,0.999637376053903,0.762111859396959,0.991454531822078,0.99906051831384,1,0.962346166096083,0.989212254209048,0.449182113577399,0.820557713571369,0.944010924367408",
"Treatment.5,0.969316946773183,0.909408486972833,0.918661911614817,0.973222174821046,0.962346166096083,1,0.991784351747349,0.189407610662142,0.634294194129571,0.81878574572229",
"Treatment.6,0.992798203986959,0.848734212632234,0.961649044703906,0.994631289318653,0.989212254209048,0.991784351747349,1,0.31345701514879,0.72797778020465,0.885498274066011",
"Treatment.7,0.424960684181985,-0.236126723371631,0.561895346303209,0.410041249133801,0.449182113577399,0.189407610662142,0.31345701514879,1,0.879237827530393,0.718791431723663",
"Treatment.8,0.804869101320034,0.255300504533133,0.888107698459535,0.795017057233326,0.820557713571369,0.634294194129571,0.72797778020465,0.879237827530393,1,0.963182415401058",
"Treatment.9,0.934784678841289,0.505840502482398,0.978982111839266,0.9288266084351,0.944010924367408,0.81878574572229,0.885498274066011,0.718791431723663,0.963182415401058,1"
)), class = "data.frame", row.names = c(NA, -10L)), structure(list(
X.Treatment.1.Treatment.10.Treatment.2.Treatment.3.Treatment.4.Treatment.5.Treatment.6.Treatment.7.Treatment.8.Treatment.9 = c("Treatment.1,1,NA,NA,NA,NA,NA,NA,NA,NA,NA",
"Treatment.10,NA,1,NA,NA,NA,NA,NA,NA,NA,NA", "Treatment.2,NA,NA,1,NA,NA,NA,NA,NA,NA,NA",
"Treatment.3,NA,NA,NA,1,NA,NA,NA,NA,NA,NA", "Treatment.4,NA,NA,NA,NA,1,NA,NA,NA,NA,NA",
"Treatment.5,NA,NA,NA,NA,NA,1,NA,NA,NA,NA", "Treatment.6,NA,NA,NA,NA,NA,NA,1,NA,NA,NA",
"Treatment.7,NA,NA,NA,NA,NA,NA,NA,1,NA,NA", "Treatment.8,NA,NA,NA,NA,NA,NA,NA,NA,1,NA",
"Treatment.9,NA,NA,NA,NA,NA,NA,NA,NA,NA,1")), class = "data.frame", row.names = c(NA,
-10L)))
I want a list containing all the names of the data frames. The problem is that when I write:
names(list_median)[i]
It just returns NULL. Each data frame in the list is a correlation matrix that looks like this.
I am not understanding if this is it:
mat_names <- lapply(list_median, \(x) do.call(cbind, dimnames(x)))
mat_names <- lapply(mat_names, \(x) {colnames(x) <- c("Rows", "Cols"); x})
Here is a possible explanation why you are running into issues. The code is commented:
# extract each dataframe to global environment with this code
for (i in seq(list_median))
assign(paste0("df", i), list_median[[i]])
# you should see df1 and df2 etc.. in the Environment
# Now construct a list out of a few of df eg.df1 and df2 with a list of two dataframes:
my_list<- list(df1,df2)
# Now try to get the names
names(my_list)
# you will get NULL
# Now try this: name the dataframes like here and call the names:
my_list<- list(df1nownamed = df1, df2nownamed = df2)
names(my_list)
# and you will get:
[1] "df1nownamed" "df2nownamed"
I have a function similar to this:
testfun = function(jID,kID,d){
g=paste0(jID,kID)
date = d
bb=data.frame(g,date)
return(bb)
}
Data frame:
x=data.frame(jID = c("a","b"),kID=c("c","d"),date="20170206",stringsAsFactors = FALSE)
I want to pass each row as inputs into the function. The solutions provided here: Passing multiple arguments to a function taken from dataframe are great but in their case, the number of columns was known. How would a solution like this:
vtestfun <- (Vectorize(testfun, SIMPLIFY=FALSE))
vtestfun(x[,1],x[,2],x[,3])
be applied if the number of columns in the dataframe is not known or keeps changing?
If you can match the argument names to the column names like so:
testfun <- function(jID, kID, date){ # 'date', not 'd'
g <- paste0(jID, kID)
bb <- data.frame(g, date)
return(bb)
}
You could do:
purrr::pmap(x, testfun)
Returning:
[[1]]
g date
1 ac 20170206
[[2]]
g date
1 bd 20170206
# Data used:
x <- structure(list(jID = c("a", "b"), kID = c("c", "d"), date = c("20170206", "20170206")), class = "data.frame", row.names = c(NA, -2L))
I have 1 row of data and 50 columns in the row from a csv which I've put into a dataframe. The data is arranged across the spreadsheet like this:
"FSEG-DFGS-THDG", "SGDG-SGRE-JJDF", "DIDC-DFGS-LEMS"...
How would I select only the middle part of each element (eg, "DFGS" in the 1st one, "SGRE" in the second etc), count their occurances and display the results?
I have tried using the strsplit function but I couldn't get it to work for the entire row of data. I'm thinking a loop of some kind might be what I need
You can do unlist(strsplit(x, '-'))[seq(2, length(x)*3, 3)] (assuming your data is consistently of the form A-B-C).
# E.g.
fun <- function(x) unlist(strsplit(x, '-'))[seq(2, length(x)*3, 3)]
fun(c("FSEG-DFGS-THDG", "SGDG-SGRE-JJDF", "DIDC-DFGS-LEMS"))
# [1] "DFGS" "SGRE" "DFGS"
Edit
# Data frame
df <- structure(list(a = "FSEG-DFGS-THDG", b = "SGDG-SGRE-JJDF", c = "DIDC-DFGS-LEMS"),
class = "data.frame", row.names = c(NA, -1L))
fun(t(df[1,]))
# [1] "DFGS" "SGRE" "DFGS"
First we create a function strng() and then we apply() it on every column of df. strsplit() splits a string by "-" and strng() returns the second part.
df = data.frame(a = "ab-bc-ca", b = "gn-bc-ca", c = "kj-ll-mn")
strng = function(x) {
strsplit(x,"-")[[1]][2]
}
# table() outputs frequency of elements in the input
table(apply(df, MARGIN = 2, FUN = strng))
# output: bc ll
2 1
library(tidyverse)
df0 <- data.frame(col1 = c(5, 2), col2 = c(6, 4))
df1 <- data.frame(col1 = c(5, 2),
col2 = c(6, 4),
col3 = ifelse(apply(df0[, 1:2], 1, sum) > 10 &
df0[, 2] > 5,
"True",
"False"))
df2 <- as_tibble(df1)
I've got my data frame df1 above. I've basically "copied" it as a tibble df2. Let's mimic an analysis for this df1 data frame and df2 tibble.
identical(df1[[2]], df1[, 2])
# [1] TRUE
identical(df2[[2]], df2[, 2])
# [1] FALSE
Since df1 and df2 are essentially the "same", why do I get the TRUE/FALSE dichotomy in my code block above. What is the tibble() property that has changed?
The same question asked another way - what is the difference between [[X]] and [, X], when applied to base R, and also when used in the tidyverse?
Since all lists are vectors, we can think of this in terms of list subsetting. Take for instance:
L <- list(A = c(1, 2), B = c(1, 4))
L[[2]]
This Extracts the second element of the list. Extrapolate this to:
df1[[2]]
We get the same output as df1[, 2] hence identical(df1[[2]], df1[, 2]) returns TRUE.
The second part is to do with tibble structure ie:
typeof(as_tibble(df1)[[2]])
[1] "double"
typeof(as_tibble(df1[, 2]))
[1] "list"
The second is a list while the first is a vector hence identical returns FALSE.
Objects of class tbl_df have:(From the docs)
A class attribute of c("tbl_df", "tbl", "data.frame").
A base type of "list", where each element of the list has the same NROW().
A names attribute that is a character vector the same length as the underlying list.
A row.names attribute, included for compatibility with the base data.frame class. This attribute is only consulted to query the number of rows, any row names that might be stored there are ignored by most tibble methods.
is there any package/command in R that reads a data.frame and then creates a command that can be used to create the exactly same data.frame without loading data, i.e., all data of the data.frame would have to be stored within the command?
e.g. if one has a data.frame like this:
mydata <- data.frame(col1=c(1,2),col2=c(3,4))
I just want to get the command such that reading "mydata" results in the command on the right hand side.
BR
Fabian
The dput function "Writes an ASCII text representation of an R object to a file or connection" and is as close to the right hand side as you would get. It actually contains more details about the structure of the object, as is seen below:
> dput(mydata)
structure(list(col1 = c(1, 2), col2 = c(3, 4)), .Names = c("col1",
"col2"), row.names = c(NA, -2L), class = "data.frame")
You could also use enquote, which turns mydata back into an unevaluated call. It can then be evaluated with eval.
> ( e <- enquote(mydata) )
# quote(list(col1 = c(1, 2), col2 = c(3, 4)))
> eval(e)
# col1 col2
# 1 1 3
# 2 2 4
> identical(eval(e), mydata)
# [1] TRUE