Appending/Growing List Elements Separately in r - r

I have a loop with an embedded function inside, which creates a list with the same elements but different length each time. I want the created list of elements to grow or merge in each loop. Here is a very simplified visualization of it:
# Code Body-------------------------------------------------------------
desiredList <- list()
for (i in 1:3){
# "existingList" has three dataframes of A, B embedded
# A is a dataframe with x1 and x2 columns
# B is a vector
# A, B values are calculated using "somefunction" &
# in each loop their lenght differ
existingList <- somefunction(variable[i])
# "desiredList" should also have three dataframes of A, B
# In each loop, gererated Ai, Bi append to A, B elements of "desiredList"
desiredList <- append(desiredList, (existingList))
}
# existingList in each loop--------------------------------------------
# i=1................................
A:'data.frame': 3 obs. of 2 variables:
..$ x1: num [1:3] 1 2 3
..$ x2: num [1:3] 13 26 39
B:'data.frame': 1 obs. of 1 variables:
..$ b: num [1:1] 2.6
# i=2................................
A:'data.frame': 2 obs. of 2 variables:
..$ x1: num [1:2] 4 5
..$ x2: num [1:2] 52 65
B:'data.frame': 3 obs. of 1 variables:
..$ b: num [1:3] 5.2 7.8 10.4
# i=3................................
A:'data.frame': 5 obs. of 2 variables:
..$ x1: num [1:5] 6 7 8 9 10
..$ x2: num [1:5] 78 91 104 117 130
B:'data.frame': 2 obs. of 1 variables:
..$ b: num [1:2] 13 15.6
# desiredList at the end of the loop
A:'data.frame': 10 obs. of 2 variables:
..$ x1: num [1:10] 1 2 3 4 5 6 7 8 9 10
..$ x2: num [1:10] 13 26 39 52 65 78 91 104 117 130
B:'data.frame': 6 obs. of 1 variables:
..$ b: num [1:6] 2.6 5.2 7.8 10.4 13 15.6
I have tried "append", "lapply", "Map", and bunches of other functions. However, none gives the correct answer.

Your desired output requires "row-binding" the dataframes, as in:
library(tibble)
a1 <- tribble(
~A1, ~B1, ~C1,
1, 13 , 2.6,
2, 26 , 5.2,
3, 39 , 7.8,
7, 91 , 18.2,
8, 104, 20.8
)
a2 <- tribble(
~A2, ~B2,~C2,
4, 52, 10.4,
5, 65, 13,
6, 78, 15.6
)
a3 <- tribble(
~A3, ~B3, ~C3,
9, 117, 23.4,
10, 130, 26
)
out <- lapply(
X = list(a1, a2, a3),
FUN = function(x) `names<-`(x, substr(names(x), 1,1))
)
do.call("rbind", out)
#> # A tibble: 10 x 3
#> A B C
#> <dbl> <dbl> <dbl>
#> 1 1 13 2.6
#> 2 2 26 5.2
#> 3 3 39 7.8
#> 4 7 91 18.2
#> 5 8 104 20.8
#> 6 4 52 10.4
#> 7 5 65 13
#> 8 6 78 15.6
#> 9 9 117 23.4
#> 10 10 130 26
dplyr::bind_rows(out)
#> # A tibble: 10 x 3
#> A B C
#> <dbl> <dbl> <dbl>
#> 1 1 13 2.6
#> 2 2 26 5.2
#> 3 3 39 7.8
#> 4 7 91 18.2
#> 5 8 104 20.8
#> 6 4 52 10.4
#> 7 5 65 13
#> 8 6 78 15.6
#> 9 9 117 23.4
#> 10 10 130 26
Created on 2020-07-21 by the reprex package (v0.3.0)
If you have control of the data frame names and can avoid appending numbers to them, you can row bind them more easily. dplyr::bind_rows() is also an easy way to bind them.

One way to solve this is by using the "Map" function. The only problem with Map is that it can not merge an empty list with a list containing elements. This can be solved by applying an if-else statement. When the loop is running for the first time, the empty list is set to the existing list and for the remaining loops, the "Map" function is applied to update the desired list.
# Code Body-------------------------------------------------------------
desiredList <- list()
for (i in 1:3){
# "existingList" generated using an external function
existingList <- somefunction(variable[i])
# "desiredList" generation
if (i == 1){ # define the list in first loop
desiredList <- existingList
} else { # append the list
desiredList <- Map(rbind, desiredList, existingList)
}
}

Related

Re-creating tibbles within purrr::map

I want to use map to apply a function to each column of a tibble.
However, I don't want the tibble columns to be simplified.
I could deal with that by re-creating tibbles with one column using imap.
However, how do I do that?
Let's use a super-simple function called test to see if it works.
Default behavior: columns simplified to vectors:
test<-function(data){
data
}
tibble(v1=1:20,v2=100:119) %>% map(test)
$v1
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$v2
[1] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
This can't work because the names need to be quoted:
tibble(v1=1:20,v2=100:119) %>% imap(~test(tibble(.y=.x))) %>% str
List of 2
$ v1: tibble [20 x 1] (S3: tbl_df/tbl/data.frame)
..$ .y: int [1:20] 1 2 3 4 5 6 7 8 9 10 ...
$ v2: tibble [20 x 1] (S3: tbl_df/tbl/data.frame)
..$ .y: int [1:20] 100 101 102 103 104 105 106 107 108 109 ...
So why does this not work?
tibble(v1=1:20,v2=100:119) %>% imap(~test(tibble(!!!(.y)=.x)))
Error: unexpected '=' in "tibble(v1=1:20,v2=100:119) %>% imap(~test(tibble(!!!(.y)="
We can change the = to assignment operator (:=) and evaluate (!!) the .y on the lhs of :=
library(tibble)
library(purrr)
out <- tibble(v1=1:20,v2=100:119) %>%
imap( ~ test(tibble(!!.y := .x)))
-output
str(out)
#List of 2
# $ v1: tibble [20 × 1] (S3: tbl_df/tbl/data.frame)
# ..$ v1: int [1:20] 1 2 3 4 5 6 7 8 9 10 ...
# $ v2: tibble [20 × 1] (S3: tbl_df/tbl/data.frame)
# ..$ v2: int [1:20] 100 101 102 103 104 105 106 107 108 109 ...
You can use setNames :
tibble::tibble(v1=1:20,v2=100:119) %>% purrr::imap(~test(.x) %>% setNames(.y))
#$v1
# A tibble: 20 x 1
# v1
# <int>
# 1 1
# 2 2
# 3 3
# 4 4
# 5 5
#...
#...
#$v2
# A tibble: 20 x 1
# v2
# <int>
# 1 100
# 2 101
# 3 102
# 4 103
# 5 104
# 6 105
#...
#...

Error in ncol(xj) : object 'xj' not found when using R matplot()

Using matplot, I'm trying to plot the 2nd, 3rd and 4th columns of airquality data.frame after dividing these 3 columns by the first column of airquality.
However I'm getting an error
Error in ncol(xj) : object 'xj' not found
Why are we getting this error? The code below will reproduce this problem.
attach(airquality)
airquality[2:4] <- apply(airquality[2:4], 2, function(x) x /airquality[1])
matplot(x= airquality[,1], y= as.matrix(airquality[-1]))
You have managed to mangle your data in an interesting way. Starting with airquality before you mess with it. (And please don't attach() - it's unnecessary and sometimes dangerous/confusing.)
str(airquality)
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
After you do
airquality[2:4] <- apply(airquality[2:4], 2,
function(x) x /airquality[1])
you get
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R:'data.frame': 153 obs. of 1 variable:
..$ Ozone: num 4.63 3.28 12.42 17.39 NA ...
$ Wind :'data.frame': 153 obs. of 1 variable:
..$ Ozone: num 0.18 0.222 1.05 0.639 NA ...
$ Temp :'data.frame': 153 obs. of 1 variable:
..$ Ozone: num 1.63 2 6.17 3.44 NA ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
or
sapply(airquality,class)
## Ozone Solar.R Wind Temp Month Day
## "integer" "data.frame" "data.frame" "data.frame" "integer" "integer"
that is, you have data frames embedded within your data frame!
rm(airquality) ## clean up
Now change one character and divide by the column airquality[,1] rather than airquality[1] (divide by a vector, not a list of length one ...)
airquality[,2:4] <- apply(airquality[,2:4], 2,
function(x) x/airquality[,1])
matplot(x= airquality[,1], y= as.matrix(airquality[,-1]))
In general it's safer to use [, ...] indexing rather than [] indexing to refer to columns of a data frame unless you really know what you're doing ...

aggregate() puts multiple output columns in a matrix instead

I am to compute multiple quantiles for a certain variable:
> res1 <- aggregate(airquality$Wind, list(airquality$Month), function (x) quantile(x, c(0.9, 0.95, 0.975)))
> head(res1)
Group.1 x.90% x.95% x.97.5%
1 5 16.6000 17.5000 18.8250
2 6 14.9000 15.5600 17.3650
3 7 14.3000 14.6000 14.9000
4 8 12.6000 14.0500 14.6000
5 9 14.9600 15.5000 15.8025
The result looks good at first, but aggregate actually returns it in a very strange form, where the last 3 columns are not columns of a data.frame, but a single matrix!
> names(res1)
[1] "Group.1" "x"
> dim(res1)
[1] 5 2
> class(res1[,2])
[1] "matrix"
This causes a lot of problems in further processing.
Few questions:
Why is aggregate() behaving so strange?
Is there any way to
persuade it to make the result I expect?
Or am I perhaps using a
wrong function for this purpose? Is there any other prefered way to
get the wanted result?
Of course I could do some transformation of the output of aggregate(), but I look for some more simple and straightforward solution.
Q1: Why is the behavior so strange?
This is actually a documented behavior at ?aggregate (though it may still be unexpected). The relevant argument to look at would be simplify.
If simplify is set to FALSE, aggregate would produce a list instead in a case like this.
res2 <- aggregate(airquality$Wind, list(airquality$Month), function (x)
quantile(x, c(0.9, 0.95, 0.975)), simplify = FALSE)
str(res2)
# 'data.frame': 5 obs. of 2 variables:
# $ Group.1: int 5 6 7 8 9
# $ x :List of 5
# ..$ 1 : Named num 16.6 17.5 18.8
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 32 : Named num 14.9 15.6 17.4
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 62 : Named num 14.3 14.6 14.9
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 93 : Named num 12.6 14.1 14.6
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 124: Named num 15 15.5 15.8
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
Now, both a matrix and a list as columns may seem to be strange behavior, but I presume it's more of a case of "status by design" rather than a "bug" or a "flaw".
For instance, consider the following: We want to aggregate both the "Wind" and the "Temp" columns from the "airquality" dataset, and we know that each aggregation would result in multiple columns (like we would expect with quantile).
res3 <- aggregate(cbind(Wind, Temp) ~ Month, airquality,
function (x) quantile(x, c(0.9, 0.95, 0.975)))
res3
# Month Wind.90% Wind.95% Wind.97.5% Temp.90% Temp.95% Temp.97.5%
# 1 5 16.6000 17.5000 18.8250 74.000 77.500 79.500
# 2 6 14.9000 15.5600 17.3650 87.300 91.100 92.275
# 3 7 14.3000 14.6000 14.9000 89.000 91.500 92.000
# 4 8 12.6000 14.0500 14.6000 94.000 95.000 96.250
# 5 9 14.9600 15.5000 15.8025 91.100 92.550 93.000
In some ways, keeping these values as matrix-columns might make sense--the data aggregated data are easily accessible by their original column names:
res3$Temp
# 90% 95% 97.5%
# [1,] 74.0 77.50 79.500
# [2,] 87.3 91.10 92.275
# [3,] 89.0 91.50 92.000
# [4,] 94.0 95.00 96.250
# [5,] 91.1 92.55 93.000
Q2: How do you get the results as separate columns in a data.frame?
But a list as a column is just as awkward to deal with as a matrix as a column in many cases. If you want to "flatten" your matrix into columns, use do.call(data.frame, ...):
do.call(data.frame, res1)
# Group.1 x.90. x.95. x.97.5.
# 1 5 16.60 17.50 18.8250
# 2 6 14.90 15.56 17.3650
# 3 7 14.30 14.60 14.9000
# 4 8 12.60 14.05 14.6000
# 5 9 14.96 15.50 15.8025
str(.Last.value)
# 'data.frame': 5 obs. of 4 variables:
# $ Group.1: int 5 6 7 8 9
# $ x.90. : num 16.6 14.9 14.3 12.6 15
# $ x.95. : num 17.5 15.6 14.6 14.1 15.5
# $ x.97.5.: num 18.8 17.4 14.9 14.6 15.8a
Q3: Are there other alternatives?
As with most things R, yes of course. My preferred alternative would be to use the "data.table" package, with which you can do:
library(data.table)
as.data.table(airquality)[, as.list(quantile(Wind, c(.9, .95, .975))),
by = Month]
# Month 90% 95% 97.5%
# 1: 5 16.60 17.50 18.8250
# 2: 6 14.90 15.56 17.3650
# 3: 7 14.30 14.60 14.9000
# 4: 8 12.60 14.05 14.6000
# 5: 9 14.96 15.50 15.8025
str(.Last.value)
# Classes ‘data.table’ and 'data.frame': 5 obs. of 4 variables:
# $ Month: int 5 6 7 8 9
# $ 90% : num 16.6 14.9 14.3 12.6 15
# $ 95% : num 17.5 15.6 14.6 14.1 15.5
# $ 97.5%: num 18.8 17.4 14.9 14.6 15.8
# - attr(*, ".internal.selfref")=<externalptr>

Carc data from rda file to numeric matrix

I try to make KDA (Kernel discriminant analysis) for carc data, but when I call command X<-data.frame(scale(X)); r shows error:
"Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"
I tried to use as.numeric(as.matrix(carc)) and carc<-na.omit(carc), but it does not help either
library(ks);library(MASS);library(klaR);library(FSelector)
install.packages("klaR")
install.packages("FSelector")
library(ks);library(MASS);library(klaR);library(FSelector)
attach("carc.rda")
data<-load("carc.rda")
data
carc<-na.omit(carc)
head(carc)
class(carc) # check for its class
class(as.matrix(carc)) # change class, and
as.numeric(as.matrix(carc))
XX<-carc
X<-XX[,1:12];X.class<-XX[,13];
X<-data.frame(scale(X));
fit.pc<-princomp(X,scores=TRUE);
plot(fit.pc,type="line")
X.new<-fit.pc$scores[,1:5]; X.new<-data.frame(X.new);
cfs(X.class~.,cbind(X.new,X.class))
X.new<-fit.pc$scores[,c(1,4)]; X.new<-data.frame(X.new);
fit.kda1<-Hkda(x=X.new,x.group=X.class,pilot="samse",
bw="plugin",pre="sphere")
kda.fit1 <- kda(x=X.new, x.group=X.class, Hs=fit.kda1)
Can you help to resolve this problem and make this analysis?
Added:The car data set( Chambers, kleveland, Kleiner & Tukey 1983)
> head(carc)
P M R78 R77 H R Tr W L T D G C
AMC_Concord 4099 22 3 2 2.5 27.5 11 2930 186 40 121 3.58 US
AMC_Pacer 4749 17 3 1 3.0 25.5 11 3350 173 40 258 2.53 US
AMC_Spirit 3799 22 . . 3.0 18.5 12 2640 168 35 121 3.08 US
Audi_5000 9690 17 5 2 3.0 27.0 15 2830 189 37 131 3.20 Europe
Audi_Fox 6295 23 3 3 2.5 28.0 11 2070 174 36 97 3.70 Europe
Here is a small dataset with similar characteristics to what you describe
in order to answer this error:
"Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"
carc <- data.frame(type1=rep(c('1','2'), each=5),
type2=rep(c('5','6'), each=5),
x = rnorm(10,1,2)/10, y = rnorm(10))
This should be similar to your data.frame
str(carc)
# 'data.frame': 10 obs. of 3 variables:
# $ type1: Factor w/ 2 levels "1","2": 1 1 1 1 1 2 2 2 2 2
# $ type2: Factor w/ 2 levels "5","6": 1 1 1 1 1 2 2 2 2 2
# $ x : num -0.1177 0.3443 0.1351 0.0443 0.4702 ...
# $ y : num -0.355 0.149 -0.208 -1.202 -1.495 ...
scale(carc)
# Similar error
# Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
Using set()
require(data.table)
DT <- data.table(carc)
cols_fix <- c("type1", "type2")
for (col in cols_fix) set(DT, j=col, value = as.numeric(as.character(DT[[col]])))
str(DT)
# Classes ‘data.table’ and 'data.frame': 10 obs. of 4 variables:
# $ type1: num 1 1 1 1 1 2 2 2 2 2
# $ type2: num 5 5 5 5 5 6 6 6 6 6
# $ x : num 0.0465 0.1712 0.1582 0.1684 0.1183 ...
# $ y : num 0.155 -0.977 -0.291 -0.766 -1.02 ...
# - attr(*, ".internal.selfref")=<externalptr>
The first column(s) of your data set may be factors. Taking the data from corrgram:
library(corrgram)
carc <- auto
str(carc)
# 'data.frame': 74 obs. of 14 variables:
# $ Model : Factor w/ 74 levels "AMC Concord ",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ Origin: Factor w/ 3 levels "A","E","J": 1 1 1 2 2 2 1 1 1 1 ...
# $ Price : int 4099 4749 3799 9690 6295 9735 4816 7827 5788 4453 ...
# $ MPG : int 22 17 22 17 23 25 20 15 18 26 ...
# $ Rep78 : num 3 3 NA 5 3 4 3 4 3 NA ...
# $ Rep77 : num 2 1 NA 2 3 4 3 4 4 NA ...
# $ Hroom : num 2.5 3 3 3 2.5 2.5 4.5 4 4 3 ...
# $ Rseat : num 27.5 25.5 18.5 27 28 26 29 31.5 30.5 24 ...
# $ Trunk : int 11 11 12 15 11 12 16 20 21 10 ...
# $ Weight: int 2930 3350 2640 2830 2070 2650 3250 4080 3670 2230 ...
# $ Length: int 186 173 168 189 174 177 196 222 218 170 ...
# $ Turn : int 40 40 35 37 36 34 40 43 43 34 ...
# $ Displa: int 121 258 121 131 97 121 196 350 231 304 ...
# $ Gratio: num 3.58 2.53 3.08 3.2 3.7 3.64 2.93 2.41 2.73 2.87 ...
So exclude them by trying this:
X<-XX[,3:14]
or this
X<-XX[,-(1:2)]

add variable to a list in R

I have 28 list within a list and I try to add another variable called ID to each individual list. I found this Dataframes in a list; adding a new variable with name of dataframe to be very helpful. But when i tried his code, it doesn't work in my case. I think it's because my list doesn't have clear labels [1],[2].[3], etc.. that the code can recognize.
all$id <- rep(names(mylist), sapply(mylist, nrow))
>List of 1
$ :List of 28
..$ :'data.frame': 271 obs. of 12 variables:
.. ..$ Sample_ID : Factor w/ 271 levels "MC25",..: 19 27 2
.. ..$ Reported_Analyte : Factor w/ 10 levels "2-Butoxyethanol",..: 7 7 7
..$ Date_Collected : Factor w/ 71 levels "2010-05-08","2010-05-09",..: 8 9 1
.. ..$ Result2 : num [1:271] 0.11 0.11 0.11 0.11
..$ :'data.frame': 6 obs. of 12 variables:
.. ..$ Sample_ID : Factor w/ 271 levels "MC25",..: 19 27 2
.. ..$ Reported_Analyte : Factor w/ 10 levels "2-Butoxyethanol",..: 7 7 7
..$ Date_Collected : Factor w/ 71 levels "2010-05-08","2010-05-09",..: 8 9 1
.. ..$ Result2 : num [1:271] 0.11 0.11 0.11 0.11
It really isn't very clear what you want to achieve (the post you linked to was about collapsing over the list of data frames and adding into the collapsed version an ID variable indicating which original data frame each row in the collapsed data frame came from).
I see a complication with your data; you have a list of 28 data frames within a list. You can see that in the output from str() that is given in your Q. You can see this better with this example data set (here all the data frames are the same but that is just for expedience)
set.seed(42)
dat <- data.frame(Sample_ID = factor(sample(10)),
Reported_Analyte = factor(sample(LETTERS, 10)),
Date_Collected = Sys.Date() + 0:9,
Result2 = rnorm(10))
mylist <- list(lapply(1:28, function(x) dat))
If we look at mylist using str() we see the nature of the complication I mentioned:
R> str(mylist, max = 2)
List of 1
$ :List of 28
..$ Data_frame_ 1 :'data.frame': 10 obs. of 4 variables:
..$ Data_frame_ 2 :'data.frame': 10 obs. of 4 variables:
..$ Data_frame_ 3 :'data.frame': 10 obs. of 4 variables:
..$ Data_frame_ 4 :'data.frame': 10 obs. of 4 variables:
..$ Data_frame_ 5 :'data.frame': 10 obs. of 4 variables:
..$ Data_frame_ 6 :'data.frame': 10 obs. of 4 variables:
..$ Data_frame_ 7 :'data.frame': 10 obs. of 4 variables:
....<etc>
Where the post you linked to was starting from was the list inside your outer list and that list had named components. If you don't need the outer list, perhaps best to throw it away at this stage:
mylist2 <- mylist[[1]]
## the `[[` are important as we want the 1st component *inside* the list
## using `[` would just give us a list within a list again.
Names can then be added to this list
names(mylist2) <- paste("Data_frame_", seq_along(mylist2), sep = "")
which would result in
R> str(mylist2)
List of 28
$ Data_frame_1 :'data.frame': 10 obs. of 4 variables:
..$ Sample_ID : Factor w/ 10 levels "1","2","3","4",..: 10 9 3 6 4 8 5 1 2 7
..$ Reported_Analyte: Factor w/ 10 levels "C","F","I","J",..: 6 7 10 2 5 8 9 1 3 4
..$ Date_Collected : Date[1:10], format: "2012-05-02" "2012-05-03" ...
..$ Result2 : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
$ Data_frame_2 :'data.frame': 10 obs. of 4 variables:
..$ Sample_ID : Factor w/ 10 levels "1","2","3","4",..: 10 9 3 6 4 8 5 1 2 7
..$ Reported_Analyte: Factor w/ 10 levels "C","F","I","J",..: 6 7 10 2 5 8 9 1 3 4
..$ Date_Collected : Date[1:10], format: "2012-05-02" "2012-05-03" ...
..$ Result2 : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
....<etc>
Notice the List of 1 is no longer reported.
If the list of data frames within a list is important to you (not sure why it would be, but OK), then you can assign the names to the [[1]]st component directly.
names(mylist[[1]]) <- paste("Data_frame_", seq_along(mylist[[1]]), sep = "")
(Notice I'm using the original mylist and on both occasions I index that list with [[1]].)
The result is similar to the above though the list within a list structure is retained:
R> str(mylist)
List of 1
$ :List of 28
..$ Data_frame_1 :'data.frame': 10 obs. of 4 variables:
.. ..$ Sample_ID : Factor w/ 10 levels "1","2","3","4",..: 10 9 3 6 4 8 5 1 2 7
.. ..$ Reported_Analyte: Factor w/ 10 levels "C","F","I","J",..: 6 7 10 2 5 8 9 1 3 4
.. ..$ Date_Collected : Date[1:10], format: "2012-05-02" "2012-05-03" ...
.. ..$ Result2 : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
..$ Data_frame_2 :'data.frame': 10 obs. of 4 variables:
.. ..$ Sample_ID : Factor w/ 10 levels "1","2","3","4",..: 10 9 3 6 4 8 5 1 2 7
.. ..$ Reported_Analyte: Factor w/ 10 levels "C","F","I","J",..: 6 7 10 2 5 8 9 1 3 4
.. ..$ Date_Collected : Date[1:10], format: "2012-05-02" "2012-05-03" ...
.. ..$ Result2 : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
....<etc>
If you now wish to proceed with collapsing the individual data frames into a single data frame, but retaining the information about which data frame they came from, we would do this for mylist2:
all2 <- do.call("rbind", mylist2)
all2 <- transform(all2, id = rep(names(mylist2), sapply(mylist2, nrow)))
rownames(all2) <- seq_len(nrow(all2)) ## reset rownames for compactness
which gives:
R> head(all2)
Sample_ID Reported_Analyte Date_Collected Result2 id
1 10 L 2012-05-02 1.3048697 Data_frame_1
2 9 R 2012-05-03 2.2866454 Data_frame_1
3 3 W 2012-05-04 -1.3888607 Data_frame_1
4 6 F 2012-05-05 -0.2787888 Data_frame_1
5 4 K 2012-05-06 -0.1333213 Data_frame_1
6 8 T 2012-05-07 0.6359504 Data_frame_1
For mylist we use something very similar, but just index into mylist using [[1]]:
all1 <- do.call("rbind", mylist[[1]])
all1 <- transform(all1, id = rep(names(mylist[[1]]), sapply(mylist[[1]], nrow)))
rownames(all1) <- seq_len(nrow(all1)) ## reset rownames for compactness
R> head(all1)
Sample_ID Reported_Analyte Date_Collected Result2 id
1 10 L 2012-05-02 1.3048697 Data_frame_1
2 9 R 2012-05-03 2.2866454 Data_frame_1
3 3 W 2012-05-04 -1.3888607 Data_frame_1
4 6 F 2012-05-05 -0.2787888 Data_frame_1
5 4 K 2012-05-06 -0.1333213 Data_frame_1
6 8 T 2012-05-07 0.6359504 Data_frame_1
As you can see repeatedly having to refer to your list of data frames as mylist[[1]] is a pain if you dont need the outer list.
Update:
If you don't want to collapse the list into a single data frame, see #Andrie's answer, but modify it to read:
ml2 <- ml1
ml2[[1]] <- lapply(seq_along(ml[[1]]), function(x)cbind(ml[[1]][[x]], id=x))
so you account for the list within list structure.
I answer this using a constructed example of a list with samples from mtcars.
First, create a list of data frames. Do this by sampling 10 rows from mtcars for each element of the list:
ml <- lapply(1:3, function(x)mtcars[sample(1:32, 10), 1:3])
So, now you have an unnamed list of 3 data frames. Next you want to add an id column. The trick is to use lapply over a sequence of list items using seq_along(ml), and then to cbind your id to each data frame:
ml2 <- lapply(seq_along(ml), function(x)cbind(ml[[x]], id=x))
The results are what you required:
str(ml2)
List of 3
$ :'data.frame': 10 obs. of 4 variables:
..$ mpg : num [1:10] 15 24.4 26 15.8 22.8 21 32.4 17.3 17.8 30.4
..$ cyl : num [1:10] 8 4 4 8 4 6 4 8 6 4
..$ disp: num [1:10] 301 147 120 351 108 ...
..$ id : int [1:10] 1 1 1 1 1 1 1 1 1 1
$ :'data.frame': 10 obs. of 4 variables:
..$ mpg : num [1:10] 33.9 19.2 24.4 10.4 30.4 22.8 16.4 21.4 15.5 21.5
..$ cyl : num [1:10] 4 6 4 8 4 4 8 6 8 4
..$ disp: num [1:10] 71.1 167.6 146.7 460 75.7 ...
..$ id : int [1:10] 2 2 2 2 2 2 2 2 2 2
$ :'data.frame': 10 obs. of 4 variables:
..$ mpg : num [1:10] 15.5 21 13.3 21.5 21.4 30.4 21 18.1 30.4 15.2
..$ cyl : num [1:10] 8 6 8 4 4 4 6 6 4 8
..$ disp: num [1:10] 318 160 350 120 121 ...
..$ id : int [1:10] 3 3 3 3 3 3 3 3 3 3

Resources