adding a column across a list of tibbles - r

I am reading using readxl to bring data into R as a list of tibbles.
library("readxl")
library("reshape2")
xlsx_example<-readxl_example("datasets.xlsx")
AllSheets<-lapply(excel_sheets(xlsx_example), read_excel, path=xlsx_example)
AllSheets
This then brings up 4 tibbles, [[1]] through [[4]]].
I would like to add a new column to all four, with a unique label in each. if it were a single data frame I would use
AllSheets%Newcolumn<-"number1"
But this does not work when you have a list of tibbles. Is there a way to add NewColumn to all the sheets, with "number1", "number2", etc in each sheet?

You can use Map for this:
newdf <- Map(function(x, y) {
x$Newcolumn <- y
x
},
AllSheets,
c('Number1', 'Number2', 'Number3', 'Number4'))
And the output
List of 4
$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 150 obs. of 6 variables:
..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
..$ Species : chr [1:150] "setosa" "setosa" "setosa" "setosa" ...
..$ Newcolumn : chr [1:150] "Number1" "Number1" "Number1" "Number1" ...
$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 32 obs. of 12 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp : num [1:32] 160 160 108 258 360 ...
..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ drat : num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec : num [1:32] 16.5 17 18.6 19.4 17 ...
..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ gear : num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ carb : num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
..$ Newcolumn: chr [1:32] "Number2" "Number2" "Number2" "Number2" ...
$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 71 obs. of 3 variables:
..$ weight : num [1:71] 179 160 136 227 217 168 108 124 143 140 ...
..$ feed : chr [1:71] "horsebean" "horsebean" "horsebean" "horsebean" ...
..$ Newcolumn: chr [1:71] "Number3" "Number3" "Number3" "Number3" ...
$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1000 obs. of 6 variables:
..$ lat : num [1:1000] -20.4 -20.6 -26 -18 -20.4 ...
..$ long : num [1:1000] 182 181 184 182 182 ...
..$ depth : num [1:1000] 562 650 42 626 649 195 82 194 211 622 ...
..$ mag : num [1:1000] 4.8 4.2 5.4 4.1 4 4 4.8 4.4 4.7 4.3 ...
..$ stations : num [1:1000] 41 15 43 19 11 12 43 15 35 19 ...
..$ Newcolumn: chr [1:1000] "Number4" "Number4" "Number4" "Number4" ...

Related

Selecting subsets of each dataset in a list in R

After using kfold from the dismo package, I am attempting to select a subset of the groups that this function makes from different datasets in a list in R. In an individual datset, this is easy:
#With an individual dataset:
library(dismo)
data_car <- mtcars
group_presence <- kfold(x = data_car, k = 5) # kfold is in dismo package
# Separate observations into training and testing groups:
presence_train <- data_car[group_presence != 1, ]
But, I can't seem to get it to work across multiple datasets in a list in R:
#Now, with listed datasets:
data_1 <- mtcars
data_2 <- iris
mylist <- list(data_1, data_2)
mylist_data <- lapply(mylist, function(q) {
data = q
return(data)
})
mylist_groups <- lapply(mylist, function(q) {
group_item = kfold(x = q,
k = 5)
q$group_obj = group_item
return(q)
})
presence_train <- mylist_groups[group_obj != 1, ]
#Result:
Error: object 'group_obj' not found
We could use Map
out <- Map(function(x, y) x[y !=1, ], mylist, mylist_groups)
where
mylist_groups <- lapply(mylist, function(q) {
kfold(x = q,
k = 5)})
-output
> str(out)
List of 2
$ :'data.frame': 26 obs. of 11 variables:
..$ mpg : num [1:26] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:26] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp: num [1:26] 160 160 108 258 360 ...
..$ hp : num [1:26] 110 110 93 110 175 105 245 62 95 123 ...
..$ drat: num [1:26] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ wt : num [1:26] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec: num [1:26] 16.5 17 18.6 19.4 17 ...
..$ vs : num [1:26] 0 0 1 1 0 1 0 1 1 1 ...
..$ am : num [1:26] 1 1 1 0 0 0 0 0 0 0 ...
..$ gear: num [1:26] 4 4 4 3 3 3 3 4 4 4 ...
..$ carb: num [1:26] 4 4 1 1 2 1 4 2 2 4 ...
$ :'data.frame': 120 obs. of 5 variables:
..$ Sepal.Length: num [1:120] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
..$ Sepal.Width : num [1:120] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
..$ Petal.Length: num [1:120] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
..$ Petal.Width : num [1:120] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Remove a dataframe in a nested list and add another dataframe in same nested list in R

I want to remove the "$ plink_fam:'data.frame" from a nested list called "format_files" in list (genotypes) and add another dataframe called "df" in same nested list "format files".
how can I do it ?
code
glimpse(genotypes)
output
List of 2
$ id_snps : chr [1:45807] "BovineHD0100000015" "Hapmap43437-BTA-101873" "BovineHD0100000062" "ARS-BFGL-NGS-16466" ...
$ format_files:List of 2
..$ plink_fam:'data.frame': 38996 obs. of 6 variables:
.. ..$ pedigree: logi [1:38996] NA NA NA NA NA NA ...
.. ..$ member : int [1:38996] 407243954 407537778 408990264 409742750 409817894 409859435 409922125 410570238 410829671 411075330 ...
.. ..$ father : int [1:38996] 400004752 400005622 412300604 412300604 400005917 400005850 400005850 400005375 400005607 400005356 ...
.. ..$ mother : int [1:38996] 406249617 406901234 411694156 408626860 410533913 411102034 411657369 407288999 408611867 407723032 ...
.. ..$ sex : int [1:38996] 2 2 2 2 2 2 2 2 2 2 ...
.. ..$ affected: logi [1:38996] NA NA NA NA NA NA ...
..$ plink_map:'data.frame': 45807 obs. of 6 variables:
.. ..$ chromosome: int [1:45807] 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ snp.name : chr [1:45807] "BovineHD0100000015" "Hapmap43437-BTA-101873" "BovineHD0100000062" "ARS-BFGL-NGS-16466" ...
.. ..$ cM : logi [1:45807] NA NA NA NA NA NA ...
.. ..$ position : int [1:45807] 36337 135098 206470 267940 347418 348331 393248 471078 516341 533815 ...
.. ..$ allele.1 : chr [1:45807] "G" "G" "C" "T" ...
.. ..$ allele.2 : chr [1:45807] "A" "A" "T" "C" ...
Assign the new data.frame in place, with an index returned by grep, then change the list member name.
i <- grep("plink_fam", names(genotypes$format_files))
genotypes$format_files[[i]] <- cars
names(genotypes$format_files)[i] <- "df"
str(genotypes)
#List of 2
# $ id_snps : int [1:10] 1 2 3 4 5 6 7 8 9 10
# $ format_files:List of 2
# ..$ df :'data.frame': 50 obs. of 2 variables:
# .. ..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
# .. ..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
# ..$ plink_map:'data.frame': 32 obs. of 11 variables:
# .. ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# .. ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# .. ..$ disp: num [1:32] 160 160 108 258 360 ...
# .. ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
# .. ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# .. ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
# .. ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
# .. ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
# .. ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
# .. ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
# .. ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
Test data
genotypes <- list(
id_snps = 1:10,
format_files = list(
plink_fam = iris,
plink_map = mtcars
)
)

append n identical dataframes with id [duplicate]

This question already has answers here:
Repeat rows of a data.frame N times
(10 answers)
Closed 4 years ago.
I want to append n identical data frames to each other. This works if n=2:
> d = data.frame(a=1:2)
> dplyr::bind_rows(d,d, .id="id")
# id a
# 1 1
# 1 2
# 2 1
# 2 2
But I don't know how to extend this to larger values of n, without manually typing something like dplyr::bind_rows(d,d,d .id="id") for n = 3. Is there some smart way to programatically feed a list of d with length=n to the bind_rows command? This doesn't work: dplyr::bind_rows(rep(d,3), .id="id").
Also - is there a data.table solution?
Here's a solution using data.table::rbindlist():
library(data.table)
l <- list(mtcars, mtcars*2, mtcars*3)
DATA
# Check l
> str(l)
List of 3
$ :'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp: num [1:32] 160 160 108 258 360 ...
..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
$ :'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 42 42 45.6 42.8 37.4 36.2 28.6 48.8 45.6 38.4 ...
..$ cyl : num [1:32] 12 12 8 12 16 12 16 8 8 12 ...
..$ disp: num [1:32] 320 320 216 516 720 ...
..$ hp : num [1:32] 220 220 186 220 350 210 490 124 190 246 ...
..$ drat: num [1:32] 7.8 7.8 7.7 6.16 6.3 5.52 6.42 7.38 7.84 7.84 ...
..$ wt : num [1:32] 5.24 5.75 4.64 6.43 6.88 6.92 7.14 6.38 6.3 6.88 ...
..$ qsec: num [1:32] 32.9 34 37.2 38.9 34 ...
..$ vs : num [1:32] 0 0 2 2 0 2 0 2 2 2 ...
..$ am : num [1:32] 2 2 2 0 0 0 0 0 0 0 ...
..$ gear: num [1:32] 8 8 8 6 6 6 6 8 8 8 ...
..$ carb: num [1:32] 8 8 2 2 4 2 8 4 4 8 ...
$ :'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 63 63 68.4 64.2 56.1 54.3 42.9 73.2 68.4 57.6 ...
..$ cyl : num [1:32] 18 18 12 18 24 18 24 12 12 18 ...
..$ disp: num [1:32] 480 480 324 774 1080 ...
..$ hp : num [1:32] 330 330 279 330 525 315 735 186 285 369 ...
..$ drat: num [1:32] 11.7 11.7 11.55 9.24 9.45 ...
..$ wt : num [1:32] 7.86 8.62 6.96 9.64 10.32 ...
..$ qsec: num [1:32] 49.4 51.1 55.8 58.3 51.1 ...
..$ vs : num [1:32] 0 0 3 3 0 3 0 3 3 3 ...
..$ am : num [1:32] 3 3 3 0 0 0 0 0 0 0 ...
..$ gear: num [1:32] 12 12 12 9 9 9 9 12 12 12 ...
..$ carb: num [1:32] 12 12 3 3 6 3 12 6 6 12 ...
CODE & OUTPUT
dat <- rbindlist(l, use.names = T, fill = T)
# Verify if data looks like what we want
> str(dat)
Classes ‘data.table’ and 'data.frame': 96 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
- attr(*, ".internal.selfref")=<externalptr>

best way to store many training data on R

I want to random the dataset I have on R for 100 times and want to see which training and testing data give the best model result. how I should store these data so I can compare the prediction result? should I make different variable for each one training and testing data or save it on an array? I'm pretty new on R so I don't really know how to do it in the best way. I'm using RStudio 1.1.423.
This is how I random the data, I use holdout function from package rminer
H=holdout(myData$salary, ratio = 2/3, mode = "random")
trainData <- myData[H$tr,]
testData <- myData[H$ts,]
trainData and testData is the variable I made to store the training and testing data. myData is my dataset.
Whenever I deal with multiple frames of the same structure, I tend to put them into a list and do "one thing" to everything in that list. A good reference for this can be found here: How do I make a list of data frames?.
In this example, there are a couple of ways to proceed. I don't have your data, so I'll use mtcars:
dat <- mtcars[1:3]
ntrain <- (2/3) * nrow(dat)
n <- 3 # 100 for you?
Reproducibility is important, but hard-coding set.seed can be problematic (academically, at least), so here's a randomly-generated seed that we track/store:
(seed <- sample(.Machine$integer.max, size=1L))
seed
# [1] 558990070
I like to store the indices for easy recall later.
set.seed(seed)
inds <- replicate(n, sample(nrow(dat), size=ntrain), simplify=FALSE)
str(inds)
# List of 3
# $ : int [1:21] 22 32 15 16 30 20 21 3 14 1 ...
# $ : int [1:21] 6 11 17 24 22 9 15 4 10 21 ...
# $ : int [1:21] 23 26 4 21 14 10 20 17 32 28 ...
Now these can be used easily to generate your training and test sets:
trains <- lapply(inds, function(i) dat[i,,drop=FALSE])
tests <- lapply(inds, function(i) dat[-i,,drop=FALSE])
str(tests)
# List of 3
# $ :'data.frame': 11 obs. of 3 variables:
# ..$ mpg : num [1:11] 18.1 14.3 24.4 22.8 17.8 32.4 30.4 13.3 19.2 27.3 ...
# ..$ cyl : num [1:11] 6 8 4 4 6 4 4 8 8 4 ...
# ..$ disp: num [1:11] 225 360 147 141 168 ...
# $ :'data.frame': 11 obs. of 3 variables:
# ..$ mpg : num [1:11] 21 18.7 24.4 16.4 17.3 10.4 33.9 19.2 26 15.8 ...
# ..$ cyl : num [1:11] 6 8 4 8 8 8 4 8 4 8 ...
# ..$ disp: num [1:11] 160 360 147 276 276 ...
# $ :'data.frame': 11 obs. of 3 variables:
# ..$ mpg : num [1:11] 21 18.7 18.1 22.8 17.8 17.3 10.4 32.4 30.4 19.2 ...
# ..$ cyl : num [1:11] 6 8 6 4 6 8 8 4 4 8 ...
# ..$ disp: num [1:11] 160 360 225 141 168 ...
Alternatively, you can generate both train/test in each element, though I don't know if this adds much value:
str(both)
# List of 3
# $ :List of 3
# ..$ ind : int [1:21] 22 32 15 16 30 20 21 3 14 1 ...
# ..$ train:'data.frame': 21 obs. of 3 variables:
# .. ..$ mpg : num [1:21] 15.5 21.4 10.4 10.4 19.7 33.9 21.5 22.8 15.2 21 ...
# .. ..$ cyl : num [1:21] 8 4 8 8 6 4 4 4 8 6 ...
# .. ..$ disp: num [1:21] 318 121 472 460 145 ...
# ..$ test :'data.frame': 11 obs. of 3 variables:
# .. ..$ mpg : num [1:11] 18.1 14.3 24.4 22.8 17.8 32.4 30.4 13.3 19.2 27.3 ...
# .. ..$ cyl : num [1:11] 6 8 4 4 6 4 4 8 8 4 ...
# .. ..$ disp: num [1:11] 225 360 147 141 168 ...
# $ :List of 3
# ..$ ind : int [1:21] 6 11 17 24 22 9 15 4 10 21 ...
# ..$ train:'data.frame': 21 obs. of 3 variables:
# .. ..$ mpg : num [1:21] 18.1 17.8 14.7 13.3 15.5 22.8 10.4 21.4 19.2 21.5 ...
# .. ..$ cyl : num [1:21] 6 6 8 8 8 4 8 6 6 4 ...
# .. ..$ disp: num [1:21] 225 168 440 350 318 ...
# ..$ test :'data.frame': 11 obs. of 3 variables:
# .. ..$ mpg : num [1:11] 21 18.7 24.4 16.4 17.3 10.4 33.9 19.2 26 15.8 ...
# .. ..$ cyl : num [1:11] 6 8 4 8 8 8 4 8 4 8 ...
# .. ..$ disp: num [1:11] 160 360 147 276 276 ...
# $ :List of 3
# ..$ ind : int [1:21] 23 26 4 21 14 10 20 17 32 28 ...
# ..$ train:'data.frame': 21 obs. of 3 variables:
# .. ..$ mpg : num [1:21] 15.2 27.3 21.4 21.5 15.2 19.2 33.9 14.7 21.4 30.4 ...
# .. ..$ cyl : num [1:21] 8 4 6 4 8 6 4 8 4 4 ...
# .. ..$ disp: num [1:21] 304 79 258 120 276 ...
# ..$ test :'data.frame': 11 obs. of 3 variables:
# .. ..$ mpg : num [1:11] 21 18.7 18.1 22.8 17.8 17.3 10.4 32.4 30.4 19.2 ...
# .. ..$ cyl : num [1:11] 6 8 6 4 6 8 8 4 4 8 ...
# .. ..$ disp: num [1:11] 160 360 225 141 168 ...
From here, it's just a matter of running your model against the data:
results <- lapply(trains, function(x) randomForest(mpg~., data=x, ...))
(where ... are your other model parameters). Then something like:
validation <- mapply(function(result, test) predict(result, data=test, ...),
results, tests, SIMPLIFY=FALSE)
(You can certainly do more than just predict, perhaps checking yhat or similar.)

Count rows of dataframes within a list of dataframes

I have a list of dataframes, str(datalist,max.level = 1) reveals
List of 9
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 41 obs. of 21 variables:
Now some the variables within the 21 variables of the dataframe are again dataframes. For eg. the 18th variable is a dataframe called topics which in turn contains 3 variables. How do I get the count of rows in each of the topics dataframe?
I tried using the map() function from the purrr package : x <- map(datalist, ~.x[["topics"]]) and thereafter sapply(x, NROW) but this gives me the number of rows of the original dataframe and not the topics dataframe. Any help would be appreciated.
To give you an example of what the topics dataframe looks like, datalist[[1]]$topics[[1]]
urlkey name id
1 selfdefense Self-Defense 443
2 martial Martial Arts 681
3 jujitsu Jiu Jitsu 9615
4 mixed-martial-arts Mixed Martial Arts 15514
5 kickboxing Kickboxing 18225
6 jiu-jitsu Jiu-jitsu 21219
7 brazilian-jiujitsu Brazilian Jiu-Jitsu 22237
8 mma-mixed-martial-arts MMA Mixed Martial Arts 35023
9 brazilian-jiu-jitsu Brazilian Jiu Jitsu 46818
The solution you described works for me:
Make a reproducible example:
datalist <- list(
data.frame(V1 = 1:2, topics = I(list(mtcars, mtcars))),
data.frame(V1 = 1:2, topics = I(list(mtcars, mtcars)))
)
str(datalist)
# List of 2
# $ :'data.frame': 2 obs. of 2 variables:
# ..$ V1 : int [1:2] 1 2
# ..$ topics:List of 2
# .. ..$ :'data.frame': 32 obs. of 11 variables:
# .. .. ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# .. .. ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# .. .. ..$ disp: num [1:32] 160 160 108 258 360 ...
# .. .. ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
# .. .. ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# .. .. ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
# .. .. ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
# .. .. ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
# .. .. ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
# .. .. ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
# .. .. ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
# .. ..$ :'data.frame': 32 obs. of 11 variables:
# .. .. ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# .. .. ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# .. .. ..$ disp: num [1:32] 160 160 108 258 360 ...
# .. .. ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
# .. .. ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# .. .. ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
# .. .. ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
# .. .. ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
# .. .. ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
# .. .. ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
# .. .. ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
# .. ..- attr(*, "class")= chr "AsIs"
# $ :'data.frame': 2 obs. of 2 variables:
# ..$ V1 : int [1:2] 1 2
# ..$ topics:List of 2
# .. ..$ :'data.frame': 32 obs. of 11 variables:
# .. .. ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# .. .. ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# .. .. ..$ disp: num [1:32] 160 160 108 258 360 ...
# .. .. ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
# .. .. ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# .. .. ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
# .. .. ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
# .. .. ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
# .. .. ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
# .. .. ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
# .. .. ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
# .. ..$ :'data.frame': 32 obs. of 11 variables:
# .. .. ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# .. .. ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# .. .. ..$ disp: num [1:32] 160 160 108 258 360 ...
# .. .. ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
# .. .. ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# .. .. ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
# .. .. ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
# .. .. ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
# .. .. ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
# .. .. ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
# .. .. ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
# .. ..- attr(*, "class")= chr "AsIs"
Your solution:
library(purrr)
map(datalist, ~ sapply(.x[["topics"]], NROW))
# [[1]]
# [1] 32 32
#
# [[2]]
# [1] 32 32
count_rows <- function(dfs) {
nrow(dfs$topics)
}
count <- lapply(datalist, count_rows)
The count_rows function just subsets each dataframe in the list and then applies nrow on your "topics" dataframe.

Resources