Count rows of dataframes within a list of dataframes - r

I have a list of dataframes, str(datalist,max.level = 1) reveals
List of 9
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 200 obs. of 21 variables:
$ :'data.frame': 41 obs. of 21 variables:
Now some the variables within the 21 variables of the dataframe are again dataframes. For eg. the 18th variable is a dataframe called topics which in turn contains 3 variables. How do I get the count of rows in each of the topics dataframe?
I tried using the map() function from the purrr package : x <- map(datalist, ~.x[["topics"]]) and thereafter sapply(x, NROW) but this gives me the number of rows of the original dataframe and not the topics dataframe. Any help would be appreciated.
To give you an example of what the topics dataframe looks like, datalist[[1]]$topics[[1]]
urlkey name id
1 selfdefense Self-Defense 443
2 martial Martial Arts 681
3 jujitsu Jiu Jitsu 9615
4 mixed-martial-arts Mixed Martial Arts 15514
5 kickboxing Kickboxing 18225
6 jiu-jitsu Jiu-jitsu 21219
7 brazilian-jiujitsu Brazilian Jiu-Jitsu 22237
8 mma-mixed-martial-arts MMA Mixed Martial Arts 35023
9 brazilian-jiu-jitsu Brazilian Jiu Jitsu 46818

The solution you described works for me:
Make a reproducible example:
datalist <- list(
data.frame(V1 = 1:2, topics = I(list(mtcars, mtcars))),
data.frame(V1 = 1:2, topics = I(list(mtcars, mtcars)))
)
str(datalist)
# List of 2
# $ :'data.frame': 2 obs. of 2 variables:
# ..$ V1 : int [1:2] 1 2
# ..$ topics:List of 2
# .. ..$ :'data.frame': 32 obs. of 11 variables:
# .. .. ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# .. .. ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# .. .. ..$ disp: num [1:32] 160 160 108 258 360 ...
# .. .. ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
# .. .. ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# .. .. ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
# .. .. ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
# .. .. ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
# .. .. ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
# .. .. ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
# .. .. ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
# .. ..$ :'data.frame': 32 obs. of 11 variables:
# .. .. ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# .. .. ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# .. .. ..$ disp: num [1:32] 160 160 108 258 360 ...
# .. .. ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
# .. .. ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# .. .. ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
# .. .. ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
# .. .. ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
# .. .. ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
# .. .. ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
# .. .. ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
# .. ..- attr(*, "class")= chr "AsIs"
# $ :'data.frame': 2 obs. of 2 variables:
# ..$ V1 : int [1:2] 1 2
# ..$ topics:List of 2
# .. ..$ :'data.frame': 32 obs. of 11 variables:
# .. .. ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# .. .. ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# .. .. ..$ disp: num [1:32] 160 160 108 258 360 ...
# .. .. ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
# .. .. ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# .. .. ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
# .. .. ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
# .. .. ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
# .. .. ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
# .. .. ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
# .. .. ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
# .. ..$ :'data.frame': 32 obs. of 11 variables:
# .. .. ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# .. .. ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# .. .. ..$ disp: num [1:32] 160 160 108 258 360 ...
# .. .. ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
# .. .. ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# .. .. ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
# .. .. ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
# .. .. ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
# .. .. ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
# .. .. ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
# .. .. ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
# .. ..- attr(*, "class")= chr "AsIs"
Your solution:
library(purrr)
map(datalist, ~ sapply(.x[["topics"]], NROW))
# [[1]]
# [1] 32 32
#
# [[2]]
# [1] 32 32

count_rows <- function(dfs) {
nrow(dfs$topics)
}
count <- lapply(datalist, count_rows)
The count_rows function just subsets each dataframe in the list and then applies nrow on your "topics" dataframe.

Related

Remove a dataframe in a nested list and add another dataframe in same nested list in R

I want to remove the "$ plink_fam:'data.frame" from a nested list called "format_files" in list (genotypes) and add another dataframe called "df" in same nested list "format files".
how can I do it ?
code
glimpse(genotypes)
output
List of 2
$ id_snps : chr [1:45807] "BovineHD0100000015" "Hapmap43437-BTA-101873" "BovineHD0100000062" "ARS-BFGL-NGS-16466" ...
$ format_files:List of 2
..$ plink_fam:'data.frame': 38996 obs. of 6 variables:
.. ..$ pedigree: logi [1:38996] NA NA NA NA NA NA ...
.. ..$ member : int [1:38996] 407243954 407537778 408990264 409742750 409817894 409859435 409922125 410570238 410829671 411075330 ...
.. ..$ father : int [1:38996] 400004752 400005622 412300604 412300604 400005917 400005850 400005850 400005375 400005607 400005356 ...
.. ..$ mother : int [1:38996] 406249617 406901234 411694156 408626860 410533913 411102034 411657369 407288999 408611867 407723032 ...
.. ..$ sex : int [1:38996] 2 2 2 2 2 2 2 2 2 2 ...
.. ..$ affected: logi [1:38996] NA NA NA NA NA NA ...
..$ plink_map:'data.frame': 45807 obs. of 6 variables:
.. ..$ chromosome: int [1:45807] 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ snp.name : chr [1:45807] "BovineHD0100000015" "Hapmap43437-BTA-101873" "BovineHD0100000062" "ARS-BFGL-NGS-16466" ...
.. ..$ cM : logi [1:45807] NA NA NA NA NA NA ...
.. ..$ position : int [1:45807] 36337 135098 206470 267940 347418 348331 393248 471078 516341 533815 ...
.. ..$ allele.1 : chr [1:45807] "G" "G" "C" "T" ...
.. ..$ allele.2 : chr [1:45807] "A" "A" "T" "C" ...
Assign the new data.frame in place, with an index returned by grep, then change the list member name.
i <- grep("plink_fam", names(genotypes$format_files))
genotypes$format_files[[i]] <- cars
names(genotypes$format_files)[i] <- "df"
str(genotypes)
#List of 2
# $ id_snps : int [1:10] 1 2 3 4 5 6 7 8 9 10
# $ format_files:List of 2
# ..$ df :'data.frame': 50 obs. of 2 variables:
# .. ..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
# .. ..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
# ..$ plink_map:'data.frame': 32 obs. of 11 variables:
# .. ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# .. ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# .. ..$ disp: num [1:32] 160 160 108 258 360 ...
# .. ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
# .. ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# .. ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
# .. ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
# .. ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
# .. ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
# .. ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
# .. ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
Test data
genotypes <- list(
id_snps = 1:10,
format_files = list(
plink_fam = iris,
plink_map = mtcars
)
)

append n identical dataframes with id [duplicate]

This question already has answers here:
Repeat rows of a data.frame N times
(10 answers)
Closed 4 years ago.
I want to append n identical data frames to each other. This works if n=2:
> d = data.frame(a=1:2)
> dplyr::bind_rows(d,d, .id="id")
# id a
# 1 1
# 1 2
# 2 1
# 2 2
But I don't know how to extend this to larger values of n, without manually typing something like dplyr::bind_rows(d,d,d .id="id") for n = 3. Is there some smart way to programatically feed a list of d with length=n to the bind_rows command? This doesn't work: dplyr::bind_rows(rep(d,3), .id="id").
Also - is there a data.table solution?
Here's a solution using data.table::rbindlist():
library(data.table)
l <- list(mtcars, mtcars*2, mtcars*3)
DATA
# Check l
> str(l)
List of 3
$ :'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp: num [1:32] 160 160 108 258 360 ...
..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
$ :'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 42 42 45.6 42.8 37.4 36.2 28.6 48.8 45.6 38.4 ...
..$ cyl : num [1:32] 12 12 8 12 16 12 16 8 8 12 ...
..$ disp: num [1:32] 320 320 216 516 720 ...
..$ hp : num [1:32] 220 220 186 220 350 210 490 124 190 246 ...
..$ drat: num [1:32] 7.8 7.8 7.7 6.16 6.3 5.52 6.42 7.38 7.84 7.84 ...
..$ wt : num [1:32] 5.24 5.75 4.64 6.43 6.88 6.92 7.14 6.38 6.3 6.88 ...
..$ qsec: num [1:32] 32.9 34 37.2 38.9 34 ...
..$ vs : num [1:32] 0 0 2 2 0 2 0 2 2 2 ...
..$ am : num [1:32] 2 2 2 0 0 0 0 0 0 0 ...
..$ gear: num [1:32] 8 8 8 6 6 6 6 8 8 8 ...
..$ carb: num [1:32] 8 8 2 2 4 2 8 4 4 8 ...
$ :'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 63 63 68.4 64.2 56.1 54.3 42.9 73.2 68.4 57.6 ...
..$ cyl : num [1:32] 18 18 12 18 24 18 24 12 12 18 ...
..$ disp: num [1:32] 480 480 324 774 1080 ...
..$ hp : num [1:32] 330 330 279 330 525 315 735 186 285 369 ...
..$ drat: num [1:32] 11.7 11.7 11.55 9.24 9.45 ...
..$ wt : num [1:32] 7.86 8.62 6.96 9.64 10.32 ...
..$ qsec: num [1:32] 49.4 51.1 55.8 58.3 51.1 ...
..$ vs : num [1:32] 0 0 3 3 0 3 0 3 3 3 ...
..$ am : num [1:32] 3 3 3 0 0 0 0 0 0 0 ...
..$ gear: num [1:32] 12 12 12 9 9 9 9 12 12 12 ...
..$ carb: num [1:32] 12 12 3 3 6 3 12 6 6 12 ...
CODE & OUTPUT
dat <- rbindlist(l, use.names = T, fill = T)
# Verify if data looks like what we want
> str(dat)
Classes ‘data.table’ and 'data.frame': 96 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
- attr(*, ".internal.selfref")=<externalptr>

best way to store many training data on R

I want to random the dataset I have on R for 100 times and want to see which training and testing data give the best model result. how I should store these data so I can compare the prediction result? should I make different variable for each one training and testing data or save it on an array? I'm pretty new on R so I don't really know how to do it in the best way. I'm using RStudio 1.1.423.
This is how I random the data, I use holdout function from package rminer
H=holdout(myData$salary, ratio = 2/3, mode = "random")
trainData <- myData[H$tr,]
testData <- myData[H$ts,]
trainData and testData is the variable I made to store the training and testing data. myData is my dataset.
Whenever I deal with multiple frames of the same structure, I tend to put them into a list and do "one thing" to everything in that list. A good reference for this can be found here: How do I make a list of data frames?.
In this example, there are a couple of ways to proceed. I don't have your data, so I'll use mtcars:
dat <- mtcars[1:3]
ntrain <- (2/3) * nrow(dat)
n <- 3 # 100 for you?
Reproducibility is important, but hard-coding set.seed can be problematic (academically, at least), so here's a randomly-generated seed that we track/store:
(seed <- sample(.Machine$integer.max, size=1L))
seed
# [1] 558990070
I like to store the indices for easy recall later.
set.seed(seed)
inds <- replicate(n, sample(nrow(dat), size=ntrain), simplify=FALSE)
str(inds)
# List of 3
# $ : int [1:21] 22 32 15 16 30 20 21 3 14 1 ...
# $ : int [1:21] 6 11 17 24 22 9 15 4 10 21 ...
# $ : int [1:21] 23 26 4 21 14 10 20 17 32 28 ...
Now these can be used easily to generate your training and test sets:
trains <- lapply(inds, function(i) dat[i,,drop=FALSE])
tests <- lapply(inds, function(i) dat[-i,,drop=FALSE])
str(tests)
# List of 3
# $ :'data.frame': 11 obs. of 3 variables:
# ..$ mpg : num [1:11] 18.1 14.3 24.4 22.8 17.8 32.4 30.4 13.3 19.2 27.3 ...
# ..$ cyl : num [1:11] 6 8 4 4 6 4 4 8 8 4 ...
# ..$ disp: num [1:11] 225 360 147 141 168 ...
# $ :'data.frame': 11 obs. of 3 variables:
# ..$ mpg : num [1:11] 21 18.7 24.4 16.4 17.3 10.4 33.9 19.2 26 15.8 ...
# ..$ cyl : num [1:11] 6 8 4 8 8 8 4 8 4 8 ...
# ..$ disp: num [1:11] 160 360 147 276 276 ...
# $ :'data.frame': 11 obs. of 3 variables:
# ..$ mpg : num [1:11] 21 18.7 18.1 22.8 17.8 17.3 10.4 32.4 30.4 19.2 ...
# ..$ cyl : num [1:11] 6 8 6 4 6 8 8 4 4 8 ...
# ..$ disp: num [1:11] 160 360 225 141 168 ...
Alternatively, you can generate both train/test in each element, though I don't know if this adds much value:
str(both)
# List of 3
# $ :List of 3
# ..$ ind : int [1:21] 22 32 15 16 30 20 21 3 14 1 ...
# ..$ train:'data.frame': 21 obs. of 3 variables:
# .. ..$ mpg : num [1:21] 15.5 21.4 10.4 10.4 19.7 33.9 21.5 22.8 15.2 21 ...
# .. ..$ cyl : num [1:21] 8 4 8 8 6 4 4 4 8 6 ...
# .. ..$ disp: num [1:21] 318 121 472 460 145 ...
# ..$ test :'data.frame': 11 obs. of 3 variables:
# .. ..$ mpg : num [1:11] 18.1 14.3 24.4 22.8 17.8 32.4 30.4 13.3 19.2 27.3 ...
# .. ..$ cyl : num [1:11] 6 8 4 4 6 4 4 8 8 4 ...
# .. ..$ disp: num [1:11] 225 360 147 141 168 ...
# $ :List of 3
# ..$ ind : int [1:21] 6 11 17 24 22 9 15 4 10 21 ...
# ..$ train:'data.frame': 21 obs. of 3 variables:
# .. ..$ mpg : num [1:21] 18.1 17.8 14.7 13.3 15.5 22.8 10.4 21.4 19.2 21.5 ...
# .. ..$ cyl : num [1:21] 6 6 8 8 8 4 8 6 6 4 ...
# .. ..$ disp: num [1:21] 225 168 440 350 318 ...
# ..$ test :'data.frame': 11 obs. of 3 variables:
# .. ..$ mpg : num [1:11] 21 18.7 24.4 16.4 17.3 10.4 33.9 19.2 26 15.8 ...
# .. ..$ cyl : num [1:11] 6 8 4 8 8 8 4 8 4 8 ...
# .. ..$ disp: num [1:11] 160 360 147 276 276 ...
# $ :List of 3
# ..$ ind : int [1:21] 23 26 4 21 14 10 20 17 32 28 ...
# ..$ train:'data.frame': 21 obs. of 3 variables:
# .. ..$ mpg : num [1:21] 15.2 27.3 21.4 21.5 15.2 19.2 33.9 14.7 21.4 30.4 ...
# .. ..$ cyl : num [1:21] 8 4 6 4 8 6 4 8 4 4 ...
# .. ..$ disp: num [1:21] 304 79 258 120 276 ...
# ..$ test :'data.frame': 11 obs. of 3 variables:
# .. ..$ mpg : num [1:11] 21 18.7 18.1 22.8 17.8 17.3 10.4 32.4 30.4 19.2 ...
# .. ..$ cyl : num [1:11] 6 8 6 4 6 8 8 4 4 8 ...
# .. ..$ disp: num [1:11] 160 360 225 141 168 ...
From here, it's just a matter of running your model against the data:
results <- lapply(trains, function(x) randomForest(mpg~., data=x, ...))
(where ... are your other model parameters). Then something like:
validation <- mapply(function(result, test) predict(result, data=test, ...),
results, tests, SIMPLIFY=FALSE)
(You can certainly do more than just predict, perhaps checking yhat or similar.)

Splitting a dataset into two datasets in R (for ggplot2 channeled through Shiny)

I saw some similar questions here, but none exactly like mine - or if they were the same, I didn't recognize it, as a rank newbie to programming in R (I've programmed in lots of other languages, but not R!)
I have an input dataset from a csv file, that I convert with read.csv. The dataset may or may not, have two groups in it. I found I could split the groups as follows:
datalist <- split(mydata, mydata$group)
but then the list I get back does not play nice with ggplot2 (I get an error that it cannot plot a list variable - although the list variable, if I print it to the console, shows the split data subset?). OK, fine. But if I then do
data = as.data.frame(datalist[1])
And feed that to ggplot2, as.data.frame mangles my column names, and so I lose the name of the variable I want to plot. Augh!
What I ideally want, is to split my input data as read by read.csv, into two separate variables (data frames, I take it?) that ggplot2 can recognize as valid data sets. Actually, I want to overlay them as histograms on the same plot.
There HAS to be an easy way to do this, but I'm not gettin' it? Advice or pointers welcome.
If you just want a single index value then using subset might be easier (at least for interactive use.)
p <- qplot(value, # assuming there is a column named "value"
data = subset(mydata, group==mydata$group[1]),
colour = "cyan")
The result of split(mydata, mydata$group) is a list of data.frames. There is a difference in the [ and [[ notation: [ subsets the list where [[ extracts from the list. So datalist[1] is a list of length 1 consisting of just the first data.frame. datalist[[1]] is the data.frame which is in the first position. Since ggplot (and qplot) expects a data.frame, you need the second (double bracket) version as #Alex mentioned in the comment. I don't know why you got the error you saw and can't diagnosis it without a complete example. Using a different data set (mtcars), I don't see it.
datalist <- split(mtcars, mtcars$am)
ggplot(datalist[[1]], aes(x=wt, y=mpg)) + geom_point()
qplot(wt, data=datalist[[1]], colour="cyan")
(I'm guessing you wanted colour=I("cyan"), but that's an unrelated issue.)
The difference in the subsetting/extraction operators can be seen here:
> str(datalist)
List of 2
$ 0:'data.frame': 19 obs. of 11 variables:
..$ mpg : num [1:19] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 ...
..$ cyl : num [1:19] 6 8 6 8 4 4 6 6 8 8 ...
..$ disp: num [1:19] 258 360 225 360 147 ...
..$ hp : num [1:19] 110 175 105 245 62 95 123 123 180 180 ...
..$ drat: num [1:19] 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 ...
..$ wt : num [1:19] 3.21 3.44 3.46 3.57 3.19 ...
..$ qsec: num [1:19] 19.4 17 20.2 15.8 20 ...
..$ vs : num [1:19] 1 0 1 0 1 1 1 1 0 0 ...
..$ am : num [1:19] 0 0 0 0 0 0 0 0 0 0 ...
..$ gear: num [1:19] 3 3 3 3 4 4 4 4 3 3 ...
..$ carb: num [1:19] 1 2 1 4 2 2 4 4 3 3 ...
$ 1:'data.frame': 13 obs. of 11 variables:
..$ mpg : num [1:13] 21 21 22.8 32.4 30.4 33.9 27.3 26 30.4 15.8 ...
..$ cyl : num [1:13] 6 6 4 4 4 4 4 4 4 8 ...
..$ disp: num [1:13] 160 160 108 78.7 75.7 ...
..$ hp : num [1:13] 110 110 93 66 52 65 66 91 113 264 ...
..$ drat: num [1:13] 3.9 3.9 3.85 4.08 4.93 4.22 4.08 4.43 3.77 4.22 ...
..$ wt : num [1:13] 2.62 2.88 2.32 2.2 1.61 ...
..$ qsec: num [1:13] 16.5 17 18.6 19.5 18.5 ...
..$ vs : num [1:13] 0 0 1 1 1 1 1 0 1 0 ...
..$ am : num [1:13] 1 1 1 1 1 1 1 1 1 1 ...
..$ gear: num [1:13] 4 4 4 4 4 4 4 5 5 5 ...
..$ carb: num [1:13] 4 4 1 1 2 1 1 2 2 4 ...
> str(datalist[1])
List of 1
$ 0:'data.frame': 19 obs. of 11 variables:
..$ mpg : num [1:19] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 ...
..$ cyl : num [1:19] 6 8 6 8 4 4 6 6 8 8 ...
..$ disp: num [1:19] 258 360 225 360 147 ...
..$ hp : num [1:19] 110 175 105 245 62 95 123 123 180 180 ...
..$ drat: num [1:19] 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 ...
..$ wt : num [1:19] 3.21 3.44 3.46 3.57 3.19 ...
..$ qsec: num [1:19] 19.4 17 20.2 15.8 20 ...
..$ vs : num [1:19] 1 0 1 0 1 1 1 1 0 0 ...
..$ am : num [1:19] 0 0 0 0 0 0 0 0 0 0 ...
..$ gear: num [1:19] 3 3 3 3 4 4 4 4 3 3 ...
..$ carb: num [1:19] 1 2 1 4 2 2 4 4 3 3 ...
> str(datalist[[1]])
'data.frame': 19 obs. of 11 variables:
$ mpg : num 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 ...
$ cyl : num 6 8 6 8 4 4 6 6 8 8 ...
$ disp: num 258 360 225 360 147 ...
$ hp : num 110 175 105 245 62 95 123 123 180 180 ...
$ drat: num 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 ...
$ wt : num 3.21 3.44 3.46 3.57 3.19 ...
$ qsec: num 19.4 17 20.2 15.8 20 ...
$ vs : num 1 0 1 0 1 1 1 1 0 0 ...
$ am : num 0 0 0 0 0 0 0 0 0 0 ...
$ gear: num 3 3 3 3 4 4 4 4 3 3 ...
$ carb: num 1 2 1 4 2 2 4 4 3 3 ...

optimize (simple) loop for renaming inside list

I'm trying to learn about loops and I currently have a long list of data frames and I need to go inside a bunch of these data frames and rename some variables. I have a function, but I’m struggling to construct a smart way to loop thru my list (the real list is much longer than in the example below) and at the same time apply varying suffixes prefixes hen renaming.
Hopefully my working example below will illustrate the situation. I imagine I can build the last part into two loops, but I can't seem to figure out how I write to the data frame inside the list inside a loop.
Any help would be appreciated!
data(mtcars)
mtcarsList <- list(mtcars1 = mtcars, mtcars2 = mtcars,
mtcarsA = mtcars, mtcars = mtcars )
# function I use to renames a specific number of variables
baRadd <- function(df, vector, suffix){
names(df) <- ifelse(names(df) %in% vector,names(df),
paste(suffix, names(df), sep = "."))
return(df)}
foo <- c("mpg", "cyl", "disp")
suffix1 <- "bar"
suffix2 <- "barBAR"
suffix3 <- "barBARbar"
mtcarsList$mtcars1 <- baRadd(mtcarsList$mtcars1, foo, suffix1)
mtcarsList$mtcars2 <- baRadd(mtcarsList$mtcars2, foo, suffix2)
mtcarsList$mtcarsA <- baRadd(mtcarsList$mtcarsA, foo, suffix3)
names(mtcarsList$mtcars1)
# [1] "mpg" "cyl" "disp" "bar.hp" "bar.drat" "bar.wt"
# [7] "bar.qsec" "bar.vs" "bar.am" "bar.gear" "bar.carb"
names(mtcarsList$mtcars2)
# [1] "mpg" "cyl" "disp" "barBAR.hp" "barBAR.drat"
# [6] "barBAR.wt" "barBAR.qsec" "barBAR.vs" "barBAR.am" "barBAR.gear"
# [11] "barBAR.carb"
names(mtcarsList$mtcarsA)
# [1] "mpg" "cyl" "disp" "barBARbar.hp"
# [5] "barBARbar.drat" "barBARbar.wt" "barBARbar.qsec" "barBARbar.vs"
# [9] "barBARbar.am" "barBARbar.gear" "barBARbar.carb"
names(mtcarsList$mtcars)
# [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
# [11] "carb"
Update,
Based on DWin's response below I write this scrip that solves my issue,
# rm(list = ls(all = TRUE)) ## Clear workspace
data(mtcars)
mtcarsList <- list(mtcars1 = mtcars, mtcars2 = mtcars,
mtcarsA = mtcars, mtcars = mtcars)
## function I use to renames a specific number of variables
baRadd <- function(df, vector, suffix){
names(df) <- ifelse(names(df) %in% vector,names(df),
paste(suffix, names(df), sep = "."))
return(df)}
suffixes <- c('A', 'B', 'C') # suffixes to be added to the three dfTO
whatNOTtoRename <- c("mpg", "cyl", "disp")
# variables within the data frame I do not want to renames
dfTO <- c('mtcars1','mtcars2','mtcarsA')
# the specific data frames I need to rename
# str(mtcarsList)
mtcarsList[ names( mtcarsList[dfTO]) ] <-
mapply(baRadd, df=mtcarsList[dfTO],
suffix= suffixes,
MoreArgs=list(vector=whatNOTtoRename) , SIMPLIFY=FALSE)
str(mtcarsList)
Looks as though mapply can do this task:
> newList <- mapply(baRadd, df=mtcarsList[1:3], suffix= c(suffix1, suffix2, suffix3), MoreArgs=list(vector=foo) , SIMPLIFY=FALSE)
> str(newList)
List of 3
$ mtcars1:'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp : num [1:32] 160 160 108 258 360 ...
..$ bar.hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ bar.drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ bar.wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ bar.qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
..$ bar.vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ bar.am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ bar.gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ bar.carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
$ mtcars2:'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp : num [1:32] 160 160 108 258 360 ...
..$ barBAR.hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ barBAR.drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ barBAR.wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ barBAR.qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
..$ barBAR.vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ barBAR.am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ barBAR.gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ barBAR.carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
$ mtcarsA:'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp : num [1:32] 160 160 108 258 360 ...
..$ barBARbar.hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ barBARbar.drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ barBARbar.wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ barBARbar.qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
..$ barBARbar.vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ barBARbar.am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ barBARbar.gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ barBARbar.carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
If you wanted to assign that result to mtcarsList[1:3], that too should be possible.
To your comment: this succeeds ....
mtcarsList[ names( mtcarsList[1:3]) ] <-
mapply(baRadd, df=mtcarsList[1:3],
suffix= c(suffix1, suffix2, suffix3),
MoreArgs=list(vector=foo) , SIMPLIFY=FALSE)
# omitted output of str(mtcarsList) ....

Resources