After using kfold from the dismo package, I am attempting to select a subset of the groups that this function makes from different datasets in a list in R. In an individual datset, this is easy:
#With an individual dataset:
library(dismo)
data_car <- mtcars
group_presence <- kfold(x = data_car, k = 5) # kfold is in dismo package
# Separate observations into training and testing groups:
presence_train <- data_car[group_presence != 1, ]
But, I can't seem to get it to work across multiple datasets in a list in R:
#Now, with listed datasets:
data_1 <- mtcars
data_2 <- iris
mylist <- list(data_1, data_2)
mylist_data <- lapply(mylist, function(q) {
data = q
return(data)
})
mylist_groups <- lapply(mylist, function(q) {
group_item = kfold(x = q,
k = 5)
q$group_obj = group_item
return(q)
})
presence_train <- mylist_groups[group_obj != 1, ]
#Result:
Error: object 'group_obj' not found
We could use Map
out <- Map(function(x, y) x[y !=1, ], mylist, mylist_groups)
where
mylist_groups <- lapply(mylist, function(q) {
kfold(x = q,
k = 5)})
-output
> str(out)
List of 2
$ :'data.frame': 26 obs. of 11 variables:
..$ mpg : num [1:26] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:26] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp: num [1:26] 160 160 108 258 360 ...
..$ hp : num [1:26] 110 110 93 110 175 105 245 62 95 123 ...
..$ drat: num [1:26] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ wt : num [1:26] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec: num [1:26] 16.5 17 18.6 19.4 17 ...
..$ vs : num [1:26] 0 0 1 1 0 1 0 1 1 1 ...
..$ am : num [1:26] 1 1 1 0 0 0 0 0 0 0 ...
..$ gear: num [1:26] 4 4 4 3 3 3 3 4 4 4 ...
..$ carb: num [1:26] 4 4 1 1 2 1 4 2 2 4 ...
$ :'data.frame': 120 obs. of 5 variables:
..$ Sepal.Length: num [1:120] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
..$ Sepal.Width : num [1:120] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
..$ Petal.Length: num [1:120] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
..$ Petal.Width : num [1:120] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Related
This question already has answers here:
Repeat rows of a data.frame N times
(10 answers)
Closed 4 years ago.
I want to append n identical data frames to each other. This works if n=2:
> d = data.frame(a=1:2)
> dplyr::bind_rows(d,d, .id="id")
# id a
# 1 1
# 1 2
# 2 1
# 2 2
But I don't know how to extend this to larger values of n, without manually typing something like dplyr::bind_rows(d,d,d .id="id") for n = 3. Is there some smart way to programatically feed a list of d with length=n to the bind_rows command? This doesn't work: dplyr::bind_rows(rep(d,3), .id="id").
Also - is there a data.table solution?
Here's a solution using data.table::rbindlist():
library(data.table)
l <- list(mtcars, mtcars*2, mtcars*3)
DATA
# Check l
> str(l)
List of 3
$ :'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp: num [1:32] 160 160 108 258 360 ...
..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
$ :'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 42 42 45.6 42.8 37.4 36.2 28.6 48.8 45.6 38.4 ...
..$ cyl : num [1:32] 12 12 8 12 16 12 16 8 8 12 ...
..$ disp: num [1:32] 320 320 216 516 720 ...
..$ hp : num [1:32] 220 220 186 220 350 210 490 124 190 246 ...
..$ drat: num [1:32] 7.8 7.8 7.7 6.16 6.3 5.52 6.42 7.38 7.84 7.84 ...
..$ wt : num [1:32] 5.24 5.75 4.64 6.43 6.88 6.92 7.14 6.38 6.3 6.88 ...
..$ qsec: num [1:32] 32.9 34 37.2 38.9 34 ...
..$ vs : num [1:32] 0 0 2 2 0 2 0 2 2 2 ...
..$ am : num [1:32] 2 2 2 0 0 0 0 0 0 0 ...
..$ gear: num [1:32] 8 8 8 6 6 6 6 8 8 8 ...
..$ carb: num [1:32] 8 8 2 2 4 2 8 4 4 8 ...
$ :'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 63 63 68.4 64.2 56.1 54.3 42.9 73.2 68.4 57.6 ...
..$ cyl : num [1:32] 18 18 12 18 24 18 24 12 12 18 ...
..$ disp: num [1:32] 480 480 324 774 1080 ...
..$ hp : num [1:32] 330 330 279 330 525 315 735 186 285 369 ...
..$ drat: num [1:32] 11.7 11.7 11.55 9.24 9.45 ...
..$ wt : num [1:32] 7.86 8.62 6.96 9.64 10.32 ...
..$ qsec: num [1:32] 49.4 51.1 55.8 58.3 51.1 ...
..$ vs : num [1:32] 0 0 3 3 0 3 0 3 3 3 ...
..$ am : num [1:32] 3 3 3 0 0 0 0 0 0 0 ...
..$ gear: num [1:32] 12 12 12 9 9 9 9 12 12 12 ...
..$ carb: num [1:32] 12 12 3 3 6 3 12 6 6 12 ...
CODE & OUTPUT
dat <- rbindlist(l, use.names = T, fill = T)
# Verify if data looks like what we want
> str(dat)
Classes ‘data.table’ and 'data.frame': 96 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
- attr(*, ".internal.selfref")=<externalptr>
I am reading using readxl to bring data into R as a list of tibbles.
library("readxl")
library("reshape2")
xlsx_example<-readxl_example("datasets.xlsx")
AllSheets<-lapply(excel_sheets(xlsx_example), read_excel, path=xlsx_example)
AllSheets
This then brings up 4 tibbles, [[1]] through [[4]]].
I would like to add a new column to all four, with a unique label in each. if it were a single data frame I would use
AllSheets%Newcolumn<-"number1"
But this does not work when you have a list of tibbles. Is there a way to add NewColumn to all the sheets, with "number1", "number2", etc in each sheet?
You can use Map for this:
newdf <- Map(function(x, y) {
x$Newcolumn <- y
x
},
AllSheets,
c('Number1', 'Number2', 'Number3', 'Number4'))
And the output
List of 4
$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 150 obs. of 6 variables:
..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
..$ Species : chr [1:150] "setosa" "setosa" "setosa" "setosa" ...
..$ Newcolumn : chr [1:150] "Number1" "Number1" "Number1" "Number1" ...
$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 32 obs. of 12 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp : num [1:32] 160 160 108 258 360 ...
..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ drat : num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec : num [1:32] 16.5 17 18.6 19.4 17 ...
..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ gear : num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ carb : num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
..$ Newcolumn: chr [1:32] "Number2" "Number2" "Number2" "Number2" ...
$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 71 obs. of 3 variables:
..$ weight : num [1:71] 179 160 136 227 217 168 108 124 143 140 ...
..$ feed : chr [1:71] "horsebean" "horsebean" "horsebean" "horsebean" ...
..$ Newcolumn: chr [1:71] "Number3" "Number3" "Number3" "Number3" ...
$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1000 obs. of 6 variables:
..$ lat : num [1:1000] -20.4 -20.6 -26 -18 -20.4 ...
..$ long : num [1:1000] 182 181 184 182 182 ...
..$ depth : num [1:1000] 562 650 42 626 649 195 82 194 211 622 ...
..$ mag : num [1:1000] 4.8 4.2 5.4 4.1 4 4 4.8 4.4 4.7 4.3 ...
..$ stations : num [1:1000] 41 15 43 19 11 12 43 15 35 19 ...
..$ Newcolumn: chr [1:1000] "Number4" "Number4" "Number4" "Number4" ...
I saw some similar questions here, but none exactly like mine - or if they were the same, I didn't recognize it, as a rank newbie to programming in R (I've programmed in lots of other languages, but not R!)
I have an input dataset from a csv file, that I convert with read.csv. The dataset may or may not, have two groups in it. I found I could split the groups as follows:
datalist <- split(mydata, mydata$group)
but then the list I get back does not play nice with ggplot2 (I get an error that it cannot plot a list variable - although the list variable, if I print it to the console, shows the split data subset?). OK, fine. But if I then do
data = as.data.frame(datalist[1])
And feed that to ggplot2, as.data.frame mangles my column names, and so I lose the name of the variable I want to plot. Augh!
What I ideally want, is to split my input data as read by read.csv, into two separate variables (data frames, I take it?) that ggplot2 can recognize as valid data sets. Actually, I want to overlay them as histograms on the same plot.
There HAS to be an easy way to do this, but I'm not gettin' it? Advice or pointers welcome.
If you just want a single index value then using subset might be easier (at least for interactive use.)
p <- qplot(value, # assuming there is a column named "value"
data = subset(mydata, group==mydata$group[1]),
colour = "cyan")
The result of split(mydata, mydata$group) is a list of data.frames. There is a difference in the [ and [[ notation: [ subsets the list where [[ extracts from the list. So datalist[1] is a list of length 1 consisting of just the first data.frame. datalist[[1]] is the data.frame which is in the first position. Since ggplot (and qplot) expects a data.frame, you need the second (double bracket) version as #Alex mentioned in the comment. I don't know why you got the error you saw and can't diagnosis it without a complete example. Using a different data set (mtcars), I don't see it.
datalist <- split(mtcars, mtcars$am)
ggplot(datalist[[1]], aes(x=wt, y=mpg)) + geom_point()
qplot(wt, data=datalist[[1]], colour="cyan")
(I'm guessing you wanted colour=I("cyan"), but that's an unrelated issue.)
The difference in the subsetting/extraction operators can be seen here:
> str(datalist)
List of 2
$ 0:'data.frame': 19 obs. of 11 variables:
..$ mpg : num [1:19] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 ...
..$ cyl : num [1:19] 6 8 6 8 4 4 6 6 8 8 ...
..$ disp: num [1:19] 258 360 225 360 147 ...
..$ hp : num [1:19] 110 175 105 245 62 95 123 123 180 180 ...
..$ drat: num [1:19] 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 ...
..$ wt : num [1:19] 3.21 3.44 3.46 3.57 3.19 ...
..$ qsec: num [1:19] 19.4 17 20.2 15.8 20 ...
..$ vs : num [1:19] 1 0 1 0 1 1 1 1 0 0 ...
..$ am : num [1:19] 0 0 0 0 0 0 0 0 0 0 ...
..$ gear: num [1:19] 3 3 3 3 4 4 4 4 3 3 ...
..$ carb: num [1:19] 1 2 1 4 2 2 4 4 3 3 ...
$ 1:'data.frame': 13 obs. of 11 variables:
..$ mpg : num [1:13] 21 21 22.8 32.4 30.4 33.9 27.3 26 30.4 15.8 ...
..$ cyl : num [1:13] 6 6 4 4 4 4 4 4 4 8 ...
..$ disp: num [1:13] 160 160 108 78.7 75.7 ...
..$ hp : num [1:13] 110 110 93 66 52 65 66 91 113 264 ...
..$ drat: num [1:13] 3.9 3.9 3.85 4.08 4.93 4.22 4.08 4.43 3.77 4.22 ...
..$ wt : num [1:13] 2.62 2.88 2.32 2.2 1.61 ...
..$ qsec: num [1:13] 16.5 17 18.6 19.5 18.5 ...
..$ vs : num [1:13] 0 0 1 1 1 1 1 0 1 0 ...
..$ am : num [1:13] 1 1 1 1 1 1 1 1 1 1 ...
..$ gear: num [1:13] 4 4 4 4 4 4 4 5 5 5 ...
..$ carb: num [1:13] 4 4 1 1 2 1 1 2 2 4 ...
> str(datalist[1])
List of 1
$ 0:'data.frame': 19 obs. of 11 variables:
..$ mpg : num [1:19] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 ...
..$ cyl : num [1:19] 6 8 6 8 4 4 6 6 8 8 ...
..$ disp: num [1:19] 258 360 225 360 147 ...
..$ hp : num [1:19] 110 175 105 245 62 95 123 123 180 180 ...
..$ drat: num [1:19] 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 ...
..$ wt : num [1:19] 3.21 3.44 3.46 3.57 3.19 ...
..$ qsec: num [1:19] 19.4 17 20.2 15.8 20 ...
..$ vs : num [1:19] 1 0 1 0 1 1 1 1 0 0 ...
..$ am : num [1:19] 0 0 0 0 0 0 0 0 0 0 ...
..$ gear: num [1:19] 3 3 3 3 4 4 4 4 3 3 ...
..$ carb: num [1:19] 1 2 1 4 2 2 4 4 3 3 ...
> str(datalist[[1]])
'data.frame': 19 obs. of 11 variables:
$ mpg : num 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 ...
$ cyl : num 6 8 6 8 4 4 6 6 8 8 ...
$ disp: num 258 360 225 360 147 ...
$ hp : num 110 175 105 245 62 95 123 123 180 180 ...
$ drat: num 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 ...
$ wt : num 3.21 3.44 3.46 3.57 3.19 ...
$ qsec: num 19.4 17 20.2 15.8 20 ...
$ vs : num 1 0 1 0 1 1 1 1 0 0 ...
$ am : num 0 0 0 0 0 0 0 0 0 0 ...
$ gear: num 3 3 3 3 4 4 4 4 3 3 ...
$ carb: num 1 2 1 4 2 2 4 4 3 3 ...
I'm trying to learn about loops and I currently have a long list of data frames and I need to go inside a bunch of these data frames and rename some variables. I have a function, but I’m struggling to construct a smart way to loop thru my list (the real list is much longer than in the example below) and at the same time apply varying suffixes prefixes hen renaming.
Hopefully my working example below will illustrate the situation. I imagine I can build the last part into two loops, but I can't seem to figure out how I write to the data frame inside the list inside a loop.
Any help would be appreciated!
data(mtcars)
mtcarsList <- list(mtcars1 = mtcars, mtcars2 = mtcars,
mtcarsA = mtcars, mtcars = mtcars )
# function I use to renames a specific number of variables
baRadd <- function(df, vector, suffix){
names(df) <- ifelse(names(df) %in% vector,names(df),
paste(suffix, names(df), sep = "."))
return(df)}
foo <- c("mpg", "cyl", "disp")
suffix1 <- "bar"
suffix2 <- "barBAR"
suffix3 <- "barBARbar"
mtcarsList$mtcars1 <- baRadd(mtcarsList$mtcars1, foo, suffix1)
mtcarsList$mtcars2 <- baRadd(mtcarsList$mtcars2, foo, suffix2)
mtcarsList$mtcarsA <- baRadd(mtcarsList$mtcarsA, foo, suffix3)
names(mtcarsList$mtcars1)
# [1] "mpg" "cyl" "disp" "bar.hp" "bar.drat" "bar.wt"
# [7] "bar.qsec" "bar.vs" "bar.am" "bar.gear" "bar.carb"
names(mtcarsList$mtcars2)
# [1] "mpg" "cyl" "disp" "barBAR.hp" "barBAR.drat"
# [6] "barBAR.wt" "barBAR.qsec" "barBAR.vs" "barBAR.am" "barBAR.gear"
# [11] "barBAR.carb"
names(mtcarsList$mtcarsA)
# [1] "mpg" "cyl" "disp" "barBARbar.hp"
# [5] "barBARbar.drat" "barBARbar.wt" "barBARbar.qsec" "barBARbar.vs"
# [9] "barBARbar.am" "barBARbar.gear" "barBARbar.carb"
names(mtcarsList$mtcars)
# [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
# [11] "carb"
Update,
Based on DWin's response below I write this scrip that solves my issue,
# rm(list = ls(all = TRUE)) ## Clear workspace
data(mtcars)
mtcarsList <- list(mtcars1 = mtcars, mtcars2 = mtcars,
mtcarsA = mtcars, mtcars = mtcars)
## function I use to renames a specific number of variables
baRadd <- function(df, vector, suffix){
names(df) <- ifelse(names(df) %in% vector,names(df),
paste(suffix, names(df), sep = "."))
return(df)}
suffixes <- c('A', 'B', 'C') # suffixes to be added to the three dfTO
whatNOTtoRename <- c("mpg", "cyl", "disp")
# variables within the data frame I do not want to renames
dfTO <- c('mtcars1','mtcars2','mtcarsA')
# the specific data frames I need to rename
# str(mtcarsList)
mtcarsList[ names( mtcarsList[dfTO]) ] <-
mapply(baRadd, df=mtcarsList[dfTO],
suffix= suffixes,
MoreArgs=list(vector=whatNOTtoRename) , SIMPLIFY=FALSE)
str(mtcarsList)
Looks as though mapply can do this task:
> newList <- mapply(baRadd, df=mtcarsList[1:3], suffix= c(suffix1, suffix2, suffix3), MoreArgs=list(vector=foo) , SIMPLIFY=FALSE)
> str(newList)
List of 3
$ mtcars1:'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp : num [1:32] 160 160 108 258 360 ...
..$ bar.hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ bar.drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ bar.wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ bar.qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
..$ bar.vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ bar.am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ bar.gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ bar.carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
$ mtcars2:'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp : num [1:32] 160 160 108 258 360 ...
..$ barBAR.hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ barBAR.drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ barBAR.wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ barBAR.qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
..$ barBAR.vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ barBAR.am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ barBAR.gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ barBAR.carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
$ mtcarsA:'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp : num [1:32] 160 160 108 258 360 ...
..$ barBARbar.hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ barBARbar.drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ barBARbar.wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ barBARbar.qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
..$ barBARbar.vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ barBARbar.am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ barBARbar.gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ barBARbar.carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
If you wanted to assign that result to mtcarsList[1:3], that too should be possible.
To your comment: this succeeds ....
mtcarsList[ names( mtcarsList[1:3]) ] <-
mapply(baRadd, df=mtcarsList[1:3],
suffix= c(suffix1, suffix2, suffix3),
MoreArgs=list(vector=foo) , SIMPLIFY=FALSE)
# omitted output of str(mtcarsList) ....
I'm renaming the majority of the variables in a data frame and I'm not really impressed with my method.
Therefore, does anyone on SO have a smarter or faster way then the one presented below using only base?
data(mtcars)
# head(mtcars)
temp.mtcars <- mtcars
names(temp.mtcars) <- c((x <- c("mpg", "cyl", "disp")),
gsub('^', "baR.", setdiff(names (mtcars),x)))
str(temp.mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp : num 160 160 108 258 360 ...
$ baR.hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ baR.drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ baR.wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ baR.qsec: num 16.5 17 18.6 19.4 17 ...
$ baR.vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ baR.am : num 1 1 1 0 0 0 0 0 0 0 ...
$ baR.gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ baR.carb: num 4 4 1 1 2 1 4 2 2 4 ...
Edited for answer using base R only
The package plyr has a convenient function rename() that does what you ask. Your modified question specifies using base R only. One easy way of doing this is to simply copy the code from plyr::rename and create your own function.
rename <- function (x, replace) {
old_names <- names(x)
new_names <- unname(replace)[match(old_names, names(replace))]
setNames(x, ifelse(is.na(new_names), old_names, new_names))
}
The function rename takes an argument that is a named vector, where the elements of the vectors are the new names, and the names of the vector are the existing names. There are many ways to construct such a named vector. In the example below I simply use structure.
x <- c("mpg", "disp", "wt")
some.names <- structure(paste0("baR.", x), names=x)
some.names
mpg disp wt
"baR.mpg" "baR.disp" "baR.wt"
Now you are ready to rename:
mtcars <- rename(mtcars, replace=some.names)
The results:
'data.frame': 32 obs. of 11 variables:
$ baR.mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ baR.disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ baR.wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec : num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear : num 4 4 4 3 3 3 3 4 4 4 ...
$ carb : num 4 4 1 1 2 1 4 2 2 4 ...
I would use ifelse:
names(temp.mtcars) <- ifelse(names(mtcars) %in% c("mpg", "cyl", "disp"),
names(mtcars),
paste("bar", names(mtcars), sep = "."))
Nearly the same but without plyr:
data(mtcars)
temp.mtcars <- mtcars
carNames <- names(temp.mtcars)
modifyNames <- !(carNames %in% c("mpg", "cyl", "disp"))
names(temp.mtcars)[modifyNames] <- paste("baR.", carNames[modifyNames], sep="")
Output:
str(temp.mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp : num 160 160 108 258 360 ...
$ baR.hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ baR.drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ baR.wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ baR.qsec: num 16.5 17 18.6 19.4 17 ...
$ baR.vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ baR.am : num 1 1 1 0 0 0 0 0 0 0 ...
$ baR.gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ baR.carb: num 4 4 1 1 2 1 4 2 2 4 ...
You could use the rename.vars function in the gdata package.
It works well when you only want to replace a subset of variable names and where the order of your vector of names is not the same as the order of names in the data.frame.
Adapted from the help file:
library(gdata)
data <- data.frame(x=1:10,y=1:10,z=1:10)
names(data)
data <- rename.vars(data, from=c("z","y"), to=c("Z","Y"))
names(data)
Converts data.frame names:
[1] "x" "y" "z"
to
[1] "x" "Y" "Z"
I.e., Note how this handles the subsetting and the fact that string of names are not in the same order as the names in the data.frame.
names(df)[match(
c('old_var1','old_var2'),
names(df)
)]=c('new_var1', 'new_var2')