Lets take mtcars as example and create a new variable:
mtcars$name <- rownames(mtcars)
mtcars[,] <- lapply(mtcars, factor)
mtcars[,] <- lapply(mtcars, as.numeric)
Now the names are converted into numerics which i definitely dont want
> mtcars
mpg cyl disp hp drat wt qsec vs am gear carb name
Mazda RX4 16 2 13 11 16 9 6 1 2 2 4 18
Mazda RX4 Wag 16 2 13 11 16 12 10 1 2 2 4 19
Datsun 710 19 1 6 6 15 7 22 2 2 2 1 5
Hornet 4 Drive 17 2 16 11 5 16 24 2 1 1 1 13
Hornet Sportabout 13 3 23 15 6 18 10 1 1 1 2 14
Valiant 12 2 15 9 1 19 29 2 1 1 1 31
Duster 360 3 3 23 20 7 21 5 1 1 1 4 7
Merc 240D 20 1 12 2 11 15 27 2 1 2 2 21
How can i convert factors back into the right formats.(char,log,num ...) ?
It is possible that type.convert would suit your needs. It coerces its input to the most basic data type that can represent it. Thus, it would turn a character column that contains numbers that can be represented as integer into an integer column.
mtcars$name <- rownames(mtcars)
str(mtcars)
# 'data.frame': 32 obs. of 12 variables:
# $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
# $ disp: num 160 160 108 258 360 ...
# $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
# $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
# $ qsec: num 16.5 17 18.6 19.4 17 ...
# $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
# $ am : num 1 1 1 0 0 0 0 0 0 0 ...
# $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
# $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
# $ name: chr "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
mtcars[,] <- lapply(mtcars, factor)
str(mtcars)
# 'data.frame': 32 obs. of 12 variables:
# $ mpg : Factor w/ 25 levels "10.4","13.3",..: 16 16 19 17 13 12 3 20 19 14 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: Factor w/ 27 levels "71.1","75.7",..: 13 13 6 16 23 15 23 12 10 14 ...
# $ hp : Factor w/ 22 levels "52","62","65",..: 11 11 6 11 15 9 20 2 7 13 ...
# $ drat: Factor w/ 22 levels "2.76","2.93",..: 16 16 15 5 6 1 7 11 17 17 ...
# $ wt : Factor w/ 29 levels "1.513","1.615",..: 9 12 7 16 18 19 21 15 13 18 ...
# $ qsec: Factor w/ 30 levels "14.5","14.6",..: 6 10 22 24 10 29 5 27 30 19 ...
# $ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
# $ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
# $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
# $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
# $ name: Factor w/ 32 levels "AMC Javelin",..: 18 19 5 13 14 31 7 21 20 22 ...
mtcars[,] <- lapply(mtcars, function(x) type.convert(as.character(x), as.is = TRUE))
str(mtcars)
#'data.frame': 32 obs. of 12 variables:
#$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#$ cyl : int 6 6 4 6 8 6 8 4 4 6 ...
#$ disp: num 160 160 108 258 360 ...
#$ hp : int 110 110 93 110 175 105 245 62 95 123 ...
#$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
#$ qsec: num 16.5 17 18.6 19.4 17 ...
#$ vs : int 0 0 1 1 0 1 0 1 1 1 ...
#$ am : int 1 1 1 0 0 0 0 0 0 0 ...
#$ gear: int 4 4 4 3 3 3 3 4 4 4 ...
#$ carb: int 4 4 1 1 2 1 4 2 2 4 ...
#$ name: chr "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
If you don't store the original column classes before you turn the columns into factors, there is no way to restore this information completely. However, that shouldn't be necessary anyway.
df <- data.frame(x = factor(1:10)
,y = factor(1:10))
str(df)
df[,] <- lapply(df, function(x) {as.numeric(as.character(x))})
str(df)
result
'data.frame': 10 obs. of 2 variables:
$ x: num 1 2 3 4 5 6 7 8 9 10
$ y: num 1 2 3 4 5 6 7 8 9 10
Related
I intend to use a function to save me typing work for repetitive procedures. Many things are already working but not everything is working yet. Here is the code:
quicky <- function(df, factors){
output <- as.character(substitute(factors)[-1])
print(output)
df[,output]
for(i in names(df[,output])){
hist(df[,as.character(i)])
df[,as.character(i)] <- as.factor(df[,as.character(i)])#<- Why does this not work?
}
}
quicky(mtcars, c(cyl,hp,drat))
Request for help and explanation! Thanks in advance.
As we are looping over the column names created from 'output', just looping over those instead of further subsetting the data and getting te names. Also, in the function, return the dataset at the end
quicky <- function(df, factors){
output <- as.character(substitute(factors)[-1])
print(output)
for(i in output){
df[[i]] <- as.factor(df[[i]])
}
df
}
out <- quicky(mtcars, c(cyl,hp,drat))
str(out)
#'data.frame': 32 obs. of 11 variables:
# $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ... ###
# $ disp: num 160 160 108 258 360 ...
# $ hp : Factor w/ 22 levels "52","62","65",..: 11 11 6 11 15 9 20 2 7 13 ...###
# $ drat: Factor w/ 22 levels "2.76","2.93",..: 16 16 15 5 6 1 7 11 17 17 ...###
# $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
# $ qsec: num 16.5 17 18.6 19.4 17 ...
# $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
# $ am : num 1 1 1 0 0 0 0 0 0 0 ...
# $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
# $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
NOTE: Changed the [ to [[ so that it works with data.table and tbl_df
The reason quickly is failing to return the results of assignments to the columns of df is a peculiar feature of an R for-loop. It returns NULL. And the last function that was evaluated within your quicky function was for. So all you need to do is add a call to the value of df outside of the loop:
quicky <- function(df, factors){
output <- as.character(substitute(factors)[-1])
print(output)
df[,output]
for(i in names(df[,output])){
hist(df[,as.character(i)])
df[, i] <- as.factor(df[, i ])
}; df # add a call to evaluate `df`
}
str( quicky(mtcars, c(cyl,hp,drat)) )
#-------
[1] "cyl" "hp" "drat"
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
$ disp: num 160 160 108 258 360 ...
$ hp : Factor w/ 22 levels "52","62","65",..: 11 11 6 11 15 9 20 2 7 13 ...
$ drat: Factor w/ 22 levels "2.76","2.93",..: 16 16 15 5 6 1 7 11 17 17 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ..
This behavior of for is in contrast to most other functions in R. With a for-loop, the evaluations and assignments done within it typically become effective outside the for-loop body, i.e. in the calling environment, but the function call itself returns NULL. Most other functions have no effect outside their function body environments which then requires the programmer to assign the returned value to a named object if any lasting effect is desired. (You should, of course, not expect the value of mtcars to be affected by that action.)
names <- names(mtcars)
str(mtcars[names[1]]) # shows the str for mpg data frame
I would like to select everything EXCEPT names[1] which in this example is mpg.
Tried:
str(mtcars[!names[1]])
Error in !names[1] : invalid argument type
Also tried
str(mtcars[-names[1]])
Error in -names[1] : invalid argument to unary operator
How can I select mtcars minus names[1] feature using square braces syntax?
str(mtcars[!names %in% names[1]])
'data.frame': 32 obs. of 10 variables:
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
If you want to use numerical indexing for selection, you can just use a - in front of that to do the reverse.
str(mtcars[names[1]]) # shows the str for mpg data frame
'data.frame': 32 obs. of 1 variable:
$ mpg: num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
str(mtcars[names[-1]])
'data.frame': 32 obs. of 10 variables:
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
I am writing a function that would give the dim() and str() of a given dataset.
JustfunFun <- function(.csv) {
csv <- read.csv(.csv)
dimVal <- dim(csv)
print("The dimension of the dataset is:")
strVal <- str(csv)
print("The structute of the dataset is:")
headVal <- head(csv)
return(list(dimVal, strVal, headVal))
}
Ideally, the output must have the dimension first, the structure second and then the head of dataset.
But the output is as follows:
> JustfunFun("tips.csv")
[1] "The dimension of the dataset is:"
'data.frame': 244 obs. of 8 variables:
$ obs : int 1 2 3 4 5 6 7 8 9 10 ...
$ totbill: num 17 10.3 21 23.7 24.6 ...
$ tip : num 1.01 1.66 3.5 3.31 3.61 4.71 2 3.12 1.96 3.23 ...
$ sex : Factor w/ 2 levels "F","M": 1 2 2 2 1 2 2 2 2 2 ...
$ smoker : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ day : Factor w/ 4 levels "Fri","Sat","Sun",..: 3 3 3 3 3 3 3 3 3 3 ...
$ time : Factor w/ 2 levels "Day","Night": 2 2 2 2 2 2 2 2 2 2 ...
$ size : int 2 3 3 2 4 4 2 4 2 2 ...
[1] "The structute of the dataset is:"
[1] "The head of the dataset is:"
[[1]]
[1] 244 8
[[2]]
NULL
[[3]]
obs totbill tip sex smoker day time size
1 1 16.99 1.01 F No Sun Night 2
2 2 10.34 1.66 M No Sun Night 3
3 3 21.01 3.50 M No Sun Night 3
4 4 23.68 3.31 M No Sun Night 2
5 5 24.59 3.61 F No Sun Night 4
6 6 25.29 4.71 M No Sun Night 4
>
How do I tackle this problem?
str, like print does not return anything. You can see the last line of utils:::str.default. The easiest way to see this is try to nest a str ( ie. str(str(mtcars)) ).
This function should print the way you want, AND store the data.
JustfunFun <- function(.csv) {
csv <- read.csv(.csv)
dimVal <- dim(csv)
print("The dimension of the dataset is:")
print(dimVal)
print("The structute of the dataset is:")
strVal <- utils:::capture.output(str(csv))
print(strVal)
print(head(csv))
return(invisible(list(dimVal, strVal, head(csv))))
}
Example:
write.csv(mtcars, "mtcars.csv", row.names = FALSE)
a <- JustfunFun("mtcars.csv")
Result:
[1] "The dimension of the dataset is:"
[1] 32 11
[1] "The structute of the dataset is:"
[1] "'data.frame':\t32 obs. of 11 variables:"
[2] " $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ..."
[3] " $ cyl : int 6 6 4 6 8 6 8 4 4 6 ..."
[4] " $ disp: num 160 160 108 258 360 ..."
[5] " $ hp : int 110 110 93 110 175 105 245 62 95 123 ..."
[6] " $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ..."
[7] " $ wt : num 2.62 2.88 2.32 3.21 3.44 ..."
[8] " $ qsec: num 16.5 17 18.6 19.4 17 ..."
[9] " $ vs : int 0 0 1 1 0 1 0 1 1 1 ..."
[10] " $ am : int 1 1 1 0 0 0 0 0 0 0 ..."
[11] " $ gear: int 4 4 4 3 3 3 3 4 4 4 ..."
[12] " $ carb: int 4 4 1 1 2 1 4 2 2 4 ..."
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
str(a)
$ : int [1:2] 32 11
$ : chr [1:12] "'data.frame':\t32 obs. of 11 variables:" " $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ..." " $ cyl : int 6 6 4 6 8 6 8 4 4 6 ..." " $ disp: num 160 160 108 258 360 ..." ...
$ :'data.frame': 6 obs. of 11 variables:
..$ mpg : num [1:6] 21 21 22.8 21.4 18.7 18.1
..$ cyl : int [1:6] 6 6 4 6 8 6
..$ disp: num [1:6] 160 160 108 258 360 225
..$ hp : int [1:6] 110 110 93 110 175 105
..$ drat: num [1:6] 3.9 3.9 3.85 3.08 3.15 2.76
..$ wt : num [1:6] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec: num [1:6] 16.5 17 18.6 19.4 17 ...
..$ vs : int [1:6] 0 0 1 1 0 1
..$ am : int [1:6] 1 1 1 0 0 0
..$ gear: int [1:6] 4 4 4 3 3 3
..$ carb: int [1:6] 4 4 1 1 2 1
Everything that you have written in your function is correct except the fact that you need to capture the strby command capture.output. So, below is the function that you are looking for:
JustfunFun <- function(.csv) {
csv <- read.csv(.csv)
dimVal <- dim(csv)
strVal <- capture.output(str(csv))
headVal <- head(csv)
return(list("The dimension of the dataset is:" = dimVal,
"The structute of the dataset is:" = strVal,
headVal))
}
Cheers & happy R Coding
If you are only interested in displaying the result without storing it, you could write a function that does not return anything but print the dimensions, the structure and the head of the data. The following code seems to do the trick.
# Simulation of the data
dat <- data.frame(obs=1:100,totbill=round(100*runif(100)),sex=sample(c("F","M"),100,replace=TRUE))
# Function
dim.str.head <- function(dat){
print("The dimension of the dataset is:")
print(dim(dat))
print("The structure of the dataset is:")
print(str(dat))
print("The first observations look like this:")
head(dat)
}
# Try the function
dim.str.head(dat)
I have a big ol' data frame with two ID columns for courses and users, and I needed to split it into one dataframe per course to do some further analysis/subsetting. After eliminating quite a few rows from each of the individual course dataframes, I'll need to stick them back together.
I split it up using, you guessed it, split, and that worked exactly as I needed it to. However, unsplitting was harder than I thought. The R documentation says that "unsplit reverses the effect of split," but my reading on the web so far is suggesting that that is not the case when the elements of the split-out list are themselves dataframes.
What can I do to rejoin my modified dfs?
This is a place for do.call. Simply calling df <- rbind(split.df) will result in a weird and useless list object, but do.call("rbind", split.df) should give you the result you're looking for.
unsplit() will work / does seem to work in the general situation that you describe, but not the particular situation of removing rows from the thus split data frame.
Consider
> spl <- split(mtcars, mtcars$cyl)
> str(spl, max = 1)
List of 3
$ 4:'data.frame': 11 obs. of 11 variables:
$ 6:'data.frame': 7 obs. of 11 variables:
$ 8:'data.frame': 14 obs. of 11 variables:
> str(unsplit(spl, f = mtcars$cyl))
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
As we can see, unsplit() can undo a split. However, in the case where the split data frame is further worked upon and altered to remove rows, there will be a mismatch between the total number of rows in the data frames in the split list and the variable used to split the original data frame.
If you know or can compute the changes required to make the variable used to split the original data frame then unsplit() can be deployed. Though it is more than likely that this will not be trivial.
The general solution is, as #Andrew Sannier mentions is the do.call(rbind, ...) idiom:
> spl <- split(mtcars, mtcars$cyl)
> str(do.call(rbind, spl))
'data.frame': 32 obs. of 11 variables:
$ mpg : num 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26 30.4 ...
$ cyl : num 4 4 4 4 4 4 4 4 4 4 ...
$ disp: num 108 146.7 140.8 78.7 75.7 ...
$ hp : num 93 62 95 66 52 65 97 66 91 113 ...
$ drat: num 3.85 3.69 3.92 4.08 4.93 4.22 3.7 4.08 4.43 3.77 ...
$ wt : num 2.32 3.19 3.15 2.2 1.61 ...
$ qsec: num 18.6 20 22.9 19.5 18.5 ...
$ vs : num 1 1 1 1 1 1 1 1 0 1 ...
$ am : num 1 0 0 1 1 1 0 1 1 1 ...
$ gear: num 4 4 4 4 4 4 3 4 5 5 ...
$ carb: num 1 2 2 1 2 1 1 1 2 2 ...
Outside of base R, also consider:
data.table::rbindlist() with the side effect of the result being a data.table
dplyr::bind_rows() which despite its somewhat confusing name will bind rows across lists
The answer by Andrew Sannier works but has the side-effect that the rownames get changed. rbind adds the list names to them, so e.g. "Datsun 710" becomes "4.Datsun 710". One can use unname in between to avoid this problem.
Complete example:
mtcars_reorder = mtcars[order(mtcars$cyl), ] #reorder based on cyl first
l1 = split(mtcars_reorder, mtcars_reorder$cyl) #split by cyl
l1 = unname(l1) #remove list names
l2 = do.call(what = "rbind", l1) #unsplit
all(l2 == mtcars_reorder) #check if matches
#> TRUE
I'm renaming the majority of the variables in a data frame and I'm not really impressed with my method.
Therefore, does anyone on SO have a smarter or faster way then the one presented below using only base?
data(mtcars)
# head(mtcars)
temp.mtcars <- mtcars
names(temp.mtcars) <- c((x <- c("mpg", "cyl", "disp")),
gsub('^', "baR.", setdiff(names (mtcars),x)))
str(temp.mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp : num 160 160 108 258 360 ...
$ baR.hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ baR.drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ baR.wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ baR.qsec: num 16.5 17 18.6 19.4 17 ...
$ baR.vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ baR.am : num 1 1 1 0 0 0 0 0 0 0 ...
$ baR.gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ baR.carb: num 4 4 1 1 2 1 4 2 2 4 ...
Edited for answer using base R only
The package plyr has a convenient function rename() that does what you ask. Your modified question specifies using base R only. One easy way of doing this is to simply copy the code from plyr::rename and create your own function.
rename <- function (x, replace) {
old_names <- names(x)
new_names <- unname(replace)[match(old_names, names(replace))]
setNames(x, ifelse(is.na(new_names), old_names, new_names))
}
The function rename takes an argument that is a named vector, where the elements of the vectors are the new names, and the names of the vector are the existing names. There are many ways to construct such a named vector. In the example below I simply use structure.
x <- c("mpg", "disp", "wt")
some.names <- structure(paste0("baR.", x), names=x)
some.names
mpg disp wt
"baR.mpg" "baR.disp" "baR.wt"
Now you are ready to rename:
mtcars <- rename(mtcars, replace=some.names)
The results:
'data.frame': 32 obs. of 11 variables:
$ baR.mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ baR.disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ baR.wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec : num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear : num 4 4 4 3 3 3 3 4 4 4 ...
$ carb : num 4 4 1 1 2 1 4 2 2 4 ...
I would use ifelse:
names(temp.mtcars) <- ifelse(names(mtcars) %in% c("mpg", "cyl", "disp"),
names(mtcars),
paste("bar", names(mtcars), sep = "."))
Nearly the same but without plyr:
data(mtcars)
temp.mtcars <- mtcars
carNames <- names(temp.mtcars)
modifyNames <- !(carNames %in% c("mpg", "cyl", "disp"))
names(temp.mtcars)[modifyNames] <- paste("baR.", carNames[modifyNames], sep="")
Output:
str(temp.mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp : num 160 160 108 258 360 ...
$ baR.hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ baR.drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ baR.wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ baR.qsec: num 16.5 17 18.6 19.4 17 ...
$ baR.vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ baR.am : num 1 1 1 0 0 0 0 0 0 0 ...
$ baR.gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ baR.carb: num 4 4 1 1 2 1 4 2 2 4 ...
You could use the rename.vars function in the gdata package.
It works well when you only want to replace a subset of variable names and where the order of your vector of names is not the same as the order of names in the data.frame.
Adapted from the help file:
library(gdata)
data <- data.frame(x=1:10,y=1:10,z=1:10)
names(data)
data <- rename.vars(data, from=c("z","y"), to=c("Z","Y"))
names(data)
Converts data.frame names:
[1] "x" "y" "z"
to
[1] "x" "Y" "Z"
I.e., Note how this handles the subsetting and the fact that string of names are not in the same order as the names in the data.frame.
names(df)[match(
c('old_var1','old_var2'),
names(df)
)]=c('new_var1', 'new_var2')