I'm still learning some very basic concepts in R. I have an excel file that I import into R, but it has atrocious variable names. I have another file with 2 columns: the first is the original column names in my data file, the second is what I want the variable names to be.
What's the most efficient way to update all column names using this auxiliary file that I have?
Since names or colnames of data frames is a character vector and every column in a data frame is an atomic vector, simply re-assign original with new names.
str(names_df)
# EXPECTED TWO COLUMNS OF CHR TYPE
# RE-ORDER COLUMNS BY PASSING CHARACTER VECTOR
excel_df <- excel_df[names_df$original_names]
# RE-ASSIGN NAMES: TWO METHODS
names(excel_df) <- names_df$new_names
excel_df <- setNames(excel_df, names_df$new_names)
This method is a little overkill if all columns are perfectly accounted for and in the correct order. If there are columns out of order or new columns, however, this method is robust by only changing those you intend to change and are found.
mt <- mtcars
head(mt, 3)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
namechange <- data.frame(oldname = c("mpg", "cyl", "hp"), newname = c("MPG", "CYL", "HP"))
namechange
# oldname newname
# 1 mpg MPG
# 2 cyl CYL
# 3 hp HP
ind <- match(names(mtcars), namechange$oldname)
ind
# [1] 1 2 NA 3 NA NA NA NA NA NA NA
ifelse(is.na(ind), names(mt), namechange$newname[ind])
# [1] "MPG" "CYL" "disp" "HP" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
names(mt) <- ifelse(is.na(ind), names(mt), namechange$newname[ind])
head(mt, 3)
# MPG CYL disp HP drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
you can use the rename command
library(dplyr)
#Exemple with base mtcars
View(mtcars)
new_mtcars<-mtcars%>%
rename('new_MPG'='mpg', 'new_cyl'='cyl')#Change the columns mpg and cyl
View(new_mtcars)
output
Related
I need to merge all the csv files in a folder into one csv file. However I need a empty row between each of the file's contents in the merged CSV file. This is to help differentiate the different files and put into the correct formatting for later. Below I have attached the working code that merges the files using lapply, and I would appreciate any ideas on how I can modify this code to add in an empty line before each merge. Thanks.
filenames <- list.files(full.names=TRUE)
Combined <- lapply(filenames,function(x){
read.csv(x, header=FALSE)})
You just add a row of NA values at the end of each dataframe before you rbind the dataframes together.
For example:
All <- lapply(filenames,function(i){
dat = read.csv(i, header=FALSE)
dat[nrow(dat)+1,] = NA
return(dat)
})
Try adding a blank (NA) row to each frame before writing:
list_of_frames <- list(head(mtcars, 3), head(mtcars, 2))
lapply(list_of_frames, function(x) { x[nrow(x)+1,] <- NA; x})
# [[1]]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# 4 NA NA NA NA NA NA NA NA NA NA NA
# [[2]]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
# 3 NA NA NA NA NA NA NA NA NA NA NA
I am trying to implement advice I am finding in the web but I am halfway where I want to go.
Here is a reproducible example:
library(tidyverse)
library(dplyr)
library(rlang)
data(mtcars)
filter_expr = "am == 1"
mutate_expr = "gear_carb = gear*carb"
select_expr = "mpg , cyl"
mtcars %>% filter_(filter_expr) %>% mutate_(mutate_expr) %>% select_(select_expr)
The filter expression works fine.
The mutate expression works as well but the new variable has the name gear_carb = gear*carb instead of the intended gear_carb.
Finally, the select expression returns an exception.
As mentioned in the comments, the underscore versions of dplyr verbs are now deprecated. The correct approach is to use quasiquotation.
To address your issue with select, you simply need to modify select_expr to contain multiple expressions:
## I renamed your variables to *_str because they are, well, strings.
filter_str <- "am == 1"
mutate_str <- "gear_carb = gear*carb"
select_str <- "mpg; cyl" # Note the ;
Use rlang::parse_expr to convert these strings to unevaluated expressions:
## Notice the plural parse_exprs, which parses a list of expressions
filter_expr <- rlang::parse_expr( filter_str )
mutate_expr <- rlang::parse_expr( mutate_str )
select_expr <- rlang::parse_exprs( select_str )
Given the unevaluated expressions, we can now pass them to the dplyr verbs. Writing filter( filter_expr ) won't work because filter will look for a column named filter_expr in your data frame. Instead, we want to access the expression stored inside filter_expr. To do this we use the !! operator to let dplyr verbs know that the argument should be expanded to its contents (which is the unevaluated expressions we are interested in):
mtcars %>% filter( !!filter_expr )
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 4 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
mtcars %>% mutate( !!mutate_expr )
# mpg cyl disp hp drat wt qsec vs am gear carb gear_carb = gear * carb
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 16
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 16
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 3
In case of select, we have multiple expressions, which is handled by !!! instead:
mtcars %>% select( !!!select_expr )
# mpg cyl
# Mazda RX4 21.0 6
# Mazda RX4 Wag 21.0 6
# Datsun 710 22.8 4
P.S. It's also worth mentioning that select works directly with string vectors, without having to rlang::parse_expr() them first:
mtcars %>% select( c("mpg", "cyl") )
# mpg cyl
# Mazda RX4 21.0 6
# Mazda RX4 Wag 21.0 6
# Datsun 710 22.8 4
This question already has answers here:
Rename multiple columns by names
(20 answers)
Closed 4 years ago.
While this is easy to do with base R or setnames in data.table or rename_ in dplyr 0.5. Since rename_ is deprecated, I couldn't find an easy way to do this in dplyr 0.6.0.
Below is an example. I want to replace column name in col.from with corresponding values in col.to:
col.from <- c("wt", "hp", "vs")
col.to <- c("foo", "bar", "baz")
df <- mtcars
head(df, 2)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Expected output:
names(df)[match(col.from, names(df))] <- col.to
head(df, 2)
#> mpg cyl disp bar drat foo qsec baz am gear carb
#> Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
How can I do this with rename or rename_at in dplyr 0.6.0?
I don't know if this is the right way to approach it, but
library(dplyr)
df %>% rename_at(vars(col.from), function(x) col.to) %>% head(2)
# mpg cyl disp bar drat foo qsec baz am gear carb
# Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Also note that I live in the future:
# packageVersion("dplyr")
# # [1] ‘0.7.0’
I'm learning R programming as such have hit a few problems - and with your help have been able to fix them.
But I now have a need to rename columns of a data frame. I have a translation data frame with 2 columns that contains the column names and what the new columns should be called.
Here is my code: my question is how do I select the two columns from the trans dataframe and use them here as trans$old and trans$new variables?
I have 7 columns I'm renaming, and this might be even longer hence the translation table.
replace_header <- function()
{
names(industries)[names(industries)==trans$old] <- trans$new
replaced <- industries
return (replaced)
}
replaced_industries <- replace_header()
Here's an example using the built-in mtcars data frame. We'll use the match function to find the indices of the columns names we want to replace and then replace them with new names.
# Copy of built-in data frame
mt = mtcars
head(mt,3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Data frame with column name substitutions
dat = data.frame(old=c("mpg","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
dat
old new
1 mpg new.name1
2 am new.name2
Use match to find the indices of the "old" names in the mt data frame:
match(dat[,"old"], names(mt))
[1] 1 9
Substitute "old" names with "new" names:
names(mt)[match(dat[,"old"], names(mt))] = dat[,"new"]
head(mt,3)
new.name1 cyl disp hp drat wt qsec vs new.name2 gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
I'd recommend setnames from "data.table" for this. Using #eipi10's example:
mt = mtcars
dat = data.frame(old=c("mpg","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
library(data.table)
setnames(mt, dat$old, dat$new)
names(mt)
# [1] "new.name1" "cyl" "disp" "hp" "drat" "wt"
# [7] "qsec" "vs" "new.name2" "gear" "carb"
If there's a concern as indicated by #jmbadia that the data.frame with the old and new names, you can add skip_absent=TRUE to setnames.
improving a bit the eipi10's answer, if we want to use a "rename dataframe" with old names not always present on the mt dataframe (e.g. because mt is provided by differnt sources so we don't always know its colnames) we can consider the following code
mt = mtcars
head(mt,3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# dataframe with possible names to replace
dat = data.frame(old=c("strangeName","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
# find which old names are present in mt
namesMatched <- dat[dat$old %in% names(mt)
#renaming
names(mt)[match(namesMatched,"old"], names(mt))] = dat[namesMatched,"new"]
head(mt,3)
mpg cyl disp hp drat wt qsec vs new.name2 gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
I have a large data frame that contains 900 variables per row. I am trying to write a function that gives me the name of each column that contains a NA for each row.
For example:
x->
mpg cyl disp hp draw wt
Mazda RX4 21.0 6 160 110 NA 2.62
Mazda RX4 Wag 21.0 6 NA 110 3.90 NA
Datsun 710 22.8 4 NA 93 NA NA
I would like a function to return:
Mazda RX4: "draw"
Mazda RX4 Wag: "disp", "wt"
Datsun 710: "disp","draw","wt"
Run apply by row to select from colnames(x). Probably going to get a list since the result is ragged.
apply(x, 1, function(x2) colnames(x)[ is.na(x2) ] )
$`Mazda RX4`
[1] "draw"
$`Mazda RX4 Wag`
[1] "disp" "wt"
$`Datsun 710`
[1] "disp" "draw" "wt"