I don't know how to find how rows begin with a certain letter in a data frame.
For example.
mtcars
mpg cyl disp hp drat wt ...
Mazda RX4 21.0 6 160 110 3.90 2.62 ...
Mazda RX4 Wag 21.0 6 160 110 3.90 2.88 ...
Datsun 710 22.8 4 108 93 3.85 2.32 ...
I want to find out how many cars begins with the letter 'M'
Thanks
You can do
sum(startsWith(rownames(mtcars), "M"))
# [1] 10
Other less efficient possibilities include
sum(grepl("^M", rownames(mtcars)))
# [1] 10
length(grep("^M", rownames(mtcars)))
# [1] 10
sum(regexpr("^M", rownames(mtcars)) == 1L)
# [1] 10
sum(substr(rownames(mtcars), 1, 1) == "M")
# [1] 10
Another good way to do this is with stringr package
library(stringr)
str_view(rownames(mtcars),'^M') # to see all results or
str_view(rownames(mtcars),'^M', match = T) # to see only the match results
Related
I'm still learning some very basic concepts in R. I have an excel file that I import into R, but it has atrocious variable names. I have another file with 2 columns: the first is the original column names in my data file, the second is what I want the variable names to be.
What's the most efficient way to update all column names using this auxiliary file that I have?
Since names or colnames of data frames is a character vector and every column in a data frame is an atomic vector, simply re-assign original with new names.
str(names_df)
# EXPECTED TWO COLUMNS OF CHR TYPE
# RE-ORDER COLUMNS BY PASSING CHARACTER VECTOR
excel_df <- excel_df[names_df$original_names]
# RE-ASSIGN NAMES: TWO METHODS
names(excel_df) <- names_df$new_names
excel_df <- setNames(excel_df, names_df$new_names)
This method is a little overkill if all columns are perfectly accounted for and in the correct order. If there are columns out of order or new columns, however, this method is robust by only changing those you intend to change and are found.
mt <- mtcars
head(mt, 3)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
namechange <- data.frame(oldname = c("mpg", "cyl", "hp"), newname = c("MPG", "CYL", "HP"))
namechange
# oldname newname
# 1 mpg MPG
# 2 cyl CYL
# 3 hp HP
ind <- match(names(mtcars), namechange$oldname)
ind
# [1] 1 2 NA 3 NA NA NA NA NA NA NA
ifelse(is.na(ind), names(mt), namechange$newname[ind])
# [1] "MPG" "CYL" "disp" "HP" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
names(mt) <- ifelse(is.na(ind), names(mt), namechange$newname[ind])
head(mt, 3)
# MPG CYL disp HP drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
you can use the rename command
library(dplyr)
#Exemple with base mtcars
View(mtcars)
new_mtcars<-mtcars%>%
rename('new_MPG'='mpg', 'new_cyl'='cyl')#Change the columns mpg and cyl
View(new_mtcars)
output
I have a dataframe of the following form
"column1"
"column2"
1
5
2
6
3
7
How do I remove the quotation mark from the column names? I've tried using gsub but I can't quote quotation marks haha. Also need a way to do this that isn't just names(data) <- c("column1", "column2"). Thank you all!
You can use gsub with single-quotes in order to reference the double-quote character for replacement:
names(df) = gsub('"', "", names(df))
Test:
# Set up data
d = mtcars[1:3, 1:4]
names(d)[1:2] = c('"column1"', '"column2"')
names(d)
#> [1] "\"column1\"" "\"column2\"" "disp" "hp"
d
#> "column1" "column2" disp hp
#> Mazda RX4 21.0 6 160 110
#> Mazda RX4 Wag 21.0 6 160 110
#> Datsun 710 22.8 4 108 93
# Remove quotation marks from column names
names(d) = gsub('"', "", names(d))
names(d)
#> [1] "column1" "column2" "disp" "hp"
d
#> column1 column2 disp hp
#> Mazda RX4 21.0 6 160 110
#> Mazda RX4 Wag 21.0 6 160 110
#> Datsun 710 22.8 4 108 93
Created on 2021-01-19 by the reprex package (v0.3.0)
I am trying to implement advice I am finding in the web but I am halfway where I want to go.
Here is a reproducible example:
library(tidyverse)
library(dplyr)
library(rlang)
data(mtcars)
filter_expr = "am == 1"
mutate_expr = "gear_carb = gear*carb"
select_expr = "mpg , cyl"
mtcars %>% filter_(filter_expr) %>% mutate_(mutate_expr) %>% select_(select_expr)
The filter expression works fine.
The mutate expression works as well but the new variable has the name gear_carb = gear*carb instead of the intended gear_carb.
Finally, the select expression returns an exception.
As mentioned in the comments, the underscore versions of dplyr verbs are now deprecated. The correct approach is to use quasiquotation.
To address your issue with select, you simply need to modify select_expr to contain multiple expressions:
## I renamed your variables to *_str because they are, well, strings.
filter_str <- "am == 1"
mutate_str <- "gear_carb = gear*carb"
select_str <- "mpg; cyl" # Note the ;
Use rlang::parse_expr to convert these strings to unevaluated expressions:
## Notice the plural parse_exprs, which parses a list of expressions
filter_expr <- rlang::parse_expr( filter_str )
mutate_expr <- rlang::parse_expr( mutate_str )
select_expr <- rlang::parse_exprs( select_str )
Given the unevaluated expressions, we can now pass them to the dplyr verbs. Writing filter( filter_expr ) won't work because filter will look for a column named filter_expr in your data frame. Instead, we want to access the expression stored inside filter_expr. To do this we use the !! operator to let dplyr verbs know that the argument should be expanded to its contents (which is the unevaluated expressions we are interested in):
mtcars %>% filter( !!filter_expr )
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 4 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
mtcars %>% mutate( !!mutate_expr )
# mpg cyl disp hp drat wt qsec vs am gear carb gear_carb = gear * carb
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 16
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 16
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 3
In case of select, we have multiple expressions, which is handled by !!! instead:
mtcars %>% select( !!!select_expr )
# mpg cyl
# Mazda RX4 21.0 6
# Mazda RX4 Wag 21.0 6
# Datsun 710 22.8 4
P.S. It's also worth mentioning that select works directly with string vectors, without having to rlang::parse_expr() them first:
mtcars %>% select( c("mpg", "cyl") )
# mpg cyl
# Mazda RX4 21.0 6
# Mazda RX4 Wag 21.0 6
# Datsun 710 22.8 4
This question already has answers here:
Rename multiple columns by names
(20 answers)
Closed 4 years ago.
While this is easy to do with base R or setnames in data.table or rename_ in dplyr 0.5. Since rename_ is deprecated, I couldn't find an easy way to do this in dplyr 0.6.0.
Below is an example. I want to replace column name in col.from with corresponding values in col.to:
col.from <- c("wt", "hp", "vs")
col.to <- c("foo", "bar", "baz")
df <- mtcars
head(df, 2)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Expected output:
names(df)[match(col.from, names(df))] <- col.to
head(df, 2)
#> mpg cyl disp bar drat foo qsec baz am gear carb
#> Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
How can I do this with rename or rename_at in dplyr 0.6.0?
I don't know if this is the right way to approach it, but
library(dplyr)
df %>% rename_at(vars(col.from), function(x) col.to) %>% head(2)
# mpg cyl disp bar drat foo qsec baz am gear carb
# Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Also note that I live in the future:
# packageVersion("dplyr")
# # [1] ‘0.7.0’
I'm learning R programming as such have hit a few problems - and with your help have been able to fix them.
But I now have a need to rename columns of a data frame. I have a translation data frame with 2 columns that contains the column names and what the new columns should be called.
Here is my code: my question is how do I select the two columns from the trans dataframe and use them here as trans$old and trans$new variables?
I have 7 columns I'm renaming, and this might be even longer hence the translation table.
replace_header <- function()
{
names(industries)[names(industries)==trans$old] <- trans$new
replaced <- industries
return (replaced)
}
replaced_industries <- replace_header()
Here's an example using the built-in mtcars data frame. We'll use the match function to find the indices of the columns names we want to replace and then replace them with new names.
# Copy of built-in data frame
mt = mtcars
head(mt,3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Data frame with column name substitutions
dat = data.frame(old=c("mpg","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
dat
old new
1 mpg new.name1
2 am new.name2
Use match to find the indices of the "old" names in the mt data frame:
match(dat[,"old"], names(mt))
[1] 1 9
Substitute "old" names with "new" names:
names(mt)[match(dat[,"old"], names(mt))] = dat[,"new"]
head(mt,3)
new.name1 cyl disp hp drat wt qsec vs new.name2 gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
I'd recommend setnames from "data.table" for this. Using #eipi10's example:
mt = mtcars
dat = data.frame(old=c("mpg","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
library(data.table)
setnames(mt, dat$old, dat$new)
names(mt)
# [1] "new.name1" "cyl" "disp" "hp" "drat" "wt"
# [7] "qsec" "vs" "new.name2" "gear" "carb"
If there's a concern as indicated by #jmbadia that the data.frame with the old and new names, you can add skip_absent=TRUE to setnames.
improving a bit the eipi10's answer, if we want to use a "rename dataframe" with old names not always present on the mt dataframe (e.g. because mt is provided by differnt sources so we don't always know its colnames) we can consider the following code
mt = mtcars
head(mt,3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# dataframe with possible names to replace
dat = data.frame(old=c("strangeName","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
# find which old names are present in mt
namesMatched <- dat[dat$old %in% names(mt)
#renaming
names(mt)[match(namesMatched,"old"], names(mt))] = dat[namesMatched,"new"]
head(mt,3)
mpg cyl disp hp drat wt qsec vs new.name2 gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1