How to remove missing values (NA) when uniting columns? - r

I am trying to unite 5 columns into one new column using the Unite function. However, all rows contain lots of NA values, creating variables that look like
Mother|NA|NA|NA|NA
NA|NA|Father|Mother|NA
Mother|Father|NA|Stepmother|NA
I've tried to unite them using this code:
df2 <- df %>%
unite(Parent_full, Parent:Parent5, sep = "|", remove = TRUE, na.rm = TRUE)
But that gives me the following error:
Error: TRUE must evaluate to column positions or names, not a logical vector
I've also looked on the forum, and found that possibly the na.rm function of unite is not active?
Here is some data to recreate my dataset
Name <- c('Paul', 'Edward', 'Mary')
Postalcode <- c('4732', '9045', '3476')
Parent <- c('Mother', 'NA', 'Mother')
Parent2 <- c('NA', 'NA', 'Father')
Parent3 <- c('NA', 'Father', 'NA')
Parent4 <- c('NA', 'Mother', 'Stepmother')
Parent5 <- c('NA', 'NA', 'NA')
df <- data.frame(Name, Postalcode, Parent, Parent2, Parent3, Parent4, Parent5)
Would love to know how to unite my columns without NA's.
UPDATE:
I've now updated the tidyr package and I added "na = c("", "NA")" to my read_csv command.
Now the
df2 <- df %>%
unite(Parent_full, Parent:Parent5, sep = "|", remove = TRUE, na.rm = TRUE)
Command works, however for some reasons the NA at the end of the value stays. Now my columns look like this:
Mother|NA
Father|Mother|NA
Mother|Father|Stepmother|NA
Does anyone know what went wrong now?

You have got couple of problems,
1) the NAs are not reals NA's (Check is.na(df$Parent2))
2) Your columns are factors
While constructing the dataframe use stringsAsFactors = FALSE
df <- data.frame(Name, Postalcode, Parent, Parent2, Parent3, Parent4,
Parent5, stringsAsFactors = FALSE)
and then replace NA and use unite
library(dplyr)
df %>%
na_if('NA') %>%
tidyr::unite(Parent_full, Parent:Parent5, sep = "|", na.rm = TRUE)
# Name Postalcode Parent_full
#1 Paul 4732 Mother
#2 Edward 9045 Father|Mother
#3 Mary 3476 Mother|Father|Stepmother
If the data is already loaded, we can change them by using mutate_if
df %>%
mutate_if(is.factor, as.character) %>%
na_if('NA') %>%
tidyr::unite(Parent_full, Parent:Parent5, sep = "|", na.rm = TRUE)

Your main problem here is that you haven't updated to tidyr 1.0 yet. That error message is the best that the previous version can do with the input na.rm = TRUE, since that argument didn't exist before. It thinks you're giving it a named argument as part of the ....
Specifically, just run install.packages("tidyr") and it should work. You might need to restart R first, so tidyr isn't currently loaded.
If your missing values are "NA" strings, then, as Ronak pointed out, you need to use na_if() on them first. It's strange to me because your initial code chunk makes it look like those are proper NAs, due to the red highlighting. But then your reprex code has 'NA' values which would definitely be strings. Anyway, you say you're reading in from CSV, so, it would be cleaner and quicker to run the CSV-reading code so as to read NAs in properly with an na argument or the like.
Response to Edit: That does seem like a bug, that NAs at the end of the united string don't get properly removed. Well, anyway, the fix is easy, and probably better than anything else we could do:
df2 <- df %>%
unite(Parent_full, Parent:Parent5, sep = "|", na.rm = TRUE) %>%
mutate_at("Parent_full", . %>%
str_remove("(^|\\|)NA$") %>%
na_if(""))
This ensures two things: 1) that the letters "NA" at the end of a string are only removed if they're there because of the unite(), with a pipe (if anything) in front of them; and 2) if there's no non-missing values on a line here, then the value will be a proper NA rather than "NA", "", or what have you, which I assume is what you want.
Update: I've found that the bug applies to any column that contains nothing but NAs, i.e. na.rm = TRUE only removes NAs from columns that have at least one non-missing value. I've filed a bug report: https://github.com/tidyverse/tidyr/issues/765
Given this, though, the optimal solution is probably just to remove any columns that are all NA beforehand. If this is production code, though, then that gets real tricky, since you have to specify the unite() so as to not break if any or even all of the columns to be united are dropped by that prior step.
Update 2: As a response to the bug report pointed out, the issue is actually that that all-missing column is logicals. So that makes the optimal solution: read in such columns as character, or coerce them to character before uniting. Full reprex for that:
library(tidyverse)
Name <- c('Paul', 'Edward', 'Mary')
Postalcode <- c('4732', '9045', '3476')
Parent <- c('Mother', NA, 'Mother')
Parent2 <- c(NA, NA, 'Father')
Parent3 <- c(NA, 'Father', NA)
Parent4 <- c(NA, 'Mother', 'Stepmother')
Parent5 <- c(NA, NA, NA)
(df <- data.frame(Name, Postalcode, Parent, Parent2, Parent3, Parent4, Parent5))
#> Name Postalcode Parent Parent2 Parent3 Parent4 Parent5
#> 1 Paul 4732 Mother <NA> <NA> <NA> NA
#> 2 Edward 9045 <NA> <NA> Father Mother NA
#> 3 Mary 3476 Mother Father <NA> Stepmother NA
(df2 <- df %>%
mutate_at(vars(Parent:Parent5), as.character) %>%
unite(Parent_full, Parent:Parent5, sep = "|", na.rm = TRUE))
#> Name Postalcode Parent_full
#> 1 Paul 4732 Mother
#> 2 Edward 9045 Father|Mother
#> 3 Mary 3476 Mother|Father|Stepmother
Created on 2019-09-27 by the reprex package (v0.3.0)

unite() (and na.rm = TRUE) only works for character columns (as far as I can tell). This isn't made clear in the help docs.
For factors, it also returns the integer code rather than the factor level - something to watch out for.
Numeric: Doesn't remove NAs:
df <- data.frame("to.combine1" = c(NA, 1, 3),
"to.combine2" = c(2, NA, 3))
sapply(df, class) #not functional, just illustrative
#> to.combine1 to.combine2
#> "numeric" "numeric"
unite(df, "combined", to.combine1:to.combine2, sep="_", na.rm = TRUE)
#> combined
#> 1 NA_2
#> 2 1_NA
#> 3 3_3
Factor: Doesn't remove NAs and uses integer code rather than level:
df <- data.frame("to.combine1" = as.character(c(NA, 1, "a")),
"to.combine2" = as.character(c(2, NA, "a")),
stringsAsFactors = TRUE)
sapply(df, class) #not functional, just illustrative
#> to.combine1 to.combine2
#> "factor" "factor"
unite(df, "combined", to.combine1:to.combine2, sep="_", na.rm = TRUE)
#> combined
#>1 NA_1
#>2 1_NA
#>3 2_2
Character: Expected behaviour
df <- data.frame("to.combine1" = as.character(c(NA, 1, "a")),
"to.combine2" = as.character(c(2, NA, "a")),
stringsAsFactors = FALSE)
sapply(df, class) #not functional, just illustrative
#>to.combine1 to.combine2
#>"character" "character"
unite(df, "combined", to.combine1:to.combine2, sep="_", na.rm = TRUE)
#> combined
#> 1 2
#> 2 1
#> 3 a_a

You can remove the NAs later with something like this
df %>%
unite(Parent_full, Parent:Parent5, sep = "|", remove = TRUE) %>%
mutate(Parent_full = gsub("(?<![a-zA-Z])NA\\||\\|NA(?![a-zA-Z])|\\|NA$", '', Parent_full, perl = T))
Name Postalcode Parent_full
1 Paul 4732 Mother
2 Edward 9045 Father|Mother
3 Mary 3476 Mother|Father|Stepmother
It replaces NA| not preceded by a letter or |NA not followed by a letter or |NA at the end of the string, with an empty string

Related

How to recode dataframe values to keep only those that satisfy a certain set, replace others with "other"

I'm looking for a concise solution, preferably using dplyr, to clean up values in a dataframe column so that I can keep as they are values that match a certain set, but others that don't match will be recoded as "other".
Example
I have a dataframe with names of animals. There are 4 legit animal names, but other rows contain gibberish rather than names. I want to clean the column up, to keep only the legit animal names: zebra, lion, cow, or cat.
Data
library(tidyverse)
library(stringi)
real_animals_names <- sample(c("zebra", "cow", "lion", "cat"), size = 50, replace = TRUE)
gibberish <- do.call(paste0, Map(stri_rand_strings, n = 50, length=c(5, 4, 1),
pattern = c('[a-z]', '[0-9]', '[A-Z]')))
df <- tibble(animals = sample(c(animals, gibberish)))
> df
## # A tibble: 100 x 1
## animals
## <chr>
## 1 zebra
## 2 zebra
## 3 rbzal0677O
## 4 lion
## 5 cat
## 6 cfsgt0504G
## 7 cat
## 8 jhixe2566V
## 9 lion
## 10 zebra
## # ... with 90 more rows
One way to solve the problem -- which I find annoying and not concise
Using dplyr 1.0.2
df %>%
mutate(across(animals, recode,
"lion" = "lion",
"zebra" = "zebra",
"cow" = "cow",
"cat" = "cat",
.default = "other"))
This gets it done, but this code repeats each animal name twice, and I find it clunky. Is there a cleaner solution, preferably using dplyr?
EDIT GIVEN SUGGESTED ANSWERS BELOW
Since I do like the readability of dplyr::recode, but dislike having to repeat each animal name twice; and since the answers below utilize %in% – could I incorporate %in% in my own recode solution to make it simpler/more concise?
A base solution:
keep_names <- c('lion', 'zebra', 'cow', 'cat')
within(df, animals[!animals %in% keep_names] <- "other")
A dplyr option with replace():
library(tidyverse)
df %>%
mutate(animals = replace(animals, !animals %in% keep_names, "other"))
With recode(), you can use a named character vector for unquote splicing with !!!.
df %>%
mutate(animals = recode(animals, !!!set_names(keep_names), .default = "other"))
Note: set_names(keep_names) is equivalent to setNames(keep_names, keep_names).
You could keep the animals that you need as it is and turn the rest to "Others" :
library(dplyr)
keep_names <- c('lion', 'zebra', 'cow', 'cat')
df %>% mutate(animals = ifelse(animals %in% keep_names, animals, 'Others'))
I know you asked preferably for a dplyr solution but here a data.table solution (note that I changed the tibble() call to data.table()):
library(stringi)
library(data.table)
real_animals_names <- sample(c("zebra", "cow", "lion", "cat"), size = 50, replace = TRUE)
gibberish <- do.call(paste0, Map(stri_rand_strings, n = 50, length=c(5, 4, 1),
pattern = c('[a-z]', '[0-9]', '[A-Z]')))
df <- data.table(animals = sample(c(real_animals_names, gibberish)))
keep_names <- c("lion", "zebra", "cow", "cat")
df[!animals %in% keep_names, animals := "other"]

How to remove all rows that contain character

Im trying to remove all rows that contain a ? in any column in a data frame. I have 950 rows by 11 columns.
Ive tried this to do it all at once.
dataNew <- data %>% filter_all(all_vars(!grepl("?",.)))
and this to see if i could even get it to work for one column.
dataNew <- data[!grepl('?',data$column),]
Both of these attempts resulted in an empty dataframe. Any help is appreciated, thank you.
We can use fixed = TRUE as ? is a metacharacter (or escape (\\?) or wrap it inside square bracket ([?]) when the default mode for grep is fixed = FALSE
library(dplyr)
data %>%
filter_all(all_vars(!grepl("?",., fixed = TRUE)))
# col1 col2
#1 1 2
Or using across from the devel version of dplyr
data %>%
filter(across(everything(), ~ !grepl("?", ., fixed = TRUE)))
# col1 col2
#1 1 2
Or using base R
data[!Reduce(`|`, lapply(data, grepl, pattern = '?', fixed = TRUE)),]
data
data <- data.frame(col1 = c("?", 1, 3, "?"), col2 = c(1, 2, "?", "?"),
stringsAsFactors = FALSE)

How to rename individual column in dataframe [duplicate]

I know if I have a data frame with more than 1 column, then I can use
colnames(x) <- c("col1","col2")
to rename the columns. How to do this if it's just one column?
Meaning a vector or data frame with only one column.
Example:
trSamp <- data.frame(sample(trainer$index, 10000))
head(trSamp )
# sample.trainer.index..10000.
# 1 5907862
# 2 2181266
# 3 7368504
# 4 1949790
# 5 3475174
# 6 6062879
ncol(trSamp)
# [1] 1
class(trSamp)
# [1] "data.frame"
class(trSamp[1])
# [1] "data.frame"
class(trSamp[,1])
# [1] "numeric"
colnames(trSamp)[2] <- "newname2"
# Error in names(x) <- value :
# 'names' attribute [2] must be the same length as the vector [1]
This is a generalized way in which you do not have to remember the exact location of the variable:
# df = dataframe
# old.var.name = The name you don't like anymore
# new.var.name = The name you want to get
names(df)[names(df) == 'old.var.name'] <- 'new.var.name'
This code pretty much does the following:
names(df) looks into all the names in the df
[names(df) == old.var.name] extracts the variable name you want to check
<- 'new.var.name' assigns the new variable name.
colnames(trSamp)[2] <- "newname2"
attempts to set the second column's name. Your object only has one column, so the command throws an error. This should be sufficient:
colnames(trSamp) <- "newname2"
colnames(df)[colnames(df) == 'oldName'] <- 'newName'
This is an old question, but it is worth noting that you can now use setnames from the data.table package.
library(data.table)
setnames(DF, "oldName", "newName")
# or since the data.frame in question is just one column:
setnames(DF, "newName")
# And for reference's sake, in general (more than once column)
nms <- c("col1.name", "col2.name", etc...)
setnames(DF, nms)
This can also be done using Hadley's plyr package, and the rename function.
library(plyr)
df <- data.frame(foo=rnorm(1000))
df <- rename(df,c('foo'='samples'))
You can rename by the name (without knowing the position) and perform multiple renames at once. After doing a merge, for example, you might end up with:
letterid id.x id.y
1 70 2 1
2 116 6 5
3 116 6 4
4 116 6 3
5 766 14 9
6 766 14 13
Which you can then rename in one step using:
letters <- rename(letters,c("id.x" = "source", "id.y" = "target"))
letterid source target
1 70 2 1
2 116 6 5
3 116 6 4
4 116 6 3
5 766 14 9
6 766 14 13
I think the best way of renaming columns is by using the dplyr package like this:
require(dplyr)
df = rename(df, new_col01 = old_col01, new_col02 = old_col02, ...)
It works the same for renaming one or many columns in any dataset.
I find that the most convenient way to rename a single column is using dplyr::rename_at :
library(dplyr)
cars %>% rename_at("speed",~"new") %>% head
cars %>% rename_at(vars(speed),~"new") %>% head
cars %>% rename_at(1,~"new") %>% head
# new dist
# 1 4 2
# 2 4 10
# 3 7 4
# 4 7 22
# 5 8 16
# 6 9 10
works well in pipe chaines
convenient when names are stored in variables
works with a name or an column index
clear and compact
I like the next style for rename dataframe column names one by one.
colnames(df)[which(colnames(df) == 'old_colname')] <- 'new_colname'
where
which(colnames(df) == 'old_colname')
returns by the index of the specific column.
Let df be the dataframe you have with col names myDays and temp.
If you want to rename "myDays" to "Date",
library(plyr)
rename(df,c("myDays" = "Date"))
or with pipe, you can
dfNew <- df %>%
plyr::rename(c("myDays" = "Date"))
Try:
colnames(x)[2] <- 'newname2'
This is likely already out there, but I was playing with renaming fields while searching out a solution and tried this on a whim. Worked for my purposes.
Table1$FieldNewName <- Table1$FieldOldName
Table1$FieldOldName <- NULL
Edit begins here....
This works as well.
df <- rename(df, c("oldColName" = "newColName"))
You can use the rename.vars in the gdata package.
library(gdata)
df <- rename.vars(df, from = "oldname", to = "newname")
This is particularly useful where you have more than one variable name to change or you want to append or pre-pend some text to the variable names, then you can do something like:
df <- rename.vars(df, from = c("old1", "old2", "old3",
to = c("new1", "new2", "new3"))
For an example of appending text to a subset of variables names see:
https://stackoverflow.com/a/28870000/180892
You could also try 'upData' from 'Hmisc' package.
library(Hmisc)
trSamp = upData(trSamp, rename=c(sample.trainer.index..10000. = 'newname2'))
If you know that your dataframe has only one column, you can use:
names(trSamp) <- "newname2"
The OP's question has been well and truly answered. However, here's a trick that may be useful in some situations: partial matching of the column name, irrespective of its position in a dataframe:
Partial matching on the name:
d <- data.frame(name1 = NA, Reported.Cases..WHO..2011. = NA, name3 = NA)
## name1 Reported.Cases..WHO..2011. name3
## 1 NA NA NA
names(d)[grepl("Reported", names(d))] <- "name2"
## name1 name2 name3
## 1 NA NA NA
Another example: partial matching on the presence of "punctuation":
d <- data.frame(name1 = NA, Reported.Cases..WHO..2011. = NA, name3 = NA)
## name1 Reported.Cases..WHO..2011. name3
## 1 NA NA NA
names(d)[grepl("[[:punct:]]", names(d))] <- "name2"
## name1 name2 name3
## 1 NA NA NA
These were examples I had to deal with today, I thought might be worth sharing.
I would simply change a column name to the dataset with the new name I want with the following code:
names(dataset)[index_value] <- "new_col_name"
I found colnames() argument easier
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/row%2Bcolnames
select some column from the data frame
df <- data.frame(df[, c( "hhid","b1005", "b1012_imp", "b3004a")])
and rename the selected column in order,
colnames(df) <- c("hhid", "income", "cost", "credit")
check the names and the values to be sure
names(df);head(df)
I would simply add a new column to the data frame with the name I want and get the data for it from the existing column. like this:
dataf$value=dataf$Article1Order
then I remove the old column! like this:
dataf$Article1Order<-NULL
This code might seem silly! But it works perfectly...
We can use rename_with to rename columns with a function (stringr functions, for example).
Consider the following data df_1:
df_1 <- data.frame(
x = replicate(n = 3, expr = rnorm(n = 3, mean = 10, sd = 1)),
y = sample(x = 1:2, size = 10, replace = TRUE)
)
names(df_1)
#[1] "x.1" "x.2" "x.3" "y"
Rename all variables with dplyr::everything():
library(tidyverse)
df_1 %>%
rename_with(.data = ., .cols = everything(.),
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "var_1" "var_2" "var_3" "var_4"
Rename by name particle with some dplyr verbs (starts_with, ends_with, contains, matches, ...).
Example with . (x variables):
df_1 %>%
rename_with(.data = ., .cols = contains('.'),
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "var_1" "var_2" "var_3" "y"
Rename by class with many functions of class test, like is.integer, is.numeric, is.factor...
Example with is.integer (y):
df_1 %>%
rename_with(.data = ., .cols = is.integer,
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "x.1" "x.2" "x.3" "var_1"
The warning:
Warning messages:
1: In stri_replace_first_regex(string, pattern, fix_replacement(replacement), :
longer object length is not a multiple of shorter object length
2: In names[cols] <- .fn(names[cols], ...) :
number of items to replace is not a multiple of replacement length
It is not relevant, as it is just an inconsistency of seq_along(.) with the replace function.
library(dplyr)
rename(data, de=de.y)

colMeans not working in R

The data set I have as file Dummy.txt is as follows
A|B|C|D
1|2|1.9|5
2.5|5|53|3
4|48|49|0.4
8|94|495|B6
(please note a text character in 5th row, 4th column)
I would like to obtain the mean of each column (i.e. column A, B, C and D).
The code I am using is as follows:
mydata_1 <- read.delim("Dummy.txt", skipNul = TRUE, sep = "|", header = FALSE, row.names = NULL)
mydata_1 <- as.numeric(as.character(mydata_1))
colMeans(mydata_1, na.rm = TRUE,)
However, this doesn't seem to be working. Any suggestions please?
You need to set header = TRUE to have the A|B|C|D row be used for column names, otherwise they are included as values, and all columns are parsed as string columns.
Then, passing stringsAsFactors = FALSE prevents columns D from being turned into a factor, and then the value 'B6' will automatically be turned into an NA when converted to a numeric type.
mydata_1 <- read.delim("Dummy.txt", skipNul = TRUE, sep = "|", header = TRUE,
row.names = NULL, stringsAsFactors = FALSE)
mydata_1[] <- lapply(mydata_1, as.numeric)
#> Warning message:
#> In lapply(mydata_1, as.numeric) : NAs introduced by coercion
colMeans(mydata_1, na.rm = TRUE)
#> A B C D
#> 3.875 37.250 149.725 2.800
The syntax mydata_1[] <- ... makes mydata_1 keep its data frame structure even though a list is being returned on the right-hand side.
The problem here is that as.numeric(as.character(mydata_1)) returns [1] NA NA NA NA.
My suggestion would be to first go through all columns and coerce the types using sapply(), and then calculate the means of the columns:
library(magrittr)
mydata_1 %>%
sapply(., function(col) as.numeric(as.character(col))) %>%
colMeans(na.rm = TRUE)
This will return:
A B C D
3.875 37.250 149.725 2.800
Note: I am using magrittr to make use of the pipe (%>%) operator to chain the operations so you can check the output of every step.

How to rename a single column in a data.frame?

I know if I have a data frame with more than 1 column, then I can use
colnames(x) <- c("col1","col2")
to rename the columns. How to do this if it's just one column?
Meaning a vector or data frame with only one column.
Example:
trSamp <- data.frame(sample(trainer$index, 10000))
head(trSamp )
# sample.trainer.index..10000.
# 1 5907862
# 2 2181266
# 3 7368504
# 4 1949790
# 5 3475174
# 6 6062879
ncol(trSamp)
# [1] 1
class(trSamp)
# [1] "data.frame"
class(trSamp[1])
# [1] "data.frame"
class(trSamp[,1])
# [1] "numeric"
colnames(trSamp)[2] <- "newname2"
# Error in names(x) <- value :
# 'names' attribute [2] must be the same length as the vector [1]
This is a generalized way in which you do not have to remember the exact location of the variable:
# df = dataframe
# old.var.name = The name you don't like anymore
# new.var.name = The name you want to get
names(df)[names(df) == 'old.var.name'] <- 'new.var.name'
This code pretty much does the following:
names(df) looks into all the names in the df
[names(df) == old.var.name] extracts the variable name you want to check
<- 'new.var.name' assigns the new variable name.
colnames(trSamp)[2] <- "newname2"
attempts to set the second column's name. Your object only has one column, so the command throws an error. This should be sufficient:
colnames(trSamp) <- "newname2"
colnames(df)[colnames(df) == 'oldName'] <- 'newName'
This is an old question, but it is worth noting that you can now use setnames from the data.table package.
library(data.table)
setnames(DF, "oldName", "newName")
# or since the data.frame in question is just one column:
setnames(DF, "newName")
# And for reference's sake, in general (more than once column)
nms <- c("col1.name", "col2.name", etc...)
setnames(DF, nms)
This can also be done using Hadley's plyr package, and the rename function.
library(plyr)
df <- data.frame(foo=rnorm(1000))
df <- rename(df,c('foo'='samples'))
You can rename by the name (without knowing the position) and perform multiple renames at once. After doing a merge, for example, you might end up with:
letterid id.x id.y
1 70 2 1
2 116 6 5
3 116 6 4
4 116 6 3
5 766 14 9
6 766 14 13
Which you can then rename in one step using:
letters <- rename(letters,c("id.x" = "source", "id.y" = "target"))
letterid source target
1 70 2 1
2 116 6 5
3 116 6 4
4 116 6 3
5 766 14 9
6 766 14 13
I think the best way of renaming columns is by using the dplyr package like this:
require(dplyr)
df = rename(df, new_col01 = old_col01, new_col02 = old_col02, ...)
It works the same for renaming one or many columns in any dataset.
I find that the most convenient way to rename a single column is using dplyr::rename_at :
library(dplyr)
cars %>% rename_at("speed",~"new") %>% head
cars %>% rename_at(vars(speed),~"new") %>% head
cars %>% rename_at(1,~"new") %>% head
# new dist
# 1 4 2
# 2 4 10
# 3 7 4
# 4 7 22
# 5 8 16
# 6 9 10
works well in pipe chaines
convenient when names are stored in variables
works with a name or an column index
clear and compact
I like the next style for rename dataframe column names one by one.
colnames(df)[which(colnames(df) == 'old_colname')] <- 'new_colname'
where
which(colnames(df) == 'old_colname')
returns by the index of the specific column.
Let df be the dataframe you have with col names myDays and temp.
If you want to rename "myDays" to "Date",
library(plyr)
rename(df,c("myDays" = "Date"))
or with pipe, you can
dfNew <- df %>%
plyr::rename(c("myDays" = "Date"))
Try:
colnames(x)[2] <- 'newname2'
This is likely already out there, but I was playing with renaming fields while searching out a solution and tried this on a whim. Worked for my purposes.
Table1$FieldNewName <- Table1$FieldOldName
Table1$FieldOldName <- NULL
Edit begins here....
This works as well.
df <- rename(df, c("oldColName" = "newColName"))
You can use the rename.vars in the gdata package.
library(gdata)
df <- rename.vars(df, from = "oldname", to = "newname")
This is particularly useful where you have more than one variable name to change or you want to append or pre-pend some text to the variable names, then you can do something like:
df <- rename.vars(df, from = c("old1", "old2", "old3",
to = c("new1", "new2", "new3"))
For an example of appending text to a subset of variables names see:
https://stackoverflow.com/a/28870000/180892
You could also try 'upData' from 'Hmisc' package.
library(Hmisc)
trSamp = upData(trSamp, rename=c(sample.trainer.index..10000. = 'newname2'))
If you know that your dataframe has only one column, you can use:
names(trSamp) <- "newname2"
The OP's question has been well and truly answered. However, here's a trick that may be useful in some situations: partial matching of the column name, irrespective of its position in a dataframe:
Partial matching on the name:
d <- data.frame(name1 = NA, Reported.Cases..WHO..2011. = NA, name3 = NA)
## name1 Reported.Cases..WHO..2011. name3
## 1 NA NA NA
names(d)[grepl("Reported", names(d))] <- "name2"
## name1 name2 name3
## 1 NA NA NA
Another example: partial matching on the presence of "punctuation":
d <- data.frame(name1 = NA, Reported.Cases..WHO..2011. = NA, name3 = NA)
## name1 Reported.Cases..WHO..2011. name3
## 1 NA NA NA
names(d)[grepl("[[:punct:]]", names(d))] <- "name2"
## name1 name2 name3
## 1 NA NA NA
These were examples I had to deal with today, I thought might be worth sharing.
I would simply change a column name to the dataset with the new name I want with the following code:
names(dataset)[index_value] <- "new_col_name"
I found colnames() argument easier
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/row%2Bcolnames
select some column from the data frame
df <- data.frame(df[, c( "hhid","b1005", "b1012_imp", "b3004a")])
and rename the selected column in order,
colnames(df) <- c("hhid", "income", "cost", "credit")
check the names and the values to be sure
names(df);head(df)
I would simply add a new column to the data frame with the name I want and get the data for it from the existing column. like this:
dataf$value=dataf$Article1Order
then I remove the old column! like this:
dataf$Article1Order<-NULL
This code might seem silly! But it works perfectly...
We can use rename_with to rename columns with a function (stringr functions, for example).
Consider the following data df_1:
df_1 <- data.frame(
x = replicate(n = 3, expr = rnorm(n = 3, mean = 10, sd = 1)),
y = sample(x = 1:2, size = 10, replace = TRUE)
)
names(df_1)
#[1] "x.1" "x.2" "x.3" "y"
Rename all variables with dplyr::everything():
library(tidyverse)
df_1 %>%
rename_with(.data = ., .cols = everything(.),
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "var_1" "var_2" "var_3" "var_4"
Rename by name particle with some dplyr verbs (starts_with, ends_with, contains, matches, ...).
Example with . (x variables):
df_1 %>%
rename_with(.data = ., .cols = contains('.'),
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "var_1" "var_2" "var_3" "y"
Rename by class with many functions of class test, like is.integer, is.numeric, is.factor...
Example with is.integer (y):
df_1 %>%
rename_with(.data = ., .cols = is.integer,
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "x.1" "x.2" "x.3" "var_1"
The warning:
Warning messages:
1: In stri_replace_first_regex(string, pattern, fix_replacement(replacement), :
longer object length is not a multiple of shorter object length
2: In names[cols] <- .fn(names[cols], ...) :
number of items to replace is not a multiple of replacement length
It is not relevant, as it is just an inconsistency of seq_along(.) with the replace function.
library(dplyr)
rename(data, de=de.y)

Resources