Adding a column to a tibble using string matching - r

I am trying to add a new column into a data frame by matching words from a different column. To use mtcars as an example, I want to create a column "country" by scanning each rowname for a string. To go through the first few rows in pseudocode:
if "Mazda" in rowname then "Japan"
if "Datsun" in rowname then "Japan"
if "Hornet" in rowname then "USA"
etc
I've tried using mutate with the map function, but to no avail.
Any help would be appreciated.

You want to use case_when() or ifelse():
library(dplyr)
mt <- head(mtcars, 5)
mt %>%
mutate(new_col = case_when(
mpg == 21.0 ~ "new",
TRUE ~ "A"
))
mpg cyl disp hp drat wt qsec vs am gear carb new_col
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 new
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 new
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 A
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 A
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 A

Instead of using multiple if/else or nested ifelse, we can have a key/val dataset and join with the original data
library(tibble)
library(dplyr
library(stringr)
keyvaldat <- tibble(make = c("Mazda", "Datsun", "Hornet"),
Country = c("Japan", "Japan", "USA"))
rownames_to_column(mtcars, "rn") %>%
mutate(make = word(rn, 1)) %>%
left_join(keyvaldat) %>%
head(4)
# rn mpg cyl disp hp drat wt qsec vs am gear carb make Country
#1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda Japan
#2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Mazda Japan
#3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Datsun Japan
#4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet USA
NOTE: If there are 100- values to change, then it would be 100 ifelse statements. Better would be a join

Use a named vector (x in this example)
library(dplyr)
x = c(Mazda = "Japan", Datsun = "Japan", Hornet = "USA")
mtcars %>%
mutate(Make = row.names(.)) %>%
select(Make) %>%
mutate(Country = x[sapply(strsplit(Make, " "), function(x) x[1])])

Related

Getting the total of all rows but excluding certain columns Rstudio

I need to get the total of each row within my table however I want to exclude certain columns as these contain numeric data such as plot numbers or treatments that I don't want to be counted.
I have tried using mutate and rowsums for this but it doesn't seem to work and I get this error:
Error in UseMethod("mutate") :
no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"
mutate(total=rowSums(select(Flower,-Survey, -Date, -Recorder, -Site, -Block, -Plot, -Treatment)))
Following a comment here is my updated code:
df<- mutate(total = rowSums(select(Flower, !c(Ranunculus.repens, Ranunculus.acris, Ranunculus.ficaria, Trifolium.repens, Geranium.molle, Cardamine.flexuosa, Bellis.perennis, Taraxacum.officinalis, Lamium.purpureum, Glechoma.hederacea, Cardamine.pratensis, Medicago.lupulina, Medicago.arabica, Cerastium.fontanum, Prunella.vulgaris, Sonchus.arvensis, Veronica.persica, Veronica.chamaedrys, Viola.riviniana)), na.rm = TRUE))
I am now getting an error message saying that X must be numeric however after checking all of the columns entered are numeric.
The issue is that the first argument of mutate has to be a dataframe while you try to apply mutate on total which is a numeric. To make your code work you have to do:
library(dplyr)
mutate(Flower, total=rowSums(select(Flower,-Survey, -Date, -Recorder, -Site, -Block, -Plot, -Treatment)))
Using mtcars as example data:
library(dplyr)
mtcars |>
mutate(total = rowSums(select(mtcars, !c(hp, mpg, disp, drat, wt, qsec))))
#> mpg cyl disp hp drat wt qsec vs am gear carb total
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 15
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 15
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 11
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 11
Another option would be to use dplyr::across like so:
mtcars |>
mutate(total = rowSums(across(!c(hp, mpg, disp, drat, wt, qsec))))
#> mpg cyl disp hp drat wt qsec vs am gear carb total
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 15
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 15
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 11
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 11

replace row names with defined vector in R

Is there a way that the row names can be substituted based on predefined vector in R, something like:
rownames(GV) <- c(beta1='Age', beta10='Female Gender')
Or maybe case_when() will be easier for you:
library(dplyr)
df <- data.frame(a = c(1, 2, 3))
rownames(df)
#> [1] "1" "2" "3"
rownames(df) <- case_when(rownames(df) == "1" ~ "one",
rownames(df) == "2" ~ "two",
TRUE ~ rownames(df))
rownames(df)
#> [1] "one" "two" "3"
You specify new value for each contidion and the value for all rest cases (where is TRUE ~ rownames(df) line) - for the rest cases I'm leaving the previous row names above.
We could do the following:
rownames(mtcars)[which(rownames(mtcars) == "Datsun 710")] <- "My Rowname"
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> My Rowname 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If we want to rename more rownames we can use %in%, but as #gss mentions in the comments, this comes with a caveat: not matter the order of the names in the character vector succeeding %in% the names will be replaced in the order they appear in rownames(). Compare the following two calls:
rownames(mtcars)[which(rownames(mtcars) %in% c("Datsun 710", "Mazda RX4 Wag"))] <- c("My Rowname1","My Rowname2")
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> My Rowname1 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> My Rowname2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Which has the same result as:
rownames(mtcars)[which(rownames(mtcars) %in% c("Mazda RX4 Wag", "Datsun 710"))] <- c("My Rowname1","My Rowname2")
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> My Rowname1 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> My Rowname2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Created on 2021-12-21 by the reprex package (v2.0.1)
If you want to rename all the rows, and you have an array of the desired new names in order:
example <- head(mtcars, 3)
mynewnames <- c("First", "Second", "Third")
rownames(example) <- mynewnames
example
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> First 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Second 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Third 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
If you want to rename all the rows, and you have a named array (not necessarily in the correct order):
example <- head(mtcars, 3)
mynewnames <- c("Datsun 710" = "Datsun", "Mazda RX4" = "Mazda", "Mazda RX4 Wag" = "Also Mazda")
rownames(example) <- mynewnames[rownames(example)]
example
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Also Mazda 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
If you want to rename only some rows, and have a named array (an ordered array makes no sense in this context):
example <- head(mtcars, 3)
mynewnames <- c("Mazda RX4" = "This Mazda", "Mazda RX4 Wag" = "That Mazda")
rownames(example)[rownames(example) %in% names(mynewnames)] <-
mynewnames[rownames(example)[rownames(example) %in% names(mynewnames)]]
example
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> This Mazda 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> That Mazda 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
This is a bit unwieldy; if you are only replacing one or two row names then #TimTeaFan's first suggestion is probably easier.
Most safe way and as OP prefers with a predefined named vector is taking the current rownames, replace those who are defined and set the rownames again. this does not fail on an incomplete vector, if it cannot be replaced it stays as it was before.
The advantage of this solution is to prevent the error below if your rename vector is incomplete.
Error in `.rowNamesDF<-`(x, value = value) :
missing values in 'row.names' are not allowed
solution
library(stringr) # used for str_replace_all()
df <- data.frame(
x = rep(1:5),
y = rep(11:15),
row.names = LETTERS[1:5]
)
df
# x y
# A 1 11
# B 2 12
# C 3 13
# D 4 14
# E 5 15
change <- c("A" = "a", "C" = "c")
row.names(df) <- str_replace_all(row.names(df), change)
df
# x y
# a 1 11
# B 2 12
# c 3 13
# D 4 14
# E 5 15

Is it possible to delete some rows in .x of purrr:::iwalk?

I use purrr:::iwalk like this but it doesn't do what I want:
purrr:::iwalk(DGE_tables, ~ .x[-which(.x$column1 %in% VectorOfSpecificValues),])
DGE_tables is list of data-frames. In this all data-frames I want to delete rows which have a specific values in column1. The dataframes have the same structure.
Is it possible to do that with purrr:::iwalk ? Or there is a better way to do that?
EDIT: An example:
The list of dataframe:
DGE_tables
# Display
$dataframe1
column1 column2
1 to_delete 56
2 to_keep 45
$dataframe2
column1 column2
1 to_delete 78
2 to_keep 27
...
So I want to delete rows which have $column1 = "to_delete". Like this:
# wanted result
$dataframe1
column1 column2
1 to_keep 45
$dataframe2
column1 column2
1 to_keep 27
...
purrr has keep() and discard() just for this sort of thing:
library(purrr)
l <- list(
list(col1 = 'to keep', col2 = 1),
list(col1 = 'to discard', col2 = 2)
)
purrr::keep(l, ~ .x[['col1']] == 'to keep')
#> [[1]]
#> [[1]]$col1
#> [1] "to keep"
#>
#> [[1]]$col2
#> [1] 1
purrr::discard(l, ~ .x[['col1']] == 'to discard')
#> [[1]]
#> [[1]]$col1
#> [1] "to keep"
#>
#> [[1]]$col2
#> [1] 1
There are a few things at work here. Since I don't have your data (yet?), I'll make my own, rather crudely:
dge <- list(mtcars[1:5,], mtcars[1:5,])
Some problems:
By definition, purrr::walk and purrr::iwalk return the original frame .x, regardless of what you do in the function block. As an example, see this:
(purrr::iwalk(dge, ~ return(NULL)))
# [[1]]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# [[2]]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
(If you do just purrr::iwalk(dge, ...) without the surrounding parens, you won't see anything because the default is to return the return value invisibly. The parens force it to be visible.)
So the hard point here is that your example of just filtering within the iwalk will not work. For that, you probably want purrr::imap. (If you are doing more and the small example in your question was a shorter snippet of more code, then you are probably still good with iwalk.)
I tend to prefer to not use which in blocks like that, as the absence with negation of which can be problematic (negative empty-vector does not do "nothing"). Instead, I suggest using logical vectors, and not integer vectors.
Examples: I'll try a stupid conditional of 1 %in% 2, which should obviously find nothing (and with your negation, return all rows):
dge[[1]][ -which(1 %in% 2), ]
# [1] mpg cyl disp hp drat wt qsec vs am gear carb
# <0 rows> (or 0-length row.names)
Using a logical vector instead (and ! instead of -) returns what we expect (i.e., all rows):
dge[[1]][ !(1 %in% 2), ]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
One option would be filter
library(tidyverse)
map(DGE_tables, ~ .x %>%
filter(column1 != "to_delete"))
#[[1]]
# column1 column2
#1 to_keep 45
#[[2]]
# column1 column2
#1 to_keep 27
Or with slice
map(DGE_tables, ~ .x %>%
slice(which(column1 != "to_delete")))
Or it can be done with base R as well
lapply(DGE_tables, subset, subset = column1 != "to_delete")
NOTE: The OP's dataset is a list of data.frame and needs a return output a list of data.frames with subset of rows
and it wouldn't work with keep or discard
purrr::keep(DGE_tables, ~ .x[['column1']] == 'to keep')
Error: Predicate functions must return a single TRUE or FALSE, not
a logical vector of length 2
data
DGE_tables <- list(structure(list(column1 = c("to_delete", "to_keep"),
column2 = c(56L,
45L)), class = "data.frame", row.names = c("1", "2")), structure(list(
column1 = c("to_delete", "to_keep"), column2 = c(78L, 27L
)), class = "data.frame", row.names = c("1", "2")))

How to dplyr rename a column, by column index?

The following code renames first column in the data set:
require(dplyr)
mtcars %>%
setNames(c("RenamedColumn", names(.)[2:length(names(.))]))
Desired results:
RenamedColumn cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Would it be possible to arrive at the same result using rename and column index?
This:
mtcars %>%
rename(1 = "ChangedNameAgain")
will fail:
Error in source("~/.active-rstudio-document", echo = TRUE) :
~/.active-rstudio-document:7:14: unexpected '='
6: mtcars %>%
7: rename(1 =
^
Similarly trying to use rename_ or .[[1]] as column reference will return an error.
As of dplyr 0.7.5, rlang 0.2.1, tidyselect 0.2.4, this simply works:
library(dplyr)
rename(mtcars, ChangedNameAgain = 1)
# ChangedNameAgain cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# ...
Original answer and edits now obsolete:
The logic of rename() is new_name = old_name, so ChangedNameAgain = 1 would make more sense than 1 = ChangedNameAgain.
I would suggest:
mtcars %>% rename_(ChangedNameAgain = names(.)[1])
# ChangedNameAgain cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Edit
I have yet to wrap my head around the new dplyr programming system based on rlang, since versions 0.6/0.7 of dplyr.
The underscore-suffixed version of rename used in my initial answer is now deprecated, and per #jzadra's comment, it didn't work anyway with syntactically problematic names like "foo bar".
Here is my attempt with the new rlang-based Non Standard Evaluation system. Do not hesitate to tell me what I've done wrong, in the comments:
df <- tibble("foo" = 1:2, "bar baz" = letters[1:2])
# # A tibble: 2 x 2
# foo `bar baz`
# <int> <chr>
# 1 1 a
# 2 2 b
First I try directly with rename() but unfortunately I've got an error. It seems to be a FIXME (or is this FIXME unrelated?) in the source code (I'm using dplyr 0.7.4), so it could work in the future:
df %>% rename(qux = !! quo(names(.)[[2]]))
# Error: Expressions are currently not supported in `rename()`
(Edit: the error message now (dplyr 0.7.5) reads Error in UseMethod("rename_") : no applicable method for 'rename_' applied to an object of class "function")
(Update 2018-06-14: df %>% rename(qux = !! quo(names(.)[[2]])) now seems to work, still with dplyr 0.7.5, not sure if an underlying package changed).
Here is a workaround with select that works. It doesn't preserve column order like rename though:
df %>% select(qux = !! quo(names(.)[[2]]), everything())
# # A tibble: 2 x 2
# qux foo
# <chr> <int>
# 1 a 1
# 2 b 2
And if we want to put it in a function, we'd have to slightly modify it with := to allow unquoting on the left hand side. If we want to be robust to inputs like strings and bare variable names, we have to use the "dark magic" (or so says the vignette) of enquo() and quo_name() (honestly I don't fully understand what it does):
rename_col_by_position <- function(df, position, new_name) {
new_name <- enquo(new_name)
new_name <- quo_name(new_name)
select(df, !! new_name := !! quo(names(df)[[position]]), everything())
}
This works with new name as a string:
rename_col_by_position(df, 2, "qux")
# # A tibble: 2 x 2
# qux foo
# <chr> <int>
# 1 a 1
# 2 b 2
This works with new name as a quosure:
rename_col_by_position(df, 2, quo(qux))
# # A tibble: 2 x 2
# qux foo
# <chr> <int>
# 1 a 1
# 2 b 2
This works with new name as a bare name:
rename_col_by_position(df, 2, qux)
# # A tibble: 2 x 2
# qux foo
# <chr> <int>
# 1 a 1
# 2 b 2
And even this works:
rename_col_by_position(df, 2, `qux quux`)
# # A tibble: 2 x 2
# `qux quux` foo
# <chr> <int>
# 1 a 1
# 2 b 2
Here's a couple of alternative solutions that are arguably easier to read because they are not focused around the . reference. select understands column indices, so if you're renaming the first column, you can simply do
mtcars %>% select( RenamedColumn = 1, everything() )
However, the issue with using select is that it will reorder columns if you're renaming a column in the middle. To get around the issue, you have to pre-select the columns to the left of the one you're renaming:
## This will rename the 7th column without changing column order
mtcars %>% select( 1:6, RenamedColumn = 7, everything() )
Another option is to use the new rename_at, which also understand column indices:
## This will also rename the 7th column without changing the order
## Credit for simplifying the second argument: Moody_Mudskipper
mtcars %>% rename_at( 7, ~"RenamedColumn" )
The ~ is needed because rename_at is quite flexible and can accept functions as its second argument. For example, mtcars %>% rename_at( c(2,4), toupper ) will make the names of the second and fourth columns uppercase.
dplyr has superceded rename_at() with rename_with(). You can rename a column by index like this:
library(tidyverse)
mtcars %>%
rename_with(.cols = 1, ~"renamed_column")
#> renamed_column cyl disp hp drat wt qsec vs am gear
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3
#> ...
Be sure to include the tilde (~)* before the new column name.
Also note that if you introduce the glue package, you can modify existing column names like this:
library(glue)
mtcars %>%
rename_with(.cols = 1, ~glue::glue("renamed_{.}"))
#> renamed_mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> ...
Applying the above approach to multiple columns is just a matter of passing in the column index number range using a colon (:) or multiple indices in a vector using c(); here's a combination of both:
mtcars %>%
rename_with(.cols = c(1:3, 5), ~glue::glue("renamed_{.}"))
#> renamed_mpg renamed_cyl renamed_disp hp renamed_drat wt
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875
#> Datsun 710 22.8 4 108.0 93 3.85 2.320
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440
#> ...
And keep in mind that since the . represents the current column name, you can apply string modification functions to it like this:
mtcars %>%
rename_with(.cols = c(1:3),
~glue::glue("renamed_{str_replace(.,'mpg','miles_per_gallon')}"))
#> renamed_miles_per_gallon renamed_cyl renamed_disp hp
#> Mazda RX4 21.0 6 160.0 110
#> Mazda RX4 Wag 21.0 6 160.0 110
#> Datsun 710 22.8 4 108.0 93
#> Hornet 4 Drive 21.4 6 258.0 110
#> Hornet Sportabout 18.7 8 360.0 175
#> ...
*You can learn more about the ~ and . NSE function shorthand here.
Imho rlang as suggested by #Aurele is too much here.
Solution 1: Use a curly bracket pipe pipe context:
bcMatrix %>% {colnames(.)[1] = "foo"; .}
Solution 2: Or (ab)use the tee operator %>% from magrittr package (installed anyway if dplyr is used) to perform the renaming as a side-effect:
bcMatrix %T>% {colnames(.)[1] = "foo"}
Solution 3: using a simple helper function:
rename_by_pos = function(df, index, new_name){
colnames(df)[index] = new_name
df
}
iris %>% rename_by_pos(2,"foo")

Move a column to first position in a data frame

I would like to have the last column of the data frame moved to the start (as first column). How can I do it in R?
My data.frame has about a thousand columns to changing the order wont to. I just want to pick one column and "move it to the start".
Dplyr's select() approach
Moving the last column to the start:
new_df <- df %>%
select(last_column_name, everything())
This is also valid for any column and any quantity:
new_df <- df %>%
select(col_5, col_8, everything())
Example using mtcars data frame:
head(mtcars, n = 2)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Last column is 'carb'
new_df <- mtcars %>% select(carb, everything())
head(new_df, n = 2)
# carb mpg cyl disp hp drat wt qsec vs am gear
# Mazda RX4 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
# Mazda RX4 Wag 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
dplyr 1.0.0 now includes the relocate() function to reorder columns. The default behaviour is to move the named column(s) to the first position.
library(dplyr) # from version 1.0.0
mtcars %>%
relocate(carb) %>%
head()
carb mpg cyl disp hp drat wt qsec vs am gear
Mazda RX4 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
Datsun 710 1 22.8 4 108 93 3.85 2.320 18.61 1 1 4
Hornet 4 Drive 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3
Hornet Sportabout 2 18.7 8 360 175 3.15 3.440 17.02 0 0 3
Valiant 1 18.1 6 225 105 2.76 3.460 20.22 1 0 3
But other locations can be specifed with the .before or .after arguments:
mtcars %>%
relocate(gear, carb, .before = cyl) %>%
head()
mpg gear carb cyl disp hp drat wt qsec vs am
Mazda RX4 21.0 4 4 6 160 110 3.90 2.620 16.46 0 1
Mazda RX4 Wag 21.0 4 4 6 160 110 3.90 2.875 17.02 0 1
Datsun 710 22.8 4 1 4 108 93 3.85 2.320 18.61 1 1
Hornet 4 Drive 21.4 3 1 6 258 110 3.08 3.215 19.44 1 0
Hornet Sportabout 18.7 3 2 8 360 175 3.15 3.440 17.02 0 0
Valiant 18.1 3 1 6 225 105 2.76 3.460 20.22 1 0
You can change the order of columns by adressing them in the new order by choosing them explicitly with data[,c(ORDER YOU WANT THEM TO BE IN)]
If you just want the last column to be first use: data[,c(ncol(data),1:(ncol(data)-1))]
> head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
> head(cars[,c(2,1)])
dist speed
1 2 4
2 10 4
3 4 7
4 22 7
5 16 8
6 10 9
dataframe<-dataframe[,c(1000, 1:999)]
this will move your last column i.e. 1000th column to the first column.
I don't know if it's worth adding this as an answer or if a comment would be fine, but I wrote a function called moveme that lets you do what you want to do with the language you describe. You can find the function at this answer: https://stackoverflow.com/a/18540144/1270695
It works on the names of your data.frame and produces a character vector that you can use to reorder your columns:
mydf <- data.frame(matrix(1:12, ncol = 4))
mydf
moveme(names(mydf), "X4 first")
# [1] "X4" "X1" "X2" "X3"
moveme(names(mydf), "X4 first; X1 last")
# [1] "X4" "X2" "X3" "X1"
mydf[moveme(names(mydf), "X4 first")]
# X4 X1 X2 X3
# 1 10 1 4 7
# 2 11 2 5 8
# 3 12 3 6 9
If you're shuffling things around like this, I suggest converting your data.frame to a data.table and using setcolorder (with my moveme function, if you wish) to make the change by reference.
In your question, you also mentioned "I just want to pick one column and move it to the start". If it's an arbitrary column, and not specifically the last one, you could also look at using setdiff.
Imagine you're working with the "mtcars" dataset and want to move the "am" column to the start.
x <- "am"
mtcars[c(x, setdiff(names(mtcars), x))]
If you want to move any named column to the first position, simply use:
df[,c(which(colnames(df)=="desired_colname"),which(colnames(df)!="desired_colname"))]
A native R approach that works with any number of rows or columns to move the last column of a dataframe to the first column position:
df <- df[,c(ncol(df),1:ncol(df)-1)]
It can be used to move any column to the first column by replacing:
df <- df[,c(your_column_number_here,1:ncol(df)-1)]
If you don't know the column number, but know the column label name, do the following replacing "your_column_name_here":
columnNumber <- which(colnames(df)=="your_column_name_here")
df <- df[,c(columnNumber,1:ncol(df)-1)]
There is also the data.table option with setcolorder():
library(data.table)
mtcars_copy <- copy(mtcars)
setDT(mtcars_copy)
# Move column "gear" in the first position
setcolorder(mtcars_copy, neworder = "gear")
head(mtcars_copy)
# gear mpg cyl disp hp drat wt qsec vs am carb
# 1: 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
# 2: 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
# 3: 4 22.8 4 108 93 3.85 2.320 18.61 1 1 1
# 4: 3 21.4 6 258 110 3.08 3.215 19.44 1 0 1
# 5: 3 18.7 8 360 175 3.15 3.440 17.02 0 0 2
# 6: 3 18.1 6 225 105 2.76 3.460 20.22 1 0 1
If multiple columns, then mention the order in a vector:
setcolorder(mtcars_copy, neworder = c("vs", "carb"))
head(mtcars_copy)
# vs carb gear mpg cyl disp hp drat wt qsec am
# 1: 0 4 4 21.0 6 160 110 3.90 2.620 16.46 1
# 2: 0 4 4 21.0 6 160 110 3.90 2.875 17.02 1
# 3: 1 1 4 22.8 4 108 93 3.85 2.320 18.61 1
# 4: 1 1 3 21.4 6 258 110 3.08 3.215 19.44 0
# 5: 0 2 3 18.7 8 360 175 3.15 3.440 17.02 0
# 6: 1 1 3 18.1 6 225 105 2.76 3.460 20.22 0
Move any column from any position for the first position in your data
n <- which(colnames(df)=="column_need_move")
column_need_move <- df$column_need_to_move
df <- cbind(column_need_move, df[,-n])
If you want to create a new column and have it be the first column, use the .before=1 argument:
my_data <- my_data %>% mutate(newcol = a*b, .before=1)

Resources