tryCatch() in across() fails when the error comes from another column - r

I use across() and I want to put NA where the computation fails. I tried to use tryCatch() but can't make it work in my case, whereas there are situations where it works.
This works:
library(dplyr)
head(mtcars) %>%
mutate(
across(
all_of("drat"),
function(x) tryCatch(blabla, error = function(e) NA) # create an intentional error for the example
)
)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 NA 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 NA 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 NA 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 NA 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 NA 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 NA 3.460 20.22 1 0 3 1
But this doesn't:
library(dplyr)
head(mtcars) %>%
mutate(
across(
all_of("drat"),
function(x) tryCatch(x[which(mpg == 10000)], error = function(e) NA) # create an intentional error for the example
)
)
#> Error in `mutate()`:
#> ! Problem while computing `..1 = across(...)`.
#> Caused by error in `across()`:
#> ! Problem while computing column `drat`.
Created on 2022-07-07 by the reprex package (v2.0.1)
I thought tryCatch() was supposed to catch any error. Why doesn't it work in the second situation? How to fix it?
Note: I need to use across() in my real situation (even if it's not truly needed in the examples)

The problem isn't the tryCatch because the code you run doesn't trigger an error. Basically you are running
foo <- function(x) tryCatch(x[which(mtcars$mpg==10000)], error = function(e) NA))
foo(mtcars$drat)
# numeric(0)
And notice that no error is triggered. That expression simply returns numeric(0). And the problem is that the function needs to return a value with a non-zero length. So the error is happening after your tryCatch code runs and dplyr is trying to assign the value back into the data.frame. You will need to handle the case where no values are found separately. Perhaps
head(mtcars) %>%
mutate(
across(
all_of("drat"),
function(x) {
matches <- mpg == 10000
if (any(matches)) x[which(matches)] else NA
}
)
)

It looks like you just need to reference mpg with x:
library(dplyr)
head(mtcars) %>%
mutate(
across(
all_of("drat"),
function(x) tryCatch(x[which(x$mpg == 10000)], error = function(e) NA) # create an intentional error for the example
)
)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 NA 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 NA 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 NA 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 NA 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 NA 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 NA 3.460 20.22 1 0 3 1

Related

Using variable as column name not working when filtering or using subsets in R

I am trying to filter out all values above 0 in a column name, a variable I am getting from another CSV file.
When trying to filter on this value it does not work, returns 0 records like this:
retA <- retnegative %>% filter(valname > 0)
I also tried
retA <- retnegative %>% filter(as.numeric(valname) > 0)
this also does not work. How can I keep my valname a variable and still be able to filter through these values dynamically? Or is there another method to make this happen in R?
Full code:
# get our samplenames from other CSV
samples <- phenodata$samplename
# modify names to match names in pos and neg files.
negsamples <- paste(samples, "neg", sep = "-")
# for loop
for (val in negsamples) {
# setup correct name format for R
valname <- make.names(val)
if (valname != "" & valname != "-neg") {
print(valname)
retA <- retnegative %>% filter(valname > 0)
write.table(retA, paste("ResultNegData/", val, ".csv"), col.names = TRUE, sep = ",") # nolint
}
}
Thanks in advance guys!
I am expecting the code to filter and give me all values for that column that are above 0
filter doesn't recognise the string as a column name. You can parse it as an expression using base::parse or rlang::parse_expr (shown below):
library(tidyverse)
library(rlang)
for (val in names(mtcars[,c(8, 9, 11)])) {
valname <- make.names(val)
print(valname)
filter(mtcars, !!parse_expr(valname) == 1) |>
head(3) |>
print()
}
#> [1] "vs"
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
#> [1] "am"
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> [1] "carb"
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Converting a character into a variable name

I need to convert a data frame into a matrix using the model.matrix function. The name of the original data frame is train, and the outcome variable of interest is called adequacy_ratio_total_percent. The below R code works.
X_train_matrix <- model.matrix(adequacy_ratio_total_percent ~ ., train)[, -1]
However, since my outcome variables may vary and I hope to simplify the changing of the outcome variables using the below code, which does not work.
list_outcome <- c("adequacy_ratio_total_percent")
X_train_matrix <- model.matrix(list_outcome ~ ., train)[, -1]
Error in model.frame.default(object, data, xlev = xlev) :
variable lengths differ (found for 'adequacy_ratio_total_percent')
I also tried the following, which does not work either.
list_outcome <- c("adequacy_ratio_total_percent")
X_train_matrix <- model.matrix(train$list_outcome ~ ., train)[, -1]
Error in model.frame.default(object, data, xlev = xlev) :
invalid type (NULL) for variable 'train$list_outcome'
Or the following:
list_outcome <- c("adequacy_ratio_total_percent")
X_train_matrix <- model.matrix(list_outcome[1] ~ ., train)[, -1]
Error in model.frame.default(object, data, xlev = xlev) :
variable lengths differ (found for 'adequacy_ratio_total_percent')
How can I extract the variable name from list_outcome and apply it to the model.matrix function? Thank you in advance for any advice!
Here's an answer that uses the same idea as #user20650, but with multiple possibilities for outcomes:
data(mtcars)
list_outcomes = c("qsec", "mpg")
Xmats <- lapply(list_outcomes, function(l){
model.matrix(reformulate(".", response=l), data=mtcars)
})
lapply(Xmats, head)
#> [[1]]
#> (Intercept) mpg cyl disp hp drat wt vs am gear carb
#> Mazda RX4 1 21.0 6 160 110 3.90 2.620 0 1 4 4
#> Mazda RX4 Wag 1 21.0 6 160 110 3.90 2.875 0 1 4 4
#> Datsun 710 1 22.8 4 108 93 3.85 2.320 1 1 4 1
#> Hornet 4 Drive 1 21.4 6 258 110 3.08 3.215 1 0 3 1
#> Hornet Sportabout 1 18.7 8 360 175 3.15 3.440 0 0 3 2
#> Valiant 1 18.1 6 225 105 2.76 3.460 1 0 3 1
#>
#> [[2]]
#> (Intercept) cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 1 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 1 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 1 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 1 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 1 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 1 6 225 105 2.76 3.460 20.22 1 0 3 1
Created on 2022-06-28 by the reprex package (v2.0.1)

Fix subscript out of bounds error when adding column to df

I have a df with 20 columns of numerical data. I am trying to add an additional column with the "total" number of rows, however I am getting a subscript out of bounds error. This is the code I'm using:
df[,"Total"]<-rowSums(df)
This is the error:
Error in `[<-`(`*tmp*`, , "Total", value = c(Acidovorax = 13, Acinetobacter = 48143, :
subscript out of bounds
That shouldn't happen for data.frames, but can for matrix.
mt_mtx <- as.matrix(mtcars)
mtcars[,"Total"] <- rowSums(mtcars)
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb Total
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 328.980
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 329.795
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 259.580
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 426.135
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 590.310
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 385.540
mt_mtx[,"Total"] <- rowSums(mt_mtx)
# Error in `[<-`(`*tmp*`, , "Total", value = c(`Mazda RX4` = 328.98, `Mazda RX4 Wag` = 329.795, :
# subscript out of bounds
The quick remedy is to convert your df back to a data.frame. If you weren't expecting this, thinking that your df was already a frame, then I suggest you go back through your code to find what accidentally coerced it to a matrix.

How to dplyr rename a column, by column index?

The following code renames first column in the data set:
require(dplyr)
mtcars %>%
setNames(c("RenamedColumn", names(.)[2:length(names(.))]))
Desired results:
RenamedColumn cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Would it be possible to arrive at the same result using rename and column index?
This:
mtcars %>%
rename(1 = "ChangedNameAgain")
will fail:
Error in source("~/.active-rstudio-document", echo = TRUE) :
~/.active-rstudio-document:7:14: unexpected '='
6: mtcars %>%
7: rename(1 =
^
Similarly trying to use rename_ or .[[1]] as column reference will return an error.
As of dplyr 0.7.5, rlang 0.2.1, tidyselect 0.2.4, this simply works:
library(dplyr)
rename(mtcars, ChangedNameAgain = 1)
# ChangedNameAgain cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# ...
Original answer and edits now obsolete:
The logic of rename() is new_name = old_name, so ChangedNameAgain = 1 would make more sense than 1 = ChangedNameAgain.
I would suggest:
mtcars %>% rename_(ChangedNameAgain = names(.)[1])
# ChangedNameAgain cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Edit
I have yet to wrap my head around the new dplyr programming system based on rlang, since versions 0.6/0.7 of dplyr.
The underscore-suffixed version of rename used in my initial answer is now deprecated, and per #jzadra's comment, it didn't work anyway with syntactically problematic names like "foo bar".
Here is my attempt with the new rlang-based Non Standard Evaluation system. Do not hesitate to tell me what I've done wrong, in the comments:
df <- tibble("foo" = 1:2, "bar baz" = letters[1:2])
# # A tibble: 2 x 2
# foo `bar baz`
# <int> <chr>
# 1 1 a
# 2 2 b
First I try directly with rename() but unfortunately I've got an error. It seems to be a FIXME (or is this FIXME unrelated?) in the source code (I'm using dplyr 0.7.4), so it could work in the future:
df %>% rename(qux = !! quo(names(.)[[2]]))
# Error: Expressions are currently not supported in `rename()`
(Edit: the error message now (dplyr 0.7.5) reads Error in UseMethod("rename_") : no applicable method for 'rename_' applied to an object of class "function")
(Update 2018-06-14: df %>% rename(qux = !! quo(names(.)[[2]])) now seems to work, still with dplyr 0.7.5, not sure if an underlying package changed).
Here is a workaround with select that works. It doesn't preserve column order like rename though:
df %>% select(qux = !! quo(names(.)[[2]]), everything())
# # A tibble: 2 x 2
# qux foo
# <chr> <int>
# 1 a 1
# 2 b 2
And if we want to put it in a function, we'd have to slightly modify it with := to allow unquoting on the left hand side. If we want to be robust to inputs like strings and bare variable names, we have to use the "dark magic" (or so says the vignette) of enquo() and quo_name() (honestly I don't fully understand what it does):
rename_col_by_position <- function(df, position, new_name) {
new_name <- enquo(new_name)
new_name <- quo_name(new_name)
select(df, !! new_name := !! quo(names(df)[[position]]), everything())
}
This works with new name as a string:
rename_col_by_position(df, 2, "qux")
# # A tibble: 2 x 2
# qux foo
# <chr> <int>
# 1 a 1
# 2 b 2
This works with new name as a quosure:
rename_col_by_position(df, 2, quo(qux))
# # A tibble: 2 x 2
# qux foo
# <chr> <int>
# 1 a 1
# 2 b 2
This works with new name as a bare name:
rename_col_by_position(df, 2, qux)
# # A tibble: 2 x 2
# qux foo
# <chr> <int>
# 1 a 1
# 2 b 2
And even this works:
rename_col_by_position(df, 2, `qux quux`)
# # A tibble: 2 x 2
# `qux quux` foo
# <chr> <int>
# 1 a 1
# 2 b 2
Here's a couple of alternative solutions that are arguably easier to read because they are not focused around the . reference. select understands column indices, so if you're renaming the first column, you can simply do
mtcars %>% select( RenamedColumn = 1, everything() )
However, the issue with using select is that it will reorder columns if you're renaming a column in the middle. To get around the issue, you have to pre-select the columns to the left of the one you're renaming:
## This will rename the 7th column without changing column order
mtcars %>% select( 1:6, RenamedColumn = 7, everything() )
Another option is to use the new rename_at, which also understand column indices:
## This will also rename the 7th column without changing the order
## Credit for simplifying the second argument: Moody_Mudskipper
mtcars %>% rename_at( 7, ~"RenamedColumn" )
The ~ is needed because rename_at is quite flexible and can accept functions as its second argument. For example, mtcars %>% rename_at( c(2,4), toupper ) will make the names of the second and fourth columns uppercase.
dplyr has superceded rename_at() with rename_with(). You can rename a column by index like this:
library(tidyverse)
mtcars %>%
rename_with(.cols = 1, ~"renamed_column")
#> renamed_column cyl disp hp drat wt qsec vs am gear
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3
#> ...
Be sure to include the tilde (~)* before the new column name.
Also note that if you introduce the glue package, you can modify existing column names like this:
library(glue)
mtcars %>%
rename_with(.cols = 1, ~glue::glue("renamed_{.}"))
#> renamed_mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> ...
Applying the above approach to multiple columns is just a matter of passing in the column index number range using a colon (:) or multiple indices in a vector using c(); here's a combination of both:
mtcars %>%
rename_with(.cols = c(1:3, 5), ~glue::glue("renamed_{.}"))
#> renamed_mpg renamed_cyl renamed_disp hp renamed_drat wt
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875
#> Datsun 710 22.8 4 108.0 93 3.85 2.320
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440
#> ...
And keep in mind that since the . represents the current column name, you can apply string modification functions to it like this:
mtcars %>%
rename_with(.cols = c(1:3),
~glue::glue("renamed_{str_replace(.,'mpg','miles_per_gallon')}"))
#> renamed_miles_per_gallon renamed_cyl renamed_disp hp
#> Mazda RX4 21.0 6 160.0 110
#> Mazda RX4 Wag 21.0 6 160.0 110
#> Datsun 710 22.8 4 108.0 93
#> Hornet 4 Drive 21.4 6 258.0 110
#> Hornet Sportabout 18.7 8 360.0 175
#> ...
*You can learn more about the ~ and . NSE function shorthand here.
Imho rlang as suggested by #Aurele is too much here.
Solution 1: Use a curly bracket pipe pipe context:
bcMatrix %>% {colnames(.)[1] = "foo"; .}
Solution 2: Or (ab)use the tee operator %>% from magrittr package (installed anyway if dplyr is used) to perform the renaming as a side-effect:
bcMatrix %T>% {colnames(.)[1] = "foo"}
Solution 3: using a simple helper function:
rename_by_pos = function(df, index, new_name){
colnames(df)[index] = new_name
df
}
iris %>% rename_by_pos(2,"foo")

Move a column to first position in a data frame

I would like to have the last column of the data frame moved to the start (as first column). How can I do it in R?
My data.frame has about a thousand columns to changing the order wont to. I just want to pick one column and "move it to the start".
Dplyr's select() approach
Moving the last column to the start:
new_df <- df %>%
select(last_column_name, everything())
This is also valid for any column and any quantity:
new_df <- df %>%
select(col_5, col_8, everything())
Example using mtcars data frame:
head(mtcars, n = 2)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Last column is 'carb'
new_df <- mtcars %>% select(carb, everything())
head(new_df, n = 2)
# carb mpg cyl disp hp drat wt qsec vs am gear
# Mazda RX4 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
# Mazda RX4 Wag 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
dplyr 1.0.0 now includes the relocate() function to reorder columns. The default behaviour is to move the named column(s) to the first position.
library(dplyr) # from version 1.0.0
mtcars %>%
relocate(carb) %>%
head()
carb mpg cyl disp hp drat wt qsec vs am gear
Mazda RX4 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
Datsun 710 1 22.8 4 108 93 3.85 2.320 18.61 1 1 4
Hornet 4 Drive 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3
Hornet Sportabout 2 18.7 8 360 175 3.15 3.440 17.02 0 0 3
Valiant 1 18.1 6 225 105 2.76 3.460 20.22 1 0 3
But other locations can be specifed with the .before or .after arguments:
mtcars %>%
relocate(gear, carb, .before = cyl) %>%
head()
mpg gear carb cyl disp hp drat wt qsec vs am
Mazda RX4 21.0 4 4 6 160 110 3.90 2.620 16.46 0 1
Mazda RX4 Wag 21.0 4 4 6 160 110 3.90 2.875 17.02 0 1
Datsun 710 22.8 4 1 4 108 93 3.85 2.320 18.61 1 1
Hornet 4 Drive 21.4 3 1 6 258 110 3.08 3.215 19.44 1 0
Hornet Sportabout 18.7 3 2 8 360 175 3.15 3.440 17.02 0 0
Valiant 18.1 3 1 6 225 105 2.76 3.460 20.22 1 0
You can change the order of columns by adressing them in the new order by choosing them explicitly with data[,c(ORDER YOU WANT THEM TO BE IN)]
If you just want the last column to be first use: data[,c(ncol(data),1:(ncol(data)-1))]
> head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
> head(cars[,c(2,1)])
dist speed
1 2 4
2 10 4
3 4 7
4 22 7
5 16 8
6 10 9
dataframe<-dataframe[,c(1000, 1:999)]
this will move your last column i.e. 1000th column to the first column.
I don't know if it's worth adding this as an answer or if a comment would be fine, but I wrote a function called moveme that lets you do what you want to do with the language you describe. You can find the function at this answer: https://stackoverflow.com/a/18540144/1270695
It works on the names of your data.frame and produces a character vector that you can use to reorder your columns:
mydf <- data.frame(matrix(1:12, ncol = 4))
mydf
moveme(names(mydf), "X4 first")
# [1] "X4" "X1" "X2" "X3"
moveme(names(mydf), "X4 first; X1 last")
# [1] "X4" "X2" "X3" "X1"
mydf[moveme(names(mydf), "X4 first")]
# X4 X1 X2 X3
# 1 10 1 4 7
# 2 11 2 5 8
# 3 12 3 6 9
If you're shuffling things around like this, I suggest converting your data.frame to a data.table and using setcolorder (with my moveme function, if you wish) to make the change by reference.
In your question, you also mentioned "I just want to pick one column and move it to the start". If it's an arbitrary column, and not specifically the last one, you could also look at using setdiff.
Imagine you're working with the "mtcars" dataset and want to move the "am" column to the start.
x <- "am"
mtcars[c(x, setdiff(names(mtcars), x))]
If you want to move any named column to the first position, simply use:
df[,c(which(colnames(df)=="desired_colname"),which(colnames(df)!="desired_colname"))]
A native R approach that works with any number of rows or columns to move the last column of a dataframe to the first column position:
df <- df[,c(ncol(df),1:ncol(df)-1)]
It can be used to move any column to the first column by replacing:
df <- df[,c(your_column_number_here,1:ncol(df)-1)]
If you don't know the column number, but know the column label name, do the following replacing "your_column_name_here":
columnNumber <- which(colnames(df)=="your_column_name_here")
df <- df[,c(columnNumber,1:ncol(df)-1)]
There is also the data.table option with setcolorder():
library(data.table)
mtcars_copy <- copy(mtcars)
setDT(mtcars_copy)
# Move column "gear" in the first position
setcolorder(mtcars_copy, neworder = "gear")
head(mtcars_copy)
# gear mpg cyl disp hp drat wt qsec vs am carb
# 1: 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
# 2: 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
# 3: 4 22.8 4 108 93 3.85 2.320 18.61 1 1 1
# 4: 3 21.4 6 258 110 3.08 3.215 19.44 1 0 1
# 5: 3 18.7 8 360 175 3.15 3.440 17.02 0 0 2
# 6: 3 18.1 6 225 105 2.76 3.460 20.22 1 0 1
If multiple columns, then mention the order in a vector:
setcolorder(mtcars_copy, neworder = c("vs", "carb"))
head(mtcars_copy)
# vs carb gear mpg cyl disp hp drat wt qsec am
# 1: 0 4 4 21.0 6 160 110 3.90 2.620 16.46 1
# 2: 0 4 4 21.0 6 160 110 3.90 2.875 17.02 1
# 3: 1 1 4 22.8 4 108 93 3.85 2.320 18.61 1
# 4: 1 1 3 21.4 6 258 110 3.08 3.215 19.44 0
# 5: 0 2 3 18.7 8 360 175 3.15 3.440 17.02 0
# 6: 1 1 3 18.1 6 225 105 2.76 3.460 20.22 0
Move any column from any position for the first position in your data
n <- which(colnames(df)=="column_need_move")
column_need_move <- df$column_need_to_move
df <- cbind(column_need_move, df[,-n])
If you want to create a new column and have it be the first column, use the .before=1 argument:
my_data <- my_data %>% mutate(newcol = a*b, .before=1)

Resources