When I create a new variable, is there a way to specify in the function where to place it?
Right now, it adds it to the end of the dataframe, but for ease of viewing in Excel for example, I'd like to place a new calculated column beside the columns I used for the calculation.
Here's an example of code:
rawdata2 <- (rawdata1 %>% unite(location, locations1,locations2, locations3,
na.rm = TRUE, remove=TRUE)
%>% select(-location7, -location16)
%>% unite(Sector, Sectors, na.rm=TRUE, remove=TRUE)
%>% unite(TypeofSpace, TypesofSpace, type.of.spaceOther, na.rm=TRUE,
remove=TRUE)
)
You can rearrange the columns in your data frame. It looks like you are using dplyr::select in your example.
library(dplyr)
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars2 <- mtcars %>%
select(mpg, carb, everything()) ## moves carb up behind mpg
head(mtcars2)
# mpg carb cyl disp hp drat wt qsec vs am gear
# Mazda RX4 21.0 4 6 160 110 3.90 2.620 16.46 0 1 4
# Mazda RX4 Wag 21.0 4 6 160 110 3.90 2.875 17.02 0 1 4
# Datsun 710 22.8 1 4 108 93 3.85 2.320 18.61 1 1 4
# Hornet 4 Drive 21.4 1 6 258 110 3.08 3.215 19.44 1 0 3
# Hornet Sportabout 18.7 2 8 360 175 3.15 3.440 17.02 0 0 3
# Valiant 18.1 1 6 225 105 2.76 3.460 20.22 1 0 3
You can do the same thing with base subsetting, for example with a data frame with 11 columns you can move the 11th behind the second by
mtcars3 <- mtcars[,c(1,11,2:10)]
identical(mtcars2, mtcars3)
# [1] TRUE
I ended up using relocate, documentation here: dplyr.tidyverse.org/reference/relocate.html
Related
I have a column including lots of "0" and other values (f.i. 2 or 2,3 etc). Is there any possibility to rename the columns with 0 to "None" and all other values to "others"? I wanted to use fct_recode or fct_collapse but cant figure out how to include all other values. Do you have any idea? I must not be necessarily include the fct_recode function.
Thanks a lot
Philipp
I tried to use fct_recode, fct_collapse
Here is a way to do it using mtcars and the vs column as an example:
cars <- mtcars
head(cars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
cars$vs <- ifelse(cars$vs == 0, "none", "other")
head(cars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 none 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 none 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 other 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 other 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 none 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 other 0 3 1
Note that R coerces the vs column from numeric to character. But you could do that explicitly first for clarity:
cars$vs <- as.character(cars$vs)
Using dplyr, we can do this on multiple colums as
library(dplyr)
df1 <- df1 %>%
mutate(across(everything(), ~ case_when(.x == 0 ~ "none", TRUE ~ "other")))
Or in base R
df1[] <- c("other", "none")[1 + (df1 == 0)]
I need to re-ordering the columns' position in a dataframe with 500 columns. In fact, I only want the last column to be moved between the third and the fourth columns.
Here is what I tried:
df[ ,c(1, 2, 3, ncol(df), 4:ncol(df)-1)]
But it gives me a vector of values which are the columns' number. Would you someone tell me what I expect wrong from this code?
The issue maybe related to the operator precedence - wrap the (ncol(df)-1) within bracket (assuming the original object is a data.frame)
library(data.table)
df <- df[ ,c(1, 2, 3, ncol(df), 4:(ncol(df)-1)), with = FALSE]
Or use setcolorder to update the original object
setcolorder(df, c(1, 2, 3, ncol(df), 4:(ncol(df)-1)))
NOTE: with = FALSE was added after the OP confirmed it is a data.table object
Or another option is select
library(dplyr)
df <- df %>%
select(1:3, last_col(), everything())
Or with relocate
df <- df %>%
relocate(last_col(), .before = 4)
-reproducible example testing
> data(mtcars)
> head(mtcars)[, c(1, 2, 3, ncol(mtcars), 4:(ncol(mtcars)-1))]
mpg cyl disp carb hp drat wt qsec vs am gear
Mazda RX4 21.0 6 160 4 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 21.0 6 160 4 110 3.90 2.875 17.02 0 1 4
Datsun 710 22.8 4 108 1 93 3.85 2.320 18.61 1 1 4
Hornet 4 Drive 21.4 6 258 1 110 3.08 3.215 19.44 1 0 3
Hornet Sportabout 18.7 8 360 2 175 3.15 3.440 17.02 0 0 3
Valiant 18.1 6 225 1 105 2.76 3.460 20.22 1 0 3
> head(mtcars) %>% select(1:3, last_col(), everything())
mpg cyl disp carb hp drat wt qsec vs am gear
Mazda RX4 21.0 6 160 4 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 21.0 6 160 4 110 3.90 2.875 17.02 0 1 4
Datsun 710 22.8 4 108 1 93 3.85 2.320 18.61 1 1 4
Hornet 4 Drive 21.4 6 258 1 110 3.08 3.215 19.44 1 0 3
Hornet Sportabout 18.7 8 360 2 175 3.15 3.440 17.02 0 0 3
Valiant 18.1 6 225 1 105 2.76 3.460 20.22 1 0 3
> ?relocate
> head(mtcars) %>% relocate(last_col(), .before = 4)
mpg cyl disp carb hp drat wt qsec vs am gear
Mazda RX4 21.0 6 160 4 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 21.0 6 160 4 110 3.90 2.875 17.02 0 1 4
Datsun 710 22.8 4 108 1 93 3.85 2.320 18.61 1 1 4
Hornet 4 Drive 21.4 6 258 1 110 3.08 3.215 19.44 1 0 3
Hornet Sportabout 18.7 8 360 2 175 3.15 3.440 17.02 0 0 3
Valiant 18.1 6 225 1 105 2.76 3.460 20.22 1 0 3
Example, I want to drop field mpg, select carb so that it's first, then just everything that's left over in their existing order.
mtcars |> select_at(vars(-mpg, carb, everything()))
This seems to drop mpg as desired, but carb is not in the front position / first variable.
My call to select_at() was intended to read in english 'drop mpg, then select carb first then everything else'.
On the docs for ?vars it says to use across. I'm open to either, but would prefer a one liner if possible as opposed to e.g. select(-mpg) |> select_at(vars(carb, everything()))
The order can be changed - i.e. place the column that needs to be deleted as the last entry
library(dplyr)
mtcars |>
select_at(vars(carb, everything(), -mpg)) |>
head()
carb cyl disp hp drat wt qsec vs am gear
Mazda RX4 4 6 160 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 4 6 160 110 3.90 2.875 17.02 0 1 4
Datsun 710 1 4 108 93 3.85 2.320 18.61 1 1 4
Hornet 4 Drive 1 6 258 110 3.08 3.215 19.44 1 0 3
Hornet Sportabout 2 8 360 175 3.15 3.440 17.02 0 0 3
Valiant 1 6 225 105 2.76 3.460 20.22 1 0 3
The _at/_all etc are all deprecated. We can directly use everything() within select
mtcars |>
select(carb, everything(), -mpg) |>
head()
carb cyl disp hp drat wt qsec vs am gear
Mazda RX4 4 6 160 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 4 6 160 110 3.90 2.875 17.02 0 1 4
Datsun 710 1 4 108 93 3.85 2.320 18.61 1 1 4
Hornet 4 Drive 1 6 258 110 3.08 3.215 19.44 1 0 3
Hornet Sportabout 2 8 360 175 3.15 3.440 17.02 0 0 3
Valiant 1 6 225 105 2.76 3.460 20.22 1 0 3
The issue is that when we use -mpg as the first entry, it removes that column keeping all the rest of the column, then adding 'carb' as second entry does nothing because 'carb' is already a column in the selected data and duplicates for column names are not allowed, the last everything() adds back the 'mpg' again.
> mtcars |> select_at(vars(-mpg)) |> head()
cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 6 225 105 2.76 3.460 20.22 1 0 3 1
> mtcars |> select_at(vars(-mpg, carb)) |> head()
cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 6 225 105 2.76 3.460 20.22 1 0 3 1
> mtcars |> select_at(vars(-mpg, carb, everything())) |> head()
cyl disp hp drat wt qsec vs am gear carb mpg
Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4 21.0
Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4 21.0
Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1 22.8
Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1 21.4
Hornet Sportabout 8 360 175 3.15 3.440 17.02 0 0 3 2 18.7
Valiant 6 225 105 2.76 3.460 20.22 1 0 3 1 18.1
According to ?select, the usage is
select(.data, ...)
where ... is variadic argument, which can take any number of column names, numbers etc.
The order of evaluation happens from left to right, thus first expression is evaluated, then second and so on ...
I want to rename() some variables in my data programmatically, so I can to it via map at some point.
I'm looking for the equivalent of,
library(tidyverse)
mtcars %>% rename(
"MPG" = "mpg"
)
but using environment variables instead. I tried !!sym() by doing the following,
library(tidyverse)
new_name <- "MPG"
old_name <- "mpg"
mtcars %>% rename(
!!sym(new_name) = !!sym(old_name)
)
However, I get the error Error: unexpected ')' in ")". I am not sure what I am missing here!
We could use setNames and evaluate (!!!)
head(mtcars %>%
rename(!!! setNames(old_name, new_name)))
-output
MPG cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 6 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
You can use {{}} -
library(dplyr)
new_name <- "MPG"
old_name <- "mpg"
mtcars %>% rename({{new_name}} := {{old_name}}) %>% head
# MPG cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Can you sort a df based on object class? Say
data("mtcars")
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- as.factor(mtcars$am)
sapply(mtcars,class)
and I want all numeric variables first and then all factors at the end? I want to be able to do this on a much larger dataset so I prefer solutions that do not rely on subsetting by column number. Cheers.
Maybe this one?
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
x <- mtcars[,names(sort(unlist(lapply(mtcars, class)), decreasing = T))]
head(x)
# mpg disp hp drat wt qsec gear carb cyl vs am
# Mazda RX4 21.0 160 110 3.90 2.620 16.46 4 4 6 0 1
# Mazda RX4 Wag 21.0 160 110 3.90 2.875 17.02 4 4 6 0 1
# Datsun 710 22.8 108 93 3.85 2.320 18.61 4 1 4 1 1
# Hornet 4 Drive 21.4 258 110 3.08 3.215 19.44 3 1 6 1 0
# Hornet Sportabout 18.7 360 175 3.15 3.440 17.02 3 2 8 0 0
# Valiant 18.1 225 105 2.76 3.460 20.22 3 1 6 1 0
In x, as you see, the columns cyl, vs and am that are of class factor are place at the end and those of class numeric first.