I would like to have the last column of the data frame moved to the start (as first column). How can I do it in R?
My data.frame has about a thousand columns to changing the order wont to. I just want to pick one column and "move it to the start".
Dplyr's select() approach
Moving the last column to the start:
new_df <- df %>%
select(last_column_name, everything())
This is also valid for any column and any quantity:
new_df <- df %>%
select(col_5, col_8, everything())
Example using mtcars data frame:
head(mtcars, n = 2)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Last column is 'carb'
new_df <- mtcars %>% select(carb, everything())
head(new_df, n = 2)
# carb mpg cyl disp hp drat wt qsec vs am gear
# Mazda RX4 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
# Mazda RX4 Wag 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
dplyr 1.0.0 now includes the relocate() function to reorder columns. The default behaviour is to move the named column(s) to the first position.
library(dplyr) # from version 1.0.0
mtcars %>%
relocate(carb) %>%
head()
carb mpg cyl disp hp drat wt qsec vs am gear
Mazda RX4 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
Datsun 710 1 22.8 4 108 93 3.85 2.320 18.61 1 1 4
Hornet 4 Drive 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3
Hornet Sportabout 2 18.7 8 360 175 3.15 3.440 17.02 0 0 3
Valiant 1 18.1 6 225 105 2.76 3.460 20.22 1 0 3
But other locations can be specifed with the .before or .after arguments:
mtcars %>%
relocate(gear, carb, .before = cyl) %>%
head()
mpg gear carb cyl disp hp drat wt qsec vs am
Mazda RX4 21.0 4 4 6 160 110 3.90 2.620 16.46 0 1
Mazda RX4 Wag 21.0 4 4 6 160 110 3.90 2.875 17.02 0 1
Datsun 710 22.8 4 1 4 108 93 3.85 2.320 18.61 1 1
Hornet 4 Drive 21.4 3 1 6 258 110 3.08 3.215 19.44 1 0
Hornet Sportabout 18.7 3 2 8 360 175 3.15 3.440 17.02 0 0
Valiant 18.1 3 1 6 225 105 2.76 3.460 20.22 1 0
You can change the order of columns by adressing them in the new order by choosing them explicitly with data[,c(ORDER YOU WANT THEM TO BE IN)]
If you just want the last column to be first use: data[,c(ncol(data),1:(ncol(data)-1))]
> head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
> head(cars[,c(2,1)])
dist speed
1 2 4
2 10 4
3 4 7
4 22 7
5 16 8
6 10 9
dataframe<-dataframe[,c(1000, 1:999)]
this will move your last column i.e. 1000th column to the first column.
I don't know if it's worth adding this as an answer or if a comment would be fine, but I wrote a function called moveme that lets you do what you want to do with the language you describe. You can find the function at this answer: https://stackoverflow.com/a/18540144/1270695
It works on the names of your data.frame and produces a character vector that you can use to reorder your columns:
mydf <- data.frame(matrix(1:12, ncol = 4))
mydf
moveme(names(mydf), "X4 first")
# [1] "X4" "X1" "X2" "X3"
moveme(names(mydf), "X4 first; X1 last")
# [1] "X4" "X2" "X3" "X1"
mydf[moveme(names(mydf), "X4 first")]
# X4 X1 X2 X3
# 1 10 1 4 7
# 2 11 2 5 8
# 3 12 3 6 9
If you're shuffling things around like this, I suggest converting your data.frame to a data.table and using setcolorder (with my moveme function, if you wish) to make the change by reference.
In your question, you also mentioned "I just want to pick one column and move it to the start". If it's an arbitrary column, and not specifically the last one, you could also look at using setdiff.
Imagine you're working with the "mtcars" dataset and want to move the "am" column to the start.
x <- "am"
mtcars[c(x, setdiff(names(mtcars), x))]
If you want to move any named column to the first position, simply use:
df[,c(which(colnames(df)=="desired_colname"),which(colnames(df)!="desired_colname"))]
A native R approach that works with any number of rows or columns to move the last column of a dataframe to the first column position:
df <- df[,c(ncol(df),1:ncol(df)-1)]
It can be used to move any column to the first column by replacing:
df <- df[,c(your_column_number_here,1:ncol(df)-1)]
If you don't know the column number, but know the column label name, do the following replacing "your_column_name_here":
columnNumber <- which(colnames(df)=="your_column_name_here")
df <- df[,c(columnNumber,1:ncol(df)-1)]
There is also the data.table option with setcolorder():
library(data.table)
mtcars_copy <- copy(mtcars)
setDT(mtcars_copy)
# Move column "gear" in the first position
setcolorder(mtcars_copy, neworder = "gear")
head(mtcars_copy)
# gear mpg cyl disp hp drat wt qsec vs am carb
# 1: 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
# 2: 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
# 3: 4 22.8 4 108 93 3.85 2.320 18.61 1 1 1
# 4: 3 21.4 6 258 110 3.08 3.215 19.44 1 0 1
# 5: 3 18.7 8 360 175 3.15 3.440 17.02 0 0 2
# 6: 3 18.1 6 225 105 2.76 3.460 20.22 1 0 1
If multiple columns, then mention the order in a vector:
setcolorder(mtcars_copy, neworder = c("vs", "carb"))
head(mtcars_copy)
# vs carb gear mpg cyl disp hp drat wt qsec am
# 1: 0 4 4 21.0 6 160 110 3.90 2.620 16.46 1
# 2: 0 4 4 21.0 6 160 110 3.90 2.875 17.02 1
# 3: 1 1 4 22.8 4 108 93 3.85 2.320 18.61 1
# 4: 1 1 3 21.4 6 258 110 3.08 3.215 19.44 0
# 5: 0 2 3 18.7 8 360 175 3.15 3.440 17.02 0
# 6: 1 1 3 18.1 6 225 105 2.76 3.460 20.22 0
Move any column from any position for the first position in your data
n <- which(colnames(df)=="column_need_move")
column_need_move <- df$column_need_to_move
df <- cbind(column_need_move, df[,-n])
If you want to create a new column and have it be the first column, use the .before=1 argument:
my_data <- my_data %>% mutate(newcol = a*b, .before=1)
Related
Is there a way that the row names can be substituted based on predefined vector in R, something like:
rownames(GV) <- c(beta1='Age', beta10='Female Gender')
Or maybe case_when() will be easier for you:
library(dplyr)
df <- data.frame(a = c(1, 2, 3))
rownames(df)
#> [1] "1" "2" "3"
rownames(df) <- case_when(rownames(df) == "1" ~ "one",
rownames(df) == "2" ~ "two",
TRUE ~ rownames(df))
rownames(df)
#> [1] "one" "two" "3"
You specify new value for each contidion and the value for all rest cases (where is TRUE ~ rownames(df) line) - for the rest cases I'm leaving the previous row names above.
We could do the following:
rownames(mtcars)[which(rownames(mtcars) == "Datsun 710")] <- "My Rowname"
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> My Rowname 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If we want to rename more rownames we can use %in%, but as #gss mentions in the comments, this comes with a caveat: not matter the order of the names in the character vector succeeding %in% the names will be replaced in the order they appear in rownames(). Compare the following two calls:
rownames(mtcars)[which(rownames(mtcars) %in% c("Datsun 710", "Mazda RX4 Wag"))] <- c("My Rowname1","My Rowname2")
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> My Rowname1 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> My Rowname2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Which has the same result as:
rownames(mtcars)[which(rownames(mtcars) %in% c("Mazda RX4 Wag", "Datsun 710"))] <- c("My Rowname1","My Rowname2")
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> My Rowname1 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> My Rowname2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Created on 2021-12-21 by the reprex package (v2.0.1)
If you want to rename all the rows, and you have an array of the desired new names in order:
example <- head(mtcars, 3)
mynewnames <- c("First", "Second", "Third")
rownames(example) <- mynewnames
example
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> First 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Second 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Third 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
If you want to rename all the rows, and you have a named array (not necessarily in the correct order):
example <- head(mtcars, 3)
mynewnames <- c("Datsun 710" = "Datsun", "Mazda RX4" = "Mazda", "Mazda RX4 Wag" = "Also Mazda")
rownames(example) <- mynewnames[rownames(example)]
example
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Also Mazda 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
If you want to rename only some rows, and have a named array (an ordered array makes no sense in this context):
example <- head(mtcars, 3)
mynewnames <- c("Mazda RX4" = "This Mazda", "Mazda RX4 Wag" = "That Mazda")
rownames(example)[rownames(example) %in% names(mynewnames)] <-
mynewnames[rownames(example)[rownames(example) %in% names(mynewnames)]]
example
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> This Mazda 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> That Mazda 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
This is a bit unwieldy; if you are only replacing one or two row names then #TimTeaFan's first suggestion is probably easier.
Most safe way and as OP prefers with a predefined named vector is taking the current rownames, replace those who are defined and set the rownames again. this does not fail on an incomplete vector, if it cannot be replaced it stays as it was before.
The advantage of this solution is to prevent the error below if your rename vector is incomplete.
Error in `.rowNamesDF<-`(x, value = value) :
missing values in 'row.names' are not allowed
solution
library(stringr) # used for str_replace_all()
df <- data.frame(
x = rep(1:5),
y = rep(11:15),
row.names = LETTERS[1:5]
)
df
# x y
# A 1 11
# B 2 12
# C 3 13
# D 4 14
# E 5 15
change <- c("A" = "a", "C" = "c")
row.names(df) <- str_replace_all(row.names(df), change)
df
# x y
# a 1 11
# B 2 12
# c 3 13
# D 4 14
# E 5 15
When I run colnames(), it never shows the name of this first column.
For example, after wasting a lot of time researching online, I discovered the name of the first column in mtcars is das_Auto.
Why doesn't this name show when I run this code?
[colnames(mtcars)][1]
What's the easiest way to determine the name of the first column in a data set?
This is because the first 'column' of mtcars is not actually a column but an index. If you want to convert it to a column you can run the below:
df <- cbind(das_Auto = rownames(mtcars), mtcars)
rownames(df) <- 1:nrow(mtcars)
head(df)
das_Auto mpg cyl disp hp drat wt qsec vs am gear carb
1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
I have a table with column names that I want to change based on a table with a column of the current names and a column of the replacement names.
As an example:
library(tidyverse)
name_tbl <- enframe(names(mtcars), value = "old_names") %>%
rowwise() %>%
mutate(new_names = paste0(old_names, sample(50:100, 1)))
name_tbl
#> # A tibble: 11 x 3
#> # Rowwise:
#> name old_names new_names
#> <int> <chr> <chr>
#> 1 1 mpg mpg79
#> 2 2 cyl cyl57
#> 3 3 disp disp70
#> 4 4 hp hp57
#> 5 5 drat drat72
#> 6 6 wt wt88
#> 7 7 qsec qsec86
#> 8 8 vs vs66
#> 9 9 am am91
#> 10 10 gear gear71
#> 11 11 carb carb50
At this point, I would like to be able to do something like:
mtcars %>% rename(name_tbl$old_names = name_tbl$new_names)
however it of course does not work because rename expects named arguments.
I've looked all over and have not found a tidyverse method of doing this operation. Is there some way to rename columns using a lookup table like this?
Currently I'm pivoting longer on everything and then left_joining the key table, however this is problematic because all columns have to be converted to the same type first, which is not easily undone.
We ungroup, the 'name_tbl' and create a named vector or list and use !!! within rename to change the column names
library(dplyr)
library(tibble)
v1 <- name_tbl %>%
ungroup %>%
select(new_names, old_names) %>%
deframe
mtcars %>%
rename(!!! v1)
-output
# mpg82 cyl96 disp93 hp94 drat74 wt58 qsec53 vs70 am79 gear81 carb83
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# ...
One option with the addition of stringr could be:
mtcars %>%
rename_with(~ str_replace_all(., setNames(name_tbl$new_names, name_tbl$old_names)), everything())
mpg81 cyl61 disp61 hp69 drat98 wt99 qsec87 vs87 am77 gear50 carb81
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
I am new to R and I have a problem that I simply cant find the solution to. I have tried reading through older posts etc. but I can't figure out how it could/should be done. I hope that some of you might be able to help.
Using the mtcars dataset as an example,
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
I wish to assign the cars a value (e.g. 1) if for instance the hp valvue is above the median and 0 if below. This could be called "hp_index".
I would do the same for lets say cyl, disp and drat, and then, I would like to assign the cars an "index_score" where a car for instance would be given the value 1 if a minimum of 3 out of the 4 hp, cyl, disp and drat is above median (that is, if 3 or 4 out of the 4 "hp_index", "cyl_index", "disp_index" and "drat_index" is 1).
Once again, I really hope that some of you might be able to help!
Thanks in advance, and have a nice day!
here's a tidyverse solution:
library(tidyverse)
mtcars %>%
mutate(across(c(hp, cyl, disp, drat), .fns = ~if_else(.x >= median(.x), 1, 0), .names = "{.col}_index"),
index_score = apply(across(c(hp_index, cyl_index, disp_index, drat_index)), 1, sum),
index_score = if_else(index_score >= 3, 1, 0))
Note: I created the conditions in a way that I split with >= median and < median. If you only use > and < there could be edge cases with missings if the row has exactly the median.
First six cases:
mpg cyl disp hp drat wt qsec vs am gear carb hp_index cyl_index disp_index drat_index index_score
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0 1 0 1 0
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0 1 0 1 0
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 0 0 0 1 0
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 0 1 1 0 0
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 1 1 1 0 1
6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 0 1 1 0 0
Using dplyr :
library(dplyr)
cols <- c('cyl', 'disp', 'drat', 'hp')
mtcars %>%
mutate(across(cols, list(index = ~+(.x > median(.x))))) %>%
mutate(index_score = +(rowSums(select(., ends_with('index'))) >= 3)) -> result
result
Used select(., ends_with('index')) in rowSums assuming there are no other columns that end with index in your actual dataset apart from the newly created ones with across.
In this case based on all the columns, otherwise specific columns can be selected
mtcars$index=rowSums(
sapply(mtcars,function(x){
ifelse(x>median(x),1,0)
})
)
mpg cyl disp hp drat wt qsec vs am gear carb index
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 5
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 4
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 4
...
from here you can do another ifelse on the "index" column to decide whether it be 0 or 1.
To use different filters for columns, as you proposed
rowSums(
cbind(
sapply(mtcars[c("cyl", "disp", "drat")],function(x){ifelse(x>median(x),1,0)}),
sapply(mtcars["hp"],function(x){ifelse(x<median(x),1,0)})
)
)
As an intermediate step I generate a data frame with one column as character strings and the rest are numbers. I'd like to convert it to a matrix, but first I have to convert that character column into row names and remove it from the data frame.
Is there a simpe way to do this in dplyr? A function like to_rownames() that is opposite to add_rownames()?
I saw a solution using a custom function, but it's really out of dplyr philosophy.
You can now use the tibble-package:
tibble::column_to_rownames()
This provides NSE & standard eval functions:
library(dplyr)
df <- data_frame(a=sample(letters, 4), b=c(1:4), c=c(5:8))
reset_rownames <- function(df, col="rowname") {
stopifnot(is.data.frame(df))
col <- as.character(substitute(col))
reset_rownames_(df, col)
}
reset_rownames_ <- function(df, col="rowname") {
stopifnot(is.data.frame(df))
nm <- data.frame(df)[, col]
df <- df[, !(colnames(df) %in% col)]
rownames(df) <- nm
df
}
m <- "rowname"
head(as.matrix(reset_rownames(add_rownames(mtcars), "rowname")))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
head(as.matrix(reset_rownames_(add_rownames(mtcars), m)))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Perhaps to_rownames() or set_rownames() makes more sense. ¯\_(ツ)_/¯ YMMV.
If you really need a matrix you can just save the character column to a separate variable, drop it, and then create the matrix
library(dplyr)
df <- data_frame(a = sample(letters, 4), b = c(1:4), c = c(5:8))
letters <- df %>% select(a)
a.matrix <- df %>% select(-a) %>% as.matrix
Not sure what you are going to do after that, but this gets you as far as you asked for...