Changing columns right to left in a tibble - r

I have a table - read from an excel file, with column names in English and some variables in Hebrew.
As I read the excel file and receive a tibble, the column names don't fit the data.
I use the following code to read the table:
excel_file <- file.path(the file path, the file)
tab_1 <- read_xlsx(excel_file)
tab_1
The result that I'm getting:
# A tibble: 2 x 5
case a b c d
<chr> <dbl> <dbl> <dbl> <dbl>
1 שחור 3 2 1 4
2 אדום 2 5 2 3
>
How can I change the order of the column names? I have looked all over and found no solution.

You can do it by specifying the column indexes
Using the iris dataset as an example
First, change to a tibble
iris2 <- iris %>% as_tibble()
Reverse columns by manually specifying by column index
iris2[,c(5,4,3,2,1)]
Or do the same programatically
iris2[,ncol(iris2):1]

When the tibble becomes wider (more columns) the answer looks something like this
> tab_1 <- tab_1[,ncol(tab_1):1]
> print(tab_1)
# A tibble: 2 x 19
result q p o n m l k j i h g f e d c b a
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 שחור 2 4 5 6 4 6 2 1 2 5 2 3 4 4 1 2 3
2 אדום 4 3 5 5 6 3 0 3 3 4 5 3 5 3 2 5 2
case
<chr>
1 שחור
2 אדום
changing back the column names of the first part

I use the tidyverse generally for content management. The select function is a clean single line to reverse df columns generally. E.g.,
library(tidyverse)
n <- ncol(mtcars)
mtcars2 <- select(mtcars, c(n:1))

An option is also to use rev
library(dplyr)
mtcars %>%
select(rev(names(.)))

Related

How to convert two vectors into dataframe (wide format)

I want to convert two vectors into a wide format dataframe. The fist vector represent the column names and the second vector the values.
Here is my reproduceable example:
vector1<-c("Reply","Reshare","Like","Share","Search")
vector2<-c(2,1,0,4,3)
Now I want to convert these two vector into a wide format dataframe:
# A tibble: 1 x 5
Reply Reshare Like Share Search
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1 0 4 3
I have found some examples for the long format, but none simple solution for the wide format. Can anyone help me?
You can make a named list (e.g. using setNames), followed by as.data.frame:
df <- as.data.frame(setNames(as.list(vector2), vector1))
Note that it needs to be a list: when converting a named vector into a data.frame, R puts values into separate rows instead of columns.
vector1<-c("Reply","Reshare","Like","Share","Search")
vector2<-c(2,1,0,4,3)
df <- data.frame(vector1, vector2)
df |> tidyr::pivot_wider(names_from = vector1, values_from = vector2)
#> # A tibble: 1 × 5
#> Reply Reshare Like Share Search
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 1 0 4 3
Created on 2022-02-08 by the reprex package (v2.0.1)
Yet another solution, based on dplyr::bind_rows:
library(dplyr)
vector1<-c("Reply","Reshare","Like","Share","Search")
vector2<-c(2,1,0,4,3)
names(vector2) <- vector1
bind_rows(vector2)
#> # A tibble: 1 × 5
#> Reply Reshare Like Share Search
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 1 0 4 3
We can use map_dfc and set_names
library(purrr)
set_names(map_dfc(vector2, ~.x), vector1)
# A tibble: 1 × 5
Reply Reshare Like Share Search
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1 0 4 3
Another possible solution:
library(dplyr)
data.frame(rbind(vector1, vector2)) %>%
`colnames<-`(.[1, ]) %>%
.[-1, ] %>%
`rownames<-`(NULL)
Reply Reshare Like Share Search
1 2 1 0 4 3

How to use an expression in dplyr::mutate in R

I want to add a new column based on a given character vector.
For example, in the example below, I want to add column d defined in expr:
library(magrittr)
data <- tibble::tibble(
a = c(1, 2),
b = c(3, 4)
)
expr <- "d = a + b"
just as below:
data %>%
dplyr::mutate(d = a + b)
# # A tibble: 2 x 3
# a b d
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
However, in the codes below, while the calculations themselves (i.e., adding) work, the names of the new columns are different from what I expected.
data %>%
dplyr::mutate(!!rlang::parse_expr(expr))
# # A tibble: 2 x 3
# a b `d = a + b`
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
data %>%
dplyr::mutate(!!rlang::parse_quo(expr, env = rlang::global_env()))
# # A tibble: 2 x 3
# a b `d = a + b`
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
data %>%
dplyr::mutate(rlang::eval_tidy(rlang::parse_expr(expr)))
# # A tibble: 2 x 3
# a b `rlang::eval_tidy(rlang::parse_expr(expr))`
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
How can I properly use an expression in dplyr::mutate?
My question is similar to this, but in my example, the new variable (d) and its definition (a + b) are given in a single character vector (expr).
Lets first look at what kind of expressions dplyr::mutate takes to create named variables: we need a named list that contains an expression to create variables based on that expression with the given list element name.
library(tidyverse)
data <- tibble::tibble(
a = c(1, 2),
b = c(3, 4)
)
expr <- "d = a + b"
# let's rewrite the string above as named list containing an expression.
expr2 <- list(d = expr(a + b))
# this works as expected:
data %>%
mutate(!!! expr2)
#> # A tibble: 2 x 3
#> a b d
#> <dbl> <dbl> <dbl>
#> 1 1 3 4
#> 2 2 4 6
Now we simply need a function that transforms a string into a named list containing the expression of the right-hand side of the equation. The name needs to be the left-hand side of the equation. We can do this with regular string manipulations. Finally we need to transform the right-hand side of the equation from a string into an expression. We can use base R's str2lang here.
create_expr_ls <- function(str_expr) {
expr_nm <- str_extract(str_expr, "^\\w+")
expr_code <- str_replace_all(str_expr, "(^\\w+\\s?=\\s?)(.*)", "\\2")
set_names(list(str2lang(expr_code)), expr_nm)
}
expr3 <- create_expr_ls(expr)
data %>%
mutate(!!! expr3)
#> # A tibble: 2 x 3
#> a b d
#> <dbl> <dbl> <dbl>
#> 1 1 3 4
#> 2 2 4 6
Created on 2022-01-23 by the reprex package (v0.3.0)
Any of these work. The second is similar to the first but does not require that rlang be on the search path. The third and fourth also work if the d= part is not present in expr in which case default names are used. The last one uses only base R and is also the shortest.
data %>% mutate(within(., !!parse_expr(expr)))
data %>% mutate(within(., !!parse(text = expr)))
data %>% mutate(data, !!parse_expr(sprintf("tibble(%s)", expr)))
data %>% { eval_tidy(parse_expr(sprintf("mutate(., %s)", expr))) }
within(data, eval(parse(text = expr))) # base R
Note
Assume this premable:
library(dplyr)
library(rlang)
# input
data <- tibble(a = c(1, 2), b = c(3, 4))
expr <- "d = a + b"
To get the desired name for the mutated column, you can still use the same syntax and assign the results to a column with the preferred name. To get this name you can use a regular expression to find what is before = and then remove any leading or trailing spaces that might exist.
expr <- "x = a * b"
col_name <- trimws(str_extract(expr,"[^=]+"))
data %>%
dplyr::mutate(!!col_name := !!rlang::parse_expr(expr))
# A tibble: 2 × 3
a b x
<dbl> <dbl> <dbl>
1 1 3 3
2 2 4 8
data %>%
dplyr::mutate(!!col_name := !!rlang::parse_quo(expr, env = rlang::global_env()))
# A tibble: 2 × 3
a b x
<dbl> <dbl> <dbl>
1 1 3 3
2 2 4 8
data %>%
dplyr::mutate(!!col_name := rlang::eval_tidy(rlang::parse_expr(expr)))
# A tibble: 2 × 3
a b x
<dbl> <dbl> <dbl>
1 1 3 3
2 2 4 8

Filter a tibble with two conditions

i have this tibble..
tibble(id=c(4,4), client=c(5,10), stock=c(NA,10))
# A tibble: 2 x 3
id client stock
<dbl> <dbl> <dbl>
1 4 5 NA
2 4 10 10
from which i want to keep the row where client == 5 and stock == 10. How would i filter that? So my desired outcome would be:
# A tibble: 1 x 3
a client stock
<dbl> <dbl> <dbl>
1 4 5 10
Not sure about the context of filtering using values from different rows, but see if below operation works for you.
> library(dplyr)
> df %>% fill(stock, .direction = 'up') %>% filter(client == 5 & stock == 10)
# A tibble: 1 x 3
id client stock
<dbl> <dbl> <dbl>
1 4 5 10

When I don't know column names in data.frame, when I use dplyr mutate function

I like to know how I can use dplyr mutate function when I don't know column names. Here is my example code;
library(dplyr)
w<-c(2,3,4)
x<-c(1,2,7)
y<-c(1,5,4)
z<-c(3,2,6)
df <- data.frame(w,x,y,z)
df %>% rowwise() %>% mutate(minimum = min(x,y,z))
Source: local data frame [3 x 5]
Groups: <by row>
# A tibble: 3 x 5
w x y z minimum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1 1 3 1
2 3 2 5 2 2
3 4 7 4 6 4
This code is finding minimum value in row-wise. Yes, "df %>% rowwise() %>% mutate(minimum = min(x,y,z))" works because I typed column names, x, y, z. But, let's assume that I have a really big data.frame with several hundred columns, and I don't know all of the column names. Or, I have multiple data sets of data.frame, and they have all different column names; I just want to find a minimum value from 10th column to 20th column in each row and in each data.frame.
In this example data.frame I provided above, let's assume that I don't know column names, but I just want to get minimum value from 2nd column to 4th column in each row. Of course, this doesn't work, because 'mutate' doesn't work with vector;
df %>% rowwise() %>% mutate(minimum=min(df[,2],df[,3], df[,4]))
Source: local data frame [3 x 5]
Groups: <by row>
# A tibble: 3 x 5
w x y z minimum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1 1 3 1
2 3 2 5 2 1
3 4 7 4 6 1
These two codes below also don't work.
df %>% rowwise() %>% mutate(average=min(colnames(df)[2], colnames(df)[3], colnames(df)[4]))
df %>% rowwise() %>% mutate(average=min(noquote(colnames(df)[2]), noquote(colnames(df)[3]), noquote(colnames(df)[4])))
I know that I can get minimum value by using apply or different method when I don't know column names. But, I like to know whether dplyr mutate function can be able to do that without known column names.
Thank you,
With apply:
library(dplyr)
library(purrr)
df %>%
mutate(minimum = apply(df[,2:4], 1, min))
or with pmap:
df %>%
mutate(minimum = pmap(.[2:4], min))
Also with by_row from purrrlyr:
df %>%
purrrlyr::by_row(~min(.[2:4]), .collate = "rows", .to = "minimum")
Output:
# tibble [3 x 5]
w x y z minimum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1 1 3 1
2 3 2 5 2 2
3 4 7 4 6 4
A vectorized option would be pmin. Convert the column names to symbols with syms and evaluate (!!!) to return the values of the columns on which pmin is applied
library(dplyr)
df %>%
mutate(minimum = pmin(!!! rlang::syms(names(.)[2:4])))
# w x y z minimum
#1 2 1 1 3 1
#2 3 2 5 2 2
#3 4 7 4 6 4
Here is a tidyeval approach along the lines of the suggestion from aosmith. If you don't know the column names, you can make a function that accepts the desired positions as inputs and finds the columns names itself. Here, rlang::syms() takes the column names as strings and turns them into symbols, !!! unquotes and splices the symbols into the function.
library(dplyr)
w<-c(2,3,4)
x<-c(1,2,7)
y<-c(1,5,4)
z<-c(3,2,6)
df <- data.frame(w,x,y,z)
rowwise_min <- function(df, min_cols){
cols <- df[, min_cols] %>% colnames %>% rlang::syms()
df %>%
rowwise %>%
mutate(minimum = min(!!!cols))
}
rowwise_min(df, 2:4)
#> Source: local data frame [3 x 5]
#> Groups: <by row>
#>
#> # A tibble: 3 x 5
#> w x y z minimum
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 1 1 3 1
#> 2 3 2 5 2 2
#> 3 4 7 4 6 4
rowwise_min(df, c(1, 3))
#> Source: local data frame [3 x 5]
#> Groups: <by row>
#>
#> # A tibble: 3 x 5
#> w x y z minimum
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 1 1 3 1
#> 2 3 2 5 2 3
#> 3 4 7 4 6 4
Created on 2018-09-04 by the reprex package (v0.2.0).

Mass changing columns of a data set to numeric

I've imported an excel data set and want to set nearly all columns (greater than 90) to numeric when they are initially characters. What is the best way to achieve this because importing and changing each to numeric one by one isn't the most efficient approach?
This should do as you wish:
# Random data frame for illustration (100 columns wide)
df <- data.frame(replicate(100,sample(0:1,1000,rep=TRUE)))
# Check column names / return column number (just encase you wanted to check)
colnames(df)
# Specify columns
cols <- c(1:length(df)) # length(df) is useful as if you ever add more columns at later date
# Or if only want to specify specific column numbers:
# cols <- c(1:100)
#With help of magrittr pipe function change all to numeric
library(magrittr)
df[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))
# Check our columns are numeric
str(df)
Assuming your data is already imported with all character columns, you can convert the relevant columns to numeric using mutate_at by position or name:
suppressPackageStartupMessages(library(tidyverse))
# Assume the imported excel file has 5 columns a to e
df <- tibble(a = as.character(1:3),
b = as.character(5:7),
c = as.character(8:10),
d = as.character(2:4),
e = as.character(2:4))
# select the columns by position (convert all except 'b')
df %>% mutate_at(c(1, 3:5), as.numeric)
#> # A tibble: 3 x 5
#> a b c d e
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 5 8 2 2
#> 2 2 6 9 3 3
#> 3 3 7 10 4 4
# or drop the columns that shouldn't be used ('b' and 'd' should stay as chr)
df %>% mutate_at(-c(2, 4), as.numeric)
#> # A tibble: 3 x 5
#> a b c d e
#> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 1 5 8 2 2
#> 2 2 6 9 3 3
#> 3 3 7 10 4 4
# select the columns by name
df %>% mutate_at(c("a", "c", "d", "e"), as.numeric)
#> # A tibble: 3 x 5
#> a b c d e
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 5 8 2 2
#> 2 2 6 9 3 3
#> 3 3 7 10 4 4

Resources