R - dplyr - code to run a number of very similar queries...? - r

Consider the following:
df <- data.frame(
Name = c("Alan", "Bob", "Christine", "David", "Erica"),
Gender = c("M", "M", "F", "M", "F"),
Star_Sign = c("Aquarius", "Capricorn", "Aquarius", "Libra", "Leo"),
City = c("London", "Paris", "Berlin", "London", "Paris"),
Blood_Group = c("A", "AB", "B", "O", "A"),
Hours_Worked = c(2000, 1600, 0, 100, 200),
Salary = c(100000, 20000, 0, 500, 4000)
)
Name_Summary <- df %>% group_by(Name) %>% summarise(Hours_Worked = sum(Hours_Worked), Average_Salary = mean(Salary))
Gender_Summary <- df %>% group_by(Gender) %>% summarise(Hours_Worked = sum(Hours_Worked), Average_Salary = mean(Salary))
Star_Sign_Summary <- df %>% group_by(Star_Sign) %>% summarise(Hours_Worked = sum(Hours_Worked), Average_Salary = mean(Salary))
City_Summary <- df %>% group_by(City) %>% summarise(Hours_Worked = sum(Hours_Worked), Average_Salary = mean(Salary))
Blood_Group_Summary <- df %>% group_by(Blood_Group) %>% summarise(Hours_Worked = sum(Hours_Worked), Average_Salary = mean(Salary))
Obviously this works fine for a small number of fields. If, however, I've got 100 different fields (say) to do this for, it becomes very unwieldy.
I'd like to think that there is a way to loop through the list of fields and produce these summaries for each field, using some code to generate (and name the summaries), but I don't think I know how to do this. Can anyone help please?
Thanks
Alan

If you have a list of the columns you want to group by as a character vector:
vars_to_group_by <- names(df)[1:5]
You could iterate over them (I'm using purrr::map() but you could use lapply() or a loop), and use this rlang pattern to convert strings >> symbols >> properly evaluated variables.
library(tidyverse)
map(vars_to_group_by, sym) %>%
map(~ df %>%
group_by(!!.x) %>%
summarise(avg_salary = mean(Salary),
avg_hours = mean(Hours_Worked),
avg_hourly_wage = avg_salary / avg_hours))
You get an unnamed list back, because the vector going in was unnamed.
[[1]]
# A tibble: 5 x 4
Name avg_salary avg_hours avg_hourly_wage
<fct> <dbl> <dbl> <dbl>
1 Alan 100000 2000 50
2 Bob 20000 1600 12.5
3 Christine 0 0 NaN
4 David 500 100 5
5 Erica 4000 200 20
[[2]]
# A tibble: 2 x 4
Gender avg_salary avg_hours avg_hourly_wage
<fct> <dbl> <dbl> <dbl>
1 F 2000 100 20
2 M 40167. 1233. 32.6
[[3]]
# A tibble: 4 x 4
Star_Sign avg_salary avg_hours avg_hourly_wage
<fct> <dbl> <dbl> <dbl>
1 Aquarius 50000 1000 50
2 Capricorn 20000 1600 12.5
3 Leo 4000 200 20
4 Libra 500 100 5
[[4]]
# A tibble: 3 x 4
City avg_salary avg_hours avg_hourly_wage
<fct> <dbl> <dbl> <dbl>
1 Berlin 0 0 NaN
2 London 50250 1050 47.9
3 Paris 12000 900 13.3
[[5]]
# A tibble: 4 x 4
Blood_Group avg_salary avg_hours avg_hourly_wage
<fct> <dbl> <dbl> <dbl>
1 A 52000 1100 47.3
2 AB 20000 1600 12.5
3 B 0 0 NaN
4 O 500 100 5
You could add names based on vars_to_group_by either before or after the map() calls.

We could use the group_by_at which can take a string as input
library(purrr)
library(dplyr)
map(names(df)[-6], ~ df %>%
group_by_at(.x) %>%
summarise(avg_salary = mean(Salary)))
#[[1]]
# A tibble: 5 x 2
# Name avg_salary
# <fct> <dbl>
#1 Alan 100000
#2 Bob 20000
#3 Christine 0
#4 David 500
#5 Erica 4000
#[[2]]
# A tibble: 2 x 2
# Gender avg_salary
# <fct> <dbl>
#1 F 2000
#2 M 40167.
#[[3]]
# A tibble: 4 x 2
# Star_Sign avg_salary
# <fct> <dbl>
#1 Aquarius 50000
#2 Capricorn 20000
#3 Leo 4000
#4 Libra 500
#[[4]]
# A tibble: 3 x 2
# City avg_salary
# <fct> <dbl>
#1 Berlin 0
#2 London 50250
#3 Paris 12000
#[[5]]
# A tibble: 4 x 2
# Blood_Group avg_salary
# <fct> <dbl>
#1 A 52000
#2 AB 20000
#3 B 0
#4 O 500

Related

R: Aggregate data in sliding window into new columns

let's say I have a dataframe like this:
df <- tibble(ID = c(1, 1, 1, 1, 1), v1 = c(3, 5, 1, 0, 1), v2 = c(10, 6, 1, 20, 23), Time = c(as.POSIXct("1900-01-01 10:00:00"), as.POSIXct("1900-01-01 11:00:00"), as.POSIXct("1900-01-01 13:00:00"), as.POSIXct("1900-01-01 16:00:00"), as.POSIXct("1900-01-01 20:00:00"))) %>% group_by(ID)
# A tibble: 5 x 4
# Groups: ID [1]
ID v1 v2 Time
<dbl> <dbl> <dbl> <dttm>
1 1 3 10 1900-01-01 10:00:00
2 1 5 6 1900-01-01 11:00:00
3 1 1 1 1900-01-01 13:00:00
4 1 0 20 1900-01-01 16:00:00
5 1 1 23 1900-01-01 20:00:00
In words, this is a simple timeseries of a specific ID with two values v1 and v2 per time.
As quite common in machine learning, I want to aggregate the last n timesteps into one feature vector. For all previous timesteps there should be a time reference in hours when this data point occured. For the first row, where no previous timestep is available, the data should be filled with zeros.
Let's make an example. In this case n=2, that is I want to aggregate the current time step (t2) and the prevopus (t1) together:
# A tibble: 5 x 6
ID v1_t1 v2_t1 time_t1 v1_t2 v2_t2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 NA 3 10
2 1 3 10 1 5 6
3 1 5 6 2 1 1
4 1 1 1 3 0 20
5 1 0 20 4 1 23
I want to keep that as generic as possible, so that n can change and the number of data columns. Any idea how to do this?
Thanks :)
Using dplyr::lag and dplyr::across you could do:
library(dplyr, warn=FALSE)
library(lubridate, warn=FALSE)
df %>%
group_by(ID) %>%
mutate(time_t1 = lubridate::hour(Time) - lag(lubridate::hour(Time))) %>%
mutate(across(c(v1, v2), .fns = list(t2 = ~.x, t1 = ~lag(.x, default = 0)))) %>%
select(-v1, -v2, -Time)
#> # A tibble: 5 × 6
#> # Groups: ID [1]
#> ID time_t1 v1_t2 v1_t1 v2_t2 v2_t1
#> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 NA 3 0 10 0
#> 2 1 1 5 3 6 10
#> 3 1 2 1 5 1 6
#> 4 1 3 0 1 20 1
#> 5 1 4 1 0 23 20
UPDATE Here is a more generic approach which makes use of some function factories to create list of functions which could then be passed to the .fns argument of across. Haven't tested for the more general case but should work for any n or number of lags to include and also for any number of data columns.
library(dplyr, warn=FALSE)
library(lubridate, warn=FALSE)
fun_factory1 <- function(n) {
function(x) {
lubridate::hour(x) - lag(lubridate::hour(x), n = n)
}
}
fun_factory2 <- function(n) {
function(x) {
lag(x, n = n, default = 0)
}
}
n <- 2
fns1 <- lapply(seq(n - 1), fun_factory1)
names(fns1) <- paste0("t", seq(n - 1))
fns2 <- lapply(seq(n) - 1, fun_factory2)
names(fns2) <- paste0("t", seq(n))
df %>%
group_by(ID) %>%
mutate(across(Time, .fns = fns1)) %>%
mutate(across(c(v1, v2), .fns = fns2)) %>%
select(-v1, -v2, -Time)
#> # A tibble: 5 × 6
#> # Groups: ID [1]
#> ID Time_t1 v1_t1 v1_t2 v2_t1 v2_t2
#> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 NA 3 0 10 0
#> 2 1 1 5 3 6 10
#> 3 1 2 1 5 1 6
#> 4 1 3 0 1 20 1
#> 5 1 4 1 0 23 20

Convert rows into columns in R

I have this sample dataset and i want to convert it into the following format:
Type <- c("AGE", "AGE", "REGION", "REGION", "REGION", "DRIVERS", "DRIVERS")
Level <- c("18-25", "26-70", "London", "Southampton", "Newcastle", "1", "2")
Estimate <- c(1.5,1,2,3,1,2,2.5)
df_before <- data.frame(Type, Level, Estimate)
Type Level Estimate
1 AGE 18-25 1.5
2 AGE 26-70 1.0
3 REGION London 2.0
4 REGION Southampton 3.0
5 REGION Newcastle 1.0
6 DRIVERS 1 2.0
7 DRIVERS 2 2.5
Basically, I would like to to transform the dataset into the following format. I have tried with the function dcast() but it seems that is not working.
AGE Estimate_AGE REGION Estimate_REGION DRIVERS Estimate_DRIVERS
1 18-25 1.5 London 2 1 2.0
2 26-70 1.0 Southampton 3 2 2.5
3 <NA> NA Newcastle 1 <NA> NA
df_before %>%
group_by(Type) %>%
mutate(id = row_number(), Estimate = as.character(Estimate))%>%
pivot_longer(-c(Type, id)) %>%
pivot_wider(id, names_from = c(Type, name))%>%
type.convert(as.is = TRUE)
# A tibble: 3 x 7
id AGE_Level AGE_Estimate REGION_Level REGION_Estimate DRIVERS_Level DRIVERS_Estimate
<int> <chr> <dbl> <chr> <int> <int> <dbl>
1 1 18-25 1.5 London 2 1 2
2 2 26-70 1 Southampton 3 2 2.5
3 3 NA NA Newcastle 1 NA NA
In data.table:
library(data.table)
setDT(df_before)
dcast(melt(df_before, 'Type'), rowid(Type, variable)~Type + variable)
Note that you will get alot of warning because of the type mismatch. You could use reshape2::melt to avoid this.
Anyway your datafram is not in a standard format.
In Base R >=4.0
transform(df_before, id = ave(Estimate, Type, FUN = seq_along)) |>
reshape(v.names = c('Level', 'Estimate'), dir = 'wide', timevar = 'Type', sep = "_")
id Level_AGE Estimate_AGE Level_REGION Estimate_REGION Level_DRIVERS Estimate_DRIVERS
1 1 18-25 1.5 London 2 1 2.0
2 2 26-70 1.0 Southampton 3 2 2.5
5 3 <NA> NA Newcastle 1 <NA> NA
IN base R <4
reshape(transform(df_before, id = ave(Estimate, Type, FUN = seq_along)),
v.names = c('Level', 'Estimate'), dir = 'wide', timevar = 'Type', sep = "_")
Update:
The exact output as the desired output:
df_before %>%
group_by(Type) %>%
mutate(id = row_number()) %>%
pivot_wider(
names_from = Type,
values_from = c(Level, Estimate)
) %>%
select(AGE = Level_AGE, Estimate_AGE, REGION = Level_REGION,
Estimate_REGION, DRIVERS = Level_DRIVERS, Estimate_DRIVERS) %>%
type.convert(as.is=TRUE)
AGE Estimate_AGE REGION Estimate_REGION DRIVERS Estimate_DRIVERS
<chr> <dbl> <chr> <int> <int> <dbl>
1 18-25 1.5 London 2 1 2
2 26-70 1 Southampton 3 2 2.5
3 NA NA Newcastle 1 NA NA
First answer:
Main aspect is to group by Type as already provided Onyambu's solution. After that we could use one pivot_wider:
library(dplyr)
library(tidyr)
df_before %>%
group_by(Type) %>%
mutate(id = row_number()) %>%
pivot_wider(
names_from = Type,
values_from = c(Level, Estimate)
)
id Level_AGE Level_REGION Level_DRIVERS Estimate_AGE Estimate_REGION Estimate_DRIVERS
<int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1 18-25 London 1 1.5 2 2
2 2 26-70 Southampton 2 1 3 2.5
3 3 NA Newcastle NA NA 1 NA
We can try this:
library(tidyverse)
Type <- c("AGE", "AGE", "REGION", "REGION", "REGION", "DRIVERS", "DRIVERS")
Level <- c("18-25", "26-70", "London", "Southampton", "Newcastle", "1", "2")
Estimate <- c(1.5, 1, 2, 3, 1, 2, 2.5)
df_before <- data.frame(Type, Level, Estimate)
data <-
df_before %>% group_split(Type)
data <-
map2(
data, map(data, ~ unique(.$Type)),
~ mutate(., "{.y}" := Level, "Estimate_{.y}" := Estimate) %>%
select(-c("Type", "Level", "Estimate"))
)
#get the longest number of rows to be able to join the columns
max_rows <- map_dbl(data, nrow) %>%
max()
#add rows if needed
map_if(
data, ~ nrow(.) < max_rows,
~ rbind(., NA)
) %>%
bind_cols()
#> # A tibble: 3 × 6
#> AGE Estimate_AGE DRIVERS Estimate_DRIVERS REGION Estimate_REGION
#> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 18-25 1.5 1 2 London 2
#> 2 26-70 1 2 2.5 Southampton 3
#> 3 <NA> NA <NA> NA Newcastle 1
Created on 2021-12-07 by the reprex package (v2.0.1)
A solution based on tidyr::pivot_wider and purrr::map_dfc:
library(tidyverse)
Type <- c("AGE", "AGE", "REGION", "REGION", "REGION", "DRIVERS", "DRIVERS")
Level <- c("18-25", "26-70", "London", "Southampton", "Newcastle", "1", "2")
Estimate <- c(1.5,1,2,3,1,2,2.5)
df_before <- data.frame(Type, Level, Estimate)
df_before %>%
pivot_wider(names_from=Type, values_from=c(Level, Estimate), values_fn=list) %>%
map_dfc(~ c(unlist(.x), rep(NA, max(table(df_before$Type))-length(unlist(.x)))))
#> # A tibble: 3 × 6
#> Level_AGE Level_REGION Level_DRIVERS Estimate_AGE Estimate_REGION
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 18-25 London 1 1.5 2
#> 2 26-70 Southampton 2 1 3
#> 3 <NA> Newcastle <NA> NA 1
#> # … with 1 more variable: Estimate_DRIVERS <dbl>
Another solution, based on dplyr:: group_split and purrr::map_dfc:
library(tidyverse)
df_before %>%
mutate(maxn = max(table(.$Type))) %>%
group_by(Type) %>% group_split() %>%
map_dfc(
~ data.frame(c(.x$Level, rep(NA, .x$maxn[1] - nrow(.x))),
c(.x$Estimate, rep(NA, .x$maxn[1] - nrow(.x)))) %>%
set_names(c(.x$Type[1], paste0("Estimate_", .x$Type[1])))) %>%
type.convert(as.is=T)
#> AGE Estimate_AGE DRIVERS Estimate_DRIVERS REGION Estimate_REGION
#> 1 18-25 1.5 1 2.0 London 2
#> 2 26-70 1.0 2 2.5 Southampton 3
#> 3 <NA> NA NA NA Newcastle 1

Why does my dplyr percentile calculation not work with tidy evaluation?

I have a tibble with student test data, and I wish to convert these to percentiles using dplyr. For the sake of having a minimal example, imagine the following setup of three students.
require(tidyverse)
tbl <- tibble(Name = c("Alice", "Bob", "Cat"), Test = c(16, 13, 15))
The following code works and yields the desired output.
tbl %>% mutate(TestPercentile = cume_dist(Test) * 100)
# A tibble: 3 x 3
Name Test TestPercentile
<chr> <dbl> <dbl>
1 Alice 16 100
2 Bob 13 33.3
3 Cat 15 66.7
However, I actually want to do it programmatically because there are many such columns.
colname <- "Test"
percname <- str_c(colname, "Percentile")
tbl %>% mutate({{percname}} := cume_dist({{colname}}) * 100)
# A tibble: 3 x 3
Name Test TestPercentile
<chr> <dbl> <dbl>
1 Alice 16 100
2 Bob 13 100
3 Cat 15 100
Why does cume_dist make the percentile 100 for all students when I try to use tidy evaluation like this? (And ideally, if I can be permitted a second question, how can I fix it?)
If by programmatically you mean you want to write your own function, you can do it like this:
calculate_percentile <- function(data, colname) {
data %>%
mutate("{{colname}}Percentile" := cume_dist({{colname}} * 100))
}
tbl %>%
calculate_percentile(Test)
# A tibble: 3 x 3
Name Test TestPercentile
<chr> <dbl> <dbl>
1 Alice 16 1
2 Bob 13 0.333
3 Cat 15 0.667
Edit for multiple columns
New Data
tbl <- tibble(Name = c("Alice", "Bob", "Cat"), Test = c(16, 13, 15), Test_math = c(16, 30, 55), Test_music = c(3, 78, 34))
calculate_percentile <- function(data, colnames) {
data %>%
mutate(across({{colnames}}, ~cume_dist(.) * 100, .names = "{col}Percentile"))
}
test_columns <- c("Test_math", "Test_music")
tbl %>%
calculate_percentile(test_columns)
# A tibble: 3 x 6
Name Test Test_math Test_music Test_mathPercentile Test_musicPercentile
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Alice 16 16 3 33.3 33.3
2 Bob 13 30 78 66.7 100
3 Cat 15 55 34 100 66.7
Why does your solution not work? Because your solution applies cume_dist literally to the string "test":
tbl %>% mutate({{percname}} := print({{colname}}))
[1] "Test"
# A tibble: 3 x 5
Name Test Test_math Test_music TestPercentile
<chr> <dbl> <dbl> <dbl> <chr>
1 Alice 16 16 3 Test
2 Bob 13 30 78 Test
3 Cat 15 55 34 Test
Why does this give a TestPercentile value of 100? Because cume_dist of "test" is 1:
cume_dist("test")
#[1] 1
So we need R to tell not to evaluate the string "test" per se but to look for a variable with this name, which we can do like this:
tbl %>% mutate({{percname}} := cume_dist(!!parse_quo(colname, env = global_env())) * 100)
# A tibble: 3 x 5
Name Test Test_math Test_music TestPercentile
<chr> <dbl> <dbl> <dbl> <dbl>
1 Alice 16 16 3 100
2 Bob 13 30 78 33.3
3 Cat 15 55 34 66.7
#Check that this uses the values of "Test" and not "Test" per se:
tbl %>% mutate({{percname}} := print(!!parse_quo(colname, env = global_env())))
[1] 16 13 15
# A tibble: 3 x 5
Name Test Test_math Test_music TestPercentile
<chr> <dbl> <dbl> <dbl> <dbl>
1 Alice 16 16 3 16
2 Bob 13 30 78 13
3 Cat 15 55 34 15
Passing column name as string :
library(dplyr)
library(rlang)
return_percentile <- function(data, colname) {
percname <- paste0(colname, "Percentile")
data %>% mutate({{percname}} := cume_dist(!!sym(colname)) * 100)
}
tbl %>% return_percentile("Test")
# A tibble: 3 x 3
# Name Test TestPercentile
# <chr> <dbl> <dbl>
#1 Alice 16 100
#2 Bob 13 33.3
#3 Cat 15 66.7
Passing column name unquoted :
return_percentile <- function(data, colname) {
percname <- paste0(deparse(substitute(colname)), "Percentile")
data %>% mutate({{percname}} := cume_dist({{colname}}) * 100)
}
tbl %>% return_percentile(Test)
# A tibble: 3 x 3
# Name Test TestPercentile
# <chr> <dbl> <dbl>
#1 Alice 16 100
#2 Bob 13 33.3
#3 Cat 15 66.7

Map as.numeric to only specific columns of a dataframe

I have some data in the format below, where all columns are of type chr.
#> # A tibble: 3 x 4
#> id age name income
#> <chr> <chr> <chr> <chr>
#> 1 1 18 jim 100
#> 2 2 21 bob 200
#> 3 3 16 alice 300
I'd like to use as.numeric() on only some columns. Preferably, I'd like to define a vector of column names and then use purrr:map to map as.numeric() to only those columns:
numeric_variables <- c("id", "age", "income")
How can I map that?
My desired output would look like:
df
#> # A tibble: 3 x 4
#> id age name income
#> <dbl> <dbl> <chr> <dbl>
#> 1 1 18 jim 100
#> 2 2 21 bob 200
#> 3 3 16 alice 300
Code for data entry below.
library(purrr)
df <- data.frame(stringsAsFactors=FALSE,
id = c(1, 2, 3),
age = c(18, 21, 16),
name = c("jim", "bob", "alice"),
income = c(100, 200, 300)
)
df <- map_df(df, as.character)
df
Created on 2020-02-15 by the reprex package (v0.3.0)
We can use mutate_at
library(dplyr)
df %>%
mutate_at(vars(numeric_variables), as.numeric) %>%
as_tibble
# A tibble: 3 x 4
# id age name income
# <dbl> <dbl> <chr> <dbl>
#1 1 18 jim 100
#2 2 21 bob 200
#3 3 16 alice 300
Or more easily
df %>%
type.convert(as.is = TRUE)
Or with map
library(purrr)
df %>%
map_if(names(.) %in% numeric_variables, as.numeric) %>%
bind_cols
# A tibble: 3 x 4
# id age name income
# <dbl> <dbl> <chr> <dbl>
#1 1 18 jim 100
#2 2 21 bob 200
#3 3 16 alice 300
Or if we use the compound assignment operator (%<>%), this can be assigned in place
library(magrittr)
df %<>%
map_if(names(.) %in% numeric_variables, as.numeric) %<>%
bind_cols
str(df)
#tibble [3 × 4] (S3: tbl_df/tbl/data.frame)
# $ id : num [1:3] 1 2 3
# $ age : num [1:3] 18 21 16
# $ name : chr [1:3] "jim" "bob" "alice"
# $ income: num [1:3] 100 200 300
You can use map_at
df[] <- purrr::map_at(df, numeric_variables, as.numeric)
df
# A tibble: 3 x 4
# id age name income
# <dbl> <dbl> <chr> <dbl>
#1 1 18 jim 100
#2 2 21 bob 200
#3 3 16 alice 300

loop to multiply across columns

I have a data frame with columns labeled sales1, sales2, price1, price2 and I want to calculate revenues by multiplying sales1 * price1 and so-on across each number in an iterative fashion.
data <- data_frame(
"sales1" = c(1, 2, 3),
"sales2" = c(2, 3, 4),
"price1" = c(3, 2, 2),
"price2" = c(3, 3, 5))
data
# A tibble: 3 x 4
# sales1 sales2 price1 price2
# <dbl> <dbl> <dbl> <dbl>
#1 1 2 3 3
#2 2 3 2 3
#3 3 4 2 5
Why doesn't the following code work?
data %>%
mutate (
for (i in seq_along(1:2)) {
paste0("revenue",i) = paste0("sales",i) * paste0("price",i)
}
)
Assuming your columns are already ordered (sales1, sales2, price1, price2). We can split the dataframe in two parts and then multiply them
data[grep("sales", names(data))] * data[grep("price", names(data))]
# sales1 sales2
#1 3 6
#2 4 9
#3 6 20
If the columns are not already sorted according to their names, we can sort them by using order and then use above command.
data <- data[order(names(data))]
This answer is not brief. For that, #RonakShah's existing answer is the one to look at!
My response is intended to address a broader concern regarding the difficulty of trying to do this in the tidyverse. My understanding is this is difficult because the data is not currently in a "tidy" format. Instead, you can create a tidy data frame like so:
library(tidyverse)
tidy_df <- data %>%
rownames_to_column() %>%
gather(key, value, -rowname) %>%
extract(key, c("variable", "id"), "([a-z]+)([0-9]+)") %>%
spread(variable, value)
Which then makes the final calculation straightforward
tidy_df %>% mutate(revenue = sales * price)
#> # A tibble: 6 x 5
#> rowname id price sales revenue
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 1 3 1 3
#> 2 1 2 3 2 6
#> 3 2 1 2 2 4
#> 4 2 2 3 3 9
#> 5 3 1 2 3 6
#> 6 3 2 5 4 20
If you need to get the data back into the original format you can although this feels clunky to me (I'm sure this can be improved in someway).
tidy_df %>% mutate(revenue = sales * price) %>%
gather(key, value, -c(rowname, id)) %>%
unite(key, key, id, sep = "") %>%
spread(key, value) %>%
select(starts_with("price"),
starts_with("sales"),
starts_with("revenue"))
#> # A tibble: 3 x 6
#> price1 price2 sales1 sales2 revenue1 revenue2
#> * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 3 3 1 2 3 6
#> 2 2 3 2 3 4 9
#> 3 2 5 3 4 6 20

Resources