How to control new variables' names after tidyr's spread?

How to control new variables' names after tidyr's spread? - r

I have a dataframe with panel structure: 2 observations for each unit from two years:
library(tidyr)
mydf <- data.frame(
id = rep(1:3, rep(2,3)),
year = rep(c(2012, 2013), 3),
value = runif(6)
)
mydf
# id year value
#1 1 2012 0.09668064
#2 1 2013 0.62739399
#3 2 2012 0.45618433
#4 2 2013 0.60347152
#5 3 2012 0.84537624
#6 3 2013 0.33466030
I would like to reshape this data to wide format which can be done easily with tidyr::spread. However, as the values of the year variable are numbers, the names of my new variables become numbers as well which makes its further use harder.
spread(mydf, year, value)
# id 2012 2013
#1 1 0.09668064 0.6273940
#2 2 0.45618433 0.6034715
#3 3 0.84537624 0.3346603
I know I can easily rename the columns. However, if I would like to reshape within a chain with other operations, it becomes inconvenient. E.g. the following line obviously does not make sense.
library(dplyr)
mydf %>% spread(year, value) %>% filter(2012 > 0.5)
The following works but is not that concise:
tmp <- spread(mydf, year, value)
names(tmp) <- c("id", "y2012", "y2013")
filter(tmp, y2012 > 0.5)
Any idea how I can change the new variable names within spread?

I know some years has passed since this question was originally asked, but for posterity I want to also highlight the sep argument of spread. When not NULL, it will be used as separator between the key name and values:
mydf %>%
spread(key = year, value = value, sep = "")
# id year2012 year2013
#1 1 0.15608322 0.6886531
#2 2 0.04598124 0.0792947
#3 3 0.16835445 0.1744542
This is not exactly as wanted in the question, but sufficient for my purposes. See ?spread.
Update with tidyr 1.0.0: tidyr 1.0.0 have now introduced pivot_wider (and pivot_longer) which allows for more control in this respect with the arguments names_sep and names_prefix. So now the call would be:
mydf %>%
pivot_wider(names_from = year, values_from = value,
names_prefix = "year")
# # A tibble: 3 x 3
# id year2012 year2013
# <int> <dbl> <dbl>
# 1 1 0.347 0.388
# 2 2 0.565 0.924
# 3 3 0.406 0.296
To get exactly what was originally wanted (prefixing "y" only) you can of course now get that directly by simply having names_prefix = "y".
The names_sep is used in case you gather over multiple columns as demonstrated below where I have added quarters to the data:
# Add quarters to data
mydf2 <- data.frame(
id = rep(1:3, each = 8),
year = rep(rep(c(2012, 2013), each = 4), 3),
quarter = rep(c("Q1","Q2","Q3","Q4"), 3),
value = runif(24)
)
head(mydf2)
# id year quarter value
# 1 1 2012 Q1 0.8651470
# 2 1 2012 Q2 0.3944423
# 3 1 2012 Q3 0.4580580
# 4 1 2012 Q4 0.2902604
# 5 1 2013 Q1 0.4751588
# 6 1 2013 Q2 0.6851755
mydf2 %>%
pivot_wider(names_from = c(year, quarter), values_from = value,
names_sep = "_", names_prefix = "y")
# # A tibble: 3 x 9
# id y2012_Q1 y2012_Q2 y2012_Q3 y2012_Q4 y2013_Q1 y2013_Q2 y2013_Q3 y2013_Q4
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 0.865 0.394 0.458 0.290 0.475 0.685 0.213 0.920
# 2 2 0.566 0.614 0.509 0.0515 0.974 0.916 0.681 0.509
# 3 3 0.968 0.615 0.670 0.748 0.723 0.996 0.247 0.449

You can use backticks for column names starting with numbers and filter should work as expected
mydf %>%
spread(year, value) %>%
filter(`2012` > 0.5)
# id 2012 2013
#1 3 0.8453762 0.3346603
Or another option would be using unite to join two columns to a single columnn after creating a second column 'year1' with string 'y'.
mydf %>%
mutate(year1='y') %>%
unite(yearN, year1, year) %>%
spread(yearN, value) %>%
filter(y_2012 > 0.5)
# id y_2012 y_2013
#1 3 0.8453762 0.3346603
Even we can change the 'year' column within mutate by using paste
mydf %>%
mutate(year=paste('y', year, sep="_")) %>%
spread(year, value) %>%
filter(y_2012 > 0.5)

Another option is to use the setNames() function as the next thing in the pipe:
mydf %>%
spread(mydf, year, value) %>%
setNames( c("id", "y2012", "y2013") ) %>%
filter(y2012 > 0.5)
The only problem using setNames is that you have to know exactly what your columns will be when you spread() them. Most of the time, that's not a problem, particularly if you're working semi-interactively.
But if you're missing a key/value pair in your original data, there's a chance it won't show up as a column, and you can end up naming your columns incorrectly without even knowing it. Granted, setNames() will throw an error if the number of names doesn't match the number of columns, so you've got a bit of error checking built in.
Still, the convenience of using setNames() has outweighed the risk more often than not for me.

Using spread()'s successor pivot_wider() we can give a prefix to the created columns :
library(tidyr)
set.seed(1)
mydf <- data.frame(
id = rep(1:3, rep(2,3)),
year = rep(c(2012, 2013), 3),
value = runif(6)
)
pivot_wider(mydf, names_from = "year", values_from = "value", names_prefix = "y")
#> # A tibble: 3 x 3
#> id y2012 y2013
#> <int> <dbl> <dbl>
#> 1 1 0.266 0.372
#> 2 2 0.573 0.908
#> 3 3 0.202 0.898
Created on 2019-09-14 by the reprex package (v0.3.0)

rename() in dplyr should do the trick
library(tidyr); library(dplyr)
mydf %>%
spread(year,value)%>%
rename(y2012 = '2012',y2013 = '2013')%>%
filter(y2012>0.5)

Related

applying function to each group using dplyr and return specified dataframe

I used group_map for the first time and think I do it correctly. This is my code:
library(REAT)
df <- data.frame(value = c(1,1,1, 1,0.5,0.1, 0,0,0,1), group = c(1,1,1, 2,2,2, 3,3,3,3))
haves <- df %>%
group_by(group) %>%
group_map(~gini(.x$value, coefnorm = TRUE))
The thing is that haves is a list rather than a data frame. What would I have to do to obtain this df
wants <- data.frame(group = c(1,2,3), gini = c(0,0.5625,1))
group gini
1 0.0000
2 0.5625
3 1.0000
Thanks!

You can use dplyr::summarize:
df %>%
group_by(group) %>%
summarize(gini = gini(value, coefnorm = TRUE))
#> # A tibble: 3 x 2
#> group gini
#> <dbl> <dbl>
#> 1 1 0
#> 2 2 0.562
#> 3 3 1

According to the documentation, group_map always produces a list. group_modify is an alternative that produces a tibble if the function does, but gini just outputs a vector. So, you could do something like this...
df %>%
group_by(group) %>%
group_modify(~tibble(gini = gini(.x$value, coefnorm = TRUE)))
# A tibble: 3 x 2
# Groups: group [3]
group gini
<dbl> <dbl>
1 1 0
2 2 0.562
3 3 1

Using data.table
library(data.table)
setDT(df)[, .(gini = gini(value, coefnorm = TRUE)), group]
For grouped datasets, we can specify .data if in case we don't want to use column names unquoted
library(dplyr)
df %>%
group_by(group) %>%
summarize(gini = gini(.data$value, coefnorm = TRUE))

Creating a list from an existing dataframe based on dplyr functions

I currently have a datatframe similar to this one:
df <- tibble("Fam_Name" = c("Architecture", "Arts", "Business", "Managers", "Medicine", "Science"), "Code" = c(1,1,2, 2,3, 3), "Share_2002" = c(0.116, 3.442, 2.445, 1.932, 0.985, 0.321), "Share_2018" = c(0.161, 0.232, 1.234, 0.456, 0.089, 0.06))
I would like to create a list called family which contains three other lists: fam1, fam2, fam3
Each fam(i) list would contain two dataframes called fam_normal and fam_long which are constructed based on dplyr functions, for instance:
fam_normal <- df %>% # I am not sure how to write this so that it is incorporated into the fam(i) list
filter(Code == i) %>%
rename("2002" = Share_2002,
"2018" = Share_2018)
fam_long <- fam_normal %>%
gather(Year, Share, 3:4) %>%
arrange(Fam_Name)
The end goal is to plot a graph for each fam(i) in the fam list where there are Years on the x-axis and Shares on the y-axis.
My real dataset has 25 families and more years.

You could first rename the columns use group_split to split them based on Code and then use map to get list of dataframes.
library(tidyverse)
df %>%
rename("2002" = Share_2002,
"2018" = Share_2018) %>%
group_split(Code) %>%
map(~list(fam_normal = .x, fam_long = .x %>%
gather(Year, Share, 3:4) %>%
arrange(Fam_Name)))
#[[1]]
#[[1]]$fam_normal
# A tibble: 2 x 4
# Fam_Name Code `2002` `2018`
# <chr> <dbl> <dbl> <dbl>
#1 Architecture 1 0.116 0.161
#2 Arts 1 3.44 0.232
#[[1]]$fam_long
# A tibble: 4 x 4
# Fam_Name Code Year Share
# <chr> <dbl> <chr> <dbl>
#1 Architecture 1 2002 0.116
#2 Architecture 1 2018 0.161
#3 Arts 1 2002 3.44
#4 Arts 1 2018 0.232
#....

Here is a base R solution,
dd <- cbind.data.frame(df[1:2], stack(df[-c(1, 2)]))
Map(list, split(df, df$Code), split(dd, dd$Code))
which gives,
$`1`
$`1`[[1]]
# A tibble: 2 x 4
Fam_Name Code Share_2002 Share_2018
<chr> <dbl> <dbl> <dbl>
1 Architecture 1 0.116 0.161
2 Arts 1 3.44 0.232
$`1`[[2]]
Fam_Name Code values ind
1 Architecture 1 0.116 Share_2002
2 Arts 1 3.442 Share_2002
7 Architecture 1 0.161 Share_2018
8 Arts 1 0.232 Share_2018
....
NOTE: You can change column names as per usual

first you can work with the purrr package to work with nested tibbles:
this allows you define the sublists together:
library(tidyverse)
df2 <- df %>%
group_by(Code) %>%
nest(.key = fam_normal) %>%
mutate(fam_long = map(fam_normal, ~gather(.x, Year, Share, -Fam_Name) %>%
arrange(Fam_Name) %>%
mutate(Year = parse_number(Year)))) %>%
unnest(fam_long)
Then you can use ggplot2 to get the plots:
ggplot(df2, aes(x = Year, y = Share, color = Fam_Name)) +
geom_line(size = 2) +
facet_grid(Code~ .)

fam <- list()
fam$normal <- df %>%
filter(Code == i) %>%
rename("2002" = Share_2002,
"2018" = Share_2018)
fam$long <- fam$normal %>%
gather(Year, Share, 3:4) %>%
arrange(Fam_Name)
Now you have a named list fam containing your DFs. Your DFs are so custom that a dplyrsolution may not be as legible as this simple assignment. I am a big fan of tidyverse-style coding but not when it gets in the way of clarity and legibility.
If you want to use this in a pipe, just create a function:
make_families <- function(df) {
# insert code above
# Return `fam`
fam
}`
Then you're done: this will create the list of lists you describe.
df %>%
split(Fam_Name) %>%
purrr::map(make_families)

R: dplyr and row_number() does not enumerate as expected

I want to enumerate each record of a dataframe/tibble resulted from a grouping. The index is according a defined order. If I use row_number() it does enumerate but within group. But I want that it enumerates without considering the former grouping.
Here is an example. To make it simple I used the most minimal dataframe:
library(dplyr)
df0 <- data.frame( x1 = rep(LETTERS[1:2],each=2)
, x2 = rep(letters[1:2], 2)
, y = floor(abs(rnorm(4)*10))
)
df0
# x1 x2 y
# 1 A a 12
# 2 A b 24
# 3 B a 0
# 4 B b 12
Now, I group this table:
df1 <- df0 %>% group_by(x1,x2) %>% summarize(y=sum(y))
This gives me a object of class tibble:
# A tibble: 4 x 3
# Groups: x1 [?]
# x1 x2 y
# <fct> <fct> <dbl>
# 1 A a 12
# 2 A b 24
# 3 B a 0
# 4 B b 12
I want to add a row number to this table using row_numer():
df2 <- df1 %>% arrange(desc(y)) %>% mutate(index = row_number())
df2
# A tibble: 4 x 4
# Groups: x1 [2]
# x1 x2 y index
# <fct> <fct> <dbl> <int>
# 1 A b 24 1
# 2 A a 12 2
# 3 B b 12 1
# 4 B a 0 2
row_number() does enumerate within the former grouping. This was not my intention. This can be avoid converting tibble to a dataframe first:
df2 <- df2 %>% as.data.frame() %>% arrange(desc(y)) %>% mutate(index = row_number())
df2
# x1 x2 y index
# 1 A b 24 1
# 2 A a 12 2
# 3 B b 12 3
# 4 B a 0 4
My question is: is this behaviour intended?
If yes: is it not very dangerous to incorporate former data processing into tibble? Which type of processing is incorporated?
At the moment I will convert tibble into dataframe to avoid this kind of unexpected results.

To elaborate on my comment: yes, retaining grouping is intended, and in many cases useful. It's only dangerous if you don't understand how group_by works—and that's true of any function. To undo group_by, you call ungroup.
Take a look at the group_by docs, as they're very thorough and explain how this function interacts with others, how grouping is layered, etc. The docs also explain how each call to summarise removes a layer of grouping—it might be there that you got confused about what's going on.
For example, you can group by x1 and x2, summarize y, and create a row number, which will give you the rows according to x1 (summarise removed a layer of grouping, i.e. drops the x2 grouping). Then ungrouping allows you to get row numbers based on the entire data frame.
library(dplyr)
df0 %>%
group_by(x1, x2) %>%
summarise(y = sum(y)) %>%
mutate(group_row = row_number()) %>%
ungroup() %>%
mutate(all_df_row = row_number())
#> # A tibble: 4 x 5
#> x1 x2 y group_row all_df_row
#> <fct> <fct> <dbl> <int> <int>
#> 1 A a 12 1 1
#> 2 A b 2 2 2
#> 3 B a 10 1 3
#> 4 B b 23 2 4
A use case—I do this for work probably every day—is to get sums within multiple groups (again, x1 and x2), then to find the shares of those values within their larger group (after peeling away a layer of grouping, this is x1) with mutate. Again, here I ungroup to show the shares instead of the entire data frame.
df0 %>%
group_by(x1, x2) %>%
summarise(y = sum(y)) %>%
mutate(share_in_group = y / sum(y)) %>%
ungroup() %>%
mutate(share_all_df = y / sum(y))
#> # A tibble: 4 x 5
#> x1 x2 y share_in_group share_all_df
#> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 A a 12 0.857 0.255
#> 2 A b 2 0.143 0.0426
#> 3 B a 10 0.303 0.213
#> 4 B b 23 0.697 0.489
Created on 2018-10-11 by the reprex package (v0.2.1)

As camille nicely showed, there are good reasons for wanting to have the result of summarize() retain additional layers of grouping and it's a documented behaviour so not really dangerous or unexpected per se.
However one additional tip is that if you are just going to call ungroup() after summarize() you might as well use summarize(.groups = "drop") which will return an ungrouped tibble and save you a line of code.
library(tidyverse)
df0 <- data.frame(
x1 = rep(LETTERS[1:2], each = 2),
x2 = rep(letters[1:2], 2),
y = floor(abs(rnorm(4) * 10))
)
df0 %>%
group_by(x1,x2) %>%
summarize(y=sum(y), .groups = "drop") %>%
arrange(desc(y)) %>%
mutate(index = row_number())
#> # A tibble: 4 x 4
#> x1 x2 y index
#> <chr> <chr> <dbl> <int>
#> 1 A b 8 1
#> 2 A a 2 2
#> 3 B a 2 3
#> 4 B b 1 4
Created on 2022-02-06 by the reprex package (v2.0.1)

Calculating % of total within groups across each column and transposing

Is there a way to create the following output (assuming a lot of IDs and a lot more attributes)?
I am stuck after calculating the % of total by ATT1 within ID and then ATT2, etc.. Not sure how to go about making the rows into column headers and aggregate.
Input File (df in R):
ID ATT1 ATT2 ATT3 ATT4 Value
1 a x d i 10
1 a y d j 10
1 a y d k 10
1 b y c k 10
1 b y c l 10
2 a x c k 20
…
And I want the output file to look like (ATT4_l is cut off):
ID ATT1_a ATT1_b ATT2_x ATT2_y ATT3_d ATT3_c ATT4_i ATT4_j ATT4_k
1 0.6 0.4 0.2 0.8 0.6 0.4 0.2 0.2 0.4
...
I tried using dplyr
df %>% group_by(ID, ATT1) %>% mutate(proc = (Value/sum(Value) * 100))
But I am not sure what to do once I have all the ATT calculated to get them into columns and aggregated so that each ID only has 1 row of data.

You can do this with the two main workhorses of the tidyverse: dplyr for calculations and tidyr for reshaping data. Some of the reshaping is convoluted so I'm breaking it into steps.
library(dplyr)
library(tidyr)
...
If you gather the data from its original wide format into a long format, you'll have a column of IDs, a column of ATTx values, a column of letters (don't know the context meaning of these, so I'm literally calling it letters), and a column of values. From this format, you can group observations by combinations of ID, ATT, and letter, and you can later stick ATTs and letters together in the way you've laid out.
df %>%
gather(key = att, value = letter, -ID, -Value) %>%
head()
#> # A tibble: 6 x 4
#> ID Value att letter
#> <int> <int> <chr> <chr>
#> 1 1 10 ATT1 a
#> 2 1 10 ATT1 a
#> 3 1 10 ATT1 a
#> 4 1 10 ATT1 b
#> 5 1 10 ATT1 b
#> 6 2 20 ATT1 a
After grouping, calculate total values for each ID/ATT/letter combo:
df %>%
gather(key = att, value = letter, -ID, -Value) %>%
group_by(ID, att, letter) %>%
summarise(group_val = sum(Value)) %>%
head()
#> # A tibble: 6 x 4
#> # Groups: ID, att [3]
#> ID att letter group_val
#> <int> <chr> <chr> <int>
#> 1 1 ATT1 a 30
#> 2 1 ATT1 b 20
#> 3 1 ATT2 x 10
#> 4 1 ATT2 y 40
#> 5 1 ATT3 c 20
#> 6 1 ATT3 d 30
Using mutate, you can calculate the share of each observation within its larger group. mutate drops one layer of the grouping hierarchy, so this is the share of values for each letter within a given ID and ATT. Since you no longer need the total values, just their shares, drop that column, and stick the ATTs and letters back together with unite.
df %>%
gather(key = att, value = letter, -ID, -Value) %>%
group_by(ID, att, letter) %>%
summarise(group_val = sum(Value)) %>%
mutate(share = group_val / sum(group_val)) %>%
select(-group_val) %>%
unite(group, att, letter, sep = "_") %>%
head()
#> # A tibble: 6 x 3
#> # Groups: ID [1]
#> ID group share
#> <int> <chr> <dbl>
#> 1 1 ATT1_a 0.6
#> 2 1 ATT1_b 0.4
#> 3 1 ATT2_x 0.2
#> 4 1 ATT2_y 0.8
#> 5 1 ATT3_c 0.4
#> 6 1 ATT3_d 0.6
Now you have all the information you're looking for, just need to get it into a wide format, turning the values in the group column into individual columns. You do this with spread:
df %>%
gather(key = att, value = letter, -ID, -Value) %>%
group_by(ID, att, letter) %>%
summarise(group_val = sum(Value)) %>%
mutate(share = group_val / sum(group_val)) %>%
select(-group_val) %>%
unite(group, att, letter, sep = "_") %>%
spread(key = group, value = share)
#> # A tibble: 2 x 11
#> # Groups: ID [2]
#> ID ATT1_a ATT1_b ATT2_x ATT2_y ATT3_c ATT3_d ATT4_i ATT4_j ATT4_k
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.6 0.4 0.2 0.8 0.4 0.6 0.2 0.2 0.4
#> 2 2 1 NA 1 NA 1 NA NA NA 1
#> # ... with 1 more variable: ATT4_l <dbl>
Note that there are NAs filled in here where there aren't observations for combinations of ID/ATT/letter. I'm assuming you'll have more complete data than in the sample you posted.
Created on 2018-10-03 by the reprex package (v0.2.1)

I believe you are looking for the reshape2 package
library(reshape2)
df.new <- dcast(df,
formula = ID~ATT1,
value.var = "proc",
fun.aggregate = mean)
This will not completely fix your problem though - I recommend doing this first to make your data tidy
df.tidy <- melt(df,
id.vars = c("ID","Value"),
variable.name = "ATT1_4",
value.name = "att.factor")
df.tidy <- df.tidy %>% group_by(ID, att.factor) %>% mutate(proc = (Value/sum(Value)*100))
df.new <- dcast(df.tidy,
formula = ID~att.factor,
value.var = "proc",
fun.aggregate = mean)
NaN will be returned for anything combination that isnt represented in df.tidy. you can use the fill argument to assign a value to those.

Creating column that is a proportion of two conditions

I have a data frame with about 50 variables but where the ones in the example under are the most important. My aim is to create a table that includes various elements split by department and gender. The combination of dplyr, group_by and summarise gives me most of what I need but I haven't been able to figure out how to get separate columns that shows for example meanFemaleSalary/meanMaleSalary per department. I'm able to get the mean salary per gender per department in separate data frames, but either get an error or just a single value when I try to divide them.
I have tried searching the site and found what I believed was similar questions but couldn't get any of the answers to work. I'd be grateful if anyone could give me a hint on how to proceed…
Thanks!
Example:
library(dplyr)
x <- data.frame(Department = rep(c("Dep1", "Dep2", "Dep3"), times=2),
Gender = rep(c("F", "M"), times=3),
Salary = seq(10,15))
This is what I have that actually works so far:
Table <- x %>% group_by(Department, Gender) %>% summarise(Count = n(),
AverageSalary = mean(Salary, na.rm = T),
MedianSalary = median(Salary, na.rm = T))
I'd like two additional columns for AvgSalaryWomen/Men and MedianSalaryWomen/Men.
Again thanks!

If you want the new columns to be part of Table you could do something like this. But it will result in the value being repeated per department.
Table %>% group_by(Department) %>%
mutate(`AvgSalaryWomen/Men` = AverageSalary[Gender == "F"]/AverageSalary[Gender == "M"],
`MedianSalaryWomen/Men` = MedianSalary[Gender == "F"]/MedianSalary[Gender == "M"])
# Department Gender Count AverageSalary MedianSalary `AvgSalaryWomen/Men` `MedianSalaryWomen/Men`
# <fct> <fct> <int> <dbl> <int> <dbl> <dbl>
# 1 Dep1 F 1 10. 10 0.769 0.769
# 2 Dep1 M 1 13. 13 0.769 0.769
# 3 Dep2 F 1 14. 14 1.27 1.27
# 4 Dep2 M 1 11. 11 1.27 1.27
# 5 Dep3 F 1 12. 12 0.800 0.800
# 6 Dep3 M 1 15. 15 0.800 0.800
If you want just one row per department simply change mutate to summarise and you'll get
# Department `AvgSalaryWomen/Men` `MedianSalaryWomen/Men`
# <fct> <dbl> <dbl>
# 1 Dep1 0.769 0.769
# 2 Dep2 1.27 1.27
# 3 Dep3 0.800 0.800

Here is an option to get this by spreading it to wide format
library(tidyverse)
x %>%
spread(Gender, Salary) %>%
group_by(Department) %>%
summarise(`AvgSalaryWomen/Men` = mean(F)/mean(M),
`MedianSalaryWomen/Men` = median(F)/median(M))
# A tibble: 3 x 3
# Department `AvgSalaryWomen/Men` `MedianSalaryWomen/Men`
# <fctr> <dbl> <dbl>
# 1 Dep1 0.769 0.769
# 2 Dep2 1.27 1.27
# 3 Dep3 0.800 0.800 `

If you want to end up with a table that has one row per department and includes all of the descriptive statistics you're computing along the way, you probably need to convert to long, unite some columns to use as a key, go back to wide, and then add your ratios. Something like...
Table <- x %>%
group_by(Department, Gender) %>%
summarise(Count = n(),
AverageSalary = mean(Salary, na.rm = TRUE),
MedianSalary = median(Salary, na.rm = TRUE)) %>%
# convert to long form
gather(Quantity, Value, -Department, -Gender) %>%
# create a unified gender/measure column to use as the key in the next step
unite(Set, Gender, Quantity) %>%
# go back to wide, now with repeating columns by gender
spread(Set, Value) %>%
# compute the department-level quantities you want using those new cols
mutate(AverageSalaryWomenMen = F_AverageSalary/M_AverageSalary,
MedianSalaryWomenMen = F_MedianSalary/M_MedianSalary)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to control new variables' names after tidyr's spread? - r

rename() in dplyr should do the trick library(tidyr); library(dplyr) mydf %>% spread(year,value)%>% rename(y2012 = '2012',y2013 = '2013')%>% filter(y2012>0.5)

Related

applying function to each group using dplyr and return specified dataframe

Creating a list from an existing dataframe based on dplyr functions

R: dplyr and row_number() does not enumerate as expected

Calculating % of total within groups across each column and transposing

Creating column that is a proportion of two conditions

Categories

Resources