Summarising duplicates in dataframe in R [duplicate] - r

This question already has answers here:
How to sum a variable by group
(18 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 2 years ago.
I have a dateframe with the following data:
#sample data
Date <- c( "2020-01-01", "2020-01-01", "2020-01-01", "2020-01-01", "2020-01-01", "2020-01-02", "2020-01-02", "2020-01-02", "2020-01-02")
Salesperson <-c ( "Sales1", "Sales1", "Sales1", "Sales2", "Sales2", "Sales1", "Sales1", "Sales2", "Sales2" )
Clothing <-c ( "5", "2", "8", "3", "3", "4", "7", "3", "4" )
Electronics <-c ( "6", "9", "1", "2", "1", "2", "2", "1", "2" )
data<-data.frame(Date,Salesperson,Clothing,Electronics, stringsAsFactors = FALSE)
data$Date<-as.Date(data$Date,"%Y-%m-%d")
There are rows in the df where a salesperson have recorded their sales multiple times for the same date rather than adding them up.
The result I want is shown by the dataframe below:
Date <- c ( "2020-01-01", "2020-01-01", "2020-01-02", "2020-01-02" )
Salesperson <- c ( "Sales1", "Sales2", "Sales1", "Sales2")
Clothing <- c ( "15", "6", "11", "7" )
Electronics <- c ( "16", "3", "4", "3" )
data1<-data.frame(Date,Salesperson,Clothing,Electronics, stringsAsFactors = FALSE)
Does anyone know how to achieve this result?

To summarise your data, you need the numbers to be passed as numbers, not strings. See I added as.numeric() in front of your Clothing and Electronics variables:
Clothing <-as.numeric(c ( "5", "2", "8", "3", "3", "4", "7", "3", "4" ))
Electronics <-as.numeric(c ( "6", "9", "1", "2", "1", "2", "2", "1", "2" ))
Now, to summarise using the sum, try:
library(dplyr)
data %>%
group_by(Date, Salesperson) %>%
summarise(sum_cloth=(sum(Clothing)), sum_elec=sum(Electronics))
# Groups: Date [2]
Date Salesperson sum_cloth sum_elec
<chr> <chr> <dbl> <dbl>
1 2020-01-01 Sales1 15 16
2 2020-01-01 Sales2 6 3
3 2020-01-02 Sales1 11 4
4 2020-01-02 Sales2 7 3

Related

Merging two matrices with merge.Matrices does not return the desired output

I have two matrices provided below:
cf = structure(c("7", "7", "7", "7", "7", "7", "7", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3", "3",
"3", "3", "3", "3", "3", "3", "3", "3", "3", "17", "18", "19",
"20", "21", "22", "23", "0", "1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
"19", "20", "21", "22", "23", "0", "1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17",
"18", "19", "20", "21", "22", "23", "0", "1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"), .Dim = c(71L,
2L), .Dimnames = list(NULL, c("d", "h")))
hour_df<-data.frame(
day = as.character(rep(c(1,2,3,4,5,6,7), each = 24)),
hours = as.character(rep(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23), times = 7)),
period = rep(c(rep("night",times = 8),rep("day",times = 12),rep("night",times = 4)), times = 7),
tariff_label = rep(c(rep("special feed", times = 8),rep("normal feed", times = 12),rep("special feed", times = 4)), times = 7),
week_period = c(rep("weekend",times = 32),rep("weekday",times = 108),rep("weekend",times = 28))
)
hour_df$tariff_label[hour_df$day %in% c("7","1")]<-"special feed"
hour_df<-as.matrix(hour_df)
I want to merge these matrices on two common columns in each matrix. e.g by.x = c("d","h"), by.y = c("day","hours")
If I use the base function merge()I get my desired output that looks like this
merge(cf,hour_df, by.x = c("d","h"), by.y = c("day","hours"))
d h period tariff_label week_period
1 1 0 night special feed weekend
2 1 1 night special feed weekend
3 1 10 day special feed weekend
4 1 11 day special feed weekend
5 1 12 day special feed weekend
6 1 13 day special feed weekend
7 1 14 day special feed weekend
8 1 15 day special feed weekend
9 1 16 day special feed weekend
10 1 17 day special feed weekend
11 1 18 day special feed weekend
12 1 19 day special feed weekend
13 1 2 night special feed weekend
14 1 20 night special feed weekend
15 1 21 night special feed weekend
16 1 22 night special feed weekend
17 1 23 night special feed weekend
18 1 3 night special feed weekend
19 1 4 night special feed weekend
20 1 5 night special feed weekend
21 1 6 night special feed weekend
22 1 7 night special feed weekend
23 1 8 day special feed weekend
24 1 9 day special feed weekend
25 2 0 night special feed weekend
26 2 1 night special feed weekend
27 2 10 day normal feed weekday
28 2 11 day normal feed weekday
29 2 12 day normal feed weekday
30 2 13 day normal feed weekday
31 2 14 day normal feed weekday
32 2 15 day normal feed weekday
33 2 16 day normal feed weekday
34 2 17 day normal feed weekday
35 2 18 day normal feed weekday
36 2 19 day normal feed weekday
37 2 2 night special feed weekend
38 2 20 night special feed weekday
39 2 21 night special feed weekday
40 2 22 night special feed weekday
41 2 23 night special feed weekday
42 2 3 night special feed weekend
43 2 4 night special feed weekend
44 2 5 night special feed weekend
45 2 6 night special feed weekend
46 2 7 night special feed weekend
47 2 8 day normal feed weekday
48 2 9 day normal feed weekday
49 3 0 night special feed weekday
50 3 1 night special feed weekday
51 3 10 day normal feed weekday
52 3 11 day normal feed weekday
53 3 12 day normal feed weekday
54 3 13 day normal feed weekday
55 3 14 day normal feed weekday
56 3 15 day normal feed weekday
57 3 2 night special feed weekday
58 3 3 night special feed weekday
59 3 4 night special feed weekday
60 3 5 night special feed weekday
61 3 6 night special feed weekday
62 3 7 night special feed weekday
63 3 8 day normal feed weekday
64 3 9 day normal feed weekday
65 7 17 day special feed weekend
66 7 18 day special feed weekend
67 7 19 day special feed weekend
68 7 20 night special feed weekend
69 7 21 night special feed weekend
70 7 22 night special feed weekend
71 7 23 night special feed weekend
As you see above I have 71 rows. I wanted to see if there is a faster function for merging matrices. I saw online that there is a function called merge.Matrix() and it should be faster than base merge. However, when I tried to implement it, I got a completely different result.
library(Matrix.utils)
merge.Matrix(cf,hour_df, by.x = c("d","h"), by.y = c("day","hours"))
d h day hours period tariff_label week_period
"7" "17" "1" "2" "night" "special feed" "weekend"
"7" "19" "1" "0" "night" "special feed" "weekend"
"7" "18" "1" "2" "night" "special feed" "weekend"
"7" "19" "1" "1" "night" "special feed" "weekend"
I tried to see online how it is used and more information about it but information on this function seems to be scarce. I also checked out the vignette. Can someone tell me what I am doing wrong or whether there is a better function than this?
Please Note
I am already aware of dplyr joins and data.table. It is important that both of the matrices stay matrices and that they are not changed into some other format. In reality, my code is performing a join from a list that contains thousands of matrices and therefore needs to be quick.

How to extract records to those patients who got admitted before discharge in another hospital

I am analyzing data of patient admission/discharge in a number of hospitals for various inconsistencies.
My data structure is like -
Row_id ; nothing but a unique identifier of records (used as foreign key in some other table)
patient_id : unique identifier key for a patient
pack_id : the medical package chosen by the patient for treatment
hospital_id : unique identifier for a hospital
admn_dt : the date of admission
discharge_date : the date of discharge of patient
Snapshot of data
row_id patient_id pack_id hosp_id admn_date discharge_date
1 1 12 1 01-01-2020 14-01-2020
2 1 62 2 03-01-2020 15-01-2020
3 1 77 1 16-01-2020 27-01-2020
4 1 86 1 18-01-2020 19-01-2020
5 1 20 2 22-01-2020 25-01-2020
6 2 55 3 01-01-2020 14-01-2020
7 2 86 3 03-01-2020 17-01-2020
8 2 72 4 16-01-2020 27-01-2020
9 1 7 1 26-01-2020 30-01-2020
10 3 54 5 14-01-2020 22-01-2020
11 3 75 5 09-02-2020 17-02-2020
12 3 26 6 22-01-2020 05-02-2020
13 4 21 7 14-04-2020 23-04-2020
14 4 12 7 23-04-2020 29-04-2020
15 5 49 8 17-03-2020 26-03-2020
16 5 35 9 27-02-2020 07-03-2020
17 6 51 10 12-04-2020 15-04-2020
18 7 31 11 11-02-2020 17-02-2020
19 8 10 12 07-03-2020 08-03-2020
20 8 54 13 20-03-2020 23-03-2020
sample dput of data is as under:
df <- structure(list(row_id = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
"19", "20"), patient_id = c("1", "1", "1", "1", "1", "2", "2",
"2", "1", "3", "3", "3", "4", "4", "5", "5", "6", "7", "8", "8"
), pack_id = c("12", "62", "77", "86", "20", "55", "86", "72",
"7", "54", "75", "26", "21", "12", "49", "35", "51", "31", "10",
"54"), hosp_id = c("1", "2", "1", "1", "2", "3", "3", "4", "1",
"5", "5", "6", "7", "7", "8", "9", "10", "11", "12", "13"), admn_date = structure(c(18262,
18264, 18277, 18279, 18283, 18262, 18264, 18277, 18287, 18275,
18301, 18283, 18366, 18375, 18338, 18319, 18364, 18303, 18328,
18341), class = "Date"), discharge_date = structure(c(18275,
18276, 18288, 18280, 18286, 18275, 18278, 18288, 18291, 18283,
18309, 18297, 18375, 18381, 18347, 18328, 18367, 18309, 18329,
18344), class = "Date")), row.names = c(NA, -20L), class = "data.frame")
I have to identify the records where patient got admitted without discharge from previous treatment. For this I have used the following code taking help from this thread How to know customers who placed next order before delivery/receiving of earlier order? In R -
library(tidyverse)
df %>% arrange(patient_id, admn_date, discharge_date) %>%
mutate(sort_key = row_number()) %>%
pivot_longer(c(admn_date, discharge_date), names_to ="activity",
values_to ="date", names_pattern = "(.*)_date") %>%
mutate(activity = factor(activity, ordered = T,
levels = c("admn", "discharge")),
admitted = ifelse(activity == "admn", 1, -1)) %>%
group_by(patient_id) %>%
arrange(date, sort_key, activity, .by_group = TRUE) %>%
mutate (admitted = cumsum(admitted)) %>%
ungroup() %>%
filter(admitted >1, activity == "admn")
This give me nicely all the records where patients got admission without being discharged from previous treatment.
Output-
# A tibble: 6 x 8
row_id patient_id pack_id hosp_id sort_key activity date admitted
<chr> <chr> <chr> <chr> <int> <ord> <date> <dbl>
1 2 1 62 2 2 admn 2020-01-03 2
2 4 1 86 1 4 admn 2020-01-18 2
3 5 1 20 2 5 admn 2020-01-22 2
4 9 1 7 1 6 admn 2020-01-26 2
5 7 2 86 3 8 admn 2020-01-03 2
6 8 2 72 4 9 admn 2020-01-16 2
Explanation-
Row_id 2 is correct because it overlaps with row_id 1
Row_id 4 is correct because it overlaps with row_id 3
Row_id 5 is correct because it overlaps with row_id 3 (again)
Row_id 9 is correct because it overlaps with row_id 3 (again)
Row_id 7 is correct becuase it overlaps with row_id 6
Row_id 8 is correct becuase it overlaps with row_id 7
Now I am stuck at a given validation rule that patients are allowed to take admission in same hospital n number of times without actually validating for their previous discharge. In other words, I have to extract only those records where patients got admitted in a different hospital without being discharged from 'another hospital. If the hospital would have been same, the group_by at hosp_id field could have done the work for me, but here the case is actually reverse. For same hosp_id it is allowed but for different it is not allowed.
Please help how may I proceed?
If I could map the resultant row_id with its overlapping record's row_id, may be we can solve the problem.
Desired Output-
row_id
2
5
8
because row_ids 4,, 9 and 7 overlaps with record having same hospital id.
Thanks in advance.
P.S. Though a desired solution has been given, I want to know can it done through map/apply group of function and/or through data.table package?
Is this what you're looking for? (Refer to the comments in the code for details. I can provide clarifications if necessary.)
#Your data
df <- structure(list(row_id = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
"19", "20"), patient_id = c("1", "1", "1", "1", "1", "2", "2",
"2", "1", "3", "3", "3", "4", "4", "5", "5", "6", "7", "8", "8"
), pack_id = c("12", "62", "77", "86", "20", "55", "86", "72",
"7", "54", "75", "26", "21", "12", "49", "35", "51", "31", "10",
"54"), hosp_id = c("1", "2", "1", "1", "2", "3", "3", "4", "1",
"5", "5", "6", "7", "7", "8", "9", "10", "11", "12", "13"), admn_date = structure(c(18262,
18264, 18277, 18279, 18283, 18262, 18264, 18277, 18287, 18275,
18301, 18283, 18366, 18375, 18338, 18319, 18364, 18303, 18328,
18341), class = "Date"), discharge_date = structure(c(18275,
18276, 18288, 18280, 18286, 18275, 18278, 18288, 18291, 18283,
18309, 18297, 18375, 18381, 18347, 18328, 18367, 18309, 18329,
18344), class = "Date")), row.names = c(NA, -20L), class = "data.frame")
#Solution
library(dplyr)
library(tidyr)
library(stringr)
library(magrittr)
library(lubridate)
#Convert patient_id column into numeric
df$patient_id <- as.numeric(df$patient_id)
#Create empty (well, 1 row) data.frame to
#collect output data
#This needs three additional columns
#(as indicated)
outdat <- data.frame(matrix(nrow = 1, ncol = 9), stringsAsFactors = FALSE)
names(outdat) <- c(names(df), "ref_discharge_date", "ref_hosp_id", "overlap")
#Logic:
#For each unique patient_id take all
#their records.
#For each row of each such set of records
#compare its discharge_date with the admn_date
#of all other records with admn_date >= its own
#admn_date
#Then register the time interval between this row's
#discharge_date and the compared row's admn_date
#as a numeric value ("overlap")
#The idea is that concurrent hospital stays will have
#negative overlaps as the admn_date (of the current stay)
#will precede the discharge_date (of the previous one)
for(i in 1:length(unique(df$patient_id))){
#i <- 7
curdat <- df %>% filter(patient_id == unique(df$patient_id)[i])
curdat %<>% mutate(admn_date = lubridate::as_date(admn_date),
discharge_date = lubridate::as_date(discharge_date))
curdat %<>% arrange(admn_date)
for(j in 1:nrow(curdat)){
#j <- 1
currow <- curdat[j, ]
#otrows <- curdat[-j, ]
#
otrows <- curdat %>% filter(admn_date >= currow$admn_date)
#otrows <- curdat
for(k in 1:nrow(otrows)){
otrows$ref_discharge_date[k] <- currow$discharge_date
#otrows$refdisc[k] <- as_date(otrows$refdisc[k])
otrows$ref_hosp_id[k] <- currow$hosp_id
otrows$overlap[k] <- as.numeric(difftime(otrows$admn_date[k], currow$discharge_date))
}
otrows$ref_discharge_date <- as_date(otrows$ref_discharge_date)
outdat <- bind_rows(outdat, otrows)
}
}
rm(curdat, i, j, k, otrows, currow)
#Removing that NA row + removing all self-rows
outdat %<>%
filter(!is.na(patient_id)) %>%
filter(discharge_date != ref_discharge_date)
#Filter out only negative overlaps
outdat %<>% filter(overlap < 0)
#Filter out only those records where the patient
#was admitted to different hospitals
outdat %<>% filter(hosp_id != ref_hosp_id)
outdat
# row_id patient_id pack_id hosp_id admn_date discharge_date ref_discharge_date ref_hosp_id overlap
# 1 2 1 62 2 2020-01-03 2020-01-15 2020-01-14 1 -11
# 2 5 1 20 2 2020-01-22 2020-01-25 2020-01-27 1 -5
# 3 8 2 72 4 2020-01-16 2020-01-27 2020-01-17 3 -1
Group by the patient id again and then count the hospital IDs. Then merge that back on and filter the data.
Something like:
admitted_not_validated %>%
left_join(
admitted_not_validated %>%
group_by(patient_id) %>%
summarize (multi_hosp = length(unique(hosp_id)),.groups ='drop'),
by = 'patient_id') %>%
filter(multi_hosp >1)

Pivot from long format to wide format in a dataframe [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
pivot_wider when there's no names column (or when names column should be created)
(2 answers)
Closed 2 years ago.
I have the dataframe below:
dput(Moment[1:15,])
structure(list(SectionCut = c("1", "1", "1", "1", "2", "2", "2",
"2", "3", "3", "3", "3", "Left", "Left", "Left"), N_l = c("1",
"2", "3", "4", "1", "2", "3", "4", "1", "2", "3", "4", "1", "2",
"3"), UG = c("84", "84", "84", "84", "84", "84", "84", "84",
"84", "84", "84", "84", "84", "84", "84"), S = c("12", "12",
"12", "12", "12", "12", "12", "12", "12", "12", "12", "12", "12",
"12", "12"), Sample = c("S00", "S00", "S00", "S00", "S00", "S00",
"S00", "S00", "S00", "S00", "S00", "S00", "S00", "S00", "S00"
), DF = c(0.367164093630677, 0.540130283330855, 0.590662743113521,
0.497030982705986, 0.000319303760901125, 0.000504925126205843,
0.00051127115578891, 0.000395434233037301, 0.413218926236695,
0.610726262711904, 0.685000816613652, 0.59474035159783, 0.483354599644366,
0.645710184115934, 0.625883097885242)), row.names = c(NA, -15L
), class = c("tbl_df", "tbl", "data.frame"))
I want to separate the content of the column by pivoting the SectionCut column. I would basically want to use the opposite of pivot_longer somehow... so at the end the values in column DF will be shown under 5 different columns (the values of SectionCut = c("1", "2", "3", "left", "right")
We could use pivot_wider from tidyr after creating a sequence column with rowid
library(dplyr)
library(tidyr0
library(data.table)
Moment %>%
mutate(rn = rowid(SectionCut)) %>%
pivot_wider(names_from = SectionCut, values_from = DF)
-output
# A tibble: 4 x 9
# N_l UG S Sample rn `1` `2` `3` Left
# <chr> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#1 1 84 12 S00 1 0.367 0.000319 0.413 0.483
#2 2 84 12 S00 2 0.540 0.000505 0.611 0.646
#3 3 84 12 S00 3 0.591 0.000511 0.685 0.626
#4 4 84 12 S00 4 0.497 0.000395 0.595 NA

How to filter rows with multiple conditions?

I have a data set as I've shown below.
df <- tribble(
~shop_id, ~id, ~key, ~date, ~status,
"1", "10", "abc", '2020-05-04', 'good',
"1", "10", "def", '2020-05-03', 'normal',
"1", "10", "glm", '2020-05-03', 'bad',
"1", "20", "ksr", '2020-05-01', 'bad',
"1", "20", "tyz", '2020-05-02', 'bad',
"2", "20", "uyv", '2020-05-01', 'good',
"2", "20", "mys", '2020-05-01', 'normal',
"2", "30", "ert", '2020-05-01', 'bad',
"2", "40", "yer", '2020-05-05', 'good',
"2", "40", "tet", '2020-05-05', 'bad',
)
Now, I want to filter the data with the following conditions:
Group the data by shop_id and id, then look at the date. Then,
If the date is minimum when status == 'bad', then remove the rows. For instance, the first three rows were removed from the data set because of this condition. (please see desired_df)
If there is only the status of 'bad', leave all the rows. Because of this condition, 4th and 5th rows left in the desired data set.
If the date is the same among the rows when status == 'bad', then leave both rows in the desired data set.
In other words, I only want to see the rows when the date of 'bad' status is the maximum after we grouped the shop_id and id. But when the date of the status is the same in both statuses, keep the rows.
desired_df <- tribble(
~shop_id, ~id, ~key, ~date, ~status,
"1", "20", "ksr", '2020-05-01', 'bad',
"1", "20", "tyz", '2020-05-02', 'bad',
"2", "30", "ert", '2020-05-01', 'bad',
"2", "40", "yer", '2020-05-05', 'good',
"2", "40", "tet", '2020-05-05', 'bad',
)
Any help or assistance would be really appreciated!
One approach is to use case_when.
df %>%
mutate(date = ymd(date)) %>%
group_by(shop_id,id) %>%
mutate(filter = case_when(all(status != "bad") ~ FALSE,
all(status == "bad") ~ TRUE,
all(status[date == min(date)] == "bad") ~ FALSE,
any(status[date == min(date)] == "good") ~ TRUE,
TRUE ~ FALSE)) %>%
filter(filter == TRUE) %>%
dplyr::select(-filter)
# A tibble: 5 x 5
# Groups: shop_id, id [3]
shop_id id key date status
<chr> <chr> <chr> <date> <chr>
1 1 20 ksr 2020-05-01 bad
2 1 20 tyz 2020-05-02 bad
3 2 30 ert 2020-05-01 bad
4 2 40 yer 2020-05-05 good
5 2 40 tet 2020-05-05 bad

Convert multiple header table to long format

I am reading in an Excel table with multiple rows of headers, which, through read.csv, creates an object like this in R.
R1 <- c("X", "X.1", "X.2", "X.3", "EU", "EU.1", "EU.2", "US", "US.1", "US.2")
R2 <- c("Min Age", "Max Age", "Min Duration", "Max Duration", "1", "2", "3", "1", "2", "3")
R3 <- c("18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01")
R4 <- c("22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05")
R5 <- c("26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21")
R6 <- c("18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40")
R7 <- c("22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50")
R8 <- c("26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99")
table1 <- as.data.frame(rbind(R1, R2, R3, R4, R5, R6, R7, R8))
How do I now 'flatten' this so that I end up with an R table with "Min age", "Max Age", "Min Duration", "Max Duration", "Area", "Level", "Price" columns. With the "Area" column showing either "EU" or "US", the "Level" column showing either 1, 2 or 3, and then the "Price" column showing the corresponding price found in the Excel table?
I would use the gather function from tidyr if there weren't multiple header rows, but can't seem to work it with this data, any ideas?
The output should have a total of 36 rows + headers
If you skip the first row, as suggested by akrun, you will presumably end up with data that looks something like this: (with "X"s and ".1"/".2" added automatically by R)
library(tidyverse)
df <- tribble(
~Min.Age, ~Max.Age, ~Min.Duration, ~Max.Duration, ~X1.1, ~X2.1, ~X3.1, ~X1.2, ~X2.2, ~X3.2,
"18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01",
"22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05",
"26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21",
"18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40",
"22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50",
"26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99"
)
With this data, you can then use gather to collect all headers beginning with X into one column and price into another. You can separate the the headers into the "Level" and "Area". Finally, recode Area and remove "X" from the levels.
df %>%
gather(headers, Price, starts_with("X")) %>%
separate(headers, c("Level", "Area")) %>%
mutate(Area = if_else(Area == "1", "EU", "US"),
Level = parse_number(Level))
#> # A tibble: 36 x 7
#> Min.Age Max.Age Min.Duration Max.Duration Level Area Price
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 18 21 1 3 1 EU 0.12
#> 2 22 25 1 3 1 EU 0.20
#> 3 26 30 1 3 1 EU 0.25
#> 4 18 21 4 5 1 EU 0.32
#> 5 22 25 4 5 1 EU 0.40
#> 6 26 30 4 5 1 EU 0.55
#> 7 18 21 1 3 2 EU 0.32
#> 8 22 25 1 3 2 EU 0.40
#> 9 26 30 1 3 2 EU 0.50
#> 10 18 21 4 5 2 EU 0.60
#> # ... with 26 more rows
Created on 2018-10-12 by the reprex package (v0.2.1)
P.S. You can find lots of spreadsheet munging workflows here: https://nacnudus.github.io/spreadsheet-munging-strategies/small-multiples-with-all-headers-present-for-each-multiple.html

Resources