Errorbars and bar plots having different positions in ggplot

Errorbars and bar plots having different positions in ggplot - r

I have a dataframe df
> df
id zone mean SE
1 1 1 0.9378712 0.10
2 1 2 2.4830645 0.09
3 1 3 0.7191759 0.09
4 1 4 1.3030844 0.09
5 1 5 1.2497096 0.11
6 1 6 0.7247015 0.15
7 1 7 0.1776825 0.16
8 1 8 1.4755258 0.13
9 1 9 1.0902742 0.16
10 1 10 0.2679057 0.08
11 1 12 0.7677998 0.09
12 2 1 1.2728942 0.14
13 2 2 1.3189574 0.07
14 2 3 1.0934750 0.14
15 2 4 1.3024298 0.10
16 2 5 1.3029797 0.11
17 2 6 1.0878356 0.12
18 2 7 0.5390098 0.12
19 2 8 1.2761170 0.09
20 2 9 1.1395524 0.12
21 2 10 0.6863418 0.14
22 2 12 1.1534048 0.12
23 3 1 1.2963668 0.14
24 3 2 1.3032349 0.07
25 3 3 1.1302980 0.14
26 3 4 1.3049038 0.10
27 3 5 1.3221782 0.11
28 3 6 1.0464710 0.14
29 3 7 0.4997006 0.13
30 3 8 1.2777002 0.09
31 3 9 1.1480874 0.12
32 3 10 0.6844529 0.15
33 3 12 1.1593346 0.13
34 4 1 1.2819611 0.14
35 4 2 1.4276992 0.07
36 4 3 1.1061886 0.14
37 4 4 1.3572913 0.11
38 4 5 1.3588146 0.12
39 4 6 1.1318426 0.14
40 4 7 0.5321167 0.12
41 4 8 1.3701237 0.10
42 4 9 1.1996266 0.13
43 4 10 0.6977050 0.14
44 4 12 1.2620727 0.14
Now it can be seen in zones that there is no 11 number, after 10 it comes 12.
So when I plot it automatically it comes like this
axis_labels <- c("first","second","third","fourth","fifth","sixth","seventh","eigth","ninth","tenth","eleventh")
axis_labels <- setNames(axis_labels, 1:11)
ggplot(df, aes(x=factor(zone), y=mean, fill = id)) +
geom_col(position = position_dodge()) +
scale_fill_discrete(labels = c("1" = "M", "2" = "I","3" = "Mi","4"="C"))+
scale_x_discrete(labels = axis_labels) +
theme(axis.title.x = element_blank(),
axis.line.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
theme(plot.margin = unit(rep(0, 5), "pt"))+
geom_errorbar(aes(x=zone, ymin=mean-SE, ymax=mean+SE), width=0.4, position = position_dodge(.9))+
theme_bw()
So the bars at eleventh that are read are actually the twelveth zone in the dataframe but the errorbars are in the actual twelfth place. How can solve this problem without changing the whole code?

The problem comes down to a few things:
Up front, I'll make inferences about column class: I'm fairly confident that id should be character, but I'm not certain about zone. I'll guess character for now.
You use factor(zone) in one aesthetic and zone in another; either all of them should be factor, or none, otherwise you are confusing ggplot2 (and me).
You have 12 in your zone but your labels say eleventh, not sure if that's a typo or something else.
I think the fixes are to make a "proper" factor variable.
df$zone <- as.character(df$zone) # just in case
axis_labels <- setNames(axis_labels, c(1:10,12)) # no 11s in your data, no 12s in your labels
df$zone2 <- factor(axis_labels[df$zone], levels = axis_labels)
ggplot(df, aes(x=zone2, y=mean, fill = id)) +
geom_col(position = position_dodge()) +
scale_fill_discrete(labels = c("1" = "M", "2" = "I","3" = "Mi","4"="C"))+
scale_x_discrete(labels = axis_labels) +
theme(axis.title.x = element_blank(),
axis.line.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
theme(plot.margin = unit(rep(0, 5), "pt"))+
geom_errorbar(aes(x=zone2, ymin=mean-SE, ymax=mean+SE), width=0.4, position = position_dodge(.9))+
theme_bw()
Data:
df <- structure(list(id = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "4", "4", "4", "4", "4", "4", "4", "4", "4", "4", "4"), zone = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "12", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "12", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "12", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "12"), mean = c(0.9378712, 2.4830645, 0.7191759, 1.3030844, 1.2497096, 0.7247015, 0.1776825, 1.4755258, 1.0902742, 0.2679057, 0.7677998, 1.2728942, 1.3189574, 1.093475, 1.3024298, 1.3029797, 1.0878356, 0.5390098, 1.276117, 1.1395524, 0.6863418, 1.1534048, 1.2963668, 1.3032349, 1.130298, 1.3049038, 1.3221782, 1.046471, 0.4997006, 1.2777002, 1.1480874, 0.6844529, 1.1593346, 1.2819611, 1.4276992, 1.1061886, 1.3572913, 1.3588146, 1.1318426, 0.5321167, 1.3701237, 1.1996266, 0.697705, 1.2620727), SE = c(0.1, 0.09, 0.09, 0.09, 0.11, 0.15, 0.16, 0.13, 0.16, 0.08, 0.09, 0.14, 0.07, 0.14, 0.1, 0.11, 0.12, 0.12, 0.09, 0.12, 0.14, 0.12, 0.14, 0.07, 0.14, 0.1, 0.11, 0.14, 0.13, 0.09, 0.12, 0.15, 0.13, 0.14, 0.07, 0.14, 0.11, 0.12, 0.14, 0.12, 0.1, 0.13, 0.14, 0.14)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44"))

Related

Change metrics inside rows by condition

Suppose a data:
df1 <- tibble::tribble(~"M1", ~"M2", ~"Beer, pints", ~"Coffee, oz", ~"Gasoline, galons", ~"Milk, galons", ~"Warehouse, square feet", ~"Nearest place, miles",
"NY", "22", "10", "12", "15", "100", "100", "20",
"NY", "20", "9", "10", "12", "100", "100", "20",
"NY", "18", "8", "9", "11", "100", "100", "20",
"M1", "M2", "Beer, liters", "Coffee, cups (120 ml)", "Gasoline, liters", "Milk, liters", "Warehouse, square meters", "Nearest place, kilometers",
"PR", "22", "7", "8", "9", "70", "67", "7",
"PR", "20", "6", "7", "8", "80", "75", "7",
"M1", "M2", "Beer, pints", "Coffee, oz", "Gasoline, liters", "Milk, liters", "Warehouse, square feet", "Nearest place, miles",
"KR", "22", "6", "6", "7", "60", "50", "9",
"KR", "20", "5", "6", "8", "55", "65", "9",
"KR", "18", "5", "6", "8", "50", "55", "9")
For visual representation:
Is there a nice method to recalculate all columns in the same metrics (like if it is liters, then the entrire column should be liters; if miles (not kilometers), then the entire column to be miles [based on condition in the subheadings inside]?
It could be great to think on the nicest methods to solve it.
PS: for information:
1 gallon = 3.78541 liters
1 pint = 0.473176 liters
1 oz = 0.0295735 liters
11 square feet = 1.02193 square meters
1 mile = 1.60934 kilometers
I am just wondering and just started to consider for solution.
I am interested to look for possible nice solutions.
In addition, it will be interesting for the entire R community to think on the best methods to edit the data by condition.

When the data is sloppy, we must also get our hands dirty.I thought of way, with many steps.
Data
df1 <-
structure(list(m1 = c("M1", "NY", "NY", "NY", "M1", "PR", "PR",
"M1", "KR", "KR", "KR"), m2 = c("M2", "22", "20", "18", "M2",
"22", "20", "M2", "22", "20", "18"), beer = c("Beer, pints",
"10", "9", "8", "Beer, liters", "7", "6", "Beer, pints", "6",
"5", "5"), coffee = c("Coffee, oz", "12", "10", "9", "Coffee, cups (120 ml)",
"8", "7", "Coffee, oz", "6", "6", "6"), gasoline = c("Gasoline, galons",
"15", "12", "11", "Gasoline, liters", "9", "8", "Gasoline, liters",
"7", "8", "8"), milk = c("Milk, galons", "100", "100", "100",
"Milk, liters", "70", "80", "Milk, liters", "60", "55", "50"),
warehouse = c("Warehouse, square feet", "100", "100", "100",
"Warehouse, square meters", "67", "75", "Warehouse, square feet",
"50", "65", "55"), nearest_place = c("Nearest_place, miles",
"20", "20", "20", "Nearest place, kilometers", "7", "7",
"Nearest place, miles", "9", "9", "9")), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"))
Convert function
convert_unit <- function(value,unit){
m <-
case_when(
unit == "galons" ~ 3.78541,
unit == "pints" ~ 0.473176,
unit == "oz" ~ 0.0295735,
unit == "squarefeet" ~ 1.02193/11,
unit == "miles" ~ 1.02193/11,
TRUE ~ 1
)
output <- m*as.numeric(value)
return(output)
}
Data preparation
First, I would add the header as the first row and also create better names.
library(dplyr)
library(stringr)
library(tidyr)
#remotes::install_github("vbfelix/relper")
library(relper)
or_names <- names(df1)
new_names <- str_to_lower(str_select(or_names,before = ","))
n_row <- nrow(df1)
df1[2:(n_row+1),] <- df1
df1[1,] <- as.list(or_names)
names(df1) <- new_names
Data manipulation
Then, I would create new columns with the units, and the apply the function to each one.
df1 %>%
mutate(
across(.cols = -c(m1:m2),.fns = ~str_keep(str_select(.,after = ",")),.names = "{.col}_unit"),
aux = beer_unit == "",
across(.cols = ends_with("_unit"),~if_else(. == "",NA_character_,.))) %>%
fill(ends_with("_unit"),.direction = "down") %>%
filter(aux) %>%
mutate(
across(
.cols = beer:nearest_place,
.fns = ~convert_unit(value = .,unit = get(str_c(cur_column(),"_unit")))
)
) %>%
select(-aux,-ends_with("_unit"))
Output
# A tibble: 8 x 8
m1 m2 beer coffee gasoline milk warehouse nearest_place
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NY 22 4.73 0.355 56.8 379. 9.29 1.86
2 NY 20 4.26 0.296 45.4 379. 9.29 1.86
3 NY 18 3.79 0.266 41.6 379. 9.29 1.86
4 PR 22 7 8 9 70 67 7
5 PR 20 6 7 8 80 75 7
6 KR 22 2.84 0.177 7 60 4.65 0.836
7 KR 20 2.37 0.177 8 55 6.04 0.836
8 KR 18 2.37 0.177 8 50 5.11 0.836

R select rows in dataframe by external vector as index

I have the following data and I want to subset some rows from the table if the name is in the vector l.
df <-data.frame("Names" = c("TIGIT", "ABCB1", "CD8B", "CD8A", "CD1C", "F2RL1", "LCP1", "LAG3", "ABL1", "CD2", "IL12A", "PSEN2", "CD3G", "CD28", "PSEN1", "ITGA1"),"1S" = c("5", "6", "8", "99", "5", "0", "1", "3", "15", "15", "34", "62", "54", "6", "8", "9"), "1T" = c("6", "4", "6", "9", "5", "11", "33", "7", "8", "24", "34", "62", "66", "4", "78", "44"))
rownames(df) <- df$Names
df <- df %>% select(-"Names") # df I have
l <- c("TIGIT", "CD8B", "CD8A", "CD1C", "F2RL1", "LCP1", "LAG3", "CD2", "PSEN2", "CD3G", "CD28", "PSEN1") # genes I want to select
I want to get the following table in the output.
X1S X1T
TIGIT 5 6
CD8B 8 6
CD8A 99 9
CD1C 5 5
F2RL1 0 11
LCP1 1 33
LAG3 3 7
CD2 15 24
PSEN2 62 62
CD3G 54 66
CD28 6 4
PSEN1 8 78

It is easier to filter by the gene names, if you keep them as a column,
instead of making them rownames.
The following changes to your code will get you the result you are lookin for.
library(tidyverse)
df <-data.frame("Names" = c("TIGIT", "ABCB1", "CD8B", "CD8A", "CD1C", "F2RL1", "LCP1", "LAG3", "ABL1", "CD2", "IL12A", "PSEN2", "CD3G", "CD28", "PSEN1", "ITGA1"),"1S" = c("5", "6", "8", "99", "5", "0", "1", "3", "15", "15", "34", "62", "54", "6", "8", "9"), "1T" = c("6", "4", "6", "9", "5", "11", "33", "7", "8", "24", "34", "62", "66", "4", "78", "44"))
genes_to_select <- c("TIGIT", "CD8B", "CD8A", "CD1C", "F2RL1", "LCP1", "LAG3", "CD2", "PSEN2", "CD3G", "CD28", "PSEN1") # genes I want to select
df <-
df %>%
filter(Names %in% genes_to_select) %>%
column_to_rownames("Names") %>%
mutate(across(.fns = as.numeric)) %>%
as.matrix()
df
#> X1S X1T
#> [1,] 5 6
#> [2,] 8 6
#> [3,] 99 9
#> [4,] 5 5
#> [5,] 0 11
#> [6,] 1 33
#> [7,] 3 7
#> [8,] 15 24
#> [9,] 62 62
#> [10,] 54 66
#> [11,] 6 4
#> [12,] 8 78

We could also use slice
library(dplyr)
library(tibble)
df %>%
slice(match(Names, l)) %>%
column_to_rownames('Names')

One line does the job:
df[rownames(df) %in% l,]
X1S X1T
TIGIT 5 6
CD8B 8 6
CD8A 99 9
CD1C 5 5
F2RL1 0 11
LCP1 1 33
LAG3 3 7
CD2 15 24
PSEN2 62 62
CD3G 54 66
CD28 6 4
PSEN1 8 78
Or if you have Names:
df[df$Names %in% l,]

Pivot from long format to wide format in a dataframe [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
pivot_wider when there's no names column (or when names column should be created)
(2 answers)
Closed 2 years ago.
I have the dataframe below:
dput(Moment[1:15,])
structure(list(SectionCut = c("1", "1", "1", "1", "2", "2", "2",
"2", "3", "3", "3", "3", "Left", "Left", "Left"), N_l = c("1",
"2", "3", "4", "1", "2", "3", "4", "1", "2", "3", "4", "1", "2",
"3"), UG = c("84", "84", "84", "84", "84", "84", "84", "84",
"84", "84", "84", "84", "84", "84", "84"), S = c("12", "12",
"12", "12", "12", "12", "12", "12", "12", "12", "12", "12", "12",
"12", "12"), Sample = c("S00", "S00", "S00", "S00", "S00", "S00",
"S00", "S00", "S00", "S00", "S00", "S00", "S00", "S00", "S00"
), DF = c(0.367164093630677, 0.540130283330855, 0.590662743113521,
0.497030982705986, 0.000319303760901125, 0.000504925126205843,
0.00051127115578891, 0.000395434233037301, 0.413218926236695,
0.610726262711904, 0.685000816613652, 0.59474035159783, 0.483354599644366,
0.645710184115934, 0.625883097885242)), row.names = c(NA, -15L
), class = c("tbl_df", "tbl", "data.frame"))
I want to separate the content of the column by pivoting the SectionCut column. I would basically want to use the opposite of pivot_longer somehow... so at the end the values in column DF will be shown under 5 different columns (the values of SectionCut = c("1", "2", "3", "left", "right")

We could use pivot_wider from tidyr after creating a sequence column with rowid
library(dplyr)
library(tidyr0
library(data.table)
Moment %>%
mutate(rn = rowid(SectionCut)) %>%
pivot_wider(names_from = SectionCut, values_from = DF)
-output
# A tibble: 4 x 9
# N_l UG S Sample rn `1` `2` `3` Left
# <chr> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#1 1 84 12 S00 1 0.367 0.000319 0.413 0.483
#2 2 84 12 S00 2 0.540 0.000505 0.611 0.646
#3 3 84 12 S00 3 0.591 0.000511 0.685 0.626
#4 4 84 12 S00 4 0.497 0.000395 0.595 NA

aggregate subset returning this error: NAs introduced by coercion

I'm having trouble finding the mean for a subset of data. Here are the two questions I'm hoping to answer. The first seems to be working fine, but the second returns the same answer as the first, but without numbers to the right of the decimal place. What's going on?
There is also an error that appears:
NAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercion
# What is the mean suspension rate for schools by farms overall?
aggregate(suspension_rate_total ~ farms, merged_data, FUN = function(suspension_rate_total)
mean(as.numeric(as.character(suspension_rate_total))))
# What is the mean suspension rate for schools with farms > 100?
aggregate(suspension_rate_total ~ farms, merged_data, FUN = function(suspension_rate_total)
mean(as.numeric(as.character(suspension_rate_total))), subset = farms< 100)
Data
merged_data <- structure(list(schid = c("1030642", "1030766", "1030774", "1030840",
"1130103", "1230150", "1530435", "1530492", "1530500", "1931047",
"1931708", "1931864", "1932623", "1933746", "1937226", "1938554",
"1938612", "1938885", "1995836", "1996016"), farms = c("132",
"116", "348", "406", "68", "130", "370", "204", "225", "2,616",
"1,106", "1,918", "1,148", "2,445", "1,123", "1,245", "1,369",
"1,073", "932", "178"), foster = c("2", "0", "1", "8", "1", "4",
"4", "0", "0", "22", "11", "12", "2", "8", "13", "13", "4", "3",
"2", "3"), homeless = c("14", "0", "8", "4", "1", "4", "5", "0",
"14", "35", "42", "116", "9", "8", "34", "54", "26", "31", "5",
"11"), migrant = c("0", "0", "0", "0", "0", "0", "18", "0", "0",
"0", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0"), ell = c("18",
"12", "114", "45", "7", "4", "50", "28", "26", "274", "212",
"325", "95", "112", "232", "185", "121", "84", "24", "35"), suspension_rate_total = c("*",
"20", "0", "0", "95", "5", "*", "256", "78", "33", "20", "1",
"218", "120", "0", "0", "*", "*", "*", "0"), suspension_violent = c("*",
"9", "0", "0", "20", "2", "*", "38", "0", "6", "3", "0", "53",
"35", "0", "0", "*", "*", "*", "0"), suspension_violent_no_injury = c("*",
"6", "0", "0", "47", "1", "*", "121", "52", "7", "13", "1", "77",
"44", "0", "0", "*", "*", "*", "0"), suspension_weapon = c("*",
"0", "0", "0", "8", "0", "*", "1", "0", "1", "1", "0", "4", "3",
"0", "0", "*", "*", "*", "0"), suspension_drug = c("*", "0",
"0", "0", "9", "1", "*", "59", "12", "16", "0", "0", "6", "5",
"0", "0", "*", "*", "*", "0"), suspension_defiance = c("*", "1",
"0", "0", "9", "1", "*", "16", "12", "0", "3", "0", "69", "30",
"0", "0", "*", "*", "*", "0"), suspension_other = c("*", "4",
"0", "0", "2", "0", "*", "21", "2", "3", "0", "0", "9", "3",
"0", "0", "*", "*", "*", "0")), row.names = c(NA, 20L), class = "data.frame")
Thank you so much.
Image-1
Image-2

Tidy up your data:
# replace * with NA
merged_data$suspension_rate_total[merged_data$suspension_rate_total == '*'] <- NA
# convert character to numeric format
merged_data$suspension_rate_total <- as.numeric(merged_data$suspension_rate_total)
# remove comma in strings and convert character to numeric format
merged_data$farms <- as.numeric(gsub(",", "", merged_data$farms))
Output
# What is the mean suspension rate for schools by farms overall?
aggregate(suspension_rate_total ~ farms, merged_data, FUN = mean, na.rm = TRUE)
# farms suspension_rate_total
# 1 68 95
# 2 116 20
# 3 130 5
# 4 178 0
# 5 204 256
# 6 225 78
# 7 348 0
# 8 406 0
# 9 1106 20
# 10 1123 0
# 11 1148 218
# 12 1245 0
# 13 1918 1
# 14 2445 120
# 15 2616 33
# What is the mean suspension rate for schools with farms > 100?
aggregate(suspension_rate_total ~ farms, merged_data, FUN = mean, na.rm = TRUE, subset = farms > 100)
# farms suspension_rate_total
# 1 116 20
# 2 130 5
# 3 178 0
# 4 204 256
# 5 225 78
# 6 348 0
# 7 406 0
# 8 1106 20
# 9 1123 0
# 10 1148 218
# 11 1245 0
# 12 1918 1
# 13 2445 120
# 14 2616 33

Are you sure 'NA's introduced by coercion' is a error and not a warning.
When you convert a character column to numeric :
as.numeric(as.character(suspension_rate_total)) , the blanks are coerced into NA's , which is intimated through warnings.
Also, I get different answers for both blocks of code
> aggregate(suspension_rate_total ~ farms, merged_data, FUN = function(suspension_rate_total)
+ mean(as.numeric(as.character(suspension_rate_total))))
farms suspension_rate_total
1 68 95
2 116 20
3 130 5
4 132 NA
5 178 0
6 204 256
7 225 78
8 348 0
9 370 NA
10 406 0
11 932 NA
> aggregate(suspension_rate_total ~ farms, merged_data, FUN = function(suspension_rate_total)
+ mean(as.numeric(as.character(suspension_rate_total))), subset = farms< 100)
farms suspension_rate_total
1 68 95
>
>
Further, the comment on you second block of code mention farms > 100? , but in you code you used subset = farms< 100

Convert multiple header table to long format

I am reading in an Excel table with multiple rows of headers, which, through read.csv, creates an object like this in R.
R1 <- c("X", "X.1", "X.2", "X.3", "EU", "EU.1", "EU.2", "US", "US.1", "US.2")
R2 <- c("Min Age", "Max Age", "Min Duration", "Max Duration", "1", "2", "3", "1", "2", "3")
R3 <- c("18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01")
R4 <- c("22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05")
R5 <- c("26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21")
R6 <- c("18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40")
R7 <- c("22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50")
R8 <- c("26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99")
table1 <- as.data.frame(rbind(R1, R2, R3, R4, R5, R6, R7, R8))
How do I now 'flatten' this so that I end up with an R table with "Min age", "Max Age", "Min Duration", "Max Duration", "Area", "Level", "Price" columns. With the "Area" column showing either "EU" or "US", the "Level" column showing either 1, 2 or 3, and then the "Price" column showing the corresponding price found in the Excel table?
I would use the gather function from tidyr if there weren't multiple header rows, but can't seem to work it with this data, any ideas?
The output should have a total of 36 rows + headers

If you skip the first row, as suggested by akrun, you will presumably end up with data that looks something like this: (with "X"s and ".1"/".2" added automatically by R)
library(tidyverse)
df <- tribble(
~Min.Age, ~Max.Age, ~Min.Duration, ~Max.Duration, ~X1.1, ~X2.1, ~X3.1, ~X1.2, ~X2.2, ~X3.2,
"18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01",
"22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05",
"26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21",
"18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40",
"22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50",
"26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99"
)
With this data, you can then use gather to collect all headers beginning with X into one column and price into another. You can separate the the headers into the "Level" and "Area". Finally, recode Area and remove "X" from the levels.
df %>%
gather(headers, Price, starts_with("X")) %>%
separate(headers, c("Level", "Area")) %>%
mutate(Area = if_else(Area == "1", "EU", "US"),
Level = parse_number(Level))
#> # A tibble: 36 x 7
#> Min.Age Max.Age Min.Duration Max.Duration Level Area Price
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 18 21 1 3 1 EU 0.12
#> 2 22 25 1 3 1 EU 0.20
#> 3 26 30 1 3 1 EU 0.25
#> 4 18 21 4 5 1 EU 0.32
#> 5 22 25 4 5 1 EU 0.40
#> 6 26 30 4 5 1 EU 0.55
#> 7 18 21 1 3 2 EU 0.32
#> 8 22 25 1 3 2 EU 0.40
#> 9 26 30 1 3 2 EU 0.50
#> 10 18 21 4 5 2 EU 0.60
#> # ... with 26 more rows
Created on 2018-10-12 by the reprex package (v0.2.1)
P.S. You can find lots of spreadsheet munging workflows here: https://nacnudus.github.io/spreadsheet-munging-strategies/small-multiples-with-all-headers-present-for-each-multiple.html

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Errorbars and bar plots having different positions in ggplot - r

Related

Change metrics inside rows by condition

R select rows in dataframe by external vector as index

Pivot from long format to wide format in a dataframe [duplicate]

aggregate subset returning this error: NAs introduced by coercion

Convert multiple header table to long format

Categories

Resources