Avoid hard-coding with pivot longer to pivot multiple columns at once - r

I want to pivot_long() multiple columns of the dataset below avoiding hard-coding. I've seen some similar questions, but I still cannot do it.
Wide data:
> head(data)
ID IND_TEST_SCORE ARG_G1_ABC NARR_G1_ABC ARG_G1_EF NARR_G1_EF ARG_G2_ABC NARR_G2_ABC
1 PART_1 100 68.53 71.32 4.94 3.42 64.90 64.25
2 PART_2 36 65.90 NA 6.55 NA 63.80 59.00
3 PART_3 32 69.78 NA 2.44 NA 71.73 NA
4 PART_4 96 68.29 67.83 3.00 3.17 67.67 67.88
5 PART_5 11 NaN NaN NaN NaN NA 67.08
6 PART_6 12 69.50 71.60 3.25 2.50 NA NA
ARG_G2_EF NARR_G2_EF
1 7.10 5.08
2 7.40 7.00
3 1.09 NA
4 3.67 1.76
5 NA 3.00
6 NA NA
Desired output:
ID IND_TEST_SCORE ABC EF GROUP TYPE
1 PART_1 100 G1 ARG
1 PART_1 100 G1 NARR
1 PART_1 100 G2 ARG
1 PART_1 100 G2 NARR
2 PART_2 36 G1 ARG
2 PART_2 36 G1 NARR
2 PART_2 36 G2 ARG
2 PART_2 36 G2 NARR
so on...
Questions: how can I:
Create a new column called "GROUP" with 'G1' and G2' values
Create a new column called "TYPE" with 'ARG' and NARR' values
Create 2 new columns, one for "ABC" values and another for "EF" values
without hard-coding it? I'd like to work with patterns...Thanks in advance!
My attempt so far:
# create a single "my_names" columns and work on it:
dataLong <- data %>%
pivot_longer(cols = c(-ID, -IND_TEST_SCORE),
names_to = "my_names",
values_to = "my_values") %>%
mutate(GROUP = case_when(my_names == "ARG_G1_ABC" ~ "G1",
my_names == "ARG_G1_ABC" ~ "G2",
my_names == "ARG_G1_EF" ~ "G1",
my_names == "ARG_G2_EF" ~ "G2",
my_names == "NARR_G1_ABC" ~ "G1",
my_names == "NARR_G1_ABC" ~ "G2",
my_names == "NARR_G1_EF" ~ "G1",
my_names == "NARR_G2_EF" ~ "G2")) %>%
mutate(TYPE = case_when(my_names == "ARG_G1_ABC" ~ "ARG",
my_names == "ARG_G2_ABC" ~ "ARG",
my_names == "ARG_G1_EF" ~ "ARG",
my_names == "ARG_G2_EF" ~ "ARG",
my_names == "NARR_G1_ABC" ~ "NARR",
my_names == "NARR_G2_ABC" ~ "NARR",
my_names == "NARR_G1_EF" ~ "NARR",
my_names == "NARR_G2_EF" ~ "NARR"))
Dataset:
> dput(data)
structure(list(ID = structure(c("PART_1", "PART_2", "PART_3",
"PART_4", "PART_5", "PART_6", "PART_7", "PART_8", "PART_9", "PART_10",
"PART_11", "PART_12", "PART_13", "PART_14", "PART_15", "PART_16",
"PART_17", "PART_18", "PART_19", "PART_20", "PART_21", "PART_22",
"PART_23", "PART_24", "PART_25", "PART_26", "PART_27", "PART_28",
"PART_29", "PART_30", "PART_31", "PART_32", "PART_33", "PART_34",
"PART_35", "PART_36", "PART_37", "PART_38", "PART_39", "PART_40",
"PART_41", "PART_42", "PART_43", "PART_44", "PART_45", "PART_46",
"PART_47", "PART_48", "PART_49", "PART_50", "PART_51", "PART_52",
"PART_53", "PART_54", "PART_55", "PART_56", "PART_57", "PART_58",
"PART_59", "PART_60", "PART_61", "PART_62", "PART_63", "PART_64",
"PART_65", "PART_66", "PART_67", "PART_68", "PART_69", "PART_70",
"PART_71"), class = c("glue", "character")), IND_TEST_SCORE = c(100,
36, 32, 96, 11, 12, 32, 72, 100, 64, 2, 19, 99, 86, 60, 108,
95, 35, 60, 9, 78, 61, 61, 67, 105, 99, 51, 21, 65, 30, 0.9,
77, 54, 14, 103, 48, 0.7, 2, 39, 94, 80, 8, 30, 103, 113, 91,
59, 56, 86, 99, 72, 34, 32, 6, 44, 99, 65, 98, 110, 102, 87,
50, 89, 36, 93, 8, 11, 78, 48, 77, 4), ARG_G1_ABC = c(68.53,
65.9, 69.78, 68.29, NaN, 69.5, 67.05, 73.74, 73.59, 72.57, 64.33,
67.79, 72.94, 63.75, 71.56, 75.5, 68.16, NA, 65.64, 68.36, 69.75,
72.73, 67.67, 66.19, 62.94, 72.48, 72.19, 62.44, 72.5, 71.06,
70.4, 69.14, NA, 67.59, 69.1, 74.05, NA, 68.6, 68.27, 59.12,
NA, NA, 63.7, 67.18, NA, 68.38, 63.44, 72.56, 66.06, 66.53, 73.19,
NA, NA, NA, 73.44, 67.45, 72.91, 65.81, 73.96, 75, 75.89, 72,
NA, 68.2, 67.29, 69.91, NaN, 69.67, 68.39, 69.2, 67.55), NARR_G1_ABC = c(71.32,
NA, NA, 67.83, NaN, 71.6, 64.2, 71.68, 73.29, 70.53, 73.35, 59.31,
71.08, 74.06, 68.7, 74, 69.08, NA, 68.52, 63.47, 68.33, NA, 65.64,
62.11, 63.9, 70.41, 60.36, 65.88, 68.81, 69.62, 70.68, 67.5,
NA, 68.45, 67.16, 74.39, 60.6, 65.89, 71.94, 68.75, NA, NA, 67,
66.85, NA, NA, 62.56, 73.33, 69.81, 67.68, 73.06, 65.8, 63.85,
NA, 67.64, 71.6, 68.47, 69.39, 71.16, 72.33, NA, 66.68, NA, 66.22,
67, 61.27, NaN, 72.33, 68.29, 71.33, 65.57), ARG_G1_EF = c(4.94,
6.55, 2.44, 3, NaN, 3.25, 4.71, 2.84, 1.07, 2, 5.33, 5.43, 1.72,
10.55, 3, 1.17, 5.8, NA, 10.55, 4.21, 2.94, 3.55, 6.33, 8.25,
5.88, 2, 3.44, 9.22, 1.69, 4.18, 2.5, 4.71, NA, 4.41, 5.9, 2.21,
NA, 6.67, 3.33, 7, NA, NA, 8, 4.76, NA, 4.44, 2.68, 3.16, 4.94,
5.42, 2.81, NA, NA, NA, 1.78, 6.09, 2.52, 6.56, 1.96, 1.12, 0.67,
3.78, NA, 3.5, 3.65, 5.27, NaN, 4.33, 6.78, 3.6, 4.35), NARR_G1_EF = c(3.42,
NA, NA, 3.17, NaN, 2.5, 3.29, 1.64, 1.07, 6, 1.41, 9.25, 3.25,
2.69, 3.8, 1.32, 3.04, NA, 2.38, 5.18, 2.38, NA, 6.18, 6.11,
6.4, 1.85, 7.45, 3.69, 1.89, 3.25, 1.6, 4.8, NA, 2.8, 4.32, 2.3,
6.6, 7.42, 2.83, 4.75, NA, NA, 5, 4.75, NA, NA, 8, 1.71, 2.67,
2.05, 1.47, 4.8, 7.96, NA, 4.43, 3.8, 4.47, 4.91, 1.68, 2.78,
NA, 6.58, NA, 6.67, 6, 5.18, NaN, 1.67, 4.86, 2.08, 4.38), ARG_G2_ABC = c(64.9,
63.8, 71.73, 67.67, NA, NA, 52.5, 72.35, 65.28, 57.22, NA, NaN,
69, 66.67, NaN, 66.58, 69, 60.55, 56.29, 67.45, 68.4, 64.25,
NaN, 50.86, 67.83, 65.96, 57, 53.07, 66.89, NaN, NA, 59, 61.5,
NA, 65.9, 64.07, NA, NA, 57.91, 67.89, 68.75, 68.5, NaN, 63.24,
66.19, 60.59, 59.24, 54.33, 64.39, 65.83, 65.71, 63, 63.78, 63.62,
64, 65.08, NA, 67.61, 67.57, 72.71, 65.46, 61.71, NA, 57.62,
NA, NA, NA, 64, 61.33, 62.64, NA), NARR_G2_ABC = c(64.25, 59,
NA, 67.88, 67.08, NA, 60.75, 64.42, 71.17, 58.42, NA, 49.8, 63.36,
65.2, NaN, 70.2, 62.85, NaN, 61.6, 53.92, 62.63, NA, NaN, 50.46,
65.14, 60.58, 63.29, NA, 64.33, NaN, NA, 68.57, NA, NA, 66.3,
NA, 57.29, NA, 53.5, 63.48, NA, 57.07, NaN, 61.82, NA, 68.61,
57.1, 62.84, 63, 61.91, 58.38, NaN, 61.56, NA, NaN, 65.55, 63.8,
65, 63.14, 67.31, 67.75, 57.62, 63.31, 54.83, 66.43, NA, NA,
64.67, 57.92, 59, NA), ARG_G2_EF = c(7.1, 7.4, 1.09, 3.67, NA,
NA, 12.75, 1.24, 3.28, 9.78, NA, NaN, 1.71, 1.93, NaN, 6.21,
2.76, 7.91, 8.65, 3.55, 3.4, 5, NaN, 16.05, 3.39, 4.52, 13, 11.6,
5.05, NaN, NA, 9.5, 9.67, NA, 7.03, 3.87, NA, NA, 8, 3.33, 2.19,
3, NaN, 8.53, 3.37, 5.47, 7.35, 13.48, 5.33, 3.83, 3.65, 5.82,
4, 6.17, 6, 6.42, NA, 3.83, 2.71, 2.19, 4.58, 5.18, NA, 9.75,
NA, NA, NA, 5, 6.44, 5.36, NA), NARR_G2_EF = c(5.08, 7, NA, 1.76,
3, NA, 8.88, 4.26, 2.92, 7.08, NA, 10.6, 5.5, 4.16, NaN, 2.87,
4.7, NaN, 7, 9.5, 4.68, NA, NaN, 12.75, 4.77, 9.15, 5, NA, 5.44,
NaN, NA, 4.57, NA, NA, 1.7, NA, 11.29, NA, 13.33, 5.95, NA, 10.79,
NaN, 5.18, NA, 5.22, 7.1, 3.53, 5.75, 6.77, 6.31, NaN, 7.88,
NA, NaN, 3, 4.88, 4.69, 6.19, 10.31, 3.62, 9.75, 5.46, 6.83,
4.43, NA, NA, 3.67, 8.67, 8.53, NA)), row.names = c(NA, -71L), class = "data.frame")

We may use pivot_longer - specify the columns with matches that match the column names substring _ABC or _EF at the end ($) of the string and split the column names at _ by specifying names_sep as _ as well as specify the corresponding column names in names_to (.value will return the value of the columns where as TYPE or GROUP gets the first and second substring from column names
library(tidyr)
pivot_longer(data, cols = matches('_(ABC|EF)$'),
names_to = c("TYPE", "GROUP", ".value"),
names_sep = "_", values_drop_na = TRUE)
-output
# A tibble: 217 × 6
ID IND_TEST_SCORE TYPE GROUP ABC EF
<glue> <dbl> <chr> <chr> <dbl> <dbl>
1 PART_1 100 ARG G1 68.5 4.94
2 PART_1 100 NARR G1 71.3 3.42
3 PART_1 100 ARG G2 64.9 7.1
4 PART_1 100 NARR G2 64.2 5.08
5 PART_2 36 ARG G1 65.9 6.55
6 PART_2 36 ARG G2 63.8 7.4
7 PART_2 36 NARR G2 59 7
8 PART_3 32 ARG G1 69.8 2.44
9 PART_3 32 ARG G2 71.7 1.09
10 PART_4 96 ARG G1 68.3 3
# … with 207 more rows

Related

does R have a "alternate()" function?

I have a large df and I'm trying to relocate the columns with patterns instead of manually write each column name in select(). More details here.
A glimpse of the issue (edit): All my columns share a pattern ARG_G1_50_AAA or ARG_G2_50_AAA or NARR_G1_50_AAA or NARR_G2_50_AAA. The final parts are: AAA, AAC, AC and AB. I need two subsets of this data.
Set 1: I need to intercalate "G1" and "G2" columns (in the order 50, 100, 150 and 200) and in the order (AAA, AAC, AC and AB). Ex:
NARR_G1_50_AAA, NARR_G2_50_AAA,
NARR_G1_50_AAC, NARR_G2_50_AAC.... so on
Set 2: I need to intercalate "Narr" and "Arg" columns (again, 50 before 100, 150 and 200 and AAA before AAC, AC and AB). No need to intercalate G1 and G2 now. Ex:
NARR_G1_50_AAA, ARG_G1_50_AAA,
NARR_G2_50_AAA, ARG_G2_50_AAA... so on
Basically, I was able to partially solve my problem (cf. linked post above) with:
dfPaired <- merged_DF %>%
dplyr::select(ID, str_subset(names(merged_DF), "G?_50\\w*"))
head(dfPaired)
ID ARG_G1_50_AAA ARG_G1_50_AAC ARG_G1_50_AC ARG_G1_50_AB
ARG_G2_50_AAA ARG_G2_50_AAC, ARG_G2_50_AC ARG_G2_50_AB....
## I know that I'm only getting the "50" here, in fact I need all, but It wouldn't be "A" problem to repeat the code for 100, 150, 200)
How can I make R "intercalate" the strings? I mean, I need:
ARG_G1_50_AAA, ARG_G2_50_AAA
ARG_G1_50_AAC, ARG_G2_50_AAC,
ARG_G1_50_AC, ARG_G2_50_AC,
ARG_G1_50_AB, ARG_G2_50_AB ... (so on)
(intercalate G1 and G2 coluns in case of set 1)
Questions :
Could I use sth as seq(by = 2) ?
Is there a way to pass two patterns to str() and ask it to intercalate the output?
Is there an "intercalate()" function that I could pass to str_subset(names(merged_DF), "G?_50\\w*")) ?
** I mean, sth as int(str_subset(names(merged_DF), "G1_50\w*")), str_subset(names(merged_DF), "G2_50\w*")) Thanks in advance :)
EDIT:
dput(merged_DF[1:50])
structure(list(ID = structure(c("P1", "P2", "P3", "P4", "P5",
"P6", "P7", "P8", "P9", "P10", "P11", "P12", "P13", "P14", "P15",
"P16", "P17", "P18", "P19", "P20", "P21", "P22", "P23", "P24",
"P25", "P26", "P27", "P28", "P29", "P30", "P31", "P32", "P33",
"P34", "P35", "P36", "P37", "P38", "P39", "P40", "P41", "P42",
"P43", "P44", "P45", "P46", "P47", "P48", "P49", "P50", "P51",
"P52", "P53", "P54", "P55", "P56", "P57", "P58", "P59", "P60",
"P61", "P62", "P63", "P64", "P65", "P66", "P67", "P68", "P69",
"P70", "P71"), class = c("glue", "character")), ARG_G1_100_AAA = c(68.53,
65.9, 69.78, 68.29, NaN, 69.5, 67.05, 73.74, 73.59, 72.57, 64.33,
67.79, 72.94, 63.75, 71.56, 75.5, 68.16, NA, 65.64, 68.36, 69.75,
72.73, 67.67, 66.19, 62.94, 72.48, 72.19, 62.44, 72.5, 71.06,
70.4, 69.14, NA, 67.59, 69.1, 74.05, NA, 68.6, 68.27, 59.12,
NA, NA, 63.7, 67.18, NA, 68.38, 63.44, 72.56, 66.06, 66.53, 73.19,
NA, NA, NA, 73.44, 67.45, 72.91, 65.81, 73.96, 75, 75.89, 72,
NA, 68.2, 67.29, 69.91, NaN, 69.67, 68.39, 69.2, 67.55), ARG_G1_100_AAC = c(70.18,
67.65, 71.89, 70.42, NaN, 72.38, 69.67, 75.63, 76.7, 76.21, 66.5,
70.57, 76.72, 66.4, 74.75, 79.17, 70.84, NA, 67.82, 70, 71.88,
74.55, 69.33, 69.5, 65.25, 75.05, 75.44, 64.56, 74.88, 74.29,
72.4, 71.93, NA, 69.12, 71.43, 77.53, NA, 71.93, 70.4, 60.25,
NA, NA, 64.8, 69, NA, 71.19, 71.12, 75.04, 68.89, 68.26, 75.81,
NA, NA, NA, 75.89, 68.82, 77.35, 68.38, 76.71, 79.12, 78.89,
73.5, NA, 69.7, 69.82, 70.91, NaN, 72, 71.17, 71.85, 69.7), ARG_G1_100_AC = c(4.35,
4.95, 1.44, 2.71, NaN, 3.25, 3.95, 2.26, 0.85, 1.21, 5.33, 5.43,
0.83, 10.4, 2.56, 0.33, 4.92, NA, 10.55, 3.43, 2.94, 1.55, 5.33,
6.44, 5.25, 2, 3.12, 8.5, 1.38, 3.76, 1.9, 2.79, NA, 4.06, 5.57,
1.95, NA, 6.07, 2.67, 7, NA, NA, 8, 4.76, NA, 4.19, 2.68, 3,
4.94, 4.79, 2.19, NA, NA, NA, 1.78, 5.27, 2.52, 5.88, 1.96, 1.12,
0.67, 3.28, NA, 3.5, 3.41, 3.73, NaN, 3.83, 6.06, 3.3, 3.9),
ARG_G1_100_AB = c(4.94, 6.55, 2.44, 3, NaN, 3.25, 4.71, 2.84,
1.07, 2, 5.33, 5.43, 1.72, 10.55, 3, 1.17, 5.8, NA, 10.55,
4.21, 2.94, 3.55, 6.33, 8.25, 5.88, 2, 3.44, 9.22, 1.69,
4.18, 2.5, 4.71, NA, 4.41, 5.9, 2.21, NA, 6.67, 3.33, 7,
NA, NA, 8, 4.76, NA, 4.44, 2.68, 3.16, 4.94, 5.42, 2.81,
NA, NA, NA, 1.78, 6.09, 2.52, 6.56, 1.96, 1.12, 0.67, 3.78,
NA, 3.5, 3.65, 5.27, NaN, 4.33, 6.78, 3.6, 4.35), ARG_G1_150_AAA = c(93.38,
90.2, 98.33, 94.69, NaN, 99, 93.64, 104.22, 104.8, 103.17,
87, 93.83, 101.89, 87.5, 100.38, 107, 94.69, NA, 90.75, 91.5,
93.88, 99.5, NaN, 89.5, 86.5, 100.55, 101, 84.22, 101.88,
94.62, 97.2, 96.5, NA, 87.38, 96.82, 103.67, NA, 97.57, 95.86,
84, NA, NA, 85.5, 90.5, NA, 96.29, 89.71, 101.64, 92.33,
93.89, 104.43, NA, NA, NA, 101.33, 93.5, 105.42, 90.75, 104.23,
108.86, 102.67, 97, NA, 91.9, 91.38, 93.5, NaN, 98, 94.78,
95.1, 93.4), ARG_G1_150_AAC = c(96.38, 90.9, 100, 96.08,
NaN, 99.5, 95.82, 106.33, 106.6, 106.5, 92, 95.83, 104, 89,
103.75, 109, 96.92, NA, 93, 93.17, 95.12, 102.75, NaN, 93.5,
89.38, 102.09, 104.12, 85.44, 103.38, 96.75, 99.2, 98.5,
NA, 90.38, 99.18, 105.89, NA, 99.43, 97, 84, NA, NA, 86.75,
91.88, NA, 96.86, 98.64, 103.71, 94.22, 95.22, 105.71, NA,
NA, NA, 102.33, 94.25, 108.08, 91.75, 107, 112.29, 106.33,
98.22, NA, 93.5, 93.25, 94.25, NaN, 100, 96.78, 97.8, 95.5
), ARG_G1_150_AC = c(8.75, 10.1, 3.67, 5.23, NaN, 6.5, 6.73,
4.78, 2.27, 3.17, 12, 9.83, 3.44, 21.1, 4.25, 2, 11.85, NA,
17.5, 6.17, 7.25, 3, NaN, 13.5, 10.62, 5, 5.75, 17.44, 4,
10.75, 5, 5.5, NA, 9.5, 9.36, 3.56, NA, 10, 6.86, 9.5, NA,
NA, 16.25, 10.25, NA, 10.43, 6, 6.21, 9.22, 9.22, 5.14, NA,
NA, NA, 3, 10.75, 6, 12.88, 3.77, 2.57, 4.33, 7.22, NA, 8.6,
7.88, 10, NaN, 7, 11.67, 7.8, 7.7), ARG_G1_150_AB = c(10.12,
12.6, 5.33, 5.77, NaN, 6.5, 7.91, 5.44, 2.53, 4.33, 12, 9.83,
4.78, 21.4, 5.25, 3, 13.77, NA, 17.5, 7.33, 7.25, 6, NaN,
16.5, 11.5, 5, 6.25, 18.67, 4.5, 11.38, 5.8, 8.5, NA, 10,
9.82, 4.33, NA, 11, 7.71, 9.5, NA, NA, 16.25, 10.25, NA,
10.86, 6, 7, 9.22, 10.33, 6.43, NA, NA, NA, 3.33, 11.75,
6, 14, 3.77, 2.57, 4.33, 8.22, NA, 8.8, 9, 12, NaN, 8, 12.67,
8.2, 8.4), ARG_G1_200_AAA = c(121.5, 110.6, NaN, 120.57,
NaN, NaN, 115.67, 132.4, 131.11, 128.5, NaN, 114.5, 126.25,
107.4, 124.67, NaN, 120.5, NA, 108, 110.5, 114.33, 125, NaN,
114.67, 108, 123.5, 126.67, 105.5, 129.67, 117.75, 121, 120,
NA, 108.5, 122.83, 130.8, NA, 123.67, 119, NaN, NA, NA, NaN,
109.75, NA, 119, 114.75, 128.88, 115.25, 117, 134, NA, NA,
NA, NaN, 113, 131.86, 110.67, 133.57, 138.33, 127.5, 118.25,
NA, 112.8, 111.5, 113, NaN, NaN, 114.25, 118, 112.8), ARG_G1_200_AAC = c(123.25,
111.6, NaN, 121.29, NaN, NaN, 116.33, 133.4, 132.89, 130.5,
NaN, 115.5, 129.5, 108.2, 128.33, NaN, 123, NA, 108, 111.5,
115.67, 125, NaN, 118, 112, 125.17, 129, 105.75, 130.33,
119.5, 121.4, 121, NA, 109.75, 124.33, 133.4, NA, 125, 120.33,
NaN, NA, NA, NaN, 110.75, NA, 123, 124, 129.75, 117.5, 117.2,
134, NA, NA, NA, NaN, 116, 134.43, 111.33, 135, 141.33, 129.5,
119.5, NA, 114, 113.5, 113, NaN, NaN, 115.5, 120.6, 114),
ARG_G1_200_AC = c(12, 15.6, NaN, 8, NaN, NaN, 10.83, 7.8,
5.33, 6, NaN, 16.5, 6.75, 31.2, 9.33, NaN, 18, NA, 30, 14.5,
13, 11, NaN, 19.67, 17, 9, 9.33, 25.5, 8, 16.25, 9.6, 9,
NA, 16, 12.67, 6.2, NA, 13.67, 11.67, NaN, NA, NA, NaN, 17.5,
NA, 17, 9, 9.5, 14.75, 15.8, 8, NA, NA, NA, NaN, 23, 10.43,
21.33, 5.71, 4.67, 10.25, 13.25, NA, 14.6, 13.25, 19, NaN,
NaN, 21.5, 13.2, 14.6), ARG_G1_200_AB = c(14, 19.4, NaN,
8.71, NaN, NaN, 12.5, 9, 6, 8, NaN, 16.5, 8.5, 31.8, 11,
NaN, 21, NA, 30, 15.5, 13, 15, NaN, 24, 18, 9, 10, 27, 9,
17.25, 10.8, 12, NA, 17, 13.5, 7.2, NA, 14.67, 14, NaN, NA,
NA, NaN, 17.5, NA, 17.67, 9, 10.88, 14.75, 17, 9.67, NA,
NA, NA, NaN, 24, 10.43, 23.33, 5.71, 4.67, 10.5, 15, NA,
14.8, 14.75, 21, NaN, NaN, 23.25, 13.8, 15.8), ARG_G1_50_AAA = c(36.35,
35.88, 36.22, 35.72, 36.12, 36.96, 35.24, 37.62, 36.05, 34.63,
34.19, 33.71, 36.22, 34.43, 34.95, 34.59, 36.03, NA, 32.61,
35.29, 37.17, 37.13, 35.62, 34.64, 34.4, 35.69, 37.36, 36.4,
36.69, 35.8, 36.57, 35.97, NA, 36.44, 34.94, 35.26, NA, 34.44,
37.85, 33.15, NA, NA, 36.13, 34.91, NA, 35.54, 29.02, 35.55,
35.64, 35.79, 35.93, NA, NA, NA, 37, 32.58, 35.71, 34.98,
36.64, 33.29, 35.29, 37.2, NA, 36.29, 36.91, 31.26, 34, 37.48,
33.89, 36.34, 35.88), ARG_G1_50_AAC = c(41.19, 38.7, 41.22,
40.53, 44.12, 41.04, 40.18, 42.38, 42.17, 41.87, 38, 41.21,
42.24, 38.69, 42.64, 42.14, 41.53, NA, 39.65, 40.76, 41.88,
42.23, 39.62, 41.55, 38.19, 42.53, 42.24, 39.49, 42.07, 43.3,
40.92, 39.92, NA, 40.35, 40.49, 44.11, NA, 41.72, 40.64,
36.15, NA, NA, 39.03, 40.86, NA, 40.93, 37.95, 42.27, 39.47,
39.72, 42.12, NA, NA, NA, 42.11, 39.81, 42.82, 39.12, 42.67,
43.02, 43.58, 42.61, NA, 40.04, 41.42, 40.9, 41.5, 41.62,
40.02, 41.08, 40.18), ARG_G1_50_AC = c(0.98, 1.5, 0.37, 0.6,
0.88, 0.73, 1.51, 0.23, 0.25, 0.42, 1.67, 1.58, 0.31, 3.27,
0.62, 0.05, 0.83, NA, 3.71, 1.47, 1.07, 0.1, 1.81, 1.19,
1.62, 0.61, 0.76, 1.73, 0.24, 0.64, 0.33, 0.97, NA, 0.6,
1.98, 0.34, NA, 1.69, 0.26, 2.12, NA, NA, 1.5, 1.14, NA,
1, 0.65, 0.88, 1.62, 1.3, 0.39, NA, NA, NA, 0.57, 1.48, 0.58,
2.21, 0.43, 0.24, 0.16, 0.65, NA, 0.96, 0.4, 1.13, 1.5, 1.05,
1.91, 0.7, 0.94), ARG_G1_50_AB = c(1.09, 2.24, 0.74, 0.68,
0.88, 0.73, 1.82, 0.38, 0.36, 0.89, 1.67, 1.58, 0.76, 3.27,
0.83, 0.45, 1.15, NA, 3.71, 1.82, 1.07, 1.16, 2.25, 1.93,
1.86, 0.61, 1, 2.09, 0.31, 0.86, 0.61, 1.73, NA, 0.77, 2.18,
0.34, NA, 1.92, 0.49, 2.12, NA, NA, 1.5, 1.14, NA, 1.2, 0.65,
0.88, 1.62, 1.49, 0.63, NA, NA, NA, 0.57, 1.77, 0.58, 2.6,
0.43, 0.24, 0.16, 0.85, NA, 0.96, 0.4, 1.84, 1.5, 1.05, 2.4,
0.76, 1.14), ARG_G2_100_AAA = c(64.9, 63.8, 71.73, 67.67,
NA, NA, 52.5, 72.35, 65.28, 57.22, NA, NaN, 69, 66.67, NaN,
66.58, 69, 60.55, 56.29, 67.45, 68.4, 64.25, NaN, 50.86,
67.83, 65.96, 57, 53.07, 66.89, NaN, NA, 59, 61.5, NA, 65.9,
64.07, NA, NA, 57.91, 67.89, 68.75, 68.5, NaN, 63.24, 66.19,
60.59, 59.24, 54.33, 64.39, 65.83, 65.71, 63, 63.78, 63.62,
64, 65.08, NA, 67.61, 67.57, 72.71, 65.46, 61.71, NA, 57.62,
NA, NA, NA, 64, 61.33, 62.64, NA), ARG_G2_100_AAC = c(65.7,
65.8, 74.45, 68, NA, NA, 53.75, 73.94, 67.24, 58.22, NA,
NaN, 71.07, 68.07, NaN, 69.88, 71.32, 62.18, 58.65, 76.45,
71.13, 67.25, NaN, 51.76, 69.33, 68.17, 58, 54.27, 68.05,
NaN, NA, 61, 61.67, NA, 67.79, 65.93, NA, NA, 59.27, 69.67,
71.38, 70, NaN, 64.88, 68.19, 62.06, 61, 55.48, 65.67, 67.72,
68.47, 64, 65.11, 66, 67.5, 66.33, NA, 69.61, 69.33, 75.67,
68.17, 63, NA, 58.81, NA, NA, NA, 66.5, 62.33, 65, NA), ARG_G2_100_AC = c(7.1,
6.4, 0.18, 3.67, NA, NA, 12.75, 1.24, 2.96, 9.78, NA, NaN,
1.43, 1.33, NaN, 5.21, 2.76, 7.91, 8.06, 2.36, 2.87, 4, NaN,
15.52, 2.67, 4.17, 13, 10.07, 5.05, NaN, NA, 9.5, 8.17, NA,
5.86, 3.87, NA, NA, 7, 3.33, 1.75, 3, NaN, 7.94, 3.11, 5.29,
5.29, 13.1, 3.78, 3.33, 3.06, 5.18, 2.56, 5.04, 5.5, 5.75,
NA, 2.22, 2.48, 1, 3.83, 4.82, NA, 8.19, NA, NA, NA, 5, 6.44,
5.29, NA), ARG_G2_100_AB = c(7.1, 7.4, 1.09, 3.67, NA, NA,
12.75, 1.24, 3.28, 9.78, NA, NaN, 1.71, 1.93, NaN, 6.21,
2.76, 7.91, 8.65, 3.55, 3.4, 5, NaN, 16.05, 3.39, 4.52, 13,
11.6, 5.05, NaN, NA, 9.5, 9.67, NA, 7.03, 3.87, NA, NA, 8,
3.33, 2.19, 3, NaN, 8.53, 3.37, 5.47, 7.35, 13.48, 5.33,
3.83, 3.65, 5.82, 4, 6.17, 6, 6.42, NA, 3.83, 2.71, 2.19,
4.58, 5.18, NA, 9.75, NA, NA, NA, 5, 6.44, 5.36, NA), ARG_G2_150_AAA = c(85.25,
NaN, 99, NaN, NA, NA, 66.86, 101, 89.31, 71.33, NA, NaN,
94.5, 88.57, NaN, 95, 95.5, 81.5, 78.5, 107.75, 93.43, NaN,
NaN, 66.18, 92.33, 92.25, NaN, 67.43, 87.44, NaN, NA, NaN,
78, NA, 89.81, 86.43, NA, NA, 75.75, 91.67, 95, NaN, NaN,
85.12, 91.47, 81.88, 79.38, 72.45, 87.67, 91.22, 90.88, 83,
85, 89.23, NaN, 86.2, NA, 92, 93.09, 100.27, 88.62, 83.88,
NA, 75, NA, NA, NA, NaN, 80, 83.5, NA), ARG_G2_150_AAC = c(86.75,
NaN, 101, NaN, NA, NA, 67.29, 103.75, 91.15, 71.67, NA, NaN,
96.33, 88.86, NaN, 96.23, 97.5, 83.5, 79.12, 109.5, 95, NaN,
NaN, 66.45, 93.56, 93.42, NaN, 68, 88.33, NaN, NA, NaN, 78,
NA, 91.69, 87, NA, NA, 76.75, 93, 96.88, NaN, NaN, 85.5,
92.67, 83.38, 80.25, 73.09, 88.33, 92.44, 92.38, 84.25, 85.33,
91.23, NaN, 87.8, NA, 92.67, 94.09, 102.09, 90.15, 84.75,
NA, 76.14, NA, NA, NA, NaN, 81, 85.67, NA), ARG_G2_150_AC = c(15.75,
NaN, 1, NaN, NA, NA, 25.71, 2.62, 6.85, 19.33, NA, NaN, 3.83,
4.57, NaN, 9.85, 6.5, 15.5, 13.88, 3.75, 6.29, NaN, NaN,
27.36, 5.67, 8.42, NaN, 18.86, 11.33, NaN, NA, NaN, 19, NA,
11.25, 9.57, NA, NA, 12.75, 6, 4.5, NaN, NaN, 15.75, 5.67,
10.75, 9.75, 24.82, 8.67, 6.67, 5.88, 13.25, 7, 10, NaN,
10.6, NA, 6.56, 4.18, 2.55, 8.54, 9.75, NA, 17.86, NA, NA,
NA, NaN, 15.67, 13.17, NA), ARG_G2_150_AB = c(15.75, NaN,
2, NaN, NA, NA, 25.71, 2.62, 8.69, 19.33, NA, NaN, 4.33,
5.43, NaN, 11.31, 6.5, 15.5, 14.75, 6, 7.14, NaN, NaN, 28.27,
7.22, 9, NaN, 21.29, 11.33, NaN, NA, NaN, 22, NA, 13.44,
9.71, NA, NA, 14.75, 6, 5.12, NaN, NaN, 16.75, 6, 11.25,
12.75, 25.36, 11.11, 7.33, 6.62, 14.25, 9.33, 11.62, NaN,
11.8, NA, 9.22, 4.91, 4.64, 10, 10.38, NA, 19.86, NA, NA,
NA, NaN, 15.67, 13.33, NA), ARG_G2_200_AAA = c(NaN, NaN,
125, NaN, NA, NA, 81.33, 129.5, 112.25, NaN, NA, NaN, 117.5,
108.33, NaN, 120, 119.25, 99, 94, 134, 113.67, NaN, NaN,
77.67, 112.25, 112.86, NaN, 78.33, 106.6, NaN, NA, NaN, NaN,
NA, 112.4, 106.67, NA, NA, 93, NaN, 122, NaN, NaN, 104.25,
114.89, 101.25, 96.75, 87, 107, 112.25, 112.25, 100, NaN,
111.86, NaN, 101, NA, 114, 114.5, 124.17, 108.86, 103.25,
NA, 90.67, NA, NA, NA, NaN, NaN, 99, NA), ARG_G2_200_AAC = c(NaN,
NaN, 126, NaN, NA, NA, 82.33, 129.75, 113.5, NaN, NA, NaN,
118, 109.33, NaN, 120.71, 120.25, 101, 94.25, 136, 114, NaN,
NaN, 78, 114, 114, NaN, 78.67, 106.8, NaN, NA, NaN, NaN,
NA, 114, 108.33, NA, NA, 93, NaN, 123, NaN, NaN, 104.25,
116.67, 102.75, 97.25, 87.67, 107.75, 113.25, 113.25, 101,
NaN, 113.14, NaN, 101, NA, 114.5, 115, 126.17, 111.29, 104.25,
NA, 92, NA, NA, NA, NaN, NaN, 99, NA), ARG_G2_200_AC = c(NaN,
NaN, 1, NaN, NA, NA, 36, 5.25, 12.25, NaN, NA, NaN, 8.5,
8.33, NaN, 14.29, 11.38, 24, 22.25, 6, 11.67, NaN, NaN, 42.5,
9.25, 13.14, NaN, 32, 19.4, NaN, NA, NaN, NaN, NA, 15.6,
17, NA, NA, 24, NaN, 6.67, NaN, NaN, 21.5, 8.89, 17.5, 16,
37.83, 15.75, 12.25, 11.75, 20, NaN, 15.43, NaN, 26, NA,
12.25, 7.5, 5.67, 12.86, 14.75, NA, 27, NA, NA, NA, NaN,
NaN, 28.5, NA), ARG_G2_200_AB = c(NaN, NaN, 2, NaN, NA, NA,
36, 5.25, 16, NaN, NA, NaN, 10, 9.33, NaN, 16.57, 11.38,
24, 23.25, 9, 13, NaN, NaN, 44.33, 11.5, 14.29, NaN, 35,
19.4, NaN, NA, NaN, NaN, NA, 18.8, 17.33, NA, NA, 26, NaN,
7.67, NaN, NaN, 22.5, 9.33, 18.25, 20.25, 38.67, 19, 13.25,
13.25, 22, NaN, 18, NaN, 28, NA, 15.75, 8.83, 8.17, 15.14,
16, NA, 29.33, NA, NA, NA, NaN, NaN, 29, NA), ARG_G2_50_AAA = c(36.97,
35.4, 34.72, 33.81, NA, NA, 32.98, 35.7, 35.59, 35.36, NA,
36, 37.66, 36.35, 33.44, 34.72, 36.9, 34.32, 32.28, 33.74,
36.38, 35.06, 34.5, 31.47, 36.59, 36.18, 34.75, 31.9, 36.53,
32.62, NA, 33.85, 34.86, NA, 35.36, 34.52, NA, NA, 33.68,
35.89, 36.24, 37.21, 28, 34.05, 36.3, 34.16, 32.86, 32.06,
34.65, 35.57, 35.95, 33.19, 34.61, 34.6, 34.92, 34.24, NA,
34.33, 35.65, 36.16, 33.91, 34.37, NA, 33.44, NA, NA, NA,
33.93, 33.71, 35.42, NA), ARG_G2_50_AAC = c(40.2, 38.6, 42.09,
39.25, NA, NA, 35.68, 41.41, 39.12, 37.68, NA, 39, 41.16,
40.67, 36.11, 39.25, 40.65, 37.52, 35.14, 41.26, 41.13, 40.71,
36.25, 33.33, 40.59, 39.67, 36.83, 34.44, 40.57, 34, NA,
37, 36.45, NA, 39.52, 38.17, NA, NA, 36.52, 40.39, 40.69,
41.21, 29, 39.63, 40.23, 37.27, 36.58, 34.45, 38.87, 38.98,
39.51, 38.13, 37.68, 37.88, 38.85, 38.48, NA, 40, 40.43,
42.73, 39.93, 38.19, NA, 36.41, NA, NA, NA, 39.71, 36.43,
38.03, NA), ARG_G2_50_AC = c(0.8, 1.9, 0, 0.5, NA, NA, 2.93,
0.52, 0.58, 2.75, NA, 1.25, 0.21, 0.25, 2.11, 2, 0.85, 2.03,
2.67, 0.71, 0.82, 0.29, 0.75, 4.27, 0.63, 0.78, 2.92, 2.77,
1.17, 4.88, NA, 3, 2.64, NA, 1.78, 0.98, NA, NA, 2.29, 0.82,
0.45, 0.93, 6, 1.67, 0.86, 1.27, 1.79, 3.37, 1.11, 0.74,
0.79, 1.1, 0.71, 1.11, 1.08, 2.48, NA, 0.17, 0.75, 0.22,
0.91, 1.19, NA, 1.66, NA, NA, NA, 1.07, 1.75, 1.42, NA),
ARG_G2_50_AB = c(0.8, 2, 0.31, 0.5, NA, NA, 2.93, 0.52, 0.58,
2.75, NA, 1.25, 0.34, 0.5, 3.33, 2.44, 0.85, 2.03, 2.91,
1.42, 1, 0.94, 0.75, 4.63, 0.85, 0.96, 2.92, 3.49, 1.17,
4.88, NA, 3, 3.36, NA, 2.3, 0.98, NA, NA, 2.61, 0.82, 0.52,
0.93, 6, 1.91, 1.02, 1.34, 2.58, 3.67, 1.59, 0.96, 1.09,
1.39, 1.5, 1.65, 1.15, 2.76, NA, 0.93, 0.8, 0.82, 1.25, 1.44,
NA, 2.49, NA, NA, NA, 1.07, 1.75, 1.47, NA), NARR_G1_100_AAA = c(71.32,
NA, NA, 67.83, NaN, 71.6, 64.2, 71.68, 73.29, 70.53, 73.35,
59.31, 71.08, 74.06, 68.7, 74, 69.08, NA, 68.52, 63.47, 68.33,
NA, 65.64, 62.11, 63.9, 70.41, 60.36, 65.88, 68.81, 69.62,
70.68, 67.5, NA, 68.45, 67.16, 74.39, 60.6, 65.89, 71.94,
68.75, NA, NA, 67, 66.85, NA, NA, 62.56, 73.33, 69.81, 67.68,
73.06, 65.8, 63.85, NA, 67.64, 71.6, 68.47, 69.39, 71.16,
72.33, NA, 66.68, NA, 66.22, 67, 61.27, NaN, 72.33, 68.29,
71.33, 65.57), NARR_G1_100_AAC = c(74.26, NA, NA, 70.94,
NaN, 75, 66.14, 74.48, 77.07, 73.47, 76, 60.44, 73.92, 77.19,
71.4, 77.59, 72, NA, 70.38, 65.47, 70.54, NA, 68.09, 64.61,
66.5, 72.52, 62.59, 69.25, 71.48, 71.88, 74.4, 70.1, NA,
70, 69.6, 78.04, 62.3, 68.79, 73.44, 72.25, NA, NA, 67, 68.25,
NA, NA, 65.94, 75.71, 72.43, 69.68, 76, 68.6, 65.65, NA,
70.43, 74, 71.76, 71.17, 74.63, 74.22, NA, 69.47, NA, 68.72,
67, 62.82, NaN, 77.33, 69.76, 75.42, 67.62), NARR_G1_100_AC = c(3.05,
NA, NA, 2.33, NaN, 2.4, 1.89, 0.84, 0.07, 5.47, 1.12, 8.81,
2.39, 1.38, 3.6, 0.88, 2.65, NA, 2.05, 5.18, 2.38, NA, 5,
4.78, 6.4, 1.85, 7.41, 3.69, 1.85, 2.62, 1.28, 3.9, NA, 2.35,
3.8, 1.87, 5.1, 6.95, 1.67, 4.5, NA, NA, 4, 4.25, NA, NA,
7.17, 1.29, 2.62, 1.37, 1.47, 3.3, 7.27, NA, 3.64, 3.6, 2.59,
4.83, 0.63, 2.28, NA, 6.58, NA, 4.56, 6, 4.82, NaN, 0.67,
3.95, 1.75, 4.38), NARR_G1_100_AB = c(3.42, NA, NA, 3.17,
NaN, 2.5, 3.29, 1.64, 1.07, 6, 1.41, 9.25, 3.25, 2.69, 3.8,
1.32, 3.04, NA, 2.38, 5.18, 2.38, NA, 6.18, 6.11, 6.4, 1.85,
7.45, 3.69, 1.89, 3.25, 1.6, 4.8, NA, 2.8, 4.32, 2.3, 6.6,
7.42, 2.83, 4.75, NA, NA, 5, 4.75, NA, NA, 8, 1.71, 2.67,
2.05, 1.47, 4.8, 7.96, NA, 4.43, 3.8, 4.47, 4.91, 1.68, 2.78,
NA, 6.58, NA, 6.67, 6, 5.18, NaN, 1.67, 4.86, 2.08, 4.38),
NARR_G1_150_AAA = c(102, NA, NA, 96.22, NaN, 105.33, 87.1,
100.14, 106.17, 97.67, 99.88, 75.43, 99.62, 106.86, 95.3,
105.68, 97.14, NA, 92.82, 87.25, 96.23, NA, 88.5, 83.56,
89.75, 98.47, 80.64, 92.14, 96.07, 94.62, 99.46, 100, NA,
92.6, 94.54, 106.25, 82.5, 93.6, 100.33, 95, NA, NA, NaN,
90.9, NA, NA, 87.89, 101.08, 96.18, 95, 103.12, 92.75, 85.71,
NA, 94.17, NaN, 95.25, 97.5, 100.67, 100.44, NA, 90.9, NA,
90.11, NaN, 81.5, NaN, NaN, 94.45, 100.4, 91.64), NARR_G1_150_AAC = c(103.2,
NA, NA, 97.67, NaN, 106.67, 88.55, 102.43, 109.17, 98.78,
103.25, 76.57, 102.05, 109.43, 97.4, 108.42, 99.29, NA, 94.73,
89, 98, NA, 89.75, 85, 91.75, 100.47, 81.64, 93.14, 97.73,
96, 101.08, 101.33, NA, 94.1, 95.92, 110.33, 83.25, 95.5,
101.67, 98, NA, NA, NaN, 93, NA, NA, 90.56, 102.38, 99, 96.78,
106.5, 94.25, 87.43, NA, 98.33, NaN, 99, 98.92, 103.44, 103,
NA, 93.8, NA, 92, NaN, 82.25, NaN, NaN, 95.45, 102.8, 93.82
), NARR_G1_150_AC = c(6.4, NA, NA, 5.78, NaN, 5, 4.85, 2.29,
0.5, 12.44, 2.5, 19, 4.71, 3, 8, 1.63, 5.86, NA, 4.82, 9.25,
4.08, NA, 10.75, 9.44, 12.25, 3.6, 15.73, 7.14, 3.73, 7.12,
4.08, 6.33, NA, 5.1, 6.62, 3.08, 10.25, 12.5, 4.56, 7.5,
NA, NA, NaN, 8.6, NA, NA, 13.67, 3.15, 6, 2.22, 2.5, 8, 15,
NA, 6, NaN, 5.5, 8.75, 2.44, 4.33, NA, 13.9, NA, 8.78, NaN,
13.75, NaN, NaN, 7.73, 4.4, 9.36), NARR_G1_150_AB = c(7,
NA, NA, 7.33, NaN, 5.33, 7.4, 3.71, 2.17, 13.33, 2.88, 20.14,
6, 5.14, 8.5, 2.42, 6.43, NA, 5.18, 9.25, 4.08, NA, 12.5,
11.56, 12.25, 3.6, 15.73, 7.14, 4, 8.12, 4.46, 7.33, NA,
5.9, 7.54, 3.67, 13, 13.3, 6.78, 8, NA, NA, NaN, 9.1, NA,
NA, 15.11, 4.15, 6.09, 3.22, 2.5, 10.5, 16.29, NA, 7.33,
NaN, 8.38, 8.83, 4, 5.22, NA, 13.9, NA, 12.11, NaN, 15.25,
NaN, NaN, 9.27, 5, 9.36), NARR_G1_200_AAA = c(127.8, NA,
NA, 120.25, NaN, NaN, 105.85, 126.62, 134.5, 121.4, 126.25,
89.33, 126.23, 136, 120.4, 133.17, 124, NA, 115.5, 106.5,
120.86, NA, 115, 104.25, NaN, 123.22, 100, 114, 120.22, 115.67,
124.38, NaN, NA, 112.6, 119, 137.29, NaN, 118.4, 127, NaN,
NA, NA, NaN, 113.8, NA, NA, 111.5, 123.57, 122.33, 118.8,
130, NaN, 106.38, NA, 123.5, NaN, 123.75, 123.29, 127.2,
126.5, NA, 113.8, NA, 113.75, NaN, 101, NaN, NaN, 117.83,
125, 114.5), NARR_G1_200_AAC = c(130, NA, NA, 123, NaN, NaN,
107.54, 128.75, 136.5, 123, 128.5, 90, 128, 137.33, 121.6,
136.92, 125.5, NA, 117, 108.25, 122.29, NA, 115, 105, NaN,
125.11, 102, 116, 122.33, 117.33, 126.25, NaN, NA, 114.6,
121.12, 138.86, NaN, 119.2, 127.75, NaN, NA, NA, NaN, 114.4,
NA, NA, 113, 124.43, 124, 120.6, 133, NaN, 107, NA, 124.5,
NaN, 127.75, 123.57, 129, 127.5, NA, 115.6, NA, 117, NaN,
101, NaN, NaN, 118.5, 129, 115.5), NARR_G1_200_AC = c(11.2,
NA, NA, 12.5, NaN, NaN, 9.31, 4.25, 2, 17.8, 4.5, 32.33,
7.77, 5.67, 13.4, 2.67, 9.62, NA, 7.67, 15, 6.14, NA, 16,
14.75, NaN, 6.22, 24.33, 11, 6.67, 14.33, 7.62, NaN, NA,
9.4, 9.75, 4.86, NaN, 18.6, 8.25, NaN, NA, NA, NaN, 13.8,
NA, NA, 21.75, 6.14, 9.33, 6, 4.5, NaN, 23.75, NA, 8.5, NaN,
6.75, 13.86, 3.8, 6.75, NA, 21.4, NA, 12.75, NaN, 20, NaN,
NaN, 12.83, 7, 15.83), NARR_G1_200_AB = c(12, NA, NA, 14.5,
NaN, NaN, 12.85, 6.38, 4.5, 18.8, 5.25, 34.67, 9.54, 8.67,
14.4, 4, 10.62, NA, 8.33, 15, 6.29, NA, 18, 17.5, NaN, 6.22,
24.33, 11.33, 7, 15.33, 8.12, NaN, NA, 10.8, 11, 5.71, NaN,
19.6, 10.75, NaN, NA, NA, NaN, 14.6, NA, NA, 24, 7.57, 9.5,
8, 5, NaN, 25.75, NA, 10.5, NaN, 10.5, 14, 6, 8.75, NA, 21.4,
NA, 17.75, NaN, 22, NaN, NaN, 15.5, 8, 15.83), NARR_G1_50_AAA = c(37.69,
NA, NA, 37.02, 35.38, 34.34, 36.19, 37.25, 36.78, 36.83,
36.61, 34.2, 34.24, 37.51, 35.74, 34, 35.02, NA, 37.4, 36.18,
36.63, NA, 34.42, 34.38, 35.43, 37.2, 34.49, 34.2, 36.41,
37.07, 36.56, 34.93, NA, 36.06, 36.49, 35.31, 33.33, 34.27,
36.5, 36.5, NA, NA, 34.21, 36.02, NA, NA, 34.02, 35.59, 37.16,
36.02, 37.58, 36.53, 35.46, NA, 36.46, 38.42, 36.05, 37.39,
37.3, 36.22, NA, 35.31, NA, 33.96, 35.55, 35.03, 35, 35.31,
36.54, 36.06, 34.98), NARR_G1_50_AAC = c(41.85, NA, NA, 40.71,
37.5, 42.38, 39.05, 41.98, 42.51, 42.47, 43.43, 36.41, 42.17,
43.27, 40.42, 43.1, 40.52, NA, 41.65, 38.82, 40.63, NA, 40.35,
39.18, 38.93, 41.44, 38.3, 39.54, 40.73, 41.83, 42.54, 40.34,
NA, 40.69, 40.31, 43.51, 36.13, 39.1, 41.65, 41.62, NA, NA,
38.57, 40.02, NA, NA, 38.26, 42.66, 41.55, 39.7, 42.91, 40.43,
38.87, NA, 40.86, 43.26, 40.55, 40.84, 42.13, 42.09, NA,
40.31, NA, 39.69, 39.73, 36.97, 37.71, 43.44, 40.44, 42.33,
39.65), NARR_G1_50_AC = c(0.77, NA, NA, 0.69, 2.25, 0.45,
0.59, 0.12, 0, 1.15, 0.34, 2.61, 0.61, 0.24, 0.64, 0.26,
0.79, NA, 0.19, 1.43, 0.65, NA, 1.39, 1.11, 1.87, 0.31, 1.98,
1.07, 0.54, 0.29, 0.24, 0.76, NA, 0.59, 1.05, 0.62, 2.17,
2.25, 0.33, 1.62, NA, NA, 1.36, 1.53, NA, NA, 2.22, 0.22,
0.65, 0.45, 0.42, 0.9, 2.18, NA, 0.97, 0.05, 0.84, 0.98,
0, 0.44, NA, 1.83, NA, 1.71, 0.91, 1.16, 1.86, 0.12, 0.69,
0.45, 1.24), NARR_G1_50_AB = c(0.88, NA, NA, 0.82, 2.25,
0.45, 1.03, 0.45, 0.54, 1.36, 0.55, 2.71, 0.96, 0.73, 0.64,
0.47, 0.97, NA, 0.29, 1.43, 0.65, NA, 1.81, 1.69, 1.87, 0.31,
2.02, 1.07, 0.54, 0.52, 0.39, 1.1, NA, 0.8, 1.31, 0.82, 2.9,
2.44, 0.74, 1.62, NA, NA, 1.86, 1.76, NA, NA, 2.48, 0.38,
0.67, 0.66, 0.42, 1.67, 2.38, NA, 1.43, 0.16, 1.64, 1.04,
0.57, 0.69, NA, 1.83, NA, 2.6, 0.91, 1.16, 2.71, 0.75, 0.98,
0.58, 1.24), NARR_G2_100_AAA = c(64.25, 59, NA, 67.88, 67.08,
NA, 60.75, 64.42, 71.17, 58.42, NA, 49.8, 63.36, 65.2, NaN,
70.2, 62.85, NaN, 61.6, 53.92, 62.63, NA, NaN, 50.46, 65.14,
60.58, 63.29, NA, 64.33, NaN, NA, 68.57, NA, NA, 66.3, NA,
57.29, NA, 53.5, 63.48, NA, 57.07, NaN, 61.82, NA, 68.61,
57.1, 62.84, 63, 61.91, 58.38, NaN, 61.56, NA, NaN, 65.55,
63.8, 65, 63.14, 67.31, 67.75, 57.62, 63.31, 54.83, 66.43,
NA, NA, 64.67, 57.92, 59, NA)), row.names = c(NA, -71L), class = "data.frame")
I would suggest pulling your column names into a data frame, separating them into their components, and ordering them as desired:
library(dplyr)
library(tidyr)
col_df = data.frame(names = names(merged_DF)[-1]) ## -1 to skip the ID col
col_df = col_df %>%
separate(
col = names, sep = "_",
into = c("s1", "gnum", "num2", "astring"),
remove = FALSE, convert = TRUE
) %>%
arrange(s1, num2, astring, gnum)
## now we have the names in order:
col_df
# names s1 gnum num2 astring
# 1 ARG_G1_50_AAA ARG G1 50 AAA
# 2 ARG_G2_50_AAA ARG G2 50 AAA
# 3 ARG_G1_50_AAC ARG G1 50 AAC
# 4 ARG_G2_50_AAC ARG G2 50 AAC
# 5 ARG_G1_50_AB ARG G1 50 AB
# 6 ARG_G2_50_AB ARG G2 50 AB
# 7 ARG_G1_50_AC ARG G1 50 AC
# 8 ARG_G2_50_AC ARG G2 50 AC
# 9 ARG_G1_100_AAA ARG G1 100 AAA
# 10 ARG_G2_100_AAA ARG G2 100 AAA
# ...
## we can use this order to rearrange the columns
merged_DF = select(merged_DF, c(ID, col_df$names))
names(merged_DF)
# [1] "ID" "ARG_G1_50_AAA" "ARG_G2_50_AAA" "ARG_G1_50_AAC" "ARG_G2_50_AAC"
# [6] "ARG_G1_50_AB" "ARG_G2_50_AB" "ARG_G1_50_AC" "ARG_G2_50_AC" "ARG_G1_100_AAA"
# [11] "ARG_G2_100_AAA" "ARG_G1_100_AAC" "ARG_G2_100_AAC" "ARG_G1_100_AB" "ARG_G2_100_AB"
# [16] "ARG_G1_100_AC" "ARG_G2_100_AC" "ARG_G1_150_AAA" "ARG_G2_150_AAA" "ARG_G1_150_AAC"
# [21] "ARG_G2_150_AAC" "ARG_G1_150_AB" "ARG_G2_150_AB" "ARG_G1_150_AC" "ARG_G2_150_AC"
# [26] "ARG_G1_200_AAA" "ARG_G2_200_AAA" "ARG_G1_200_AAC" "ARG_G2_200_AAC" "ARG_G1_200_AB"
# [31] "ARG_G2_200_AB" "ARG_G1_200_AC" "ARG_G2_200_AC" "NARR_G1_50_AAA" "NARR_G1_50_AAC"
# [36] "NARR_G1_50_AB" "NARR_G1_50_AC" "NARR_G1_100_AAA" "NARR_G2_100_AAA" "NARR_G1_100_AAC"
# [41] "NARR_G1_100_AB" "NARR_G1_100_AC" "NARR_G1_150_AAA" "NARR_G1_150_AAC" "NARR_G1_150_AB"
# [46] "NARR_G1_150_AC" "NARR_G1_200_AAA" "NARR_G1_200_AAC" "NARR_G1_200_AB" "NARR_G1_200_AC"
I bet that there are simpler ways of doing this but this one seems to work.
intercalate <- function(X, pattern) {
f <- function(h, n) {
i <- seq(1, length(h), by = 2)
j <- seq(2, length(h), by = 2)
h[order(c(i, j))]
}
#
g <- function(x, y) {
nx <- length(x)
ny <- length(y)
if(nx == ny) {
h <- c(x, y)
f(h, nx)
} else if(nx > ny) {
h <- c(x[seq_along(y)], y)
h <- f(h, ny)
c(h, x[-seq_along(y)])
} else {
h <- c(x, y[seq_along(x)])
h <- f(h, nx)
c(h, y[-seq_along(x)])
}
}
#
s <- grepl(pattern = pattern, X)
s <- abs(c(0, diff(s)))
sp <- split(X, cumsum(s))
i_odd <- seq(1, length(sp), by = 2)
i_even <- seq(2, length(sp), by = 2)
new_names <- mapply(g, sp[i_odd], sp[i_even])
unname(unlist(new_names))
}
newnames <- intercalate(names(merged_DF)[-1], pattern = "G2")
newnames <- c(names(merged_DF)[1], newnames)
merged_DF[newnames]
This is probably insufficient to the task:
strings <- c('ARG_G1_50_AAA' ,'ARG_G1_50_AAC', 'ARG_G1_50_AC' ,'ARG_G1_50_AB',
'ARG_G2_50_AAA' ,'ARG_G2_50_AAC', 'ARG_G2_50_AC')
substring(strings, regexpr('_\\K[[:upper:]]{2,3}', strings, perl = TRUE), nchar(strings))
[1] "AAA" "AAC" "AC" "AB" "AAA" "AAC" "AC"
idx_strings <- order(substring(strings, regexpr('_\\K[[:upper:]]{2,3}', strings, perl = TRUE), nchar(strings)))
idx_strings
[1] 1 5 2 6 4 3 7
> strings[idx_strings]
[1] "ARG_G1_50_AAA" "ARG_G2_50_AAA" "ARG_G1_50_AAC" "ARG_G2_50_AAC"
[5] "ARG_G1_50_AB" "ARG_G1_50_AC" "ARG_G2_50_AC"
Getting nearly desired 'set1' results for 'NARR_' and 'ARG_' as follows
for 'NARR_', using #akrun data v1, though [7] & [8] appear reversed
idx_v1_N <- which(regexpr('^[N]', v1, perl = TRUE) == 1)
v1[idx_v1_N[order(
substring(v1[idx_v1_N],
regexpr('[^_.G][\\d_]\\d.+[[:upper:]]', v1[idx_v1_N], perl = TRUE),
nchar(v1[idx_v1_N]))[idx_v1_N])]]
[1] "NARR_G1_100_AAC" "NARR_G1_100_AB" "NARR_G2_100_AC" "NARR_G1_150_AAC"
[5] "NARR_G1_150_AB" "NARR_G1_100_AAA" "NARR_G2_150_AAA" "NARR_G2_100_AAA"
[9] "NARR_G1_100_AC" "NARR_G1_150_AAA" "NARR_G1_150_AC" "NARR_G2_100_AAC"
[13] "NARR_G2_50_AB" "NARR_G1_50_AC" "NARR_G1_50_AAA" "NARR_G2_150_AB"
[17] "NARR_G2_150_AAC" "NARR_G2_50_AC" "NARR_G1_50_AAC" "NARR_G2_150_AC"
[21] "NARR_G1_50_AB" "NARR_G2_100_AB" "NARR_G2_50_AAA" "NARR_G2_50_AAC"
the substring and regexpr '[^_.G][\\d_]\\d.+[[:upper:]]' return
substring(v1[idx_v1_N], regexpr('[^_.G][\\d_]\\d.+[[:upper:]]', v1[idx_v1_N], perl = TRUE), nchar(v1[idx_v1_N]))
[1] "1_100_AB" "1_150_AAC" "2_50_AB" "1_150_AB" "2_100_AAA" "1_100_AAC"
[7] "1_150_AAA" "2_100_AC" "1_100_AAA" "1_150_AC" "2_100_AAC" "2_150_AAA"
[13] "1_100_AC" "1_50_AC" "1_50_AAA" "2_150_AB" "2_150_AAC" "2_50_AC"
[19] "1_50_AAC" "2_150_AC" "1_50_AB" "2_100_AB" "2_50_AAA" "2_50_AAC"
which is then order([ed] nearly correctly. Results for 'ARG_' just need an index for starting with 'A'. There are better hammers for this nail, as seen above.

All the column NA values in a dataframe fill with median values in R

I need to fill the null values of all the numerical columns with each column's median value in a data frame. I did the following code.
median_forNumericalNulls <- function(dataframe){
nums <- unlist(lapply(dataframe, is.numeric))
df_num <- dataframe[ , nums]
df_num[] <- lapply(df_num, function(x) {
x[is.na(x)] <- median(x, na.rm = TRUE)
x
})
return(dataframe)
}
median_forNumericalNulls(A)
A is the parent table, which consists of both numerical as well as categorical variables. How can I replace the columns of 'A' dataframe with the output of the function median_forNumericalNulls?
Is there a better way that we can do the same?
May be we need to change the function to directly subset the columns and updating the columns, instead of creating another object and then updating
median_forNumericalNulls <- function(dataframe){
nums <- unlist(lapply(dataframe, is.numeric))
df_num <- dataframe[ , nums]
dataframe[nums] <- lapply(dataframe[nums], function(x) {
x[is.na(x)] <- median(x, na.rm = TRUE)
x
})
dataframe
}
-testing
A <- median_forNumericalNulls(A)
Also, this can be done in a compact way with na.aggregate though
library(zoo)
A <- na.aggregate(A, FUN = median)
Or using tidyverse
library(dplyr)
A <- A %>%
mutate(across(where(is.numeric),
~ replace(., is.na(.), median(., na.rm = TRUE))))
Here is another approach how you could do it:
Example:
librara(dplyr)
iris1 <- iris %>%
select(1, 2, 5)
head(iris1, 10) %>%
as_tibble() %>%
mutate(across(where(is.numeric), ~ifelse(.<= 3, NA, .))) %>%
mutate(across(where(is.numeric), ~ifelse(is.na(.), median(.,na.rm = TRUE), .)))
Sepal.Length Sepal.Width Species
<dbl> <dbl> <fct>
1 5.1 3.5 setosa
2 4.9 3.4 setosa
3 4.7 3.2 setosa
4 4.6 3.1 setosa
5 5 3.6 setosa
6 5.4 3.9 setosa
7 4.6 3.4 setosa
8 5 3.4 setosa
9 4.4 3.4 setosa
10 4.9 3.1 setosa
Base R solution:
# Function to deteremine data.frame vector indices that are numeric:
# resolve_num_cols => function()
resolve_num_cols <- function(df){
# Store a vector of numeric column names:
# num_cols => logical vector
num_cols <- which(
vapply(
df,
is.numeric,
logical(1),
USE.NAMES = FALSE
)
)
# Explicitly define the returned object: logical vector => env
return(num_cols)
}
# Function to impute median values for each numeric vector in data.frame
# impute_median_vals_in_df => function()
impute_median_vals_in_df <- function(df, num_col_idx){
# Replace the na. values in each numeric vector: df => data.frame
df[,num_col_idx] <- lapply(
num_col_idx,
function(col_idx){
df[,col_idx] <- ifelse(
is.na(df[,col_idx]),
median(df[,col_idx], na.rm = TRUE),
df[,col_idx]
)
}
)
# Return the data.frame object: data.frame => env
return(df)
}
# Apply the function to resolve the numeric vectors in data.frame:
# num_cols => integer vector
num_cols <- resolve_num_cols(df1)
# Apply the function: clean_df => data.frame
clean_df <- impute_median_vals_in_df(df1, num_cols)
Data used:
# Import data: df1 => data.frame
df1 <- structure(list(mpg = c(NA, 21, NA, 21.4, 18.7, 18.1, 14.3, 24.4,
22.8, 19.2, NA, 16.4, 17.3, NA, NA, 10.4, 14.7, 32.4, 30.4, 33.9,
21.5, NA, 15.2, 13.3, 19.2, 27.3, 26, NA, NA, 19.7, 15, 21.4),
cyl = c(NA, 6, NA, 6, 8, NA, NA, 4, 4, 6, 6, 8, 8, 8, 8,
8, 8, NA, NA, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, NA, NA, 4), disp = c(160,
160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6, 167.6,
NA, NA, 275.8, NA, 460, 440, 78.7, 75.7, 71.1, NA, 318, NA,
350, 400, NA, NA, 95.1, 351, NA, 301, 121), hp = c(110, 110,
93, 110, NA, 105, 245, 62, 95, 123, 123, 180, NA, 180, 205,
215, NA, 66, NA, 65, 97, NA, NA, 245, 175, 66, 91, NA, 264,
175, NA, 109), drat = c(3.9, 3.9, 3.85, 3.08, 3.15, 2.76,
3.21, NA, 3.92, 3.92, 3.92, 3.07, NA, 3.07, 2.93, 3, 3.23,
4.08, NA, 4.22, NA, 2.76, 3.15, 3.73, 3.08, NA, NA, 3.77,
4.22, 3.62, NA, NA), wt = c(2.62, 2.875, 2.32, 3.215, 3.44,
3.46, 3.57, NA, 3.15, 3.44, 3.44, 4.07, NA, NA, 5.25, 5.424,
5.345, 2.2, 1.615, 1.835, NA, NA, 3.435, 3.84, NA, NA, NA,
1.513, 3.17, 2.77, 3.57, 2.78), qsec = c(16.46, 17.02, 18.61,
19.44, NA, NA, NA, 20, NA, 18.3, 18.9, 17.4, 17.6, 18, NA,
17.82, NA, 19.47, 18.52, 19.9, NA, 16.87, NA, 15.41, 17.05,
18.9, 16.7, 16.9, 14.5, 15.5, 14.6, 18.6), vs = c(0, NA,
1, 1, NA, NA, 0, NA, 1, NA, 1, 0, 0, 0, 0, 0, 0, 1, NA, 1,
1, 0, 0, 0, NA, 1, NA, 1, 0, 0, 0, 1), am = c(NA, NA, NA,
0, NA, 0, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, NA,
0, 0, 0, NA, 1, 1, 1, 1, 1, 1, 1), gear = c(4, 4, 4, 3, NA,
3, 3, NA, NA, 4, 4, 3, 3, 3, 3, 3, NA, 4, 4, 4, 3, 3, 3,
3, 3, 4, 5, NA, NA, NA, NA, 4), carb = c(4, 4, 1, 1, 2, 1,
4, NA, NA, 4, NA, 3, NA, 3, NA, 4, 4, 1, 2, 1, 1, 2, 2, 4,
NA, NA, 2, 2, NA, 6, 8, 2)), row.names = c("Mazda RX4", "Mazda RX4 Wag",
"Datsun 710", "Hornet 4 Drive", "Hornet Sportabout", "Valiant",
"Duster 360", "Merc 240D", "Merc 230", "Merc 280", "Merc 280C",
"Merc 450SE", "Merc 450SL", "Merc 450SLC", "Cadillac Fleetwood",
"Lincoln Continental", "Chrysler Imperial", "Fiat 128", "Honda Civic",
"Toyota Corolla", "Toyota Corona", "Dodge Challenger", "AMC Javelin",
"Camaro Z28", "Pontiac Firebird", "Fiat X1-9", "Porsche 914-2",
"Lotus Europa", "Ford Pantera L", "Ferrari Dino", "Maserati Bora",
"Volvo 142E"), class = "data.frame")

Impute missing values with average of previous 13 values

I have a dataset with few missing observations. My objective is to impute the missing value in each variable with the average of previous 13 values. In case there is a missing value before the 13th observation, the average of whatever there before should be used for imputing that variable. I am not sure how to do it.
Please use the below to replicate my dataset. Your help is much appreciated.
df1 <- structure(list(V1 = c(276.12, 53.4, 20.64, 181.8, 216.96, 10.44,
69, 144.24, 10.32, 239.76, 79.32, 257.64, 28.56, 117, 244.92,
234.48, NA, 337.68, 83.04, 176.76, 262.08, 284.88, 15.84, NA,
74.76, 315.48, 171.48, 288.12, 298.56, 84.72, 351.48, 135.48,
NA, 318.72, 114.84, 348.84, 320.28, 89.64, 51.72, 273.6, 243,
212.4, 352.32, 248.28, NA, 210.12, 107.64, 287.88, 272.64, 80.28,
239.76, 120.48, 259.68, 219.12, 315.24, 238.68, 8.76, 163.44,
252.96), V2 = c(45.36, 47.16, 55.08, 49.56, 12.96, 58.68, 39.36,
NA, 2.52, 3.12, 6.96, 28.8, NA, 9.12, 39.48, 57.24, 43.92, 47.52,
24.6, 28.68, 33.24, 6.12, 19.08, 20.28, 15.12, 4.2, 35.16, NA,
32.52, 19.2, 33.96, 20.88, 1.8, 24, 1.68, NA, 52.56, 59.28, 32.04,
45.24, 26.76, 40.08, 33.24, 10.08, 30.84, 27, 11.88, 49.8, 18.96,
14.04, 3.72, 11.52, 50.04, 55.44, 34.56, NA, 33.72, 23.04, 59.52
)), class = "data.frame", row.names = c(NA, -59L))
You can use zoo::rollapply to compute the mean over the 13 values:
mean13 = zoo::rollapply(
df1$V1,
13,
function(x) {
mean(na.omit(x))
},
align = "right",
fill = NA,
partial = TRUE
)
df1$V1_prev_mean = c(df1$V1[1], head(mean13, -1))
df1$V1 = ifelse(is.na(df1$V1), df1$V1_prev_mean, df1$V1)
Output:
V1 V2 V1_prev_mean
1 276.1200 45.36 276.1200
2 53.4000 47.16 276.1200
3 20.6400 55.08 164.7600
4 181.8000 49.56 116.7200
5 216.9600 12.96 132.9900
6 10.4400 58.68 149.7840
7 69.0000 39.36 126.5600
8 144.2400 NA 118.3371
9 10.3200 2.52 121.5750
10 239.7600 3.12 109.2133
11 79.3200 6.96 122.2680
12 257.6400 28.80 118.3636
13 28.5600 NA 129.9700
14 117.0000 9.12 122.1692
15 244.9200 39.48 109.9292
16 234.4800 57.24 124.6615
17 141.1108 43.92 141.1108 # <- this row filled
18 337.6800 47.52 137.7200
19 83.0400 24.60 147.7800
20 176.7600 28.68 153.8300

Match values in 2 dataframes, NA error

It is necessary to use the data presented here, for the sake of the problem.
I would like to match values from 2 dataframes. however some values are not "matched", and I cannot see why!
I will try to concisely explain my problem.
1) dataframe with theoretical values
#1.1) I have the following vector
Pos<-c(8.75, 9.3, 8.8, 9.6, 9.4, 11, NA, 13, 10.5, 12.31, 11.18, 13.06, 10.71, 12.5, 15.03, 15.26, 13.22, 15.25, 13.03, 15.28, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 9.2, NA, 9.6, NA, 10.93, NA, 11.19, NA, 10.86, 10.3, 9.4, 9.1, 9.1, 9.4, 9.7, 8.9, 9.86, 9.2, 9.2, NA, NA, NA, NA, NA, NA, NA, 10.9, NA, NA, 10.92, 10.69, 9.91, 10.01, NA, 10.66, NA, 10.38, NA, 11.4, 7.4, 7.3, 9, 9.6, NA, NA, 8, 9.3, NA, NA, 9.33, 9.9, 9.9, 11.2, 6.9, 7.3, 7, 8.7, 7.4, 8.6, 7.6, 9.24, 8.59, 8.6, 8.46, NA, 8.21, 9, 6.6, 8.5, 8.5, 10.2, 9.6, 9.55, NA, NA, 7.8, 9.6, NA, NA, 10.5, 11.4, 11.81, 9.7, NA, NA, 7.8, 8.9, NA, NA, NA, 12.29, NA, 11, NA, NA, NA, 11.11, NA, NA, 8.1, 8.1, 8.3, 10.2, NA, NA, 8.2, 11, NA, NA, NA, 8.7, NA, 8.9, NA, 11.3, NA, 12.2, NA, 12.5, 10.76, 14, 11.19, 15.4, NA, NA, 8.9, 10.9, NA, NA, 9.04, 9.74, 9.41, 9.43, 10.96, 10.93, 13.06, 10.31, 11.69, 8.66, 9.11, 8.87, 9.61, 8.99, 9.48, 9.58, 9.26, 9.29, 8.4, 8.5, 8.2, 8.3, 12.1, 8.7, 13.9, 8.8, 7.79, 10.45, 9.56, 9.66, 10.55, 11.76, 9.31, 12.36, 9.33, 10.71, 13.03, 12.36, 11.88, 11.94, 12.83, 13.51, 12.54, 14.29, 11.43, 11.19, 11.4, 9.9, 13.21, 11.1, 12.75, 12.03, 11.55, 10.3, 10.26, 10.31, 8.9, 8.8, 9.12, 10.35, 9.2, 9.3, 8.9, 7.7, 8.51, 8.2, 8.2, 8.54, 8.6, NA, 8, 8.5, 8.84, 8.22, 9.78, 7.8, 7.5, 7.7, 7.7, 9.68, 8.1, 8.21, 7.91, 8.11, 9.21, 9.01, 9.89, 8.2, 8.56, 10.19, 9.1, 9, 10.46, 8.7, 10.16, 8.9, 8.7, 9.6, 7.76, 7.76, 8.51, 10.26, 7.2, 11.71, 11.43, 11.24, 7.3, 9.13, 8.74, 8.81, 8.61, 8.63, 9.43, 8.93, 9.13, 9.33, 7.47, 7.21, 7.71, 8.28, 7.48, NA, 7.44, 8.81, 7.42, 7.25, 6.1, 8.74, 8.51, 6.7, 8.76, 6.2, 7.94, 8.51, 6.8, 13.03, 13.09, 12.9, 13.34, 13.07, 12.02, 12.94, 12, 12.61, 9.96, 8.79, 8.91, 9.2, 8.73, 8.61, 7.89, 8.17, 11.71, 8.99, 11.35, 10.36, 9.67, 8.86, 10.2, 11.17, 12.75, 12.49, 7.6, 9.62, 8.1, 9.93, 12.4, NA, NA, 8.3, 9.95, 7.4, 9.21, 9.34, 10.09, 7.9, 9.64, 7.6, 10.19, 12.65, 10.3, 10.3, 11, 11.66, 16, 11, 12.7, 11, 11.4, 11.49, 12.79, 16.65, NA, 11.75, 12.94, 13.3, 11.3, 9.86, 10.9, 12.08, 11, 9.99, 12.81, 12.36, NA, NA, 7.66, 6.5, 6.3, 6.4, 7, 7.1, 8.48, 6.8, 7.75, 12.97, 12.88, 12.49, 12.59, 12.83, 11.59, 8.9, 13.93, 13.35, 13.63, 14.64, 13.53, 13.64, 13.68, 13.38, 13.97, 12.98, 12.35, 12.89, 9.54, 9.3, 10.16, 10.71, 11.95, 12.03, 9.26, 10.15, 10.26, 6.7, 6.6, 7, 6.3, 7.76, 8.21, 7.7, 7.6, 13.49, 12.2, NA, 12.76, 12.78, 12.5, 13.57, 12.3, 12.84, 15.85, 11.26, 9.4, 11.16, 10.69, 11.43, 10.17, 10.51, 13.27, 11.39, 10.9, 10.54, NA, 10, 11.64, 10.6, 10.1, NA, 11.29, 7.61, 7.3, 7, 9.3, 13.33, 8.01, 8.16, 7.1, 9.91, 8.08, 11.33, 7.4, 10.39, 9, 11.5, 10.68, 8.53, 9.3, 11.19, 15.62, 11.02, 10.3, 9.7, 11.3, 10.5, 10.84, 13.86, 7.9, 7.6, 9.46, 7.9, 7.8, 9.33, 9.79, 7.7, 8.5, 8.3, 8.2, 8.1, 8.1, 10.2, 7.9, 8.3, 9.56, 9.34, 8.6, 9.6, 9.27, 8.1, 11.8, 9.74, 8.9, 8.3, 9.7, 7.6, 7.2, 9.21, 7.8, 7, 7.1, 8.1, 8.85, 9.4, 9.91, 9.44, 10.06, 8.6, 10.2, 10.55, NA, NA, 12.79, NA, NA, 9.75, 13.11, 14.54, NA, 14.36, 10.18, 14, 12.1, 15.26, NA, 10.99, 9.59, 10.9, 10.81, 9.3, 8.2, 8.75, 9.6, 8.9, 11.11, 11, 12, 10.9, 10.96, 8.99, 12.1, 11.76, 12.83, 11.1, 9.12, 8.54, 7.5, 9.01, 10.16, 11.71, 9.43, NA, 8.76, 13.07, 8.73, 8.86, 12.4, 7.9, 16, 11.75, 12.81, 7.1, 11.59, 13.38, 11.95, 7.76, 12.5, 11.43, 11.64, 13.33, 9, 9.7, 7.8, 10.2, 11.8, 7, 10.2, 14.54)
#1.2) Height, is the column to be filled
Pos.table<-data.frame(Pos=Pos,Height=NA)
2) dataframe with theoretical values
#2.1) the whole range of values that "Pos" can get
Source<- seq(0,17,0.01) #possible values that weight can get [0,17]
#2.2)height.0, the adjusted value of Height according to the Loop below
Table.match<- data.frame(Source=Source,Height.0=NA)
# loop for Source (real values)
for (i in 1:dim(Table.match)[1])
{
Table.match[i,"Height.0"] <- -57.5+5*(Table.match[i,"Source"])
}
2) Problem
The following Loop looks for respective matches.
for (i in 1:dim(Pos.table)[1])
{
H.i<-match(Pos.table[i,"Pos"], Table.match[,"Source"], nomatch = 0)
Pos.table[i,"Height"] <-ifelse(H.i,Table.match[H.i,"Height.0"],0)
# Rev.table[i,"Rev.Prot"]<-Rev.table[i,"Rev.Prot"]*Rev.table[i,"Yield"]
}
However, some values ares disregarded. for example, position 15 and 20 (among many others):
# both return NAs
match(15.03, Table.match[,"Source"])
match(15.28, Table.match[,"Source"])
Could you please advice me on how to overcome this problem?
I agree with Nicole that exact comparison between floating numbers should be avoided.
To solve that, I've just added a round() with 2 significant digits in the code:
for (i in 1:dim(Pos.table)[1])
{
H.i<-match(round(Pos.table[i,"Pos"],2), round(Table.match[,"Source"],2), nomatch = 0)
Pos.table[i,"Height"] <-ifelse(H.i,Table.match[H.i,"Height.0"],0)
# Rev.table[i,"Rev.Prot"]<-Rev.table[i,"Rev.Prot"]*Rev.table[i,"Yield"]
}
I guess this solves the problem.

Weird behaviour (bug?) in car::bcPower

Consider the dataset Kort:
structure(list(V1 = c(-0.03, 0.22, -0.11, -0.01, 0.25, 0.29,
-0.74, 0.23, 0.39, -0.04, 0.18, 0.19, 0.4, 0.21, 0.21, -0.01,
-0.05, 0.02, -0.12, 0.37, -0.07, 0.51, 0.39, 0.14, 0.02, 0.73,
-0.25, 0.44, 0.29), V2 = c(35.39, 34.33, 32.74, 34.72, 33.07,
30.9, 29.89, 31.17, 31.62, 33.13, 30.64, 33.31, 33.61, 34.16,
30.06, 30.06, 31.18, 25.57, 30.52, 32.43, 31.54, 29.6, 34.66,
31.74, 27.22, 41, 32.02, 37.96, 29.25), V3 = c(37.24, 36.77,
37.21, 41.16, 40.3, 42.16, 40.77, 39.59, 37, 38.32, 34.6, 38.1,
36.07, 39.2, 36.97, 38.28, 38.72, 46.81, 39.63, 36, 45.33, 38.72,
36.2, 40.94, 37.7, 42.44, 37.92, 39.87, 37.15), V4 = c(-36L,
-18L, -2L, 20L, 37L, 39L, -7L, 31L, -23L, 32L, 73L, 10L, 14L,
18L, 126L, 98L, 13L, 14L, 15L, 37L, 66L, 3L, -50L, 9L, 6L, -20L,
4L, -26L, -2L), V5 = c(12.4, 10.5, 2.8, 9.5, 9.4, 10.7, 7.5,
14.8, 10.9, 13.5, 11.5, 11.8, 13.6, 8.6, 13.6, 13.1, 14.3, 11.3,
16.1, 14.5, 8.4, 15.4, 13.4, 14, 18.8, 17.4, 16.4, 16, 17.7),
V6 = c(27424L, 25597L, 20968L, 24730L, 25423L, 25801L, 23681L,
29527L, 26228L, 28262L, 27363L, 27134L, 27542L, 24647L, 28260L,
27922L, 29054L, 25650L, 30096L, 29103L, 24112L, 30035L, 28771L,
27818L, 32455L, 29722L, 30508L, 29896L, 31961L), V7 = c(68.8,
70.4, 61.6, 73.5, 71.8, 76.5, 72.7, 75.3, 71.7, 75, 72.9,
73.3, 73.7, 69, 72.7, 74.2, 73.4, 71.2, 76.4, 73, 62.5, 76,
73.7, 74.7, 74.3, 74.8, 74.6, 74.4, 74.4), V8 = c(8.1, 6.8,
11, 5.3, 6.3, 4.1, 5.5, 4, 5.9, 4.3, 5.5, 5.4, 4.2, 8.1,
5.2, 4.8, 4.4, 8.2, 3.8, 5.9, 12.9, 4.3, 5.2, 5, 3.6, 3.8,
4.6, 4.3, 4.5), V9 = c(0.38, 0.15, 0.16, 0.08, 0.12, 0.05,
0.07, 0.04, 0.08, 0.07, 0.13, 0.08, 0.08, 0.26, 0.05, 0.14,
0.05, 0.26, 0.03, 0.18, 0.26, 0.04, 0.04, 0.14, 0.05, 0,
0.02, 0.02, 0.1), V10 = c(9.8, 9.9, 19.4, 7, 9.2, 3, 8.5,
1.1, 3, 2.3, 5.1, 5.6, 1, 22.3, 4.4, 6.2, 2.2, 5.3, 1.5,
5, 18.7, 1.5, 3, 8.9, 1.6, 0, 5.1, 2.1, 3.6), V11 = c(6.3,
7.5, 5.5, 10.2, 5, 9.6, 9.3, 4.8, 4.3, 4.6, 4.1, 5.7, 6.4,
4, 7.2, 4.7, 4.2, 4.5, 7.6, 5.3, 6.2, 4.1, 4.9, 4.1, 5.1,
3.3, 5.4, 5, 5.6), V12 = c(153605L, 152867L, 115972L, 140341L,
139245L, 167038L, 143239L, 179712L, 135273L, 167487L, 160738L,
160648L, 154717L, 118800L, 168954L, 148412L, 147637L, 142615L,
210838L, 161840L, 114310L, 182670L, 160293L, 147747L, 192889L,
191077L, 164107L, 202051L, 192945L)), .Names = c("V1", "V2",
"V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12"
), class = "data.frame", row.names = c(NA, -29L))
Where the response is:
Kort$V12
[1] 153605 152867 115972 140341 139245 167038 143239 179712 135273 167487
[11] 160738 160648 154717 118800 168954 148412 147637 142615 210838 161840
[21] 114310 182670 160293 147747 192889 191077 164107 202051 192945
Doing a box-cox transform, using car::boxcox
boxcox(V12~.,data=Kort,lambda=seq(-4,4,4/10))
yields an optimal parameter of -2. Transforming the response using
car::bcPower
TVP<-bcPower(Kort$V12,lambda=-2)
turns TVP into a vector of constants:
TVP
[1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[20] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
but box cox transform should be a continuous map!
I don't think this is a bug, there's simply a limit to how many decimal places are printed out. The help file suggests that the calculation is (U^(lambda)-1)/lambda which is pretty close to 1/2 where U is large. You can see that TVP is being calculated correctly with
TVP-0.5
# [1] -2.119138e-11 -2.139650e-11 -3.717610e-11 ...
or
options(digits=20)
TVP
# [1] 0.49999999997880861802 0.49999999997860350431 0.49999999996282390446 ...

Resources