Using gratia::data_slice() for a GAM with an offset - r

I am trying to get my fitted values from a gam with a few of the features on the GitHub version of gratia and am having trouble using the data_slice() function with a model that has an offset. I am not sure what I need to do to get the data I need for the fitted_values() function when my model has an offset.
I have a data set that looks like this:
df <- data.frame(
Landings = c(6918, 4899, 43, 0, 1712, 34427, 1080, 2826, 30521, 53302, 19467,
98013, 399, 13915, 568, 1399, 5345, 219271, 79400, 8195, 4956,
634, 12963, 2430, 32003, 2598, 3772, 5759, 11695, 35459, 0, 8959,
66760, 30628, 52, 859, 14417, 1688, 3287, 0, 22661, 31184, 4169,
78, 4647, 49, 1241, 5684, 11788, 43606, 2662, 9887, 17844, 64693,
28943, 55279, 5321, 18504, 197130, 298, 77454, 22884, 5359, 117578,
14252, 16361, 3775, 4375, 4402, 39140, 13047, 2892, 21982, 458,
68849, 4966, 72827, 29788, 319, 20712, 10327, 11536, 21070, 2545,
51660, 12031, 16566, 3263, 64641, 82188, 101116, 3510, 694, 5999,
70, 21446, 1908, 66598, 52434, 39591),
per.change = c(NA, NA, NA, -0.0170667179320969, -0.0310932741933139, -0.00876883009292576,
-0.0337896071523766, NA, -0.0383733367412093, 0.0313493941187739,
NA, 0.00309694472510056, 0.00440176543788437, 0.0162746873496682,
0.0633103545787895, -0.0304660643253923, -0.0477002654740667,
0.0465198702443307, 0.0848090520628948, NA, 0.00136588406239499,
0.02768520985601, -0.0439813588624489, 0.150138388427594, 0,
0.0393789469840049, -0.0339055127886893, -0.0253342452162168,
0.06662884765506, 0.0489644208362528, NA, NA, 0.118029428502036,
0.00145541011350925, NA, NA, 0.129622031181149, NA, -0.0308115567673083,
NA, -0.0425221990122855, 0.0294010131077341, -0.0227758522729325,
NA, 0.0664264452451825, -0.0542076704097344, -0.0630814426046203,
-0.0202991435976089, -0.0228409071757005, -0.0481243087379853,
-0.0222487699626362, 0.00268148684149014, 0.032275119594268,
0.0592311473502147, NA, -0.0402932362775077, 0.0225902785267178,
-0.0245393760611263, -0.0910224764171599, 0.0248344347525319,
-0.0132098512036838, -0.00480626865142122, -0.0207390648567119,
-0.0210938178547339, 0.0653168963830966, -0.0258970505397598,
-0.0266521730813433, NA, 0.00766938294024467, 0.0361020509477387,
-0.00356050066471782, -0.0303002256316303, -0.0493053708804782,
NA, 0.095008528099584, 0.00673520533179099, -0.0145235131366679,
-0.0915368065151916, NA, -0.0633484162895928, -0.0364410398900781,
0.0297277420555085, -0.0259575275766121, 0.000333945513698282,
-0.040540978212464, -0.105289527646177, 0.00931710632328382,
-0.00481869261842514, 0.00990973004165871, -0.0207742441119774,
-0.0725670373822072, -0.00197394223951389, NA, NA, 0.069556715359811,
NA, -0.00233517445977803, -0.118707735630186, -0.0283717012552832,
0.0224756418583045),
lmb_eff = c(9348, 5383, 86.5, 1160, 1520.7, 37832, 1800.6, 9421, 24693,
80761, 20754, 297008.8, 3067, 10871, 1798, 3515, 19089, 261037,
107881, 4737, 10114, 396, 29462, 14639, 16328, 6186.5, 12572,
9930, 15188, 48112, 967, 16967.9, 71785, 69608.5, 742, 1492,
8099.5, 5723, 5218, 88, 35519, 31853, 9063, 654, 13276, 2439,
14262.5, 10526, 15113, 118817.9, 3646, 2808.7, 137263.9, 143763,
15816, 50026, 17221.6, 21516, 148777, 966, 140824.7, 35259.8,
6615.5, 113492, 39590, 68170, 16415, 3580, 8151, 32918, 20386,
3825.5, 19453, 208, 51380, 6208, 137409, 17409.8, 1028, 12007,
15413.6, 16622.5, 32974, 3397.5, 53812.6, 15057, 31256.6, 7124.8,
72063, 70913, 65447, 4228.8, 1020, 14887, 212, 14966, 10721.5,
58063, 100834, 43175),
stringsAsFactors = FALSE)
I am running a gam that looks like this:
gam<-gam(Landings~s(per.change)+offset(log(lmb_eff)),data=df)
I try to use the data_slice function and I am keep getting an error.
ds <- data_slice(gam, per.change = evenly(per.change, n = 100), Season = evenly(Season), lmb_eff=1)
Error in eval(predvars, data, env) : object 'lmb_eff' not found
I see in the help file that there is an offset = that I can add to the code but I have tried a bunch of iterations and not had any luck figuring out what I need to specify so that I can use data_slice() with a model that has an offset.

It's a bug in an internal helper function https://github.com/gavinsimpson/gratia/issues/189
In the meantime, until I fix this:
new_df <- with(df,
tidyr::expand_grid(per.change = evenly(per.change, n = 100)))
ds <- dplyr::bind_cols(new_df, lmb_eff = rep(1, nrow(new_df)))
fitted_values(m, data = ds)

Related

How to solve error when using adorn_totals function in R?

I get the following message of error when using janitor::adorn_totals("row"):
"Error in adorn_totals(., "row") :
trying to re-add a totals dimension that is already been added"
Here is the head of my dataset :
structure(list(code_1 = c("M01", "C03", "M99", "C05", "O01",
"C07"), regroupement_elsan = c("Gastro", "Ophtalmo", "Divers médecine",
"Gynéco", "Accouchements", "bouche et dents"), actes_2019 = c(9179,
5589, 6024, 4150, 4028, 3458), actes_2020 = c(7933, 4167, 3740,
2994, 3348, 2206), actes_2021 = c(6504, 5505, 4682, 3376, 3226,
3035), sejours_2019 = c(1631, 2502, 1028, 852, 1455, 1288), sejours_2020 = c(1335,
1819, 726, 574, 1371, 801), sejours_2021 = c(1109, 2416, 825,
657, 1259, 1106), tx_0_nuit_2019 = c("3.92397302268547", "90.7673860911271",
"32.9766536964981", "57.5117370892019", "0.206185567010309",
"98.9130434782609"), tx_0_nuit_2020 = c("3.29588014981273", "92.9081913139087",
"47.1074380165289", "59.581881533101", "0.291757840991977", "99.250936329588"
), tx_0_nuit_2021 = c("3.6068530207394", "95.4470198675497",
"18.3030303030303", "60.2739726027397", "0.158856235107228",
"98.7341772151899"), pourcentage = c(5.37796226165473, 4.55191916519208,
3.87140518282095, 2.79151300666457, 2.66748251170021, 2.50955034811226
), pourcentage_cumule = c(78.4062908267046, 82.9582099918967,
86.8296151747176, 89.6211281813822, 92.2886106930824, 94.7981610411947
)), row.names = c(NA, -6L), class = c("tabyl", "tbl_df", "tbl",
"data.frame"), core = structure(list(code_1 = c("M01b", "C01",
"C02", "C04", "M01", "C03", "M99", "C05", "O01", "C07", "C08",
"C99", "C98", "C10", "C06", "M03", "O02", "M02", "M04", "C01b",
"O03", "S99", "***", "C10b", "M05", "M98", "O04"), regroupement_elsan = c("Endoscopies
digestives",
"Ortho (+ rhumato et rachis)", "Chirurgie digestive", "Uro-néphro",
"Gastro", "Ophtalmo", "Divers médecine", "Gynéco", "Accouchements",
"bouche et dents", "Tissus mou et chir plastique", "Divers chir",
"Chir esth et hors sécu", "Chir thoracique et vasculaire", "ORL Stomato sf bouche et
dent",
"Pneumologie", "Obstétrique autre (hors IVG)", "Cardio Vasc (médecine)",
"Neurologie", "Rachis", "IVG", "Séances autres", "Autres", "Chir thoracique",
"Soins palliatifs", "Vasculaire interventionnel", "Néo nat"),
actes_2019 = c(36079, 29520, 14618, 6515, 9179, 5589, 6024,
4150, 4028, 3458, 2137, 2180, 575, 449, 866, 388, 294, 311,
714, 395, 292, 1842, 10, 0, 4, 0, 1), actes_2020 = c(30192,
25451, 12845, 7376, 7933, 4167, 3740, 2994, 3348, 2206, 2107,
1477, 575, 437, 337, 897, 193, 218, 267, 308, 118, 737, 8,
4, 0, 11, 5), actes_2021 = c(42333, 24055, 13735, 8196, 6504,
5505, 4682, 3376, 3226, 3035, 2571, 1134, 689, 511, 352,
272, 181, 161, 138, 106, 82, 61, 18, 8, 7, 0, 0), sejours_2019 = c(6992,
5493, 2577, 1221, 1631, 2502, 1028, 852, 1455, 1288, 540,
397, 236, 158, 260, 63, 148, 101, 90, 44, 246, 1820, 4, 0,
1, 0, 1), sejours_2020 = c(5811, 4946, 2220, 1220, 1335,
1819, 726, 574, 1371, 801, 554, 269, 221, 140, 94, 42, 109,
79, 58, 34, 98, 720, 2, 1, 0, 1, 5), sejours_2021 = c(7922,
5144, 2523, 1451, 1109, 2416, 825, 657, 1259, 1106, 649,
264, 278, 162, 111, 51, 108, 69, 30, 21, 77, 54, 7, 1, 2,
0, 0), tx_0_nuit_2019 = c("96.0955377574371", "63.5718186783179",
"41.4435389988359", "36.2817362817363", "3.92397302268547",
"90.7673860911271", "32.9766536964981", "57.5117370892019",
"0.206185567010309", "98.9130434782609", "72.5925925925926",
"53.904282115869", "13.9830508474576", "96.2025316455696",
"50.7692307692308", "42.8571428571429", "85.1351351351351",
"72.2772277227723", "11.1111111111111", "4.54545454545455",
"100,0", "100,0", "100,0", "0,0", "0,0", "0,0", "0,0"), tx_0_nuit_2020 =
c("96.0936155567028",
"67.3069146785281", "40.5855855855856", "34.344262295082",
"3.29588014981273", "92.9081913139087", "47.1074380165289",
"59.581881533101", "0.291757840991977", "99.250936329588",
"76.3537906137184", "49.814126394052", "11.7647058823529",
"99.2857142857143", "53.1914893617021", "16.6666666666667",
"74.3119266055046", "81.0126582278481", "25.8620689655172",
"8.82352941176471", "98.9795918367347", "100,0", "100,0",
"100,0", "0,0", "0,0", "20,0"), tx_0_nuit_2021 = c("96.7053774299419",
"73.2892690513219", "51.0503369005153", "41.9021364576154",
"3.6068530207394", "95.4470198675497", "18.3030303030303",
"60.2739726027397", "0.158856235107228", "98.7341772151899",
"83.9753466872111", "60.2272727272727", "50,0", "94.4444444444444",
"72.972972972973", "1.96078431372549", "81.4814814814815",
"85.5072463768116", "43.3333333333333", "52.3809523809524",
"100,0", "100,0", "100,0", "100,0", "0,0", "0,0", "0,0")), row.names = c(NA,
-27L), class = "data.frame"), tabyl_type = "two_way", totals = "row")
And the code I tried :
library(janitor)
autres %>%
adorn_totals("row")
Could anyone help ? I had indeed used the adorn_totals function on the dataframe used to generate the dataframe "autres", but I made sure the row "total" isn't in the dataframe "autres" anymore.
With the object you have shared as x:
x %>%
untabyl() %>%
adorn_totals()
Why it works:
You can see at the end of the object you shared, tabyl_type = "two_way", totals = "row". Those attributes are stored with the data.frame you're working with. When you try to adorn_totals() a second time, janitor checks this and errors.
When you call untabyl() it strips those attributes. Then adorn_totals() succeeds.
I notice you have a cumulative percentage column. If desired, you can control exactly which columns get a totals value in adorn_totals() - see ?adorn_totals and the ... argument for how, and here's an example: https://stackoverflow.com/a/69759313.

Using lapply on a function that attempts to match values to another object in R

I hope someone may be able to help me. I have a list of IDs and blood results in output
dput(output)
list(BNP = structure(list(record_id = structure(c("113-1", "113-10",
"113-11", "113-12", "113-13", "113-14"), label = c(record_id = "Record ID"), class = c("labelled",
"character")), BNP = c(67.8, 1873.3, 784.5, 82.3, 156.5, 116.4
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
)), proBNP = structure(list(record_id = structure(c("103-13",
"103-16", "103-17", "103-20", "104-1", "106-12", "106-13", "106-14",
"106-2", "106-3", "106-4", "106-6", "106-7", "106-8", "112-1"
), label = c(record_id = "Record ID"), class = c("labelled",
"character")), proBNP = c(66, 1865, 6143, 275, 1240, 992, 1116,
8469, 165, 2486, 85, 100, 9231, 8561, 815)), row.names = c(NA,
-15L), class = c("tbl_df", "tbl", "data.frame")), troponin = structure(list(
record_id = structure(c("101-2", "103-13", "103-14", "103-16",
"103-17", "104-1", "104-2", "106-12", "106-13", "106-14",
"106-5", "106-6", "106-7", "106-8", "112-1", "113-1", "113-10",
"113-11", "113-12", "113-13", "113-14"), label = c(record_id = "Record ID"), class = c("labelled",
"character")), troponin = c(29900, 5, 3, 5, 422, 11300, 10,
5.1, 1.5, 159.7, 15.9, 1863, 92.8, 706.5, 643, 50, 110, 60,
30, 130, 10)), row.names = c(NA, -21L), class = c("tbl_df",
"tbl", "data.frame")), CRP = structure(list(record_id = structure(c("101-2",
"103-13", "103-14", "103-15", "103-16", "103-17", "103-19", "103-20",
"104-1", "104-2", "106-1", "106-11", "106-12", "106-13", "106-14",
"106-2", "106-3", "106-4", "106-5", "106-6", "106-7", "106-8",
"112-1", "113-1", "113-10", "113-11", "113-12", "113-13", "113-14"
), label = c(record_id = "Record ID"), class = c("labelled",
"character")), CRP = c(54.8, 78, 229, 166, 77, 345, 25, 124,
225.4, 156.2, 141.11, 110, 96.87, 126, 238.97, 6.19, 135.7, 135,
138.7, 84.7, 242, 299, 41, 114.7, 156.2, 112.3, 394.3, 179.5,
93)), row.names = c(NA, -29L), class = c("tbl_df", "tbl", "data.frame"
)), ferritin = structure(list(record_id = structure(c("101-2",
"103-13", "103-14", "103-15", "103-16", "103-17", "103-20", "104-1",
"106-11", "106-12", "106-13", "106-14", "106-2", "106-3", "106-4",
"106-5", "106-6", "106-8", "112-1", "113-10", "113-11", "113-12",
"113-13", "113-14"), label = c(record_id = "Record ID"), class = c("labelled",
"character")), ferritin = c(253.97, 314, 438, 199, 390, 1342,
128, 462.6, 125.8, 428, 237, 302.23, 1651, 133.6, 167, 1746,
343, 1145.96, 697.76, 690.2, 395.4, 1492.2, 275.4, 254.2)), row.names = c(NA,
-24L), class = c("tbl_df", "tbl", "data.frame")))
I also have a list of IDs in a separate df called identity
dput(identity)
structure(list(identity = c("101-2", "103-13", "103-14", "103-15",
"103-16", "103-17", "103-19", "103-20", "104-1", "104-2", "106-1",
"106-11", "106-12", "106-13", "106-14", "106-2", "106-3", "106-4",
"106-5", "106-6", "106-7", "106-8", "112-1", "113-1", "113-10",
"113-11", "113-12", "113-13", "113-14")), row.names = c(NA, -29L
), class = "data.frame")
I have a vector of the blood tests through which I am trying to apply my function called key:
c("BNP", "proBNP", "troponin", "CRP", "ferritin")
Each variable in key corresponds to one of the blood tests. I am trying to loop through each of the blood tests in the output list and then match those results and corresponding IDs to the identity dataframe making a new column for each blood test. I think the issue may be the looping over a list and then trying to output to an object? I'm probably overcomplicating it!
My failing attempt:
#match output values with IDs
match_IDs <- function(x, y) {
#output[[ c(deparse(substitute(x)), "record_id") ]] gives record ID for blood test
#output[[ c(deparse(substitute(x)), deparse(substitute(x))) ]] gives blood value for blood test
y$new <- output[[ c(deparse(substitute(x)), deparse(substitute(x))) ]][match(y$`df2$record_id`, output[[ c(deparse(substitute(x)), "record_id") ]] )]
return(y$new)
}
lapply(key, identity, function(x) do.call("match_IDs", list(as.name(x)))) -> output2
Outside of the match_IDs function, the code to match works, but I want to automate the process rather than hard code it. This is what i am aiming for:
structure(list(identity = c("101-2", "103-13", "103-14", "103-15",
"103-16", "103-17", "103-19", "103-20", "104-1", "104-2", "106-1",
"106-11", "106-12", "106-13", "106-14", "106-2", "106-3", "106-4",
"106-5", "106-6", "106-7", "106-8", "112-1", "113-1", "113-10",
"113-11", "113-12", "113-13", "113-14"), baseline_CRP = c(54.8,
78, 229, 166, 77, 345, 25, 124, 225.4, 156.2, 141.11, 110, 96.87,
126, 238.97, 6.19, 135.7, 135, 138.7, 84.7, 242, 299, 41, 114.7,
156.2, 112.3, 394.3, 179.5, 93), baseline_bnp = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 67.8, 1873.3, 784.5, 82.3, 156.5, 116.4), baseline_ferritin = c(253.97,
314, 438, 199, 390, 1342, NA, 128, 462.6, NA, NA, 125.8, 428,
237, 302.23, 1651, 133.6, 167, 1746, 343, NA, 1145.96, 697.76,
NA, 690.2, 395.4, 1492.2, 275.4, 254.2), baseline_trop = c(29900,
5, 3, NA, 5, 422, NA, NA, 11300, 10, NA, NA, 5.1, 1.5, 159.7,
NA, NA, NA, 15.9, 1863, 92.8, 706.5, 643, 50, 110, 60, 30, 130,
10), baseline_proBNP = c(NA, 66, NA, NA, 1865, 6143, NA, 275,
1240, NA, NA, NA, 992, 1116, 8469, 165, 2486, 85, NA, 100, 9231,
8561, 815, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -29L), class = c("tbl_df",
"tbl", "data.frame"))
If anyone has any suggestions, I'm keen to learn!
If you fix output so it has the same column name and type as identity, we can just do a bunch of joins.
library(dplyr)
library(purrr)
# first, make the list fit the target
output_fixed <- map(
output,
~ .x %>% mutate(identity = as.character(record_id)) %>% select(-record_id)
)
# then, join repeatedly until at end of the list
output2 <- reduce(output_fixed, left_join, .init = identity, by = 'identity')
head(output2)
identity BNP proBNP troponin CRP ferritin
1 101-2 NA NA 29900 54.8 253.97
2 103-13 NA 66 5 78.0 314.00
3 103-14 NA NA 3 229.0 438.00
4 103-15 NA NA NA 166.0 199.00
5 103-16 NA 1865 5 77.0 390.00
6 103-17 NA 6143 422 345.0 1342.00

Ordering column based on some strings

I have a data in columns I have characters part of which are TRG1, TRG2, TRG3, TRG4 and TRG5
How I can order this data frame based on TRG so that first TRG1 ....finally TRG5 are placed in the columns?
My data is
> dput(head(result))
structure(list(`Sample Name` = c("ACTB", "ATP5F1", "DDX5", "EEF1G",
"GAPDH", "NCL"), `31-10TRG3R` = c(15723, 1682, 16598, 17240,
38686, 10670), `31-11TRG4R` = c(24846, 3294, 25522, 38914, 73022,
14628), `31-12TRG4R` = c(7812, 1326, 5750, 9204, 12352, 5489),
`31-13TRG1R` = c(15332, 1162, 18268, 20875, 62257, 10614),
`31-14TRG4R` = c(7644, 1435, 16822, 13731, 26244, 10548),
`31-15TRG4R` = c(6501, 947, 10320, 7285, 10538, 4638), `31-16TRG4R` = c(5428,
825, 11789, 12018, 6812, 5954), `31-17TRG3R` = c(10074, 1056,
7966, 12489, 26819, 6404), `31-18TRG1R` = c(12487, 567, 13945,
16474, 43309, 11831), `31-19TRG4R` = c(5211, 917, 9144, 8024,
8200, 3935), `31-1TRG3R` = c(9928, 1112, 5726, 6227, 12942,
3644), `31-21TRG3R` = c(6806, 1460, 7472, 12420, 46378, 5871
), `31-22TRG3R` = c(4834, 640, 9807, 7082, 14823, 4594),
`31-23TRG1R` = c(3156, 765, 18034, 18982, 17237, 18880),
`31-24TRG4R` = c(6990, 761, 4440, 2833, 8150, 1340), `31-25TRG2R` = c(60621,
6290, 47502, 135948, 233717, 37583), `31-26TRG3R` = c(4198,
718, 2564, 3830, 5790, 1258), `31-27TRG2R` = c(10815, 1010,
8694, 11868, 18684, 5706), `31-28TRG4R` = c(7980, 1343, 7342,
9874, 14286, 4255), `31-29TRG1R` = c(3854, 748, 9314, 9132,
25546, 7852), `31-2TRG1R` = c(7653, 1495, 12238, 12568, 11296,
11256), `31-30TRG5R` = c(24358, 2091, 15594, 26998, 91442,
20914), `31-31TRG4R` = c(6796, 940, 12752, 11642, 41967,
12922), `31-32TRG2R` = c(127379, 11541, 90020, 74881, 234454,
51464), `31-33TRG1R` = c(4139, 338, 8260, 8650, 13916, 8000
), `31-34TRG3R` = c(37303, 2998, 22122, 30431, 51981, 11737
), `31-35TRG4R` = c(32279, 2718, 42178, 36956, 115962, 21194
), `31-36TRG3R` = c(12424, 1134, 8177, 14462, 20147, 6648
), `31-37TRG2R` = c(7031, 690, 8208, 17495, 28514, 7058),
`31-38TRG3R` = c(3645, 698, 16117, 11122, 25739, 7031), `31-39TRG3R` = c(28273,
2169, 14697, 20890, 68353, 25293), `31-3TRG4R` = c(9250,
1335, 24776, 14674, 31266, 8732), `31-40TRG1R` = c(28858,
2100, 26910, 43331, 104235, 19544), `31-41TRG1R` = c(13980,
1184, 13204, 13624, 47414, 11870), `31-42TRG2R` = c(22697,
2401, 16326, 22962, 40136, 11796), `31-43TRG3R` = c(13820,
797, 16245, 7827, 38292, 6206), `31-44TRG2R` = c(9477, 1244,
7140, 6580, 12457, 5176), `31-45TRG3R` = c(12182, 573, 2818,
3699, 4365, 1639), `31-46TRG1R` = c(5438, 997, 9226, 26045,
17740, 8628), `31-47TRG3R` = c(14419, 1927, 7350, 10375,
15736, 3415), `31-48TRG2R` = c(8758, 1002, 8044, 6677, 17354,
7355), `31-49TRG4R` = c(7738, 792, 13920, 15589, 42536, 14056
), `31-4TRG3R` = c(9947, 1115, 7267, 5957, 13831, 2793),
`31-50TRG4R` = c(6660, 701, 4092, 16796, 7958, 2408), `31-51TRG2R` = c(151880,
16572, 93610, 110556, 303604, 57029), `31-52TRG2R` = c(7184,
1396, 12785, 11124, 13050, 8934), `31-53TRG2R` = c(9012,
1118, 7786, 11482, 19512, 9143), `31-5TRG2R` = c(5479, 440,
8913, 7103, 15886, 5801), `31-6TRG4R` = c(6716, 677, 8812,
12184, 14380, 7684), `31-7TRG3R` = c(16192, 1155, 9405, 11930,
30034, 7726), `31-8TRG1R` = c(11408, 1007, 11396, 20424,
38188, 9570), `31-9TRG1R` = c(9468, 812, 10774, 8504, 15464,
4606)), row.names = c(NA, 6L), class = "data.frame")
>
May be, we extract the digits after the 'TRG' and use that in order
result2 <- result[c(1, order(as.numeric(sub(".*TRG(\\d+)\\D+", "\\1",
names(result)[-1])))+1)]

R - How to perform cross-year date operations?

I am working with daily measurements of temperature. In total I have about 40 years of observations. How can I perform date operations covering a time interval that crosses years?
For example, I want to sum the values from every october-to-february period. However, the sum should be taken only on the contiguous period of oct-nov-dec-jan-feb.
"Isolated" months should not be taken into account, like for example jan and feb of the first year, and oct-nov-dec of the last year. The sum has to run over the contiguous period only (from oct-nov-dec-jan-fev).
For example, this is what I am looking for:
1st year 2nd year 3rd year
J-F-M-A-M-J-J-A-S-**O-N-D J-F**-M-A-M-J-J-A-S-**O-N-D J-F**-M-A-M-J-J-A-S-O-N-D
But this is not OK:
1st year 2nd year 3rd year
**J-F**-M-A-M-J-J-A-S-**O-N-D J-F**-M-A-M-J-J-A-S-**O-N-D J-F**-M-A-M-J-J-A-S-**O-N-D**
This is a sample data frame to work on:
df <- structure(list(date = structure(c(-3653, -3622, -3593, -3562,
-3532, -3501, -3471, -3440, -3409, -3379, -3348, -3318, -3287,
-3256, -3228, -3197, -3167, -3136, -3106, -3075, -3044, -3014,
-2983, -2953, -2922, -2891, -2863, -2832, -2802, -2771, -2741,
-2710, -2679, -2649, -2618, -2588, -2557, -2526, -2498, -2467,
-2437, -2406, -2376, -2345, -2314, -2284, -2253, -2223, -2192,
-2161, -2132, -2101, -2071, -2040, -2010, -1979, -1948, -1918,
-1887, -1857, -1826, -1795, -1767, -1736, -1706, -1675, -1645,
-1614, -1583, -1553, -1522, -1492, -1461, -1430, -1402, -1371,
-1341, -1310, -1280, -1249, -1218, -1188, -1157, -1127, -1096,
-1065, -1037, -1006, -976, -945, -915, -884, -853, -823, -792,
-762, -731, -700, -671, -640, -610, -579, -549, -518, -487, -457,
-426, -396, -365, -334, -306, -275, -245, -214, -184, -153, -122,
-92, -61, -31, 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304,
334, 365, 396, 424, 455, 485, 516, 546, 577, 608, 638, 669, 699,
730, 761, 790, 821, 851, 882, 912, 943, 974, 1004, 1035, 1065,
1096, 1127, 1155, 1186, 1216, 1247, 1277, 1308, 1339, 1369, 1400,
1430, 1461, 1492, 1520, 1551, 1581, 1612, 1642, 1673, 1704, 1734,
1765, 1795, 1826, 1857, 1885, 1916, 1946, 1977, 2007, 2038, 2069,
2099, 2130, 2160, 2191, 2222, 2251, 2282, 2312, 2343, 2373, 2404,
2435, 2465, 2496, 2526, 2557, 2588, 2616, 2647, 2677, 2708, 2738,
2769, 2800, 2830, 2861, 2891, 2922, 2953, 2981, 3012, 3042, 3073,
3103, 3134, 3165, 3195, 3226, 3256, 3287, 3318, 3346, 3377, 3407,
3438, 3468, 3499, 3530, 3560, 3591, 3621, 3652, 3683, 3712, 3743,
3773, 3804, 3834, 3865, 3896, 3926, 3957, 3987, 4018, 4049, 4077,
4108, 4138, 4169, 4199, 4230, 4261, 4291, 4322, 4352, 4383, 4414,
4442, 4473, 4503, 4534, 4564, 4595, 4626, 4656, 4687, 4717, 4748,
4779, 4807, 4838, 4868, 4899, 4929, 4960, 4991, 5021, 5052, 5082,
5113, 5144, 5173, 5204, 5234, 5265, 5295, 5326, 5357, 5387, 5418,
5448, 5479, 5510, 5538, 5569, 5599, 5630, 5660, 5691, 5722, 5752,
5783, 5813, 5844, 5875, 5903, 5934, 5964, 5995, 6025, 6056, 6087,
6117, 6148, 6178, 6209, 6240, 6268, 6299, 6329, 6360, 6390, 6421,
6452, 6482, 6513, 6543, 6574, 6605, 6634, 6665, 6695, 6726, 6756,
6787, 6818, 6848, 6879, 6909, 6940, 6971, 6999, 7030, 7060, 7091,
7121, 7152, 7183, 7213, 7244, 7274), class = "Date"), temp = c(22.9223529411765,
23.0705882352941, 23.1094117647059, 20.7835294117647, 17.4517647058824,
17.3176470588235, 18.0494117647059, 19.6188235294118, 21.3023529411765,
23.1105882352941, 22.2364705882353, 22.7482352941176, 23.5870588235294,
24.0023529411765, 23.0094117647059, 22.0176470588235, 19.4917647058824,
18.1011764705882, 18.3164705882353, 20.0623529411765, 22.8717647058824,
23.2576470588235, 23.68, 22.3694117647059, 22.9517647058824,
23.6976470588235, 23.3294117647059, 20.8564705882353, 18.16,
15.8988235294118, 15.7988235294118, 18.4176470588235, 20.8423529411765,
20.3247058823529, 22.3070588235294, 22.2035294117647, 24.2235294117647,
23.6976470588235, 24.4082352941176, 21.1752941176471, 18.1023529411765,
16.1211764705882, 18.3164705882353, 19.7635294117647, 23.1294117647059,
22.9964705882353, 23.6552941176471, 22.6964705882353, 23.6011764705882,
23.6517647058824, 23.7035294117647, 22.4352941176471, 18.5835294117647,
16.5976470588235, 15.7741176470588, 19.2541176470588, 20.8776470588235,
20.5729411764706, 21.1729411764706, 21.5870588235294, 22.4576470588235,
23.6058823529412, 21.84, 21.6694117647059, 19.2458823529412,
18.7517647058824, 17.7811764705882, 19.4764705882353, 21.9270588235294,
21.5470588235294, 22.88, 23.2458823529412, 24.2776470588235,
25.2470588235294, 23.4694117647059, 21.4435294117647, 19.3941176470588,
18.5447058823529, 17.6, 18.3764705882353, 19.8529411764706, 22.0823529411765,
22.7294117647059, 23.4011764705882, 23.3611764705882, 24.2505882352941,
23.2870588235294, 21.9482352941176, 20.5552941176471, 18.0788235294118,
18.5929411764706, 20.8752941176471, 21.9023529411765, 23.6105882352941,
22.4070588235294, 21.5635294117647, 23.3129411764706, 22.9741176470588,
23.3670588235294, 19.6105882352941, 16.9941176470588, 17.7670588235294,
17.4858823529412, 17.8517647058824, 20.26, 22.1576470588235,
23.8364705882353, 23.4447058823529, 24.8129411764706, 25.1764705882353,
24.2694117647059, 21.5035294117647, 20.0458823529412, 18.4694117647059,
18.4541176470588, 19.5388235294118, 22.02, 20.5364705882353,
22.9858823529412, 21.9752941176471, 23.7729411764706, 24.0576470588235,
24.0941176470588, 22.1552941176471, 21.2329411764706, 19.5611764705882,
17.8788235294118, 18.6823529411765, 20.1541176470588, 21.6258823529412,
21.5211764705882, 23.9811764705882, 24.8352941176471, 24.5882352941176,
24.1729411764706, 21.1035294117647, 19.0435294117647, 17.08,
17.4529411764706, 19.1458823529412, 20.4447058823529, 20.7129411764706,
21.5047058823529, 22.6952941176471, 23.4364705882353, 23.1, 24.1847058823529,
19.8105882352941, 19.9847058823529, 20.5188235294118, 17.7658823529412,
19.4435294117647, 20.7588235294118, 21.7835294117647, 22.7788235294118,
23.2388235294118, 24.9129411764706, 25.6, 23.5647058823529, 24.0058823529412,
19.7823529411765, 19.3152941176471, 18.7741176470588, 19.0305882352941,
20.5576470588235, 21.3611764705882, 21.4247058823529, 23.4811764705882,
23.6505882352941, 25.1870588235294, 23.3541176470588, 21.4823529411765,
18.7364705882353, 17.7235294117647, 18.3976470588235, 19.7235294117647,
21.0741176470588, 21.6094117647059, 22.9635294117647, 22.4011764705882,
23.4152941176471, 24.7741176470588, 24.3270588235294, 20.7976470588235,
18.8764705882353, 17.7788235294118, 16.4129411764706, 21.4117647058824,
22.3317647058824, 21.66, 22.3694117647059, 23.0917647058824,
24.4541176470588, 23.2847058823529, 23.3164705882353, 21.2529411764706,
19.1258823529412, 17.3882352941176, 17.3823529411765, 19.0529411764706,
19.6576470588235, 20.2976470588235, 21.9023529411765, 23.3094117647059,
24.0117647058824, 25.5611764705882, 24.9129411764706, 21.3964705882353,
19.9870588235294, 18.3929411764706, 20.9917647058824, 20.3058823529412,
21.4435294117647, 23.1941176470588, 22.8388235294118, 22.5176470588235,
24.6317647058824, 24.6541176470588, 24.2, 20.84, 18.4576470588235,
17.5011764705882, 19.16, 20.54, 20.1517647058824, 22.6776470588235,
22.7470588235294, 22.7882352941176, 22.0811764705882, 24.2152941176471,
22.9235294117647, 20.8411764705882, 19.6188235294118, 17.16,
16.0529411764706, 20.3223529411765, 19.9752941176471, 22.5152941176471,
22.2705882352941, 23.1541176470588, 23.1047058823529, 23.9517647058824,
24.8176470588235, 22.18, 20.5023529411765, 17.3505882352941,
19.1917647058824, 19.9894117647059, 19.0235294117647, 22.8235294117647,
22.7094117647059, 23.8741176470588, 24.0517647058824, 25.1764705882353,
23.9235294117647, 21.2929411764706, 20.6117647058824, 17.1305882352941,
16.3470588235294, 19.6470588235294, 21.3341176470588, 20.2176470588235,
23.7435294117647, 22.6741176470588, 22.9070588235294, 24.7152941176471,
23.2905882352941, 20.5776470588235, 18.9635294117647, 19.0658823529412,
18.8423529411765, 20.0729411764706, 21.3047058823529, 22.1588235294118,
24.0388235294118, 22.1917647058824, 24.0517647058824, 24.8729411764706,
23.0117647058824, 23, 21.3094117647059, 19.4105882352941, 20.3470588235294,
19.4482352941176, 20.0670588235294, 21.6364705882353, 23.4211764705882,
23.16, 25.4788235294118, 26.4741176470588, 24.0482352941176,
21.4176470588235, 21.7164705882353, 19.0905882352941, 19.6752941176471,
18.1611764705882, 20.0482352941176, 23.4917647058824, 23.4894117647059,
22.5482352941176, 23.1376470588235, 24.9811764705882, 24.1552941176471,
22.8423529411765, 19.7435294117647, 16.4, 17.3105882352941, 20.5235294117647,
21.0494117647059, 23.1352941176471, 23.9435294117647, 23.9058823529412,
24.9835294117647, 24.6952941176471, 24.0047058823529, 23.3164705882353,
21.5823529411765, 18.3447058823529, 18.1964705882353, 20.0035294117647,
20.7152941176471, 22.5705882352941, 24.6541176470588, 23.2329411764706,
25.0517647058824, 24.3329411764706, 23.5811764705882, 22.9988235294118,
19.4976470588235, 17.3188235294118, 19.5635294117647, 19.0211764705882,
19.7223529411765, 22.6858823529412, 23.9423529411765, 23.6905882352941,
25.7129411764706, 23.9505882352941, 24.4376470588235, 22.6070588235294,
19.8882352941176, 17.2058823529412, 16.4211764705882, 20.02,
21.9458823529412, 21.9341176470588, 22.74, 23.8, 23.9611764705882,
24.4564705882353, 24, 23.2129411764706, 19.4729411764706, 17.7105882352941,
16.9682352941176, 19.0341176470588, 20.2917647058824, 20.7776470588235,
22.9364705882353, 22.7894117647059)), .Names = c("date", "temp"
), row.names = c(NA, -360L), class = "data.frame")
Any input appreciated.
Hopefully this helps:
df$date = as.POSIXct(df$date,format="%Y-%m-%d")
df$year = as.numeric(format(df$date,format="%Y"))
df$month = as.numeric(format(df$date,format="%m"))
years = unique(df$year)
# initialize a new data frame to store in your summed values
newdf=NULL
# run through a loop starting at your second year and ending at second last
for(i in 2:(length(years)-1)){
#data from year1
start = df[df$year==years[i] & df$month %in% c(10,11,12),]
end = df[df$year==years[i+1] & df$month %in% c(1,2),]
data1 = rbind(start,end)
# in case you have NAs in your data you can add ra.rm = T
sum.data = sum(data1$temp,na.rm = T)
df1 = as.data.frame(list(Year = years[i],
sum.data = sum.data))
# or paste year 1 and year 2 together
#df1 = as.data.frame(list(Year = paste(years[i],years[i+1],sep="-"),
# sum.data = sum.data))
newdf = rbind(newdf,df1)
}
head(newdf)

Time Series based Forecasting for Daily Data but Seasonality is Quarterly - in R

I have demand for a product on daily bases for last 4 years. This demand has quarterly seasonal patterns, as shown in following image
I would like to do time series based forecasting on this data. Following is my code
myts = ts(forecastsku1$Value,frequency=90)
fit <- stl(myts, s.window="period")
plot(fit)
fit <- decompose(myts)
plot(fit)
Here instead of 4 seasonal factor ts is creating 90 seasonal factor, which is not what I want. I want to apply same seasonality on 3 month duration and then do forecasting.
Data for reference
dput(head(forecastsku1,100))
structure(list(date = structure(c(14625, 14626, 14627, 14628, 14629, 14630, 14631, 14632, 14633, 14634, 14635, 14636, 14637,
14638, 14639, 14640, 14641, 14642, 14643, 14644, 14645, 14646, 14647, 14648, 14649, 14650, 14651, 14652, 14653, 14654, 14655,
14656, 14657, 14658, 14659, 14660, 14661, 14662, 14663, 14664, 14665, 14666, 14667, 14668, 14669, 14670, 14671, 14672, 14673,
14674, 14675, 14676, 14677, 14678, 14679, 14680, 14681, 14682, 14683, 14684, 14685, 14686, 14687, 14688, 14689, 14690, 14691,
14692, 14693, 14694, 14695, 14696, 14697, 14698, 14699, 14700, 14701, 14702, 14703, 14704, 14705, 14706, 14707, 14708, 14709,
14710, 14711, 14712, 14713, 14714, 14715, 14716, 14717, 14718, 14719, 14720, 14721, 14722, 14723, 14724), class = "Date"),
Value = c(1407, 1413, 1407, 1406, 1401, 1410, 1411, 1416, 1404, 1409, 1414, 1414, 1400, 1421, 1398, 1404, 1397, 1404, 1407, 1409, 1406, 1395, 1397,
1403, 1412, 1399, 1409, 1393, 1405, 1403, 1406, 1402, 1405, 1386, 1393, 1405, 1397, 1393, 1402, 1402, 1393, 1391, 1410, 1402, 1408,
1394, 1404, 1398, 1406, 1389, 1401, 1391, 1394, 1384, 1377, 1390, 1395, 1399, 1384, 1397, 1398, 1384, 1377, 1394, 1398, 1394, 1391,
1403, 1382, 1390, 1385, 1403, 1390, 1388, 1391, 1384, 1392, 1390, 1381, 1387, 1395, 1390, 1388, 1384, 1387, 1395, 1380, 1378, 1383,
1384, 1232, 1247, 1232, 1248, 1236, 1236, 1231, 1237, 1224, 1236)),
.Names = c("date", "Value"), row.names = 13150:13249, class = "data.frame")
Can anyone help me in this case? Please let me know if more data required.
myts = ts(forecastsku1$Value,frequency=4)
fit <- decompose(myts)
plot(fit)
Result would be:
It is creating a 90 seasonal factor because your frequency is 90 in the ts definition. What you need to do is to specify a start and end in the ts and the period=4 so that the observations can be segregated the way you want them to be.. if you can successfully create a 4 seasonal factor, you can obviousy predict quarterly (4*3=12) . So instead of these dates I think it is more clear to have like start=c(2005,1) .Hopefully this is useful
this is an old question, but still, maybe my answer is of some value.
You can seasonally adjust daily data using the dsa package (disclaimer: I'm the author).
I tried to replicate your time series (or something similar) to give you an idea of how to seasonally adjust them (the setting of the seasonal adjustment try to help modelling the jumping behaviour of the time series appropriately):
# loading packages
library(dsa); library(xts)
# Replication of the data
set.seed(23)
data <- seq(1250, 1000, , length.out=365.25*4) + rnorm(365.25*4, 0, 5)
time <- seq(as.Date("2008-01-01"), by="days", length.out=365.25*4)
x <- xts(data, time)
ind <- as.numeric(format(zoo::index(x), "%m")) # Indicator of day of year
x[ind==1 | ind==2 | ind==3 | ind==7 | ind==8 | ind==9] <-
x[ind==1 | ind==2 | ind==3 | ind==7 | ind==8 | ind==9] + 200
# Seasonally adjusting the data
result <- dsa(x, fourier_number=40, reiterate3=4, reg.create=NULL, cval=30)
sa <- result$output[,1]
xtsplot(result$output[,c(2,1)], names=c("original", "seasonally adjusted"))
output(result) # creates a html in your working directory.

Resources