Spreading a data frame using an "external" df? - r

I have two data frames data1 and data2. I am trying to spread my data or create dummy variables on one column x2 in data1. I can do the following:
library(dummies)
x2dummy <- dummy(data1$x2)
final_out <- cbind(data1, x1dummy)
Which will give me a large data frame of 190 columns and 500 observations, however the universe of x2 items is larger than that in the current data frame data1. I have a sort of dictionary or a different data frame consisting of all the unique items which can be chosen data2. How can I spread my data data1 by data2 so that I will have 441 dummy variable columns (the length of data2) and populate it with the items in data1?
EDIT: Adding new smaller sample of data:
Data 1:
data1 <- structure(list(y = c(440000, 550000, 990, 135000, 267000, 135000,
239000, 170000, 855000, 158000, 1200, 256000, 86000, 98700, 450000,
130000, 465000, 308000, 680000, 305000), x1 = c(240, 156, 52,
74, 85, 70, 160, 176, 386, 65, 52, 90, 87, 193, 110, 105, 126,
76, 153, 133), x2 = c(8338, 8860, 8003, 8207, 8901, 8224, 8811,
8508, 8840, 8940, 8012, 8223, 8206, 8490, 8023, 8490, 8870, 8024,
8011, 8394)), .Names = c("y", "x1", "x2"), row.names = c(NA,
20L), class = "data.frame")
Data2:
data2 <- c(4375, 8001, 8002, 8003, 8004, 8005, 8006, 8007, 8008, 8009,
8010, 8011, 8012, 8013, 8014)
EDIT:
Thanks for the edits from the community however now data2 does not contain the full universe of information. For example; in data1 -> x2 = 8206 however this does not appear in data2 above which is what I am trying to spread the data by.
I want to spread the columns of a new data frame by all unique values in data2 and then populate these columns with the values in data1 column x2.
Based on the small data in data1 I will have a very sparse matrix.
Data2
data2 <- structure(list(x2_dictionary = c(4375, 8001, 8002, 8003, 8004,
8005, 8006, 8007, 8008, 8009, 8010, 8011, 8012, 8013, 8014, 8015,
8016, 8017, 8018, 8019, 8020, 8021, 8022, 8023, 8024, 8025, 8026,
8026, 8027, 8028, 8029, 8030, 8031, 8032, 8033, 8034, 8035, 8036,
8037, 8038, 8039, 8040, 8041, 8042, 8100, 8104, 8105, 8106, 8107,
8110, 8120, 8130, 8140, 8146, 8148, 8148, 8150, 8160, 8161, 8170,
8172, 8173, 8174, 8175, 8178, 8180, 8181, 8182, 8183, 8183, 8183,
8184, 8184, 8185, 8186, 8187, 8188, 8189, 8190, 8191, 8192, 8193,
8194, 8195, 8196, 8197, 8198, 8201, 8202, 8203, 8204, 8205, 8206,
8207, 8208, 8210, 8211, 8212, 8213, 8214, 8220, 8221, 8222, 8223,
8224, 8225, 8226, 8227, 8228, 8230, 8231, 8232, 8233, 8240, 8241,
8242, 8243, 8250, 8251, 8251, 8253, 8254, 8254, 8255, 8256, 8256,
8259, 8260, 8261, 8262, 8263, 8269, 8269, 8270, 8270, 8271, 8272,
8273, 8274, 8275, 8275, 8278, 8278, 8279, 8280, 8281, 8281, 8281,
8281, 8282, 8282, 8289, 8289, 8290, 8291, 8292, 8293, 8294, 8295,
8296, 8297, 8298, 8299, 8301, 8302, 8303, 8304, 8310, 8317, 8318,
8319, 8320, 8328, 8329, 8330, 8338, 8339, 8340, 8348, 8349, 8350,
8350, 8358, 8359, 8360, 8370, 8380, 8384, 8389, 8390, 8391, 8392,
8393, 8394, 8395, 8396, 8397, 8398, 8401, 8401, 8402, 8403, 8410,
8415, 8416, 8420, 8430, 8440, 8440, 8445, 8450, 8455, 8458, 8458,
8459, 8459, 8460, 8460, 8460, 8461, 8469, 8469, 8470, 8470, 8471,
8472, 8474, 8476, 8479, 8480, 8490, 8495, 8500, 8503, 8503, 8504,
8504, 8505, 8506, 8507, 8508, 8508, 8509, 8510, 8510, 8511, 8511,
8512, 8513, 8514, 8515, 8516, 8518, 8519, 8519, 8519, 8519, 8520,
8521, 8529, 8530, 8530, 8540, 8550, 8551, 8552, 8553, 8554, 8559,
8560, 8569, 8569, 8570, 8571, 8572, 8573, 8580, 8585, 8587, 8588,
8589, 8589, 8589, 8590, 8591, 8591, 8592, 8593, 8600, 8607, 8610,
8611, 8612, 8613, 8619, 8619, 8619, 8620, 8629, 8630, 8635, 8640,
8650, 8660, 8670, 8672, 8672, 8680, 8690, 8691, 8692, 8693, 8693,
8694, 8694, 8695, 8695, 8696, 8696, 8697, 8698, 8699, 8699, 8699,
8699, 8700, 8710, 8711, 8712, 8717, 8717, 8718, 8719, 8719, 8719,
8719, 8720, 8729, 8730, 8731, 8731, 8732, 8732, 8733, 8734, 8734,
8735, 8736, 8737, 8738, 8739, 8739, 8740, 8750, 8753, 8754, 8755,
8756, 8757, 8758, 8759, 8760, 8769, 8770, 8770, 8773, 8775, 8776,
8777, 8779, 8780, 8781, 8782, 8783, 8784, 8785, 8786, 8787, 8787,
8787, 8787, 8788, 8789, 8790, 8791, 8792, 8792, 8793, 8794, 8795,
8796, 8797, 8798, 8798, 8799, 8800, 8801, 8810, 8811, 8812, 8818,
8820, 8830, 8840, 8840, 8849, 8850, 8859, 8860, 8870, 8871, 8880,
8901, 8902, 8903, 8904, 8905, 8906, 8907, 8908, 8911, 8912, 8913,
8914, 8915, 8916, 8917, 8918, 8921, 8922, 8923, 8924, 8930, 8940,
8950, 8960, 8970, 8980, 17532, 43421, 80338)), class = "data.frame", row.names = c(NA,
-441L), .Names = "x2_dictionary")

Related

Ordering column based on some strings

I have a data in columns I have characters part of which are TRG1, TRG2, TRG3, TRG4 and TRG5
How I can order this data frame based on TRG so that first TRG1 ....finally TRG5 are placed in the columns?
My data is
> dput(head(result))
structure(list(`Sample Name` = c("ACTB", "ATP5F1", "DDX5", "EEF1G",
"GAPDH", "NCL"), `31-10TRG3R` = c(15723, 1682, 16598, 17240,
38686, 10670), `31-11TRG4R` = c(24846, 3294, 25522, 38914, 73022,
14628), `31-12TRG4R` = c(7812, 1326, 5750, 9204, 12352, 5489),
`31-13TRG1R` = c(15332, 1162, 18268, 20875, 62257, 10614),
`31-14TRG4R` = c(7644, 1435, 16822, 13731, 26244, 10548),
`31-15TRG4R` = c(6501, 947, 10320, 7285, 10538, 4638), `31-16TRG4R` = c(5428,
825, 11789, 12018, 6812, 5954), `31-17TRG3R` = c(10074, 1056,
7966, 12489, 26819, 6404), `31-18TRG1R` = c(12487, 567, 13945,
16474, 43309, 11831), `31-19TRG4R` = c(5211, 917, 9144, 8024,
8200, 3935), `31-1TRG3R` = c(9928, 1112, 5726, 6227, 12942,
3644), `31-21TRG3R` = c(6806, 1460, 7472, 12420, 46378, 5871
), `31-22TRG3R` = c(4834, 640, 9807, 7082, 14823, 4594),
`31-23TRG1R` = c(3156, 765, 18034, 18982, 17237, 18880),
`31-24TRG4R` = c(6990, 761, 4440, 2833, 8150, 1340), `31-25TRG2R` = c(60621,
6290, 47502, 135948, 233717, 37583), `31-26TRG3R` = c(4198,
718, 2564, 3830, 5790, 1258), `31-27TRG2R` = c(10815, 1010,
8694, 11868, 18684, 5706), `31-28TRG4R` = c(7980, 1343, 7342,
9874, 14286, 4255), `31-29TRG1R` = c(3854, 748, 9314, 9132,
25546, 7852), `31-2TRG1R` = c(7653, 1495, 12238, 12568, 11296,
11256), `31-30TRG5R` = c(24358, 2091, 15594, 26998, 91442,
20914), `31-31TRG4R` = c(6796, 940, 12752, 11642, 41967,
12922), `31-32TRG2R` = c(127379, 11541, 90020, 74881, 234454,
51464), `31-33TRG1R` = c(4139, 338, 8260, 8650, 13916, 8000
), `31-34TRG3R` = c(37303, 2998, 22122, 30431, 51981, 11737
), `31-35TRG4R` = c(32279, 2718, 42178, 36956, 115962, 21194
), `31-36TRG3R` = c(12424, 1134, 8177, 14462, 20147, 6648
), `31-37TRG2R` = c(7031, 690, 8208, 17495, 28514, 7058),
`31-38TRG3R` = c(3645, 698, 16117, 11122, 25739, 7031), `31-39TRG3R` = c(28273,
2169, 14697, 20890, 68353, 25293), `31-3TRG4R` = c(9250,
1335, 24776, 14674, 31266, 8732), `31-40TRG1R` = c(28858,
2100, 26910, 43331, 104235, 19544), `31-41TRG1R` = c(13980,
1184, 13204, 13624, 47414, 11870), `31-42TRG2R` = c(22697,
2401, 16326, 22962, 40136, 11796), `31-43TRG3R` = c(13820,
797, 16245, 7827, 38292, 6206), `31-44TRG2R` = c(9477, 1244,
7140, 6580, 12457, 5176), `31-45TRG3R` = c(12182, 573, 2818,
3699, 4365, 1639), `31-46TRG1R` = c(5438, 997, 9226, 26045,
17740, 8628), `31-47TRG3R` = c(14419, 1927, 7350, 10375,
15736, 3415), `31-48TRG2R` = c(8758, 1002, 8044, 6677, 17354,
7355), `31-49TRG4R` = c(7738, 792, 13920, 15589, 42536, 14056
), `31-4TRG3R` = c(9947, 1115, 7267, 5957, 13831, 2793),
`31-50TRG4R` = c(6660, 701, 4092, 16796, 7958, 2408), `31-51TRG2R` = c(151880,
16572, 93610, 110556, 303604, 57029), `31-52TRG2R` = c(7184,
1396, 12785, 11124, 13050, 8934), `31-53TRG2R` = c(9012,
1118, 7786, 11482, 19512, 9143), `31-5TRG2R` = c(5479, 440,
8913, 7103, 15886, 5801), `31-6TRG4R` = c(6716, 677, 8812,
12184, 14380, 7684), `31-7TRG3R` = c(16192, 1155, 9405, 11930,
30034, 7726), `31-8TRG1R` = c(11408, 1007, 11396, 20424,
38188, 9570), `31-9TRG1R` = c(9468, 812, 10774, 8504, 15464,
4606)), row.names = c(NA, 6L), class = "data.frame")
>
May be, we extract the digits after the 'TRG' and use that in order
result2 <- result[c(1, order(as.numeric(sub(".*TRG(\\d+)\\D+", "\\1",
names(result)[-1])))+1)]

Replacing values in df using index - why not working?

I am using the function provided in here: Replacing values in df using index and here: How to repeat the Grubbs test and flag the outliers
# Function to detect outliers with Grubbs test in a vector
grubbs.flag <- function(vector) {
outliers <- NULL
test <- vector
grubbs.result <- grubbs.test(test)
pv <- grubbs.result$p.value
# throw an error if there are too few values for the Grubb's test
if (length(test) < 3 ) stop("Grubb's test requires > 2 input values")
na.vect <- test
while(pv < 0.05) {
outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3]))
test <- vector[!vector %in% outliers]
# stop if all but two values are flagged as outliers
if (length(test) < 3 ) {
warning("All but two values flagged as outliers")
break
}
grubbs.result <- grubbs.test(test)
pv <- grubbs.result$p.value
idx.outlier <- which(vector %in% outliers)
na.vect <- replace(vector, idx.outlier, NA)
}
return(na.vect)
}
It works perfectly on example data provided there. But when I am trying to run it on my dataframe its seems that loop does not end or something. Does anyone know why is that?
My data:
test <- structure(list(Abs_18 = c(0.04359, 0.05682, 0.05002, 0.04997,
0.03433, 0.060055, 0.0447, 0.0499, 0.04509, 0.04875, 0.04052,
0.062785, 0.07602, 0.05072, 0.04253, 0.05595, 0.02888, 0.077018,
0.05416, 0.04966, 0.0476, 0.04252, 0.03891, 0.065207, 0.02675,
0.05892, 0.03523, 0.04546, 0.02696, 0.024995, 0.02469, 0.0442,
0.04504, 0.04421, 0.04683, 0.08017, -0.065334, 0.04914, 0.04086,
0.05341, 0.02706, 0.065362, 0.01571, 0.01021, 0.04802, 0.04807,
0.02735, 0.062755), FL_18 = c(3618, 3526, 3543, 5323, 5050, 767,
3641, 3418, 3353, 4179, 4864, 760, 3693, 3408, 3309, 5057, 4686,
748, 3693, 3349, 3240, 3934, 4876, 741, 2394, 3477, 3417, 4254,
4899, 755, 2375, 3486, 3370, 4516, 4838, 772, 817, 3449, 3361,
3945, 4856, 802, 2293, 2529, 3410, 4460, 5175, 813), Abs_25 = c(0.04261,
0.05332, 0.04966, 0.0482, 0.03355, 0.059344, 0.04572, 0.04967,
0.04275, 0.04989, 0.02745, 0.059196, 0.04649, 0.05517, 0.04181,
0.06214, 0.02749, 0.074719, 0.05264, 0.044, 0.04486, 0.03999,
0.0331, 0.058829, 0.03119, 0.05943, 0.03781, 0.04003, 0.02383,
0.069582, 0.02868, 0.04943, 0.04566, 0.0422, 0.03265, 0.067265,
-0.067674, 0.05038, 0.03828, 0.03854, 0.02671, 0.071176, 0.01602,
0.01055, 0.03961, 0.04729, 0.03009, 0.06377), FL_25 = c(2714,
2656, 2625, 3856, 3642, 606, 2759, 2580, 2498, 3276, 3495, 596,
2808, 2590, 2482, 3759, 3365, 586, 2838, 2548, 2433, 2864, 3557,
591, 1878, 2664, 2588, 3081, 3603, 602, 1820, 2672, 2576, 3154,
3589, 617, 572, 2661, 2575, 2918, 3601, 635, 1739, 1924, 2650,
3260, 3866, 655)), .Names = c("Abs_18", "FL_18", "Abs_25", "FL_25"
), row.names = c(NA, -48L), class = "data.frame")
I am using:
apply(test,2,grubbs.flag)

Check date of a year not present in dataset

doing this in R,
I have a set of 361 observations, "Dataset", 2 columns: Date and some numeric. All the dates present are between 2015-01-01 and 2015-12-31. Obviously there are 4 days that don't exist in this set, I would like to know which ones.
I tried to do:
MA <- rep(NA, 365)
for(i in 2:365){
MA[1] <- as.Date("2015-01-01")
MA[i] <- MA[i-1] + days(1)
}
MA[!(%in% Dataset$Date)]
But doesn't work... The vector MA consists of 365 times the number 16436
Anything solution for that?
EDIT:
This is set I called Dataset above:
dput(AW1)
structure(list(Date = structure(c(1420070400, 1420243200, 1420329600,
1420416000, 1420502400, 1420588800, 1420675200, 1420761600, 1420848000,
1420934400, 1421020800, 1421107200, 1421193600, 1421280000, 1421366400,
1421452800, 1421539200, 1421625600, 1421712000, 1421798400, 1421884800,
1421971200, 1422057600, 1422144000, 1422230400, 1422316800, 1422403200,
1422489600, 1422576000, 1422662400, 1422748800, 1422835200, 1422921600,
1423008000, 1423094400, 1423180800, 1423267200, 1423353600, 1423440000,
1423526400, 1423612800, 1423699200, 1423785600, 1423872000, 1423958400,
1424044800, 1424131200, 1424217600, 1424304000, 1424390400, 1424476800,
1424563200, 1424649600, 1424736000, 1424822400, 1424908800, 1424995200,
1425081600, 1425168000, 1425254400, 1425340800, 1425427200, 1425513600,
1425600000, 1425686400, 1425772800, 1425859200, 1425945600, 1426032000,
1426118400, 1426204800, 1426291200, 1426377600, 1426464000, 1426550400,
1426636800, 1426723200, 1426809600, 1426896000, 1426982400, 1427068800,
1427155200, 1427241600, 1427328000, 1427414400, 1427500800, 1427587200,
1427673600, 1427760000, 1427846400, 1427932800, 1428019200, 1428105600,
1428192000, 1428278400, 1428364800, 1428451200, 1428537600, 1428624000,
1428710400, 1428796800, 1428883200, 1428969600, 1429056000, 1429142400,
1429228800, 1429315200, 1429401600, 1429488000, 1429574400, 1429660800,
1429747200, 1429833600, 1429920000, 1430006400, 1430092800, 1430179200,
1430265600, 1430352000, 1430438400, 1430524800, 1430611200, 1430697600,
1430784000, 1430870400, 1430956800, 1431043200, 1431129600, 1431216000,
1431302400, 1431388800, 1431475200, 1431561600, 1431734400, 1431820800,
1431907200, 1431993600, 1432080000, 1432166400, 1432252800, 1432339200,
1432425600, 1432512000, 1432598400, 1432684800, 1432771200, 1432857600,
1432944000, 1433030400, 1433116800, 1433203200, 1433289600, 1433376000,
1433462400, 1433548800, 1433635200, 1433721600, 1433808000, 1433894400,
1433980800, 1434067200, 1434153600, 1434240000, 1434326400, 1434412800,
1434499200, 1434585600, 1434672000, 1434758400, 1434844800, 1434931200,
1435017600, 1435104000, 1435190400, 1435276800, 1435363200, 1435449600,
1435536000, 1435622400, 1435708800, 1435795200, 1435881600, 1435968000,
1436054400, 1436140800, 1436227200, 1436313600, 1436400000, 1436486400,
1436572800, 1436659200, 1436745600, 1436832000, 1436918400, 1437004800,
1437091200, 1437177600, 1437264000, 1437350400, 1437436800, 1437523200,
1437609600, 1437696000, 1437782400, 1437868800, 1437955200, 1438041600,
1438128000, 1438214400, 1438300800, 1438387200, 1438473600, 1438560000,
1438646400, 1438732800, 1438819200, 1438905600, 1438992000, 1439078400,
1439164800, 1439251200, 1439337600, 1439424000, 1439510400, 1439596800,
1439683200, 1439769600, 1439856000, 1439942400, 1440028800, 1440115200,
1440201600, 1440288000, 1440374400, 1440460800, 1440547200, 1440633600,
1440720000, 1440806400, 1440892800, 1440979200, 1441065600, 1441152000,
1441238400, 1441324800, 1441411200, 1441497600, 1441584000, 1441670400,
1441756800, 1441843200, 1441929600, 1442016000, 1442102400, 1442188800,
1442275200, 1442361600, 1442448000, 1442534400, 1442620800, 1442707200,
1442793600, 1442880000, 1442966400, 1443052800, 1443139200, 1443225600,
1443312000, 1443398400, 1443484800, 1443571200, 1443657600, 1443744000,
1443830400, 1443916800, 1444003200, 1444089600, 1444176000, 1444262400,
1444348800, 1444435200, 1444521600, 1444608000, 1444694400, 1444780800,
1444867200, 1444953600, 1445040000, 1445126400, 1445212800, 1445299200,
1445385600, 1445472000, 1445558400, 1445644800, 1445731200, 1445817600,
1445904000, 1445990400, 1446076800, 1446163200, 1446249600, 1446336000,
1446422400, 1446508800, 1446595200, 1446681600, 1446768000, 1446854400,
1446940800, 1447027200, 1447113600, 1447200000, 1447286400, 1447372800,
1447459200, 1447545600, 1447632000, 1447718400, 1447804800, 1447891200,
1447977600, 1448064000, 1448150400, 1448236800, 1448323200, 1448409600,
1448496000, 1448582400, 1448668800, 1448755200, 1448841600, 1448928000,
1449014400, 1449100800, 1449187200, 1449273600, 1449360000, 1449446400,
1449532800, 1449619200, 1449705600, 1449792000, 1449878400, 1449964800,
1450051200, 1450137600, 1450224000, 1450310400, 1450396800, 1450483200,
1450569600, 1450656000, 1450742400, 1450828800, 1450915200, 1451001600,
1451174400, 1451260800, 1451347200, 1451433600), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Volume = c(2224.5, 44.3, 1835.4, 22205.2,
1100.9, 1409.7, 4233.9, 1857.5, 0.5, 1378.6, 1917.7, 4438.1,
73314, 1929.7, 666.9, 26.4, 1331.7, 7182.9, 2902.4, 22501.5,
2632.9, 1301.7, 102, 3673.7, 3446.7, 24917.2, 3867.7, 3977.5,
1780.7, 13.2, 2762.6, 5084.2, 3071.9, 4674, 4061.2, 2567.3, 216.5,
3323.7, 16072.4, 2108.4, 2786.2, 2883.9, 1848, 50.2, 2884.5,
9099.1, 4772.4, 2814.2, 2507.8, 1532.9, 2, 2932.5, 5734.1, 3077.1,
4960.5, 4289.3, 39098.7, 42.7, 1688.5, 3714.8, 6161.5, 4288.6,
25189, 2376.3, 18.4, 2530.1, 28803.4, 4369.3, 7202.6, 3500.1,
1880.4, 1705.5, 1541.4, 10804.1, 3712.7, 3182.5, 3527.6, 2266.8,
123.5, 2721.4, 5698, 8242.8, 4526.2, 13216.9, 1666.8, 61.8, 1596.4,
3999, 2026.6, 8054.1, 7198.6, 1754.9, 9.7, 44.4, 2837.6, 3479.5,
5583.3, 2247.9, 11005.5, 112, 614.1, 3668.8, 2464.6, 2156.6,
2086, 854.2, 90.1, 673.2, 18881.6, 2561.1, 11970.8, 2405.9, 1322.4,
226.2, 900.7, 1119.4, 3307.2, 10196, 2721.7, 27680.5, 7.4, 1130.1,
5506.6, 4332.5, 4490, 3839.1, 3902.9, 160.1, 1335.7, 13019.7,
1928.8, 2770.7, 58916.9, 200.6, 1759.9, 5744.1, 4217.8, 1734.2,
2385.6, 2810.8, 2409.8, 616.3, 2927.8, 1196.8, 4121.3, 18369.2,
2028, 3970, 1653.5, 8414.8, 3273.6, 2806.7, 3887.8, 1921, 3088.3,
1969.7, 1570.6, 3932.8, 16083.7, 4239.9, 2512.2, 2256.3, 618.8,
2312.8, 3129.2, 2973.7, 3311, 1889.8, 4972.5, 1871.8, 1480.9,
3875.4, 2899.1, 3199.6, 1227.6, 22825.8, 1704.6, 2799.4, 2039.6,
1579.7, 4847.7, 1284.8, 68.7, 1506.6, 18901.3, 13065.2, 30693.9,
4664.7, 4345, 11.6, 519.9, 2128.6, 4278.8, 2287.6, 2350.6, 577.7,
5.5, 987.8, 11598.7, 3479.5, 195.2, 5739.5, 2712.7, 45.6, 209.2,
5504.3, 2638.1, 1502.4, 2591.6, 983.5, 47.2, 556.9, 6807.1, 3577.6,
1790.5, 3795.6, 2223.6, 37.7, 599.7, 3029.7, 3722.8, 3904.5,
3650.1, 1190.3, 100.6, 605.9, 2981.2, 2090.1, 1876.7, 2296.2,
1013.7, 49.8, 421.3, 3973.4, 3028.6, 2808.4, 3595.6, 1450, 43.4,
914.4, 4933.7, 3790.2, 1735.5, 2675.1, 1211.9, 48, 1134.9, 3888.2,
5568.9, 3657.6, 7268.8, 2565.8, 44.1, 509.6, 56995.8, 2383.3,
1789.9, 4338.9, 2458.1, 63.4, 1073.7, 4398.2, 3822.8, 879, 2079,
2036.6, 216.6, 633.8, 9265.2, 1682.8, 1500.9, 3907.3, 2813.5,
17, 4582.7, 9989.6, 3588.3, 5064.6, 97352.7, 1892.3, 54, 1141.1,
10532.7, 9683, 19452.3, 4151.3, 2243, 33.7, 2208.9, 6159.6, 5811.6,
54718, 4610.5, 3598.8, 167.3, 8045.6, 6464.1, 3895.1, 3857.8,
4043.6, 2080.8, 350.4, 16011.2, 7012.4, 4329.9, 4554.6, 7454.4,
4379, 49.9, 2446.7, 32326.9, 28430.4, 11898.1, 11953.9, 3514.7,
74.3, 7928.2, 2188.7, 1895.9, 2113.7, 4400.2, 2367, 10, 2460,
2607.7, 14809.5, 2594.6, 2670.7, 3387.4, 26.2, 2321.6, 2555.1,
2302, 17930.3, 5320.1, 1865.2, 69, 3560.6, 1396.6, 3248, 2639.1,
4639.1, 327.2, 177.8, 3518.4, 3120.7, 4778.8, 4848.4, 2806.6,
3855.5, 1.7, 4524.5, 2473.7, 4024.4, 2574.3, 1350.6, 2.9, 703.1,
940.7, 9048.1, 164.2)), .Names = c("Date", "Volume"), row.names = c(NA,
-361L), class = "data.frame")

R - How to perform cross-year date operations?

I am working with daily measurements of temperature. In total I have about 40 years of observations. How can I perform date operations covering a time interval that crosses years?
For example, I want to sum the values from every october-to-february period. However, the sum should be taken only on the contiguous period of oct-nov-dec-jan-feb.
"Isolated" months should not be taken into account, like for example jan and feb of the first year, and oct-nov-dec of the last year. The sum has to run over the contiguous period only (from oct-nov-dec-jan-fev).
For example, this is what I am looking for:
1st year 2nd year 3rd year
J-F-M-A-M-J-J-A-S-**O-N-D J-F**-M-A-M-J-J-A-S-**O-N-D J-F**-M-A-M-J-J-A-S-O-N-D
But this is not OK:
1st year 2nd year 3rd year
**J-F**-M-A-M-J-J-A-S-**O-N-D J-F**-M-A-M-J-J-A-S-**O-N-D J-F**-M-A-M-J-J-A-S-**O-N-D**
This is a sample data frame to work on:
df <- structure(list(date = structure(c(-3653, -3622, -3593, -3562,
-3532, -3501, -3471, -3440, -3409, -3379, -3348, -3318, -3287,
-3256, -3228, -3197, -3167, -3136, -3106, -3075, -3044, -3014,
-2983, -2953, -2922, -2891, -2863, -2832, -2802, -2771, -2741,
-2710, -2679, -2649, -2618, -2588, -2557, -2526, -2498, -2467,
-2437, -2406, -2376, -2345, -2314, -2284, -2253, -2223, -2192,
-2161, -2132, -2101, -2071, -2040, -2010, -1979, -1948, -1918,
-1887, -1857, -1826, -1795, -1767, -1736, -1706, -1675, -1645,
-1614, -1583, -1553, -1522, -1492, -1461, -1430, -1402, -1371,
-1341, -1310, -1280, -1249, -1218, -1188, -1157, -1127, -1096,
-1065, -1037, -1006, -976, -945, -915, -884, -853, -823, -792,
-762, -731, -700, -671, -640, -610, -579, -549, -518, -487, -457,
-426, -396, -365, -334, -306, -275, -245, -214, -184, -153, -122,
-92, -61, -31, 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304,
334, 365, 396, 424, 455, 485, 516, 546, 577, 608, 638, 669, 699,
730, 761, 790, 821, 851, 882, 912, 943, 974, 1004, 1035, 1065,
1096, 1127, 1155, 1186, 1216, 1247, 1277, 1308, 1339, 1369, 1400,
1430, 1461, 1492, 1520, 1551, 1581, 1612, 1642, 1673, 1704, 1734,
1765, 1795, 1826, 1857, 1885, 1916, 1946, 1977, 2007, 2038, 2069,
2099, 2130, 2160, 2191, 2222, 2251, 2282, 2312, 2343, 2373, 2404,
2435, 2465, 2496, 2526, 2557, 2588, 2616, 2647, 2677, 2708, 2738,
2769, 2800, 2830, 2861, 2891, 2922, 2953, 2981, 3012, 3042, 3073,
3103, 3134, 3165, 3195, 3226, 3256, 3287, 3318, 3346, 3377, 3407,
3438, 3468, 3499, 3530, 3560, 3591, 3621, 3652, 3683, 3712, 3743,
3773, 3804, 3834, 3865, 3896, 3926, 3957, 3987, 4018, 4049, 4077,
4108, 4138, 4169, 4199, 4230, 4261, 4291, 4322, 4352, 4383, 4414,
4442, 4473, 4503, 4534, 4564, 4595, 4626, 4656, 4687, 4717, 4748,
4779, 4807, 4838, 4868, 4899, 4929, 4960, 4991, 5021, 5052, 5082,
5113, 5144, 5173, 5204, 5234, 5265, 5295, 5326, 5357, 5387, 5418,
5448, 5479, 5510, 5538, 5569, 5599, 5630, 5660, 5691, 5722, 5752,
5783, 5813, 5844, 5875, 5903, 5934, 5964, 5995, 6025, 6056, 6087,
6117, 6148, 6178, 6209, 6240, 6268, 6299, 6329, 6360, 6390, 6421,
6452, 6482, 6513, 6543, 6574, 6605, 6634, 6665, 6695, 6726, 6756,
6787, 6818, 6848, 6879, 6909, 6940, 6971, 6999, 7030, 7060, 7091,
7121, 7152, 7183, 7213, 7244, 7274), class = "Date"), temp = c(22.9223529411765,
23.0705882352941, 23.1094117647059, 20.7835294117647, 17.4517647058824,
17.3176470588235, 18.0494117647059, 19.6188235294118, 21.3023529411765,
23.1105882352941, 22.2364705882353, 22.7482352941176, 23.5870588235294,
24.0023529411765, 23.0094117647059, 22.0176470588235, 19.4917647058824,
18.1011764705882, 18.3164705882353, 20.0623529411765, 22.8717647058824,
23.2576470588235, 23.68, 22.3694117647059, 22.9517647058824,
23.6976470588235, 23.3294117647059, 20.8564705882353, 18.16,
15.8988235294118, 15.7988235294118, 18.4176470588235, 20.8423529411765,
20.3247058823529, 22.3070588235294, 22.2035294117647, 24.2235294117647,
23.6976470588235, 24.4082352941176, 21.1752941176471, 18.1023529411765,
16.1211764705882, 18.3164705882353, 19.7635294117647, 23.1294117647059,
22.9964705882353, 23.6552941176471, 22.6964705882353, 23.6011764705882,
23.6517647058824, 23.7035294117647, 22.4352941176471, 18.5835294117647,
16.5976470588235, 15.7741176470588, 19.2541176470588, 20.8776470588235,
20.5729411764706, 21.1729411764706, 21.5870588235294, 22.4576470588235,
23.6058823529412, 21.84, 21.6694117647059, 19.2458823529412,
18.7517647058824, 17.7811764705882, 19.4764705882353, 21.9270588235294,
21.5470588235294, 22.88, 23.2458823529412, 24.2776470588235,
25.2470588235294, 23.4694117647059, 21.4435294117647, 19.3941176470588,
18.5447058823529, 17.6, 18.3764705882353, 19.8529411764706, 22.0823529411765,
22.7294117647059, 23.4011764705882, 23.3611764705882, 24.2505882352941,
23.2870588235294, 21.9482352941176, 20.5552941176471, 18.0788235294118,
18.5929411764706, 20.8752941176471, 21.9023529411765, 23.6105882352941,
22.4070588235294, 21.5635294117647, 23.3129411764706, 22.9741176470588,
23.3670588235294, 19.6105882352941, 16.9941176470588, 17.7670588235294,
17.4858823529412, 17.8517647058824, 20.26, 22.1576470588235,
23.8364705882353, 23.4447058823529, 24.8129411764706, 25.1764705882353,
24.2694117647059, 21.5035294117647, 20.0458823529412, 18.4694117647059,
18.4541176470588, 19.5388235294118, 22.02, 20.5364705882353,
22.9858823529412, 21.9752941176471, 23.7729411764706, 24.0576470588235,
24.0941176470588, 22.1552941176471, 21.2329411764706, 19.5611764705882,
17.8788235294118, 18.6823529411765, 20.1541176470588, 21.6258823529412,
21.5211764705882, 23.9811764705882, 24.8352941176471, 24.5882352941176,
24.1729411764706, 21.1035294117647, 19.0435294117647, 17.08,
17.4529411764706, 19.1458823529412, 20.4447058823529, 20.7129411764706,
21.5047058823529, 22.6952941176471, 23.4364705882353, 23.1, 24.1847058823529,
19.8105882352941, 19.9847058823529, 20.5188235294118, 17.7658823529412,
19.4435294117647, 20.7588235294118, 21.7835294117647, 22.7788235294118,
23.2388235294118, 24.9129411764706, 25.6, 23.5647058823529, 24.0058823529412,
19.7823529411765, 19.3152941176471, 18.7741176470588, 19.0305882352941,
20.5576470588235, 21.3611764705882, 21.4247058823529, 23.4811764705882,
23.6505882352941, 25.1870588235294, 23.3541176470588, 21.4823529411765,
18.7364705882353, 17.7235294117647, 18.3976470588235, 19.7235294117647,
21.0741176470588, 21.6094117647059, 22.9635294117647, 22.4011764705882,
23.4152941176471, 24.7741176470588, 24.3270588235294, 20.7976470588235,
18.8764705882353, 17.7788235294118, 16.4129411764706, 21.4117647058824,
22.3317647058824, 21.66, 22.3694117647059, 23.0917647058824,
24.4541176470588, 23.2847058823529, 23.3164705882353, 21.2529411764706,
19.1258823529412, 17.3882352941176, 17.3823529411765, 19.0529411764706,
19.6576470588235, 20.2976470588235, 21.9023529411765, 23.3094117647059,
24.0117647058824, 25.5611764705882, 24.9129411764706, 21.3964705882353,
19.9870588235294, 18.3929411764706, 20.9917647058824, 20.3058823529412,
21.4435294117647, 23.1941176470588, 22.8388235294118, 22.5176470588235,
24.6317647058824, 24.6541176470588, 24.2, 20.84, 18.4576470588235,
17.5011764705882, 19.16, 20.54, 20.1517647058824, 22.6776470588235,
22.7470588235294, 22.7882352941176, 22.0811764705882, 24.2152941176471,
22.9235294117647, 20.8411764705882, 19.6188235294118, 17.16,
16.0529411764706, 20.3223529411765, 19.9752941176471, 22.5152941176471,
22.2705882352941, 23.1541176470588, 23.1047058823529, 23.9517647058824,
24.8176470588235, 22.18, 20.5023529411765, 17.3505882352941,
19.1917647058824, 19.9894117647059, 19.0235294117647, 22.8235294117647,
22.7094117647059, 23.8741176470588, 24.0517647058824, 25.1764705882353,
23.9235294117647, 21.2929411764706, 20.6117647058824, 17.1305882352941,
16.3470588235294, 19.6470588235294, 21.3341176470588, 20.2176470588235,
23.7435294117647, 22.6741176470588, 22.9070588235294, 24.7152941176471,
23.2905882352941, 20.5776470588235, 18.9635294117647, 19.0658823529412,
18.8423529411765, 20.0729411764706, 21.3047058823529, 22.1588235294118,
24.0388235294118, 22.1917647058824, 24.0517647058824, 24.8729411764706,
23.0117647058824, 23, 21.3094117647059, 19.4105882352941, 20.3470588235294,
19.4482352941176, 20.0670588235294, 21.6364705882353, 23.4211764705882,
23.16, 25.4788235294118, 26.4741176470588, 24.0482352941176,
21.4176470588235, 21.7164705882353, 19.0905882352941, 19.6752941176471,
18.1611764705882, 20.0482352941176, 23.4917647058824, 23.4894117647059,
22.5482352941176, 23.1376470588235, 24.9811764705882, 24.1552941176471,
22.8423529411765, 19.7435294117647, 16.4, 17.3105882352941, 20.5235294117647,
21.0494117647059, 23.1352941176471, 23.9435294117647, 23.9058823529412,
24.9835294117647, 24.6952941176471, 24.0047058823529, 23.3164705882353,
21.5823529411765, 18.3447058823529, 18.1964705882353, 20.0035294117647,
20.7152941176471, 22.5705882352941, 24.6541176470588, 23.2329411764706,
25.0517647058824, 24.3329411764706, 23.5811764705882, 22.9988235294118,
19.4976470588235, 17.3188235294118, 19.5635294117647, 19.0211764705882,
19.7223529411765, 22.6858823529412, 23.9423529411765, 23.6905882352941,
25.7129411764706, 23.9505882352941, 24.4376470588235, 22.6070588235294,
19.8882352941176, 17.2058823529412, 16.4211764705882, 20.02,
21.9458823529412, 21.9341176470588, 22.74, 23.8, 23.9611764705882,
24.4564705882353, 24, 23.2129411764706, 19.4729411764706, 17.7105882352941,
16.9682352941176, 19.0341176470588, 20.2917647058824, 20.7776470588235,
22.9364705882353, 22.7894117647059)), .Names = c("date", "temp"
), row.names = c(NA, -360L), class = "data.frame")
Any input appreciated.
Hopefully this helps:
df$date = as.POSIXct(df$date,format="%Y-%m-%d")
df$year = as.numeric(format(df$date,format="%Y"))
df$month = as.numeric(format(df$date,format="%m"))
years = unique(df$year)
# initialize a new data frame to store in your summed values
newdf=NULL
# run through a loop starting at your second year and ending at second last
for(i in 2:(length(years)-1)){
#data from year1
start = df[df$year==years[i] & df$month %in% c(10,11,12),]
end = df[df$year==years[i+1] & df$month %in% c(1,2),]
data1 = rbind(start,end)
# in case you have NAs in your data you can add ra.rm = T
sum.data = sum(data1$temp,na.rm = T)
df1 = as.data.frame(list(Year = years[i],
sum.data = sum.data))
# or paste year 1 and year 2 together
#df1 = as.data.frame(list(Year = paste(years[i],years[i+1],sep="-"),
# sum.data = sum.data))
newdf = rbind(newdf,df1)
}
head(newdf)

Time Series based Forecasting for Daily Data but Seasonality is Quarterly - in R

I have demand for a product on daily bases for last 4 years. This demand has quarterly seasonal patterns, as shown in following image
I would like to do time series based forecasting on this data. Following is my code
myts = ts(forecastsku1$Value,frequency=90)
fit <- stl(myts, s.window="period")
plot(fit)
fit <- decompose(myts)
plot(fit)
Here instead of 4 seasonal factor ts is creating 90 seasonal factor, which is not what I want. I want to apply same seasonality on 3 month duration and then do forecasting.
Data for reference
dput(head(forecastsku1,100))
structure(list(date = structure(c(14625, 14626, 14627, 14628, 14629, 14630, 14631, 14632, 14633, 14634, 14635, 14636, 14637,
14638, 14639, 14640, 14641, 14642, 14643, 14644, 14645, 14646, 14647, 14648, 14649, 14650, 14651, 14652, 14653, 14654, 14655,
14656, 14657, 14658, 14659, 14660, 14661, 14662, 14663, 14664, 14665, 14666, 14667, 14668, 14669, 14670, 14671, 14672, 14673,
14674, 14675, 14676, 14677, 14678, 14679, 14680, 14681, 14682, 14683, 14684, 14685, 14686, 14687, 14688, 14689, 14690, 14691,
14692, 14693, 14694, 14695, 14696, 14697, 14698, 14699, 14700, 14701, 14702, 14703, 14704, 14705, 14706, 14707, 14708, 14709,
14710, 14711, 14712, 14713, 14714, 14715, 14716, 14717, 14718, 14719, 14720, 14721, 14722, 14723, 14724), class = "Date"),
Value = c(1407, 1413, 1407, 1406, 1401, 1410, 1411, 1416, 1404, 1409, 1414, 1414, 1400, 1421, 1398, 1404, 1397, 1404, 1407, 1409, 1406, 1395, 1397,
1403, 1412, 1399, 1409, 1393, 1405, 1403, 1406, 1402, 1405, 1386, 1393, 1405, 1397, 1393, 1402, 1402, 1393, 1391, 1410, 1402, 1408,
1394, 1404, 1398, 1406, 1389, 1401, 1391, 1394, 1384, 1377, 1390, 1395, 1399, 1384, 1397, 1398, 1384, 1377, 1394, 1398, 1394, 1391,
1403, 1382, 1390, 1385, 1403, 1390, 1388, 1391, 1384, 1392, 1390, 1381, 1387, 1395, 1390, 1388, 1384, 1387, 1395, 1380, 1378, 1383,
1384, 1232, 1247, 1232, 1248, 1236, 1236, 1231, 1237, 1224, 1236)),
.Names = c("date", "Value"), row.names = 13150:13249, class = "data.frame")
Can anyone help me in this case? Please let me know if more data required.
myts = ts(forecastsku1$Value,frequency=4)
fit <- decompose(myts)
plot(fit)
Result would be:
It is creating a 90 seasonal factor because your frequency is 90 in the ts definition. What you need to do is to specify a start and end in the ts and the period=4 so that the observations can be segregated the way you want them to be.. if you can successfully create a 4 seasonal factor, you can obviousy predict quarterly (4*3=12) . So instead of these dates I think it is more clear to have like start=c(2005,1) .Hopefully this is useful
this is an old question, but still, maybe my answer is of some value.
You can seasonally adjust daily data using the dsa package (disclaimer: I'm the author).
I tried to replicate your time series (or something similar) to give you an idea of how to seasonally adjust them (the setting of the seasonal adjustment try to help modelling the jumping behaviour of the time series appropriately):
# loading packages
library(dsa); library(xts)
# Replication of the data
set.seed(23)
data <- seq(1250, 1000, , length.out=365.25*4) + rnorm(365.25*4, 0, 5)
time <- seq(as.Date("2008-01-01"), by="days", length.out=365.25*4)
x <- xts(data, time)
ind <- as.numeric(format(zoo::index(x), "%m")) # Indicator of day of year
x[ind==1 | ind==2 | ind==3 | ind==7 | ind==8 | ind==9] <-
x[ind==1 | ind==2 | ind==3 | ind==7 | ind==8 | ind==9] + 200
# Seasonally adjusting the data
result <- dsa(x, fourier_number=40, reiterate3=4, reg.create=NULL, cval=30)
sa <- result$output[,1]
xtsplot(result$output[,c(2,1)], names=c("original", "seasonally adjusted"))
output(result) # creates a html in your working directory.

Resources