How to plot two groups of values? - r

These are my sets of four mean values:
meanf1hindi = c(253, 297, 377, 426, 476, 518, 560, 620, 657, 697)
meanf2hindi = c(850, 887, 1017, 1080, 1197, 1342, 1694, 1820, 2265)
meanf1tamil = c(260, 304, 390, 435, 483, 527, 563, 628, 670, 704)
meanf2tamil = c(891, 826, 1018, 1068, 1188, 1355, 1709, 1834, 1976, 2303)
I would like to make a linear graph of meanf1hindi and meanf2hindi together, and do the same with meanf1tamil and meanf2tamil.
This is what I did so far, and don't know how to proceed further:
plot(meanf1hindi, meanf2hindi)
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ

You get the error because the length differs for your vectors. What you can do is make the two vectors' length the same by removing one value for the longer vector in this case remove one value of meanf1hindi by doing this:
> length(meanf1hindi)
[1] 10
> length(meanf2hindi)
[1] 9
plot(meanf1hindi[-1], meanf2hindi)
Output:

Related

Interpolate with splines without surpassing next value R

I have a dataset of accumulated data. I am trying to interpolate some missing values but at some points I get a superior value. This is an example of my data:
dat <- tibble(day=c(1:30),
value=c(278, 278, 278, NA, NA, 302, 316, NA, 335, 359, NA, NA,
383, 403, 419, 419, 444, NA, NA, 444, 464, 487, 487, 487,
NA, NA, 487, 487, 487, 487))
My dataset is quite long and when I use smooth.spline to interpolate the missing values I get a value greater than the next observation, which is quite aabsurd considering I am dealing with accumulated data. This is the output I get:
value.smspl <- c(278, 278, 278, 287.7574, 295.2348, 302, 316, 326.5689, 335,
359, 364.7916, 377.3012, 383, 403, 419, 419, 444, 439.765, 447.1823,
444, 464, 487, 487, 487, 521.6235, 526.3715, 487, 487, 487, 487)
My question is: can you somehow set boundaries for the interpolation so the result is reliable? If so, how could you do it?
You have monotonic data for interpolation. We can use "hyman" method in spline():
x <- dat$day
yi <- y <- dat$value
naInd <- is.na(y)
yi[naInd] <- spline(x[!naInd], y[!naInd], xout = x[naInd], method = "hyman")$y
plot(x, y, pch = 19) ## non-NA data (black)
points(x[naInd], yi[naInd], pch = 19, col = 2) ## interpolation at NA (red)
Package zoo has a number of functions to fill NA values, one of which is na.spline. So as G. Grothendieck (a wizard for time series) suggests, the following does the same:
library(zoo)
library(dplyr)
dat %>% mutate(value.interp = na.spline(value, method = "hyman"))

How to smooth data of increasing noise

Chemist here (so not very good with statistical analysis) and novice in R:
I have various sets of data where the yield of a reaction is monitored with time such as:
The data:
df <- structure(list(time = c(15, 30, 45, 60, 75, 90, 105, 120, 135,
150, 165, 180, 195, 210, 225, 240, 255, 270, 285, 300, 315, 330,
345, 360, 375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525,
540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720,
735, 750, 765, 780, 795, 810, 825, 840, 855, 870, 885, 900, 915,
930, 945, 960, 975, 990, 1005, 1020, 1035, 1050, 1065, 1080,
1095, 1110, 1125, 1140, 1155, 1170, 1185, 1200, 1215, 1230, 1245,
1260, 1275, 1290, 1305, 1320, 1335, 1350, 1365, 1380, 1395, 1410,
1425, 1440, 1455, 1470, 1485, 1500, 1515, 1530, 1545, 1560, 1575,
1590, 1605, 1620, 1635, 1650, 1665, 1680, 1695, 1710, 1725, 1740,
1755, 1770, 1785, 1800, 1815, 1830, 1845, 1860, 1875, 1890, 1905,
1920, 1935, 1950, 1965, 1980, 1995, 2010, 2025, 2040, 2055, 2070,
2085, 2100, 2115, 2130), yield = c(9.3411, 9.32582, 10.5475,
13.5358, 17.3376, 16.7444, 20.7234, 19.8374, 24.327, 27.4162,
27.38, 31.3926, 29.3289, 32.2556, 33.0025, 35.3358, 35.8986,
40.1859, 40.3886, 42.2828, 41.23, 43.8108, 43.9391, 43.9543,
48.0524, 47.8295, 48.674, 48.2456, 50.2641, 50.7147, 49.6828,
52.8877, 51.7906, 57.2553, 53.6175, 57.0186, 57.6598, 56.4049,
57.1446, 58.5464, 60.7213, 61.0584, 57.7481, 59.9151, 64.475,
61.2322, 63.5167, 64.6289, 64.4245, 62.0048, 65.5821, 65.8275,
65.7584, 68.0523, 65.4874, 68.401, 68.1503, 67.8713, 69.5478,
69.9774, 73.4199, 66.7266, 70.4732, 67.5119, 69.6107, 70.4911,
72.7592, 69.3821, 72.049, 70.2548, 71.6336, 70.6215, 70.8611,
72.0337, 72.2842, 76.0792, 75.2526, 72.7016, 73.6547, 75.6202,
76.5013, 74.2459, 76.033, 78.4803, 76.3058, 73.837, 74.795, 76.2126,
75.1816, 75.3594, 79.9158, 77.8157, 77.8152, 75.3712, 78.3249,
79.1198, 77.6184, 78.1244, 78.1741, 77.9305, 79.7576, 78.0261,
79.8136, 75.5314, 80.2177, 79.786, 81.078, 78.4183, 80.8013,
79.3855, 81.5268, 78.416, 78.9021, 79.9394, 80.8221, 81.241,
80.6111, 79.7504, 81.6001, 80.7021, 81.1008, 82.843, 82.2716,
83.024, 81.0381, 80.0248, 85.1418, 83.1229, 83.3334, 83.2149,
84.836, 79.5156, 81.909, 81.1477, 85.1715, 83.7502, 83.8336,
83.7595, 86.0062, 84.9572, 86.6709, 84.4124)), .Names = c("time",
"yield"), row.names = c(NA, -142L), class = "data.frame")
What i want to do to the data:
I need to smooth the data in order to plot the 1st derivative. In the paper the author mentioned that one can fit a high order polynomial and use that to do the processing which i think is wrong since we dont really know the true relationship between time and yield for the data and is definitely not polyonymic. I tried regardless and the plot of the derivative did not make any chemical sense as expected. Next i looked into loess using: loes<-loess(Yield~Time,data=df,span=0.9) which gave a much better fit. However, the best results so far was using :
spl <- smooth.spline(df$Time, y=df$Yield,cv=TRUE)
colnames(predspl)<-c('Time','Yield')
pred.der<-as.data.frame(predict(spl, deriv=1))
colnames(pred.der)<-c('Time', 'Yield')
which gave the best fit especially in the initial data points (by visual inspection).
The problem i have:
The issue however is that the derivative looks really good only up to t=500s and then it starts wiggling more and more towards the end. This shouldnt happen from a chemistry point of view and it is just a result of overfitting towards the end of the data due to the increase of the noise. I know this since for some experiments that i have performed 3 times and averaged the data (so the noise decreased) the wiggling is much smaller in the plot of the derivative.
What i have tried so far:
I tried different values of spar which although it smoothens correctly the later data it causes a poor fit in the initial data (which are the most important). I also tried to reduce the number of knots but i got a similar result with the one from changing the spar value. What i think i need is to have a larger amount of knots in the begining which will smoothly decrease to a small number of knots towards the end to avoid that overfitting.
The question:
Is my reasoning correct here? Does anyone know how can i have the above effect in order to get a smooth derivative without any wiggling? Do i need to try a different fit other than the spline maybe? I have attached a pic in the end where you can see the derivative from the smooth.spline vs time and a black line (drawn by hand) of what it should look like. Thank you for your help in advance.
I think you're on the right track on having more closely spaced knots for the spline at the start of the curve. You can specify knot locations for smooth.spline using all.knots (at least on R >= 3.4.3; I skimmed the release notes for R, but couldn't pinpoint the version where this became available).
Below is an example, and the resulting, smoother fit for the derivative after some manual work of trying out different knot positions:
with(df, {
kn <- c(0, c(50, 100, 200, 350, 500, 1500) / max(time), 1)
s <- smooth.spline(time, yield, cv = T)
s2 <- smooth.spline(time, yield, all.knots = kn)
ds <- predict(s, d = 1)
ds2 <- predict(s2, d = 1)
np <- list(mfrow = c(2, 1), mar = c(4, 4, 1, 2))
withr::with_par(np, {
plot(time, yield)
lines(s)
lines(s2, lty = 2, col = 'red')
plot(ds, type = 'l', ylim = c(0, 0.15))
lines(ds2, lty = 2, col = 'red')
})
})
You can probably fine tune the locations further, but I wouldn't be too concerned about it. The primary fits are already near enough indistinguishable, and I'd say you're asking quite a lot from these data in terms of identifying details about the derivative (this should be evident if you plot(time[-1], diff(yield) / diff(time)) which gives you an impression about the level of information your data carry about the derivative).
Created on 2018-02-15 by the reprex package (v0.2.0).

Yearly seasonal sums for DJF

I want to create sums for the meteorological nomenclature of DJF, that means December values are from the year x-1.
There is already a suggestion, using the packages seas and zoo for my kind of problem: Link to the reference. Can I use a loop regarding the time index of my zoo-object, to get the winter sums for each year and different columns? There are already only the winter months in my sample data:
structure(c(0.335767631885527, 0.329964137686826, 0.324867678295622,
0.346234032749876, 0.315486588076342, 0.373440783616547, 0.393108355980974,
0.310526442402042, 0.955068399718777, 0.959654624426492, 0.293930575800507,
0.350949140946517, 0.657761387039141, 0.53822087533681, 0.296938223280703,
0.318325593619261, 0.827528522109129, 0.914084376992577, 0.914209302937996,
0.913163846516007, 0.776698687524975, 0.597284692104539, 0.91488961230643,
0.28945161773974, 0.282895617679457, 0.28492139335934, 0.928492227792593,
0.287740157404564, 0.93011080075256, 0.32787462005944, 0.809245564874419,
0.299095322129539, 0.302473955104931, 0.453458703894119, 0.331724139938735,
0.314265997270211, 0.378968117507553, 0.344955599135117, 0.961200295699775,
1.07300929383762, 0.339365254133058, 0.421999171190298, 0.351276824906379,
0.36810350819186, 0.364237601690115, 0.425751222495895, 1.2000504740503,
0.401585883450189, 0.393244206959102, 0.412013522316855, 1.40622761554481,
1.43010692801434, 1.45452312391606, 1.44102848262452, 0.583854512560274,
0.453530324821785, 0.836929179095723, 0.485649439571136, 1.45323622566975,
1.42066532567401, 1.55192692063172, 1.69545734226667, 1.59084952877426,
0.536277991651981, 0.878100994910164, 1.80588869793109, 0.612726668114702,
1.49557275883036, 1.83080789724595, 0.859368961826519, 1.3537163175202,
0.795003445956722, 1.68510799767645, 1.94219078558463, 0.678911636490617,
1.98538116097216, 1.39431924099171, 0.716178198907659, 0.897864731079577,
0.739754008960108, 1.32647638785145, 1.27550346512974, 1.57782298324095,
1.17541538713537, 1.08141388070016, 2.81373485339402, 0.841584582588819,
2.98872530454666, 1.93484656658214, 3.01625884992721, 0.902448663673698,
0.361944635028181, 1.03795562218241, 0.961881521906292, 0.704732279822006,
0.894256898010956, 0.307197052425753, 0.620230669033494, 0.900835004143219,
0.336503062729966, 0.376726235662507, 0.323019953443342, 0.291097473211189,
0.583926906347703, 0.540940525007957, 0.906358816314195, 0.372788957369332,
0.335375002309946, 0.914209302937996, 0.328320596067713, 0.659589829678685,
0.68859386616471, 0.91488961230643, 0.902977019532625, 0.739324647975471,
0.603576498397486, 0.690375139214112, 0.603004583921208, 0.659868379563069,
0.292376232645021, 0.562401086780579, 0.298131207627614, 0.299095322129539,
0.302473955104931, 0.705840069893102, 0.993644273952054, 0.425326528868129,
0.400345928302124, 0.361221494378293, 0.328750601711733, 0.55820945179875,
0.748093576785292, 0.345188978576, 0.351315165819748, 0.357626992140137,
0.517538802067647, 1.04751086637289, 0.385695811626645, 0.385612146149294,
0.397271280188057, 0.550298801906058, 1.28131889629393, 0.82396230266283,
1.03189532043667, 0.502923809446499, 1.13388533378536, 0.821249922028902,
0.496130920693478, 0.491056299113018, 0.861144623672965, 0.498763665924562,
0.912165347541201, 0.64869230436972, 1.32528603957948, 1.75339437114229,
1.78285803283739, 1.11217610098546, 0.597795159831033, 1.00740416004752,
0.739549658487185, 0.607139331936484, 1.35734916834937, 1.43608105985186,
1.80042779869959, 1.18905308118327, 1.70456429994882, 0.905541925940458,
2.22398340066076, 2.16944665030202, 2.29546486372867, 1.85605245367111,
1.1239234690604, 2.50480944519147, 1.02954245959557, 0.975126362552554,
2.14223132835323, 2.91282474285556, 2.66863827732602, 0.933593864631134,
2.70815814163342, 2.87351062547491, 0.335329222971355, 0.934907402460015,
0.57591904762801, 0.907224647738403, 0.320417497402957, 0.766767831651282,
0.861903342837008, 0.303464733511709, 0.709698376015027, 0.308598232977547,
0.293930575800507, 0.29130992351097, 0.28896933229556, 0.45769807141885,
0.468340431926149, 0.830040974016766, 0.282420179745874, 0.477428977916008,
0.733418492651481, 0.822348309121175, 0.280392410026905, 0.542239475756514,
0.281077879631808, 0.281845318148658, 0.42849080424256, 0.295089908538224,
0.747925637213591, 0.929814463524078, 0.310954657683433, 0.292376232645021,
0.64500798819687, 0.690255336889303, 0.364309565584761, 0.306129346468766,
0.311371964852598, 0.915461004824963, 0.397063771122394, 1.0404933625801,
0.483845551843616, 0.333807374425717, 0.402255447456447, 0.453946781602374,
0.394538152500142, 0.357626992140137, 0.364237601690115, 0.372020526598045,
0.37823224873185, 0.389581791596903, 0.393244206959102, 0.401126173348066,
0.563948059226945, 0.625538021242673, 0.80823517471131, 0.440809452269821,
0.753920921570439, 0.571583127323145, 0.463092290982252, 0.576935449307388,
0.482901053437729, 1.40965077473646, 1.25183016539419, 0.856169846501004,
1.72377824975207, 0.536277991651981, 1.13652692119597, 1.24290457699823,
1.64437171023011, 1.87302947654355, 0.594841647571458, 2.04410190051534,
1.62571002130845, 1.13052139459963, 0.836130011762252, 1.85233449007414,
2.38839794838805, 1.09920265799031, 1.94766079436355, 1.66770758466983,
1.27453119791191, 2.57917818578189, 1.13896219096471, 2.74804359878488,
1.69823856330245, 0.935150681359782, 1.74656095016161, 0.835168244061429,
0.841584582588819, 0.856635868155615, 0.972724285567558, 2.42939239419398,
0.96325679668782, 0.640892567004161, 1.03795562218241, 0.949309568900219,
0.316910844084317, 0.311204732481577, 0.307197052425753, 0.303464733511709,
0.779574150582344, 0.296830513889512, 0.335960010735195, 0.4390886067335,
0.28896933229556, 0.306902835898889, 0.926150657204963, 0.388532344331494,
0.495283643343666, 0.916064063737401, 0.281013296117892, 0.913163846516007,
0.912928724576721, 0.438926937515807, 0.59117658733228, 0.517844090756594,
0.704234100156676, 0.913848110190877, 0.423829975580762, 0.795497269555325,
0.289917958593354, 0.292376232645021, 0.295114252321699, 0.345353147959634,
0.854886103409894, 0.62965115658928, 0.776701146370991, 0.446059142229343,
0.326457042618417, 0.568752212327844, 0.325374322793979, 0.374762702815228,
0.333807374425717, 0.420206697512664, 0.399408381034396, 0.456977698650331,
0.357626992140137, 0.596680957599271, 1.29550961397828, 1.24265117031916,
0.580164026815441, 0.393244206959102, 0.401126173348066, 0.443462006528755,
0.417630422649225, 0.426247823064678, 0.505363855323395, 0.494595916530596,
1.12922054709106, 0.482617341273223, 0.650774092876326, 0.5452273225038,
1.61305811763483, 1.66701808699342, 0.514281824935098, 0.525174470147384,
1.6850349371761, 1.78354241230912, 1.83460579403794, 1.86582069105335,
1.40279004365455, 0.594841647571458, 0.691585610303159, 0.619623644706909,
2.06846657922012, 0.710726446010795, 0.997307890433014, 2.40064963745822,
2.22161516025196, 1.79188547652641, 2.19553900228869, 2.1816869110449,
2.1984531582332, 2.55364304827728, 0.918827215513173, 0.930267750935017,
0.798812034349413, 0.830829315142733, 1.13089106389005, 1.00204606351463,
1.07126361979325, 0.871799972892206, 1.28166129954517), .Dim = c(181L,
2L), .Dimnames = list(NULL, NULL), index = structure(c(699, 700,
701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713,
714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726,
727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739,
740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752,
753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765,
766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778,
779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 1065,
1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076,
1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1086, 1087,
1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096, 1097, 1098,
1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1107, 1108, 1109,
1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120,
1121, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131,
1132, 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140, 1141, 1142,
1143, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153,
1154), class = "Date"), class = "zoo")
library(hydroTSM)
dm2seasonal(df, FUN=sum, season="DJF")
I've used the package hydroTSM. The package can also be used for other seasons (MAM, JJA, SON) and with other functions (e.g. mean). You can compute the yearly seasonal sums for every column in your matrix (df). seas does the same for every column, but you have to write your own loop to get yearly seasonal sums I guess. mkseas() from seas will compute the sum over all winter months in your timeseries.

R, ggplot2: Fit curve to scatter plot

I am trying to fit curves to the following scatter plot with ggplot2.
I found the geom_smooth function, but trying different methods and spans, I never seem to get the curves right...
This is my scatter plot:
And this is my best attempt:
Can anyone get better curves that fit correctly and don't look so wiggly? Thanks!
Find a MWE below:
my.df <- data.frame(sample=paste("samp",1:60,sep=""),
reads=c(523, 536, 1046, 1071, 2092, 2142, 4184, 4283, 8367, 8566, 16734, 17132, 33467, 34264, 66934, 68528, 133867, 137056, 267733, 274112, 409, 439, 818, 877, 1635, 1754, 3269, 3508, 6538, 7015, 13075, 14030, 26149, 28060, 52297, 56120, 104594, 112240, 209188, 224479, 374, 463, 748, 925, 1496, 1850, 2991, 3699, 5982, 7397, 11963, 14794, 23925, 29587, 47850, 59174, 95699, 118347, 191397, 236694),
number=c(17, 14, 51, 45, 136, 130, 326, 333, 742, 738, 1637, 1654, 3472, 3619, 7035, 7444, 13133, 13713, 21167, 21535, 11, 22, 30, 44, 108, 137, 292, 349, 739, 853, 1605, 1832, 3099, 3565, 5287, 5910, 7832, 8583, 10429, 11240, 21, 43, 82, 124, 208, 296, 421, 568, 753, 908, 1127, 1281, 1448, 1608, 1723, 1854, 1964, 2064, 2156, 2259),
condition=rep(paste("cond",1:3,sep=""), each=20))
png(filename="TEST1.png", height=800, width=1000)
print(#or ggsave()
ggplot(data=my.df, aes(x=reads, y=log2(number+1), group=condition, color=condition)) +
geom_point()
)
dev.off()
png(filename="TEST2.png", height=800, width=1000)
print(#or ggsave()
ggplot(data=my.df, aes(x=reads, y=log2(number+1), group=condition, color=condition)) +
geom_point() +
geom_smooth(se=FALSE, method="loess", span=0.5)
)
dev.off()
This is a very broad question, as you're effectively looking for a model with less variance (more bias), of which there are many. Here's one:
ggplot(data = my.df,
aes(x = reads, y = log2(number + 1), color = condition)) +
geom_point() +
geom_smooth(se = FALSE, method = "gam", formula = y ~ s(log(x)))
For documentation, see ?mgcv::gam or a suitable text on modeling. Depending on your use case, it may make more sense to make your model outside of ggplot.

Split time-series between any interval

I have a have time-series at 10 minutes duration. I want sub-series of duration between 23:10:00 - 00:00:00 hours. Here is the dput of data,
df<-structure(c(994, 1019, 1381, 843, 1105, 1120, 869, 2216, 1741,
1737, 1727, 1462, 1564, 418, 281, 280, 277, 311, 242, 221, 328,
359, 410, 436, 359, 1738, 2075, 1766, 1812, 1810, 1246, 323,
250, 272, 283, 286, 252, 1671, 1695, 1687, 1646, 1257, 1632,
277, 305, 292, 261, 309, 304, 209, 210, 225, 201, 197, 247, 264,
238, 260, 254, 263, 226, 624, 1955, 1561, 1231, 976, 1213, 167,
1037, 1269, 1619, 1749, 1674, 1123, 1695, 2164, 1780, 1732, 1715,
283, 230, 291, 281, 137, 1358, 1630, 1626, 1889, 1635, 1591,
1606, 2024, 1783, 1752, 613, 301, 933, 1823, 1831, 1810, 1895,
1876, 1222, 1952, 1288, 282, 261, 296, 839, 1831, 1799, 1950,
2085, 1921, 1862, 1885, 1869, 1909, 1896, 1843), .Dim = c(120L,
1L), .Dimnames = list(NULL, "value"), index = structure(c(1430764200,
1430847600, 1430848200, 1430848800, 1430849400, 1430850000, 1430850600,
1430934000, 1430934600, 1430935200, 1430935800, 1430936400, 1430937000,
1431020400, 1431021000, 1431021600, 1431022200, 1431022800, 1431023400,
1431106800, 1431107400, 1431108000, 1431108600, 1431109200, 1431109800,
1431193200, 1431193800, 1431194400, 1431195000, 1431195600, 1431196200,
1431279600, 1431280200, 1431280800, 1431281400, 1431282000, 1431282600,
1431366000, 1431366600, 1431367200, 1431367800, 1431368400, 1431369000,
1431452400, 1431453000, 1431453600, 1431454200, 1431454800, 1431455400,
1431538800, 1431539400, 1431540000, 1431540600, 1431541200, 1431541800,
1431625200, 1431625800, 1431626400, 1431627000, 1431627600, 1431628200,
1431711600, 1431712200, 1431712800, 1431713400, 1431714000, 1431714600,
1431798000, 1431798600, 1431799200, 1431799800, 1431800400, 1431801000,
1431884400, 1431885000, 1431885600, 1431886200, 1431886800, 1431887400,
1431970800, 1431971400, 1431972000, 1431972600, 1431973200, 1431973800,
1432057200, 1432057800, 1432058400, 1432059000, 1432059600, 1432060200,
1432143600, 1432144200, 1432144800, 1432145400, 1432146000, 1432146600,
1432230000, 1432230600, 1432231200, 1432231800, 1432232400, 1432233000,
1432316400, 1432317000, 1432317600, 1432318200, 1432318800, 1432319400,
1432402800, 1432403400, 1432404000, 1432404600, 1432405200, 1432405800,
1432489200, 1432489800, 1432490400, 1432491000, 1432491600), tclass = c("POSIXct",
"POSIXt"), tzone = "Asia/Kolkata"), .indexCLASS = c("POSIXct",
"POSIXt"), .indexTZ = "Asia/Kolkata", tclass = c("POSIXct", "POSIXt"
), tzone = "Asia/Kolkata", class = c("xts", "zoo"))
Required output is:
Is there any existing function which can do this? I tried split.xts, but was not able to get required form.
You could use xts with only base R or use chained expressions with dplyr and tidyr. Base R's unstack and tidyr's spread both take two columns of data containing key-value pairs and arrange them as separate columns of values for each unique key value. Code would look like:
# base R version
library(xts)
df2 <- unstack(data.frame(value=coredata(df), time = format(index(df), "%H:%M")),
value ~ time)[,c(2:6,1)]
# version using chained expressions with dplyr and tidyr
library(xts)
library(dplyr)
library(tidyr)
df3 <- df %>% fortify.zoo() %>%
mutate(time=format(Index, "%H:%M"), Index=format(Index, "%Y-%m-%d") ) %>%
spread(key=time, value=value) %>%
select(c(3:6,2))

Resources