First off, StackOverFlow keeps saying there are answers already, but I've been looking for 2.5 hours now and nothing is available.
I'm attempting to view values from a dataframe with 940 rows. I would like to view the calories associated to the user IDs from the first and last dates of the trial.
Id ActivityDay Calories
1 1503960366 2016-04-12 1985
2 1624580081 2016-04-12 1432
3 1644430081 2016-04-12 3199
4 1844505072 2016-04-12 2030
5 1927972279 2016-04-12 2220
6 2022484408 2016-04-12 2390
7 2026352035 2016-04-12 1459
8 2320127002 2016-04-12 2124
9 2347167796 2016-04-12 2344
10 2873212765 2016-04-12 1982
11 3372868164 2016-04-12 1788
12 3977333714 2016-04-12 1450
13 4020332650 2016-04-12 3654
14 4057192912 2016-04-12 2286
15 4319703577 2016-04-12 2115
16 4388161847 2016-04-12 2955
17 4445114986 2016-04-12 2113
18 4558609924 2016-04-12 1909
19 4702921684 2016-04-12 2947
20 5553957443 2016-04-12 2026
21 5577150313 2016-04-12 3405
22 6117666160 2016-04-12 1496
23 6290855005 2016-04-12 2560
24 6775888955 2016-04-12 1841
25 6962181067 2016-04-12 1994
26 7007744171 2016-04-12 2937
27 7086361926 2016-04-12 2772
28 8053475328 2016-04-12 3186
29 8253242879 2016-04-12 2044
30 8378563200 2016-04-12 3635
31 8583815059 2016-04-12 2650
32 8792009665 2016-04-12 2044
33 8877689391 2016-04-12 3921
34 1503960366 2016-04-13 1797
35 1624580081 2016-04-13 1411
36 1644430081 2016-04-13 2902
37 1844505072 2016-04-13 1860
38 1927972279 2016-04-13 2151
39 2022484408 2016-04-13 2601
40 2026352035 2016-04-13 1521
41 2320127002 2016-04-13 2003
42 2347167796 2016-04-13 2038
43 2873212765 2016-04-13 2004
44 3372868164 2016-04-13 2093
45 3977333714 2016-04-13 1495
46 4020332650 2016-04-13 1981
47 4057192912 2016-04-13 2306
48 4319703577 2016-04-13 2135
49 4388161847 2016-04-13 3092
50 4445114986 2016-04-13 2095
51 4558609924 2016-04-13 1722
52 4702921684 2016-04-13 2898
This is the sample data...ommiting the other nearly 900 rows...
I want to keep only the date of 2016-04-12, AND 2016-05-12. That is the range of which the data was taken from. I'd like to see the IDs of the users, and their calories from those 2 dates only.
I've tried about 50 codes...here is where I'm at right now:
Daily_Calories %>%
group_by(Id, Calories) %>%
arrange(ActivityDay) %>%
as.data.frame()
I have not saved all the codes I've tried, as I'm new and RStudio gets messy and unorganized quickly...and then I get a bit lost.
I've also tried:
Daily_Calories %>%
group_by(Id, Calories) %>%
group_by(min(ActivityDay), max(ActivityDay)) %>%
arrange(ActivityDay) %>%
as.data.frame()
and got this:
Id ActivityDay Calories min(ActivityDay) max(ActivityDay)
1 1503960366 2016-04-12 1985 2016-04-12 2016-05-12
2 1624580081 2016-04-12 1432 2016-04-12 2016-05-12
3 1644430081 2016-04-12 3199 2016-04-12 2016-05-12
4 1844505072 2016-04-12 2030 2016-04-12 2016-05-12
5 1927972279 2016-04-12 2220 2016-04-12 2016-05-12
6 2022484408 2016-04-12 2390 2016-04-12 2016-05-12
7 2026352035 2016-04-12 1459 2016-04-12 2016-05-12
8 2320127002 2016-04-12 2124 2016-04-12 2016-05-12
9 2347167796 2016-04-12 2344 2016-04-12 2016-05-12
10 2873212765 2016-04-12 1982 2016-04-12 2016-05-12
11 3372868164 2016-04-12 1788 2016-04-12 2016-05-12
12 3977333714 2016-04-12 1450 2016-04-12 2016-05-12
and then tried this:
Daily_Calories %>%
group_by(Id, Calories) %>%
arrange(ActivityDay) %>%
summarise(min(ActivityDay), max(ActivityDay)) %>%
as.data.frame()
and got this:
Id Calories min(ActivityDay) max(ActivityDay)
1 1503960366 0 2016-05-12 2016-05-12
2 1503960366 1728 2016-04-17 2016-04-17
3 1503960366 1740 2016-05-08 2016-05-08
4 1503960366 1745 2016-04-15 2016-04-15
5 1503960366 1775 2016-04-21 2016-04-21
6 1503960366 1776 2016-04-14 2016-04-14
7 1503960366 1783 2016-05-11 2016-05-11
8 1503960366 1786 2016-04-20 2016-04-20
9 1503960366 1788 2016-04-24 2016-04-24
I'm not looking for the minimum and maximum calories, simply, the "minimum" and "maximum" dates...meaning, 2016-04-12, and 2016-05-12.
All three of these codes I just tried had 700+ rows omitted from the results, which signifies they are wrong. There are 33 users, and 2 dates, so there should be 66 rows for results.
I hope this is explained well enough, I'm trying to be better with my questions. I appreciate the time and help.
Almost forgot, I wasn't wanting to create a new dataframe, just see the results. That's why my code starts with just the dataframe. Does it make a difference? I'd prefer the results in the console for viewing. Cheers!
If I understand you correctly, you want to keep all observations in the data frame where ActivityDay is either 2016-04-12 or 2016-05-12, correct? Or do you want to view all values in the range between them?
If so, try:
keeps <- c("2016-04-12", "2016-05-12")
# Keep only those values
df[df$ActivityDay %in% keeps,]
# Keep value in range between
df[as.Date(df$ActivityDay) %in% seq(min(as.Date(keeps)), max(as.Date(keeps)),1),]
This will show values for the dates that you want.
I was unclear as to what your final data would look like - if I misunderstood, let me know and I will modify my answer. Good luck!
I have this mesurements from my temperature sensor that I put in a dataframe data.
Time Temperature
1 2012-06-28 12:49:00 23.04
2 2012-06-28 12:49:34 23.06
3 2012-06-28 12:49:38 23.06
4 2012-06-28 12:49:39 23.08
5 2012-06-28 12:49:40 23.08
6 2012-06-28 12:49:56 23.09
7 2012-06-28 13:49:00 23.02
8 2012-06-28 14:49:00 22.73
9 2012-06-28 15:49:00 22.50
10 2012-06-28 16:49:00 22.38
11 2012-06-28 17:49:00 22.31
12 2012-06-28 18:49:00 22.16
13 2012-06-28 19:49:00 22.11
14 2012-06-28 20:49:00 22.04
15 2012-06-28 21:49:00 21.89
16 2012-06-28 22:49:00 21.78
17 2012-06-28 23:49:00 21.66
18 2012-06-29 00:49:00 21.64
19 2012-06-29 01:49:00 21.52
20 2012-06-29 02:49:00 21.42
21 2012-06-29 03:49:00 21.36
22 2012-06-29 04:49:00 21.34
23 2012-06-29 05:49:00 21.24
24 2012-06-29 06:49:00 21.29
25 2012-06-29 07:27:08 21.32
26 2012-06-29 07:49:00 21.38
27 2012-06-29 08:49:00 21.39
28 2012-06-29 09:49:00 21.44
29 2012-06-29 10:49:00 21.42
30 2012-06-29 11:49:00 21.58
31 2012-06-29 12:49:00 21.96
32 2012-06-29 13:49:00 22.22
33 2012-06-29 14:49:00 22.33
34 2012-06-29 15:49:00 22.51
The type of data in data$Temps are POSIxlt
I want to create a new dataframe that includes only the mesurement of for exemple this day : 2012-06-28. That would be data[1:17,]
I tried to work with the function which() based on exemples from the internet but I failed to find a solution.
What function should I use ?
In order to do that i used this :
library(lubridate)
data[date(data$Time)==ymd("2012-06-28"),]
It works just fine.
We can use as.Date
subset(data, as.Date(Time) == as.Date("2012-06-28"))
I have an R Time Series at the weekly level starting at Jan 7, 2013 and ending at May 23 2016.
I created the time series using the following code:
start_date <- min(Orders_All_weekly$Week_Start)
Orders_Weekly.ts <- ts(Orders_All_weekly$Sales, start = decimal_date(ymd(start_date)), freq = (365.25/7))
Orders_Weekly.stl <- stl(Orders_Weekly.ts, s.window = 'periodic')
I am attempting to run a Holt Winters time series on these data, and I am receiving the error
Orders_Weekly.hw <- HoltWinters(Orders_Weekly.stl)
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) :
time series has no or less than 2 periods
I have seen several people post this error and the common response has been that the data did not, in fact, have at least two periods...which is necessary for this procedure. Unless I'm completely misunderstanding the meaning of this (which is possible) I have more than the required two periods. My data are at the weekly level, and I have 3+ years of observations.
Am I setting this up incorrectly? Or is the error essentially saying there is no seasonality?
ID Week_Start Sales
1 2013-04-08 932662.9
3 2013-05-13 1021574.4
4 2013-05-20 913812.9
5 2013-06-17 1086239.1
7 2013-08-26 762829.3
8 2013-11-18 1085033.0
9 2013-12-02 897158.4
10 2013-12-09 776733.7
11 2013-12-23 867362.8
12 2013-02-04 666362.0
13 2013-02-18 748603.2
15 2013-07-22 1005986.7
16 2013-09-02 896582.8
17 2013-10-28 868364.8
18 2014-01-06 814648.7
19 2014-02-10 847342.4
20 2014-02-17 869374.3
21 2014-03-17 827677.6
22 2014-03-24 897462.3
23 2014-03-31 850542.4
24 2014-04-21 1139619.4
25 2014-07-28 889043.3
26 2014-08-04 1097560.6
27 2014-09-08 1029379.4
28 2014-10-13 998094.8
29 2014-11-10 1238445.9
30 2014-12-15 1204006.6
31 2014-07-14 1106800.6
32 2014-09-01 730030.8
33 2014-10-06 1085331.8
34 2014-05-05 1072926.8
35 2014-05-19 863283.7
36 2015-01-19 1095186.1
37 2015-02-02 866258.2
38 2015-02-16 1006247.0
39 2015-03-23 1214339.7
40 2015-04-20 1181482.9
41 2015-05-18 1112542.4
42 2015-06-01 1188714.7
43 2015-07-20 1216050.4
45 2015-08-17 848302.8
46 2015-08-24 1081198.9
47 2015-09-14 916539.8
48 2015-09-28 957177.8
49 2015-10-26 964467.1
50 2015-11-02 1063949.1
51 2015-01-12 879343.9
53 2015-03-09 1245047.9
55 2015-11-16 913514.4
56 2015-02-09 1108247.6
57 2015-12-28 1014929.2
58 2016-01-25 946786.3
59 2016-02-01 891230.8
60 2016-02-29 1274039.8
61 2016-03-07 847501.8
62 2016-04-04 1057844.1
64 2016-04-11 1207347.4
65 2016-04-18 1159690.4
66 2016-05-02 1394727.6
67 2016-05-23 1044129.3
68 2013-03-04 1040017.1
69 2013-03-11 984574.2
70 2013-04-15 1054174.1
72 2013-04-29 952720.1
73 2013-05-06 1000977.1
74 2013-06-03 1091743.6
75 2013-07-01 955164.8
76 2013-08-12 808803.7
77 2013-09-23 960096.4
78 2013-09-30 814014.4
79 2013-10-14 743264.9
81 2013-01-28 956396.4
84 2013-10-21 959058.5
85 2013-11-11 915108.6
90 2013-01-14 867140.6
91 2014-01-27 910063.7
92 2014-03-10 963144.2
93 2014-04-07 975789.6
95 2014-04-28 1030313.7
97 2014-05-26 1139089.3
99 2014-06-09 1077980.6
100 2014-06-30 1019326.6
101 2014-09-15 666787.6
103 2014-11-03 1059089.4
105 2014-11-24 705428.6
106 2014-12-22 889368.8
108 2014-06-23 1046989.4
110 2015-02-23 1327066.4
112 2015-04-13 1110673.9
115 2015-06-08 1177799.1
116 2015-07-06 1314697.7
118 2015-07-27 1094805.6
119 2015-08-03 882394.2
120 2015-09-21 1159233.2
121 2015-10-19 1171636.9
122 2015-11-23 1036050.9
125 2015-12-21 984050.8
128 2016-01-04 1371348.3
129 2016-01-11 1086225.4
131 2016-02-22 1077692.4
137 2013-03-18 854699.1
141 2013-05-27 1011870.1
142 2013-08-05 893878.4
143 2013-12-16 801215.2
148 2013-10-07 805962.8
150 2013-11-04 801729.8
152 2013-08-19 726361.0
155 2014-02-24 979288.7
158 2014-04-14 1006729.5
161 2014-07-07 1102600.4
162 2014-08-11 979494.5
164 2014-10-20 901047.1
166 2014-10-27 1260062.0
169 2014-12-29 1022656.2
171 2014-08-18 976136.5
175 2015-03-02 897352.6
177 2015-03-30 1059103.8
178 2015-05-11 1033694.4
179 2015-06-29 1037959.4
182 2015-09-07 1230050.6
183 2015-10-12 975898.2
185 2015-12-07 1057603.4
186 2015-12-14 953718.2
189 2015-04-06 1233091.9
190 2015-04-27 1176994.2
192 2015-01-26 1256182.6
196 2016-01-18 955919.5
197 2016-02-15 954623.5
198 2016-03-14 740724.2
199 2013-01-07 924205.2
201 2013-02-11 672150.0
202 2013-03-25 769391.5
205 2013-06-10 870971.1
206 2013-06-24 1043166.2
208 2013-07-15 1106379.4
210 2013-09-09 916382.0
215 2013-04-22 934307.5
217 2013-12-30 974004.0
219 2014-01-13 972211.2
220 2014-01-20 952294.8
221 2014-02-03 946820.6
225 2014-06-02 1182837.6
228 2014-08-25 912550.8
234 2014-03-03 1013797.0
245 2015-06-15 946565.2
246 2015-07-13 1139633.6
248 2015-08-10 1080701.8
249 2015-08-31 1052796.2
253 2015-11-30 980493.4
259 2016-03-28 1105384.2
264 2016-02-08 897832.2
267 2013-02-25 766646.8
269 2013-04-01 954419.8
281 2013-11-25 852430.6
286 2013-09-16 997656.1
290 2014-07-21 1171519.8
294 2014-09-29 804772.4
298 2014-12-01 813872.0
299 2014-12-08 1005479.1
304 2014-06-16 981782.5
312 2015-03-16 1009182.7
315 2015-05-25 1166947.6
329 2015-01-05 903062.3
337 2016-03-21 1299648.7
338 2016-04-25 1132090.1
341 2013-01-21 818799.7
364 2014-05-12 1035870.7
367 2014-09-22 1234683.8
381 2015-06-22 990619.5
383 2015-10-05 1175100.6
385 2015-11-09 1095345.9
395 2016-05-16 1121192.5
399 2016-05-09 1175343.4
407 2013-07-08 1035513.8
430 2014-11-17 1024473.3
443 2015-05-04 1063411.6
476 2013-07-29 809045.3
I'm not sure if this completely answers the question but I was able to get a result out of your data with the slightly modified code below.
Hope this helps!
One point, I first sorted the data by date, assuming this was part of your intent.
Orders_Sorted <- Orders_Weekly[order(Orders_Weekly$Week_Start),] # Sort by date (unless you want to keep the data out of date order for some reason)
Orders_Weekly.ts <- ts(Orders_Sorted$Sales, frequency = (365.25/7)) # Convert df to time series
Orders_Weekly.hw <- HoltWinters(x=Orders_Weekly.ts, beta = FALSE, gamma = FALSE) # Run HW
plot(Orders_Weekly.hw) # Show plot of HW output
This produces the plot below.
Plot of Holt-Winters exponential smoothing of data
I have encountered the same error, except that I have been computing a moving average of my initial data.
Unfortunately, the decompose() function that HoltWinters() uses returns that error message if anything goes wrong, not just when there aren't enough periods. Look more closely at the data you're passing HoltWinters(), even if your initial data looks fine.
In your particular case, Orders_Weekly.ts is kind of a ts object, but it has seasonal, trend, remainder, and weights components. I'm not very familiar with stl(), but when I try HoltWinters(Orders_Weekly.ts$time.series), it works just fine.
In my case, the moving average of my initial data introduced a bunch of NAs at the beginning of my time-series. After removing those, HoltWinters() worked.
The trick is to have at least two periods in your time series. The time series need to be complete-- there are two default time periods.
https://r.789695.n4.nabble.com/time-series-has-no-or-less-than-2-periods-td4677519.html
I have the following column in my data frame:
DateTime
1 2011-10-03 08:00:04
2 2011-10-03 08:00:05
3 2011-10-03 08:00:06
4 2011-10-03 08:00:09
5 2011-10-03 08:00:15
6 2011-10-03 08:00:24
7 2011-10-03 08:00:30
8 2011-10-03 08:00:42
9 2011-10-03 08:01:01
10 2011-10-03 08:01:24
11 2011-10-03 08:01:58
12 2011-10-03 08:02:34
13 2011-10-03 08:03:25
14 2011-10-03 08:04:26
15 2011-10-03 08:06:00
With dput:
> dput(smallDF)
structure(list(DateTime = structure(c(1317621604, 1317621605,
1317621606, 1317621609, 1317621615, 1317621624, 1317621630, 1317621642,
1317621661, 1317621684, 1317621718, 1317621754, 1317621805, 1317621866,
1317621960, 1317622103, 1317622197, 1317622356, 1317622387, 1317622463,
1317622681, 1317622851, 1317623061, 1317623285, 1317623404, 1317623498,
1317623612, 1317623849, 1317623916, 1317623994, 1317624174, 1317624414,
1317624484, 1317624607, 1317624848, 1317625023, 1317625103, 1317625179,
1317625200, 1317625209, 1317625229, 1317625238, 1317625249, 1317625264,
1317625282, 1317625300, 1317625315, 1317625339, 1317625353, 1317625365,
1317625371, 1317625381, 1317625395, 1317625415, 1317625423, 1317625438,
1317625458, 1317625469, 1317625487, 1317625500, 1317625513, 1317625533,
1317625548, 1317625565, 1317625581, 1317625598, 1317625613, 1317625640,
1317625661, 1317625674, 1317625702, 1317625715, 1317625737, 1317625758,
1317625784, 1317625811, 1317625826, 1317625841, 1317625862, 1317625895,
1317625909, 1317625935, 1317625956, 1317625973, 1317626001, 1317626043,
1317626062, 1317626100, 1317626113, 1317626132, 1317626153, 1317626179,
1317626212, 1317626239, 1317626271, 1317626296, 1317626323, 1317626361,
1317626384, 1317626407), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = "DateTime", row.names = c(NA,
-100L), class = "data.frame")
My goal: I want to calculate the time difference, in seconds, between each measurement.
Edit:
I'm looking to get the following result, where the time difference (in seconds) between each data point is calculated, except for the first value of the day (line 3), when the time is calculate relative to 8 am:
DateTime Seconds
1 2011-09-30 21:59:02 6
2 2011-09-30 21:59:04 2
3 2011-10-03 08:00:04 4
4 2011-10-03 08:00:05 1
5 2011-10-03 08:00:06 1
6 2011-10-03 08:00:09 3
7 2011-10-03 08:00:15 5
8 2011-10-03 08:00:24 9
9 2011-10-03 08:00:30 6
10 2011-10-03 08:00:42 12
11 2011-10-03 08:01:01 19
12 2011-10-03 08:01:24 23
13 2011-10-03 08:01:58 34
14 2011-10-03 08:02:34 36
15 2011-10-03 08:03:25 51
16 2011-10-03 08:04:26 61
17 2011-10-03 08:06:00 94
However, the measurements start at 8:00 am, so if the value is the first of the day, the number of seconds relative to 8:00 am need to be calculated. In the example above, the first measurement ends at 8:00:04 so using the $sec attribute of POSIX could work here, but on other days the first value may happen a few minutes after 8:00 o'clock.
I've tried to achieve that goal with the following function:
SecondsInBar <- function(x, startTime){
# First data point or first of day
if (x == 1 || x > 1 && x$wkday != x[-1]$wkday){
seconds <- as.numeric(difftime(x,
as.POSIXlt(startTime, format = "%H:%M:%S"),
units = "secs"))
# else calculate time difference
} else {
seconds <- as.numeric(difftime(x, x[-1], units = "secs"))
}
return (seconds)
}
Which then could be called with SecondsInBar(smallDF$DateTime, "08:00:00").
There are at least two problems with this function, but I don't know how to solve these:
The code segment x$wkday != x[-1]$wkday returns a $ operator is
invalid for atomic vectors error,
And the as.POSIXlt(startTime, format = "%H:%M:%S") uses the
current date, which makes the difftime calculation erroneous.
My question:
Where am I going wrong with this function?
And: is this approach a viable way or should I approach it from a different angle?
How about something along these lines:
smallDF$DateTime - as.POSIXct(paste(strftime(smallDF$DateTime,"%Y-%m-%d"),"07:00:00"))
Time differences in secs
[1] 4 5 6 9 15 24 30 42 61 84 118 154 205 266 360
[16] 503 597 756 787 863 1081 1251 1461 1685 1804 1898 2012 2249 2316 2394
[31] 2574 2814 2884 3007 3248 3423 3503 3579 3600 3609 3629 3638 3649 3664 3682
[46] 3700 3715 3739 3753 3765 3771 3781 3795 3815 3823 3838 3858 3869 3887 3900
[61] 3913 3933 3948 3965 3981 3998 4013 4040 4061 4074 4102 4115 4137 4158 4184
[76] 4211 4226 4241 4262 4295 4309 4335 4356 4373 4401 4443 4462 4500 4513 4532
[91] 4553 4579 4612 4639 4671 4696 4723 4761 4784 4807
attr(,"tzone")
[1] ""
Note that I used 7am as when I copied your data my it decided to interpret it as BST.
As for your errors, you can't use $ to get elements of a date with POSIXct (which is how smallDF$DateTime is defined), only with POSIXlt. And for the second error, if you don't supply a date, it has to assume the current date, as there is no other information to draw upon.
Edit
Now its been clarified, I would propose a different approach: split your data frame by day, and then combine the times with the reference time and do diff on that, using lapply to loop over days:
#modify dataframe to add extra day to second half
smallDF[51:100,1] <- smallDF[51:100,1]+86400
smallDF2 <- split(smallDF,strftime(smallDF$DateTime,"%Y-%m-%d"))
lapply(smallDF2,function(x) diff(c(as.POSIXct(paste(strftime(x$DateTime[1],"%Y-%m-%d"),"07:00:00")),x$DateTime)))
$`2011-10-03`
Time differences in secs
[1] 4 1 1 3 6 9 6 12 19 23 34 36 51 61 94 143 94 159 31
[20] 76 218 170 210 224 119 94 114 237 67 78 180 240 70 123 241 175 80 76
[39] 21 9 20 9 11 15 18 18 15 24 14 12
$`2011-10-04`
Time differences in secs
[1] 3771 10 14 20 8 15 20 11 18 13 13 20 15 17 16
[16] 17 15 27 21 13 28 13 22 21 26 27 15 15 21 33
[31] 14 26 21 17 28 42 19 38 13 19 21 26 33 27 32
[46] 25 27 38 23 23
I maintain my journal electronically and I'm trying to get an idea of how consistent I've been with my journal writing over the last few months. I have the following data file, which shows how many journal entries (Entry Count) and words (Word Count) I recorded over the preceding 30-day period.
Date Entry Count Word Count
2010-08-25 22 4205
2010-08-26 21 4012
2010-08-27 20 3865
2010-08-28 20 4062
2010-08-29 19 3938
2010-08-30 18 3759
2010-08-31 17 3564
2010-09-01 17 3564
2010-09-02 16 3444
2010-09-03 17 3647
2010-09-04 17 3617
2010-09-05 16 3390
2010-09-06 15 3251
2010-09-07 15 3186
2010-09-08 15 3186
2010-09-09 16 3414
2010-09-10 15 3228
2010-09-11 14 3006
2010-09-12 13 2769
2010-09-13 13 2781
2010-09-14 12 2637
2010-09-15 13 2774
2010-09-16 13 2808
2010-09-17 12 2732
2010-09-18 12 2664
2010-09-19 13 2931
2010-09-20 13 2751
2010-09-21 13 2710
2010-09-22 14 2950
2010-09-23 14 2834
2010-09-24 14 2834
2010-09-25 14 2834
2010-09-26 14 2834
2010-09-27 14 2834
2010-09-28 14 2543
2010-09-29 14 2543
2010-09-30 15 2884
2010-10-01 16 3105
2010-10-02 16 3105
2010-10-03 16 3105
2010-10-04 15 2902
2010-10-05 14 2805
2010-10-06 14 2805
2010-10-07 14 2805
2010-10-08 14 2812
2010-10-09 15 2895
2010-10-10 14 2667
2010-10-11 15 2876
2010-10-12 16 2938
2010-10-13 17 3112
2010-10-14 16 2894
2010-10-15 16 2894
2010-10-16 16 2923
2010-10-17 15 2722
2010-10-18 15 2722
2010-10-19 14 2544
2010-10-20 13 2277
2010-10-21 13 2329
2010-10-22 12 2132
2010-10-23 11 1892
2010-10-24 10 1764
2010-10-25 10 1764
2010-10-26 10 1764
2010-10-27 10 1764
2010-10-28 10 1764
2010-10-29 9 1670
2010-10-30 10 1969
2010-10-31 10 1709
2010-11-01 10 1624
2010-11-02 11 1677
2010-11-03 11 1677
2010-11-04 11 1677
2010-11-05 11 1677
2010-11-06 12 1786
2010-11-07 12 1786
2010-11-08 11 1529
2010-11-09 10 1446
2010-11-10 11 1682
2010-11-11 11 1540
2010-11-12 11 1673
2010-11-13 11 1765
2010-11-14 12 1924
2010-11-15 13 2276
2010-11-16 12 2110
2010-11-17 13 2524
2010-11-18 14 2615
2010-11-19 14 2615
2010-11-20 15 2706
2010-11-21 14 2549
2010-11-22 15 2647
2010-11-23 16 2874
2010-11-24 16 2874
2010-11-25 16 2874
2010-11-26 17 3249
2010-11-27 18 3421
2010-11-28 18 3421
2010-11-29 19 3647
I'm trying to plot this data with R to get a graphical representation of my journal-writing consistency. I load it into R with the following command.
d <- read.table("journal.txt", header=T, sep="\t")
I can then graph the data with the following command.
plot(seq(from=1, to=length(d$Entry.Count), by=1), d$Entry.Count, type="o", ylim=c(0, max(d$Entry.Count)))
However, in this plot the X axis is just a number, not a date. I tried adjusting the command to show dates on the X axis like this.
plot(d$Date, d$Entry.Count, type="o", ylim=c(0, max(d$Entry.Count)))
However, not only does the plot look strange, but the labels on the X axis are not very helpful. What is the best way to plot this data so that I can clearly associate dates with points on the plotted curve?
Based on your code the dates are just characters.
Try converting them to Dates:
plot(as.Date(d$Date), d$Entry.Count)
Quite simple in your case as the "%Y-%m-%d" format is the default for as.Date. See strptime for more general options.
You could use zoo. ?plot.zoo has several examples of how to create custom axis labels.
z <- zoo(d[,-1],as.Date(d[,1]))
plot(z)
# Example of custom axis labels
plot(z$Entry.Count, screen = 1, col = 1:2, xaxt = "n")
ix <- seq(1, length(time(z)), 3)
axis(1, at = time(z)[ix], labels = format(time(z)[ix],"%b-%d"), cex.axis = 0.7)