Plotting Time in hours? - r

I have this dataset
Time Forums_Read
1 00:01 1
2 00:04 1
3 00:05 3
4 00:06 3
5 00:07 3
6 00:08 6
7 00:10 2
8 00:11 2
9 00:12 1
I am trying to geom_line the data. However, it needs to be of type POSIXct.
The structure of the column-Time is:
Factor w/ 1254 levels "00:01","00:04",..
Is there any solution for this?
Thanks!

The date class POSIXct requires a starting point. We can use any since the values in the Time column are being compared to each other. This function call will convert the dates to the proper format.
df$Time <- as.POSIXct(df$Time, format="%M:%S")
I used the format "%M:%S" to indicate minutes and seconds. If you have hours and minutes represented in your data, use "%H:%M". For more information on date formatting see ?strptime.

Along the lines of #Pierre Lafortune's comment,
Df$Time2 <- as.POSIXct(
paste0(Sys.Date(), " 00:", as.character(Df$Time)))
##
library(ggplot2)
##
ggplot(
data = Df,
aes(x = Time2, y = Forums_Read)) +
geom_line()
Data:
Df <- read.table(text = " Time Forums_Read
1 00:01 1
2 00:04 1
3 00:05 3
4 00:06 3
5 00:07 3
6 00:08 6
7 00:10 2
8 00:11 2
9 00:12 1",
header = TRUE,
stringsAsFactors = TRUE)

Related

Count the number of active episodes per month from data with start and end dates

I am trying to get a count of active clients per month, using data that has a start and end date to each client's episode. The code I am using I can't work out how to count per month, rather than per every n days.
Here is some sample data:
Start.Date <- as.Date(c("2014-01-01", "2014-01-02","2014-01-03","2014-01-03"))
End.Date<- as.Date(c("2014-01-04", "2014-01-03","2014-01-03","2014-01-04"))
Make sure the dates are dates:
Start.Date <- as.Date(Start.Date, "%d/%m/%Y")
End.Date <- as.Date(End.Date, "%d/%m/%Y")
Here is the code I am using, which current counts the number per day:
library(plyr)
count(Reduce(c, Map(seq, start.month, end.month, by = 1)))
which returns:
x freq
1 2014-01-01 1
2 2014-01-02 2
3 2014-01-03 4
4 2014-01-04 2
The "by" argument can be changed to be however many days I want, but problems arise because months have different lengths.
Would anyone be able to suggest how I can count per month?
Thanks a lot.
note: I now realize that for my example data I have only used dates in the same month, but my real data has dates spanning 3 years.
Here's a solution that seems to work. First, I set the seed so that the example is reproducible.
# Set seed for reproducible example
set.seed(33550336)
Next, I create a dummy data frame.
# Test data
df <- data.frame(Start_date = as.Date(sample(seq(as.Date('2014/01/01'), as.Date('2015/01/01'), by="day"), 12))) %>%
mutate(End_date = as.Date(Start_date + sample(1:365, 12, replace = TRUE)))
which looks like,
# Start_date End_date
# 1 2014-11-13 2015-09-26
# 2 2014-05-09 2014-06-16
# 3 2014-07-11 2014-08-16
# 4 2014-01-25 2014-04-23
# 5 2014-05-16 2014-12-19
# 6 2014-11-29 2015-07-11
# 7 2014-09-21 2015-03-30
# 8 2014-09-15 2015-01-03
# 9 2014-09-17 2014-09-26
# 10 2014-12-03 2015-05-08
# 11 2014-08-03 2015-01-12
# 12 2014-01-16 2014-12-12
The function below takes a start date and end date and creates a sequence of months between these dates.
# Sequence of months
mon_seq <- function(start, end){
# Change each day to the first to aid month counting
day(start) <- 1
day(end) <- 1
# Create a sequence of months
seq(start, end, by = "month")
}
Right, this is the tricky bit. I apply my function mon_seq to all rows in the data frame using mapply. This gives the months between each start and end date. Then, I combine all these months together into a vector. I format this vector so that dates just contain months and years. Finally, I pipe (using dplyr's %>%) this into table which counts each occurrence of year-month and I cast as a data frame.
data.frame(format(do.call("c", mapply(mon_seq, df$Start_date, df$End_date)), "%Y-%m") %>% table)
This gives,
# . Freq
# 1 2014-01 2
# 2 2014-02 2
# 3 2014-03 2
# 4 2014-04 2
# 5 2014-05 3
# 6 2014-06 3
# 7 2014-07 3
# 8 2014-08 4
# 9 2014-09 6
# 10 2014-10 5
# 11 2014-11 7
# 12 2014-12 8
# 13 2015-01 6
# 14 2015-02 4
# 15 2015-03 4
# 16 2015-04 3
# 17 2015-05 3
# 18 2015-06 2
# 19 2015-07 2
# 20 2015-08 1
# 21 2015-09 1

Converting sets of calendar dates to Julian days in a data frame

I am a beginner in R and I am trying to convert sets of calendar dates to sets of Julian dates in a data frame using R. I know there are a similar questions answered but I am not being able to get I want.
df <- data.frame(Date = c('2010-06-20','2005-10-19','2000-05-01','2003-04-04','2010-11-20','2009-09-14'), No = c(1, 4, 6, 11, 7, 9))
df$ jDate <- as.POSIXct(as.numeric(df$Date), origin = '1970-01-01')
gives me
df
Date No cDate
1 2010-06-20 1 1969-12-31 19:00:05
2 2005-10-19 4 1969-12-31 19:00:03
3 2000-05-01 6 1969-12-31 19:00:01
4 2003-04-04 11 1969-12-31 19:00:02
5 2010-11-20 7 1969-12-31 19:00:06
6 2009-09-14 9 1969-12-31 19:00:04
How could I get a column with Julian days in the column 'jDate'?
Thank you for your help.
You can do
df$Date <- as.Date(df$Date)
to get the date, and then
df$jDate <- format(df$Date, "%j")
to get the julian days or
df$jDateYr <- format(df$Date, "%Y-%j")
to prepend the year (if you want). This returns
df
Date No jDate jDateYr
1 2010-06-20 1 171 2010-171
2 2005-10-19 4 292 2005-292
3 2000-05-01 6 122 2000-122
4 2003-04-04 11 094 2003-094
5 2010-11-20 7 324 2010-324
6 2009-09-14 9 257 2009-257
To read more about the possible date-time formats, see ?strptime.
Based on aosmith's comments, I did this and got what I wanted.
> df$jDate <- julian(as.Date(df$Date), origin = as.Date('1970-01-01'))
df
Date No jDate
1 2010-06-20 1 14780
2 2005-10-19 4 13075
3 2000-05-01 6 11078
4 2003-04-04 11 12146
5 2010-11-20 7 14933
6 2009-09-14 9 14501

How to bin observations over a time series in r?

I have an data set which looks like this:
VisitID Start
1 0 2015-02-15 09:46:43.17
2 1 2015-02-15 09:47:37.84
3 2 2015-02-15 09:58:46.42
4 3 2015-02-15 09:58:48.46
5 4 2015-02-15 10:28:25.09
6 5 2015-02-15 10:33:43.53
I want to make a bar plot of count per one hour(y-axis) vs. absolute time(x-axis), meaning how many observations were in one hour.
can you please help?
Thanks,
Guy
Something like this should work :
DF <- read.csv(text=
"VisitID,Start
0,2015-02-15 09:46:43.17
1,2015-02-15 09:47:37.84
2,2015-02-15 09:58:46.42
3,2015-02-15 09:58:48.46
4,2015-02-15 10:28:25.09
5,2015-02-15 10:33:43.53",stringsAsFactors=FALSE)
DF$StartDate <- strptime(DF$Start, tz='GMT', format="%Y-%m-%d %H:%M:%OS")
hours <- vapply(split(1:nrow(DF),format(DF$StartDate,"%Y-%m-%d %H:00:00",tz='UTC')),length,0)
barplot(hours)

Select rows from a data frame according to another vector, including repetitions

Example data:
dates=seq(as.POSIXct("2015-01-01 00:00:00"), as.POSIXct("2015-01-07 00:00:00"), by="day")
data=rnorm(7,1,2)
groupID=c(12,14,16,24,35,46,54)
DF=data.frame(Date=dates,Data=data,groupID=groupID)
BB=c(12,12,16,24,35,35)
DF[DF$groupID %in% BB,]
Date Data groupID
1 2015-01-01 4.4104202 12
3 2015-01-03 2.1557735 16
4 2015-01-04 -0.9880946 24
5 2015-01-05 -0.3396025 35
I need to filter the data frame DF according to values in my vector BB which match the groupID column. However, if BB contains repetitions, this is not reflected in the result.
Since my vector BB includes two values of 1, and two of 5, the output should in fact be:
Date Data groupID
1 2015-01-01 4.4104202 12
1 2015-01-01 4.4104202 12
3 2015-01-03 2.1557735 16
4 2015-01-04 -0.9880946 24
5 2015-01-05 -0.3396025 35
5 2015-01-05 -0.3396025 35
Is there a way to achieve this? And to keep the ordering of the vector BB if possible?
Use match() (or findInterval()):
DF[match(BB,DF$groupID),];
## Date Data groupID
## 1 2015-01-01 1.2199835 12
## 1.1 2015-01-01 1.2199835 12
## 3 2015-01-03 1.8141556 16
## 4 2015-01-04 0.2748579 24
## 5 2015-01-05 3.2030200 35
## 5.1 2015-01-05 3.2030200 35
(Note that the Data column is different because you used rnorm() to generate it without calling set.seed() first. It is recommended to call set.seed() in any code sample where you incorporate randomness so that exact results can be reproduced.)
You can transform BB into a data.frame and use merge() to merge DF and BB according to their groupID, to be specific:
dates=seq(as.POSIXct("2015-01-01 00:00:00"), as.POSIXct("2015-01-07 00:00:00"), by="day")
groupID=c(12,14,16,24,35,46,54)
set.seed(1234)
data=rnorm(7,1,2)
DF=data.frame(Date=dates,Data=data,groupID=groupID)
BB=data.frame(groupID=c(12,12,16,24,35,35))
Test result:
>merge(DF,BB,by="groupID")
groupID Date Data
1 12 2015-01-01 -1.414131
2 12 2015-01-01 -1.414131
3 16 2015-01-03 3.168882
4 24 2015-01-04 -3.691395
5 35 2015-01-05 1.858249
6 35 2015-01-05 1.858249

Plot a histogram based on the Date class

I was looking around the web but could not find the answer that I' looking for.
Here is my input data:
Date Calls
2012-01-01 3
2012-01-01 3
2012-01-01 10
2012-03-02 15
2012-03-02 7
2012-03-02 5
2012-04-02 0
2012-04-02 5
2012-04-02 18
2012-04-02 1
2012-04-02 0
2012-05-02 2
I want to plot a hist that will have the sum of calls for each of the days in the "Date" column.
Yes, it can be done by identifying the levels of Date column and add up the corresponding Calls but wondering
if as an elegant way to do it. "Date" column is as "Date" data.class().
According to this example, the final hist should have 4 bins of (16, 27, 24, 2).
Cheers,
Well, technically a histogram is really only to estimate the density function of continuous data and the way you have your data coded, it's more like Date is a categorical variable. So you probably just want a bar chart of counts than a true histogram. You can do what with ggplot with
qplot(Date,Calls, data=dd, stat="summary", fun.y="sum", geom="bar")
Read data:
d <- read.table(text=
"Date Calls
2012-01-01 3
2012-01-01 3
2012-01-01 10
2012-03-02 15
2012-03-02 7
2012-03-02 5
2012-04-02 0
2012-04-02 5
2012-04-02 18
2012-04-02 1
2012-04-02 0
2012-05-02 2",
header=TRUE)
d$Date <- as.Date(d$Date)
library(plyr)
s <- ddply(d,"Date",summarize,Calls=sum(Calls))
library(ggplot2)
If we use Date as the x variable we get month labels:
ggplot(s,aes(x=Date,y=Calls))+geom_bar(stat="identity")
You might prefer the particular date labels:
ggplot(s,aes(x=factor(Date),y=Calls))+geom_bar(stat="identity")
Or non-default labels:
ggplot(s,aes(x=format(Date,"%d-%b"),y=Calls))+geom_bar(stat="identity")+
labs(x="Date")
It should also be possible to do this by constructing your own hist object and passing it to plot.histogram, but I think this way is easier ...

Resources