R get the particular hour from the time - r

I have the following data
Num Date Time
1 2015.05.21 12:12:12
2 2015.05.22 13:12:12
3 2015.05.23 14:12:12
4 2015.05.24 15:12:12
5 2015.05.25 16:12:12
By using weekdays(as.Date(data$Date, format='%Y.%m.%d')) I can get the corresponding days of the week. Also by using months I can get the corresponding months. Is there a way to get the hour only in a new column? Something like hours(as.Date(data$Time, format='%H:%M:%S')) which will provide me the following output.
Num Date Time Hour
1 2015.05.21 12:12:12 12
2 2015.05.22 13:12:12 13
3 2015.05.23 14:12:12 14
4 2015.05.24 15:12:12 15
5 2015.05.25 16:12:12 16

R doesn't have a native data type for just time values without dates. With the sample data
dd<-read.table(text="Num Date Time
1 2015.05.21 12:12:12
2 2015.05.22 13:12:12
3 2015.05.23 14:12:12
4 2015.05.24 15:12:12
5 2015.05.25 16:12:12", header=T, stringsAsFactors=F)
You can do
transform(dd, Hour=as.POSIXlt(paste(Date, Time), format="%Y.%m.%d %H:%M:%S")$hour)
to get
Num Date Time Hour
1 1 2015.05.21 12:12:12 12
2 2 2015.05.22 13:12:12 13
3 3 2015.05.23 14:12:12 14
4 4 2015.05.24 15:12:12 15
5 5 2015.05.25 16:12:12 16

Related

Calculating week numbers WITHOUT a yearwise reset (i.e. week_id = 55 is valid and shows it is a year after) + with a specified start date

This probably seems straightforward, but I am pretty stumped.
I have a set of dates ~ August 1 of each year and need to sum sales by week number. The earliest date is 2008-12-08 (YYYY-MM-DD). I need to create a "week_id" field where week #1 begins on 2008-12-08. And the date 2011-09-03 is week 142. Note that this is different since the calculation of week number does not reset every year.
I am putting up a small example dataset here:
data <- data.frame(
dates = c("2008-12-08", "2009-08-10", "2010-03-31", "2011-10-16", "2008-06-03", "2009-11-14" , "2010-05-05", "2011-09-03"))
data$date = as.Date(data$date)
Any help is appreciated
data$week_id = as.numeric(data$date - as.Date("2008-12-08")) %/% 7 + 1
This would take the day difference between the two dates and find the integer number of 7 days elapsed. I add one since we want the dates where zero weeks have elapsed since the start to be week 1 instead of week 0.
dates date week_id
1 2008-12-07 2008-12-07 0 # added for testing
2 2008-12-08 2008-12-08 1
3 2008-12-09 2008-12-09 1 # added for testing
4 2008-12-14 2008-12-14 1 # added for testing
5 2008-12-15 2008-12-15 2 # added for testing
6 2009-08-10 2009-08-10 36
7 2010-03-31 2010-03-31 69
8 2011-10-16 2011-10-16 149
9 2008-06-03 2008-06-03 -26
10 2009-11-14 2009-11-14 49
11 2010-05-05 2010-05-05 74
12 2011-09-03 2011-09-03 143

Get the first date of the first five-consecutive dates in a list using R

I have the following data:
structure(list(V1 = c("1979-01-28", "1979-01-29", "1979-01-30",
"1979-02-13", "1979-02-14", "1979-02-17", "1979-02-18", "1979-02-19",
"1979-02-20", "1979-02-21", "1979-02-22", "1979-02-23", "1979-03-07",
"1979-03-14", "1979-03-18", "1979-03-29", "1979-03-30", "1979-03-31",
"1979-04-01", "1979-04-02", "1979-04-03", "1979-04-04", "1979-04-05")), class =
"data.frame", row.names = c(NA,-22L))
This is a list of dates. The interval is daily but with gaps.
I would like to get the first date of a five-day sequence that occurred first.
So in the example above, the expected output is "1979-02-17".
Right now, I am getting the dates manually. How can I do this in R?
I'll appreciate any help on this.
Using rle and diff.
df$V1[with(rle(diff(as.Date(df$V1)) == 1), {
inds <- which.max(values & lengths >= 5)
sum(lengths[1:(inds - 1)]) + 1
})]
#[1] "1979-02-17"
How about
df=data.frame("V1"=df$V1)
df$V2=difftime(df$V1,c(tail(df$V1,-1),NA))
tmp=rle(as.numeric(df$V2))
df$V3=rep(tmp$lengths,tmp$lengths)
df
V1 V2 V3
1 1979-01-28 24 hours 2
2 1979-01-29 24 hours 2
3 1979-01-30 336 hours 1
4 1979-02-13 24 hours 1
5 1979-02-14 72 hours 1
6 1979-02-17 24 hours 6
7 1979-02-18 24 hours 6
8 1979-02-19 24 hours 6
9 1979-02-20 24 hours 6
10 1979-02-21 24 hours 6
11 1979-02-22 24 hours 6
12 1979-02-23 288 hours 1
13 1979-03-07 168 hours 1
14 1979-03-14 96 hours 1
15 1979-03-18 264 hours 1
16 1979-03-29 24 hours 3
17 1979-03-30 24 hours 3
18 1979-03-31 24 hours 3
19 1979-04-01 23 hours 1
20 1979-04-02 24 hours 3
21 1979-04-03 24 hours 3
22 1979-04-04 24 hours 3
23 1979-04-05 NA hours 1
df$V1[which.max(df$V3>=5)]
[1] "1979-02-17"

Count the number of active episodes per month from data with start and end dates

I am trying to get a count of active clients per month, using data that has a start and end date to each client's episode. The code I am using I can't work out how to count per month, rather than per every n days.
Here is some sample data:
Start.Date <- as.Date(c("2014-01-01", "2014-01-02","2014-01-03","2014-01-03"))
End.Date<- as.Date(c("2014-01-04", "2014-01-03","2014-01-03","2014-01-04"))
Make sure the dates are dates:
Start.Date <- as.Date(Start.Date, "%d/%m/%Y")
End.Date <- as.Date(End.Date, "%d/%m/%Y")
Here is the code I am using, which current counts the number per day:
library(plyr)
count(Reduce(c, Map(seq, start.month, end.month, by = 1)))
which returns:
x freq
1 2014-01-01 1
2 2014-01-02 2
3 2014-01-03 4
4 2014-01-04 2
The "by" argument can be changed to be however many days I want, but problems arise because months have different lengths.
Would anyone be able to suggest how I can count per month?
Thanks a lot.
note: I now realize that for my example data I have only used dates in the same month, but my real data has dates spanning 3 years.
Here's a solution that seems to work. First, I set the seed so that the example is reproducible.
# Set seed for reproducible example
set.seed(33550336)
Next, I create a dummy data frame.
# Test data
df <- data.frame(Start_date = as.Date(sample(seq(as.Date('2014/01/01'), as.Date('2015/01/01'), by="day"), 12))) %>%
mutate(End_date = as.Date(Start_date + sample(1:365, 12, replace = TRUE)))
which looks like,
# Start_date End_date
# 1 2014-11-13 2015-09-26
# 2 2014-05-09 2014-06-16
# 3 2014-07-11 2014-08-16
# 4 2014-01-25 2014-04-23
# 5 2014-05-16 2014-12-19
# 6 2014-11-29 2015-07-11
# 7 2014-09-21 2015-03-30
# 8 2014-09-15 2015-01-03
# 9 2014-09-17 2014-09-26
# 10 2014-12-03 2015-05-08
# 11 2014-08-03 2015-01-12
# 12 2014-01-16 2014-12-12
The function below takes a start date and end date and creates a sequence of months between these dates.
# Sequence of months
mon_seq <- function(start, end){
# Change each day to the first to aid month counting
day(start) <- 1
day(end) <- 1
# Create a sequence of months
seq(start, end, by = "month")
}
Right, this is the tricky bit. I apply my function mon_seq to all rows in the data frame using mapply. This gives the months between each start and end date. Then, I combine all these months together into a vector. I format this vector so that dates just contain months and years. Finally, I pipe (using dplyr's %>%) this into table which counts each occurrence of year-month and I cast as a data frame.
data.frame(format(do.call("c", mapply(mon_seq, df$Start_date, df$End_date)), "%Y-%m") %>% table)
This gives,
# . Freq
# 1 2014-01 2
# 2 2014-02 2
# 3 2014-03 2
# 4 2014-04 2
# 5 2014-05 3
# 6 2014-06 3
# 7 2014-07 3
# 8 2014-08 4
# 9 2014-09 6
# 10 2014-10 5
# 11 2014-11 7
# 12 2014-12 8
# 13 2015-01 6
# 14 2015-02 4
# 15 2015-03 4
# 16 2015-04 3
# 17 2015-05 3
# 18 2015-06 2
# 19 2015-07 2
# 20 2015-08 1
# 21 2015-09 1

Issue with exporting information from predict function - date specifically in R

When I use the function below:
lapply(reg_results, function(model) {predict(model,newdata=subset(df, date=="12/15/2016 12:00:00 AM" & key=="1"))})
I get an output of regression results for a specific date based on x variables contained in df.
However, my df has several days for key==1. I want to export the regression results based on all dates, and print to say "df_results". My issue is, when I do the following:
lapply(reg_results, function(model) {predict(model,newdata=subset(DataNP, scenario_ID=="1", id=date))})
It outputs all the predictors for all dates, but the dates are showing as random numbers. They don't even appear to be numerically stored. Is there a way to bring in the column dates for each prediction output?
Example data set:
Key Date y x1 x2 x3
1 1/10/2018 12:00:00 AM 2 3 2 5
1 1/11/2018 12:00:00 AM 3 5 7 2
1 1/12/2018 12:00:00 AM 5 7 4 7
1 1/13/2018 12:00:00 AM 7 2 7 6
2 1/10/2018 12:00:00 AM 2 6 3 8
2 1/11/2018 12:00:00 AM 3 7 7 3
2 1/12/2018 12:00:00 AM 3 2 3 4
2 1/13/2018 12:00:00 AM 7 6 2 7
Essentially, I would like my regression predict function to show both the key and date next to each predict output so I can tie back to the insample data to compare.

create an unique week variable NOT depending on the calendar in R

I have a daily revenue time series df from 01-01-2014 to 15-06-2017 and I want to aggregate the daily revenue data to weekly revenue data and do the weekly predictions. Before I aggregate the revenue, I need to create a continuously week variable, which will NOT start from week 1 again when a new year starts. Since 01-01-2014 was not Monday, so I decided to start my first week from 06-01-2014.
My df now looks like this
date year month total
7 2014-01-06 2014 1 1857679.4
8 2014-01-07 2014 1 1735488.0
9 2014-01-08 2014 1 1477269.9
10 2014-01-09 2014 1 1329882.9
11 2014-01-10 2014 1 1195215.7
...
709 2017-06-14 2017 6 1677476.9
710 2017-06-15 2017 6 1533083.4
I want to create a unique week variable starting from 2014-01-06 until the last row of my dataset (1257 rows in total), which is 2017-06-15.
I wrote a loop:
week = c()
for (i in 1:179) {
week = rep(i,7)
print(week)
}
However, the result of this loop is not saved for each iteration. When I type week, it just shows 179,179,179,179,179,179,179
Where is the problem and how can I add 180, 180, 180, 180 after the repeat loop?
And if I will add more new data after 2017-06-15, how can I create the weekly variable automatically depending on my end of row (date)? (In other words, by doing that, I don't need to calculate how many daily observations I have and divide it by 7 and plus the rest of the dates to become the week index)
Thank you!
Does this work
library(lubridate)
#DATA
x = data.frame(date = seq.Date(from = ymd("2014-01-06"),
to = ymd("2017-06-15"), length.out = 15))
#Add year and week for each date
x$week = year(x$date) + week(x$date)/100
#Convert the addition of year and week to factor and then to numeric
x$week_variable = as.numeric(as.factor(x$week))
#Another alternative
x$week_variable2 = floor(as.numeric(x$date - min(x$date))/7) + 1
x
# date week week_variable week_variable2
#1 2014-01-06 2014.01 1 1
#2 2014-04-05 2014.14 2 13
#3 2014-07-04 2014.27 3 26
#4 2014-10-02 2014.40 4 39
#5 2014-12-30 2014.52 5 52
#6 2015-03-30 2015.13 6 65
#7 2015-06-28 2015.26 7 77
#8 2015-09-26 2015.39 8 90
#9 2015-12-24 2015.52 9 103
#10 2016-03-23 2016.12 10 116
#11 2016-06-21 2016.25 11 129
#12 2016-09-18 2016.38 12 141
#13 2016-12-17 2016.51 13 154
#14 2017-03-17 2017.11 14 167
#15 2017-06-15 2017.24 15 180
Here is the answer:
week = c()
for (i in 1:184) {
for (j in 1:7) {
week[j+(i-1)*7] = i
}
}
week = as.data.frame(week)
I created a week variable, and from week 1 to the week 184 (end of my dataset). For each week number, I repeat 7 times because there are 7 days in a week. Later I assigned the week variable to my data frame.

Resources