I am new to R and I have a data frame with date time as variable. For every hour each day temperature is recorded, and date time is in format of YYYY-MM-DD 00:00:00.
Now I would like to convert the time into a factor ranging from 0 to 23 each day.
So For each day my new column should have factors 0 to 23. Could anyone help me with this? My 2015-01-01 00:00:00, should give me 0, while 2015-01-01 01:00:00, should give me 1 and so on. Also my 2015-01-02 00:00:00 should be 0 again.
You can convert your timestamp into a POSIXlt object. Once you have that, you can obtain the hour directly like this:
> timestamp <- as.POSIXlt("2015-01-01 00:00:00")
> timestamp
[1] "2015-01-01 MYT"
> timestamp$hour
[1] 0
Using a sample data, one way would be the following.
mydf <- data.frame(id = c(1,1,1,2,2,1,1),
event = c("start", "valid", "end", "start", "bad", "start", "bad"),
time = as.POSIXct(c("2015-05-16 20:46:53", "2015-05-16 20:46:56", "2015-05-16 21:46:59",
"2015-05-16 22:46:53", "2015-05-16 22:47:00", "2015-05-16 22:49:05",
"2015-05-16 23:49:09"), format = "%Y-%m-%d %H:%M:%S"),
stringsAsFactors = FALSE)
library(dplyr)
mutate(mydf, group = factor(format(time, "%H")))
# id event time group
#1 1 start 2015-05-16 20:46:53 20
#2 1 valid 2015-05-16 20:46:56 20
#3 1 end 2015-05-16 21:46:59 21
#4 2 start 2015-05-16 22:46:53 22
#5 2 bad 2015-05-16 22:47:00 22
#6 1 start 2015-05-16 22:49:05 22
#7 1 bad 2015-05-16 23:49:09 23
Tim's answer using POSIXlt is probably the best option, but here's a regex way just in case:
> times <- c("2015-01-01 00:00:00", "2015-01-01 01:00:00", "2015-01-02 00:00:00")
> regmatches(times, regexpr("(?<=-\\d{2} )\\d{2}", times, perl=TRUE))
[1] "00" "01" "00"
With the extracted hours you can make them factors or integers as necessary.
#Sairam, in addition to #jazzurro's use of 'dplyr' (which, like jazzurro, many R-insitas routinely use)...in the future, if you need/want a simple & powerful way to manipulate dates, you're encouraged to gain familiarity with another package: 'lubridate.'
lubridate makes working with dates a snap. Hope this helps and best regards on your project.
Related
I am new at using R and I am encountering a problem with historical hourly electric load data that I have downloaded.My goal is to make a load forecast based on an ARIMA model and/or Artificial Neural Networks.
The problem is that the data is in the following Date-time (hourly) format:
#> DateTime Day_ahead_Load Actual_Load
#> [1,] "01.01.2015 00:00 - 01.01.2015 01:00" "6552" "6100"
#> [2,] "01.01.2015 01:00 - 01.01.2015 02:00" "6140" "5713"
#> [3,] "01.01.2015 02:00 - 01.01.2015 03:00" "5950" "5553"
I have tried to make a POSIXct object but it didn't work:
as.Date.POSIXct(DateTime, format = "%d-%m-%Y %H:%M:%S", tz="EET", usetz=TRUE)
The message I get is that it is not in an unambiguous format. I would really appreciate your feedback on this.
Thank you in advance.
Best Regards,
Iro
You have 2 major problems. First, your DateTime column contains two dates, so you need to split that column into two. Second, your format argument has - characters but your date has . characters.
We can use separate from tidyr and mutate with across to change the columns to POSIXct.
library(dplyr)
library(tidyr)
data %>%
separate(DateTime, c("StartDateTime","EndDateTime"), " - ") %>%
mutate(across(c("StartDateTime","EndDateTime"),
~ as.POSIXct(., format = "%d.%m.%Y %H:%M",
tz="EET", usetz=TRUE)))
StartDateTime EndDateTime Day_ahead_Load Actual_Load
1 2015-01-01 00:00:00 2015-01-01 01:00:00 6552 6100
2 2015-01-01 01:00:00 2015-01-01 02:00:00 6140 5713
3 2015-01-01 02:00:00 2015-01-01 03:00:00 5950 5553
How can I force R to change the parameters of a day. So I mean that for example yrDay like here provided doesn't go from 0am til 0am but from 6pm til 6pm.
df <- data.frame(Date=seq(
from=as.POSIXct("2012-1-1 13:00:00", tz="UTC"),
to=as.POSIXct("2012-1-3 13:00:00", tz="UTC"),
by="hour")
)
df$yrDay <- as.numeric(strftime(df$Date,format="%j"))
Just add 5 hours (5 * 60 min * 60 sec) using your code:
df$yrDay <- as.numeric(strftime(df$Date + 5*60*60,format="%j"))
Date yrDay
1 2012-01-01 13:00:00 1
2 2012-01-01 14:00:00 1
3 2012-01-01 15:00:00 1
4 2012-01-01 16:00:00 1
5 2012-01-01 17:00:00 1
6 2012-01-01 18:00:00 2
7 2012-01-01 19:00:00 2
8 2012-01-01 20:00:00 2
9 2012-01-01 21:00:00 2
10 2012-01-01 22:00:00 2
...
Or 6 hours using lubridate (better approach IMHO):
df$yrDay <- lubridate::day(df$Date + 6*60*60)
But as mentioned by #ngm in the comments, it is a quick and dirty solution that might not be robust for all cases.
A robust solution in python code for the Q above.
This algorithm uses a NumPy and pandas to set a conditions for each time in with your datetime data. It simply ask if the day before is less then the time specified and the is date == to the current date.
Then it ask and if the day after is greater then shift hour and is it This condition been generate for each day in your data then is check for date same as the before day.
Hope this help or encourage other to try think for better solution.
Sorry about my English it is not my first and I am very busy so this is my best for the time i allocate to make an answer. Hope to improve please give helpful comments and try not to harsh :)
def count_custome_day_start_and_end(pd_index_datetime, shift_time='8:30am'):
dates = sorted(set(pd_index_datetime.date))
times = np.array([time.time() for time in pd_index_datetime
])
shift_time = pd.to_datetime(shift_time).time()
conditiones = [
cond for date in dates for cond in [
((times < shift_time)
& (pd_index_datetime.date == date)), # day before
((times >= shift_time)
& (pd_index_datetime.date == date)) # day after
]
]
langth = (len(conditiones) // 2) + 1
choices = [0, *np.repeat(range(1, langth), 2)]
choices.pop()
return np.select(conditiones, choices, pd_index_datetime.factorize()[0])
I have dataset consisting of two columns (timestamp and power) as:
str(df2)
'data.frame': 720 obs. of 2 variables:
$ timestamp: POSIXct, format: "2015-08-01 00:00:00" "2015-08-01 01:00:00" " ...
$ power : num 124 149 118 167 130 ..
This dataset is of entire one month duration. I want to create two subsets of it - one containing the weekend data, and other one containing weekday (Monday - Friday) data. In other words, one dataset should contain data corresponding to saturday and sunday and the other one should contain data of other days. Both of the subsets should retain both of the columns. How can I do this in R?
I tried to use the concept of aggregate and split, but I am not clear in the function parameter (FUN) of aggregate, how should I specify a divison of dataset.
You can use R base functions to do this, first use strptime to separate date data from first column and then use function weekdays.
Example:
df1<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00"),power=1:2)
df1$day<-strptime(df1[,1], "%Y-%m-%d")
df1$weekday<-weekdays(df1$day)
df1
timestamp power day weekday
2015-08-01 00:00:00 1 2015-08-01 Saturday
2015-10-13 00:00:00 2 2015-10-13 Tuesday
Building on top of #ShruS example:
df<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00", "2015-10-11 00:00:00", "2015-10-14 00:00:00"))
df$day<-strptime(df[,1], "%Y-%m-%d")
df$weekday<-weekdays(df$day)
df1 = subset(df,df$weekday == "Saturday" | df$weekday == "Sunday")
df2 = subset(df,df$weekday != "Saturday" & df$weekday != "Sunday")
> df
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
3 2015-10-11 00:00:00 2015-10-11 Sunday
4 2015-10-14 00:00:00 2015-10-14 Wednesday
> df1
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
3 2015-10-11 00:00:00 2015-10-11 Sunday
> df2
timestamp day weekday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
4 2015-10-14 00:00:00 2015-10-14 Wednesday
Initially, I tried for complex approaches using extra libraries, but at the end, I came out with a basic approach using R.
#adding day column to existing set
df2$day <- weekdays(as.POSIXct(df2$timestamp))
# creating two data_subsets, i.e., week_data and weekend_data
week_data<- data.frame(timestamp=factor(), power= numeric(),day= character())
weekend_data<- data.frame(timestamp=factor(),power=numeric(),day= character())
#Specifying weekend days in vector, weekend
weekend <- c("Saturday","Sunday")
for(i in 1:nrow(df2)){
if(is.element(df2[i,3], weekend)){
weekend_data <- rbind(weekend_data, df2[i,])
} else{
week_data <- rbind(week_data, df2[i,])
}
}
The datasets created, i.e., weekend_data and week_data are my required sub datasets.
I have a table in R like:
start duration
02/01/2012 20:00:00 5
05/01/2012 07:00:00 6
etc... etc...
I got to this by importing a table from Microsoft Excel that looked like this:
date time duration
2012/02/01 20:00:00 5
etc...
I then merged the date and time columns by running the following code:
d.f <- within(d.f, { start=format(as.POSIXct(paste(date, time)), "%m/%d/%Y %H:%M:%S") })
I want to create a third column called 'end', which will be calculated as the number of hours after the start time. I am pretty sure that my time is a POSIXct vector. I have seen how to manipulate one datetime object, but how can I do that for the entire column?
The expected result should look like:
start duration end
02/01/2012 20:00:00 5 02/02/2012 01:00:00
05/01/2012 07:00:00 6 05/01/2012 13:00:00
etc... etc... etc...
Using lubridate
> library(lubridate)
> df$start <- mdy_hms(df$start)
> df$end <- df$start + hours(df$duration)
> df
# start duration end
#1 2012-02-01 20:00:00 5 2012-02-02 01:00:00
#2 2012-05-01 07:00:00 6 2012-05-01 13:00:00
data
df <- structure(list(start = c("02/01/2012 20:00:00", "05/01/2012 07:00:00"
), duration = 5:6), .Names = c("start", "duration"), class = "data.frame", row.names = c(NA,
-2L))
You can simply add dur*3600 to start column of the data frame. E.g. with one date:
start = as.POSIXct("02/01/2012 20:00:00",format="%m/%d/%Y %H:%M:%S")
start
[1] "2012-02-01 20:00:00 CST"
start + 5*3600
[1] "2012-02-02 01:00:00 CST"
I want to the frequency of observations "after 19:00" independently of date. What would be the quickest and most logical way?
As I told R that the Date column is a date as.Date, I would like to tell R that Time is a time column... and then just ask "Time > "19:00:00"" but this does not seem to be possible.
I tried as.POSIXct(Time, format= "%H:%M:%S") but this function adds a date of today to my column which creates annoying clutter and unprofessional look.
I could use substr(as.character(Time),1,2) > 19 but that doesn't feel very elegant either.
Date Time
1 2014-01-01 17:16:48
2 2014-01-01 18:57:36
3 2014-01-01 19:40:48
4 2014-01-01 19:40:48
5 2014-01-01 20:09:36
6 2014-01-01 20:24:00
library(data.table)
## Convert (by reference) your data to a data.table
setDT(dat)
dat[, .N
, by = list(above_1900 = hour(as.POSIXlt(Time, format="%H:%M:%S")) > 19)]
above_19 N
1: FALSE 4
2: TRUE 2