Sometime the data may not be recorded. So i want to make na and then check na distribution. To make the problem simple, i will generate date and hour data each year. And join for this problem. How can i make it date and hour from year?
For example 2017 and 2018, maybe 17520 data (each 8760)
1 2016-01-01 01:00:00
2 2016-01-01 02:00:00
3 2016-01-01 03:00:00
4 2016-01-01 04:00:00
5 2016-01-01 05:00:00
6 2016-01-01 06:00:00
7 2016-01-01 07:00:00
8 2016-01-01 08:00:00
9 2016-01-01 09:00:00
10 2016-01-01 10:00:00
Try
seq(from=as.POSIXct("2017-01-01 01:00", tz="UTC"),
to=as.POSIXct("2018-12-31 23:00", tz="UTC"),
by="hour")
length(seq(from=as.POSIXct("2017-01-01 01:00", tz="UTC"),
to=as.POSIXct("2018-12-31 23:00", tz="UTC"),
by="hour"))
[1] 17519
Related
How can I plot time series data hourly so that x-axis is 1:24. If I hav let's say one year of data so 365 days and 8000+ rows?
Tried with ggplot2 but didn't get it to work.
head looks like this
Value DateTime
1 104 2018-01-01 01:00:00
2 104 2018-01-01 02:00:00
3 108 2018-01-01 03:00:00
4 106 2018-01-01 04:00:00
5 117 2018-01-01 05:00:00
6 166 2018-01-01 06:00:00
And Tail
Value DateTime
8754 160.10 2018-12-31 19:00:00
8755 156.11 2018-12-31 20:00:00
8756 139.11 2018-12-31 21:00:00
8757 112.11 2018-12-31 22:00:00
8758 96.10 2018-12-31 23:00:00
8759 90.11 2019-01-01 00:00:00
Here is an image what I'm trying to achieve
What about having time of the day and date as seperate variables? You can use the package hms to do this.
timeOfDay <- as.hms(df$DateTime)
date <- as.Date(df$DateTime)
Now, you can use timeOfDay on the x-axis and date as your grouping aesthetics.
This works for me:
ggplot(df, aes(x = timeOfDay, y = value)) +
geom_line(aes(group = date))
I have a dataframe where I splitted the datetime column by date and time (two columns). However, when I group by time it gives me duplicates in time. So, to analyze it I used table() on time column, and it gave me duplicates also. This is a sample of it:
> table(df$time)
00:00:00 00:00:00 00:15:00 00:15:00 00:30:00 00:30:00
2211 1047 2211 1047 2211 1047
As you may see, when I splitted one of the "unique" values kept a " " inside. Is there a easy way to solve this?
PS: The datatype of the time column is character.
EDIT: Code added
df$datetime <- as.character.Date(df$datetime)
x <- colsplit(df$datetime, ' ', names = c('Date','Time'))
df <- cbind(df, x)
There are a number of approaches. One of them is to use appropriate functions to extract Dates and Times from Datetime column:
df <- data.frame(datetime = seq(
from=as.POSIXct("2018-5-15 0:00", tz="UTC"),
to=as.POSIXct("2018-5-16 24:00", tz="UTC"),
by="30 min") )
head(df$datetime)
#[1] "2018-05-15 00:00:00 UTC" "2018-05-15 00:30:00 UTC" "2018-05-15 01:00:00 UTC" "2018-05-15 01:30:00 UTC"
#[5] "2018-05-15 02:00:00 UTC" "2018-05-15 02:30:00 UTC"
df$Date <- as.Date(df$datetime)
df$Time <- format(df$datetime,"%H:%M:%S")
head(df)
# datetime Date Time
# 1 2018-05-15 00:00:00 2018-05-15 00:00:00
# 2 2018-05-15 00:30:00 2018-05-15 00:30:00
# 3 2018-05-15 01:00:00 2018-05-15 01:00:00
# 4 2018-05-15 01:30:00 2018-05-15 01:30:00
# 5 2018-05-15 02:00:00 2018-05-15 02:00:00
# 6 2018-05-15 02:30:00 2018-05-15 02:30:00
table(df$Time)
#00:00:00 00:30:00 01:00:00 01:30:00 02:00:00 02:30:00 03:00:00 03:30:00 04:00:00 04:30:00 05:00:00 05:30:00
#3 2 2 2 2 2 2 2 2 2 2 2
#06:00:00 06:30:00 07:00:00 07:30:00 08:00:00 08:30:00 09:00:00 09:30:00 10:00:00 10:30:00 11:00:00 11:30:00
#2 2 2 2 2 2 2 2 2 2 2 2
#12:00:00 12:30:00 13:00:00 13:30:00 14:00:00 14:30:00 15:00:00 15:30:00 16:00:00 16:30:00 17:00:00 17:30:00
#2 2 2 2 2 2 2 2 2 2 2 2
#18:00:00 18:30:00 19:00:00 19:30:00 20:00:00 20:30:00 21:00:00 21:30:00 22:00:00 22:30:00 23:00:00 23:30:00
#2 2 2 2 2 2 2 2 2 2 2 2
#If the data were given as character strings and contain extra spaces the above approach will still work
df <- data.frame(datetime=c("2018-05-15 00:00:00","2018-05-15 00:30:00",
"2018-05-15 01:00:00", "2018-05-15 02:00:00",
"2018-05-15 00:00:00","2018-05-15 00:30:00"),
stringsAsFactors=FALSE)
df$Date <- as.Date(df$datetime)
df$Time <- format(as.POSIXct(df$datetime, tz="UTC"),"%H:%M:%S")
head(df)
# datetime Date Time
# 1 2018-05-15 00:00:00 2018-05-15 00:00:00
# 2 2018-05-15 00:30:00 2018-05-15 00:30:00
# 3 2018-05-15 01:00:00 2018-05-15 01:00:00
# 4 2018-05-15 02:00:00 2018-05-15 02:00:00
# 5 2018-05-15 00:00:00 2018-05-15 00:00:00
# 6 2018-05-15 00:30:00 2018-05-15 00:30:00
table(df$Time)
#00:00:00 00:30:00 01:00:00 02:00:00
# 2 2 1 1
reshape2::colsplit accepts regular expressions, so you could split on '\s+' which matches 1 or more whitespace characters.
You can find out more about regular expressions in R using ?base::regex. The syntax is generally constant between languages, so you can use pretty much any regex tutorial. Take a look at https://regex101.com/. This site evaluates your regular expressions in real time and shows you exactly what each part is matching. It is extremely helpful!
Keep in mind that in R, as compared to most other languages, you must double the number of backslashes \. So \s (to match 1 whitespace character) must be written as \\s in R.
This question already has answers here:
How do I clear an NA flag for a posix value?
(3 answers)
Closed 5 years ago.
I have a large dataset (21683 records) and I've managed to combine date and time to datetime in a correct way using asPOSIXct. Nevertheless, this did not work for 6 records (17463:17468). This is the dataset I'm using:
> head(solar.angle)
Date Time sol.elev.angle ID Datetime
1 2016-11-24 15:00:00 41.32397 1 2016-11-24 15:00:00
2 2016-11-24 15:10:00 39.11225 2 2016-11-24 15:10:00
3 2016-11-24 15:20:00 36.88180 3 2016-11-24 15:20:00
4 2016-11-24 15:30:00 34.63507 4 2016-11-24 15:30:00
5 2016-11-24 15:40:00 32.37418 5 2016-11-24 15:40:00
6 2016-11-24 15:50:00 30.10096 6 2016-11-24 15:50:00
> solar.angle[17460:17470,]
Date Time sol.elev.angle ID Datetime
17488 2017-03-26 01:30:00 -72.01821 17460 2017-03-26 01:30:00
17489 2017-03-26 01:40:00 -69.53832 17461 2017-03-26 01:40:00
17490 2017-03-26 01:50:00 -67.05409 17462 2017-03-26 01:50:00
17491 2017-03-26 02:00:00 -64.56682 17463 <NA>
17492 2017-03-26 02:10:00 -62.07730 17464 <NA>
17493 2017-03-26 02:20:00 -59.58609 17465 <NA>
17494 2017-03-26 02:30:00 -57.09359 17466 <NA>
17495 2017-03-26 02:40:00 -54.60006 17467 <NA>
17496 2017-03-26 02:50:00 -52.10572 17468 <NA>
17497 2017-03-26 03:00:00 -49.61071 17469 2017-03-26 03:00:00
17498 2017-03-26 03:10:00 -47.11515 17470 2017-03-26 03:10:00
This is the code I'm using:
solar.angle$Datetime <- as.POSIXct(paste(solar.angle$Date,solar.angle$Time), format="%Y-%m-%d %H:%M:%S")
I've already tried to fill them in manually but this did not make any difference:
> solar.angle$Datetime[17463] <- as.POSIXct('2017-03-26 02:00:00', format = "%Y-%m-%d %H:%M:%S")
> solar.angle$Datetime[17463]
[1] NA
Any help will be appreciated!
The problem here is that this is the time you switch to summer time, so you need to specify the time zone, otherwise there is ambiguity.
If you specify a time zone, it will work:
as.POSIXct('2017-03-26 02:00:00', format = "%Y-%m-%d %H:%M:%S", tz = "GMT")
Which returns:
"2017-03-26 02:00:00 GMT"
You can check ?timezones for more information.
ALL;
I just have a data file with two columns, one is time series, one is values. Normally, the time interval between tow rows is exact 5 mins,but sometimes it is larger than 5 mins
A sample is as below:
dd <- data.table(date = c("2015-07-01 00:00:00", "2015-07-01 00:05:00", "2015-07-01 00:20:00","2015-07-01 00:25:00","2015-07-01 00:30:00"),
value = c(9,1,10,12,0))
what i want to do is to check the time interval between two rows, when the time interval is larger than 5 mins, then insert a new row below with 0 value, so , the result could be :
date value
2015-07-01 00:00:00 9
2015-07-01 00:05:00 1
2015-07-01 00:10:00 0
2015-07-01 00:15:00 0
2015-07-01 00:20:00 10
2015-07-01 00:25:00 12
2015-07-01 00:30:00 0
any suggestion and idea is welcome :)
We can do a join after converting to 'date' to DateClass
dd[, date := as.POSIXct(date)][]
dd[dd[, .(date=seq(min(date), max(date), by = "5 min"))], on = 'date'
][is.na(value), value := 0][]
# date value
#1: 2015-07-01 00:00:00 9
#2: 2015-07-01 00:05:00 1
#3: 2015-07-01 00:10:00 0
#4: 2015-07-01 00:15:00 0
#5: 2015-07-01 00:20:00 10
#6: 2015-07-01 00:25:00 12
#7: 2015-07-01 00:30:00 0
The Dataset
head(data)
Date OPEN
2015-11-30 10:00:00 951.15
2015-11-30 10:30:00 949.90
2015-11-30 11:00:00 943.45
2015-11-30 11:30:00 944.30
2015-11-30 12:00:00 942.00
2015-11-30 12:30:00 940.60
2015-01-01 10:00:00 951.15
2015-01-01 10:30:00 949.90
2015-01-02 10:30:00 943.45
2015-01-02 11:30:00 944.30
2015-01-03 10:00:00 943.45
2015-01-03 10:30:00 943.45
2015-01-03 11:30:00 944.30
2015-01-06 10:00:00 942.00
2015-01-06 10:30:00 940.60
2015-01-06 11:00:00 940.60
2015-01-06 11:30:00 942.00
str(data)
'data.frame': 32023 obs. of 2 variables:
$ Date : POSIXct, format: "2015-11-30 10:00:00" "2015-11-30 10:30:00" "2015-11-30 11:00:00" ...
$ OPEN : num 951 950 943 944 942 ...
Hi,
Dataframe is mentioned above. I want to extract OPEN prices with timestamps 10:00 and 10:30 for all the dates available. I only need to keep timestamps 10:00 to 10:30 in filter condition irrespective of dates. Please suggest in R.
Thanks.
We can format the 'Date' to extract the HH:MM part, use %in% to get a logical vector and subset based on that.
subset(data, format(Date, "%H:%M") %in% c("10:00", "10:30"), select="OPEN")
# OPEN
#1 951.15
#2 949.90
#7 951.15
#8 949.90
#9 943.45
#11 943.45
#12 943.45
#14 942.00
#15 940.60
If it is between those intervals
library(chron)
subset(data, between(times(format(Date, "%H:%M:%S")) ,
times("10:00:00"), times("10:30:00")))
you can use lubridate package to make a friendly subset:
library(lubridate)
res <- subset(data, minute(Date) <=30 & hour(Date) == 10)