I am using lubridate to parse a timestamp to POSIXlt.
user time
____ ____
1 2017-09-01 00:01:01
1 2017-09-01 00:01:20
1 2017-09-01 00:03:01
library(lubridate)
data[, time:=parse_date_time2(time,orders="YmdHMS",tz="NA")]
But this resulted in
Warning message:
In as.POSIXct.POSIXlt(.mklt(.Call("parse_dt", x, orders, FALSE, :
unknown timezone 'NA'
Any help is appreciated.
Parse simply without tz
> ts <- '2017-09-01 00:01:01'
> lubridate::parse_date_time2(ts,orders="YmdHMS")
[1] "2017-09-01 00:01:01 UTC"
Similar to input code:
data[, time:=parse_date_time2(time,orders="YmdHMS")]
Related
I am trying to use the RchivalTag package in R and I am having issues with the date and time. Whenever I try to use read_histos() it gives me the following error:
hist_dat_1<- read_histos(Georgia)
Warning: 598 parsing failures.
row col expected actual
1 -- date like %H:%M:%S %d-%b-%Y 2009-02-23 00:00:00
2 -- date like %H:%M:%S %d-%b-%Y 2009-02-23 06:00:00
3 -- date like %H:%M:%S %d-%b-%Y 2009-02-23 12:00:00
4 -- date like %H:%M:%S %d-%b-%Y 2009-02-23 18:00:00
5 -- date like %H:%M:%S %d-%b-%Y 2009-02-24 00:00:00
... ... ........................... ...................
See problems(...) for more details.
Error in .fact2datetime(as.character(add0$Date), date_format = date_format, :
date concersion failed! Please revise current'date_format': %H:%M:%S %d-%b-%Y to 2009-02-23 00:00:00
Current date/time class:
class(Georgia$Date)
#[1] "POSIXct" "POSIXt"
I have changed the date and time format to:
format(Georgia$Date, format = "%H:%M:%S %d-%b-%Y")
But I am still getting the error message. If anyone is familiar with this package, help would be greatly appreciated
This is a difficult one without reproducible data, but we can actually replicate your error using one of the package's built-in data sets:
library(RchivalTag)
hist_file <- system.file("example_files/67851-12h-Histos.csv",package="RchivalTag")
Georgia <- read.csv(hist_file)
This reads the histogram data in as a data frame, which seems to be the stage you are at. We will, however, change the format to POSIXct standard date-time representation.
Georgia$Date <- strptime(Georgia$Date, "%H:%M:%S %d-%b-%Y")
Georgia$Date <- as.character(Georgia$Date)
Now when we try to read the data we get:
read_histos(Georgia)
#> Warning: 34 parsing failures.
#> row col expected actual
#> 1 -- date like %H:%M:%S %d-%b-%Y 2012-02-01 08:10:00
#> 2 -- date like %H:%M:%S %d-%b-%Y 2012-02-01 12:00:00
#> 3 -- date like %H:%M:%S %d-%b-%Y 2012-02-02 00:00:00
#> 4 -- date like %H:%M:%S %d-%b-%Y 2012-02-02 12:00:00
#> 5 -- date like %H:%M:%S %d-%b-%Y 2012-02-03 00:00:00
#> ... ... ........................... ...................
#> See problems(...) for more details.
#>
#> Error in .fact2datetime(as.character(add0$Date), date_format = date_format, :
#> date concersion failed! Please revise current'date_format': %H:%M:%S %d-%b-%Y to 2012-02-01 08:10:00
To fix this, we can rewrite the Date column using strftime:
Georgia$Date <- strftime(Georgia$Date, "%H:%M:%S %d-%b-%Y")
And now we can read the data without the parse errors.
read_histos(Georgia)
#> DeployID Ptt Type p0_25 p0_50 p0_75 p0_90
#> 1 67851 67851 TAD 2.9 100 100 100
#> 2 67851 67851 TAT 3.0 100 100 100
I have a dataframe (vlinder) like the following, whereby the date and the timestamp (in UTC) are in separate columns:
date time.utc variable
1/04/2020 0:00:00 12
1/04/2020 0:05:00 54
In a first step, I combined the date and time variables into one column called dateandtime using the following code:
vlinder$dateandtime <- paste(vlinder$date, vlinder$time.utc)
which resulted in an extra column in dataframe vlinder:
date time.utc variable dateandtime
1/04/2020 0:00:00 12 1/04/2020 0:00:00
1/04/2020 0:05:00 54 1/04/2020 0:05:00
I want to convert the time of UTC into local time (which is CEST, so a time difference of 2 hours).
I tried using the following code, but I get something totally different.
vlinder$dateandtime <- as.POSIXct(vlinder$dateandtime, tz = "UTC")
vlinder$dateandtime.cest <- format(vlinder$dateandtime, tz = "Europe/Brussels", usetz = TRUE)
which results in:
date time.utc variable dateandtime dateandtime.cest
1/04/2020 0:00:00 12 0001-04-20 0001-04-20 00:17:30 LMT
1/04/2020 0:05:00 54 0001-04-20 0001-04-20 00:17:30 LMT
How can I solve this?
Many thanks!
Here's a lubridate and tidyverse answer. Some data tidying, data type changes, and then bam. Check lubridate::OlsonNames() for valid time zones (tz). (I'm not positive I chose the correct tz.)
library(tidyverse)
library(lubridate)
df <- read.table(header = TRUE,
text = "date time.utc variable
1/04/2020 00:00:00 12
1/04/2020 00:05:00 54")
df <- df %>%
mutate(date = mdy(date),
datetime_utc = as_datetime(paste(date, time.utc)),
datetime_cest = as_datetime(datetime_utc, tz = 'Europe/Brussels'))
date time.utc variable datetime_utc datetime_cest
1 2020-01-04 00:00:00 12 2020-01-04 00:00:00 2020-01-04 01:00:00
2 2020-01-04 00:05:00 54 2020-01-04 00:05:00 2020-01-04 01:05:00
The default format of as.POSIXct expects an date ordered by Year-Month-Day. Therefore the date 01/04/2020 is translated into the 20th April of Year 1.
You just need to add your timeformat to as.POSIXct:
vlinder$dateandtime <- as.POSIXct(vlinder$dateandtime, tz = "UTC", format = "%d/%m/%Y %H:%M:%S")
format(vlinder$dateandtime, tz = "Europe/Brussels", usetz = TRUE)
I am new at using R and I am encountering a problem with historical hourly electric load data that I have downloaded.My goal is to make a load forecast based on an ARIMA model and/or Artificial Neural Networks.
The problem is that the data is in the following Date-time (hourly) format:
#> DateTime Day_ahead_Load Actual_Load
#> [1,] "01.01.2015 00:00 - 01.01.2015 01:00" "6552" "6100"
#> [2,] "01.01.2015 01:00 - 01.01.2015 02:00" "6140" "5713"
#> [3,] "01.01.2015 02:00 - 01.01.2015 03:00" "5950" "5553"
I have tried to make a POSIXct object but it didn't work:
as.Date.POSIXct(DateTime, format = "%d-%m-%Y %H:%M:%S", tz="EET", usetz=TRUE)
The message I get is that it is not in an unambiguous format. I would really appreciate your feedback on this.
Thank you in advance.
Best Regards,
Iro
You have 2 major problems. First, your DateTime column contains two dates, so you need to split that column into two. Second, your format argument has - characters but your date has . characters.
We can use separate from tidyr and mutate with across to change the columns to POSIXct.
library(dplyr)
library(tidyr)
data %>%
separate(DateTime, c("StartDateTime","EndDateTime"), " - ") %>%
mutate(across(c("StartDateTime","EndDateTime"),
~ as.POSIXct(., format = "%d.%m.%Y %H:%M",
tz="EET", usetz=TRUE)))
StartDateTime EndDateTime Day_ahead_Load Actual_Load
1 2015-01-01 00:00:00 2015-01-01 01:00:00 6552 6100
2 2015-01-01 01:00:00 2015-01-01 02:00:00 6140 5713
3 2015-01-01 02:00:00 2015-01-01 03:00:00 5950 5553
as I failed to solve my problem with PHP/MySQL or Excel due to the data size, I'm trying to do my very first steps with R now and struggle a bit. The problem is this: I have a second-by-second CSV-file with half a year of data, that looks like this:
metering,timestamp
123,2016-01-01 00:00:00
345,2016-01-01 00:00:01
243,2016-01-01 00:00:02
101,2016-01-01 00:00:04
134,2016-01-01 00:00:06
As you see, there are some seconds missing every once in a while (don't ask me, why the values are written before the timestamp, but that's how I received the data…). Now I try to calculate the amount of values (= seconds) that are missing.
So my idea was
to create a vector that is correct (includes all sec-by-sec timestamps),
match the given CSV file with that new vector, and
sum up all the timestamps with no value.
I managed to make step 1 happen with the following code:
RegularTimeSeries <- seq(as.POSIXct("2016-01-01 00:00:00", tz = "UTC"), as.POSIXct("2016-01-01 00:00:30", tz = "UTC"), by = "1 sec")
write.csv(RegularTimeSeries, file = "RegularTimeSeries.csv")
To have an idea what I did I also exported the vector to a CSV that looks like this:
"1",2016-01-01 00:00:00
"2",2016-01-01 00:00:01
"3",2016-01-01 00:00:02
"4",2016-01-01 00:00:03
"5",2016-01-01 00:00:04
"6",2016-01-01 00:00:05
"7",2016-01-01 00:00:06
Unfortunately I have no idea how to go on with step 2 and 3. I found some very similar examples (http://www.r-bloggers.com/fix-missing-dates-with-r/, R: Insert rows for missing dates/times), but as a total R noob I struggled to translate these examples to my given sec-by-sec data.
Some hints for the greenhorn would be very very helpful – thank you very much in advance :)
In the tidyverse,
library(dplyr)
library(tidyr)
# parse datetimes
df %>% mutate(timestamp = as.POSIXct(timestamp)) %>%
# complete sequence to full sequence from min to max by second
complete(timestamp = seq.POSIXt(min(timestamp), max(timestamp), by = 'sec'))
## # A tibble: 7 x 2
## timestamp metering
## <time> <int>
## 1 2016-01-01 00:00:00 123
## 2 2016-01-01 00:00:01 345
## 3 2016-01-01 00:00:02 243
## 4 2016-01-01 00:00:03 NA
## 5 2016-01-01 00:00:04 101
## 6 2016-01-01 00:00:05 NA
## 7 2016-01-01 00:00:06 134
If you want the number of NAs (i.e. the number of seconds with no data), add on
%>% tally(is.na(metering))
## # A tibble: 1 x 1
## n
## <int>
## 1 2
You can check which values of your RegularTimeSeries are in your broken time series using which and %in%. First create BrokenTimeSeries from your example:
RegularTimeSeries <- seq(as.POSIXct("2016-01-01 00:00:00", tz = "UTC"), as.POSIXct("2016-01-01 00:00:30", tz = "UTC"), by = "1 sec")
BrokenTimeSeries <- RegularTimeSeries[-c(3,6,9)] # remove some seconds
This will give you the indeces of values within RegularTimeSeries that are not in BrokenTimeSeries:
> which(!(RegularTimeSeries %in% BrokenTimeSeries))
[1] 3 6 9
This will return the actual values:
> RegularTimeSeries[which(!(RegularTimeSeries %in% BrokenTimeSeries))]
[1] "2016-01-01 00:00:02 UTC" "2016-01-01 00:00:05 UTC" "2016-01-01 00:00:08 UTC"
Maybe I'm misunderstanding your problem but you can count the number of missing seconds simply subtracting the length of your broken time series from RegularTimeSeries or getting the length of any of the two resulting vectors above.
> length(RegularTimeSeries) - length(BrokenTimeSeries)
[1] 3
> length(which(!(RegularTimeSeries %in% BrokenTimeSeries)))
[1] 3
> length(RegularTimeSeries[which(!(RegularTimeSeries %in% BrokenTimeSeries))])
[1] 3
If you want to merge the files together to see the missing values you can do something like this:
#data with regular time series and a "step"
df <- data.frame(
RegularTimeSeries
)
df$BrokenTimeSeries[RegularTimeSeries %in% BrokenTimeSeries] <- df$RegularTimeSeries
df$BrokenTimeSeries <- as.POSIXct(df$BrokenTimeSeries, origin="2015-01-01", tz="UTC")
resulting in:
> df[1:12,]
RegularTimeSeries BrokenTimeSeries
1 2016-01-01 00:00:00 2016-01-01 00:00:00
2 2016-01-01 00:00:01 2016-01-01 00:00:01
3 2016-01-01 00:00:02 <NA>
4 2016-01-01 00:00:03 2016-01-01 00:00:02
5 2016-01-01 00:00:04 2016-01-01 00:00:03
6 2016-01-01 00:00:05 <NA>
7 2016-01-01 00:00:06 2016-01-01 00:00:04
8 2016-01-01 00:00:07 2016-01-01 00:00:05
9 2016-01-01 00:00:08 <NA>
10 2016-01-01 00:00:09 2016-01-01 00:00:06
11 2016-01-01 00:00:10 2016-01-01 00:00:07
12 2016-01-01 00:00:11 2016-01-01 00:00:08
If all you want is the number of missing seconds, it can be done much more simply. First find the number of seconds in your timerange, and then subtract the number of rows in your dataset. This could be done in R along these lines:
n.seconds <- difftime("2016-06-01 00:00:00", "2016-01-01 00:00:00", units="secs")
n.rows <- nrow(my.data.frame)
n.missing.values <- n.seconds - n.rows
You might change the time range and the variable of your data frame.
Hope it helps
d <- (c("2016-01-01 00:00:01",
"2016-01-01 00:00:02",
"2016-01-01 00:00:03",
"2016-01-01 00:00:04",
"2016-01-01 00:00:05",
"2016-01-01 00:00:06",
"2016-01-01 00:00:10",
"2016-01-01 00:00:12",
"2016-01-01 00:00:14",
"2016-01-01 00:00:16",
"2016-01-01 00:00:18",
"2016-01-01 00:00:20",
"2016-01-01 00:00:22"))
d <- as.POSIXct(d)
for (i in 2:length(d)){
if(difftime(d[i-1],d[i], units = "secs") < -1 ){
c[i] <- d[i]
}
}
class(c) <- c('POSIXt','POSIXct')
c
[1] NA NA NA
NA NA
[6] NA "2016-01-01 00:00:10 EST" "2016-01-01 00:00:12
EST" "2016-01-01 00:00:14 EST" "2016-01-01 00:00:16 EST"
[11] "2016-01-01 00:00:18 EST" "2016-01-01 00:00:20 EST" "2016-01-01
00:00:22 EST"
I am new to R and I have a data frame with date time as variable. For every hour each day temperature is recorded, and date time is in format of YYYY-MM-DD 00:00:00.
Now I would like to convert the time into a factor ranging from 0 to 23 each day.
So For each day my new column should have factors 0 to 23. Could anyone help me with this? My 2015-01-01 00:00:00, should give me 0, while 2015-01-01 01:00:00, should give me 1 and so on. Also my 2015-01-02 00:00:00 should be 0 again.
You can convert your timestamp into a POSIXlt object. Once you have that, you can obtain the hour directly like this:
> timestamp <- as.POSIXlt("2015-01-01 00:00:00")
> timestamp
[1] "2015-01-01 MYT"
> timestamp$hour
[1] 0
Using a sample data, one way would be the following.
mydf <- data.frame(id = c(1,1,1,2,2,1,1),
event = c("start", "valid", "end", "start", "bad", "start", "bad"),
time = as.POSIXct(c("2015-05-16 20:46:53", "2015-05-16 20:46:56", "2015-05-16 21:46:59",
"2015-05-16 22:46:53", "2015-05-16 22:47:00", "2015-05-16 22:49:05",
"2015-05-16 23:49:09"), format = "%Y-%m-%d %H:%M:%S"),
stringsAsFactors = FALSE)
library(dplyr)
mutate(mydf, group = factor(format(time, "%H")))
# id event time group
#1 1 start 2015-05-16 20:46:53 20
#2 1 valid 2015-05-16 20:46:56 20
#3 1 end 2015-05-16 21:46:59 21
#4 2 start 2015-05-16 22:46:53 22
#5 2 bad 2015-05-16 22:47:00 22
#6 1 start 2015-05-16 22:49:05 22
#7 1 bad 2015-05-16 23:49:09 23
Tim's answer using POSIXlt is probably the best option, but here's a regex way just in case:
> times <- c("2015-01-01 00:00:00", "2015-01-01 01:00:00", "2015-01-02 00:00:00")
> regmatches(times, regexpr("(?<=-\\d{2} )\\d{2}", times, perl=TRUE))
[1] "00" "01" "00"
With the extracted hours you can make them factors or integers as necessary.
#Sairam, in addition to #jazzurro's use of 'dplyr' (which, like jazzurro, many R-insitas routinely use)...in the future, if you need/want a simple & powerful way to manipulate dates, you're encouraged to gain familiarity with another package: 'lubridate.'
lubridate makes working with dates a snap. Hope this helps and best regards on your project.