R Converting POSIXlt to xts - r

I have a time series in the format:
> str(Y$Date)
POSIXlt[1:174110], format: "2001-01-01 12:00:00" "2001-01-01 05:30:00" "2001-01-02 01:30:00" "2001-01-02 02:00:00" "2001-01-02 02:00:00" "2001-01-02 02:01:00" "2001-01-02 04:00:00" "2001-01-02 04:00:00" ...
> summary(Y$Date)
Min. 1st Qu. Median Mean 3rd Qu. Max.
"2001-01-01 05:30:00" "2004-03-15 10:40:30" "2007-01-03 04:00:00" "2006-11-11 15:53:11" "2009-08-13 12:00:00" "2011-12-30 12:30:00"
> length(Y$Date)
[1] 174110
which I need to convert to a xts format. In order to do so I have done the following:
date <- Y$Date
date <- as.xts(date)
> xtsible(date) #tests wheather or not the data is convertibal to format xts
[1] TRUE
However:
> str(date)
An 'xts' object of zero-width
> length(date)
[1] 0
> head(date['2001'])
[,1]
2001-01-01 05:30:00 NA
2001-01-01 12:00:00 NA
2001-01-02 01:30:00 NA
2001-01-02 02:00:00 NA
2001-01-02 02:00:00 NA
2001-01-02 02:00:00 NA
and in order to get the data back into the data frame:
> Y$date <- date
Error in `$<-.data.frame`(`*tmp*`, "date", value = numeric(0)) :
replacement has 0 rows, data has 174110
and
> as.data.frame(date)
Error in data.frame(`coredata(x)` = c(NA_character_, NA_character_, NA_character_, :
duplicate row.names: 2001-01-02 02:00:00, ... , 2001-01-08 06:00:00, 200
In addition: Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
corrupt data frame: columns will be truncated or padded with NAs
> str(Y)
'data.frame': 174110 obs. of 17 variables:
$ Date : POSIXlt, format: "2001-01-01 12:00:00" "2001-01-01 05:30:00" "2001-01-02 01:30:00" "2001-01-02 02:00:00" ...
$ C : chr "MA" "IN" "SI" "ID" ...
$ Event : chr "MALAY VEHICLE SALES" "Interbank Offer Rate - Percent" "Advance GDP Estimate (YoY)" "Foreign Reserves" ...
$ News : num NA NA NA NA NA NA NA NA NA NA ...
$ Growth : num 148 NA 0.3 387.2 0 ...
$ Surv.M : num NA NA NA NA NA NA NA NA NA NA ...
$ Act : num 30892 NA 10.5 29281.4 12500 ...
$ Prior : num 30744 8100 10.2 28894.2 12500 ...
$ Revised : num NA NA NA NA NA ...
$ Type : chr NA NA "%" "$B" ...
$ Freq. : chr "M" "NA" "Q" "M" ...
$ Ticker : chr "MAVSTTL Index" "IMIBOR Index" "SGAVYOY% Index" "IDGFA Index" ...
$ Period : chr "Nov" "12/31/13" "4Q" "Dec" ...
$ Category: chr "NA" "NA" "NA" "NA" ...
$ Time : chr "12:00:00 AM" "05:30:00 AM" "01:30:00 AM" "02:00:00 AM" ...
$ Country : chr "Malaysia" "India" "Singapore" "Indonesia" ...
$ date : POSIXlt, format: "2001-01-01 12:00:00" "2001-01-01 05:30:00" "2001-01-02 01:30:00" "2001-01-02 02:00:00" ...
I don"t know why I can't properly convert the data into the xts format and then get them back into the dataframe.
Your help is very much appreciated.

I have answered a similar question asked by you previously. I guess it has caused some confusion. When you see ?xts, it says that xts creates an "extensible time-series" object. First we have to specify x, the object of which time-series has to be made and then specify the index, i.e. the time-series itself (Y$Date in your case).
Here is a simplified solution:
Y_new <- xts(x = Y[,-1], order.by = Y$Date]
This creates a new object Y_new in the time-series format with all the data of Y and with an added benefit easily choosing desired time intervals.

Related

Convert character variables into numeric with R

FIRST QUESTION EVER ;)
Here's the point: I have this dataset and I started without "stringsAsFactors=FALSE" in read.csv function. I can't work with those data because I got the Warning message: NAs introduced by coercion. Thank you for the help :)
rm(list=ls())
path <- "....."
file <- read.csv(path, header = TRUE, sep = ",", stringsAsFactors=FALSE)
str(file)
#'data.frame': 33 obs. of 11 variables:
#$ Var1: chr "01/09/2021" "02/09/2021" "09/09/2021" "10/09/2021" ...
#$ Var2: chr "mercoledì" "giovedì" "giovedì" "venerdì" ...
#$ Var3: chr "2,5" "2,5" "2,5" "3,0" ...
#$ Var4: chr "4,0" "0,0" "2,0" "3,0" ...
#$ Var5: chr "2,0" "5,0" "5,0" "5,0" ...
#$ Var5: chr "0,0" "0,0" "0,0" "0,0" ...
#$ Var6: chr "6,0" "5,0" "7,0" "8,0" ...
#$ Var7: chr "23,5" "25,0" "28,0" "32,0" ...
#$ Var8: chr "0,0" "1,0" "5,0" "5,5" ...
#$ Var9: chr "23,5" "26,0" "33,0" "37,5" ...
#$ Var10: chr "67,0" "0,0" "0,0" "0,0" ...
as.numeric(file$Var7)
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Warning message:
NAs introduced by coercion
CSV FILE
I managed to recreate your problem. Your file is using , both as field separator and decimal separator (which is uncommon).
You can fix your problem by specifying that decimals are commas in (dec = ",") in read.csv(), as follows:
read.csv(
path,
header = TRUE,
sep = ",",
dec = ",", # I've added this line
stringsAsFactors = FALSE
)
Change this, run str(file) again, and you should see that most columns are numeric.

Error in converting character [duplicate]

This question already has answers here:
How to convert time to decimal
(3 answers)
Closed 5 years ago.
I have data like this.
> head(new3)
Date Hour Dayahead Actual Difference
1 2015-01-01 0:00 42955 42425 530
2 2015-01-01 0:15 42412 42021 391
3 2015-01-01 0:30 41901 42068 -167
4 2015-01-01 0:45 41355 41874 -519
5 2015-01-01 1:00 40710 41230 -520
6 2015-01-01 1:15 40204 40810 -606
Their characteristics are as below:
> str(new3)
'data.frame': 35044 obs. of 5 variables:
$ Date : Date, format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-
01-01" ...
$ Hour : chr "0:00" "0:15" "0:30" "0:45" ...
$ Dayahead : chr "42955" "42412" "41901" "41355" ...
$ Actual : int 42425 42021 42068 41874 41230 40810 40461 40160 39958
39671 ...
$ Difference: chr "530" "391" "-167" "-519" ...
I tried to change Hour and Dayahead as numberic by doing as.numeric. But it shows me this.
> new3$Dayahead<-as.numeric(new3$Dayahead)
Warning message:
NAs introduced by coercion
> new3$Hour<-as.numeric(new3$Hour)
Warning message:
NAs introduced by coercion
So when I checked with str again, it showed me this.
> str(new3)
'data.frame': 35044 obs. of 5 variables:
$ Date : Date, format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-
01-01" ...
$ Hour : num NA NA NA NA NA NA NA NA NA NA ...
$ Dayahead : num 42955 42412 41901 41355 40710 ...
$ Actual : int 42425 42021 42068 41874 41230 40810 40461 40160 39958
39671 ...
$ Difference: chr "530" "391" "-167" "-519" ...
questions is,
1) why do I have 'NAs introduced by coercion' warning message?
2) How can I solve the problem above?
3) Why do I get NA data for Hour and how can I solve it?
Thank you.
As already mentioned in the comments, if your string contains a non-numeric character (i.e., ":" in your Hour column), you cannot convert it to numeric, that's why you get NA.
I am not sure why do you want to convert your times to numeric, but if you'd like to perform some operations on it (e.g., calculate time differences) then you should convert your dates to Posix format. In your case run:
new3$fulldate <- as.POSIXlt(paste(new3$Date, new3$Hour, sep = " "))
Try this:
hour <- c("0:00", "0:15", "0:30", "0:45", "1:00", "1:15")
replace the : per . And you could convert
hour <- gsub(":", ".", hour)
hour <- as.numeric(hour)
hour
[1] 0.00 0.15 0.30 0.45 1.00 1.15

parse string like "now-1h" in R

what about experience by parsing/converting strings like "now-1h", "today", "now-3d", "today+30m" in R?
how to recognize and convert string (for example as function's argument) to date_time?
I do not recommend using this code, which is brittle. This is just to demonstrate this is possible:
library(lubridate)
library(stringr)
library(dplyr)
data <- c("now-1h", "today", "now-3d", "today+30m", "-3d")
We're going to use lubridate functions now(), today(), days(), hours(), ... which resemble your input data:
str(now())
# POSIXct[1:1], format: "2017-03-24 12:26:18"
str(today())
# Date[1:1], format: "2017-03-24"
str(now() - days(3))
# POSIXct[1:1], format: "2017-03-21 12:27:11"
str(- days(3))
# Formal class 'Period' [package "lubridate"] with 6 slots
# ..# .Data : num 0
# ..# year : num 0
# ..# month : num 0
# ..# day : num -3
# ..# hour : num 0
# ..# minute: num 0
We're going to have to parse() them as strings, to be able to actually use them, like that:
eval(parse(text = "now() + days(3)"))
# [1] "2017-03-27 12:41:34 CEST"
Now let's parse the input strings with a regex, manipulate them a bit to match lubridate syntax, then eval()uate them:
res <-
str_match(data, "(today|now)?([+-])?(\\d+)?([dhms])?")[, - 1] %>%
apply(1, function(x) {
time_ <- if (is.na(x[1])) NULL else paste0(x[1], "()")
offset_ <- if (any(is.na(x[2:4]))) NULL else paste(x[2],
recode(x[4], "d" = "days(", "h" = "hours(", "m" = "minutes(", "s" = "seconds("),
x[3],
")")
parse(text = paste(time_, offset_))
}) %>%
lapply(eval)
Notice that you get a variety of classes as output (either POSIXct or Date or lubridate::Period):
invisible(lapply(res, function(x) { print(x) ; str(x) }))
# [1] "2017-03-24 11:57:52 CET"
# POSIXct[1:1], format: "2017-03-24 11:57:52"
# [1] "2017-03-24"
# Date[1:1], format: "2017-03-24"
# [1] "2017-03-21 12:57:52 CET"
# POSIXct[1:1], format: "2017-03-21 12:57:52"
# [1] "2017-03-24 00:30:00 UTC"
# POSIXlt[1:1], format: "2017-03-24 00:30:00"
# [1] "-3d 0H 0M 0S"
# Formal class 'Period' [package "lubridate"] with 6 slots
# ..# .Data : num 0
# ..# year : num 0
# ..# month : num 0
# ..# day : num -3
# ..# hour : num 0
# ..# minute: num 0
(What I recommend instead is to pre-process the data with the language that produced it and possesses the right tools for the job, which appears to be Perl).

ifelse Statement Returning Number Instead Of Date

I have a series of dates in my code that are in an ifelse statement, that are returning a single numerical value instead of a date.
osa <- read.delim("C:/RMathew/RScripts/osaevents/osaevents.txt", stringsAsFactors=TRUE)
#
osa$datetime <- ymd_hms(osa$datetime)
osa$date <- as.Date(osa$datetime)
sixoclock <- 6*60*60
osa$daystart <- ymd_hms(ymd(osa$date) + sixoclock)
osa$dateplus <- osa$date + 1
osa$dateminus <- osa$date - 1
osa$dayend <- ymd_hms(ymd(osa$dateplus) + sixoclock)
osa$dateloca <- osa$datetime >= osa$daystart
osa$datelocb <- osa$datetime < osa$dayend
osa$milldate <- ifelse(osa$dateloca==TRUE & osa$datelocb==TRUE,
osa$date,osa$dateminus)
The place where this data originates considers the time between 6 AM on any given day to 6 AM the following day, as one day. The code above is trying to compare the date to the question of is it after 6 AM on a particular day, but before 6 AM on the following day, to assign it the earlier day's date (for whatever day it might be).
So far so good, but it returns a single number for the osa$milldate instead of the dates in the ifelse columns.
'data.frame': 897 obs. of 16 variables:
$ datetime : POSIXct, format: "2015-08-13 15:11:53" "2015-08-13 14:53:26" "2015-08-13 14:34:58" "2015-08-13 14:16:18" ...
$ stream : Factor w/ 1 level "fc": 1 1 1 1 1 1 1 1 1 1 ...
$ fe : num 18.1 18 17.6 18.1 18.5 ...
$ ni : num 2.97 2.99 2.92 3.2 3.32 ...
$ cu : num 3.41 3.35 2.99 3.58 3.73 ...
$ pd : num 138 157 139 166 183 ...
$ mg : num 13.8 13.8 14.4 14.3 13.9 ...
$ so : num 9.67 9.81 9.65 10.58 11.37 ...
$ date : Date, format: "2015-08-13" "2015-08-13" "2015-08-13" "2015-08-13" ...
$ daystart : POSIXct, format: "2015-08-13 06:00:00" "2015-08-13 06:00:00" "2015-08-13 06:00:00" "2015-08-13 06:00:00" ...
$ dateplus : Date, format: "2015-08-14" "2015-08-14" "2015-08-14" "2015-08-14" ...
$ dateminus: Date, format: "2015-08-12" "2015-08-12" "2015-08-12" "2015-08-12" ...
$ dayend : POSIXct, format: "2015-08-14 06:00:00" "2015-08-14 06:00:00" "2015-08-14 06:00:00" "2015-08-14 06:00:00" ...
$ dateloca : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ datelocb : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ milldate : num 16660 16660 16660 16660 16660 ...
Thoughts? Also, there is likely to be a more elegant way to do this.
See the help file for ifelse
Warning:
The mode of the result may depend on the value of ‘test’ (see the
examples), and the class attribute (see ‘oldClass’) of the result
is taken from ‘test’ and may be inappropriate for the values
selected from ‘yes’ and ‘no’.
Sometimes it is better to use a construction such as
(tmp <- yes; tmp[!test] <- no[!test]; tmp)
, possibly extended to handle missing values in ‘test’.
This describes precisely what is going on in your example -- the date class attribute is lost -- and a work around -- a multi-step approach.
osa$milldate <- osa$date
ind<- osa$dateloca==TRUE & osa$datelocb==TRUE
osa$milldate[!ind] <- osa$dateminus
Another option is replace.
A. Webb set me on the right path. The ifelse class was stripping the answer of the date format. The solution above with the index seemed to jumble the date order for some reason. As A. Webb pointed out, in the help file, the following line fixed it immediately.
class(osa$milldate) <- class(osa$date)

R: Error in as.Date.numeric(value) when using is.na() on data

I am having issues with using is.na() to change NA to zeroes within a data frame. Initially, I am changing the date format. As an example, the code format is like this:
Date <- c("16/08/2010 08:00", "17/08/2010 08:00", "18/08/2010 08:00")
Data1 <- c(30,NA,40)
Data2 <- c(50,60,NA)
df <- data.frame(Date,Data1,Data2)
df$Date <- strptime(df$Date, format = "%d/%m/%Y %H:%M", tz = "GMT" )
df$Date <- as.Date(df$Date, origin = df$Date[1])
df[is.na(df)]<-0
which yields the correct result. However, when I apply the same code to my data, I receive the error which I cannot figure out:
Error in as.Date.numeric(value) : 'origin' must be supplied
When I use str(data) the output is:
str(data)
'data.frame': 19461 obs. of 6 variables:
$ Date : Date, format: "2008-01-28" "2008-01-28" "2008-01-28" ...
$ NO_flux : num NA 5.33 NA -5.92 -10.87 ...
$ NO2_flux: num NA -12.7 NA -11.5 18.8 ...
$ N2O_flux: num NA NA NA NA NA NA NA NA NA NA ...
$ NH3_flux: num NA NA NA NA NA NA NA NA NA NA ...
$ O3_flux : num NA 313.42 NA 228.41 3.46 ...

Resources