I have some imported csv data that I have turned into an xts object. If I try to convert it into a ts object (with the end goal of using functions like acf) I get:
"Error in round(frequency) : Non-numeric argument to mathematical
function"
The code to convert it is:
library("zoo")
#Working With Milliseconds
op <- options(digits.secs=3)
#Rename Function
clean_perfmon = function(x, servername) {
names(x)[names(x)=="X.PDH.CSV.4.0...Coordinated.Universal.Time..0."] <- "Time"
x$Time = strptime(x$Time, "%m/%d/%Y %H:%M:%OS")
return(x)
}
web02 = read.csv("/home/kbrandt/Desktop/Shared/web02_2011_07_20_1.csv")
web02 = clean_perfmon(web02, "NY.WEB02")
web02ts = xts(web02[,-1], web02[,"Time"])
The time is mostly regular, but with some variation in the MS:
time(web02ts)[1:3]
[1] "2011-07-20 11:21:50.459 EDT" "2011-07-20 11:21:51.457 EDT" "2011-07-20 11:21:52.456 EDT"
Some of the data has NA points:
> web02ts[1:3,1]
X..NY.WEB02.Process.Idle....Processor.Time
2011-07-20 11:21:50.459 NA
2011-07-20 11:21:51.457 1134.819
2011-07-20 11:21:52.456 1374.877
Update:
Changing to per second resolution, and a non-na subset doesn't help:
> as.ts(web02ts[2:10,1])
Error in round(frequency) : Non-numeric argument to mathematical function
> web02ts[2:10,1]
X..NY.WEB02.Process.Idle....Processor.Time
2011-07-20 11:21:51 1134.819
2011-07-20 11:21:52 1374.877
2011-07-20 11:21:53 1060.842
2011-07-20 11:21:54 1067.092
2011-07-20 11:21:55 1195.205
2011-07-20 11:21:56 1223.328
2011-07-20 11:21:57 1121.774
2011-07-20 11:21:58 1187.393
2011-07-20 11:21:59 1378.001
>
Also, frequency(web02ts) returns NULL.
strptime creates an object of class POSIXlt. as.ts doesn't support it, and thinks it is a list, hence the complaint about a non-numeric argument. Convert to POSIXct instead.
as.POSIXct(strptime(x$Time, "%m/%d/%Y %H:%M:%OS"))
A xts/zoo object must be regular to have a non-NULL frequency.
You don't show how you changed to per-second resolution but if you tried via options(digits.secs=0), that won't work because it only affects printing. You would need to do something like this:
# example data
set.seed(21)
web02ts <- xts(rnorm(10), Sys.time()+1:10+runif(10)/3)
web02ts_reg <- align.time(web02ts,1)
frequency(web02ts_reg)
# [1] 1
as.ts(web02ts_reg)
# Time Series:
# Start = 1
# End = 10
# Frequency = 1
# [1] 0.793013171 0.522251264 1.746222241 -1.271336123 2.197389533
# [6] 0.433130777 -1.570199630 -0.934905667 0.063493345 -0.002393336
Related
I am trying to find out closest next quarter end date for a given date in R.
For example, if the input is "2022-02-23", the output should be "2022-03-31"
and if the input is "2022-03-07", the output should be "2022-06-30".
If the input is "2021-12-15", the output should be "2022-03-31".
Is there any function in R for this?
lubridate::quarter with argument type = "date_last" will get you most of the way there. From the comments, it looks like you want to jump to the following quarter if the date is in the last month of a quarter; we can achieve this by adding a month to each date before passing to quarter. We can add months safely using the %m+% operator.
library(lubridate)
dates_in <- ymd(c("2022-02-23", "2022-03-07", "2021-12-15"))
dates_out <- quarter(dates_in %m+% months(1), type = "date_last")
dates_out
# "2022-03-31" "2022-06-30" "2022-03-31"
Please see this kind of function using lubridate's quarter function
last_day_in_quarter <- function(d){
require(lubridate)
last_month_in_quarter <- ymd(paste(year(d),quarter(d)*3,1))
return(last_month_in_quarter %m+% months(1) - 1)
}
last_day_in_quarter(ymd("2021-12-15")) #"2021-12-31"
last_day_in_quarter(ymd("2022-02-15")) #"2022-03-31"
last_day_in_quarter(ymd("2021-05-15")) #"2021-06-30"
last_day_in_quarter(ymd("2021-07-15")) #"2021-09-30"
I think these kinds of problems become immensely easier to understand if you work with a true year-quarter-day type. There is one of these in the clock package (I am the author).
library(clock)
x <- date_parse(c("2022-02-23", "2022-03-07", "2021-12-15"))
x
#> [1] "2022-02-23" "2022-03-07" "2021-12-15"
# What quarter are we in now?
yqd <- as_year_quarter_day(x)
yqd
#> <year_quarter_day<January><day>[3]>
#> [1] "2022-Q1-54" "2022-Q1-66" "2021-Q4-76"
# Is the current month the same as the end-of-quarter month?
# (if so, we are going to shift forward by 1 quarter).
shift <- get_month(x) == get_month(as.Date(set_day(yqd, "last")))
shift
#> [1] FALSE TRUE TRUE
# Shift by 1 quarter where applicable
yqd[shift] <- yqd[shift] + duration_quarters(1)
yqd
#> <year_quarter_day<January><day>[3]>
#> [1] "2022-Q1-54" "2022-Q2-66" "2022-Q1-76"
# Set day to end of quarter
yqd <- set_day(yqd, "last")
yqd
#> <year_quarter_day<January><day>[3]>
#> [1] "2022-Q1-90" "2022-Q2-91" "2022-Q1-90"
# Now convert back to Date
as.Date(yqd)
#> [1] "2022-03-31" "2022-06-30" "2022-03-31"
I'm an experienced Pandas user and am having trouble plugging values from my R frame into a function.
The following function works with hard coded values
>seq.Date(as.Date('2018-01-01'), as.Date('2018-01-31'), 'days')
[1] "2018-01-01" "2018-01-02" "2018-01-03" "2018-01-04" "2018-01-05" "2018-01-06" "2018-01-07"
[8] "2018-01-08" "2018-01-09" "2018-01-10" "2018-01-11" "2018-01-12" "2018-01-13" "2018-01-14"
[15] "2018-01-15" "2018-01-16" "2018-01-17" "2018-01-18" "2018-01-19" "2018-01-20" "2018-01-21"
[22] "2018-01-22" "2018-01-23" "2018-01-24" "2018-01-25" "2018-01-26" "2018-01-27" "2018-01-28"
[29] "2018-01-29" "2018-01-30" "2018-01-31"
Here is an extract from a dataframe I'm using
>df[1,1:2]
# A tibble: 1 x 2
start_time end_time
<date> <date>
1 2017-04-27 2017-05-11
When plugging these values into the 'seq.Date' function I get an error
> seq.Date(from=df[1,1], to=df[1,2], 'days')
Error in seq.Date(from = df[1, 1], to = df[1, 2], "days") :
'from' must be a "Date" object
I suspect this is because subsetting using df[x,y] returns a tibble rather than the specific value
data.class(df[1,1])
[1] "tbl_df"
What I'm hoping to derive is a sequence of dates. I need to be able to point this at various places around the dataframe.
Many thanks for any help!
Just use double brackets:
seq.Date(from=df[[1,1]], to=df[[1,2]], 'days')
The extraction functions of tibble may not return vectors but one column tibbles, use dplyr::pull to extract the column as vector, like in this answer: Extract a dplyr tbl column as a vector
Another option is to set the drop argument in the `[` function to TRUE.
If TRUE the result is coerced to the lowest possible dimension
seq.Date(from = df[1, 1, drop = TRUE], to = df[1, 2, drop = TRUE], 'days')
# [1] "2017-04-27" "2017-04-28" "2017-04-29" "2017-04-30" "2017-05-01" "2017-05-02" "2017-05-03" "2017-05-04" "2017-05-05" "2017-05-06"
#[11] "2017-05-07" "2017-05-08" "2017-05-09" "2017-05-10" "2017-05-11"
data
df <- tibble(start_time = as.Date('2017-04-27'),
end_time = as.Date('2017-05-11'))
I have a start date and an end date but when I am making a list to contain all dates in between, the format is changed:
> startDate <- as.Date("2012-01-01")
> startDate
[1] "2012-01-01"
> endDate <- as.Date("2012-02-01")
> endDate
[1] "2012-02-01"
> startDate:endDate
[1] 15340 15341 15342 15343 15344 15345 15346 15347 15348 15349 15350 15351 15352 15353 15354 15355
[17] 15356 15357 15358 15359 15360 15361 15362 15363 15364 15365 15366 15367 15368 15369 15370 15371
So you can see that all dates are converted to a numeric format.
But the problem is, I have a API function that can only read date format as "YYYY-MM-DD".
Can any one suggest how I can generate such a list like:
[1] "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" ....
Use seq function:
seq(startDate,endDate,by="day") #you could use also by=1
# see ?seq.Date for other options for "by"
From help page of operator : (use ?":" or ?Colon):
For other arguments from:to is equivalent to seq(from, to), and
generates a sequence from from to to in steps of 1 or -1. Value to
will be included if it differs from from by an integer up to a numeric
fuzz of about 1e-7. Non-numeric arguments are coerced internally
(hence without dispatching methods) to numeric—complex values will
have their imaginary parts discarded with a warning.
So
identical(startDate:endDate,as.numeric(startDate):as.numeric(endDate))
[1] TRUE
And btw, you are generating a vector, not a list. You can make a list out of your values by using as.list function though, if that is what you really want.
I know the start date start and the last date maturity. How can I fill in a vector with dates without taking weekends dates into account ?
For instance, let's say :
> start = as.Date("2013-02-28");
> maturity = as.Date("2013-03-07");
I would like to get the following vector as a result :
results
[1] "2013-03-01" "2013-03-04" "2013-03-05" "2013-03-06" "2013-03-07"
> start = as.Date("2013-02-28");
> maturity = as.Date("2013-03-07");
> x <- seq(start,maturity,by = 1)
> x
[1] "2013-02-28" "2013-03-01" "2013-03-02" "2013-03-03" "2013-03-04" "2013-03-05"
[7] "2013-03-06" "2013-03-07"
> x <- x[!weekdays(x) %in% c('Saturday','Sunday')]
> x
[1] "2013-02-28" "2013-03-01" "2013-03-02" "2013-03-03" "2013-03-04" "2013-03-05"
[7] "2013-03-06" "2013-03-07"
Same results... ?
There are probably a billion ways to do this with a variety of functions from multiple packages. But my first thought is to simply make a sequence and then remove the weekends:
x <- seq(as.Date('2011-01-01'),as.Date('2011-12-31'),by = 1)
x <- x[!weekdays(x) %in% c('Saturday','Sunday')]
This answer is valid only with an English based system. For instance, in a French version, 'Saturday' and 'Sunday' must be translated into 'samedi' and 'dimanche'
This less human than #joran answer:) , but it is no local-time depending
dd <- seq(as.Date('2011-01-01'),as.Date('2011-12-31'),by = 1)
dd[! (as.POSIXlt(dd)$wd %in% c(0,1))]
PS : another option , is to set locals before applying weekdays
tt <- Sys.getlocale('LC_TIME')
Sys.setlocale('LC_TIME','ENGLISH')
dd <- dd[!weekdays(x) %in% c('Saturday','Sunday')]
Sys.setlocale('LC_TIME',tt)
R's base strptime function is giving me output I do not expect.
This works as expected:
strptime(20130203235959, "%Y%m%d%H%M%S")
# yields "2013-02-03 23:59:59"
This too:
strptime(20130202240000, "%Y%m%d%H%M%S")
# yields "2013-02-03"
...but this does not. Why?
strptime(20130203000000, "%Y%m%d%H%M%S")
# yields NA
UPDATE
The value 20130204000000 showed up in a log I generated on a Mac 10.7.5 system using the command:
➜ ~ echo `date +"%Y%m%d%H%M%S"`
20130204000000
UPDATE 2
I even tried lubridate, which seem to be the recommendation:
> parse_date_time(c(20130205000001), c("%Y%m%d%H%M%S"))
1 parsed with %Y%m%d%H%M%S
[1] "2013-02-05 00:00:01 UTC"
> parse_date_time(c(20130205000000), c("%Y%m%d%H%M%S"))
1 failed to parse.
[1] NA
...and then funnily enough, it printed out "00:00:00" when I added enough seconds to now() to reach midnight:
> now() + new_duration(13000)
[1] "2013-02-10 00:00:00 GMT"
I should use character and not numeric when I parse my dates:
> strptime(20130203000000, "%Y%m%d%H%M%S") # No!
[1] NA
> strptime("20130203000000", "%Y%m%d%H%M%S") # Yes!
[1] "2013-02-03"
The reason for this seems to be that my numeric value gets cast to character, and I used one too many digits:
> as.character(201302030000)
[1] "201302030000"
> as.character(2013020300000)
[1] "2013020300000"
> as.character(20130203000000)
[1] "2.0130203e+13" # This causes the error: it doesn't fit "%Y%m%d%H%M%S"
> as.character(20130203000001)
[1] "20130203000001" # And this is why anything other than 000000 worked.
A quick lesson in figuring out the type you need from the docs: In R, execute help(strptime) and see a popup similar to the image below.
The red arrow points to the main argument to the function, but does not specify the type (which is why I just tried numeric).
The green arrow points to the type, which is in the document's title.
you are essentially asking for the "zeroeth" second, which obviously doesn't exist :)
# last second of february 3rd
strptime(20130203235959, "%Y%m%d%H%M%S")
# first second of february 4rd -- notice this 'rounds up' to feb 4th
# even though it says february 3rd
strptime(20130203240000, "%Y%m%d%H%M%S")
# no such second
strptime(20130204000000, "%Y%m%d%H%M%S")
# 2nd second of february 4th
strptime(20130204000001, "%Y%m%d%H%M%S")