Date format in a list is changed into numbers - r

I have a start date and an end date but when I am making a list to contain all dates in between, the format is changed:
> startDate <- as.Date("2012-01-01")
> startDate
[1] "2012-01-01"
> endDate <- as.Date("2012-02-01")
> endDate
[1] "2012-02-01"
> startDate:endDate
[1] 15340 15341 15342 15343 15344 15345 15346 15347 15348 15349 15350 15351 15352 15353 15354 15355
[17] 15356 15357 15358 15359 15360 15361 15362 15363 15364 15365 15366 15367 15368 15369 15370 15371
So you can see that all dates are converted to a numeric format.
But the problem is, I have a API function that can only read date format as "YYYY-MM-DD".
Can any one suggest how I can generate such a list like:
[1] "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" ....

Use seq function:
seq(startDate,endDate,by="day") #you could use also by=1
# see ?seq.Date for other options for "by"
From help page of operator : (use ?":" or ?Colon):
For other arguments from:to is equivalent to seq(from, to), and
generates a sequence from from to to in steps of 1 or -1. Value to
will be included if it differs from from by an integer up to a numeric
fuzz of about 1e-7. Non-numeric arguments are coerced internally
(hence without dispatching methods) to numeric—complex values will
have their imaginary parts discarded with a warning.
So
identical(startDate:endDate,as.numeric(startDate):as.numeric(endDate))
[1] TRUE
And btw, you are generating a vector, not a list. You can make a list out of your values by using as.list function though, if that is what you really want.

Related

Access R Dataframe Values Rather than Tibble

I'm an experienced Pandas user and am having trouble plugging values from my R frame into a function.
The following function works with hard coded values
>seq.Date(as.Date('2018-01-01'), as.Date('2018-01-31'), 'days')
[1] "2018-01-01" "2018-01-02" "2018-01-03" "2018-01-04" "2018-01-05" "2018-01-06" "2018-01-07"
[8] "2018-01-08" "2018-01-09" "2018-01-10" "2018-01-11" "2018-01-12" "2018-01-13" "2018-01-14"
[15] "2018-01-15" "2018-01-16" "2018-01-17" "2018-01-18" "2018-01-19" "2018-01-20" "2018-01-21"
[22] "2018-01-22" "2018-01-23" "2018-01-24" "2018-01-25" "2018-01-26" "2018-01-27" "2018-01-28"
[29] "2018-01-29" "2018-01-30" "2018-01-31"
Here is an extract from a dataframe I'm using
>df[1,1:2]
# A tibble: 1 x 2
start_time end_time
<date> <date>
1 2017-04-27 2017-05-11
When plugging these values into the 'seq.Date' function I get an error
> seq.Date(from=df[1,1], to=df[1,2], 'days')
Error in seq.Date(from = df[1, 1], to = df[1, 2], "days") :
'from' must be a "Date" object
I suspect this is because subsetting using df[x,y] returns a tibble rather than the specific value
data.class(df[1,1])
[1] "tbl_df"
What I'm hoping to derive is a sequence of dates. I need to be able to point this at various places around the dataframe.
Many thanks for any help!
Just use double brackets:
seq.Date(from=df[[1,1]], to=df[[1,2]], 'days')
The extraction functions of tibble may not return vectors but one column tibbles, use dplyr::pull to extract the column as vector, like in this answer: Extract a dplyr tbl column as a vector
Another option is to set the drop argument in the `[` function to TRUE.
If TRUE the result is coerced to the lowest possible dimension
seq.Date(from = df[1, 1, drop = TRUE], to = df[1, 2, drop = TRUE], 'days')
# [1] "2017-04-27" "2017-04-28" "2017-04-29" "2017-04-30" "2017-05-01" "2017-05-02" "2017-05-03" "2017-05-04" "2017-05-05" "2017-05-06"
#[11] "2017-05-07" "2017-05-08" "2017-05-09" "2017-05-10" "2017-05-11"
data
df <- tibble(start_time = as.Date('2017-04-27'),
end_time = as.Date('2017-05-11'))

Why can not get a vector class

I have extracted this dataframe:
> df<-as.data.frame(model_rf$variable.importance)
> df
Importance
DayOfWeek 3.763932e+11
Customers 1.364059e+12
Open 6.345289e+11
Promo 2.617495e+11
StateHoliday 5.196666e+09
SchoolHoliday 6.522969e+09
DateYear 7.035399e+09
DateMonth 2.013482e+10
DateDay 3.763177e+10
DateWeek 3.283496e+10
StoreType 3.156843e+10
Assortment 2.025741e+10
CompetitionDistance 1.118476e+11
CompetitionOpenSinceMonth 4.633220e+10
CompetitionOpenSinceYear 4.554890e+10
Promo2 0.000000e+00
Promo2SinceWeek 5.066674e+10
Promo2SinceYear 4.096407e+10
CompetitionOpen 3.992745e+10
PromoOpen 2.831936e+10
IspromoinSales 2.844220e+09
then I want to extract values in other column:
> v<-as.vector(model_rf$variable.importance$Importance)
> v
[1] 3.763932e+11 1.364059e+12 6.345289e+11 2.617495e+11 5.196666e+09 6.522969e+09 7.035399e+09 2.013482e+10 3.763177e+10
[10] 3.283496e+10 3.156843e+10 2.025741e+10 1.118476e+11 4.633220e+10 4.554890e+10 0.000000e+00 5.066674e+10 4.096407e+10
[19] 3.992745e+10 2.831936e+10 2.844220e+09
And names of each row in other column
> w<-(as.vector((row.names(df))))
> w
[1] "DayOfWeek" "Customers" "Open" "Promo"
[5] "StateHoliday" "SchoolHoliday" "DateYear" "DateMonth"
[9] "DateDay" "DateWeek" "StoreType" "Assortment"
[13] "CompetitionDistance" "CompetitionOpenSinceMonth" "CompetitionOpenSinceYear" "Promo2"
[17] "Promo2SinceWeek" "Promo2SinceYear" "CompetitionOpen" "PromoOpen"
[21] "IspromoinSales"
Then I need to get a data frame created by the tow vector above:
DF<-as.data.frame(w,v)
DF<-as.data.frame(w,v) Warning message: In as.data.frame.vector(x, ..., nm = nm) : 'row.names' is not a character vector of length 21
-- omitting it. Will be an error!
In fact, it seems that the w vector doesn't be converted as vector class even I did as.vector. It still as a character class.
> class(w)
[1] "character"
How do you explain this please?
Try this code:
DF<-as.data.frame(cbind(w,v))
If you look at the documentation of as.data.frame you see that the function expects the second vector to be a character vector for row names.
In your case, you supplied first the row names and then the values, leading to the error above.
You can either use
as.data.frame(v,w)
or
data.frame(w,v)
to get your desired result.

R setdiff function on vector of dates leads to strange results

I'm trying to get a vector of all the working days between to dates with the following code:
days_of_month = seq(as.Date("2017-01-01"), as.Date("2017-01-31"), by="days")
sundays = c(as.Date("2017-01-01"), as.Date("2017-01-08"), as.Date("2017-01-15"), as.Date("2017-01-22"), as.Date("2017-01-29"))
When I do:
working_days = setdiff(days_of_month, sundays)
The return value of setdiff is a vector of strange values:
[1] 17168 17169 17170 17171 17172 17173 17175 17176 17177 17178 17179 17180
[13] 17182 17183 17184 17185 17186 17187 17189 17190 17191 17192 17193 17194
[25] 17196 17197
What are those values? And how I get a vectors of the days that are in days_of_month but not in sundays?
Those are internal numeric value of R S3 class Date. You can see the numeric value by as.numeric(days_of_month). Or, you can convert the result to Date by as.Date(working_days, origin="1970-01-01").

R's strptime of a time datetime at midnight (00:00:00) gives NA

R's base strptime function is giving me output I do not expect.
This works as expected:
strptime(20130203235959, "%Y%m%d%H%M%S")
# yields "2013-02-03 23:59:59"
This too:
strptime(20130202240000, "%Y%m%d%H%M%S")
# yields "2013-02-03"
...but this does not. Why?
strptime(20130203000000, "%Y%m%d%H%M%S")
# yields NA
UPDATE
The value 20130204000000 showed up in a log I generated on a Mac 10.7.5 system using the command:
➜ ~ echo `date +"%Y%m%d%H%M%S"`
20130204000000
UPDATE 2
I even tried lubridate, which seem to be the recommendation:
> parse_date_time(c(20130205000001), c("%Y%m%d%H%M%S"))
1 parsed with %Y%m%d%H%M%S
[1] "2013-02-05 00:00:01 UTC"
> parse_date_time(c(20130205000000), c("%Y%m%d%H%M%S"))
1 failed to parse.
[1] NA
...and then funnily enough, it printed out "00:00:00" when I added enough seconds to now() to reach midnight:
> now() + new_duration(13000)
[1] "2013-02-10 00:00:00 GMT"
I should use character and not numeric when I parse my dates:
> strptime(20130203000000, "%Y%m%d%H%M%S") # No!
[1] NA
> strptime("20130203000000", "%Y%m%d%H%M%S") # Yes!
[1] "2013-02-03"
The reason for this seems to be that my numeric value gets cast to character, and I used one too many digits:
> as.character(201302030000)
[1] "201302030000"
> as.character(2013020300000)
[1] "2013020300000"
> as.character(20130203000000)
[1] "2.0130203e+13" # This causes the error: it doesn't fit "%Y%m%d%H%M%S"
> as.character(20130203000001)
[1] "20130203000001" # And this is why anything other than 000000 worked.
A quick lesson in figuring out the type you need from the docs: In R, execute help(strptime) and see a popup similar to the image below.
The red arrow points to the main argument to the function, but does not specify the type (which is why I just tried numeric).
The green arrow points to the type, which is in the document's title.
you are essentially asking for the "zeroeth" second, which obviously doesn't exist :)
# last second of february 3rd
strptime(20130203235959, "%Y%m%d%H%M%S")
# first second of february 4rd -- notice this 'rounds up' to feb 4th
# even though it says february 3rd
strptime(20130203240000, "%Y%m%d%H%M%S")
# no such second
strptime(20130204000000, "%Y%m%d%H%M%S")
# 2nd second of february 4th
strptime(20130204000001, "%Y%m%d%H%M%S")

Converting xts to ts: Error in Round(frequency)

I have some imported csv data that I have turned into an xts object. If I try to convert it into a ts object (with the end goal of using functions like acf) I get:
"Error in round(frequency) : Non-numeric argument to mathematical
function"
The code to convert it is:
library("zoo")
#Working With Milliseconds
op <- options(digits.secs=3)
#Rename Function
clean_perfmon = function(x, servername) {
names(x)[names(x)=="X.PDH.CSV.4.0...Coordinated.Universal.Time..0."] <- "Time"
x$Time = strptime(x$Time, "%m/%d/%Y %H:%M:%OS")
return(x)
}
web02 = read.csv("/home/kbrandt/Desktop/Shared/web02_2011_07_20_1.csv")
web02 = clean_perfmon(web02, "NY.WEB02")
web02ts = xts(web02[,-1], web02[,"Time"])
The time is mostly regular, but with some variation in the MS:
time(web02ts)[1:3]
[1] "2011-07-20 11:21:50.459 EDT" "2011-07-20 11:21:51.457 EDT" "2011-07-20 11:21:52.456 EDT"
Some of the data has NA points:
> web02ts[1:3,1]
X..NY.WEB02.Process.Idle....Processor.Time
2011-07-20 11:21:50.459 NA
2011-07-20 11:21:51.457 1134.819
2011-07-20 11:21:52.456 1374.877
Update:
Changing to per second resolution, and a non-na subset doesn't help:
> as.ts(web02ts[2:10,1])
Error in round(frequency) : Non-numeric argument to mathematical function
> web02ts[2:10,1]
X..NY.WEB02.Process.Idle....Processor.Time
2011-07-20 11:21:51 1134.819
2011-07-20 11:21:52 1374.877
2011-07-20 11:21:53 1060.842
2011-07-20 11:21:54 1067.092
2011-07-20 11:21:55 1195.205
2011-07-20 11:21:56 1223.328
2011-07-20 11:21:57 1121.774
2011-07-20 11:21:58 1187.393
2011-07-20 11:21:59 1378.001
>
Also, frequency(web02ts) returns NULL.
strptime creates an object of class POSIXlt. as.ts doesn't support it, and thinks it is a list, hence the complaint about a non-numeric argument. Convert to POSIXct instead.
as.POSIXct(strptime(x$Time, "%m/%d/%Y %H:%M:%OS"))
A xts/zoo object must be regular to have a non-NULL frequency.
You don't show how you changed to per-second resolution but if you tried via options(digits.secs=0), that won't work because it only affects printing. You would need to do something like this:
# example data
set.seed(21)
web02ts <- xts(rnorm(10), Sys.time()+1:10+runif(10)/3)
web02ts_reg <- align.time(web02ts,1)
frequency(web02ts_reg)
# [1] 1
as.ts(web02ts_reg)
# Time Series:
# Start = 1
# End = 10
# Frequency = 1
# [1] 0.793013171 0.522251264 1.746222241 -1.271336123 2.197389533
# [6] 0.433130777 -1.570199630 -0.934905667 0.063493345 -0.002393336

Resources