I'm an experienced Pandas user and am having trouble plugging values from my R frame into a function.
The following function works with hard coded values
>seq.Date(as.Date('2018-01-01'), as.Date('2018-01-31'), 'days')
[1] "2018-01-01" "2018-01-02" "2018-01-03" "2018-01-04" "2018-01-05" "2018-01-06" "2018-01-07"
[8] "2018-01-08" "2018-01-09" "2018-01-10" "2018-01-11" "2018-01-12" "2018-01-13" "2018-01-14"
[15] "2018-01-15" "2018-01-16" "2018-01-17" "2018-01-18" "2018-01-19" "2018-01-20" "2018-01-21"
[22] "2018-01-22" "2018-01-23" "2018-01-24" "2018-01-25" "2018-01-26" "2018-01-27" "2018-01-28"
[29] "2018-01-29" "2018-01-30" "2018-01-31"
Here is an extract from a dataframe I'm using
>df[1,1:2]
# A tibble: 1 x 2
start_time end_time
<date> <date>
1 2017-04-27 2017-05-11
When plugging these values into the 'seq.Date' function I get an error
> seq.Date(from=df[1,1], to=df[1,2], 'days')
Error in seq.Date(from = df[1, 1], to = df[1, 2], "days") :
'from' must be a "Date" object
I suspect this is because subsetting using df[x,y] returns a tibble rather than the specific value
data.class(df[1,1])
[1] "tbl_df"
What I'm hoping to derive is a sequence of dates. I need to be able to point this at various places around the dataframe.
Many thanks for any help!
Just use double brackets:
seq.Date(from=df[[1,1]], to=df[[1,2]], 'days')
The extraction functions of tibble may not return vectors but one column tibbles, use dplyr::pull to extract the column as vector, like in this answer: Extract a dplyr tbl column as a vector
Another option is to set the drop argument in the `[` function to TRUE.
If TRUE the result is coerced to the lowest possible dimension
seq.Date(from = df[1, 1, drop = TRUE], to = df[1, 2, drop = TRUE], 'days')
# [1] "2017-04-27" "2017-04-28" "2017-04-29" "2017-04-30" "2017-05-01" "2017-05-02" "2017-05-03" "2017-05-04" "2017-05-05" "2017-05-06"
#[11] "2017-05-07" "2017-05-08" "2017-05-09" "2017-05-10" "2017-05-11"
data
df <- tibble(start_time = as.Date('2017-04-27'),
end_time = as.Date('2017-05-11'))
Related
I'm trying to insert the previous date for every date in a vector in R.
This is my current vector:
[1] "1990-02-08" "1990-03-28" "1990-05-16" "1990-07-05" "1990-07-13" "1990-08-22" "1990-10-03"
[8] "1990-10-29" "1990-11-14" "1990-12-07" "1990-12-18" "1991-01-08" "1991-02-01" "1991-02-07"
I'm trying to get the following:
[1] "1990-02-07" "1990-02-08" "1990-03-27" "1990-03-28" "1990-05-15" "1990-05-16" "1990-07-05"
ect.
I tried the following:
dates_lagged = as.Date(dates)-1
dates_combined = c(date, dates_lagged)
However, with this method, some dates are not getting lagged.
Is there a better way to do this?
Edit: to answer the comment, this is my code (replaced CSV with its starting values):
FOMC <- read_csv(file = c("x", "1990-02-08", "1990-03-28", "1990-05-16", "1990-07-05", "1990-07-13", "1990-08-22", "1990-10-03",
"1990-10-29", "1990-11-14", "1990-12-07"))
FOMC$x <- as.Date(FOMC$x, format = "%Y-%m-%d")
colnames(FOMC) <- "Date"
dates_vector <- FOMC[["Date"]]
FOMC = as.vector(as.Date(dates_vector))
dates_lagged = as.Date(FOMC)-1
dates_combined = c(FOMC, dates_lagged)
as.Date(dates_combined)
For some reason, there is no "1990-10-28" before "1990-10-29" for example, and I can't figure out why.
You could try:
as.Date(c(rbind(dates - 1, dates)), origin = "1970-01-01")
#> [1] "1990-02-07" "1990-02-08" "1990-03-27" "1990-03-28" "1990-05-15"
#> [6] "1990-05-16" "1990-07-04" "1990-07-05" "1990-07-12" "1990-07-13"
#> [11] "1990-08-21" "1990-08-22" "1990-10-02" "1990-10-03" "1990-10-28"
#> [16] "1990-10-29" "1990-11-13" "1990-11-14" "1990-12-06" "1990-12-07"
#> [21] "1990-12-17" "1990-12-18" "1991-01-07" "1991-01-08" "1991-01-31"
#> [26] "1991-02-01" "1991-02-06" "1991-02-07"
Data
dates <- c("1990-02-08", "1990-03-28", "1990-05-16", "1990-07-05", "1990-07-13",
"1990-08-22", "1990-10-03", "1990-10-29", "1990-11-14", "1990-12-07",
"1990-12-18", "1991-01-08", "1991-02-01", "1991-02-07")
dates <- as.Date(dates)
Created on 2021-11-04 by the reprex package (v2.0.0)
I am trying to copy the date column of the data for easy access. It is located at column 0. I tried to clone it using GOOGL$DATE<- GOOGL[,0]. The results was a NULL instead of the date provided.
getSymbols("GOOGL",
from ="2019-01-01",
to = "2019-06-30",
src = "yahoo",
adjust = TRUE)
GOOGL$DATE<- GOOGL[,0]
Indexes in R start at 1. There is no column 0. The GOOGL object is an xts object whose index is the dates. Read the documentation to the xts and zoo packages for background.
Either of these give the dates:
time(GOOGL)
## [1] "2019-01-02" "2019-01-03" "2019-01-04" "2019-01-07" "2019-01-08"
## [6] "2019-01-09" "2019-01-10" "2019-01-11" "2019-01-14" "2019-01-15"
## [11] "2019-01-16" "2019-01-17" "2019-01-18" "2019-01-22" "2019-01-23"
## ...etc...
index(GOOGL)
## [1] "2019-01-02" "2019-01-03" "2019-01-04" "2019-01-07" "2019-01-08"
## [6] "2019-01-09" "2019-01-10" "2019-01-11" "2019-01-14" "2019-01-15"
## [11] "2019-01-16" "2019-01-17" "2019-01-18" "2019-01-22" "2019-01-23"
## ...etc...
The core data of an xts or zoo object is a numeric matrix and you can't mix dates and numbers in a matrix. Internally the dates are stored in an index attribute. Furthermore, it is not really desirable to add the date as a column in the first place. In its current form you can use all the facilities of xts and zoo to manipulate it which is why getSymbols gives it in that form.
It is possible to convert an xts or zoo object to a data frame using fortify.zoo(GOOGL) . That creates a data frame whose first column's name is Index containing the dates but unless there really is a good reason to do that it would be better not to do so.
One way would be to convert the zoo object to dataframe and then add rownames as new column
GOOGLE <- as.data.frame(GOOGL)
GOOGLE$Date <- as.Date(rownames(GOOGLE))
Or we can use fortify.zoo which does all of this automatically
zoo::fortify.zoo(GOOGL)
data
library(quantmod)
getSymbols("GOOGL",
from ="2019-01-01",
to = "2019-06-30",
src = "yahoo",
adjust = TRUE)
I have a bunch of character vectors which I use to download some files (one for each month of the year), for which I have to change the date for every single link manually (at the end of the vector). It looks like this:
query_01_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.01.2019&to=31.01.2019"
query_02_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.02.2019&to=28.02.2019"
query_03_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.03.2019&to=31.03.2019"
query_04_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.04.2019&to=30.04.2019"
query_05_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.05.2019&to=31.05.2019"
query_06_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.06.2019&to=30.06.2019"
query_07_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.07.2019&to=31.07.2019"
query_08_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.08.2019&to=31.08.2019"
query_09_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.09.2019&to=30.09.2019"
query_10_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.10.2019&to=31.10.2019"
query_11_19 = "?format=Html&userId=1232&userHash=1277KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.11.2019&to=30.11.2019"
query_12_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.12.2019&to=31.12.2019"
This is already rather tedious for one year, but it becomes a real pain if I want to this for all the following years (let's say until 2030).
Is there an easier way to do this?
Thanks in advance!
A few tricks to make this easy:
use of seq.Date to generate the first day of each month (it is shown here as seq due to the convenience R's S3 methods provide);
substract 1 from those to get the last day of the previous months; and
join those together with paste0 after formating them to the dot-separated date format.
## 1
dates <- seq(as.Date("2018-01-01"), as.Date("2019-01-01"), by = "month")
dates
# [1] "2018-01-01" "2018-02-01" "2018-03-01" "2018-04-01" "2018-05-01" "2018-06-01" "2018-07-01"
# [8] "2018-08-01" "2018-09-01" "2018-10-01" "2018-11-01" "2018-12-01" "2019-01-01"
dates_first <- format(dates[-length(dates)], format = "%d.%m.%Y")
## 2
dates_last <- format(dates[-1] - 1L, format = "%d.%m.%Y")
dates_last
# [1] "31.01.2018" "28.02.2018" "31.03.2018" "30.04.2018" "31.05.2018" "30.06.2018" "31.07.2018"
# [8] "31.08.2018" "30.09.2018" "31.10.2018" "30.11.2018" "31.12.2018"
## 3
paste0(
"?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=",
dates_first,
"&to=",
dates_last)
# [1] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.01.2018&to=31.01.2018"
# [2] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.02.2018&to=28.02.2018"
# [3] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.03.2018&to=31.03.2018"
# [4] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.04.2018&to=30.04.2018"
# [5] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.05.2018&to=31.05.2018"
# [6] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.06.2018&to=30.06.2018"
# [7] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.07.2018&to=31.07.2018"
# [8] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.08.2018&to=31.08.2018"
# [9] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.09.2018&to=30.09.2018"
# [10] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.10.2018&to=31.10.2018"
# [11] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.11.2018&to=30.11.2018"
# [12] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.12.2018&to=31.12.2018"
(Easily could have been done with sprintf or related functions.)
I'm trying to get a vector of all the working days between to dates with the following code:
days_of_month = seq(as.Date("2017-01-01"), as.Date("2017-01-31"), by="days")
sundays = c(as.Date("2017-01-01"), as.Date("2017-01-08"), as.Date("2017-01-15"), as.Date("2017-01-22"), as.Date("2017-01-29"))
When I do:
working_days = setdiff(days_of_month, sundays)
The return value of setdiff is a vector of strange values:
[1] 17168 17169 17170 17171 17172 17173 17175 17176 17177 17178 17179 17180
[13] 17182 17183 17184 17185 17186 17187 17189 17190 17191 17192 17193 17194
[25] 17196 17197
What are those values? And how I get a vectors of the days that are in days_of_month but not in sundays?
Those are internal numeric value of R S3 class Date. You can see the numeric value by as.numeric(days_of_month). Or, you can convert the result to Date by as.Date(working_days, origin="1970-01-01").
I have a start date and an end date but when I am making a list to contain all dates in between, the format is changed:
> startDate <- as.Date("2012-01-01")
> startDate
[1] "2012-01-01"
> endDate <- as.Date("2012-02-01")
> endDate
[1] "2012-02-01"
> startDate:endDate
[1] 15340 15341 15342 15343 15344 15345 15346 15347 15348 15349 15350 15351 15352 15353 15354 15355
[17] 15356 15357 15358 15359 15360 15361 15362 15363 15364 15365 15366 15367 15368 15369 15370 15371
So you can see that all dates are converted to a numeric format.
But the problem is, I have a API function that can only read date format as "YYYY-MM-DD".
Can any one suggest how I can generate such a list like:
[1] "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" ....
Use seq function:
seq(startDate,endDate,by="day") #you could use also by=1
# see ?seq.Date for other options for "by"
From help page of operator : (use ?":" or ?Colon):
For other arguments from:to is equivalent to seq(from, to), and
generates a sequence from from to to in steps of 1 or -1. Value to
will be included if it differs from from by an integer up to a numeric
fuzz of about 1e-7. Non-numeric arguments are coerced internally
(hence without dispatching methods) to numeric—complex values will
have their imaginary parts discarded with a warning.
So
identical(startDate:endDate,as.numeric(startDate):as.numeric(endDate))
[1] TRUE
And btw, you are generating a vector, not a list. You can make a list out of your values by using as.list function though, if that is what you really want.