lubridate extract object from period inside a function - r

I found this useful function here, to compute an age (or just compute the time spent between two dates) :
calc_age <- function(birthDate, refDate = Sys.Date()) {
period <- as.period(new_interval(birthDate, refDate),
unit = "year")
period$year
}
I would like to improve it by making the unit an argument of the function, so we can get the age in months, days or whatever unit supported by period objects.
I tried things like this :
calc_age <- function(birthDate, refDate = Sys.Date(), unit = 'year') {
period <- as.period(new_interval(birthDate, refDate),
unit = unit)
period$unit
}
but I'm getting this error message :
Error in slot(x, name) :
no slot of name "unit" for this object of class "Period"
I think the problem is something related to evaluation so I tried a few things with enquo or quotext but since I'm pretty new to functionnal programming I couldn't manage to make it work.
Thanks for help and sorry for bad english.

It is not a big adaptation of the code in the question's link.
calc_age2 <- function(birthDate, refDate = Sys.Date(), unit = "year"){
int <- interval(birthDate, refDate)
period <- as.period(int, unit = unit)
slot(period, unit)
}
What is the official age of unix?
calc_age2("1970-01-01")
#[1] 49
calc_age2("1970-01-01", unit = "month")
#[1] 593

Related

how to interpolate data within groups in R using seqtime?

I am trying to use seqtime (https://github.com/hallucigenia-sparsa/seqtime) to analyze time-serie microbiome data, as follow:
meta = data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta<- meta[order(meta$day, meta$condition),]
meta.ts<-as.data.frame(t(meta))
otu=matrix(1:390, ncol = 39)
oturar<-rarefyFilter(otu, min=0)
rarotu<-oturar$rar
time<-meta.ts[1,]
interp.otu<-interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
the interpolation returns the following error:
[1] "Processing group a"
[1] "Number of members 13"
intervals
0
12
[1] "Selected interval: 1"
[1] "Length of time series: 13"
[1] "Length of time series after interpolation: 1"
Error in stinepack::stinterp(time.vector, as.numeric(x[i, ]), xout = xout, :
The values of x must strictly increasing
I tried to change method to "hyman", but it returns the error below:
Error in interpolateSub(x = x, time.vector = time.vector, method = method) :
Time points must be provided in chronological order.
I am using R version 3.6.1 and I am a bit new to R.
Please can anyone tell me what I am doing wrong/ how to go around these errors?
Many thanks!
I used quite some time stumbling around trying to figure this out. It all comes down to the data structure of meta and the resulting time variable used as input for the time.vector parameter.
When meta.ts is being converted to a data frame, all strings are automatically converted to factors - this includes day.
To adjust, you can edit your code to the following:
library(seqtime)
meta <- data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta <- meta[order(meta$day, meta$condition),]
meta.ts <- as.data.frame(t(meta), stringsAsFactors = FALSE) # Set stringsAsFactors = FALSE
otu <- matrix(1:390, ncol = 39)
oturar <- rarefyFilter(otu, min=0)
rarotu <- oturar$rar
time <- as.integer(meta.ts[1,]) # Now 'day' is character, so convert to integer
interp.otu <- interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
As a bonus, read this blogpost for information on the stringsAsFactors parameter. Strings automatically being converted to Factors is a common bewilderment.

Using ifelse to create a running tally in R

I am trying to do some quantitative modeling in R. I'm not getting an error message, but the results are not what I actually need.
I am a newbie, but here is my complete code sample.
`library(quantmod)
#Building the data frame and xts to show dividends, splits and technical indicators
getSymbols(c("AMZN"))
Playground <- data.frame(AMZN)
Playground$date <- as.Date(row.names(Playground))
Playground$wday <- as.POSIXlt(Playground$date)$wday #day of the week
Playground$yday <- as.POSIXlt(Playground$date)$mday #day of the month
Playground$mon <- as.POSIXlt(Playground$date)$mon #month of the year
Playground$RSI <- RSI(Playground$AMZN.Adjusted, n = 5, maType="EMA") #can add Moving Average Type with maType =
Playground$MACD <- MACD(AMZN, nFast = 12, nSlow = 26, nSig = 9)
Playground$Div <- getDividends('AMZN', from = "2007-01-01", to = Sys.Date(), src = "google", auto.assign = FALSE)
Playground$Split <- getSplits('AMZN', from = "2007-01-01", to = Sys.Date(), src = "google", auto.assign = FALSE)
Playground$BuySignal <- ifelse(Playground$RSI < 30 & Playground$MACD < 0, "Buy", "Hold")
All is well up until this point when I start using some logical conditions to come up with decision points.
Playground$boughts <- ifelse(Playground$BuySignal == "Buy", lag(Playground$boughts) + 1000, lag(Playground$boughts))
It will execute but the result will be nothing but NA. I suppose this is because you are trying to add NA to a number, but I'm not 100% sure. How do you tell the computer I want you to keep a running tally of how much you have bought?
Thanks so much for the help.
So we want ot buy 1000 shares every time a buy signal is generated?
Your problem stems from MACD idicator. It actually generates two columns, macd and signal. You have to decide which one you want to keep.
Playground$MACD <- MACD(AMZN, nFast = 12, nSlow = 26, nSig = 9)$signal
This should solve the problem at hand.
Also, please check the reference for ifelse. The class of return value can be tricky at times, and so the approach suggested by Floo0 is preferable.
Also, I'd advocate using 1 and 0 instead of buy and sell to show weather you are holding . It makes the math much easier.
And I'd strongly suggest reading some beginner tutorial on backtesting with PerformanceAnalytics. They make the going much much easier.
BTW, you missed this line in the code:
Playground$boughts<- 0
Hope it helps.
EDIT: And I forgot to mention the obvious. discard the first few rows where MACD will be NA
Something like:
Playground<- Playground[-c(1:26),]
Whenever you want to do an ifelse like
if ... Do something, else stay the same: Do not use ifelse
Try this instead
ind <- which(Playground$BuySignal == "Buy")
Playground$boughts[ind] <- lag(Playground$boughts) + 1000

Create label column in dataframe according to an existing date column

I am new to R and struggling with the fact that functions are able to operate on whole vectors without having to explicitly specify this.
My goal
I have a data frame calls with multiple columns, one of which is a “date” column. Now I want to add a new column, “daytime”, that labels the daytime the particular entry’s date falls into:
> calls
call_id length date direction daytime
1 258 531 1400594572974 outgoing afternoon
2 259 0 1375555528144 unanswered evening
3 260 778 1385922648396 incoming evening
What I have done so far
I have already implemented methods that return a vector of booleans like that:
# Operates on POSIXlt timestamps
is.earlymorning <- function(date) {
hour(floor_date(date, "hour")) >= 5 & hour(floor_date(date, "hour")) < 9
}
The call is.earlymorning(“2014-05-20 16:02:52”, “2013-08-03 20:45:28”, “2013-12-01 19:30:48”) would thus return (“FALSE”, “FALSE”, “FALSE”). What I am currently struggling with is to implement a function that actually returns labels. What I would like the function to do is the following:
# rawDate is a long value of the date as ms since 1970
Daytime <- function(rawDate) {
date <- as.POSIXlt(as.numeric(rawDate) / 1000, origin = "1970-01-01")
if (is.earlymorning(date)) {
"earlymorning"
} else if (is.morning(date)) {
"morning"
} else if (is.afternoon(date)) {
"afternoon"
} else if (is.evening(date)) {
"evening"
} else if (is.earlynight(date)) {
"earlynight"
} else if (is.latenight(date)) {
"latenight"
}
}
The problem
Obviously, my above approach does not work since the if-conditions would operate on whole vectors in my example. Is there an elegant way to solve this problem? I am sure I am confusing or missing some important points, but as I mentioned I am pretty new to R.
In short, what I want to implement is a function that returns a vector of labels according to a vector of date values:
# Insert new column with daytime labels
calls$daytime <- Daytime(df$date)
# or something like that:
calls$daytime <- sapply(df$date, Daytime)
# Daytime(1400594572974, 1375555528144, 1385922648396) => (“afternoon”, “evening”, “evening”)
One approach would be to use cut rather than ifelse. I am not entirely sure how you want to label hours, but this will give you the idea. foo is your data (i.e., calls).
library(dplyr)
# Following your idea
ana <- transform(foo, date = as.POSIXlt(as.numeric(date) / 1000, origin = "1970-01-01"))
ana %>%
mutate(hour = cut(as.numeric(format(date, "%H")),
breaks = c(00,04,08,12,16,20,24),
label = c("late night", "early morning",
"morning", "afternoon",
"evening", "early night")
)
)
# call_id length date direction daytime hour
#1 258 531 2014-05-20 23:02:52 outgoing afternoon early night
#2 259 0 2013-08-04 03:45:28 unanswered evening late night
#3 260 778 2013-12-02 03:30:48 incoming evening late night
There is no need to have 6 different functions to establish which period of the day a given date is. It suffices to define a vector which matches the hour with the daytime. For instance:
Daytime<-function(rawDate) {
#change the vector according to your definition of the daytime.
#the first value corresponds to hour 0 and the last to hour 23
hours<-c(rep("latenight",5),rep("earlymorning",4),rep("morning",4),rep("afternoon",4),rep("evening",4),rep("earlynight",3))
hours[as.POSIXlt(as.numeric(rawDate) / 1000, origin = "1970-01-01")$hour+1]
}
Given Thomas' hint, I solved my problem in the following (addmittedly unelegant) way:
Daytime <- function(rawDates) {
dates <- as.POSIXlt(as.numeric(rawDates) / 1000, origin = "1970-01-01")
ifelse(is.earlymorning(dates), "earlymorning",
ifelse(is.morning(dates), "morning",
ifelse(is.afternoon(dates), "afternoon",
ifelse(is.evening(dates), "evening",
ifelse(is.earlynight(dates), "earlynight",
ifelse(is.latenight(dates), "latenight",
"N/A")
)
)
)
)
)
}
Considering a case with more labels this approach will get unmaintainable soon. Right now it serves my purposes and I will leave it at that since I must focus on analysing the data as soon as possible. But I will let you know if I had time left and found a less complicated solution! Thank you for your quick response, Thomas.

In using timeDate R package, I receive an error when specifying GBNewYearsEve

IN looking for a way to modify the .Holiday object in the chron package I discovered this solution
How to define holidays for is.holiday() chron package in R
Which works very well in itself, except when I include "GBNewYearsEve" in hlist, I recieve an error:
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'GBNewYearsEve' of mode 'function' was not found
This error doesn't appear if GBNewYearsEve is removed from the list. What have I missed?
Example Working Code:
library(chron)
library(timeDate)
hlist <- c("GBMayDay", "GBBankHoliday", "GBSummerBankHoliday", "ChristmasEve", "ChristmasDay", "BoxingDay", "NewYearsDay")
(ss <- dates(sapply(sapply(hlist,holiday,year=(c(2011)),as.Date)))
.Holidays <- ss
chron::.Holidays ##nochange
unlockBinding(".Holidays", as.environment("package:chron"))
assignInNamespace(".Holidays", .Holidays, ns="chron",
envir=as.environment("package:chron"))
assign(".Holidays", .Holidays, as.environment("package:chron"))
lockBinding(".Holidays", as.environment("package:chron"))
chron::.Holidays ##change
Example non-working code:
hlist <- c("GBMayDay", "GBBankHoliday", "GBSummerBankHoliday", "ChristmasEve", "ChristmasDay", "BoxingDay", "NewYearsDay", "GBNewYearsEve")
(ss <- dates(sapply(sapply(hlist,holiday,year=2011),as.Date)))
Not sure this is an answer that will suit you. I was curious with your problem and I've downloaded the timeDate package from CRAN. Although it seems to be documented in ?holiday, I don't think the code is ready for GBNewYearsEve.
If I run your code as it is I get:
> hlist <- c("GBMayDay", "GBBankHoliday", "GBSummerBankHoliday", "ChristmasEve", "ChristmasDay", "BoxingDay", "NewYearsDay", "GBNewYearsEve")
>
> (ss <- dates(sapply(sapply(hlist,holiday,year=2011),as.Date)))
Error in get(as.character(FUN), mode = "function", envir = envir) :
el objeto 'GBNewYearsEve' de modo 'function' no fue encontrado
(Sorry for the mixture of languages, basically the error message is saying that GBNewYearsEve was not found. I actually don't find it in the code of timeDate. However, if I add a definition like this:
GBNewYearsEve =
function(year = getRmetricsOptions("currentYear")) {
ans = year*10000 + 1231
timeDate(as.character(ans)) }
(Which is basically copied from DENewYearsEve, the only definition for New Years' Eve present in the package)
Then I get your code running:
> (ss <- dates(sapply(sapply(hlist,holiday,year=2011),as.Date)))
GBMayDay GBBankHoliday GBSummerBankHoliday ChristmasEve ChristmasDay BoxingDay
05/02/11 05/30/11 08/29/11 12/24/11 12/25/11 12/26/11
NewYearsDay GBNewYearsEve
01/01/11 12/31/11
However I'm not sure how good a solution is this. Note that in dateTime, some additional transformations are done so that e.g. when the holiday falls in a weekend it is moved to the following day. With the code above, you get just the New Years' Eve on the 31th of December.
For example, this is in holiday-LONDON.R:
# New Year's Day: if it falls on Sat/Sun, then is
# moved to following Monday
posix1 <- as.POSIXlt(NewYearsDay(y))
if (posix1$wday == 0 | posix1$wday == 6) {
lon <- timeDate(.on.or.after(y, 1, 1, 1), zone = "London",
FinCenter = "Europe/London")
holidays <- c(holidays, as.character(lon))
} else {
holidays <- c(holidays, as.character(posix1))
}
I guess the package is handling only official holidays for each country, and adding those additional rules?

Why are lubridate functions so slow when compared with as.POSIXct?

As the title goes. Why is the lubridate function so much slower?
library(lubridate)
library(microbenchmark)
Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 50000, replace = TRUE)
microbenchmark(as.POSIXct(Dates, format = "%d-%b-%Y %H:%M:%S", tz = "GMT"), times = 100)
microbenchmark(dmy(Dates, tz ="GMT"), times = 100)
Unit: milliseconds
expr min lq median uq max
1 as.POSIXct(Dates, format = "%d-%b-%Y %H:%M:%S", tz = "GMT") 103.1902 104.3247 108.675 109.2632 149.871
2 dmy(Dates, tz = "GMT") 184.4871 194.1504 197.8422 214.3771 268.4911
For the same reason cars are slow in comparison to riding on top of rockets. The added ease of use and safety make cars much slower than a rocket but you're less likely to get blown up and it's easier to start, steer, and brake a car. However, in the right situation (e.g., I need to get to the moon) the rocket is the right tool for the job. Now if someone invented a car with a rocket strapped to the roof we'd have something.
Start with looking at what dmy is doing and you'll see the difference for the speed (by the way from your bechmarks I wouldn't say that lubridate is that much slower as these are in milliseconds):
dmy #type this into the command line and you get:
>dmy
function (..., quiet = FALSE, tz = "UTC")
{
dates <- unlist(list(...))
parse_date(num_to_date(dates), make_format("dmy"), quiet = quiet,
tz = tz)
}
<environment: namespace:lubridate>
Right away I see parse_date and num_to_date and make_format. Makes one wonder what all these guys are. Let's see:
parse_date
> parse_date
function (x, formats, quiet = FALSE, seps = find_separator(x),
tz = "UTC")
{
fmt <- guess_format(head(x, 100), formats, seps, quiet)
parsed <- as.POSIXct(strptime(x, fmt, tz = tz))
if (length(x) > 2 & !quiet)
message("Using date format ", fmt, ".")
failed <- sum(is.na(parsed)) - sum(is.na(x))
if (failed > 0) {
message(failed, " failed to parse.")
}
parsed
}
<environment: namespace:lubridate>
num_to_date
> getAnywhere(num_to_date)
A single object matching ‘num_to_date’ was found
It was found in the following places
namespace:lubridate
with value
function (x)
{
if (is.numeric(x)) {
x <- as.character(x)
x <- paste(ifelse(nchar(x)%%2 == 1, "0", ""), x, sep = "")
}
x
}
<environment: namespace:lubridate>
make_format
> getAnywhere(make_format)
A single object matching ‘make_format’ was found
It was found in the following places
namespace:lubridate
with value
function (order)
{
order <- strsplit(order, "")[[1]]
formats <- list(d = "%d", m = c("%m", "%b"), y = c("%y",
"%Y"))[order]
grid <- expand.grid(formats, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
lapply(1:nrow(grid), function(i) unname(unlist(grid[i, ])))
}
<environment: namespace:lubridate>
Wow we got strsplit-ting, expand-ing.grid-s, paste-ing, ifelse-ing, unname-ing etc. plus a Whole Lotta Error Checking Going On (play on the Zep song). So what we have here is some nice syntactic sugar. Mmmmm tasty but it comes with a price, speed.
Compare that to as.POSIXct:
getAnywhere(as.POSIXct) #tells us to use methods to see the business
methods('as.POSIXct') #tells us all the business
as.POSIXct.date #what I believe your code is using (I don't use dates though)
There's a lot more Internal coding and less error checking going on with as.POSIXct So you have to ask do I want ease and safety or speed and power? Depends on the job.
#Tyler's answer is correct. Here's some more info including a tip on making lubridate faster - from the help file:
" Lubridate has an inbuilt very fast POSIX parser, ported from the
fasttime package by Simon Urbanek. This functionality is as yet
optional and could be activated with options(lubridate.fasttime =
TRUE). Lubridate will automatically detect POSIX strings and use fast
parser instead of the default strptime utility. "

Resources