xts operations yield wrong result - r

Assume I have three xts objects a, m, s, indexed with the same time slots, I want to compute abs((a*20)-m)/s. This works in the following simple case:
bla <- data.frame(c("2016-09-03 13:00", "2016-09-03 13:10", "2016-09-03 13:20"),c(1,2,3), c(4,5,6), c(7,8,9))
names(bla) <- c('ts','lin','qua','cub')
a <- as.xts(x = bla[,c('lin','qua','cub')], order.by=as.POSIXct(bla$ts)
... similar for m and s...
abs((a*20)-m)/s
gives the correct results.
When I go to my real data, I see different behaviour:
> class(a)
[1] "xts" "zoo"
> class(m)
[1] "xts" "zoo"
> class(s)
[1] "xts" "zoo"
> dim(a)
[1] 1 4650
> dim(m)
[1] 1 4650
> dim(s)
[1] 1 4650
Also the column names are the same:
> setdiff(names(a),names(m))
character(0)
> setdiff(names(m),names(s))
character(0)
Now when I do n <- abs((a*20)-m)/s I get
> n[1,feature]
feature
2016-09-08 14:00:00 12687075516
but if I do the computation by hand:
> aa <- coredata((a*20)[1,feature])[1,1]
> mm <- coredata(m[1,feature])[1,1]
> ss <- coredata(s[1,feature])[1,1]
> abs(aa-mm)/ss
feature
0.0005893713
Just to give the original values:
> a[1,feature]
feature
2016-09-08 14:00:00 27955015680
> m[1,feature]
feature
2016-09-08 14:00:00 559150430034
> s[1,feature]
feature
2016-09-08 14:00:00 85033719103
Can anyone explain this discrepancy?
Thanks a lot
Norbert

Self answering: the error was that I believed that xts is more intelligent in the sense that a/b considers column names, which it does not.
> a
lin qua cub
2016-09-03 13:00:00 1 4 7
2016-09-03 13:10:00 2 5 8
2016-09-03 13:20:00 3 6 9
> b
qua lin cub
2016-09-03 13:00:00 2 3 4
2016-09-03 13:10:00 2 3 4
2016-09-03 13:20:00 2 3 4
> a/b
lin qua cub
2016-09-03 13:00:00 0.5 1.333333 1.75
2016-09-03 13:10:00 1.0 1.666667 2.00
2016-09-03 13:20:00 1.5 2.000000 2.25
Division is done via the underlying matrix without taking care of column names. That is the reason while even if the set of column names coincide, the results are wrong.

Related

Recode Date (time) varibre in to new discrete variable

i have time variable : "00:00:29","00:06:39","20:43:15"....
and I want to recode to new vector - time based work shifts:
07:00:00 - 13:00:00 - 1
13:00:00 - 20:00:00 - 2
23:00:00 - 7:00:00 - 3
thanks for any idea :)
Assuming the time variables are strings as shown, this seems to work:
secNr <- function(x){ sum(as.numeric(unlist(strsplit(x,":",fixed=TRUE))) * c(3600,60,1)) }
workShift <- function(x)
{
n <- which.max(secNr(x) >= c(secNr("23:00:00"),secNr("20:00:00"),secNr("13:00:00"),secNr("07:00:00"),secNr("00:00:00")))
c(3,NA,2,1,3)[n]
}
"workShift" computes the work shift of one such time string. If you have a vector of time strings, use "sapply". Example:
> Time <- sprintf("%i:%02i:00", 0:23, sample(0:59,24))
> Shift <- sapply(Time,"workShift")
> Shift
0:37:00 1:17:00 2:35:00 3:09:00 4:08:00 5:28:00 6:03:00 7:43:00 8:27:00 9:38:00 10:48:00 11:50:00 12:58:00 13:32:00 14:05:00 15:39:00 16:56:00
3 3 3 3 3 3 3 1 1 1 1 1 1 2 2 2 2
17:00:00 18:22:00 19:02:00 20:42:00 21:11:00 22:15:00 23:01:00
2 2 2 NA NA NA 3

As.XTS from Matrix - Error - Adds time and timezone info

For some reason I do not understand, when I run as.xts to convert from a matrix with a date in rownames, this operation will generate a Date Time in the end. Since this is different from the start indexes merge/cbinds will not work.
Can someone point me what am I doing wrong?
> class(x)
[1] "xts" "zoo"
> head(x)
XLY.Adjusted XLP.Adjusted XLE.Adjusted AGG.Adjusted IVV.Adjusted
2005-07-31 0.042255791 0.017219585 0.17841600 0.010806168 0.04960026
2005-08-31 0.034117087 0.009951766 0.18476766 0.015245222 0.03825968
2005-09-30 -0.029594066 0.008697349 0.22851906 0.009769765 0.02944754
2005-10-31 -0.015653740 0.019966664 0.09314327 -0.012705172 0.01640395
2005-11-30 -0.005593003 0.005932542 0.05437377 -0.005209811 0.03173972
2005-12-31 0.005084193 0.021293537 0.05672958 0.002592639 0.04045477
> head(index(x))
[1] "2005-07-31" "2005-08-31" "2005-09-30" "2005-10-31" "2005-11-30" "2005-12-31"
> temp=t(apply(-x, 1, rank, na.last = "keep"))
> class(temp)
[1] "matrix"
> head(temp)
XLY.Adjusted XLP.Adjusted XLE.Adjusted AGG.Adjusted IVV.Adjusted
2005-07-31 3 4 1 5 2
2005-08-31 3 5 1 4 2
2005-09-30 5 4 1 3 2
2005-10-31 5 2 1 4 3
2005-11-30 5 3 1 4 2
2005-12-31 4 3 1 5 2
> head(rownames(temp))
[1] "2005-07-31" "2005-08-31" "2005-09-30" "2005-10-31" "2005-11-30" "2005-12-31"
> y=as.xts(temp)
> class(y)
[1] "xts" "zoo"
> head(y)
XLY.Adjusted XLP.Adjusted XLE.Adjusted AGG.Adjusted IVV.Adjusted
2005-07-31 3 4 1 5 2
2005-08-31 3 5 1 4 2
2005-09-30 5 4 1 3 2
2005-10-31 5 2 1 4 3
2005-11-30 5 3 1 4 2
2005-12-31 4 3 1 5 2
> head(index(y))
[1] "2005-07-31 BST" "2005-08-31 BST" "2005-09-30 BST" "2005-10-31 GMT" "2005-11-30 GMT" "2005-12-31 GMT"
as.xts.matrix has a dateFormat argument that defaults to "POSIXct", so it assumes the rownames of your matrix are datetimes. If you want them to simply be dates, specify dateFormat="Date" in your as.xts call.
y <- as.xts(temp, dateFormat="Date")

Warnings when using custom function to every row of a table using dplyr?

I am trying to replicate something like this with a custom function but I am getting errors. I have the following data frame
> dd
datetimeofdeath injurydatetime
1 2/10/05 17:30
2 2/13/05 19:15
3 2/15/05 1:10
4 2/24/05 21:00 2/16/05 20:36
5 3/11/05 0:45
6 3/19/05 23:05
7 3/19/05 23:13
8 3/23/05 20:51
9 3/31/05 11:30
10 4/9/05 3:07
The typeof these is integer but for some reason they have levels as if they were factors. This could be the root of my problem but I am not sure.
> typeof(dd$datetimeofdeath)
[1] "integer"
> typeof(dd$injurydatetime)
[1] "integer"
> dd$injurydatetime
[1] 2/10/05 17:30 2/13/05 19:15 2/15/05 1:10 2/16/05 20:36 3/11/05 0:45 3/19/05 23:05 3/19/05 23:13 3/23/05 20:51 3/31/05 11:30
[10] 4/9/05 3:07
549 Levels: 1/1/07 18:52 1/1/07 20:51 1/1/08 17:55 1/1/11 15:25 1/1/12 0:22 1/1/12 22:58 1/11/06 23:50 1/11/07 6:26 ... 9/9/10 8:15
Now I would like to apply the following function rowwise()
library(lubridate)
library(dplyr)
get_time_alive = function(datetimeofdeath, injurydatetime)
{
if(as.character(datetimeofdeath) == "" | as.character(injurydatetime) == "") return(NA)
time_of_death = parse_date_time(as.character(datetimeofdeath), "%m/%d/%y %H:%M")
time_of_injury = parse_date_time(as.character(injurydatetime), "%m/%d/%y %H:%M")
time_alive = as.duration(new_interval(time_of_injury,time_of_death))
time_alive_hours = as.numeric(time_alive) / (60*60)
return(time_alive_hours)
}
This works on individual rows, but not when I do the operation rowwise.
> get_time_alive(dd$datetimeofdeath[1], dd$injurydatetime[1])
[1] NA
> get_time_alive(dd$datetimeofdeath[4], dd$injurydatetime[4])
[1] 192.4
> dd = dd %>% rowwise() %>% dplyr::mutate(time_alive_hours=get_time_alive(datetimeofdeath, injurydatetime))
There were 20 warnings (use warnings() to see them)
> dd
Source: local data frame [10 x 3]
Groups:
datetimeofdeath injurydatetime time_alive_hours
1 2/10/05 17:30 NA
2 2/13/05 19:15 NA
3 2/15/05 1:10 NA
4 2/24/05 21:00 2/16/05 20:36 NA
5 3/11/05 0:45 NA
6 3/19/05 23:05 NA
7 3/19/05 23:13 NA
8 3/23/05 20:51 NA
9 3/31/05 11:30 NA
10 4/9/05 3:07 NA
As you can see the fourth element is NA even though when I applied my custom function to it by itself I got 192.4. Why is my custom function failing here?
I think you can simplify your code a lot and just use something like this:
dd %>%
mutate_each(funs(as.POSIXct(as.character(.), format = "%m/%d/%y %H:%M"))) %>%
mutate(time_alive = datetimeofdeath - injurydatetime)
# datetimeofdeath injurydatetime time_alive
#1 <NA> 2005-02-15 01:10:00 NA days
#2 2005-02-24 21:00:00 2005-02-16 20:36:00 8.016667 days
#3 <NA> 2005-03-11 00:45:00 NA days
Side notes:
I shortened your input data, because it's not easy to copy (I only took those three rows that you also see in my answer)
If you want the "time_alive" formatted in hours, just use mutate(time_alive = (datetimeofdeath - injurydatetime)*24) in the last mutate.
If you use this code, there's no need for rowwise() - which should also make it faster, I guess

Extract rows from matrix based on if condition applied to each row in R

Could you help me figuring out why the following doesn't work? I have a 2528x3 matrix uniqueitems which looks like that:
Number Created Customer
=========== =================== ============
31464686486 2013-10-25 10:00:00 john#john.de
...
What I'd like to do: Go through every row, check if Created is more recent than a given time and, if so, write the row into a new table newerthantable. Here's my code:
library(lubridate);
newerthan <- function(x) {
times <- ymd_hms(uniqueitems[,2])
newerthantable <- matrix(data=NA,ncol=3,nrow=1)
i <- 1;
while (i <= nrow(uniqueitems)) {
if (x < times[i]) {
newerthantable <- rbind(newerthantable,uniqueitems[i,])
}
i <- i + 1;
}
}
But newerthan("2013-10-24 14:00:00") doesn't have the desired effect :(, nothing is written in newerthantable. Why?
In R loops are rarely needed. You can achieve the same results using vectorized operations or subsetting as in this case.
Setup sample data frame:
number <- c(1:10)
created <- seq(as.POSIXct("2013-01-01 10:01"), length.out=10, by="26 hours")
customer <- letters[c(1:10)]
df <- data.frame(number, created, customer)
head(df, 10)
number created customer
1 1 2013-01-01 10:01:00 a
2 2 2013-01-02 12:01:00 b
3 3 2013-01-03 14:01:00 c
4 4 2013-01-04 16:01:00 d
5 5 2013-01-05 18:01:00 e
6 6 2013-01-06 20:01:00 f
7 7 2013-01-07 22:01:00 g
8 8 2013-01-09 00:01:00 h
9 9 2013-01-10 02:01:00 i
10 10 2013-01-11 04:01:00 j
Select rows newer than a given date:
newerthantable <- df[df$created > as.POSIXct("2013-01-05 18:01:00"), ]
head(newerthantable,10)
number created customer
6 6 2013-01-06 20:01:00 f
7 7 2013-01-07 22:01:00 g
8 8 2013-01-09 00:01:00 h
9 9 2013-01-10 02:01:00 i
10 10 2013-01-11 04:01:00 j
The square brackets select rows matching our criteria (created column larger than a given date) and all columns (no column specification after the comma). Read more about subsetting operations here: http://www.ats.ucla.edu/stat/r/modules/subsetting.htm
If you want to wrap it up as a function it will look like this:
new_entries <- function(data, rows_since){
data[data$created > as.POSIXct(rows_since), ]
}
new_entries(df, "2013-01-05 18:01:00")

which() with objects of type character

I have a questions that might be too basic, but here it is...
I want to extract monthly data from a dataset like this:
Date Obs
1 2001-01-01 120
2 2001-01-02 100
3 2001-01-03 150
4 2001-01-04 175
5 2001-01-05 121
6 2001-01-06 100
I just want to get the rows from the data where I have a certain month(e.g. January), this works perfectly:
output=which(strftime(dataset[,1],"%m")=="01",dataset[,1])
However when I try to create a loop to go through all the months using a variable that is declared has character it doesn't work and I only get "FALSE".
value=as.character(k)
output=which(strftime(dataset[,1],"%m")==value,dataset[,1])
Do not parse dates as strings. That is too error prone. Parse dates as dates, and do logical comparisons on them.
Here is one approach, creating January to March data and sub-setting February based on a comparison:
R> output <- data.frame(date=seq(as.Date("2011-01-01"), by=7, length=10),
+ value=cumsum(runif(10)*100))
R> output
date value
1 2011-01-01 8.29916
2 2011-01-08 44.82950
3 2011-01-15 72.08662
4 2011-01-22 134.19277
5 2011-01-29 221.67744
6 2011-02-05 245.77195
7 2011-02-12 314.82081
8 2011-02-19 396.34661
9 2011-02-26 437.14286
10 2011-03-05 442.41321
R> output[ output[,"date"] >= as.Date("2011-02-01") &
+ output[,"date"] <= as.Date("2011-02-28"), ]
date value
6 2011-02-05 245.772
7 2011-02-12 314.821
8 2011-02-19 396.347
9 2011-02-26 437.143
R>
Another approach uses the xts package:
R> oo <- xts(output[,"value"], order.by=output[,"date"])
R> oo
[,1]
2011-01-01 8.29916
2011-01-08 44.82950
2011-01-15 72.08662
2011-01-22 134.19277
2011-01-29 221.67744
2011-02-05 245.77195
2011-02-12 314.82081
2011-02-19 396.34661
2011-02-26 437.14286
2011-03-05 442.41321
R> oo["2011-02-01::2011-02-28"]
[,1]
2011-02-05 245.772
2011-02-12 314.821
2011-02-19 396.347
2011-02-26 437.143
R>
as xts has convenient date parsing for the index; see the package documentation for details.
I'm assuming k is an integer in 1:12. I suspect you may be better off using abbreviated month names:
value <- month.abb[k]
output <- which(strftime(dataset[,1],"%b")==value,dataset[,1])
The reason you way isn't working is because the month number is zero-padded and "1" != "01".
You can also use dates as dates with POSIXlt()$mon
as.POSIXlt(output$date)$mon # Note that Jan = 0 and Feb=1
[1] 0 0 0 0 0 1 1 1 1 2
There are several other packages such as chron, lubridate and gdata that provide date handling functions. I found the functions in lubridate particularly intuitive and less prone to errors in my clumsy hands.

Resources