Apply over xts with multiple columns

Apply over xts with multiple columns - r

I'm having a weird error which I can not understand. Let me explain the variables and their meaning:
ts <- a xts object
range.matrix <- matrix with two columns and n rows (only knows at execution time)
so, range.matrix contains ranges of dates. first column is the start of the range and second column is the end of it. The goal is to slice the ts time series by the ranges in range.matrix a get a list with all slices.
It fails with some ranges but not in others, and fails with 1 row matrices... The error message is:
Error in array(ans, c(len.a%/%d2, d.ans), if (!is.null(names(dn.ans))
length of 'dimnames' 1 not equal to array extent
Check yourself with this toy example (range.matrix contains numbers which are cast as.Date)
library(xts)
ts <- xts(cbind('a'= c(1,2,3,4,5,6,7,8),'b' =c(1,2,3,4,5,6,7,8),'c'= c(1,2,3,4,5,6,7,8))
,order.by = as.Date(as.Date('2017-01-01'):(as.Date('2017-01-01')+7)) )
range.matrix <- matrix(c(16314,17286), ncol = 2,byrow = TRUE) # Fails. Range: "2014-09-01/2017-04-30"
range.matrix <- matrix(c(16314,17236,16314,17286), ncol = 2,byrow = TRUE) # Fails. Range: "2014-09-01/2017-03-11" and "2014-09-01/2017-04-30"
range.matrix <- matrix(c(16314,17236,17237,17286), ncol = 2,byrow = TRUE) # does not fail. "2014-09-01/2017-03-11" and "2017-03-12/2017-04-30"
apply(range.matrix,
1,
function(r) {
ts[paste0(as.Date(r[1]), '/', as.Date(r[2]))]
})
Any clue? It has to do with dimnames but can not find the solution

Try this instead, and you won't have issues:
lapply(split(range.matrix, row(range.matrix)), function(x) {
ts[paste0(as.Date(r[1]), '/', as.Date(r[2]))]})
Personally I would not use apply on xts objects in the way you want to do it (i'd do the above; lapply is much more natural).
apply is used on arrays, and an xts object is not just a matrix (array), but also supports a time index and other attributes that give xts its power. You could use something like coredata on the xts object to just return the underlying matrix to the apply call, and then you won't get errors, but the results don't make much sense.
apply(range.matrix,
1,
function(r) {
res <- ts[paste0(as.Date(r[1]), '/', as.Date(r[2]))]
coredata(res)
})

Related

I'm looping through a vector and trying to use each element as a dataframe column name. How do I get each element to be read after the $ in df3$x?

Here is my code:
for (x in col_names){
time_series <- ts(df3$x, frequency=1, start = df3$index[1])
}
When I run this I get the following error:
Error in ts(df3$x, frequency = 1, start = df3$index[1]): 'ts' object must have one or more observations
Traceback:
1. ts(df3$x, frequency = 1, start = df3$index[1])
2. stop("'ts' object must have one or more observations")
I believe this is because R cannot read the df3$x. How do I loop through a vector and use an element of that vector as a column name of my dataframe?

It should be with [[
for (x in col_names){
time_series <- ts(df3[[x]], frequency=1, start = df3$index[1])
}
assuming that col_names is vector of specific column names
In the OP's code, time_series object created will get updated in each iteration. Instead, would be better to return in a list
time_series_list <- vector('list' length(col_names))
names(time_series_list) <- col_names
for (x in col_names){
time_series_list[[x]] <- ts(df3[[x]], frequency=1, start = df3$index[1])
}

Rbindlist error due to string to date conversion in a loop

I'm building a complex code that loops over 10-1000 files, and calculates a whole bunch of summary statistics for each file based on 6 grouping columns. That all works fine, but in the double apply structure, I'm also trying to extract the date from the filename and convert that to a date format, and add it as column to each data frame.
Without the date conversion in my full code, as well as in this example code it works fine, but with the conversion in it, it seems to cause the loop to suddenly produce strange errors.
I have tried dozen of ways to make it work. Normally single string to date format is not a problem for me, but how do I make this work in this loop structure?
At first I thought that the problem was that the date format conversion didn't work, but it seems to work, but it causes problems with the rbindlist code.
Error in rbindlist(ClusterResultlist[[cl]]) :
Column 2 of item 1 is length 11, inconsistent with first column of that item which is length 10. rbind/rbindlist doesn't recycle as it already expects each item to be a uniform list, data.frame or data.table
I have no clue why it's claiming that there is a difference in length, or how to solve it.
Question: How to convert the strings to Date format either inside the loops, or afterwards.
my code:
myfiles <- list("PICO in situ 55 10 100 100 100 2016-05-06 19u03_clustered_newtest1.csv", "PICO in situ 55 10 100 100 100 2016-05-07 19u03_clustered_newtest1.csv")
## list of clustering columns to summarize over
Clusterlist <- c('Cluster_FP1', 'Cl_names_FP1', 'GR_names_FP1', 'Cluster_FP2', 'Cl_names_FP2', 'GR_names_FP2') #
ClusterResultlist <- vector("list", length(Clusterlist))
names(ClusterResultlist) <- Clusterlist
SummarizeData <- function(y){
lapply(Clusterlist, function(z) {
datetime <- substr(y, nchar(y) -38, nchar(y) -23)
FullCounts <- data.frame(DummyIndex = 1:10)
FullCounts$DateTime <- strptime(datetime,format = "%Y-%m-%d %Hu%M")
ClusterResultlist[[z]][[y]] <<- FullCounts
})}
# run the function over all files
mapply(SummarizeData, y = myfiles)
# create 6 main dataframes out of all sub data frames
lapply(Clusterlist, function(cl) { ClusterResultlist[[cl]] <<- rbindlist(ClusterResultlist[[cl]]) })
UPDATE:
We have two (partial) solutions now, but they will not be as fast as rbindlist I believe on my actual large data object.
I tried to do the conversion outside the loops on the final ClusterResultList but that throws this error:
lapply(Clusterlist, function(cl) { ClusterResultlist[[cl]] <<- rbindlist(ClusterResultlist[[cl]]) })
lapply(Clusterlist, function(cl) { ClusterResultlist[[cl]]$DateTime <<- strptime(ClusterResultlist[[cl]]$DateTime,format = "%Y-%m-%d %Hu%M") })
In `[<-.data.table`(x, j = name, value = value) :
Supplied 11 items to be assigned to 20 items of column 'DateTime' (recycled leaving remainder of 9 items).

Fixing the date with the help of lubridate fixes the problem with rblindlist.
Replace:
FullCounts$DateTime <- strptime(datetime,format = "%Y-%m-%d %Hu%M")
With:
FullCounts$DateTime <- lubridate::ymd_hms(strptime(datetime,format = "%Y-%m-%d %Hu%M"))

How about using rbind instead of rbindlist?
lapply(Clusterlist, function(cl) ClusterResultlist[[cl]] <<- do.call(rbind, ClusterResultlist[[cl]]))

R: created a names vector containing the means of multiple numeric vectors

I have over 20 numeric vectors which consist of a series of values. each vector is distinguished by a letter, e.g. val_a, val_b, val_c etc...
I would like to put the means from each of these vectors into a single named vector. I could of course do this in a laborious manner like so:
obs <- c("val_a" = round(mean(val_a),3),
"val_b" = round(mean(val_b),3),
"val_c" = round(mean(val_c),3))
But with 20 vectors this then becomes tedious to write out, and not to mention an inelegant solution. How can I create the named vector in a more succinct way? I have made an attempt using a for loop, as so:
obs <- c(for (j in 1:20) {
assign(paste("val",letters[j], sep = "_"),
mean(as.name(paste('val',letters[j], sep = '_'))),)
})
In the right hand argument passed to assign, "as.name" is used in order to remove the quotation marks from output of "paste". So the second argument passed to assign returns a character which has the exact same name as the numeric vector that I want get the mean of, e.g. val_a. But I get the error messsage:
Warning messages:
1: In mean.default(as.name(paste("val", letters[j], sep = "_"))) :
argument is not numeric or logical: returning NA
Does anyone know how to accomplish this?

Solution
To build on bouncyball's comment so you have a full answer, you can do this:
sapply(paste('val', letters[1:20], sep='_'), function(x) round(mean(get(x)), 3))
Explanation
For an object in your environment called x, get("x") will return x. See help("get"). Then we can do this for every element of paste('val', letters[1:20], sep='_') using sapply(), or if you like, a loop.
Example
val_a <- rnorm(100)
val_b <- rnorm(100)
val_c <- rnorm(100)
sapply(paste('val', letters[1:3], sep='_'), function(x) round(mean(get(x)), 3))
val_a val_b val_c
-0.09328504 -0.15632654 -0.09759111

R Custom proxy function in dtw for wraping

I have two column of data Tm and Ts and I want to apply the dtw algorithm changing the distance function. Proxy provide this possibility but I can't understand why it gives me an error.
I have 2 vector of data with the same length:
Tm Ts
301.0607 300.6008
301.3406 300.6515
301.5912 300.7289
301.5777 300.8506
301.5996 301.0158
301.6414 301.2103
301.7181 301.4113
myDTW<-function(x,y)(diff(x,lag=1,difference=1)-diff(y,lag=1,difference=1))^2
pr_DB$set_entry(FUN = myDTW, names = c("test_myDTW", "myDTW"))
Alignment<-dtw(a$Ts,b$Tm,dist.method="test_myDTW",keep.internals=TRUE)
Error in do.call(".External", c(list(CFUN, x, y, pairwise,
if (!is.function(method)) get(method) else method), :
not a scalar return value
diff() changes the length of the vector from n to n-1 but both vectors are changed, so I think that the problem are not on matching vector of different length.
Do you have any suggestion?

The error is explicit :
not a scalar return value
Your myDTW don't return a scalar. You need to define it as a valid distance function. If you change it to something like :
myDTW <- function(x,y){
res <- (diff(x,lag=1,difference=1)
-diff(y,lag=1,difference=1))^2
sum(res) ## I return the sum of square here
}
It will works. I think also you need to use modify_entry to modify the method value in the register.
dat <- read.table(text='Tm Ts
301.0607 300.6008
301.3406 300.6515
301.5912 300.7289
301.5777 300.8506
301.5996 301.0158
301.6414 301.2103
301.7181 301.4113',header=TRUE)
myDTW <- function(x,y){
res <- (diff(x,lag=1,difference=1)
-diff(y,lag=1,difference=1))^2
sum(res)
}
pr_DB$modify_entry(FUN = myDTW, names = c("test_myDTW", "myDTW"))
library(dtw)
## I change a and b to dat here
dtw(dat$Ts,dat$Tm,dist.method="test_myDTW",keep.internals=TRUE)
The result is :
DTW alignment object
Alignment size (query x reference): 7 x 7
Call: dtw(x = dat$Ts, y = dat$Tm, dist.method = "test_myDTW", keep.internals = TRUE)

Avoiding Looping Through Every row and column

I came across this function a while back that was created for fixing PCA values. The problem with the function was that it wasn't compatible xts time series objects.
amend <- function(result) {
result.m <- as.matrix(result)
n <- dim(result.m)[1]
delta <- apply(abs(result.m[-1,] - result.m[-n,]), 1, sum)
delta.1 <- apply(abs(result.m[-1,] + result.m[-n,]), 1, sum)
signs <- c(1, cumprod(rep(-1, n-1) ^ (delta.1 <= delta)))
zoo(result * signs)
}
Full sample can be found https://stats.stackexchange.com/questions/34396/im-getting-jumpy-loadings-in-rollapply-pca-in-r-can-i-fix-it
The problem is that applying the function on a xts object with multiple columns and rows wont solve the problem. Is there a elegant way of applying the algorithm for a matrix of xts objects?
My current solution given a single column with multiple row is to loop through row by row...which is slow and tedious. Imagine having to do it column by column also.
Thanks,
Here is some code to get one started:
rm(list=ls())
require(RCurl)
sit = getURLContent('https://github.com/systematicinvestor/SIT/raw/master/sit.gz', binary=TRUE, followlocation = TRUE, ssl.verifypeer = FALSE)
con = gzcon(rawConnection(sit, 'rb'))
source(con)
close(con)
load.packages('quantmod')
data <- new.env()
tickers<-spl("VTI,IEF,VNQ,TLT")
getSymbols(tickers, src = 'yahoo', from = '1980-01-01', env = data, auto.assign = T)
for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
bt.prep(data, align='remove.na', dates='1990::2013')
prices<-data$prices[,-10] #don't include cash
retmat<-na.omit(prices/mlag(prices) - 1)
rollapply(retmat, 500, function(x) summary(princomp(x))$loadings[, 1], by.column = FALSE, align = "right") -> princomproll
require(lattice)
xyplot(amend(pruncomproll))
plotting "princomproll" will get you jumpy loadings...

It isn't very obvious how the amend function relates to the script below it (since it isn't called there), or what you are trying to achieve. There are a couple of small changes that can be made. I haven't profiled the difference, but it's a little more readable if nothing else.
You remove the first and last rows of the result twice.
rowSums might be slightly more efficient for getting the row sums than apply.
rep.int is a little bit fster than rep.
amend <- function(result) {
result <- as.matrix(result)
n <- nrow(result)
without_first_row <- result[-1,]
without_last_row <- result[-n,]
delta_minus <- rowSums(abs(without_first_row - without_last_row))
delta_plus <- rowSums(abs(without_first_row + without_last_row))
signs <- c(1, cumprod(rep.int(-1, n-1) ^ (delta_plus <= delta_minus)))
zoo(result * signs)
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Apply over xts with multiple columns - r

Related

I'm looping through a vector and trying to use each element as a dataframe column name. How do I get each element to be read after the $ in df3$x?

Rbindlist error due to string to date conversion in a loop

R: created a names vector containing the means of multiple numeric vectors

R Custom proxy function in dtw for wraping

Avoiding Looping Through Every row and column

Categories

Resources