I am trying to create a lagged vector within an xts object using the lag function. It works when defining the new vector within the xts object using $ notation (e.g. x.ts$r1_lag), but it does when defining the new variable using square brackets, i.e. xts[,"r1_lag"]. See code below:
library(xts)
x <- data.frame(date=seq(as.Date('2015-01-01'), by='days', length=100),
runif(1e2), runif(1e2), runif(1e2))
colnames(x) <- c("date", "r1", "r2", "r3")
#the following command works
x.ts <- xts(x, order.by=x$date)
x.ts$r1_lag <- lag(x.ts$r1)
# but the following does not (says subscript is out of bounds)
x.ts <- xts(x, order.by=x$date)
x.ts[,"r1_lag"] <- lag(x.ts[,"r1"])
I need to use [] notation rather than $ notation to reference the vectors because if I want to run the lag transformation on vectors in more than one xts object (vectors within a list of multiple xts objects), I can't define the new vectors within the objects using $ notation, i.e. I cant define the new vectors using the notation in the below stylized loop:
for (i in letters) {
for (j in variables) {
macro.set.ts$i$paste(j,"_L1",sep="") <- lag(macro.set.ts[[i]][,j])
macro.set.ts$i$paste(j,"_L2",sep="") <- lag(macro.set.ts[[i]][,j], 2)
macro.set.ts$i$paste(j,"_L4",sep="") <- lag(macro.set.ts[[i]][,j], 4)
}
}
Thanks!
You don't need to use [<-.xts. You can use merge instead:
for (i in letters) {
for (j in variables) {
# create all lags
mst_ij <- macro.set.ts[[i]][,j]
jL <- merge(lag(mst_ij), lag(mst_ij, 2), lag(mst_ij, 4))
colnames(jL) <- paste(j, c("L1","L2","L4"), sep="_")
# merge back with original data
macro.set.ts[[i]] <- merge(macro.set.ts[[i]], jL)
}
}
The error is not related to lag function. You get an error because you try assign an xts object with another xts object. This example reproduces the error :
x.date= seq(as.Date('2015-01-01'),
by = 'days' , length = 5)
x1 <- xts(data.frame(c1=runif(5)), order.by=x.date)
x2 <- xts(data.frame(c2=runif(5)), order.by=x.date)
x1[,'r2'] <- x2
## Error in `[<-.default`(`*tmp*`, , "r2",
## subscript out of bounds
I find this is coherent within xts logic, because xts are indexed objects. So it is better here to merge objects or join and conserve the indexed nature of your time series.
merge(x1,x2)
This will cbind the 2 times series and fix any index problem. in fact, cbind is just a merge:
identical(cbind(x1,x2),merge(x1,x2)
That's said I think it is a kind of bug that this works for $<- operator and not with [<- operator.
I got the same output with:
x.ts <- cbind(x.ts,lag(x.ts[,"r1"]))
And
x.ts <- transform(x.ts, r1_lag = lag(x.ts[,'r1']))
But, be careful with the output. It may look the same but with an altered structure.
This should work:
x.ts <- merge(x.ts,lag(x.ts[,"r1"]))
You will then probably want to rename the last column that was added:
dimnames(x.ts)[[2]][5] <- "r1_lag"
This is the result:
> head(x.ts)
date r1 r2 r3 r1_lag
2015-01-01 "2015-01-01" "0.23171030" "0.44174424" "0.3396816640" NA
2015-01-02 "2015-01-02" "0.97292220" "0.74909452" "0.2793033421" "0.23171030"
2015-01-03 "2015-01-03" "0.52320743" "0.49288463" "0.0193637393" "0.97292220"
2015-01-04 "2015-01-04" "0.36574297" "0.69571803" "0.6411834760" "0.52320743"
2015-01-05 "2015-01-05" "0.37563137" "0.13841216" "0.3087215754" "0.36574297"
2015-01-06 "2015-01-06" "0.48089356" "0.32702759" "0.3967609401" "0.37563137"
> class(x.ts)
[1] "xts" "zoo"
Hope this helps.
Related
I have the following setup:
mydata:
today_date
r1 11.11.21
r2 11.11.21
r3 11.11.21
I want to convert column like 'today_date' to a date using
as.Date(today_date,tryFormats = c("%d.%m.%Y")).
So I'm using the following function, which is supposed to change the corresponding column to proper dates:
myfun <- function(x){
x<- as.Date(x, tryFormats = c("%d.%m.%Y"))
}
In this function x is representing a variable corresponding to: mydata$today_date
Sadly, x is properly representing the object that's to be replaced, so instead of:
myfun(mydata$today_date)
I still have to use:
mydata$today_date<- myfun(mydata$today_date)
How can I manipulate the function so the as.Date()-functionality is directly applied? I'm pretty certain that the variable in myfun(x) is not properly able to represent the subsection of my dataframe that I want to change. Any help is very welcome!
Try doing this.
df <- data.frame(today_date = c("11.11.21","11.11.21","11.11.21"))
myfun <- function(df, var = 'today_date'){
df[[var]] <- as.Date(df[[var]], tryFormats = c("%d.%m.%Y"))
return(df)
}
The output is
> myfun(df, "today_date")
today_date
1 0021-11-11
2 0021-11-11
3 0021-11-11
I like the magrittr assignment pipe syntax for this.
library(magrittr)
mydata$today_date %<>% myfun()
Instead of mydata$today_date<- myfun(mydata$today_date)
I'm building a complex code that loops over 10-1000 files, and calculates a whole bunch of summary statistics for each file based on 6 grouping columns. That all works fine, but in the double apply structure, I'm also trying to extract the date from the filename and convert that to a date format, and add it as column to each data frame.
Without the date conversion in my full code, as well as in this example code it works fine, but with the conversion in it, it seems to cause the loop to suddenly produce strange errors.
I have tried dozen of ways to make it work. Normally single string to date format is not a problem for me, but how do I make this work in this loop structure?
At first I thought that the problem was that the date format conversion didn't work, but it seems to work, but it causes problems with the rbindlist code.
Error in rbindlist(ClusterResultlist[[cl]]) :
Column 2 of item 1 is length 11, inconsistent with first column of that item which is length 10. rbind/rbindlist doesn't recycle as it already expects each item to be a uniform list, data.frame or data.table
I have no clue why it's claiming that there is a difference in length, or how to solve it.
Question: How to convert the strings to Date format either inside the loops, or afterwards.
my code:
myfiles <- list("PICO in situ 55 10 100 100 100 2016-05-06 19u03_clustered_newtest1.csv", "PICO in situ 55 10 100 100 100 2016-05-07 19u03_clustered_newtest1.csv")
## list of clustering columns to summarize over
Clusterlist <- c('Cluster_FP1', 'Cl_names_FP1', 'GR_names_FP1', 'Cluster_FP2', 'Cl_names_FP2', 'GR_names_FP2') #
ClusterResultlist <- vector("list", length(Clusterlist))
names(ClusterResultlist) <- Clusterlist
SummarizeData <- function(y){
lapply(Clusterlist, function(z) {
datetime <- substr(y, nchar(y) -38, nchar(y) -23)
FullCounts <- data.frame(DummyIndex = 1:10)
FullCounts$DateTime <- strptime(datetime,format = "%Y-%m-%d %Hu%M")
ClusterResultlist[[z]][[y]] <<- FullCounts
})}
# run the function over all files
mapply(SummarizeData, y = myfiles)
# create 6 main dataframes out of all sub data frames
lapply(Clusterlist, function(cl) { ClusterResultlist[[cl]] <<- rbindlist(ClusterResultlist[[cl]]) })
UPDATE:
We have two (partial) solutions now, but they will not be as fast as rbindlist I believe on my actual large data object.
I tried to do the conversion outside the loops on the final ClusterResultList but that throws this error:
lapply(Clusterlist, function(cl) { ClusterResultlist[[cl]] <<- rbindlist(ClusterResultlist[[cl]]) })
lapply(Clusterlist, function(cl) { ClusterResultlist[[cl]]$DateTime <<- strptime(ClusterResultlist[[cl]]$DateTime,format = "%Y-%m-%d %Hu%M") })
In `[<-.data.table`(x, j = name, value = value) :
Supplied 11 items to be assigned to 20 items of column 'DateTime' (recycled leaving remainder of 9 items).
Fixing the date with the help of lubridate fixes the problem with rblindlist.
Replace:
FullCounts$DateTime <- strptime(datetime,format = "%Y-%m-%d %Hu%M")
With:
FullCounts$DateTime <- lubridate::ymd_hms(strptime(datetime,format = "%Y-%m-%d %Hu%M"))
How about using rbind instead of rbindlist?
lapply(Clusterlist, function(cl) ClusterResultlist[[cl]] <<- do.call(rbind, ClusterResultlist[[cl]]))
I have a data frame of some 90 financial symbols (will use 3 for simplicity)
> View(syM)
symbol
1 APPL
2 YAHOO
3 IBM
I created a function that gets JSON data for these symbols and produce an output. Basically:
nX <- function(x) {
#get data for "x", format it, and store it in "nX"
nX <- x
return(nX)
}
I used a loop to get the data and store the zoo series named after each symbol accordingly.
for (i in 1:nrow(syM)) {
assign(x = paste0(syM[i,]),
value = nX(x = syM[i,]))
Sys.sleep(time = 1)
}
Which results in:
[1] "APPL" "YAHOO" "IBM"
Each is a zoo series with 5 columns of data.
Further, I want to get some plotting done to each series and output the result, preferably using a for loop or something better.
yN <- function(y) {
#plot "y" series, columns 2 and 3, and store it in "yN"
yN <- y[,2:3]
return(yN)
}
Following a similar logic to my previous loop I tried:
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = paste0(syM[i,])))
}
But so far the data is not being sent to the function, only the name of the symbol, so I naturally get:
y[,2:3] : incorrect number of dimensions
I have also tried:
for (i in 1:nrow(syM)) {
assign(x = paste0(syM[i,],".plot"),
value = yN(y = ls(pattern = paste0(syM[i,]))))
}
With similar results. When I input the name of the series manually it does save the plot of the first symbol as "APPL.Plot".
assign(paste0(syM[1,], ".Plot"),
value = yN(p = APPL))
Consider lapply with setNames to create a named list of nX returned objects:
nX_list <- setNames(lapply(syM$symbol, nX), syM$symbol)
# OUTPUT ZOO OBJECTS BY NAMED INDEX
nX_list$AAPL
nX_list$YAHOO
nX_list$IBM
# CREATE SEPARATE OBJECTS FROM LIST
# BUT NO NEED TO FLOOD GLOBAL ENVIR W/ 90 OBJECTS, JUST USE 1 LIST
list2env(nX_list, envir=.GlobalEnv)
For plot function, first add a get inside function to retrieve an object by its string name, then similarly run lapply with setNames:
yN <- function(y) {
#plot "y" series, columns 2 and 3, and store it in "yN"
yobj <- get(nX_list[[y]]) # IF USING ABOVE LIST
yobj <- get(y) # IF USING SEPARATE OBJECT
yN <- yobj[,2:3]
return(yN)
}
plot_list <- setNames(lapply(syM$symbol, yN), paste0(syM$symbol, ".plot"))
# OUTPUT PLOTS BY NAMED INDEX
plot_list$AAPL.plot
plot_list$YAHOO.plot
plot_list$IBM.plot
# CREATE SEPARATE OBJECTS FROM LIST
# BUT NO NEED TO FLOOD GLOBAL ENVIR W/ 90 OBJECTS, JUST USE 1 LIST
list2env(plot_list, envir=.GlobalEnv)
As you note, you're calling yN with a character argument in:
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = paste0(syM[i,])))
}
paste0(syM[i,]) is going to resolve to a character and not the zoo object it appears you're trying to reference. Instead, use something like get():
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = get(paste0(syM[i,]))))
}
Or perhaps just store your zoo objects in a list in the first place and then operate on all elements of the list with something like lapply()...
I started using R today, so I apologize if this is too basic.
First I construct 2 matrices, and construct a vector, whose entries are these matrices. Then, I try to loop over the elements of the vector, i.e. the matrices. However, when I do, I get a "argument of length zero" error.
cam <- 1:12
ped <- 13:24
dim(cam) <- c(3,4)
dim(ped) <- c(4,3)
mats <- c('cam','ped')
for (i in 1:2) {
rownames(mats[i]) <- LETTERS[1:dim(mats[i])[1]]
colnames(mats[i]) <- LETTERS[1:dim(mats[i])[2]]
}
The error text is as follows:
Error in 1:dim(mats[i])[1] : argument of length 0
The question: how to loop over elements of a vector, these elements being matrices? (I'm guessing I'm not calling the elements correctly). Thank you for patience.
The go-to option in R is to use lists:
cam <- 1:12 ; dim(cam) <- c(3,4)
# same as matrix(1:12, nrow = 3, ncol = 4)
ped <- 13:24 ; dim(ped) <- c(4,3)
# save the list ( '=' sign for naming purposes only here)
mats <- list(cam = cam, ped = ped)
# notice the double brackets '[[' which is used for picking the list
for (i in 1:length(mats) {
rownames(mats[[i]]) <- LETTERS[1:dim(mats[[i]])[1]]
colnames(mats[[i]]) <- LETTERS[1:dim(mats[[i]])[2]]
}
# finally you can call the whole list at once as follows:
mats
# or seperately using $ or [[
mats$cam # mats[['cam']]
mats$ped # mats[['ped']]
ALTERNATIVELY
If you really want to get crazy you can take advantage of the get() and assign() functions. get() calls an object by character, and assign() can create one.
mats <- c('cam','ped')
mats.new <- NULL # initialize a matrix placeholder
for (i in 1:length(mats)) {
mats.new <- get(mats[i]) # save as a new matrix each loop
# use dimnames with a list input to do both the row and column at once
dimnames(mats.new) <- list(LETTERS[1:dim(mats.new)[1]],
LETTERS[1:dim(mats.new)[2]])
assign(mats[i],mats.new) # create (re-write) the matrix
}
If the datasets are placed in a list we can use lapply
lst <- lapply(mget(mats), function(x) {
dimnames(x) <- list(LETTERS[seq_len(nrow(x))], LETTERS[seq_len(ncol(x))])
x})
It is better to keep it in a list. In case the original objects needs to be changed
list2env(lst, envir = .GlobalEnv)
Thanks in advance, and sorry if this question has been answered previously - I have looked pretty extensively. I have a dataset containing a row of with concatenated information, specifically: name,color code,some function expression. For example, one value may be:
cost#FF0033#log(x)+6.
I have all of the code to extract the information, and I end up with a vector of expressions that I would like to convert to a list of actual functions.
For example:
func.list <- list()
test.func <- c("x","x+1","x+2","x+3","x+4")
where test.func is the vector of expressions. What I would like is:
func.list[[3]]
To be equivalent to
function(x){x+3}
I know that I can create a function using:
somefunc <- function(x){eval(parse(text="x+1"))}
to convert a character value into a function. The problem comes when I try and loop through to make multiple functions. For an example of something I tried that didn't work:
for(i in 1:length(test.func)){
temp <- test.func[i]
f <- assign(function(x){eval(expr=parse(text=temp))})
func.list[[i]] <- f
}
Based on another post (http://stats.stackexchange.com/questions/3836/how-to-create-a-vector-of-functions) I also tried this:
makefunc <- function(y){y;function(x){y}}
for(i in 1:length(test.func)){
func.list[[i]] <- assign(x=paste("f",i,sep=""),value=makefunc(eval(parse(text=test.func[i]))))
}
Which gives the following error: Error in eval(expr, envir, enclos) : object 'x' not found
The eventual goal is to take the list of functions and apply the jth function to the jth column of the data.frame, so that the user of the script can specify how to normalize each column within the concatenated information given by the column header.
Maybe initialize your list with a single generic function, and then update them using:
foo <- function(x){x+3}
> body(foo) <- quote(x+4)
> foo
function (x)
x + 4
More specifically, starting from a character, you'd probably do something like:
body(foo) <- parse(text = "x+5")
Just to add onto joran's answer, this is what finally worked:
test.data <- matrix(data=rep(1,25),5,5)
test.data <- data.frame(test.data)
test.func <- c("x","x+1","x+2","x+3","x+4")
func.list <- list()
for(i in 1:length(test.func)){
func.list[[i]] <- function(x){}
body(func.list[[i]]) <- parse(text=test.func[i])
}
processed <- mapply(do.call,func.list,lapply(test.data,list))
Thanks again, joran.
This is what I do:
f <- list(identity="x",plus1 = "x+1", square= "x^2")
funCreator <- function(snippet){
txt <- snippet
function(x){
exprs <- parse(text = txt)
eval(exprs)
}
}
listOfFunctions <- lapply(setNames(f,names(f)),function(x){funCreator(x)}) # I like to have some control of the names of the functions
listOfFunctions[[1]] # try to see what the actual function looks like?
library(pryr)
unenclose(listOfFunctions[[3]]) # good way to see the actual function http://adv-r.had.co.nz/Functional-programming.html
# Call your funcions
listOfFunctions[[2]](3) # 3+1 = 4
do.call(listOfFunctions[[3]],list(3)) # 3^2 = 9
attach(listOfFunctions) # you can also attach your list of functions and call them by name
square(3) # 3^2 = 9
identity(7) # 7 ## masked object identity, better detach it now!
detach(listOfFunctions)