Setting an xts Index - r

Build an xts object with two rows.
library(xts)
junk<-xts(c(1,2),as.Date(c("2010-01-01","2010-05-01")))
junk
> [,1]
> 2010-01-01 1
> 2010-05-01 2
Why doesn't the following change the index for the first row?
time(junk[1])<-as.Date("2010-02-01")
junk
> [,1]
> 2010-01-01 1
> 2010-05-01 2
I realize that the following works, but why doesn't the above work?
time(junk)[1]<-as.Date("2010-02-01")
junk
> [,1]
> 2010-02-01 1
> 2010-05-01 2
Thanks,
Bill

Direct answer to the post is that the magic is inside of attr<- as Josh says. Subsetting the object first simply creates a new object that gets promptly disposed of once time<- is finished.
In addition you can see the 'internals' of the index via the .index() function. Essentially an vector of type double or integer that maps to POSIXct time - with some attributes attached. The class you are assigning is automatically coerced back and forth. This makes the internals easier to maintain - and lets you do things with any time class you need outside of it.
In general, Date will be the cleanest way to keep TZ and secs trouble out of the mix, but keep in mind that the cost of this hidden aspect is the function index(). This will have to recreate the object you expect.

time(junk[1]) <- as.Date("2010-02-01")
The above doesn't change the index of the first row of junk because subsetting creates a new object--with no reference to junk--and time<-.xts replaces the index of the new object.

The dates in time series are not referenced with "[". They are more like rownames in dataframes. They are stored in the "index" leaf of the attributes list. In addition to that, they are not of Date class but rather the DateTime class so you need to may use POSIXct:
> attributes(junk)$index[1] <- as.POSIXct("2010-02-01")
> junk
[,1]
2010-02-01 1
2010-05-01 2
Edit: more accurately the attribute$index is internally in seconds but the time method will accept a variety of assignment classes.

Related

how to correct initial value NA that generates diff

when I use the diff function of type xts objects.
Well I'm trying to convert a non-stationary data source to stationary, if anyone has any other method please would appreciate your help
diff.xts(tb_xts$col1,log = F)
2012-12-01 NA # <-- correct this
2012-12-06 -0.211416877
2012-12-16 0.2005834963
is there any other way to correct the initial value?,
I know that by default the diff starts from the second element

Changing multiple columns of a data frame from class 'character' to class 'time' using chron

I have a data frame with multiple columns, some of which I need to change to 'time' class using chron so that I can retrieve basic statistics. These columns are currently times stored as characters and formatted like this: hh:mm.
Here is a subset of it as well as the list of columns that need to change:
> Data
DATE FLT TYPE REG AC DEP ARR STD STA ATD ATA
1 15-01-02 953 J C-GCPT 73M YVQ YEV 12:00 12:55 13:00 13:59
2 15-01-04 953 J C-GCPT 73M YVQ YEV 12:00 12:55 13:17 14:13
3 15-01-05 953 J C-GCPT 73M YVQ YEV 12:00 12:55 13:20 14:14
Time_list <-c("STD","STA","ATD","ATA")
Here is what I have done to change only one column (and it works):
Data$ATA <- paste0(Data$ATA, ':00')
Data$ATA<-chron(times.=Data$ATA)
class(Data$ATA)
[1] "times"
However, I would prefer to be able to do all the columns at the same time since there are many of them. I've tried multiple techniques and some seem to work for the first part, which is pasting ':00', but it always goes wrong for the second part, using chron . I seem to have a length problem that I don't understand
Using dmap
Data[,Time_list]<-
Data%>%
select(one_of(Time_list)) %>%
dmap(paste0,':00')
Data[,Time_list]<-
Data %>%
select(one_of(Time_list)) %>%
dmap(chron,times.=Data[,Time_list])
**Error in .f(.d[[i]], ...) :
.d[[i]] and Data[, Time_list] must have equal lengths**
Using apply
YEVdata[,(Time_list)] <- lapply(YEVdata[,(Time_list)], paste0,':00')
Data[,(Time_list)] <- lapply(Data[,(Time_list)], chron, times. =Data[,(Time_list)])
**Error in FUN(X[[i]], ...) :
X[[i]] and Data[, (Time_list)] must have equal lengths**
Using a forloop
I tried using a for loop, but I'm just a beginner and could get anywhere.
Using "simple" solution from another Stack Overflow question.
It just made a mess, even pasting.
Efficiently transform multiple columns of a data frame
Any ideas in plain beginner language would be very appreciated! If it is possible to nest both operations, it would be even better!
dplyr::mutate_at would work for this situation. You define the variables you want to mutate and then define the function you want to use.
You can do the pasting and converting to a time in a single step within funs using the . notation and nesting functions.
library(dplyr)
Data = mutate_at(Data, Time_list, funs(chron(times. = paste0(., ":00"))))

creating list of objects with names from a list

hopefully it is not a duplicate, rather difficult to phrase it correctly (relatively new to R)
So the problem is: I want to use sequences of dates excluding certain weekdays based on the row information. I can use bizdays and create calendar objects on the fly but it is quite inefficient - I would rather have them created before and use as needed. On the other side I do not want to create a calendar for every single object that can happen to occur (too many to bother, combination of all weekdays plus versions with/without holidays).
I can create a dataframe with list of dates between start/end date for every row, but i need to provide a calendar with weekdays
P <- setDT(R)[, list(ID=ID,
dt=bizseq(Start.Date,End.Date, cal)
), by=1:nrow(R)]
To provide a calendar I have to define it like
cal <- Calendar(weekdays=c("monday", "tuesday"))
now a working dataset that could explain what i am struggling with
> M <-c(0,1,1,0)
> T <- c(1,1,1,0)
> W <- c(0,0,0,1)
> df <- data.frame(M,T,W)
> df$S <-paste0("c",df$M,df$T,df$W)
> udf <- unique(df)
> udf
M T W S
1 0 1 0 c010
2 1 1 0 c110
4 0 0 1 c001
using udf i would like to create a list of calendar objects that i can afterwards pass to the bizseq using get(df$S), something along the lines of
require(bizdays)
loop or apply?
.... <- Calendar(weekdays=c(ifelse(udf$M==0,"","monday"), ifelse(udf$T==0,"","tuesday"),ifelse(udf$W==0,"","wednesday")))
So now the right questions;) Firstly - is it the best approach? then if so - how to create these 3 objects under their names ("c101" etc), so for example the c100 will match the calendar with Monday on - it is not a question how to create a calendar as the method above works (it is enough to substitute the dots with the name), but how to create object c101 that would become a calendar if i create names in a dynamic way? I could imagine looping through the rows, but have no idea how to force the resulting object to be named udf$S. Unless you reckon there is any better method of providing the corresponding calendar than get() from a list of pre-created objects (for a dataframe with thousands of dates and combination of days off).
I would like basically to end up with 3 calendar objects named c010, c110, c001, but if the expanded table has more unique options to create all other combinations before i run the setDT() function
Afterthought: I can add ID to the udf and call the calendars by index and then return the index to df, but I wonder if it is possible to create dynamic names of objects just as I tried
NOTE
following Sathish's lead I used what seems sufficient:
for(i in 1:nrow(udf)) {
cal <- Calendar(weekdays=c(ifelse(udf[i,1]==0,"","monday"), ifelse(udf[i,2]==0,"","tuesday"),ifelse(udf[i,3]==0,"","wednesday")))
assign(udf[i,4], cal)
}

To sort a specific column in a DataFrame in SparkR

In SparkR I have a DataFrame data. It contains time, game and id.
head(data)
then gives ID = 1 4 1 1 215 985 ..., game = 1 5 1 10 and time 2012-2-1, 2013-9-9, ...
Now game contains a gametype which is numbers from 1 to 10.
For a given gametype I want to find the minimum time, meaning the first time this game has been played. For gametype 1 I do this
data1 <- filter(data, data$game == 1)
This new data contains all data for gametype 1. To find the minimum time I do this
g <- groupBy(data1, game$time)
first(arrange(g, desc(g$time)))
but this can't run in sparkR. It says "object of type S4 is not subsettable".
Game 1 has been played 2012-01-02, 2013-05-04, 2011-01-04,... I would like to find the minimum-time.
If all you want is a minimum time sorting a whole data set doesn't make sense. You can simply use min:
agg(df, min(df$time))
or for each type of game:
groupBy(df, df$game) %>% agg(min(df$time))
By typing
arrange(game, game$time)
I get all of the time sorted. By taking first function I get the first entry. If I want the last entry I simply type this
first(arrange(game, desc(game$time)))
Just to clarify because this is something I keep running into: the error you were getting is probably because you also imported dplyr into your environment. If you would have used SparkR::first(SparkR::arrange(g, SparkR::desc(g$time))) things would probably have been fine (although obviously the query could've been more efficient).

How to created timeBased file in R

I thought I would post here since I have spent hours trying to figure this out. So I'm working with a csv file with Date and Closing return price. However, I can't get the file to be "timeBased." (timeBased function is from package xts). For example:
timeBased(dfx)
[1] FALSE
Here is what I have:
dfx = xts(aus$AUS, order.by=as.Date(aus$DATE))
and here's what the first 10 rows look like of the file:
DATE AUS
1 12/1/1988 -0.0031599720
2 12/2/1988 -0.0015724670
3 12/5/1988 -0.0000897619
4 12/6/1988 -0.0022670620
5 12/7/1988 0.0052895550
6 12/8/1988 -0.0048259860
7 12/9/1988 0.0106990910
8 12/12/1988 0.0033538810
9 12/13/1988 0.0118568700
10 12/14/1988 -0.0050105200
If anyone can help, I would appreciate it! I tried multiple codes using zoo and other edits, but nothing. Thank you!![enter image description here][1]
As Joshua Ulrich points out, using the timeBased function with an xts object should be expected to return FALSE. In addition to that, there may be another problem with your code. Assuming that your example displays the contents of aus, then aus$DATE is actually a factor or character data, not a Date object. To properly convert to an xts object, you'll have to specify the date format of the aus$DATE data. To convert and then test whether dfx is an xts object, you could use the following code:
dfx = xts(aus$AUS, order.by=as.Date(aus$DATE, "%m/%d/%Y"))
dfx
[,1]
1988-12-01 -0.0031599720
1988-12-02 -0.0015724670
1988-12-05 -0.0000897619
1988-12-06 -0.0022670620
timeBased(dfx)
[1] FALSE
is.xts(dfx)
[1] TRUE

Resources