How to created timeBased file in R - r

I thought I would post here since I have spent hours trying to figure this out. So I'm working with a csv file with Date and Closing return price. However, I can't get the file to be "timeBased." (timeBased function is from package xts). For example:
timeBased(dfx)
[1] FALSE
Here is what I have:
dfx = xts(aus$AUS, order.by=as.Date(aus$DATE))
and here's what the first 10 rows look like of the file:
DATE AUS
1 12/1/1988 -0.0031599720
2 12/2/1988 -0.0015724670
3 12/5/1988 -0.0000897619
4 12/6/1988 -0.0022670620
5 12/7/1988 0.0052895550
6 12/8/1988 -0.0048259860
7 12/9/1988 0.0106990910
8 12/12/1988 0.0033538810
9 12/13/1988 0.0118568700
10 12/14/1988 -0.0050105200
If anyone can help, I would appreciate it! I tried multiple codes using zoo and other edits, but nothing. Thank you!![enter image description here][1]

As Joshua Ulrich points out, using the timeBased function with an xts object should be expected to return FALSE. In addition to that, there may be another problem with your code. Assuming that your example displays the contents of aus, then aus$DATE is actually a factor or character data, not a Date object. To properly convert to an xts object, you'll have to specify the date format of the aus$DATE data. To convert and then test whether dfx is an xts object, you could use the following code:
dfx = xts(aus$AUS, order.by=as.Date(aus$DATE, "%m/%d/%Y"))
dfx
[,1]
1988-12-01 -0.0031599720
1988-12-02 -0.0015724670
1988-12-05 -0.0000897619
1988-12-06 -0.0022670620
timeBased(dfx)
[1] FALSE
is.xts(dfx)
[1] TRUE

Related

Trying to remove "ZCTA" from rows

I am trying to extract only the zip code values from my imported ACS data file, however, the rows all include "ZCTA" before the 5 digit zip code. Is there a way to remove that so just the 5 digit zip code remains?
Example:
I tried using strtrim on the data but I can't figure out how to target the last 5 digits. I image there is a function or loop that could also do this since the dataset is so large.
To remove "ZCTA5":
gsub("ZCTA5", "", df$zip) # df - your data.frame name
or
library(stringr)
str_replace(df$zip,"ZCTA5","")
To extract ZIP CODE:
str_sub(df$zip,-5,-1)
Here is a few others for fun:
#option 1
stringr::str_extract(df$zip, "(?<=\\s)\\d+$")
#option 2
gsub("^.*\\s(\\d+)$", "\\1", df$zip)

how to correct initial value NA that generates diff

when I use the diff function of type xts objects.
Well I'm trying to convert a non-stationary data source to stationary, if anyone has any other method please would appreciate your help
diff.xts(tb_xts$col1,log = F)
2012-12-01 NA # <-- correct this
2012-12-06 -0.211416877
2012-12-16 0.2005834963
is there any other way to correct the initial value?,
I know that by default the diff starts from the second element

The data in the time series is different from the data I entered. How do I get outputs in a similar scale as my inputs?

I had a column of data as follows:
141523
146785
143667
65560
88524
148422
151664
.
.
.
.
I used the ts() function to convert this data into a time series.
{
Aclines <- read.csv(file.choose())
Aclinests <- ts(Aclines[[1]], start = c(2013), end = c(2015), frequency = 52)
}
head(Aclines) gives me the following output:
X141.523
1 146785
2 143667
3 65560
4 88524
5 148422
6 151664
head(Aclinests) gives me the following output:
[1] 26 16 83 87 35 54
The output of all my further analysis including graphs and predictions are scaled to how you can see the head(Aclinets) output. How can I scale the outputs back to how the original data was input? Am I missing something while converting the data to a ts?
It is typically recommended to have a reproducible example How to make a great R reproducible example?. But I will try to help based what I'm reading. If it isn't helpful, I'll delete the post.
First, the read.csv defaults to header = TRUE. It doesn't look like you have a header in your file. Also, it looks like R is reading data in as factors instead of numeric.
So you can try a couple of parameters to reading the file -
Aclines <- read.csv(file.choose(), header=FALSE, stringsAsFactors=FALSE)
Then to get your time series
Aclinests <- ts(Aclines[, 2], start = c(2013), end = c(2015), frequency = 52)
Since your data looks like it has 2 columns, this will read the second column of your data frame into a ts object.
Hope this helps.

Help interpreting/converting odd date format

I have data pulled from a database and stored in Stata .dta files. But when I read it into R using the foreign package, I get a date format unlike any I've seen. All of the other dates are "%m/%d/%Y" and import correctly.
I have searched the database's documentation, but there's no explanation for the odd date format for "DealActiveDate". The "facilitystartdate" date should be close to the "DealActiveDate", but not necessarily the same. Here are a few rows of these two columns.
facilitystartdate DealActiveDate
1 09/12/1987 874022400000
2 09/12/1987 874022400000
3 09/12/1987 874022400000
4 09/01/1987 873072000000
5 09/08/1987 873676800000
6 10/01/1987 875664000000
7 08/01/1987 870393600000
8 08/01/1987 870393600000
9 10/01/1987 875664000000
10 09/01/1987 873072000000
Please let me know if you have any idea how to convert "DealActiveDate" to a more conventional date. Thanks! (I'm not sure SO is the best venue, but I couln't think of any other options!)
Looks like milliseconds since 1960-01-01:
as.POSIXct(874022400000/1000, origin="1960-01-01")
# [1] "1987-09-12 01:00:00 CDT"

Setting an xts Index

Build an xts object with two rows.
library(xts)
junk<-xts(c(1,2),as.Date(c("2010-01-01","2010-05-01")))
junk
> [,1]
> 2010-01-01 1
> 2010-05-01 2
Why doesn't the following change the index for the first row?
time(junk[1])<-as.Date("2010-02-01")
junk
> [,1]
> 2010-01-01 1
> 2010-05-01 2
I realize that the following works, but why doesn't the above work?
time(junk)[1]<-as.Date("2010-02-01")
junk
> [,1]
> 2010-02-01 1
> 2010-05-01 2
Thanks,
Bill
Direct answer to the post is that the magic is inside of attr<- as Josh says. Subsetting the object first simply creates a new object that gets promptly disposed of once time<- is finished.
In addition you can see the 'internals' of the index via the .index() function. Essentially an vector of type double or integer that maps to POSIXct time - with some attributes attached. The class you are assigning is automatically coerced back and forth. This makes the internals easier to maintain - and lets you do things with any time class you need outside of it.
In general, Date will be the cleanest way to keep TZ and secs trouble out of the mix, but keep in mind that the cost of this hidden aspect is the function index(). This will have to recreate the object you expect.
time(junk[1]) <- as.Date("2010-02-01")
The above doesn't change the index of the first row of junk because subsetting creates a new object--with no reference to junk--and time<-.xts replaces the index of the new object.
The dates in time series are not referenced with "[". They are more like rownames in dataframes. They are stored in the "index" leaf of the attributes list. In addition to that, they are not of Date class but rather the DateTime class so you need to may use POSIXct:
> attributes(junk)$index[1] <- as.POSIXct("2010-02-01")
> junk
[,1]
2010-02-01 1
2010-05-01 2
Edit: more accurately the attribute$index is internally in seconds but the time method will accept a variety of assignment classes.

Resources