I have a large number of time series variables (stock prices) of which I want to perform various analytics. The problem is not all variables have same number of prices in the data range I am interested in using because some stocks came into existence at different points in time.
As such, I am trying to return the date of the first data element in each of the xts variables but I have a very ugly solution to do this at the moment. I was wondering if there is a function that I could call to return the date by some sort of indexing.
i.e
> str(IBM)
An ‘xts’ object from 2004-01-02 to 2011-04-25 containing:
Data: num [1:1841, 1] 25.1 25.6 25.6 25.3 25.4 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "IBM.Adjusted"
Indexed by objects of class: [Date] TZ:
xts Attributes:
List of 2
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2011-04-26 14:35:02"
I am looking for a clean way to grab 2004-01-02 from the above object for example.
I appreciate the help. Thank you.
I imagine this would work:
min(index(IBM))
You can use the start function:
> library(quantmod)
> getSymbols("IBM")
[1] "IBM"
> start(IBM)
[1] "2007-01-03"
Related
I have data where I have the dates in YYYY-MM-DD format in one column and another column is num.
packages:
library(forecast)
library(ggplot2)
library(readr)
Running str(my_data) produces the following:
spec_tbl_df [261 x 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ date : Date[1:261], format: "2017-01-01" "2017-01-08" ...
$ popularity: num [1:261] 100 81 79 75 80 80 71 85 79 81 ...
- attr(*, "spec")=
.. cols(
.. date = col_date(format = ""),
.. popularity = col_double()
.. )
- attr(*, "problems")=<externalptr>
I would like to do some time series analysis on this. When running the first line of code for this decomp <- stl(log(my_data), s.window="periodic")
I keep running into the following error:
Error in Math.data.frame(my_data) :
non-numeric-alike variable(s) in data frame: date
Originally my date format was in MM/DD/YYYY format, so I feel like I'm... barely closer. I'm learning R again, but it's been a while since I took a formal course in it. I did a precursory search here, but could not find anything that I could identify as helpful (I'm just an amateur.)
You currently have a data.frame (or tibble variant thereof). That is not yet time aware. You can do things like
library(ggplot2)
ggplot(data=df) + aes(x=date, y=popularity) + geom_line()
to get a basic line plot properly index by date.
You will have to look more closely at package forecast and the examples of functions you want to use to predict or model. Packages like xts can help you, i.e.
library(xts)
x <- xts(df$popularity, order.by=df$date)
plot(x) # plot xts object
besides plotting you get time- and date aware lags and leads and subsetting. The rest depends more on what you want to do ... which you have not told us much about.
Lastly, if you wanted to convert your dates to numbers (since Jan 1, 1970) a quick as.numeric(df$date)) will; but using time-aware operations is often better (but has the learning curve you see now...)
I am trying to generate a plot that is similar to this:
A walkthrough is provided here -> https://medium.com/#erickramer/beautiful-data-science-with-functional-programming-and-r-a3f72059500b
However the code supplied on this website isn't generating a plot for me, instead I get this error:
> forecasts1 = tsdf %>%
+ map(auto.arima) %>%
+ map(forecast, h=10)
Error in is.constant(x) :
(list) object cannot be coerced to type 'double'
This is despite the fact that I have replicated their data formatting precisely. Here are our datasets for comparison:
> str(tsdf)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 89 obs. of 1 variable:
$ time_series:List of 89
..$ 1_1 : Time-Series from 2013 to 2017: 8981338 10707490 11410597 10816217 12263765 ...
..$ 1_10 : Time-Series from 2013 to 2017: 12645212 13510638 13133558 13542970 16074675 ...
..$ 1_2 : Time-Series from 2013 to 2017: 19028892 20626896 19952328 20865263 22547313 ...
..$ 1_3 : Time-Series from 2013 to 2017: 7081624 8317481 8374427 8330653 9643845 ...
..$ 1_4 : Time-Series from 2013 to 2017: 25421637 30934941 30756101 27977317 32417608 ...
And the provided example data (upon which the code did work, according to the website):
> str(time_series)
List of 9
$ Germany : Time-Series [1:52] from 1960 to 2011: 684721 716424 749838 ...
$ Singapore : Time-Series [1:52] from 1960 to 2011: 7208 7795 8349 ...
$ Finland : Time-Series [1:37] from 1975 to 2011: 85842 86137 86344 ...
I can't seem to figure it out, though it may have something to do with the fact that their timeseries has one solid endpoint, yet my timeseries have several different monthly endpoints.
Any help with this is greatly appreciated!
* UPDATE *
After applying Akruns suggestion I stored exclusively the time-series vector in a list like so:
tsdf <- akrun %>%
select(time_series)
I then fit the model like this:
tsdf$time_series %>% map(auto.arima) %>%
map(forecast, h=12)
...and then the plot...
... looks awful.
Do I need to convert y_axis scale? Or do some sort of differencing to the data before plotting the arima? Really appreciate any suggestions!
Would like to be able to read Google Sheets cell values into R with googlesheets package, but without any cell formatting applied (e.g. comma separators, percentage conversion, etc.).
Have tried gs_read() without specifying a range, which uses gs_read_csv(), which will "request the data from the Sheets API via the exportcsv link". Can't find a way to tell it to provide underlying cell value without formatting applied.
Similarly, tried gs_read() and specifying a range, which uses gs_read_cellfeed(). But can't find a way to indicate that I want un-formatted cell values.
Note: I'm not after the formulas in any cells, just the values without any formatting applied.
Example:
(looks like I'm not able to post image images)
Here's a screenshot of an example Google Sheet:
https://www.dropbox.com/s/qff05u8nn3do33n/Screenshot%202015-07-26%2008.42.58.png?dl=0
First and third columns are numeric with no formatting applied, 2nd column applies comma separators for thousands, 4th column applies percentage formatting.
Reading this sheet with the following code:
library(googlesheets)
gs <- gs_title("GoogleSheets Test")
ws <- gs_read(gs, ws = "Sheet1")
yields:
> str(ws)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 4 variables:
$ Number : int 123456 123457 123458
$ Number_wFormat : chr "123,456" "123,457" "123,458"
$ Percent : num 0.123 0.234 0.346
$ Percent_wFormat: chr "12.34%" "23.45%" "34.56%"
Would like to be able to read a worksheet that has formatting applied (ala columns 2 and 4), but read the unformatted values (ala columns 1 and 3).
At this point, I think your best bet is to fix the imported data like so:
> ws$Number_fixed <- type.convert(gsub(',', '', ws$Number_wFormat))
> ws$Percent_fixed <- type.convert(gsub('%', '', ws$Percent_wFormat)) / 100
> str(ws)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 6 variables:
$ Number : int 123456 123457 123458
$ Number_wFormat : chr "123,456" "123,457" "123,458"
$ Percent : num 0.123 0.234 0.346
$ Percent_wFormat: chr "12.34%" "23.45%" "34.56%"
$ Number_fixed : int 123456 123457 123458
$ Percent_fixed : num 0.123 0.234 0.346
I had some hope that post-processing with functions from readr would be a decent answer, but it looks like percentages and "currency" style numbers are open issues there too.
I have opened an issue to solve this better in googlesheets, one way or another.
I tried to read my data as zoo using read.zoo function, the data include 14436 columns and the first column is in date format (yyyy-mm-dd). My code is
HFs<-read.zoo("F:/Research/Drawdown analysis/Data analysis/HF return.csv",index=1,header=TRUE,format="%Y-%m-%d")
The result is I only read the first column into R as a date index, all other values are lost.
and my str(HFs) shows
‘zoo’ series from 1990-01-31 to 2010-02-28
Data: logi[1:242, 0 ]
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : NULL
Index: Date[1:242], format: "1990-01-31" "1990-02-28" "1990-03-31" "1990-04-30" "1990-05-31" "1990-06-30" ...
Could anyone help to figure out the correct way to read table as zoo into R?
Thanks
Wei
I am learning to use topicmodels package and R as well, and explored one of its example data set by using
str(testdata)
'data.frame': 3104 obs. of 5 variables:
$ Article_ID: int 41246 41257 41268 41279 41290 41302 41314 41333 41344 41355 ...
$ Date : chr "1-Jan-96" "2-Jan-96" "3-Jan-96" "4-Jan-96" ...
$ Title : chr "Nation's Smaller Jails Struggle To Cope With Surge in Inmates" "FEDERAL IMPASSE SADDLING STATES WITH INDECISION" "Long, Costly Prelude Does Little To Alter Plot of Presidential Race" "Top Leader of the Bosnian Serbs Now Under Attack From Within" ...
$ Subject : chr "Jails overwhelmed with hardened criminals" "Federal budget impasse affect on states" "Contenders for 1996 Presedential elections" "Bosnian Serb leader criticized from within" ...
$ Topic.Code: int 12 20 20 19 1 19 1 1 20 15 ...
If I want to create a data set according to the above format in R, how to do that?
test.data is a data.frame, one of the few fundamental R objects. You should probably start here: http://cran.r-project.org/doc/manuals/R-intro.pdf.
Some functions for creating data.frames are data.frame, read.table, read.csv. For each of these you can access their documentation by typing ?data.frame for example. Good luck.