Using the subsetting operator :: in quantmod with variables - r

How do I apply a user initialized date variables as the start and end values of the subset operator :: from the R-package, quantmod?
For example, when I apply user initialized date variables,
end.date <- Sys.Date()
start.date <- end.date - 5*365 #5- years to-date
start.date.char <- as.character(start.date)
end.date.char <- as.character(end.date)
to get 5-years of stock data
library(quantmod)
getSymbols("GILD",src="yahoo")
GILD.5YTD <- GILD['start.date.char::end.date.char']
I get the following error:
Error in if (length(c(year, month, day, hour, min, sec))
== 6 && c(year, :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In as_numeric(YYYY) : NAs introduced by coercion
2: In as_numeric(MM) : NAs introduced by coercion
3: In as_numeric(DD) : NAs introduced by coercion
4: In as_numeric(YYYY) : NAs introduced by coercion
5: In as_numeric(MM) : NAs introduced by coercion
6: In as_numeric(DD) : NAs introduced by coercion
I'm sure this is a basic question, but I'm a newbie.

There are convenient high-level functions to subset an xts object as returned, e.g., by quantmod's getSymbols().
For a time-based subset, the last() function from the xts package (automatically loaded by quantmod) is quite handy:
library(quantmod)
getSymbols("GILD",src="yahoo")
GILD_last5Years <- last(GILD, "5 years")
#> head(GILD_last5Years)
# GILD.Open GILD.High GILD.Low GILD.Close GILD.Volume GILD.Adjusted
#2012-01-03 41.46 41.99 41.35 41.86 19564000 20.46895
#2012-01-04 41.95 42.06 41.70 42.02 16236000 20.54719
#2012-01-05 42.04 42.97 42.00 42.52 18431800 20.79168
#2012-01-06 42.38 43.10 42.20 42.78 15542000 20.91882
#2012-01-09 42.49 42.99 42.35 42.73 16801200 20.89437
#2012-01-10 43.10 45.04 42.94 44.25 30110000 21.63763
This can be combined with an equivalent function first() to select a specific time span within the series.

Your current argument to [.xts is just the character value 'start.date.char::end.date.char' and would not be evaluated further, since R is not a macro language. Try instead to build the desired character value, which I believe is: "2011-08-28::2016-08-26". So this succeeds:
GILD.5YTD<-GILD[paste(start.date.char, end.date.char, sep="::")]
str(GILD.5YTD)
#-------
An ‘xts’ object on 2011-08-29/2016-08-25 containing:
Data: num [1:1257, 1:6] 39 39.7 40.2 39.8 39 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "GILD.Open" "GILD.High" "GILD.Low" "GILD.Close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2016-08-26 17:00:52"
So technically the :: is not acting as an R operator, but is being parsed by the [.xts function. Pkg:quantmod is built on top of the xts;package. The "::" function is really for package-directed function access for exported functions of installed packages.

The reason for your errors are that you are submitting the variables within a string which can not work. ( By the way you do not have to convert the date into as.character as in your example as pasting will do that for you). Using paste0 like so will subset your data accordingly:
GILD.5YTD<-GILD[paste0(start.date.char,'::',end.date.char)]

Related

Why does lubridate mdy() return an error in lapply()?

I'm trying to understand why my lubridate mdy() function is returning an error in lapply() to convert dates in a dplyr pipeline. I have used mdy() on other data in a similar method but have yet to see this issue. I am relatively new to R but had been able to troubleshoot other issues until now. I am not very familiar with how to use lapply().
My data is a large .csv of water quality data, which I'm subsetting to simply show the data in question.
library(dplyr)
library(lubridate)
require(lubridate)
wq.all<-as.data.frame(read.csv('C:/WQdata.csv',header=TRUE,stringsAsFactors = FALSE))
test.wq<-wq.all[1:5,12:13]
class(test.wq)
[1] "data.frame"
mode(test.wq)
[1] "list"
str(test.wq)
'data.frame': 5 obs. of 2 variables:
$ YearMonth : chr "2019-07" "2019-06" "2019-05" "2019-04" ...
$ SampleTime: chr "07/09/2019 14:44" "06/10/2019 14:17" "05/22/2019 14:31" "04/08/2019 14:15" ...
In str(test.wq), SampleTime is the data in question which I am trying to coerce from chr to date, or at least num.
First, I don't need the time values, so I used dplyr mutate() to create SampleDate with only the 10-character dates, and then was attempting to coerce using mdy():
wq.date<-test.wq%>%
mutate(SampleDate=str_sub(test.wq[[2]],start=0,end=10))%>%
mdy(SampleDate)
But this returns an error:
Error in lapply(list(...), .num_to_date) : object 'SampleDate' not found
If I only use mutate() it all seems to work fine, and gives me the new SampleDate column I was looking for:
wq.date<-test.wq%>%
mutate(SampleDate=str_sub(test.wq[[2]],start=0,end=10))
head(wq.date)
YearMonth SampleTime SampleDate
1 2019-07 07/09/2019 14:44 07/09/2019
2 2019-06 06/10/2019 14:17 06/10/2019
3 2019-05 05/22/2019 14:31 05/22/2019
4 2019-04 04/08/2019 14:15 04/08/2019
5 2019-03 03/13/2019 14:19 03/13/2019
str(wq.date)
'data.frame': 5 obs. of 3 variables:
$ YearMonth : chr "2019-07" "2019-06" "2019-05" "2019-04" ...
$ SampleTime: chr "07/09/2019 14:44" "06/10/2019 14:17" "05/22/2019 14:31" "04/08/2019 14:15" ...
$ SampleDate: chr "07/09/2019" "06/10/2019" "05/22/2019" "04/08/2019" ...
So it only seems to result in error once I attempt to coerce using mdy(), even though SampleDate clearly exists and I believe I was referencing it correctly.
I have researched other posts here and here, but neither seem to get to quite this issue.
Thoughts? Many thanks!
We need to have it inside mutate or extract the column, otherwise, it is applying the function on the entire data.frame. According to ?mdy
Transforms dates stored in character and numeric vectors to Date or POSIXct objects
So, if the input is not a vector, it won't work
library(dplyr)
library(lubridate)
library(stringr)
test.wq%>%
mutate(SampleDate=str_sub(SampleTime,start=0,end=10))%>%
mutate(date = mdy(SampleDate))

Selecting from xts by column name

I'm trying to operate on a specific column in an xts object by name within a function but I keep getting an error:
Error in if (length(c(year, month, day, hour, min, sec)) == 6 && all(c(year, :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In as_numeric(YYYY) : NAs introduced by coercion
2: In as_numeric(YYYY) : NAs introduced by coercion
If I have an xts object:
xts1 <- xts(x=1:10, order.by=Sys.Date()-1:10)
xts2 <- xts(x=1:10, order.by=Sys.Date()+1:10)
xts3 <- merge(xts1, xts2)
Then I can select a specific column with:
xts3$xts1
With a dataframe I can pass xts3 to another function and then select a specific column with:
xts3['xts1']
But if I try to do the same thing with an xts object I get the error above. e.g.
testfun <- function(xts_data){
print(xts_data['xts1'])
}
Called with:
testfun(xts3)
This works:
testfun <- function(xts_data){
print(xts_data[,1])
}
But I'd really like to select by name as I can't be certain of the column order.
Can anyone suggest how to solve this?
Thanks!
xts-objects have class c("xts", "zoo"), which means they are matrices with special attributes that are assigned by their creation functions. Although $ will not succeed with a matrix, it works with xts and zoo objects thanks to the $.zoo method. (It's also not recommended to use $ inside functions because of the potential for name-evaluation-confusion and partial name matching.) See: ?xts and examine the sample.xts object created with the first example with str:
> ?xts
starting httpd help server ... done
> data(sample_matrix)
> sample.xts <- as.xts(sample_matrix, descr='my new xts object')
>
> str(sample.xts)
An ‘xts’ object on 2007-01-02/2007-06-30 containing:
Data: num [1:180, 1:4] 50 50.2 50.4 50.4 50.2 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "Open" "High" "Low" "Close"
Indexed by objects of class: [POSIXct,POSIXt] TZ:
xts Attributes:
List of 1
$ descr: chr "my new xts object"
class(sample.xts)
# [1] "xts" "zoo"
This explains why the earlier answer advising the use of xts3[ , "x"] or equivalently xts3[ , 1] should succeed. The [.xts function extracts the "Data" element first and then returns the either named or numbered column specified by the j-argument.
str(xts3)
An ‘xts’ object on 2018-05-24/2018-06-13 containing:
Data: int [1:20, 1:2] 10 9 8 7 6 5 4 3 2 1 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "xts1" "xts2"
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
NULL
> xts3[ , "xts1"]
xts1
2018-05-24 10
2018-05-25 9
2018-05-26 8
2018-05-27 7
2018-05-28 6
2018-05-29 5
2018-05-30 4
2018-05-31 3
2018-06-01 2
2018-06-02 1
2018-06-04 NA
2018-06-05 NA
2018-06-06 NA
2018-06-07 NA
2018-06-08 NA
2018-06-09 NA
2018-06-10 NA
2018-06-11 NA
2018-06-12 NA
2018-06-13 NA
The merge.xts operation might not have delivered what you expected since the date ranges didn't overlap. It seems possible that you wanted:
> xts4 <- rbind(xts1, xts2)
> str(xts4)
An ‘xts’ object on 2018-05-24/2018-06-13 containing:
Data: int [1:20, 1] 10 9 8 7 6 5 4 3 2 1 ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
NULL
Note that the rbind.xts-operation failed to deliver an object with the shared column name so numeric access would be needed. (I would have expected a named "Data" element, but you/we also need to read ?rbind.xts.)
Type ?`[.xts` and you'll see that the function has a i and a j argument (among others).
i - the rows to extract. Numeric, timeBased or ISO-8601 style (see details)
j - the columns to extract, numeric or by name
You passed 'xts1' as the i argument, while it should be j. So your function should be
testfun <- function(xts_data){
print(xts_data[, 'xts1']) # or xts3[j = 'xts1']
}

Merge output from quantmod::getSymbols

I looked many entries on merging R data frames, however they are not clear to me, they talk about merging/joining using a common column, but in my case its missed or may I don't know how to extract. Here is what I am doing.
library(quantmod)
library(xts)
start = '2001-01-01'
end = '2015-08-14'
ticker = 'AAPL'
f = getSymbols(ticker, src = 'yahoo', from = start, to = end, auto.assign=F)
rsi14 <- RSI(f$AAPL.Adjusted,14)
The output I am expecting is all the columns of f and rsi14 match by date, however 'date' is not available as column, so not sure how do I join. I have to join few Moving Average columns as well.
The premise of your question is wrong. getSymbols returns an xts object, not a data.frame:
R> library(quantmod)
R> f <- getSymbols("AAPL", auto.assign=FALSE)
R> str(f)
An ‘xts’ object on 2007-01-03/2015-08-14 containing:
Data: num [1:2170, 1:6] 86.3 84 85.8 86 86.5 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "AAPL.Open" "AAPL.High" "AAPL.Low" "AAPL.Close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2015-08-15 00:46:49"
xts objects do not have a "Date" column. They have an index attribute that holds the datetime. xts extends zoo, so please see the zoo vignettes as well as the xts vignette and FAQ for information about how to use the classes.
Merging xts objects is as simple as:
R> f <- merge(f, rsi14=RSI(Ad(f), 14))
Or you could just use $<- to add/merge a column to an existing xts object:
R> f$rsi14 <- RSI(Ad(f), 14)

Using a variable to add a data frame column in R

I am trying to achieve the following
stocks <- c('AXP', 'VZ', 'V')
library('quantmod')
getSymbols(stocks)
Above command creates 3 data variables named AXP, VZ, and V
prices <- data.frame(stringAsFactors=FALSE)
Here I am trying to create a column with name as ticket (e.g. AXP) with data in
The following should add 3 columns to the frame, names AXP, VZ, and V with data in
AXP$AXP.Adjusted, VZ$VZ.Adjusted, V$V.Adjusted
for (ticker in stocks)
{
prices$ticker <- ticker$ticker.Adjusted
}
How do I achieve this? R gives an error like this when I try this
Error in ticker$ticker.Adjusted :
$ operator is invalid for atomic vectors
Any ideas?
Thanks in advance
Here is a simpler way to do this
do.call('cbind', lapply(mget(stocks), function(d) d[,6]))
Explanation:
mget(stocks) gets the three data frames as a list
lapply extracts the 6th column which contains the variable of interest.
do.call passes the list from (2) to cbind, which binds them together as columns.
NOTE: This solution does not take care of the different number of columns in the data frames.
I did not understand your question before, now I think I understood what you want:
What you wrote does not work because the object ticker is character string. If you want to get the object named after that string, you have to evaluate the parsed text.
Try this:
for (ticker in stocks){
prices <- cbind(prices, eval(parse(text=ticker))[,paste0(ticker, ".", "Adjusted")])
}
This will give you:
An ‘xts’ object on 2007-01-03/2014-01-28 containing:
Data: num [1:1780, 1:4] 53.4 53 52.3 52.8 52.5 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "AXP.Adjusted" "AXP.Adjusted.1" "VZ.Adjusted" "V.Adjusted"
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2014-01-29 01:06:51"
One problem you're going to have is that the three downloads have different number of rows, so binding them all into a single data frame will fail.
The code below uses the last 1000 rows of each file (most recent), and does not use loops.
stocks <- c('AXP', 'VZ', 'V')
library('quantmod')
getSymbols(stocks)
prices=do.call(data.frame,
lapply(stocks,
function(s)tail(get(s)[,paste0(s,".Adjusted")],1000)))
colnames(prices)=stocks
head(prices)
# AXP VZ V
# 2010-02-08 34.70 21.72 80.58
# 2010-02-09 35.40 22.01 80.79
# 2010-02-10 35.60 22.10 81.27
# 2010-02-11 36.11 22.23 82.73
# 2010-02-12 36.23 22.15 82.38
# 2010-02-16 37.37 22.34 83.45
Working from the inside out, s is the ticker (so, e.g., "AXP"); get(s) returns the object with that name, so AXP; get(s)[,paste0(s,".Adjusted")] is equivalent to AXP[,"AXP.Adjusted"]; tail(...,1000) returns the last 1000 rows of .... So when s="AXP", the function returns the last 1000 rows of AXP$AXP.Adjusted.
lapply(...) applies that function to each element in stocks.
do.call(data.frame,...) invokes the data.frame function with the list of columns returned by lapply(...).

R from character to numeric

I have this csv file (fm.file):
Date,FM1,FM2
28/02/2011,14.571611,11.469457
01/03/2011,14.572203,11.457512
02/03/2011,14.574798,11.487183
03/03/2011,14.575558,11.487802
04/03/2011,14.576863,11.490246
And so on.
I run this commands:
fm.data <- as.xts(read.zoo(file=fm.file,format='%d/%m/%Y',tz='',header=TRUE,sep=','))
is.character(fm.data)
And I get the following:
[1] TRUE
How do I get the fm.data to be numeric without loosing its date index. I want to perform some statistics operations that require the data to be numeric.
I was puzzled by two things: It didn't seem that that 'read.zoo' should give you a character matrix, and it didn't seem that changing it's class would affect the index values, since the data type should be separate from the indices. So then I tried to replicate the problem and get a different result:
txt <- "Date,FM1,FM2
28/02/2011,14.571611,11.469457
01/03/2011,14.572203,11.457512
02/03/2011,14.574798,11.487183
03/03/2011,14.575558,11.487802
04/03/2011,14.576863,11.490246"
require(xts)
fm.data <- as.xts(read.zoo(file=textConnection(txt),format='%d/%m/%Y',tz='',header=TRUE,sep=','))
is.character(fm.data)
#[1] FALSE
str(fm.data)
#-------------
An ‘xts’ object from 2011-02-28 to 2011-03-04 containing:
Data: num [1:5, 1:2] 14.6 14.6 14.6 14.6 14.6 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "FM1" "FM2"
Indexed by objects of class: [POSIXct,POSIXt] TZ:
xts Attributes:
List of 2
$ tclass: chr [1:2] "POSIXct" "POSIXt"
$ tzone : chr ""
zoo- and xts-objects have their data in a matrix accessed with coredata and their indices are a separate set of attributes.
I think the problem is you have some dirty data in you csv file. In other words FM1 or FM2 columns contain a character, somewhere, that stops it being interpreted as a numeric column. When that happens, XTS (which is a matrix underneath) will force the whole thing to character type.
Here is one way to use R to find suspicious data:
s <- scan(fm.file,what="character")
# s is now a vector of character strings, one entry per line
s <- s[-1] #Chop off the header row
all(grepl('^[-0-9,.]*$',s,perl=T)) #True means all your data is clean
s[ !grepl('^[-0-9,.]*$',s,perl=T) ]
which( !grepl('^[-0-9,.]*$',s,perl=T) ) + 1
The second-to-last line prints out all the csv rows that contain characters you did not expect. The last line tells you which rows in the file they are (+1 because we removed the header row).
Why not simply use read.csv and then convert the first column to an Date object using as.Date
> x <- read.csv(fm.file, header=T)
> x$Date <- as.Date(x$Date, format="%d/%m/%Y")
> x
Date FM1 FM2
1 2011-02-28 14.57161 11.46946
2 2011-03-01 14.57220 11.45751
3 2011-03-02 14.57480 11.48718
4 2011-03-03 14.57556 11.48780
5 2011-03-04 14.57686 11.49025

Resources