I am running this code in order to get a bound test on stock datas.
Everything is working until I made my ardlBoundOrders and get the following error : Error in match.arg(method) : 'arg' must be of length 1
Where this error comes from ? Is that possible this comes from the merged dataset (since I run the code without any problem when I only use excel imported dataset) ? How to fix it ?
Thanks for your help!
Here is the script :
library(quantmod)
library(ggplot2)
library(plotly)
library(dLagM)
tickers = c("DIS", "GILD", "AMZN", "AAPL")
stocks<-getSymbols(tickers,
from = "1994-01-01",
to = "2022-02-01",
periodicity = "monthly",
src = "yahoo")
DISclose<-DIS[, 4:4]
GILDclose<-GILD[, 4:4]
AMZNclose<-AMZN[, 4:4]
AAPLclose<-AAPL[, 4:4]
newdata <- merge(DATA, DISclose)
formula <- DIS.Close ~ USDEUR+CPI+CONSCONF+FEDFUNDS+HOUST+UNRATE+INDPRO+VIX+SPY+CLI
ARDLfit <- ardlDlm(formula = formula, data = newdata, p = 10, q = 10)
summary(ARDLfit)
orders3 <- ardlBoundOrders(data = newdata, formula =
formula, ic = "BIC", max.p = 2, max.q = 2)
p <- data.frame(orders3$q, orders3$p) + 1
Boundtest<- ardlBound(data = DATA, formula =
formula2, p=p , ECM = TRUE)
par(mfrow=c(1,1))
disney<-Boundtest[["ECM"]][["EC.t"]]
plot(disney, type="l")
Update :
I think I found something :
When I merge my datas, it square them by allocating each of the stocks data on each of my rows datas. An example would be more explicit :
Here is the variable DATA :
> DATA
# A tibble: 337 × 12
Date VIX USDEUR CPI CONSCONF FEDFUNDS HOUST SPY INDPRO UNRATE
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1994-01-01 00:00:00 10.6 0.897 146. 101. 3.05 1272 28.8 67.1 6.6
2 1994-02-01 00:00:00 14.9 0.895 147. 101. 3.25 1337 28.0 67.1 6.6
3 1994-03-01 00:00:00 20.5 0.876 147. 101. 3.34 1564 26.7 67.8 6.5
4 1994-04-01 00:00:00 13.8 0.877 147. 101. 3.56 1465 27.1 68.2 6.4
5 1994-05-01 00:00:00 13.0 0.859 148. 101. 4.01 1526 27.6 68.5 6.1
6 1994-06-01 00:00:00 15.0 0.846 148. 101. 4.25 1409 26.7 69.0 6.1
7 1994-07-01 00:00:00 11.1 0.818 148. 101. 4.26 1439 27.8 69.1 6.1
8 1994-08-01 00:00:00 12.0 0.818 149 101. 4.47 1450 28.8 69.5 6
9 1994-09-01 00:00:00 14.3 0.810 149. 101. 4.73 1474 27.9 69.7 5.9
10 1994-10-01 00:00:00 14.6 0.793 149. 101. 4.76 1450 28.9 70.3 5.8
# … with 327 more rows, and 2 more variables: CLI <dbl>, SPYr <dbl>
Here is the variable merged newdata :
CLI SPYr DIS.Close
1 100.52128 0.0000000000 15.53738
2 100.70483 -0.0291642024 15.53738
3 100.83927 -0.0473966064 15.53738
4 100.92260 0.0170457821 15.53738
5 100.95804 0.0159393078 15.53738
6 100.95186 -0.0293319435 15.53738
7 100.91774 0.0391511218 15.53738
8 100.86948 0.0381206253 15.53738
9 100.80795 -0.0311470101 15.53738
10 100.72614 0.0346814791 15.53738
11 100.60322 -0.0398155024 15.53738
12 100.42905 -0.0006857954 15.53738
13 100.19862 0.0418493643 15.53738
In fact, for each row of DATA there is the first row of DIScloseand so on for the 2nd, the 3rd... Then my dataset go from x row to x^2 row.
I did some research to fix this problem, and I should match both datasets through by="matchingIDinbothdataset" but I do not have matching ID. Is there a solution ?
Thank you in advance.
Related
I try to calculate stock returns for different time periods for a very large dataset.
I noticed that there are some inconsistencies with tq_mutate calculations and my checking:
library(tidyquant)
A_stock_prices <- tq_get("A",
get = "stock.prices",
from = "2000-01-01",
to = "2004-12-31")
print(A_stock_prices[A_stock_prices$date>"2000-12-31",])
# A tibble: 1,003 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 2001-01-02 38.5 38.5 35.1 36.4 2261684 **31.0**
2 A 2001-01-03 35.1 40.4 34.0 40.1 4502678 34.2
3 A 2001-01-04 40.7 42.7 39.6 41.7 4398388 35.4
4 A 2001-01-05 41.0 41.7 38.3 39.4 3277052 33.5
5 A 2001-01-08 38.8 39.9 37.4 38.1 2273288 32.4
6 A 2001-01-09 38.3 39.3 37.1 37.9 2474180 32.3
...
1 A 2001-12-21 19.7 20.2 19.7 20.0 3732520 17.0
2 A 2001-12-24 20.4 20.5 20.1 20.4 1246177 17.3
3 A 2001-12-26 20.5 20.7 20.1 20.1 2467051 17.1
4 A 2001-12-27 20.0 20.7 20.0 20.6 1909948 17.5
5 A 2001-12-28 20.7 20.9 20.4 20.7 1600430 17.6
6 A 2001-12-31 20.5 20.8 20.4 20.4 2142016 **17.3**
A_stock_prices %>%
tq_transmute (select = adjusted,
mutate_fun = periodReturn,
period = "yearly") %>%
ungroup()
# A tibble: 5 x 2
date yearly.returns
<date> <dbl>
1 2000-12-29 -0.240
2 2001-12-31 -0.479
3 2002-12-31 -0.370
4 2003-12-31 0.628
5 2004-12-30 -0.176
Now, based on the calculation, the yearly return for the year 2001 is: "-0.479"
But, when I calculate the yearly return myself (the close price at the end of the period divided by the close price at the beginning of the period), I get a different result:
A_stock_prices[A_stock_prices$date=="2001-12-31",]$adjusted/
A_stock_prices[A_stock_prices$date=="2001-01-02",]$adjusted-1
"-0.439"
Same issue persists with other time periods (e.g., monthly or weekly calculations).
What am I missing?
Update: The very strange thing is that if I change the time in the tq_get, to 2001:
A_stock_prices <- tq_get("A",
get = "stock.prices",
from = "2001-01-01",
to = "2004-01-01")
I get the correct result for the year 2001 (but not for other years)..
Not sure how your dataset is built but what's the first date for the 2001 group? Your manual attempt has it as January 2nd, 2001. If there's data present for January 1st, what's that result?
If that's not it, I'd recommend posting your data, just so we can see how it's structured.
Eventually I figured it out:
tq_get() calculates the return for a "day before" the requested period.
I.e., for the yearly return it calculates the return from (say) 31/12/2022 to 31/12/2021 (rather than to 01/01/2022).
I have a large panel data with provinces for each year-month. I would like to run a function through a list of data frames (that I create based on this initial data frame) in order to get a new column for each of them with the input of this function. However, when I run the code, I continue to get an error. Here is the code:
adm1 year month prov_code mean_temperaturec province_name avgpreci longitude latitude PET[,"PET_tho"]
<chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 TUR034 1978 1 TR100 5.61 Istanbul 170. 28.8 41.2 10.3
2 TUR034 1978 2 TR100 7.48 Istanbul 88 28.8 41.2 15.8
3 TUR034 1978 3 TR100 8.55 Istanbul 71 28.8 41.2 24.1
4 TUR034 1978 4 TR100 11.6 Istanbul 88.7 28.8 41.2 41.4
5 TUR034 1978 5 TR100 16.6 Istanbul 33.2 28.8 41.2 80.5
6 TUR034 1978 6 TR100 20.8 Istanbul 5.30 28.8 41.2 115.
# ... with 2 more variables: wbal <dbl[,1]>, SPEI <dbl>
data4spei.s <- split(dataSPEI, dataSPEI$prov_code)
spei_rows <- lapply(data4spei.s, function(x) {
x$SPEI <- spei(x$wbal, 12, na.rm = TRUE)
return(x)
})
Error in stop_vctrs(): ! Input must be a vector, not a
object. Run rlang::last_error() to see where the error occurred.
For a different function the code worked properly and I could get the columns. Does someone know what I am doing wrong?
I used the codes below to add a regression line after a boxplot.
boxplot(yield~Year, data=dfreg.raw,
ylab = 'Yield (bushels/acre)',
col = 'orange')
yield.year <- lm(yield~Year, data = dfreg.raw)
abline(reg = yield.year)
However, the regression line did not show up. The plot I got is below
My data looks like this. It's a panel data, which might end up problems with regression line.
> head(dfreg.raw)
# A tibble: 6 x 15
index Year yield State.Code harv frez_j dd_j cupc_j sm7_j fitted_j max_spring_j sp_spring_j
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 16001 1984 105 16 7200 330. 2438. 7.32 53.4 49.1 19.7 0.863
2 16001 1985 96.8 16 8200 413. 2407. 5.71 52.5 48.4 23.9 -0.391
3 16001 1986 94.9 16 7400 476. 2638. 8.34 52.5 48.4 23.4 -0.122
4 16001 1987 106. 16 9700 154. 2838. 5.44 54.4 49.9 25.6 -0.485
5 16001 1988 89.6 16 7600 184. 2944. 3.28 54.5 50.0 23.9 0.115
6 16001 1989 96.4 16 7300 383. 2766. 5.91 52.6 48.4 23.5 -1.02
# … with 3 more variables: pc_spring_j <dbl>, lt <dbl>, qt <dbl>
Anyone has any idea on this?
The x values are 1:max(levels of x variable), so the abline doesn't work. You can try something like this below.
First simulate a dataset:
dfreg.raw= data.frame(
yield=rpois(100,lambda=rep(seq(60,100,by=10),each=20)),
Year=rep(1995:1999,each=20)
)
Then plot:
boxplot(yield~Year, data=dfreg.raw,
ylab = 'Yield (bushels/acre)',
col = 'orange')
yield.year <- lm(yield~Year, data = dfreg.raw)
Get a unique ascending vector of Years, and predict
X = sort(unique(dfreg.raw$Year))
lines(x=1:length(X),
y=predict(yield.year,data.frame(Year=X)),col="blue",lty=8)
I have a large data frame with meteorological conditions at different locations (column radar_id), time (column date) and heights (column hgt).
I need to interpolate the data of each parameter (temp,u,v...) to a specific height (500 m above the ground for each radar- altitude_500 column) separately for each location (radar_id) and date.
I tried to do the approx command in dplyr pipes or splitting the data frame but it didn't work for me...
example of part of my data frame:
head (example)
radar_id date temp u v hgt W wind_ang temp_diff tw altitude_500
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Dagan 2014-03-02 18.8 -6.00 4.80 77 7.68 129. 5. -3.33 547
2 Dagan 2014-03-02 17.6 -2.40 9.30 742 9.60 166. 6 -9.20 547
3 Dagan 2014-03-02 16.2 3.10 15.4 1463 15.7 -169. 5.80 -10.4 547
4 Dagan 2014-03-03 16.2 0.900 -0.500 96 1.03 -60.9 -2.6 -0.971 547
5 Dagan 2014-03-03 13.0 3.10 -0.500 754 3.14 -80.8 -4.6 -2.39 547
6 Dagan 2014-03-03 10.8 8.10 4.10 1462 9.08 -117. -5.30 -5.01 547
I want to get a column with the y values from the approx command for each parameter (the x values are the height -hgt),at a specific height (by the altitude_500 column), after the data frame is grouped by radar_id and date .
Here's a dplyr solution. First, I define the data.
# Data
df <- read.table(text = "radar_id date temp u v hgt W wind_ang temp_diff tw altitude_500
1 Dagan 2014-03-02 18.8 -6.00 4.80 77 7.68 129. 5. -3.33 547
2 Dagan 2014-03-02 17.6 -2.40 9.30 742 9.60 166. 6 -9.20 547
3 Dagan 2014-03-02 16.2 3.10 15.4 1463 15.7 -169. 5.80 -10.4 547
4 Dagan 2014-03-03 16.2 0.900 -0.500 96 1.03 -60.9 -2.6 -0.971 547
5 Dagan 2014-03-03 13.0 3.10 -0.500 754 3.14 -80.8 -4.6 -2.39 547
6 Dagan 2014-03-03 10.8 8.10 4.10 1462 9.08 -117. -5.30 -5.01 547")
Then, I load the dplyr package.
# Load library
library(dplyr)
Finally, I group by both radar_id and date and perform a linear interpolation using approx to get the value at altitude_500 m for each column (except the grouping variables and hgt).
# Group then summarise
df %>%
group_by(radar_id, date) %>%
summarise_at(vars(-hgt), ~approx(hgt, ., xout = first(altitude_500))$y)
#> # A tibble: 2 x 10
#> # Groups: radar_id [1]
#> radar_id date temp u v W wind_ang temp_diff tw
#> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Dagan 2014~ 18.0 -3.46 7.98 9.04 155. 5.71 -7.48
#> 2 Dagan 2014~ 14.0 2.41 -0.5 2.48 -74.5 -3.97 -1.94
#> # ... with 1 more variable: altitude_500 <dbl>
Created on 2019-08-21 by the reprex package (v0.3.0)
This assumes that there is only one value of altitude_500 for each radar_id -date pair.
I am using the tidyquant package in R to calculate indicators for every symbol in the SP500.
As a sample of code:
stocks_w_price_indicators<- stocks2 %>%
group_by(symbol)%>%
tq_mutate(select=close,mutate_fun=RSI) %>%
tq_mutate(select=c(high,low,close),mutate_fun=CLV)
This works for price-based indicators, but not indicators that include volume.
I get "Evaluation error: argument "volume" is missing, with no default."
stocks_w_price_indicators<- stocks2 %>%
group_by(symbol)%>%
tq_mutate(select=close,mutate_fun=RSI) %>%
tq_mutate(select=c(high,low,close,volume),mutate_fun=CMF)
How can I get indicators that include volume to calculate properly?
There are a few functions from the TTR package that cannot be used with tidyquant. Reason being they need 3 inputs like adjRatios or need an HLC object and a volume column like the CMF function. Normally you would solve this by using the tq_mutate_xy function but this one cannot handle the HCL needed for the CMF function. If you would use the OBV function from TTR that needs a price and a volume column and works fine with tq_mutate_xy.
Now there are 2 options. One the CMF function needs to be adjusted to handle a (O)HLCV object. Or two, create your own function.
The last option is the fastest. Since the internals of the CMF function call on the CLV function you could use the first code block you have and extend it with a normal dplyr::mutate call to calculate the cmf.
# create function to calculate the chaikan money flow
tq_cmf <- function(clv, volume, n = 20){
runSum(clv * volume, n)/runSum(volume, n)
}
stocks_w_price_indicators <- stocks2 %>%
group_by(symbol) %>%
tq_mutate(select = close, mutate_fun = RSI) %>%
tq_mutate(select = c(high, low, close), mutate_fun = CLV) %>%
mutate(cmf = tq_cmf(clv, volume, 20))
# A tibble: 5,452 x 11
# Groups: symbol [2]
symbol date open high low close volume adjusted rsi clv cmf
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 MSFT 2008-01-02 35.8 36.0 35 35.2 63004200 27.1 NA -0.542 NA
2 MSFT 2008-01-03 35.2 35.7 34.9 35.4 49599600 27.2 NA 0.291 NA
3 MSFT 2008-01-04 35.2 35.2 34.1 34.4 72090800 26.5 NA -0.477 NA
4 MSFT 2008-01-07 34.5 34.8 34.2 34.6 80164300 26.6 NA 0.309 NA
5 MSFT 2008-01-08 34.7 34.7 33.4 33.5 79148300 25.7 NA -0.924 NA
6 MSFT 2008-01-09 33.4 34.5 33.3 34.4 74305500 26.5 NA 0.832 NA
7 MSFT 2008-01-10 34.3 34.5 33.8 34.3 72446000 26.4 NA 0.528 NA
8 MSFT 2008-01-11 34.1 34.2 33.7 33.9 55187900 26.1 NA -0.269 NA
9 MSFT 2008-01-14 34.5 34.6 34.1 34.4 52792200 26.5 NA 0.265 NA
10 MSFT 2008-01-15 34.0 34.4 34 34 61606200 26.2 NA -1 NA