I have a large panel data with provinces for each year-month. I would like to run a function through a list of data frames (that I create based on this initial data frame) in order to get a new column for each of them with the input of this function. However, when I run the code, I continue to get an error. Here is the code:
adm1 year month prov_code mean_temperaturec province_name avgpreci longitude latitude PET[,"PET_tho"]
<chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 TUR034 1978 1 TR100 5.61 Istanbul 170. 28.8 41.2 10.3
2 TUR034 1978 2 TR100 7.48 Istanbul 88 28.8 41.2 15.8
3 TUR034 1978 3 TR100 8.55 Istanbul 71 28.8 41.2 24.1
4 TUR034 1978 4 TR100 11.6 Istanbul 88.7 28.8 41.2 41.4
5 TUR034 1978 5 TR100 16.6 Istanbul 33.2 28.8 41.2 80.5
6 TUR034 1978 6 TR100 20.8 Istanbul 5.30 28.8 41.2 115.
# ... with 2 more variables: wbal <dbl[,1]>, SPEI <dbl>
data4spei.s <- split(dataSPEI, dataSPEI$prov_code)
spei_rows <- lapply(data4spei.s, function(x) {
x$SPEI <- spei(x$wbal, 12, na.rm = TRUE)
return(x)
})
Error in stop_vctrs(): ! Input must be a vector, not a
object. Run rlang::last_error() to see where the error occurred.
For a different function the code worked properly and I could get the columns. Does someone know what I am doing wrong?
Related
I am running this code in order to get a bound test on stock datas.
Everything is working until I made my ardlBoundOrders and get the following error : Error in match.arg(method) : 'arg' must be of length 1
Where this error comes from ? Is that possible this comes from the merged dataset (since I run the code without any problem when I only use excel imported dataset) ? How to fix it ?
Thanks for your help!
Here is the script :
library(quantmod)
library(ggplot2)
library(plotly)
library(dLagM)
tickers = c("DIS", "GILD", "AMZN", "AAPL")
stocks<-getSymbols(tickers,
from = "1994-01-01",
to = "2022-02-01",
periodicity = "monthly",
src = "yahoo")
DISclose<-DIS[, 4:4]
GILDclose<-GILD[, 4:4]
AMZNclose<-AMZN[, 4:4]
AAPLclose<-AAPL[, 4:4]
newdata <- merge(DATA, DISclose)
formula <- DIS.Close ~ USDEUR+CPI+CONSCONF+FEDFUNDS+HOUST+UNRATE+INDPRO+VIX+SPY+CLI
ARDLfit <- ardlDlm(formula = formula, data = newdata, p = 10, q = 10)
summary(ARDLfit)
orders3 <- ardlBoundOrders(data = newdata, formula =
formula, ic = "BIC", max.p = 2, max.q = 2)
p <- data.frame(orders3$q, orders3$p) + 1
Boundtest<- ardlBound(data = DATA, formula =
formula2, p=p , ECM = TRUE)
par(mfrow=c(1,1))
disney<-Boundtest[["ECM"]][["EC.t"]]
plot(disney, type="l")
Update :
I think I found something :
When I merge my datas, it square them by allocating each of the stocks data on each of my rows datas. An example would be more explicit :
Here is the variable DATA :
> DATA
# A tibble: 337 × 12
Date VIX USDEUR CPI CONSCONF FEDFUNDS HOUST SPY INDPRO UNRATE
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1994-01-01 00:00:00 10.6 0.897 146. 101. 3.05 1272 28.8 67.1 6.6
2 1994-02-01 00:00:00 14.9 0.895 147. 101. 3.25 1337 28.0 67.1 6.6
3 1994-03-01 00:00:00 20.5 0.876 147. 101. 3.34 1564 26.7 67.8 6.5
4 1994-04-01 00:00:00 13.8 0.877 147. 101. 3.56 1465 27.1 68.2 6.4
5 1994-05-01 00:00:00 13.0 0.859 148. 101. 4.01 1526 27.6 68.5 6.1
6 1994-06-01 00:00:00 15.0 0.846 148. 101. 4.25 1409 26.7 69.0 6.1
7 1994-07-01 00:00:00 11.1 0.818 148. 101. 4.26 1439 27.8 69.1 6.1
8 1994-08-01 00:00:00 12.0 0.818 149 101. 4.47 1450 28.8 69.5 6
9 1994-09-01 00:00:00 14.3 0.810 149. 101. 4.73 1474 27.9 69.7 5.9
10 1994-10-01 00:00:00 14.6 0.793 149. 101. 4.76 1450 28.9 70.3 5.8
# … with 327 more rows, and 2 more variables: CLI <dbl>, SPYr <dbl>
Here is the variable merged newdata :
CLI SPYr DIS.Close
1 100.52128 0.0000000000 15.53738
2 100.70483 -0.0291642024 15.53738
3 100.83927 -0.0473966064 15.53738
4 100.92260 0.0170457821 15.53738
5 100.95804 0.0159393078 15.53738
6 100.95186 -0.0293319435 15.53738
7 100.91774 0.0391511218 15.53738
8 100.86948 0.0381206253 15.53738
9 100.80795 -0.0311470101 15.53738
10 100.72614 0.0346814791 15.53738
11 100.60322 -0.0398155024 15.53738
12 100.42905 -0.0006857954 15.53738
13 100.19862 0.0418493643 15.53738
In fact, for each row of DATA there is the first row of DIScloseand so on for the 2nd, the 3rd... Then my dataset go from x row to x^2 row.
I did some research to fix this problem, and I should match both datasets through by="matchingIDinbothdataset" but I do not have matching ID. Is there a solution ?
Thank you in advance.
I try to calculate stock returns for different time periods for a very large dataset.
I noticed that there are some inconsistencies with tq_mutate calculations and my checking:
library(tidyquant)
A_stock_prices <- tq_get("A",
get = "stock.prices",
from = "2000-01-01",
to = "2004-12-31")
print(A_stock_prices[A_stock_prices$date>"2000-12-31",])
# A tibble: 1,003 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 2001-01-02 38.5 38.5 35.1 36.4 2261684 **31.0**
2 A 2001-01-03 35.1 40.4 34.0 40.1 4502678 34.2
3 A 2001-01-04 40.7 42.7 39.6 41.7 4398388 35.4
4 A 2001-01-05 41.0 41.7 38.3 39.4 3277052 33.5
5 A 2001-01-08 38.8 39.9 37.4 38.1 2273288 32.4
6 A 2001-01-09 38.3 39.3 37.1 37.9 2474180 32.3
...
1 A 2001-12-21 19.7 20.2 19.7 20.0 3732520 17.0
2 A 2001-12-24 20.4 20.5 20.1 20.4 1246177 17.3
3 A 2001-12-26 20.5 20.7 20.1 20.1 2467051 17.1
4 A 2001-12-27 20.0 20.7 20.0 20.6 1909948 17.5
5 A 2001-12-28 20.7 20.9 20.4 20.7 1600430 17.6
6 A 2001-12-31 20.5 20.8 20.4 20.4 2142016 **17.3**
A_stock_prices %>%
tq_transmute (select = adjusted,
mutate_fun = periodReturn,
period = "yearly") %>%
ungroup()
# A tibble: 5 x 2
date yearly.returns
<date> <dbl>
1 2000-12-29 -0.240
2 2001-12-31 -0.479
3 2002-12-31 -0.370
4 2003-12-31 0.628
5 2004-12-30 -0.176
Now, based on the calculation, the yearly return for the year 2001 is: "-0.479"
But, when I calculate the yearly return myself (the close price at the end of the period divided by the close price at the beginning of the period), I get a different result:
A_stock_prices[A_stock_prices$date=="2001-12-31",]$adjusted/
A_stock_prices[A_stock_prices$date=="2001-01-02",]$adjusted-1
"-0.439"
Same issue persists with other time periods (e.g., monthly or weekly calculations).
What am I missing?
Update: The very strange thing is that if I change the time in the tq_get, to 2001:
A_stock_prices <- tq_get("A",
get = "stock.prices",
from = "2001-01-01",
to = "2004-01-01")
I get the correct result for the year 2001 (but not for other years)..
Not sure how your dataset is built but what's the first date for the 2001 group? Your manual attempt has it as January 2nd, 2001. If there's data present for January 1st, what's that result?
If that's not it, I'd recommend posting your data, just so we can see how it's structured.
Eventually I figured it out:
tq_get() calculates the return for a "day before" the requested period.
I.e., for the yearly return it calculates the return from (say) 31/12/2022 to 31/12/2021 (rather than to 01/01/2022).
I have a large panel data with provinces for each year-month. I would like to run a function through a list of data frames (that I create based on this initial data frame) in order to get a new column for each of them with the input of this function. However, when I run the code, the new column does not appear. Here is the code:
> head(dataSPEI)
# A tibble: 6 x 11
adm1 year month prov_code mean_temperaturec neighboors province_name avgpreci longitude latitude PET
<chr> <dbl> <dbl> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 TUR034 1978 1 TR100 5.61 TR100, TR21~ Istanbul 170. 28.8 41.2 0
2 TUR034 1978 2 TR100 7.48 TR100, TR21~ Istanbul 88 28.8 41.2 0
3 TUR034 1978 3 TR100 8.55 TR100, TR21~ Istanbul 71 28.8 41.2 0
4 TUR034 1978 4 TR100 11.6 TR100, TR21~ Istanbul 88.7 28.8 41.2 0
5 TUR034 1978 5 TR100 16.6 TR100, TR21~ Istanbul 33.2 28.8 41.2 0
6 TUR034 1978 6 TR100 20.8 TR100, TR21~ Istanbul 5.30 28.8 41.2 0
dat.s <- split(dataSPEI, dataSPEI$prov_code)
lapply(dat.s, function(x) {
x$PET <- thornthwaite(x$mean_temperaturec, x$latitude[1])
return(x)
})
Does someone know what I am doing wrong?
Try assigning the result of the lapply call to an object; in this case, you can assign back to the originating list of dataframes
dat.s<-lapply(...)
I am using the ET.PenmanMonteith function ("Evapotranspiration" package, R). I have a list called data1 with Tmax, Tmin, RHmax, RHmin, Rs and u2, and also Date.daily(date) and Date.monthly(yearmon). Then, i have another list called constants, with all the constants required. I run the code but i get an error ("Error in aggregate.data.frame(as.data.frame(x), ...) : no rows to aggregate"). My code is:
data=read_excel("prueba.xlsx")
head(data)
Tmax Tmin RHmax RHmin u2 Rs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 30.2 19.4 55.8 100 2.1 18.5
2 33.6 19.8 30.8 69.6 3.3 29.9
3 34.4 16 27.8 83.3 1.5 31.4
4 35.8 17 28.8 89.5 1.7 31.1
5 36.4 18 31.1 90.5 1.7 31.2
6 37.6 20.4 35.4 95.8 1.5 31.4
rnames=read_excel("prueba.xlsx",sheet="Hoja1")
head(rnames)
Tmax Tmin RHmax RHmin u2 Rs ...7 Date.daily Date.monthly
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <chr> <chr>
1 30.2 19.4 55.8 100 10 18.5 NA 1990-1-1 1990-1
2 33.6 19.8 30.8 69.6 16 29.9 NA 1990-1-2 1990-1
3 34.4 16 27.8 83.3 7 31.4 NA 1990-1-3 1990-1
4 35.8 17 28.8 89.5 8 31.1 NA 1990-1-4 1990-1
5 36.4 18 31.1 90.5 8 31.2 NA 1990-1-5 1990-1
6 37.6 20.4 35.4 95.8 7 31.4 NA 1990-1-6 1990-1
rnames=rnames[8]
colnames(rnames)="Date.daily"
rnames=as.Date(rnames$Date.daily)
rnames2=as.yearmon(rnames)
data1=cbind(rnames,rnames2,data)
as.Date(data1$rnames)
colnames(data1)=c("Data.daily","Data.monthly","Tmax","Tmin","RHmax","RHmin","u2","Rs")
data1=as.list(data1)
#List file is seen like this:[enter image description here][1]
#Constants data
constants=read_excel("constants.xlsx")
head(constants)
lambda sigma Gsc lat lat_rad as bs Elev z
<dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl> <dbl> <dbl>
1 2.45 0.00000000490 0.082 -29.9 -0.521 NA NA 88 2
constants=as.list(constants)
#List file is seen like this:[enter image description here][2]
res=ET.PenmanMonteith(data1, constants, ts="daily", solar="data",
wind="yes", crop="short", message="yes",
AdditionalStats="yes", save.csv="no")
Error in aggregate.data.frame(as.data.frame(x), ...) :
no rows to aggregate```
Could anyone help me?
[1]: https://i.stack.imgur.com/vJD78.png
[2]: https://i.stack.imgur.com/tHNU7.png
I used the codes below to add a regression line after a boxplot.
boxplot(yield~Year, data=dfreg.raw,
ylab = 'Yield (bushels/acre)',
col = 'orange')
yield.year <- lm(yield~Year, data = dfreg.raw)
abline(reg = yield.year)
However, the regression line did not show up. The plot I got is below
My data looks like this. It's a panel data, which might end up problems with regression line.
> head(dfreg.raw)
# A tibble: 6 x 15
index Year yield State.Code harv frez_j dd_j cupc_j sm7_j fitted_j max_spring_j sp_spring_j
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 16001 1984 105 16 7200 330. 2438. 7.32 53.4 49.1 19.7 0.863
2 16001 1985 96.8 16 8200 413. 2407. 5.71 52.5 48.4 23.9 -0.391
3 16001 1986 94.9 16 7400 476. 2638. 8.34 52.5 48.4 23.4 -0.122
4 16001 1987 106. 16 9700 154. 2838. 5.44 54.4 49.9 25.6 -0.485
5 16001 1988 89.6 16 7600 184. 2944. 3.28 54.5 50.0 23.9 0.115
6 16001 1989 96.4 16 7300 383. 2766. 5.91 52.6 48.4 23.5 -1.02
# … with 3 more variables: pc_spring_j <dbl>, lt <dbl>, qt <dbl>
Anyone has any idea on this?
The x values are 1:max(levels of x variable), so the abline doesn't work. You can try something like this below.
First simulate a dataset:
dfreg.raw= data.frame(
yield=rpois(100,lambda=rep(seq(60,100,by=10),each=20)),
Year=rep(1995:1999,each=20)
)
Then plot:
boxplot(yield~Year, data=dfreg.raw,
ylab = 'Yield (bushels/acre)',
col = 'orange')
yield.year <- lm(yield~Year, data = dfreg.raw)
Get a unique ascending vector of Years, and predict
X = sort(unique(dfreg.raw$Year))
lines(x=1:length(X),
y=predict(yield.year,data.frame(Year=X)),col="blue",lty=8)