Adding a column to a list of dataframe with lapply - r

I have a large panel data with provinces for each year-month. I would like to run a function through a list of data frames (that I create based on this initial data frame) in order to get a new column for each of them with the input of this function. However, when I run the code, the new column does not appear. Here is the code:
> head(dataSPEI)
# A tibble: 6 x 11
adm1 year month prov_code mean_temperaturec neighboors province_name avgpreci longitude latitude PET
<chr> <dbl> <dbl> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 TUR034 1978 1 TR100 5.61 TR100, TR21~ Istanbul 170. 28.8 41.2 0
2 TUR034 1978 2 TR100 7.48 TR100, TR21~ Istanbul 88 28.8 41.2 0
3 TUR034 1978 3 TR100 8.55 TR100, TR21~ Istanbul 71 28.8 41.2 0
4 TUR034 1978 4 TR100 11.6 TR100, TR21~ Istanbul 88.7 28.8 41.2 0
5 TUR034 1978 5 TR100 16.6 TR100, TR21~ Istanbul 33.2 28.8 41.2 0
6 TUR034 1978 6 TR100 20.8 TR100, TR21~ Istanbul 5.30 28.8 41.2 0
dat.s <- split(dataSPEI, dataSPEI$prov_code)
lapply(dat.s, function(x) {
x$PET <- thornthwaite(x$mean_temperaturec, x$latitude[1])
return(x)
})
Does someone know what I am doing wrong?

Try assigning the result of the lapply call to an object; in this case, you can assign back to the originating list of dataframes
dat.s<-lapply(...)

Related

Error in match.arg(method), where it comes from?

I am running this code in order to get a bound test on stock datas.
Everything is working until I made my ardlBoundOrders and get the following error : Error in match.arg(method) : 'arg' must be of length 1
Where this error comes from ? Is that possible this comes from the merged dataset (since I run the code without any problem when I only use excel imported dataset) ? How to fix it ?
Thanks for your help!
Here is the script :
library(quantmod)
library(ggplot2)
library(plotly)
library(dLagM)
tickers = c("DIS", "GILD", "AMZN", "AAPL")
stocks<-getSymbols(tickers,
from = "1994-01-01",
to = "2022-02-01",
periodicity = "monthly",
src = "yahoo")
DISclose<-DIS[, 4:4]
GILDclose<-GILD[, 4:4]
AMZNclose<-AMZN[, 4:4]
AAPLclose<-AAPL[, 4:4]
newdata <- merge(DATA, DISclose)
formula <- DIS.Close ~ USDEUR+CPI+CONSCONF+FEDFUNDS+HOUST+UNRATE+INDPRO+VIX+SPY+CLI
ARDLfit <- ardlDlm(formula = formula, data = newdata, p = 10, q = 10)
summary(ARDLfit)
orders3 <- ardlBoundOrders(data = newdata, formula =
formula, ic = "BIC", max.p = 2, max.q = 2)
p <- data.frame(orders3$q, orders3$p) + 1
Boundtest<- ardlBound(data = DATA, formula =
formula2, p=p , ECM = TRUE)
par(mfrow=c(1,1))
disney<-Boundtest[["ECM"]][["EC.t"]]
plot(disney, type="l")
Update :
I think I found something :
When I merge my datas, it square them by allocating each of the stocks data on each of my rows datas. An example would be more explicit :
Here is the variable DATA :
> DATA
# A tibble: 337 × 12
Date VIX USDEUR CPI CONSCONF FEDFUNDS HOUST SPY INDPRO UNRATE
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1994-01-01 00:00:00 10.6 0.897 146. 101. 3.05 1272 28.8 67.1 6.6
2 1994-02-01 00:00:00 14.9 0.895 147. 101. 3.25 1337 28.0 67.1 6.6
3 1994-03-01 00:00:00 20.5 0.876 147. 101. 3.34 1564 26.7 67.8 6.5
4 1994-04-01 00:00:00 13.8 0.877 147. 101. 3.56 1465 27.1 68.2 6.4
5 1994-05-01 00:00:00 13.0 0.859 148. 101. 4.01 1526 27.6 68.5 6.1
6 1994-06-01 00:00:00 15.0 0.846 148. 101. 4.25 1409 26.7 69.0 6.1
7 1994-07-01 00:00:00 11.1 0.818 148. 101. 4.26 1439 27.8 69.1 6.1
8 1994-08-01 00:00:00 12.0 0.818 149 101. 4.47 1450 28.8 69.5 6
9 1994-09-01 00:00:00 14.3 0.810 149. 101. 4.73 1474 27.9 69.7 5.9
10 1994-10-01 00:00:00 14.6 0.793 149. 101. 4.76 1450 28.9 70.3 5.8
# … with 327 more rows, and 2 more variables: CLI <dbl>, SPYr <dbl>
Here is the variable merged newdata :
CLI SPYr DIS.Close
1 100.52128 0.0000000000 15.53738
2 100.70483 -0.0291642024 15.53738
3 100.83927 -0.0473966064 15.53738
4 100.92260 0.0170457821 15.53738
5 100.95804 0.0159393078 15.53738
6 100.95186 -0.0293319435 15.53738
7 100.91774 0.0391511218 15.53738
8 100.86948 0.0381206253 15.53738
9 100.80795 -0.0311470101 15.53738
10 100.72614 0.0346814791 15.53738
11 100.60322 -0.0398155024 15.53738
12 100.42905 -0.0006857954 15.53738
13 100.19862 0.0418493643 15.53738
In fact, for each row of DATA there is the first row of DIScloseand so on for the 2nd, the 3rd... Then my dataset go from x row to x^2 row.
I did some research to fix this problem, and I should match both datasets through by="matchingIDinbothdataset" but I do not have matching ID. Is there a solution ?
Thank you in advance.

Error: Input must be a vector, not a <spei> object

I have a large panel data with provinces for each year-month. I would like to run a function through a list of data frames (that I create based on this initial data frame) in order to get a new column for each of them with the input of this function. However, when I run the code, I continue to get an error. Here is the code:
adm1 year month prov_code mean_temperaturec province_name avgpreci longitude latitude PET[,"PET_tho"]
<chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 TUR034 1978 1 TR100 5.61 Istanbul 170. 28.8 41.2 10.3
2 TUR034 1978 2 TR100 7.48 Istanbul 88 28.8 41.2 15.8
3 TUR034 1978 3 TR100 8.55 Istanbul 71 28.8 41.2 24.1
4 TUR034 1978 4 TR100 11.6 Istanbul 88.7 28.8 41.2 41.4
5 TUR034 1978 5 TR100 16.6 Istanbul 33.2 28.8 41.2 80.5
6 TUR034 1978 6 TR100 20.8 Istanbul 5.30 28.8 41.2 115.
# ... with 2 more variables: wbal <dbl[,1]>, SPEI <dbl>
data4spei.s <- split(dataSPEI, dataSPEI$prov_code)
spei_rows <- lapply(data4spei.s, function(x) {
x$SPEI <- spei(x$wbal, 12, na.rm = TRUE)
return(x)
})
Error in stop_vctrs(): ! Input must be a vector, not a
object. Run rlang::last_error() to see where the error occurred.
For a different function the code worked properly and I could get the columns. Does someone know what I am doing wrong?

Pivot Longer with Modification of Columns

I have data that is in the following format:
(data <- tribble(
~Date, ~ENRSxOPEN, ~ENRSxCLOSE, ~INFTxOPEN, ~INFTxCLOSE,
"1989-09-11",82.97,82.10,72.88,72.56,
"1989-09-12",83.84,83.96,73.52,72.51,
"1989-09-13",83.16,83.88,72.91,72.12))
# A tibble: 3 x 5
Date ENRSxOPEN ENRSxCLOSE INFTxOPEN INFTxCLOSE
<chr> <dbl> <dbl> <dbl> <dbl>
1 1989-09-11 83.0 82.1 72.9 72.6
2 1989-09-12 83.8 84.0 73.5 72.5
3 1989-09-13 83.2 83.9 72.9 72.1
For analysis, I want to pivot this tibble longer to the following format:
tribble(
~Ticker, ~Date, ~OPEN, ~CLOSE,
"ENRS","1989-09-11",82.97,82.10,
"ENRS","1989-09-12",83.84,83.96,
"ENRS","1989-09-13",83.16,83.88,
"INFT","1989-09-11",72.88,72.56,
"INFT","1989-09-12",73.52,72.51,
"INFT","1989-09-13",72.91,72.12)
# A tibble: 3 x 5
Date ENRSxOPEN ENRSxCLOSE INFTxOPEN INFTxCLOSE
<chr> <dbl> <dbl> <dbl> <dbl>
1 1989-09-11 83.0 82.1 72.9 72.6
2 1989-09-12 83.8 84.0 73.5 72.5
3 1989-09-13 83.2 83.9 72.9 72.1
I.e., I want to separate the Open/Close prices from the ticker, and put the latter as an entirely new column in the beginning.
I've tried to use the function pivot_longer:
pivot_longer(data, cols = ENRSxOPEN:INFTxCLOSE)
While this goes into the direction of what I wanna achieve, it does not separate the prices and keep them in one row for each Ticker.
Is there a way to add additional arguments to pivot_longer()to achieve that?
pivot_longer(data, -Date, names_to = c('Ticker', '.value'), names_sep = 'x')
# A tibble: 6 x 4
Date Ticker OPEN CLOSE
<dbl> <chr> <dbl> <dbl>
1 1969 ENRS 83.0 82.1
2 1969 INFT 72.9 72.6
3 1968 ENRS 83.8 84.0
4 1968 INFT 73.5 72.5
5 1967 ENRS 83.2 83.9
6 1967 INFT 72.9 72.1

ET.PenmanMonteith function in R: Error in aggregate.data.frame(as.data.frame(x), ...) : no rows to aggregate

I am using the ET.PenmanMonteith function ("Evapotranspiration" package, R). I have a list called data1 with Tmax, Tmin, RHmax, RHmin, Rs and u2, and also Date.daily(date) and Date.monthly(yearmon). Then, i have another list called constants, with all the constants required. I run the code but i get an error ("Error in aggregate.data.frame(as.data.frame(x), ...) : no rows to aggregate"). My code is:
data=read_excel("prueba.xlsx")
head(data)
Tmax Tmin RHmax RHmin u2 Rs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 30.2 19.4 55.8 100 2.1 18.5
2 33.6 19.8 30.8 69.6 3.3 29.9
3 34.4 16 27.8 83.3 1.5 31.4
4 35.8 17 28.8 89.5 1.7 31.1
5 36.4 18 31.1 90.5 1.7 31.2
6 37.6 20.4 35.4 95.8 1.5 31.4
rnames=read_excel("prueba.xlsx",sheet="Hoja1")
head(rnames)
Tmax Tmin RHmax RHmin u2 Rs ...7 Date.daily Date.monthly
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <chr> <chr>
1 30.2 19.4 55.8 100 10 18.5 NA 1990-1-1 1990-1
2 33.6 19.8 30.8 69.6 16 29.9 NA 1990-1-2 1990-1
3 34.4 16 27.8 83.3 7 31.4 NA 1990-1-3 1990-1
4 35.8 17 28.8 89.5 8 31.1 NA 1990-1-4 1990-1
5 36.4 18 31.1 90.5 8 31.2 NA 1990-1-5 1990-1
6 37.6 20.4 35.4 95.8 7 31.4 NA 1990-1-6 1990-1
rnames=rnames[8]
colnames(rnames)="Date.daily"
rnames=as.Date(rnames$Date.daily)
rnames2=as.yearmon(rnames)
data1=cbind(rnames,rnames2,data)
as.Date(data1$rnames)
colnames(data1)=c("Data.daily","Data.monthly","Tmax","Tmin","RHmax","RHmin","u2","Rs")
data1=as.list(data1)
#List file is seen like this:[enter image description here][1]
#Constants data
constants=read_excel("constants.xlsx")
head(constants)
lambda sigma Gsc lat lat_rad as bs Elev z
<dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl> <dbl> <dbl>
1 2.45 0.00000000490 0.082 -29.9 -0.521 NA NA 88 2
constants=as.list(constants)
#List file is seen like this:[enter image description here][2]
res=ET.PenmanMonteith(data1, constants, ts="daily", solar="data",
wind="yes", crop="short", message="yes",
AdditionalStats="yes", save.csv="no")
Error in aggregate.data.frame(as.data.frame(x), ...) :
no rows to aggregate```
Could anyone help me?
[1]: https://i.stack.imgur.com/vJD78.png
[2]: https://i.stack.imgur.com/tHNU7.png

Add regression line in boxplot r

I used the codes below to add a regression line after a boxplot.
boxplot(yield~Year, data=dfreg.raw,
ylab = 'Yield (bushels/acre)',
col = 'orange')
yield.year <- lm(yield~Year, data = dfreg.raw)
abline(reg = yield.year)
However, the regression line did not show up. The plot I got is below
My data looks like this. It's a panel data, which might end up problems with regression line.
> head(dfreg.raw)
# A tibble: 6 x 15
index Year yield State.Code harv frez_j dd_j cupc_j sm7_j fitted_j max_spring_j sp_spring_j
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 16001 1984 105 16 7200 330. 2438. 7.32 53.4 49.1 19.7 0.863
2 16001 1985 96.8 16 8200 413. 2407. 5.71 52.5 48.4 23.9 -0.391
3 16001 1986 94.9 16 7400 476. 2638. 8.34 52.5 48.4 23.4 -0.122
4 16001 1987 106. 16 9700 154. 2838. 5.44 54.4 49.9 25.6 -0.485
5 16001 1988 89.6 16 7600 184. 2944. 3.28 54.5 50.0 23.9 0.115
6 16001 1989 96.4 16 7300 383. 2766. 5.91 52.6 48.4 23.5 -1.02
# … with 3 more variables: pc_spring_j <dbl>, lt <dbl>, qt <dbl>
Anyone has any idea on this?
The x values are 1:max(levels of x variable), so the abline doesn't work. You can try something like this below.
First simulate a dataset:
dfreg.raw= data.frame(
yield=rpois(100,lambda=rep(seq(60,100,by=10),each=20)),
Year=rep(1995:1999,each=20)
)
Then plot:
boxplot(yield~Year, data=dfreg.raw,
ylab = 'Yield (bushels/acre)',
col = 'orange')
yield.year <- lm(yield~Year, data = dfreg.raw)
Get a unique ascending vector of Years, and predict
X = sort(unique(dfreg.raw$Year))
lines(x=1:length(X),
y=predict(yield.year,data.frame(Year=X)),col="blue",lty=8)

Resources