Error in stepAIC() model building function in R - r

I have built a logistic regression model with the dependent variable WinParty, which outputs fine. Then when trying to do variable selection with stepAIC I keep getting this error
Data Structure
tibble [2,467 × 25] (S3: tbl_df/tbl/data.frame)
$ PollingPlace : chr [1:2467] "Abbotsbury" "Abbotsford" "Abbotsford East" "Aberdare" ...
$ CoalitionVotes : int [1:2467] 9438 15548 3960 3164 2370 4524 3186 10710 372 5993 ...
$ VoteDifference : num [1:2467] 0.1397 -0.0579 0.0796 -0.2454 0.2623 ...
$ Liberal.National.Coalition.Percentage: num [1:2467] 57 47.1 54 37.7 63.1 ...
$ WinParty : num [1:2467] 1 0 1 0 1 0 0 0 1 0 ...
$ Median_age_persons : num [1:2467] 43 46 41.5 37 41 31 37 36 57.5 41 ...
$ Median_mortgage_repay_monthly : num [1:2467] 2232 3000 2831 1452 1559 ...
$ Median_tot_prsnl_inc_weekly : num [1:2467] 818 1262 1380 627 719 ...
$ Median_rent_weekly : num [1:2467] 550 595 576 310 290 ...
$ Median_tot_fam_inc_weekly : num [1:2467] 2541 3062 3126 1521 2021 ...
$ Average_household_size : num [1:2467] 3.27 2.35 2.28 2.46 2.38 ...
$ Indig_Percent : num [1:2467] 0 0 1.09 10.94 10.61 ...
$ BirthPlace_Aus : num [1:2467] 60.9 67.9 61.7 90.9 89 ...
$ Other_lang_Percen : num [1:2467] 44.97 25.85 28.71 2.58 2.45 ...
$ Aus_Cit_Percent : num [1:2467] 91.5 91.5 86.6 93.7 91.9 ...
$ Yr12_Comp_Percent : num [1:2467] 49.7 57.1 62.7 25 23.1 ...
$ Pop_Density_SQKM : num [1:2467] 2849 6112 7951 1686 334 ...
$ Industrial_Percent : num [1:2467] 6.24 3.95 4.69 8.3 15.31 ...
$ Population_Serving_Percent : num [1:2467] 16 12.9 15.1 16.1 13.6 ...
$ Health_Education_Percent : num [1:2467] 9.26 11.43 10.28 9.07 7.79 ...
$ Knowledge_Intensive_Percent : num [1:2467] 11.31 19.64 17.06 7.44 6.56 ...
$ Over60_Yr : num [1:2467] 25.1 31.6 24.9 20.6 25.3 ...
$ GenZ : num [1:2467] 24.5 20 25.9 26.2 23.6 ...
$ GenX : num [1:2467] 27 29.1 26.6 25.8 26.1 ...
$ Millenials : num [1:2467] 23.3 20.3 19.7 27.3 27.1 ...
- attr(*, "na.action")= 'omit' Named int [1:8] 264 647 843 1332 1774 2033 2077 2138
..- attr(*, "names")= chr [1:8] "264" "647" "843" "1332" ...
The glm function computes the logistic regression with no errors
mod1 <- glm(WinParty~Median_age_persons+Median_rent_weekly+
Median_tot_fam_inc_weekly+Indig_Percent+BirthPlace_Aus+
Other_lang_Percen+Aus_Cit_Percent+Yr12_Comp_Percent+
Industrial_Percent+Population_Serving_Percent+Health_Education_Percent+
Knowledge_Intensive_Percent+Over60_Yr+GenZ+GenX+Millenials,
family = binomial(link = "logit"), data = GS_PP_Agg)
summary(mod1)
step1 <- stepAIC(mod1, scope = list(lower = "~1",upper = "~Median_age_persons+Median_rent_weekly+
Median_tot_fam_inc_weekly+Indig_Percent+BirthPlace_Aus+
Other_lang_Percen+Aus_Cit_Percent+Yr12_Comp_Percent+
Industrial_Percent+Population_Serving_Percent+Health_Education_Percent+
Knowledge_Intensive_Percent+Over60_Yr+GenZ+GenX+Millenials"), data = GS_PP_Agg)
Step AIC function returns the error:
"Error in FUN(left, right) : non-numeric argument to binary operator"
Some help in solving this error would be greatly appreciated!

Related

R lag doesn't work inside a loop with counter variable?

This works outside of a loop
> past=lag(zoo(c(new$SPCS20RSA)), c(-1,-2,-3,-4,-5), na.pad =TRUE)
> print(past)
lag-1 lag-2 lag-3 lag-4 lag-5
1 NA NA NA NA NA
2 169.5526 NA NA NA NA
3 169.5526 169.5526 NA NA NA
I want to replace new$SPCS20RSA with new[i] (or [a])
I if I run lag inside a loop trying to utilize a counter, I receive an error.
error:
for (i in 1:10)
{
#doesn't work in a loop
past = lag(c(new[i]), c(-1,-2, -3, -4, -5), na.pad =TRUE)
print(past)
}
Error in attr(x, "tsp") <- value : 'tsp' attribute must be numeric
of length three In addition: Warning messages: 1: In if (k !=
round(k)) { : the condition has length > 1 and only the first
element will be used 2: In (k/p[3L]) * c(1, 1, 0) : longer object
length is not a multiple of shorter object length 3: In p - (k/p[3L])
* c(1, 1, 0) : longer object length is not a multiple of shorter object length
If I try [,i]
Error in new[, i] : incorrect number of dimensions
contents of new as requested
> str(new)
List of 79
$ date : Date[1:516], format: "2008-01-01" "2008-04-01" "2008-05-04" ...
$ CPIAUCSL : num [1:516] 215 215 215 216 216 ...
$ UNRATE : num [1:516] 5.4 5.4 5.4 5.45 5.5 5.55 5.6 5.64 5.68 5.72 ...
$ MEHOINUSA672N : num [1:516] 56076 55979 55944 55936 55929 ...
$ INTDSRUSM193N : num [1:516] 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 ...
$ CIVPART : num [1:516] 66.1 66.1 66.1 66.1 66.1 66.1 66.1 66.1 66.1 66.1 ...
$ LFWA64TTUSM647S : num [1:516] 1.96e+08 1.96e+08 1.96e+08 1.96e+08 1.96e+08 ...
$ FEDFUNDS : num [1:516] 1.98 1.98 1.98 1.98 1.99 ...
$ GDPC1 : num [1:516] 14963 14963 14939 14933 14928 ...
$ A191RL1Q225SBEA : num [1:516] 2 2 0.6594 0.375 0.0906 ...
$ SP500 : num [1:516] 1412 1412 1412 1401 1413 ...
$ DCOILWTICO : num [1:516] 114 114 114 123 125 ...
$ CSUSHPINSA : num [1:516] 167 167 167 167 167 ...
$ DFF : num [1:516] 1.95 1.95 1.95 1.95 1.94 ...
$ DFII10 : num [1:516] 1.52 1.52 1.52 1.48 1.44 ...
$ A939RX0Q048SBEA : num [1:516] 49196 49196 49074 49048 49022 ...
$ PCEPILFE : num [1:516] 98.7 98.7 98.7 98.8 98.8 ...
$ GDPDEF : num [1:516] 99 99 99.2 99.3 99.3 ...
$ SPCS20RSA : num [1:516] 170 170 170 169 168 ...
$ GDPPOT : num [1:516] 15068 15068 15090 15094 15099 ...
$ CPILFESL : num [1:516] 215 215 215 215 215 ...
$ GOLDAMGBD228NLBM: num [1:516] 859 859 859 878 875 ...
$ CPIAUCNS : num [1:516] 217 217 217 217 218 ...
$ VIXCLS : num [1:516] 18.5 18.5 18.5 19.1 17.2 ...
$ WPU0911 : num [1:516] 171 171 171 171 172 ...
$ PCEPI : num [1:516] 100 100 100 100 100 ...
$ USSTHPI : num [1:516] 361 361 357 356 355 ...
$ DSPIC96 : num [1:516] 11432 11432 11432 11356 11281 ...
$ DCOILBRENTEU : num [1:516] 110 110 110 120 123 ...
$ FPCPITOTLZGUSA : num [1:516] 3.84 2.8 2.43 2.35 2.27 ...
$ PCEC96 : num [1:516] 10084 10084 10084 10081 10077 ...
$ PPIACO : num [1:516] 197 197 197 198 199 ...
$ MEPAINUSA672N : num [1:516] 29556 29477 29448 29442 29436 ...
$ GDPCA : num [1:516] 14830 14729 14692 14684 14676 ...
$ MPRIME : num [1:516] 5 5 5 5 5 5 5 5 5 5 ...
$ PAYEMS : num [1:516] 137870 137870 137870 137832 137793 ...
$ CES0500000003 : num [1:516] 21.5 21.5 21.5 21.5 21.5 ...
$ RECPROUSM156N : num [1:516] 78.2 78.2 78.2 79.7 81.2 ...
$ IC4WSA : num [1:516] 363500 363500 363500 363750 368250 ...
$ AHETPI : num [1:516] 18 18 18 18 18 ...
$ M2V : num [1:516] 1.92 1.92 1.92 1.92 1.91 ...
$ INDPRO : num [1:516] 103 103 103 103 103 ...
$ PCE : num [1:516] 10093 10093 10093 10107 10121 ...
$ UMCSENT : num [1:516] 59.8 59.8 59.8 58.9 58.1 ...
$ HDTGPDUSQ163N : num [1:516] 97.7 97.7 98 98.1 98.1 ...
$ M1V : num [1:516] 10.6 10.6 10.5 10.5 10.5 ...
$ TCU : num [1:516] 79.6 79.6 79.6 79.6 79.5 ...
$ STLFSI : num [1:516] 0.99 0.99 0.99 0.757 0.653 0.585 0.667 0.743 0.889 0.886 ...
$ BASE : num [1:516] 857 857 857 857 859 ...
$ PSAVERT : num [1:516] 7.9 7.9 7.9 7.3 6.7 6.1 5.5 5.28 5.06 4.84 ...
$ M2 : num [1:516] 7673 7673 7673 7673 7688 ...
$ M1 : num [1:516] 1394 1394 1394 1394 1393 ...
$ M1SL : num [1:516] 1394 1394 1394 1396 1398 ...
$ M2SL : num [1:516] 7696 7696 7696 7701 7705 ...
$ T10Y2Y : num [1:516] 1.42 1.42 1.42 1.53 1.41 ...
$ DGS10 : num [1:516] 3.83 3.83 3.83 3.85 3.86 ...
$ BAMLH0A0HYM2 : num [1:516] 6.72 6.72 6.72 6.76 6.73 ...
$ TB3MS : num [1:516] 1.73 1.73 1.73 1.76 1.79 ...
$ T10YIE : num [1:516] 2.31 2.31 2.31 2.37 2.41 ...
$ TEDRATE : num [1:516] 1.33 1.33 1.33 1.09 0.91 ...
$ GFDEGDQ188S : num [1:516] 64.1 64.1 65.3 65.5 65.8 ...
$ T5YIFR : num [1:516] 2.33 2.33 2.33 2.43 2.48 ...
$ T10Y3M : num [1:516] 2.36 2.36 2.36 2.21 2.04 ...
$ DGS1 : num [1:516] 1.96 1.96 1.96 1.94 2.07 ...
$ USSLIND : num [1:516] 0.02 0.02 0.02 -0.01 -0.04 -0.07 -0.1 -0.162 -0.224 -0.286 ...
$ BAMLC0A4CBBB : num [1:516] 2.98 2.98 2.98 2.96 2.94 ...
$ GFDEBTN : num [1:516] 9492006 9492006 9675128 9713972 9752816 ...
$ DGS2 : num [1:516] 2.42 2.42 2.42 2.32 2.44 ...
$ GS10 : num [1:516] 3.88 3.88 3.88 3.93 3.99 ...
$ DGS5 : num [1:516] 3.12 3.12 3.12 3.07 3.12 ...
$ DGS30 : num [1:516] 4.53 4.53 4.53 4.57 4.58 ...
$ TREAST : num [1:516] 536714 536714 536714 536714 515656 ...
$ BAA10Y : num [1:516] 3 3 3 3.04 3.06 ...
$ BAMLC0A0CM : num [1:516] 2.55 2.55 2.55 2.52 2.51 ...
$ BAMLH0A3HYC : num [1:516] 10.6 10.6 10.6 10.6 10.6 ...
$ FYFSD : num [1:516] -458553 -458553 -458553 -458553 -458553 ...
$ DGS1MO : num [1:516] 1.24 1.24 1.24 1.52 1.83 ...
$ T5YIE : num [1:516] 2.29 2.29 2.29 2.3 2.34 ...
$ FutureSPCS20RSA : num 170
> new <- head(new)
> print(new)
$`date`
[1] "2008-01-01" "2008-04-01" "2008-05-04" "2008-05-11" "2008-05-18" "2008-05-25" "2008-06-01" "2008-06-08"
[9] "2008-06-15" "2008-06-22" "2008-06-29" "2008-07-06" "2008-07-13" "2008-07-20" "2008-07-27" "2008-08-03"
[17] "2008-08-10" "2008-08-17" "2008-08-24" "2008-08-31" "2008-09-07" "2008-09-14" "2008-09-21" "2008-09-28"
[25] "2008-10-05" "2008-10-12" "2008-10-19" "2008-10-26" "2008-11-02" "2008-11-09" "2008-11-16" "2008-11-23"
[33] "2008-11-30" "2008-12-07" "2008-12-14" "2008-12-21" "2008-12-28" "2009-01-04" "2009-01-11" "2009-01-18"
[41] "2009-01-25" "2009-02-01" "2009-02-08" "2009-02-15" "2009-02-22" "2009-03-01" "2009-03-08" "2009-03-15"
[49] "2009-03-22" "2009-03-29" "2009-04-05" "2009-04-12" "2009-04-19" "2009-04-26" "2009-05-03" "2009-05-10"
[57] "2009-05-17" "2009-05-24" "2009-05-31" "2009-06-07" "2009-06-14" "2009-06-21" "2009-06-28" "2009-07-05"
[65] "2009-07-12" "2009-07-19" "2009-07-26" "2009-08-02" "2009-08-09" "2009-08-16" "2009-08-23" "2009-08-30"
[73] "2009-09-06" "2009-09-13" "2009-09-20" "2009-09-27" "2009-10-04" "2009-10-11" "2009-10-18" "2009-10-25"
[81] "2009-11-01" "2009-11-08" "2009-11-15" "2009-11-22" "2009-11-29" "2009-12-06" "2009-12-13" "2009-12-20"
[89] "2009-12-27" "2010-01-03" "2010-01-10" "2010-01-17" "2010-01-24" "2010-01-31" "2010-02-07" "2010-02-14"
[97] "2010-02-21" "2010-02-28" "2010-03-07" "2010-03-14" "2010-03-21" "2010-03-28" "2010-04-04" "2010-04-11"
[105] "2010-04-18" "2010-04-25" "2010-05-02" "2010-05-09" "2010-05-16" "2010-05-23" "2010-05-30" "2010-06-06"
[113] "2010-06-13" "2010-06-20" "2010-06-27" "2010-07-04" "2010-07-11" "2010-07-18" "2010-07-25" "2010-08-01"
[121] "2010-08-08" "2010-08-15" "2010-08-22" "2010-08-29" "2010-09-05" "2010-09-12" "2010-09-19" "2010-09-26"
[129] "2010-10-03" "2010-10-10" "2010-10-17" "2010-10-24" "2010-10-31" "2010-11-07" "2010-11-14" "2010-11-21"
[137] "2010-11-28" "2010-12-05" "2010-12-12" "2010-12-19" "2010-12-26" "2011-01-02" "2011-01-09" "2011-01-16"
[145] "2011-01-23" "2011-01-30" "2011-02-06" "2011-02-13" "2011-02-20" "2011-02-27" "2011-03-06" "2011-03-13"
[153] "2011-03-20" "2011-03-27" "2011-04-03" "2011-04-10" "2011-04-17" "2011-04-24" "2011-05-01" "2011-05-08"
[161] "2011-05-15" "2011-05-22" "2011-05-29" "2011-06-05" "2011-06-12" "2011-06-19" "2011-06-26" "2011-07-03"
[169] "2011-07-10" "2011-07-17" "2011-07-24" "2011-07-31" "2011-08-07" "2011-08-14" "2011-08-21" "2011-08-28"
[177] "2011-09-04" "2011-09-11" "2011-09-18" "2011-09-25" "2011-10-02" "2011-10-09" "2011-10-16" "2011-10-23"
[185] "2011-10-30" "2011-11-06" "2011-11-13" "2011-11-20" "2011-11-27" "2011-12-04" "2011-12-11" "2011-12-18"
[193] "2011-12-25" "2012-01-01" "2012-01-08" "2012-01-15" "2012-01-22" "2012-01-29" "2012-02-05" "2012-02-12"
[201] "2012-02-19" "2012-02-26" "2012-03-04" "2012-03-11" "2012-03-18" "2012-03-25" "2012-04-01" "2012-04-08"
[209] "2012-04-15" "2012-04-22" "2012-04-29" "2012-05-06" "2012-05-13" "2012-05-20" "2012-05-27" "2012-06-03"
[217] "2012-06-10" "2012-06-17" "2012-06-24" "2012-07-01" "2012-07-08" "2012-07-15" "2012-07-22" "2012-07-29"
[225] "2012-08-05" "2012-08-12" "2012-08-19" "2012-08-26" "2012-09-02" "2012-09-09" "2012-09-16" "2012-09-23"
[233] "2012-09-30" "2012-10-07" "2012-10-14" "2012-10-21" "2012-10-28" "2012-11-04" "2012-11-11" "2012-11-18"
[241] "2012-11-25" "2012-12-02" "2012-12-09" "2012-12-16" "2012-12-23" "2012-12-30" "2013-01-06" "2013-01-13"
[249] "2013-01-20" "2013-01-27" "2013-02-03" "2013-02-10" "2013-02-17" "2013-02-24" "2013-03-03" "2013-03-10"
[257] "2013-03-17" "2013-03-24" "2013-03-31" "2013-04-07" "2013-04-14" "2013-04-21" "2013-04-28" "2013-05-05"
[265] "2013-05-12" "2013-05-19" "2013-05-26" "2013-06-02" "2013-06-09" "2013-06-16" "2013-06-23" "2013-06-30"
[273] "2013-07-07" "2013-07-14" "2013-07-21" "2013-07-28" "2013-08-04" "2013-08-11" "2013-08-18" "2013-08-25"
[281] "2013-09-01" "2013-09-08" "2013-09-15" "2013-09-22" "2013-09-29" "2013-10-06" "2013-10-13" "2013-10-20"
[289] "2013-10-27" "2013-11-03" "2013-11-10" "2013-11-17" "2013-11-24" "2013-12-01" "2013-12-08" "2013-12-15"
[297] "2013-12-22" "2013-12-29" "2014-01-05" "2014-01-12" "2014-01-19" "2014-01-26" "2014-02-02" "2014-02-09"
[305] "2014-02-16" "2014-02-23" "2014-03-02" "2014-03-09" "2014-03-16" "2014-03-23" "2014-03-30" "2014-04-06"
[313] "2014-04-13" "2014-04-20" "2014-04-27" "2014-05-04" "2014-05-11" "2014-05-18" "2014-05-25" "2014-06-01"
[321] "2014-06-08" "2014-06-15" "2014-06-22" "2014-06-29" "2014-07-06" "2014-07-13" "2014-07-20" "2014-07-27"
[329] "2014-08-03" "2014-08-10" "2014-08-17" "2014-08-24" "2014-08-31" "2014-09-07" "2014-09-14" "2014-09-21"
[337] "2014-09-28" "2014-10-05" "2014-10-12" "2014-10-19" "2014-10-26" "2014-11-02" "2014-11-09" "2014-11-16"
[345] "2014-11-23" "2014-11-30" "2014-12-07" "2014-12-14" "2014-12-21" "2014-12-28" "2015-01-04" "2015-01-11"
[353] "2015-01-18" "2015-01-25" "2015-02-01" "2015-02-08" "2015-02-15" "2015-02-22" "2015-03-01" "2015-03-08"
[361] "2015-03-15" "2015-03-22" "2015-03-29" "2015-04-05" "2015-04-12" "2015-04-19" "2015-04-26" "2015-05-03"
[369] "2015-05-10" "2015-05-17" "2015-05-24" "2015-05-31" "2015-06-07" "2015-06-14" "2015-06-21" "2015-06-28"
[377] "2015-07-05" "2015-07-12" "2015-07-19" "2015-07-26" "2015-08-02" "2015-08-09" "2015-08-16" "2015-08-23"
[385] "2015-08-30" "2015-09-06" "2015-09-13" "2015-09-20" "2015-09-27" "2015-10-04" "2015-10-11" "2015-10-18"
[393] "2015-10-25" "2015-11-01" "2015-11-08" "2015-11-15" "2015-11-22" "2015-11-29" "2015-12-06" "2015-12-13"
[401] "2015-12-20" "2015-12-27" "2016-01-03" "2016-01-10" "2016-01-17" "2016-01-24" "2016-01-31" "2016-02-07"
[409] "2016-02-14" "2016-02-21" "2016-02-28" "2016-03-06" "2016-03-13" "2016-03-20" "2016-03-27" "2016-04-03"
[417] "2016-04-10" "2016-04-17" "2016-04-24" "2016-05-01" "2016-05-08" "2016-05-15" "2016-05-22" "2016-05-29"
[425] "2016-06-05" "2016-06-12" "2016-06-19" "2016-06-26" "2016-07-03" "2016-07-10" "2016-07-17" "2016-07-24"
[433] "2016-07-31" "2016-08-07" "2016-08-14" "2016-08-21" "2016-08-28" "2016-09-04" "2016-09-11" "2016-09-18"
[441] "2016-09-25" "2016-10-02" "2016-10-09" "2016-10-16" "2016-10-23" "2016-10-30" "2016-11-06" "2016-11-13"
[449] "2016-11-20" "2016-11-27" "2016-12-04" "2016-12-11" "2016-12-18" "2016-12-25" "2017-01-01" "2017-01-08"
[457] "2017-01-15" "2017-01-22" "2017-01-29" "2017-02-05" "2017-02-12" "2017-02-19" "2017-02-26" "2017-03-05"
[465] "2017-03-12" "2017-03-19" "2017-03-26" "2017-04-02" "2017-04-09" "2017-04-16" "2017-04-23" "2017-04-30"
[473] "2017-05-07" "2017-05-14" "2017-05-21" "2017-05-28" "2017-06-04" "2017-06-11" "2017-06-18" "2017-06-25"
[481] "2017-07-02" "2017-07-09" "2017-07-16" "2017-07-23" "2017-07-30" "2017-08-06" "2017-08-13" "2017-08-20"
[489] "2017-08-27" "2017-09-03" "2017-09-10" "2017-09-17" "2017-09-24" "2017-10-01" "2017-10-08" "2017-10-15"
[497] "2017-10-22" "2017-10-29" "2017-11-05" "2017-11-12" "2017-11-19" "2017-11-26" "2017-12-03" "2017-12-10"
[505] "2017-12-17" "2017-12-24" "2017-12-31" "2018-01-07" "2018-01-14" "2018-01-21" "2018-01-28" "2018-02-04"
[513] "2018-02-11" "2018-02-18" "2018-02-25" "2018-03-01"
$CPIAUCSL
[1] 215.2080 215.2080 215.2080 215.7717 216.3355 216.8992 217.4630 217.7736 218.0842 218.3948 218.7054
[12] 219.0160 218.9345 218.8530 218.7715 218.6900 218.7274 218.7648 218.8022 218.8396 218.8770 218.4065
[23] 217.9360 217.4655 216.9950 216.0345 215.0740 214.1135 213.1530 212.8020 212.4510 212.1000 211.7490
[34] 211.3980 211.5317 211.6655 211.7993 211.9330 212.1260 212.3190 212.5120 212.7050 212.6525 212.6000
[45] 212.5475 212.4950 212.5378 212.5806 212.6234 212.6662 212.7090 212.7873 212.8655 212.9437 213.0220
[56] 213.3756 213.7292 214.0828 214.4364 214.7900 214.7740 214.7580 214.7420 214.7260 214.9058 215.0855
[67] 215.2652 215.4450 215.5282 215.6114 215.6946 215.7778 215.8610 216.0230 216.1850 216.3470 216.5090
[78] 216.6902 216.8715 217.0528 217.2340 217.2566 217.2792 217.3018 217.3244 217.3470 217.3822 217.4175
[89] 217.4528 217.4880 217.4466 217.4052 217.3638 217.3224 217.2810 217.2990 217.3170 217.3350 217.3530
Solution
for (i in parsedList)
{
past = lag(zoo(c(new[[i]])), c(-1,-2, -3, -4, -5), na.pad =TRUE)
print(temp)
}
The Error: unexpected '}' in "}" error comes from a syntax error. You have one superfluous opening parenthesis in front of c()
a=1
for (i in parsedList)
{
#doesn't work in a loop
past = lag(c(new[a]), c(-1,-2, -3, -4, -5), na.pad =TRUE)
a=a+1
}
I hope you defined 'new' before, else you get error:
Error in new[a] : object of type 'closure' is not subsettable
(because it is a function used for object oriented programming, to create new objects).

R - incorrect number of subscripts on matrix

Currently, I am writing my bachelor thesis in economics. One part of my work is a comparison of ETF returns and the returns of their benchmark indices. For this, I want to use an r-script. At the moment I have loading my raw closing prices in the program and named the tables "ETFs" and "Benchmark". My next step is to calculate the daily returns of the ETFs with this closing prices. I've started with a for-loop but it failed.
The error term was:
Error in daylyreturn_ETFs[r, c] <- (ETFs[r, currColName]/ETFs[(r + 1), :
incorrect number of subscripts on matrix
Varaibles:
ETFs:
'data.frame': 1672 obs. of 21 variables:
$ Name : Factor w/ 1636 levels "01.02.2010","01.02.2011",..: 1608 1557 1502 1449 1252 1194 1139 1084 1029 863 ...
$ iShares.Core.S.P.500.USD.Acc : num 203 203 206 205 205 ...
$ iShares.Core.DAX.U.00AE...DE. : num 100 100 100 100 100 ...
$ iShares.Core.MSCI.World.USD.Acc : num 42 42.2 42.8 42.5 42.6 ...
$ iShares.S.P.500.USD.Dist : num 21.2 21.3 21.7 21.6 21.5 ...
$ iShares.EURO.STOXX.50..DE. : num 33.1 32.9 33 33 32.9 ...
$ iShares.Core..U.0080..Corp.Bond.EUR.Dist : num 130 130 130 130 130 ...
$ Lyxor.Euro.Stoxx.50.DR.ETF.D.EUR.A.I : num 31.9 31.9 32 32 32 ...
$ iShares..U.0080..High.Yield.Corp.Bond.EUR.Dist: num 107 107 106 106 106 ...
$ iShares.JP.Morgan...EM.Bond.USD.Dist : num 104 104 105 104 104 ...
$ iShares.MSCI.Europe.Dist : num 22.6 22.5 22.6 22.5 22.5 ...
$ iShares.STOXX.Europe.600..DE. : num 36.1 36 36.1 36 36 ...
$ iShares.EURO.STOXX.50.Dist : num 33.1 33.1 33.2 33.2 33.2 ...
$ iShares.MSCI.World.USD.Dist : num 35.5 35.5 35.9 35.8 35.7 ...
$ iShares.Edge.MSCI.USA.Size.Factor : num 5.28 5.27 5.3 5.29 4.32 4.32 4.31 4.32 4.33 4.28 ...
$ ETFS.Physical.Gold : num 106 106 106 105 104 ...
$ iShares.iBonds.Mar.2020.Term.Corp.exFncl : num 24.6 24.6 24.5 24.5 24.5 ...
$ iShares.Euro.Corporate.Bond.Large.Cap : num 135 135 135 135 135 ...
$ db.x.trackers.Euro.Stoxx.50..DR..1D : num 34.8 34.6 34.7 34.7 34.6 ...
$ db.x.trackers.Euro.Stoxx.50..DR..1C : num 44.1 44 44.1 44.1 44 ...
$ Xetra.Gold : num 35.3 35.5 35.3 35 34.9 ...`
Code:
library(readr)
werte <- read_delim("~/Uni Frankfurt/Semester 7/Bachelorarbeit/R/werte.csv", ";", escape_double = FALSE, trim_ws = TRUE)
library(readr)
werte1 <- read_delim("~/Uni Frankfurt/Semester 7/Bachelorarbeit/R/werte1.csv", ";", escape_double = FALSE, trim_ws = TRUE)
write.table(x=werte, file = "werte.dat", sep = ";", dec = ",", row.names = FALSE, col.names = TRUE)
ETFs <- read.table("werte.dat", sep = ";", dec = ",", header = TRUE)
ETFs
write.table(x=werte1, file = "werte1.dat", sep = ";", dec = ".", row.names = FALSE, col.names = TRUE)
Benchmark <- read.table("werte1.dat", sep = ";", dec = ".", header = TRUE)
Benchmark
dailyreturn_ETFs<- array()
str(ETFs)
for(c in 2:ncol(ETFs))
{
currColName <- colnames(ETFs)[c];
for(r in nrow(ETFs[c])-1:1)
{
dailyreturn_ETFs[r,c] <- (ETFs[r,currColName]/ETFs[(r+1),currColName])-1
}
}
I am very thankful for every help. If you need additional information the get rid of the problem don't hesitate to ask.

r quantregForest() error: NA's produced by integer overflow lead to an invalid argument in the rep() function

I am trying to use the quantregForest() function from the quantregForest package (which is built on the randomForest package.)
I tried to train the model using:
qrf_model <- quantregForest(x=Xtrain, y=Ytrain, importance=TRUE, ntree=10)
and I get the following error message (even after reducing the number of trees from 100 to 10):
Error in rep(0, nobs * nobs * npred) : invalid 'times' argument
plus a warning:
In nobs * nobs * npred : NAs produced by integer overflow
The data frame Xtrain has 38 numeric variables, and it looks like this:
> str(Xtrain)
'data.frame': 31132 obs. of 38 variables:
$ X1 : num 301306 6431 2293 1264 32477 ...
$ X2 : num 173.2 143.5 43.4 180.6 1006.2 ...
$ X3 : num 0.1598 0.1615 0.1336 0.0953 0.1988 ...
$ X4 : num 0.662 0.25 0.71 0.709 0.671 ...
$ X5 : num 0.05873 0.0142 0 0.00154 0.09517 ...
$ X6 : num 0.01598 0 0.0023 0.00154 0.01634 ...
$ X7 : num 0.07984 0.03001 0.00845 0.04304 0.09326 ...
$ X8 : num 0.92 0.97 0.992 0.957 0.907 ...
$ X9 : num 105208 1842 830 504 11553 ...
$ X10: num 69974 1212 611 352 7080 ...
$ X11: num 0.505 0.422 0.55 0.553 0.474 ...
$ X12: num 0.488 0.401 0.536 0.541 0.45 ...
$ X13: num 0.333 0.419 0.257 0.282 0.359 ...
$ X14: num 0.187 0.234 0.172 0.207 0.234 ...
$ X15: num 0.369 0.216 0.483 0.412 0.357 ...
$ X16: num 0.0765 0.1205 0.0262 0.054 0.0624 ...
$ X17: num 2954 77 12 10 739 ...
$ X18: num 2770 43 9 21 433 119 177 122 20 17 ...
$ X19: num 3167 72 49 25 622 ...
$ X20: num 3541 57 14 24 656 ...
$ X21: num 3361 82 0 33 514 ...
$ X22: num 3929 27 10 48 682 ...
$ X23: num 3695 73 61 15 643 ...
$ X24: num 4781 52 5 14 680 ...
$ X25: num 3679 103 5 23 404 ...
$ X26: num 7716 120 55 40 895 ...
$ X27: num 11043 195 72 48 1280 ...
$ X28: num 16080 332 160 83 1684 ...
$ X29: num 12312 125 124 62 1015 ...
$ X30: num 8218 99 36 22 577 ...
$ X31: num 9957 223 146 26 532 ...
$ X32: num 0.751 0.444 0.621 0.527 0.682 ...
$ X33: num 0.01873 0 0 0.00317 0.02112 ...
$ X34: num 0.563 0.372 0.571 0.626 0.323 ...
$ X35: num 0.366 0.39 0.156 0.248 0.549 ...
$ X36: num 0.435 0.643 0.374 0.505 0.36 ...
$ X37: num 0.526 0.31 0.577 0.441 0.591 ...
$ X38: num 0.00163 0 0 0 0.00155 0.00103 0 0 0 0 ...
And the response variable Ytrain looks like this:
> str(Ytrain)
num [1:31132] 2605 56 8 16 214 ...
I checked that neither Xtrain or Ytrain contain any NA's by:
> sum(is.na(Xtrain))
[1] 0
> sum(is.na(Ytrain))
[1] 0
I am assuming that the error message for the invalid "times" argument for the rep(0, nobs * nobs * npred)) function comes from the NA value assigned to the product nobs * nobs * npred due to an integer overflow.
What I do not understand is where the integer overflow comes from. None of my variables are of the integer class so what am I missing?
I examined the source code for the quantregForest() function and the source code for the method predict.imp called by the quantregForest() function.
I found that nobs stands for the number of observations. In the case above nobs =length(Ytrain) = 31132 . The variable npred stands for the number of predictors. It is given by npred = ncol(Xtrain)=38. Both npred and nobs are of class integer, and
npred*npred*nobs = 31132*31132*38 = 36829654112.
And herein lies the root cause of the error, since:
npred*npred*nobs = 36829654112 > 2147483647,
where 2147483647 is the maximal integer value in R. Hence the integer overflow warning and the replacement of the product npred*npred*nobs with an NA.
The bottom line is, in order to avoid the error message I will have to use quite a bit fewer observations when training the model or set importance=FALSE in the quantregForest() function argument. The computations required to find variable importance are very memory intensive, even when using less then 10000 observations.

Combine data frames after using rvest

My task is to grab baseball data from all 30 teams and combine it all into one table. However, I keep getting integer(0) as a return. Here are my data frames:
install.packages("rvest")
library(rvest)
# Store web url
baseball1 <- read_html("http://www.baseball-reference.com/teams/ARI/")
#Scrape the website for the franchise table
franch1 <- baseball1 %>%
html_nodes("#franchise_years") %>%
html_table()
franch1
# Store web url
baseball2 <- read_html("http://www.baseball-reference.com/teams/ATL/")
#Scrape the website for the franchise table
franch2 <- baseball2 %>%
html_nodes("#franchise_years") %>%
html_table()
franch2
Here is the structure of the data frame: str(franch1)
List of 1
$ :'data.frame': 18 obs. of 21 variables:
..$ Rk : int [1:18] 1 2 3 4 5 6 7 8 9 10 ...
..$ Year : int [1:18] 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 ...
..$ Tm : chr [1:18] "Arizona Diamondbacks" "Arizona Diamondbacks" "Arizona Diamondbacks" "Arizona Diamondbacks" ...
..$ Lg : chr [1:18] "NL West" "NL West" "NL West" "NL West" ...
..$ G : int [1:18] 162 162 162 162 162 162 162 162 162 162 ...
..$ W : int [1:18] 79 64 81 81 94 65 70 82 90 76 ...
..$ L : int [1:18] 83 98 81 81 68 97 92 80 72 86 ...
..$ Ties : int [1:18] 0 0 0 0 0 0 0 0 0 0 ...
..$ W-L% : num [1:18] 0.488 0.395 0.5 0.5 0.58 0.401 0.432 0.506 0.556 0.469 ...
..$ pythW-L% : num [1:18] 0.504 0.415 0.493 0.53 0.545 0.428 0.462 0.509 0.487 0.491 ...
..$ Finish : chr [1:18] "3rd of 5" "5th of 5" "2nd of 5" "3rd of 5" ...
..$ GB : chr [1:18] "13.0" "30.0" "11.0" "13.0" ...
..$ Playoffs : chr [1:18] "" "" "" "" ...
..$ R : int [1:18] 720 615 685 734 731 713 720 720 712 773 ...
..$ RA : int [1:18] 713 742 695 688 662 836 782 706 732 788 ...
..$ BatAge : num [1:18] 26.6 27.6 28.1 28.3 28.2 26.8 26.5 26.7 26.6 29.6 ...
..$ PAge : num [1:18] 27.1 28 27.6 27.4 27.4 27.9 27.7 29.4 28.2 28.8 ...
..$ #Bat : int [1:18] 50 52 44 48 51 48 45 41 47 45 ...
..$ #P : int [1:18] 27 25 23 23 25 28 24 20 26 25 ...
..$ Top Player: chr [1:18] "P.Goldschmidt (8.8)" "P.Goldschmidt (4.5)" "P.Goldschmidt (7.1)" "A.Hill (5.0)" ...
..$ Managers : chr [1:18] "C.Hale (79-83)" "K.Gibson (63-96) and A.Trammell (1-2)" "K.Gibson (81-81)" "K.Gibson (81-81)" ...
What function do I use to combine these data frames? Your help is much appreciated and let me know if I need to provide additional info.
It's because your franchise tables are listed as data frame values that needed to be converted into data frames still. Also, "read_html" didn't work for me I use "html" instead.
Try this:
# Store web url using "html" not "read_html"
baseball1 <- html("http://www.baseball-reference.com/teams/ARI/")
#Scrape the website for the franchise table
franch1 <- baseball1 %>%
html_nodes("#franchise_years") %>%
html_table()
franch1
# Store web url
baseball2 <- html("http://www.baseball-reference.com/teams/ATL/")
#Scrape the website for the franchise table
franch2 <- baseball2 %>%
html_nodes("#franchise_years") %>%
html_table()
franch2
franch1 <- as.data.frame(franch1)
franch2 <- as.data.frame(franch2)
franchMerged <- rbind(franch1, franch2)
Let me know if that works for you.

Unexpected filled with color timeserie using ggplot

I'm a beginner at ggplot, and I tried to use it to draw some timeserie data.
I want to draw bound_transporter_in_evolution.mean as a function of time, in different conditions where the attribute p_off (float) varies.
p4 <- ggplot(data=df, aes(x=timesteps.mean)) +
geom_line(aes(y=bound_transporter_in_evolution.mean, color=p_off)) +
xlab(label="Time (s)") +
ylab(label="Number of bound 'in' transporters")
ggsave("p4.pdf", width=8, height=3.3)
I get the following plot:
I expected this result, but with a line instead of points:
Thank you
since p_off is a numeric variable, ggplot will create only one line connecting all the dots and color it along the values. If you want separated lines, you have do transform your colouring variable into a factor(assuming you have a limited number of different values). Let's take an example with a numeric color variable:
df=data.frame(x=c(1:5, 1:5), y=rnorm(10), z=c(1,1,1,1,1,2,2,2,2,2))
ggplot(data=df, aes(x=x)) + geom_line(aes(x=x, y=y, color=z))
Which doesn't make any sense since consecutive points come from different categories. And now turn it into a factor:
ggplot(data=df, aes(x=x)) + geom_line(aes(x=x, y=y, color=factor(z)))
In your first graph, the line constantly goes from one p_off value to another, and since you have a really big dataset it quickly saturates the screen.
Here is the output of str(df):
'data.frame': 150010 obs. of 34 variables:
$ bound_transporter_evolution.low : num [1:150010(1d)] 0 11.4 26.1 41.8 48.2 ...
$ bound_transporter_evolution.mean : num [1:150010(1d)] 0 15 28.2 45 53.8 63.8 71.6 77.8 86.2 91.2 ...
$ bound_transporter_evolution.up : num [1:150010(1d)] 0 18.6 30.3 48.2 59.4 ...
$ bound_transporter_in_evolution.low : num [1:150010(1d)] 0 11.4 26.1 41.8 48.2 ...
$ bound_transporter_in_evolution.mean : num [1:150010(1d)] 0 15 28.2 45 53.8 63.8 71.6 77.8 86.2 91.2 ...
$ bound_transporter_in_evolution.up : num [1:150010(1d)] 0 18.6 30.3 48.2 59.4 ...
$ bound_transporter_out_evolution.low : num [1:150010(1d)] 0 0 0 0 0 0 0 0 0 0 ...
$ bound_transporter_out_evolution.mean: num [1:150010(1d)] 0 0 0 0 0 0 0 0 0 0 ...
$ bound_transporter_out_evolution.up : num [1:150010(1d)] 0 0 0 0 0 0 0 0 0 0 ...
$ free_transporter_evolution.low : num [1:150010(1d)] 200 181 170 152 141 ...
$ free_transporter_evolution.mean : num [1:150010(1d)] 200 185 172 155 146 ...
$ free_transporter_evolution.up : num [1:150010(1d)] 200 189 174 158 152 ...
$ free_transporter_in_evolution.low : num [1:150010(1d)] 186 172 158 139 127 ...
$ free_transporter_in_evolution.mean : num [1:150010(1d)] 188 173 160 143 135 ...
$ free_transporter_in_evolution.up : num [1:150010(1d)] 191 175 162 148 142 ...
$ free_transporter_out_evolution.low : num [1:150010(1d)] 9.18 9.18 9.18 9.18 9.18 ...
$ free_transporter_out_evolution.mean : num [1:150010(1d)] 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 ...
$ free_transporter_out_evolution.up : num [1:150010(1d)] 14 14 14 14 14 ...
$ glutamate_evolution.low : num [1:150010(1d)] 2000 1981 1970 1951 1939 ...
$ glutamate_evolution.mean : num [1:150010(1d)] 2000 1985 1971 1954 1943 ...
$ glutamate_evolution.up : num [1:150010(1d)] 2000 1989 1973 1957 1948 ...
$ p_off : num [1:150010(1d)] 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 ...
$ simulation_name : Factor w/ 1 level "Variable p-off large diffusion-limited area": 1 1 1 1 1 1 1 1 1 1 ...
$ timesteps.low : num [1:150010(1d)] 0e+00 1e-06 2e-06 3e-06 4e-06 5e-06 6e-06 7e-06 8e-06 9e-06 ...
$ timesteps.mean : num [1:150010(1d)] 0e+00 1e-06 2e-06 3e-06 4e-06 5e-06 6e-06 7e-06 8e-06 9e-06 ...
$ timesteps.up : num [1:150010(1d)] 0e+00 1e-06 2e-06 3e-06 4e-06 5e-06 6e-06 7e-06 8e-06 9e-06 ...
$ transporter_in_evolution.low : num [1:150010(1d)] 186 186 186 186 186 ...
$ transporter_in_evolution.mean : num [1:150010(1d)] 188 188 188 188 188 ...
$ transporter_in_evolution.up : num [1:150010(1d)] 191 191 191 191 191 ...
$ transporter_out_evolution.low : num [1:150010(1d)] 9.18 9.18 9.18 9.18 9.18 ...
$ transporter_out_evolution.mean : num [1:150010(1d)] 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 ...
$ transporter_out_evolution.up : num [1:150010(1d)] 14 14 14 14 14 ...
$ variable_parameter : Factor w/ 1 level "p_off": 1 1 1 1 1 1 1 1 1 1 ...
$ variable_value : num [1:150010(1d)] 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 ...

Resources