R lag doesn't work inside a loop with counter variable?

R lag doesn't work inside a loop with counter variable? - r

This works outside of a loop
> past=lag(zoo(c(new$SPCS20RSA)), c(-1,-2,-3,-4,-5), na.pad =TRUE)
> print(past)
lag-1 lag-2 lag-3 lag-4 lag-5
1 NA NA NA NA NA
2 169.5526 NA NA NA NA
3 169.5526 169.5526 NA NA NA
I want to replace new$SPCS20RSA with new[i] (or [a])
I if I run lag inside a loop trying to utilize a counter, I receive an error.
error:
for (i in 1:10)
{
#doesn't work in a loop
past = lag(c(new[i]), c(-1,-2, -3, -4, -5), na.pad =TRUE)
print(past)
}
Error in attr(x, "tsp") <- value : 'tsp' attribute must be numeric
of length three In addition: Warning messages: 1: In if (k !=
round(k)) { : the condition has length > 1 and only the first
element will be used 2: In (k/p[3L]) * c(1, 1, 0) : longer object
length is not a multiple of shorter object length 3: In p - (k/p[3L])
* c(1, 1, 0) : longer object length is not a multiple of shorter object length
If I try [,i]
Error in new[, i] : incorrect number of dimensions
contents of new as requested
> str(new)
List of 79
$ date : Date[1:516], format: "2008-01-01" "2008-04-01" "2008-05-04" ...
$ CPIAUCSL : num [1:516] 215 215 215 216 216 ...
$ UNRATE : num [1:516] 5.4 5.4 5.4 5.45 5.5 5.55 5.6 5.64 5.68 5.72 ...
$ MEHOINUSA672N : num [1:516] 56076 55979 55944 55936 55929 ...
$ INTDSRUSM193N : num [1:516] 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 ...
$ CIVPART : num [1:516] 66.1 66.1 66.1 66.1 66.1 66.1 66.1 66.1 66.1 66.1 ...
$ LFWA64TTUSM647S : num [1:516] 1.96e+08 1.96e+08 1.96e+08 1.96e+08 1.96e+08 ...
$ FEDFUNDS : num [1:516] 1.98 1.98 1.98 1.98 1.99 ...
$ GDPC1 : num [1:516] 14963 14963 14939 14933 14928 ...
$ A191RL1Q225SBEA : num [1:516] 2 2 0.6594 0.375 0.0906 ...
$ SP500 : num [1:516] 1412 1412 1412 1401 1413 ...
$ DCOILWTICO : num [1:516] 114 114 114 123 125 ...
$ CSUSHPINSA : num [1:516] 167 167 167 167 167 ...
$ DFF : num [1:516] 1.95 1.95 1.95 1.95 1.94 ...
$ DFII10 : num [1:516] 1.52 1.52 1.52 1.48 1.44 ...
$ A939RX0Q048SBEA : num [1:516] 49196 49196 49074 49048 49022 ...
$ PCEPILFE : num [1:516] 98.7 98.7 98.7 98.8 98.8 ...
$ GDPDEF : num [1:516] 99 99 99.2 99.3 99.3 ...
$ SPCS20RSA : num [1:516] 170 170 170 169 168 ...
$ GDPPOT : num [1:516] 15068 15068 15090 15094 15099 ...
$ CPILFESL : num [1:516] 215 215 215 215 215 ...
$ GOLDAMGBD228NLBM: num [1:516] 859 859 859 878 875 ...
$ CPIAUCNS : num [1:516] 217 217 217 217 218 ...
$ VIXCLS : num [1:516] 18.5 18.5 18.5 19.1 17.2 ...
$ WPU0911 : num [1:516] 171 171 171 171 172 ...
$ PCEPI : num [1:516] 100 100 100 100 100 ...
$ USSTHPI : num [1:516] 361 361 357 356 355 ...
$ DSPIC96 : num [1:516] 11432 11432 11432 11356 11281 ...
$ DCOILBRENTEU : num [1:516] 110 110 110 120 123 ...
$ FPCPITOTLZGUSA : num [1:516] 3.84 2.8 2.43 2.35 2.27 ...
$ PCEC96 : num [1:516] 10084 10084 10084 10081 10077 ...
$ PPIACO : num [1:516] 197 197 197 198 199 ...
$ MEPAINUSA672N : num [1:516] 29556 29477 29448 29442 29436 ...
$ GDPCA : num [1:516] 14830 14729 14692 14684 14676 ...
$ MPRIME : num [1:516] 5 5 5 5 5 5 5 5 5 5 ...
$ PAYEMS : num [1:516] 137870 137870 137870 137832 137793 ...
$ CES0500000003 : num [1:516] 21.5 21.5 21.5 21.5 21.5 ...
$ RECPROUSM156N : num [1:516] 78.2 78.2 78.2 79.7 81.2 ...
$ IC4WSA : num [1:516] 363500 363500 363500 363750 368250 ...
$ AHETPI : num [1:516] 18 18 18 18 18 ...
$ M2V : num [1:516] 1.92 1.92 1.92 1.92 1.91 ...
$ INDPRO : num [1:516] 103 103 103 103 103 ...
$ PCE : num [1:516] 10093 10093 10093 10107 10121 ...
$ UMCSENT : num [1:516] 59.8 59.8 59.8 58.9 58.1 ...
$ HDTGPDUSQ163N : num [1:516] 97.7 97.7 98 98.1 98.1 ...
$ M1V : num [1:516] 10.6 10.6 10.5 10.5 10.5 ...
$ TCU : num [1:516] 79.6 79.6 79.6 79.6 79.5 ...
$ STLFSI : num [1:516] 0.99 0.99 0.99 0.757 0.653 0.585 0.667 0.743 0.889 0.886 ...
$ BASE : num [1:516] 857 857 857 857 859 ...
$ PSAVERT : num [1:516] 7.9 7.9 7.9 7.3 6.7 6.1 5.5 5.28 5.06 4.84 ...
$ M2 : num [1:516] 7673 7673 7673 7673 7688 ...
$ M1 : num [1:516] 1394 1394 1394 1394 1393 ...
$ M1SL : num [1:516] 1394 1394 1394 1396 1398 ...
$ M2SL : num [1:516] 7696 7696 7696 7701 7705 ...
$ T10Y2Y : num [1:516] 1.42 1.42 1.42 1.53 1.41 ...
$ DGS10 : num [1:516] 3.83 3.83 3.83 3.85 3.86 ...
$ BAMLH0A0HYM2 : num [1:516] 6.72 6.72 6.72 6.76 6.73 ...
$ TB3MS : num [1:516] 1.73 1.73 1.73 1.76 1.79 ...
$ T10YIE : num [1:516] 2.31 2.31 2.31 2.37 2.41 ...
$ TEDRATE : num [1:516] 1.33 1.33 1.33 1.09 0.91 ...
$ GFDEGDQ188S : num [1:516] 64.1 64.1 65.3 65.5 65.8 ...
$ T5YIFR : num [1:516] 2.33 2.33 2.33 2.43 2.48 ...
$ T10Y3M : num [1:516] 2.36 2.36 2.36 2.21 2.04 ...
$ DGS1 : num [1:516] 1.96 1.96 1.96 1.94 2.07 ...
$ USSLIND : num [1:516] 0.02 0.02 0.02 -0.01 -0.04 -0.07 -0.1 -0.162 -0.224 -0.286 ...
$ BAMLC0A4CBBB : num [1:516] 2.98 2.98 2.98 2.96 2.94 ...
$ GFDEBTN : num [1:516] 9492006 9492006 9675128 9713972 9752816 ...
$ DGS2 : num [1:516] 2.42 2.42 2.42 2.32 2.44 ...
$ GS10 : num [1:516] 3.88 3.88 3.88 3.93 3.99 ...
$ DGS5 : num [1:516] 3.12 3.12 3.12 3.07 3.12 ...
$ DGS30 : num [1:516] 4.53 4.53 4.53 4.57 4.58 ...
$ TREAST : num [1:516] 536714 536714 536714 536714 515656 ...
$ BAA10Y : num [1:516] 3 3 3 3.04 3.06 ...
$ BAMLC0A0CM : num [1:516] 2.55 2.55 2.55 2.52 2.51 ...
$ BAMLH0A3HYC : num [1:516] 10.6 10.6 10.6 10.6 10.6 ...
$ FYFSD : num [1:516] -458553 -458553 -458553 -458553 -458553 ...
$ DGS1MO : num [1:516] 1.24 1.24 1.24 1.52 1.83 ...
$ T5YIE : num [1:516] 2.29 2.29 2.29 2.3 2.34 ...
$ FutureSPCS20RSA : num 170
> new <- head(new)
> print(new)
$`date`
[1] "2008-01-01" "2008-04-01" "2008-05-04" "2008-05-11" "2008-05-18" "2008-05-25" "2008-06-01" "2008-06-08"
[9] "2008-06-15" "2008-06-22" "2008-06-29" "2008-07-06" "2008-07-13" "2008-07-20" "2008-07-27" "2008-08-03"
[17] "2008-08-10" "2008-08-17" "2008-08-24" "2008-08-31" "2008-09-07" "2008-09-14" "2008-09-21" "2008-09-28"
[25] "2008-10-05" "2008-10-12" "2008-10-19" "2008-10-26" "2008-11-02" "2008-11-09" "2008-11-16" "2008-11-23"
[33] "2008-11-30" "2008-12-07" "2008-12-14" "2008-12-21" "2008-12-28" "2009-01-04" "2009-01-11" "2009-01-18"
[41] "2009-01-25" "2009-02-01" "2009-02-08" "2009-02-15" "2009-02-22" "2009-03-01" "2009-03-08" "2009-03-15"
[49] "2009-03-22" "2009-03-29" "2009-04-05" "2009-04-12" "2009-04-19" "2009-04-26" "2009-05-03" "2009-05-10"
[57] "2009-05-17" "2009-05-24" "2009-05-31" "2009-06-07" "2009-06-14" "2009-06-21" "2009-06-28" "2009-07-05"
[65] "2009-07-12" "2009-07-19" "2009-07-26" "2009-08-02" "2009-08-09" "2009-08-16" "2009-08-23" "2009-08-30"
[73] "2009-09-06" "2009-09-13" "2009-09-20" "2009-09-27" "2009-10-04" "2009-10-11" "2009-10-18" "2009-10-25"
[81] "2009-11-01" "2009-11-08" "2009-11-15" "2009-11-22" "2009-11-29" "2009-12-06" "2009-12-13" "2009-12-20"
[89] "2009-12-27" "2010-01-03" "2010-01-10" "2010-01-17" "2010-01-24" "2010-01-31" "2010-02-07" "2010-02-14"
[97] "2010-02-21" "2010-02-28" "2010-03-07" "2010-03-14" "2010-03-21" "2010-03-28" "2010-04-04" "2010-04-11"
[105] "2010-04-18" "2010-04-25" "2010-05-02" "2010-05-09" "2010-05-16" "2010-05-23" "2010-05-30" "2010-06-06"
[113] "2010-06-13" "2010-06-20" "2010-06-27" "2010-07-04" "2010-07-11" "2010-07-18" "2010-07-25" "2010-08-01"
[121] "2010-08-08" "2010-08-15" "2010-08-22" "2010-08-29" "2010-09-05" "2010-09-12" "2010-09-19" "2010-09-26"
[129] "2010-10-03" "2010-10-10" "2010-10-17" "2010-10-24" "2010-10-31" "2010-11-07" "2010-11-14" "2010-11-21"
[137] "2010-11-28" "2010-12-05" "2010-12-12" "2010-12-19" "2010-12-26" "2011-01-02" "2011-01-09" "2011-01-16"
[145] "2011-01-23" "2011-01-30" "2011-02-06" "2011-02-13" "2011-02-20" "2011-02-27" "2011-03-06" "2011-03-13"
[153] "2011-03-20" "2011-03-27" "2011-04-03" "2011-04-10" "2011-04-17" "2011-04-24" "2011-05-01" "2011-05-08"
[161] "2011-05-15" "2011-05-22" "2011-05-29" "2011-06-05" "2011-06-12" "2011-06-19" "2011-06-26" "2011-07-03"
[169] "2011-07-10" "2011-07-17" "2011-07-24" "2011-07-31" "2011-08-07" "2011-08-14" "2011-08-21" "2011-08-28"
[177] "2011-09-04" "2011-09-11" "2011-09-18" "2011-09-25" "2011-10-02" "2011-10-09" "2011-10-16" "2011-10-23"
[185] "2011-10-30" "2011-11-06" "2011-11-13" "2011-11-20" "2011-11-27" "2011-12-04" "2011-12-11" "2011-12-18"
[193] "2011-12-25" "2012-01-01" "2012-01-08" "2012-01-15" "2012-01-22" "2012-01-29" "2012-02-05" "2012-02-12"
[201] "2012-02-19" "2012-02-26" "2012-03-04" "2012-03-11" "2012-03-18" "2012-03-25" "2012-04-01" "2012-04-08"
[209] "2012-04-15" "2012-04-22" "2012-04-29" "2012-05-06" "2012-05-13" "2012-05-20" "2012-05-27" "2012-06-03"
[217] "2012-06-10" "2012-06-17" "2012-06-24" "2012-07-01" "2012-07-08" "2012-07-15" "2012-07-22" "2012-07-29"
[225] "2012-08-05" "2012-08-12" "2012-08-19" "2012-08-26" "2012-09-02" "2012-09-09" "2012-09-16" "2012-09-23"
[233] "2012-09-30" "2012-10-07" "2012-10-14" "2012-10-21" "2012-10-28" "2012-11-04" "2012-11-11" "2012-11-18"
[241] "2012-11-25" "2012-12-02" "2012-12-09" "2012-12-16" "2012-12-23" "2012-12-30" "2013-01-06" "2013-01-13"
[249] "2013-01-20" "2013-01-27" "2013-02-03" "2013-02-10" "2013-02-17" "2013-02-24" "2013-03-03" "2013-03-10"
[257] "2013-03-17" "2013-03-24" "2013-03-31" "2013-04-07" "2013-04-14" "2013-04-21" "2013-04-28" "2013-05-05"
[265] "2013-05-12" "2013-05-19" "2013-05-26" "2013-06-02" "2013-06-09" "2013-06-16" "2013-06-23" "2013-06-30"
[273] "2013-07-07" "2013-07-14" "2013-07-21" "2013-07-28" "2013-08-04" "2013-08-11" "2013-08-18" "2013-08-25"
[281] "2013-09-01" "2013-09-08" "2013-09-15" "2013-09-22" "2013-09-29" "2013-10-06" "2013-10-13" "2013-10-20"
[289] "2013-10-27" "2013-11-03" "2013-11-10" "2013-11-17" "2013-11-24" "2013-12-01" "2013-12-08" "2013-12-15"
[297] "2013-12-22" "2013-12-29" "2014-01-05" "2014-01-12" "2014-01-19" "2014-01-26" "2014-02-02" "2014-02-09"
[305] "2014-02-16" "2014-02-23" "2014-03-02" "2014-03-09" "2014-03-16" "2014-03-23" "2014-03-30" "2014-04-06"
[313] "2014-04-13" "2014-04-20" "2014-04-27" "2014-05-04" "2014-05-11" "2014-05-18" "2014-05-25" "2014-06-01"
[321] "2014-06-08" "2014-06-15" "2014-06-22" "2014-06-29" "2014-07-06" "2014-07-13" "2014-07-20" "2014-07-27"
[329] "2014-08-03" "2014-08-10" "2014-08-17" "2014-08-24" "2014-08-31" "2014-09-07" "2014-09-14" "2014-09-21"
[337] "2014-09-28" "2014-10-05" "2014-10-12" "2014-10-19" "2014-10-26" "2014-11-02" "2014-11-09" "2014-11-16"
[345] "2014-11-23" "2014-11-30" "2014-12-07" "2014-12-14" "2014-12-21" "2014-12-28" "2015-01-04" "2015-01-11"
[353] "2015-01-18" "2015-01-25" "2015-02-01" "2015-02-08" "2015-02-15" "2015-02-22" "2015-03-01" "2015-03-08"
[361] "2015-03-15" "2015-03-22" "2015-03-29" "2015-04-05" "2015-04-12" "2015-04-19" "2015-04-26" "2015-05-03"
[369] "2015-05-10" "2015-05-17" "2015-05-24" "2015-05-31" "2015-06-07" "2015-06-14" "2015-06-21" "2015-06-28"
[377] "2015-07-05" "2015-07-12" "2015-07-19" "2015-07-26" "2015-08-02" "2015-08-09" "2015-08-16" "2015-08-23"
[385] "2015-08-30" "2015-09-06" "2015-09-13" "2015-09-20" "2015-09-27" "2015-10-04" "2015-10-11" "2015-10-18"
[393] "2015-10-25" "2015-11-01" "2015-11-08" "2015-11-15" "2015-11-22" "2015-11-29" "2015-12-06" "2015-12-13"
[401] "2015-12-20" "2015-12-27" "2016-01-03" "2016-01-10" "2016-01-17" "2016-01-24" "2016-01-31" "2016-02-07"
[409] "2016-02-14" "2016-02-21" "2016-02-28" "2016-03-06" "2016-03-13" "2016-03-20" "2016-03-27" "2016-04-03"
[417] "2016-04-10" "2016-04-17" "2016-04-24" "2016-05-01" "2016-05-08" "2016-05-15" "2016-05-22" "2016-05-29"
[425] "2016-06-05" "2016-06-12" "2016-06-19" "2016-06-26" "2016-07-03" "2016-07-10" "2016-07-17" "2016-07-24"
[433] "2016-07-31" "2016-08-07" "2016-08-14" "2016-08-21" "2016-08-28" "2016-09-04" "2016-09-11" "2016-09-18"
[441] "2016-09-25" "2016-10-02" "2016-10-09" "2016-10-16" "2016-10-23" "2016-10-30" "2016-11-06" "2016-11-13"
[449] "2016-11-20" "2016-11-27" "2016-12-04" "2016-12-11" "2016-12-18" "2016-12-25" "2017-01-01" "2017-01-08"
[457] "2017-01-15" "2017-01-22" "2017-01-29" "2017-02-05" "2017-02-12" "2017-02-19" "2017-02-26" "2017-03-05"
[465] "2017-03-12" "2017-03-19" "2017-03-26" "2017-04-02" "2017-04-09" "2017-04-16" "2017-04-23" "2017-04-30"
[473] "2017-05-07" "2017-05-14" "2017-05-21" "2017-05-28" "2017-06-04" "2017-06-11" "2017-06-18" "2017-06-25"
[481] "2017-07-02" "2017-07-09" "2017-07-16" "2017-07-23" "2017-07-30" "2017-08-06" "2017-08-13" "2017-08-20"
[489] "2017-08-27" "2017-09-03" "2017-09-10" "2017-09-17" "2017-09-24" "2017-10-01" "2017-10-08" "2017-10-15"
[497] "2017-10-22" "2017-10-29" "2017-11-05" "2017-11-12" "2017-11-19" "2017-11-26" "2017-12-03" "2017-12-10"
[505] "2017-12-17" "2017-12-24" "2017-12-31" "2018-01-07" "2018-01-14" "2018-01-21" "2018-01-28" "2018-02-04"
[513] "2018-02-11" "2018-02-18" "2018-02-25" "2018-03-01"
$CPIAUCSL
[1] 215.2080 215.2080 215.2080 215.7717 216.3355 216.8992 217.4630 217.7736 218.0842 218.3948 218.7054
[12] 219.0160 218.9345 218.8530 218.7715 218.6900 218.7274 218.7648 218.8022 218.8396 218.8770 218.4065
[23] 217.9360 217.4655 216.9950 216.0345 215.0740 214.1135 213.1530 212.8020 212.4510 212.1000 211.7490
[34] 211.3980 211.5317 211.6655 211.7993 211.9330 212.1260 212.3190 212.5120 212.7050 212.6525 212.6000
[45] 212.5475 212.4950 212.5378 212.5806 212.6234 212.6662 212.7090 212.7873 212.8655 212.9437 213.0220
[56] 213.3756 213.7292 214.0828 214.4364 214.7900 214.7740 214.7580 214.7420 214.7260 214.9058 215.0855
[67] 215.2652 215.4450 215.5282 215.6114 215.6946 215.7778 215.8610 216.0230 216.1850 216.3470 216.5090
[78] 216.6902 216.8715 217.0528 217.2340 217.2566 217.2792 217.3018 217.3244 217.3470 217.3822 217.4175
[89] 217.4528 217.4880 217.4466 217.4052 217.3638 217.3224 217.2810 217.2990 217.3170 217.3350 217.3530
Solution
for (i in parsedList)
{
past = lag(zoo(c(new[[i]])), c(-1,-2, -3, -4, -5), na.pad =TRUE)
print(temp)
}

The Error: unexpected '}' in "}" error comes from a syntax error. You have one superfluous opening parenthesis in front of c()
a=1
for (i in parsedList)
{
#doesn't work in a loop
past = lag(c(new[a]), c(-1,-2, -3, -4, -5), na.pad =TRUE)
a=a+1
}
I hope you defined 'new' before, else you get error:
Error in new[a] : object of type 'closure' is not subsettable
(because it is a function used for object oriented programming, to create new objects).

Related

Error in stepAIC() model building function in R

I have built a logistic regression model with the dependent variable WinParty, which outputs fine. Then when trying to do variable selection with stepAIC I keep getting this error
Data Structure
tibble [2,467 × 25] (S3: tbl_df/tbl/data.frame)
$ PollingPlace : chr [1:2467] "Abbotsbury" "Abbotsford" "Abbotsford East" "Aberdare" ...
$ CoalitionVotes : int [1:2467] 9438 15548 3960 3164 2370 4524 3186 10710 372 5993 ...
$ VoteDifference : num [1:2467] 0.1397 -0.0579 0.0796 -0.2454 0.2623 ...
$ Liberal.National.Coalition.Percentage: num [1:2467] 57 47.1 54 37.7 63.1 ...
$ WinParty : num [1:2467] 1 0 1 0 1 0 0 0 1 0 ...
$ Median_age_persons : num [1:2467] 43 46 41.5 37 41 31 37 36 57.5 41 ...
$ Median_mortgage_repay_monthly : num [1:2467] 2232 3000 2831 1452 1559 ...
$ Median_tot_prsnl_inc_weekly : num [1:2467] 818 1262 1380 627 719 ...
$ Median_rent_weekly : num [1:2467] 550 595 576 310 290 ...
$ Median_tot_fam_inc_weekly : num [1:2467] 2541 3062 3126 1521 2021 ...
$ Average_household_size : num [1:2467] 3.27 2.35 2.28 2.46 2.38 ...
$ Indig_Percent : num [1:2467] 0 0 1.09 10.94 10.61 ...
$ BirthPlace_Aus : num [1:2467] 60.9 67.9 61.7 90.9 89 ...
$ Other_lang_Percen : num [1:2467] 44.97 25.85 28.71 2.58 2.45 ...
$ Aus_Cit_Percent : num [1:2467] 91.5 91.5 86.6 93.7 91.9 ...
$ Yr12_Comp_Percent : num [1:2467] 49.7 57.1 62.7 25 23.1 ...
$ Pop_Density_SQKM : num [1:2467] 2849 6112 7951 1686 334 ...
$ Industrial_Percent : num [1:2467] 6.24 3.95 4.69 8.3 15.31 ...
$ Population_Serving_Percent : num [1:2467] 16 12.9 15.1 16.1 13.6 ...
$ Health_Education_Percent : num [1:2467] 9.26 11.43 10.28 9.07 7.79 ...
$ Knowledge_Intensive_Percent : num [1:2467] 11.31 19.64 17.06 7.44 6.56 ...
$ Over60_Yr : num [1:2467] 25.1 31.6 24.9 20.6 25.3 ...
$ GenZ : num [1:2467] 24.5 20 25.9 26.2 23.6 ...
$ GenX : num [1:2467] 27 29.1 26.6 25.8 26.1 ...
$ Millenials : num [1:2467] 23.3 20.3 19.7 27.3 27.1 ...
- attr(*, "na.action")= 'omit' Named int [1:8] 264 647 843 1332 1774 2033 2077 2138
..- attr(*, "names")= chr [1:8] "264" "647" "843" "1332" ...
The glm function computes the logistic regression with no errors
mod1 <- glm(WinParty~Median_age_persons+Median_rent_weekly+
Median_tot_fam_inc_weekly+Indig_Percent+BirthPlace_Aus+
Other_lang_Percen+Aus_Cit_Percent+Yr12_Comp_Percent+
Industrial_Percent+Population_Serving_Percent+Health_Education_Percent+
Knowledge_Intensive_Percent+Over60_Yr+GenZ+GenX+Millenials,
family = binomial(link = "logit"), data = GS_PP_Agg)
summary(mod1)
step1 <- stepAIC(mod1, scope = list(lower = "~1",upper = "~Median_age_persons+Median_rent_weekly+
Median_tot_fam_inc_weekly+Indig_Percent+BirthPlace_Aus+
Other_lang_Percen+Aus_Cit_Percent+Yr12_Comp_Percent+
Industrial_Percent+Population_Serving_Percent+Health_Education_Percent+
Knowledge_Intensive_Percent+Over60_Yr+GenZ+GenX+Millenials"), data = GS_PP_Agg)
Step AIC function returns the error:
"Error in FUN(left, right) : non-numeric argument to binary operator"
Some help in solving this error would be greatly appreciated!

Summary of a Subset in R does not work - Why?

I am doing the Analytics Edge course on EdX and ran into this problem. We have a dataset which we are subsetting. Running a Str on the subset works as intended, however trying summary on the same subset throws an error. Can someone explain why?
> str(WHO_Europe)
'data.frame': 53 obs. of 13 variables:
$ Country : Factor w/ 194 levels "Afghanistan",..: 2 4 8 10 11 16 17 22 26 42 ...
$ Region : Factor w/ 6 levels "Africa","Americas",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Population : int 3162 78 2969 8464 9309 9405 11060 3834 7278 4307 ...
$ Under15 : num 21.3 15.2 20.3 14.5 22.2 ...
$ Over60 : num 14.93 22.86 14.06 23.52 8.24 ...
$ FertilityRate : num 1.75 NA 1.74 1.44 1.96 1.47 1.85 1.26 1.51 1.48 ...
$ LifeExpectancy : int 74 82 71 81 71 71 80 76 74 77 ...
$ ChildMortality : num 16.7 3.2 16.4 4 35.2 5.2 4.2 6.7 12.1 4.7 ...
$ CellularSubscribers : num 96.4 75.5 103.6 154.8 108.8 ...
$ LiteracyRate : num NA NA 99.6 NA NA NA NA 97.9 NA 98.8 ...
$ GNI : num 8820 NA 6100 42050 8960 ...
$ PrimarySchoolEnrollmentMale : num NA 78.4 NA NA 85.3 NA 98.9 86.5 99.3 94.8 ...
$ PrimarySchoolEnrollmentFemale: num NA 79.4 NA NA 84.1 NA 99.2 88.4 99.7 97 ...
> Summary(WHO_Europe)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘Summary’ for signature ‘"data.frame"’
> write.csv(WHO_Europe,"WHO_Europe.CSV")
> Summary(WHO_Europe)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘Summary’ for signature ‘"data.frame"’

Faster way to add new columns row-by-row onto a large dataframe

I have the following data frame.
> str(df)
'data.frame': 98444 obs. of 25 variables:
$ count : int 361 362 363 364 365 366 367 368 369 370 ...
$ time : num 3.01 3.02 3.02 3.03 3.04 ...
$ H_Rx : num -164 -164 -164 -164 -164 ...
$ H_Ry : num -10.7 -10.7 -10.7 -10.7 -10.7 ...
$ H_Rz : num -174 -174 -174 -174 -174 ...
$ H_Tx : num -0.00137 -0.00137 -0.00136 -0.00135 -0.00134 ...
$ H_Ty : num 1.67 1.67 1.67 1.67 1.67 ...
$ H_Tz : num -0.194 -0.194 -0.194 -0.194 -0.194 ...
$ C_Rx : num -13.4 -13.4 -13.5 -13.5 -13.6 ...
$ C_Ry : num -14.7 -14.6 -14.5 -14.4 -14.4 ...
$ C_Rz : num 7.7 7.69 7.69 7.68 7.67 ...
$ C_Tx : num 0.00914 0.00914 0.00914 0.00914 0.00914 ...
$ C_Ty : num 1.21 1.21 1.21 1.21 1.21 ...
$ C_Tz : num -0.0466 -0.0466 -0.0466 -0.0465 -0.0465 ...
$ D_Rx : num -32.6 -32.6 -32.6 -32.6 -32.6 ...
$ D_Ry : num -49 -49 -49 -49 -49 ...
$ D_Rz : num 1.91 1.91 1.91 1.92 1.92 ...
$ D_Tx : num -0.0403 -0.0403 -0.0403 -0.0402 -0.0402 ...
$ D_Ty : num 1.63 1.63 1.63 1.63 1.63 ...
$ D_Tz : num 0.0214 0.0214 0.0214 0.0214 0.0215 ...
$ part : chr "P2" "P2" "P2" "P2" ...
$ freq : chr "100Hz" "100Hz" "100Hz" "100Hz" ...
$ device : chr "A1" "A1" "A1" "A1" ...
$ act : chr "Nod" "Nod" "Nod" "Nod" ...
$ trial : chr "Rest" "Rest" "Rest" "Rest" ...
- attr(*, "na.action")=Class 'omit' Named int [1:133] 469 470 471 472 473 474 475 476 477 478 ...
.. ..- attr(*, "names")= chr [1:133] "469" "470" "471" "472" ...
And I also have a list of matrices.
> str(listofmatrix)
List of 98444
$ : num [1:4, 1] 0.0807 0.0165 -0.2062 1
$ : num [1:4, 1] 0.0807 0.0165 -0.2062 1
[list output truncated]
I extracted first three elements from each matrix in listofmatrix, placing them onto new columns in df, using a for-loop:
for (i in 1:nrow(df)) {
df$D_Txnew[i] <- listofmatrix[[i]][1, 1]
df$D_Tynew[i] <- listofmatrix[[i]][2, 1]
df$D_Tznew[i] <- listofmatrix[[i]][3, 1]
}
It worked as intended, but the processing speed was less than desirable.
What are the different approaches to speed things up?

Instead of assigning row by row, one option would be to extract the first elements from each of the matrices in 'listofmatrix' (as it have only a single column) to returns a list of vectors, rbind it and assign the output to new columns in 'df'.
df1[paste0("D_T", c("xnew", "ynew", "znew"))] <- do.call(rbind,
lapply(listofmatrix, `[`, 1:3))
By running the OP's code on 'df'
identical(df, df1)
#[1] TRUE
Benchmarks
Here are some benchmarks on a slightly bigger dataset
set.seed(142)
listofmatrix <- lapply(1:1e4, function(i) matrix(rnorm(4), ncol=1))
df <- data.frame(count = 1:1e4, act= sample(LETTERS, 1e4, replace=TRUE))
df1 <- df
system.time({
for (i in 1:nrow(df)) {
df$D_Txnew[i] <- listofmatrix[[i]][1, 1]
df$D_Tynew[i] <- listofmatrix[[i]][2, 1]
df$D_Tznew[i] <- listofmatrix[[i]][3, 1]
}
})
#user system elapsed
# 1.94 0.00 1.94
system.time({
df1[paste0("D_T", c("xnew", "ynew", "znew"))] <- do.call(rbind,
lapply(listofmatrix, `[`, 1:3))
})
# user system elapsed
# 0.02 0.00 0.02
data
set.seed(24)
listofmatrix <- lapply(1:5, function(i) matrix(rnorm(4), ncol=1))
df <- data.frame(count = 1:5, act= LETTERS[1:5])
df1 <- df

r quantregForest() error: NA's produced by integer overflow lead to an invalid argument in the rep() function

I am trying to use the quantregForest() function from the quantregForest package (which is built on the randomForest package.)
I tried to train the model using:
qrf_model <- quantregForest(x=Xtrain, y=Ytrain, importance=TRUE, ntree=10)
and I get the following error message (even after reducing the number of trees from 100 to 10):
Error in rep(0, nobs * nobs * npred) : invalid 'times' argument
plus a warning:
In nobs * nobs * npred : NAs produced by integer overflow
The data frame Xtrain has 38 numeric variables, and it looks like this:
> str(Xtrain)
'data.frame': 31132 obs. of 38 variables:
$ X1 : num 301306 6431 2293 1264 32477 ...
$ X2 : num 173.2 143.5 43.4 180.6 1006.2 ...
$ X3 : num 0.1598 0.1615 0.1336 0.0953 0.1988 ...
$ X4 : num 0.662 0.25 0.71 0.709 0.671 ...
$ X5 : num 0.05873 0.0142 0 0.00154 0.09517 ...
$ X6 : num 0.01598 0 0.0023 0.00154 0.01634 ...
$ X7 : num 0.07984 0.03001 0.00845 0.04304 0.09326 ...
$ X8 : num 0.92 0.97 0.992 0.957 0.907 ...
$ X9 : num 105208 1842 830 504 11553 ...
$ X10: num 69974 1212 611 352 7080 ...
$ X11: num 0.505 0.422 0.55 0.553 0.474 ...
$ X12: num 0.488 0.401 0.536 0.541 0.45 ...
$ X13: num 0.333 0.419 0.257 0.282 0.359 ...
$ X14: num 0.187 0.234 0.172 0.207 0.234 ...
$ X15: num 0.369 0.216 0.483 0.412 0.357 ...
$ X16: num 0.0765 0.1205 0.0262 0.054 0.0624 ...
$ X17: num 2954 77 12 10 739 ...
$ X18: num 2770 43 9 21 433 119 177 122 20 17 ...
$ X19: num 3167 72 49 25 622 ...
$ X20: num 3541 57 14 24 656 ...
$ X21: num 3361 82 0 33 514 ...
$ X22: num 3929 27 10 48 682 ...
$ X23: num 3695 73 61 15 643 ...
$ X24: num 4781 52 5 14 680 ...
$ X25: num 3679 103 5 23 404 ...
$ X26: num 7716 120 55 40 895 ...
$ X27: num 11043 195 72 48 1280 ...
$ X28: num 16080 332 160 83 1684 ...
$ X29: num 12312 125 124 62 1015 ...
$ X30: num 8218 99 36 22 577 ...
$ X31: num 9957 223 146 26 532 ...
$ X32: num 0.751 0.444 0.621 0.527 0.682 ...
$ X33: num 0.01873 0 0 0.00317 0.02112 ...
$ X34: num 0.563 0.372 0.571 0.626 0.323 ...
$ X35: num 0.366 0.39 0.156 0.248 0.549 ...
$ X36: num 0.435 0.643 0.374 0.505 0.36 ...
$ X37: num 0.526 0.31 0.577 0.441 0.591 ...
$ X38: num 0.00163 0 0 0 0.00155 0.00103 0 0 0 0 ...
And the response variable Ytrain looks like this:
> str(Ytrain)
num [1:31132] 2605 56 8 16 214 ...
I checked that neither Xtrain or Ytrain contain any NA's by:
> sum(is.na(Xtrain))
[1] 0
> sum(is.na(Ytrain))
[1] 0
I am assuming that the error message for the invalid "times" argument for the rep(0, nobs * nobs * npred)) function comes from the NA value assigned to the product nobs * nobs * npred due to an integer overflow.
What I do not understand is where the integer overflow comes from. None of my variables are of the integer class so what am I missing?

I examined the source code for the quantregForest() function and the source code for the method predict.imp called by the quantregForest() function.
I found that nobs stands for the number of observations. In the case above nobs =length(Ytrain) = 31132 . The variable npred stands for the number of predictors. It is given by npred = ncol(Xtrain)=38. Both npred and nobs are of class integer, and
npred*npred*nobs = 31132*31132*38 = 36829654112.
And herein lies the root cause of the error, since:
npred*npred*nobs = 36829654112 > 2147483647,
where 2147483647 is the maximal integer value in R. Hence the integer overflow warning and the replacement of the product npred*npred*nobs with an NA.
The bottom line is, in order to avoid the error message I will have to use quite a bit fewer observations when training the model or set importance=FALSE in the quantregForest() function argument. The computations required to find variable importance are very memory intensive, even when using less then 10000 observations.

Unexpected filled with color timeserie using ggplot

I'm a beginner at ggplot, and I tried to use it to draw some timeserie data.
I want to draw bound_transporter_in_evolution.mean as a function of time, in different conditions where the attribute p_off (float) varies.
p4 <- ggplot(data=df, aes(x=timesteps.mean)) +
geom_line(aes(y=bound_transporter_in_evolution.mean, color=p_off)) +
xlab(label="Time (s)") +
ylab(label="Number of bound 'in' transporters")
ggsave("p4.pdf", width=8, height=3.3)
I get the following plot:
I expected this result, but with a line instead of points:
Thank you

since p_off is a numeric variable, ggplot will create only one line connecting all the dots and color it along the values. If you want separated lines, you have do transform your colouring variable into a factor(assuming you have a limited number of different values). Let's take an example with a numeric color variable:
df=data.frame(x=c(1:5, 1:5), y=rnorm(10), z=c(1,1,1,1,1,2,2,2,2,2))
ggplot(data=df, aes(x=x)) + geom_line(aes(x=x, y=y, color=z))
Which doesn't make any sense since consecutive points come from different categories. And now turn it into a factor:
ggplot(data=df, aes(x=x)) + geom_line(aes(x=x, y=y, color=factor(z)))
In your first graph, the line constantly goes from one p_off value to another, and since you have a really big dataset it quickly saturates the screen.

Here is the output of str(df):
'data.frame': 150010 obs. of 34 variables:
$ bound_transporter_evolution.low : num [1:150010(1d)] 0 11.4 26.1 41.8 48.2 ...
$ bound_transporter_evolution.mean : num [1:150010(1d)] 0 15 28.2 45 53.8 63.8 71.6 77.8 86.2 91.2 ...
$ bound_transporter_evolution.up : num [1:150010(1d)] 0 18.6 30.3 48.2 59.4 ...
$ bound_transporter_in_evolution.low : num [1:150010(1d)] 0 11.4 26.1 41.8 48.2 ...
$ bound_transporter_in_evolution.mean : num [1:150010(1d)] 0 15 28.2 45 53.8 63.8 71.6 77.8 86.2 91.2 ...
$ bound_transporter_in_evolution.up : num [1:150010(1d)] 0 18.6 30.3 48.2 59.4 ...
$ bound_transporter_out_evolution.low : num [1:150010(1d)] 0 0 0 0 0 0 0 0 0 0 ...
$ bound_transporter_out_evolution.mean: num [1:150010(1d)] 0 0 0 0 0 0 0 0 0 0 ...
$ bound_transporter_out_evolution.up : num [1:150010(1d)] 0 0 0 0 0 0 0 0 0 0 ...
$ free_transporter_evolution.low : num [1:150010(1d)] 200 181 170 152 141 ...
$ free_transporter_evolution.mean : num [1:150010(1d)] 200 185 172 155 146 ...
$ free_transporter_evolution.up : num [1:150010(1d)] 200 189 174 158 152 ...
$ free_transporter_in_evolution.low : num [1:150010(1d)] 186 172 158 139 127 ...
$ free_transporter_in_evolution.mean : num [1:150010(1d)] 188 173 160 143 135 ...
$ free_transporter_in_evolution.up : num [1:150010(1d)] 191 175 162 148 142 ...
$ free_transporter_out_evolution.low : num [1:150010(1d)] 9.18 9.18 9.18 9.18 9.18 ...
$ free_transporter_out_evolution.mean : num [1:150010(1d)] 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 ...
$ free_transporter_out_evolution.up : num [1:150010(1d)] 14 14 14 14 14 ...
$ glutamate_evolution.low : num [1:150010(1d)] 2000 1981 1970 1951 1939 ...
$ glutamate_evolution.mean : num [1:150010(1d)] 2000 1985 1971 1954 1943 ...
$ glutamate_evolution.up : num [1:150010(1d)] 2000 1989 1973 1957 1948 ...
$ p_off : num [1:150010(1d)] 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 ...
$ simulation_name : Factor w/ 1 level "Variable p-off large diffusion-limited area": 1 1 1 1 1 1 1 1 1 1 ...
$ timesteps.low : num [1:150010(1d)] 0e+00 1e-06 2e-06 3e-06 4e-06 5e-06 6e-06 7e-06 8e-06 9e-06 ...
$ timesteps.mean : num [1:150010(1d)] 0e+00 1e-06 2e-06 3e-06 4e-06 5e-06 6e-06 7e-06 8e-06 9e-06 ...
$ timesteps.up : num [1:150010(1d)] 0e+00 1e-06 2e-06 3e-06 4e-06 5e-06 6e-06 7e-06 8e-06 9e-06 ...
$ transporter_in_evolution.low : num [1:150010(1d)] 186 186 186 186 186 ...
$ transporter_in_evolution.mean : num [1:150010(1d)] 188 188 188 188 188 ...
$ transporter_in_evolution.up : num [1:150010(1d)] 191 191 191 191 191 ...
$ transporter_out_evolution.low : num [1:150010(1d)] 9.18 9.18 9.18 9.18 9.18 ...
$ transporter_out_evolution.mean : num [1:150010(1d)] 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 ...
$ transporter_out_evolution.up : num [1:150010(1d)] 14 14 14 14 14 ...
$ variable_parameter : Factor w/ 1 level "p_off": 1 1 1 1 1 1 1 1 1 1 ...
$ variable_value : num [1:150010(1d)] 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 ...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R lag doesn't work inside a loop with counter variable? - r

Related

Error in stepAIC() model building function in R

Summary of a Subset in R does not work - Why?

Faster way to add new columns row-by-row onto a large dataframe

r quantregForest() error: NA's produced by integer overflow lead to an invalid argument in the rep() function

Unexpected filled with color timeserie using ggplot

Categories

Resources