Including lagged independent variables - R - r

I would like to run a regression where I use both the current value and lagged values from a specific independent variable.
My dataset
This is an example extract from my dataset:
dt nrOfCalls nrOfOrders nrOfOrdersLag1 nrOfOrdersLag2 nrOfOrdersLag3
2016/04/20 17 5 9 7 12
2016/04/21 12 8 5 9 7
2016/04/22 14 4 8 5 9
2016/04/23 15 6 4 8 5
2016/04/24 20 14 6 4 8
2016/04/25 10 3 14 6 4
Where NrOfOrdersLagX implies the number of orders X days ago. I have also included dummy variables (because of limited space I have included these dummy variables in the example extract of my dataset).
My code
When I run the following code everything works perfectly fine:
reg <- lm(nrOfCalls ~ dummy1+...+dummy6+nrOfOrders, data=trainingSet)
However, when I try including the lagged values of the nrOfOrders regressor (for this example I only include one lagged value), I get some inordinary results. I use the following code:
reg <- lm(nrOfCalls ~ dummy1+...+dummy6+nrOfOrders+nrOfOrdersLag1, data=trainingSet)
Instead of merely including the regressor nrOfOrdersLag1, it will include all kinds of regressors which variable names are a variation on nrOfOrdersLag1.
Call:
lm(formula = nrOfCalls ~ dummy1 + dummy2 + dummy3 + dummy4 +
dummy5 + dummy6 + nrOfOrders + nrOfOrdersLag1, data = trainCall)
Coefficients:
(Intercept) dummy1 dummy2 dummy3 dummy4
604.06334 -114.03241 -229.67540 -270.62292 -220.12409
dummy5 dummy6 nrOfOrders nrOfOrdersLag110707 nrOfOrdersLag11161
-457.22245 -465.17116 0.01729 -249.54641 -10.98526
nrOfOrdersLag111869 nrOfOrdersLag11207 nrOfOrdersLag11234 nrOfOrdersLag11262 nrOfOrdersLag11267
45.36821 33.46161 -17.70615 -384.09745 -413.64804
nrOfOrdersLag11279 nrOfOrdersLag11285 nrOfOrdersLag112945 nrOfOrdersLag11336 nrOfOrdersLag11348
-200.19660 32.75546 -264.04005 -47.13457 79.48368
nrOfOrdersLag11351 nrOfOrdersLag11355 nrOfOrdersLag11363 nrOfOrdersLag11364 nrOfOrdersLag11368
-208.62312 6.83426 -98.71679 170.29583 -93.83054
nrOfOrdersLag11375 nrOfOrdersLag11398 nrOfOrdersLag11456 nrOfOrdersLag11462 nrOfOrdersLag11464
50.54960 14.39958 118.73762 113.72744 190.54445
nrOfOrdersLag11469 nrOfOrdersLag114778 nrOfOrdersLag11486 nrOfOrdersLag11489 nrOfOrdersLag11504
-8.79258 84.35041 66.29121 29.67360 24.30553
nrOfOrdersLag11505 nrOfOrdersLag11511 nrOfOrdersLag11520 nrOfOrdersLag11521 nrOfOrdersLag11527
286.85352 69.76762 -159.45588 -38.90402 53.62128
nrOfOrdersLag11538 nrOfOrdersLag11540 nrOfOrdersLag11564 nrOfOrdersLag115674 nrOfOrdersLag11579
-104.66037 -60.10656 -58.32177 522.56810 77.65481
nrOfOrdersLag11587 nrOfOrdersLag11593 nrOfOrdersLag11603 nrOfOrdersLag11618 nrOfOrdersLag11622
34.63649 31.28570 -124.35673 16.43115 207.99435
nrOfOrdersLag11624 nrOfOrdersLag11626 nrOfOrdersLag11629 nrOfOrdersLag11631 nrOfOrdersLag11635
93.90391 78.94275 155.88327 15.32027 125.02409
nrOfOrdersLag11640 nrOfOrdersLag11645 nrOfOrdersLag11649 nrOfOrdersLag11651 nrOfOrdersLag11653
208.51996 -42.03086 -1.62533 164.73045 12.61157
nrOfOrdersLag11654 nrOfOrdersLag11673 nrOfOrdersLag11683 nrOfOrdersLag11688 nrOfOrdersLag11698
129.26306 -41.56615 137.09095 149.86866 -49.43096
nrOfOrdersLag11699 nrOfOrdersLag11702 nrOfOrdersLag11703 nrOfOrdersLag11705 nrOfOrdersLag11714
76.86530 202.69027 -70.26281 -173.43605 170.02302
nrOfOrdersLag11715 nrOfOrdersLag11716 nrOfOrdersLag11726 nrOfOrdersLag11749 nrOfOrdersLag11754
34.30252 75.45378 176.16211 76.39492 58.11995
nrOfOrdersLag11757 nrOfOrdersLag11764 nrOfOrdersLag11766 nrOfOrdersLag11772 nrOfOrdersLag11777
133.71731 137.62373 24.95059 -75.96096 54.03353
nrOfOrdersLag11778 nrOfOrdersLag11782 nrOfOrdersLag11793 nrOfOrdersLag11806 nrOfOrdersLag11810
-147.40657 -45.70752 27.76710 94.17449 -191.98461
nrOfOrdersLag11811 nrOfOrdersLag11812 nrOfOrdersLag11814 nrOfOrdersLag11815 nrOfOrdersLag11817
61.04646 145.25908 38.56959 18.22574 140.84081
nrOfOrdersLag11827 nrOfOrdersLag11832 nrOfOrdersLag11839 nrOfOrdersLag11841 nrOfOrdersLag11859
-254.56931 138.30797 -139.32523 -151.50010 39.27760
nrOfOrdersLag11860 nrOfOrdersLag11862 nrOfOrdersLag11868 nrOfOrdersLag11874 nrOfOrdersLag11876
304.88804 150.84361 30.75749 -91.55666 192.43385
nrOfOrdersLag11879 nrOfOrdersLag11880 nrOfOrdersLag11885 nrOfOrdersLag11887 nrOfOrdersLag11891
118.75260 -44.83615 163.35474 194.12038 127.79107
nrOfOrdersLag11896 nrOfOrdersLag11901 nrOfOrdersLag11914 nrOfOrdersLag11919 nrOfOrdersLag11921
82.79870 179.44324 303.18796 242.51540 159.40652
nrOfOrdersLag11928 nrOfOrdersLag11929 nrOfOrdersLag11932 nrOfOrdersLag11937 nrOfOrdersLag11939
484.73958 35.38640 286.54643 46.88513 48.94031
nrOfOrdersLag11952 nrOfOrdersLag11967 nrOfOrdersLag11988 nrOfOrdersLag11994 nrOfOrdersLag11996
265.02228 170.65576 47.77627 317.10968 383.09702
nrOfOrdersLag119987 nrOfOrdersLag12007 nrOfOrdersLag12010 nrOfOrdersLag12017 nrOfOrdersLag12018
416.71786 93.41540 61.71721 73.68938 136.60641
nrOfOrdersLag12019 nrOfOrdersLag12023 nrOfOrdersLag12027 nrOfOrdersLag12034 nrOfOrdersLag12040
88.13672 -214.93168 38.82154 148.72993 -60.63852
nrOfOrdersLag12050 nrOfOrdersLag12051 nrOfOrdersLag12056 nrOfOrdersLag12058 nrOfOrdersLag12060
205.21811 246.46001 163.20151 -0.35863 61.93024
nrOfOrdersLag12073 nrOfOrdersLag12082 nrOfOrdersLag12087 nrOfOrdersLag12093 nrOfOrdersLag12107
122.50936 -27.13307 -43.74262 366.51938 146.85581
nrOfOrdersLag12119 nrOfOrdersLag12122 nrOfOrdersLag12124 nrOfOrdersLag121319 nrOfOrdersLag12133
119.31341 36.35183 253.68015 115.01838 228.66567
nrOfOrdersLag12136 nrOfOrdersLag12137 nrOfOrdersLag12154 nrOfOrdersLag12167 nrOfOrdersLag12169
-9.97711 121.20416 -448.43096 324.45466 169.37446
nrOfOrdersLag12176 nrOfOrdersLag12180 nrOfOrdersLag12181 nrOfOrdersLag12184 nrOfOrdersLag12186
88.35432 -14.74399 41.03555 310.68640 308.82549
nrOfOrdersLag12189 nrOfOrdersLag12195 nrOfOrdersLag12202 nrOfOrdersLag12204 nrOfOrdersLag12216
121.87542 264.78895 191.52156 281.02113 168.29821
nrOfOrdersLag12219 nrOfOrdersLag12221 nrOfOrdersLag12231 nrOfOrdersLag12236 nrOfOrdersLag12237
218.48030 66.07233 -228.54230 111.06068 162.65347
nrOfOrdersLag12242 nrOfOrdersLag12244 nrOfOrdersLag12246 nrOfOrdersLag12261 nrOfOrdersLag12262
12.05505 114.60872 -123.06406 -45.54485 380.26022
nrOfOrdersLag12268 nrOfOrdersLag12271 nrOfOrdersLag12302 nrOfOrdersLag12304 nrOfOrdersLag12311
4.23556 249.55941 248.38079 103.12194 -71.69000
nrOfOrdersLag12313 nrOfOrdersLag12329 nrOfOrdersLag12345 nrOfOrdersLag12353 nrOfOrdersLag12356
247.93662 207.13958 314.96154 95.08688 300.10247
nrOfOrdersLag12361 nrOfOrdersLag12371 nrOfOrdersLag12376 nrOfOrdersLag12380 nrOfOrdersLag12384
37.27506 -167.84137 66.61313 247.32681 237.73556
nrOfOrdersLag12399 nrOfOrdersLag12406 nrOfOrdersLag12413 nrOfOrdersLag12417 nrOfOrdersLag12420
107.37362 399.28658 275.48695 95.07723 324.87029
nrOfOrdersLag12423 nrOfOrdersLag12434 nrOfOrdersLag12437 nrOfOrdersLag12442 nrOfOrdersLag12446
233.30480 193.45613 250.79606 322.78975 320.40151
nrOfOrdersLag12448 nrOfOrdersLag12449 nrOfOrdersLag12451 nrOfOrdersLag12460 nrOfOrdersLag124708
172.20478 -113.45790 108.52769 305.32173 -134.41931
nrOfOrdersLag12484 nrOfOrdersLag12486 nrOfOrdersLag12493 nrOfOrdersLag12497 nrOfOrdersLag12505
156.35931 -9.49808 223.13247 -67.47891 534.66815
nrOfOrdersLag12541 nrOfOrdersLag12552 nrOfOrdersLag12563 nrOfOrdersLag12588 nrOfOrdersLag12596
221.35464 1.92188 -53.40846 -473.89923 497.69016
nrOfOrdersLag12611 nrOfOrdersLag12618 nrOfOrdersLag12623 nrOfOrdersLag12632 nrOfOrdersLag12638
175.77150 125.22040 -302.58298 -159.54109 -337.04664
nrOfOrdersLag12646 nrOfOrdersLag12648 nrOfOrdersLag12663 nrOfOrdersLag12665 nrOfOrdersLag12687
539.15416 350.53169 -148.22458 147.67351 -349.52567
nrOfOrdersLag12696 nrOfOrdersLag12713 nrOfOrdersLag12721 nrOfOrdersLag12723 nrOfOrdersLag12743
-42.64843 141.90979 47.07766 -443.50878 356.28944
nrOfOrdersLag12745 nrOfOrdersLag12750 nrOfOrdersLag12753 nrOfOrdersLag12761 nrOfOrdersLag127688
14.65720 13.35666 8.30924 -191.17540 -123.52409
nrOfOrdersLag12802 nrOfOrdersLag12806 nrOfOrdersLag12812 nrOfOrdersLag12815 nrOfOrdersLag12818
128.14604 281.35157 361.79299 8.34690 86.67458
nrOfOrdersLag12824 nrOfOrdersLag12836 nrOfOrdersLag12841 nrOfOrdersLag12842 nrOfOrdersLag12876
518.23720 -357.78788 288.63660 433.15556 158.51341
nrOfOrdersLag12883 nrOfOrdersLag12884 nrOfOrdersLag12901 nrOfOrdersLag12941 nrOfOrdersLag12956
214.74913 68.99485 -208.43888 -297.43011 319.30849
nrOfOrdersLag12996 nrOfOrdersLag13007 nrOfOrdersLag13013 nrOfOrdersLag13023 nrOfOrdersLag13033
321.02569 -88.96746 80.93579 106.97804 -223.88599
nrOfOrdersLag13051 nrOfOrdersLag13072 nrOfOrdersLag13094 nrOfOrdersLag13098 nrOfOrdersLag13127
40.95339 161.48086 524.04025 -94.23016 17.50082
nrOfOrdersLag13152 nrOfOrdersLag13171 nrOfOrdersLag13185 nrOfOrdersLag13202 nrOfOrdersLag13205
-266.11135 8.82232 -107.11441 -141.14442 212.80057
nrOfOrdersLag13222 nrOfOrdersLag13277 nrOfOrdersLag13295 nrOfOrdersLag13321 nrOfOrdersLag13332
187.90431 306.69183 -24.55235 68.42339 -290.11682
nrOfOrdersLag13362 nrOfOrdersLag13378 nrOfOrdersLag13380 nrOfOrdersLag13391 nrOfOrdersLag13476
44.30976 463.85118 276.57882 -282.06457 34.35207
nrOfOrdersLag13488 nrOfOrdersLag13490 nrOfOrdersLag13530 nrOfOrdersLag13578 nrOfOrdersLag13599
217.46608 386.26006 194.69082 52.45357 406.44931
nrOfOrdersLag13611 nrOfOrdersLag13618 nrOfOrdersLag13626 nrOfOrdersLag13632 nrOfOrdersLag13635
242.81201 -22.19253 23.90163 -395.87751 103.44677
nrOfOrdersLag13674 nrOfOrdersLag13681 nrOfOrdersLag13767 nrOfOrdersLag13841 nrOfOrdersLag13849
200.18354 83.25027 -71.88190 382.05886 -279.73606
nrOfOrdersLag13857 nrOfOrdersLag13874 nrOfOrdersLag13885 nrOfOrdersLag13897 nrOfOrdersLag13908
370.92867 -17.14313 -140.99009 -244.17716 93.79552
nrOfOrdersLag13966 nrOfOrdersLag14009 nrOfOrdersLag14031 nrOfOrdersLag14111 nrOfOrdersLag14160
61.75484 224.96558 -107.99394 -126.12766 572.14222
nrOfOrdersLag14171 nrOfOrdersLag14205 nrOfOrdersLag14312 nrOfOrdersLag14468 nrOfOrdersLag14560
-42.29929 -379.41067 194.25204 -47.50642 -116.49251
nrOfOrdersLag14619 nrOfOrdersLag14640 nrOfOrdersLag14684 nrOfOrdersLag14762 nrOfOrdersLag14776
41.34325 -355.84333 -122.77109 -331.12296 404.86637
nrOfOrdersLag14865 nrOfOrdersLag14959 nrOfOrdersLag14967 nrOfOrdersLag15195 nrOfOrdersLag15218
371.14617 104.60840 -42.74014 99.78008 520.62517
nrOfOrdersLag15402 nrOfOrdersLag16029 nrOfOrdersLag16284 nrOfOrdersLag16321 nrOfOrdersLag16350
529.17004 161.02870 268.77256 74.02159 386.53868
nrOfOrdersLag16418 nrOfOrdersLag16557 nrOfOrdersLag16711 nrOfOrdersLag16722 nrOfOrdersLag16825
-81.37023 190.74905 225.64313 -131.70051 271.39936
nrOfOrdersLag16952 nrOfOrdersLag16996 nrOfOrdersLag17098 nrOfOrdersLag17251 nrOfOrdersLag17279
357.39158 408.46849 210.03477 -25.74894 NA
nrOfOrdersLag17292 nrOfOrdersLag17391 nrOfOrdersLag18642 nrOfOrdersLag18670 nrOfOrdersLag18949
262.00528 4.71906 326.28857 49.30983 174.99732
nrOfOrdersLag19202 nrOfOrdersLag19690 nrOfOrdersLag19772
16.13322 15.59552 -62.26111
I have no clue what is happening and why this is going wrong. Anybody that can help me out here? Thanks in advance!

The lagged independent variables were factor variables instead of integer/numeric variables. Having fixed this, the lm call works as intended.

Related

Calculation of Akaike Information Criterion

My data file is given below.
Expt. (Col 1:2) Fit1 (Col 3:4) Fit2 (Col 5:6)
x_expt. y_expt x_fit y_fit x_fit y_fit
2.89394 3.04606 2.95515 3.14485 2.96485 3.16485
2.90727 3.22788 2.96788 3.31939 2.97758 3.34061
2.92788 3.42545 2.98242 3.50303 2.99212 3.52606
2.97576 3.62303 2.99818 3.69758 3.00788 3.72182
2.97394 3.84182 3.01576 3.90182 3.02545 3.92848
3.00061 4.06364 3.03515 4.11818 3.04485 4.14606
3.02788 4.29939 3.05636 4.34545 3.06606 4.37515
3.05758 4.54848 3.08 4.58545 3.0897 4.61697
3.09455 4.80424 3.10606 4.83818 3.11576 4.87212
3.12909 5.07818 3.13515 5.10424 3.14485 5.14061
3.1697 5.37152 3.16667 5.38485 3.17758 5.42364
3.20727 5.65333 3.20182 5.68061 3.21273 5.72182
3.25394 5.98 3.24061 5.99152 3.25212 6.03576
3.30061 6.30909 3.28364 6.32121 3.29576 6.36364
3.35697 6.66061 3.33091 6.66667 3.34364 6.71515
3.41212 7.0303 3.38242 7.02424 3.39636 7.07879
3.47273 7.41818 3.44 7.40606 3.45455 7.46667
3.54606 7.82424 3.50303 7.81212 3.51818 7.87273
3.62242 8.24848 3.57212 8.2303 3.58909 8.29697
3.69939 8.69697 3.64788 8.67879 3.66667 8.74545
3.79212 9.16364 3.73152 9.14545 3.75212 9.21212
3.8903 9.66061 3.82364 9.6303 3.84545 9.70909
4.00242 10.1818 3.92424 10.1455 3.94848 10.2242
4.13212 10.6909 4.03455 10.6849 4.06121 10.7697
4.29697 11.2667 4.15576 11.2546 4.18485 11.3394
4.39273 11.8909 4.28788 11.8485 4.32061 11.9394
4.5503 12.5091 4.43273 12.4727 4.46848 12.5636
4.72061 13.2 4.59152 13.1212 4.63091 13.2242
4.89152 13.8424 4.76424 13.8061 4.80788 13.9091
5.07818 14.5636 4.95333 14.5212 5.00061 14.6303
5.30303 15.3212 5.15879 15.2667 5.21091 15.3818
5.54606 16.0788 5.38303 16.0485 5.44 16.1636
5.79212 16.8909 5.62667 16.8667 5.68848 16.9879
6.07273 17.7273 5.89212 17.7212 5.95939 17.8424
6.35152 18.6303 6.18182 18.6121 6.25455 18.7333
6.67879 19.5394 6.49091 19.5394 6.5697 19.6667
6.99394 20.4727 6.8303 20.503 6.91515 20.6364
7.42424 21.4485 7.19394 21.5091 7.28485 21.6424
7.73939 22.497 7.59394 22.5515 7.69091 22.6909
8.17576 23.5879 8.01818 23.6424 8.12121 23.7758
8.64242 24.7091 8.47879 24.7697 8.58788 24.903
9.12727 25.8242 8.97576 25.9394 9.09091 26.0788
9.59394 27.0727 9.50909 27.1515 9.6303 27.2909
10.1636 28.297 10.0788 28.4061 10.2061 28.5455
10.6909 29.5212 10.6909 29.7091 10.8242 29.8424
11.3939 31.0121 11.3455 31.0485 11.4909 31.1818
12.1152 32.4121 12.0485 32.4364 12.1939 32.5697
12.903 33.8667 12.8 33.8667 12.9455 34
13.6364 35.3515 13.5939 35.3455 13.7515 35.4667
14.4303 36.8546 14.4485 36.8606 14.6 36.9818
15.2849 38.5515 15.3515 38.4242 15.503 38.5394
16.2606 40.1939 16.3091 40.0303 16.4606 40.1455
17.2364 41.8909 17.3212 41.6788 17.4788 41.7879
18.3212 43.7091 18.3939 43.3697 18.5455 43.4727
19.4667 45.5576 19.5273 45.103 19.6788 45.2061
20.5152 46.8849 20.7212 46.8788 20.8667 46.9758
22.0606 49.1758 21.9818 48.6909 22.1152 48.7939
23.3152 51.1879 23.303 50.5515 23.4303 50.6485
24.7212 53.1576 24.6909 52.4485 24.8061 52.5455
26.2121 55.4727 26.1455 54.3879 26.2485 54.4849
a). Col 1:2 represent experimental data.
b). Col 3:4 is fitted data using non-linear least squares fitting using 4 adjustable parameters.
c). Col 5:6 is fitted data using non-linear least squares fitting using 5 adjustable parameters with one parameter fixed to 0.
I wish to calculate AIC for both the fits and conclude which fit is better. Can anybody suggest how to solve this problem in R or Excel?

Error while fitting data in auto.arima - R

I am running auto.arima for forecasting time series data and getting the following error:
1: The time series frequency has been rounded to support seasonal
differencing.
2: In value[3L] : The chosen test encountered
an error, so no seasonal differencing is selected. Check the time
series data.
This is what I am executing:
fit <- auto.arima(data,seasonal = TRUE, approximation = FALSE)
I have weekly time series data.
This is how dput(data) looks like:
structure(c(12911647L, 12618317L, 12827388L, 12967840L, 13264925L,
13557838L, 13701131L, 13812463L, 13971928L, 13837658L, 13550635L,
13022371L, 13507596L, 13456736L, 12992393L, 12831883L, 13262301L,
12831691L, 12808893L, 12726330L, 11893457L, 12434051L, 12363464L,
12077055L, 12107221L, 11986124L, 11997087L, 12264971L, 12164412L,
12438279L, 12733842L, 12543251L, 12627134L, 12480153L, 12276238L,
12443655L, 12497753L, 12279060L, 12549138L, 12308591L, 12416680L,
12516725L, 12326545L, 12772578L, 12524848L, 13429830L, 14188044L,
16611840L, 16476565L, 15659941L, 10785585L, 12150894L, 13436366L,
12985213L, 13097555L, 13204872L, 13786040L, 13760281L, 13295389L,
14734578L, 15043941L, 14821169L, 14361765L, 14300180L, 14357964L,
14271892L, 13248168L, 13813784L, 14092489L, 14100024L, 13378374L,
13225650L, 12582444L, 13267163L, 13026181L, 12747286L, 12707074L,
12534595L, 12546094L, 13030406L, 12950360L, 12814398L, 13405187L,
13277755L, 13142375L, 12742153L, 12610817L, 12267747L, 12570075L,
12704157L, 12835948L, 12851893L, 12978880L, 13104906L, 12754018L,
13213958L, 13584642L, 13963433L, 14471672L, 16312595L, 16630000L,
16443882L, 11555299L, 12018373L, 13031876L, 13013945L, 13164137L,
13313246L, 13652605L, 13803606L, 13308310L, 14466211L, 15092736L,
15346015L, 14467260L, 14767785L, 13914271L, 14185070L, 13851028L,
13605858L, 13597999L, 13876994L, 13026270L, 13113250L, 12288727L,
12925846L, 13525010L, 12594472L, 12654512L, 12888260L), .Tsp = c(2016.00819672131,
2018.48047598209, 52.1785714285714), class = "ts")
This is how I am reading data from the csv
read_data <- read.csv(file="data.csv", header=TRUE)
data_ts <- ts(read_data, freq=365.25/7, start=decimal_date(ymd("2016-1-4")))
data <- data_ts[, 2:2]
This is the data in the csv:
Year si_act
1/4/16 12911647
1/11/16 12618317
1/18/16 12827388
1/25/16 12967840
2/1/16 13264925
2/8/16 13557838
2/15/16 13701131
2/22/16 13812463
2/29/16 13971928
3/7/16 13837658
3/14/16 13550635
3/21/16 13022371
3/28/16 13507596
4/4/16 13456736
4/11/16 12992393
4/18/16 12831883
4/25/16 13262301
5/2/16 12831691
5/9/16 12808893
5/16/16 12726330
5/23/16 11893457
5/30/16 12434051
6/6/16 12363464
6/13/16 12077055
6/20/16 12107221
6/27/16 11986124
7/4/16 11997087
7/11/16 12264971
7/18/16 12164412
7/25/16 12438279
8/1/16 12733842
8/8/16 12543251
8/15/16 12627134
8/22/16 12480153
8/29/16 12276238
9/5/16 12443655
9/12/16 12497753
9/19/16 12279060
9/26/16 12549138
10/3/16 12308591
10/10/16 12416680
10/17/16 12516725
10/24/16 12326545
10/31/16 12772578
11/7/16 12524848
11/14/16 13429830
11/21/16 14188044
11/28/16 16611840
12/5/16 16476565
12/12/16 15659941
12/19/16 10785585
12/26/16 12150894
1/2/17 13436366
1/9/17 12985213
1/16/17 13097555
1/23/17 13204872
1/30/17 13786040
2/6/17 13760281
2/13/17 13295389
2/20/17 14734578
2/27/17 15043941
3/6/17 14821169
3/13/17 14361765
3/20/17 14300180
3/27/17 14357964
4/3/17 14271892
4/10/17 13248168
4/17/17 13813784
4/24/17 14092489
5/1/17 14100024
5/8/17 13378374
5/15/17 13225650
5/22/17 12582444
5/29/17 13267163
6/5/17 13026181
6/12/17 12747286
6/19/17 12707074
6/26/17 12534595
7/3/17 12546094
7/10/17 13030406
7/17/17 12950360
7/24/17 12814398
7/31/17 13405187
8/7/17 13277755
8/14/17 13142375
8/21/17 12742153
8/28/17 12610817
9/4/17 12267747
9/11/17 12570075
9/18/17 12704157
9/25/17 12835948
10/2/17 12851893
10/9/17 12978880
10/16/17 13104906
10/23/17 12754018
10/30/17 13213958
11/6/17 13584642
11/13/17 13963433
11/20/17 14471672
11/27/17 16312595
12/4/17 16630000
12/11/17 16443882
12/18/17 11555299
12/25/17 12018373
1/1/18 13031876
1/8/18 13013945
1/15/18 13164137
1/22/18 13313246
1/29/18 13652605
2/5/18 13803606
2/12/18 13308310
2/19/18 14466211
2/26/18 15092736
3/5/18 15346015
3/12/18 14467260
3/19/18 14767785
3/26/18 13914271
4/2/18 14185070
4/9/18 13851028
4/16/18 13605858
4/23/18 13597999
4/30/18 13876994
5/7/18 13026270
5/14/18 13113250
5/21/18 12288727
5/28/18 12925846
6/4/18 13525010
6/11/18 12594472
6/18/18 12654512
6/25/18 12888260
I was able to read the data without any errors before, initially, I had 160 records & the model does not throw any error but, then for 80-20 test I removed the last 30 records and this error cropped up. Now also, if I run with all the data I don't get any error but is I run it with first 130 as 80% I get this error.
when using auto.arima with seasonal = TRUE the parameter S is not calibrated but taken from the frequency of the ts object you are providing. So in your case S = 52.17.
In case the frequency of the time series is not and integer, S is rounded to next integer so auto.arima takes S = 52.
With S=52 and a data of length 150 it becomes difficult to calibrate a seasonal arima model: e.g if P = 2 and and all other variables are zero the first 104 observations cannot be used. I guess that is what the warning is about. You are being told that the seasonal component cannot be calibrated due to the large coefficient S (or due to your short data).
So either you get a longer data history, or you aggregate your data to monthly data (such that S = 12).

Plotting Conditionally Summed Data (base R or ggplot)

I started with a dataframe containing info on West Nile cases in Canada from 2012-2015. 600 observations of 10 variables in total.
> head(mosquitoes)
Years Weeks Province Avg.Temp Avg..Precepitation Wind Number.of.cases Number.of.Dead.Birds Mosquito.Pools.Tested Google.Trend.Searches
1 2015 17 Alberta 48 0.01 8 0 0 0 1
2 2015 18 Alberta 46 0.03 10 0 0 0 2
3 2015 19 Alberta 44 0.07 8 0 0 0 2
4 2015 20 Alberta 51 0.00 9 0 0 0 2
5 2015 21 Alberta 56 0.01 9 0 0 0 4
6 2015 22 Alberta 58 0.10 7 0 0 0 1
Here is the entire data set....sorry it's large.
Years,Weeks,Province,Avg Temp ,Avg. Precepitation,Wind,Number of cases,Number of Dead Birds,Mosquito Pools Tested,Google Trend Searches
2015,17,Alberta,48,0.01,8,0,0,0,1
2015,18,Alberta,46,0.03,10,0,0,0,2
2015,19,Alberta,44,0.07,8,0,0,0,2
2015,20,Alberta,51,0,9,0,0,0,2
2015,21,Alberta,56,0.01,9,0,0,0,4
2015,22,Alberta,58,0.1,7,0,0,0,1
2015,23,Alberta,61,0.05,8,0,0,0,1
2015,24,Alberta,55,0.08,9,0,0,0,1
2015,25,Alberta,63,0.02,6,0,0,0,4
2015,26,Alberta,67,0.16,8,0,0,0,5
2015,27,Alberta,65,0.02,8,0,0,0,3
2015,28,Alberta,62,0.09,10,0,0,0,7
2015,29,Alberta,66,0.01,8,0,0,0,2
2015,30,Alberta,62,0.02,7,0,0,0,3
2015,31,Alberta,64,0.21,7,0,0,0,6
2015,32,Alberta,66,0.07,7,0,0,0,4
2015,33,Alberta,55,0.13,8,0,0,0,4
2015,34,Alberta,63,0,6,0,0,0,1
2015,35,Alberta,52,0.11,9,0,0,0,4
2015,36,Alberta,54,0.02,7,0,0,0,2
2015,37,Alberta,48,0.06,8,0,0,0,2
2015,38,Alberta,52,0.03,9,0,0,0,3
2015,39,Alberta,49,0.03,9,0,0,0,3
2015,40,Alberta,51,0,8,0,0,0,2
2015,41,Alberta,48,0,8,0,0,0,2
2014,17,Alberta,43,0.05,8,0,0,0,1
2014,18,Alberta,44,0.06,9,0,0,0,3
2014,19,Alberta,37,0.03,9,0,0,0,3
2014,20,Alberta,48,0.01,8,0,0,0,1
2014,21,Alberta,57,0.01,10,0,0,0,2
2014,22,Alberta,53,0.06,8,0,0,0,4
2014,23,Alberta,53,0.04,10,0,0,0,6
2014,24,Alberta,53,0.04,10,0,0,0,6
2014,25,Alberta,54,0.24,9,0,0,0,4
2014,26,Alberta,59,0.03,9,0,0,0,7
2014,27,Alberta,64,0.02,11,0,0,0,19
2014,28,Alberta,65,0.03,10,0,0,0,33
2014,29,Alberta,67,0.01,9,0,0,0,18
2014,30,Alberta,62,0.08,10,0,0,0,14
2014,31,Alberta,68,0,10,0,0,0,10
2014,32,Alberta,63,0.16,8,0,0,0,11
2014,33,Alberta,66,0.01,7,0,0,0,19
2014,34,Alberta,58,0.05,8,0,0,0,17
2014,35,Alberta,58,0.04,7,0,0,0,8
2014,36,Alberta,54,0.01,7,0,0,0,12
2014,37,Alberta,41,0.15,8,0,0,0,3
2014,38,Alberta,58,0,5,0,0,0,3
2014,39,Alberta,60,0.02,6,0,0,0,4
2014,40,Alberta,48,0.03,11,0,0,0,5
2014,41,Alberta,51,0,6,0,0,0,3
2013,17,Alberta,42,0,12,0,0,0,3
2013,18,Alberta,42,0.01,11,0,0,0,2
2013,19,Alberta,57,0,11,0,0,0,2
2013,20,Alberta,55,0.01,10,0,0,0,9
2013,21,Alberta,50,0.23,11,0,0,0,7
2013,22,Alberta,52,0.08,6,0,0,0,8
2013,23,Alberta,55,0.15,10,0,0,0,10
2013,24,Alberta,53,0.08,10,0,0,0,4
2013,25,Alberta,57,0.3,11,0,0,0,9
2013,26,Alberta,61,0.01,9,0,0,0,17
2013,27,Alberta,65,0.08,10,0,0,0,27
2013,28,Alberta,59,0.07,8,0,0,0,19
2013,29,Alberta,62,0.01,10,0,0,0,21
2013,30,Alberta,62,0.06,10,0,0,0,18
2013,31,Alberta,57,0.03,7,0,0,0,13
2013,32,Alberta,60,0.07,8,0,0,0,10
2013,33,Alberta,67,0,8,3,0,0,2
2013,34,Alberta,63,0,8,5,0,0,12
2013,35,Alberta,64,0.03,10,4,0,0,20
2013,36,Alberta,64,0.13,8,2,1,0,15
2013,37,Alberta,63,0,9,5,0,0,9
2013,38,Alberta,57,0.06,11,2,0,0,11
2013,39,Alberta,47,0,10,0,0,0,4
2013,40,Alberta,44,0,11,0,0,0,5
2013,41,Alberta,45,0.06,8,0,0,0,5
2012,17,Alberta,49,0.06,7,0,0,0,2
2012,18,Alberta,42,0.13,9,0,0,0,2
2012,19,Alberta,48,0,9,0,0,0,6
2012,20,Alberta,53,0.01,10,0,0,0,2
2012,21,Alberta,49,0.08,8,0,0,0,2
2012,22,Alberta,52,0,9,0,0,0,2
2012,23,Alberta,54,0.28,9,0,0,0,4
2012,24,Alberta,56,0.21,12,0,0,0,7
2012,25,Alberta,56,0.05,8,0,0,0,5
2012,26,Alberta,59,0.14,8,0,0,0,3
2012,27,Alberta,61,0.21,9,0,0,0,22
2012,28,Alberta,69,0,8,0,0,0,32
2012,29,Alberta,65,0.09,10,0,0,0,16
2012,30,Alberta,64,0.02,10,0,0,0,15
2012,31,Alberta,63,0.03,10,0,0,0,20
2012,32,Alberta,68,0,10,0,0,0,25
2012,33,Alberta,62,0.07,10,4,0,0,36
2012,34,Alberta,62,0.05,10,2,0,0,100
2012,35,Alberta,61,0.01,10,0,0,0,76
2012,36,Alberta,57,0,12,1,0,0,29
2012,37,Alberta,57,0,12,2,0,0,30
2012,38,Alberta,59,0,9,0,0,0,14
2012,39,Alberta,58,0.01,9,0,0,0,11
2012,40,Alberta,43,0.07,12,0,0,0,10
2012,41,Alberta,43,0.02,13,0,0,0,7
2015,17,British Columbia,53,0.03,10,0,0,0,5
2015,18,British Columbia,53,0.01,6,0,0,0,5
2015,19,British Columbia,58,0.01,7,0,0,0,5
2015,20,British Columbia,60,0,7,0,0,0,4
2015,21,British Columbia,62,0,7,0,0,0,6
2015,22,British Columbia,60,0.03,7,0,0,0,9
2015,23,British Columbia,62,0,13,0,0,0,9
2015,24,British Columbia,62,0.02,8,0,0,0,10
2015,25,British Columbia,66,0,9,0,0,0,7
2015,26,British Columbia,70,0,12,0,0,0,5
2015,27,British Columbia,67,0.01,9,0,0,0,11
2015,28,British Columbia,66,0,10,0,0,0,9
2015,29,British Columbia,65,0.04,9,0,0,0,14
2015,30,British Columbia,65,0.04,6,0,0,0,7
2015,31,British Columbia,65,0.02,9,0,0,0,7
2015,32,British Columbia,66,0.04,9,0,0,0,9
2015,33,British Columbia,65,0,9,0,0,0,11
2015,34,British Columbia,64,0.1,7,0,0,0,6
2015,35,British Columbia,57,0.12,10,0,0,0,4
2015,36,British Columbia,61,0.02,9,0,0,0,9
2015,37,British Columbia,58,0.09,9,0,0,0,9
2015,38,British Columbia,55,0.04,9,0,0,0,3
2015,39,British Columbia,52,0,6,0,0,0,3
2015,40,British Columbia,56,0.08,6,0,0,0,3
2015,41,British Columbia,51,0.04,7,0,0,0,7
2014,17,British Columbia,49,0.07,10,0,0,0,3
2014,18,British Columbia,54,0.03,8,0,0,0,4
2014,19,British Columbia,53,0.18,9,0,0,0,4
2014,20,British Columbia,60,0,8,0,0,0,6
2014,21,British Columbia,59,0.06,7,0,0,0,6
2014,22,British Columbia,56,0.09,7,0,0,0,6
2014,23,British Columbia,59,0,8,0,0,0,8
2014,24,British Columbia,60,0.03,10,0,0,0,7
2014,25,British Columbia,58,0.09,9,0,0,0,8
2014,26,British Columbia,62,0.05,7,0,0,0,10
2014,27,British Columbia,64,0.01,8,0,0,0,7
2014,28,British Columbia,66,0.01,8,0,0,0,19
2014,29,British Columbia,68,0,9,0,0,0,13
2014,30,British Columbia,63,0.06,8,0,0,0,12
2014,31,British Columbia,67,0,6,0,0,0,16
2014,32,British Columbia,66,0,7,0,0,0,25
2014,33,British Columbia,67,0.08,7,0,0,0,17
2014,34,British Columbia,65,0,6,0,0,0,13
2014,35,British Columbia,66,0,7,0,0,0,30
2014,36,British Columbia,61,0.05,7,0,0,0,9
2014,37,British Columbia,60,0,6,0,0,0,11
2014,38,British Columbia,61,0.02,6,0,0,0,3
2014,39,British Columbia,62,0.12,9,0,0,0,8
2014,40,British Columbia,56,0.04,6,0,0,0,9
2014,41,British Columbia,58,0.03,5,0,0,0,7
2013,17,British Columbia,50,0.03,7,0,0,0,14
2013,18,British Columbia,50,0,12,0,0,0,8
2013,19,British Columbia,59,0.03,6,0,0,0,5
2013,20,British Columbia,56,0.07,8,0,0,0,7
2013,21,British Columbia,54,0.04,8,0,0,0,4
2013,22,British Columbia,55,0.09,7,0,0,0,8
2013,23,British Columbia,60,0.01,9,0,0,0,14
2013,24,British Columbia,58,0.01,7,0,0,0,16
2013,25,British Columbia,62,0.04,8,0,0,0,10
2013,26,British Columbia,63,0.1,7,0,0,0,17
2013,27,British Columbia,67,0,8,0,0,0,29
2013,28,British Columbia,63,0,8,0,0,0,30
2013,29,British Columbia,66,0,9,0,0,0,20
2013,30,British Columbia,64,0,8,0,0,0,34
2013,31,British Columbia,64,0.02,8,0,0,0,11
2013,32,British Columbia,66,0,6,0,0,1,13
2013,33,British Columbia,66,0.02,8,0,0,1,16
2013,34,British Columbia,63,0.01,8,0,0,1,16
2013,35,British Columbia,65,0.17,7,0,1,1,12
2013,36,British Columbia,64,0.06,6,0,0,1,8
2013,37,British Columbia,63,0,6,0,0,1,14
2013,38,British Columbia,60,0.19,6,0,0,1,6
2013,39,British Columbia,54,0.23,10,0,0,1,6
2013,40,British Columbia,51,0.15,9,0,0,1,6
2013,41,British Columbia,51,0.01,8,0,0,1,8
2012,17,British Columbia,53,0.05,8,0,0,0,5
2012,18,British Columbia,50,0.11,7,0,0,0,6
2012,19,British Columbia,52,0,9,0,0,0,7
2012,20,British Columbia,54,0,10,0,0,0,8
2012,21,British Columbia,55,0.06,8,0,0,0,9
2012,22,British Columbia,57,0.07,7,0,0,0,8
2012,23,British Columbia,53,0.07,8,0,0,0,4
2012,24,British Columbia,57,0.04,8,0,0,0,4
2012,25,British Columbia,58,0.13,8,0,0,0,7
2012,26,British Columbia,60,0.04,8,0,0,0,8
2012,27,British Columbia,59,0.03,7,0,0,0,22
2012,28,British Columbia,66,0,6,0,0,0,30
2012,29,British Columbia,66,0.05,8,0,0,0,30
2012,30,British Columbia,63,0.03,8,0,0,0,38
2012,31,British Columbia,65,0,8,0,0,0,60
2012,32,British Columbia,67,0.01,8,0,0,0,34
2012,33,British Columbia,69,0,7,0,0,0,63
2012,34,British Columbia,63,0,8,0,0,0,100
2012,35,British Columbia,62,0,7,0,0,0,51
2012,36,British Columbia,62,0,7,0,0,0,32
2012,37,British Columbia,58,0.01,8,0,0,0,24
2012,38,British Columbia,60,0,6,0,0,0,13
2012,39,British Columbia,57,0,6,0,0,0,13
2012,40,British Columbia,53,0,8,0,0,0,6
2012,41,British Columbia,52,0.09,5,0,0,0,8
2015,17,Manitoba,56,0,10,0,0,0,4
2015,18,Manitoba,48,0,13,0,0,0,4
2015,19,Manitoba,46,0,10,0,0,0,4
2015,20,Manitoba,52,0,14,0,0,0,4
2015,21,Manitoba,57,0,10,0,0,12,4
2015,22,Manitoba,60,0,12,0,0,4,8
2015,23,Manitoba,67,0,9,0,0,87,8
2015,24,Manitoba,59,0,9,0,0,82,8
2015,25,Manitoba,66,0,7,0,0,44,8
2015,26,Manitoba,68,0,7,0,0,75,11
2015,27,Manitoba,66,0,10,0,0,73,17
2015,28,Manitoba,70,0,7,0,0,132,8
2015,29,Manitoba,69,0,9,0,0,139,17
2015,30,Manitoba,70,0,11,0,0,204,4
2015,31,Manitoba,63,0,9,0,0,275,13
2015,32,Manitoba,73,0,9,0,0,195,23
2015,33,Manitoba,62,0,10,0,0,228,13
2015,34,Manitoba,62,0,11,0,0,69,12
2015,35,Manitoba,73,0,11,1,0,92,10
2015,36,Manitoba,57,0,10,1,0,113,8
2015,37,Manitoba,60,0,11,2,0,34,4
2015,38,Manitoba,61,0,13,1,0,0,4
2015,39,Manitoba,53,0,13,0,0,0,6
2015,40,Manitoba,48,0,11,0,0,0,6
2015,41,Manitoba,44,0,11,0,0,0,6
2014,17,Manitoba,42,0,11,0,0,0,4
2014,18,Manitoba,42,0,14,0,0,0,0
2014,19,Manitoba,46,0,9,0,0,0,0
2014,20,Manitoba,45,0,10,0,0,0,0
2014,21,Manitoba,57,0,12,0,0,0,0
2014,22,Manitoba,66,0,8,0,0,0,0
2014,23,Manitoba,62,0,10,0,0,0,5
2014,24,Manitoba,60,0,11,0,0,0,13
2014,25,Manitoba,62,0,12,0,0,0,9
2014,26,Manitoba,66,0,10,0,0,0,7
2014,27,Manitoba,65,0,15,0,0,0,9
2014,28,Manitoba,67,0,11,0,0,0,36
2014,29,Manitoba,63,0,11,0,0,0,24
2014,30,Manitoba,68,0,9,0,0,0,53
2014,31,Manitoba,65,0,8,0,0,7,41
2014,32,Manitoba,71,0,8,0,0,7,48
2014,33,Manitoba,68,0,8,1,0,14,14
2014,34,Manitoba,67,0,8,2,0,19,18
2014,35,Manitoba,61,0,11,2,0,22,9
2014,36,Manitoba,60,0,8,0,0,24,4
2014,37,Manitoba,50,0,11,0,0,24,11
2014,38,Manitoba,52,0,10,0,0,24,4
2014,39,Manitoba,65,0,13,0,0,24,15
2014,40,Manitoba,47,0,16,0,0,24,4
2014,41,Manitoba,39,0,13,0,0,24,4
2013,17,Manitoba,36,0.01,12,0,0,0,4
2013,18,Manitoba,38,0.11,9,0,0,0,4
2013,19,Manitoba,49,0.02,12,0,0,0,4
2013,20,Manitoba,56,0.02,10,0,0,0,5
2013,21,Manitoba,55,0.05,14,0,0,0,4
2013,22,Manitoba,58,0.16,15,0,0,0,4
2013,23,Manitoba,57,0.01,9,0,0,0,9
2013,24,Manitoba,63,0.03,10,0,0,0,16
2013,25,Manitoba,66,0.1,9,0,0,0,23
2013,26,Manitoba,69,0.24,10,0,0,0,14
2013,27,Manitoba,72,0,6,0,0,0,23
2013,28,Manitoba,70,0.06,10,0,0,1,19
2013,29,Manitoba,66,0.1,9,0,0,1,45
2013,30,Manitoba,60,0.19,8,0,1,7,35
2013,31,Manitoba,61,0.03,7,0,0,10,31
2013,32,Manitoba,59,0.04,7,0,0,16,22
2013,33,Manitoba,64,0.02,8,1,0,16,24
2013,34,Manitoba,71,0.17,10,0,0,16,49
2013,35,Manitoba,76,0.01,7,0,0,17,14
2013,36,Manitoba,64,0,10,1,0,17,11
2013,37,Manitoba,63,0.01,8,0,0,19,9
2013,38,Manitoba,54,0,11,0,0,19,6
2013,39,Manitoba,60,0.1,12,0,0,19,13
2013,40,Manitoba,50,0.03,11,0,0,19,8
2013,41,Manitoba,52,0,10,0,1,19,4
2012,17,Manitoba,46,0.01,12,0,0,0,0
2012,18,Manitoba,51,0.05,11,0,0,0,0
2012,19,Manitoba,56,0.06,13,0,0,0,5
2012,20,Manitoba,58,0.16,12,0,0,0,6
2012,21,Manitoba,53,0.02,11,0,0,0,5
2012,22,Manitoba,53,0.13,9,0,0,0,5
2012,23,Manitoba,67,0.08,8,0,0,0,8
2012,24,Manitoba,62,0.17,11,0,0,0,10
2012,25,Manitoba,60,0.04,8,0,0,0,11
2012,26,Manitoba,68,0,10,0,0,0,11
2012,27,Manitoba,73,0.03,7,0,0,0,15
2012,28,Manitoba,73,0,7,0,0,0,17
2012,29,Manitoba,69,0.05,8,1,0,2,21
2012,30,Manitoba,71,0,8,1,0,20,36
2012,31,Manitoba,71,0.2,9,4,0,48,100
2012,32,Manitoba,67,0,9,7,0,62,47
2012,33,Manitoba,62,0.04,8,7,0,98,31
2012,34,Manitoba,69,0.01,7,6,0,108,84
2012,35,Manitoba,70,0.01,11,7,0,111,75
2012,36,Manitoba,63,0.01,11,1,0,116,22
2012,37,Manitoba,59,0.01,11,3,0,116,23
2012,38,Manitoba,47,0.01,12,2,0,116,13
2012,39,Manitoba,50,0,8,0,0,116,5
2012,40,Manitoba,46,0.02,15,0,0,116,7
2012,41,Manitoba,37,0.02,10,0,0,116,5
2015,17,Quebec,53,0,8,0,0,0,8
2015,18,Quebec,65,0.06,8,0,0,0,8
2015,19,Quebec,58,0.09,10,0,0,0,8
2015,20,Quebec,59,0.05,11,0,0,0,8
2015,21,Quebec,69,0.11,11,0,0,0,8
2015,22,Quebec,56,0.07,9,0,0,0,8
2015,23,Quebec,65,0.16,9,0,0,0,8
2015,24,Quebec,64,0.16,7,0,0,0,16
2015,25,Quebec,67,0.18,8,0,0,0,8
2015,26,Quebec,64,0.07,9,0,0,120,19
2015,27,Quebec,71,0.01,8,0,0,127,24
2015,28,Quebec,70,0.05,9,0,1,132,24
2015,29,Quebec,70,0.3,8,0,1,131,16
2015,30,Quebec,75,0.07,9,1,2,129,16
2015,31,Quebec,67,0.02,9,1,3,126,8
2015,32,Quebec,69,0.31,7,0,0,133,8
2015,33,Quebec,76,0.11,9,1,1,125,16
2015,34,Quebec,68,0.01,8,2,1,123,11
2015,35,Quebec,70,0,8,1,3,131,31
2015,36,Quebec,72,0.15,8,2,4,128,15
2015,37,Quebec,69,0.21,9,6,0,123,7
2015,38,Quebec,58,0,7,5,0,108,7
2015,39,Quebec,55,0.17,11,2,2,107,11
2015,40,Quebec,49,0.03,7,5,0,0,7
2015,41,Quebec,51,0.11,11,8,0,0,15
2014,17,Quebec,46,0.05,9,0,0,0,0
2014,18,Quebec,49,0.18,12,0,0,0,0
2014,19,Quebec,53,0.09,10,0,0,0,0
2014,20,Quebec,62,0.17,13,0,0,0,0
2014,21,Quebec,59,0.01,9,0,0,0,13
2014,22,Quebec,59,0.08,9,0,0,0,13
2014,23,Quebec,66,0.13,8,0,0,0,40
2014,24,Quebec,66,0.28,11,0,0,0,18
2014,25,Quebec,65,0.14,8,0,0,0,27
2014,26,Quebec,69,0.14,6,0,0,0,33
2014,27,Quebec,75,0.02,9,0,0,0,23
2014,28,Quebec,70,0.08,12,0,0,0,40
2014,29,Quebec,69,0.05,9,0,0,1,27
2014,30,Quebec,72,0.06,10,0,0,4,28
2014,31,Quebec,66,0.18,8,0,0,9,54
2014,32,Quebec,70,0.04,6,0,0,10,24
2014,33,Quebec,67,0.2,10,1,2,19,34
2014,34,Quebec,66,0,7,1,0,19,9
2014,35,Quebec,70,0,8,1,1,39,17
2014,36,Quebec,72,0.11,10,1,0,70,8
2014,37,Quebec,60,0.12,9,0,3,99,12
2014,38,Quebec,52,0.02,9,1,2,112,13
2014,39,Quebec,61,0.02,9,0,0,119,15
2014,40,Quebec,58,0.06,11,0,1,119,16
2014,41,Quebec,51,0.1,13,1,0,119,16
2013,17,Quebec,46,0.03,11,1,0,0,9
2013,18,Quebec,60,0.01,7,0,0,0,9
2013,19,Quebec,65,0.08,8,0,0,0,9
2013,20,Quebec,51,0.01,11,0,0,0,18
2013,21,Quebec,64,0.19,10,0,0,0,17
2013,22,Quebec,64,0.18,9,0,0,0,9
2013,23,Quebec,59,0.11,10,0,0,0,21
2013,24,Quebec,64,0.11,9,0,0,0,18
2013,25,Quebec,62,0.09,8,0,0,0,9
2013,26,Quebec,69,0.14,9,0,0,0,37
2013,27,Quebec,72,0.02,9,0,0,0,9
2013,28,Quebec,73,0.06,8,0,0,0,45
2013,29,Quebec,79,0.28,9,0,0,2,49
2013,30,Quebec,66,0.06,7,0,0,3,73
2013,31,Quebec,70,0.12,9,1,3,5,40
2013,32,Quebec,68,0.04,9,3,2,11,74
2013,33,Quebec,66,0.08,9,8,4,23,56
2013,34,Quebec,69,0.02,10,3,5,36,64
2013,35,Quebec,70,0.06,7,4,9,36,29
2013,36,Quebec,63,0.06,10,2,6,40,32
2013,37,Quebec,62,0.18,8,3,4,47,20
2013,38,Quebec,58,0.12,9,1,2,59,8
2013,39,Quebec,54,0.03,6,1,0,60,16
2013,40,Quebec,61,0,6,1,0,60,24
2013,41,Quebec,55,0.11,10,0,0,60,20
2012,17,Quebec,40,0.17,13,0,0,0,0
2012,18,Quebec,50,0.03,7,0,0,0,10
2012,19,Quebec,55,0.07,8,0,0,0,10
2012,20,Quebec,61,0.02,7,0,0,0,10
2012,21,Quebec,69,0.1,7,0,0,0,11
2012,22,Quebec,62,0.16,8,0,0,0,10
2012,23,Quebec,61,0.02,8,0,0,0,10
2012,24,Quebec,68,0.08,7,0,0,0,11
2012,25,Quebec,76,0.01,9,0,0,0,11
2012,26,Quebec,69,0.13,9,0,0,0,26
2012,27,Quebec,73,0.12,6,0,0,0,40
2012,28,Quebec,72,0,8,0,2,0,24
2012,29,Quebec,71,0.21,6,1,0,0,11
2012,30,Quebec,71,0.1,7,1,0,0,11
2012,31,Quebec,76,0.01,7,0,1,5,78
2012,32,Quebec,72,0.17,10,2,5,8,31
2012,33,Quebec,70,0.02,7,6,2,19,94
2012,34,Quebec,70,0,6,10,5,19,100
2012,35,Quebec,71,0.01,11,9,8,19,76
2012,36,Quebec,71,0.11,6,14,1,19,70
2012,37,Quebec,63,0.07,8,23,6,19,43
2012,38,Quebec,58,0.12,10,16,0,19,34
2012,39,Quebec,54,0.01,9,27,0,19,38
2012,40,Quebec,57,0.16,8,11,0,19,14
2012,41,Quebec,45,0.06,10,8,0,19,19
2015,17,Ontario,53,0,9,0,0,0,2
2015,18,Ontario,61,0.04,5,0,0,0,2
2015,19,Ontario,58,0.07,7,0,0,0,4
2015,20,Ontario,58,0,8,0,0,0,5
2015,21,Ontario,70,0.11,8,0,0,0,8
2015,22,Ontario,57,0.14,7,0,0,180,8
2015,23,Ontario,65,0.18,6,0,0,356,5
2015,24,Ontario,65,0.08,5,0,1,852,5
2015,25,Ontario,67,0.33,7,0,0,886,13
2015,26,Ontario,63,0.02,7,0,0,954,15
2015,27,Ontario,68,0.04,5,0,0,1152,13
2015,28,Ontario,67,0.03,6,1,0,1216,21
2015,29,Ontario,72,0.01,7,1,4,1219,16
2015,30,Ontario,76,0.03,6,1,1,1222,22
2015,31,Ontario,68,0.06,6,0,8,1176,24
2015,32,Ontario,69,0.21,6,0,0,1168,15
2015,33,Ontario,73,0.09,5,1,0,1168,24
2015,34,Ontario,64,0.01,5,5,1,987,12
2015,35,Ontario,75,0,5,2,1,881,18
2015,36,Ontario,70,0.11,5,5,0,802,9
2015,37,Ontario,65,0.07,6,1,2,712,6
2015,38,Ontario,60,0,5,5,4,526,4
2015,39,Ontario,55,0.04,9,2,2,396,6
2015,40,Ontario,53,0.14,6,3,0,65,5
2015,41,Ontario,52,0.04,8,3,4,0,2
2014,17,Ontario,46,0.05,8,0,0,0,3
2014,18,Ontario,47,0.14,9,0,0,0,2
2014,19,Ontario,53,0,9,0,0,0,2
2014,20,Ontario,56,0.13,6,0,0,0,3
2014,21,Ontario,57,0.09,5,0,0,0,4
2014,22,Ontario,65,0.02,6,0,0,0,7
2014,23,Ontario,63,0.04,6,0,0,0,10
2014,24,Ontario,65,0.19,6,0,0,0,16
2014,25,Ontario,66,0.16,5,0,0,0,13
2014,26,Ontario,69,0.06,4,0,0,0,7
2014,27,Ontario,72,0.09,7,0,0,0,20
2014,28,Ontario,68,0.12,6,0,0,0,17
2014,29,Ontario,66,0.21,5,1,0,0,13
2014,30,Ontario,68,0.03,5,0,0,2,14
2014,31,Ontario,67,0.35,5,0,0,5,35
2014,32,Ontario,68,0.21,4,0,0,9,22
2014,33,Ontario,65,0.12,7,2,0,11,30
2014,34,Ontario,67,0.02,4,0,2,13,11
2014,35,Ontario,67,0,6,2,3,30,18
2014,36,Ontario,71,0.39,5,5,0,43,13
2014,37,Ontario,60,0.15,6,1,0,52,10
2014,38,Ontario,53,0.02,4,0,1,56,7
2014,39,Ontario,60,0.08,4,0,0,56,3
2014,40,Ontario,61,0.06,4,0,0,56,6
2014,41,Ontario,50,0.06,6,0,0,56,4
2013,17,Ontario,43,0.05,6,0,0,0,2
2013,18,Ontario,57,0.05,6,0,0,0,3
2013,19,Ontario,59,0.04,5,0,0,0,4
2013,20,Ontario,51,0.02,8,0,0,0,3
2013,21,Ontario,60,0.17,8,0,0,0,7
2013,22,Ontario,64,0.16,6,1,0,0,9
2013,23,Ontario,58,0.05,7,1,0,0,9
2013,24,Ontario,64,0.29,6,0,0,0,12
2013,25,Ontario,64,0.11,5,0,0,0,12
2013,26,Ontario,73,0.06,4,0,1,2,12
2013,27,Ontario,71,0.05,5,1,0,2,20
2013,28,Ontario,72,0.13,6,2,0,4,15
2013,29,Ontario,80,0.05,5,1,2,12,20
2013,30,Ontario,65,0.12,6,5,0,22,56
2013,31,Ontario,66,0.26,5,4,8,41,43
2013,32,Ontario,67,0.04,6,5,6,65,32
2013,33,Ontario,63,0,5,5,2,89,24
2013,34,Ontario,70,0,5,2,0,131,30
2013,35,Ontario,72,0.2,3,2,8,155,22
2013,36,Ontario,63,0.12,6,7,2,179,12
2013,37,Ontario,64,0.04,6,3,2,190,15
2013,38,Ontario,57,0.17,4,5,2,194,9
2013,39,Ontario,55,0,4,0,1,196,5
2013,40,Ontario,61,0.04,4,5,0,198,9
2013,41,Ontario,56,0.04,4,1,0,198,4
2012,17,Ontario,40,0.06,11,0,0,0,4
2012,18,Ontario,50,0.12,6,0,0,0,3
2012,19,Ontario,56,0.07,6,0,0,0,3
2012,20,Ontario,58,0.02,4,0,0,0,3
2012,21,Ontario,69,0.01,6,0,0,0,5
2012,22,Ontario,64,0.09,8,0,0,0,3
2012,23,Ontario,63,0.03,6,1,0,0,6
2012,24,Ontario,67,0.08,6,0,0,0,4
2012,25,Ontario,76,0.17,6,0,0,2,7
2012,26,Ontario,70,0.04,7,0,0,6,10
2012,27,Ontario,75,0.04,5,3,1,10,39
2012,28,Ontario,73,0.02,5,5,3,19,24
2012,29,Ontario,75,0.06,6,9,1,30,19
2012,30,Ontario,72,0.38,6,14,2,89,17
2012,31,Ontario,73,0.16,4,23,1,162,77
2012,32,Ontario,70,0.14,6,44,1,249,46
2012,33,Ontario,68,0.05,4,44,8,312,64
2012,34,Ontario,67,0,4,38,4,375,83
2012,35,Ontario,70,0.15,6,26,0,409,100
2012,36,Ontario,69,0.56,4,25,0,434,79
2012,37,Ontario,61,0.03,5,17,2,454,37
2012,38,Ontario,57,0.16,5,3,4,462,23
2012,39,Ontario,53,0,6,2,6,462,24
2012,40,Ontario,57,0.03,5,3,0,464,18
2012,41,Ontario,42,0.04,5,1,0,464,10
2015,17,Saskatchewan,50,0,10,0,0,0,6
2015,18,Saskatchewan,46,0,11,0,0,0,12
2015,19,Saskatchewan,46,0,9,0,0,0,6
2015,20,Saskatchewan,53,0,8,0,0,0,6
2015,21,Saskatchewan,56,0,8,0,0,2,9
2015,22,Saskatchewan,60,0,10,0,0,0,9
2015,23,Saskatchewan,64,0,10,0,0,3,9
2015,24,Saskatchewan,57,0,8,0,0,3,12
2015,25,Saskatchewan,65,0,7,0,0,10,31
2015,26,Saskatchewan,70,0,6,0,0,13,15
2015,27,Saskatchewan,66,0,9,0,0,16,13
2015,28,Saskatchewan,67,0,8,0,0,40,15
2015,29,Saskatchewan,68,0,10,0,0,47,16
2015,30,Saskatchewan,63,0.02,9,0,0,69,43
2015,31,Saskatchewan,63,0,8,0,0,67,16
2015,32,Saskatchewan,70,0,8,0,0,80,28
2015,33,Saskatchewan,58,0,8,0,0,94,38
2015,34,Saskatchewan,62,0,8,0,0,42,21
2015,35,Saskatchewan,61,0,10,0,1,41,14
2015,36,Saskatchewan,53,0,8,0,0,0,9
2015,37,Saskatchewan,52,0,8,0,0,0,5
2015,38,Saskatchewan,54,0,10,0,0,0,5
2015,39,Saskatchewan,48,0,8,0,0,0,5
2015,40,Saskatchewan,48,0,9,0,0,0,8
2015,41,Saskatchewan,44,0,11,0,0,0,5
2014,17,Saskatchewan,40,0,12,0,0,0,6
2014,18,Saskatchewan,41,0,10,0,0,0,6
2014,19,Saskatchewan,41,0,9,0,0,0,6
2014,20,Saskatchewan,45,0,7,0,0,0,6
2014,21,Saskatchewan,59,0,10,0,0,0,13
2014,22,Saskatchewan,57,0,11,0,0,0,20
2014,23,Saskatchewan,55,0,8,0,0,0,17
2014,24,Saskatchewan,53,0,10,0,0,0,13
2014,25,Saskatchewan,57,0,10,0,0,0,7
2014,26,Saskatchewan,63,0,8,0,0,0,21
2014,27,Saskatchewan,66,0,11,0,0,0,26
2014,28,Saskatchewan,65,0,10,0,0,0,69
2014,29,Saskatchewan,64,0,9,0,0,0,65
2014,30,Saskatchewan,63,0,9,0,0,1,60
2014,31,Saskatchewan,67,0,6,0,0,1,36
2014,32,Saskatchewan,69,0,6,0,2,2,47
2014,33,Saskatchewan,67,0,7,0,0,9,67
2014,34,Saskatchewan,64,0,8,0,0,19,45
2014,35,Saskatchewan,58,0,9,0,0,20,34
2014,36,Saskatchewan,56,0,8,0,0,20,13
2014,37,Saskatchewan,46,0,9,0,0,20,19
2014,38,Saskatchewan,55,0,8,0,0,20,6
2014,39,Saskatchewan,61,0,9,0,0,20,16
2014,40,Saskatchewan,44,0,12,0,0,20,12
2014,41,Saskatchewan,45,0,9,0,0,20,6
2013,17,Saskatchewan,34,0,10,0,0,0,10
2013,18,Saskatchewan,40,0,12,0,0,0,14
2013,19,Saskatchewan,50,0,12,0,0,0,14
2013,20,Saskatchewan,59,0,9,0,0,0,7
2013,21,Saskatchewan,57,0,13,0,0,0,7
2013,22,Saskatchewan,60,0,9,0,0,0,14
2013,23,Saskatchewan,57,0,9,0,0,0,21
2013,24,Saskatchewan,57,0,10,0,0,0,20
2013,25,Saskatchewan,61,0,10,0,0,0,14
2013,26,Saskatchewan,64,0,7,0,0,0,41
2013,27,Saskatchewan,69,0,7,0,0,0,61
2013,28,Saskatchewan,65,0,8,0,0,1,65
2013,29,Saskatchewan,62,0,9,0,3,1,81
2013,30,Saskatchewan,60,0,9,0,1,3,75
2013,31,Saskatchewan,59,0,8,0,2,3,33
2013,32,Saskatchewan,60,0,6,0,1,18,44
2013,33,Saskatchewan,69,0,8,0,0,29,75
2013,34,Saskatchewan,66,0,8,1,1,29,60
2013,35,Saskatchewan,69,0,8,3,0,36,24
2013,36,Saskatchewan,67,0,7,1,0,40,21
2013,37,Saskatchewan,62,0,9,0,0,40,26
2013,38,Saskatchewan,57,0,10,1,2,40,32
2013,39,Saskatchewan,51,0,9,0,1,40,13
2013,40,Saskatchewan,45,0,11,0,0,40,29
2013,41,Saskatchewan,46,0,10,0,0,40,10
2012,17,Saskatchewan,44,0,13,0,0,0,24
2012,18,Saskatchewan,46,0,12,0,0,0,16
2012,19,Saskatchewan,51,0,13,0,0,0,16
2012,20,Saskatchewan,54,0,12,0,0,0,9
2012,21,Saskatchewan,48,0,11,0,0,0,17
2012,22,Saskatchewan,53,0,9,0,0,0,16
2012,23,Saskatchewan,61,0,13,0,0,0,8
2012,24,Saskatchewan,56,0,11,0,0,0,16
2012,25,Saskatchewan,58,0,7,0,0,0,25
2012,26,Saskatchewan,64,0,12,0,0,0,22
2012,27,Saskatchewan,65,0,9,0,0,0,23
2012,28,Saskatchewan,71,0,7,0,1,0,67
2012,29,Saskatchewan,67,0,10,0,0,0,34
2012,30,Saskatchewan,67,0,8,0,0,0,28
2012,31,Saskatchewan,64,0,8,0,0,0,59
2012,32,Saskatchewan,68,0,8,0,0,3,58
2012,33,Saskatchewan,59,0,8,2,0,4,34
2012,34,Saskatchewan,65,0,9,1,0,6,100
2012,35,Saskatchewan,64,0,9,0,0,6,49
2012,36,Saskatchewan,55,0,11,3,0,6,41
2012,37,Saskatchewan,58,0,13,0,0,6,16
2012,38,Saskatchewan,50,0,8,3,0,6,19
2012,39,Saskatchewan,55,0,6,0,0,6,15
2012,40,Saskatchewan,42,0,10,0,0,6,11
2012,41,Saskatchewan,36,0,8,0,0,6,7
First I produced this plot
But I did that in the most brute force way imaginable
#split out each year
cases2015 <- subset(mosquitoes, mosquitoes$Years==2015)
cases2014 <- subset(mosquitoes, mosquitoes$Years==2014)
cases2013 <- subset(mosquitoes, mosquitoes$Years==2013)
cases2012 <- subset(mosquitoes, mosquitoes$Years==2012)
#get the sums by week
aggregate2015 <- aggregate(cases2015$Number.of.cases, by=list(Weeks=cases2015$Weeks), FUN=sum)
aggregate2014 <- aggregate(cases2014$Number.of.cases, by=list(Weeks=cases2014$Weeks), FUN=sum)
aggregate2013 <- aggregate(cases2013$Number.of.cases, by=list(Weeks=cases2013$Weeks), FUN=sum)
aggregate2012 <- aggregate(cases2012$Number.of.cases, by=list(Weeks=cases2012$Weeks), FUN=sum)
#put the sums back together into a dataframe
aggregateSums <- aggregate2012
aggregateSums <- cbind(aggregateSums, aggregate2013[,2])
aggregateSums <- cbind(aggregateSums, aggregate2014[,2])
aggregateSums <- cbind(aggregateSums, aggregate2015[,2])
#give the columns useful names
colnames(aggregateSums) <- c("Weeks","Cases.2012","Cases.2013","Cases.2014","Cases.2015")
#base R plot
#plot the first set of points
plot(x=aggregateSums$Weeks,y=aggregateSums$Cases.2012,pch=16,col="Red",main="West Nile Cases",xlab="Week",ylab="Number of Cases")
#add additional years
points(x=aggregateSums$Weeks,y=aggregateSums$Cases.2013,pch=15,col="Blue")
points(x=aggregateSums$Weeks,y=aggregateSums$Cases.2014,pch=14,col="Orange")
points(x=aggregateSums$Weeks,y=aggregateSums$Cases.2015,pch=13,col="Brown")
#add the connecting lines
lines(x=aggregateSums$Weeks,y=aggregateSums$Cases.2012,col="Red")
lines(x=aggregateSums$Weeks,y=aggregateSums$Cases.2013,col="Blue")
lines(x=aggregateSums$Weeks,y=aggregateSums$Cases.2014,col="Orange")
lines(x=aggregateSums$Weeks,y=aggregateSums$Cases.2015,col="Brown")
#click to place legend
legend(locator(1),c("2012","2013","2014","2015"),pch=c(16,15,14,13), col=c("Red","Blue","Orange","Brown"))
So surely there has to be a more efficient way to get there.
My next step is to produce the same plot but for just one province at a time. I don't want to have to go through the above 6 times...
I'm opening to accomplishing this via ggplot. If possible, I'd like to do it without resorting to additional packages (like plyr) as I'm trying to learn the base functionality for manipulating data.
Just to close the loop after Biranjan's answer...
mosq2 <- mosquitoes %>%
select(Years,Weeks,Province,Number.of.cases) %>%
group_by(Years,Weeks,Province) %>%
summarise(sum_case=sum(Number.of.cases))
ggplot(data=mosq2, aes(x=as.factor(Weeks),y=sum_case,color=as.factor(Years))) +
geom_point(aes(shape=as.factor(Years))) +
geom_line(aes(group=as.factor(Years))) +
labs(title="West Nile Cases", x="weeks", y="Number of cases") +
theme(legend.title=element_blank()) +
facet_wrap(~Province,ncol=3) +
scale_x_discrete(breaks=c(17,30,41))
Turned out quite nicely
ggplot(data=data1, aes(x=as.factor(Weeks),y=sum_case,color=as.factor(Years)))+
geom_point(aes(shape=as.factor(Years)))+
geom_line(aes(group=as.factor(Years)))+
labs(title="West Nile cases",x="weeks",y="Number of cases")+
theme(legend.title=element_blank())
Update:
I had too few points in my simulation so it rendered fine so that was the problem. I could't find a way to plot just using ggplot. The same code works if "dplyr" is used first and variable name edited accordingly. I know it is not what you are looking for, sorry to disappoint you.
library(dplyr)
data1 <- data %>%
select(Years,Weeks,Number.of.cases) %>%
group_by(Years,Weeks) %>%
summarise(sum_case=sum(Number.of.cases))

cut function and controlled frequency in the intervals

My question is pretty simple: the cut() function allows to choose the breaks along which I can divide the range of my vector into intervals. I would like to be able to control for the number of observations within the newly created interval, in a way similar to what could be obtained with a quantile argument in the cut() function call. However I don't want to be using the quantile argument because I would like for the intervals to be chosen fixed, so that I can match them between different databases for further comparison, and I want the same discrete values to be found in the labels of the newly cut vectors.
I used to use this for the quantile approach:
df$z<-cut(df$x, quantile(x, (0:10)/10), include.lowest=TRUE)
Which is fairly simple. My new approach is even simpler, so it resembles this for example:
df$z<-cut(df$x, c(0.04,0.055,0.06,0.065,0.07,0.075,0.08,0.085,0.09,0.095,0.11), include.lowest=T)
I then have another variable which I want to calculate some statistics on, according to the levels of the discrete variable.
So it would go something like this :
df$conf.intx<-ifelse(df$z=="1",t.test(df[df$z=="1",]$y)$conf.int[1],
ifelse(df$z=="2",t.test(df[df$z=="2",]$y)$conf.int[1],
ifelse(df$z=="3",t.test(df[df$z=="3",]$y)$conf.int[1],
ifelse(df$z=="4",t.test(df[df$z=="4",]$y)$conf.int[1],NA))))
But for me to be able to calculate this kind of t-test confidence interval on each of the 'pools' of the y values (which number in the same amount as the observations within the intervals of the discrete variable), I need to be able to control for the number of values within each created interval for z, so that my test remains valid, at least as far as the number of observations is concerned.
Simply put, I'd need an automated procedure that would create the vector of breaks for the z variable so that each of them contains a minimum number of observations. As an added complication, it should be the same breaks for two different databases, which I don't know if it's possible.
Any help on the matter would be welcome, thank you in advance.
EDIT: here is a sample of my data for x.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1161917, 7.0606333, 5.2622667, 6.780925, 5.4615417, 6.48185,
5.51585, 6.2224333, 5.3660667, 7.196525, 6.2984083, 7.0137833,
7.4490083, 5.9712333, 6.4287833, 7.6693917, 6.4406417, 5.4135083,
7.16245, 7.2267, 5.820325, 6.066175, 5.760975, 6.4775, 6.2625,
5.5182583, 8.446625, 8.19025, 6.7955333, 4.7899583, 6.5680167,
4.5965917, 6.3539333, 4.6639, 6.0489667, 4.9047833, 5.353625,
4.711425, 6.6268833, 5.5458083, 6.3271917, 6.4591417, 5.1843917,
5.6117167, 7.1828417, 5.6956917, 5.0271917, 6.741875, 6.68305,
4.7859667, 5.3068667, 5.3245, 5.745675, 5.7518917, 5.37945, 8.0030417,
7.7064583, 6.2935333, 5.1838667, 6.9369333, 4.9734583, 6.7257167,
5.0510333, 6.4257667, 5.2858083, 5.7285167, 5.084, 7.0092833,
5.905875, 6.6893417, 6.8319583, 5.5558083, 5.9854833, 7.5552167,
6.064625, 5.3990333, 7.115175, 7.0600167, 5.1644833, 5.6848667,
5.7014417, 6.1051, 6.1186333, 5.7217667, 8.3685417, 8.071325,
6.6547333, 5.5972417, 7.4226, 5.539725, 7.26335, 5.645975, 6.87475,
5.8486167, 6.3001667, 5.5997833, 7.4353167, 6.5089583, 7.213625,
7.3125667, 6.12095, 6.5410083, 8.0639083, 6.6505167, 5.8886417,
7.6301167, 7.5850417, 5.7693667, 6.2480167, 6.1847167, 6.6896167,
6.6323917, 6.1972167, 8.8560333, 8.5501083, 7.1036167, 4.9929583,
6.9839583, 5.3847417, 6.8814417, 5.59555, 6.7867167, 5.7831333,
6.9370917, 5.7400917, 7.6922, 6.3151, 7.084725, 7.0414417, 5.95435,
6.4274167, 7.6692167, 6.9159, 6.0856083, 7.3079583, 7.1937667,
5.744675, 5.946525, 6.0651833, 6.8488833, 6.5924333, 5.772025,
8.3281167, 8.5475917, 6.7952917, 8.248525, 5.1931083, 7.0688917,
5.4793583, 7.0091583, 5.7593, 7.1053333, 5.9382583, 7.1765417,
6.003075, 7.7699833, 6.2757333, 7.2446583, 7.179275, 6.0013083,
6.447975, 7.7845833, 6.9071083, 6.1009, 7.425425, 7.4619083,
5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417, 6.0763917, 8.5926583,
8.7468417, 7.2485167, 8.5096833, 5.1541, 7.0479917, 5.43065,
6.9689083, 5.7356, 7.0842917, 5.9051667, 7.1283333, 5.9666667,
7.7295583, 6.249925, 7.21005, 7.1427167, 5.9675583, 6.4135667,
7.7448583, 6.874275, 6.0679333, 7.388675, 7.429025, 5.911225,
6.1757167, 6.095225, 7.045775, 6.9870833, 6.0567333, 8.5771167,
8.7541917, 7.3187333, 8.5092083, 5.5746, 7.342925, 5.8561667,
7.4704667, 5.922225, 6.9787, 6.1564167, 7.6059667, 5.9122917,
7.7848833, 6.6192, 7.34055, 7.2352417, 5.9776083, 6.5197583,
7.4891583, 7.2185667, 6.4710167, 7.70945, 7.5078083, 6.1470417,
6.66115, 6.6899333, 7.4454083, 7.2270917, 6.350075, 8.3156667,
8.9007917, 6.7578083, 8.3258083, 5.1996, 6.9688833, 5.3592917,
6.7583417, 5.5623583, 6.756375, 5.7361, 7.120425, 5.6567, 7.6174667,
6.1474833, 7.1442167, 6.74475, 5.5820333, 6.0106, 7.142675, 6.667475,
5.9067917, 7.2392, 7.058675, 5.6394417, 5.9119167, 5.8367333,
6.798025, 6.694675, 5.8565917, 8.6035083, 8.912375, 7.0501083,
8.38045, 4.8478083, 6.7493167, 5.3686667, 6.5152333, 5.282025,
6.5464333, 5.5085583, 6.870975, 5.4757667, 7.318, 5.92225, 6.9300417,
6.5758083, 5.4233083, 5.8295583, 7.0451, 6.4790083, 5.68255,
6.9632833, 6.9965833, 5.5005667, 5.717725, 5.5938083, 6.5309,
6.4824583, 5.4429833, 8.072575, 8.3635, 6.5797167, 8.0352333,
4.6289833, 6.64105, 4.8883833, 6.2025833, 5.2291833, 6.4814667,
5.2211083, 6.5780083, 5.196275, 7.030725, 5.6001583, 6.620475,
6.2858333, 5.114375, 5.5424417, 6.7784917, 6.1561333, 5.339375,
6.6249083, 6.6248583, 5.139775, 5.4195, 5.4531833, 6.3348583,
6.4041417, 5.292, 7.6243833, 7.9624583, 6.3226417, 7.761175,
4.8419083, 6.8384083, 5.3500417, 6.5903333, 5.33275, 6.732575,
5.4486, 6.8069417, 5.4569583, 7.26275, 5.835525, 6.8680333, 6.6712333,
5.4720417, 5.904325, 7.1506917, 6.4746833, 5.638675, 6.9570667,
7.0017333, 5.5033667, 5.6859333, 5.651875, 6.5903, 6.529725,
5.4819667, 7.971975, 8.2337833, 6.5815333, 7.9736583, 5.7711917,
7.543325, 5.8986917, 7.5081333, 6.2920333, 7.5321667, 6.4908917,
7.7616583, 6.4509417, 8.08035, 6.8219, 7.7939167, 7.6491333,
6.4773583, 6.9338667, 8.1865583, 7.3998917, 6.572125, 7.9198417,
8.0568, 6.5880333, 6.8299667, 6.7399833, 7.6436, 7.509275, 6.5139833,
9.1520167, 9.3580667, 7.65415, 9.0725167, 5.7483583, 7.5230417,
5.89105, 7.4808833, 6.1969667, 7.4923583, 6.4092583, 7.70695,
6.3970833, 8.0971333, 6.7949083, 7.76445, 7.6170167, 6.4494333,
6.8997, 8.1575333, 7.3728417, 6.544075, 7.888, 8.0215, 6.5484,
6.7911667, 6.7121917, 7.6179083, 7.4731167, 6.4629167, 9.1226333,
9.3307083, 7.6230583, 9.024875, 5.543925, 7.1460833, 5.6575583,
7.5986083, 6.027075, 7.4386167, 6.3500333, 7.6694833, 6.3682583,
8.0843333, 6.7181083, 7.7376, 7.5818583, 6.4010667, 6.8440083,
8.1217917, 7.3290833, 6.5187333, 7.8591667, 7.9898583, 6.5051,
6.7251167, 6.6881333, 7.477675, 7.3571333, 6.3351833, 8.881575,
9.12315, 7.3851, 8.8008667, 5.3437833, 7.1560417, 5.5748, 7.4622583,
5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685, 8.0270917,
6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417, 8.0401167,
7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667, 6.6103333,
7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625, 7.2460917,
8.660125, 5.2502833, 7.2591, 5.6425417, 6.889925, 5.353675, 6.50635,
6.260675, 7.4236583, 5.9076417, 7.3915, 6.2134917, 7.1645333,
6.922675, 6.0295417, 6.1687917, 7.2771083, 6.6152333, 6.3299417,
7.167325, 6.647275, 5.726475, 5.93905, 6.2888583, 6.7497167,
6.4364083, 5.8906583, 7.6052917, 8.039425, 6.5672833, 7.8754667,
6.3086333, 5.352025, 7.2849417, 5.7184833, 6.9675917, 5.5615333,
6.6157917, 6.3505417, 7.4881, 6.0007417, 7.5110583, 6.35525,
7.254075, 7.0289083, 6.1994417, 6.2860833, 7.372575, 6.735975,
6.4628917, 7.3102167, 6.8619417, 5.9123667, 6.1611917, 6.4854083,
6.8942417, 6.563625, 6.0610083, 7.941625, 8.6969167, 6.66075,
8.1197167, 6.2802, 3.9638, 5.870825, 4.1852, 5.5841417, 4.3007583,
5.2352167, 4.4281417, 5.819425, 4.1990917, 5.9338917, 4.89765,
5.7204333, 5.6546833, 4.5632167, 4.9803333, 5.6962417, 5.247725,
4.7092583, 6.0145417, 5.6403917, 4.4016917, 4.7181, 4.5007833,
5.2828917, 5.1314167, 4.7492, 6.777575, 6.9040083, 4.9760583,
6.4471917, 5.0952833, 3.712725, 5.8215333, 4.025725, 5.5635,
4.2354083, 5.143525, 4.4900083, 5.6802417, 4.1214333, 5.8128,
4.7525583, 5.6412583, 5.5534917, 4.487475, 4.8237833, 5.6156917,
5.0573, 4.5755417, 5.8096083, 5.5252083, 4.3145583, 4.5437417,
4.194675, 5.0100833, 4.8972333, 4.590025, 6.6441417, 6.5789417,
4.6947667, 6.1648167, 4.8517333, 3.982925, 5.7966833, 4.1607083,
5.5564833, 4.2557417, 5.2304083, 4.8661333, 5.912875, 4.4988333,
6.03915, 4.9131583, 5.8518667, 5.6578583, 4.773225, 4.8958583,
5.8759833, 5.204725, 4.8961667, 5.9217, 5.58395, 4.5410667, 4.73445,
4.5922333, 5.2517333, 5.0220333, 4.619475, 6.4883667, 6.429175,
4.6796417, 6.3171083, 4.93615, 3.9278833, 5.7590417, 4.1155667,
5.612725, 4.2199833, 5.2126667, 4.805275, 5.8888833, 4.4363,
6.0380083, 4.892, 5.8192083, 5.64205, 4.708825, 4.8751583, 5.833775,
5.2210417, 4.853225, 5.924225, 5.5856583, 4.5386167, 4.7280917,
4.5618, 5.264425, 5.03855, 4.5539, 6.4993, 6.4900667, 4.6749083,
6.2961333, 4.918525, 4.0890583, 6.33385, 4.3470083, 5.9645, 4.6541833,
5.5438667, 4.9556583, 6.1590583, 4.6379417, 6.2876833, 5.2235167,
6.1387167, 6.0547583, 4.9545667, 5.254125, 6.05395, 5.4813417,
4.9971333, 6.2266583, 5.9172833, 4.7275917, 4.9274917, 4.443575,
5.3164917, 5.2507083, 5.1704583, 7.173075, 6.9351583, 5.0816667,
6.5568, 5.3417667, 5.1705167, 7.0777833, 5.6253333, 7.231225,
5.5799167, 6.6942917, 6.1014583, 7.538725, 5.7152667, 7.459275,
6.2406083, 7.064925, 6.9234417, 5.8328833, 6.1819583, 7.2127583,
6.8071583, 6.2599417, 7.2975417, 6.973875, 5.804125, 6.1944667,
6.38855, 7.0553583, 6.8393167, 6.1275417, 7.9986833, 8.5846,
6.4682167, 8.0134583, 6.1805917, 5.0699583, 6.9006667, 5.36365,
6.9204917, 5.4478667, 6.5391583, 6.0647417, 7.2951667, 5.6632833,
7.25595, 6.1057333, 6.9578417, 6.8235583, 5.8671833, 6.0716417,
7.060175, 6.5401, 6.1229417, 7.1305083, 6.7823417, 5.62415, 5.9202,
5.9957167, 6.7142167, 6.4706417, 5.9004667, 7.8304583, 8.2144667,
6.1530583, 7.6896417, 5.9285333, 4.2625417, 5.9677583, 4.58695,
6.0400083, 4.4215333, 5.6052833, 5.04165, 6.48845, 4.6423583,
6.1688833, 5.0256167, 5.926725, 5.7214667, 4.746375, 4.9828,
6.1583083, 5.6903, 5.217375, 6.1341583, 5.7868083, 4.5895333,
4.98235, 5.159725, 5.7866167, 5.6300833, 4.882975, 6.7210833,
7.4314833, 5.2493083, 6.8503833, 5.2225583, 3.8417833, 5.9798,
4.1168583, 5.63415, 4.3311333, 5.0777667, 4.6606833, 5.789425,
4.3565167, 5.9736167, 4.8910667, 5.9445417, 5.699275, 4.6897167,
4.9036083, 5.8767, 5.088675, 4.6224417, 5.8052833, 5.5697167,
4.3237, 4.6084333, 4.2958833, 5.1394417, 5.0137583, 4.7711, 6.771275,
6.5984417, 4.845625, 6.3338083, 5.1370333, 3.1820167, 5.2699667,
3.4827167, 5.0992583, 3.7040583, 4.6358583, 4.1604917, 5.2488333,
3.7522, 5.3774167, 4.2636167, 5.1998167, 5.0456333, 4.051475,
4.289175, 5.1718917, 4.5787083, 4.1461667, 5.2983167, 5.03025,
3.8709333, 4.0917167, 3.731925, 4.5584167, 4.4200333, 4.061375,
6.064225, 6.02975, 4.1590167, 5.6589083, 4.2614833, 3.68695,
5.587375, 3.91725, 5.3387, 4.0061667, 4.9563833, 4.1942, 5.6720583,
3.9584333, 5.6873583, 4.6251, 5.4801417, 5.3975583, 4.2382, 4.6710917,
5.4898083, 5.0469667, 4.4950083, 5.72005, 5.46085, 4.30355, 4.5525917,
4.3681667, 5.1723167, 5.0331417, 4.4793083, 6.5492917, 6.720225,
4.7550917, 6.197775, 4.8082917, 4.09925, 5.986525, 4.3104417,
5.68455, 4.4287167, 5.3555667, 4.5191083, 5.9269833, 4.2695917,
5.9984167, 4.981225, 5.8049917, 5.7680667, 4.5736667, 5.0673583,
5.7443583, 5.2811083, 4.719175, 6.0376667, 5.73875, 4.3947333,
4.8157333, 4.6093417, 5.3906417, 5.2357417, 4.684825, 6.8885583,
7.018425, 5.0878167, 6.5122333, 5.2084, 3.810525, 6.2600083,
3.6246583, 5.7396417, 4.0617917, 5.6724583, 4.2505833, 4.7518417,
4.1232, 6.208375, 4.5881167, 5.252575, 5.71795, 4.0840583, 4.700325,
6.2360333, 4.701725, 3.922525, 5.5162167, 5.6220333, 3.8836833,
4.4883667, 4.5398583)), .Names = "x", row.names = c(NA, -962L
), class = "data.frame")
Assuming I want 30 values per interval (the 'n'), here is the code I used:
df$z<-cut(df$x, seq(30,length(df$x),by=30)/length(df$x), include.lowest=T)
Which gives me:
> table(df$z)
[0.0312,0.0624] (0.0624,0.0936] (0.0936,0.125] (0.125,0.156] (0.156,0.187] (0.187,0.218] (0.218,0.249] (0.249,0.281] (0.281,0.312] (0.312,0.343] (0.343,0.374]
0 0 0 0 0 0 0 0 0 0 0
(0.374,0.405] (0.405,0.437] (0.437,0.468] (0.468,0.499] (0.499,0.53] (0.53,0.561] (0.561,0.593] (0.593,0.624] (0.624,0.655] (0.655,0.686] (0.686,0.717]
0 0 0 0 0 0 0 0 0 0 0
(0.717,0.748] (0.748,0.78] (0.78,0.811] (0.811,0.842] (0.842,0.873] (0.873,0.904] (0.904,0.936] (0.936,0.967] (0.967,0.998]
0 0 0 0 0 0 0 0 0
What I want is a similar result to what I get with quantiles:
df$zbis<-cut(df$x, quantile(df$x, (0:20)/20), include.lowest=T)
table(df$zbis)
[3.18,4.29] (4.29,4.62] (4.62,4.89] (4.89,5.14] (5.14,5.33] (5.33,5.53] (5.53,5.66] (5.66,5.8] (5.8,5.94] (5.94,6.1] (6.1,6.26] (6.26,6.45] (6.45,6.58] (6.58,6.74] (6.74,6.93]
49 48 48 48 48 48 48 48 48 48 48 48 48 48 48
(6.93,7.14] (7.14,7.34] (7.34,7.62] (7.62,8.06] (8.06,9.36]
48 48 48 48 49
Except I'd like this to be reproducible for another database, and so I can't use the quantile function, since I would not get the same intervals on a different database.
SECOND EDIT: here is the second sample from another database. 'x' is the same variable, and they have similar ranges.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1931083, 7.0688917, 5.4793583, 7.0091583, 5.7593, 7.1053333,
5.9382583, 7.1765417, 6.003075, 7.7699833, 6.2757333, 7.2446583,
7.179275, 6.0013083, 6.447975, 7.7845833, 6.9071083, 6.1009,
7.425425, 7.4619083, 5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417,
6.0763917, 8.5926583, 8.7468417, 7.2485167, 8.5096833, 5.177275,
7.09985, 5.6444667, 7.0102417, 5.7303833, 7.0383333, 5.9870583,
7.3342083, 5.9363667, 7.7753333, 6.38355, 7.389575, 7.0396667,
5.889625, 6.29395, 7.51135, 6.940925, 6.1455417, 7.4281833, 7.4657167,
5.9707083, 6.1902083, 6.0936167, 6.9595167, 6.85065, 5.8525,
8.5148083, 8.805625, 7.00665, 8.4457, 5.3437833, 7.1560417, 5.5748,
7.4622583, 5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685,
8.0270917, 6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417,
8.0401167, 7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667,
6.6103333, 7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625,
7.2460917, 8.660125, 3.614125, 5.6345917, 3.9410417, 5.2901417,
4.0147333, 4.766825, 4.4500417, 5.5189, 4.11375, 5.6350667, 4.5756917,
5.5998833, 5.3663, 4.44405, 4.5767417, 5.552025, 4.847425, 4.4382583,
5.5769417, 5.2390667, 4.0610917, 4.4054833, 4.1917, 4.9029083,
4.6935917, 4.3499417, 6.0562333, 6.081225, 4.45855, 6.0121583,
4.740275, 4.5028, 6.4177833, 4.8716417, 6.1469917, 4.6208917,
5.7748083, 5.4530083, 6.694125, 5.0944333, 6.5123167, 5.3257083,
6.2765333, 6.0149167, 5.1815583, 5.30715, 6.4149083, 5.82245,
5.515425, 6.3654333, 5.8472833, 4.9798917, 5.1833583, 5.5210333,
6.0410667, 5.7377917, 5.2666083, 7.0378167, 7.744175, 5.718725,
7.3220583, 5.24325, 5.3256, 7.2155167, 5.696925, 7.0029667, 5.5235,
6.7261083, 6.2810667, 7.546825, 5.90915, 7.3299167, 6.2227333,
7.147075, 6.9142417, 6.0012083, 6.1725333, 7.29815, 6.7, 6.3454583,
7.2129583, 6.7559833, 5.8115, 6.0756667, 6.458225, 6.9969167,
6.778825, 6.2245833, 8.0809583, 8.875325, 6.7210917, 8.3203,
6.3513, 5.2591333, 7.1404917, 5.6266417, 6.9356, 5.4568, 6.6604,
6.206025, 7.48525, 5.8323667, 7.24635, 6.1446583, 7.066275, 6.8334,
5.9198667, 6.09505, 7.2206583, 6.63085, 6.270075, 7.1397333,
6.689125, 5.7441333, 6.042575, 6.38255, 6.9325833, 6.7175667,
6.1592, 8.00415, 8.8051167, 6.647125, 8.2465667, 6.2788167, 6.49435,
8.1847583, 6.664475, 8.0528583, 6.6822417, 7.376, 7.1517833,
8.2306833, 6.8584583, 8.3052167, 7.288375, 8.2758583, 7.7162583,
7.2807833, 7.0459, 8.2507833, 7.5855, 7.0505917, 8.2230167, 8.1669,
6.8184667, 6.9700583, 7.0936167, 7.7615667, 7.6239083, 7.0921667,
9.02585, 9.3416167, 7.6256333, 9.0869333, 8.0984667, 4.116325,
6.1680917, 4.56965, 5.797725, 4.36085, 5.42455, 5.144075, 6.1531833,
4.77825, 6.2533417, 5.0192083, 5.99395, 5.6934083, 4.9074167,
4.9823083, 5.9861667, 5.4068833, 5.1872833, 6.10095, 5.659325,
4.6632833, 4.86315, 5.221775, 5.5878, 5.3217083, 4.8202333, 6.4883083,
6.69355, 4.952075, 6.7075583, 5.00015, 5.2502833, 7.2591, 5.6425417,
6.889925, 5.353675, 6.50635, 6.260675, 7.4236583, 5.9076417,
7.3915, 6.2134917, 7.1645333, 6.922675, 6.0295417, 6.1687917,
7.2771083, 6.6152333, 6.3299417, 7.167325, 6.647275, 5.726475,
5.93905, 6.2888583, 6.7497167, 6.4364083, 5.8906583, 7.6052917,
8.039425, 6.5672833, 7.8754667, 6.3086333, 5.352025, 7.2849417,
5.7184833, 6.9675917, 5.5615333, 6.6157917, 6.3505417, 7.4881,
6.0007417, 7.5110583, 6.35525, 7.254075, 7.0289083, 6.1994417,
6.2860833, 7.372575, 6.735975, 6.4628917, 7.3102167, 6.8619417,
5.9123667, 6.1611917, 6.4854083, 6.8942417, 6.563625, 6.0610083,
7.941625, 8.6969167, 6.66075, 8.1197167, 6.2802, 3.9638, 5.870825,
4.1852, 5.5841417, 4.3007583, 5.2352167, 4.4281417, 5.819425,
4.1990917, 5.9338917, 4.89765, 5.7204333, 5.6546833, 4.5632167,
4.9803333, 5.6962417, 5.247725, 4.7092583, 6.0145417, 5.6403917,
4.4016917, 4.7181, 4.5007833, 5.2828917, 5.1314167, 4.7492, 6.777575,
6.9040083, 4.9760583, 6.4471917, 5.0952833, 3.712725, 5.8215333,
4.025725, 5.5635, 4.2354083, 5.143525, 4.4900083, 5.6802417,
4.1214333, 5.8128, 4.7525583, 5.6412583, 5.5534917, 4.487475,
4.8237833, 5.6156917, 5.0573, 4.5755417, 5.8096083, 5.5252083,
4.3145583, 4.5437417, 4.194675, 5.0100833, 4.8972333, 4.590025,
6.6441417, 6.5789417, 4.6947667, 6.1648167, 4.8517333, 4.1059833,
5.9023167, 4.2812417, 5.6593917, 4.3587583, 5.3359583, 4.983275,
6.0223417, 4.6178333, 6.1545333, 5.0244667, 5.9596, 5.7608833,
4.8875333, 4.9990583, 5.9919333, 5.3157417, 5.0169333, 6.024775,
5.6717167, 4.6372083, 4.8370583, 4.7311333, 5.3704, 5.133575,
4.7174917)), .Names = "x", row.names = c(NA, -455L), class = "data.frame")
Updated after some comments:
Since you state that the minimum number of cases in each group would be fine for you, I'd go with Hmisc::cut2
v <- rnorm(10, 0, 1)
Hmisc::cut2(v, m = 3) # minimum of 3 cases per group
The documentation for cut2 states:
m desired minimum number of observations in a group.
The algorithm does not guarantee that all groups will have at least m observations.
The same cuts for separate variables
If the distributions of your variables are very similar you could extract the exact cutpoints by setting the argument onlycuts = T and reuse them for the other variables. In case the distributions are different though, you will end up with few cases in some intervals.
Using your data:
library(magrittr)
library(Hmisc)
cuts <- cut2(df1$x, g = 20, onlycuts = T) # determine cuts based on df1
cut2(df1$x, cuts = cuts) %>% table
cut2(df2$x, cuts = cuts) %>% table*2 # multiplied by two for better comparison
This is a good example of how NOT to pose a question. At last we have an example an, it is possible to post code that applies to it. (You apparently naively pasted the exact code in my comment without thinking about how to express 'n' and 'N' in the context of the problem. I did need to add prob=c( seq(...) , 1) in order to capture the highest values.
This assumes that you want groups of size 100 (although it is still very unclear why this is needed).
x$xct <- cut( x$x, breaks=quantile(x$x, prob=c( seq(100, length(x$x), by=100)/length(x$x) , 1) ))
table(x$xct)
(4.64,5.17] (5.17,5.57] (5.57,5.85] (5.85,6.17] (6.17,6.51] (6.51,6.85]
100 100 100 100 100 100
(6.85,7.26] (7.26,7.94] (7.94,9.36]
100 100 62

How can I apply fisher test on this set of data (nominal variables)

I'm pretty new in statistics:
fisher = function(idxToTest, idxATI){
idxDependent=c()
dependent=c()
p = c()
for(i in c(1:length(idxToTest)))
{
tbl = table(data[[idxToTest[i]]], data[[idxATI]])
rez = fisher.test(tbl, workspace = 20000000000)
if(rez$p.value<0.1){
dependent=c(dependent, TRUE)
if(rez$p.value<0.1){
idxDependent = c(idxDependent, idxToTest[i])
}
}
else{
dependent = c(dependent, FALSE)
}
p = c(p, rez$p.value)
}
}
This is the function I use. It seems to work.
What I understood until now is that I have to pass as first parameter data like:
Men Women
Dieting 10 30
Non-dieting 5 60
My data comes from a CSV:
data = read.csv('***.csv', header = TRUE, sep=',');
My first problem is that I don't know how to converse from:
Loan.Purpose Home.Ownership
lp_value_1 ho_value_2
lp_value_1 ho_value_2
lp_value_2 ho_value_1
lp_value_3 ho_value_2
lp_value_2 ho_value_3
lp_value_4 ho_value_2
lp_value_3 ho_value_3
to:
ho_value_1 ho_value_2 ho_value_3
lp_value1 0 2 0
lp_value2 1 0 1
lp_value3 0 1 1
lp_value4 0 1 0
The second issue is that I don't know what the second parameter should be
POST UPDATE: This is what I get using fisher.test(myTable):
Error in fisher.test(test) : FEXACT error 501.
The hash table key cannot be computed because the largest key
is larger than the largest representable int.
The algorithm cannot proceed.
Reduce the workspace size or use another algorithm.
where myTable is:
MORTGAGE NONE OTHER OWN RENT
car 18 0 0 5 27
credit_card 190 0 2 38 214
debt_consolidation 620 0 2 87 598
educational 5 0 0 3 7
...
Basically, fisher tests only work on smallish data sets because they require alot of memory. But all is good because chi-square tests make minimal additional assumptions and are easier on the computer. Just do:
chisq.test(Loan.Purpose,Home.Ownership)
to get your p-values.
Make sure you read through and understand the help page for chisq.test, especially the examples at the bottom.
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/chisq.test.html
Then look at a mosaicplot to see the quantities like:
mosaicplot(Loan.Purpose,Home.Ownership)
this reference explains how mosaicplots work.
http://alumni.media.mit.edu/~tpminka/courses/36-350.2001/lectures/day12/

Resources