How to get indices of outliers in a dataframe boxplot? - r

I have a dataframe and I want to get each columns of outliers indices.
Here is part of my dataframe;
mediamarkt[,48]
[1] 7126 4012 3711 3237 3432 2671 2861 7065 3158 4023 4770 3861
[13] 4108 7408 9071 3596 3889 4093 4446 6059 8345 10291 5546 5129
[25] 4683 4670 5694 8619 11047 5743 5775 5216 5283 4854 7871 9944
[37] 3797 3821 3834 3999 4577 8898 11396 4508 5459 3668 3885 4021
[49] 7491 8831 3513 3606 3332 3189 3656 6859 9167 3306 3305 3379
[61] 3507 3912 6562 8245 3420 3445 3530 3404 3847 7187 9128 3623
[73] 3581 3401 2784 3024 6342 7835 2766 2718 2578 2591 2737 5479
[85] 7064 2528 2550 2287 1893 1846
First of all I have tried to get value of outliers with this codes:
boxplot(mediamarkt[,48])$out and I get 2 outliers;
[1] 11047 11396
Everything is okey so far but when I need to get indices of outliers with these code below:
which(mediamarkt[,48] %in% boxplot_mediamarkt$out)
[1] 5 18 29 43 59
I get more than 2 outliers, it does not match these results
What is wrong with my codes
Could anyone help me about solve my problem?

#G5W has asked a question that remains open. This code shows how to do easy input for your data and suggests that your boxplot_mediamarkt is not the output of boxplot or boxplot.stats from your data.
dat <- scan()
1: 7126 4012 3711 3237 3432 2671 2861 7065 3158 4023 4770 3861
13: 4108 7408 9071 3596 3889 4093 4446 6059 8345 10291 5546 5129
25: 4683 4670 5694 8619 11047 5743 5775 5216 5283 4854 7871 9944
37: 3797 3821 3834 3999 4577 8898 11396 4508 5459 3668 3885 4021
49: 7491 8831 3513 3606 3332 3189 3656 6859 9167 3306 3305 3379
61: 3507 3912 6562 8245 3420 3445 3530 3404 3847 7187 9128 3623
73: 3581 3401 2784 3024 6342 7835 2766 2718 2578 2591 2737 5479
85: 7064 2528 2550 2287 1893 1846
91:
Read 90 items
> boxplot(dat)$out
[1] 11047 11396
> which(dat %in% boxplot(dat)$out)
[1] 29 43

Related

Adding a second y-axis as a formula of the first x and y-axis

I've got the following data:
ClusterID AvgGenes nCoreGenes Ratio
20001 1941 1572 0.809892
20005 1599 1374 0.859287
20008 2017 1712 0.848785
20009 1808 1590 0.879425
20013 1823 1469 0.805815
20015 2056 1677 0.815661
20019 2135 1783 0.835129
20020 3152 2625 0.832805
20026 2028 1586 0.782051
20028 1835 1420 0.773842
20030 2885 2189 0.758752
20031 1772 1485 0.838036
20032 1722 1473 0.855401
20034 1801 1459 0.810105
20035 1677 1339 0.798450
20042 2193 1651 0.752850
20047 1747 1345 0.769891
20049 1306 1008 0.771822
20051 1738 1358 0.781358
20052 1552 1188 0.765464
20062 2179 1509 0.692520
20065 2047 1894 0.925256
20074 1948 1568 0.804928
20088 2588 2192 0.846986
20103 1916 1341 0.699896
20109 2511 2190 0.872162
20117 1668 1278 0.766187
20162 1936 1601 0.826963
20167 2068 1856 0.897485
20168 4375 3992 0.912457
20170 3961 3252 0.821005
20190 2327 2013 0.865062
20196 3350 2522 0.752836
20198 3028 2302 0.760238
20207 1522 1241 0.815375
20208 1791 1546 0.863205
20215 3013 1853 0.615002
20219 2803 2043 0.728862
20225 4604 2931 0.636620
20247 1927 1567 0.813181
20248 2510 1732 0.690040
20251 2252 1674 0.743339
20279 2843 1775 0.624340
20293 1611 1245 0.772812
20313 2277 1914 0.840580
20314 2320 1915 0.825431
20318 2201 1762 0.800545
20320 2287 1943 0.849585
20321 2060 1645 0.798544
20323 2242 1524 0.679750
20327 2132 1845 0.865385
20328 1685 1402 0.832047
20329 2393 1727 0.721688
20341 2190 1729 0.789498
20368 3906 2991 0.765745
20370 3245 2325 0.716487
20373 2608 1935 0.741948
20374 3632 2380 0.655286
20388 1787 1435 0.803022
20408 1506 1262 0.837981
20423 1979 1428 0.721577
20433 2452 1646 0.671289
20459 2118 1649 0.778565
20462 1778 1496 0.841395
20478 1653 1447 0.875378
20492 2709 1895 0.699520
20494 2686 1773 0.660089
20498 2676 1909 0.713378
20508 1425 1092 0.766316
20517 2461 1983 0.805770
20548 2752 2059 0.748183
20565 2239 1764 0.787852
20566 2368 1882 0.794764
20569 2285 1877 0.821444
20572 2179 1703 0.781551
20573 1609 1355 0.842138
20577 1753 1379 0.786651
20579 1786 1426 0.798432
20589 1811 1239 0.684152
20600 2293 1822 0.794592
20650 1693 1422 0.839929
20677 1904 1485 0.779937
20729 1680 1362 0.810714
20742 2210 1855 0.839367
20744 1583 1372 0.866709
20746 2087 1743 0.835170
20750 1859 1418 0.762776
20753 1701 1496 0.879483
20758 1480 1169 0.789865
20759 1839 1406 0.764546
20772 2068 1786 0.863636
20773 2321 2024 0.872038
20775 2528 2012 0.795886
20784 1869 1592 0.851792
20788 1843 1516 0.822572
20809 1541 1352 0.877352
20811 1569 1346 0.857871
20824 1594 1323 0.829987
20836 2287 1688 0.738085
20857 2252 1704 0.756661
20890 1884 1340 0.711253
20903 1681 1404 0.835217
20966 1826 1455 0.796824
20967 1877 1605 0.855088
20990 2125 1605 0.755294
21002 1743 1345 0.771658
21027 1866 1504 0.806002
21047 2866 2191 0.764480
21049 2163 1596 0.737864
21059 2298 1847 0.803742
21085 1640 1490 0.908537
21258 3002 1950 0.649567
21325 2945 2117 0.718846
21326 2343 1996 0.851899
21348 2362 1809 0.765876
21370 2313 1553 0.671422
21384 1932 1383 0.715839
21405 1948 1398 0.717659
21477 1852 1538 0.830454
21584 2514 1838 0.731106
21586 1247 910 0.729751
21734 1619 1452 0.896850
21818 1593 1363 0.855618
21826 2688 2009 0.747396
21845 2595 1854 0.714451
21889 1678 1285 0.765793
22085 1718 1314 0.764843
22153 1290 1139 0.882946
22347 2356 1629 0.691426
22359 2170 1552 0.715207
22396 1648 1337 0.811286
I would like to use AvgGenes as my x-axis and nCoreGenes as my primary y-axis. In addition, I would like to add a second y-axis for the ratio which is nCoreGenes/AvgGenes*100 (pCoreGenes). However, I couldn't find the right formula: y-axis/x-axis*100 to use for scale_y_continuous(sec.axis()) in ggplot2.
cluster2core$pCoreGenes <- cluster2core$Ratio*100
g6 <- ggplot(cluster2core, aes(AvgGenes, nCoreGenes))
g6 <- g6 + geom_point(aes(y = nCoreGenes)) + geom_smooth(method = lm)
g6 <- g6 + geom_line(aes(y = pCoreGenes))
g6 <- g6 + labs(y = "Number of core genes", x = "Average number of genes")
#g6 <- g6 + scale_y_continuous(sec.axis = sec_axis())
The mean value of the ratio % is 78.7 so I expect to get a horizontal line which indicates that on average genomes has 78% core genes.
Using secondary axes in ggplot requires a cheat. You need to pretend that your secondary y axis data are in the same range as the primary y axis data, so scale it accordingly. Multiplying by 100 does not suffice, as you want to have the data in the range around 1000 or so. Multiplying by 4000 should get you there.
Then, you need to reverse the process for the axis, specifying an argument to sec_axis. Normally, you would divide by 4000, but since you want percentage, divide by 40:
ggplot(df, aes(x=AvgGenes, y=nCoreGenes)) + geom_point() +
geom_smooth(method=lm) +
geom_line(aes(y=Ratio*4000)) +
scale_y_continuous(sec.axis=sec_axis( ~ . / 40))
Also, there is no need to specify the esthetics in geom_point since it is inherited from the esthetics in the ggplot() call.

Change color and linetype of one line in the plot

I have a dataset collected from a spectrometer; the number of spectra can be between 5 and a few hundred. I added a reference spectra to the dataset and plotted the whole lot. To distinguish between the spectra and the ref, I would like to be able to change the color and line type of the ref.
I already tried with scale_linetype_manual and scale_color_manual but did not succeed.
Wavelength EGG5 EGG6 EGG7 EGG8 EGG9 EGG10 Mean
337 516.87 3362 3400 3577 2727 3321 3627 3273.874
338 517.15 3389 3496 3567 2727 3368 3698 3285.288
339 517.44 3370 3479 3590 2728 3410 3711 3291.471
340 517.72 3363 3541 3584 2744 3403 3732 3308.022
341 518.01 3332 3528 3617 2780 3403 3722 3318.277
342 518.29 3350 3547 3610 2779 3413 3748 3321.633
343 518.58 3334 3489 3628 2759 3454 3772 3323.353
344 518.87 3371 3462 3630 2742 3499 3774 3341.572
345 519.15 3368 3494 3666 2761 3494 3747 3357.921
346 519.44 3407 3524 3639 2799 3470 3765 3364.273
347 519.72 3426 3536 3657 2791 3455 3792 3368.371
348 520.01 3449 3512 3640 2771 3462 3812 3380.655
349 520.29 3433 3494 3685 2761 3490 3791 3395.201
350 520.58 3478 3519 3673 2773 3475 3800 3406.921
351 520.86 3481 3518 3717 2798 3482 3802 3413.669
352 521.15 3497 3551 3694 2809 3468 3797 3417.065
353 521.43 3444 3551 3745 2832 3486 3785 3416.489
354 521.72 3447 3579 3710 2814 3516 3820 3422.496
355 522.00 3428 3575 3738 2787 3481 3843 3422.108
356 522.29 3475 3566 3725 2800 3518 3897 3429.982
357 522.57 3472 3577 3726 2832 3491 3894 3427.777
358 522.86 3512 3601 3714 2866 3552 3882 3444.234
359 523.14 3492 3630 3731 2846 3529 3835 3444.288
360 523.43 3515 3609 3744 2856 3567 3821 3455.061
361 523.71 3523 3621 3753 2816 3595 3845 3459.338
362 524.00 3522 3612 3720 2800 3555 3829 3457.784
363 524.28 3527 3641 3731 2791 3552 3861 3466.151
364 524.57 3511 3647 3742 2859 3530 3880 3469.151
365 524.85 3508 3656 3760 2876 3586 3917 3483.457
366 525.14 3526 3629 3745 2858 3609 3896 3475.342
367 525.42 3533 3598 3763 2845 3621 3907 3484.658
368 525.71 3600 3616 3774 2829 3623 3909 3485.237
369 525.99 3587 3633 3815 2861 3603 3933 3496.604
370 526.28 3598 3677 3795 2873 3593 3901 3495.921
371 526.56 3583 3676 3763 2914 3589 3910 3495.032
372 526.85 3586 3672 3719 2921 3588 3887 3491.266
373 527.13 3573 3653 3719 2929 3590 3875 3485.712
374 527.42 3537 3652 3734 2927 3582 3878 3489.273
375 527.70 3567 3681 3761 2938 3607 3880 3495.421
376 527.99 3587 3700 3751 2931 3632 3924 3506.284
377 528.27 3607 3705 3749 2928 3620 3890 3508.827
378 528.56 3588 3708 3736 2899 3631 3890 3504.655
379 528.84 3552 3685 3742 2882 3602 3867 3495.597
380 529.13 3554 3666 3716 2897 3614 3902 3499.396
381 529.41 3513 3618 3742 2877 3571 3934 3498.964
382 529.70 3561 3619 3753 2884 3598 3949 3504.813
383 529.98 3582 3622 3816 2844 3618 3919 3506.935
384 530.26 3625 3667 3805 2868 3613 3897 3510.910
385 530.55 3603 3659 3839 2894 3618 3917 3507.428
386 530.83 3593 3691 3830 2920 3607 3954 3501.018
387 531.12 3613 3672 3833 2909 3631 3966 3503.317
388 531.40 3631 3675 3815 2878 3581 3925 3500.795
389 531.69 3627 3615 3771 2890 3569 3909 3503.964
390 531.97 3614 3633 3765 2865 3590 3893 3501.849
391 532.26 3596 3650 3775 2868 3664 3923 3512.464
392 532.54 3615 3700 3797 2852 3673 3943 3519.813
393 532.82 3584 3704 3781 2873 3591 3952 3517.381
394 533.11 3593 3723 3781 2878 3543 3931 3511.986
395 533.39 3631 3742 3776 2885 3561 3929 3516.270
396 533.68 3642 3739 3823 2900 3608 3935 3523.061
397 533.96 3663 3696 3810 2893 3639 3974 3534.691
398 534.25 3590 3678 3814 2886 3649 3968 3533.288
399 534.53 3584 3677 3821 2866 3657 3988 3535.514
400 534.81 3587 3711 3795 2872 3624 3964 3531.615
401 535.10 3575 3681 3827 2904 3639 3993 3539.518
402 535.38 3554 3683 3763 2928 3634 3959 3549.532
403 535.67 3531 3673 3778 2957 3646 3951 3553.817
404 535.95 3577 3687 3761 2921 3660 3936 3543.496
405 536.24 3630 3662 3828 2940 3679 3966 3550.511
406 536.52 3664 3670 3854 2875 3689 3983 3557.536
407 536.80 3664 3709 3858 2874 3655 3998 3577.245
408 537.09 3644 3778 3835 2893 3680 4017 3580.881
409 537.37 3656 3796 3837 2949 3695 4044 3593.831
410 537.66 3659 3799 3860 2979 3708 4052 3589.629
411 537.94 3679 3786 3892 2956 3693 4042 3587.568
412 538.22 3635 3778 3906 2915 3670 4027 3588.734
413 538.51 3671 3793 3909 2925 3686 4062 3608.561
414 538.79 3700 3791 3861 2920 3687 4121 3621.248
415 539.08 3731 3848 3862 2997 3722 4120 3637.788
416 539.36 3723 3849 3875 3033 3742 4106 3647.777
417 539.64 3739 3868 3942 3075 3769 4072 3657.374
418 539.93 3735 3861 3952 3044 3767 4116 3663.442
419 540.21 3776 3890 4021 3057 3765 4101 3675.629
420 540.50 3765 3910 4030 3033 3785 4102 3691.608
421 540.78 3831 3931 4078 3054 3836 4103 3709.950
422 541.06 3827 3953 4098 3016 3843 4130 3725.504
423 541.35 3871 3969 4112 3025 3872 4163 3744.403
424 541.63 3878 3966 4132 3055 3856 4181 3756.741
425 541.91 3864 3966 4089 3076 3920 4213 3766.216
426 542.20 3842 3999 4107 3098 3935 4246 3780.737
427 542.48 3866 4038 4107 3066 3950 4254 3788.410
428 542.76 3930 4043 4120 3085 3958 4285 3812.446
429 543.05 3970 4069 4137 3096 3935 4272 3838.629
430 543.33 3968 4103 4153 3130 3948 4341 3870.597
431 543.62 3971 4145 4196 3174 3966 4334 3883.356
432 543.90 3980 4171 4238 3192 4048 4384 3901.036
433 544.18 4016 4156 4282 3206 4105 4393 3918.155
434 544.47 4054 4195 4315 3200 4124 4458 3945.201
435 544.75 4123 4221 4347 3241 4131 4490 3976.065
436 545.03 4176 4279 4362 3229 4129 4490 3999.345
437 545.32 4196 4294 4368 3247 4121 4476 4019.050
438 545.60 4197 4336 4376 3242 4138 4525 4036.065
439 545.88 4205 4379 4394 3294 4190 4553 4080.453
440 546.17 4250 4383 4471 3291 4259 4621 4112.651
441 546.45 4324 4413 4489 3328 4306 4622 4146.709
442 546.73 4355 4437 4542 3381 4369 4690 4163.784
443 547.02 4385 4542 4562 3447 4397 4711 4202.871
444 547.30 4441 4560 4605 3453 4484 4790 4231.701
445 547.58 4493 4630 4594 3435 4496 4788 4264.464
446 547.87 4515 4671 4592 3412 4532 4827 4284.018
447 548.15 4518 4726 4666 3461 4496 4808 4310.137
448 548.43 4590 4765 4762 3520 4554 4856 4343.259
449 548.72 4610 4805 4823 3571 4604 4889 4375.237
450 549.00 4653 4848 4850 3564 4670 4916 4403.590
451 549.28 4626 4839 4848 3553 4663 4962 4432.097
452 549.57 4713 4836 4867 3639 4691 5003 4470.795
453 549.85 4768 4859 4896 3689 4714 5068 4510.371
454 550.13 4828 4895 4951 3728 4716 5126 4545.076
455 550.41 4810 4993 4972 3643 4730 5148 4572.414
456 550.70 4836 5075 5021 3698 4762 5176 4607.784
457 550.98 4883 5174 5043 3698 4844 5188 4641.068
458 551.26 4944 5204 5156 3806 4877 5272 4686.183
459 551.55 4994 5241 5183 3795 4912 5349 4717.932
460 551.83 5029 5247 5224 3889 4961 5370 4750.104
461 552.11 5112 5277 5189 3868 5019 5379 4782.029
462 552.40 5134 5343 5196 3906 5050 5415 4823.004
463 552.68 5185 5426 5247 3912 5103 5462 4863.273
464 552.96 5212 5459 5318 3986 5142 5488 4900.011
465 553.24 5284 5462 5384 4000 5240 5544 4945.446
466 553.53 5339 5466 5449 3985 5262 5594 4976.561
467 553.81 5391 5554 5513 3986 5318 5612 5019.626
468 554.09 5415 5640 5570 4006 5319 5612 5045.396
469 554.38 5448 5721 5614 4064 5380 5674 5098.770
470 554.66 5544 5765 5653 4051 5437 5751 5138.752
471 554.94 5585 5793 5634 4089 5494 5821 5176.392
472 555.22 5624 5808 5667 4112 5563 5827 5202.475
473 555.51 5622 5837 5693 4185 5622 5892 5236.281
474 555.79 5648 5845 5760 4214 5672 5918 5265.612
475 556.07 5709 5953 5783 4286 5668 5985 5316.511
476 556.35 5757 6016 5849 4313 5704 5955 5359.302
477 556.64 5797 6109 5927 4322 5776 5991 5412.820
478 556.92 5848 6135 5989 4292 5832 6037 5443.241
479 557.20 5856 6199 6049 4292 5852 6167 5479.432
480 557.48 5949 6274 6125 4354 5838 6226 5515.306
481 557.77 5991 6303 6165 4419 5896 6261 5553.723
482 558.05 6085 6362 6209 4445 5955 6272 5592.842
483 558.33 6091 6392 6258 4475 5999 6315 5629.468
484 558.61 6158 6483 6267 4465 6062 6388 5669.486
485 558.90 6205 6511 6315 4541 6146 6430 5713.281
486 559.18 6244 6546 6314 4565 6237 6515 5755.496
487 559.46 6304 6549 6393 4619 6261 6565 5793.651
488 559.74 6357 6602 6351 4598 6276 6610 5810.910
489 560.03 6423 6661 6353 4623 6281 6660 5840.727
490 560.31 6467 6769 6401 4636 6325 6656 5883.446
491 560.59 6446 6782 6536 4720 6361 6675 5938.773
492 560.87 6508 6876 6651 4722 6408 6648 5980.385
493 561.16 6552 6881 6669 4736 6441 6703 6014.791
494 561.44 6672 6943 6663 4734 6442 6769 6043.910
495 561.72 6757 6985 6649 4743 6507 6820 6085.784
496 562.00 6801 7076 6689 4786 6544 6868 6136.007
497 562.28 6838 7177 6721 4837 6661 6932 6185.385
498 562.57 6862 7240 6782 4935 6727 7017 6229.784
499 562.85 6922 7301 6894 5022 6785 7087 6265.989
500 563.13 7003 7353 6959 5079 6823 7082 6320.410
501 563.41 7064 7414 7001 5081 6841 7169 6374.471
502 563.69 7120 7451 7003 5072 6921 7197 6425.284
503 563.98 7157 7513 7069 5065 6989 7331 6466.151
504 564.26 7241 7588 7096 5074 7043 7369 6511.083
505 564.54 7294 7653 7181 5089 7088 7459 6567.119
506 564.82 7353 7735 7219 5127 7095 7479 6609.558
507 565.10 7430 7764 7311 5179 7175 7515 6649.820
508 565.39 7493 7842 7364 5259 7265 7579 6711.403
509 565.67 7606 7949 7443 5277 7345 7672 6786.032
510 565.95 7662 8057 7549 5332 7442 7773 6872.349
511 566.23 7732 8162 7618 5332 7519 7855 6916.7
AEDDataMelt <- data.frame( melt(AEDPlotData, id.vars = 'Wavelength', variable = 'series'))
Q <- ggplot()+
geom_line(data = subset(AEDDataMelt, series!="Mean"),aes(x=Wavelength, y=value, col=series))+
scale_color_manual(values = brewer.pal(n=(ncol(AEDPlotData)-1),name = "Dark2" ))+
geom_line(data = subset(AEDDataMelt, series=="Mean"),aes(x = Wavelength, y=value,col= series, linetype = series))+
scale_linetype_manual(values = "dash")+
ggplotly(Q)
At this point I get a dashed line but did not succeed in changing the color of the mean reference line. Also the legend changed in format ffrom EGG5 to (EGG5,1).
I don't know about the ggplotly bit, but manually setting a specific colour and linetype for only one of your lines is quite straightforward. Your code seems to be somewhat overcomplicated for your task and you could simply achieve what you stated by manually adding those scales:
ggplot(AEDDataMelt) +
geom_line(aes(Wavelength, value, colour = series, linetype = series)) +
# First six lines are of one type, last line should be different
scale_linetype_manual(values = c(1,1,1,1,1,1,2)) +
# First six lines are from the brewer pallette, last one a custom colour
scale_colour_manual(values = c(brewer.pal(6, "Dark2"), "black"))
Which gave me this plot:
With the help of teunbrand,
I came to the following solution:
AEDDataMelt <- data.frame( melt(AEDPlotData, id.vars = 'Wavelength', variable =
'series'))
MyLineType <- rep.int(1,(length(AEDPlotData[1, 1:(ncol(AEDPlotData)-2)])))
MyLineType <- append(MyLineType,2,after = length(MyLineType))
Q <- ggplot(AEDDataMelt) +
geom_line(aes(Wavelength, value, colour = series, linetype = series)) +
# First n lines are of one type, last line is reference and should be different
scale_linetype_manual(values = MyLineType)+
# First six lines are from the brewer pallette, last one a custom colour
scale_colour_manual(values = c(brewer.pal((length(MyLineType)-1), "Dark2"),
"black"))
ggplotly(Q)

Time Series - Compare two series and check for the relation between them

I have the following time series data. Series1 and Series2 are to be compared and to check for any relation between the two series.
My requirements are.
1) How to check whether there is any relation between the series, such that one series is impacting the other after certain days? How can it be established ?
series1 shows similarity to series2 after 40-50 days. but I got a correlation -0.3345712
serdata <- read.csv("Timeseries.csv")
library("graphics")
plot.ts(serdata)
cor(serdata$Series1,serdata$Series2)
2) What method can be used to analyse such data ?
Moving average can be applied or ARIMA to smooth the curve and to check for a fit ? (I am new to time series analysis. Please suggest any other method)
The data is
date Series1 Series2
9/27/2016 5431 4451
9/28/2016 5468 4889
9/29/2016 5160 5002
9/30/2016 5267 5452
10/1/2016 5097 6223
10/2/2016 4749 6593
10/3/2016 5396 4574
10/4/2016 6001 4285
10/5/2016 6266 5323
10/6/2016 6344 4689
10/7/2016 5992 5499
10/8/2016 5147 5852
10/9/2016 4712 4987
10/10/2016 5328 2680
10/11/2016 6171 3128
10/12/2016 6235 2189
10/13/2016 7286 2532
10/14/2016 7230 3296
10/15/2016 7027 5019
10/16/2016 6063 4222
10/17/2016 5579 2482
10/18/2016 7155 2742
10/19/2016 6938 2611
10/20/2016 6805 2248
10/21/2016 6643 3463
10/22/2016 5620 5030
10/23/2016 6260 6164
10/24/2016 5504 4192
10/25/2016 4035 2879
10/26/2016 4054 2333
10/27/2016 6922 2710
10/28/2016 6848 3568
10/29/2016 5598 5415
10/30/2016 5069 5974
10/31/2016 5537 2314
11/1/2016 6264 2334
11/2/2016 7109 3379
11/3/2016 7411 2846
11/4/2016 7314 3183
11/5/2016 6095 4865
11/6/2016 5279 4948
11/7/2016 4295 3159
11/8/2016 4638 2724
11/9/2016 3536 2866
11/10/2016 3600 3600
11/11/2016 2995 5198
11/12/2016 2432 5192
11/13/2016 2516 4482
11/14/2016 2576 0
11/15/2016 3739 3
11/16/2016 3860 3284
11/17/2016 3587 2938
11/18/2016 3155 3710
11/19/2016 3446 4293
11/20/2016 2682 4239
11/21/2016 3198 4086
11/22/2016 3299 3054
11/23/2016 3134 3194
11/24/2016 2819 3033
11/25/2016 2324 3927
11/26/2016 2093 4824
11/27/2016 2493 4685
11/28/2016 3155 3072
11/29/2016 3510 3139
11/30/2016 3517 4363
12/1/2016 3315 2780
12/2/2016 3640 4075
12/3/2016 3186 5207
12/4/2016 2445 5327
12/5/2016 2812 4223
12/6/2016 3321 3179
12/7/2016 3163 3329
12/8/2016 3325 3747
12/9/2016 3007 3534
12/10/2016 2492 4673
12/11/2016 2364 5205
12/12/2016 2986 3600
12/13/2016 3313 4541
12/14/2016 3425 3823
12/15/2016 4385 3770
12/16/2016 3736 4370
12/17/2016 2336 4478
12/18/2016 2211 5396
12/19/2016 2322 5140
12/20/2016 2342 4089
12/21/2016 2262 4231
12/22/2016 2043 5657
12/23/2016 1723 6215
12/24/2016 1408 4211
12/25/2016 1463 0
12/26/2016 2248 5503
12/27/2016 3424 9483
12/28/2016 2925 6956
12/29/2016 2029 8992
12/30/2016 1950 6062
12/31/2016 1838 6326
1/1/2017 1964 7463
1/2/2017 2232 8426
1/3/2017 2480 8084
1/4/2017 2606 7026
1/5/2017 2606 6295
1/6/2017 2693 7179
1/7/2017 2458 5745
1/8/2017 2362 5690
1/9/2017 2767 5761
1/10/2017 2141 6332
1/11/2017 2355 6240
1/12/2017 3000 6710
1/13/2017 2921 5698
1/14/2017 2558 6156
1/15/2017 2407 7415
1/16/2017 2613 5742
1/17/2017 3005 5779
1/18/2017 3128 5784
1/19/2017 2961 5331
1/20/2017 2582 5476
1/21/2017 2191 7710
1/22/2017 2214 7187
1/23/2017 2649 7676
1/24/2017 3065 4742
1/25/2017 3216 5153
1/26/2017 3548 4817
1/27/2017 4316 5976
1/28/2017 4355 6145
1/29/2017 4848 5764
1/30/2017 4376 5305
1/31/2017 3808 4760
2/1/2017 4172 4752
2/2/2017 8098 4527
2/3/2017 7891 5206
2/4/2017 3484 6209
2/5/2017 3625 5729
2/6/2017 4219 7056
2/7/2017 4282 4955
2/8/2017 3982 4185
2/9/2017 3680 4090
2/10/2017 3314 3881
2/11/2017 2985 5280
2/12/2017 3266 6471
2/13/2017 3665 5840
2/14/2017 3892 4530
2/15/2017 3953 3993
2/16/2017 3855 4453
2/17/2017 3511 5570
2/18/2017 3222 7479
2/19/2017 3284 5349
2/20/2017 3615 4098
2/21/2017 3915 5032
2/22/2017 3994 4256
2/23/2017 3765 6215
2/24/2017 3494 4480
2/25/2017 3257 5995
2/26/2017 3399 6412
2/27/2017 3797 5450
2/28/2017 4076 3935

Neural network time series forecasting in R does not work with my data?

I am trying this example with my data set but it gives me very strange results: Example of Time Series Prediction using Neural Networks in R
Do you have any idea why it is like this?
This is my source code:
require(quantmod)
require(nnet)
require(caret)
series = read.csv("data.csv")
model <- train(y ~ x1+x2 , series, method='nnet', linout=TRUE, trace = FALSE)
series["o"] <- predict(model, series)
plot.ts(series)
write.csv(series, paste(format(Sys.time(), "%Y%m%d%I%p"), "csv", sep = "."))
This is my data set:
3938
1317
4021
10477
9379
7707
9507
4194
2681
3522
5599
5641
6737
7781
2044
1501
6586
4915
5918
6132
9394
2113
935
9729
5236
8815
3169
5888
5722
191
9539
3384
6006
7139
7285
136
1843
5094
3795
5985
5566
3545
965
14
3738
4645
8439
6390
13842
7754
11440
7572
4876
3206
5577
2734
1169
20
5049
6612
2685
7000
6711
4091
26
5383
5516
7185
6118
4484
2178
754
8104
8209
6159
11137
8994
5172
425
8082
5337
5712
7157
6385
3343
4196
5957
8581
3686
0
254
1819
1071
876
3509
2777
1474
4945
3971
21
5466
5509
1316
5653
2775
797
22
5601
6177
5662
5132
6543
1700
4361
6951
7734
3451
5385
6358
6838
19
6460
5813
6839
6335
2105
8
6
9530
1250
5668
5595
6008
2315
1712
8553
5570
5979
4818
6745
5250
43
5727
7416
5888
6270
4931
0
31
6190
11164
5768
7307
5412
2716
35
8391
6054
2796
5081
6646
4597
1978
7570
5909
9581
3571
6740
1702
1080
6719
963
6781
7544
7708
1993
597
2394
5516
12966
723
6528
2476
86
5956
5820
6995
6682
2460
2479
56
7095
7255
6310
9971
3725
5400
452
6018
5803
6673
6098
9476
692
20
7855
11970
10557
5696
7765
3847
47
6020
6037
5684
7089
6372
970
861
3590
7672
3730
10689
9428
1514
2062
6154
5234
6160
5134
879
1079
9164
6338
6687
8195
6351
1123
4216
3759
9372
7782
3143
4773
6993
849
906
6385
7512
8824
8150
12464
7726
8745
13594
6589
6524
2784
0
1785
688
7998
6797
8289
10815
10280
4839
3928
10935
4588
5785
6771
7628
2908
11391
6637
5585
7454
5828
8259
6644
2436
7055
7206
7873
7368
6239
3595
3166
1846
2301
21
1600
2390
1894
1469
9097
8401
2034
3244
8811
2979
20
7808
7698
11031
4556
7149
3745
5563
9673
8149
12158
7043
6273
1855
80
10729
5880
9327
6343
7227
3522
1244
6382
7186
4964
6162
7435
10524
2449
7437
11970
6661
6122
7323
6707
25
2270
5117
6676
5317
7032
7689
4891
8051
5699
4927
11553
6418
2968
11338
7662
9976
5526
14341
4331
10026
1672
5199
4699
7774
7958
7720
2499
10745
19609
15896
5705
6207
7699
2543
32
3642
6307
7491
6236
8644
2121
1448
7838
5434
5945
6074
6962
5441
42
7424
5818
8877
5743
7980
3140
3046
8329
8186
5994
2931
7309
862
145
8141
6252
9536
6213
7150
2718
1687
5000
6068
5918
10652
12257
1505
2421
10518
2368
7341
8137
7997
3437
2009
5468
3947
5836
8567
11039
3726
746
3417
8649
8016
7652
8298
1306
4031
5525
6203
11847
7688
10911
1080
1001
12315
6084
6529
4074
8526
3161
2184
7400
4916
4521
1523
398
1364
925
38
2580
1039
6556
2040
1166
825
7672
7177
6104
7928
6240
1420
1214
10638
10726
2323
6113
8112
2757
3761
6982
5680
7793
8983
8546
1335
817
6136
3778
6639
6548
6120
3648
584
9099
6434
8828
9988
6066
2575
2237
5114
5879
4094
9309
8008
1614
4307
5801
8006
6344
4803
10904
1339
411
8468
6945
5471
8828
4157
1134
1071
5542
2213
5633
9245
2145
4901
39
10430
7941
6189
7985
8296
614
894
6236
1704
4257
7707
8388
1050
855
9352
4801
7088
8466
470
2433
1036
392
2169
84
5316
8339
4272
2617
1840
7254
5999
6178
4563
3370
756
2773
6610
8967
6182
7452
2570
1443
6537
5338
9158
3870
12036
3574
864
10135
5595
8643
2287
9918
2484

How to merge bins in R

So, I am trying to merge bins of a histogram whenever the number of observations in a bin is less than 6.
library(fitdistrplus)
mydata <-read.csv("Book2.csv",stringsAsFactors=FALSE)
QF3<-as.numeric(mydata[,1])
histrv<-hist(QF3,breaks="FD")
binvec<-data.frame(diff(histrv$breaks))
binbreak=histrv$breaks
freq<-histrv$count
datmean=as.numeric(mean(QF3))
datsigma=as.numeric(sd(QF3))
templist<-as.numeric()#empty list
for (i in 1:nrow(binvec)){
templist[i]=pnorm(binbreak[i+1],datmean,datsigma)-pnorm(binbreak[i],datmean,datsigma)
}
pi<-data.frame(templist)
chisqvec<-(freq-length(QF3)*pi)^2/(length(QF3)*pi)
xstat=sum(chisqvec)
The above code will provide a histogram with five bins that contain less than 6 observations, which are the bins 6000-7000, 7000-8000, 8000-9000, 9000-10000, and 10000-11000. Each of these 5 bins contain 2, 5, 2, 2, and 1 observations respectively. I would like to merge the bins that they can have more than 5 observations.
In other words, I would like to have the two bins 6000-8000 and 8000-11000 so that they can contain 7 observations and 5 observations.
Does anyone have any clue on how to approach this problem?
QF3 looks like the following:
> QF3
[1] 2016 1425 2000 785 823 2484 1870 770 1220 3454 1056 2745 2830
[14] 950 601 1245 2663 1500 1717 1070 1704 2517 1090 3310 3389 2200
[27] 882 2113 600 1900 4417 745 530 1630 1600 4530 948 2764 2202
[40] 1052 2685 1120 1275 2300 1590 1935 3957 4283 3215 5684 4092 7548
[53] 4547 3510 3063 5549 6460 5204 4626 4965 5023 8111 5525 4804 5994
[66] 8471 4767 7142 3420 4061 5102 9135 3861 5372 7274 5054 7318 3791
[79] 4901 3549 4758 4859 10190 5609 7624 5841 4908 4974 6691 5713 3235
[92] 4464 2656 4399 9581 3993 4061

Resources