SMOTE function 'subscript out of bond' - r

I'm trying to implement a logistic regression as follows:
However I can't get good predictions because my class output 1 is under-represented in my data.
Therefore I'm trying to apply SMOTE algorithm to my trainset in order to get better results.
However I get the message error:
Error in T[i, ] : subscript out of bounds
There is my code:
set.seed(157)
split <- createDataPartition(df_statique$Y, p = .50,list = FALSE,times = 1)
trainSplit <- df_statique[ split,]
testSplit <- df_statique[-split,]
trainSplit <- SMOTE(Y ~ insolvency + efficiency + DebtToAssetsRatio + taille + CashAssetRatio + current + netWorth + REA, trainSplit, perc.over = 300, perc.under=100)
There is a part of my dataframe df_statique:
index countryIsoCode insolvency efficiency CashAssetRatio DebtToAssetsRatio netWorth REA taille Y
41807 IT 0.00360 0.5193711 0.8686575 0.49446355 4387182 1.657145e-03 2 1
41808 IT 0.00050 1.5269309 1.6295765 0.36543122 30916838 6.601092e-03 3 0
41809 IT 0.00050 2.2635592 1.3427063 0.15809120 2200087 1.218576e-03 1 0
41810 IT 0.00280 1.3989753 0.9345793 0.69642554 2940473 3.852093e-04 2 0
41811 IT 0.00140 2.1440221 3.5781748 0.07951644 28418622 8.845920e-04 2 0
41812 IT 0.00040 1.0068491 1.7238305 0.47561418 22486133 2.703242e-04 2 0
41813 IT 0.00130 1.5569114 1.4459704 0.57632716 9769040 9.741611e-04 2 0
41814 IT 0.00510 5.0143711 0.1035034 0.71267895 3610152 2.391447e-03 2 0
41815 IT 0.00090 3.3280521 0.5160867 0.34998732 218965703 2.550272e-04 3 0
41816 IT 0.00040 1.7217051 2.2758391 0.29638050 29868519 1.136387e-04 3 0
41817 IT 0.00360 1.7261580 0.8490392 0.41231551 106020226 2.304773e-06 3 0
41818 IT 0.00040 1.3600893 1.6298656 0.57789518 55408765 4.841743e-04 3 1
41819 IT 0.00510 5.5565821 0.1376145 0.19679467 9491245 1.398124e-03 2 0
41820 IT 0.00131 3.8312347 1.1365521 0.73639696 8921497 4.701300e-06 3 0
41821 IT 0.00400 1.8218620 0.9113375 0.62646234 24134486 9.435248e-04 3 0
41822 IT 0.00100 1.8215702 1.0690901 0.82764828 777547 6.335832e-03 2 0
41823 IT 0.00090 1.8153513 0.9320536 0.80258849 2437903 6.035954e-04 2 0
41824 IT 0.00050 2.1300765 1.7388457 0.31394248 27009000 3.507500e-04 3 0
41825 IT 0.00100 1.8697385 1.4438289 0.56198890 35917 5.765082e-03 1 0
41826 IT 0.00230 6.5298138 1.1726536 0.56654516 2675415 1.038839e-02 2 0
41827 IT 0.00220 9.8201528 0.4794298 0.63618554 488924 1.336866e-05 2 0
Finally, my output Y is a dummy indicating a default or not at horizon 1 year

This error occurs when the target variable you use for SMOTE function is of INT data type. SMOTE can only work with factor target variable.

Related

How do I perform true or false statement, with a calculation, on a data-frame column?

I'm trying to run an if{}else{} statement down a data-frame of financial data. I've tried various different routes but can't seem to muster up something that reads the data correctly.
I've tried on a smaller sample;
O H L C StoHH StoLL K kM Tr
2007-02-23 9.274052 9.293215 9.269759 9.283475 9.455829 9.229304 0.23913917 0.2587985 0
2007-02-26 9.300035 9.349619 9.266349 9.321300 9.455829 9.229304 0.40611853 0.2809329 0
2007-02-27 9.322654 9.579280 9.336498 9.565459 9.579280 9.229304 0.96050872 0.5352555 0
2007-02-28 9.581271 9.609756 9.510326 9.560271 9.609756 9.238159 0.86683154 0.7444863 0
2007-03-01 9.556549 9.686216 9.507037 9.561439 9.686216 9.238159 0.72151534 0.8496185 0
2007-03-02 9.550849 9.740609 9.545815 9.732355 9.740609 9.238159 0.98357249 0.8573065 0
2007-03-05 9.772542 9.850654 9.776493 9.798985 9.850654 9.238159 0.91564176 0.8735765 0
2007-03-06 9.785914 9.800714 9.659360 9.681929 9.850654 9.238159 0.72452836 0.8745809 0
2007-03-07 9.682955 9.753325 9.678526 9.799160 9.850654 9.238159 0.91592748 0.8520325 0
2007-03-08 9.798771 9.754936 9.649990 9.662383 9.850654 9.238159 0.69261627 0.7776907 0
2007-03-09 9.656704 9.694217 9.590265 9.617014 9.850654 9.238159 0.61854382 0.7423625 0
2007-03-12 9.605565 9.664844 9.582515 9.676105 9.850654 9.238159 0.71501971 0.6753933 0
2007-03-13 9.673779 9.873512 9.675159 9.885906 9.873512 9.266349 1.02041297 0.7846588 0
2007-03-14 9.893805 9.908378 9.809037 9.834033 9.908378 9.266349 0.88420305 0.8732119 0
2007-03-15 9.830605 9.822618 9.788816 9.833431 9.908378 9.336498 0.86894628 0.9245208 0
2007-03-16 9.823855 9.937603 9.832304 9.936606 9.937603 9.507037 0.99768444 0.9169446 0
2007-03-19 9.911422 9.894650 9.830386 9.886574 9.937603 9.507037 0.88148391 0.9160382 0
2007-03-20 9.871804 9.858855 9.780539 9.834216 9.937603 9.545815 0.73611494 0.8717611 0
2007-03-21 9.833709 9.825223 9.730592 9.732152 9.937603 9.582515 0.42140821 0.6796690 0
2007-03-22 9.731063 9.723312 9.577633 9.585578 9.937603 9.577633 0.02207128 0.3931981 0
2007-03-23 9.596751 9.600287 9.536365 9.579816 9.937603 9.536365 0.10829234 0.1839239 0
With;
a.2 = a.1[40:60,]
sapply(a.2$K, as.numeric)
if (a.2$K >= 0.8) {
a.2$Tr = (a.2$O - a.2$C)* s
} else {
a.2$Tr = 0}
But it doesn't seem to register the value in K as being >= 0.8. Any ideas?
You don't need to use sapply here, you can put it all into ifelse which is already vectorized
a.2$Tr<-ifelse(as.numeric(a.2$K)>=0.8, (a.2$O-a.2$C)*s, 0)
I believe your problem is in not storing the numerical value. Try:
a.2$K = as.numeric(a.2$K)

Plotting Conditionally Summed Data (base R or ggplot)

I started with a dataframe containing info on West Nile cases in Canada from 2012-2015. 600 observations of 10 variables in total.
> head(mosquitoes)
Years Weeks Province Avg.Temp Avg..Precepitation Wind Number.of.cases Number.of.Dead.Birds Mosquito.Pools.Tested Google.Trend.Searches
1 2015 17 Alberta 48 0.01 8 0 0 0 1
2 2015 18 Alberta 46 0.03 10 0 0 0 2
3 2015 19 Alberta 44 0.07 8 0 0 0 2
4 2015 20 Alberta 51 0.00 9 0 0 0 2
5 2015 21 Alberta 56 0.01 9 0 0 0 4
6 2015 22 Alberta 58 0.10 7 0 0 0 1
Here is the entire data set....sorry it's large.
Years,Weeks,Province,Avg Temp ,Avg. Precepitation,Wind,Number of cases,Number of Dead Birds,Mosquito Pools Tested,Google Trend Searches
2015,17,Alberta,48,0.01,8,0,0,0,1
2015,18,Alberta,46,0.03,10,0,0,0,2
2015,19,Alberta,44,0.07,8,0,0,0,2
2015,20,Alberta,51,0,9,0,0,0,2
2015,21,Alberta,56,0.01,9,0,0,0,4
2015,22,Alberta,58,0.1,7,0,0,0,1
2015,23,Alberta,61,0.05,8,0,0,0,1
2015,24,Alberta,55,0.08,9,0,0,0,1
2015,25,Alberta,63,0.02,6,0,0,0,4
2015,26,Alberta,67,0.16,8,0,0,0,5
2015,27,Alberta,65,0.02,8,0,0,0,3
2015,28,Alberta,62,0.09,10,0,0,0,7
2015,29,Alberta,66,0.01,8,0,0,0,2
2015,30,Alberta,62,0.02,7,0,0,0,3
2015,31,Alberta,64,0.21,7,0,0,0,6
2015,32,Alberta,66,0.07,7,0,0,0,4
2015,33,Alberta,55,0.13,8,0,0,0,4
2015,34,Alberta,63,0,6,0,0,0,1
2015,35,Alberta,52,0.11,9,0,0,0,4
2015,36,Alberta,54,0.02,7,0,0,0,2
2015,37,Alberta,48,0.06,8,0,0,0,2
2015,38,Alberta,52,0.03,9,0,0,0,3
2015,39,Alberta,49,0.03,9,0,0,0,3
2015,40,Alberta,51,0,8,0,0,0,2
2015,41,Alberta,48,0,8,0,0,0,2
2014,17,Alberta,43,0.05,8,0,0,0,1
2014,18,Alberta,44,0.06,9,0,0,0,3
2014,19,Alberta,37,0.03,9,0,0,0,3
2014,20,Alberta,48,0.01,8,0,0,0,1
2014,21,Alberta,57,0.01,10,0,0,0,2
2014,22,Alberta,53,0.06,8,0,0,0,4
2014,23,Alberta,53,0.04,10,0,0,0,6
2014,24,Alberta,53,0.04,10,0,0,0,6
2014,25,Alberta,54,0.24,9,0,0,0,4
2014,26,Alberta,59,0.03,9,0,0,0,7
2014,27,Alberta,64,0.02,11,0,0,0,19
2014,28,Alberta,65,0.03,10,0,0,0,33
2014,29,Alberta,67,0.01,9,0,0,0,18
2014,30,Alberta,62,0.08,10,0,0,0,14
2014,31,Alberta,68,0,10,0,0,0,10
2014,32,Alberta,63,0.16,8,0,0,0,11
2014,33,Alberta,66,0.01,7,0,0,0,19
2014,34,Alberta,58,0.05,8,0,0,0,17
2014,35,Alberta,58,0.04,7,0,0,0,8
2014,36,Alberta,54,0.01,7,0,0,0,12
2014,37,Alberta,41,0.15,8,0,0,0,3
2014,38,Alberta,58,0,5,0,0,0,3
2014,39,Alberta,60,0.02,6,0,0,0,4
2014,40,Alberta,48,0.03,11,0,0,0,5
2014,41,Alberta,51,0,6,0,0,0,3
2013,17,Alberta,42,0,12,0,0,0,3
2013,18,Alberta,42,0.01,11,0,0,0,2
2013,19,Alberta,57,0,11,0,0,0,2
2013,20,Alberta,55,0.01,10,0,0,0,9
2013,21,Alberta,50,0.23,11,0,0,0,7
2013,22,Alberta,52,0.08,6,0,0,0,8
2013,23,Alberta,55,0.15,10,0,0,0,10
2013,24,Alberta,53,0.08,10,0,0,0,4
2013,25,Alberta,57,0.3,11,0,0,0,9
2013,26,Alberta,61,0.01,9,0,0,0,17
2013,27,Alberta,65,0.08,10,0,0,0,27
2013,28,Alberta,59,0.07,8,0,0,0,19
2013,29,Alberta,62,0.01,10,0,0,0,21
2013,30,Alberta,62,0.06,10,0,0,0,18
2013,31,Alberta,57,0.03,7,0,0,0,13
2013,32,Alberta,60,0.07,8,0,0,0,10
2013,33,Alberta,67,0,8,3,0,0,2
2013,34,Alberta,63,0,8,5,0,0,12
2013,35,Alberta,64,0.03,10,4,0,0,20
2013,36,Alberta,64,0.13,8,2,1,0,15
2013,37,Alberta,63,0,9,5,0,0,9
2013,38,Alberta,57,0.06,11,2,0,0,11
2013,39,Alberta,47,0,10,0,0,0,4
2013,40,Alberta,44,0,11,0,0,0,5
2013,41,Alberta,45,0.06,8,0,0,0,5
2012,17,Alberta,49,0.06,7,0,0,0,2
2012,18,Alberta,42,0.13,9,0,0,0,2
2012,19,Alberta,48,0,9,0,0,0,6
2012,20,Alberta,53,0.01,10,0,0,0,2
2012,21,Alberta,49,0.08,8,0,0,0,2
2012,22,Alberta,52,0,9,0,0,0,2
2012,23,Alberta,54,0.28,9,0,0,0,4
2012,24,Alberta,56,0.21,12,0,0,0,7
2012,25,Alberta,56,0.05,8,0,0,0,5
2012,26,Alberta,59,0.14,8,0,0,0,3
2012,27,Alberta,61,0.21,9,0,0,0,22
2012,28,Alberta,69,0,8,0,0,0,32
2012,29,Alberta,65,0.09,10,0,0,0,16
2012,30,Alberta,64,0.02,10,0,0,0,15
2012,31,Alberta,63,0.03,10,0,0,0,20
2012,32,Alberta,68,0,10,0,0,0,25
2012,33,Alberta,62,0.07,10,4,0,0,36
2012,34,Alberta,62,0.05,10,2,0,0,100
2012,35,Alberta,61,0.01,10,0,0,0,76
2012,36,Alberta,57,0,12,1,0,0,29
2012,37,Alberta,57,0,12,2,0,0,30
2012,38,Alberta,59,0,9,0,0,0,14
2012,39,Alberta,58,0.01,9,0,0,0,11
2012,40,Alberta,43,0.07,12,0,0,0,10
2012,41,Alberta,43,0.02,13,0,0,0,7
2015,17,British Columbia,53,0.03,10,0,0,0,5
2015,18,British Columbia,53,0.01,6,0,0,0,5
2015,19,British Columbia,58,0.01,7,0,0,0,5
2015,20,British Columbia,60,0,7,0,0,0,4
2015,21,British Columbia,62,0,7,0,0,0,6
2015,22,British Columbia,60,0.03,7,0,0,0,9
2015,23,British Columbia,62,0,13,0,0,0,9
2015,24,British Columbia,62,0.02,8,0,0,0,10
2015,25,British Columbia,66,0,9,0,0,0,7
2015,26,British Columbia,70,0,12,0,0,0,5
2015,27,British Columbia,67,0.01,9,0,0,0,11
2015,28,British Columbia,66,0,10,0,0,0,9
2015,29,British Columbia,65,0.04,9,0,0,0,14
2015,30,British Columbia,65,0.04,6,0,0,0,7
2015,31,British Columbia,65,0.02,9,0,0,0,7
2015,32,British Columbia,66,0.04,9,0,0,0,9
2015,33,British Columbia,65,0,9,0,0,0,11
2015,34,British Columbia,64,0.1,7,0,0,0,6
2015,35,British Columbia,57,0.12,10,0,0,0,4
2015,36,British Columbia,61,0.02,9,0,0,0,9
2015,37,British Columbia,58,0.09,9,0,0,0,9
2015,38,British Columbia,55,0.04,9,0,0,0,3
2015,39,British Columbia,52,0,6,0,0,0,3
2015,40,British Columbia,56,0.08,6,0,0,0,3
2015,41,British Columbia,51,0.04,7,0,0,0,7
2014,17,British Columbia,49,0.07,10,0,0,0,3
2014,18,British Columbia,54,0.03,8,0,0,0,4
2014,19,British Columbia,53,0.18,9,0,0,0,4
2014,20,British Columbia,60,0,8,0,0,0,6
2014,21,British Columbia,59,0.06,7,0,0,0,6
2014,22,British Columbia,56,0.09,7,0,0,0,6
2014,23,British Columbia,59,0,8,0,0,0,8
2014,24,British Columbia,60,0.03,10,0,0,0,7
2014,25,British Columbia,58,0.09,9,0,0,0,8
2014,26,British Columbia,62,0.05,7,0,0,0,10
2014,27,British Columbia,64,0.01,8,0,0,0,7
2014,28,British Columbia,66,0.01,8,0,0,0,19
2014,29,British Columbia,68,0,9,0,0,0,13
2014,30,British Columbia,63,0.06,8,0,0,0,12
2014,31,British Columbia,67,0,6,0,0,0,16
2014,32,British Columbia,66,0,7,0,0,0,25
2014,33,British Columbia,67,0.08,7,0,0,0,17
2014,34,British Columbia,65,0,6,0,0,0,13
2014,35,British Columbia,66,0,7,0,0,0,30
2014,36,British Columbia,61,0.05,7,0,0,0,9
2014,37,British Columbia,60,0,6,0,0,0,11
2014,38,British Columbia,61,0.02,6,0,0,0,3
2014,39,British Columbia,62,0.12,9,0,0,0,8
2014,40,British Columbia,56,0.04,6,0,0,0,9
2014,41,British Columbia,58,0.03,5,0,0,0,7
2013,17,British Columbia,50,0.03,7,0,0,0,14
2013,18,British Columbia,50,0,12,0,0,0,8
2013,19,British Columbia,59,0.03,6,0,0,0,5
2013,20,British Columbia,56,0.07,8,0,0,0,7
2013,21,British Columbia,54,0.04,8,0,0,0,4
2013,22,British Columbia,55,0.09,7,0,0,0,8
2013,23,British Columbia,60,0.01,9,0,0,0,14
2013,24,British Columbia,58,0.01,7,0,0,0,16
2013,25,British Columbia,62,0.04,8,0,0,0,10
2013,26,British Columbia,63,0.1,7,0,0,0,17
2013,27,British Columbia,67,0,8,0,0,0,29
2013,28,British Columbia,63,0,8,0,0,0,30
2013,29,British Columbia,66,0,9,0,0,0,20
2013,30,British Columbia,64,0,8,0,0,0,34
2013,31,British Columbia,64,0.02,8,0,0,0,11
2013,32,British Columbia,66,0,6,0,0,1,13
2013,33,British Columbia,66,0.02,8,0,0,1,16
2013,34,British Columbia,63,0.01,8,0,0,1,16
2013,35,British Columbia,65,0.17,7,0,1,1,12
2013,36,British Columbia,64,0.06,6,0,0,1,8
2013,37,British Columbia,63,0,6,0,0,1,14
2013,38,British Columbia,60,0.19,6,0,0,1,6
2013,39,British Columbia,54,0.23,10,0,0,1,6
2013,40,British Columbia,51,0.15,9,0,0,1,6
2013,41,British Columbia,51,0.01,8,0,0,1,8
2012,17,British Columbia,53,0.05,8,0,0,0,5
2012,18,British Columbia,50,0.11,7,0,0,0,6
2012,19,British Columbia,52,0,9,0,0,0,7
2012,20,British Columbia,54,0,10,0,0,0,8
2012,21,British Columbia,55,0.06,8,0,0,0,9
2012,22,British Columbia,57,0.07,7,0,0,0,8
2012,23,British Columbia,53,0.07,8,0,0,0,4
2012,24,British Columbia,57,0.04,8,0,0,0,4
2012,25,British Columbia,58,0.13,8,0,0,0,7
2012,26,British Columbia,60,0.04,8,0,0,0,8
2012,27,British Columbia,59,0.03,7,0,0,0,22
2012,28,British Columbia,66,0,6,0,0,0,30
2012,29,British Columbia,66,0.05,8,0,0,0,30
2012,30,British Columbia,63,0.03,8,0,0,0,38
2012,31,British Columbia,65,0,8,0,0,0,60
2012,32,British Columbia,67,0.01,8,0,0,0,34
2012,33,British Columbia,69,0,7,0,0,0,63
2012,34,British Columbia,63,0,8,0,0,0,100
2012,35,British Columbia,62,0,7,0,0,0,51
2012,36,British Columbia,62,0,7,0,0,0,32
2012,37,British Columbia,58,0.01,8,0,0,0,24
2012,38,British Columbia,60,0,6,0,0,0,13
2012,39,British Columbia,57,0,6,0,0,0,13
2012,40,British Columbia,53,0,8,0,0,0,6
2012,41,British Columbia,52,0.09,5,0,0,0,8
2015,17,Manitoba,56,0,10,0,0,0,4
2015,18,Manitoba,48,0,13,0,0,0,4
2015,19,Manitoba,46,0,10,0,0,0,4
2015,20,Manitoba,52,0,14,0,0,0,4
2015,21,Manitoba,57,0,10,0,0,12,4
2015,22,Manitoba,60,0,12,0,0,4,8
2015,23,Manitoba,67,0,9,0,0,87,8
2015,24,Manitoba,59,0,9,0,0,82,8
2015,25,Manitoba,66,0,7,0,0,44,8
2015,26,Manitoba,68,0,7,0,0,75,11
2015,27,Manitoba,66,0,10,0,0,73,17
2015,28,Manitoba,70,0,7,0,0,132,8
2015,29,Manitoba,69,0,9,0,0,139,17
2015,30,Manitoba,70,0,11,0,0,204,4
2015,31,Manitoba,63,0,9,0,0,275,13
2015,32,Manitoba,73,0,9,0,0,195,23
2015,33,Manitoba,62,0,10,0,0,228,13
2015,34,Manitoba,62,0,11,0,0,69,12
2015,35,Manitoba,73,0,11,1,0,92,10
2015,36,Manitoba,57,0,10,1,0,113,8
2015,37,Manitoba,60,0,11,2,0,34,4
2015,38,Manitoba,61,0,13,1,0,0,4
2015,39,Manitoba,53,0,13,0,0,0,6
2015,40,Manitoba,48,0,11,0,0,0,6
2015,41,Manitoba,44,0,11,0,0,0,6
2014,17,Manitoba,42,0,11,0,0,0,4
2014,18,Manitoba,42,0,14,0,0,0,0
2014,19,Manitoba,46,0,9,0,0,0,0
2014,20,Manitoba,45,0,10,0,0,0,0
2014,21,Manitoba,57,0,12,0,0,0,0
2014,22,Manitoba,66,0,8,0,0,0,0
2014,23,Manitoba,62,0,10,0,0,0,5
2014,24,Manitoba,60,0,11,0,0,0,13
2014,25,Manitoba,62,0,12,0,0,0,9
2014,26,Manitoba,66,0,10,0,0,0,7
2014,27,Manitoba,65,0,15,0,0,0,9
2014,28,Manitoba,67,0,11,0,0,0,36
2014,29,Manitoba,63,0,11,0,0,0,24
2014,30,Manitoba,68,0,9,0,0,0,53
2014,31,Manitoba,65,0,8,0,0,7,41
2014,32,Manitoba,71,0,8,0,0,7,48
2014,33,Manitoba,68,0,8,1,0,14,14
2014,34,Manitoba,67,0,8,2,0,19,18
2014,35,Manitoba,61,0,11,2,0,22,9
2014,36,Manitoba,60,0,8,0,0,24,4
2014,37,Manitoba,50,0,11,0,0,24,11
2014,38,Manitoba,52,0,10,0,0,24,4
2014,39,Manitoba,65,0,13,0,0,24,15
2014,40,Manitoba,47,0,16,0,0,24,4
2014,41,Manitoba,39,0,13,0,0,24,4
2013,17,Manitoba,36,0.01,12,0,0,0,4
2013,18,Manitoba,38,0.11,9,0,0,0,4
2013,19,Manitoba,49,0.02,12,0,0,0,4
2013,20,Manitoba,56,0.02,10,0,0,0,5
2013,21,Manitoba,55,0.05,14,0,0,0,4
2013,22,Manitoba,58,0.16,15,0,0,0,4
2013,23,Manitoba,57,0.01,9,0,0,0,9
2013,24,Manitoba,63,0.03,10,0,0,0,16
2013,25,Manitoba,66,0.1,9,0,0,0,23
2013,26,Manitoba,69,0.24,10,0,0,0,14
2013,27,Manitoba,72,0,6,0,0,0,23
2013,28,Manitoba,70,0.06,10,0,0,1,19
2013,29,Manitoba,66,0.1,9,0,0,1,45
2013,30,Manitoba,60,0.19,8,0,1,7,35
2013,31,Manitoba,61,0.03,7,0,0,10,31
2013,32,Manitoba,59,0.04,7,0,0,16,22
2013,33,Manitoba,64,0.02,8,1,0,16,24
2013,34,Manitoba,71,0.17,10,0,0,16,49
2013,35,Manitoba,76,0.01,7,0,0,17,14
2013,36,Manitoba,64,0,10,1,0,17,11
2013,37,Manitoba,63,0.01,8,0,0,19,9
2013,38,Manitoba,54,0,11,0,0,19,6
2013,39,Manitoba,60,0.1,12,0,0,19,13
2013,40,Manitoba,50,0.03,11,0,0,19,8
2013,41,Manitoba,52,0,10,0,1,19,4
2012,17,Manitoba,46,0.01,12,0,0,0,0
2012,18,Manitoba,51,0.05,11,0,0,0,0
2012,19,Manitoba,56,0.06,13,0,0,0,5
2012,20,Manitoba,58,0.16,12,0,0,0,6
2012,21,Manitoba,53,0.02,11,0,0,0,5
2012,22,Manitoba,53,0.13,9,0,0,0,5
2012,23,Manitoba,67,0.08,8,0,0,0,8
2012,24,Manitoba,62,0.17,11,0,0,0,10
2012,25,Manitoba,60,0.04,8,0,0,0,11
2012,26,Manitoba,68,0,10,0,0,0,11
2012,27,Manitoba,73,0.03,7,0,0,0,15
2012,28,Manitoba,73,0,7,0,0,0,17
2012,29,Manitoba,69,0.05,8,1,0,2,21
2012,30,Manitoba,71,0,8,1,0,20,36
2012,31,Manitoba,71,0.2,9,4,0,48,100
2012,32,Manitoba,67,0,9,7,0,62,47
2012,33,Manitoba,62,0.04,8,7,0,98,31
2012,34,Manitoba,69,0.01,7,6,0,108,84
2012,35,Manitoba,70,0.01,11,7,0,111,75
2012,36,Manitoba,63,0.01,11,1,0,116,22
2012,37,Manitoba,59,0.01,11,3,0,116,23
2012,38,Manitoba,47,0.01,12,2,0,116,13
2012,39,Manitoba,50,0,8,0,0,116,5
2012,40,Manitoba,46,0.02,15,0,0,116,7
2012,41,Manitoba,37,0.02,10,0,0,116,5
2015,17,Quebec,53,0,8,0,0,0,8
2015,18,Quebec,65,0.06,8,0,0,0,8
2015,19,Quebec,58,0.09,10,0,0,0,8
2015,20,Quebec,59,0.05,11,0,0,0,8
2015,21,Quebec,69,0.11,11,0,0,0,8
2015,22,Quebec,56,0.07,9,0,0,0,8
2015,23,Quebec,65,0.16,9,0,0,0,8
2015,24,Quebec,64,0.16,7,0,0,0,16
2015,25,Quebec,67,0.18,8,0,0,0,8
2015,26,Quebec,64,0.07,9,0,0,120,19
2015,27,Quebec,71,0.01,8,0,0,127,24
2015,28,Quebec,70,0.05,9,0,1,132,24
2015,29,Quebec,70,0.3,8,0,1,131,16
2015,30,Quebec,75,0.07,9,1,2,129,16
2015,31,Quebec,67,0.02,9,1,3,126,8
2015,32,Quebec,69,0.31,7,0,0,133,8
2015,33,Quebec,76,0.11,9,1,1,125,16
2015,34,Quebec,68,0.01,8,2,1,123,11
2015,35,Quebec,70,0,8,1,3,131,31
2015,36,Quebec,72,0.15,8,2,4,128,15
2015,37,Quebec,69,0.21,9,6,0,123,7
2015,38,Quebec,58,0,7,5,0,108,7
2015,39,Quebec,55,0.17,11,2,2,107,11
2015,40,Quebec,49,0.03,7,5,0,0,7
2015,41,Quebec,51,0.11,11,8,0,0,15
2014,17,Quebec,46,0.05,9,0,0,0,0
2014,18,Quebec,49,0.18,12,0,0,0,0
2014,19,Quebec,53,0.09,10,0,0,0,0
2014,20,Quebec,62,0.17,13,0,0,0,0
2014,21,Quebec,59,0.01,9,0,0,0,13
2014,22,Quebec,59,0.08,9,0,0,0,13
2014,23,Quebec,66,0.13,8,0,0,0,40
2014,24,Quebec,66,0.28,11,0,0,0,18
2014,25,Quebec,65,0.14,8,0,0,0,27
2014,26,Quebec,69,0.14,6,0,0,0,33
2014,27,Quebec,75,0.02,9,0,0,0,23
2014,28,Quebec,70,0.08,12,0,0,0,40
2014,29,Quebec,69,0.05,9,0,0,1,27
2014,30,Quebec,72,0.06,10,0,0,4,28
2014,31,Quebec,66,0.18,8,0,0,9,54
2014,32,Quebec,70,0.04,6,0,0,10,24
2014,33,Quebec,67,0.2,10,1,2,19,34
2014,34,Quebec,66,0,7,1,0,19,9
2014,35,Quebec,70,0,8,1,1,39,17
2014,36,Quebec,72,0.11,10,1,0,70,8
2014,37,Quebec,60,0.12,9,0,3,99,12
2014,38,Quebec,52,0.02,9,1,2,112,13
2014,39,Quebec,61,0.02,9,0,0,119,15
2014,40,Quebec,58,0.06,11,0,1,119,16
2014,41,Quebec,51,0.1,13,1,0,119,16
2013,17,Quebec,46,0.03,11,1,0,0,9
2013,18,Quebec,60,0.01,7,0,0,0,9
2013,19,Quebec,65,0.08,8,0,0,0,9
2013,20,Quebec,51,0.01,11,0,0,0,18
2013,21,Quebec,64,0.19,10,0,0,0,17
2013,22,Quebec,64,0.18,9,0,0,0,9
2013,23,Quebec,59,0.11,10,0,0,0,21
2013,24,Quebec,64,0.11,9,0,0,0,18
2013,25,Quebec,62,0.09,8,0,0,0,9
2013,26,Quebec,69,0.14,9,0,0,0,37
2013,27,Quebec,72,0.02,9,0,0,0,9
2013,28,Quebec,73,0.06,8,0,0,0,45
2013,29,Quebec,79,0.28,9,0,0,2,49
2013,30,Quebec,66,0.06,7,0,0,3,73
2013,31,Quebec,70,0.12,9,1,3,5,40
2013,32,Quebec,68,0.04,9,3,2,11,74
2013,33,Quebec,66,0.08,9,8,4,23,56
2013,34,Quebec,69,0.02,10,3,5,36,64
2013,35,Quebec,70,0.06,7,4,9,36,29
2013,36,Quebec,63,0.06,10,2,6,40,32
2013,37,Quebec,62,0.18,8,3,4,47,20
2013,38,Quebec,58,0.12,9,1,2,59,8
2013,39,Quebec,54,0.03,6,1,0,60,16
2013,40,Quebec,61,0,6,1,0,60,24
2013,41,Quebec,55,0.11,10,0,0,60,20
2012,17,Quebec,40,0.17,13,0,0,0,0
2012,18,Quebec,50,0.03,7,0,0,0,10
2012,19,Quebec,55,0.07,8,0,0,0,10
2012,20,Quebec,61,0.02,7,0,0,0,10
2012,21,Quebec,69,0.1,7,0,0,0,11
2012,22,Quebec,62,0.16,8,0,0,0,10
2012,23,Quebec,61,0.02,8,0,0,0,10
2012,24,Quebec,68,0.08,7,0,0,0,11
2012,25,Quebec,76,0.01,9,0,0,0,11
2012,26,Quebec,69,0.13,9,0,0,0,26
2012,27,Quebec,73,0.12,6,0,0,0,40
2012,28,Quebec,72,0,8,0,2,0,24
2012,29,Quebec,71,0.21,6,1,0,0,11
2012,30,Quebec,71,0.1,7,1,0,0,11
2012,31,Quebec,76,0.01,7,0,1,5,78
2012,32,Quebec,72,0.17,10,2,5,8,31
2012,33,Quebec,70,0.02,7,6,2,19,94
2012,34,Quebec,70,0,6,10,5,19,100
2012,35,Quebec,71,0.01,11,9,8,19,76
2012,36,Quebec,71,0.11,6,14,1,19,70
2012,37,Quebec,63,0.07,8,23,6,19,43
2012,38,Quebec,58,0.12,10,16,0,19,34
2012,39,Quebec,54,0.01,9,27,0,19,38
2012,40,Quebec,57,0.16,8,11,0,19,14
2012,41,Quebec,45,0.06,10,8,0,19,19
2015,17,Ontario,53,0,9,0,0,0,2
2015,18,Ontario,61,0.04,5,0,0,0,2
2015,19,Ontario,58,0.07,7,0,0,0,4
2015,20,Ontario,58,0,8,0,0,0,5
2015,21,Ontario,70,0.11,8,0,0,0,8
2015,22,Ontario,57,0.14,7,0,0,180,8
2015,23,Ontario,65,0.18,6,0,0,356,5
2015,24,Ontario,65,0.08,5,0,1,852,5
2015,25,Ontario,67,0.33,7,0,0,886,13
2015,26,Ontario,63,0.02,7,0,0,954,15
2015,27,Ontario,68,0.04,5,0,0,1152,13
2015,28,Ontario,67,0.03,6,1,0,1216,21
2015,29,Ontario,72,0.01,7,1,4,1219,16
2015,30,Ontario,76,0.03,6,1,1,1222,22
2015,31,Ontario,68,0.06,6,0,8,1176,24
2015,32,Ontario,69,0.21,6,0,0,1168,15
2015,33,Ontario,73,0.09,5,1,0,1168,24
2015,34,Ontario,64,0.01,5,5,1,987,12
2015,35,Ontario,75,0,5,2,1,881,18
2015,36,Ontario,70,0.11,5,5,0,802,9
2015,37,Ontario,65,0.07,6,1,2,712,6
2015,38,Ontario,60,0,5,5,4,526,4
2015,39,Ontario,55,0.04,9,2,2,396,6
2015,40,Ontario,53,0.14,6,3,0,65,5
2015,41,Ontario,52,0.04,8,3,4,0,2
2014,17,Ontario,46,0.05,8,0,0,0,3
2014,18,Ontario,47,0.14,9,0,0,0,2
2014,19,Ontario,53,0,9,0,0,0,2
2014,20,Ontario,56,0.13,6,0,0,0,3
2014,21,Ontario,57,0.09,5,0,0,0,4
2014,22,Ontario,65,0.02,6,0,0,0,7
2014,23,Ontario,63,0.04,6,0,0,0,10
2014,24,Ontario,65,0.19,6,0,0,0,16
2014,25,Ontario,66,0.16,5,0,0,0,13
2014,26,Ontario,69,0.06,4,0,0,0,7
2014,27,Ontario,72,0.09,7,0,0,0,20
2014,28,Ontario,68,0.12,6,0,0,0,17
2014,29,Ontario,66,0.21,5,1,0,0,13
2014,30,Ontario,68,0.03,5,0,0,2,14
2014,31,Ontario,67,0.35,5,0,0,5,35
2014,32,Ontario,68,0.21,4,0,0,9,22
2014,33,Ontario,65,0.12,7,2,0,11,30
2014,34,Ontario,67,0.02,4,0,2,13,11
2014,35,Ontario,67,0,6,2,3,30,18
2014,36,Ontario,71,0.39,5,5,0,43,13
2014,37,Ontario,60,0.15,6,1,0,52,10
2014,38,Ontario,53,0.02,4,0,1,56,7
2014,39,Ontario,60,0.08,4,0,0,56,3
2014,40,Ontario,61,0.06,4,0,0,56,6
2014,41,Ontario,50,0.06,6,0,0,56,4
2013,17,Ontario,43,0.05,6,0,0,0,2
2013,18,Ontario,57,0.05,6,0,0,0,3
2013,19,Ontario,59,0.04,5,0,0,0,4
2013,20,Ontario,51,0.02,8,0,0,0,3
2013,21,Ontario,60,0.17,8,0,0,0,7
2013,22,Ontario,64,0.16,6,1,0,0,9
2013,23,Ontario,58,0.05,7,1,0,0,9
2013,24,Ontario,64,0.29,6,0,0,0,12
2013,25,Ontario,64,0.11,5,0,0,0,12
2013,26,Ontario,73,0.06,4,0,1,2,12
2013,27,Ontario,71,0.05,5,1,0,2,20
2013,28,Ontario,72,0.13,6,2,0,4,15
2013,29,Ontario,80,0.05,5,1,2,12,20
2013,30,Ontario,65,0.12,6,5,0,22,56
2013,31,Ontario,66,0.26,5,4,8,41,43
2013,32,Ontario,67,0.04,6,5,6,65,32
2013,33,Ontario,63,0,5,5,2,89,24
2013,34,Ontario,70,0,5,2,0,131,30
2013,35,Ontario,72,0.2,3,2,8,155,22
2013,36,Ontario,63,0.12,6,7,2,179,12
2013,37,Ontario,64,0.04,6,3,2,190,15
2013,38,Ontario,57,0.17,4,5,2,194,9
2013,39,Ontario,55,0,4,0,1,196,5
2013,40,Ontario,61,0.04,4,5,0,198,9
2013,41,Ontario,56,0.04,4,1,0,198,4
2012,17,Ontario,40,0.06,11,0,0,0,4
2012,18,Ontario,50,0.12,6,0,0,0,3
2012,19,Ontario,56,0.07,6,0,0,0,3
2012,20,Ontario,58,0.02,4,0,0,0,3
2012,21,Ontario,69,0.01,6,0,0,0,5
2012,22,Ontario,64,0.09,8,0,0,0,3
2012,23,Ontario,63,0.03,6,1,0,0,6
2012,24,Ontario,67,0.08,6,0,0,0,4
2012,25,Ontario,76,0.17,6,0,0,2,7
2012,26,Ontario,70,0.04,7,0,0,6,10
2012,27,Ontario,75,0.04,5,3,1,10,39
2012,28,Ontario,73,0.02,5,5,3,19,24
2012,29,Ontario,75,0.06,6,9,1,30,19
2012,30,Ontario,72,0.38,6,14,2,89,17
2012,31,Ontario,73,0.16,4,23,1,162,77
2012,32,Ontario,70,0.14,6,44,1,249,46
2012,33,Ontario,68,0.05,4,44,8,312,64
2012,34,Ontario,67,0,4,38,4,375,83
2012,35,Ontario,70,0.15,6,26,0,409,100
2012,36,Ontario,69,0.56,4,25,0,434,79
2012,37,Ontario,61,0.03,5,17,2,454,37
2012,38,Ontario,57,0.16,5,3,4,462,23
2012,39,Ontario,53,0,6,2,6,462,24
2012,40,Ontario,57,0.03,5,3,0,464,18
2012,41,Ontario,42,0.04,5,1,0,464,10
2015,17,Saskatchewan,50,0,10,0,0,0,6
2015,18,Saskatchewan,46,0,11,0,0,0,12
2015,19,Saskatchewan,46,0,9,0,0,0,6
2015,20,Saskatchewan,53,0,8,0,0,0,6
2015,21,Saskatchewan,56,0,8,0,0,2,9
2015,22,Saskatchewan,60,0,10,0,0,0,9
2015,23,Saskatchewan,64,0,10,0,0,3,9
2015,24,Saskatchewan,57,0,8,0,0,3,12
2015,25,Saskatchewan,65,0,7,0,0,10,31
2015,26,Saskatchewan,70,0,6,0,0,13,15
2015,27,Saskatchewan,66,0,9,0,0,16,13
2015,28,Saskatchewan,67,0,8,0,0,40,15
2015,29,Saskatchewan,68,0,10,0,0,47,16
2015,30,Saskatchewan,63,0.02,9,0,0,69,43
2015,31,Saskatchewan,63,0,8,0,0,67,16
2015,32,Saskatchewan,70,0,8,0,0,80,28
2015,33,Saskatchewan,58,0,8,0,0,94,38
2015,34,Saskatchewan,62,0,8,0,0,42,21
2015,35,Saskatchewan,61,0,10,0,1,41,14
2015,36,Saskatchewan,53,0,8,0,0,0,9
2015,37,Saskatchewan,52,0,8,0,0,0,5
2015,38,Saskatchewan,54,0,10,0,0,0,5
2015,39,Saskatchewan,48,0,8,0,0,0,5
2015,40,Saskatchewan,48,0,9,0,0,0,8
2015,41,Saskatchewan,44,0,11,0,0,0,5
2014,17,Saskatchewan,40,0,12,0,0,0,6
2014,18,Saskatchewan,41,0,10,0,0,0,6
2014,19,Saskatchewan,41,0,9,0,0,0,6
2014,20,Saskatchewan,45,0,7,0,0,0,6
2014,21,Saskatchewan,59,0,10,0,0,0,13
2014,22,Saskatchewan,57,0,11,0,0,0,20
2014,23,Saskatchewan,55,0,8,0,0,0,17
2014,24,Saskatchewan,53,0,10,0,0,0,13
2014,25,Saskatchewan,57,0,10,0,0,0,7
2014,26,Saskatchewan,63,0,8,0,0,0,21
2014,27,Saskatchewan,66,0,11,0,0,0,26
2014,28,Saskatchewan,65,0,10,0,0,0,69
2014,29,Saskatchewan,64,0,9,0,0,0,65
2014,30,Saskatchewan,63,0,9,0,0,1,60
2014,31,Saskatchewan,67,0,6,0,0,1,36
2014,32,Saskatchewan,69,0,6,0,2,2,47
2014,33,Saskatchewan,67,0,7,0,0,9,67
2014,34,Saskatchewan,64,0,8,0,0,19,45
2014,35,Saskatchewan,58,0,9,0,0,20,34
2014,36,Saskatchewan,56,0,8,0,0,20,13
2014,37,Saskatchewan,46,0,9,0,0,20,19
2014,38,Saskatchewan,55,0,8,0,0,20,6
2014,39,Saskatchewan,61,0,9,0,0,20,16
2014,40,Saskatchewan,44,0,12,0,0,20,12
2014,41,Saskatchewan,45,0,9,0,0,20,6
2013,17,Saskatchewan,34,0,10,0,0,0,10
2013,18,Saskatchewan,40,0,12,0,0,0,14
2013,19,Saskatchewan,50,0,12,0,0,0,14
2013,20,Saskatchewan,59,0,9,0,0,0,7
2013,21,Saskatchewan,57,0,13,0,0,0,7
2013,22,Saskatchewan,60,0,9,0,0,0,14
2013,23,Saskatchewan,57,0,9,0,0,0,21
2013,24,Saskatchewan,57,0,10,0,0,0,20
2013,25,Saskatchewan,61,0,10,0,0,0,14
2013,26,Saskatchewan,64,0,7,0,0,0,41
2013,27,Saskatchewan,69,0,7,0,0,0,61
2013,28,Saskatchewan,65,0,8,0,0,1,65
2013,29,Saskatchewan,62,0,9,0,3,1,81
2013,30,Saskatchewan,60,0,9,0,1,3,75
2013,31,Saskatchewan,59,0,8,0,2,3,33
2013,32,Saskatchewan,60,0,6,0,1,18,44
2013,33,Saskatchewan,69,0,8,0,0,29,75
2013,34,Saskatchewan,66,0,8,1,1,29,60
2013,35,Saskatchewan,69,0,8,3,0,36,24
2013,36,Saskatchewan,67,0,7,1,0,40,21
2013,37,Saskatchewan,62,0,9,0,0,40,26
2013,38,Saskatchewan,57,0,10,1,2,40,32
2013,39,Saskatchewan,51,0,9,0,1,40,13
2013,40,Saskatchewan,45,0,11,0,0,40,29
2013,41,Saskatchewan,46,0,10,0,0,40,10
2012,17,Saskatchewan,44,0,13,0,0,0,24
2012,18,Saskatchewan,46,0,12,0,0,0,16
2012,19,Saskatchewan,51,0,13,0,0,0,16
2012,20,Saskatchewan,54,0,12,0,0,0,9
2012,21,Saskatchewan,48,0,11,0,0,0,17
2012,22,Saskatchewan,53,0,9,0,0,0,16
2012,23,Saskatchewan,61,0,13,0,0,0,8
2012,24,Saskatchewan,56,0,11,0,0,0,16
2012,25,Saskatchewan,58,0,7,0,0,0,25
2012,26,Saskatchewan,64,0,12,0,0,0,22
2012,27,Saskatchewan,65,0,9,0,0,0,23
2012,28,Saskatchewan,71,0,7,0,1,0,67
2012,29,Saskatchewan,67,0,10,0,0,0,34
2012,30,Saskatchewan,67,0,8,0,0,0,28
2012,31,Saskatchewan,64,0,8,0,0,0,59
2012,32,Saskatchewan,68,0,8,0,0,3,58
2012,33,Saskatchewan,59,0,8,2,0,4,34
2012,34,Saskatchewan,65,0,9,1,0,6,100
2012,35,Saskatchewan,64,0,9,0,0,6,49
2012,36,Saskatchewan,55,0,11,3,0,6,41
2012,37,Saskatchewan,58,0,13,0,0,6,16
2012,38,Saskatchewan,50,0,8,3,0,6,19
2012,39,Saskatchewan,55,0,6,0,0,6,15
2012,40,Saskatchewan,42,0,10,0,0,6,11
2012,41,Saskatchewan,36,0,8,0,0,6,7
First I produced this plot
But I did that in the most brute force way imaginable
#split out each year
cases2015 <- subset(mosquitoes, mosquitoes$Years==2015)
cases2014 <- subset(mosquitoes, mosquitoes$Years==2014)
cases2013 <- subset(mosquitoes, mosquitoes$Years==2013)
cases2012 <- subset(mosquitoes, mosquitoes$Years==2012)
#get the sums by week
aggregate2015 <- aggregate(cases2015$Number.of.cases, by=list(Weeks=cases2015$Weeks), FUN=sum)
aggregate2014 <- aggregate(cases2014$Number.of.cases, by=list(Weeks=cases2014$Weeks), FUN=sum)
aggregate2013 <- aggregate(cases2013$Number.of.cases, by=list(Weeks=cases2013$Weeks), FUN=sum)
aggregate2012 <- aggregate(cases2012$Number.of.cases, by=list(Weeks=cases2012$Weeks), FUN=sum)
#put the sums back together into a dataframe
aggregateSums <- aggregate2012
aggregateSums <- cbind(aggregateSums, aggregate2013[,2])
aggregateSums <- cbind(aggregateSums, aggregate2014[,2])
aggregateSums <- cbind(aggregateSums, aggregate2015[,2])
#give the columns useful names
colnames(aggregateSums) <- c("Weeks","Cases.2012","Cases.2013","Cases.2014","Cases.2015")
#base R plot
#plot the first set of points
plot(x=aggregateSums$Weeks,y=aggregateSums$Cases.2012,pch=16,col="Red",main="West Nile Cases",xlab="Week",ylab="Number of Cases")
#add additional years
points(x=aggregateSums$Weeks,y=aggregateSums$Cases.2013,pch=15,col="Blue")
points(x=aggregateSums$Weeks,y=aggregateSums$Cases.2014,pch=14,col="Orange")
points(x=aggregateSums$Weeks,y=aggregateSums$Cases.2015,pch=13,col="Brown")
#add the connecting lines
lines(x=aggregateSums$Weeks,y=aggregateSums$Cases.2012,col="Red")
lines(x=aggregateSums$Weeks,y=aggregateSums$Cases.2013,col="Blue")
lines(x=aggregateSums$Weeks,y=aggregateSums$Cases.2014,col="Orange")
lines(x=aggregateSums$Weeks,y=aggregateSums$Cases.2015,col="Brown")
#click to place legend
legend(locator(1),c("2012","2013","2014","2015"),pch=c(16,15,14,13), col=c("Red","Blue","Orange","Brown"))
So surely there has to be a more efficient way to get there.
My next step is to produce the same plot but for just one province at a time. I don't want to have to go through the above 6 times...
I'm opening to accomplishing this via ggplot. If possible, I'd like to do it without resorting to additional packages (like plyr) as I'm trying to learn the base functionality for manipulating data.
Just to close the loop after Biranjan's answer...
mosq2 <- mosquitoes %>%
select(Years,Weeks,Province,Number.of.cases) %>%
group_by(Years,Weeks,Province) %>%
summarise(sum_case=sum(Number.of.cases))
ggplot(data=mosq2, aes(x=as.factor(Weeks),y=sum_case,color=as.factor(Years))) +
geom_point(aes(shape=as.factor(Years))) +
geom_line(aes(group=as.factor(Years))) +
labs(title="West Nile Cases", x="weeks", y="Number of cases") +
theme(legend.title=element_blank()) +
facet_wrap(~Province,ncol=3) +
scale_x_discrete(breaks=c(17,30,41))
Turned out quite nicely
ggplot(data=data1, aes(x=as.factor(Weeks),y=sum_case,color=as.factor(Years)))+
geom_point(aes(shape=as.factor(Years)))+
geom_line(aes(group=as.factor(Years)))+
labs(title="West Nile cases",x="weeks",y="Number of cases")+
theme(legend.title=element_blank())
Update:
I had too few points in my simulation so it rendered fine so that was the problem. I could't find a way to plot just using ggplot. The same code works if "dplyr" is used first and variable name edited accordingly. I know it is not what you are looking for, sorry to disappoint you.
library(dplyr)
data1 <- data %>%
select(Years,Weeks,Number.of.cases) %>%
group_by(Years,Weeks) %>%
summarise(sum_case=sum(Number.of.cases))

how to color points in 17 colors based on principal component? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am doing PCA in R on a data frame(df_f)
pc_gtex <- prcomp(df_f)
plot(pc_gtex$x[,1], pc_gtex$x[,2], col=gtex_group, main = "PCA", xlab = "PC1", ylab = "PC2")
legend("topleft", col=1:17, legend = paste(unique(gtex_pm$tissue), 1:17), pch = 20, bty='n', cex=1.5)
Below is my group table for the PCA.The sample column in this table actually represents the rows of the main data to be plotted.The columns of that table are genes.So basically I have 17 groups/tissues to be represented on PCA.
head(gtex_pm)
sample tissue
1 SRR1069514 Prostate
2 SRR1071717 Bladder
3 SRR1073069 Prostate
Based on the above gtex_group object looks like the levels:
head(gtex_group)
[1] 1 2 1 1 1
THE sample head of Main table for PCA is :The rownames are the Samples
SRR1069514 0 0.0009995 5.773065971 1.644998088 0.142367241 0.176471143 0.195566784 0.0009995 0.025667747 3.380994674 1.762502288 0 0.077886539 0 0.002995509 0.01093994 2.110576771 1.38829236 2.26186726 0.431132855 3.108480433 3.96347629 0 0 0.41012092 3.48452699 1.68565794 0 1.425034189 1.87456758 2.590542128 0 0 0 1.941471742 0.961646434 0 1.17711535 0.058268908 0 0.260824618 3.08534443 1.10426296 0.242946179 0.0009995 0 0 0 0.0009995 1.560247668 1.517541898 0.016857117 0.767326579 0.0009995 3.0191069 0 2.607050533 1.446683661 2.288384744 2.62082062 0.19309663 0 0 0.234281296 0 1.415610416 2.328837464 0.008959741 0.911479175 0.375005901 0.660107327 3.184739763 1.16064768 0.001998003 0.138891999 2.219855445 3.1011278 1.81872592 2.98229236 2.4114395 3.24528404 0 1.54734972 0.406131553 0.029558802 0.003992021 0.693647056 2.07581 2.8357982 0.0009995 0.082501222 1.09661029 2.75829962 0.635518068 3.11484775 0.01291623 3.40837159 0
SRR1071717 0 0 0.0009995 4.99519673 1.626491667 0.100749903 0.327863862 0.09531018 0 0.056380333 3.328196489 1.541373182 0 0.091667189 0.044973366 0 0.033434776 1.953311265 1.56444055 1.79142608 0.993622075 3.206236281 3.82609468 0 0 2.565487674 3.2202349 1.1304339 0 1.092258815 1.80203978 2.645394351 0 0 0.0009995 1.681200279 2.047434746 0 0.948176921 0.006975614 0.014888613 0.298622013 2.49667052 1.01884732 0.38662202 0 0 0 0 0.0009995 0.941958479 1.752845376 0.017839918 0.216722984 0.051643233 3.0505518 0 2.034444176 0.988053098 2.235804059 1.89686995 0.090754363 0 0 0.198850859 0 1.585554972 2.274905524 0 0.04305949 0.056380333 0.044016885 0.771496147 1.195436473 0 0.368801124 1.974636427 2.7700856 2.00120969 2.88875935 2.2651947 2.66242502 0 0.429181635 0.04018179 0.034401427 0 0.242161557 1.9907469 2.1384177 0.0009995 0.008959741 0.99916021 2.3892214 0.086177696 3.16821391 0 3.2038434 0
SRR1073069 2.19544522 1.32866525 0.0009995 4.50198508 1.159707388 0.141499562 0.265436464 0.026641931 2.3330173 0.028587457 3.140698044 1.537297235 0.012916225 0.023716527 0 0.002995509 0.049742092 2.071157322 1.02460688 2.11818137 0.359072069 2.419656765 3.5065479 0.137149838 2.121902193 0.305276381 2.95958683 1.49939981 3.14397985 1.001366904 1.450911 1.39475844 1.930071085 1.140074079 0.037295785 1.609437912 0.412109651 0.870456196 0.943516718 0.013902905 0 0.152721087 2.88836976 1.482967248 0.272314595 2.061532121 0.552159487 2.394890764 1.391033116 0.443402947 1.593714952 1.285921387 0.00796817 0.371563556 0.020782539 3.1946651 1.26327891 2.212003715 1.46672161 2.140183804 2.71997877 0.294161039 0.018821754 0.0009995 0.179818427 1.893714192 1.731478538 2.502255288 0.013902905 0.752830183 0.347129531 0.407463111 2.467082065 0.558472277 1.563812734 0.022739487 1.608837732 2.8176816 1.30670988 2.44495233 1.81107178 3.03254625 0.569283193 0.948176921 0.101653654 0.036331929 0 0.786182047 1.9867779 3.5039946 2.463427618 0.008959741 0.76360564 2.20640453 0.514618422 2.87964779 1.11021142 3.18750899 1.22436349
SRR1074410 2.69022562 1.70055751 0.013902905 3.314622273 0.503196597 0.4940863 0.044016885 0.023716527 1.753884517 0.03246719 2.767324893 1.666385193 0.009950331 0.05259245 0 0 0.017839918 1.575260461 0.76779072 2.22202559 0.83377831 2.198113071 3.57953881 0.051643233 2.207284913 0.072320662 3.04414141 1.39177929 2.851746423 0.982452934 1.33210213 1.888583654 1.871340532 1.238664044 0.03246719 1.734659877 0.486737828 0.412109651 1.126551657 0.035367144 0 0.213497174 2.76032635 1.131402111 0.572108852 2.102425378 0.291175962 1.85159947 0.943516718 0.283674051 1.232560261 0.982078472 0 0.223943232 0.035367144 2.9064091 1.583299255 2.376671636 1.185095749 2.07681309 2.20794469 0.877549904 0.151002874 0 0.107059072 3.038312721 1.486365915 2.633829402 0 0.403463105 0.195566784 0.285930539 1.296643139 0.48796633 1.664115474 0.054488185 1.884034745 2.3757426 1.71036863 2.61732284 1.9348492 3.1138708 1.220239777 0.322807874 0.12398598 0.004987542 0.002995509 0.446607051 1.939317 3.8484227 2.78346684 0.025667747 0.78253074 2.03352848 0.181487876 2.7091163 1.00430161 3.1429015 1.24875495
The figure eventually represents 8 colors and then repeats itself,so we cant distinguish between some tissues.I want to show 17 different colors.How do I do that?
It's hard to say without knowing exactly what your data look like, but perhaps something like this would work:
cols <- rainbow(17)[as.factor(gtex_pm$tissue)]
plot(pc_gtex$x[,1], pc_gtex$x[,2], col=cols, main = "PCA", xlab = "PC1", ylab = "PC2")

How can I apply fisher test on this set of data (nominal variables)

I'm pretty new in statistics:
fisher = function(idxToTest, idxATI){
idxDependent=c()
dependent=c()
p = c()
for(i in c(1:length(idxToTest)))
{
tbl = table(data[[idxToTest[i]]], data[[idxATI]])
rez = fisher.test(tbl, workspace = 20000000000)
if(rez$p.value<0.1){
dependent=c(dependent, TRUE)
if(rez$p.value<0.1){
idxDependent = c(idxDependent, idxToTest[i])
}
}
else{
dependent = c(dependent, FALSE)
}
p = c(p, rez$p.value)
}
}
This is the function I use. It seems to work.
What I understood until now is that I have to pass as first parameter data like:
Men Women
Dieting 10 30
Non-dieting 5 60
My data comes from a CSV:
data = read.csv('***.csv', header = TRUE, sep=',');
My first problem is that I don't know how to converse from:
Loan.Purpose Home.Ownership
lp_value_1 ho_value_2
lp_value_1 ho_value_2
lp_value_2 ho_value_1
lp_value_3 ho_value_2
lp_value_2 ho_value_3
lp_value_4 ho_value_2
lp_value_3 ho_value_3
to:
ho_value_1 ho_value_2 ho_value_3
lp_value1 0 2 0
lp_value2 1 0 1
lp_value3 0 1 1
lp_value4 0 1 0
The second issue is that I don't know what the second parameter should be
POST UPDATE: This is what I get using fisher.test(myTable):
Error in fisher.test(test) : FEXACT error 501.
The hash table key cannot be computed because the largest key
is larger than the largest representable int.
The algorithm cannot proceed.
Reduce the workspace size or use another algorithm.
where myTable is:
MORTGAGE NONE OTHER OWN RENT
car 18 0 0 5 27
credit_card 190 0 2 38 214
debt_consolidation 620 0 2 87 598
educational 5 0 0 3 7
...
Basically, fisher tests only work on smallish data sets because they require alot of memory. But all is good because chi-square tests make minimal additional assumptions and are easier on the computer. Just do:
chisq.test(Loan.Purpose,Home.Ownership)
to get your p-values.
Make sure you read through and understand the help page for chisq.test, especially the examples at the bottom.
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/chisq.test.html
Then look at a mosaicplot to see the quantities like:
mosaicplot(Loan.Purpose,Home.Ownership)
this reference explains how mosaicplots work.
http://alumni.media.mit.edu/~tpminka/courses/36-350.2001/lectures/day12/

Check if a column has an value if so right true or false to column next to it

i was wondering how to make something that checks if column Lair in the data
is below or above an certain threshold lets say below 0.5 is called LOH en
above is called imbalance. So the calls LOH and INBALANCE should be written in a new column. I tried something as the code below.
detection<-function(assay,method,thres){
if(method=="threshold"){
idx<-ifelse(segmenten["intensity"]<1.1000000 & segmenten["intensity"]>0.900000 & segmenten["Lair"]>thres,TRUE,FALSE)
}
if(method=="cnloh"){
idx<-ifelse(segmenten["intensity"]<1.1000000 & segmenten["intensity"]>0.900000 & segmenten["Lair"]<thres,TRUE,FALSE)
}
if(method=="gain"){
idx<-ifelse(segmenten["intensity"]>1.1000000 & segmenten["Lair"]<thres,TRUE,FALSE)
}
if(method=="loss"){
idx<-ifelse(segmenten["intensity"]<0.900000 & segmenten["Lair"]<thres,TRUE,FALSE)
}
if(method=="bloss"){
idx<-ifelse(segmenten["intensity"]<0.900000 & segmenten["Lair"]>thres,TRUE,FALSE)
}
if(method=="bgain"){
idx<-ifelse(segmenten["intensity"]>1.100000 & segmenten["Lair"]>thres,TRUE,FALSE)
}
return(idx)
}
After this part the next step is to write the data from the function to the existing table.
Anyone has an idea
Since your desired result is not clear enough I made some assumptions and wrote something that might be useful or not.
First at all, inside your function there is an object segmenten which is not defined, I suppose this is the data set supplied as an input, then you used ifelse and the returning results are TRUE or FALSE but you want either LOH or INBALANCE when some conditions are met.
You want INBALANCE when ... & segmenten["Lair"]>thres and LOH otherwise (here ... means the other part of the condition) this will give a vector, but you want it in the main dataset as an addional column, don't you? So maybe this could be a new starting point for you to improve your code.
detection <- function(assay, method=c('threshold', 'cnloh', 'gain', 'loss', 'bloss', 'bgain'),
thres=0.5){
x <- assay
idx <- switch(match.arg(method),
threshold = ifelse(x["intensity"]<1.1 & x["intensity"]>0.9 & x["Lair"]>thres, 'INBALANCE', 'LOH'),
cnloh = ifelse(x["intensity"]<1.1 & x["intensity"]>0.9 & x["Lair"]<thres, 'LOH', 'INBALANCE'),
gain = ifelse(x["intensity"]>1.1 & x["Lair"]<thres, 'LOH', 'INBALANCE'),
loss = ifelse(x["intensity"]<0.9 & x["Lair"]<thres,'LOH', 'INBALANCE'),
bloss = ifelse(x["intensity"]<0.9 & x["Lair"]>thres, 'INBALANCE', 'LOH'),
bgain = ifelse(x["intensity"]>1.1 & x["Lair"]>thres, 'INBALANCE', 'LOH'))
colnames(idx) <- 'Checking'
return(cbind(x, as.data.frame(idx)))
}
Example:
Data <- read.csv("japansegment data.csv", header=T)
result <- detection(Data, method='threshold', thres=0.5) # 'threshold' is the default value for method
head(result)
SNP_NAME x0 x1 y pos.start pos.end chrom count copynumber intensity allele.B Lair uncertain sample_id
1 SNP_A-1656705 0 0 0 836727 27933161 1 230 2 1.0783 1 0.9218 FALSE GSM288035
2 SNP_A-1677548 0 0 0 28244579 246860994 1 4408 2 0.9827 1 0.9236 FALSE GSM288035
3 SNP_A-1669537 0 0 0 100819 159783145 2 3480 2 0.9806 1 0.9193 FALSE GSM288035
4 SNP_A-1758569 0 0 0 159783255 159791136 2 5 2 1.7244 1 0.9665 FALSE GSM288035
5 SNP_A-1662168 0 0 0 159817465 168664268 2 250 2 0.9786 1 0.9197 FALSE GSM288035
6 SNP_A-1723506 0 0 0 168721411 168721920 2 2 2 1.8027 -4 NA FALSE GSM288035
Checking
1 INBALANCE
2 INBALANCE
3 INBALANCE
4 LOH
5 INBALANCE
6 LOH
Using match.arg and switch functions will help you to avoid a lot of if statements.

Resources