My question is pretty simple: the cut() function allows to choose the breaks along which I can divide the range of my vector into intervals. I would like to be able to control for the number of observations within the newly created interval, in a way similar to what could be obtained with a quantile argument in the cut() function call. However I don't want to be using the quantile argument because I would like for the intervals to be chosen fixed, so that I can match them between different databases for further comparison, and I want the same discrete values to be found in the labels of the newly cut vectors.
I used to use this for the quantile approach:
df$z<-cut(df$x, quantile(x, (0:10)/10), include.lowest=TRUE)
Which is fairly simple. My new approach is even simpler, so it resembles this for example:
df$z<-cut(df$x, c(0.04,0.055,0.06,0.065,0.07,0.075,0.08,0.085,0.09,0.095,0.11), include.lowest=T)
I then have another variable which I want to calculate some statistics on, according to the levels of the discrete variable.
So it would go something like this :
df$conf.intx<-ifelse(df$z=="1",t.test(df[df$z=="1",]$y)$conf.int[1],
ifelse(df$z=="2",t.test(df[df$z=="2",]$y)$conf.int[1],
ifelse(df$z=="3",t.test(df[df$z=="3",]$y)$conf.int[1],
ifelse(df$z=="4",t.test(df[df$z=="4",]$y)$conf.int[1],NA))))
But for me to be able to calculate this kind of t-test confidence interval on each of the 'pools' of the y values (which number in the same amount as the observations within the intervals of the discrete variable), I need to be able to control for the number of values within each created interval for z, so that my test remains valid, at least as far as the number of observations is concerned.
Simply put, I'd need an automated procedure that would create the vector of breaks for the z variable so that each of them contains a minimum number of observations. As an added complication, it should be the same breaks for two different databases, which I don't know if it's possible.
Any help on the matter would be welcome, thank you in advance.
EDIT: here is a sample of my data for x.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1161917, 7.0606333, 5.2622667, 6.780925, 5.4615417, 6.48185,
5.51585, 6.2224333, 5.3660667, 7.196525, 6.2984083, 7.0137833,
7.4490083, 5.9712333, 6.4287833, 7.6693917, 6.4406417, 5.4135083,
7.16245, 7.2267, 5.820325, 6.066175, 5.760975, 6.4775, 6.2625,
5.5182583, 8.446625, 8.19025, 6.7955333, 4.7899583, 6.5680167,
4.5965917, 6.3539333, 4.6639, 6.0489667, 4.9047833, 5.353625,
4.711425, 6.6268833, 5.5458083, 6.3271917, 6.4591417, 5.1843917,
5.6117167, 7.1828417, 5.6956917, 5.0271917, 6.741875, 6.68305,
4.7859667, 5.3068667, 5.3245, 5.745675, 5.7518917, 5.37945, 8.0030417,
7.7064583, 6.2935333, 5.1838667, 6.9369333, 4.9734583, 6.7257167,
5.0510333, 6.4257667, 5.2858083, 5.7285167, 5.084, 7.0092833,
5.905875, 6.6893417, 6.8319583, 5.5558083, 5.9854833, 7.5552167,
6.064625, 5.3990333, 7.115175, 7.0600167, 5.1644833, 5.6848667,
5.7014417, 6.1051, 6.1186333, 5.7217667, 8.3685417, 8.071325,
6.6547333, 5.5972417, 7.4226, 5.539725, 7.26335, 5.645975, 6.87475,
5.8486167, 6.3001667, 5.5997833, 7.4353167, 6.5089583, 7.213625,
7.3125667, 6.12095, 6.5410083, 8.0639083, 6.6505167, 5.8886417,
7.6301167, 7.5850417, 5.7693667, 6.2480167, 6.1847167, 6.6896167,
6.6323917, 6.1972167, 8.8560333, 8.5501083, 7.1036167, 4.9929583,
6.9839583, 5.3847417, 6.8814417, 5.59555, 6.7867167, 5.7831333,
6.9370917, 5.7400917, 7.6922, 6.3151, 7.084725, 7.0414417, 5.95435,
6.4274167, 7.6692167, 6.9159, 6.0856083, 7.3079583, 7.1937667,
5.744675, 5.946525, 6.0651833, 6.8488833, 6.5924333, 5.772025,
8.3281167, 8.5475917, 6.7952917, 8.248525, 5.1931083, 7.0688917,
5.4793583, 7.0091583, 5.7593, 7.1053333, 5.9382583, 7.1765417,
6.003075, 7.7699833, 6.2757333, 7.2446583, 7.179275, 6.0013083,
6.447975, 7.7845833, 6.9071083, 6.1009, 7.425425, 7.4619083,
5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417, 6.0763917, 8.5926583,
8.7468417, 7.2485167, 8.5096833, 5.1541, 7.0479917, 5.43065,
6.9689083, 5.7356, 7.0842917, 5.9051667, 7.1283333, 5.9666667,
7.7295583, 6.249925, 7.21005, 7.1427167, 5.9675583, 6.4135667,
7.7448583, 6.874275, 6.0679333, 7.388675, 7.429025, 5.911225,
6.1757167, 6.095225, 7.045775, 6.9870833, 6.0567333, 8.5771167,
8.7541917, 7.3187333, 8.5092083, 5.5746, 7.342925, 5.8561667,
7.4704667, 5.922225, 6.9787, 6.1564167, 7.6059667, 5.9122917,
7.7848833, 6.6192, 7.34055, 7.2352417, 5.9776083, 6.5197583,
7.4891583, 7.2185667, 6.4710167, 7.70945, 7.5078083, 6.1470417,
6.66115, 6.6899333, 7.4454083, 7.2270917, 6.350075, 8.3156667,
8.9007917, 6.7578083, 8.3258083, 5.1996, 6.9688833, 5.3592917,
6.7583417, 5.5623583, 6.756375, 5.7361, 7.120425, 5.6567, 7.6174667,
6.1474833, 7.1442167, 6.74475, 5.5820333, 6.0106, 7.142675, 6.667475,
5.9067917, 7.2392, 7.058675, 5.6394417, 5.9119167, 5.8367333,
6.798025, 6.694675, 5.8565917, 8.6035083, 8.912375, 7.0501083,
8.38045, 4.8478083, 6.7493167, 5.3686667, 6.5152333, 5.282025,
6.5464333, 5.5085583, 6.870975, 5.4757667, 7.318, 5.92225, 6.9300417,
6.5758083, 5.4233083, 5.8295583, 7.0451, 6.4790083, 5.68255,
6.9632833, 6.9965833, 5.5005667, 5.717725, 5.5938083, 6.5309,
6.4824583, 5.4429833, 8.072575, 8.3635, 6.5797167, 8.0352333,
4.6289833, 6.64105, 4.8883833, 6.2025833, 5.2291833, 6.4814667,
5.2211083, 6.5780083, 5.196275, 7.030725, 5.6001583, 6.620475,
6.2858333, 5.114375, 5.5424417, 6.7784917, 6.1561333, 5.339375,
6.6249083, 6.6248583, 5.139775, 5.4195, 5.4531833, 6.3348583,
6.4041417, 5.292, 7.6243833, 7.9624583, 6.3226417, 7.761175,
4.8419083, 6.8384083, 5.3500417, 6.5903333, 5.33275, 6.732575,
5.4486, 6.8069417, 5.4569583, 7.26275, 5.835525, 6.8680333, 6.6712333,
5.4720417, 5.904325, 7.1506917, 6.4746833, 5.638675, 6.9570667,
7.0017333, 5.5033667, 5.6859333, 5.651875, 6.5903, 6.529725,
5.4819667, 7.971975, 8.2337833, 6.5815333, 7.9736583, 5.7711917,
7.543325, 5.8986917, 7.5081333, 6.2920333, 7.5321667, 6.4908917,
7.7616583, 6.4509417, 8.08035, 6.8219, 7.7939167, 7.6491333,
6.4773583, 6.9338667, 8.1865583, 7.3998917, 6.572125, 7.9198417,
8.0568, 6.5880333, 6.8299667, 6.7399833, 7.6436, 7.509275, 6.5139833,
9.1520167, 9.3580667, 7.65415, 9.0725167, 5.7483583, 7.5230417,
5.89105, 7.4808833, 6.1969667, 7.4923583, 6.4092583, 7.70695,
6.3970833, 8.0971333, 6.7949083, 7.76445, 7.6170167, 6.4494333,
6.8997, 8.1575333, 7.3728417, 6.544075, 7.888, 8.0215, 6.5484,
6.7911667, 6.7121917, 7.6179083, 7.4731167, 6.4629167, 9.1226333,
9.3307083, 7.6230583, 9.024875, 5.543925, 7.1460833, 5.6575583,
7.5986083, 6.027075, 7.4386167, 6.3500333, 7.6694833, 6.3682583,
8.0843333, 6.7181083, 7.7376, 7.5818583, 6.4010667, 6.8440083,
8.1217917, 7.3290833, 6.5187333, 7.8591667, 7.9898583, 6.5051,
6.7251167, 6.6881333, 7.477675, 7.3571333, 6.3351833, 8.881575,
9.12315, 7.3851, 8.8008667, 5.3437833, 7.1560417, 5.5748, 7.4622583,
5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685, 8.0270917,
6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417, 8.0401167,
7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667, 6.6103333,
7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625, 7.2460917,
8.660125, 5.2502833, 7.2591, 5.6425417, 6.889925, 5.353675, 6.50635,
6.260675, 7.4236583, 5.9076417, 7.3915, 6.2134917, 7.1645333,
6.922675, 6.0295417, 6.1687917, 7.2771083, 6.6152333, 6.3299417,
7.167325, 6.647275, 5.726475, 5.93905, 6.2888583, 6.7497167,
6.4364083, 5.8906583, 7.6052917, 8.039425, 6.5672833, 7.8754667,
6.3086333, 5.352025, 7.2849417, 5.7184833, 6.9675917, 5.5615333,
6.6157917, 6.3505417, 7.4881, 6.0007417, 7.5110583, 6.35525,
7.254075, 7.0289083, 6.1994417, 6.2860833, 7.372575, 6.735975,
6.4628917, 7.3102167, 6.8619417, 5.9123667, 6.1611917, 6.4854083,
6.8942417, 6.563625, 6.0610083, 7.941625, 8.6969167, 6.66075,
8.1197167, 6.2802, 3.9638, 5.870825, 4.1852, 5.5841417, 4.3007583,
5.2352167, 4.4281417, 5.819425, 4.1990917, 5.9338917, 4.89765,
5.7204333, 5.6546833, 4.5632167, 4.9803333, 5.6962417, 5.247725,
4.7092583, 6.0145417, 5.6403917, 4.4016917, 4.7181, 4.5007833,
5.2828917, 5.1314167, 4.7492, 6.777575, 6.9040083, 4.9760583,
6.4471917, 5.0952833, 3.712725, 5.8215333, 4.025725, 5.5635,
4.2354083, 5.143525, 4.4900083, 5.6802417, 4.1214333, 5.8128,
4.7525583, 5.6412583, 5.5534917, 4.487475, 4.8237833, 5.6156917,
5.0573, 4.5755417, 5.8096083, 5.5252083, 4.3145583, 4.5437417,
4.194675, 5.0100833, 4.8972333, 4.590025, 6.6441417, 6.5789417,
4.6947667, 6.1648167, 4.8517333, 3.982925, 5.7966833, 4.1607083,
5.5564833, 4.2557417, 5.2304083, 4.8661333, 5.912875, 4.4988333,
6.03915, 4.9131583, 5.8518667, 5.6578583, 4.773225, 4.8958583,
5.8759833, 5.204725, 4.8961667, 5.9217, 5.58395, 4.5410667, 4.73445,
4.5922333, 5.2517333, 5.0220333, 4.619475, 6.4883667, 6.429175,
4.6796417, 6.3171083, 4.93615, 3.9278833, 5.7590417, 4.1155667,
5.612725, 4.2199833, 5.2126667, 4.805275, 5.8888833, 4.4363,
6.0380083, 4.892, 5.8192083, 5.64205, 4.708825, 4.8751583, 5.833775,
5.2210417, 4.853225, 5.924225, 5.5856583, 4.5386167, 4.7280917,
4.5618, 5.264425, 5.03855, 4.5539, 6.4993, 6.4900667, 4.6749083,
6.2961333, 4.918525, 4.0890583, 6.33385, 4.3470083, 5.9645, 4.6541833,
5.5438667, 4.9556583, 6.1590583, 4.6379417, 6.2876833, 5.2235167,
6.1387167, 6.0547583, 4.9545667, 5.254125, 6.05395, 5.4813417,
4.9971333, 6.2266583, 5.9172833, 4.7275917, 4.9274917, 4.443575,
5.3164917, 5.2507083, 5.1704583, 7.173075, 6.9351583, 5.0816667,
6.5568, 5.3417667, 5.1705167, 7.0777833, 5.6253333, 7.231225,
5.5799167, 6.6942917, 6.1014583, 7.538725, 5.7152667, 7.459275,
6.2406083, 7.064925, 6.9234417, 5.8328833, 6.1819583, 7.2127583,
6.8071583, 6.2599417, 7.2975417, 6.973875, 5.804125, 6.1944667,
6.38855, 7.0553583, 6.8393167, 6.1275417, 7.9986833, 8.5846,
6.4682167, 8.0134583, 6.1805917, 5.0699583, 6.9006667, 5.36365,
6.9204917, 5.4478667, 6.5391583, 6.0647417, 7.2951667, 5.6632833,
7.25595, 6.1057333, 6.9578417, 6.8235583, 5.8671833, 6.0716417,
7.060175, 6.5401, 6.1229417, 7.1305083, 6.7823417, 5.62415, 5.9202,
5.9957167, 6.7142167, 6.4706417, 5.9004667, 7.8304583, 8.2144667,
6.1530583, 7.6896417, 5.9285333, 4.2625417, 5.9677583, 4.58695,
6.0400083, 4.4215333, 5.6052833, 5.04165, 6.48845, 4.6423583,
6.1688833, 5.0256167, 5.926725, 5.7214667, 4.746375, 4.9828,
6.1583083, 5.6903, 5.217375, 6.1341583, 5.7868083, 4.5895333,
4.98235, 5.159725, 5.7866167, 5.6300833, 4.882975, 6.7210833,
7.4314833, 5.2493083, 6.8503833, 5.2225583, 3.8417833, 5.9798,
4.1168583, 5.63415, 4.3311333, 5.0777667, 4.6606833, 5.789425,
4.3565167, 5.9736167, 4.8910667, 5.9445417, 5.699275, 4.6897167,
4.9036083, 5.8767, 5.088675, 4.6224417, 5.8052833, 5.5697167,
4.3237, 4.6084333, 4.2958833, 5.1394417, 5.0137583, 4.7711, 6.771275,
6.5984417, 4.845625, 6.3338083, 5.1370333, 3.1820167, 5.2699667,
3.4827167, 5.0992583, 3.7040583, 4.6358583, 4.1604917, 5.2488333,
3.7522, 5.3774167, 4.2636167, 5.1998167, 5.0456333, 4.051475,
4.289175, 5.1718917, 4.5787083, 4.1461667, 5.2983167, 5.03025,
3.8709333, 4.0917167, 3.731925, 4.5584167, 4.4200333, 4.061375,
6.064225, 6.02975, 4.1590167, 5.6589083, 4.2614833, 3.68695,
5.587375, 3.91725, 5.3387, 4.0061667, 4.9563833, 4.1942, 5.6720583,
3.9584333, 5.6873583, 4.6251, 5.4801417, 5.3975583, 4.2382, 4.6710917,
5.4898083, 5.0469667, 4.4950083, 5.72005, 5.46085, 4.30355, 4.5525917,
4.3681667, 5.1723167, 5.0331417, 4.4793083, 6.5492917, 6.720225,
4.7550917, 6.197775, 4.8082917, 4.09925, 5.986525, 4.3104417,
5.68455, 4.4287167, 5.3555667, 4.5191083, 5.9269833, 4.2695917,
5.9984167, 4.981225, 5.8049917, 5.7680667, 4.5736667, 5.0673583,
5.7443583, 5.2811083, 4.719175, 6.0376667, 5.73875, 4.3947333,
4.8157333, 4.6093417, 5.3906417, 5.2357417, 4.684825, 6.8885583,
7.018425, 5.0878167, 6.5122333, 5.2084, 3.810525, 6.2600083,
3.6246583, 5.7396417, 4.0617917, 5.6724583, 4.2505833, 4.7518417,
4.1232, 6.208375, 4.5881167, 5.252575, 5.71795, 4.0840583, 4.700325,
6.2360333, 4.701725, 3.922525, 5.5162167, 5.6220333, 3.8836833,
4.4883667, 4.5398583)), .Names = "x", row.names = c(NA, -962L
), class = "data.frame")
Assuming I want 30 values per interval (the 'n'), here is the code I used:
df$z<-cut(df$x, seq(30,length(df$x),by=30)/length(df$x), include.lowest=T)
Which gives me:
> table(df$z)
[0.0312,0.0624] (0.0624,0.0936] (0.0936,0.125] (0.125,0.156] (0.156,0.187] (0.187,0.218] (0.218,0.249] (0.249,0.281] (0.281,0.312] (0.312,0.343] (0.343,0.374]
0 0 0 0 0 0 0 0 0 0 0
(0.374,0.405] (0.405,0.437] (0.437,0.468] (0.468,0.499] (0.499,0.53] (0.53,0.561] (0.561,0.593] (0.593,0.624] (0.624,0.655] (0.655,0.686] (0.686,0.717]
0 0 0 0 0 0 0 0 0 0 0
(0.717,0.748] (0.748,0.78] (0.78,0.811] (0.811,0.842] (0.842,0.873] (0.873,0.904] (0.904,0.936] (0.936,0.967] (0.967,0.998]
0 0 0 0 0 0 0 0 0
What I want is a similar result to what I get with quantiles:
df$zbis<-cut(df$x, quantile(df$x, (0:20)/20), include.lowest=T)
table(df$zbis)
[3.18,4.29] (4.29,4.62] (4.62,4.89] (4.89,5.14] (5.14,5.33] (5.33,5.53] (5.53,5.66] (5.66,5.8] (5.8,5.94] (5.94,6.1] (6.1,6.26] (6.26,6.45] (6.45,6.58] (6.58,6.74] (6.74,6.93]
49 48 48 48 48 48 48 48 48 48 48 48 48 48 48
(6.93,7.14] (7.14,7.34] (7.34,7.62] (7.62,8.06] (8.06,9.36]
48 48 48 48 49
Except I'd like this to be reproducible for another database, and so I can't use the quantile function, since I would not get the same intervals on a different database.
SECOND EDIT: here is the second sample from another database. 'x' is the same variable, and they have similar ranges.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1931083, 7.0688917, 5.4793583, 7.0091583, 5.7593, 7.1053333,
5.9382583, 7.1765417, 6.003075, 7.7699833, 6.2757333, 7.2446583,
7.179275, 6.0013083, 6.447975, 7.7845833, 6.9071083, 6.1009,
7.425425, 7.4619083, 5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417,
6.0763917, 8.5926583, 8.7468417, 7.2485167, 8.5096833, 5.177275,
7.09985, 5.6444667, 7.0102417, 5.7303833, 7.0383333, 5.9870583,
7.3342083, 5.9363667, 7.7753333, 6.38355, 7.389575, 7.0396667,
5.889625, 6.29395, 7.51135, 6.940925, 6.1455417, 7.4281833, 7.4657167,
5.9707083, 6.1902083, 6.0936167, 6.9595167, 6.85065, 5.8525,
8.5148083, 8.805625, 7.00665, 8.4457, 5.3437833, 7.1560417, 5.5748,
7.4622583, 5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685,
8.0270917, 6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417,
8.0401167, 7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667,
6.6103333, 7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625,
7.2460917, 8.660125, 3.614125, 5.6345917, 3.9410417, 5.2901417,
4.0147333, 4.766825, 4.4500417, 5.5189, 4.11375, 5.6350667, 4.5756917,
5.5998833, 5.3663, 4.44405, 4.5767417, 5.552025, 4.847425, 4.4382583,
5.5769417, 5.2390667, 4.0610917, 4.4054833, 4.1917, 4.9029083,
4.6935917, 4.3499417, 6.0562333, 6.081225, 4.45855, 6.0121583,
4.740275, 4.5028, 6.4177833, 4.8716417, 6.1469917, 4.6208917,
5.7748083, 5.4530083, 6.694125, 5.0944333, 6.5123167, 5.3257083,
6.2765333, 6.0149167, 5.1815583, 5.30715, 6.4149083, 5.82245,
5.515425, 6.3654333, 5.8472833, 4.9798917, 5.1833583, 5.5210333,
6.0410667, 5.7377917, 5.2666083, 7.0378167, 7.744175, 5.718725,
7.3220583, 5.24325, 5.3256, 7.2155167, 5.696925, 7.0029667, 5.5235,
6.7261083, 6.2810667, 7.546825, 5.90915, 7.3299167, 6.2227333,
7.147075, 6.9142417, 6.0012083, 6.1725333, 7.29815, 6.7, 6.3454583,
7.2129583, 6.7559833, 5.8115, 6.0756667, 6.458225, 6.9969167,
6.778825, 6.2245833, 8.0809583, 8.875325, 6.7210917, 8.3203,
6.3513, 5.2591333, 7.1404917, 5.6266417, 6.9356, 5.4568, 6.6604,
6.206025, 7.48525, 5.8323667, 7.24635, 6.1446583, 7.066275, 6.8334,
5.9198667, 6.09505, 7.2206583, 6.63085, 6.270075, 7.1397333,
6.689125, 5.7441333, 6.042575, 6.38255, 6.9325833, 6.7175667,
6.1592, 8.00415, 8.8051167, 6.647125, 8.2465667, 6.2788167, 6.49435,
8.1847583, 6.664475, 8.0528583, 6.6822417, 7.376, 7.1517833,
8.2306833, 6.8584583, 8.3052167, 7.288375, 8.2758583, 7.7162583,
7.2807833, 7.0459, 8.2507833, 7.5855, 7.0505917, 8.2230167, 8.1669,
6.8184667, 6.9700583, 7.0936167, 7.7615667, 7.6239083, 7.0921667,
9.02585, 9.3416167, 7.6256333, 9.0869333, 8.0984667, 4.116325,
6.1680917, 4.56965, 5.797725, 4.36085, 5.42455, 5.144075, 6.1531833,
4.77825, 6.2533417, 5.0192083, 5.99395, 5.6934083, 4.9074167,
4.9823083, 5.9861667, 5.4068833, 5.1872833, 6.10095, 5.659325,
4.6632833, 4.86315, 5.221775, 5.5878, 5.3217083, 4.8202333, 6.4883083,
6.69355, 4.952075, 6.7075583, 5.00015, 5.2502833, 7.2591, 5.6425417,
6.889925, 5.353675, 6.50635, 6.260675, 7.4236583, 5.9076417,
7.3915, 6.2134917, 7.1645333, 6.922675, 6.0295417, 6.1687917,
7.2771083, 6.6152333, 6.3299417, 7.167325, 6.647275, 5.726475,
5.93905, 6.2888583, 6.7497167, 6.4364083, 5.8906583, 7.6052917,
8.039425, 6.5672833, 7.8754667, 6.3086333, 5.352025, 7.2849417,
5.7184833, 6.9675917, 5.5615333, 6.6157917, 6.3505417, 7.4881,
6.0007417, 7.5110583, 6.35525, 7.254075, 7.0289083, 6.1994417,
6.2860833, 7.372575, 6.735975, 6.4628917, 7.3102167, 6.8619417,
5.9123667, 6.1611917, 6.4854083, 6.8942417, 6.563625, 6.0610083,
7.941625, 8.6969167, 6.66075, 8.1197167, 6.2802, 3.9638, 5.870825,
4.1852, 5.5841417, 4.3007583, 5.2352167, 4.4281417, 5.819425,
4.1990917, 5.9338917, 4.89765, 5.7204333, 5.6546833, 4.5632167,
4.9803333, 5.6962417, 5.247725, 4.7092583, 6.0145417, 5.6403917,
4.4016917, 4.7181, 4.5007833, 5.2828917, 5.1314167, 4.7492, 6.777575,
6.9040083, 4.9760583, 6.4471917, 5.0952833, 3.712725, 5.8215333,
4.025725, 5.5635, 4.2354083, 5.143525, 4.4900083, 5.6802417,
4.1214333, 5.8128, 4.7525583, 5.6412583, 5.5534917, 4.487475,
4.8237833, 5.6156917, 5.0573, 4.5755417, 5.8096083, 5.5252083,
4.3145583, 4.5437417, 4.194675, 5.0100833, 4.8972333, 4.590025,
6.6441417, 6.5789417, 4.6947667, 6.1648167, 4.8517333, 4.1059833,
5.9023167, 4.2812417, 5.6593917, 4.3587583, 5.3359583, 4.983275,
6.0223417, 4.6178333, 6.1545333, 5.0244667, 5.9596, 5.7608833,
4.8875333, 4.9990583, 5.9919333, 5.3157417, 5.0169333, 6.024775,
5.6717167, 4.6372083, 4.8370583, 4.7311333, 5.3704, 5.133575,
4.7174917)), .Names = "x", row.names = c(NA, -455L), class = "data.frame")
Updated after some comments:
Since you state that the minimum number of cases in each group would be fine for you, I'd go with Hmisc::cut2
v <- rnorm(10, 0, 1)
Hmisc::cut2(v, m = 3) # minimum of 3 cases per group
The documentation for cut2 states:
m desired minimum number of observations in a group.
The algorithm does not guarantee that all groups will have at least m observations.
The same cuts for separate variables
If the distributions of your variables are very similar you could extract the exact cutpoints by setting the argument onlycuts = T and reuse them for the other variables. In case the distributions are different though, you will end up with few cases in some intervals.
Using your data:
library(magrittr)
library(Hmisc)
cuts <- cut2(df1$x, g = 20, onlycuts = T) # determine cuts based on df1
cut2(df1$x, cuts = cuts) %>% table
cut2(df2$x, cuts = cuts) %>% table*2 # multiplied by two for better comparison
This is a good example of how NOT to pose a question. At last we have an example an, it is possible to post code that applies to it. (You apparently naively pasted the exact code in my comment without thinking about how to express 'n' and 'N' in the context of the problem. I did need to add prob=c( seq(...) , 1) in order to capture the highest values.
This assumes that you want groups of size 100 (although it is still very unclear why this is needed).
x$xct <- cut( x$x, breaks=quantile(x$x, prob=c( seq(100, length(x$x), by=100)/length(x$x) , 1) ))
table(x$xct)
(4.64,5.17] (5.17,5.57] (5.57,5.85] (5.85,6.17] (6.17,6.51] (6.51,6.85]
100 100 100 100 100 100
(6.85,7.26] (7.26,7.94] (7.94,9.36]
100 100 62
I'm trying to plot a simple graph that show the increase in wealth for two different investment strategies. When using the standard graph from R it works, but when I try to use ggplot2 I get these weird spikes in the lines.
Does any have any idea what could be causing this?
I've tried to simply the code as much as possible:
For the standard graph
ind.ts = ts(cbind(ind.passive,ind.active), start=c(insample.endstart,1),frequency=12)
plot(log(ind.ts),type="lines", col=c("blue","red"))
legend(x="topleft", legend=c("Passive","Active"), col=c("blue","red"), lty=1)
For the ggplot graph
testers=data.frame(ind.ts)
ggplot(testers, aes(date)) +
geom_line(aes(y = log(ind.passive), colour = "Passive",size="1")) +
geom_line(aes(y = log(ind.active), colour = "Active",size="1"))
The Ind.ts data set
structure(c(1, 1.026669, 1.066102329621, 1.09764083483818, 1.13073909657189,
1.17422279926966, 1.201650295415, 1.24229131005623, 1.24436842112664,
1.29675757602449, 1.29281154272065, 1.34840890311535, 1.37447769243928,
1.42187380670767, 1.43432089001159, 1.44828830683852, 1.47037760009442,
1.50663270057995, 1.51269991046518, 1.44617893190248, 1.47609892782461,
1.55880475075062, 1.60230787373457, 1.72267003659376, 1.6884336922865,
1.7947931958647, 1.80827747714523, 1.73407842742553, 1.83823238001199,
1.94879470474019, 2.03637158997651, 2.19836698633073, 2.07500122615881,
2.18823196806907, 2.11573803119891, 2.21303659177769, 2.25083083069207,
2.27667036862841, 2.44006700098487, 2.56495939036328, 2.59127330874902,
2.54554769994283, 2.64902166839781, 2.62135793511473, 2.24229384954953,
2.38534322797539, 2.58003017155629, 2.73574015247005, 2.89313822640227,
3.01496249083961, 2.92082933195062, 3.03735873897812, 3.15584610338566,
3.08028252428619, 3.25121048184135, 3.15027015001163, 3.13383204036887,
3.04763285626648, 3.24152630621501, 3.30661615444381, 3.5011906754359,
3.32628169286315, 3.26271977599422, 3.58162126961968, 3.47465973202375,
3.4018482373392, 3.48660188432426, 3.43296051433394, 3.64465402445034,
3.45302176049876, 3.43920276741325, 3.16710336206381, 3.18321124976327,
3.29673729577483, 2.9957319937214, 2.80662641161774, 3.02543381329387,
3.04403720581181, 2.97111425050939, 2.94227958670819, 2.75683358891715,
2.53472102032527, 2.58379068455775, 2.78122846592754, 2.80549468429276,
2.76500859050373, 2.71079783207832, 2.81360212906206, 2.64401226073284,
2.62324090041252, 2.43641368348514, 2.24723834303094, 2.26148583412576,
2.01595857860056, 2.19346574740491, 2.32192606890168, 2.18514140418268,
2.12856372294559, 2.09571359900937, 2.1165869064555, 2.29149953181808,
2.41150994529845, 2.44221328992199, 2.48518647497146, 2.53301388868229,
2.50620193667058, 2.64742390960003, 2.6698343529948, 2.80897010046677,
2.86115795596334, 2.89979789415863, 2.85611823847891, 2.81197121886675,
2.84980347964538, 2.90496997540435, 2.80930350417434, 2.81972040156782,
2.85016210302314, 2.89418855702854, 3.00999951213804, 3.11183381563269,
3.03729294841303, 3.09892873421517, 3.04396923311387, 2.98710484387007,
3.08097760069353, 3.08499827646243, 3.20047593194697, 3.16912086924169,
3.19575099190593, 3.14371138275373, 3.25904157854143, 3.26071346687123,
3.3485896948034, 3.35499219829987, 3.3971510302637, 3.44342702159796,
3.34200432210381, 3.3473849490624, 3.36955802696499, 3.4464479715823,
3.53637269205683, 3.65311189099431, 3.71864871831875, 3.7710110109214,
3.82954087282191, 3.75144504580245, 3.79450413203817, 3.96444479409563,
4.09921609487092, 4.03197255405065, 3.90887240000293, 3.96507025849778,
4.11298323942078, 4.18000430130714, 4.00202389816178, 3.973681564915,
3.73688988046171, 3.6132997214452, 3.59812747591486, 3.77562310430174,
3.82238042082541, 3.50029900180582, 3.47233161278139, 3.52122551422096,
3.20811814149644, 2.67119786498117, 2.47785656351383, 2.50381211101664,
2.29590056094204, 2.04999813136234, 2.23149881591877, 2.44744541933286,
2.58359925545577, 2.59022877114527, 2.78828284344458, 2.88774646903593,
2.99667515359443, 2.94310059519847, 3.1174675330616, 3.17829867703423,
3.06610473373492, 3.15882374088307, 3.34981254190434, 3.40448483240076,
3.13064849939144, 2.96722864772321, 3.17659630110655, 3.0311907820197,
3.30193068028814, 3.42901538831107, 3.42659107443153, 3.65581631094671,
3.74411158648869, 1, 1.026669, 1.066102329621, 1.09764083483818,
1.13073909657189, 1.17422279926966, 1.201650295415, 1.24229131005623,
1.24436842112664, 1.29675757602449, 1.29281154272065, 1.34840890311535,
1.37447769243928, 1.42187380670767, 1.43432089001159, 1.44828830683852,
1.47037760009442, 1.50663270057995, 1.51269991046518, 1.44617893190248,
1.47609892782461, 1.55880475075062, 1.60230787373457, 1.72267003659376,
1.6884336922865, 1.7947931958647, 1.80827747714523, 1.73407842742553,
1.83823238001199, 1.94879470474019, 2.03637158997651, 2.19836698633073,
2.07500122615881, 2.18823196806907, 2.11573803119891, 2.21303659177769,
2.25083083069207, 2.27667036862841, 2.44006700098487, 2.56495939036328,
2.59127330874902, 2.54554769994283, 2.64902166839781, 2.62135793511473,
2.24229384954953, 2.2509042579318, 2.25833224198298, 2.39462710945113,
2.53239958556629, 2.63903386731532, 2.55663795191, 2.6586375796394,
2.76235103162114, 2.69620929852, 2.84582464870417, 2.75747033083585,
2.74308185064955, 2.66763064126559, 2.83734797029354, 2.89432191753704,
3.06463539645259, 2.91153540595201, 2.85589887587967, 3.13503728790702,
3.04141253434097, 2.97767973468385, 3.05186564759377, 3.00491269460554,
3.19021063591839, 3.02247255089243, 3.01037661574376, 3.02584995154869,
3.04040428981563, 3.05344762421894, 3.06587515604951, 3.07715757662378,
3.08709679559627, 3.09641982791897, 3.10543040961822, 3.1145293207184,
3.12325000281641, 3.13012115282261, 3.13575537089769, 3.14064714927629,
3.14507546175677, 3.14941566589399, 3.15395082445288, 3.15865021118131,
2.96826256970236, 2.97253686780273, 2.97675787015501, 2.98092533117323,
2.98494958037031, 2.98900911179961, 2.99295460382719, 2.99603734706913,
2.99900342404273, 3.00194244739829, 3.00488435099674, 3.00770894228668,
3.01053618869243, 3.16820398996663, 3.20854156316688, 3.26499906051237,
3.32783396743193, 3.29260884488666, 3.47814406068718, 3.5075865501609,
3.69038091563598, 3.75894450266758, 3.80970904817611, 3.75232340078343,
3.69432373797752, 3.74402716954827, 3.81650404749639, 3.69081893620424,
3.70450449281968, 3.74449832332416, 3.80233958892455, 3.95449020757537,
4.08827852027806, 3.99034789660332, 4.07132402646909, 3.99911909485966,
3.92441155104859, 4.04774010845184, 4.05302240929337, 4.20473514411804,
4.16354135391111, 4.19852759190803, 4.1301587686014, 4.28167777318631,
4.28387427388395, 4.39932468556512, 4.40773619436392, 4.4631238073823,
4.52392047988646, 4.39067292607189, 4.39774190948286, 4.42687255189128,
4.52788935665288, 4.64603104574667, 4.79940117659781, 4.88550243370598,
4.95429519347499, 5.03119080917292, 4.92858973500145, 4.9851600879798,
5.20842546768007, 5.38548589145385, 5.29714238089044, 5.13541532685947,
5.20924719301373, 5.40357295030192, 5.49162417152709, 5.25779630592764,
5.22056059248906, 4.90946738678263, 4.91815714405724, 4.9233212090585,
4.92863839596428, 4.93573563525447, 4.94338602548911, 4.95010903048378,
4.95718768639737, 4.96184744282258, 4.96462607739057, 4.96542041756295,
4.96556938017547, 4.96611559280729, 4.9673571217055, 4.9682512459874,
4.96889711864938, 4.96964245321718, 4.97038789958516, 4.9711334577701,
5.14846373047568, 5.34266893085295, 5.24715269570716, 5.55802550431702,
5.66647925598276, 5.46645253824657, 5.63175806300315, 5.97226541900844,
6.06973876291208, 5.58152539525601, 5.29016976962365, 5.2908574916937,
5.04867378086891, 5.04933010846042, 5.24366872567485, 5.2399614518858,
5.59049391317115, 5.72551552216206), .Dim = c(194L, 2L), .Dimnames = list(
NULL, c("ind.passive", "ind.active")), .Tsp = c(1995, 2011.08333333333,
12), class = c("mts", "ts", "matrix"))
The date data set
structure(c(1995.1, 1995.2, 1995.3, 1995.4, 1995.5, 1995.6, 1995.7,
1995.8, 1995.9, 1995.1, 1995.11, 1995.12, 1996.1, 1996.2, 1996.3,
1996.4, 1996.5, 1996.6, 1996.7, 1996.8, 1996.9, 1996.1, 1996.11,
1996.12, 1997.1, 1997.2, 1997.3, 1997.4, 1997.5, 1997.6, 1997.7,
1997.8, 1997.9, 1997.1, 1997.11, 1997.12, 1998.1, 1998.2, 1998.3,
1998.4, 1998.5, 1998.6, 1998.7, 1998.8, 1998.9, 1998.1, 1998.11,
1998.12, 1999.1, 1999.2, 1999.3, 1999.4, 1999.5, 1999.6, 1999.7,
1999.8, 1999.9, 1999.1, 1999.11, 1999.12, 2000.1, 2000.2, 2000.3,
2000.4, 2000.5, 2000.6, 2000.7, 2000.8, 2000.9, 2000.1, 2000.11,
2000.12, 2001.1, 2001.2, 2001.3, 2001.4, 2001.5, 2001.6, 2001.7,
2001.8, 2001.9, 2001.1, 2001.11, 2001.12, 2002.1, 2002.2, 2002.3,
2002.4, 2002.5, 2002.6, 2002.7, 2002.8, 2002.9, 2002.1, 2002.11,
2002.12, 2003.1, 2003.2, 2003.3, 2003.4, 2003.5, 2003.6, 2003.7,
2003.8, 2003.9, 2003.1, 2003.11, 2003.12, 2004.1, 2004.2, 2004.3,
2004.4, 2004.5, 2004.6, 2004.7, 2004.8, 2004.9, 2004.1, 2004.11,
2004.12, 2005.1, 2005.2, 2005.3, 2005.4, 2005.5, 2005.6, 2005.7,
2005.8, 2005.9, 2005.1, 2005.11, 2005.12, 2006.1, 2006.2, 2006.3,
2006.4, 2006.5, 2006.6, 2006.7, 2006.8, 2006.9, 2006.1, 2006.11,
2006.12, 2007.1, 2007.2, 2007.3, 2007.4, 2007.5, 2007.6, 2007.7,
2007.8, 2007.9, 2007.1, 2007.11, 2007.12, 2008.1, 2008.2, 2008.3,
2008.4, 2008.5, 2008.6, 2008.7, 2008.8, 2008.9, 2008.1, 2008.11,
2008.12, 2009.1, 2009.2, 2009.3, 2009.4, 2009.5, 2009.6, 2009.7,
2009.8, 2009.9, 2009.1, 2009.11, 2009.12, 2010.1, 2010.2, 2010.3,
2010.4, 2010.5, 2010.6, 2010.7, 2010.8, 2010.9, 2010.1, 2010.11,
2010.12, 2011.1, 2011.2), .Tsp = c(1995, 2011.08333333333, 12
), class = "ts")
The spikes are in your data, specifically in the crummy way the dates are stored. January, February, March 1995 are coded as 1995.10, 1995.20, 1995.30, but then October, November, and December are 1995.10, 1995.11, 1995.12. When you pass your time series to ggplot you maybe saw a warning like:
Don't know how to automatically pick scale for object of type ts. Defaulting to continuous
So ggplot just converted to numerics, giving October the same x value as January and inserting Nov and Dec before February, causing your spikes. Since your samples (as far as I checked) are spaced every month, you could add a new column to your data like this:
ind.df <- as.data.frame(ind.ts)
ind.df$date <- seq(as.Date('1995-01-01'), as.Date('2011-02-01'), by = "month")
Then, ggplot works best with long-format data, so we can melt it
library(reshape2)
ind.melt <- melt(ind.df, id.vars = "date")
ggplot(ind.melt, aes(x = date, y = value, color = variable) +
geom_line(size = 1)
And the spikes are gone.
One other note, in ggplot don't put anything inside aes() that isn't mapping to a data column. In your post, inside aes() you have size = "1". You don't need the quotes around 1, and since it applies to the whole layer you should put it outside of aes().
The following example illustrates that for a very simple example, the plots from the basic R plotting and ggplot2 are the same, i.e. basic plotting does not get rid of the spikes, nor does ggplot2 introduces spikes. You need to make your example more complete, i.e. provide us with a sample of your data that reproduces the issue you see.
x = 1:100
y = runif(100)
y[50] = 5
plot(x, y)
library(ggplot2)
qplot(x, y, geom = 'line')