cut function and controlled frequency in the intervals - r

My question is pretty simple: the cut() function allows to choose the breaks along which I can divide the range of my vector into intervals. I would like to be able to control for the number of observations within the newly created interval, in a way similar to what could be obtained with a quantile argument in the cut() function call. However I don't want to be using the quantile argument because I would like for the intervals to be chosen fixed, so that I can match them between different databases for further comparison, and I want the same discrete values to be found in the labels of the newly cut vectors.
I used to use this for the quantile approach:
df$z<-cut(df$x, quantile(x, (0:10)/10), include.lowest=TRUE)
Which is fairly simple. My new approach is even simpler, so it resembles this for example:
df$z<-cut(df$x, c(0.04,0.055,0.06,0.065,0.07,0.075,0.08,0.085,0.09,0.095,0.11), include.lowest=T)
I then have another variable which I want to calculate some statistics on, according to the levels of the discrete variable.
So it would go something like this :
df$conf.intx<-ifelse(df$z=="1",t.test(df[df$z=="1",]$y)$conf.int[1],
ifelse(df$z=="2",t.test(df[df$z=="2",]$y)$conf.int[1],
ifelse(df$z=="3",t.test(df[df$z=="3",]$y)$conf.int[1],
ifelse(df$z=="4",t.test(df[df$z=="4",]$y)$conf.int[1],NA))))
But for me to be able to calculate this kind of t-test confidence interval on each of the 'pools' of the y values (which number in the same amount as the observations within the intervals of the discrete variable), I need to be able to control for the number of values within each created interval for z, so that my test remains valid, at least as far as the number of observations is concerned.
Simply put, I'd need an automated procedure that would create the vector of breaks for the z variable so that each of them contains a minimum number of observations. As an added complication, it should be the same breaks for two different databases, which I don't know if it's possible.
Any help on the matter would be welcome, thank you in advance.
EDIT: here is a sample of my data for x.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1161917, 7.0606333, 5.2622667, 6.780925, 5.4615417, 6.48185,
5.51585, 6.2224333, 5.3660667, 7.196525, 6.2984083, 7.0137833,
7.4490083, 5.9712333, 6.4287833, 7.6693917, 6.4406417, 5.4135083,
7.16245, 7.2267, 5.820325, 6.066175, 5.760975, 6.4775, 6.2625,
5.5182583, 8.446625, 8.19025, 6.7955333, 4.7899583, 6.5680167,
4.5965917, 6.3539333, 4.6639, 6.0489667, 4.9047833, 5.353625,
4.711425, 6.6268833, 5.5458083, 6.3271917, 6.4591417, 5.1843917,
5.6117167, 7.1828417, 5.6956917, 5.0271917, 6.741875, 6.68305,
4.7859667, 5.3068667, 5.3245, 5.745675, 5.7518917, 5.37945, 8.0030417,
7.7064583, 6.2935333, 5.1838667, 6.9369333, 4.9734583, 6.7257167,
5.0510333, 6.4257667, 5.2858083, 5.7285167, 5.084, 7.0092833,
5.905875, 6.6893417, 6.8319583, 5.5558083, 5.9854833, 7.5552167,
6.064625, 5.3990333, 7.115175, 7.0600167, 5.1644833, 5.6848667,
5.7014417, 6.1051, 6.1186333, 5.7217667, 8.3685417, 8.071325,
6.6547333, 5.5972417, 7.4226, 5.539725, 7.26335, 5.645975, 6.87475,
5.8486167, 6.3001667, 5.5997833, 7.4353167, 6.5089583, 7.213625,
7.3125667, 6.12095, 6.5410083, 8.0639083, 6.6505167, 5.8886417,
7.6301167, 7.5850417, 5.7693667, 6.2480167, 6.1847167, 6.6896167,
6.6323917, 6.1972167, 8.8560333, 8.5501083, 7.1036167, 4.9929583,
6.9839583, 5.3847417, 6.8814417, 5.59555, 6.7867167, 5.7831333,
6.9370917, 5.7400917, 7.6922, 6.3151, 7.084725, 7.0414417, 5.95435,
6.4274167, 7.6692167, 6.9159, 6.0856083, 7.3079583, 7.1937667,
5.744675, 5.946525, 6.0651833, 6.8488833, 6.5924333, 5.772025,
8.3281167, 8.5475917, 6.7952917, 8.248525, 5.1931083, 7.0688917,
5.4793583, 7.0091583, 5.7593, 7.1053333, 5.9382583, 7.1765417,
6.003075, 7.7699833, 6.2757333, 7.2446583, 7.179275, 6.0013083,
6.447975, 7.7845833, 6.9071083, 6.1009, 7.425425, 7.4619083,
5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417, 6.0763917, 8.5926583,
8.7468417, 7.2485167, 8.5096833, 5.1541, 7.0479917, 5.43065,
6.9689083, 5.7356, 7.0842917, 5.9051667, 7.1283333, 5.9666667,
7.7295583, 6.249925, 7.21005, 7.1427167, 5.9675583, 6.4135667,
7.7448583, 6.874275, 6.0679333, 7.388675, 7.429025, 5.911225,
6.1757167, 6.095225, 7.045775, 6.9870833, 6.0567333, 8.5771167,
8.7541917, 7.3187333, 8.5092083, 5.5746, 7.342925, 5.8561667,
7.4704667, 5.922225, 6.9787, 6.1564167, 7.6059667, 5.9122917,
7.7848833, 6.6192, 7.34055, 7.2352417, 5.9776083, 6.5197583,
7.4891583, 7.2185667, 6.4710167, 7.70945, 7.5078083, 6.1470417,
6.66115, 6.6899333, 7.4454083, 7.2270917, 6.350075, 8.3156667,
8.9007917, 6.7578083, 8.3258083, 5.1996, 6.9688833, 5.3592917,
6.7583417, 5.5623583, 6.756375, 5.7361, 7.120425, 5.6567, 7.6174667,
6.1474833, 7.1442167, 6.74475, 5.5820333, 6.0106, 7.142675, 6.667475,
5.9067917, 7.2392, 7.058675, 5.6394417, 5.9119167, 5.8367333,
6.798025, 6.694675, 5.8565917, 8.6035083, 8.912375, 7.0501083,
8.38045, 4.8478083, 6.7493167, 5.3686667, 6.5152333, 5.282025,
6.5464333, 5.5085583, 6.870975, 5.4757667, 7.318, 5.92225, 6.9300417,
6.5758083, 5.4233083, 5.8295583, 7.0451, 6.4790083, 5.68255,
6.9632833, 6.9965833, 5.5005667, 5.717725, 5.5938083, 6.5309,
6.4824583, 5.4429833, 8.072575, 8.3635, 6.5797167, 8.0352333,
4.6289833, 6.64105, 4.8883833, 6.2025833, 5.2291833, 6.4814667,
5.2211083, 6.5780083, 5.196275, 7.030725, 5.6001583, 6.620475,
6.2858333, 5.114375, 5.5424417, 6.7784917, 6.1561333, 5.339375,
6.6249083, 6.6248583, 5.139775, 5.4195, 5.4531833, 6.3348583,
6.4041417, 5.292, 7.6243833, 7.9624583, 6.3226417, 7.761175,
4.8419083, 6.8384083, 5.3500417, 6.5903333, 5.33275, 6.732575,
5.4486, 6.8069417, 5.4569583, 7.26275, 5.835525, 6.8680333, 6.6712333,
5.4720417, 5.904325, 7.1506917, 6.4746833, 5.638675, 6.9570667,
7.0017333, 5.5033667, 5.6859333, 5.651875, 6.5903, 6.529725,
5.4819667, 7.971975, 8.2337833, 6.5815333, 7.9736583, 5.7711917,
7.543325, 5.8986917, 7.5081333, 6.2920333, 7.5321667, 6.4908917,
7.7616583, 6.4509417, 8.08035, 6.8219, 7.7939167, 7.6491333,
6.4773583, 6.9338667, 8.1865583, 7.3998917, 6.572125, 7.9198417,
8.0568, 6.5880333, 6.8299667, 6.7399833, 7.6436, 7.509275, 6.5139833,
9.1520167, 9.3580667, 7.65415, 9.0725167, 5.7483583, 7.5230417,
5.89105, 7.4808833, 6.1969667, 7.4923583, 6.4092583, 7.70695,
6.3970833, 8.0971333, 6.7949083, 7.76445, 7.6170167, 6.4494333,
6.8997, 8.1575333, 7.3728417, 6.544075, 7.888, 8.0215, 6.5484,
6.7911667, 6.7121917, 7.6179083, 7.4731167, 6.4629167, 9.1226333,
9.3307083, 7.6230583, 9.024875, 5.543925, 7.1460833, 5.6575583,
7.5986083, 6.027075, 7.4386167, 6.3500333, 7.6694833, 6.3682583,
8.0843333, 6.7181083, 7.7376, 7.5818583, 6.4010667, 6.8440083,
8.1217917, 7.3290833, 6.5187333, 7.8591667, 7.9898583, 6.5051,
6.7251167, 6.6881333, 7.477675, 7.3571333, 6.3351833, 8.881575,
9.12315, 7.3851, 8.8008667, 5.3437833, 7.1560417, 5.5748, 7.4622583,
5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685, 8.0270917,
6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417, 8.0401167,
7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667, 6.6103333,
7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625, 7.2460917,
8.660125, 5.2502833, 7.2591, 5.6425417, 6.889925, 5.353675, 6.50635,
6.260675, 7.4236583, 5.9076417, 7.3915, 6.2134917, 7.1645333,
6.922675, 6.0295417, 6.1687917, 7.2771083, 6.6152333, 6.3299417,
7.167325, 6.647275, 5.726475, 5.93905, 6.2888583, 6.7497167,
6.4364083, 5.8906583, 7.6052917, 8.039425, 6.5672833, 7.8754667,
6.3086333, 5.352025, 7.2849417, 5.7184833, 6.9675917, 5.5615333,
6.6157917, 6.3505417, 7.4881, 6.0007417, 7.5110583, 6.35525,
7.254075, 7.0289083, 6.1994417, 6.2860833, 7.372575, 6.735975,
6.4628917, 7.3102167, 6.8619417, 5.9123667, 6.1611917, 6.4854083,
6.8942417, 6.563625, 6.0610083, 7.941625, 8.6969167, 6.66075,
8.1197167, 6.2802, 3.9638, 5.870825, 4.1852, 5.5841417, 4.3007583,
5.2352167, 4.4281417, 5.819425, 4.1990917, 5.9338917, 4.89765,
5.7204333, 5.6546833, 4.5632167, 4.9803333, 5.6962417, 5.247725,
4.7092583, 6.0145417, 5.6403917, 4.4016917, 4.7181, 4.5007833,
5.2828917, 5.1314167, 4.7492, 6.777575, 6.9040083, 4.9760583,
6.4471917, 5.0952833, 3.712725, 5.8215333, 4.025725, 5.5635,
4.2354083, 5.143525, 4.4900083, 5.6802417, 4.1214333, 5.8128,
4.7525583, 5.6412583, 5.5534917, 4.487475, 4.8237833, 5.6156917,
5.0573, 4.5755417, 5.8096083, 5.5252083, 4.3145583, 4.5437417,
4.194675, 5.0100833, 4.8972333, 4.590025, 6.6441417, 6.5789417,
4.6947667, 6.1648167, 4.8517333, 3.982925, 5.7966833, 4.1607083,
5.5564833, 4.2557417, 5.2304083, 4.8661333, 5.912875, 4.4988333,
6.03915, 4.9131583, 5.8518667, 5.6578583, 4.773225, 4.8958583,
5.8759833, 5.204725, 4.8961667, 5.9217, 5.58395, 4.5410667, 4.73445,
4.5922333, 5.2517333, 5.0220333, 4.619475, 6.4883667, 6.429175,
4.6796417, 6.3171083, 4.93615, 3.9278833, 5.7590417, 4.1155667,
5.612725, 4.2199833, 5.2126667, 4.805275, 5.8888833, 4.4363,
6.0380083, 4.892, 5.8192083, 5.64205, 4.708825, 4.8751583, 5.833775,
5.2210417, 4.853225, 5.924225, 5.5856583, 4.5386167, 4.7280917,
4.5618, 5.264425, 5.03855, 4.5539, 6.4993, 6.4900667, 4.6749083,
6.2961333, 4.918525, 4.0890583, 6.33385, 4.3470083, 5.9645, 4.6541833,
5.5438667, 4.9556583, 6.1590583, 4.6379417, 6.2876833, 5.2235167,
6.1387167, 6.0547583, 4.9545667, 5.254125, 6.05395, 5.4813417,
4.9971333, 6.2266583, 5.9172833, 4.7275917, 4.9274917, 4.443575,
5.3164917, 5.2507083, 5.1704583, 7.173075, 6.9351583, 5.0816667,
6.5568, 5.3417667, 5.1705167, 7.0777833, 5.6253333, 7.231225,
5.5799167, 6.6942917, 6.1014583, 7.538725, 5.7152667, 7.459275,
6.2406083, 7.064925, 6.9234417, 5.8328833, 6.1819583, 7.2127583,
6.8071583, 6.2599417, 7.2975417, 6.973875, 5.804125, 6.1944667,
6.38855, 7.0553583, 6.8393167, 6.1275417, 7.9986833, 8.5846,
6.4682167, 8.0134583, 6.1805917, 5.0699583, 6.9006667, 5.36365,
6.9204917, 5.4478667, 6.5391583, 6.0647417, 7.2951667, 5.6632833,
7.25595, 6.1057333, 6.9578417, 6.8235583, 5.8671833, 6.0716417,
7.060175, 6.5401, 6.1229417, 7.1305083, 6.7823417, 5.62415, 5.9202,
5.9957167, 6.7142167, 6.4706417, 5.9004667, 7.8304583, 8.2144667,
6.1530583, 7.6896417, 5.9285333, 4.2625417, 5.9677583, 4.58695,
6.0400083, 4.4215333, 5.6052833, 5.04165, 6.48845, 4.6423583,
6.1688833, 5.0256167, 5.926725, 5.7214667, 4.746375, 4.9828,
6.1583083, 5.6903, 5.217375, 6.1341583, 5.7868083, 4.5895333,
4.98235, 5.159725, 5.7866167, 5.6300833, 4.882975, 6.7210833,
7.4314833, 5.2493083, 6.8503833, 5.2225583, 3.8417833, 5.9798,
4.1168583, 5.63415, 4.3311333, 5.0777667, 4.6606833, 5.789425,
4.3565167, 5.9736167, 4.8910667, 5.9445417, 5.699275, 4.6897167,
4.9036083, 5.8767, 5.088675, 4.6224417, 5.8052833, 5.5697167,
4.3237, 4.6084333, 4.2958833, 5.1394417, 5.0137583, 4.7711, 6.771275,
6.5984417, 4.845625, 6.3338083, 5.1370333, 3.1820167, 5.2699667,
3.4827167, 5.0992583, 3.7040583, 4.6358583, 4.1604917, 5.2488333,
3.7522, 5.3774167, 4.2636167, 5.1998167, 5.0456333, 4.051475,
4.289175, 5.1718917, 4.5787083, 4.1461667, 5.2983167, 5.03025,
3.8709333, 4.0917167, 3.731925, 4.5584167, 4.4200333, 4.061375,
6.064225, 6.02975, 4.1590167, 5.6589083, 4.2614833, 3.68695,
5.587375, 3.91725, 5.3387, 4.0061667, 4.9563833, 4.1942, 5.6720583,
3.9584333, 5.6873583, 4.6251, 5.4801417, 5.3975583, 4.2382, 4.6710917,
5.4898083, 5.0469667, 4.4950083, 5.72005, 5.46085, 4.30355, 4.5525917,
4.3681667, 5.1723167, 5.0331417, 4.4793083, 6.5492917, 6.720225,
4.7550917, 6.197775, 4.8082917, 4.09925, 5.986525, 4.3104417,
5.68455, 4.4287167, 5.3555667, 4.5191083, 5.9269833, 4.2695917,
5.9984167, 4.981225, 5.8049917, 5.7680667, 4.5736667, 5.0673583,
5.7443583, 5.2811083, 4.719175, 6.0376667, 5.73875, 4.3947333,
4.8157333, 4.6093417, 5.3906417, 5.2357417, 4.684825, 6.8885583,
7.018425, 5.0878167, 6.5122333, 5.2084, 3.810525, 6.2600083,
3.6246583, 5.7396417, 4.0617917, 5.6724583, 4.2505833, 4.7518417,
4.1232, 6.208375, 4.5881167, 5.252575, 5.71795, 4.0840583, 4.700325,
6.2360333, 4.701725, 3.922525, 5.5162167, 5.6220333, 3.8836833,
4.4883667, 4.5398583)), .Names = "x", row.names = c(NA, -962L
), class = "data.frame")
Assuming I want 30 values per interval (the 'n'), here is the code I used:
df$z<-cut(df$x, seq(30,length(df$x),by=30)/length(df$x), include.lowest=T)
Which gives me:
> table(df$z)
[0.0312,0.0624] (0.0624,0.0936] (0.0936,0.125] (0.125,0.156] (0.156,0.187] (0.187,0.218] (0.218,0.249] (0.249,0.281] (0.281,0.312] (0.312,0.343] (0.343,0.374]
0 0 0 0 0 0 0 0 0 0 0
(0.374,0.405] (0.405,0.437] (0.437,0.468] (0.468,0.499] (0.499,0.53] (0.53,0.561] (0.561,0.593] (0.593,0.624] (0.624,0.655] (0.655,0.686] (0.686,0.717]
0 0 0 0 0 0 0 0 0 0 0
(0.717,0.748] (0.748,0.78] (0.78,0.811] (0.811,0.842] (0.842,0.873] (0.873,0.904] (0.904,0.936] (0.936,0.967] (0.967,0.998]
0 0 0 0 0 0 0 0 0
What I want is a similar result to what I get with quantiles:
df$zbis<-cut(df$x, quantile(df$x, (0:20)/20), include.lowest=T)
table(df$zbis)
[3.18,4.29] (4.29,4.62] (4.62,4.89] (4.89,5.14] (5.14,5.33] (5.33,5.53] (5.53,5.66] (5.66,5.8] (5.8,5.94] (5.94,6.1] (6.1,6.26] (6.26,6.45] (6.45,6.58] (6.58,6.74] (6.74,6.93]
49 48 48 48 48 48 48 48 48 48 48 48 48 48 48
(6.93,7.14] (7.14,7.34] (7.34,7.62] (7.62,8.06] (8.06,9.36]
48 48 48 48 49
Except I'd like this to be reproducible for another database, and so I can't use the quantile function, since I would not get the same intervals on a different database.
SECOND EDIT: here is the second sample from another database. 'x' is the same variable, and they have similar ranges.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1931083, 7.0688917, 5.4793583, 7.0091583, 5.7593, 7.1053333,
5.9382583, 7.1765417, 6.003075, 7.7699833, 6.2757333, 7.2446583,
7.179275, 6.0013083, 6.447975, 7.7845833, 6.9071083, 6.1009,
7.425425, 7.4619083, 5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417,
6.0763917, 8.5926583, 8.7468417, 7.2485167, 8.5096833, 5.177275,
7.09985, 5.6444667, 7.0102417, 5.7303833, 7.0383333, 5.9870583,
7.3342083, 5.9363667, 7.7753333, 6.38355, 7.389575, 7.0396667,
5.889625, 6.29395, 7.51135, 6.940925, 6.1455417, 7.4281833, 7.4657167,
5.9707083, 6.1902083, 6.0936167, 6.9595167, 6.85065, 5.8525,
8.5148083, 8.805625, 7.00665, 8.4457, 5.3437833, 7.1560417, 5.5748,
7.4622583, 5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685,
8.0270917, 6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417,
8.0401167, 7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667,
6.6103333, 7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625,
7.2460917, 8.660125, 3.614125, 5.6345917, 3.9410417, 5.2901417,
4.0147333, 4.766825, 4.4500417, 5.5189, 4.11375, 5.6350667, 4.5756917,
5.5998833, 5.3663, 4.44405, 4.5767417, 5.552025, 4.847425, 4.4382583,
5.5769417, 5.2390667, 4.0610917, 4.4054833, 4.1917, 4.9029083,
4.6935917, 4.3499417, 6.0562333, 6.081225, 4.45855, 6.0121583,
4.740275, 4.5028, 6.4177833, 4.8716417, 6.1469917, 4.6208917,
5.7748083, 5.4530083, 6.694125, 5.0944333, 6.5123167, 5.3257083,
6.2765333, 6.0149167, 5.1815583, 5.30715, 6.4149083, 5.82245,
5.515425, 6.3654333, 5.8472833, 4.9798917, 5.1833583, 5.5210333,
6.0410667, 5.7377917, 5.2666083, 7.0378167, 7.744175, 5.718725,
7.3220583, 5.24325, 5.3256, 7.2155167, 5.696925, 7.0029667, 5.5235,
6.7261083, 6.2810667, 7.546825, 5.90915, 7.3299167, 6.2227333,
7.147075, 6.9142417, 6.0012083, 6.1725333, 7.29815, 6.7, 6.3454583,
7.2129583, 6.7559833, 5.8115, 6.0756667, 6.458225, 6.9969167,
6.778825, 6.2245833, 8.0809583, 8.875325, 6.7210917, 8.3203,
6.3513, 5.2591333, 7.1404917, 5.6266417, 6.9356, 5.4568, 6.6604,
6.206025, 7.48525, 5.8323667, 7.24635, 6.1446583, 7.066275, 6.8334,
5.9198667, 6.09505, 7.2206583, 6.63085, 6.270075, 7.1397333,
6.689125, 5.7441333, 6.042575, 6.38255, 6.9325833, 6.7175667,
6.1592, 8.00415, 8.8051167, 6.647125, 8.2465667, 6.2788167, 6.49435,
8.1847583, 6.664475, 8.0528583, 6.6822417, 7.376, 7.1517833,
8.2306833, 6.8584583, 8.3052167, 7.288375, 8.2758583, 7.7162583,
7.2807833, 7.0459, 8.2507833, 7.5855, 7.0505917, 8.2230167, 8.1669,
6.8184667, 6.9700583, 7.0936167, 7.7615667, 7.6239083, 7.0921667,
9.02585, 9.3416167, 7.6256333, 9.0869333, 8.0984667, 4.116325,
6.1680917, 4.56965, 5.797725, 4.36085, 5.42455, 5.144075, 6.1531833,
4.77825, 6.2533417, 5.0192083, 5.99395, 5.6934083, 4.9074167,
4.9823083, 5.9861667, 5.4068833, 5.1872833, 6.10095, 5.659325,
4.6632833, 4.86315, 5.221775, 5.5878, 5.3217083, 4.8202333, 6.4883083,
6.69355, 4.952075, 6.7075583, 5.00015, 5.2502833, 7.2591, 5.6425417,
6.889925, 5.353675, 6.50635, 6.260675, 7.4236583, 5.9076417,
7.3915, 6.2134917, 7.1645333, 6.922675, 6.0295417, 6.1687917,
7.2771083, 6.6152333, 6.3299417, 7.167325, 6.647275, 5.726475,
5.93905, 6.2888583, 6.7497167, 6.4364083, 5.8906583, 7.6052917,
8.039425, 6.5672833, 7.8754667, 6.3086333, 5.352025, 7.2849417,
5.7184833, 6.9675917, 5.5615333, 6.6157917, 6.3505417, 7.4881,
6.0007417, 7.5110583, 6.35525, 7.254075, 7.0289083, 6.1994417,
6.2860833, 7.372575, 6.735975, 6.4628917, 7.3102167, 6.8619417,
5.9123667, 6.1611917, 6.4854083, 6.8942417, 6.563625, 6.0610083,
7.941625, 8.6969167, 6.66075, 8.1197167, 6.2802, 3.9638, 5.870825,
4.1852, 5.5841417, 4.3007583, 5.2352167, 4.4281417, 5.819425,
4.1990917, 5.9338917, 4.89765, 5.7204333, 5.6546833, 4.5632167,
4.9803333, 5.6962417, 5.247725, 4.7092583, 6.0145417, 5.6403917,
4.4016917, 4.7181, 4.5007833, 5.2828917, 5.1314167, 4.7492, 6.777575,
6.9040083, 4.9760583, 6.4471917, 5.0952833, 3.712725, 5.8215333,
4.025725, 5.5635, 4.2354083, 5.143525, 4.4900083, 5.6802417,
4.1214333, 5.8128, 4.7525583, 5.6412583, 5.5534917, 4.487475,
4.8237833, 5.6156917, 5.0573, 4.5755417, 5.8096083, 5.5252083,
4.3145583, 4.5437417, 4.194675, 5.0100833, 4.8972333, 4.590025,
6.6441417, 6.5789417, 4.6947667, 6.1648167, 4.8517333, 4.1059833,
5.9023167, 4.2812417, 5.6593917, 4.3587583, 5.3359583, 4.983275,
6.0223417, 4.6178333, 6.1545333, 5.0244667, 5.9596, 5.7608833,
4.8875333, 4.9990583, 5.9919333, 5.3157417, 5.0169333, 6.024775,
5.6717167, 4.6372083, 4.8370583, 4.7311333, 5.3704, 5.133575,
4.7174917)), .Names = "x", row.names = c(NA, -455L), class = "data.frame")

Updated after some comments:
Since you state that the minimum number of cases in each group would be fine for you, I'd go with Hmisc::cut2
v <- rnorm(10, 0, 1)
Hmisc::cut2(v, m = 3) # minimum of 3 cases per group
The documentation for cut2 states:
m desired minimum number of observations in a group.
The algorithm does not guarantee that all groups will have at least m observations.
The same cuts for separate variables
If the distributions of your variables are very similar you could extract the exact cutpoints by setting the argument onlycuts = T and reuse them for the other variables. In case the distributions are different though, you will end up with few cases in some intervals.
Using your data:
library(magrittr)
library(Hmisc)
cuts <- cut2(df1$x, g = 20, onlycuts = T) # determine cuts based on df1
cut2(df1$x, cuts = cuts) %>% table
cut2(df2$x, cuts = cuts) %>% table*2 # multiplied by two for better comparison

This is a good example of how NOT to pose a question. At last we have an example an, it is possible to post code that applies to it. (You apparently naively pasted the exact code in my comment without thinking about how to express 'n' and 'N' in the context of the problem. I did need to add prob=c( seq(...) , 1) in order to capture the highest values.
This assumes that you want groups of size 100 (although it is still very unclear why this is needed).
x$xct <- cut( x$x, breaks=quantile(x$x, prob=c( seq(100, length(x$x), by=100)/length(x$x) , 1) ))
table(x$xct)
(4.64,5.17] (5.17,5.57] (5.57,5.85] (5.85,6.17] (6.17,6.51] (6.51,6.85]
100 100 100 100 100 100
(6.85,7.26] (7.26,7.94] (7.94,9.36]
100 100 62

Related

radius in nn2() function in RANN r-package

I was trying to use the solution offered here to find all the location from df which are within the 70 km distance from my point of interest userLocation=c(6.9,55.2), but it does not work properly !
df = structure(list(lng = c(6.2694184, 6.25737207, 6.23839104, 6.25844252,
6.22595901, 6.21351832, 6.2010845, 6.1886414, 6.1762058, 6.1637609,
6.15132287, 6.13887619, 6.12643637, 6.14361895, 6.16332364, 6.18302157,
6.2027276, 6.22242688, 6.24213488, 6.26842752, 6.26745135, 6.24518597,
6.26645948, 6.24420242, 6.22357831, 6.26548171, 6.24321746, 6.2226023,
6.20041884, 6.18070459, 6.16099845, 6.16716672, 6.17960629, 6.18686265,
6.2078525, 6.19203657, 6.20447434, 6.21691835, 6.2293537, 6.24179593,
6.26009321, 6.26448764, 6.2422317, 6.21927538, 6.20186455, 6.26350828,
6.24124514, 6.22028969, 6.26251321, 6.2402584, 6.23404584, 6.26153227,
6.22171658, 5.94065657, 6.10363006, 6.11606487, 6.12850589, 6.14093826,
6.15337749, 6.16582359, 6.17826103, 6.19070472, 6.20313974, 6.20009703,
5.96044213, 5.96988333, 5.98023582, 5.98966667, 5.99910246, 6.00003829,
6.00947365, 6.01889843, 6.02832882, 6.01983402, 6.02925771, 6.038687,
6.0481219, 6.05754688, 6.03963788, 6.04906608, 6.05848435, 6.06792377,
6.07735326, 6.08677283, 6.05941948, 6.06885218, 6.07829049, 6.08771889,
6.09713671, 6.10657633, 6.11600538, 6.07922538, 6.08864707, 6.10756108,
6.12000483, 6.13243993, 6.12019786, 6.14488189, 6.15733073, 6.16977091,
6.16621949, 6.13805015, 6.13652024, 5.941545, 6.20491484, 6.18423897,
6.17806466, 6.16355552, 6.15738696, 6.14558294, 6.14286638, 6.13670293,
6.12217027, 6.11601258, 6.10148275, 6.09533146, 6.08080511, 6.07464337,
6.06011984, 6.03729438, 6.05394895, 6.02546329, 6.0136389, 6.03674112,
6.05743408, 6.07812006, 6.09879971, 6.11948795, 6.11063647, 6.08914275,
6.08440881, 6.0018212, 6.02491713, 5.98999461, 6.01308427, 5.97815849,
6.00125809, 5.96632973, 5.98943792, 5.9995124, 6.02119838, 6.04364466,
6.0223476, 6.04560587, 6.03821257, 6.06131821, 6.06046748, 5.97888909,
5.95766873, 6.24771247, 6.04931495, 6.25538943, 6.23227728, 6.25434093,
6.25329159, 6.25225759, 6.25120656, 6.25015469, 6.24911757, 6.06338238,
6.08539205, 6.10756976, 6.12975108, 6.15193667, 6.17411029, 6.19630377,
6.21848591, 6.22602495, 6.23123663, 6.20931486, 6.23019515, 6.20826628,
6.22915282, 6.20721685, 6.22810966, 6.21962063, 6.20209266, 6.20618216,
6.19702482, 6.1799057, 6.15772301, 6.13554395, 6.11336914, 6.09118237,
6.09738412, 6.11958004, 6.12698723, 6.14767387, 6.16835417, 6.18613747,
6.185096, 6.165456, 6.14476821, 6.15765091, 6.23561071, 6.08001353,
6.22353732, 6.2376767, 6.21143885, 6.19936347, 6.18727866, 6.17520066,
6.16311385, 6.15103386, 6.13894506, 6.12686243, 6.11478725, 6.10270261,
6.09818625, 6.12128852, 6.2468456, 6.22571713, 6.24558662, 6.22445138,
6.24434288, 6.22320086, 6.24308194, 6.22194875, 6.24182062, 6.22068065,
6.24057332, 6.21942655, 6.2113264, 6.22341814, 6.19699748, 6.18490568,
6.1988361, 6.17283631, 6.16074252, 6.14867115, 6.13657473, 6.13954049,
6.16263694, 6.18482009, 6.20327221, 6.20009595, 6.19278885, 6.17005571
), lat = c(54.67598304, 54.83924292, 54.83162024, 54.82483795,
54.82033259, 54.80904336, 54.79775292, 54.78646988, 54.77517665,
54.76389082, 54.75260377, 54.74131515, 54.73002531, 54.72096456,
54.71392047, 54.70687309, 54.69983176, 54.69278713, 54.68573957,
54.68934722, 54.7027117, 54.69910571, 54.71606682, 54.71246092,
54.70614626, 54.72943123, 54.72582507, 54.71951053, 54.71576339,
54.72280423, 54.72985112, 54.74274399, 54.75402944, 54.73569581,
54.72983408, 54.7653223, 54.77660496, 54.78789538, 54.79918423,
54.81047187, 54.80230996, 54.74279524, 54.73918917, 54.74155047,
54.75043676, 54.75615956, 54.75255324, 54.75849353, 54.76951451,
54.76590829, 54.77879358, 54.78287875, 54.84106585, 54.79004116,
54.73264696, 54.7439301, 54.755221, 54.76651031, 54.77779842,
54.78908531, 54.80037062, 54.81166369, 54.82295519, 54.83631649,
54.78306731, 54.79535153, 54.77609951, 54.7883729, 54.80065457,
54.76912877, 54.78140068, 54.7936805, 54.80595963, 54.7621547,
54.77443373, 54.78671208, 54.79898973, 54.81126633, 54.75518666,
54.76746422, 54.77974071, 54.7920169, 54.80429202, 54.81656608,
54.74821493, 54.76049101, 54.7727664, 54.78504073, 54.79732298,
54.80959594, 54.82187682, 54.74124062, 54.75351485, 54.76118719,
54.77247897, 54.78376916, 54.79513973, 54.79505815, 54.80634591,
54.8176321, 54.8309898, 54.82587825, 54.81251828, 54.80340625,
54.85043439, 54.85669012, 54.84379843, 54.8629602, 54.85006754,
54.83850747, 54.86921769, 54.85633303, 54.87548056, 54.86259492,
54.88174916, 54.86885358, 54.88800555, 54.8751176, 54.89427628,
54.89688611, 54.88137801, 54.88534048, 54.87379377, 54.87235508,
54.86608859, 54.85982748, 54.85356275, 54.84730376, 54.83491117,
54.82992904, 54.84309445, 54.86224596, 54.86080945, 54.85069668,
54.84926234, 54.8391549, 54.83771415, 54.82760306, 54.82617384,
54.81401612, 54.81866848, 54.82193287, 54.83203, 54.85454492,
54.8418856, 54.84463218, 54.83126946, 54.807705, 54.81314447,
54.95082492, 54.90870481, 54.85261135, 54.8544958, 54.86597391,
54.87933643, 54.89269028, 54.90605271, 54.91941509, 54.93277779,
54.91930405, 54.92337651, 54.92712528, 54.93087901, 54.93462874,
54.93838307, 54.94213376, 54.94588009, 54.93325056, 54.86785843,
54.86365241, 54.88122101, 54.87701472, 54.89458354, 54.890377,
54.90794604, 54.9204011, 54.92916856, 54.90373959, 54.91606989,
54.92541899, 54.92166542, 54.91791682, 54.91416421, 54.91041621,
54.89753741, 54.90128459, 54.88863077, 54.8823668, 54.87609923,
54.88468492, 54.89804726, 54.89094652, 54.89720452, 54.90830415,
55.08370977, 54.93641839, 55.07226944, 55.06170442, 55.06083624,
55.04939327, 55.03795778, 55.02652115, 55.01508302, 55.00364374,
54.99220296, 54.98077001, 54.96932695, 54.95789135, 54.94469353,
54.94333719, 54.96418243, 54.9585686, 54.97754901, 54.9719349,
54.99090692, 54.98529252, 55.00427341, 54.99865908, 55.01763087,
55.01201625, 55.03099761, 55.02538271, 55.03790915, 55.04934232,
55.02208092, 55.01064489, 54.99998971, 54.99920808, 54.98776941,
54.97632995, 54.9648976, 54.95153594, 54.95007267, 54.95382515,
54.96186591, 54.98662348, 54.97394968, 54.97118391)), class = "data.frame", row.names = c(NA,
-238L))
What I have done is as follow :
Add the point of interest to the beginning of df
df = rbind(userLocation,df)
Set the radius to 0.64 since according to here, every 0.1 is equivalent to 11.1 km !
radius <- 0.64
#Identifying neighbors
res <- nn2(df, k=nrow(df), searchtype="radius", radius = radius)
Since my point of interest is the first row in df I would expect all the non zero index in the first row are the points within my 70 km threshold
Ind <- res$nn.idx[1,][res$nn.idx[1,]>0]
My Ind object has just one value!
Ind
[1] 1
but if I plot the data, all of the points are within 70 km distance :
I would appreciate it if someone could help me here.

Results from MATLAB's crosscorr function and R's ccf different

I'm using MATLAB's crosscorr function and R's ccf. For the same data, the results differ. It appears that the lag axis is flipped in one of them. Why is this happening?
I've reproduced the crosscorr documentation example in both platforms and this is what I see. Any help will be appreciated.
R Studio
MATLAB
The data for the example can be found here:
R data:
xx <- c(-0.649013765191241, 1.18116604196553, -0.758453297283692, -1.10961303850152, -0.845551240007797, -0.572664866457950, -0.558680764473972, 0.178380225849766, -0.196861446475943, 0.586442621667069, -0.851886969622469, 0.800320709801823, -1.50940472473439, 0.875874147834533, -0.242789536333340, 0.166813439453503, -1.96541870928278, -1.27007139263854, 1.17517126546302, 2.02916018474976, -0.275157240675694, 0.603658445825815, 1.78125189324250, 1.77365832632615, -1.86512257453063, -1.05110705924059, -0.417382047996795, 1.40216228633781, -1.36774699097611, -0.292534999151874, 1.27084843418894, 0.0660093412882059, 0.451290213630776, -0.322209718011896, 0.788409216227425, 0.928736046813314, -0.490790376269763, 1.79720058425494, 0.590696551205452, -0.635785737847226, 0.603346612845761, -0.535247967775900, -0.155080385492789, 0.612122370772160, -1.04434349451734, -0.345631908307050, -1.17140482049761, -0.685586780437283, 0.926216394168962, -1.48167521167231, -0.558057808685045, -0.0284531115706568, -1.47629235201010, 0.258899957160403, -2.01869095243834, 0.199740262298379, 0.425864319131210, -1.27004345059705, -0.485218835743043, 0.594307616829848, -0.276464906639256, -1.85758288592737, 0.0407308117494288, 0.282970177161990, 0.0635612193024994, 0.433430065111595, 0.422860364487685, 1.29952829655200, -1.04979323447507,-1.78641172211092,0.816043081031918, -0.328208543142512, -1.21456561358767,1.11183287253465, -0.507496954829846, 0.898730486034072, 0.377215659958544, 1.45239164558790, 0.446945073178942, 0.645824788453030, -0.623677409296163, -0.595236431548712, 1.61132368718055, -0.348998045314167, 0.164167484938754, -1.63657708517891, 0.581365555343623, -0.128905996910632, 0.432858634222399, -0.245109040039237, -1.08543038934632, 1.68080151955536, 0.176411940863882, -2.07143962693628, 0.211089334851037, -0.582847822547194, 0.0181688430923922, 1.49477799287395, -0.424796733441211, 1.68624315536028)
yy <- c(0, 0, 0, 0, -0.649013765191241, 1.18116604196553, -0.758453297283692, -1.10961303850152, -0.845551240007797, -0.572664866457950, -0.558680764473972, 0.178380225849766, -0.196861446475943, 0.586442621667069, -0.851886969622469, 0.800320709801823, -1.50940472473439, 0.875874147834533, -0.242789536333340, 0.166813439453503,-1.96541870928278, -1.27007139263854, 1.17517126546302, 2.02916018474976,-0.275157240675694, 0.603658445825815, 1.78125189324250, 1.77365832632615, -1.86512257453063, -1.05110705924059,-0.417382047996795, 1.40216228633781,-1.36774699097611, -0.292534999151874, 1.27084843418894, 0.0660093412882059, 0.451290213630776, -0.322209718011896, 0.788409216227425, 0.928736046813314, -0.490790376269763, 1.79720058425494, 0.590696551205452, -0.635785737847226, 0.603346612845761, -0.535247967775900, -0.155080385492789, 0.612122370772160,-1.04434349451734, -0.345631908307050,-1.17140482049761, -0.685586780437283, 0.926216394168962, -1.48167521167231,-0.558057808685045, -0.0284531115706568, -1.47629235201010, 0.258899957160403, -2.01869095243834, 0.199740262298379, 0.425864319131210, -1.27004345059705, -0.485218835743043, 0.594307616829848, -0.276464906639256, -1.85758288592737, 0.0407308117494288, 0.282970177161990, 0.0635612193024994, 0.433430065111595, 0.422860364487685, 1.29952829655200, -1.04979323447507, -1.78641172211092, 0.816043081031918, -0.328208543142512, -1.21456561358767, 1.11183287253465, -0.507496954829846, 0.898730486034072, 0.377215659958544, 1.45239164558790, 0.446945073178942, 0.645824788453030, -0.623677409296163, -0.595236431548712, 1.61132368718055, -0.348998045314167, 0.164167484938754, -1.63657708517891, 0.581365555343623, -0.128905996910632, 0.432858634222399, -0.245109040039237, -1.08543038934632, 1.68080151955536, 0.176411940863882, -2.07143962693628, 0.211089334851037,-0.582847822547194)
ccf (xx, yy)
Matlab data & code:
x = [-0.649013765191241
1.18116604196553
-0.758453297283692
-1.10961303850152
-0.845551240007797
-0.572664866457950
-0.558680764473972
0.178380225849766
-0.196861446475943
0.586442621667069
-0.851886969622469
0.800320709801823
-1.50940472473439
0.875874147834533
-0.242789536333340
0.166813439453503
-1.96541870928278
-1.27007139263854
1.17517126546302
2.02916018474976
-0.275157240675694
0.603658445825815
1.78125189324250
1.77365832632615
-1.86512257453063
-1.05110705924059
-0.417382047996795
1.40216228633781
-1.36774699097611
-0.292534999151874
1.27084843418894
0.0660093412882059
0.451290213630776
-0.322209718011896
0.788409216227425
0.928736046813314
-0.490790376269763
1.79720058425494
0.590696551205452
-0.635785737847226
0.603346612845761
-0.535247967775900
-0.155080385492789
0.612122370772160
-1.04434349451734
-0.345631908307050
-1.17140482049761
-0.685586780437283
0.926216394168962
-1.48167521167231
-0.558057808685045
-0.0284531115706568
-1.47629235201010
0.258899957160403
-2.01869095243834
0.199740262298379
0.425864319131210
-1.27004345059705
-0.485218835743043
0.594307616829848
-0.276464906639256
-1.85758288592737
0.0407308117494288
0.282970177161990
0.0635612193024994
0.433430065111595
0.422860364487685
1.29952829655200
-1.04979323447507
-1.78641172211092
0.816043081031918
-0.328208543142512
-1.21456561358767
1.11183287253465
-0.507496954829846
0.898730486034072
0.377215659958544
1.45239164558790
0.446945073178942
0.645824788453030
-0.623677409296163
-0.595236431548712
1.61132368718055
-0.348998045314167
0.164167484938754
-1.63657708517891
0.581365555343623
-0.128905996910632
0.432858634222399
-0.245109040039237
-1.08543038934632
1.68080151955536
0.176411940863882
-2.07143962693628
0.211089334851037
-0.582847822547194
0.0181688430923922
1.49477799287395
-0.424796733441211
1.68624315536028]
yy = [0
0
0
0
-0.649013765191241
1.18116604196553
-0.758453297283692
-1.10961303850152
-0.845551240007797
-0.572664866457950
-0.558680764473972
0.178380225849766
-0.196861446475943
0.586442621667069
-0.851886969622469
0.800320709801823
-1.50940472473439
0.875874147834533
-0.242789536333340
0.166813439453503
-1.96541870928278
-1.27007139263854
1.17517126546302
2.02916018474976
-0.275157240675694
0.603658445825815
1.78125189324250
1.77365832632615
-1.86512257453063
-1.05110705924059
-0.417382047996795
1.40216228633781
-1.36774699097611
-0.292534999151874
1.27084843418894
0.0660093412882059
0.451290213630776
-0.322209718011896
0.788409216227425
0.928736046813314
-0.490790376269763
1.79720058425494
0.590696551205452
-0.635785737847226
0.603346612845761
-0.535247967775900
-0.155080385492789
0.612122370772160
-1.04434349451734
-0.345631908307050
-1.17140482049761
-0.685586780437283
0.926216394168962
-1.48167521167231
-0.558057808685045
-0.0284531115706568
-1.47629235201010
0.258899957160403
-2.01869095243834
0.199740262298379
0.425864319131210
-1.27004345059705
-0.485218835743043
0.594307616829848
-0.276464906639256
-1.85758288592737
0.0407308117494288
0.282970177161990
0.0635612193024994
0.433430065111595
0.422860364487685
1.29952829655200
-1.04979323447507
-1.78641172211092
0.816043081031918
-0.328208543142512
-1.21456561358767
1.11183287253465
-0.507496954829846
0.898730486034072
0.377215659958544
1.45239164558790
0.446945073178942
0.645824788453030
-0.623677409296163
-0.595236431548712
1.61132368718055
-0.348998045314167
0.164167484938754
-1.63657708517891
0.581365555343623
-0.128905996910632
0.432858634222399
-0.245109040039237
-1.08543038934632
1.68080151955536
0.176411940863882
-2.07143962693628
0.211089334851037
-0.582847822547194]
[XCF,lags,bounds] = crosscorr(xx,yy);

Remove all rows above and below a value in R

We have citizen scientist recording data for us using In-Situ Aqua troll 600 instruments. It is similar to a CTD but not. The data format is a little different. Different enough that I cannot use CTD trim from the OCE package in R. I need to remove all the rows of data during the soak time (time in the water before they start lowering the instrument) and the up cast from the data. That is all the rows after they reached the max depth. So I just need that center portion of my dataframe.
My Data
Date Time Salinity (ppt) (672441) Chlorophyll-a Fluorescence (RFU) (671721) RDO Concentration (mg/L) (672144) Temperature (°C) (676121) Depth (ft) (671051)
16:29.0 0 0.01089297 7.257619 31.91303 0.008220486
16:31.0 0 0.01765913 7.246986 31.93175 0.1499496
16:33.0 0 0.0130412 7.258863 31.93253 0.5387784
16:35.0 0 0.01299242 7.274049 31.93806 0.6187978
16:37.0 0 0.01429801 7.26965 31.94401 0.6640261
16:39.0 0 0.01342988 7.271608 31.93595 0.681709
16:41.0 0 0.01337719 7.271549 31.93503 0.684597
16:43.0 7.087267 0.007094439 6.98015 31.89018 1.598019
16:45.0 28.3442 0.007111916 6.268753 31.83806 1.687673
16:47.0 31.06357 0.007945394 6.197834 31.77821 1.418773
16:49.0 32.07076 0.0080788 6.166986 31.76881 1.382685
16:51.0 31.95504 0.004382414 6.191305 31.72906 1.358556
16:53.0 36.21165 0.01983912 5.732656 29.3942 123.4148
16:55.0 36.37849 0.02243886 5.626586 28.82502 125.2927
16:57.0 36.43061 0.02416219 5.450325 28.23787 126.7997
16:59.0 36.44484 0.02441683 5.421676 28.14037 127.0321
17:01.0 36.46815 4.510316 5.318929 28.09501 127.2064
17:03.0 36.41381 4.012657 5.241654 28.14595 127.2227
17:05.0 36.42724 0.7891375 5.174401 28.20383 127.2019
17:07.0 36.41064 0.4351442 5.120181 28.18592 127.197
17:09.0 36.38155 0.2253969 5.033384 28.21021 127.1895
17:11.0 36.37671 0.2089337 5.019629 28.21222 127.1885
17:13.0 36.43813 0.08728585 4.981099 28.17526 127.2223
17:15.0 36.47644 0.904435 4.951878 28.13579 127.2108
17:17.0 36.54742 0.1230291 4.93056 28.06166 127.2307
17:19.0 36.60466 10.04291 4.908442 27.9397 126.6003
17:21.0 36.61511 11.33922 4.904828 27.92038 126.5161
17:23.0 36.68179 0.6680982 4.87018 27.78319 123.707
17:25.0 36.74612 0.06539913 4.848994 27.72977 119.906
17:27.0 36.75729 0.02414635 4.826871 27.72545 114.9537
17:29.0 37.1578 0.01556828 4.804105 27.81129 113.3405
> depthmax<- max(WS$`Depth (ft) (671051)`, na.rm = TRUE)
> output <- WS[WS$"Depth (ft) (671051)" < depthmax,]
> Output2 <- output[output$"Depth (ft) (671051)" > 1,]
I tried these and got output2 to work but can't seam to get output to work. Is there a more elegant way to do this? Just to recap I need to remove all rows after the depthmax (127.2307) and all the rows before the depth when they start lowering the instrument (~2.41).
Your code does remove the maximum depth, but not the rows after the maximum depth is reached. You want to locate the row index of the the maximum depth and delete that row and the ones after:
start <- tail(which(na.omit(WS$`Depth (ft) (671051)`) < 2.41), 1) + 1
end<- which.max(na.omit(WS$`Depth (ft) (671051)`)) - 1
output <- WS[start:end, ]
The first line finds the index of the last row less than 2.41 and adds 1 to get the starting row. The second line finds the index of the maximum depth and subtracts 1 to get the row before that.

Plotting multiple series (scatter line) with same x axis on one plot [duplicate]

This question already has answers here:
Plot multiple columns on the same graph in R [duplicate]
(4 answers)
Closed 4 years ago.
I have a compositional data set. I have a set of columns (samples) which contain percentage data. Each row (channel-diameter in my case) is therefore a particular variable that each sample has a percentage of. E.g.
Channel diameter (um) sample2 sample3 sample8 sample9 sample17
0.375198 0.0365797 0.0424338 0.0158648 0.02944 0.0157091
0.411878 0.0647681 0.0750611 0.0280678 0.052028 0.0278099
0.452145 0.0956633 0.111489 0.0415484 0.0770551 0.0410209
0.496347 0.137893 0.162464 0.0601572 0.111755 0.0589772
0.544872 0.175746 0.210556 0.0771818 0.143911 0.0748565
0.59814 0.210752 0.257403 0.0932129 0.174446 0.089273
0.656615 0.244288 0.304665 0.10884 0.204511 0.102797
0.720807 0.278281 0.354677 0.124906 0.235612 0.11622
0.791275 0.31069 0.405324 0.140553 0.266354 0.128626
0.868632 0.339832 0.454374 0.15495 0.295125 0.139238
0.953552 0.365523 0.500985 0.167898 0.321535 0.147978
1.04677 0.387791 0.544478 0.179338 0.345493 0.154899
1.14911 0.407715 0.585383 0.189749 0.367873 0.160534
1.26145 0.424342 0.622144 0.1988 0.388226 0.164562
1.38477 0.437851 0.654347 0.206637 0.406776 0.167147
1.52015 0.448418 0.681951 0.213521 0.424175 0.168487
1.66876 0.457694 0.706822 0.220449 0.442197 0.169372
1.8319 0.466729 0.730714 0.228307 0.462539 0.170336
2.011 0.476516 0.755269 0.237889 0.48627 0.171799
2.2076 0.487906 0.782015 0.249849 0.514036 0.174083
2.42342 0.501736 0.81248 0.264752 0.546016 0.177432
2.66033 0.51929 0.848837 0.283331 0.582431 0.182235
2.92042 0.541324 0.892608 0.305976 0.62241 0.188562
3.20592 0.568374 0.944571 0.332691 0.663758 0.196293
3.51934 0.599897 1.00394 0.362726 0.702966 0.204848
3.8634 0.635522 1.06984 0.395209 0.736726 0.213754
4.2411 0.674643 1.14148 0.429266 0.762942 0.222574
4.65572 0.717242 1.21878 0.464337 0.780965 0.231205
5.11087 0.76318 1.30134 0.499874 0.791079 0.23963
5.61052 0.812207 1.38818 0.535257 0.794286 0.247904
6.15902 0.863791 1.478 0.570021 0.793137 0.256198
6.76114 0.917491 1.56991 0.604296 0.792304 0.264896
7.42212 0.973638 1.66349 0.638955 0.797465 0.274726
8.14773 1.03178 1.75579 0.674653 0.812076 0.286046
8.94427 1.09013 1.83889 0.710974 0.834453 0.298613
9.81869 1.14346 1.89853 0.745919 0.857295 0.310908
10.7786 1.18666 1.92001 0.77754 0.871976 0.321507
11.8323 1.21701 1.89513 0.805316 0.873849 0.329678
12.9891 1.23962 1.82973 0.830937 0.865868 0.336774
14.2589 1.26532 1.74176 0.855741 0.855011 0.345067
15.6529 1.30625 1.65369 0.877552 0.846228 0.35632
17.1832 1.37039 1.58411 0.889331 0.838007 0.370876
18.863 1.45674 1.54142 0.881201 0.822809 0.387127
20.7071 1.55939 1.52665 0.846803 0.793656 0.403864
22.7315 1.6691 1.53505 0.7877 0.749241 0.42062
24.9538 1.78027 1.56136 0.714575 0.696582 0.438749
27.3934 1.89095 1.60072 0.642402 0.646703 0.459818
30.0714 2.00065 1.64851 0.583683 0.608531 0.484033
33.0113 2.10867 1.70138 0.544639 0.585721 0.50996
36.2385 2.21043 1.75647 0.52405 0.57563 0.534436
39.7813 2.30025 1.81358 0.51599 0.572488 0.554934
43.6704 2.37323 1.87431 0.512943 0.570144 0.570926
47.9397 2.42843 1.9405 0.509019 0.564741 0.584767
52.6264 2.47132 2.01249 0.502374 0.556494 0.60136
57.7713 2.51141 2.08592 0.494986 0.549106 0.626145
63.4192 2.55913 2.15193 0.491467 0.549135 0.663665
69.6192 2.62015 2.19769 0.496593 0.563727 0.716063
76.4253 2.69185 2.2104 0.513592 0.598638 0.782837
83.8969 2.7645 2.18362 0.543656 0.656535 0.861711
92.0988 2.826 2.12191 0.585723 0.735146 0.949471
101.103 2.86765 2.04062 0.637396 0.827534 1.04325
110.987 2.88366 1.95552 0.695157 0.923739 1.1402
121.837 2.86566 1.87145 0.756528 1.01592 1.23771
133.748 2.79489 1.77514 0.820962 1.10124 1.33218
146.824 2.64552 1.64295 0.891174 1.18226 1.41966
161.177 2.39707 1.45793 0.970232 1.26132 1.49497
176.935 2.05403 1.2272 1.05834 1.3359 1.55358
194.232 1.65431 0.983299 1.14894 1.39739 1.59279
213.221 1.25961 0.76806 1.2303 1.43626 1.61461
234.066 0.932009 0.612117 1.29251 1.45138 1.62797
256.948 0.708748 0.526414 1.33762 1.45415 1.64839
282.068 0.596048 0.507822 1.3814 1.46446 1.69346
309.644 0.577959 0.544827 1.44754 1.49861 1.77678
339.916 0.628404 0.619653 1.55335 1.5589 1.90214
373.147 0.712328 0.706958 1.70468 1.63116 2.06529
409.626 0.785738 0.771807 1.88764 1.69265 2.25554
449.672 0.808987 0.78167 2.07784 1.72831 2.46479
493.633 0.763715 0.72189 2.24938 1.74795 2.69133
541.892 0.665773 0.613547 2.40785 1.79079 2.94904
594.869 0.562534 0.5129 2.57317 1.90894 3.24541
653.025 0.501094 0.468972 2.78062 2.1497 3.57554
716.866 0.509195 0.50887 3.03373 2.53054 3.89743
786.949 0.584288 0.63571 3.33205 3.02726 4.16047
863.883 0.682625 0.805693 3.64631 3.56815 4.31705
948.338 0.736664 0.946139 3.93691 4.04493 4.34466
1041.05 0.679724 0.960431 4.14637 4.34887 4.24767
1142.83 0.462905 0.717301 4.29187 4.39208 4.08343
1254.55 0.212328 0.364022 4.3391 4.11532 3.85551
1377.2 0.0459161 0.0848405 4.29583 3.5172 3.56707
1511.84 0.00420859 0.00868247 4.08845 2.63958 3.15498
1659.64 0 0 3.86265 1.92542 2.794
1821.89 0 0 3.64037 1.16644 2.40284
2000
I would like to plot each sample as a scatter(line) on the same graph. X axis would be channel diameter (the rows), and y axis would be the percentage data that's in the columns.
Most things I've tried don't seem to recognize the first column as the x axis value.
Create an empty plot:
plot(x=NA, y=NA, xlim=c(0,1), ylim=c(0,1),
xlab="My X Label", ylab="My Y Label",
main="My Title")
The add your lines one at a time:
for(i in 2:ncol(df)) {
lines(x=df[,1], y=df[,i])
}
This code assumes your data.frame with your data is called df and that you want to plot all columns as y variables except the first column which you treat as your x variable.

How can I apply fisher test on this set of data (nominal variables)

I'm pretty new in statistics:
fisher = function(idxToTest, idxATI){
idxDependent=c()
dependent=c()
p = c()
for(i in c(1:length(idxToTest)))
{
tbl = table(data[[idxToTest[i]]], data[[idxATI]])
rez = fisher.test(tbl, workspace = 20000000000)
if(rez$p.value<0.1){
dependent=c(dependent, TRUE)
if(rez$p.value<0.1){
idxDependent = c(idxDependent, idxToTest[i])
}
}
else{
dependent = c(dependent, FALSE)
}
p = c(p, rez$p.value)
}
}
This is the function I use. It seems to work.
What I understood until now is that I have to pass as first parameter data like:
Men Women
Dieting 10 30
Non-dieting 5 60
My data comes from a CSV:
data = read.csv('***.csv', header = TRUE, sep=',');
My first problem is that I don't know how to converse from:
Loan.Purpose Home.Ownership
lp_value_1 ho_value_2
lp_value_1 ho_value_2
lp_value_2 ho_value_1
lp_value_3 ho_value_2
lp_value_2 ho_value_3
lp_value_4 ho_value_2
lp_value_3 ho_value_3
to:
ho_value_1 ho_value_2 ho_value_3
lp_value1 0 2 0
lp_value2 1 0 1
lp_value3 0 1 1
lp_value4 0 1 0
The second issue is that I don't know what the second parameter should be
POST UPDATE: This is what I get using fisher.test(myTable):
Error in fisher.test(test) : FEXACT error 501.
The hash table key cannot be computed because the largest key
is larger than the largest representable int.
The algorithm cannot proceed.
Reduce the workspace size or use another algorithm.
where myTable is:
MORTGAGE NONE OTHER OWN RENT
car 18 0 0 5 27
credit_card 190 0 2 38 214
debt_consolidation 620 0 2 87 598
educational 5 0 0 3 7
...
Basically, fisher tests only work on smallish data sets because they require alot of memory. But all is good because chi-square tests make minimal additional assumptions and are easier on the computer. Just do:
chisq.test(Loan.Purpose,Home.Ownership)
to get your p-values.
Make sure you read through and understand the help page for chisq.test, especially the examples at the bottom.
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/chisq.test.html
Then look at a mosaicplot to see the quantities like:
mosaicplot(Loan.Purpose,Home.Ownership)
this reference explains how mosaicplots work.
http://alumni.media.mit.edu/~tpminka/courses/36-350.2001/lectures/day12/

Resources