cut function and controlled frequency in the intervals - r
My question is pretty simple: the cut() function allows to choose the breaks along which I can divide the range of my vector into intervals. I would like to be able to control for the number of observations within the newly created interval, in a way similar to what could be obtained with a quantile argument in the cut() function call. However I don't want to be using the quantile argument because I would like for the intervals to be chosen fixed, so that I can match them between different databases for further comparison, and I want the same discrete values to be found in the labels of the newly cut vectors.
I used to use this for the quantile approach:
df$z<-cut(df$x, quantile(x, (0:10)/10), include.lowest=TRUE)
Which is fairly simple. My new approach is even simpler, so it resembles this for example:
df$z<-cut(df$x, c(0.04,0.055,0.06,0.065,0.07,0.075,0.08,0.085,0.09,0.095,0.11), include.lowest=T)
I then have another variable which I want to calculate some statistics on, according to the levels of the discrete variable.
So it would go something like this :
df$conf.intx<-ifelse(df$z=="1",t.test(df[df$z=="1",]$y)$conf.int[1],
ifelse(df$z=="2",t.test(df[df$z=="2",]$y)$conf.int[1],
ifelse(df$z=="3",t.test(df[df$z=="3",]$y)$conf.int[1],
ifelse(df$z=="4",t.test(df[df$z=="4",]$y)$conf.int[1],NA))))
But for me to be able to calculate this kind of t-test confidence interval on each of the 'pools' of the y values (which number in the same amount as the observations within the intervals of the discrete variable), I need to be able to control for the number of values within each created interval for z, so that my test remains valid, at least as far as the number of observations is concerned.
Simply put, I'd need an automated procedure that would create the vector of breaks for the z variable so that each of them contains a minimum number of observations. As an added complication, it should be the same breaks for two different databases, which I don't know if it's possible.
Any help on the matter would be welcome, thank you in advance.
EDIT: here is a sample of my data for x.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1161917, 7.0606333, 5.2622667, 6.780925, 5.4615417, 6.48185,
5.51585, 6.2224333, 5.3660667, 7.196525, 6.2984083, 7.0137833,
7.4490083, 5.9712333, 6.4287833, 7.6693917, 6.4406417, 5.4135083,
7.16245, 7.2267, 5.820325, 6.066175, 5.760975, 6.4775, 6.2625,
5.5182583, 8.446625, 8.19025, 6.7955333, 4.7899583, 6.5680167,
4.5965917, 6.3539333, 4.6639, 6.0489667, 4.9047833, 5.353625,
4.711425, 6.6268833, 5.5458083, 6.3271917, 6.4591417, 5.1843917,
5.6117167, 7.1828417, 5.6956917, 5.0271917, 6.741875, 6.68305,
4.7859667, 5.3068667, 5.3245, 5.745675, 5.7518917, 5.37945, 8.0030417,
7.7064583, 6.2935333, 5.1838667, 6.9369333, 4.9734583, 6.7257167,
5.0510333, 6.4257667, 5.2858083, 5.7285167, 5.084, 7.0092833,
5.905875, 6.6893417, 6.8319583, 5.5558083, 5.9854833, 7.5552167,
6.064625, 5.3990333, 7.115175, 7.0600167, 5.1644833, 5.6848667,
5.7014417, 6.1051, 6.1186333, 5.7217667, 8.3685417, 8.071325,
6.6547333, 5.5972417, 7.4226, 5.539725, 7.26335, 5.645975, 6.87475,
5.8486167, 6.3001667, 5.5997833, 7.4353167, 6.5089583, 7.213625,
7.3125667, 6.12095, 6.5410083, 8.0639083, 6.6505167, 5.8886417,
7.6301167, 7.5850417, 5.7693667, 6.2480167, 6.1847167, 6.6896167,
6.6323917, 6.1972167, 8.8560333, 8.5501083, 7.1036167, 4.9929583,
6.9839583, 5.3847417, 6.8814417, 5.59555, 6.7867167, 5.7831333,
6.9370917, 5.7400917, 7.6922, 6.3151, 7.084725, 7.0414417, 5.95435,
6.4274167, 7.6692167, 6.9159, 6.0856083, 7.3079583, 7.1937667,
5.744675, 5.946525, 6.0651833, 6.8488833, 6.5924333, 5.772025,
8.3281167, 8.5475917, 6.7952917, 8.248525, 5.1931083, 7.0688917,
5.4793583, 7.0091583, 5.7593, 7.1053333, 5.9382583, 7.1765417,
6.003075, 7.7699833, 6.2757333, 7.2446583, 7.179275, 6.0013083,
6.447975, 7.7845833, 6.9071083, 6.1009, 7.425425, 7.4619083,
5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417, 6.0763917, 8.5926583,
8.7468417, 7.2485167, 8.5096833, 5.1541, 7.0479917, 5.43065,
6.9689083, 5.7356, 7.0842917, 5.9051667, 7.1283333, 5.9666667,
7.7295583, 6.249925, 7.21005, 7.1427167, 5.9675583, 6.4135667,
7.7448583, 6.874275, 6.0679333, 7.388675, 7.429025, 5.911225,
6.1757167, 6.095225, 7.045775, 6.9870833, 6.0567333, 8.5771167,
8.7541917, 7.3187333, 8.5092083, 5.5746, 7.342925, 5.8561667,
7.4704667, 5.922225, 6.9787, 6.1564167, 7.6059667, 5.9122917,
7.7848833, 6.6192, 7.34055, 7.2352417, 5.9776083, 6.5197583,
7.4891583, 7.2185667, 6.4710167, 7.70945, 7.5078083, 6.1470417,
6.66115, 6.6899333, 7.4454083, 7.2270917, 6.350075, 8.3156667,
8.9007917, 6.7578083, 8.3258083, 5.1996, 6.9688833, 5.3592917,
6.7583417, 5.5623583, 6.756375, 5.7361, 7.120425, 5.6567, 7.6174667,
6.1474833, 7.1442167, 6.74475, 5.5820333, 6.0106, 7.142675, 6.667475,
5.9067917, 7.2392, 7.058675, 5.6394417, 5.9119167, 5.8367333,
6.798025, 6.694675, 5.8565917, 8.6035083, 8.912375, 7.0501083,
8.38045, 4.8478083, 6.7493167, 5.3686667, 6.5152333, 5.282025,
6.5464333, 5.5085583, 6.870975, 5.4757667, 7.318, 5.92225, 6.9300417,
6.5758083, 5.4233083, 5.8295583, 7.0451, 6.4790083, 5.68255,
6.9632833, 6.9965833, 5.5005667, 5.717725, 5.5938083, 6.5309,
6.4824583, 5.4429833, 8.072575, 8.3635, 6.5797167, 8.0352333,
4.6289833, 6.64105, 4.8883833, 6.2025833, 5.2291833, 6.4814667,
5.2211083, 6.5780083, 5.196275, 7.030725, 5.6001583, 6.620475,
6.2858333, 5.114375, 5.5424417, 6.7784917, 6.1561333, 5.339375,
6.6249083, 6.6248583, 5.139775, 5.4195, 5.4531833, 6.3348583,
6.4041417, 5.292, 7.6243833, 7.9624583, 6.3226417, 7.761175,
4.8419083, 6.8384083, 5.3500417, 6.5903333, 5.33275, 6.732575,
5.4486, 6.8069417, 5.4569583, 7.26275, 5.835525, 6.8680333, 6.6712333,
5.4720417, 5.904325, 7.1506917, 6.4746833, 5.638675, 6.9570667,
7.0017333, 5.5033667, 5.6859333, 5.651875, 6.5903, 6.529725,
5.4819667, 7.971975, 8.2337833, 6.5815333, 7.9736583, 5.7711917,
7.543325, 5.8986917, 7.5081333, 6.2920333, 7.5321667, 6.4908917,
7.7616583, 6.4509417, 8.08035, 6.8219, 7.7939167, 7.6491333,
6.4773583, 6.9338667, 8.1865583, 7.3998917, 6.572125, 7.9198417,
8.0568, 6.5880333, 6.8299667, 6.7399833, 7.6436, 7.509275, 6.5139833,
9.1520167, 9.3580667, 7.65415, 9.0725167, 5.7483583, 7.5230417,
5.89105, 7.4808833, 6.1969667, 7.4923583, 6.4092583, 7.70695,
6.3970833, 8.0971333, 6.7949083, 7.76445, 7.6170167, 6.4494333,
6.8997, 8.1575333, 7.3728417, 6.544075, 7.888, 8.0215, 6.5484,
6.7911667, 6.7121917, 7.6179083, 7.4731167, 6.4629167, 9.1226333,
9.3307083, 7.6230583, 9.024875, 5.543925, 7.1460833, 5.6575583,
7.5986083, 6.027075, 7.4386167, 6.3500333, 7.6694833, 6.3682583,
8.0843333, 6.7181083, 7.7376, 7.5818583, 6.4010667, 6.8440083,
8.1217917, 7.3290833, 6.5187333, 7.8591667, 7.9898583, 6.5051,
6.7251167, 6.6881333, 7.477675, 7.3571333, 6.3351833, 8.881575,
9.12315, 7.3851, 8.8008667, 5.3437833, 7.1560417, 5.5748, 7.4622583,
5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685, 8.0270917,
6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417, 8.0401167,
7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667, 6.6103333,
7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625, 7.2460917,
8.660125, 5.2502833, 7.2591, 5.6425417, 6.889925, 5.353675, 6.50635,
6.260675, 7.4236583, 5.9076417, 7.3915, 6.2134917, 7.1645333,
6.922675, 6.0295417, 6.1687917, 7.2771083, 6.6152333, 6.3299417,
7.167325, 6.647275, 5.726475, 5.93905, 6.2888583, 6.7497167,
6.4364083, 5.8906583, 7.6052917, 8.039425, 6.5672833, 7.8754667,
6.3086333, 5.352025, 7.2849417, 5.7184833, 6.9675917, 5.5615333,
6.6157917, 6.3505417, 7.4881, 6.0007417, 7.5110583, 6.35525,
7.254075, 7.0289083, 6.1994417, 6.2860833, 7.372575, 6.735975,
6.4628917, 7.3102167, 6.8619417, 5.9123667, 6.1611917, 6.4854083,
6.8942417, 6.563625, 6.0610083, 7.941625, 8.6969167, 6.66075,
8.1197167, 6.2802, 3.9638, 5.870825, 4.1852, 5.5841417, 4.3007583,
5.2352167, 4.4281417, 5.819425, 4.1990917, 5.9338917, 4.89765,
5.7204333, 5.6546833, 4.5632167, 4.9803333, 5.6962417, 5.247725,
4.7092583, 6.0145417, 5.6403917, 4.4016917, 4.7181, 4.5007833,
5.2828917, 5.1314167, 4.7492, 6.777575, 6.9040083, 4.9760583,
6.4471917, 5.0952833, 3.712725, 5.8215333, 4.025725, 5.5635,
4.2354083, 5.143525, 4.4900083, 5.6802417, 4.1214333, 5.8128,
4.7525583, 5.6412583, 5.5534917, 4.487475, 4.8237833, 5.6156917,
5.0573, 4.5755417, 5.8096083, 5.5252083, 4.3145583, 4.5437417,
4.194675, 5.0100833, 4.8972333, 4.590025, 6.6441417, 6.5789417,
4.6947667, 6.1648167, 4.8517333, 3.982925, 5.7966833, 4.1607083,
5.5564833, 4.2557417, 5.2304083, 4.8661333, 5.912875, 4.4988333,
6.03915, 4.9131583, 5.8518667, 5.6578583, 4.773225, 4.8958583,
5.8759833, 5.204725, 4.8961667, 5.9217, 5.58395, 4.5410667, 4.73445,
4.5922333, 5.2517333, 5.0220333, 4.619475, 6.4883667, 6.429175,
4.6796417, 6.3171083, 4.93615, 3.9278833, 5.7590417, 4.1155667,
5.612725, 4.2199833, 5.2126667, 4.805275, 5.8888833, 4.4363,
6.0380083, 4.892, 5.8192083, 5.64205, 4.708825, 4.8751583, 5.833775,
5.2210417, 4.853225, 5.924225, 5.5856583, 4.5386167, 4.7280917,
4.5618, 5.264425, 5.03855, 4.5539, 6.4993, 6.4900667, 4.6749083,
6.2961333, 4.918525, 4.0890583, 6.33385, 4.3470083, 5.9645, 4.6541833,
5.5438667, 4.9556583, 6.1590583, 4.6379417, 6.2876833, 5.2235167,
6.1387167, 6.0547583, 4.9545667, 5.254125, 6.05395, 5.4813417,
4.9971333, 6.2266583, 5.9172833, 4.7275917, 4.9274917, 4.443575,
5.3164917, 5.2507083, 5.1704583, 7.173075, 6.9351583, 5.0816667,
6.5568, 5.3417667, 5.1705167, 7.0777833, 5.6253333, 7.231225,
5.5799167, 6.6942917, 6.1014583, 7.538725, 5.7152667, 7.459275,
6.2406083, 7.064925, 6.9234417, 5.8328833, 6.1819583, 7.2127583,
6.8071583, 6.2599417, 7.2975417, 6.973875, 5.804125, 6.1944667,
6.38855, 7.0553583, 6.8393167, 6.1275417, 7.9986833, 8.5846,
6.4682167, 8.0134583, 6.1805917, 5.0699583, 6.9006667, 5.36365,
6.9204917, 5.4478667, 6.5391583, 6.0647417, 7.2951667, 5.6632833,
7.25595, 6.1057333, 6.9578417, 6.8235583, 5.8671833, 6.0716417,
7.060175, 6.5401, 6.1229417, 7.1305083, 6.7823417, 5.62415, 5.9202,
5.9957167, 6.7142167, 6.4706417, 5.9004667, 7.8304583, 8.2144667,
6.1530583, 7.6896417, 5.9285333, 4.2625417, 5.9677583, 4.58695,
6.0400083, 4.4215333, 5.6052833, 5.04165, 6.48845, 4.6423583,
6.1688833, 5.0256167, 5.926725, 5.7214667, 4.746375, 4.9828,
6.1583083, 5.6903, 5.217375, 6.1341583, 5.7868083, 4.5895333,
4.98235, 5.159725, 5.7866167, 5.6300833, 4.882975, 6.7210833,
7.4314833, 5.2493083, 6.8503833, 5.2225583, 3.8417833, 5.9798,
4.1168583, 5.63415, 4.3311333, 5.0777667, 4.6606833, 5.789425,
4.3565167, 5.9736167, 4.8910667, 5.9445417, 5.699275, 4.6897167,
4.9036083, 5.8767, 5.088675, 4.6224417, 5.8052833, 5.5697167,
4.3237, 4.6084333, 4.2958833, 5.1394417, 5.0137583, 4.7711, 6.771275,
6.5984417, 4.845625, 6.3338083, 5.1370333, 3.1820167, 5.2699667,
3.4827167, 5.0992583, 3.7040583, 4.6358583, 4.1604917, 5.2488333,
3.7522, 5.3774167, 4.2636167, 5.1998167, 5.0456333, 4.051475,
4.289175, 5.1718917, 4.5787083, 4.1461667, 5.2983167, 5.03025,
3.8709333, 4.0917167, 3.731925, 4.5584167, 4.4200333, 4.061375,
6.064225, 6.02975, 4.1590167, 5.6589083, 4.2614833, 3.68695,
5.587375, 3.91725, 5.3387, 4.0061667, 4.9563833, 4.1942, 5.6720583,
3.9584333, 5.6873583, 4.6251, 5.4801417, 5.3975583, 4.2382, 4.6710917,
5.4898083, 5.0469667, 4.4950083, 5.72005, 5.46085, 4.30355, 4.5525917,
4.3681667, 5.1723167, 5.0331417, 4.4793083, 6.5492917, 6.720225,
4.7550917, 6.197775, 4.8082917, 4.09925, 5.986525, 4.3104417,
5.68455, 4.4287167, 5.3555667, 4.5191083, 5.9269833, 4.2695917,
5.9984167, 4.981225, 5.8049917, 5.7680667, 4.5736667, 5.0673583,
5.7443583, 5.2811083, 4.719175, 6.0376667, 5.73875, 4.3947333,
4.8157333, 4.6093417, 5.3906417, 5.2357417, 4.684825, 6.8885583,
7.018425, 5.0878167, 6.5122333, 5.2084, 3.810525, 6.2600083,
3.6246583, 5.7396417, 4.0617917, 5.6724583, 4.2505833, 4.7518417,
4.1232, 6.208375, 4.5881167, 5.252575, 5.71795, 4.0840583, 4.700325,
6.2360333, 4.701725, 3.922525, 5.5162167, 5.6220333, 3.8836833,
4.4883667, 4.5398583)), .Names = "x", row.names = c(NA, -962L
), class = "data.frame")
Assuming I want 30 values per interval (the 'n'), here is the code I used:
df$z<-cut(df$x, seq(30,length(df$x),by=30)/length(df$x), include.lowest=T)
Which gives me:
> table(df$z)
[0.0312,0.0624] (0.0624,0.0936] (0.0936,0.125] (0.125,0.156] (0.156,0.187] (0.187,0.218] (0.218,0.249] (0.249,0.281] (0.281,0.312] (0.312,0.343] (0.343,0.374]
0 0 0 0 0 0 0 0 0 0 0
(0.374,0.405] (0.405,0.437] (0.437,0.468] (0.468,0.499] (0.499,0.53] (0.53,0.561] (0.561,0.593] (0.593,0.624] (0.624,0.655] (0.655,0.686] (0.686,0.717]
0 0 0 0 0 0 0 0 0 0 0
(0.717,0.748] (0.748,0.78] (0.78,0.811] (0.811,0.842] (0.842,0.873] (0.873,0.904] (0.904,0.936] (0.936,0.967] (0.967,0.998]
0 0 0 0 0 0 0 0 0
What I want is a similar result to what I get with quantiles:
df$zbis<-cut(df$x, quantile(df$x, (0:20)/20), include.lowest=T)
table(df$zbis)
[3.18,4.29] (4.29,4.62] (4.62,4.89] (4.89,5.14] (5.14,5.33] (5.33,5.53] (5.53,5.66] (5.66,5.8] (5.8,5.94] (5.94,6.1] (6.1,6.26] (6.26,6.45] (6.45,6.58] (6.58,6.74] (6.74,6.93]
49 48 48 48 48 48 48 48 48 48 48 48 48 48 48
(6.93,7.14] (7.14,7.34] (7.34,7.62] (7.62,8.06] (8.06,9.36]
48 48 48 48 49
Except I'd like this to be reproducible for another database, and so I can't use the quantile function, since I would not get the same intervals on a different database.
SECOND EDIT: here is the second sample from another database. 'x' is the same variable, and they have similar ranges.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1931083, 7.0688917, 5.4793583, 7.0091583, 5.7593, 7.1053333,
5.9382583, 7.1765417, 6.003075, 7.7699833, 6.2757333, 7.2446583,
7.179275, 6.0013083, 6.447975, 7.7845833, 6.9071083, 6.1009,
7.425425, 7.4619083, 5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417,
6.0763917, 8.5926583, 8.7468417, 7.2485167, 8.5096833, 5.177275,
7.09985, 5.6444667, 7.0102417, 5.7303833, 7.0383333, 5.9870583,
7.3342083, 5.9363667, 7.7753333, 6.38355, 7.389575, 7.0396667,
5.889625, 6.29395, 7.51135, 6.940925, 6.1455417, 7.4281833, 7.4657167,
5.9707083, 6.1902083, 6.0936167, 6.9595167, 6.85065, 5.8525,
8.5148083, 8.805625, 7.00665, 8.4457, 5.3437833, 7.1560417, 5.5748,
7.4622583, 5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685,
8.0270917, 6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417,
8.0401167, 7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667,
6.6103333, 7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625,
7.2460917, 8.660125, 3.614125, 5.6345917, 3.9410417, 5.2901417,
4.0147333, 4.766825, 4.4500417, 5.5189, 4.11375, 5.6350667, 4.5756917,
5.5998833, 5.3663, 4.44405, 4.5767417, 5.552025, 4.847425, 4.4382583,
5.5769417, 5.2390667, 4.0610917, 4.4054833, 4.1917, 4.9029083,
4.6935917, 4.3499417, 6.0562333, 6.081225, 4.45855, 6.0121583,
4.740275, 4.5028, 6.4177833, 4.8716417, 6.1469917, 4.6208917,
5.7748083, 5.4530083, 6.694125, 5.0944333, 6.5123167, 5.3257083,
6.2765333, 6.0149167, 5.1815583, 5.30715, 6.4149083, 5.82245,
5.515425, 6.3654333, 5.8472833, 4.9798917, 5.1833583, 5.5210333,
6.0410667, 5.7377917, 5.2666083, 7.0378167, 7.744175, 5.718725,
7.3220583, 5.24325, 5.3256, 7.2155167, 5.696925, 7.0029667, 5.5235,
6.7261083, 6.2810667, 7.546825, 5.90915, 7.3299167, 6.2227333,
7.147075, 6.9142417, 6.0012083, 6.1725333, 7.29815, 6.7, 6.3454583,
7.2129583, 6.7559833, 5.8115, 6.0756667, 6.458225, 6.9969167,
6.778825, 6.2245833, 8.0809583, 8.875325, 6.7210917, 8.3203,
6.3513, 5.2591333, 7.1404917, 5.6266417, 6.9356, 5.4568, 6.6604,
6.206025, 7.48525, 5.8323667, 7.24635, 6.1446583, 7.066275, 6.8334,
5.9198667, 6.09505, 7.2206583, 6.63085, 6.270075, 7.1397333,
6.689125, 5.7441333, 6.042575, 6.38255, 6.9325833, 6.7175667,
6.1592, 8.00415, 8.8051167, 6.647125, 8.2465667, 6.2788167, 6.49435,
8.1847583, 6.664475, 8.0528583, 6.6822417, 7.376, 7.1517833,
8.2306833, 6.8584583, 8.3052167, 7.288375, 8.2758583, 7.7162583,
7.2807833, 7.0459, 8.2507833, 7.5855, 7.0505917, 8.2230167, 8.1669,
6.8184667, 6.9700583, 7.0936167, 7.7615667, 7.6239083, 7.0921667,
9.02585, 9.3416167, 7.6256333, 9.0869333, 8.0984667, 4.116325,
6.1680917, 4.56965, 5.797725, 4.36085, 5.42455, 5.144075, 6.1531833,
4.77825, 6.2533417, 5.0192083, 5.99395, 5.6934083, 4.9074167,
4.9823083, 5.9861667, 5.4068833, 5.1872833, 6.10095, 5.659325,
4.6632833, 4.86315, 5.221775, 5.5878, 5.3217083, 4.8202333, 6.4883083,
6.69355, 4.952075, 6.7075583, 5.00015, 5.2502833, 7.2591, 5.6425417,
6.889925, 5.353675, 6.50635, 6.260675, 7.4236583, 5.9076417,
7.3915, 6.2134917, 7.1645333, 6.922675, 6.0295417, 6.1687917,
7.2771083, 6.6152333, 6.3299417, 7.167325, 6.647275, 5.726475,
5.93905, 6.2888583, 6.7497167, 6.4364083, 5.8906583, 7.6052917,
8.039425, 6.5672833, 7.8754667, 6.3086333, 5.352025, 7.2849417,
5.7184833, 6.9675917, 5.5615333, 6.6157917, 6.3505417, 7.4881,
6.0007417, 7.5110583, 6.35525, 7.254075, 7.0289083, 6.1994417,
6.2860833, 7.372575, 6.735975, 6.4628917, 7.3102167, 6.8619417,
5.9123667, 6.1611917, 6.4854083, 6.8942417, 6.563625, 6.0610083,
7.941625, 8.6969167, 6.66075, 8.1197167, 6.2802, 3.9638, 5.870825,
4.1852, 5.5841417, 4.3007583, 5.2352167, 4.4281417, 5.819425,
4.1990917, 5.9338917, 4.89765, 5.7204333, 5.6546833, 4.5632167,
4.9803333, 5.6962417, 5.247725, 4.7092583, 6.0145417, 5.6403917,
4.4016917, 4.7181, 4.5007833, 5.2828917, 5.1314167, 4.7492, 6.777575,
6.9040083, 4.9760583, 6.4471917, 5.0952833, 3.712725, 5.8215333,
4.025725, 5.5635, 4.2354083, 5.143525, 4.4900083, 5.6802417,
4.1214333, 5.8128, 4.7525583, 5.6412583, 5.5534917, 4.487475,
4.8237833, 5.6156917, 5.0573, 4.5755417, 5.8096083, 5.5252083,
4.3145583, 4.5437417, 4.194675, 5.0100833, 4.8972333, 4.590025,
6.6441417, 6.5789417, 4.6947667, 6.1648167, 4.8517333, 4.1059833,
5.9023167, 4.2812417, 5.6593917, 4.3587583, 5.3359583, 4.983275,
6.0223417, 4.6178333, 6.1545333, 5.0244667, 5.9596, 5.7608833,
4.8875333, 4.9990583, 5.9919333, 5.3157417, 5.0169333, 6.024775,
5.6717167, 4.6372083, 4.8370583, 4.7311333, 5.3704, 5.133575,
4.7174917)), .Names = "x", row.names = c(NA, -455L), class = "data.frame")
Updated after some comments:
Since you state that the minimum number of cases in each group would be fine for you, I'd go with Hmisc::cut2
v <- rnorm(10, 0, 1)
Hmisc::cut2(v, m = 3) # minimum of 3 cases per group
The documentation for cut2 states:
m desired minimum number of observations in a group.
The algorithm does not guarantee that all groups will have at least m observations.
The same cuts for separate variables
If the distributions of your variables are very similar you could extract the exact cutpoints by setting the argument onlycuts = T and reuse them for the other variables. In case the distributions are different though, you will end up with few cases in some intervals.
Using your data:
library(magrittr)
library(Hmisc)
cuts <- cut2(df1$x, g = 20, onlycuts = T) # determine cuts based on df1
cut2(df1$x, cuts = cuts) %>% table
cut2(df2$x, cuts = cuts) %>% table*2 # multiplied by two for better comparison
This is a good example of how NOT to pose a question. At last we have an example an, it is possible to post code that applies to it. (You apparently naively pasted the exact code in my comment without thinking about how to express 'n' and 'N' in the context of the problem. I did need to add prob=c( seq(...) , 1) in order to capture the highest values.
This assumes that you want groups of size 100 (although it is still very unclear why this is needed).
x$xct <- cut( x$x, breaks=quantile(x$x, prob=c( seq(100, length(x$x), by=100)/length(x$x) , 1) ))
table(x$xct)
(4.64,5.17] (5.17,5.57] (5.57,5.85] (5.85,6.17] (6.17,6.51] (6.51,6.85]
100 100 100 100 100 100
(6.85,7.26] (7.26,7.94] (7.94,9.36]
100 100 62
Related
radius in nn2() function in RANN r-package
I was trying to use the solution offered here to find all the location from df which are within the 70 km distance from my point of interest userLocation=c(6.9,55.2), but it does not work properly ! df = structure(list(lng = c(6.2694184, 6.25737207, 6.23839104, 6.25844252, 6.22595901, 6.21351832, 6.2010845, 6.1886414, 6.1762058, 6.1637609, 6.15132287, 6.13887619, 6.12643637, 6.14361895, 6.16332364, 6.18302157, 6.2027276, 6.22242688, 6.24213488, 6.26842752, 6.26745135, 6.24518597, 6.26645948, 6.24420242, 6.22357831, 6.26548171, 6.24321746, 6.2226023, 6.20041884, 6.18070459, 6.16099845, 6.16716672, 6.17960629, 6.18686265, 6.2078525, 6.19203657, 6.20447434, 6.21691835, 6.2293537, 6.24179593, 6.26009321, 6.26448764, 6.2422317, 6.21927538, 6.20186455, 6.26350828, 6.24124514, 6.22028969, 6.26251321, 6.2402584, 6.23404584, 6.26153227, 6.22171658, 5.94065657, 6.10363006, 6.11606487, 6.12850589, 6.14093826, 6.15337749, 6.16582359, 6.17826103, 6.19070472, 6.20313974, 6.20009703, 5.96044213, 5.96988333, 5.98023582, 5.98966667, 5.99910246, 6.00003829, 6.00947365, 6.01889843, 6.02832882, 6.01983402, 6.02925771, 6.038687, 6.0481219, 6.05754688, 6.03963788, 6.04906608, 6.05848435, 6.06792377, 6.07735326, 6.08677283, 6.05941948, 6.06885218, 6.07829049, 6.08771889, 6.09713671, 6.10657633, 6.11600538, 6.07922538, 6.08864707, 6.10756108, 6.12000483, 6.13243993, 6.12019786, 6.14488189, 6.15733073, 6.16977091, 6.16621949, 6.13805015, 6.13652024, 5.941545, 6.20491484, 6.18423897, 6.17806466, 6.16355552, 6.15738696, 6.14558294, 6.14286638, 6.13670293, 6.12217027, 6.11601258, 6.10148275, 6.09533146, 6.08080511, 6.07464337, 6.06011984, 6.03729438, 6.05394895, 6.02546329, 6.0136389, 6.03674112, 6.05743408, 6.07812006, 6.09879971, 6.11948795, 6.11063647, 6.08914275, 6.08440881, 6.0018212, 6.02491713, 5.98999461, 6.01308427, 5.97815849, 6.00125809, 5.96632973, 5.98943792, 5.9995124, 6.02119838, 6.04364466, 6.0223476, 6.04560587, 6.03821257, 6.06131821, 6.06046748, 5.97888909, 5.95766873, 6.24771247, 6.04931495, 6.25538943, 6.23227728, 6.25434093, 6.25329159, 6.25225759, 6.25120656, 6.25015469, 6.24911757, 6.06338238, 6.08539205, 6.10756976, 6.12975108, 6.15193667, 6.17411029, 6.19630377, 6.21848591, 6.22602495, 6.23123663, 6.20931486, 6.23019515, 6.20826628, 6.22915282, 6.20721685, 6.22810966, 6.21962063, 6.20209266, 6.20618216, 6.19702482, 6.1799057, 6.15772301, 6.13554395, 6.11336914, 6.09118237, 6.09738412, 6.11958004, 6.12698723, 6.14767387, 6.16835417, 6.18613747, 6.185096, 6.165456, 6.14476821, 6.15765091, 6.23561071, 6.08001353, 6.22353732, 6.2376767, 6.21143885, 6.19936347, 6.18727866, 6.17520066, 6.16311385, 6.15103386, 6.13894506, 6.12686243, 6.11478725, 6.10270261, 6.09818625, 6.12128852, 6.2468456, 6.22571713, 6.24558662, 6.22445138, 6.24434288, 6.22320086, 6.24308194, 6.22194875, 6.24182062, 6.22068065, 6.24057332, 6.21942655, 6.2113264, 6.22341814, 6.19699748, 6.18490568, 6.1988361, 6.17283631, 6.16074252, 6.14867115, 6.13657473, 6.13954049, 6.16263694, 6.18482009, 6.20327221, 6.20009595, 6.19278885, 6.17005571 ), lat = c(54.67598304, 54.83924292, 54.83162024, 54.82483795, 54.82033259, 54.80904336, 54.79775292, 54.78646988, 54.77517665, 54.76389082, 54.75260377, 54.74131515, 54.73002531, 54.72096456, 54.71392047, 54.70687309, 54.69983176, 54.69278713, 54.68573957, 54.68934722, 54.7027117, 54.69910571, 54.71606682, 54.71246092, 54.70614626, 54.72943123, 54.72582507, 54.71951053, 54.71576339, 54.72280423, 54.72985112, 54.74274399, 54.75402944, 54.73569581, 54.72983408, 54.7653223, 54.77660496, 54.78789538, 54.79918423, 54.81047187, 54.80230996, 54.74279524, 54.73918917, 54.74155047, 54.75043676, 54.75615956, 54.75255324, 54.75849353, 54.76951451, 54.76590829, 54.77879358, 54.78287875, 54.84106585, 54.79004116, 54.73264696, 54.7439301, 54.755221, 54.76651031, 54.77779842, 54.78908531, 54.80037062, 54.81166369, 54.82295519, 54.83631649, 54.78306731, 54.79535153, 54.77609951, 54.7883729, 54.80065457, 54.76912877, 54.78140068, 54.7936805, 54.80595963, 54.7621547, 54.77443373, 54.78671208, 54.79898973, 54.81126633, 54.75518666, 54.76746422, 54.77974071, 54.7920169, 54.80429202, 54.81656608, 54.74821493, 54.76049101, 54.7727664, 54.78504073, 54.79732298, 54.80959594, 54.82187682, 54.74124062, 54.75351485, 54.76118719, 54.77247897, 54.78376916, 54.79513973, 54.79505815, 54.80634591, 54.8176321, 54.8309898, 54.82587825, 54.81251828, 54.80340625, 54.85043439, 54.85669012, 54.84379843, 54.8629602, 54.85006754, 54.83850747, 54.86921769, 54.85633303, 54.87548056, 54.86259492, 54.88174916, 54.86885358, 54.88800555, 54.8751176, 54.89427628, 54.89688611, 54.88137801, 54.88534048, 54.87379377, 54.87235508, 54.86608859, 54.85982748, 54.85356275, 54.84730376, 54.83491117, 54.82992904, 54.84309445, 54.86224596, 54.86080945, 54.85069668, 54.84926234, 54.8391549, 54.83771415, 54.82760306, 54.82617384, 54.81401612, 54.81866848, 54.82193287, 54.83203, 54.85454492, 54.8418856, 54.84463218, 54.83126946, 54.807705, 54.81314447, 54.95082492, 54.90870481, 54.85261135, 54.8544958, 54.86597391, 54.87933643, 54.89269028, 54.90605271, 54.91941509, 54.93277779, 54.91930405, 54.92337651, 54.92712528, 54.93087901, 54.93462874, 54.93838307, 54.94213376, 54.94588009, 54.93325056, 54.86785843, 54.86365241, 54.88122101, 54.87701472, 54.89458354, 54.890377, 54.90794604, 54.9204011, 54.92916856, 54.90373959, 54.91606989, 54.92541899, 54.92166542, 54.91791682, 54.91416421, 54.91041621, 54.89753741, 54.90128459, 54.88863077, 54.8823668, 54.87609923, 54.88468492, 54.89804726, 54.89094652, 54.89720452, 54.90830415, 55.08370977, 54.93641839, 55.07226944, 55.06170442, 55.06083624, 55.04939327, 55.03795778, 55.02652115, 55.01508302, 55.00364374, 54.99220296, 54.98077001, 54.96932695, 54.95789135, 54.94469353, 54.94333719, 54.96418243, 54.9585686, 54.97754901, 54.9719349, 54.99090692, 54.98529252, 55.00427341, 54.99865908, 55.01763087, 55.01201625, 55.03099761, 55.02538271, 55.03790915, 55.04934232, 55.02208092, 55.01064489, 54.99998971, 54.99920808, 54.98776941, 54.97632995, 54.9648976, 54.95153594, 54.95007267, 54.95382515, 54.96186591, 54.98662348, 54.97394968, 54.97118391)), class = "data.frame", row.names = c(NA, -238L)) What I have done is as follow : Add the point of interest to the beginning of df df = rbind(userLocation,df) Set the radius to 0.64 since according to here, every 0.1 is equivalent to 11.1 km ! radius <- 0.64 #Identifying neighbors res <- nn2(df, k=nrow(df), searchtype="radius", radius = radius) Since my point of interest is the first row in df I would expect all the non zero index in the first row are the points within my 70 km threshold Ind <- res$nn.idx[1,][res$nn.idx[1,]>0] My Ind object has just one value! Ind [1] 1 but if I plot the data, all of the points are within 70 km distance : I would appreciate it if someone could help me here.
Results from MATLAB's crosscorr function and R's ccf different
I'm using MATLAB's crosscorr function and R's ccf. For the same data, the results differ. It appears that the lag axis is flipped in one of them. Why is this happening? I've reproduced the crosscorr documentation example in both platforms and this is what I see. Any help will be appreciated. R Studio MATLAB The data for the example can be found here: R data: xx <- c(-0.649013765191241, 1.18116604196553, -0.758453297283692, -1.10961303850152, -0.845551240007797, -0.572664866457950, -0.558680764473972, 0.178380225849766, -0.196861446475943, 0.586442621667069, -0.851886969622469, 0.800320709801823, -1.50940472473439, 0.875874147834533, -0.242789536333340, 0.166813439453503, -1.96541870928278, -1.27007139263854, 1.17517126546302, 2.02916018474976, -0.275157240675694, 0.603658445825815, 1.78125189324250, 1.77365832632615, -1.86512257453063, -1.05110705924059, -0.417382047996795, 1.40216228633781, -1.36774699097611, -0.292534999151874, 1.27084843418894, 0.0660093412882059, 0.451290213630776, -0.322209718011896, 0.788409216227425, 0.928736046813314, -0.490790376269763, 1.79720058425494, 0.590696551205452, -0.635785737847226, 0.603346612845761, -0.535247967775900, -0.155080385492789, 0.612122370772160, -1.04434349451734, -0.345631908307050, -1.17140482049761, -0.685586780437283, 0.926216394168962, -1.48167521167231, -0.558057808685045, -0.0284531115706568, -1.47629235201010, 0.258899957160403, -2.01869095243834, 0.199740262298379, 0.425864319131210, -1.27004345059705, -0.485218835743043, 0.594307616829848, -0.276464906639256, -1.85758288592737, 0.0407308117494288, 0.282970177161990, 0.0635612193024994, 0.433430065111595, 0.422860364487685, 1.29952829655200, -1.04979323447507,-1.78641172211092,0.816043081031918, -0.328208543142512, -1.21456561358767,1.11183287253465, -0.507496954829846, 0.898730486034072, 0.377215659958544, 1.45239164558790, 0.446945073178942, 0.645824788453030, -0.623677409296163, -0.595236431548712, 1.61132368718055, -0.348998045314167, 0.164167484938754, -1.63657708517891, 0.581365555343623, -0.128905996910632, 0.432858634222399, -0.245109040039237, -1.08543038934632, 1.68080151955536, 0.176411940863882, -2.07143962693628, 0.211089334851037, -0.582847822547194, 0.0181688430923922, 1.49477799287395, -0.424796733441211, 1.68624315536028) yy <- c(0, 0, 0, 0, -0.649013765191241, 1.18116604196553, -0.758453297283692, -1.10961303850152, -0.845551240007797, -0.572664866457950, -0.558680764473972, 0.178380225849766, -0.196861446475943, 0.586442621667069, -0.851886969622469, 0.800320709801823, -1.50940472473439, 0.875874147834533, -0.242789536333340, 0.166813439453503,-1.96541870928278, -1.27007139263854, 1.17517126546302, 2.02916018474976,-0.275157240675694, 0.603658445825815, 1.78125189324250, 1.77365832632615, -1.86512257453063, -1.05110705924059,-0.417382047996795, 1.40216228633781,-1.36774699097611, -0.292534999151874, 1.27084843418894, 0.0660093412882059, 0.451290213630776, -0.322209718011896, 0.788409216227425, 0.928736046813314, -0.490790376269763, 1.79720058425494, 0.590696551205452, -0.635785737847226, 0.603346612845761, -0.535247967775900, -0.155080385492789, 0.612122370772160,-1.04434349451734, -0.345631908307050,-1.17140482049761, -0.685586780437283, 0.926216394168962, -1.48167521167231,-0.558057808685045, -0.0284531115706568, -1.47629235201010, 0.258899957160403, -2.01869095243834, 0.199740262298379, 0.425864319131210, -1.27004345059705, -0.485218835743043, 0.594307616829848, -0.276464906639256, -1.85758288592737, 0.0407308117494288, 0.282970177161990, 0.0635612193024994, 0.433430065111595, 0.422860364487685, 1.29952829655200, -1.04979323447507, -1.78641172211092, 0.816043081031918, -0.328208543142512, -1.21456561358767, 1.11183287253465, -0.507496954829846, 0.898730486034072, 0.377215659958544, 1.45239164558790, 0.446945073178942, 0.645824788453030, -0.623677409296163, -0.595236431548712, 1.61132368718055, -0.348998045314167, 0.164167484938754, -1.63657708517891, 0.581365555343623, -0.128905996910632, 0.432858634222399, -0.245109040039237, -1.08543038934632, 1.68080151955536, 0.176411940863882, -2.07143962693628, 0.211089334851037,-0.582847822547194) ccf (xx, yy) Matlab data & code: x = [-0.649013765191241 1.18116604196553 -0.758453297283692 -1.10961303850152 -0.845551240007797 -0.572664866457950 -0.558680764473972 0.178380225849766 -0.196861446475943 0.586442621667069 -0.851886969622469 0.800320709801823 -1.50940472473439 0.875874147834533 -0.242789536333340 0.166813439453503 -1.96541870928278 -1.27007139263854 1.17517126546302 2.02916018474976 -0.275157240675694 0.603658445825815 1.78125189324250 1.77365832632615 -1.86512257453063 -1.05110705924059 -0.417382047996795 1.40216228633781 -1.36774699097611 -0.292534999151874 1.27084843418894 0.0660093412882059 0.451290213630776 -0.322209718011896 0.788409216227425 0.928736046813314 -0.490790376269763 1.79720058425494 0.590696551205452 -0.635785737847226 0.603346612845761 -0.535247967775900 -0.155080385492789 0.612122370772160 -1.04434349451734 -0.345631908307050 -1.17140482049761 -0.685586780437283 0.926216394168962 -1.48167521167231 -0.558057808685045 -0.0284531115706568 -1.47629235201010 0.258899957160403 -2.01869095243834 0.199740262298379 0.425864319131210 -1.27004345059705 -0.485218835743043 0.594307616829848 -0.276464906639256 -1.85758288592737 0.0407308117494288 0.282970177161990 0.0635612193024994 0.433430065111595 0.422860364487685 1.29952829655200 -1.04979323447507 -1.78641172211092 0.816043081031918 -0.328208543142512 -1.21456561358767 1.11183287253465 -0.507496954829846 0.898730486034072 0.377215659958544 1.45239164558790 0.446945073178942 0.645824788453030 -0.623677409296163 -0.595236431548712 1.61132368718055 -0.348998045314167 0.164167484938754 -1.63657708517891 0.581365555343623 -0.128905996910632 0.432858634222399 -0.245109040039237 -1.08543038934632 1.68080151955536 0.176411940863882 -2.07143962693628 0.211089334851037 -0.582847822547194 0.0181688430923922 1.49477799287395 -0.424796733441211 1.68624315536028] yy = [0 0 0 0 -0.649013765191241 1.18116604196553 -0.758453297283692 -1.10961303850152 -0.845551240007797 -0.572664866457950 -0.558680764473972 0.178380225849766 -0.196861446475943 0.586442621667069 -0.851886969622469 0.800320709801823 -1.50940472473439 0.875874147834533 -0.242789536333340 0.166813439453503 -1.96541870928278 -1.27007139263854 1.17517126546302 2.02916018474976 -0.275157240675694 0.603658445825815 1.78125189324250 1.77365832632615 -1.86512257453063 -1.05110705924059 -0.417382047996795 1.40216228633781 -1.36774699097611 -0.292534999151874 1.27084843418894 0.0660093412882059 0.451290213630776 -0.322209718011896 0.788409216227425 0.928736046813314 -0.490790376269763 1.79720058425494 0.590696551205452 -0.635785737847226 0.603346612845761 -0.535247967775900 -0.155080385492789 0.612122370772160 -1.04434349451734 -0.345631908307050 -1.17140482049761 -0.685586780437283 0.926216394168962 -1.48167521167231 -0.558057808685045 -0.0284531115706568 -1.47629235201010 0.258899957160403 -2.01869095243834 0.199740262298379 0.425864319131210 -1.27004345059705 -0.485218835743043 0.594307616829848 -0.276464906639256 -1.85758288592737 0.0407308117494288 0.282970177161990 0.0635612193024994 0.433430065111595 0.422860364487685 1.29952829655200 -1.04979323447507 -1.78641172211092 0.816043081031918 -0.328208543142512 -1.21456561358767 1.11183287253465 -0.507496954829846 0.898730486034072 0.377215659958544 1.45239164558790 0.446945073178942 0.645824788453030 -0.623677409296163 -0.595236431548712 1.61132368718055 -0.348998045314167 0.164167484938754 -1.63657708517891 0.581365555343623 -0.128905996910632 0.432858634222399 -0.245109040039237 -1.08543038934632 1.68080151955536 0.176411940863882 -2.07143962693628 0.211089334851037 -0.582847822547194] [XCF,lags,bounds] = crosscorr(xx,yy);
Remove all rows above and below a value in R
We have citizen scientist recording data for us using In-Situ Aqua troll 600 instruments. It is similar to a CTD but not. The data format is a little different. Different enough that I cannot use CTD trim from the OCE package in R. I need to remove all the rows of data during the soak time (time in the water before they start lowering the instrument) and the up cast from the data. That is all the rows after they reached the max depth. So I just need that center portion of my dataframe. My Data Date Time Salinity (ppt) (672441) Chlorophyll-a Fluorescence (RFU) (671721) RDO Concentration (mg/L) (672144) Temperature (°C) (676121) Depth (ft) (671051) 16:29.0 0 0.01089297 7.257619 31.91303 0.008220486 16:31.0 0 0.01765913 7.246986 31.93175 0.1499496 16:33.0 0 0.0130412 7.258863 31.93253 0.5387784 16:35.0 0 0.01299242 7.274049 31.93806 0.6187978 16:37.0 0 0.01429801 7.26965 31.94401 0.6640261 16:39.0 0 0.01342988 7.271608 31.93595 0.681709 16:41.0 0 0.01337719 7.271549 31.93503 0.684597 16:43.0 7.087267 0.007094439 6.98015 31.89018 1.598019 16:45.0 28.3442 0.007111916 6.268753 31.83806 1.687673 16:47.0 31.06357 0.007945394 6.197834 31.77821 1.418773 16:49.0 32.07076 0.0080788 6.166986 31.76881 1.382685 16:51.0 31.95504 0.004382414 6.191305 31.72906 1.358556 16:53.0 36.21165 0.01983912 5.732656 29.3942 123.4148 16:55.0 36.37849 0.02243886 5.626586 28.82502 125.2927 16:57.0 36.43061 0.02416219 5.450325 28.23787 126.7997 16:59.0 36.44484 0.02441683 5.421676 28.14037 127.0321 17:01.0 36.46815 4.510316 5.318929 28.09501 127.2064 17:03.0 36.41381 4.012657 5.241654 28.14595 127.2227 17:05.0 36.42724 0.7891375 5.174401 28.20383 127.2019 17:07.0 36.41064 0.4351442 5.120181 28.18592 127.197 17:09.0 36.38155 0.2253969 5.033384 28.21021 127.1895 17:11.0 36.37671 0.2089337 5.019629 28.21222 127.1885 17:13.0 36.43813 0.08728585 4.981099 28.17526 127.2223 17:15.0 36.47644 0.904435 4.951878 28.13579 127.2108 17:17.0 36.54742 0.1230291 4.93056 28.06166 127.2307 17:19.0 36.60466 10.04291 4.908442 27.9397 126.6003 17:21.0 36.61511 11.33922 4.904828 27.92038 126.5161 17:23.0 36.68179 0.6680982 4.87018 27.78319 123.707 17:25.0 36.74612 0.06539913 4.848994 27.72977 119.906 17:27.0 36.75729 0.02414635 4.826871 27.72545 114.9537 17:29.0 37.1578 0.01556828 4.804105 27.81129 113.3405 > depthmax<- max(WS$`Depth (ft) (671051)`, na.rm = TRUE) > output <- WS[WS$"Depth (ft) (671051)" < depthmax,] > Output2 <- output[output$"Depth (ft) (671051)" > 1,] I tried these and got output2 to work but can't seam to get output to work. Is there a more elegant way to do this? Just to recap I need to remove all rows after the depthmax (127.2307) and all the rows before the depth when they start lowering the instrument (~2.41).
Your code does remove the maximum depth, but not the rows after the maximum depth is reached. You want to locate the row index of the the maximum depth and delete that row and the ones after: start <- tail(which(na.omit(WS$`Depth (ft) (671051)`) < 2.41), 1) + 1 end<- which.max(na.omit(WS$`Depth (ft) (671051)`)) - 1 output <- WS[start:end, ] The first line finds the index of the last row less than 2.41 and adds 1 to get the starting row. The second line finds the index of the maximum depth and subtracts 1 to get the row before that.
Plotting multiple series (scatter line) with same x axis on one plot [duplicate]
This question already has answers here: Plot multiple columns on the same graph in R [duplicate] (4 answers) Closed 4 years ago. I have a compositional data set. I have a set of columns (samples) which contain percentage data. Each row (channel-diameter in my case) is therefore a particular variable that each sample has a percentage of. E.g. Channel diameter (um) sample2 sample3 sample8 sample9 sample17 0.375198 0.0365797 0.0424338 0.0158648 0.02944 0.0157091 0.411878 0.0647681 0.0750611 0.0280678 0.052028 0.0278099 0.452145 0.0956633 0.111489 0.0415484 0.0770551 0.0410209 0.496347 0.137893 0.162464 0.0601572 0.111755 0.0589772 0.544872 0.175746 0.210556 0.0771818 0.143911 0.0748565 0.59814 0.210752 0.257403 0.0932129 0.174446 0.089273 0.656615 0.244288 0.304665 0.10884 0.204511 0.102797 0.720807 0.278281 0.354677 0.124906 0.235612 0.11622 0.791275 0.31069 0.405324 0.140553 0.266354 0.128626 0.868632 0.339832 0.454374 0.15495 0.295125 0.139238 0.953552 0.365523 0.500985 0.167898 0.321535 0.147978 1.04677 0.387791 0.544478 0.179338 0.345493 0.154899 1.14911 0.407715 0.585383 0.189749 0.367873 0.160534 1.26145 0.424342 0.622144 0.1988 0.388226 0.164562 1.38477 0.437851 0.654347 0.206637 0.406776 0.167147 1.52015 0.448418 0.681951 0.213521 0.424175 0.168487 1.66876 0.457694 0.706822 0.220449 0.442197 0.169372 1.8319 0.466729 0.730714 0.228307 0.462539 0.170336 2.011 0.476516 0.755269 0.237889 0.48627 0.171799 2.2076 0.487906 0.782015 0.249849 0.514036 0.174083 2.42342 0.501736 0.81248 0.264752 0.546016 0.177432 2.66033 0.51929 0.848837 0.283331 0.582431 0.182235 2.92042 0.541324 0.892608 0.305976 0.62241 0.188562 3.20592 0.568374 0.944571 0.332691 0.663758 0.196293 3.51934 0.599897 1.00394 0.362726 0.702966 0.204848 3.8634 0.635522 1.06984 0.395209 0.736726 0.213754 4.2411 0.674643 1.14148 0.429266 0.762942 0.222574 4.65572 0.717242 1.21878 0.464337 0.780965 0.231205 5.11087 0.76318 1.30134 0.499874 0.791079 0.23963 5.61052 0.812207 1.38818 0.535257 0.794286 0.247904 6.15902 0.863791 1.478 0.570021 0.793137 0.256198 6.76114 0.917491 1.56991 0.604296 0.792304 0.264896 7.42212 0.973638 1.66349 0.638955 0.797465 0.274726 8.14773 1.03178 1.75579 0.674653 0.812076 0.286046 8.94427 1.09013 1.83889 0.710974 0.834453 0.298613 9.81869 1.14346 1.89853 0.745919 0.857295 0.310908 10.7786 1.18666 1.92001 0.77754 0.871976 0.321507 11.8323 1.21701 1.89513 0.805316 0.873849 0.329678 12.9891 1.23962 1.82973 0.830937 0.865868 0.336774 14.2589 1.26532 1.74176 0.855741 0.855011 0.345067 15.6529 1.30625 1.65369 0.877552 0.846228 0.35632 17.1832 1.37039 1.58411 0.889331 0.838007 0.370876 18.863 1.45674 1.54142 0.881201 0.822809 0.387127 20.7071 1.55939 1.52665 0.846803 0.793656 0.403864 22.7315 1.6691 1.53505 0.7877 0.749241 0.42062 24.9538 1.78027 1.56136 0.714575 0.696582 0.438749 27.3934 1.89095 1.60072 0.642402 0.646703 0.459818 30.0714 2.00065 1.64851 0.583683 0.608531 0.484033 33.0113 2.10867 1.70138 0.544639 0.585721 0.50996 36.2385 2.21043 1.75647 0.52405 0.57563 0.534436 39.7813 2.30025 1.81358 0.51599 0.572488 0.554934 43.6704 2.37323 1.87431 0.512943 0.570144 0.570926 47.9397 2.42843 1.9405 0.509019 0.564741 0.584767 52.6264 2.47132 2.01249 0.502374 0.556494 0.60136 57.7713 2.51141 2.08592 0.494986 0.549106 0.626145 63.4192 2.55913 2.15193 0.491467 0.549135 0.663665 69.6192 2.62015 2.19769 0.496593 0.563727 0.716063 76.4253 2.69185 2.2104 0.513592 0.598638 0.782837 83.8969 2.7645 2.18362 0.543656 0.656535 0.861711 92.0988 2.826 2.12191 0.585723 0.735146 0.949471 101.103 2.86765 2.04062 0.637396 0.827534 1.04325 110.987 2.88366 1.95552 0.695157 0.923739 1.1402 121.837 2.86566 1.87145 0.756528 1.01592 1.23771 133.748 2.79489 1.77514 0.820962 1.10124 1.33218 146.824 2.64552 1.64295 0.891174 1.18226 1.41966 161.177 2.39707 1.45793 0.970232 1.26132 1.49497 176.935 2.05403 1.2272 1.05834 1.3359 1.55358 194.232 1.65431 0.983299 1.14894 1.39739 1.59279 213.221 1.25961 0.76806 1.2303 1.43626 1.61461 234.066 0.932009 0.612117 1.29251 1.45138 1.62797 256.948 0.708748 0.526414 1.33762 1.45415 1.64839 282.068 0.596048 0.507822 1.3814 1.46446 1.69346 309.644 0.577959 0.544827 1.44754 1.49861 1.77678 339.916 0.628404 0.619653 1.55335 1.5589 1.90214 373.147 0.712328 0.706958 1.70468 1.63116 2.06529 409.626 0.785738 0.771807 1.88764 1.69265 2.25554 449.672 0.808987 0.78167 2.07784 1.72831 2.46479 493.633 0.763715 0.72189 2.24938 1.74795 2.69133 541.892 0.665773 0.613547 2.40785 1.79079 2.94904 594.869 0.562534 0.5129 2.57317 1.90894 3.24541 653.025 0.501094 0.468972 2.78062 2.1497 3.57554 716.866 0.509195 0.50887 3.03373 2.53054 3.89743 786.949 0.584288 0.63571 3.33205 3.02726 4.16047 863.883 0.682625 0.805693 3.64631 3.56815 4.31705 948.338 0.736664 0.946139 3.93691 4.04493 4.34466 1041.05 0.679724 0.960431 4.14637 4.34887 4.24767 1142.83 0.462905 0.717301 4.29187 4.39208 4.08343 1254.55 0.212328 0.364022 4.3391 4.11532 3.85551 1377.2 0.0459161 0.0848405 4.29583 3.5172 3.56707 1511.84 0.00420859 0.00868247 4.08845 2.63958 3.15498 1659.64 0 0 3.86265 1.92542 2.794 1821.89 0 0 3.64037 1.16644 2.40284 2000 I would like to plot each sample as a scatter(line) on the same graph. X axis would be channel diameter (the rows), and y axis would be the percentage data that's in the columns. Most things I've tried don't seem to recognize the first column as the x axis value.
Create an empty plot: plot(x=NA, y=NA, xlim=c(0,1), ylim=c(0,1), xlab="My X Label", ylab="My Y Label", main="My Title") The add your lines one at a time: for(i in 2:ncol(df)) { lines(x=df[,1], y=df[,i]) } This code assumes your data.frame with your data is called df and that you want to plot all columns as y variables except the first column which you treat as your x variable.
How can I apply fisher test on this set of data (nominal variables)
I'm pretty new in statistics: fisher = function(idxToTest, idxATI){ idxDependent=c() dependent=c() p = c() for(i in c(1:length(idxToTest))) { tbl = table(data[[idxToTest[i]]], data[[idxATI]]) rez = fisher.test(tbl, workspace = 20000000000) if(rez$p.value<0.1){ dependent=c(dependent, TRUE) if(rez$p.value<0.1){ idxDependent = c(idxDependent, idxToTest[i]) } } else{ dependent = c(dependent, FALSE) } p = c(p, rez$p.value) } } This is the function I use. It seems to work. What I understood until now is that I have to pass as first parameter data like: Men Women Dieting 10 30 Non-dieting 5 60 My data comes from a CSV: data = read.csv('***.csv', header = TRUE, sep=','); My first problem is that I don't know how to converse from: Loan.Purpose Home.Ownership lp_value_1 ho_value_2 lp_value_1 ho_value_2 lp_value_2 ho_value_1 lp_value_3 ho_value_2 lp_value_2 ho_value_3 lp_value_4 ho_value_2 lp_value_3 ho_value_3 to: ho_value_1 ho_value_2 ho_value_3 lp_value1 0 2 0 lp_value2 1 0 1 lp_value3 0 1 1 lp_value4 0 1 0 The second issue is that I don't know what the second parameter should be POST UPDATE: This is what I get using fisher.test(myTable): Error in fisher.test(test) : FEXACT error 501. The hash table key cannot be computed because the largest key is larger than the largest representable int. The algorithm cannot proceed. Reduce the workspace size or use another algorithm. where myTable is: MORTGAGE NONE OTHER OWN RENT car 18 0 0 5 27 credit_card 190 0 2 38 214 debt_consolidation 620 0 2 87 598 educational 5 0 0 3 7 ...
Basically, fisher tests only work on smallish data sets because they require alot of memory. But all is good because chi-square tests make minimal additional assumptions and are easier on the computer. Just do: chisq.test(Loan.Purpose,Home.Ownership) to get your p-values. Make sure you read through and understand the help page for chisq.test, especially the examples at the bottom. http://stat.ethz.ch/R-manual/R-patched/library/stats/html/chisq.test.html Then look at a mosaicplot to see the quantities like: mosaicplot(Loan.Purpose,Home.Ownership) this reference explains how mosaicplots work. http://alumni.media.mit.edu/~tpminka/courses/36-350.2001/lectures/day12/