Related
I hope you're doing well! I have a theoretical time-series analysis problem I hope that you can help me sort out.
To start, you'll find a reproducible example of my dataset below. Date is in a daily timescale. Q25 is 25th or lower quartile of my non-transformed data, Q75 is the 75th or upper quartile of my non-transformed data, fit is local weighted fit of the median, firstder is the first derivative of fit, and secondder is the second derivative of fit.
Plotting out fit produces two oscillations and then a steady increase in the data. Plotting the quartiles around that produces a large spread of data that narrows towards the increase in fit data. The first derivative shows the rate of change of the fit and this is where my issue comes in. I'm not sure where the increase in fit data starts based on the first derivative data. Logically, I know the signal-to-noise start date has to occur after March 7 (based on the quartiles), and before March 20 (before the steady increase in fit data). And this is also represented in the first derivative for about the same interval where the negative-to-positive inflection point changes on March 5th, becomes positive on March 16th, and then produces a stationary time series.
All that being said, should my exact start date be the change in the inflection point from the first derivative, or be the first positive value on March 16th?
I appreciate your time in this problem and any thoughts you may have!
data<-structure(list(Date = structure(c(1485950474, 1486036874, 1486123274,
1486209674, 1486296074, 1486382474, 1486468874, 1486555274, 1486641674,
1486728074, 1486814474, 1486900874, 1486987274, 1487073674, 1487160074,
1487246474, 1487332874, 1487419274, 1487505674, 1487592074, 1487678474,
1487764874, 1487851274, 1487937674, 1488024074, 1488110474, 1488196874,
1488283274, 1488369674, 1488456074, 1488542474, 1488628874, 1488715274,
1488801674, 1488888074, 1488974474, 1489060874, 1489147274, 1489233674,
1489320074, 1489406474, 1489492874, 1489579274, 1489665674, 1489752074,
1489838474, 1489924874, 1490011274, 1490097674, 1490184074, 1490270474,
1490356874, 1490443274, 1490529674, 1490616074, 1490702474, 1490788874,
1490875274, 1490961674, 1491048074, 1491134474, 1491220874, 1491307274,
1491393674, 1491480074, 1491566474, 1491652874, 1491739274, 1491825674,
1491912074, 1491998474, 1492084874, 1492171274, 1492257674, 1492344074,
1492430474, 1492516874, 1492603274, 1492689674, 1492776074, 1492862474,
1492948874, 1493035274, 1493121674, 1493208074, 1493294474, 1493380874,
1493467274, 1493553674, 1493640074, 1493726474, 1493812874, 1493899274,
1493985674, 1494072074, 1494158474, 1494244874, 1494331274, 1494417674,
1494504074, 1494590474, 1494676874, 1494763274, 1494849674, 1494936074,
1495022474, 1495108874, 1495195274, 1495281674, 1495368074), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Q25 = c(-1.61495132528742, -3.86616056128065, -3.92140420424278,
-4.8011229557052, -8.64427034627082, -3.11323607034871, -4.3673083843457,
-1.45023104534208, 0.395769745934938, -1.49394189431791, -3.54063822876105,
-4.36090193633662, -0.966958995958447, -2.43233048854294, -0.181367797683111,
0.826258942687981, 3.36833418895383, -6.8991417494414, -1.15773470862185,
-1.75360705873163, 1.83790453304777, 2.11575746130393, -3.82025172988123,
0.679651741170909, -4.64628184041103, -6.91923314565111, 0.550274303541761,
0.104011128328036, -0.895257855280075, -0.801630235696042, 2.27958927430356,
2.98003963398985, 3.41649824319921, 1.56559818977215, -2.20923132476973,
0.552658760232765, 0.15158829140461, -4.75454688546242, -0.595460561248954,
-2.53729443345183, -0.826010503400985, -5.20578683534568, -2.78364193219594,
-3.62503323095109, 3.37820215582788, -2.53645164034493, -1.76051141957494,
-1.0256290530567, 1.94178279643985, 0.261239031590387, 0.00321585342072063,
2.87814873140354, -2.26732156613212, 2.65097224867168, -4.16746046231376,
1.64816233695592, 3.50505415841016, 2.83685877611882, 1.66353660199615,
2.27900517713667, 5.47721995923733, -5.31044894311933, 7.30753839733595,
5.50143585044911, -1.25129055380416, -2.41051058119916, 3.69266303212359,
2.28752278841533, -0.275687673398348, 5.74597173218469, 6.5773422259343,
3.72096844335478, 2.05388534852328, 5.41063696868948, 0.526467452167141,
1.60445671702256, -1.80394989627014, 1.56432488418924, 5.95370989889123,
7.94953250403525, 4.09121878799004, 2.11516919787794, -2.12808005361608,
6.77215849921842, 9.53718510298556, 4.16562173164636, 10.4573226478082,
7.703077796612, 7.55811710979136, 4.47194951592662, 10.2104312432178,
11.3454383477984, 0.997649090931488, 4.84898050707927, 10.8819209584302,
8.06296236341084, 11.3317616787558, 7.51878628894305, 7.87729934765305,
11.9108509727303, 6.77401202490232, 5.36297357453455, 10.6362047038983,
8.68979831512869, 4.0465996534104, 11.9579904470733, 9.41141176380086,
10.5754750604254, 12.6944336852953, 7.61563466861022), Q75 = c(5.93775779359077,
26.4536084846094, 7.92690107568623, 16.195405687679, 3.47567054091916,
34.9690262666155, 15.5126126583077, 24.4425589002446, 29.7425859431597,
23.1420118192775, 26.827758017105, 18.6306368759596, 19.759179203689,
10.0667740183259, 30.9080218485755, 10.0628623899296, 21.1120424008512,
12.1232187464341, 14.9571040303508, 11.4927011052638, 16.1617172813173,
19.0606972964125, 8.39991659547325, 9.5080530252195, 10.2717546026802,
12.018391863395, 27.2666992661895, 12.5172584337237, 19.9658806224003,
6.90019918091751, 18.4119063276997, 23.2991253786256, 27.95161418973,
16.9477966472485, 26.3880458021082, 19.2178725103802, 5.58699033890406,
9.82525729279156, 6.22139350667344, 5.6625294221828, 8.18283315939774,
4.78856479855966, 4.91215612536983, 5.35278870440784, 15.7471499356884,
7.95473965312171, 7.58463611165082, 6.03119890210746, 9.88624343762245,
6.66377352843609, 6.92675024060609, 7.20403099201013, 6.96877369392089,
17.7034248870798, 6.22890341708267, 6.1624397247754, 23.3856864094132,
7.13518162203812, 6.96344109315883, 7.69414570220079, 18.0859103957135,
7.52300478408242, 10.1635801549871, 10.021556657451, 8.51746254314866,
7.83000625461296, 15.4938419153615, 8.6844260972191, 8.07596479745038,
13.1423674521087, 8.04161364299224, 10.7442773622841, 8.58410892324644,
9.08436532340561, 8.84748510783176, 9.27529549461203, 9.01978932806698,
9.99776533859531, 9.61123990151036, 11.2228855544025, 10.3285714984086,
10.7107229417799, 11.452541129334, 11.9951421202043, 11.3568792509498,
11.139621487692, 12.957244784325, 13.1010906952192, 11.8445972599726,
12.8124554609003, 12.1817389611984, 13.4529860098547, 13.1808997426024,
12.568956945967, 13.9405958892683, 14.4445923505263, 14.5816203429081,
12.798362023978, 13.7926596005317, 14.3284196983115, 14.3967490595795,
14.3699332949429, 13.8061418130819, 15.4045229902535, 15.328632395916,
15.5928587109464, 15.5111381098579, 15.7167488979248, 16.4121827249844,
16.7700564366026), fit = c(1.3157822724014, 1.44491806546299,
1.67963756121542, 1.96834398237369, 2.32222986513481, 2.73223146146706,
3.16143742264514, 3.74278329406317, 4.4673163398484, 5.08529278937518,
5.58735598987316, 6.01592790788482, 6.19893270175371, 6.0219082198616,
5.64253432163072, 5.29694818196536, 4.89670493804841, 4.35145910275626,
3.89449691453349, 3.48150649031492, 3.06858491643756, 2.88963188544926,
3.13399806321574, 3.62311989322663, 4.03902573446563, 4.40598627768245,
4.84291047423098, 5.1737840740012, 5.3972440468493, 5.5747020603732,
5.62430591107552, 5.42843052467024, 5.07513358262307, 4.79108701506415,
4.59907825712695, 4.39731440509327, 4.22559688081583, 4.10100609028878,
4.00444369172723, 3.92144298531529, 3.82259220819525, 3.72499526558926,
3.68395895980124, 3.69588308031619, 3.73924432798967, 3.84246487218137,
4.07884774763199, 4.41108295888359, 4.70167312999791, 4.95537881350854,
5.2206483181831, 5.42551590243433, 5.52148736399275, 5.55736071284688,
5.60710852579646, 5.65757759073701, 5.68911425674423, 5.76594044238814,
5.93786454015275, 6.15175825295678, 6.31743846502224, 6.40077523837882,
6.45704948591979, 6.53019436816257, 6.59356685208809, 6.63353784524384,
6.71356141899707, 6.88849022040772, 7.11437487009308, 7.30646639975639,
7.43724432723552, 7.55279324817994, 7.67877181101032, 7.76924002146674,
7.83161170884946, 7.97157625691941, 8.25223488219952, 8.60947602940562,
8.95816992458796, 9.34076728750423, 9.77554331222275, 10.1411049362597,
10.3842988541376, 10.5696053585185, 10.7520817841281, 10.9357595672387,
11.0970528791622, 11.2495931571849, 11.3764752236255, 11.4864715266717,
11.6317299424136, 11.8381584436134, 12.0667779318613, 12.2724056764894,
12.462010561811, 12.6517333832877, 12.8101492769744, 12.9055352762602,
12.9678598772259, 13.0582354099638, 13.1489397497677, 13.2204738414797,
13.346284619515, 13.6054940294766, 13.9436193637562, 14.2337005769519,
14.5449448398809, 14.8895799498019, 15.0551768009747, 15.0689572800127
), firstder = c(0.0542499277820437, 0.193160412687084, 0.264645386746196,
0.318230646770668, 0.390583391620104, 0.410606699200811, 0.484714112557398,
0.683182658658343, 0.699350916534123, 0.546311900646561, 0.476582322984034,
0.33921923563074, 0.000346679118119919, -0.32275830659655, -0.377372654859586,
-0.342379980870621, -0.492111485610006, -0.524917784293232, -0.414059192641829,
-0.430018688265099, -0.343482693656914, 0.0295127267198723, 0.42189373253822,
0.482044173095213, 0.364522990904745, 0.40991488301477, 0.40715895907959,
0.264020778627613, 0.200548459021332, 0.136695124879259, -0.0667758528503706,
-0.308783766357995, -0.344835056787729, -0.22338628389576, -0.19056674389956,
-0.195775242472453, -0.146360055189657, -0.107867992742261, -0.0856184200473131,
-0.0883963049921002, -0.106496806989568, -0.0747428483921662,
-0.0103234284849929, 0.028493059030597, 0.0620691939868203, 0.163240621281308,
0.304123951137378, 0.325609827601989, 0.261609166722046, 0.261432729205552,
0.249586474110962, 0.150199026157553, 0.0521536950613295, 0.0370628072573624,
0.0565243651980056, 0.0371337817771211, 0.0409727028064402, 0.124422569131023,
0.207609809433488, 0.20232516927351, 0.121600832063498, 0.058433044321534,
0.0638003776220697, 0.0745713396178918, 0.0471802520722933, 0.0467708263829785,
0.126045851395065, 0.213953247074989, 0.220308792495525, 0.158550331399022,
0.11422743390592, 0.123806714974779, 0.114997378074604, 0.0651990840907102,
0.0828996021118185, 0.210617558388392, 0.336478451788591, 0.356675237198802,
0.354610868118913, 0.419333862640583, 0.419974858146042, 0.301270480481834,
0.201419853206041, 0.17882844049566, 0.186628379656891, 0.172934534594114,
0.156583940148236, 0.142289490196014, 0.11234075824169, 0.119081439314575,
0.177295034391252, 0.226764155293772, 0.22057696671022, 0.193643700730051,
0.190744241252391, 0.181381161962744, 0.127949080877661, 0.0681406193708671,
0.0729227267768433, 0.0983314755975622, 0.0766175682172481, 0.0819140886989596,
0.188151474480757, 0.320764600927798, 0.32011829707578, 0.283266397091015,
0.351814702578002, 0.276441515194414, 0.0724489974588587, -0.0273030060468944
), secondder = c(0.172623240328004, 0.105197729482076, 0.0377722186361492,
0.0693983014127931, 0.0753071882860794, -0.0352605731246656,
0.18347539983784, 0.213461692364051, -0.181125176612492, -0.124952855162631,
-0.0145063001624228, -0.260219874544165, -0.417525238481075,
-0.228684732948264, 0.119456036422192, -0.0494706884442619, -0.249992321034509,
0.184379723668058, 0.037337459634748, -0.0692564508812885, 0.242328440097658,
0.503662400655915, 0.281099610980781, -0.160798729866795, -0.0742436345141417,
0.165027418734192, -0.170539266604553, -0.1157370942994, -0.0112075449131641,
-0.116499123370982, -0.290442832088276, -0.193572994926973, 0.121470414067507,
0.12142713171643, -0.055788051724031, 0.0453710545782453, 0.0534593199873461,
0.0235248049074466, 0.0209743404824492, -0.0265301103720232,
-0.00967089362291196, 0.0731788108177152, 0.0556600289966314,
0.0219729460345484, 0.0451793238778984, 0.157163530711077, 0.124603129001063,
-0.081631376071841, -0.0463699456880455, 0.0460170706550578,
-0.0697095808442372, -0.129065315062581, -0.0670253471298663,
0.0368435715219322, 0.0020795443593542, -0.0408607112011232,
0.0485385532597613, 0.118361179389404, 0.0480133012155273, -0.058582581535485,
-0.102866092884539, -0.0234694825993884, 0.0342041492004599,
-0.0126622252088158, -0.0421199498823812, 0.0413010985037516,
0.117248951520421, 0.0585658398394289, -0.045854748998357, -0.0776621731946507,
-0.0109836217915529, 0.0301421839292724, -0.0477608577296227,
-0.0518357302381656, 0.0872367662803821, 0.168199146272765, 0.0835226405276321,
-0.0431290697072093, 0.039000331547431, 0.0904456574959092, -0.0891636664849909,
-0.148245088843424, -0.0514561657081618, 0.00627334028739845,
0.00932653803506511, -0.036714228160621, 0.00401303926886598,
-0.0326019391733094, -0.0272955247353401, 0.0407768868811118,
0.0756503032722406, 0.0232879385327998, -0.0356623156999039,
-0.0182042162604343, 0.012405297305115, -0.0311314558844096,
-0.0757327062857556, -0.0438842167278324, 0.0534484315397847,
-0.00263093389834701, -0.0407968808622812, 0.0513899218257041,
0.161084849737891, 0.10414140315619, -0.105434010860225, 0.0317302108906947,
0.105366400083279, -0.256112774850456, -0.151872260620654, -0.0476317463908522
)), row.names = c(NA, -110L), class = c("tbl_df", "tbl", "data.frame"
))
If the problem is to find where the fit column starts rising then fit a curve made up of a horizontal line segment followed by a sloped line segment (red in the graph) and report the changepoint (Date0 and dashed line in graph).
# calculate starting values, st
fm0 <- lm(fit ~ Date, data, subset = seq(to = nrow(data), length = 20))
st <- c(mean(data$fit[1:20]), coef(fm0))
names(st) <- c("a0", "a", "b")
fm <- nls(fit ~ pmax(a0, a + b * as.numeric(Date)), data, start = st)
# solve a0 = a + b * Date0 for Date0 using calculated a0, a and b
Date0 <- with(as.list(coef(fm)), .POSIXct((a0 - a)/b))
plot(fit ~ Date, data, ylab = "")
lines(fitted(fm) ~ Date, data, col = "red")
abline(v = Date0, lty = 2)
Date0
## [1] "2017-03-21 07:53:56 EDT"
I have a dataset with geometry through the sf package included with lat and lon information. I also have a heat map created as a result of the dataset
Overlay<-stat_density2d(aes(x = df$LONGITUDE, y = fd$LATITUDE, fill = ..density..), geom = 'tile', contour = F, alpha = .5)
Is there a way to join the heat map density color values into the point file?
st_join(Shots1B,Overlay)
gives me an error
Error in st_join.sf(Shots1B, Overlay) :
second argument should be of class sf: maybe revert the first two arguments?
Reproductive dataset to follow the process. Overlay is based on the result of this dataset here.
structure(list(LATITUDE = c(40.68358, 40.69754, 40.843464, 40.692547,
40.626457, 40.526894, 40.840775, 40.694035, 40.857365, 40.698807,
40.71402, 40.815, 40.55079, 40.655903, 40.890076, 40.650402,
40.79335, 40.72538, 40.75184, 40.649788, 40.686928, 40.712963,
40.801285, 40.633976, 40.670296, 40.66423, 40.817696, 40.668495,
40.841087, 40.70955, 40.733376, 40.700356, 40.83801, 40.66584,
40.761436, 40.74958, 40.73197, 40.76249, 40.668507, 40.638268,
40.696735, 40.870823, 40.574867, 40.866577, 40.775414, 40.84744,
40.542908, 40.78468, 40.632416, 40.714207, 40.727913, 40.854485,
40.698986, 40.841717, 40.861687, 40.691822, 40.856014, 40.83383,
40.68781, 40.642044, 40.69814, 40.64664, 40.680897, 40.760822,
40.74608, 40.626293, 40.767967, 40.673634, 40.579212, 40.57365,
40.73632, 40.619396, 40.820263, 40.601864, 40.810318, 40.666306,
40.708805, 40.826424, 40.63174, 40.727146, 40.67253, 40.702335,
40.587894, 40.67922, 40.65047, 40.836555, 40.870056, 40.579372,
40.805138, 40.85968, 40.605595, 40.819214, 40.827972, 40.66496,
40.719177, 40.748825, 40.733597, 40.54048, 40.738403, 40.68039,
40.817627, 40.751446, 40.76161, 40.689648, 40.596977, 40.63864,
40.565254, 40.655895, 40.68821, 40.71649, 40.876785, 40.86367,
40.835827, 40.793396, 40.84827, 40.656273, 40.693462, 40.66725,
40.844105, 40.651707, 40.680496, 40.834415, 40.7357, 40.771038,
40.69484, 40.785774, 40.733017, 40.709023, 40.692886, 40.620487,
40.618595, 40.803787, 40.82319, 40.680088, 40.827927, 40.66895,
40.879055, 40.67043, 40.875874, 40.675037, 40.767582, 40.734352,
40.63083, 40.63532, 40.714073, 40.702194, 40.764362, 40.69496,
40.79656, 40.805016, 40.66406, 40.7963, 40.66563, 40.680477,
40.737785, 40.778606, 40.75868, 40.856045, 40.880257, 40.60677,
40.695683, 40.667236, 40.8351, 40.633682, 40.698116, 40.84747,
40.8047, 40.762676, 40.7158, 40.75584, 40.772102, 40.681602,
40.62677, 40.707493, 40.8252, 40.854115, 40.768875, 40.629707,
40.72654, 40.634415, 40.66937, 40.89466, 40.669067, 40.681484,
40.82433, 40.856606, 40.65785, 40.62764, 40.58401, 40.71791,
40.780437, 40.73973, 40.7952, 40.694794, 40.614063, 40.633152,
40.612736, 40.70166, 40.80641, 40.762234, 40.863647, 40.576626,
40.60118, 40.64721, 40.681145, 40.57529, 40.786, 40.601128, 40.827923,
40.805824, 40.642776, 40.86674, 40.678375, 40.74209, 40.81228,
40.604195, 40.84383, 40.759163, 40.652927, 40.69097, 40.718864,
40.683174, 40.749744, 40.738316, 40.839382, 40.66806, 40.74715,
40.663776, 40.843903, 40.836296, 40.655285, 40.70166, 40.64606,
40.72119, 40.708363, 40.674004, 40.729176, 40.86832, 40.598515,
40.695004, 40.72773, 40.704563, 40.66807, 40.66944, 40.684082,
40.69349, 40.765266, 40.74613, 40.74708, 40.87482, 40.70399,
40.649788, 40.69507, 40.788673, 40.847897, 40.68896, 40.695377,
40.880657, 40.828114, 40.781265, 40.848736, 40.65989, 40.748436,
40.61033, 40.752556, 40.829697, 40.718826, 40.65241, 40.852673,
40.851555, 40.707928, 40.891876, 40.58947, 40.74668, 40.85814,
40.708626, 40.73464, 40.62855, 40.65563, 40.687046, 40.70326,
40.633114, 40.62046, 40.75964, 40.64254, 40.783146, 40.705452,
40.74425, 40.75348, 40.84307, 40.620914, 40.80889, 40.78847,
40.712776, 40.75868, 40.74661, 40.835705, 40.688404, 40.781715,
40.730644, 40.75218, 40.731422, 40.761234, 40.668976, 40.637276,
40.788685, 40.87356, 40.795006, 40.820095, 40.594334, 40.666306,
40.673008, 40.583626, 40.874474, 40.633995, 40.772327, 40.704937,
40.653873, 40.677917, 40.59857, 40.809563, 40.68836, 40.666737,
40.713173, 40.73006, 40.652317, 40.76122, 40.588722, 40.643456,
40.865532, 40.67612, 40.620663, 40.72166, 40.733723, 40.745686,
40.875294, 40.803555, 40.7605, 40.661995, 40.69045, 40.658672,
40.711227, 40.700485, 40.816555, 40.861862, 40.875793, 40.68657,
40.654705, 40.637054, 40.6191, 40.734566, 40.714912, 40.74734,
40.6963, 40.63598, 40.724358, 40.586277, 40.671932, 40.650703,
40.61378, 40.727375, 40.573204, 40.671604, 40.740276, 40.684,
40.704494, 40.845642, 40.82681, 40.681168, 40.662476, 40.64739,
40.687138, 40.865143, 40.866673, 40.72313, 40.674934, 40.708363,
40.739525, 40.637997, 40.750965, 40.671585, 40.694294, 40.810173,
40.694748, 40.687103, 40.861744, 40.741074, 40.67875, 40.666943,
40.6635, 40.827824, 40.575832, 40.730366, 40.640945, 40.784237,
40.76803, 40.669823, 40.659336, 40.616093, 40.763546), LONGITUDE = c(-73.97617,
-73.98312, -73.836, -73.990974, -73.918, -74.16728, -73.87246,
-73.72679, -73.84657, -73.91837, -73.74827, -73.89402, -74.20098,
-73.89817, -73.819855, -73.89422, -73.97275, -74.00011, -73.90358,
-73.9622, -73.920815, -73.93647, -73.95394, -74.02211, -73.997604,
-73.919106, -73.922615, -73.925606, -73.86447, -73.95887, -73.86665,
-73.95732, -73.87329, -73.75551, -73.76995, -73.86541, -73.78651,
-73.839584, -73.779625, -73.93187, -73.93481, -73.8721, -74.00069,
-73.8722, -73.91984, -73.89968, -74.15579, -73.80911, -73.94724,
-73.92817, -73.873245, -73.854645, -73.91671, -73.94435, -73.82435,
-73.92223, -73.91213, -73.921234, -73.9237, -73.98124, -73.89111,
-73.9246, -73.95118, -73.99832, -73.974945, -74.01572, -73.96822,
-73.89294, -73.976265, -74.11252, -73.85631, -73.969574, -73.92976,
-74.00232, -73.943634, -73.79171, -73.92577, -73.85868, -73.96793,
-73.954735, -73.798386, -73.89073, -73.95504, -73.90405, -73.917366,
-73.94306, -73.83222, -74.16948, -73.945244, -73.90427, -73.98404,
-73.84662, -73.88707, -73.82226, -73.79223, -73.96984, -73.91062,
-74.153404, -73.93864, -73.94956, -73.92366, -74.001434, -73.97076,
-73.9184, -73.97324, -74.02245, -74.1301, -73.898224, -73.96583,
-73.98484, -73.87446, -73.86741, -73.89068, -73.94043, -73.88312,
-73.90731, -73.965485, -73.88799, -73.923065, -73.93121, -73.821365,
-73.92854, -73.90377, -73.83413, -73.98391, -73.97052, -73.8852,
-73.757835, -73.832184, -74.029305, -73.99847, -73.953896, -73.889496,
-73.94398, -73.90094, -73.9339, -73.87439, -73.928185, -73.84989,
-73.930534, -73.9109, -74.00849, -73.90736, -73.95033, -73.95087,
-73.93587, -73.96162, -73.946236, -73.97226, -73.92109, -73.73846,
-73.93829, -73.73921, -73.7921, -73.93496, -73.98163, -73.87552,
-73.90079, -73.843864, -73.759575, -73.741875, -73.770004, -73.8825,
-73.89868, -73.977325, -73.89134, -73.91243, -73.954346, -73.824486,
-73.99238, -73.763954, -73.95855, -73.946884, -73.94153, -73.867714,
-73.89091, -73.94898, -73.90486, -73.71589, -74.08535, -73.89523,
-73.86137, -73.9878, -73.85049, -73.874374, -73.92841, -73.91648,
-73.89022, -73.98587, -73.95341, -73.94989, -73.70625, -73.94623,
-73.98246, -74.01393, -73.97762, -74.01172, -73.961464, -73.94227,
-73.98987, -73.8918, -73.98478, -73.99098, -74.01531, -73.7923,
-73.97655, -73.84574, -73.997086, -73.93483, -73.954636, -74.02003,
-73.92873, -73.9265, -73.984985, -73.90941, -73.97218, -73.886375,
-73.988396, -73.959335, -73.94833, -73.97483, -73.87389, -73.884315,
-73.98773, -73.84531, -73.90394, -73.985504, -73.889915, -73.927284,
-73.87369, -73.93658, -73.961464, -74.01648, -73.761185, -73.87203,
-73.81881, -73.87898, -73.83317, -73.766464, -73.9525, -73.90674,
-73.91066, -73.80789, -73.883995, -73.90864, -73.97917, -73.81517,
-73.83605, -73.98135, -73.877, -73.85589, -73.9622, -73.80114,
-73.97136, -73.92499, -73.93326, -73.94921, -73.877625, -73.93107,
-73.97599, -73.93234, -73.90536, -73.984566, -73.95932, -73.92972,
-73.91313, -73.98424, -73.9264, -73.919106, -73.952446, -73.784294,
-73.8616, -73.80105, -73.9745, -73.895744, -73.94513, -73.87421,
-73.952835, -73.92596, -73.792114, -73.86474, -73.94928, -74.07641,
-73.95817, -73.87652, -73.97833, -73.78149, -73.7334, -73.980896,
-73.848076, -73.9753, -73.95581, -73.968895, -73.90601, -73.87552,
-73.86473, -73.88875, -73.93803, -73.823845, -73.97329, -73.85201,
-73.94641, -73.96389, -73.90668, -73.93177, -73.94386, -73.81853,
-73.9485, -73.955086, -73.990944, -73.79171, -73.97851, -73.98407,
-73.90031, -73.98138, -73.94531, -73.94932, -74.008156, -73.93861,
-73.9689, -73.92923, -73.96444, -73.90224, -73.90076, -73.710754,
-73.92752, -73.93056, -73.960464, -73.972725, -73.86238, -73.936005,
-74.1524, -73.888664, -73.72516, -73.97213, -73.9088, -73.91184,
-73.95699, -73.91959, -73.959435, -73.90019, -73.72826, -73.93637,
-73.91755, -73.91282, -73.85465, -73.776146, -74.00731, -73.98643,
-74.159615, -73.72269, -73.94784, -73.88673, -73.97545, -73.915146,
-73.742516, -73.9862, -73.85165, -73.920586, -74.07225, -73.90313,
-74.09711, -73.86907, -73.92782, -73.95031, -73.81743, -73.90211,
-73.85361, -73.92941, -73.768326, -73.92314, -73.75136, -73.87204,
-73.90896, -73.90614, -73.80186, -73.92401, -73.92512, -74.02136,
-73.94027, -73.99843, -73.74868, -73.95117, -73.73427, -73.89251,
-73.911804, -73.7258, -73.794, -73.890144, -73.94276, -73.91934,
-74.12409, -73.91388, -73.94852, -73.947075, -73.87722, -73.90981,
-73.92726, -74.14523, -73.88209), COLLISION_ID = c(4407147L,
4136992L, 4395664L, 4397513L, 4403773L, 4405244L, 4405914L, 4407366L,
4407778L, 4407461L, 4407407L, 4407900L, 4407760L, 4407746L, 4408143L,
4407638L, 4407958L, 4407885L, 4407616L, 4408038L, 4408224L, 4407392L,
4407765L, 4407821L, 4407971L, 4408071L, 4407430L, 4408259L, 4407592L,
4407674L, 4407708L, 4408396L, 4407152L, 4407862L, 4407636L, 4407792L,
4407853L, 4408205L, 4407945L, 4408118L, 4408242L, 4407563L, 4408098L,
4407169L, 4407798L, 4407797L, 4407349L, 4407994L, 4408032L, 4407478L,
4407924L, 4408315L, 4407892L, 4408280L, 4408403L, 4407753L, 4408003L,
4407497L, 4408229L, 4407525L, 4407817L, 4407539L, 4408306L, 4407830L,
4407282L, 4407688L, 4407701L, 4407728L, 4408052L, 4407849L, 4407320L,
4407291L, 4408200L, 4407649L, 4407802L, 4407345L, 4408356L, 4407245L,
4408057L, 4408332L, 4407785L, 4407929L, 4407425L, 4408080L, 4408123L,
4408290L, 4408412L, 4407757L, 4407770L, 4407554L, 4407873L, 4407502L,
4408044L, 4408165L, 4407544L, 4407277L, 4407834L, 4407338L, 4407397L,
4408109L, 4407683L, 4407829L, 4407496L, 4407609L, 4407689L, 4407861L,
4407350L, 4407721L, 4407381L, 4407653L, 4407896L, 4407914L, 4407512L,
4408427L, 4407532L, 4408086L, 4407856L, 4407729L, 4407789L, 4408129L,
4407576L, 4408193L, 4407643L, 4407906L, 4407414L, 4407623L, 4407436L,
4407952L, 4407761L, 4408063L, 4407388L, 4407766L, 4407901L, 4408104L,
4407486L, 4408264L, 4407678L, 4407355L, 4408155L, 4408271L, 4408380L,
4407598L, 4407866L, 4407452L, 4407809L, 4407393L, 4407453L, 4408214L,
4407824L, 4407431L, 4407632L, 4407841L, 4407096L, 4407658L, 4407933L,
4407976L, 4407714L, 4407373L, 4407568L, 4407707L, 4407692L, 4407865L,
4407154L, 4407637L, 4407354L, 4407793L, 4407432L, 4407946L, 4407852L,
4408249L, 4407987L, 4408215L, 4408027L, 4407389L, 4407562L, 4407794L,
4407779L, 4407997L, 4407408L, 4408313L, 4407747L, 4408402L, 4407702L,
4407762L, 4407297L, 4407321L, 4408100L, 4407505L, 4408047L, 4407805L,
4407697L, 4407367L, 4407837L, 4407548L, 4407820L, 4408150L, 4407842L,
4407673L, 4407278L, 4407459L, 4407514L, 4408367L, 4407313L, 4407396L,
4407611L, 4408053L, 4407905L, 4407642L, 4407556L, 4407774L, 4407360L,
4408139L, 4408223L, 4408075L, 4407967L, 4407875L, 4407915L, 4408258L,
4408033L, 4408308L, 4407869L, 4407732L, 4407602L, 4407959L, 4408397L,
4408067L, 4408279L, 4407723L, 4408198L, 4407501L, 4408119L, 4408359L,
4407816L, 4407543L, 4407801L, 4407346L, 4407769L, 4408093L, 4407475L,
4408113L, 4407833L, 4407923L, 4407424L, 4407741L, 4407891L, 4407647L,
4407909L, 4407439L, 4407604L, 4407752L, 4407380L, 4408037L, 4408416L,
4407549L, 4408002L, 4408227L, 4408262L, 4407897L, 4408187L, 4407975L,
4408164L, 4408085L, 4408250L, 4408028L, 4407580L, 4408302L, 4407400L,
4408128L, 4408194L, 4408289L, 4407953L, 4408391L, 4408204L, 4407870L,
4407524L, 4408353L, 4407445L, 4408056L, 4407327L, 4407784L, 4407928L,
4407454L, 4407372L, 4407444L, 4407756L, 4407620L, 4407412L, 4407633L,
4407119L, 4407660L, 4408154L, 4407773L, 4407567L, 4407938L, 4407713L,
4408208L, 4408043L, 4408233L, 4407990L, 4408079L, 4408217L, 4408331L,
4407698L, 4407825L, 4408122L, 4407838L, 4407948L, 4407843L, 4407515L,
4407312L, 4407314L, 4407489L, 4407656L, 4407332L, 4407506L, 4408401L,
4407677L, 4407359L, 4408266L, 4407696L, 4407261L, 4407857L, 4407490L,
4407788L, 4408379L, 4408103L, 4408076L, 4408049L, 4407527L, 4407836L,
4408274L, 4407547L, 4407815L, 4408375L, 4407279L, 4407331L, 4407686L,
4408398L, 4408066L, 4408309L, 4408068L, 4407691L, 4407353L, 4407433L,
4407679L, 4407566L, 4408026L, 4407651L, 4407296L, 4407826L, 4407368L,
4407804L, 4407242L, 4407858L, 4407795L, 4407347L, 4408059L, 4407715L,
4408114L, 4408312L, 4407922L, 4407342L, 4407748L, 4407619L, 4408297L,
4407763L, 4408005L, 4407504L, 4408231L, 4407049L, 4408106L, 4407542L,
4407523L, 4407428L, 4408257L, 4407595L, 4407672L, 4407832L, 4407384L,
4408387L, 4407790L, 4407612L, 4407775L, 4407864L, 4407379L, 4407156L,
4407634L, 4408389L, 4407724L, 4407942L, 4408197L, 4407847L, 4407403L,
4408120L, 4407703L, 4407668L, 4407570L, 4408101L, 4407800L, 4407917L
)), row.names = c(NA, 400L), class = "data.frame")
I have data frame that contains variable with set of coordinates, which supposed to be a polygon. I’d like to try to convert it into sf geometry polygon, but I have little idea how to achieve this goal.
Data looks like this:
a <- c("[30.523311, 50.40919], [30.523111, 50.409093], [30.522475, 50.408781], [30.522484, 50.408771], [30.523591, 50.407804], [30.524049, 50.407403], [30.526558, 50.406062], [30.526791, 50.405939], [30.527487, 50.4057], [30.527787, 50.405564], [30.528793, 50.405209], [30.528718, 50.404554], [30.530223, 50.404552], [30.530133, 50.404363], [30.529104, 50.404185], [30.529018, 50.403965], [30.528933, 50.403337], [30.529986, 50.403227], [30.531422, 50.403077], [30.531336, 50.402585], [30.531743, 50.402489], [30.531612, 50.401577], [30.531505, 50.401447], [30.531483, 50.401211], [30.531489, 50.40103], [30.531873, 50.400999], [30.531932, 50.400429], [30.531961, 50.400257], [30.531965, 50.400233], [30.532439, 50.400174], [30.533329, 50.400062], [30.533369, 50.399868], [30.533435, 50.399542], [30.533478, 50.39954], [30.534231, 50.399498], [30.534229, 50.399863], [30.5354583263, 50.4005476511], [30.5361664295, 50.4004929412], [30.5361825227, 50.4008211999], [30.5366814136, 50.4008656514], [30.537257, 50.400902], [30.537703, 50.400882], [30.538053, 50.400829], [30.538949, 50.40062], [30.539305, 50.400619], [30.539605, 50.40051], [30.540023, 50.40049], [30.54056, 50.400838], [30.540828, 50.400443], [30.541166, 50.400788], [30.541501, 50.401137], [30.542203, 50.40115], [30.54236, 50.40171], [30.542792, 50.401676], [30.543223, 50.401641], [30.548297, 50.400637], [30.558722, 50.398574], [30.558969, 50.398794], [30.559252, 50.399041], [30.559338, 50.399931], [30.55938, 50.400163], [30.559447, 50.400539], [30.559532, 50.400916], [30.559051, 50.400983], [30.555229, 50.401518], [30.555258, 50.402649], [30.554082, 50.402791], [30.553204, 50.402945], [30.552119, 50.40318], [30.551907, 50.403224], [30.551414, 50.403334], [30.55056, 50.403525], [30.550566, 50.403544], [30.547881, 50.404161], [30.547819, 50.404175], [30.546416, 50.404497], [30.544978, 50.404827], [30.54398, 50.404963], [30.543771, 50.404992], [30.54248, 50.405173], [30.540812, 50.405404], [30.538729, 50.405488], [30.537509, 50.405536], [30.536298, 50.405576], [30.534761, 50.405593], [30.53247, 50.405613], [30.530065, 50.40586], [30.528, 50.406381], [30.527158, 50.406595], [30.525696, 50.407249], [30.52562, 50.407313], [30.525481, 50.40743], [30.525423, 50.407478], [30.525274, 50.4076], [30.524885, 50.407919], [30.524221, 50.408445], [30.523308, 50.409165], [30.52333, 50.409175], [30.523311, 50.40919]")
b <- c("[30.517824, 50.405568], [30.517689, 50.40543], [30.517727, 50.405406], [30.517798, 50.405361], [30.517947, 50.405265], [30.518736, 50.405337], [30.519452, 50.405434], [30.520105, 50.405521], [30.52047, 50.405706], [30.520844, 50.405896], [30.521225, 50.405894], [30.521758, 50.405744], [30.524134, 50.404085], [30.524483, 50.403844], [30.524582, 50.403776], [30.524901, 50.403571], [30.524915, 50.403562], [30.524967, 50.403529], [30.525058, 50.40347], [30.525317, 50.403308], [30.525573, 50.403147], [30.526089, 50.402824], [30.526607, 50.4025], [30.527259, 50.402092], [30.527528, 50.401796], [30.528232, 50.40102], [30.528482, 50.400748], [30.528687, 50.400524], [30.528731, 50.400476], [30.528953, 50.400242], [30.529032, 50.400158], [30.529059, 50.400054], [30.529836, 50.399809], [30.530864, 50.399677], [30.530971, 50.39987], [30.53074, 50.400041], [30.53095, 50.400096], [30.53118, 50.401053], [30.531489, 50.40103], [30.531483, 50.401211], [30.531505, 50.401447], [30.531612, 50.401577], [30.531743, 50.402489], [30.531336, 50.402585], [30.531422, 50.403077], [30.529986, 50.403227], [30.528933, 50.403337], [30.529018, 50.403965], [30.529104, 50.404185], [30.530133, 50.404363], [30.530223, 50.404552], [30.528718, 50.404554], [30.528793, 50.405209], [30.527787, 50.405564], [30.527487, 50.4057], [30.526791, 50.405939], [30.526558, 50.406062], [30.524049, 50.407403], [30.523591, 50.407804], [30.522484, 50.408771], [30.519427, 50.407208], [30.519404, 50.407185], [30.519397, 50.407177], [30.518728, 50.406494], [30.518557, 50.406319], [30.517824, 50.405568]")
polygons<- as.data.frame(c(a,b), ncol=1, nrow=2)%>%
rename(polygon=1)
You need to parse these text strings into numbers, then convert into polygons. This function creates an sfc object that contains the polygons these vectors describe:
polygonise <- function(strings) {
do.call(c, lapply(strings, function(x) {
cutstring <- unlist(strsplit(x, "\\[|\\]"))
cutstring <- cutstring[nchar(cutstring) > 3]
sf::st_sfc(sf::st_polygon(list(do.call(rbind,
lapply(strsplit(cutstring, ", "), as.numeric)))))
}))
}
So, for example, we can do:
polygons$geometry <- polygonise(polygons$polygon)
ggplot(polygons) + geom_sf(aes(geometry = geometry), fill = "forestgreen")
Your solution works perfectly on the example I provided. However, when I try to use your function on real data I get an error ("'MtrxSet(x, dim, type = "POLYGON", needClosed = TRUE)': polygons not (all) closed ". Perhaps my coordinates are flawed? Sorry for that, but I’m pretty new to R.
a <- read_csv("https://raw.githubusercontent.com/slawomirmatuszak/Covid.UA/master/dzielnice.csv")
b<- polygonise(a)
We have citizen scientist recording data for us using In-Situ Aqua troll 600 instruments. It is similar to a CTD but not. The data format is a little different. Different enough that I cannot use CTD trim from the OCE package in R. I need to remove all the rows of data during the soak time (time in the water before they start lowering the instrument) and the up cast from the data. That is all the rows after they reached the max depth. So I just need that center portion of my dataframe.
My Data
Date Time Salinity (ppt) (672441) Chlorophyll-a Fluorescence (RFU) (671721) RDO Concentration (mg/L) (672144) Temperature (°C) (676121) Depth (ft) (671051)
16:29.0 0 0.01089297 7.257619 31.91303 0.008220486
16:31.0 0 0.01765913 7.246986 31.93175 0.1499496
16:33.0 0 0.0130412 7.258863 31.93253 0.5387784
16:35.0 0 0.01299242 7.274049 31.93806 0.6187978
16:37.0 0 0.01429801 7.26965 31.94401 0.6640261
16:39.0 0 0.01342988 7.271608 31.93595 0.681709
16:41.0 0 0.01337719 7.271549 31.93503 0.684597
16:43.0 7.087267 0.007094439 6.98015 31.89018 1.598019
16:45.0 28.3442 0.007111916 6.268753 31.83806 1.687673
16:47.0 31.06357 0.007945394 6.197834 31.77821 1.418773
16:49.0 32.07076 0.0080788 6.166986 31.76881 1.382685
16:51.0 31.95504 0.004382414 6.191305 31.72906 1.358556
16:53.0 36.21165 0.01983912 5.732656 29.3942 123.4148
16:55.0 36.37849 0.02243886 5.626586 28.82502 125.2927
16:57.0 36.43061 0.02416219 5.450325 28.23787 126.7997
16:59.0 36.44484 0.02441683 5.421676 28.14037 127.0321
17:01.0 36.46815 4.510316 5.318929 28.09501 127.2064
17:03.0 36.41381 4.012657 5.241654 28.14595 127.2227
17:05.0 36.42724 0.7891375 5.174401 28.20383 127.2019
17:07.0 36.41064 0.4351442 5.120181 28.18592 127.197
17:09.0 36.38155 0.2253969 5.033384 28.21021 127.1895
17:11.0 36.37671 0.2089337 5.019629 28.21222 127.1885
17:13.0 36.43813 0.08728585 4.981099 28.17526 127.2223
17:15.0 36.47644 0.904435 4.951878 28.13579 127.2108
17:17.0 36.54742 0.1230291 4.93056 28.06166 127.2307
17:19.0 36.60466 10.04291 4.908442 27.9397 126.6003
17:21.0 36.61511 11.33922 4.904828 27.92038 126.5161
17:23.0 36.68179 0.6680982 4.87018 27.78319 123.707
17:25.0 36.74612 0.06539913 4.848994 27.72977 119.906
17:27.0 36.75729 0.02414635 4.826871 27.72545 114.9537
17:29.0 37.1578 0.01556828 4.804105 27.81129 113.3405
> depthmax<- max(WS$`Depth (ft) (671051)`, na.rm = TRUE)
> output <- WS[WS$"Depth (ft) (671051)" < depthmax,]
> Output2 <- output[output$"Depth (ft) (671051)" > 1,]
I tried these and got output2 to work but can't seam to get output to work. Is there a more elegant way to do this? Just to recap I need to remove all rows after the depthmax (127.2307) and all the rows before the depth when they start lowering the instrument (~2.41).
Your code does remove the maximum depth, but not the rows after the maximum depth is reached. You want to locate the row index of the the maximum depth and delete that row and the ones after:
start <- tail(which(na.omit(WS$`Depth (ft) (671051)`) < 2.41), 1) + 1
end<- which.max(na.omit(WS$`Depth (ft) (671051)`)) - 1
output <- WS[start:end, ]
The first line finds the index of the last row less than 2.41 and adds 1 to get the starting row. The second line finds the index of the maximum depth and subtracts 1 to get the row before that.
My question is pretty simple: the cut() function allows to choose the breaks along which I can divide the range of my vector into intervals. I would like to be able to control for the number of observations within the newly created interval, in a way similar to what could be obtained with a quantile argument in the cut() function call. However I don't want to be using the quantile argument because I would like for the intervals to be chosen fixed, so that I can match them between different databases for further comparison, and I want the same discrete values to be found in the labels of the newly cut vectors.
I used to use this for the quantile approach:
df$z<-cut(df$x, quantile(x, (0:10)/10), include.lowest=TRUE)
Which is fairly simple. My new approach is even simpler, so it resembles this for example:
df$z<-cut(df$x, c(0.04,0.055,0.06,0.065,0.07,0.075,0.08,0.085,0.09,0.095,0.11), include.lowest=T)
I then have another variable which I want to calculate some statistics on, according to the levels of the discrete variable.
So it would go something like this :
df$conf.intx<-ifelse(df$z=="1",t.test(df[df$z=="1",]$y)$conf.int[1],
ifelse(df$z=="2",t.test(df[df$z=="2",]$y)$conf.int[1],
ifelse(df$z=="3",t.test(df[df$z=="3",]$y)$conf.int[1],
ifelse(df$z=="4",t.test(df[df$z=="4",]$y)$conf.int[1],NA))))
But for me to be able to calculate this kind of t-test confidence interval on each of the 'pools' of the y values (which number in the same amount as the observations within the intervals of the discrete variable), I need to be able to control for the number of values within each created interval for z, so that my test remains valid, at least as far as the number of observations is concerned.
Simply put, I'd need an automated procedure that would create the vector of breaks for the z variable so that each of them contains a minimum number of observations. As an added complication, it should be the same breaks for two different databases, which I don't know if it's possible.
Any help on the matter would be welcome, thank you in advance.
EDIT: here is a sample of my data for x.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1161917, 7.0606333, 5.2622667, 6.780925, 5.4615417, 6.48185,
5.51585, 6.2224333, 5.3660667, 7.196525, 6.2984083, 7.0137833,
7.4490083, 5.9712333, 6.4287833, 7.6693917, 6.4406417, 5.4135083,
7.16245, 7.2267, 5.820325, 6.066175, 5.760975, 6.4775, 6.2625,
5.5182583, 8.446625, 8.19025, 6.7955333, 4.7899583, 6.5680167,
4.5965917, 6.3539333, 4.6639, 6.0489667, 4.9047833, 5.353625,
4.711425, 6.6268833, 5.5458083, 6.3271917, 6.4591417, 5.1843917,
5.6117167, 7.1828417, 5.6956917, 5.0271917, 6.741875, 6.68305,
4.7859667, 5.3068667, 5.3245, 5.745675, 5.7518917, 5.37945, 8.0030417,
7.7064583, 6.2935333, 5.1838667, 6.9369333, 4.9734583, 6.7257167,
5.0510333, 6.4257667, 5.2858083, 5.7285167, 5.084, 7.0092833,
5.905875, 6.6893417, 6.8319583, 5.5558083, 5.9854833, 7.5552167,
6.064625, 5.3990333, 7.115175, 7.0600167, 5.1644833, 5.6848667,
5.7014417, 6.1051, 6.1186333, 5.7217667, 8.3685417, 8.071325,
6.6547333, 5.5972417, 7.4226, 5.539725, 7.26335, 5.645975, 6.87475,
5.8486167, 6.3001667, 5.5997833, 7.4353167, 6.5089583, 7.213625,
7.3125667, 6.12095, 6.5410083, 8.0639083, 6.6505167, 5.8886417,
7.6301167, 7.5850417, 5.7693667, 6.2480167, 6.1847167, 6.6896167,
6.6323917, 6.1972167, 8.8560333, 8.5501083, 7.1036167, 4.9929583,
6.9839583, 5.3847417, 6.8814417, 5.59555, 6.7867167, 5.7831333,
6.9370917, 5.7400917, 7.6922, 6.3151, 7.084725, 7.0414417, 5.95435,
6.4274167, 7.6692167, 6.9159, 6.0856083, 7.3079583, 7.1937667,
5.744675, 5.946525, 6.0651833, 6.8488833, 6.5924333, 5.772025,
8.3281167, 8.5475917, 6.7952917, 8.248525, 5.1931083, 7.0688917,
5.4793583, 7.0091583, 5.7593, 7.1053333, 5.9382583, 7.1765417,
6.003075, 7.7699833, 6.2757333, 7.2446583, 7.179275, 6.0013083,
6.447975, 7.7845833, 6.9071083, 6.1009, 7.425425, 7.4619083,
5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417, 6.0763917, 8.5926583,
8.7468417, 7.2485167, 8.5096833, 5.1541, 7.0479917, 5.43065,
6.9689083, 5.7356, 7.0842917, 5.9051667, 7.1283333, 5.9666667,
7.7295583, 6.249925, 7.21005, 7.1427167, 5.9675583, 6.4135667,
7.7448583, 6.874275, 6.0679333, 7.388675, 7.429025, 5.911225,
6.1757167, 6.095225, 7.045775, 6.9870833, 6.0567333, 8.5771167,
8.7541917, 7.3187333, 8.5092083, 5.5746, 7.342925, 5.8561667,
7.4704667, 5.922225, 6.9787, 6.1564167, 7.6059667, 5.9122917,
7.7848833, 6.6192, 7.34055, 7.2352417, 5.9776083, 6.5197583,
7.4891583, 7.2185667, 6.4710167, 7.70945, 7.5078083, 6.1470417,
6.66115, 6.6899333, 7.4454083, 7.2270917, 6.350075, 8.3156667,
8.9007917, 6.7578083, 8.3258083, 5.1996, 6.9688833, 5.3592917,
6.7583417, 5.5623583, 6.756375, 5.7361, 7.120425, 5.6567, 7.6174667,
6.1474833, 7.1442167, 6.74475, 5.5820333, 6.0106, 7.142675, 6.667475,
5.9067917, 7.2392, 7.058675, 5.6394417, 5.9119167, 5.8367333,
6.798025, 6.694675, 5.8565917, 8.6035083, 8.912375, 7.0501083,
8.38045, 4.8478083, 6.7493167, 5.3686667, 6.5152333, 5.282025,
6.5464333, 5.5085583, 6.870975, 5.4757667, 7.318, 5.92225, 6.9300417,
6.5758083, 5.4233083, 5.8295583, 7.0451, 6.4790083, 5.68255,
6.9632833, 6.9965833, 5.5005667, 5.717725, 5.5938083, 6.5309,
6.4824583, 5.4429833, 8.072575, 8.3635, 6.5797167, 8.0352333,
4.6289833, 6.64105, 4.8883833, 6.2025833, 5.2291833, 6.4814667,
5.2211083, 6.5780083, 5.196275, 7.030725, 5.6001583, 6.620475,
6.2858333, 5.114375, 5.5424417, 6.7784917, 6.1561333, 5.339375,
6.6249083, 6.6248583, 5.139775, 5.4195, 5.4531833, 6.3348583,
6.4041417, 5.292, 7.6243833, 7.9624583, 6.3226417, 7.761175,
4.8419083, 6.8384083, 5.3500417, 6.5903333, 5.33275, 6.732575,
5.4486, 6.8069417, 5.4569583, 7.26275, 5.835525, 6.8680333, 6.6712333,
5.4720417, 5.904325, 7.1506917, 6.4746833, 5.638675, 6.9570667,
7.0017333, 5.5033667, 5.6859333, 5.651875, 6.5903, 6.529725,
5.4819667, 7.971975, 8.2337833, 6.5815333, 7.9736583, 5.7711917,
7.543325, 5.8986917, 7.5081333, 6.2920333, 7.5321667, 6.4908917,
7.7616583, 6.4509417, 8.08035, 6.8219, 7.7939167, 7.6491333,
6.4773583, 6.9338667, 8.1865583, 7.3998917, 6.572125, 7.9198417,
8.0568, 6.5880333, 6.8299667, 6.7399833, 7.6436, 7.509275, 6.5139833,
9.1520167, 9.3580667, 7.65415, 9.0725167, 5.7483583, 7.5230417,
5.89105, 7.4808833, 6.1969667, 7.4923583, 6.4092583, 7.70695,
6.3970833, 8.0971333, 6.7949083, 7.76445, 7.6170167, 6.4494333,
6.8997, 8.1575333, 7.3728417, 6.544075, 7.888, 8.0215, 6.5484,
6.7911667, 6.7121917, 7.6179083, 7.4731167, 6.4629167, 9.1226333,
9.3307083, 7.6230583, 9.024875, 5.543925, 7.1460833, 5.6575583,
7.5986083, 6.027075, 7.4386167, 6.3500333, 7.6694833, 6.3682583,
8.0843333, 6.7181083, 7.7376, 7.5818583, 6.4010667, 6.8440083,
8.1217917, 7.3290833, 6.5187333, 7.8591667, 7.9898583, 6.5051,
6.7251167, 6.6881333, 7.477675, 7.3571333, 6.3351833, 8.881575,
9.12315, 7.3851, 8.8008667, 5.3437833, 7.1560417, 5.5748, 7.4622583,
5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685, 8.0270917,
6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417, 8.0401167,
7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667, 6.6103333,
7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625, 7.2460917,
8.660125, 5.2502833, 7.2591, 5.6425417, 6.889925, 5.353675, 6.50635,
6.260675, 7.4236583, 5.9076417, 7.3915, 6.2134917, 7.1645333,
6.922675, 6.0295417, 6.1687917, 7.2771083, 6.6152333, 6.3299417,
7.167325, 6.647275, 5.726475, 5.93905, 6.2888583, 6.7497167,
6.4364083, 5.8906583, 7.6052917, 8.039425, 6.5672833, 7.8754667,
6.3086333, 5.352025, 7.2849417, 5.7184833, 6.9675917, 5.5615333,
6.6157917, 6.3505417, 7.4881, 6.0007417, 7.5110583, 6.35525,
7.254075, 7.0289083, 6.1994417, 6.2860833, 7.372575, 6.735975,
6.4628917, 7.3102167, 6.8619417, 5.9123667, 6.1611917, 6.4854083,
6.8942417, 6.563625, 6.0610083, 7.941625, 8.6969167, 6.66075,
8.1197167, 6.2802, 3.9638, 5.870825, 4.1852, 5.5841417, 4.3007583,
5.2352167, 4.4281417, 5.819425, 4.1990917, 5.9338917, 4.89765,
5.7204333, 5.6546833, 4.5632167, 4.9803333, 5.6962417, 5.247725,
4.7092583, 6.0145417, 5.6403917, 4.4016917, 4.7181, 4.5007833,
5.2828917, 5.1314167, 4.7492, 6.777575, 6.9040083, 4.9760583,
6.4471917, 5.0952833, 3.712725, 5.8215333, 4.025725, 5.5635,
4.2354083, 5.143525, 4.4900083, 5.6802417, 4.1214333, 5.8128,
4.7525583, 5.6412583, 5.5534917, 4.487475, 4.8237833, 5.6156917,
5.0573, 4.5755417, 5.8096083, 5.5252083, 4.3145583, 4.5437417,
4.194675, 5.0100833, 4.8972333, 4.590025, 6.6441417, 6.5789417,
4.6947667, 6.1648167, 4.8517333, 3.982925, 5.7966833, 4.1607083,
5.5564833, 4.2557417, 5.2304083, 4.8661333, 5.912875, 4.4988333,
6.03915, 4.9131583, 5.8518667, 5.6578583, 4.773225, 4.8958583,
5.8759833, 5.204725, 4.8961667, 5.9217, 5.58395, 4.5410667, 4.73445,
4.5922333, 5.2517333, 5.0220333, 4.619475, 6.4883667, 6.429175,
4.6796417, 6.3171083, 4.93615, 3.9278833, 5.7590417, 4.1155667,
5.612725, 4.2199833, 5.2126667, 4.805275, 5.8888833, 4.4363,
6.0380083, 4.892, 5.8192083, 5.64205, 4.708825, 4.8751583, 5.833775,
5.2210417, 4.853225, 5.924225, 5.5856583, 4.5386167, 4.7280917,
4.5618, 5.264425, 5.03855, 4.5539, 6.4993, 6.4900667, 4.6749083,
6.2961333, 4.918525, 4.0890583, 6.33385, 4.3470083, 5.9645, 4.6541833,
5.5438667, 4.9556583, 6.1590583, 4.6379417, 6.2876833, 5.2235167,
6.1387167, 6.0547583, 4.9545667, 5.254125, 6.05395, 5.4813417,
4.9971333, 6.2266583, 5.9172833, 4.7275917, 4.9274917, 4.443575,
5.3164917, 5.2507083, 5.1704583, 7.173075, 6.9351583, 5.0816667,
6.5568, 5.3417667, 5.1705167, 7.0777833, 5.6253333, 7.231225,
5.5799167, 6.6942917, 6.1014583, 7.538725, 5.7152667, 7.459275,
6.2406083, 7.064925, 6.9234417, 5.8328833, 6.1819583, 7.2127583,
6.8071583, 6.2599417, 7.2975417, 6.973875, 5.804125, 6.1944667,
6.38855, 7.0553583, 6.8393167, 6.1275417, 7.9986833, 8.5846,
6.4682167, 8.0134583, 6.1805917, 5.0699583, 6.9006667, 5.36365,
6.9204917, 5.4478667, 6.5391583, 6.0647417, 7.2951667, 5.6632833,
7.25595, 6.1057333, 6.9578417, 6.8235583, 5.8671833, 6.0716417,
7.060175, 6.5401, 6.1229417, 7.1305083, 6.7823417, 5.62415, 5.9202,
5.9957167, 6.7142167, 6.4706417, 5.9004667, 7.8304583, 8.2144667,
6.1530583, 7.6896417, 5.9285333, 4.2625417, 5.9677583, 4.58695,
6.0400083, 4.4215333, 5.6052833, 5.04165, 6.48845, 4.6423583,
6.1688833, 5.0256167, 5.926725, 5.7214667, 4.746375, 4.9828,
6.1583083, 5.6903, 5.217375, 6.1341583, 5.7868083, 4.5895333,
4.98235, 5.159725, 5.7866167, 5.6300833, 4.882975, 6.7210833,
7.4314833, 5.2493083, 6.8503833, 5.2225583, 3.8417833, 5.9798,
4.1168583, 5.63415, 4.3311333, 5.0777667, 4.6606833, 5.789425,
4.3565167, 5.9736167, 4.8910667, 5.9445417, 5.699275, 4.6897167,
4.9036083, 5.8767, 5.088675, 4.6224417, 5.8052833, 5.5697167,
4.3237, 4.6084333, 4.2958833, 5.1394417, 5.0137583, 4.7711, 6.771275,
6.5984417, 4.845625, 6.3338083, 5.1370333, 3.1820167, 5.2699667,
3.4827167, 5.0992583, 3.7040583, 4.6358583, 4.1604917, 5.2488333,
3.7522, 5.3774167, 4.2636167, 5.1998167, 5.0456333, 4.051475,
4.289175, 5.1718917, 4.5787083, 4.1461667, 5.2983167, 5.03025,
3.8709333, 4.0917167, 3.731925, 4.5584167, 4.4200333, 4.061375,
6.064225, 6.02975, 4.1590167, 5.6589083, 4.2614833, 3.68695,
5.587375, 3.91725, 5.3387, 4.0061667, 4.9563833, 4.1942, 5.6720583,
3.9584333, 5.6873583, 4.6251, 5.4801417, 5.3975583, 4.2382, 4.6710917,
5.4898083, 5.0469667, 4.4950083, 5.72005, 5.46085, 4.30355, 4.5525917,
4.3681667, 5.1723167, 5.0331417, 4.4793083, 6.5492917, 6.720225,
4.7550917, 6.197775, 4.8082917, 4.09925, 5.986525, 4.3104417,
5.68455, 4.4287167, 5.3555667, 4.5191083, 5.9269833, 4.2695917,
5.9984167, 4.981225, 5.8049917, 5.7680667, 4.5736667, 5.0673583,
5.7443583, 5.2811083, 4.719175, 6.0376667, 5.73875, 4.3947333,
4.8157333, 4.6093417, 5.3906417, 5.2357417, 4.684825, 6.8885583,
7.018425, 5.0878167, 6.5122333, 5.2084, 3.810525, 6.2600083,
3.6246583, 5.7396417, 4.0617917, 5.6724583, 4.2505833, 4.7518417,
4.1232, 6.208375, 4.5881167, 5.252575, 5.71795, 4.0840583, 4.700325,
6.2360333, 4.701725, 3.922525, 5.5162167, 5.6220333, 3.8836833,
4.4883667, 4.5398583)), .Names = "x", row.names = c(NA, -962L
), class = "data.frame")
Assuming I want 30 values per interval (the 'n'), here is the code I used:
df$z<-cut(df$x, seq(30,length(df$x),by=30)/length(df$x), include.lowest=T)
Which gives me:
> table(df$z)
[0.0312,0.0624] (0.0624,0.0936] (0.0936,0.125] (0.125,0.156] (0.156,0.187] (0.187,0.218] (0.218,0.249] (0.249,0.281] (0.281,0.312] (0.312,0.343] (0.343,0.374]
0 0 0 0 0 0 0 0 0 0 0
(0.374,0.405] (0.405,0.437] (0.437,0.468] (0.468,0.499] (0.499,0.53] (0.53,0.561] (0.561,0.593] (0.593,0.624] (0.624,0.655] (0.655,0.686] (0.686,0.717]
0 0 0 0 0 0 0 0 0 0 0
(0.717,0.748] (0.748,0.78] (0.78,0.811] (0.811,0.842] (0.842,0.873] (0.873,0.904] (0.904,0.936] (0.936,0.967] (0.967,0.998]
0 0 0 0 0 0 0 0 0
What I want is a similar result to what I get with quantiles:
df$zbis<-cut(df$x, quantile(df$x, (0:20)/20), include.lowest=T)
table(df$zbis)
[3.18,4.29] (4.29,4.62] (4.62,4.89] (4.89,5.14] (5.14,5.33] (5.33,5.53] (5.53,5.66] (5.66,5.8] (5.8,5.94] (5.94,6.1] (6.1,6.26] (6.26,6.45] (6.45,6.58] (6.58,6.74] (6.74,6.93]
49 48 48 48 48 48 48 48 48 48 48 48 48 48 48
(6.93,7.14] (7.14,7.34] (7.34,7.62] (7.62,8.06] (8.06,9.36]
48 48 48 48 49
Except I'd like this to be reproducible for another database, and so I can't use the quantile function, since I would not get the same intervals on a different database.
SECOND EDIT: here is the second sample from another database. 'x' is the same variable, and they have similar ranges.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1931083, 7.0688917, 5.4793583, 7.0091583, 5.7593, 7.1053333,
5.9382583, 7.1765417, 6.003075, 7.7699833, 6.2757333, 7.2446583,
7.179275, 6.0013083, 6.447975, 7.7845833, 6.9071083, 6.1009,
7.425425, 7.4619083, 5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417,
6.0763917, 8.5926583, 8.7468417, 7.2485167, 8.5096833, 5.177275,
7.09985, 5.6444667, 7.0102417, 5.7303833, 7.0383333, 5.9870583,
7.3342083, 5.9363667, 7.7753333, 6.38355, 7.389575, 7.0396667,
5.889625, 6.29395, 7.51135, 6.940925, 6.1455417, 7.4281833, 7.4657167,
5.9707083, 6.1902083, 6.0936167, 6.9595167, 6.85065, 5.8525,
8.5148083, 8.805625, 7.00665, 8.4457, 5.3437833, 7.1560417, 5.5748,
7.4622583, 5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685,
8.0270917, 6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417,
8.0401167, 7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667,
6.6103333, 7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625,
7.2460917, 8.660125, 3.614125, 5.6345917, 3.9410417, 5.2901417,
4.0147333, 4.766825, 4.4500417, 5.5189, 4.11375, 5.6350667, 4.5756917,
5.5998833, 5.3663, 4.44405, 4.5767417, 5.552025, 4.847425, 4.4382583,
5.5769417, 5.2390667, 4.0610917, 4.4054833, 4.1917, 4.9029083,
4.6935917, 4.3499417, 6.0562333, 6.081225, 4.45855, 6.0121583,
4.740275, 4.5028, 6.4177833, 4.8716417, 6.1469917, 4.6208917,
5.7748083, 5.4530083, 6.694125, 5.0944333, 6.5123167, 5.3257083,
6.2765333, 6.0149167, 5.1815583, 5.30715, 6.4149083, 5.82245,
5.515425, 6.3654333, 5.8472833, 4.9798917, 5.1833583, 5.5210333,
6.0410667, 5.7377917, 5.2666083, 7.0378167, 7.744175, 5.718725,
7.3220583, 5.24325, 5.3256, 7.2155167, 5.696925, 7.0029667, 5.5235,
6.7261083, 6.2810667, 7.546825, 5.90915, 7.3299167, 6.2227333,
7.147075, 6.9142417, 6.0012083, 6.1725333, 7.29815, 6.7, 6.3454583,
7.2129583, 6.7559833, 5.8115, 6.0756667, 6.458225, 6.9969167,
6.778825, 6.2245833, 8.0809583, 8.875325, 6.7210917, 8.3203,
6.3513, 5.2591333, 7.1404917, 5.6266417, 6.9356, 5.4568, 6.6604,
6.206025, 7.48525, 5.8323667, 7.24635, 6.1446583, 7.066275, 6.8334,
5.9198667, 6.09505, 7.2206583, 6.63085, 6.270075, 7.1397333,
6.689125, 5.7441333, 6.042575, 6.38255, 6.9325833, 6.7175667,
6.1592, 8.00415, 8.8051167, 6.647125, 8.2465667, 6.2788167, 6.49435,
8.1847583, 6.664475, 8.0528583, 6.6822417, 7.376, 7.1517833,
8.2306833, 6.8584583, 8.3052167, 7.288375, 8.2758583, 7.7162583,
7.2807833, 7.0459, 8.2507833, 7.5855, 7.0505917, 8.2230167, 8.1669,
6.8184667, 6.9700583, 7.0936167, 7.7615667, 7.6239083, 7.0921667,
9.02585, 9.3416167, 7.6256333, 9.0869333, 8.0984667, 4.116325,
6.1680917, 4.56965, 5.797725, 4.36085, 5.42455, 5.144075, 6.1531833,
4.77825, 6.2533417, 5.0192083, 5.99395, 5.6934083, 4.9074167,
4.9823083, 5.9861667, 5.4068833, 5.1872833, 6.10095, 5.659325,
4.6632833, 4.86315, 5.221775, 5.5878, 5.3217083, 4.8202333, 6.4883083,
6.69355, 4.952075, 6.7075583, 5.00015, 5.2502833, 7.2591, 5.6425417,
6.889925, 5.353675, 6.50635, 6.260675, 7.4236583, 5.9076417,
7.3915, 6.2134917, 7.1645333, 6.922675, 6.0295417, 6.1687917,
7.2771083, 6.6152333, 6.3299417, 7.167325, 6.647275, 5.726475,
5.93905, 6.2888583, 6.7497167, 6.4364083, 5.8906583, 7.6052917,
8.039425, 6.5672833, 7.8754667, 6.3086333, 5.352025, 7.2849417,
5.7184833, 6.9675917, 5.5615333, 6.6157917, 6.3505417, 7.4881,
6.0007417, 7.5110583, 6.35525, 7.254075, 7.0289083, 6.1994417,
6.2860833, 7.372575, 6.735975, 6.4628917, 7.3102167, 6.8619417,
5.9123667, 6.1611917, 6.4854083, 6.8942417, 6.563625, 6.0610083,
7.941625, 8.6969167, 6.66075, 8.1197167, 6.2802, 3.9638, 5.870825,
4.1852, 5.5841417, 4.3007583, 5.2352167, 4.4281417, 5.819425,
4.1990917, 5.9338917, 4.89765, 5.7204333, 5.6546833, 4.5632167,
4.9803333, 5.6962417, 5.247725, 4.7092583, 6.0145417, 5.6403917,
4.4016917, 4.7181, 4.5007833, 5.2828917, 5.1314167, 4.7492, 6.777575,
6.9040083, 4.9760583, 6.4471917, 5.0952833, 3.712725, 5.8215333,
4.025725, 5.5635, 4.2354083, 5.143525, 4.4900083, 5.6802417,
4.1214333, 5.8128, 4.7525583, 5.6412583, 5.5534917, 4.487475,
4.8237833, 5.6156917, 5.0573, 4.5755417, 5.8096083, 5.5252083,
4.3145583, 4.5437417, 4.194675, 5.0100833, 4.8972333, 4.590025,
6.6441417, 6.5789417, 4.6947667, 6.1648167, 4.8517333, 4.1059833,
5.9023167, 4.2812417, 5.6593917, 4.3587583, 5.3359583, 4.983275,
6.0223417, 4.6178333, 6.1545333, 5.0244667, 5.9596, 5.7608833,
4.8875333, 4.9990583, 5.9919333, 5.3157417, 5.0169333, 6.024775,
5.6717167, 4.6372083, 4.8370583, 4.7311333, 5.3704, 5.133575,
4.7174917)), .Names = "x", row.names = c(NA, -455L), class = "data.frame")
Updated after some comments:
Since you state that the minimum number of cases in each group would be fine for you, I'd go with Hmisc::cut2
v <- rnorm(10, 0, 1)
Hmisc::cut2(v, m = 3) # minimum of 3 cases per group
The documentation for cut2 states:
m desired minimum number of observations in a group.
The algorithm does not guarantee that all groups will have at least m observations.
The same cuts for separate variables
If the distributions of your variables are very similar you could extract the exact cutpoints by setting the argument onlycuts = T and reuse them for the other variables. In case the distributions are different though, you will end up with few cases in some intervals.
Using your data:
library(magrittr)
library(Hmisc)
cuts <- cut2(df1$x, g = 20, onlycuts = T) # determine cuts based on df1
cut2(df1$x, cuts = cuts) %>% table
cut2(df2$x, cuts = cuts) %>% table*2 # multiplied by two for better comparison
This is a good example of how NOT to pose a question. At last we have an example an, it is possible to post code that applies to it. (You apparently naively pasted the exact code in my comment without thinking about how to express 'n' and 'N' in the context of the problem. I did need to add prob=c( seq(...) , 1) in order to capture the highest values.
This assumes that you want groups of size 100 (although it is still very unclear why this is needed).
x$xct <- cut( x$x, breaks=quantile(x$x, prob=c( seq(100, length(x$x), by=100)/length(x$x) , 1) ))
table(x$xct)
(4.64,5.17] (5.17,5.57] (5.57,5.85] (5.85,6.17] (6.17,6.51] (6.51,6.85]
100 100 100 100 100 100
(6.85,7.26] (7.26,7.94] (7.94,9.36]
100 100 62