Divide data frame in three subframes based on values - r

I have a data frame like below. I would like to divide this into three submatrices bases on the average row.values. Basically I need to first calculate the average row.value and then divide the data frame based on the ID and the average row.value. Any ways to do this directly in one command?
dput(head(matrix_cpm_spike_norm_mature,6))
structure(c(3.60338983255681, 2.60614455986074, 7.38026015357423,
0.485930325361538, 10.0728954273074, 11.2022336797709, 2.3081852592818,
1.92925594571705, 4.42105830784866, 1.97655358070322, 11.4414272084792,
10.5245987429194, 3.79928941804137, 1.43988964742944, 2.43182718729225,
1.85054706660746, 10.6157635199152, 9.93246596282924, 1.76499400158053,
0.867000595584972, 4.17937486393757, 1.62313934970233, 10.9760540633484,
11.4384402909535, 3.12565733850373, 1.02306009758869, 6.15822532096255,
1.45585403170949, 11.1021517437179, 10.9107307780349, 1.84965418527454,
1.57953239648466, 3.92960542264649, 0.968010692772174, 11.0236055961936,
10.5913003928287, 4.16812971357324, 2.58284099400479, 5.77874217548982,
2.71612416959378, 11.6945755132658, 12.2125038706251, 2.28044423317132,
2.28044423317132, 4.20181017967427, 1.93489752637374, 12.0201280590592,
11.4757936894966, 3.29573463740003, 3.1840621322311, 5.6701373082146,
1.96630490538038, 10.7374381660113, 13.2995125381539, 2.91175170429137,
2.1569190504634, 5.20328658455966, 2.1569190504634, 10.2217040585341,
9.83559070490264, 3.93309437957696, 2.51818736936141, 7.40221631369804,
0.335828139281744, 10.56613413102, 10.6619866025163, 2.27796568023663,
1.66056419670379, 5.29496497910422, 0.699488415484219, 10.310084339381,
9.75424847574386, 3.64268206121968, 2.37908914926602, 8.24035280411754,
1.86474527822428, 8.96352397286089, 10.2227828989983, 2.33835251965787,
2.06871044842623, 4.5954355631791, 1.01279126170234, 12.3601973455172,
11.4656151558222, 2.44441563566059, 2.40421119777587, 6.02973594460351,
1.52508878726709, 10.4309455175175, 9.04837239279591, 2.19268157249804,
1.24045869709133, 3.09706859211401, 1.13115919913948, 10.7933496311904,
9.9695518430138, 2.92171136565383, 0.408245399247249, 2.04506253637125,
1.21781706285458, 11.799265629373, 10.0589147547578, 2.01873582708745,
1.46069392909814, 3.53893363843314, 1.196243674246, 11.337876323565,
11.2124904638672, 2.52480547770426, 1.46864391283124, 4.93028527141748,
0.732496016009805, 10.1639221283662, 9.76884595465318, 1.44174154410475,
1.64024858387856, 5.63721432238654, 1.72991175679785, 11.1211504054224,
10.6024617121637, 3.22964454373919, 1.02206068958127, 6.97852650980799,
1.30645573502801, 12.8991256257565, 13.0000814142899, 2.826661275181,
1.2548377150201, 4.36474368941959, 2.12408033247534, 12.4705231651432,
11.9618071621679, 2.69847153335308, 2.27631143253466, 4.20951126623905,
-0.0664422382974684, 8.98849795367716, 9.41468863016541, 2.43225940352207,
2.09595624532069, 4.63715429950104, 1.25491429244567, 10.9872971587847,
10.4925408800726, 3.45374293794281, 3.13429367766325, 7.39158804595222,
1.54024232057877, 10.8542522087055, 10.6155963202093, 2.69223465587794,
0.809912072570472, 4.64506635251942, 1.6137442296552, 11.418571423468,
12.2726701702859, 2.77267844825925, 1.78512859921307, 9.14121538410264,
-4.43688074939158, 9.73718972990295, 10.2957742347039, 1.49118640848005,
1.5813150312641, 4.0667005226673, 0.59204571815906, 10.0375004920899,
9.54533950930523, 2.16345862609174, 3.05436004610877, 8.07790690254631,
1.79368307261622, 12.3560914214054, 13.0051227140041, 2.58124788178095,
1.88617681546547, 2.73360657353678, 1.77592657048602, 10.9528780864007,
10.4348473225329, 3.00497812997592, 1.7631510316129, 5.90700548290542,
0.877605937609275, 9.87041660889873, 9.7614993662457, 2.05885671172016,
1.76410360006183, 6.3570619407495, 1.76410360006183, 11.685705297838,
11.3602560106241, 2.82800833596723, 2.03247959283134, 6.95870308054991,
1.27136597104121, 10.8372565515699, 9.81173432822107, 2.98238612294936,
1.83072752194121, 5.68689117080157, 1.16721273421709, 12.1340028664354,
11.3680995516162, 3.39653560487816, 2.69687166650826, 6.48412064518055,
1.29454179328719, 10.0028066556529, 10.512440462176, 1.4596941835747,
0.727974609981551, 3.88239656448561, 0.800353161112983, 10.6363990565411,
10.1243371999547, 3.3438733844774, 0.697114163823588, 8.47892011973568,
0.697114163823588, 10.6855866660417, 10.6118837404499, 2.13674190430653,
1.1527414454512, 4.18464270261989, 1.27853746250223, 10.7320904309318,
9.89189546554829, 3.823237482187, 0.874031743370891, 1.95508453030069,
1.87846554046761, 9.88713195004868, 9.86183499157279, 2.53517059781643,
0.728446508864785, 3.50839578521561, 1.9965292874828, 10.9487017925789,
10.9382935538901, 2.62649833974985, 1.92098108152647, 3.36108007152541,
0.488100205530239, 10.4894419819966, 10.2226664777547, 1.33967056255352,
0.0917001628908314, 3.81741552255672, 2.03856822618478, 12.2392733147047,
11.4938478602584, 3.25046552747648, 1.48275456354609, 4.81128734606174,
1.96154924623549, 12.1358994156854, 11.4729501435715, 2.22982247734829,
1.68173260626794, 5.2904025048379, 1.68173260626794, 11.5137150099059,
11.010850144881, 3.91979174751635, 2.17257846100131, 3.53963311590271,
1.6107186758943, 11.4951784786694, 10.1260992493434, 2.35254220581856,
1.26444532526964, 4.6646553023283, 1.17529994013973, 10.9857870198993,
10.5814974795421, 3.75054671795939, 3.03029499770489, 9.00153489042535,
3.27015511904915, 11.5616951026666, 10.6545660172833, 2.18325656314713,
1.50344897782345, 3.97329240591904, 1.50344897782345, 11.3467527232018,
10.707768513033, 2.7932432629389, 1.70473241476131, 5.05340701550577,
0.547092257741687, 11.7624630621943, 9.91053139355945, 2.80574376605516,
1.63874222935751, 3.92143952780207, 1.834040048536, 11.4458009229089,
10.9155922874574, 2.81245930319929, 1.47154823046698, 4.17913709197299,
0.836392585913434, 12.5796307717185, 11.6787505410903, 1.7257545501894,
0.949499417203444, 5.34316953950321, 1.63528985628147, 12.2668256759477,
11.696888238442, 2.43827755616949, 1.81088892324984, 5.13790770293989,
1.62918292819459, 10.2635519811317, 10.3028832599596, 2.07789375455372,
1.08266801489579, 3.36924723755809, 0.361290751749976, 9.978530962904,
9.62993954501451, 3.46479867592192, 2.12156177109216, 8.78601850725548,
1.28447344131806, 11.5362902737332, 11.569001443705, 1.88432186226604,
0.738852102905189, 3.93512805821853, 2.28505136817535, 12.2366325701317,
11.5419247012961, 2.72781971767877, 0.981047395388437, 2.42660975369917,
1.39151946304238, 9.47444971798845, 10.1389766220257, 1.9620421090479,
1.31100308652463, 3.9625997590393, 1.31100308652463, 11.0538785110822,
10.5756552873608, 2.24522309004463, 1.98668802513104, 4.91591774317019,
1.41382278476808, 10.9307830177181, 9.75061291019652, 2.89312477978526,
1.41138577836645, 5.3114727278592, 1.18511963455519, 12.8070122897496,
11.9900344090709, 3.27111851368705, 0.108008163568448, 3.27111851368705,
1.04663775582004, 10.5015959502624, 10.9439135408891, 1.96611160593682,
1.75218878170152, 1.96611160593682, -0.675585494894112, 11.1733364353258,
10.5537870233171, 2.76365773789404, 2.24622484994411, 4.82500775257594,
0.859189165848259, 11.3729184708685, 10.4792280890542, 2.2746810191026,
1.26862765095299, 3.1182053649793, 1.35619562745757, 11.541422194603,
11.0876007441211, 2.93955281780494, 2.41792430951958, 7.43488775671718,
1.59455665002896, 10.8522501620988, 10.7066940648954, 2.56175708272954,
1.63789352792862, 5.62616733093608, 1.77321879142546, 11.5341769241796,
11.3614738869516, 3.52512139810446, 2.04455677275519, 9.07892025450834,
1.80422229584867, 12.2658423589039, 12.7955854152128, 2.41662572851679,
0.739356302189606, 6.58079778253096, 2.2056509250739, 11.4453206203817,
11.2577755286648, 4.43488006430324, 2.47577649074622, 7.97026648668841,
2.47577649074622, 10.5347452823169, 10.9336199647057, 0.253851439446271,
1.33349021364545, 1.90668843852765, 1.38668924767966, 11.529988569981,
9.68152140791945, 2.69321031115829, 1.69780604669796, 9.9826122018356,
-0.0693106968443571, 11.2474341893491, 10.2586104182553, 2.30251678501683,
1.67135129150304, 5.1970813744178, 1.88064524158918, 11.4072019068968,
10.9762764532365, 2.73053313404777, 2.92486117981077, 9.751901699517,
2.31573050368619, 11.0745724079847, 11.5397308498818, 1.85114298699175,
1.74446232598571, 4.67405729933025, 1.36771765795199, 11.1449736215866,
10.6467965079355, 3.12408501589049, 2.00482310148083, 4.90897925267742,
0.753058050023766, 10.3589506698824, 10.3513173985724, 2.72147165969401,
1.46186080167698, 3.9354821726169, 1.23699756995357, 11.1881701541595,
10.931341142772, 2.99260441125988, 2.23826963603629, 3.87458928218141,
0.65507014009726, 9.74342426391136, 10.8918928191143, 2.78046014914365,
0.505490227190171, 4.43327034269568, 1.9329054288394, 11.6078716913683,
10.9011106771378, 2.92822672503857, 1.76286882344236, 6.82012516033837,
1.49833750295022, 9.57121023826284, 8.39324323219324, 2.81052780333747,
1.40588646539131, 3.18875457138522, 1.18481091851206, 11.462231229116,
10.5770583597432, 3.30911666206261, 1.33469191841737, 4.96029091396055,
2.17413398672633, 10.5583178628059, 10.0329059364534, 1.88186145438922,
0.989773711428552, 5.0108103947152, 0.769365496730378, 11.7867744116038,
11.2871021676365, 2.88127273360004, 1.87947399714945, 2.88127273360004,
1.92593791384916, 9.46125879965226, 9.29462968277333, 2.36391085778722,
1.74079268896779, 3.93604270788606, 1.34641095936402, 10.2068762533305,
10.0118036485061, 2.57361661875718, 2.03528106162882, 4.33420919282757,
1.1624557079541, 11.0754175076428, 11.0622083554009, 1.93155319127651,
1.52032894976282, 4.46324520367219, 1.52032894976282, 11.3476141375944,
10.4138620742203, 3.85081675794147, 2.74371623100224, 1.11770920823725,
1.98482146279348, 12.566434472332, 11.9424300470872, 2.53925962537711,
1.48382075377967, 4.87249055114745, 1.07284442662947, 11.7111768375923,
11.5795125663534, 2.15535566170556, 1.6234973786264, 7.72868868952004,
0.562556076894618, 9.85480133708155, 9.92934562704339, 2.54728578873058,
1.8265104903494, 3.76801234500258, 1.2821466213423, 11.2568527058396,
10.8343412926481, 3.05211979795212, 0.152264838074353, 2.54493306112919,
1.77792835724608, 11.5338955198532, 9.97573017264385, 2.42059823105591,
0.60936602148068, 3.12812928480832, 1.14355342605761, 11.5524961118039,
10.5285363661307), .Dim = c(6L, 92L), .Dimnames = list(c("hsa-miR-671-5p",
"hsa-miR-3909", "hsa-miR-1247-5p", "hsa-miR-628-3p", "hsa-miR-23b-3p",
"hsa-miR-127-3p"), c("100G", "100R", "106G", "106R", "122G",
"122R", "124G", "124R", "126G", "126R", "134G", "134R", "141G",
"141R", "167G", "167R", "185G", "185R", "192G", "192R", "235G",
"235R", "239G", "239R", "243G", "243R", "246G", "246R", "261G",
"261R", "267G", "267R", "26G", "26R", "270G", "270R", "279G",
"279R", "299G", "299R", "301G", "301R", "305G", "305R", "342G",
"342R", "350G", "350R", "356G", "356R", "35G", "35R", "361G",
"361R", "366G", "366R", "367G", "367R", "377G", "377R", "379G",
"379R", "388G", "388R", "400G", "400R", "402G", "402R", "46G",
"46R", "48G", "48R", "55G", "55R", "57G", "57R", "60G", "60R",
"68G", "68R", "70G", "70R", "73G", "73R", "77G", "77R", "82G",
"82R", "93G", "93R", "94G", "94R")))

I guess we need cut to split up the rows of the matrix after getting the rowMeans
v1 <- rowMeans(m1)
v2 <- cut(v1, breaks=3)
v2
#[1] (5.52,8.48] (11.4,14.4] (5.52,8.48] (11.4,14.4] (5.52,8.48] (5.52,8.48]
#Levels: (5.52,8.48] (8.48,11.4] (11.4,14.4]
We split the sequence of rows of the matrix ('m1') by 'v2', subset the rows of 'm1' based on the row index to create a list of 3 matrices.
lst <- lapply(split(1:nrow(m1), v2), function(i) m1[i,])

So here is a different approach to grouping by row-means. This uses kmeans clustering on the row-means to divide based on the closeness of the values. I think this is more in line with your description "low", "med", "high".
Your data set as provided is a matrix with 6 rows and 96 columns (the ID's are in the row.names.
df <- data.frame(ID=rownames(df),df)
library(data.table)
setDT(df)[, rm:=rowMeans(.SD), .SDcols=2:ncol(df), by=ID]
df[,grp:=kmeans(rm, centers=3)$cluster]
df[,list(ID,rm,grp)]
# ID rm grp
# 1: hsa-let-7a-3p 5.819544 1
# 2: hsa-let-7a-5p 14.396980 3
# 3: hsa-let-7b-3p 5.526127 1
# 4: hsa-let-7b-5p 11.548722 3
# 5: hsa-let-7c-5p 7.881395 2
# 6: hsa-let-7d-3p 6.368912 1
result <- split(df,df$grp) # split into 3 data.tables based on group
You can see that 5.5, 5.8, and 6.3 are in the first group, 7.9 is in the second group, and 11.5 and 14.4 are in the 3rd group.

Related

radius in nn2() function in RANN r-package

I was trying to use the solution offered here to find all the location from df which are within the 70 km distance from my point of interest userLocation=c(6.9,55.2), but it does not work properly !
df = structure(list(lng = c(6.2694184, 6.25737207, 6.23839104, 6.25844252,
6.22595901, 6.21351832, 6.2010845, 6.1886414, 6.1762058, 6.1637609,
6.15132287, 6.13887619, 6.12643637, 6.14361895, 6.16332364, 6.18302157,
6.2027276, 6.22242688, 6.24213488, 6.26842752, 6.26745135, 6.24518597,
6.26645948, 6.24420242, 6.22357831, 6.26548171, 6.24321746, 6.2226023,
6.20041884, 6.18070459, 6.16099845, 6.16716672, 6.17960629, 6.18686265,
6.2078525, 6.19203657, 6.20447434, 6.21691835, 6.2293537, 6.24179593,
6.26009321, 6.26448764, 6.2422317, 6.21927538, 6.20186455, 6.26350828,
6.24124514, 6.22028969, 6.26251321, 6.2402584, 6.23404584, 6.26153227,
6.22171658, 5.94065657, 6.10363006, 6.11606487, 6.12850589, 6.14093826,
6.15337749, 6.16582359, 6.17826103, 6.19070472, 6.20313974, 6.20009703,
5.96044213, 5.96988333, 5.98023582, 5.98966667, 5.99910246, 6.00003829,
6.00947365, 6.01889843, 6.02832882, 6.01983402, 6.02925771, 6.038687,
6.0481219, 6.05754688, 6.03963788, 6.04906608, 6.05848435, 6.06792377,
6.07735326, 6.08677283, 6.05941948, 6.06885218, 6.07829049, 6.08771889,
6.09713671, 6.10657633, 6.11600538, 6.07922538, 6.08864707, 6.10756108,
6.12000483, 6.13243993, 6.12019786, 6.14488189, 6.15733073, 6.16977091,
6.16621949, 6.13805015, 6.13652024, 5.941545, 6.20491484, 6.18423897,
6.17806466, 6.16355552, 6.15738696, 6.14558294, 6.14286638, 6.13670293,
6.12217027, 6.11601258, 6.10148275, 6.09533146, 6.08080511, 6.07464337,
6.06011984, 6.03729438, 6.05394895, 6.02546329, 6.0136389, 6.03674112,
6.05743408, 6.07812006, 6.09879971, 6.11948795, 6.11063647, 6.08914275,
6.08440881, 6.0018212, 6.02491713, 5.98999461, 6.01308427, 5.97815849,
6.00125809, 5.96632973, 5.98943792, 5.9995124, 6.02119838, 6.04364466,
6.0223476, 6.04560587, 6.03821257, 6.06131821, 6.06046748, 5.97888909,
5.95766873, 6.24771247, 6.04931495, 6.25538943, 6.23227728, 6.25434093,
6.25329159, 6.25225759, 6.25120656, 6.25015469, 6.24911757, 6.06338238,
6.08539205, 6.10756976, 6.12975108, 6.15193667, 6.17411029, 6.19630377,
6.21848591, 6.22602495, 6.23123663, 6.20931486, 6.23019515, 6.20826628,
6.22915282, 6.20721685, 6.22810966, 6.21962063, 6.20209266, 6.20618216,
6.19702482, 6.1799057, 6.15772301, 6.13554395, 6.11336914, 6.09118237,
6.09738412, 6.11958004, 6.12698723, 6.14767387, 6.16835417, 6.18613747,
6.185096, 6.165456, 6.14476821, 6.15765091, 6.23561071, 6.08001353,
6.22353732, 6.2376767, 6.21143885, 6.19936347, 6.18727866, 6.17520066,
6.16311385, 6.15103386, 6.13894506, 6.12686243, 6.11478725, 6.10270261,
6.09818625, 6.12128852, 6.2468456, 6.22571713, 6.24558662, 6.22445138,
6.24434288, 6.22320086, 6.24308194, 6.22194875, 6.24182062, 6.22068065,
6.24057332, 6.21942655, 6.2113264, 6.22341814, 6.19699748, 6.18490568,
6.1988361, 6.17283631, 6.16074252, 6.14867115, 6.13657473, 6.13954049,
6.16263694, 6.18482009, 6.20327221, 6.20009595, 6.19278885, 6.17005571
), lat = c(54.67598304, 54.83924292, 54.83162024, 54.82483795,
54.82033259, 54.80904336, 54.79775292, 54.78646988, 54.77517665,
54.76389082, 54.75260377, 54.74131515, 54.73002531, 54.72096456,
54.71392047, 54.70687309, 54.69983176, 54.69278713, 54.68573957,
54.68934722, 54.7027117, 54.69910571, 54.71606682, 54.71246092,
54.70614626, 54.72943123, 54.72582507, 54.71951053, 54.71576339,
54.72280423, 54.72985112, 54.74274399, 54.75402944, 54.73569581,
54.72983408, 54.7653223, 54.77660496, 54.78789538, 54.79918423,
54.81047187, 54.80230996, 54.74279524, 54.73918917, 54.74155047,
54.75043676, 54.75615956, 54.75255324, 54.75849353, 54.76951451,
54.76590829, 54.77879358, 54.78287875, 54.84106585, 54.79004116,
54.73264696, 54.7439301, 54.755221, 54.76651031, 54.77779842,
54.78908531, 54.80037062, 54.81166369, 54.82295519, 54.83631649,
54.78306731, 54.79535153, 54.77609951, 54.7883729, 54.80065457,
54.76912877, 54.78140068, 54.7936805, 54.80595963, 54.7621547,
54.77443373, 54.78671208, 54.79898973, 54.81126633, 54.75518666,
54.76746422, 54.77974071, 54.7920169, 54.80429202, 54.81656608,
54.74821493, 54.76049101, 54.7727664, 54.78504073, 54.79732298,
54.80959594, 54.82187682, 54.74124062, 54.75351485, 54.76118719,
54.77247897, 54.78376916, 54.79513973, 54.79505815, 54.80634591,
54.8176321, 54.8309898, 54.82587825, 54.81251828, 54.80340625,
54.85043439, 54.85669012, 54.84379843, 54.8629602, 54.85006754,
54.83850747, 54.86921769, 54.85633303, 54.87548056, 54.86259492,
54.88174916, 54.86885358, 54.88800555, 54.8751176, 54.89427628,
54.89688611, 54.88137801, 54.88534048, 54.87379377, 54.87235508,
54.86608859, 54.85982748, 54.85356275, 54.84730376, 54.83491117,
54.82992904, 54.84309445, 54.86224596, 54.86080945, 54.85069668,
54.84926234, 54.8391549, 54.83771415, 54.82760306, 54.82617384,
54.81401612, 54.81866848, 54.82193287, 54.83203, 54.85454492,
54.8418856, 54.84463218, 54.83126946, 54.807705, 54.81314447,
54.95082492, 54.90870481, 54.85261135, 54.8544958, 54.86597391,
54.87933643, 54.89269028, 54.90605271, 54.91941509, 54.93277779,
54.91930405, 54.92337651, 54.92712528, 54.93087901, 54.93462874,
54.93838307, 54.94213376, 54.94588009, 54.93325056, 54.86785843,
54.86365241, 54.88122101, 54.87701472, 54.89458354, 54.890377,
54.90794604, 54.9204011, 54.92916856, 54.90373959, 54.91606989,
54.92541899, 54.92166542, 54.91791682, 54.91416421, 54.91041621,
54.89753741, 54.90128459, 54.88863077, 54.8823668, 54.87609923,
54.88468492, 54.89804726, 54.89094652, 54.89720452, 54.90830415,
55.08370977, 54.93641839, 55.07226944, 55.06170442, 55.06083624,
55.04939327, 55.03795778, 55.02652115, 55.01508302, 55.00364374,
54.99220296, 54.98077001, 54.96932695, 54.95789135, 54.94469353,
54.94333719, 54.96418243, 54.9585686, 54.97754901, 54.9719349,
54.99090692, 54.98529252, 55.00427341, 54.99865908, 55.01763087,
55.01201625, 55.03099761, 55.02538271, 55.03790915, 55.04934232,
55.02208092, 55.01064489, 54.99998971, 54.99920808, 54.98776941,
54.97632995, 54.9648976, 54.95153594, 54.95007267, 54.95382515,
54.96186591, 54.98662348, 54.97394968, 54.97118391)), class = "data.frame", row.names = c(NA,
-238L))
What I have done is as follow :
Add the point of interest to the beginning of df
df = rbind(userLocation,df)
Set the radius to 0.64 since according to here, every 0.1 is equivalent to 11.1 km !
radius <- 0.64
#Identifying neighbors
res <- nn2(df, k=nrow(df), searchtype="radius", radius = radius)
Since my point of interest is the first row in df I would expect all the non zero index in the first row are the points within my 70 km threshold
Ind <- res$nn.idx[1,][res$nn.idx[1,]>0]
My Ind object has just one value!
Ind
[1] 1
but if I plot the data, all of the points are within 70 km distance :
I would appreciate it if someone could help me here.

Filtering a large named list based on matches to a data frame

I don't work with lists in R often, so I'm sure there is a simple solution here. I am working with a large, named list of KEGG pathway IDs (test1). Within each KEGG pathway ID (koXXXXX) is a list of every gene within that pathway (K#####). I have a selection of important genes (test2) and their associated KEGG IDs (test2$kegg_id; K#####). I'd like to filter test1 to include only KEGG pathway IDs that contain at least one matching $kegg_id from test2 (i.e. contains a matching test2$kegg_id value). I'd like to retain all of the information from test_1, but just for pathways that have a matching K##### in test2$kegg_id.
I'd then like to create a character vector of just those KEGG pathway IDs.
Here is a subset of the data:
dput(test1)
list(`ko00970 Aminoacyl-tRNA biosynthesis` = c("K00604", "K01042",
"K01866", "K01867", "K01868", "K01869", "K01870", "K01872", "K01873",
"K01874", "K01875", "K01876", "K01878", "K01879", "K01880", "K01881",
"K01883", "K01884", "K01885", "K01886", "K01887", "K01889", "K01890",
"K01892", "K01893", "K02433", "K02434", "K02435", "K03330", "K03341",
"K03865", "K04566", "K04567", "K06868", "K07587", "K09482", "K09698",
"K09759", "K10837", "K11627", "K14163", "K14164", "K14218", "K14219",
"K14220", "K14221", "K14222", "K14223", "K14224", "K14225", "K14226",
"K14227", "K14228", "K14229", "K14230", "K14231", "K14232", "K14233",
"K14234", "K14235", "K14236", "K14237", "K14238", "K14239", "K22503",
"K24278"), `ko02010 ABC transporters` = c("K01995", "K01996",
"K01997", "K01998", "K01999", "K02000", "K02001", "K02002", "K02006",
"K02007", "K02008", "K02009", "K02010", "K02011", "K02012", "K02017",
"K02018", "K02020", "K02036", "K02037", "K02038", "K02040", "K02041",
"K02042", "K02044", "K02045", "K02046", "K02047", "K02048", "K02062",
"K02063", "K02064", "K02065", "K02066", "K02067", "K02071", "K02072",
"K02073", "K02193", "K02194", "K02195", "K02196", "K02424", "K02471",
"K03523", "K05031", "K05032", "K05033", "K05641", "K05642", "K05643",
"K05644", "K05645", "K05646", "K05647", "K05648", "K05649", "K05650",
"K05651", "K05652", "K05653", "K05654", "K05655", "K05656", "K05657",
"K05658", "K05659", "K05660", "K05661", "K05662", "K05663", "K05664",
"K05665", "K05666", "K05667", "K05668", "K05669", "K05670", "K05671",
"K05672", "K05673", "K05674", "K05675", "K05676", "K05677", "K05678",
"K05679", "K05680", "K05681", "K05682", "K05683", "K05684", "K05685",
"K05772", "K05773", "K05776", "K05813", "K05814", "K05815", "K05816",
"K05845", "K05846", "K05847", "K06073", "K06074", "K06159", "K06160",
"K06161", "K06726", "K06857", "K06858", "K06861", "K07091", "K07122",
"K07323", "K07335", "K08711", "K08712", "K09688", "K09689", "K09690",
"K09691", "K09692", "K09693", "K09694", "K09695", "K09696", "K09697",
"K09808", "K09810", "K09811", "K09812", "K09813", "K09814", "K09815",
"K09816", "K09817", "K09969", "K09970", "K09971", "K09972", "K09996",
"K09997", "K09998", "K09999", "K10000", "K10001", "K10002", "K10003",
"K10004", "K10005", "K10006", "K10007", "K10008", "K10009", "K10010",
"K10013", "K10014", "K10015", "K10016", "K10017", "K10018", "K10019",
"K10020", "K10021", "K10022", "K10023", "K10024", "K10025", "K10036",
"K10037", "K10038", "K10039", "K10040", "K10041", "K10094", "K10107",
"K10108", "K10109", "K10110", "K10111", "K10112", "K10117", "K10118",
"K10119", "K10188", "K10189", "K10190", "K10191", "K10192", "K10193",
"K10194", "K10195", "K10196", "K10197", "K10198", "K10199", "K10200",
"K10201", "K10202", "K10227", "K10228", "K10229", "K10232", "K10233",
"K10234", "K10235", "K10236", "K10237", "K10238", "K10240", "K10241",
"K10242", "K10439", "K10440", "K10441", "K10537", "K10538", "K10539",
"K10540", "K10541", "K10542", "K10543", "K10544", "K10545", "K10546",
"K10547", "K10548", "K10549", "K10550", "K10551", "K10552", "K10553",
"K10554", "K10555", "K10556", "K10557", "K10558", "K10559", "K10560",
"K10561", "K10562", "K10820", "K10823", "K10824", "K10829", "K10830",
"K10831", "K11004", "K11050", "K11051", "K11069", "K11070", "K11071",
"K11072", "K11073", "K11074", "K11075", "K11076", "K11077", "K11078",
"K11079", "K11080", "K11081", "K11082", "K11083", "K11084", "K11085",
"K11601", "K11602", "K11603", "K11604", "K11605", "K11606", "K11607",
"K11631", "K11632", "K11704", "K11705", "K11706", "K11707", "K11708",
"K11709", "K11710", "K11720", "K11950", "K11951", "K11952", "K11953",
"K11954", "K11955", "K11956", "K11957", "K11958", "K11959", "K11960",
"K11961", "K11962", "K11963", "K12292", "K12368", "K12369", "K12370",
"K12371", "K12372", "K12533", "K12536", "K12539", "K12541", "K13409",
"K13889", "K13890", "K13891", "K13892", "K13893", "K13894", "K13895",
"K13896", "K14698", "K14699", "K15495", "K15496", "K15497", "K15551",
"K15552", "K15553", "K15554", "K15555", "K15556", "K15557", "K15558",
"K15576", "K15577", "K15578", "K15579", "K15580", "K15581", "K15582",
"K15583", "K15584", "K15585", "K15586", "K15587", "K15598", "K15599",
"K15600", "K15628", "K15770", "K15771", "K15772", "K16012", "K16013",
"K16014", "K16199", "K16200", "K16201", "K16202", "K16299", "K16783",
"K16784", "K16785", "K16786", "K16787", "K16905", "K16906", "K16907",
"K16915", "K16916", "K16917", "K16918", "K16919", "K16920", "K16921",
"K16956", "K16957", "K16958", "K16959", "K16960", "K16961", "K16962",
"K16963", "K17062", "K17063", "K17073", "K17074", "K17076", "K17077",
"K17202", "K17203", "K17204", "K17205", "K17206", "K17207", "K17208",
"K17209", "K17210", "K17213", "K17214", "K17215", "K17234", "K17235",
"K17236", "K17237", "K17238", "K17239", "K17240", "K17241", "K17242",
"K17243", "K17244", "K17245", "K17246", "K17311", "K17312", "K17313",
"K17314", "K17315", "K17316", "K17317", "K17318", "K17319", "K17320",
"K17321", "K17322", "K17323", "K17324", "K17325", "K17326", "K17327",
"K17328", "K17329", "K17330", "K17331", "K18104", "K18216", "K18217",
"K18230", "K18231", "K18232", "K18233", "K18887", "K18888", "K18889",
"K18890", "K18891", "K18892", "K18893", "K18894", "K18895", "K19079",
"K19080", "K19083", "K19084", "K19226", "K19227", "K19228", "K19229",
"K19230", "K19309", "K19310", "K19340", "K19341", "K19349", "K19350",
"K19971", "K19972", "K19973", "K19975", "K19976", "K20344", "K20386",
"K20459", "K20460", "K20461", "K20490", "K20491", "K20492", "K20494",
"K22921", "K22922", "K22923", "K23055", "K23056", "K23057", "K23058",
"K23059", "K23060", "K23061", "K23062", "K23063", "K23064", "K23125",
"K23163", "K23181", "K23182", "K23183", "K23184", "K23185", "K23186",
"K23187", "K23188", "K23227", "K23228", "K23508", "K23509", "K23510",
"K23511", "K23512", "K23513", "K23535", "K23536", "K23537", "K23545",
"K23546", "K23547"), `ko02020 Two-component system` = c("K00027",
"K00066", "K00244", "K00245", "K00246", "K00247", "K00370", "K00371",
"K00373", "K00374", "K00404", "K00405", "K00406", "K00407", "K00410",
"K00411", "K00412", "K00413", "K00424", "K00425", "K00426", "K00494",
"K00575", "K00626", "K00689", "K00692", "K00990", "K01034", "K01035",
"K01051", "K01077", "K01104", "K01113", "K01179", "K01425", "K01467",
"K01545", "K01546", "K01547", "K01548", "K01643", "K01644", "K01646",
"K01791", "K01910", "K01915", "K01991", "K02040", "K02106", "K02252",
"K02253", "K02259", "K02313", "K02398", "K02402", "K02403", "K02405",
"K02406", "K02472", "K02488", "K02489", "K02490", "K02491", "K02556",
"K02584", "K02650", "K02657", "K02658", "K02659", "K02660", "K02661",
"K02667", "K02668", "K03092", "K03367", "K03400", "K03406", "K03407",
"K03408", "K03412", "K03413", "K03415", "K03532", "K03533", "K03563",
"K03620", "K03739", "K03740", "K03776", "K04751", "K04771", "K05338",
"K05339", "K05597", "K05874", "K05875", "K05876", "K05877", "K05964",
"K05966", "K06046", "K06080", "K06281", "K06282", "K06347", "K06375",
"K06596", "K06597", "K06598", "K07165", "K07260", "K07636", "K07637",
"K07638", "K07639", "K07640", "K07641", "K07642", "K07643", "K07644",
"K07645", "K07646", "K07647", "K07648", "K07649", "K07650", "K07651",
"K07652", "K07653", "K07654", "K07655", "K07656", "K07657", "K07658",
"K07659", "K07660", "K07661", "K07662", "K07663", "K07664", "K07665",
"K07666", "K07667", "K07668", "K07669", "K07670", "K07671", "K07672",
"K07673", "K07674", "K07675", "K07676", "K07677", "K07678", "K07679",
"K07680", "K07681", "K07682", "K07683", "K07684", "K07685", "K07686",
"K07687", "K07688", "K07689", "K07690", "K07691", "K07692", "K07693",
"K07694", "K07695", "K07696", "K07697", "K07698", "K07699", "K07700",
"K07701", "K07702", "K07703", "K07704", "K07705", "K07706", "K07707",
"K07708", "K07709", "K07710", "K07711", "K07712", "K07713", "K07714",
"K07715", "K07716", "K07717", "K07718", "K07719", "K07720", "K07768",
"K07769", "K07770", "K07771", "K07772", "K07773", "K07774", "K07775",
"K07776", "K07777", "K07778", "K07780", "K07781", "K07782", "K07783",
"K07784", "K07785", "K07786", "K07787", "K07788", "K07789", "K07790",
"K07792", "K07793", "K07794", "K07795", "K07796", "K07797", "K07798",
"K07799", "K07800", "K07801", "K07803", "K07804", "K07805", "K07806",
"K07810", "K07811", "K07813", "K08082", "K08083", "K08348", "K08349",
"K08350", "K08357", "K08358", "K08359", "K08372", "K08475", "K08476",
"K08477", "K08478", "K08479", "K08641", "K08738", "K08926", "K08927",
"K08928", "K08929", "K08930", "K08939", "K09474", "K09475", "K09476",
"K09477", "K09696", "K09697", "K10001", "K10002", "K10003", "K10004",
"K10125", "K10126", "K10255", "K10681", "K10682", "K10697", "K10715",
"K10850", "K10851", "K10909", "K10910", "K10911", "K10912", "K10913",
"K10914", "K10916", "K10941", "K10942", "K10943", "K11103", "K11230",
"K11231", "K11232", "K11233", "K11326", "K11327", "K11328", "K11329",
"K11330", "K11331", "K11332", "K11354", "K11355", "K11356", "K11357",
"K11382", "K11383", "K11384", "K11443", "K11444", "K11520", "K11521",
"K11522", "K11523", "K11524", "K11525", "K11526", "K11601", "K11602",
"K11603", "K11614", "K11615", "K11616", "K11617", "K11618", "K11619",
"K11620", "K11621", "K11622", "K11623", "K11624", "K11625", "K11626",
"K11629", "K11630", "K11631", "K11632", "K11633", "K11634", "K11635",
"K11636", "K11637", "K11638", "K11639", "K11640", "K11641", "K11688",
"K11689", "K11690", "K11691", "K11692", "K11711", "K11712", "K12292",
"K12293", "K12294", "K12295", "K12296", "K12340", "K12415", "K12530",
"K12531", "K12532", "K13040", "K13041", "K13061", "K13486", "K13487",
"K13488", "K13489", "K13490", "K13491", "K13532", "K13533", "K13584",
"K13587", "K13588", "K13589", "K13598", "K13599", "K13815", "K13816",
"K13924", "K13927", "K13991", "K13994", "K14188", "K14205", "K14978",
"K14979", "K14980", "K14981", "K14982", "K14983", "K14986", "K14987",
"K14988", "K14989", "K15011", "K15012", "K15739", "K15841", "K15850",
"K15851", "K15853", "K15854", "K15859", "K15860", "K15861", "K15862",
"K16692", "K16712", "K16713", "K17060", "K17061", "K18072", "K18073",
"K18093", "K18094", "K18095", "K18321", "K18322", "K18323", "K18324",
"K18326", "K18344", "K18345", "K18346", "K18347", "K18348", "K18349",
"K18350", "K18351", "K18352", "K18353", "K18354", "K18444", "K18856",
"K18866", "K18940", "K18941", "K18986", "K18987", "K19077", "K19078",
"K19079", "K19080", "K19081", "K19082", "K19083", "K19084", "K19609",
"K19610", "K19611", "K19615", "K19616", "K19617", "K19618", "K19620",
"K19621", "K19622", "K19624", "K19641", "K19661", "K19666", "K19667",
"K19668", "K19690", "K19691", "K19692", "K20263", "K20264", "K20339",
"K20340", "K20482", "K20483", "K20484", "K20485", "K20486", "K20487",
"K20488", "K20489", "K20490", "K20491", "K20492", "K20494", "K20552",
"K20973", "K20974", "K20975", "K20976", "K20977", "K20978", "K22501",
"K23236", "K23514", "K23548", "K23549"), `ko02024 Quorum sensing` = c("K00494",
"K01114", "K01218", "K01318", "K01364", "K01399", "K01497", "K01580",
"K01626", "K01635", "K01657", "K01658", "K01728", "K01897", "K01995",
"K01996", "K01997", "K01998", "K01999", "K02031", "K02032", "K02033",
"K02034", "K02035", "K02052", "K02053", "K02054", "K02055", "K02250",
"K02251", "K02252", "K02253", "K02402", "K02403", "K02490", "K03070",
"K03071", "K03073", "K03075", "K03076", "K03106", "K03110", "K03210",
"K03217", "K03400", "K03666", "K06046", "K06352", "K06353", "K06354",
"K06355", "K06356", "K06358", "K06359", "K06360", "K06361", "K06363",
"K06364", "K06365", "K06366", "K06369", "K06375", "K06998", "K07173",
"K07344", "K07645", "K07666", "K07667", "K07680", "K07691", "K07692",
"K07699", "K07706", "K07707", "K07711", "K07715", "K07781", "K07782",
"K07800", "K07813", "K08321", "K08605", "K08642", "K08777", "K09823",
"K09936", "K10555", "K10556", "K10557", "K10558", "K10715", "K10823",
"K10909", "K10910", "K10911", "K10912", "K10913", "K10914", "K10915",
"K10916", "K10917", "K11006", "K11007", "K11031", "K11033", "K11034",
"K11035", "K11036", "K11037", "K11039", "K11063", "K11216", "K11530",
"K11531", "K11752", "K12257", "K12292", "K12293", "K12294", "K12295",
"K12296", "K12415", "K12789", "K12990", "K13060", "K13061", "K13062",
"K13063", "K13075", "K13815", "K13816", "K14051", "K14645", "K14982",
"K14983", "K15580", "K15581", "K15582", "K15583", "K15654", "K15655",
"K15656", "K15657", "K15850", "K15851", "K15852", "K15853", "K15854",
"K16619", "K17940", "K18000", "K18001", "K18002", "K18003", "K18096",
"K18098", "K18099", "K18100", "K18101", "K18139", "K18304", "K18306",
"K18307", "K18315", "K18316", "K18317", "K18318", "K18319", "K19666",
"K19731", "K19732", "K19733", "K19734", "K19735", "K20086", "K20087",
"K20088", "K20089", "K20090", "K20248", "K20249", "K20250", "K20252",
"K20253", "K20256", "K20257", "K20258", "K20259", "K20260", "K20261",
"K20262", "K20263", "K20264", "K20265", "K20266", "K20267", "K20268",
"K20269", "K20270", "K20271", "K20272", "K20273", "K20274", "K20275",
"K20276", "K20277", "K20321", "K20322", "K20323", "K20324", "K20325",
"K20326", "K20327", "K20328", "K20329", "K20330", "K20331", "K20332",
"K20333", "K20334", "K20335", "K20336", "K20337", "K20338", "K20339",
"K20340", "K20341", "K20342", "K20343", "K20344", "K20345", "K20373",
"K20374", "K20375", "K20376", "K20377", "K20378", "K20379", "K20380",
"K20381", "K20382", "K20383", "K20384", "K20385", "K20386", "K20387",
"K20388", "K20389", "K20390", "K20391", "K20480", "K20481", "K20482",
"K20483", "K20484", "K20485", "K20486", "K20487", "K20488", "K20489",
"K20490", "K20491", "K20492", "K20494", "K20527", "K20528", "K20529",
"K20530", "K20531", "K20532", "K20533", "K20539", "K20540", "K20552",
"K20554", "K20555", "K22954", "K22955", "K22956", "K22957", "K22968",
"K23133"), `ko02025 Biofilm formation - Pseudomonas aeruginosa` = c("K01657",
"K01658", "K01768", "K02398", "K02405", "K02657", "K02658", "K02659",
"K02660", "K03563", "K03651", "K06596", "K06598", "K07678", "K07689",
"K10914", "K10941", "K11444", "K11890", "K11891", "K11893", "K11895",
"K11900", "K11901", "K11902", "K11903", "K11907", "K11912", "K11913",
"K11915", "K12990", "K12992", "K13060", "K13061", "K13487", "K13488",
"K13489", "K13490", "K13491", "K16011", "K17940", "K18000", "K18001",
"K18002", "K18003", "K18099", "K18100", "K18101", "K18304", "K19291",
"K19735", "K20257", "K20258", "K20259", "K20968", "K20969", "K20970",
"K20971", "K20972", "K20973", "K20974", "K20975", "K20976", "K20977",
"K20978", "K20987", "K20997", "K20998", "K20999", "K21000", "K21001",
"K21002", "K21003", "K21004", "K21005", "K21006", "K21007", "K21008",
"K21009", "K21010", "K21011", "K21012", "K21019", "K21020", "K21021",
"K21022", "K21023", "K21024", "K21025", "K23127"), `ko02026 Biofilm formation - Escherichia coli` = c("K00688",
"K00694", "K00703", "K00975", "K01991", "K02398", "K02402", "K02403",
"K02405", "K02425", "K02777", "K03087", "K03563", "K03566", "K03567",
"K04333", "K04334", "K04335", "K04336", "K04761", "K05851", "K06204",
"K07173", "K07638", "K07648", "K07659", "K07676", "K07677", "K07678",
"K07687", "K07689", "K07773", "K07781", "K07782", "K10914", "K11531",
"K11931", "K11935", "K11936", "K11937", "K12687", "K14051", "K18502",
"K18504", "K18509", "K18515", "K18516", "K18518", "K18521", "K18522",
"K18523", "K18528", "K18968", "K21084", "K21085", "K21086", "K21087",
"K21088", "K21089", "K21090", "K21091"))
And a truncated dataframe with interesting genes
dput(test2)
structure(list(gene_id = c("G6381", "G12285", "G10911", "G17366",
"G3593", "G17753"), kegg_id = c("K18523", "K19009", "K07782",
"K02398", "K21407", "K00922")), row.names = c(NA, 6L), class = "data.frame")
If we need to get the corresponding 'gene_id', create a named vector from the 'test2', loop over the list ('test1'), match those 'kegg_id' with the named vector to extract the 'gene_id' and remove the non-matching elements with na.omit
nm1 <- with(test2, setNames(gene_id, kegg_id))
lst1 <- lapply(test1, function(x) as.vector(na.omit(nm1[x])))
If we need to Filter the original list
test1[lengths(lst1) > 0]
Or to Filter the subset list
lst1[lengths(lst1) > 0]

Remove all rows above and below a value in R

We have citizen scientist recording data for us using In-Situ Aqua troll 600 instruments. It is similar to a CTD but not. The data format is a little different. Different enough that I cannot use CTD trim from the OCE package in R. I need to remove all the rows of data during the soak time (time in the water before they start lowering the instrument) and the up cast from the data. That is all the rows after they reached the max depth. So I just need that center portion of my dataframe.
My Data
Date Time Salinity (ppt) (672441) Chlorophyll-a Fluorescence (RFU) (671721) RDO Concentration (mg/L) (672144) Temperature (°C) (676121) Depth (ft) (671051)
16:29.0 0 0.01089297 7.257619 31.91303 0.008220486
16:31.0 0 0.01765913 7.246986 31.93175 0.1499496
16:33.0 0 0.0130412 7.258863 31.93253 0.5387784
16:35.0 0 0.01299242 7.274049 31.93806 0.6187978
16:37.0 0 0.01429801 7.26965 31.94401 0.6640261
16:39.0 0 0.01342988 7.271608 31.93595 0.681709
16:41.0 0 0.01337719 7.271549 31.93503 0.684597
16:43.0 7.087267 0.007094439 6.98015 31.89018 1.598019
16:45.0 28.3442 0.007111916 6.268753 31.83806 1.687673
16:47.0 31.06357 0.007945394 6.197834 31.77821 1.418773
16:49.0 32.07076 0.0080788 6.166986 31.76881 1.382685
16:51.0 31.95504 0.004382414 6.191305 31.72906 1.358556
16:53.0 36.21165 0.01983912 5.732656 29.3942 123.4148
16:55.0 36.37849 0.02243886 5.626586 28.82502 125.2927
16:57.0 36.43061 0.02416219 5.450325 28.23787 126.7997
16:59.0 36.44484 0.02441683 5.421676 28.14037 127.0321
17:01.0 36.46815 4.510316 5.318929 28.09501 127.2064
17:03.0 36.41381 4.012657 5.241654 28.14595 127.2227
17:05.0 36.42724 0.7891375 5.174401 28.20383 127.2019
17:07.0 36.41064 0.4351442 5.120181 28.18592 127.197
17:09.0 36.38155 0.2253969 5.033384 28.21021 127.1895
17:11.0 36.37671 0.2089337 5.019629 28.21222 127.1885
17:13.0 36.43813 0.08728585 4.981099 28.17526 127.2223
17:15.0 36.47644 0.904435 4.951878 28.13579 127.2108
17:17.0 36.54742 0.1230291 4.93056 28.06166 127.2307
17:19.0 36.60466 10.04291 4.908442 27.9397 126.6003
17:21.0 36.61511 11.33922 4.904828 27.92038 126.5161
17:23.0 36.68179 0.6680982 4.87018 27.78319 123.707
17:25.0 36.74612 0.06539913 4.848994 27.72977 119.906
17:27.0 36.75729 0.02414635 4.826871 27.72545 114.9537
17:29.0 37.1578 0.01556828 4.804105 27.81129 113.3405
> depthmax<- max(WS$`Depth (ft) (671051)`, na.rm = TRUE)
> output <- WS[WS$"Depth (ft) (671051)" < depthmax,]
> Output2 <- output[output$"Depth (ft) (671051)" > 1,]
I tried these and got output2 to work but can't seam to get output to work. Is there a more elegant way to do this? Just to recap I need to remove all rows after the depthmax (127.2307) and all the rows before the depth when they start lowering the instrument (~2.41).
Your code does remove the maximum depth, but not the rows after the maximum depth is reached. You want to locate the row index of the the maximum depth and delete that row and the ones after:
start <- tail(which(na.omit(WS$`Depth (ft) (671051)`) < 2.41), 1) + 1
end<- which.max(na.omit(WS$`Depth (ft) (671051)`)) - 1
output <- WS[start:end, ]
The first line finds the index of the last row less than 2.41 and adds 1 to get the starting row. The second line finds the index of the maximum depth and subtracts 1 to get the row before that.

Including lagged independent variables - R

I would like to run a regression where I use both the current value and lagged values from a specific independent variable.
My dataset
This is an example extract from my dataset:
dt nrOfCalls nrOfOrders nrOfOrdersLag1 nrOfOrdersLag2 nrOfOrdersLag3
2016/04/20 17 5 9 7 12
2016/04/21 12 8 5 9 7
2016/04/22 14 4 8 5 9
2016/04/23 15 6 4 8 5
2016/04/24 20 14 6 4 8
2016/04/25 10 3 14 6 4
Where NrOfOrdersLagX implies the number of orders X days ago. I have also included dummy variables (because of limited space I have included these dummy variables in the example extract of my dataset).
My code
When I run the following code everything works perfectly fine:
reg <- lm(nrOfCalls ~ dummy1+...+dummy6+nrOfOrders, data=trainingSet)
However, when I try including the lagged values of the nrOfOrders regressor (for this example I only include one lagged value), I get some inordinary results. I use the following code:
reg <- lm(nrOfCalls ~ dummy1+...+dummy6+nrOfOrders+nrOfOrdersLag1, data=trainingSet)
Instead of merely including the regressor nrOfOrdersLag1, it will include all kinds of regressors which variable names are a variation on nrOfOrdersLag1.
Call:
lm(formula = nrOfCalls ~ dummy1 + dummy2 + dummy3 + dummy4 +
dummy5 + dummy6 + nrOfOrders + nrOfOrdersLag1, data = trainCall)
Coefficients:
(Intercept) dummy1 dummy2 dummy3 dummy4
604.06334 -114.03241 -229.67540 -270.62292 -220.12409
dummy5 dummy6 nrOfOrders nrOfOrdersLag110707 nrOfOrdersLag11161
-457.22245 -465.17116 0.01729 -249.54641 -10.98526
nrOfOrdersLag111869 nrOfOrdersLag11207 nrOfOrdersLag11234 nrOfOrdersLag11262 nrOfOrdersLag11267
45.36821 33.46161 -17.70615 -384.09745 -413.64804
nrOfOrdersLag11279 nrOfOrdersLag11285 nrOfOrdersLag112945 nrOfOrdersLag11336 nrOfOrdersLag11348
-200.19660 32.75546 -264.04005 -47.13457 79.48368
nrOfOrdersLag11351 nrOfOrdersLag11355 nrOfOrdersLag11363 nrOfOrdersLag11364 nrOfOrdersLag11368
-208.62312 6.83426 -98.71679 170.29583 -93.83054
nrOfOrdersLag11375 nrOfOrdersLag11398 nrOfOrdersLag11456 nrOfOrdersLag11462 nrOfOrdersLag11464
50.54960 14.39958 118.73762 113.72744 190.54445
nrOfOrdersLag11469 nrOfOrdersLag114778 nrOfOrdersLag11486 nrOfOrdersLag11489 nrOfOrdersLag11504
-8.79258 84.35041 66.29121 29.67360 24.30553
nrOfOrdersLag11505 nrOfOrdersLag11511 nrOfOrdersLag11520 nrOfOrdersLag11521 nrOfOrdersLag11527
286.85352 69.76762 -159.45588 -38.90402 53.62128
nrOfOrdersLag11538 nrOfOrdersLag11540 nrOfOrdersLag11564 nrOfOrdersLag115674 nrOfOrdersLag11579
-104.66037 -60.10656 -58.32177 522.56810 77.65481
nrOfOrdersLag11587 nrOfOrdersLag11593 nrOfOrdersLag11603 nrOfOrdersLag11618 nrOfOrdersLag11622
34.63649 31.28570 -124.35673 16.43115 207.99435
nrOfOrdersLag11624 nrOfOrdersLag11626 nrOfOrdersLag11629 nrOfOrdersLag11631 nrOfOrdersLag11635
93.90391 78.94275 155.88327 15.32027 125.02409
nrOfOrdersLag11640 nrOfOrdersLag11645 nrOfOrdersLag11649 nrOfOrdersLag11651 nrOfOrdersLag11653
208.51996 -42.03086 -1.62533 164.73045 12.61157
nrOfOrdersLag11654 nrOfOrdersLag11673 nrOfOrdersLag11683 nrOfOrdersLag11688 nrOfOrdersLag11698
129.26306 -41.56615 137.09095 149.86866 -49.43096
nrOfOrdersLag11699 nrOfOrdersLag11702 nrOfOrdersLag11703 nrOfOrdersLag11705 nrOfOrdersLag11714
76.86530 202.69027 -70.26281 -173.43605 170.02302
nrOfOrdersLag11715 nrOfOrdersLag11716 nrOfOrdersLag11726 nrOfOrdersLag11749 nrOfOrdersLag11754
34.30252 75.45378 176.16211 76.39492 58.11995
nrOfOrdersLag11757 nrOfOrdersLag11764 nrOfOrdersLag11766 nrOfOrdersLag11772 nrOfOrdersLag11777
133.71731 137.62373 24.95059 -75.96096 54.03353
nrOfOrdersLag11778 nrOfOrdersLag11782 nrOfOrdersLag11793 nrOfOrdersLag11806 nrOfOrdersLag11810
-147.40657 -45.70752 27.76710 94.17449 -191.98461
nrOfOrdersLag11811 nrOfOrdersLag11812 nrOfOrdersLag11814 nrOfOrdersLag11815 nrOfOrdersLag11817
61.04646 145.25908 38.56959 18.22574 140.84081
nrOfOrdersLag11827 nrOfOrdersLag11832 nrOfOrdersLag11839 nrOfOrdersLag11841 nrOfOrdersLag11859
-254.56931 138.30797 -139.32523 -151.50010 39.27760
nrOfOrdersLag11860 nrOfOrdersLag11862 nrOfOrdersLag11868 nrOfOrdersLag11874 nrOfOrdersLag11876
304.88804 150.84361 30.75749 -91.55666 192.43385
nrOfOrdersLag11879 nrOfOrdersLag11880 nrOfOrdersLag11885 nrOfOrdersLag11887 nrOfOrdersLag11891
118.75260 -44.83615 163.35474 194.12038 127.79107
nrOfOrdersLag11896 nrOfOrdersLag11901 nrOfOrdersLag11914 nrOfOrdersLag11919 nrOfOrdersLag11921
82.79870 179.44324 303.18796 242.51540 159.40652
nrOfOrdersLag11928 nrOfOrdersLag11929 nrOfOrdersLag11932 nrOfOrdersLag11937 nrOfOrdersLag11939
484.73958 35.38640 286.54643 46.88513 48.94031
nrOfOrdersLag11952 nrOfOrdersLag11967 nrOfOrdersLag11988 nrOfOrdersLag11994 nrOfOrdersLag11996
265.02228 170.65576 47.77627 317.10968 383.09702
nrOfOrdersLag119987 nrOfOrdersLag12007 nrOfOrdersLag12010 nrOfOrdersLag12017 nrOfOrdersLag12018
416.71786 93.41540 61.71721 73.68938 136.60641
nrOfOrdersLag12019 nrOfOrdersLag12023 nrOfOrdersLag12027 nrOfOrdersLag12034 nrOfOrdersLag12040
88.13672 -214.93168 38.82154 148.72993 -60.63852
nrOfOrdersLag12050 nrOfOrdersLag12051 nrOfOrdersLag12056 nrOfOrdersLag12058 nrOfOrdersLag12060
205.21811 246.46001 163.20151 -0.35863 61.93024
nrOfOrdersLag12073 nrOfOrdersLag12082 nrOfOrdersLag12087 nrOfOrdersLag12093 nrOfOrdersLag12107
122.50936 -27.13307 -43.74262 366.51938 146.85581
nrOfOrdersLag12119 nrOfOrdersLag12122 nrOfOrdersLag12124 nrOfOrdersLag121319 nrOfOrdersLag12133
119.31341 36.35183 253.68015 115.01838 228.66567
nrOfOrdersLag12136 nrOfOrdersLag12137 nrOfOrdersLag12154 nrOfOrdersLag12167 nrOfOrdersLag12169
-9.97711 121.20416 -448.43096 324.45466 169.37446
nrOfOrdersLag12176 nrOfOrdersLag12180 nrOfOrdersLag12181 nrOfOrdersLag12184 nrOfOrdersLag12186
88.35432 -14.74399 41.03555 310.68640 308.82549
nrOfOrdersLag12189 nrOfOrdersLag12195 nrOfOrdersLag12202 nrOfOrdersLag12204 nrOfOrdersLag12216
121.87542 264.78895 191.52156 281.02113 168.29821
nrOfOrdersLag12219 nrOfOrdersLag12221 nrOfOrdersLag12231 nrOfOrdersLag12236 nrOfOrdersLag12237
218.48030 66.07233 -228.54230 111.06068 162.65347
nrOfOrdersLag12242 nrOfOrdersLag12244 nrOfOrdersLag12246 nrOfOrdersLag12261 nrOfOrdersLag12262
12.05505 114.60872 -123.06406 -45.54485 380.26022
nrOfOrdersLag12268 nrOfOrdersLag12271 nrOfOrdersLag12302 nrOfOrdersLag12304 nrOfOrdersLag12311
4.23556 249.55941 248.38079 103.12194 -71.69000
nrOfOrdersLag12313 nrOfOrdersLag12329 nrOfOrdersLag12345 nrOfOrdersLag12353 nrOfOrdersLag12356
247.93662 207.13958 314.96154 95.08688 300.10247
nrOfOrdersLag12361 nrOfOrdersLag12371 nrOfOrdersLag12376 nrOfOrdersLag12380 nrOfOrdersLag12384
37.27506 -167.84137 66.61313 247.32681 237.73556
nrOfOrdersLag12399 nrOfOrdersLag12406 nrOfOrdersLag12413 nrOfOrdersLag12417 nrOfOrdersLag12420
107.37362 399.28658 275.48695 95.07723 324.87029
nrOfOrdersLag12423 nrOfOrdersLag12434 nrOfOrdersLag12437 nrOfOrdersLag12442 nrOfOrdersLag12446
233.30480 193.45613 250.79606 322.78975 320.40151
nrOfOrdersLag12448 nrOfOrdersLag12449 nrOfOrdersLag12451 nrOfOrdersLag12460 nrOfOrdersLag124708
172.20478 -113.45790 108.52769 305.32173 -134.41931
nrOfOrdersLag12484 nrOfOrdersLag12486 nrOfOrdersLag12493 nrOfOrdersLag12497 nrOfOrdersLag12505
156.35931 -9.49808 223.13247 -67.47891 534.66815
nrOfOrdersLag12541 nrOfOrdersLag12552 nrOfOrdersLag12563 nrOfOrdersLag12588 nrOfOrdersLag12596
221.35464 1.92188 -53.40846 -473.89923 497.69016
nrOfOrdersLag12611 nrOfOrdersLag12618 nrOfOrdersLag12623 nrOfOrdersLag12632 nrOfOrdersLag12638
175.77150 125.22040 -302.58298 -159.54109 -337.04664
nrOfOrdersLag12646 nrOfOrdersLag12648 nrOfOrdersLag12663 nrOfOrdersLag12665 nrOfOrdersLag12687
539.15416 350.53169 -148.22458 147.67351 -349.52567
nrOfOrdersLag12696 nrOfOrdersLag12713 nrOfOrdersLag12721 nrOfOrdersLag12723 nrOfOrdersLag12743
-42.64843 141.90979 47.07766 -443.50878 356.28944
nrOfOrdersLag12745 nrOfOrdersLag12750 nrOfOrdersLag12753 nrOfOrdersLag12761 nrOfOrdersLag127688
14.65720 13.35666 8.30924 -191.17540 -123.52409
nrOfOrdersLag12802 nrOfOrdersLag12806 nrOfOrdersLag12812 nrOfOrdersLag12815 nrOfOrdersLag12818
128.14604 281.35157 361.79299 8.34690 86.67458
nrOfOrdersLag12824 nrOfOrdersLag12836 nrOfOrdersLag12841 nrOfOrdersLag12842 nrOfOrdersLag12876
518.23720 -357.78788 288.63660 433.15556 158.51341
nrOfOrdersLag12883 nrOfOrdersLag12884 nrOfOrdersLag12901 nrOfOrdersLag12941 nrOfOrdersLag12956
214.74913 68.99485 -208.43888 -297.43011 319.30849
nrOfOrdersLag12996 nrOfOrdersLag13007 nrOfOrdersLag13013 nrOfOrdersLag13023 nrOfOrdersLag13033
321.02569 -88.96746 80.93579 106.97804 -223.88599
nrOfOrdersLag13051 nrOfOrdersLag13072 nrOfOrdersLag13094 nrOfOrdersLag13098 nrOfOrdersLag13127
40.95339 161.48086 524.04025 -94.23016 17.50082
nrOfOrdersLag13152 nrOfOrdersLag13171 nrOfOrdersLag13185 nrOfOrdersLag13202 nrOfOrdersLag13205
-266.11135 8.82232 -107.11441 -141.14442 212.80057
nrOfOrdersLag13222 nrOfOrdersLag13277 nrOfOrdersLag13295 nrOfOrdersLag13321 nrOfOrdersLag13332
187.90431 306.69183 -24.55235 68.42339 -290.11682
nrOfOrdersLag13362 nrOfOrdersLag13378 nrOfOrdersLag13380 nrOfOrdersLag13391 nrOfOrdersLag13476
44.30976 463.85118 276.57882 -282.06457 34.35207
nrOfOrdersLag13488 nrOfOrdersLag13490 nrOfOrdersLag13530 nrOfOrdersLag13578 nrOfOrdersLag13599
217.46608 386.26006 194.69082 52.45357 406.44931
nrOfOrdersLag13611 nrOfOrdersLag13618 nrOfOrdersLag13626 nrOfOrdersLag13632 nrOfOrdersLag13635
242.81201 -22.19253 23.90163 -395.87751 103.44677
nrOfOrdersLag13674 nrOfOrdersLag13681 nrOfOrdersLag13767 nrOfOrdersLag13841 nrOfOrdersLag13849
200.18354 83.25027 -71.88190 382.05886 -279.73606
nrOfOrdersLag13857 nrOfOrdersLag13874 nrOfOrdersLag13885 nrOfOrdersLag13897 nrOfOrdersLag13908
370.92867 -17.14313 -140.99009 -244.17716 93.79552
nrOfOrdersLag13966 nrOfOrdersLag14009 nrOfOrdersLag14031 nrOfOrdersLag14111 nrOfOrdersLag14160
61.75484 224.96558 -107.99394 -126.12766 572.14222
nrOfOrdersLag14171 nrOfOrdersLag14205 nrOfOrdersLag14312 nrOfOrdersLag14468 nrOfOrdersLag14560
-42.29929 -379.41067 194.25204 -47.50642 -116.49251
nrOfOrdersLag14619 nrOfOrdersLag14640 nrOfOrdersLag14684 nrOfOrdersLag14762 nrOfOrdersLag14776
41.34325 -355.84333 -122.77109 -331.12296 404.86637
nrOfOrdersLag14865 nrOfOrdersLag14959 nrOfOrdersLag14967 nrOfOrdersLag15195 nrOfOrdersLag15218
371.14617 104.60840 -42.74014 99.78008 520.62517
nrOfOrdersLag15402 nrOfOrdersLag16029 nrOfOrdersLag16284 nrOfOrdersLag16321 nrOfOrdersLag16350
529.17004 161.02870 268.77256 74.02159 386.53868
nrOfOrdersLag16418 nrOfOrdersLag16557 nrOfOrdersLag16711 nrOfOrdersLag16722 nrOfOrdersLag16825
-81.37023 190.74905 225.64313 -131.70051 271.39936
nrOfOrdersLag16952 nrOfOrdersLag16996 nrOfOrdersLag17098 nrOfOrdersLag17251 nrOfOrdersLag17279
357.39158 408.46849 210.03477 -25.74894 NA
nrOfOrdersLag17292 nrOfOrdersLag17391 nrOfOrdersLag18642 nrOfOrdersLag18670 nrOfOrdersLag18949
262.00528 4.71906 326.28857 49.30983 174.99732
nrOfOrdersLag19202 nrOfOrdersLag19690 nrOfOrdersLag19772
16.13322 15.59552 -62.26111
I have no clue what is happening and why this is going wrong. Anybody that can help me out here? Thanks in advance!
The lagged independent variables were factor variables instead of integer/numeric variables. Having fixed this, the lm call works as intended.

cut function and controlled frequency in the intervals

My question is pretty simple: the cut() function allows to choose the breaks along which I can divide the range of my vector into intervals. I would like to be able to control for the number of observations within the newly created interval, in a way similar to what could be obtained with a quantile argument in the cut() function call. However I don't want to be using the quantile argument because I would like for the intervals to be chosen fixed, so that I can match them between different databases for further comparison, and I want the same discrete values to be found in the labels of the newly cut vectors.
I used to use this for the quantile approach:
df$z<-cut(df$x, quantile(x, (0:10)/10), include.lowest=TRUE)
Which is fairly simple. My new approach is even simpler, so it resembles this for example:
df$z<-cut(df$x, c(0.04,0.055,0.06,0.065,0.07,0.075,0.08,0.085,0.09,0.095,0.11), include.lowest=T)
I then have another variable which I want to calculate some statistics on, according to the levels of the discrete variable.
So it would go something like this :
df$conf.intx<-ifelse(df$z=="1",t.test(df[df$z=="1",]$y)$conf.int[1],
ifelse(df$z=="2",t.test(df[df$z=="2",]$y)$conf.int[1],
ifelse(df$z=="3",t.test(df[df$z=="3",]$y)$conf.int[1],
ifelse(df$z=="4",t.test(df[df$z=="4",]$y)$conf.int[1],NA))))
But for me to be able to calculate this kind of t-test confidence interval on each of the 'pools' of the y values (which number in the same amount as the observations within the intervals of the discrete variable), I need to be able to control for the number of values within each created interval for z, so that my test remains valid, at least as far as the number of observations is concerned.
Simply put, I'd need an automated procedure that would create the vector of breaks for the z variable so that each of them contains a minimum number of observations. As an added complication, it should be the same breaks for two different databases, which I don't know if it's possible.
Any help on the matter would be welcome, thank you in advance.
EDIT: here is a sample of my data for x.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1161917, 7.0606333, 5.2622667, 6.780925, 5.4615417, 6.48185,
5.51585, 6.2224333, 5.3660667, 7.196525, 6.2984083, 7.0137833,
7.4490083, 5.9712333, 6.4287833, 7.6693917, 6.4406417, 5.4135083,
7.16245, 7.2267, 5.820325, 6.066175, 5.760975, 6.4775, 6.2625,
5.5182583, 8.446625, 8.19025, 6.7955333, 4.7899583, 6.5680167,
4.5965917, 6.3539333, 4.6639, 6.0489667, 4.9047833, 5.353625,
4.711425, 6.6268833, 5.5458083, 6.3271917, 6.4591417, 5.1843917,
5.6117167, 7.1828417, 5.6956917, 5.0271917, 6.741875, 6.68305,
4.7859667, 5.3068667, 5.3245, 5.745675, 5.7518917, 5.37945, 8.0030417,
7.7064583, 6.2935333, 5.1838667, 6.9369333, 4.9734583, 6.7257167,
5.0510333, 6.4257667, 5.2858083, 5.7285167, 5.084, 7.0092833,
5.905875, 6.6893417, 6.8319583, 5.5558083, 5.9854833, 7.5552167,
6.064625, 5.3990333, 7.115175, 7.0600167, 5.1644833, 5.6848667,
5.7014417, 6.1051, 6.1186333, 5.7217667, 8.3685417, 8.071325,
6.6547333, 5.5972417, 7.4226, 5.539725, 7.26335, 5.645975, 6.87475,
5.8486167, 6.3001667, 5.5997833, 7.4353167, 6.5089583, 7.213625,
7.3125667, 6.12095, 6.5410083, 8.0639083, 6.6505167, 5.8886417,
7.6301167, 7.5850417, 5.7693667, 6.2480167, 6.1847167, 6.6896167,
6.6323917, 6.1972167, 8.8560333, 8.5501083, 7.1036167, 4.9929583,
6.9839583, 5.3847417, 6.8814417, 5.59555, 6.7867167, 5.7831333,
6.9370917, 5.7400917, 7.6922, 6.3151, 7.084725, 7.0414417, 5.95435,
6.4274167, 7.6692167, 6.9159, 6.0856083, 7.3079583, 7.1937667,
5.744675, 5.946525, 6.0651833, 6.8488833, 6.5924333, 5.772025,
8.3281167, 8.5475917, 6.7952917, 8.248525, 5.1931083, 7.0688917,
5.4793583, 7.0091583, 5.7593, 7.1053333, 5.9382583, 7.1765417,
6.003075, 7.7699833, 6.2757333, 7.2446583, 7.179275, 6.0013083,
6.447975, 7.7845833, 6.9071083, 6.1009, 7.425425, 7.4619083,
5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417, 6.0763917, 8.5926583,
8.7468417, 7.2485167, 8.5096833, 5.1541, 7.0479917, 5.43065,
6.9689083, 5.7356, 7.0842917, 5.9051667, 7.1283333, 5.9666667,
7.7295583, 6.249925, 7.21005, 7.1427167, 5.9675583, 6.4135667,
7.7448583, 6.874275, 6.0679333, 7.388675, 7.429025, 5.911225,
6.1757167, 6.095225, 7.045775, 6.9870833, 6.0567333, 8.5771167,
8.7541917, 7.3187333, 8.5092083, 5.5746, 7.342925, 5.8561667,
7.4704667, 5.922225, 6.9787, 6.1564167, 7.6059667, 5.9122917,
7.7848833, 6.6192, 7.34055, 7.2352417, 5.9776083, 6.5197583,
7.4891583, 7.2185667, 6.4710167, 7.70945, 7.5078083, 6.1470417,
6.66115, 6.6899333, 7.4454083, 7.2270917, 6.350075, 8.3156667,
8.9007917, 6.7578083, 8.3258083, 5.1996, 6.9688833, 5.3592917,
6.7583417, 5.5623583, 6.756375, 5.7361, 7.120425, 5.6567, 7.6174667,
6.1474833, 7.1442167, 6.74475, 5.5820333, 6.0106, 7.142675, 6.667475,
5.9067917, 7.2392, 7.058675, 5.6394417, 5.9119167, 5.8367333,
6.798025, 6.694675, 5.8565917, 8.6035083, 8.912375, 7.0501083,
8.38045, 4.8478083, 6.7493167, 5.3686667, 6.5152333, 5.282025,
6.5464333, 5.5085583, 6.870975, 5.4757667, 7.318, 5.92225, 6.9300417,
6.5758083, 5.4233083, 5.8295583, 7.0451, 6.4790083, 5.68255,
6.9632833, 6.9965833, 5.5005667, 5.717725, 5.5938083, 6.5309,
6.4824583, 5.4429833, 8.072575, 8.3635, 6.5797167, 8.0352333,
4.6289833, 6.64105, 4.8883833, 6.2025833, 5.2291833, 6.4814667,
5.2211083, 6.5780083, 5.196275, 7.030725, 5.6001583, 6.620475,
6.2858333, 5.114375, 5.5424417, 6.7784917, 6.1561333, 5.339375,
6.6249083, 6.6248583, 5.139775, 5.4195, 5.4531833, 6.3348583,
6.4041417, 5.292, 7.6243833, 7.9624583, 6.3226417, 7.761175,
4.8419083, 6.8384083, 5.3500417, 6.5903333, 5.33275, 6.732575,
5.4486, 6.8069417, 5.4569583, 7.26275, 5.835525, 6.8680333, 6.6712333,
5.4720417, 5.904325, 7.1506917, 6.4746833, 5.638675, 6.9570667,
7.0017333, 5.5033667, 5.6859333, 5.651875, 6.5903, 6.529725,
5.4819667, 7.971975, 8.2337833, 6.5815333, 7.9736583, 5.7711917,
7.543325, 5.8986917, 7.5081333, 6.2920333, 7.5321667, 6.4908917,
7.7616583, 6.4509417, 8.08035, 6.8219, 7.7939167, 7.6491333,
6.4773583, 6.9338667, 8.1865583, 7.3998917, 6.572125, 7.9198417,
8.0568, 6.5880333, 6.8299667, 6.7399833, 7.6436, 7.509275, 6.5139833,
9.1520167, 9.3580667, 7.65415, 9.0725167, 5.7483583, 7.5230417,
5.89105, 7.4808833, 6.1969667, 7.4923583, 6.4092583, 7.70695,
6.3970833, 8.0971333, 6.7949083, 7.76445, 7.6170167, 6.4494333,
6.8997, 8.1575333, 7.3728417, 6.544075, 7.888, 8.0215, 6.5484,
6.7911667, 6.7121917, 7.6179083, 7.4731167, 6.4629167, 9.1226333,
9.3307083, 7.6230583, 9.024875, 5.543925, 7.1460833, 5.6575583,
7.5986083, 6.027075, 7.4386167, 6.3500333, 7.6694833, 6.3682583,
8.0843333, 6.7181083, 7.7376, 7.5818583, 6.4010667, 6.8440083,
8.1217917, 7.3290833, 6.5187333, 7.8591667, 7.9898583, 6.5051,
6.7251167, 6.6881333, 7.477675, 7.3571333, 6.3351833, 8.881575,
9.12315, 7.3851, 8.8008667, 5.3437833, 7.1560417, 5.5748, 7.4622583,
5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685, 8.0270917,
6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417, 8.0401167,
7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667, 6.6103333,
7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625, 7.2460917,
8.660125, 5.2502833, 7.2591, 5.6425417, 6.889925, 5.353675, 6.50635,
6.260675, 7.4236583, 5.9076417, 7.3915, 6.2134917, 7.1645333,
6.922675, 6.0295417, 6.1687917, 7.2771083, 6.6152333, 6.3299417,
7.167325, 6.647275, 5.726475, 5.93905, 6.2888583, 6.7497167,
6.4364083, 5.8906583, 7.6052917, 8.039425, 6.5672833, 7.8754667,
6.3086333, 5.352025, 7.2849417, 5.7184833, 6.9675917, 5.5615333,
6.6157917, 6.3505417, 7.4881, 6.0007417, 7.5110583, 6.35525,
7.254075, 7.0289083, 6.1994417, 6.2860833, 7.372575, 6.735975,
6.4628917, 7.3102167, 6.8619417, 5.9123667, 6.1611917, 6.4854083,
6.8942417, 6.563625, 6.0610083, 7.941625, 8.6969167, 6.66075,
8.1197167, 6.2802, 3.9638, 5.870825, 4.1852, 5.5841417, 4.3007583,
5.2352167, 4.4281417, 5.819425, 4.1990917, 5.9338917, 4.89765,
5.7204333, 5.6546833, 4.5632167, 4.9803333, 5.6962417, 5.247725,
4.7092583, 6.0145417, 5.6403917, 4.4016917, 4.7181, 4.5007833,
5.2828917, 5.1314167, 4.7492, 6.777575, 6.9040083, 4.9760583,
6.4471917, 5.0952833, 3.712725, 5.8215333, 4.025725, 5.5635,
4.2354083, 5.143525, 4.4900083, 5.6802417, 4.1214333, 5.8128,
4.7525583, 5.6412583, 5.5534917, 4.487475, 4.8237833, 5.6156917,
5.0573, 4.5755417, 5.8096083, 5.5252083, 4.3145583, 4.5437417,
4.194675, 5.0100833, 4.8972333, 4.590025, 6.6441417, 6.5789417,
4.6947667, 6.1648167, 4.8517333, 3.982925, 5.7966833, 4.1607083,
5.5564833, 4.2557417, 5.2304083, 4.8661333, 5.912875, 4.4988333,
6.03915, 4.9131583, 5.8518667, 5.6578583, 4.773225, 4.8958583,
5.8759833, 5.204725, 4.8961667, 5.9217, 5.58395, 4.5410667, 4.73445,
4.5922333, 5.2517333, 5.0220333, 4.619475, 6.4883667, 6.429175,
4.6796417, 6.3171083, 4.93615, 3.9278833, 5.7590417, 4.1155667,
5.612725, 4.2199833, 5.2126667, 4.805275, 5.8888833, 4.4363,
6.0380083, 4.892, 5.8192083, 5.64205, 4.708825, 4.8751583, 5.833775,
5.2210417, 4.853225, 5.924225, 5.5856583, 4.5386167, 4.7280917,
4.5618, 5.264425, 5.03855, 4.5539, 6.4993, 6.4900667, 4.6749083,
6.2961333, 4.918525, 4.0890583, 6.33385, 4.3470083, 5.9645, 4.6541833,
5.5438667, 4.9556583, 6.1590583, 4.6379417, 6.2876833, 5.2235167,
6.1387167, 6.0547583, 4.9545667, 5.254125, 6.05395, 5.4813417,
4.9971333, 6.2266583, 5.9172833, 4.7275917, 4.9274917, 4.443575,
5.3164917, 5.2507083, 5.1704583, 7.173075, 6.9351583, 5.0816667,
6.5568, 5.3417667, 5.1705167, 7.0777833, 5.6253333, 7.231225,
5.5799167, 6.6942917, 6.1014583, 7.538725, 5.7152667, 7.459275,
6.2406083, 7.064925, 6.9234417, 5.8328833, 6.1819583, 7.2127583,
6.8071583, 6.2599417, 7.2975417, 6.973875, 5.804125, 6.1944667,
6.38855, 7.0553583, 6.8393167, 6.1275417, 7.9986833, 8.5846,
6.4682167, 8.0134583, 6.1805917, 5.0699583, 6.9006667, 5.36365,
6.9204917, 5.4478667, 6.5391583, 6.0647417, 7.2951667, 5.6632833,
7.25595, 6.1057333, 6.9578417, 6.8235583, 5.8671833, 6.0716417,
7.060175, 6.5401, 6.1229417, 7.1305083, 6.7823417, 5.62415, 5.9202,
5.9957167, 6.7142167, 6.4706417, 5.9004667, 7.8304583, 8.2144667,
6.1530583, 7.6896417, 5.9285333, 4.2625417, 5.9677583, 4.58695,
6.0400083, 4.4215333, 5.6052833, 5.04165, 6.48845, 4.6423583,
6.1688833, 5.0256167, 5.926725, 5.7214667, 4.746375, 4.9828,
6.1583083, 5.6903, 5.217375, 6.1341583, 5.7868083, 4.5895333,
4.98235, 5.159725, 5.7866167, 5.6300833, 4.882975, 6.7210833,
7.4314833, 5.2493083, 6.8503833, 5.2225583, 3.8417833, 5.9798,
4.1168583, 5.63415, 4.3311333, 5.0777667, 4.6606833, 5.789425,
4.3565167, 5.9736167, 4.8910667, 5.9445417, 5.699275, 4.6897167,
4.9036083, 5.8767, 5.088675, 4.6224417, 5.8052833, 5.5697167,
4.3237, 4.6084333, 4.2958833, 5.1394417, 5.0137583, 4.7711, 6.771275,
6.5984417, 4.845625, 6.3338083, 5.1370333, 3.1820167, 5.2699667,
3.4827167, 5.0992583, 3.7040583, 4.6358583, 4.1604917, 5.2488333,
3.7522, 5.3774167, 4.2636167, 5.1998167, 5.0456333, 4.051475,
4.289175, 5.1718917, 4.5787083, 4.1461667, 5.2983167, 5.03025,
3.8709333, 4.0917167, 3.731925, 4.5584167, 4.4200333, 4.061375,
6.064225, 6.02975, 4.1590167, 5.6589083, 4.2614833, 3.68695,
5.587375, 3.91725, 5.3387, 4.0061667, 4.9563833, 4.1942, 5.6720583,
3.9584333, 5.6873583, 4.6251, 5.4801417, 5.3975583, 4.2382, 4.6710917,
5.4898083, 5.0469667, 4.4950083, 5.72005, 5.46085, 4.30355, 4.5525917,
4.3681667, 5.1723167, 5.0331417, 4.4793083, 6.5492917, 6.720225,
4.7550917, 6.197775, 4.8082917, 4.09925, 5.986525, 4.3104417,
5.68455, 4.4287167, 5.3555667, 4.5191083, 5.9269833, 4.2695917,
5.9984167, 4.981225, 5.8049917, 5.7680667, 4.5736667, 5.0673583,
5.7443583, 5.2811083, 4.719175, 6.0376667, 5.73875, 4.3947333,
4.8157333, 4.6093417, 5.3906417, 5.2357417, 4.684825, 6.8885583,
7.018425, 5.0878167, 6.5122333, 5.2084, 3.810525, 6.2600083,
3.6246583, 5.7396417, 4.0617917, 5.6724583, 4.2505833, 4.7518417,
4.1232, 6.208375, 4.5881167, 5.252575, 5.71795, 4.0840583, 4.700325,
6.2360333, 4.701725, 3.922525, 5.5162167, 5.6220333, 3.8836833,
4.4883667, 4.5398583)), .Names = "x", row.names = c(NA, -962L
), class = "data.frame")
Assuming I want 30 values per interval (the 'n'), here is the code I used:
df$z<-cut(df$x, seq(30,length(df$x),by=30)/length(df$x), include.lowest=T)
Which gives me:
> table(df$z)
[0.0312,0.0624] (0.0624,0.0936] (0.0936,0.125] (0.125,0.156] (0.156,0.187] (0.187,0.218] (0.218,0.249] (0.249,0.281] (0.281,0.312] (0.312,0.343] (0.343,0.374]
0 0 0 0 0 0 0 0 0 0 0
(0.374,0.405] (0.405,0.437] (0.437,0.468] (0.468,0.499] (0.499,0.53] (0.53,0.561] (0.561,0.593] (0.593,0.624] (0.624,0.655] (0.655,0.686] (0.686,0.717]
0 0 0 0 0 0 0 0 0 0 0
(0.717,0.748] (0.748,0.78] (0.78,0.811] (0.811,0.842] (0.842,0.873] (0.873,0.904] (0.904,0.936] (0.936,0.967] (0.967,0.998]
0 0 0 0 0 0 0 0 0
What I want is a similar result to what I get with quantiles:
df$zbis<-cut(df$x, quantile(df$x, (0:20)/20), include.lowest=T)
table(df$zbis)
[3.18,4.29] (4.29,4.62] (4.62,4.89] (4.89,5.14] (5.14,5.33] (5.33,5.53] (5.53,5.66] (5.66,5.8] (5.8,5.94] (5.94,6.1] (6.1,6.26] (6.26,6.45] (6.45,6.58] (6.58,6.74] (6.74,6.93]
49 48 48 48 48 48 48 48 48 48 48 48 48 48 48
(6.93,7.14] (7.14,7.34] (7.34,7.62] (7.62,8.06] (8.06,9.36]
48 48 48 48 49
Except I'd like this to be reproducible for another database, and so I can't use the quantile function, since I would not get the same intervals on a different database.
SECOND EDIT: here is the second sample from another database. 'x' is the same variable, and they have similar ranges.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1931083, 7.0688917, 5.4793583, 7.0091583, 5.7593, 7.1053333,
5.9382583, 7.1765417, 6.003075, 7.7699833, 6.2757333, 7.2446583,
7.179275, 6.0013083, 6.447975, 7.7845833, 6.9071083, 6.1009,
7.425425, 7.4619083, 5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417,
6.0763917, 8.5926583, 8.7468417, 7.2485167, 8.5096833, 5.177275,
7.09985, 5.6444667, 7.0102417, 5.7303833, 7.0383333, 5.9870583,
7.3342083, 5.9363667, 7.7753333, 6.38355, 7.389575, 7.0396667,
5.889625, 6.29395, 7.51135, 6.940925, 6.1455417, 7.4281833, 7.4657167,
5.9707083, 6.1902083, 6.0936167, 6.9595167, 6.85065, 5.8525,
8.5148083, 8.805625, 7.00665, 8.4457, 5.3437833, 7.1560417, 5.5748,
7.4622583, 5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685,
8.0270917, 6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417,
8.0401167, 7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667,
6.6103333, 7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625,
7.2460917, 8.660125, 3.614125, 5.6345917, 3.9410417, 5.2901417,
4.0147333, 4.766825, 4.4500417, 5.5189, 4.11375, 5.6350667, 4.5756917,
5.5998833, 5.3663, 4.44405, 4.5767417, 5.552025, 4.847425, 4.4382583,
5.5769417, 5.2390667, 4.0610917, 4.4054833, 4.1917, 4.9029083,
4.6935917, 4.3499417, 6.0562333, 6.081225, 4.45855, 6.0121583,
4.740275, 4.5028, 6.4177833, 4.8716417, 6.1469917, 4.6208917,
5.7748083, 5.4530083, 6.694125, 5.0944333, 6.5123167, 5.3257083,
6.2765333, 6.0149167, 5.1815583, 5.30715, 6.4149083, 5.82245,
5.515425, 6.3654333, 5.8472833, 4.9798917, 5.1833583, 5.5210333,
6.0410667, 5.7377917, 5.2666083, 7.0378167, 7.744175, 5.718725,
7.3220583, 5.24325, 5.3256, 7.2155167, 5.696925, 7.0029667, 5.5235,
6.7261083, 6.2810667, 7.546825, 5.90915, 7.3299167, 6.2227333,
7.147075, 6.9142417, 6.0012083, 6.1725333, 7.29815, 6.7, 6.3454583,
7.2129583, 6.7559833, 5.8115, 6.0756667, 6.458225, 6.9969167,
6.778825, 6.2245833, 8.0809583, 8.875325, 6.7210917, 8.3203,
6.3513, 5.2591333, 7.1404917, 5.6266417, 6.9356, 5.4568, 6.6604,
6.206025, 7.48525, 5.8323667, 7.24635, 6.1446583, 7.066275, 6.8334,
5.9198667, 6.09505, 7.2206583, 6.63085, 6.270075, 7.1397333,
6.689125, 5.7441333, 6.042575, 6.38255, 6.9325833, 6.7175667,
6.1592, 8.00415, 8.8051167, 6.647125, 8.2465667, 6.2788167, 6.49435,
8.1847583, 6.664475, 8.0528583, 6.6822417, 7.376, 7.1517833,
8.2306833, 6.8584583, 8.3052167, 7.288375, 8.2758583, 7.7162583,
7.2807833, 7.0459, 8.2507833, 7.5855, 7.0505917, 8.2230167, 8.1669,
6.8184667, 6.9700583, 7.0936167, 7.7615667, 7.6239083, 7.0921667,
9.02585, 9.3416167, 7.6256333, 9.0869333, 8.0984667, 4.116325,
6.1680917, 4.56965, 5.797725, 4.36085, 5.42455, 5.144075, 6.1531833,
4.77825, 6.2533417, 5.0192083, 5.99395, 5.6934083, 4.9074167,
4.9823083, 5.9861667, 5.4068833, 5.1872833, 6.10095, 5.659325,
4.6632833, 4.86315, 5.221775, 5.5878, 5.3217083, 4.8202333, 6.4883083,
6.69355, 4.952075, 6.7075583, 5.00015, 5.2502833, 7.2591, 5.6425417,
6.889925, 5.353675, 6.50635, 6.260675, 7.4236583, 5.9076417,
7.3915, 6.2134917, 7.1645333, 6.922675, 6.0295417, 6.1687917,
7.2771083, 6.6152333, 6.3299417, 7.167325, 6.647275, 5.726475,
5.93905, 6.2888583, 6.7497167, 6.4364083, 5.8906583, 7.6052917,
8.039425, 6.5672833, 7.8754667, 6.3086333, 5.352025, 7.2849417,
5.7184833, 6.9675917, 5.5615333, 6.6157917, 6.3505417, 7.4881,
6.0007417, 7.5110583, 6.35525, 7.254075, 7.0289083, 6.1994417,
6.2860833, 7.372575, 6.735975, 6.4628917, 7.3102167, 6.8619417,
5.9123667, 6.1611917, 6.4854083, 6.8942417, 6.563625, 6.0610083,
7.941625, 8.6969167, 6.66075, 8.1197167, 6.2802, 3.9638, 5.870825,
4.1852, 5.5841417, 4.3007583, 5.2352167, 4.4281417, 5.819425,
4.1990917, 5.9338917, 4.89765, 5.7204333, 5.6546833, 4.5632167,
4.9803333, 5.6962417, 5.247725, 4.7092583, 6.0145417, 5.6403917,
4.4016917, 4.7181, 4.5007833, 5.2828917, 5.1314167, 4.7492, 6.777575,
6.9040083, 4.9760583, 6.4471917, 5.0952833, 3.712725, 5.8215333,
4.025725, 5.5635, 4.2354083, 5.143525, 4.4900083, 5.6802417,
4.1214333, 5.8128, 4.7525583, 5.6412583, 5.5534917, 4.487475,
4.8237833, 5.6156917, 5.0573, 4.5755417, 5.8096083, 5.5252083,
4.3145583, 4.5437417, 4.194675, 5.0100833, 4.8972333, 4.590025,
6.6441417, 6.5789417, 4.6947667, 6.1648167, 4.8517333, 4.1059833,
5.9023167, 4.2812417, 5.6593917, 4.3587583, 5.3359583, 4.983275,
6.0223417, 4.6178333, 6.1545333, 5.0244667, 5.9596, 5.7608833,
4.8875333, 4.9990583, 5.9919333, 5.3157417, 5.0169333, 6.024775,
5.6717167, 4.6372083, 4.8370583, 4.7311333, 5.3704, 5.133575,
4.7174917)), .Names = "x", row.names = c(NA, -455L), class = "data.frame")
Updated after some comments:
Since you state that the minimum number of cases in each group would be fine for you, I'd go with Hmisc::cut2
v <- rnorm(10, 0, 1)
Hmisc::cut2(v, m = 3) # minimum of 3 cases per group
The documentation for cut2 states:
m desired minimum number of observations in a group.
The algorithm does not guarantee that all groups will have at least m observations.
The same cuts for separate variables
If the distributions of your variables are very similar you could extract the exact cutpoints by setting the argument onlycuts = T and reuse them for the other variables. In case the distributions are different though, you will end up with few cases in some intervals.
Using your data:
library(magrittr)
library(Hmisc)
cuts <- cut2(df1$x, g = 20, onlycuts = T) # determine cuts based on df1
cut2(df1$x, cuts = cuts) %>% table
cut2(df2$x, cuts = cuts) %>% table*2 # multiplied by two for better comparison
This is a good example of how NOT to pose a question. At last we have an example an, it is possible to post code that applies to it. (You apparently naively pasted the exact code in my comment without thinking about how to express 'n' and 'N' in the context of the problem. I did need to add prob=c( seq(...) , 1) in order to capture the highest values.
This assumes that you want groups of size 100 (although it is still very unclear why this is needed).
x$xct <- cut( x$x, breaks=quantile(x$x, prob=c( seq(100, length(x$x), by=100)/length(x$x) , 1) ))
table(x$xct)
(4.64,5.17] (5.17,5.57] (5.57,5.85] (5.85,6.17] (6.17,6.51] (6.51,6.85]
100 100 100 100 100 100
(6.85,7.26] (7.26,7.94] (7.94,9.36]
100 100 62

Resources