questionnaires filled on the same day - r

I am working with a data set with multiple questionnaires which were supposed to be filled in on different timepoints i.e.
173 9/13/2013 10/29/2013 9/26/2014
174 10/21/2013 11/25/2013 11/3/2014
175 7/1/2014 7/3/2015 4/27/2016
176 1/15/2014 2/24/2014 6/10/2015
177 3/15/2014 4/1/2015
178 7/18/2014 9/18/2014 8/17/2015
179 6/30/2013 8/15/2013 7/15/2014
180 4/22/2013 6/24/2013 5/11/2014
181 12/7/2014 12/26/2015
182 4/2/2015 5/17/2015 4/20/2016
183 1/12/2015 2/26/2015 1/28/2016
184 7/18/2014 8/26/2014 8/14/2015
185 8/27/2013 10/19/2013 9/21/2014
186 10/29/2013 11/30/2013 11/6/2014
187 9/17/2014 11/18/2014 10/20/2015
188 5/10/2014 6/27/2014 6/1/2015
189 10/4/2013 10/5/2014
190 1/22/2013 4/11/2013
191 10/21/2014 10/21/2014
I would like to know how to see how many participants filled in all questionnaires on the same day, how many participants filled in at least 2 questionnaires on the same day. how many at least 3 on the same day etc.
Any help would be highly appreciated.
Reproducible data:
Label = c(
"1/25/2015", "1/25/2016", "1/26/2014", "1/26/2015", "1/27/2014",
"1/27/2015", "1/28/2014", "1/28/2015", "1/29/2015", "1/3/2014",
"1/3/2015", "1/3/2016", "1/30/2015", "1/31/2014", "1/4/2014",
"1/4/2015", "1/4/2016", "1/5/2014", "1/5/2015", "1/6/2014",
"1/6/2015", "1/7/2014", "1/7/2015", "1/8/2014", "1/8/2015",
"1/9/2014", "1/9/2015", "1/9/2016", "10/1/2012", "10/1/2013",
"10/1/2014", "10/1/2015", "10/10/2013", "10/10/2014", "10/11/2013",
"10/11/2014", "10/11/2015", "10/12/2013", "10/12/2014", "10/12/2015",
"10/13/2013", "10/13/2014", "10/13/2015", "10/14/2013", "10/14/2014",
"10/14/2015", "10/15/2014", "10/15/2015", "10/16/2013", "10/16/2014",
"10/16/2015", "10/17/2013", "10/17/2014", "10/17/2015", "10/18/2013",
"10/18/2014", "10/18/2015", "10/19/2013", "10/19/2014", "10/19/2015",
"10/2/2013", "10/2/2014", "10/20/2013", "10/20/2014", "10/20/2015",
"10/21/2013", "10/21/2014", "10/22/2013", "10/22/2014", "10/22/2015",
"10/23/2012", "10/23/2013", "10/23/2014", "10/23/2015", "10/24/2013",
"10/24/2014", "10/24/2015", "10/25/2013", "10/25/2014", "10/26/2013",
"10/26/2014", "10/26/2015", "10/27/2013", "10/27/2014", "10/27/2015",
"10/28/2013", "10/28/2014", "10/29/2013", "10/29/2014", "10/3/2014",
"10/3/2015", "10/30/2014", "10/31/2012", "10/31/2013", "10/31/2014",
"10/31/2015", "10/4/2013", "10/4/2014", "10/4/2015", "10/5/2014",
"10/5/2015", "10/6/2013", "10/6/2014", "10/6/2015", "10/7/2013",
"10/7/2014", "10/8/2012", "10/8/2014", "10/8/2015", "10/9/2013",
"10/9/2014", "10/9/2015", "11/1/2013", "11/1/2014", "11/1/2015",
class = "factor")
Label = c(
"4/6/2015", "4/7/2015", "4/9/2012", "5/12/2015", "5/13/2014",
"5/14/2015", "5/15/2014", "5/15/2015", "5/17/2014", "5/19/2014",
"5/20/2014", "5/25/2014", "5/27/2014", "5/29/2014", "5/30/2014",
"5/30/2015", "5/31/2015", "5/4/2014", "5/9/2015", "6/1/2015",
"6/10/2014", "6/11/2014", "6/11/2015", "6/12/2015", "6/16/2014",
"6/16/2015", "6/18/2014", "6/21/2014", "6/24/2015", "6/25/2014",
"6/25/2015", "6/26/2015", "6/27/2015", "6/29/2015", "6/5/2014",
"6/6/2015", "6/8/2014", "7/1/2014", "7/13/2014", "7/14/2015",
"7/16/2014", "7/2/2014", "7/21/2014", "7/25/2014", "7/27/2014",
"7/27/2015", "7/28/2014", "7/29/2014", "7/30/2014", "7/31/2014",
"7/31/2015", "7/4/2014", "7/4/2015", "8/1/2014", "8/11/2014",
"8/11/2015", "8/25/2014", "8/27/2015", "8/5/2014", "8/8/2014",
"8/9/2015", "9/1/2014", "9/10/2015", "9/15/2015", "9/22/2013",
"9/3/2012", "9/30/2014", "9/8/2014", "9/8/2015"), class = "factor")
Label = c(" ",
"1/16/2016", "1/26/2015", "10/11/2015", "10/14/2015", "10/16/2015",
"10/6/2014", "10/7/2013", "11/11/2015", "11/15/2015", "11/17/2013",
"11/18/2013", "11/2/2015", "11/20/2013", "11/29/2013", "2/17/2014",
"2/17/2015", "2/21/2015", "2/23/2014", "2/25/2014", "2/25/2015",
"3/11/2016", "3/2/2014", "3/22/2015", "3/4/2014", "3/4/2016",
"4/11/2014", "4/12/2013", "4/18/2016", "4/21/2015", "4/23/2015",
"4/29/2015", "4/3/2015", "4/5/2016", "5/23/2015", "5/26/2015",
"5/27/2015", "5/28/2015", "5/29/2014", "5/29/2015", "5/8/2015",
"6/16/2015", "6/22/2015", "6/28/2015", "7/24/2015", "7/27/2015",
"7/4/2014", "7/8/2015", "9/14/2015", "9/15/2015", "9/16/2014",
"9/17/2014", "9/22/2014", "9/23/2014", "9/24/2014", "9/24/2015",
"9/26/2014", "9/28/2015", "9/30/2015", "9/9/2015"), class = "factor")), .Names = c("1A_RespDate",
"1B_RespDate", "1C_1_RespDate", "1C_2_RespDate",
"1C_RespDate", "2A_1_RespDate", "2A_RespDate", "2B_RespDate",
"2C_RespDate"), row.names = c(NA, -4831L), class = "data.frame")

I'll call you dataframe df:
sapply(apply(df,1,unique),length)
will give you the number of unique dates for each individual as a vector. The highest value is 7 and the min 1 (all questionnaires answered on the same day).
which(sapply(apply(df,1,unique),length)<7)
Will give you the index of the individuals who filled at least 2 questionnaires on the same day.
length(which(sapply(apply(df,1,unique),length)<7))
Will tell you how many individuals filled at least 2 questionnaires on the same day.
Edit:
This is inelegant (there must be a cleaner way) but it seems to work
which(sapply(sapply(sapply(apply(df,1,table),function(x) x==Z),which),function(x) any(x>0)))
Z is to be set to the number of questionnaires filled on the same day.
Explaination:
apply(df,1,table)
gives a list with for each individual the unique dates and how many times they appear.
sapply(apply(df,1,table),function(x) x==Z)
will give you the same list with True/False on whether a date appears exactly Z times.
sapply(sapply(apply(df,1,table),function(x) x==Z),which)
will give either "interger(0)" or a positive integer which is the index of the date for the individual (it's not something we are interested in).
sapply(sapply(sapply(apply(df,1,table),function(x) x==Z),which),function(x) any(x>0))
will give a vector of True/False corresponding to the index of the individual
then next step with "which" is to get the index for the True.
We therefore get the individuals for which a date appears exactly Z times.

Related

Averaging time-stamped data hourly in R

I've average distance travelled data meters for several days and times of the year. Here's how the dataset I'm working with looks like:
> print(datanet)
Date & Time [Local] meters
1: 2017-06-01 00:00:14 2.333355
2: 2017-06-01 01:00:13 6.952414
3: 2017-06-01 02:00:30 61.727543
4: 2017-06-01 03:00:15 235.873883
5: 2017-06-01 04:00:15 138.136375
---
1428: 2017-07-30 19:00:21 40.602983
1429: 2017-07-30 20:00:47 34.292888
1430: 2017-07-30 21:00:20 303.478297
1431: 2017-07-30 22:00:18 5.741059
Now, I would like to transform this table so that it provides the average distance travelled for each hour of the day from 0 to 23, based on data from multiple days. Here's the code I've been using for that purpose (additionally includes an sd column):
data_travel<-datanet %>%
mutate(
date = ymd_hms(`Date & Time [Local]`),
hour = hour(date)
) %>%
group_by(hour) %>%
summarise(
avg_meters = mean(meters),
sd_meters = sd(meters)
)
This works, but sadly the last hour of the day 23 always shows NA values:
> head(data_travel[19:24,])
# A tibble: 6 x 3
hour avg_meters sd_meters
<int> <dbl> <dbl>
1 18 57.4 109.
2 19 96.5 177.
3 20 121. 248.
4 21 141. 299.
5 22 76.4 86.4
6 23 NA NaN
Does somebody have an idea of how I could modify this code so that I also get the average distance travelled avg_meters and sd for hour 23? Any input is appreciated!
> dput(datanet[1:700,])
structure(list(`Date & Time [Local]` = structure(c(1464732013,
1464735613, 1464739229, 1464742813, 1464746413, 1464750018, 1464753629,
1464757213, 1464760813, 1464764430, 1464768013, 1464771629, 1464775214,
1464778818, 1464782429, 1464786013, 1464789623, 1464793214, 1464796813,
1464800413, 1464804029, 1464807612, 1464811213, 1464814830, 1464818411,
1464822013, 1464825629, 1464829213, 1464832812, 1464836417, 1464840021,
1464843612, 1464847211, 1464850830, 1464854412, 1464858030, 1464861629,
1464865211, 1464868814, 1464872416, 1464876030, 1464879611, 1464883212,
1464886811, 1464890430, 1464894017, 1464897611, 1464901217, 1464904830,
1464908419, 1464912011, 1464915610, 1464919207, 1464922850, 1464926410,
1464930004, 1464933623, 1464937249, 1464940811, 1464944410, 1464948011,
1464951629, 1464955218, 1464958811, 1464962430, 1464966018, 1464969610,
1464973214, 1464976848, 1464980410, 1464984010, 1464987610, 1464991230,
1464994811, 1464998409, 1465002003, 1465005610, 1465009222, 1465012817,
1465016402, 1465020030, 1465023618, 1465027214, 1465030846, 1465034419,
1465038020, 1465041690, 1465045212, 1465048849, 1465052409, 1465056029,
1465059617, 1465063209, 1465066830, 1465070410, 1465074009, 1465077630,
1465081240, 1465084804, 1465088401, 1465092010, 1465095622, 1465099222,
1465102852, 1465106449, 1465110010, 1465113612, 1465117270, 1465120849,
1465124408, 1465128011, 1465131617, 1465135230, 1465138848, 1465142408,
1465146008, 1465149612, 1465153250, 1465156808, 1465160427, 1465164016,
1465167601, 1465171229, 1465174830, 1465178415, 1465182054, 1465185656,
1465189209, 1465192807, 1465196422, 1465200029, 1465203732, 1465207230,
1465210848, 1465214408, 1465218050, 1465221629, 1465225216, 1465228823,
1465232415, 1465236015, 1465239615, 1465243256, 1465246830, 1465250414,
1465254008, 1465257629, 1465261214, 1465264814, 1465268449, 1465272007,
1465275630, 1465279255, 1465282845, 1465286460, 1465290013, 1465293617,
1465297226, 1465300814, 1465304418, 1465308029, 1465311614, 1465315214,
1465318814, 1465322429, 1465326007, 1465329607, 1465333213, 1465336813,
1465340422, 1465344013, 1465347615, 1465351230, 1465354813, 1465358413,
1465362014, 1465365630, 1465369216, 1465372830, 1465376413, 1465380013,
1465383606, 1465387230, 1465390817, 1465394430, 1465398013, 1465401613,
1465405213, 1465408829, 1465412411, 1465416013, 1465419613, 1465423229,
1465426812, 1465430412, 1465434011, 1465437614, 1465441224, 1465444812,
1465448413, 1465452013, 1465455629, 1465459213, 1465462817, 1465466430,
1465470011, 1465473611, 1465477211, 1465480830, 1465484412, 1465488012,
1465491611, 1465495230, 1465498820, 1465502426, 1465506011, 1465509606,
1465513212, 1465516830, 1465520419, 1465524003, 1465527611, 1465531227,
1465534850, 1465538403, 1465542010, 1465545612, 1465549229, 1465552811,
1465556425, 1465560011, 1465563611, 1465567227, 1465570819, 1465574430,
1465578018, 1465581610, 1465585210, 1465588830, 1465592418, 1465596009,
1465599602, 1465603209, 1465606829, 1465610450, 1465614009, 1465617609,
1465621210, 1465624817, 1465628422, 1465632017, 1465635610, 1465639210,
1465642829, 1465646417, 1465650008, 1465653630, 1465657209, 1465660813,
1465664451, 1465668003, 1465671609, 1465675228, 1465678816, 1465682409,
1465686002, 1465689608, 1465693229, 1465696816, 1465700401, 1465704008,
1465707629, 1465711217, 1465714809, 1465718429, 1465722016, 1465725608,
1465729201, 1465732830, 1465736415, 1465740024, 1465743630, 1465747215,
1465750811, 1465754430, 1465758016, 1465761607, 1465765230, 1465768828,
1465772415, 1465776016, 1465779613, 1465783260, 1465786808, 1465790414,
1465794014, 1465797659, 1465801216, 1465804807, 1465808430, 1465812030,
1465815614, 1465819220, 1465822816, 1465826408, 1465830029, 1465833613,
1465837215, 1465840829, 1465844414, 1465848008, 1465851627, 1465855213,
1465858814, 1465862414, 1465866029, 1465869607, 1465873207, 1465876814,
1465880429, 1465884015, 1465887630, 1465891207, 1465894815, 1465898429,
1465902014, 1465905614, 1465909218, 1465912824, 1465916413, 1465920013,
1465923618, 1465927229, 1465930813, 1465934413, 1465938013, 1465941613,
1465945229, 1465948811, 1465952412, 1465956022, 1465959626, 1465963214,
1465966830, 1465970412, 1465974030, 1465977606, 1465981230, 1465984813,
1465988429, 1465992012, 1465995613, 1465999229, 1466002812, 1466006404,
1466010012, 1466013626, 1466017211, 1466020830, 1466024412, 1466028012,
1466031619, 1466035211, 1466038813, 1466042430, 1466046004, 1466049612,
1466053212, 1466056830, 1466060420, 1466064026, 1466067603, 1466071211,
1466074829, 1466078410, 1466082019, 1466085619, 1466089211, 1466092804,
1466096429, 1466100011, 1466103611, 1466107230, 1466110821, 1466114422,
1466118009, 1466121611, 1466125234, 1466128803, 1466132412, 1466136027,
1466139610, 1466143210, 1466146834, 1466150410, 1466154034, 1466157609,
1466161210, 1466164834, 1466168403, 1466172007, 1466175620, 1466179202,
1466182802, 1466186433, 1466190020, 1466193609, 1466197209, 1466200802,
1466204433, 1466208009, 1466211609, 1466215202, 1466218854, 1466222419,
1466226009, 1466229634, 1466233208, 1466236811, 1466240426, 1466244008,
1466247608, 1466251234, 1466254826, 1466258409, 1466262008, 1466265609,
1466269233, 1466272808, 1466276408, 1466280011, 1466283647, 1466287216,
1466290808, 1466294408, 1466298033, 1466301601, 1466305210, 1466308825,
1466312407, 1466316009, 1466319634, 1466323208, 1466326833, 1466330408,
1466334007, 1466337633, 1466341210, 1466344834, 1466348419, 1466352015,
1466355616, 1466359223, 1466362820, 1466366415, 1466370014, 1466373622,
1466377205, 1466380815, 1466384414, 1466388033, 1466391608, 1466395206,
1466398833, 1466402415, 1466406017, 1466409634, 1466413214, 1466416817,
1466420431, 1466424008, 1466427615, 1466431234, 1466434813, 1466438434,
1466442013, 1466445614, 1466449207, 1466452835, 1466456413, 1466460013,
1466463613, 1466467213, 1466470834, 1466474413, 1466478014, 1466481633,
1466485223, 1466488806, 1466492416, 1466496035, 1466499613, 1466503206,
1466506835, 1466510412, 1466514012, 1466517635, 1466521213, 1466524823,
1466528413, 1466532010, 1466535605, 1466539234, 1466542812, 1466546405,
1466550012, 1466553612, 1466557234, 1466560813, 1466564412, 1466568035,
1466571605, 1466575214, 1466578835, 1466582412, 1466586012, 1466589605,
1466593222, 1466596811, 1466600412, 1466604005, 1466607629, 1466611212,
1466614811, 1466618430, 1466622019, 1466625612, 1466629230, 1466632820,
1466636411, 1466640010, 1466643629, 1466647213, 1466650811, 1466654430,
1466658019, 1466661619, 1466665218, 1466668812, 1466672430, 1466676011,
1466679610, 1466683220, 1466686830, 1466690419, 1466694010, 1466697614,
1466701247, 1466704818, 1466708411, 1466712028, 1466715618, 1466719203,
1466722810, 1466726423, 1466730010, 1466733609, 1466737223, 1466740817,
1466744403, 1466748031, 1466751617, 1466755210, 1466758811, 1466762411,
1466766030, 1466769619, 1466773212, 1466776847, 1466780417, 1466784009,
1466787609, 1466791231, 1466794818, 1466798409, 1466802010, 1466805609,
1466809230, 1466812817, 1466816401, 1466820009, 1466823660, 1466827217,
1466830831, 1466834409, 1466838008, 1466841619, 1466845216, 1466848808,
1466852404, 1466856021, 1466859616, 1466863232, 1466866808, 1466870408,
1466874011, 1466877630, 1466881215, 1466884818, 1466888417, 1466892019,
1466895616, 1466899207, 1466902809, 1466906430, 1466910016, 1466913611,
1466917226, 1466920821, 1466924416, 1466928008, 1466931622, 1466935231,
1466938815, 1466942431, 1466946015, 1466949629, 1466953215, 1466956808,
1466960415, 1466964030, 1466967614, 1466971215, 1466974812, 1466978430,
1466982014, 1466985614, 1466989208, 1466992830, 1466996413, 1467000014,
1467003614, 1467007230, 1467010814, 1467014415, 1467018015, 1467021630,
1467025214, 1467028830, 1467032415, 1467036030, 1467039614, 1467043206,
1467046815, 1467050430, 1467054012, 1467057613, 1467061213, 1467064814,
1467068430, 1467072013, 1467075608, 1467079250, 1467082814, 1467086414,
1467090007, 1467093629, 1467097214, 1467100824, 1467104413, 1467108013,
1467111630, 1467115213, 1467118813, 1467122429, 1467126006, 1467129613,
1467133213, 1467136829, 1467140412, 1467144013, 1467147613, 1467151214,
1467154829, 1467158410, 1467162027, 1467165605, 1467169212, 1467172813,
1467176428, 1467180012, 1467183619, 1467187220, 1467190812, 1467194405,
1467198012, 1467201629, 1467205218, 1467208830, 1467212411, 1467216004,
1467219611, 1467223211, 1467226829, 1467230411, 1467234012, 1467237612,
1467241229, 1467244811, 1467248405), class = c("POSIXct", "POSIXt"
), tzone = ""), meters = c(7.24497992499657, 4.87741163537199,
9.08560044628181, 80.6842320881314, 238.606484922097, 157.204921816723,
625.23872908032, 219.35778781259, 12.6588736944506, 93.8090439559674,
319.445131807673, 67.8036768396769, 804.804836152127, 109.434600933436,
129.949236899749, 105.911149760734, 27.9531918089091, 11.27836453714,
457.093853355937, 26.5240927781247, 19.7015020304213, 14.3532653640863,
1.25853679670009, 0.150718694512225, 1.70366003911483, 2.63870002711148,
127.037462401145, 961.452700995197, 215.04628486518, 48.3476802703997,
56.4299311045402, 71.0567210386123, 53.2157129067539, 80.4040760406296,
236.078682140782, 406.948035573002, 92.6423364709784, 403.797511366086,
323.858212895809, 65.9783289318472, 26.7161400634748, 21.4406886404941,
44.6906704150594, 36.0784092780547, 66.4678272178005, 68.0358199816987,
2.1476323514823, 3.01587341033808, 1.57380761082474, 1.71653324348141,
18.8397076847765, 184.268772826548, 61.2103183204004, 82.9010640232318,
43.7120771884048, 40.4214303580113, 220.354835462908, 77.5844706628055,
10.6522275628958, 64.6401569172547, 170.237028243589, 235.781539666942,
206.150503465281, 25.3213069661311, 36.7436253838348, 9.83110790227874,
23.3459053606757, 3.45271958972457, 1.96114320043511, 20.4049146593214,
15.2372682099889, 20.3543121890185, 42.7350584816069, 12.1313207862892,
1.11708614676525, 191.836648404227, 33.3046462595366, 166.168666618136,
31.1722631768611, 133.717766242875, 12.0334817161546, 62.2359071313657,
16.7484729490856, 109.549479467076, 438.080739581294, 37.0971614841641,
105.391252306762, 122.494788370234, 88.6622245013997, 24.3191344096727,
5.117649955497, 51.9358625225939, 47.9478783281661, 6.96463276369705,
1.75025309899143, 31.176657161161, 10.1169843733554, 26.5346636683759,
15.9584899969855, 337.838831129694, 59.7693703670957, 46.7853809521572,
16.799710673628, 39.1979373391332, 122.408881979713, 266.855999717221,
63.8055787186155, 57.9900269187913, 120.78876572575, 82.1213040340665,
105.298734249817, 161.923229191297, 28.9509612131438, 0.248722765246352,
6.42826019283635, 4.80096922046293, 7.66924438494585, 3.77931970556652,
6.16226345339552, 1.89180927192504, 2.92660299028088, 4.47513027348909,
5.3772236196912, 258.79885256986, 76.3673624568927, 227.248769639605,
119.571707120552, 35.7102849958032, 36.9949248244319, 137.90048603805,
96.6658682838857, 259.080913058415, 105.606050276669, 56.1002922989478,
85.2381765021222, 191.363093870704, 55.7981801107081, 12.9578924739909,
26.3419895578265, 14.4503596334286, 15.6675803413194, 21.9669267962415,
63.1276880372023, 5.54867147176836, 9.0179124542279, 7.13599657582419,
69.6648263961824, 352.989183299746, 263.287397250075, 253.766882591523,
209.967849272818, 73.9692977527144, 98.0159993160327, 170.190795021595,
282.190504225449, 78.6666650047386, 27.0630775295066, 332.829084995611,
194.938072897224, 102.422860453484, 17.7992858642505, 13.1266890679012,
4.58091256610204, 6.40555894626406, 2.66715489350561, 19.248878078399,
14.0807810821772, 4.57816759344819, 4.40196859830686, 10.7329317290172,
32.4528952520776, 138.596548507858, 125.547606032588, 46.3652014291144,
16.3797234392651, 10.249071010749, 248.440266699442, 304.347056271548,
154.412296810916, 46.8081932028809, 226.453483692211, 431.805061221221,
111.437754042661, 217.641929376792, 25.1923986792615, 12.2256823484931,
12.2949586884092, 31.0958526630604, 85.5575841303107, 22.7975660566324,
30.5216893316272, 35.1775681936213, 9.50846937534727, 9.76657486076715,
0.579469646956765, 266.224607309967, 188.727073099707, 292.559096086872,
9.49743703683714, 107.113753739463, 175.681846441223, 19.379648871926,
88.2778253322274, 410.496903513497, 8.97162022276979, 32.4619475881012,
73.646386222969, 75.348171202073, 90.8515841171699, 10.6200903372279,
3.02379387011622, 32.825022046837, 65.550868227495, 24.9833819842075,
77.5493346217115, 2.10544392843427, 2.32222002989636, 3.12605565461408,
1.02518236594501, 341.46710436094, 151.588096225148, 353.933570258634,
124.173972566209, 60.4110080957218, 38.5295043269143, 154.717374816579,
10.642332114307, 112.19511336859, 178.656934678561, 144.883837500965,
193.991868696415, 202.99316836535, 77.6189915466929, 0.871460936515423,
1.63576829944789, 47.5439446635587, 60.241399209101, 92.7059630247652,
1.71653312232677, 3.28998417221502, 16.9888823353554, 1.073227079111,
31.3529682130551, 98.2633746496518, 146.311948071212, 277.215271024987,
30.6004645511119, 49.4907657584358, 17.6377880041836, 517.661457540348,
581.555783536356, 1010.85341607138, 101.36421835411, 101.587448595859,
144.303729077564, 91.3938747436922, 149.518556866971, 36.3308699793953,
7.80121835979054, 23.0312990229266, 13.41048184825, 20.339107047676,
3.08847373655867, 31.7536206163432, 80.5523050297356, 5.57519986215111,
12.7301911126705, 265.400347490029, 96.389278202961, 96.4450196328944,
269.701595926116, 40.1994744716222, 185.194247845766, 132.799182823423,
92.6508846479433, 31.7196753780259, 82.6725380176083, 149.907487149117,
259.995942351777, 136.962271891916, 47.7342981878729, 28.0369643012698,
23.5176540297538, 85.9823208879668, 69.1793641133218, 2.10460736024825,
2.47507980743031, 1.46820616137708, 8.50065507425538, 9.43358037894557,
15.0643556352927, 160.034358372113, 401.192221112903, 208.507212166668,
16.9012427928657, 70.1561153179486, 282.055943502233, 95.7582280566781,
20.4921115795782, 224.297864971227, 248.751359316637, 63.3008262529409,
202.381548954774, 160.240208598145, 89.2596850671307, 2.41266014760612,
2.24662537954621, 92.0812846530431, 64.0782727558696, 102.539355421441,
133.192603215476, 2.19161459176181, 2.71657087565203, 2.60259248593429,
9.95223133490391, 175.102558714441, 102.128569321102, 45.1350310564478,
60.7860248161304, 166.511239966959, 32.3622524770883, 38.1126859517567,
169.914906248272, 165.51479143087, 21.6290787183546, 154.863792200668,
224.650919723327, 172.786068029272, 41.6201741515014, 40.966552071361,
60.9998953906058, 4.00012706993277, 79.9578066806101, 183.917814759389,
103.086558986388, 6.96826209073272, 5.74403906883466, 5.4856515067938,
28.6736690417882, 238.403484773501, 231.70110714268, 126.348996131178,
61.4905557699149, 104.389974082626, 246.69389506543, 79.3069202652704,
24.117595869327, 48.4779179700019, 69.4483313003939, 127.606317513607,
78.8710394107804, 98.1528155665254, 128.061282053331, 3.19606373207204,
1.87066709355931, 30.3658567894746, 67.1163251638405, 32.6323314265454,
0.228514489291685, 2.59419308146204, 6.96463324671082, 31.4857435059086,
32.2152666766542, 219.54173457463, 100.503943180343, 71.9061154834901,
91.2830509779235, 155.03560958018, 98.391102232677, 27.4736388446992,
34.7344995015586, 85.1266347031687, 74.6245520597207, 24.0060703654787,
139.030754853487, 171.244448038257, 112.193936557097, 2.39299360914121,
19.4583131438491, 58.7443921590234, 14.0595780932243, 216.575414578798,
127.459683665046, 26.4468118443774, 15.7904069541748, 3.42889322639363,
18.9364897049987, 137.67183581274, 122.967226389734, 119.894430382828,
104.442282446888, 190.221376118256, 131.941577184786, 353.399746658368,
448.477151206068, 133.358287719838, 261.649707603195, 81.4006720251283,
343.002058701936, 163.91259720329, 197.994167334045, 7.30944634184061,
7.26330302571758, 32.2642168570983, 330.892281390864, 76.9551034586096,
16.5345940654105, 93.4833973060589, 2.44622725450917, 1.07860556492395,
1.93778399725422, 77.9570596409594, 94.0913633507909, 56.7576348472931,
409.330409688539, 51.8605115857434, 101.399915620214, 186.284262562234,
150.206902386026, 73.7756320461831, 29.7407716653824, 148.98703547435,
220.790713913921, 244.515597043242, 18.6165511888937, 9.90234122916165,
33.2562502415159, 156.414444934157, 13.9779124860658, 48.9122633130083,
170.372846084158, 157.591584502432, 41.3319249226387, 14.1513887931616,
6.44142678100409, 228.471931226893, 110.023055666621, 33.5052868559644,
173.194133068492, 32.3931156891464, 44.5888695638585, 57.8480536590698,
130.274156166872, 79.3730952009515, 46.5247317494093, 113.821030825373,
345.300064595988, 152.140595169695, 17.0421460982582, 6.84417297878845,
12.5696620896434, 36.6545290246341, 74.6452657675716, 217.14457420751,
165.496335573275, 22.3871316935213, 6.61421665015, 20.0410769731539,
23.7057126539467, 168.517094664878, 110.986727962072, 83.3281747496762,
7.35167504947444, 76.5528141698817, 20.6384141732761, 87.00310582216,
402.411224410847, 145.210679361704, 55.1206401339897, 446.103457643039,
95.317801637148, 198.306682822754, 88.7652010770343, 4.4779529467687,
55.2357786872407, 118.400174413319, 163.550512253059, 103.789889510405,
70.1485296476271, 44.9031868790507, 5.42285572703366, 23.2710355323781,
7.96212769344129, 37.560076557467, 210.670533114753, 104.544578996089,
438.121243591053, 56.4263114090557, 30.3428605030646, 102.704223497357,
88.0554172082872, 29.8261947342531, 21.3578133423672, 125.139532314134,
199.2412154636, 423.415414756748, 155.583267038193, 2.14393350563694,
2.77638044593597, 25.6375785864043, 176.271312482445, 188.095709294767,
162.049988299195, 28.6407159601821, 341.238744680548, 6.70617205440293,
0.685237342195351, 72.5497248411768, 411.366138460536, 14.570194300977,
35.8331305489166, 336.776755084208, 43.10843602833, 343.590748922672,
306.997839886018, 110.223009494854, 246.067728815614, 229.160642943454,
80.1932202086262, 223.436937319274, 7.33591020042729, 3.91210855110157,
0.301852508403949, 1.9831582085811, 100.093808999808, 136.307291596312,
56.0971664553408, 35.91142300096, 35.9638499452433, 2.71553679684771,
4.6621428025371, 140.593157625054, 274.807479865085, 221.786077846005,
61.483885141769, 20.3881339787884, 30.7758272593722, 96.0458882470437,
246.919746334924, 22.0033859138399, 198.28605047425, 103.814293419658,
46.5348985729046, 29.182847412964, 75.8648063336849, 13.8241139461049,
17.1286557911254, 88.1810161018373, 3.36013813866121, 69.639752829193,
677.723130883346, 41.0286431704323, 73.2159389655071, 2.97198914835809,
2.2955645498568, 77.4338889708046, 165.144080335453, 28.249849842644,
697.335948561217, 26.764915418294, 14.2190768683659, 91.7637857701146,
27.5440244171723, 16.8445374971489, 23.8566496302873, 0.981943140947041,
78.5997636834095, 162.138462101107, 44.1672067123073, 1.57379999898296,
4.52584675687701, 0.97658119252377, 0.477360675618112, 129.151103972441,
70.8307163818214, 275.971859788928, 127.881236082799, 1.170287338146,
0.866454371283992, 434.919703422169, 93.8451377376139, 207.10904118958,
46.8316256828644, 150.387134794503, 278.399451505872, 198.814569340003,
115.184928188408, 36.753170014185, 129.106822541989, 168.482550085438,
92.1323337766019, 250.394018594269, 37.1881210650176, 26.0619948566024,
2.1015758585879, 1.9337619910658, 2.5374401085012, 6.88084629859044,
240.364952743281, 23.3347433113824, 12.8301991435217, 104.664097883855,
11.9543330584122, 299.693093901171, 457.205452556256, 166.486441167246,
479.147896086039, 601.250116553193, 324.328442697521, 329.307886840488,
231.36130846456, 34.9248512789383, 159.724908476382, 310.307623928807,
665.667745992218, 440.34793375254, 47.0987434639045, 3.27176941323539,
9.70137304643561, 10.0607743796965, 3.08631438745061, 85.9070751173181,
114.552594829497, 56.4422079169895, 72.8598000828185, 49.5713482843566,
2.63222698246548, 278.660159682918, 374.155025716734, 614.896477070897,
84.7023024801914, 112.319999024275, 18.5987593461749, 82.9077768700278,
154.845742871174, 125.567795777075, 66.4893506450749, 126.741063662877,
82.2411837719443, 756.335890510717, 73.1151790189073, 3.5922959646701,
8.41573329417288, 4.41473763601453, 2.12491629471561, 57.10162180489,
51.9326111578832, 58.8698849597487, 64.1069702907545, 25.653260586019,
2.25452824419408, 133.78927961757, 341.7548293499, 14.113318950603,
64.7040755111393, 74.5271989167769, 407.725534601351, 309.316524308558,
20.2280966869265, 23.8884632436018, 20.0051667649045, 23.3715363806949,
21.8895053097727, 284.299801015909, 133.058636731235, 9.13435639512076,
8.93531290420054, 6.97575634977357, 38.5847487365879, 322.899303421944,
7.43662008052574, 31.3472739232612, 90.172402886085, 13.1780473878919,
11.8113106256799, 95.4868454357865, 111.151536039587, 62.3434590358668,
25.3913623754508, 152.028407407367, 140.924105429548, 110.376776160796,
21.9269046506022, 282.56001268775, 26.9210719144184, 88.3343050027196,
291.612562587322, 164.906755082596, 116.426543798048, 16.3087551310383,
6.52751999940019, 9.01631759743765, 16.1907026689521, 16.5315572289726,
1.88226712179479, 18.0388366074334, 48.3907627589146, 1.49068315465064,
9.44594654212787, 730.702774263893)), row.names = c(NA, -700L
), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000002641ef0>)
```
Your meters field might have NA values.
Hence adding na.rm = T to mean and sd functions will solve this.

Counting observations using multiple BY groups SAS

I am examining prescription patterns within a large EHR dataset. The data is structured so that we are given several key bits of information, such as patient_num, encounter_num, ordering_date, medication, age_event (age at event) etc. Example below:
Patient_num enc_num ordering_date medication age_event
1111 888888 07NOV2008 Wellbutrin 48
1111 876578 11MAY2011 Bupropion 50
2222 999999 08DEC2009 Amitriptyline 32
2222 999999 08DEC2009 Escitalopram 32
3333 656463 12APR2007 Imipramine 44
3333 643211 21DEC2008 Zoloft 45
3333 543213 02FEB2009 Fluoxetine 45
Currently I have the dataset sorted by patient_id then by ordering_date so that I can see what each individual was prescribed during their encounters in a longitudinal fashion. For now, I am most concerned with the prescription(s) that were made during their first visit. I wrote some code to count the number of prescriptions and had originally restricted later analyses to RX = 1, but as we can see, that doesn't work for people with multiple scripts on the same encounter (Patient 2222).
data pt_meds_;
set pt_meds;
by patient_num;
if first.patient_num then RX = 1;
else RX + 1;
run;
Patient_num enc_num ordering_date medication age_event RX
1111 888888 07NOV2008 Wellbutrin 48 1
1111 876578 11MAY2011 Bupropion 50 2
2222 999999 08DEC2009 Amitriptyline 32 1
2222 999999 08DEC2009 Escitalopram 32 2
3333 656463 12APR2007 Imipramine 44 1
3333 643211 21DEC2008 Zoloft 45 2
3333 543213 02FEB2009 Fluoxetine 45 3
I think it would be more appropriate to recode the encounter numbers into a new variable so that they reflect a style similar to the RX variable. Where each encounter is listed 1-n, and the number will repeat if multiple scripts are made in the same encounter. Such as below:
Patient_num enc_num ordering_date medication age_event RX Enc_
1111 888888 07NOV2008 Wellbutrin 48 1 1
1111 876578 11MAY2011 Bupropion 50 2 2
2222 999999 08DEC2009 Amitriptyline 32 1 1
2222 999999 08DEC2009 Escitalopram 32 2 1
3333 656463 12APR2007 Imipramine 44 1 1
3333 643211 21DEC2008 Zoloft 45 2 2
3333 543213 02FEB2009 Fluoxetine 45 3 3
From what I have seen, this could be possible with a variant of the above code using 2 BY groups (patient_num & enc_num), but I can't seem to get it. I think the first. / last. codes require sorting, but if I am to sort by enc_num, they won't be in chronological order because the encounter numbers are generated by the system and depend on all other encounters going in at that time.
I tried to do the following code (using ordering_date instead because its already sorted properly) but everything under Enc_ is printed as a 1. I'm sure my logic is all wrong. Any thoughts?
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
if first.patient_num;
if first.ordering_date then enc_ = 1;
else enc_ + 1;
run;
First
.First/.Last flags doesn't require sorting if data is properly ordered or you use NOTSORTED in your BY statement. If your variable in BY statement is not properly ordered then BY statment will throw error and stop executing when encounter deviations. Like this:
data class;
set sashelp.class;
by age;
first = first.age;
last = last.age;
run;
ERROR: BY variables are not properly sorted on data set SASHELP.CLASS.
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 FIRST.Age=1 LAST.Age=1 first=. last=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set SASHELP.CLASS.
Try this code to see how exacly .first/.last flags works:
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
fp = first.patient_num;
lp = last.patient_num;
fo = first.ordering_date;
lo = last.ordering_date;
run;
Second
Those condidions works differently than you think:
if expression;
If expression is true then continue with next instructions after if.
Otherwise return to begining of data step (no implicit output). This also implies your observation is not retained in the output.
In most cases if without then is equivalent to where. However
whereworks faster but it is limited to variables that comes from data set you are reading
if can be used with any type of expression including calculated fields
More info:: IF
Statement, Subsetting
Third
I think lag() function can be your answear.
data pt_meds_test;
set pt_meds_;
by patient_num;
retain enc_;
prev_patient_num = lag(patient_num);
prev_ordering_date = lag(ordering_date);
if first.patient_num then enc_ = 1;
else if patient_num = prev_patient_num and ordering_date ne prev_ordering_date then enc_ + 1;
end;
run;
With lag() function you can look what was the value of vairalbe on the previos observation and compare it with current one later.
But be carefull. lag() doesn't look for variable value from previous observation. It takes vale of variable and stores it in a FIFO queue with size of 1. On next call it retrives stored value from queue and put new value there.
More info: LAG Function
I'm not sure if this hurts the rest of your analysis, but what about just
proc freq data=pt_meds noprint;
tables patient_num ordering_date / out=pt_meds_freq;
run;
data pt_meds_freq2;
set pt_meds_freq;
by patient_num ordering_date;
if first.patient_num;
run;

cut function and controlled frequency in the intervals

My question is pretty simple: the cut() function allows to choose the breaks along which I can divide the range of my vector into intervals. I would like to be able to control for the number of observations within the newly created interval, in a way similar to what could be obtained with a quantile argument in the cut() function call. However I don't want to be using the quantile argument because I would like for the intervals to be chosen fixed, so that I can match them between different databases for further comparison, and I want the same discrete values to be found in the labels of the newly cut vectors.
I used to use this for the quantile approach:
df$z<-cut(df$x, quantile(x, (0:10)/10), include.lowest=TRUE)
Which is fairly simple. My new approach is even simpler, so it resembles this for example:
df$z<-cut(df$x, c(0.04,0.055,0.06,0.065,0.07,0.075,0.08,0.085,0.09,0.095,0.11), include.lowest=T)
I then have another variable which I want to calculate some statistics on, according to the levels of the discrete variable.
So it would go something like this :
df$conf.intx<-ifelse(df$z=="1",t.test(df[df$z=="1",]$y)$conf.int[1],
ifelse(df$z=="2",t.test(df[df$z=="2",]$y)$conf.int[1],
ifelse(df$z=="3",t.test(df[df$z=="3",]$y)$conf.int[1],
ifelse(df$z=="4",t.test(df[df$z=="4",]$y)$conf.int[1],NA))))
But for me to be able to calculate this kind of t-test confidence interval on each of the 'pools' of the y values (which number in the same amount as the observations within the intervals of the discrete variable), I need to be able to control for the number of values within each created interval for z, so that my test remains valid, at least as far as the number of observations is concerned.
Simply put, I'd need an automated procedure that would create the vector of breaks for the z variable so that each of them contains a minimum number of observations. As an added complication, it should be the same breaks for two different databases, which I don't know if it's possible.
Any help on the matter would be welcome, thank you in advance.
EDIT: here is a sample of my data for x.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1161917, 7.0606333, 5.2622667, 6.780925, 5.4615417, 6.48185,
5.51585, 6.2224333, 5.3660667, 7.196525, 6.2984083, 7.0137833,
7.4490083, 5.9712333, 6.4287833, 7.6693917, 6.4406417, 5.4135083,
7.16245, 7.2267, 5.820325, 6.066175, 5.760975, 6.4775, 6.2625,
5.5182583, 8.446625, 8.19025, 6.7955333, 4.7899583, 6.5680167,
4.5965917, 6.3539333, 4.6639, 6.0489667, 4.9047833, 5.353625,
4.711425, 6.6268833, 5.5458083, 6.3271917, 6.4591417, 5.1843917,
5.6117167, 7.1828417, 5.6956917, 5.0271917, 6.741875, 6.68305,
4.7859667, 5.3068667, 5.3245, 5.745675, 5.7518917, 5.37945, 8.0030417,
7.7064583, 6.2935333, 5.1838667, 6.9369333, 4.9734583, 6.7257167,
5.0510333, 6.4257667, 5.2858083, 5.7285167, 5.084, 7.0092833,
5.905875, 6.6893417, 6.8319583, 5.5558083, 5.9854833, 7.5552167,
6.064625, 5.3990333, 7.115175, 7.0600167, 5.1644833, 5.6848667,
5.7014417, 6.1051, 6.1186333, 5.7217667, 8.3685417, 8.071325,
6.6547333, 5.5972417, 7.4226, 5.539725, 7.26335, 5.645975, 6.87475,
5.8486167, 6.3001667, 5.5997833, 7.4353167, 6.5089583, 7.213625,
7.3125667, 6.12095, 6.5410083, 8.0639083, 6.6505167, 5.8886417,
7.6301167, 7.5850417, 5.7693667, 6.2480167, 6.1847167, 6.6896167,
6.6323917, 6.1972167, 8.8560333, 8.5501083, 7.1036167, 4.9929583,
6.9839583, 5.3847417, 6.8814417, 5.59555, 6.7867167, 5.7831333,
6.9370917, 5.7400917, 7.6922, 6.3151, 7.084725, 7.0414417, 5.95435,
6.4274167, 7.6692167, 6.9159, 6.0856083, 7.3079583, 7.1937667,
5.744675, 5.946525, 6.0651833, 6.8488833, 6.5924333, 5.772025,
8.3281167, 8.5475917, 6.7952917, 8.248525, 5.1931083, 7.0688917,
5.4793583, 7.0091583, 5.7593, 7.1053333, 5.9382583, 7.1765417,
6.003075, 7.7699833, 6.2757333, 7.2446583, 7.179275, 6.0013083,
6.447975, 7.7845833, 6.9071083, 6.1009, 7.425425, 7.4619083,
5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417, 6.0763917, 8.5926583,
8.7468417, 7.2485167, 8.5096833, 5.1541, 7.0479917, 5.43065,
6.9689083, 5.7356, 7.0842917, 5.9051667, 7.1283333, 5.9666667,
7.7295583, 6.249925, 7.21005, 7.1427167, 5.9675583, 6.4135667,
7.7448583, 6.874275, 6.0679333, 7.388675, 7.429025, 5.911225,
6.1757167, 6.095225, 7.045775, 6.9870833, 6.0567333, 8.5771167,
8.7541917, 7.3187333, 8.5092083, 5.5746, 7.342925, 5.8561667,
7.4704667, 5.922225, 6.9787, 6.1564167, 7.6059667, 5.9122917,
7.7848833, 6.6192, 7.34055, 7.2352417, 5.9776083, 6.5197583,
7.4891583, 7.2185667, 6.4710167, 7.70945, 7.5078083, 6.1470417,
6.66115, 6.6899333, 7.4454083, 7.2270917, 6.350075, 8.3156667,
8.9007917, 6.7578083, 8.3258083, 5.1996, 6.9688833, 5.3592917,
6.7583417, 5.5623583, 6.756375, 5.7361, 7.120425, 5.6567, 7.6174667,
6.1474833, 7.1442167, 6.74475, 5.5820333, 6.0106, 7.142675, 6.667475,
5.9067917, 7.2392, 7.058675, 5.6394417, 5.9119167, 5.8367333,
6.798025, 6.694675, 5.8565917, 8.6035083, 8.912375, 7.0501083,
8.38045, 4.8478083, 6.7493167, 5.3686667, 6.5152333, 5.282025,
6.5464333, 5.5085583, 6.870975, 5.4757667, 7.318, 5.92225, 6.9300417,
6.5758083, 5.4233083, 5.8295583, 7.0451, 6.4790083, 5.68255,
6.9632833, 6.9965833, 5.5005667, 5.717725, 5.5938083, 6.5309,
6.4824583, 5.4429833, 8.072575, 8.3635, 6.5797167, 8.0352333,
4.6289833, 6.64105, 4.8883833, 6.2025833, 5.2291833, 6.4814667,
5.2211083, 6.5780083, 5.196275, 7.030725, 5.6001583, 6.620475,
6.2858333, 5.114375, 5.5424417, 6.7784917, 6.1561333, 5.339375,
6.6249083, 6.6248583, 5.139775, 5.4195, 5.4531833, 6.3348583,
6.4041417, 5.292, 7.6243833, 7.9624583, 6.3226417, 7.761175,
4.8419083, 6.8384083, 5.3500417, 6.5903333, 5.33275, 6.732575,
5.4486, 6.8069417, 5.4569583, 7.26275, 5.835525, 6.8680333, 6.6712333,
5.4720417, 5.904325, 7.1506917, 6.4746833, 5.638675, 6.9570667,
7.0017333, 5.5033667, 5.6859333, 5.651875, 6.5903, 6.529725,
5.4819667, 7.971975, 8.2337833, 6.5815333, 7.9736583, 5.7711917,
7.543325, 5.8986917, 7.5081333, 6.2920333, 7.5321667, 6.4908917,
7.7616583, 6.4509417, 8.08035, 6.8219, 7.7939167, 7.6491333,
6.4773583, 6.9338667, 8.1865583, 7.3998917, 6.572125, 7.9198417,
8.0568, 6.5880333, 6.8299667, 6.7399833, 7.6436, 7.509275, 6.5139833,
9.1520167, 9.3580667, 7.65415, 9.0725167, 5.7483583, 7.5230417,
5.89105, 7.4808833, 6.1969667, 7.4923583, 6.4092583, 7.70695,
6.3970833, 8.0971333, 6.7949083, 7.76445, 7.6170167, 6.4494333,
6.8997, 8.1575333, 7.3728417, 6.544075, 7.888, 8.0215, 6.5484,
6.7911667, 6.7121917, 7.6179083, 7.4731167, 6.4629167, 9.1226333,
9.3307083, 7.6230583, 9.024875, 5.543925, 7.1460833, 5.6575583,
7.5986083, 6.027075, 7.4386167, 6.3500333, 7.6694833, 6.3682583,
8.0843333, 6.7181083, 7.7376, 7.5818583, 6.4010667, 6.8440083,
8.1217917, 7.3290833, 6.5187333, 7.8591667, 7.9898583, 6.5051,
6.7251167, 6.6881333, 7.477675, 7.3571333, 6.3351833, 8.881575,
9.12315, 7.3851, 8.8008667, 5.3437833, 7.1560417, 5.5748, 7.4622583,
5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685, 8.0270917,
6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417, 8.0401167,
7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667, 6.6103333,
7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625, 7.2460917,
8.660125, 5.2502833, 7.2591, 5.6425417, 6.889925, 5.353675, 6.50635,
6.260675, 7.4236583, 5.9076417, 7.3915, 6.2134917, 7.1645333,
6.922675, 6.0295417, 6.1687917, 7.2771083, 6.6152333, 6.3299417,
7.167325, 6.647275, 5.726475, 5.93905, 6.2888583, 6.7497167,
6.4364083, 5.8906583, 7.6052917, 8.039425, 6.5672833, 7.8754667,
6.3086333, 5.352025, 7.2849417, 5.7184833, 6.9675917, 5.5615333,
6.6157917, 6.3505417, 7.4881, 6.0007417, 7.5110583, 6.35525,
7.254075, 7.0289083, 6.1994417, 6.2860833, 7.372575, 6.735975,
6.4628917, 7.3102167, 6.8619417, 5.9123667, 6.1611917, 6.4854083,
6.8942417, 6.563625, 6.0610083, 7.941625, 8.6969167, 6.66075,
8.1197167, 6.2802, 3.9638, 5.870825, 4.1852, 5.5841417, 4.3007583,
5.2352167, 4.4281417, 5.819425, 4.1990917, 5.9338917, 4.89765,
5.7204333, 5.6546833, 4.5632167, 4.9803333, 5.6962417, 5.247725,
4.7092583, 6.0145417, 5.6403917, 4.4016917, 4.7181, 4.5007833,
5.2828917, 5.1314167, 4.7492, 6.777575, 6.9040083, 4.9760583,
6.4471917, 5.0952833, 3.712725, 5.8215333, 4.025725, 5.5635,
4.2354083, 5.143525, 4.4900083, 5.6802417, 4.1214333, 5.8128,
4.7525583, 5.6412583, 5.5534917, 4.487475, 4.8237833, 5.6156917,
5.0573, 4.5755417, 5.8096083, 5.5252083, 4.3145583, 4.5437417,
4.194675, 5.0100833, 4.8972333, 4.590025, 6.6441417, 6.5789417,
4.6947667, 6.1648167, 4.8517333, 3.982925, 5.7966833, 4.1607083,
5.5564833, 4.2557417, 5.2304083, 4.8661333, 5.912875, 4.4988333,
6.03915, 4.9131583, 5.8518667, 5.6578583, 4.773225, 4.8958583,
5.8759833, 5.204725, 4.8961667, 5.9217, 5.58395, 4.5410667, 4.73445,
4.5922333, 5.2517333, 5.0220333, 4.619475, 6.4883667, 6.429175,
4.6796417, 6.3171083, 4.93615, 3.9278833, 5.7590417, 4.1155667,
5.612725, 4.2199833, 5.2126667, 4.805275, 5.8888833, 4.4363,
6.0380083, 4.892, 5.8192083, 5.64205, 4.708825, 4.8751583, 5.833775,
5.2210417, 4.853225, 5.924225, 5.5856583, 4.5386167, 4.7280917,
4.5618, 5.264425, 5.03855, 4.5539, 6.4993, 6.4900667, 4.6749083,
6.2961333, 4.918525, 4.0890583, 6.33385, 4.3470083, 5.9645, 4.6541833,
5.5438667, 4.9556583, 6.1590583, 4.6379417, 6.2876833, 5.2235167,
6.1387167, 6.0547583, 4.9545667, 5.254125, 6.05395, 5.4813417,
4.9971333, 6.2266583, 5.9172833, 4.7275917, 4.9274917, 4.443575,
5.3164917, 5.2507083, 5.1704583, 7.173075, 6.9351583, 5.0816667,
6.5568, 5.3417667, 5.1705167, 7.0777833, 5.6253333, 7.231225,
5.5799167, 6.6942917, 6.1014583, 7.538725, 5.7152667, 7.459275,
6.2406083, 7.064925, 6.9234417, 5.8328833, 6.1819583, 7.2127583,
6.8071583, 6.2599417, 7.2975417, 6.973875, 5.804125, 6.1944667,
6.38855, 7.0553583, 6.8393167, 6.1275417, 7.9986833, 8.5846,
6.4682167, 8.0134583, 6.1805917, 5.0699583, 6.9006667, 5.36365,
6.9204917, 5.4478667, 6.5391583, 6.0647417, 7.2951667, 5.6632833,
7.25595, 6.1057333, 6.9578417, 6.8235583, 5.8671833, 6.0716417,
7.060175, 6.5401, 6.1229417, 7.1305083, 6.7823417, 5.62415, 5.9202,
5.9957167, 6.7142167, 6.4706417, 5.9004667, 7.8304583, 8.2144667,
6.1530583, 7.6896417, 5.9285333, 4.2625417, 5.9677583, 4.58695,
6.0400083, 4.4215333, 5.6052833, 5.04165, 6.48845, 4.6423583,
6.1688833, 5.0256167, 5.926725, 5.7214667, 4.746375, 4.9828,
6.1583083, 5.6903, 5.217375, 6.1341583, 5.7868083, 4.5895333,
4.98235, 5.159725, 5.7866167, 5.6300833, 4.882975, 6.7210833,
7.4314833, 5.2493083, 6.8503833, 5.2225583, 3.8417833, 5.9798,
4.1168583, 5.63415, 4.3311333, 5.0777667, 4.6606833, 5.789425,
4.3565167, 5.9736167, 4.8910667, 5.9445417, 5.699275, 4.6897167,
4.9036083, 5.8767, 5.088675, 4.6224417, 5.8052833, 5.5697167,
4.3237, 4.6084333, 4.2958833, 5.1394417, 5.0137583, 4.7711, 6.771275,
6.5984417, 4.845625, 6.3338083, 5.1370333, 3.1820167, 5.2699667,
3.4827167, 5.0992583, 3.7040583, 4.6358583, 4.1604917, 5.2488333,
3.7522, 5.3774167, 4.2636167, 5.1998167, 5.0456333, 4.051475,
4.289175, 5.1718917, 4.5787083, 4.1461667, 5.2983167, 5.03025,
3.8709333, 4.0917167, 3.731925, 4.5584167, 4.4200333, 4.061375,
6.064225, 6.02975, 4.1590167, 5.6589083, 4.2614833, 3.68695,
5.587375, 3.91725, 5.3387, 4.0061667, 4.9563833, 4.1942, 5.6720583,
3.9584333, 5.6873583, 4.6251, 5.4801417, 5.3975583, 4.2382, 4.6710917,
5.4898083, 5.0469667, 4.4950083, 5.72005, 5.46085, 4.30355, 4.5525917,
4.3681667, 5.1723167, 5.0331417, 4.4793083, 6.5492917, 6.720225,
4.7550917, 6.197775, 4.8082917, 4.09925, 5.986525, 4.3104417,
5.68455, 4.4287167, 5.3555667, 4.5191083, 5.9269833, 4.2695917,
5.9984167, 4.981225, 5.8049917, 5.7680667, 4.5736667, 5.0673583,
5.7443583, 5.2811083, 4.719175, 6.0376667, 5.73875, 4.3947333,
4.8157333, 4.6093417, 5.3906417, 5.2357417, 4.684825, 6.8885583,
7.018425, 5.0878167, 6.5122333, 5.2084, 3.810525, 6.2600083,
3.6246583, 5.7396417, 4.0617917, 5.6724583, 4.2505833, 4.7518417,
4.1232, 6.208375, 4.5881167, 5.252575, 5.71795, 4.0840583, 4.700325,
6.2360333, 4.701725, 3.922525, 5.5162167, 5.6220333, 3.8836833,
4.4883667, 4.5398583)), .Names = "x", row.names = c(NA, -962L
), class = "data.frame")
Assuming I want 30 values per interval (the 'n'), here is the code I used:
df$z<-cut(df$x, seq(30,length(df$x),by=30)/length(df$x), include.lowest=T)
Which gives me:
> table(df$z)
[0.0312,0.0624] (0.0624,0.0936] (0.0936,0.125] (0.125,0.156] (0.156,0.187] (0.187,0.218] (0.218,0.249] (0.249,0.281] (0.281,0.312] (0.312,0.343] (0.343,0.374]
0 0 0 0 0 0 0 0 0 0 0
(0.374,0.405] (0.405,0.437] (0.437,0.468] (0.468,0.499] (0.499,0.53] (0.53,0.561] (0.561,0.593] (0.593,0.624] (0.624,0.655] (0.655,0.686] (0.686,0.717]
0 0 0 0 0 0 0 0 0 0 0
(0.717,0.748] (0.748,0.78] (0.78,0.811] (0.811,0.842] (0.842,0.873] (0.873,0.904] (0.904,0.936] (0.936,0.967] (0.967,0.998]
0 0 0 0 0 0 0 0 0
What I want is a similar result to what I get with quantiles:
df$zbis<-cut(df$x, quantile(df$x, (0:20)/20), include.lowest=T)
table(df$zbis)
[3.18,4.29] (4.29,4.62] (4.62,4.89] (4.89,5.14] (5.14,5.33] (5.33,5.53] (5.53,5.66] (5.66,5.8] (5.8,5.94] (5.94,6.1] (6.1,6.26] (6.26,6.45] (6.45,6.58] (6.58,6.74] (6.74,6.93]
49 48 48 48 48 48 48 48 48 48 48 48 48 48 48
(6.93,7.14] (7.14,7.34] (7.34,7.62] (7.62,8.06] (8.06,9.36]
48 48 48 48 49
Except I'd like this to be reproducible for another database, and so I can't use the quantile function, since I would not get the same intervals on a different database.
SECOND EDIT: here is the second sample from another database. 'x' is the same variable, and they have similar ranges.
structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333,
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175,
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975,
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372,
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833,
5.1931083, 7.0688917, 5.4793583, 7.0091583, 5.7593, 7.1053333,
5.9382583, 7.1765417, 6.003075, 7.7699833, 6.2757333, 7.2446583,
7.179275, 6.0013083, 6.447975, 7.7845833, 6.9071083, 6.1009,
7.425425, 7.4619083, 5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417,
6.0763917, 8.5926583, 8.7468417, 7.2485167, 8.5096833, 5.177275,
7.09985, 5.6444667, 7.0102417, 5.7303833, 7.0383333, 5.9870583,
7.3342083, 5.9363667, 7.7753333, 6.38355, 7.389575, 7.0396667,
5.889625, 6.29395, 7.51135, 6.940925, 6.1455417, 7.4281833, 7.4657167,
5.9707083, 6.1902083, 6.0936167, 6.9595167, 6.85065, 5.8525,
8.5148083, 8.805625, 7.00665, 8.4457, 5.3437833, 7.1560417, 5.5748,
7.4622583, 5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685,
8.0270917, 6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417,
8.0401167, 7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667,
6.6103333, 7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625,
7.2460917, 8.660125, 3.614125, 5.6345917, 3.9410417, 5.2901417,
4.0147333, 4.766825, 4.4500417, 5.5189, 4.11375, 5.6350667, 4.5756917,
5.5998833, 5.3663, 4.44405, 4.5767417, 5.552025, 4.847425, 4.4382583,
5.5769417, 5.2390667, 4.0610917, 4.4054833, 4.1917, 4.9029083,
4.6935917, 4.3499417, 6.0562333, 6.081225, 4.45855, 6.0121583,
4.740275, 4.5028, 6.4177833, 4.8716417, 6.1469917, 4.6208917,
5.7748083, 5.4530083, 6.694125, 5.0944333, 6.5123167, 5.3257083,
6.2765333, 6.0149167, 5.1815583, 5.30715, 6.4149083, 5.82245,
5.515425, 6.3654333, 5.8472833, 4.9798917, 5.1833583, 5.5210333,
6.0410667, 5.7377917, 5.2666083, 7.0378167, 7.744175, 5.718725,
7.3220583, 5.24325, 5.3256, 7.2155167, 5.696925, 7.0029667, 5.5235,
6.7261083, 6.2810667, 7.546825, 5.90915, 7.3299167, 6.2227333,
7.147075, 6.9142417, 6.0012083, 6.1725333, 7.29815, 6.7, 6.3454583,
7.2129583, 6.7559833, 5.8115, 6.0756667, 6.458225, 6.9969167,
6.778825, 6.2245833, 8.0809583, 8.875325, 6.7210917, 8.3203,
6.3513, 5.2591333, 7.1404917, 5.6266417, 6.9356, 5.4568, 6.6604,
6.206025, 7.48525, 5.8323667, 7.24635, 6.1446583, 7.066275, 6.8334,
5.9198667, 6.09505, 7.2206583, 6.63085, 6.270075, 7.1397333,
6.689125, 5.7441333, 6.042575, 6.38255, 6.9325833, 6.7175667,
6.1592, 8.00415, 8.8051167, 6.647125, 8.2465667, 6.2788167, 6.49435,
8.1847583, 6.664475, 8.0528583, 6.6822417, 7.376, 7.1517833,
8.2306833, 6.8584583, 8.3052167, 7.288375, 8.2758583, 7.7162583,
7.2807833, 7.0459, 8.2507833, 7.5855, 7.0505917, 8.2230167, 8.1669,
6.8184667, 6.9700583, 7.0936167, 7.7615667, 7.6239083, 7.0921667,
9.02585, 9.3416167, 7.6256333, 9.0869333, 8.0984667, 4.116325,
6.1680917, 4.56965, 5.797725, 4.36085, 5.42455, 5.144075, 6.1531833,
4.77825, 6.2533417, 5.0192083, 5.99395, 5.6934083, 4.9074167,
4.9823083, 5.9861667, 5.4068833, 5.1872833, 6.10095, 5.659325,
4.6632833, 4.86315, 5.221775, 5.5878, 5.3217083, 4.8202333, 6.4883083,
6.69355, 4.952075, 6.7075583, 5.00015, 5.2502833, 7.2591, 5.6425417,
6.889925, 5.353675, 6.50635, 6.260675, 7.4236583, 5.9076417,
7.3915, 6.2134917, 7.1645333, 6.922675, 6.0295417, 6.1687917,
7.2771083, 6.6152333, 6.3299417, 7.167325, 6.647275, 5.726475,
5.93905, 6.2888583, 6.7497167, 6.4364083, 5.8906583, 7.6052917,
8.039425, 6.5672833, 7.8754667, 6.3086333, 5.352025, 7.2849417,
5.7184833, 6.9675917, 5.5615333, 6.6157917, 6.3505417, 7.4881,
6.0007417, 7.5110583, 6.35525, 7.254075, 7.0289083, 6.1994417,
6.2860833, 7.372575, 6.735975, 6.4628917, 7.3102167, 6.8619417,
5.9123667, 6.1611917, 6.4854083, 6.8942417, 6.563625, 6.0610083,
7.941625, 8.6969167, 6.66075, 8.1197167, 6.2802, 3.9638, 5.870825,
4.1852, 5.5841417, 4.3007583, 5.2352167, 4.4281417, 5.819425,
4.1990917, 5.9338917, 4.89765, 5.7204333, 5.6546833, 4.5632167,
4.9803333, 5.6962417, 5.247725, 4.7092583, 6.0145417, 5.6403917,
4.4016917, 4.7181, 4.5007833, 5.2828917, 5.1314167, 4.7492, 6.777575,
6.9040083, 4.9760583, 6.4471917, 5.0952833, 3.712725, 5.8215333,
4.025725, 5.5635, 4.2354083, 5.143525, 4.4900083, 5.6802417,
4.1214333, 5.8128, 4.7525583, 5.6412583, 5.5534917, 4.487475,
4.8237833, 5.6156917, 5.0573, 4.5755417, 5.8096083, 5.5252083,
4.3145583, 4.5437417, 4.194675, 5.0100833, 4.8972333, 4.590025,
6.6441417, 6.5789417, 4.6947667, 6.1648167, 4.8517333, 4.1059833,
5.9023167, 4.2812417, 5.6593917, 4.3587583, 5.3359583, 4.983275,
6.0223417, 4.6178333, 6.1545333, 5.0244667, 5.9596, 5.7608833,
4.8875333, 4.9990583, 5.9919333, 5.3157417, 5.0169333, 6.024775,
5.6717167, 4.6372083, 4.8370583, 4.7311333, 5.3704, 5.133575,
4.7174917)), .Names = "x", row.names = c(NA, -455L), class = "data.frame")
Updated after some comments:
Since you state that the minimum number of cases in each group would be fine for you, I'd go with Hmisc::cut2
v <- rnorm(10, 0, 1)
Hmisc::cut2(v, m = 3) # minimum of 3 cases per group
The documentation for cut2 states:
m desired minimum number of observations in a group.
The algorithm does not guarantee that all groups will have at least m observations.
The same cuts for separate variables
If the distributions of your variables are very similar you could extract the exact cutpoints by setting the argument onlycuts = T and reuse them for the other variables. In case the distributions are different though, you will end up with few cases in some intervals.
Using your data:
library(magrittr)
library(Hmisc)
cuts <- cut2(df1$x, g = 20, onlycuts = T) # determine cuts based on df1
cut2(df1$x, cuts = cuts) %>% table
cut2(df2$x, cuts = cuts) %>% table*2 # multiplied by two for better comparison
This is a good example of how NOT to pose a question. At last we have an example an, it is possible to post code that applies to it. (You apparently naively pasted the exact code in my comment without thinking about how to express 'n' and 'N' in the context of the problem. I did need to add prob=c( seq(...) , 1) in order to capture the highest values.
This assumes that you want groups of size 100 (although it is still very unclear why this is needed).
x$xct <- cut( x$x, breaks=quantile(x$x, prob=c( seq(100, length(x$x), by=100)/length(x$x) , 1) ))
table(x$xct)
(4.64,5.17] (5.17,5.57] (5.57,5.85] (5.85,6.17] (6.17,6.51] (6.51,6.85]
100 100 100 100 100 100
(6.85,7.26] (7.26,7.94] (7.94,9.36]
100 100 62

How should I use ezANOVA() if I already collapsed the data to cell means?

I want to do a repeated measures ANOVA using ezANOVA() from the ez pacakge. My experimental designs is a two-way repeated measures design: soa (which has four levels) X congruency (which has three levels), and the data bellow is organized according to the above order (i.e., soa x congruency), with "subject" being the ID column.
My question is how should I use ezANOVA() if I already collapsed the data to cell means (the data bellow is after I collapsed the dependent variables for each subject according to soa x congruency)?
subject mrt1 mrt2 mrt3 mrt4 mrt5 mrt6 mrt7 mrt8 mrt9 mrt10 mrt11 mrt12
99 1039.3 1078.1176 997.5323 873.4615 1024 916.2 1061.0909 1008.7778 919.7879 1053 1052.9615 953.619
5203 1020.4545 1098.6667 911.2642 941.25 944.2857 976.1053 949 992.4167 870.4308 783.9091 852.1176 927.8852
5205 1373.7273 1074.2143 986.7397 1193.4615 1031.9545 1108.4789 1041.2727 1036.0625 989.0714 1180.5 908.9688 944.2024
5306 1012.375 1038.8421 949.4938 1320 1213.5714 1003.3133 1027.3333 970.2778 922.3939 1102 971.4286 943.5634
5307 1397.8333 1243.4 1114 1038 1187.6 1046.2588 1121.4 1376.0833 1080.6615 1075.2727 1159.2381 1060.6818
5308 809.9091 1193.75 1061.2895 923.7647 1128.4286 959.5783 771 1499.25 998.875 925.8462 1074.4 1022.3418
5309 1185 1257.2857 1230.1 1231.8333 1167.4545 1197.2051 1332.4444 1323.9 1334.359 1331.6 1418.5714 1198.8378
5410 1093.7 1154.2778 991.0147 1238.3846 1040.4783 1010.2 1009.3636 1161.6 1016.0946 1140.25 1020.1481 986.5312
5511 881.8182 1082.2 752.5455 1119.2222 969.25 848.8602 958.9444 850.3448 805.75 860.3 902.1579 758.1875
5512 879 1039 951.9138 866.4545 1146.7 898.2841 988 1078.3214 911.4203 1365.7273 1179 924.4928
5513 1239.3333 1063.9565 1013.1111 1018.25 1244.5625 1091.0847 1031.6667 1035.2381 1035.8727 1013.25 1016.5238 1045.9444
5614 1151.5385 1044.4545 1050.5303 1046.2727 1116.1154 1037.9577 1080.7647 1192.1053 1054.9688 1100 995.625 1065.1408
5715 1432 1307.1765 1152.6087 1124.7778 1305.9091 1125.7792 1210.5 1268.7647 1167.8 1199.5 1137.0476 1115.8462
5716 996.7143 1355 1108.4231 1026.8 1251.4286 1108.3333 1130.2857 1196.5333 1071 919.4545 1183.4286 1040.9796
5817 1046.9 1075.8276 865.7534 1058.1818 1101.9259 877.2143 923.5 1063.25 880.44 1050.2 984.24 919.8913
Any help will be greatly appreciated,
Ayala

Stata counting substring

My table looks like this:
ID AQ_ATC amountATC
. "A05" 1
123 "A05AA02" 2525
234 "A05AA02" 2525
991 "A05AD39" 190
. "C10" 1
441 "C10AA11" 4330
229 "C10AA22" 3100
. "C05AA" 1
441 "C05AA03" 130
The count for the full 8-character AQ_ATC codes is already correct.
The shorter codes are unique in the table and are substrings of the complete 8-character codes (they represent the first x characters).
What I am looking for is the count of the appearances of the shorter codes throughout the entire table.
For example in this case the resulting table would be
ID AQ_ATC amountATC
. "A05" 2715 <-- 2525 + 190
123 "A05AA02" 2525
234 "A05AA02" 2525
991 "A05AD39" 190
. "C10" 7430 <-- 4330 + 3100
441 "C10AA11" 4330
229 "C10AA22" 3100
. "C05AA" 130 <-- 130
441 "C05AA03" 130
The partial codes do not overlap, by what I mean that if there is "C05" there wont be another partial code "C05A1".
I created the amountATC column using
bysort ATC: egen amountATC = total(AQ_ATC==AQ_ATC)
I attempted recycling the code that I had received yesterday but failed in doing so.
My attempt looks like this:
levelsof AQ_ATC, local(ATCvals)
quietly foreach y in AQ_ATC {
local i = 0
quietly foreach x of local ATCvals {
if strpos(`y', `"`x'"') == 1{
local i = `i'+1
replace amountATC = `i'
}
}
}
My idea was to use a counter "i" and increase it by 1 everytime the an AQ_ATC starts with another AQ_ATC code. Then I write "i" into amountATC and after I iterated over the entire table for my AQ_ATC, I will have an "i"-value that will be equal to the amount of occurences of the substring. Then I reset "i" to 0 and continue with the next AQ_ATC.
At least thats how I intended for it to work, what it did in the end is set all amountATC-values to 1.
I also attempted looking into different egen-functions such as noccur and moss, but my connection keeps timing out when I attempt to install the packages.
It seems as if you come from another language and you insist in using loops when not strictly necessary. Stata does many things without explicit loops, precisely because commands already apply to all observations.
One way is:
clear
set more off
input ///
ID str15 AQ_ATC amountATC
. "A05" 1
123 "A05AA02" 2525
234 "A05AA02" 2525
991 "A05AD39" 190
. "C10" 1
441 "C10AA11" 4330
229 "C10AA22" 3100
. "C05AA" 1
441 "C05AA03" 130
end
*----- what you want -----
sort AQ_ATC ID
gen grou = sum(missing(ID))
bysort grou AQ_ATC: gen tosum = amountATC if _n == 1 & !missing(ID)
by grou: egen s = total(tosum)
replace amountATC = s if missing(ID)
list, sepby(grou)
Edit
With your edit the same principles apply. Below code that adjusts to your change and slightly changes the code (one line less):
*----- what you want -----
sort AQ_ATC
gen grou = sum(missing(ID))
bysort grou: gen s = sum(amountATC) if AQ_ATC != AQ_ATC[_n+1] & !missing(ID)
by grou: replace amountATC = s[_N] if missing(ID)
More efficient should be:
<snip>
bysort grou: gen s = sum(amountATC) if AQ_ATC != AQ_ATC[_n+1]
by grou: replace amountATC = s[_N] - 1 if missing(ID)
Some comments:
sort is a very handy command. If you sort the data by AQ_ATC they are arranged in such a way that the short (sub)strings are placed before corresponding long strings.
The by: prefix is fundamental and very helpful, and I noticed you can use it after defining appropriate groups. I created the groups taking advantage of the fact that all short (sub)strings have a missing(ID).
Then (by the groups just defined) you only want to add up one value (observation) per amountATC. That's what the condition if AQ_ATC != AQ_ATC[_n+1] does.
Finally, replace back into your original variable. I would usually generate a copy and work with that, so my original variable doesn't suffer.
An excellent read for the by: prefix is Speaking Stata: How to move step by: step, by Nick Cox.
Edit2
Yet another slightly different way:
*----- what you want -----
sort AQ_ATC
gen grou = sum(missing(ID))
egen t = tag(grou AQ_ATC)
bysort grou: gen s = sum(amountATC * t)
by grou: replace amountATC = s[_N] - 1 if missing(ID)

Resources