Deduce informations with pairs using programmation - r

I would like to analyze data.
My Database is composed of 1408 (704 for type 1 and 704 for type 2) observations and 49 variables. Here is part of my database.
The point is that I want to analyze gender of type 1(sellers) who overcharged.
Data
Subject ID Gender Period Matching group Group Type Overcharging
654 1 1 73 1 1 NA
654 1 2 73 1 1 NA
654 1 3 73 1 1 NA
654 1 4 73 1 1 NA
708 0 1 73 1 2 1
708 0 2 73 1 2 0
708 0 3 73 1 2 0
708 0 4 73 1 2 1
435 1 1 73 2 1 NA
435 1 2 73 2 1 NA
435 1 3 73 2 1 NA
435 1 4 73 2 1 NA
546 0 1 73 2 2 0
546 0 2 73 2 2 0
546 0 3 73 2 2 1
546 0 4 73 2 2 0
For example, if you take a look at matching group =73, there are 2 groups (1 and 2).And in each group, there are two types (1 and 2). For each type 1 (seller) we do not have information about what he did (overcharge or not). But we have informations about buyers (type 2) who were overcharged or not.
If I can identify the buyer who were over-treated, then, this means that the seller this buyer is interacting has over-treated the buyer. So all I need to look at is the gender of the seller in the same group as the buyer.
In matching group 73 we know for instance that at period 1 subject 708 was overcharged (the one in group 1). As I know that this men belongs to group 1 and matching group 73, I am able to identify the seller who has overcharged him : subject 654 with gender =1.
In group 2 (matching group 73), we know that at period 3, agent 546 was overcharged. As I know that this men belongs to group 1 and matching group 73, I am able to identify the seller who has overcharged him : subject 435 with gender =1.
....
I would do this for all the observations I have.
However I really don't know how to proceed to code and make this condition on R.
This is what I tried to do, but doesn't fit my needs !
for (matchinggroup[type==1]==matchinggroup[type==2] &
group[type==1]==group[type==2] & period[type==1]==period[type==2])
{
if ((overtreatment==1), na.rm=TRUE)
sum(gender==1[type==1], na.rm=TRUE)
}
The expected output I would like to have is :
sum(overcharging==1[gender==1&type==1])
>3
sum(overcharging==1[gender==0&type==1])
>0
sum(overcharging==0[gender==1&type==1])
>5
sum(overcharging==0[gender==0&type==1])
>0

Not exactly sure what your desired output is, but consider this:
Data <- read.table(header = T,
text = "Subject_ID Gender Period Matching_group Group Type Overcharging
654 1 1 73 1 1 NA
654 1 2 73 1 1 NA
654 1 3 73 1 1 NA
654 1 4 73 1 1 NA
708 0 1 73 1 2 1
708 0 2 73 1 2 0
708 0 3 73 1 2 0
708 0 4 73 1 2 1
435 1 1 73 2 1 NA
435 1 2 73 2 1 NA
435 1 3 73 2 1 NA
435 1 4 73 2 1 NA
546 0 1 73 2 2 0
546 0 2 73 2 2 0
546 0 3 73 2 2 1
546 0 4 73 2 2 0
")
dat1 <- subset(Data, Overcharging==1)
This will find all the Overcharging sellers. And then you could find each matching buyer using this loop:
out <- data.frame()
for(i in 1:nrow(dat1)){
dat2 <- dat1[i,]
df <- Data[Data$Period==dat2$Period & Data$Matching_group==dat2$Matching_group &
Data$Group==dat2$Group & Data$Type==1,]
out <- rbind(out, df)
}
Which will give you:
Subject_ID Gender Period Matching_group Group Type Overcharging
1 654 1 1 73 1 1 NA
4 654 1 4 73 1 1 NA
11 435 1 3 73 2 1 NA

I think "for loop" solution is not suitable in R.
I developed another solution for you with data.table by seperating sellers and buyers, and then joining them.
library(data.table)
Data <- data.table(read.table(header = T,
text = "Subject_ID Gender Period Matching_group Group Type Overcharging
654 1 1 73 1 1 NA
654 1 2 73 1 1 NA
654 1 3 73 1 1 NA
654 1 4 73 1 1 NA
708 0 1 73 1 2 1
708 0 2 73 1 2 0
708 0 3 73 1 2 0
708 0 4 73 1 2 1
435 1 1 73 2 1 NA
435 1 2 73 2 1 NA
435 1 3 73 2 1 NA
435 1 4 73 2 1 NA
546 0 1 73 2 2 0
546 0 2 73 2 2 0
546 0 3 73 2 2 1
546 0 4 73 2 2 0
")
)
Data[, SubjectType := ifelse(Type==1, "Seller", "Buyer")]
Subjects <- unique(Data[, .(Subject_ID, Gender)])
Matches <- dcast(Data, Matching_group+Group~SubjectType, value.var="Subject_ID", fun.aggregate = mean)
Buys <- Data[!is.na(Overcharging), .(Buyer = Subject_ID, BuyerGender = Gender, Period, Matching_group, Group, Overcharging)]
Buys <- merge(Buys, Matches, by=c("Buyer", "Matching_group", "Group"), all.x = T)
Buys <- merge(Buys, Subjects[, .(Seller = Subject_ID, SellerGender = Gender)], by="Seller", all.x = T)
Buys[Overcharging==0, .N, .(BuyerGender, SellerGender)]
Buys[Overcharging==1, .N, .(BuyerGender, SellerGender)]

Related

Average columns based on other column value and number of rows in R

I'm using R and am trying to create a new dataframe of averaged results from another dataframe based on the values in Column A. To demonstrate my goal here is some data:
set.seed(1981)
df <- data.frame(A = sample(c(0,1), replace=TRUE, size=100),
B=round(runif(100), digits=4),
C=sample(1:1000, 100, replace=TRUE))
head(df, 30)
A B C
0 0.6739 459
1 0.5466 178
0 0.154 193
0 0.41 206
1 0.7526 791
1 0.3104 679
1 0.739 434
1 0.421 171
0 0.3653 577
1 0.4035 739
0 0.8796 147
0 0.9138 37
0 0.7257 350
1 0.2125 779
0 0.1502 495
1 0.2972 504
0 0.2406 245
1 0.0325 613
0 0.8642 539
1 0.1096 630
1 0.2113 363
1 0.277 974
0 0.0485 755
1 0.0553 412
0 0.509 24
0 0.2934 795
0 0.0725 413
0 0.8723 606
0 0.3192 591
1 0.5557 177
I need to reduce the size of the data by calculating the average value for column B and column C for as many rows as the value in Column A stays consecutively the same, up to a maximum of 3 rows. If value A remains either 1, or 0 for more than 3 rows it would roll over into the next row in the new dataframe as you can see below.
The new dataframe requires the following columns:
Value of A B.Av C.Av No. of rows used
0 0.6739 459 1
1 0.5466 178 1
0 0.282 199.5 2
1 0.600666667 634.6666667 3
1 0.421 171 1
0 0.3653 577 1
1 0.4035 739 1
0 0.8397 178 3
1 0.2125 779 1
0 0.1502 495 1
1 0.2972 504 1
0 0.2406 245 1
1 0.0325 613 1
0 0.8642 539 1
1 0.1993 655.6666667 3
0 0.0485 755 1
1 0.0553 412 1
0 0.291633333 410.6666667 3
0 0.59575 598.5 2
1 0.5557 177 1
I haven't managed to find another similar scenario to mine whilst searching Stack Overflow so any help would be really appreciated.
Here is a base-R solution:
## define a function to split the run-length if greater than 3
split.3 <- function(l,v) {
o <- c(values=v,lengths=min(l,3))
while(l > 3) {
l <- l - 3
o <- rbind(o,c(values=v,lengths=min(l,3)))
}
return(o)
}
## compute the run-length encoding of column A
rl <- rle(df$A)
## apply split.3 to the run-length encoding
## the first column of vl are the values of column A
## the second column of vl are the corresponding run-length limited to 3
vl <- do.call(rbind,mapply(split.3,rl$lengths,rl$values))
## compute the begin and end row indices of df for each value of A to average
fin <- cumsum(vl[,2])
beg <- fin - vl[,2] + 1
## compute the averages
out <- do.call(rbind,lapply(1:length(beg), function(i) data.frame(`Value of A`=vl[i,1],
B.Av=mean(df$B[beg[i]:fin[i]]),
C.Av=mean(df$C[beg[i]:fin[i]]),
`No. of rows used`=fin[i]-beg[i]+1)))
## Value.of.A B.Av C.Av No..of.rows.used
##1 0 0.6739000 459.0000 1
##2 1 0.5466000 178.0000 1
##3 0 0.2820000 199.5000 2
##4 1 0.6006667 634.6667 3
##5 1 0.4210000 171.0000 1
##6 0 0.3653000 577.0000 1
##7 1 0.4035000 739.0000 1
##8 0 0.8397000 178.0000 3
##9 1 0.2125000 779.0000 1
##10 0 0.1502000 495.0000 1
##11 1 0.2972000 504.0000 1
##12 0 0.2406000 245.0000 1
##13 1 0.0325000 613.0000 1
##14 0 0.8642000 539.0000 1
##15 1 0.1993000 655.6667 3
##16 0 0.0485000 755.0000 1
##17 1 0.0553000 412.0000 1
##18 0 0.2916333 410.6667 3
##19 0 0.5957500 598.5000 2
##20 1 0.5557000 177.0000 1
Here is a data.table solution:
library(data.table)
setDT(df)
# create two group variables, consecutive A and for each consecutive A every three rows
(df[,rleid := rleid(A)][, threeWindow := ((1:.N) - 1) %/% 3, rleid]
# calculate the average of the columns grouped by the above two variables
[, c(.N, lapply(.SD, mean)), .(rleid, threeWindow)]
# drop group variables
[, `:=`(rleid = NULL, threeWindow = NULL)][])
# N A B C
#1: 1 0 0.6739000 459.0000
#2: 1 1 0.5466000 178.0000
#3: 2 0 0.2820000 199.5000
#4: 3 1 0.6006667 634.6667
#5: 1 1 0.4210000 171.0000
#6: 1 0 0.3653000 577.0000
#7: 1 1 0.4035000 739.0000
#8: 3 0 0.8397000 178.0000
#9: 1 1 0.2125000 779.0000
#10: 1 0 0.1502000 495.0000
#11: 1 1 0.2972000 504.0000
#12: 1 0 0.2406000 245.0000
#13: 1 1 0.0325000 613.0000
#14: 1 0 0.8642000 539.0000
#15: 3 1 0.1993000 655.6667
#16: 1 0 0.0485000 755.0000
#17: 1 1 0.0553000 412.0000
#18: 3 0 0.2916333 410.6667
#19: 2 0 0.5957500 598.5000
#20: 1 1 0.5557000 177.0000

Line plot with R with categorical data

I have a dataset with variables;
subject, group, gender, pretest, posttest, FU_6_month, FU_12_month
Subject Group Gender Pretest Posttest FU_6_month FU_12_month
1 2 2 118.826601098386 93.7242226833558 45.9022982619128 87.5922938477103
2 2 2 61.076151378316 37.4269190073656 74.0550780537479 125.065288560879
3 2 2 57.5102273980161 77.8629614597533 75.57281055525 47.6327188844442
4 2 2 70.7363734703855 9.66991124269437 89.3449482258492 108.293657994293
5 2 2 9.86907880058945 -10.3608782428914 -37.5688442618089 -75.0331616314858
6 2 2 64.5157954624921 16.5509079527419 41.8441716919839 62.281296652626
7 2 2 63.2566934190156 55.7464724092843 19.0693261189747 59.9351530137349
8 2 2 109.093548562172 31.718921929317 39.6937442573627 52.4519409887752
9 2 2 140.693405017245 140.524720621573 77.5230524280486 134.207295832318
10 2 2 93.9426980661475 46.8332604054467 38.402855719624 50.9079427718743
11 2 2 43.7680225594388 80.6379240148244 96.3606811119212 67.1279194045236
12 2 2 20.1458211058956 -7.04822262983657 8.49333465622865 36.5967882860063
13 2 2 83.469391668149 123.969188001105 106.499980159661 29.6135142560297
14 2 2 96.6687732166198 41.1420718195119 54.7799026008967 56.4261716602755
15 2 2 106.358591131724 144.971834214664 148.026658720866 108.002672122947
16 2 2 16.9581371937412 83.1518225974514 88.0186516925204 121.061427265927
17 2 2 64.2539971694514 63.2117086017753 111.907210080775 37.9182814935607
18 2 2 124.11372018163 118.866157749789 78.6115825756602 39.7082848460843
19 2 2 46.4606560117358 74.4717571331671 40.7418447935371 51.528282732026
20 2 2 114.930656954671 87.3716921095551 71.2297039530053 52.3204838525396
21 2 2 40.4201124083124 86.7274803995984 40.0452102666523 106.294559268573
22 2 2 108.836792961519 55.3938284037392 42.5591569465982 -8.19305292316011
23 2 2 62.3804870892384 73.4711868487922 18.9401981024593 79.1316020496406
24 2 2 34.4038057814076 -10.3210731558527 59.0035259778318 70.0935426080846
25 2 2 90.8027274469837 63.9607510682 23.409456737791 -22.875864241026
26 2 2 11.7507972703742 -2.94630158894237 -58.3073936001535 -11.60312518279
27 2 2 44.80814276417 -14.5096265325743 44.6692531748914 15.2279416937636
28 2 2 50.1321942804589 21.3103925056834 23.4817359003739 4.6942563521538
29 2 2 100.916457384715 79.7904457154806 77.4563447056773 133.933576514761
30 2 2 91.9666694946805 87.5772982344139 74.7225424155036 112.423696361253
31 2 2 88.9334732523178 34.7022650208145 90.5109779763575 74.5346146636566
32 1 1 81.1214937447534 54.1044255781282 84.7921094855582 91.1960453420743
33 1 1 58.8576595772853 43.7588455451153 -28.3737936572789 20.5323873036915
34 1 1 94.7206052432539 117.417745760142 123.549594722528 123.577410999177
35 1 1 35.6759882761208 83.438460147486 101.374418533548 117.407858069453
36 1 1 27.0663967490866 65.1316276924752 40.6863523256917 43.0770501617226
37 1 1 25.8514744435283 65.6129973657268 91.2683460107141 87.8754543146423
38 1 1 38.1949952702174 86.6308464921296 46.3924788350823 75.9032186102214
39 1 1 86.8606779979669 108.826238059586 75.6851380593086 65.4200604841805
40 1 1 -6.27591849716471 37.6110556707935 17.073836159682 2.31274071895368
41 1 1 46.3301324308731 48.8945996406574 28.184935352875 75.0552601606531
42 1 1 75.5207821136948 64.3563772132469 47.0692943366087 6.73314670916386
43 1 1 72.2546738033818 69.586398745141 114.381364281594 111.826984877306
44 1 1 63.4781658850011 99.049067888915 147.435368194586 95.7697872432145
45 1 1 98.7816404692531 119.771236619925 86.0608732865149 63.5925334927907
46 1 1 57.7861680881831 60.9242563489794 115.323242983005 107.812892523927
47 1 1 20.7553750494804 64.2153119771138 80.4530077140549 109.82827909681
48 1 1 120.216467968489 98.1714936196362 103.684406597264 70.9607067374422
49 1 1 79.185403297875 64.5546197197006 25.8163919150245 25.7390824767794
50 1 1 76.7014603032583 76.4261333376855 47.8550073318326 60.1294421424394
51 1 1 28.0279022207291 65.3928762213151 47.0842042316292 87.1936301446947
52 1 1 144.744160722781 87.2810292839688 116.354612596006 107.648295676686
53 1 1 6.33304664550448 28.6180442534099 35.0924364855403 43.4180796117138
54 1 1 61.3353319617994 84.5449868274988 50.4807390973971 66.2394686330681
55 1 1 76.2997829598239 63.5136443275288 121.943272892608 91.1928686386778
56 1 1 92.1450972853901 114.609600784086 76.7511596303626 100.375073126256
57 1 1 28.804633600014 59.8400454024583 72.5911681616019 26.0906198483795
58 1 1 106.980948059764 96.5451150496155 84.9934401810607 74.8203628162023
59 1 1 59.452124962385 89.3319279228802 10.0195199176193 54.346248526172
60 1 1 95.0552085815196 97.2000344846879 70.705403830194 51.3011761075761
61 1 1 61.8002496990622 86.9070024010344 48.3554607420459 85.9692920560173
62 1 1 51.4264598967651 8.55667676146555 20.2154240800391 7.43739168375134
63 1 1 85.4912848778254 85.5693976898857 129.597738684151 91.7808811747717
64 1 1 98.0252031110864 57.6910927029629 75.4148963448406 23.2566349717607
65 1 1 71.7048653453729 78.995107581382 53.2810320681918 62.9720582710283
66 1 1 76.78555002899 69.9939478859649 128.789596038478 83.9802882230193
67 1 1 115.006897750398 129.469936599525 122.703355191477 84.5780155932675
68 1 1 134.540121180658 60.2690032596209 58.0334397473465 77.2650708277708
69 1 1 55.0188905000887 28.8950163475678 29.6744156495995 54.4603122787989
70 1 1 23.0424913085507 36.5297451319085 21.9128320910716 89.0993730787044
71 1 1 84.9353877886003 52.9391040508769 40.5060745487472 97.5747666694516
72 1 1 31.3080252805222 33.2934269506026 40.8168172306177 57.5855596826412
73 1 1 109.151435427561 103.268038055169 114.989749539916 79.0797680695321
74 1 1 44.1915601794505 32.488889575974 47.8871349362105 23.3092880391165
75 1 1 73.7934776486887 101.055099406535 84.1618985567819 103.402981000812
76 1 1 95.0542059478978 97.5911569796704 78.486148113456 68.4867425986232
77 1 1 64.5240726355218 31.671278755378 33.258435447189 40.5905417521354
78 1 1 106.25630590229 57.3504334201694 89.4167040212866 110.270001543683
79 1 1 -32.8285288832798 13.0666964953638 64.1013238102904 10.0353131055683
80 1 1 136.997343492581 143.690139614701 134.005733227978 120.271745852092
81 1 1 70.7343231272323 97.4590508106664 106.172374379448 90.3343680822183
82 1 1 110.209501165981 104.25349711217 40.4199334170602 1.84570821055021
83 1 1 100.124928168325 166.678405906746 157.469180899729 106.005633296499
84 1 1 62.5812451279611 55.6523803940952 47.3468086312141 36.9790454246945
85 1 1 158.704614848065 107.969362033628 56.1774990101832 123.674166267595
86 1 1 56.9197652881344 136.850596044725 83.6842567229687 83.9300268420387
87 1 1 128.125931932392 92.6197201206393 150.508502710328 131.529802017041
88 1 1 64.7593992341357 73.862492217461 65.0649801212256 38.6091230133425
89 1 1 68.9999096964161 80.1939274248903 30.6648554419909 36.9854625032025
90 1 1 98.3010238699883 111.484289666451 84.0643697006305 43.3236053574824
91 1 1 144.016203920245 118.355849401995 150.964556520965 79.3690577232804
92 1 1 56.3672851862174 49.3335857232337 28.0430516312931 -3.75856749795899
93 1 1 50.9904382495696 79.5835591128045 85.1716258269063 89.840247580395
94 1 1 4.79995357692619 63.109484390147 85.3620178819827 145.014718138729
95 1 1 116.342271365265 63.9238823488814 73.6395620422341 98.5908099418167
96 1 1 73.4310676105442 89.8610346746933 59.6218968304398 80.5780124201831
97 1 1 39.0845065963185 67.2066549278885 43.0404190714966 93.8553228463651
98 1 1 84.7341340177175 83.2526354844997 23.6121905738287 107.080712311365
99 1 1 91.3038283312291 135.044667633721 76.792704934174 110.553664837889
100 1 1 79.0049860628834 49.2175049784618 67.1482846117857 55.617484745908
101 1 1 29.5624193037796 .0409501286203806 43.4888184402251 39.8187985904858
102 1 1 51.3765518161258 69.4038526036381 45.9739141565358 50.6597987940666
103 1 1 63.0055365628694 103.322140129569 120.808548140362 105.919544083519
104 1 1 52.8913719708796 70.7470331787717 58.2919624308167 149.453269781413
105 1 1 120.714035605949 105.19250318209 65.8882662961688 88.7091006419313
106 1 1 -7.66766451716536 25.8711209808077 11.6178991397316 15.767943907553
107 1 1 3.72070471161231 62.0273823153584 22.4075847523903 21.7788634487747
108 1 1 51.3172868381968 113.882913281247 90.9005215836042 92.3427665340994
109 1 1 73.5034029649624 62.87236526609 8.81375967033083 -28.4972807134294
110 1 1 99.1011673438447 60.1359669228303 53.8205321646238 96.1744881373656
111 1 1 107.982390062527 135.072911770312 125.382997170895 98.8360994985125
112 1 1 18.3703016279945 53.5680481870766 65.3995855325053 114.196012166461
113 1 2 63.3407691409205 89.4389118123724 81.255769107708 100.393559031196
114 1 2 56.5421572534953 130.6960381422 59.0306960924183 75.4843608239859
115 1 2 53.6415766516234 84.3510275802067 70.3588577776371 7.17774520243648
116 1 2 32.6563157378665 103.863103137957 137.125019714332 137.633650084141
117 1 2 55.6103169017315 64.3121749271763 98.6782025731554 95.2469559040964
118 1 2 18.0644132411083 66.2946440162736 70.1064773432998 37.0526455867338
119 1 2 78.0707752233755 105.514210353922 103.005875726676 74.5737236293205
120 1 2 102.241621141341 101.870735843971 65.2114022842167 79.1628896853278
121 1 2 90.4418459183572 89.1009771794923 109.828940848838 14.1605234087263
122 1 2 168.098799922206 133.913045707145 103.78366545298 78.4568362168323
123 1 2 35.7090570053257 82.1569145373673 89.6599489830234 73.1454809943076
124 1 2 47.9178546235218 104.55609221103 84.063325033245 69.4700774018193
125 1 2 112.097399328943 79.2493293202281 43.9677786050359 33.097338196686
126 1 2 76.6236967002151 108.310865299192 59.2668769569866 85.7884279169812
127 1 2 -9.68798466340358 74.9095212029448 26.2061712938769 -48.5516085044269
128 1 2 112.998248745356 106.008001236265 73.8054153582076 14.1811229738059
129 1 2 108.233164776823 152.044046851601 134.983762387721 61.7054340377302
130 1 2 52.8401153910794 67.4683858534514 47.0885824183792 47.2019614625486
131 1 2 54.0715795536216 80.1460393610179 68.0047798144665 35.7762407840147
132 1 2 14.7567828626484 93.2883767811719 73.3331929932544 29.9865546887769
133 1 2 55.1781270568026 111.21768955155 95.0114679825819 65.3422083272693
134 1 2 4.43500653128196 34.0038153834776 26.8355329322023 -19.248732400278
135 1 2 53.7146461969759 101.924570033764 79.3507104221776 137.799585172542
136 1 2 115.416626940762 98.7063813017698 114.583512474598 47.8066528760494
137 1 2 92.8832002098113 105.095486574613 117.913021787209 114.28506812779
138 1 2 79.1670151767684 137.272559436949 110.671170365682 149.150952988482
139 1 2 41.4916764950108 88.1814608116972 70.7781360402978 56.2502955019036
140 1 2 62.299952297694 79.5442518959365 56.3854107511476 81.6024886897144
141 1 2 69.0703579054148 52.776765534947 84.1625448317378 92.6786544709192
142 1 2 -.749072465661733 37.9509014388929 45.0107418907768 14.1279427398175
143 1 2 62.5452932897767 81.2189303707544 17.1315538180223 4.72972999158627
144 1 2 33.8786649092082 53.8147463106634 37.8014559381097 58.3312339891636
145 2 1 58.4750829402665 97.2248315220446 77.197092547873 65.5159928828053
146 2 1 86.9936228722864 66.1574144764372 109.876241164801 101.050374552729
147 2 1 45.797257668434 -10.6799727646283 16.9239199285245 -2.35156822036691
148 2 1 45.4052424284709 64.0362320451053 -18.9317120769134 58.9156051743804
149 2 1 82.3222552592198 43.7841447176975 48.7467719992006 30.6908078408155
150 2 1 41.4724114870814 17.852908189203 2.65240845835374 17.1640762470276
151 2 1 75.8562984473622 65.278426178487 64.9142539786957 21.511183615897
152 2 1 73.4502794972495 76.9150599813474 106.659511743715 68.1459795314904
153 2 1 30.6986972707736 25.9830967919212 19.2126599722075 40.0176772459464
154 2 1 48.9463104941463 74.3166671883191 94.4199203293464 76.5601739056115
155 2 1 93.664955663169 34.2345681240818 35.1252683477753 51.0690055100367
156 2 1 89.0347714225498 107.917489676586 104.609385598132 72.2295350027298
157 2 1 88.8172895882086 112.088575121553 155.71739850392 82.2519548781208
158 2 1 98.2643566615029 104.198646984885 78.5303387818693 40.3586832708305
159 2 1 140.939818857767 78.0188776879868 81.2063481452701 68.525334485002
160 2 1 86.1936198080317 90.5323375347094 50.990491005716 50.1471294788092
161 2 1 103.870811613831 112.435654424441 58.3101416041746 33.7245926119906
162 2 1 27.2987791945808 53.345632406136 57.9089183269522 76.3952523121297
163 2 1 87.7060341970745 61.291274523674 100.049614808861 26.5530802039375
164 2 1 68.0936102773189 27.546686021281 78.1098864539664 109.117977596292
165 2 1 109.653139087334 108.376652318957 110.5475461516 78.1863689708069
166 2 1 81.8701695708016 79.8883822519127 70.9356570134467 48.0602030880919
167 2 1 90.5334512828109 46.3060080438415 57.7463692516884 72.5177387570704
168 2 1 69.4747837616312 81.0433631539362 84.2928523743015 83.3080622597221
169 2 1 84.3457020439425 34.0958457253984 23.3496696591518 63.8966218043022
170 2 1 85.1367615599717 48.31325458371 71.409002119249 22.7195826358608
171 2 1 6.0519739761238 82.6886765807442 42.9957736674827 48.6043069078819
172 2 1 153.607342017049 94.0689513605759 67.5617502165065 62.5307210431415
173 2 1 113.000428765149 98.5463804067778 47.0973452834747 68.3383901942912
174 2 1 130.149052574106 100.920206253848 126.83942217593 132.801239509274
175 2 1 10.7637884370764 22.7354098644514 58.6681058708439 102.796669810432
176 2 1 46.0462202749716 86.1150070810951 44.3087422654596 47.2714161661316
177 2 1 -27.68834561668 26.0619549334512 -16.3532520219022 22.3775474222391
178 2 1 85.2528610302936 86.632688194587 88.2098310158051 73.9979201031045
179 2 1 68.5946657752244 72.2857403694067 75.8611359593031 92.955988879473
180 2 1 58.7248829125552 74.2281143735229 38.9726075217798 69.6818957036748
181 2 1 114.83344514817 177.249338931554 174.814763104311 134.38841179859
182 2 1 109.399175453506 111.367962077231 -5.55558989059328 28.5262840223053
183 2 1 82.2298452398339 116.012197254927 .228292808400234 75.6485578629752
184 2 1 18.9327474171412 21.6670670950259 16.3945756900044 54.6818026426316
185 2 1 54.2349388585838 16.3669947396341 -1.56399601528626 -18.8892345761952
186 2 1 89.0369674256001 79.6703899767321 36.1189712407708 11.4989067346067
187 2 1 75.8202676457623 65.9635609540475 25.5056447678644 -18.1512392132912
188 2 1 116.339771648292 113.245057242618 86.3115594425365 109.167966173311
189 2 1 117.753925152085 67.7452097303132 4.85479636923036 61.3300645670528
190 2 1 76.3171444782418 51.2881910612134 70.072345943293 32.0402775976214
191 2 1 6.99109934167268 -9.68978382023725 1.3222912300926 3.70565113354709
192 2 1 87.3719506699759 71.0866781094484 77.8383221714908 140.673530786563
193 2 1 61.5221066085255 -12.2201874125806 33.7716972122356 83.5375669493848
194 2 1 59.033619312337 88.3938919099359 88.6469806667526 66.2006388023931
195 2 1 12.7705479386231 52.8599513977407 -8.98178395043158 5.67532437102307
196 2 1 80.4575996457465 80.3674100418745 64.0589629803417 71.3999743886292
197 2 1 169.161756107454 165.448882319084 69.4423743196653 32.3687225071222
198 2 1 96.5770762182269 131.627910387452 166.29019608871 98.559030242817
199 2 1 49.6756456139914 51.1601253391724 95.7385197241244 47.6375896980025
200 2 1 36.9702142332847 52.0308285598544 46.6203485187055 49.9205778435778
201 2 1 31.6712601805128 75.1577635193883 25.5645852430251 41.5771578423282
202 2 1 141.842177028419 79.951615011948 100.275739895567 34.2274701113761
203 2 1 106.278192177332 61.244325122698 22.4441542064441 60.4646195928593
204 2 1 126.846572787508 108.802011494278 97.5118381725256 59.0049523072224
205 2 1 77.2449560227551 134.640458503015 91.3277915052969 35.7743427602639
206 2 1 37.7728733789122 10.2075340055892 8.99376684705592 3.03192118717025
207 2 1 114.8922480495 95.7836293640519 14.3620213702521 45.6468962834769
208 2 1 45.1469870919599 92.8354230186556 19.5880350954832 -7.39316345806174
209 2 1 44.4203645840082 43.6352236864648 -.419805604549687 19.0093669734351
210 2 1 34.4785422004195 67.1335597030613 -7.48295412928773 54.2591610487253
211 2 1 33.7753422990983 67.4665374150396 46.7093420037872 38.8666598509216
212 2 1 36.2533327207531 16.6107822142166 4.9541023098848 -20.8942996647395
213 2 1 42.8875179824501 -2.96980949706499 20.2592671889875 -15.0653299294941
214 2 1 72.7910110012001 14.1832846117514 44.1906517560891 28.8987526058914
215 2 1 37.4981519755605 10.0209572467386 -15.9703049964524 19.8340566832605
216 2 1 55.7991254855275 87.2460127087151 112.514668101071 40.5956062724875
217 2 1 50.0767808073501 36.9527634280403 51.3514405476814 39.1215866903162
218 2 1 35.3884932946901 -16.1159306931375 -25.9845822793937 -28.8618969982463
219 2 1 42.645467117772 71.5512811866725 86.6146088838234 25.4306120300005
220 2 1 84.4571783414763 70.0196700982875 66.0406733064271 91.5530533206067
221 2 1 149.800029292066 119.263221836987 32.7206483746691 44.0768853551691
222 2 1 79.0871305005099 54.933237558365 96.4976033960482 73.9599335371546
223 2 1 49.5234682066419 102.530402155721 86.6150522890923 60.8527172169368
224 2 1 6.68341977650618 30.2420157859321 -16.1975706667556 82.7559857579158
225 2 1 67.5911504383323 -5.55283170793827 -34.1814633855533 -32.384973091358
I want to plot a line chart with x-axle being the test scores"pretest", "posttest", "FU_6_month" and "FU_12_month", while the y-axle is the value for the test scores. I also want seperate lines for males and females.
Does anyone know how to do this with the R plot function?
I tried googling it but cannot find anything useful for this type of data, since i want the different test periods as y-axle.
Thanks.
I did
aggregate(anovadata$Pretest ~ anovadata$Group + anovadata$Gender, data = anovadata, mean)
aggregate(anovadata$Posttest ~ anovadata$Group + anovadata$Gender, data = anovadata, mean)
aggregate(anovadata$FU_6_month ~ anovadata$Group + anovadata$Gender, data = anovadata, mean)
aggregate(anovadata$FU_12_month ~ anovadata$Group + anovadata$Gender, data = anovadata, mean)
to go get the means for each group. For this i get
anovadata$Group anovadata$Gender anovadata$Pretest
1 1 1 69.9
2 2 1 71.1
3 1 2 62.3
4 2 2 72.0
anovadata$Group anovadata$Gender anovadata$Posttest
1 1 1 76.7
2 2 1 66.2
3 1 2 90.6
4 2 2 57.6
anovadata$Group anovadata$Gender anovadata$FU_6_month
1 1 1 70.1
2 2 1 52.9
3 1 2 76.7
4 2 2 55.0
anovadata$Group anovadata$Gender anovadata$FU_12_month
1 1 1 71.9
2 2 1 49.9
3 1 2 59.5
4 2 2 58.3
Now i want to make (group 1 gender 1) (group 2 gender 1) (group 1 gender 2) (group 2 gender 2) to be seperate units and the pretest, posttest, FU_6, FU_12 to be different time periods. There should be four in total in the graph.
Does anybody know how to format the aggregates to vectors or matrices without using hardcoding in order to get the lineplot?

How to remove rows based on distance from an average of column and max of another column

Consider this toy data frame. I would like to create a new data frame in which only rows that are below the average of "birds" and only rows that less than the two top values after the maximum value of "wolfs".So in this data frame I'll get only rows: 543,608,987,225,988,556.
I used this two lines of code for the first constrain but couldn't find a solution for the second constrain.
df$filt<-ifelse(df$birds<mean(df$birds),1,0)
df1<-df1[which(df1$filt==1),]
How can I create the second constrain ?
Here is the toy dataframe:
df <- read.table(text = "userid target birds wolfs
222 1 9 7
444 1 8 4
234 0 2 8
543 1 2 3
678 1 8 3
987 0 1 2
294 1 7 1
608 0 1 5
123 1 9 7
321 1 8 7
226 0 2 7
556 0 2 3
334 1 6 3
225 0 1 1
999 0 3 9
988 0 1 1 ",header = TRUE)
subset(df,birds < mean(birds) & wolfs < sort(unique(wolfs),decreasing=T)[3]);
## userid target birds wolfs
## 4 543 1 2 3
## 6 987 0 1 2
## 8 608 0 1 5
## 12 556 0 2 3
## 14 225 0 1 1
## 16 988 0 1 1
Here a solution but maybe some constraints are not clear to me because it is fit another row respect your desired output.
avbi <- mean(df$birds)
ttw <- sort(df$wolfs, decreasing = T)[3]
df[df$birds < avbi & df$wolfs < ttw , ]
userid target birds wolfs
4 543 1 2 3
6 987 0 1 2
8 608 0 1 5
12 556 0 2 3
14 225 0 1 1
16 988 0 1 1
or with dplyr
df %>% filter(birds < avbi & wolfs < ttw)

Indexing subgroups by sorted positions in R dataframe

I have a dataframe which contains information about several categories, and some associated variables. It is of the form:
ID category sales score
227 A 109 21
131 A 410 24
131 A 509 1
123 B 2 61
545 B 19 5
234 C 439 328
654 C 765 41
What I would like to do is be able to introduce two new columns, salesRank and scoreRank, where I find the item index per category, had they been ordered by sales and score, respectively. I can solve the general case like this:
dF <- dF[order(-dF$sales),]
dF$salesRank<-seq.int(nrow(dF))
but this doesn't account for the categories and so far I've only solved this by breaking up the dataframe. What I want would result in the following:
ID category sales score salesRank scoreRank
227 A 109 21 3 2
131 A 410 24 2 1
131 A 509 1 1 3
123 B 2 61 2 1
545 B 19 5 1 2
234 C 439 328 2 1
654 C 765 41 1 2
Many thanks!
Try:
library(dplyr)
df %>%
group_by(category) %>%
mutate(salesRank = row_number(desc(sales)),
scoreRank = row_number(desc(score)))
Which gives:
#Source: local data frame [7 x 6]
#Groups: category
#
# ID category sales score salesRank scoreRank
#1 227 A 109 21 3 2
#2 131 A 410 24 2 1
#3 131 A 509 1 1 3
#4 123 B 2 61 2 1
#5 545 B 19 5 1 2
#6 234 C 439 328 2 1
#7 654 C 765 41 1 2
From the help:
row_number(): equivalent to rank(ties.method = "first")
min_rank(): equivalent to rank(ties.method = "min")
desc(): transform a vector into a format that will be sorted in descending
order.
As #thelatemail pointed out, for this particular dataset you might want to use min_rank() instead of row_number() which will account for ties in sales/score more appropriately:
> row_number(c(1,2,2,4))
#[1] 1 2 3 4
> min_rank(c(1,2,2,4))
#[1] 1 2 2 4
Use ave in base R with rank (the - is to reverse the rankings from low-to-high to high-to-low):
dF$salesRank <- with(dF, ave(-sales, category, FUN=rank) )
#[1] 3 2 1 2 1 2 1
dF$scoreRank <- with(dF, ave(-score, category, FUN=rank) )
#[1] 2 1 3 1 2 1 2
I have just a base R solution with tapply.
salesRank <- tapply(dat$sales, dat$category, order, decreasing = T)
scoreRank <- tapply(dat$score, dat$category, order, decreasing = T)
cbind(dat, salesRank = unlist(salesRank), scoreRank= unlist(scoreRank))
ID category sales score salesRank scoreRank
A1 227 A 109 21 3 2
A2 131 A 410 24 2 1
A3 131 A 509 1 1 3
B1 123 B 2 61 2 1
B2 545 B 19 5 1 2
C1 234 C 439 328 2 1
C2 654 C 765 41 1 2

compute a Means variable for a specific value in another variable

I would like to compute the mean age for every value from 1-7 in another variable called period.
This is how my data looks like:
work1 <- read.table(header=T, text="ID dead age gender inclusion_year diagnosis surv agrp period
87 0 25 2 2006 1 2174 1 5
396 0 19 2 2003 1 3077 1 3
446 0 23 2 2003 1 3144 1 3
497 0 19 2 2011 1 268 1 7
522 1 57 2 1999 1 3407 2 1
714 0 58 2 2003 1 3041 2 3
741 0 27 2 2004 1 2587 1 4
767 0 18 1 2008 1 1104 1 6
786 0 36 1 2005 1 2887 3 4
810 0 25 1 1998 1 3783 4 2")
This is a subset of a data with more then 1500 observations
This is what I'm trying to achieve:
sim <- read.table(header=T, text="Period diagnosis dead surv age
1 1 50 50000 35.5
2 1 80 70000 40.3
3 1 100 80000 32.8
4 1 120 100000 39.8
5 1 140 1200000 28.7
6 1 150 1400000 36.2
7 1 160 1600000 37.1")
In this data set I would like to group by period and diagnosis while all deaths(dead) and surv(survival time in days) is summarised in period time. I would also like for a mean value of the age in every period.
Have tried everything, still can't create the data set I'm striving for.
All help is appreciated!
You could try data.table
library(data.table)
as.data.table(work1)[, .(dead_sum=sum(dead),
surv_sum=sum(surv),
age_mean=mean(age)), keyby=.(period, diagnosis)]
Or dplyr
library(dplyr)
work1 %>% group_by(period, diagnosis) %>%
summarise(dead_sum=sum(dead), surv_sum=sum(surv), age_mean=mean(age))
# result
period diagnosis dead_sum surv_sum age_mean
1: 1 1 1 3407 57.00000
2: 2 1 0 3783 25.00000
3: 3 1 0 9262 33.33333
4: 4 1 0 5474 31.50000
5: 5 1 0 2174 25.00000
6: 6 1 0 1104 18.00000
7: 7 1 0 268 19.00000

Resources