How to annotate boxplots using svyboxplot library in R - r

I am trying to figure out how to label the boxplots that appear after I use the svyboxplot library for R.
I have tried the following:
svyboxplot(~ALCANYNO~factor(REGION), design=ihisDesign3, xlab='Region', ylab='Frequency', ylim=c(0,10), colnames=c("Northeast", "Midwest", "South", "West"));
SOLUTION: Add the following to factor:
labels = c('Northeast', 'Midwest', 'South', 'West')
This changes the example above to the following:
svyboxplot(~ALCANYNO~factor(REGION,
labels=c('Northeast', 'Midwest', 'South', 'West')),
design=ihisDesign3, xlab='Region', ylab='Frequency',
ylim =c (0, 10))

I am Creating a dataset to explain:
options(width = 120)
library (survey)
library (KernSmooth)
xd1<-
"xsmoke age_p psu stratum wt8
13601 3 22 2 20 356.5600
32966 3 38 2 45 434.3562
63493 1 32 1 87 699.9987
238175 3 46 1 338 982.8075
174162 3 40 1 240 273.6313
220206 3 33 2 308 1477.1688
118133 3 68 1 159 716.3012
142859 2 23 1 194 1100.9475
115253 2 35 2 155 444.3750
61675 3 31 1 85 769.5963
189813 3 37 1 263 328.5600
226274 1 47 2 318 605.8700
41969 3 71 2 58 597.0150
167667 3 40 2 230 1030.4637
225103 3 37 2 316 349.6825
49894 3 70 2 68 517.7862
98075 3 46 2 130 1428.7225
180771 3 50 1 250 652.4188
137057 3 42 1 186 590.2100
77705 2 23 1 105 1687.2450
89106 3 48 1 118 407.6513
208178 3 50 1 290 556.5000
100403 3 52 2 133 1481.8200
221571 1 27 2 310 833.5338
10823 2 72 1 16 1807.6425
108431 3 71 2 145 945.6263
68708 1 46 1 94 1989.3775
23874 3 23 2 33 1707.8775
150634 3 19 2 206 761.1500
231232 3 42 2 326 1487.4113
184654 2 42 2 255 1715.2375
215312 3 57 1 300 483.5663
40713 2 57 2 56 2042.2762
130309 3 23 1 177 948.5625
25515 2 55 1 35 2719.7525
235612 2 83 2 333 603.3537
13755 2 36 2 20 265.1938
2441 3 33 1 4 1062.1200
157327 3 77 1 215 2010.6600
66502 3 20 2 91 1122.9725
230778 1 55 2 325 1207.3025
74805 3 54 1 101 1028.5150
166556 1 50 1 229 1546.9450
91914 1 68 1 121 428.5350
89651 3 59 2 118 143.5437
149329 3 44 2 204 1064.7725
212700 2 59 2 295 1050.1163
454 1 79 1 1 275.5700
125639 1 27 1 170 785.1037
55442 3 47 1 76 950.3312
145132 3 77 1 197 1269.2287
123069 3 24 1 167 216.1937
188301 1 55 2 260 426.6313
852 2 66 2 1 1443.4887
3582 3 81 1 6 790.8412
235423 1 44 2 333 659.4238
42175 2 40 1 59 1089.6762
57033 3 43 1 78 226.8750
177273 2 85 1 244 392.7200
218558 3 40 2 305 1680.2700
27784 2 45 1 39 280.0550
81823 3 43 1 110 965.0438
76344 3 26 1 103 1095.6012
114916 3 56 2 154 436.8838
35563 3 78 1 49 333.2875
192279 3 30 2 267 722.0312
61315 1 48 2 84 1426.5725
219903 3 43 1 308 791.5738
42612 3 25 1 60 658.1387
178488 3 33 2 246 675.1912
9031 1 27 2 14 989.4863
145092 2 64 1 197 960.1912
71885 3 53 2 97 595.4050
38137 2 75 1 53 1004.0912
140149 1 21 1 190 1870.9350
162052 3 25 1 223 892.7775
89527 2 39 2 118 518.1050
59650 3 26 2 82 432.7837
24709 2 84 1 34 453.9013
18933 3 85 1 27 582.3288
24904 3 35 2 34 1027.5287
213668 3 39 1 298 3174.1925
110509 3 30 1 149 469.8188
72462 3 63 1 98 386.2163
152596 3 19 1 209 1328.2188
17014 4 62 1 24 294.9250
33467 2 50 1 46 1601.4575
5241 3 33 1 9 1651.0988
215094 3 23 1 300 427.6313
88885 1 21 1 118 1092.2613
204868 2 60 2 285 781.2325
157415 2 31 2 215 1323.5750
71081 2 44 2 96 1059.2088
25420 3 38 1 35 530.7413
144226 1 27 1 196 1126.3112
47888 3 46 2 66 965.4050
216179 3 29 2 301 1237.6463
29172 3 68 1 41 1025.9738
168786 1 47 1 232 680.6213
94035 2 23 2 124 330.4563
170542 1 25 2 234 757.2287
160331 2 33 2 220 636.3900
124163 3 80 2 167 287.6988
71442 2 37 1 97 442.2300
80191 2 74 2 107 871.0338
199309 3 29 2 277 485.2337
91293 3 35 2 120 138.3187
219524 2 68 1 307 609.5862
119336 3 85 2 160 149.7612
31814 3 68 1 44 396.6913
54920 1 28 2 75 532.7175
161034 3 29 2 221 791.0100
177037 1 50 1 244 626.2400
119963 1 54 1 162 374.1062
107972 2 58 1 145 944.8863
22932 3 60 1 32 310.6413
54197 3 23 2 74 931.2737
209598 3 23 1 292 1078.2950
213604 1 74 2 297 588.5000
146480 3 27 1 200 212.0588
162463 3 55 2 223 1202.0925
215534 3 33 2 300 430.3938
100703 1 53 1 134 463.6200
162588 3 27 1 224 612.0250
222676 1 35 1 312 292.7000
220052 3 84 1 308 1301.4738
131382 3 36 1 178 825.9512
102117 3 28 1 137 451.4075
70362 3 52 2 95 185.2562
188757 3 22 2 261 704.3913
215878 2 37 1 301 789.9837
45820 3 18 2 64 2019.4137
84860 3 47 1 113 149.0200
110581 3 37 1 149 526.0775
207650 3 51 2 289 688.0538
40723 3 59 2 56 497.6050
169663 3 19 2 233 845.0362
191955 1 36 1 267 735.7350
213816 3 18 2 298 2275.3513
120967 3 48 2 163 1055.3238
209430 2 42 2 291 1771.0225
21235 3 21 1 30 1204.5663
131326 3 29 1 178 331.9588
19667 1 57 1 28 638.9138
74743 2 48 1 101 1208.8763
178672 3 66 2 246 338.2013
100174 3 24 2 133 1733.6275
69046 3 24 2 94 542.4863
79960 1 41 2 107 567.6363
108591 2 42 1 146 978.3775
235635 3 24 1 334 1382.9437
187426 2 54 2 259 478.2362
28728 3 39 2 40 1165.6175
205348 3 32 2 286 1082.9913
218812 3 30 1 306 308.1037
168389 3 48 2 231 593.2475
145479 1 21 1 198 864.2663
105170 2 40 1 141 1016.7862
155753 2 78 2 212 1109.0025
169399 3 28 1 233 1467.1363
55664 1 63 1 76 904.3763
74024 2 51 1 100 547.5538
85558 1 25 1 114 893.8825
142684 3 54 2 193 1203.3212
198792 1 22 1 277 1800.3325
82603 3 70 2 110 827.3763
171036 2 50 2 235 2003.9725
1616 1 42 2 2 590.5662
57042 3 45 1 78 1021.7287
45100 2 38 2 63 1807.9288
134828 2 28 1 183 715.1187
91167 3 26 2 120 480.1950
170605 3 40 2 234 507.2763
175869 3 77 1 242 386.2987
81594 2 82 2 109 580.0838
37426 1 20 2 52 1159.1613
113799 3 85 1 153 459.5450
24721 3 18 2 34 2912.7575
26297 3 45 2 36 1304.4925
57074 1 51 1 78 602.2112
185000 3 34 1 256 583.5738
94196 3 44 2 124 2344.1087
80656 3 45 2 108 1340.9713
14849 1 46 1 22 967.2525
145730 2 73 1 198 418.8037
56633 3 34 2 77 1011.5488
273 2 54 1 1 786.2138
60567 1 40 2 83 315.2925
47788 1 38 2 66 1105.9188
76943 2 53 2 103 537.7062
165014 3 34 1 227 824.3125
188444 3 22 1 261 623.2225
29043 1 35 1 41 724.9025
165578 3 25 1 228 596.0275
50702 3 43 2 69 985.9662
197621 3 39 2 275 1310.1163
26267 3 41 2 36 1030.3900
29565 1 60 2 41 920.8550
20060 3 36 2 28 157.2188
119780 2 20 1 162 863.8100"
tor <- read.table(textConnection(xd1), header=TRUE, as.is=TRUE)
# Grouping variable "xsmoke" must be a factor
tor$xsmoke <- factor(tor$xsmoke,levels=c (1,2,3),
labels=c('Current SMK','Former SMK', 'Never Smk'), ordered=TRUE)
is.factor(tor$xsmoke)
# object with survey design variables and data
nhis <- svydesign (id=~psu,strat=~stratum, weights=~wt8, data=tor, nest=TRUE)
MyBreaks <- c(18, 25, 35, 45, 55, 65, 75, 85)
svyboxplot (age_p~xsmoke,
subset (nhis, age_p>=0),
col=c("red", "yellow", "green"), medcol="blue",
varwidth=TRUE, all.outliers=TRUE,
ylab="Age at Interview",
xlab=" "
)
The Factor variable xsmoke is coded as tor$xsmoke <- factor(tor$xsmoke,levels=c (1,2,3),
labels=c('Current SMK','Former SMK', 'Never Smk'), ordered=TRUE) which should be useful
__________________________________________enter code here

Related

How to test for p-value with groups/filters in dplyr

My data looks like the example below. (sorry if it's too long, not sure what's acceptable/needed).
I have used the following code to calculate the median and IQR of each time difference (tdif) between tests (testno):
data %>% group_by(testno) %>% filter(type ==1) %>%
summarise(Median = median(tdif), IQR= IQR(tdif), n= n(), .groups = 'keep') -> result
I have done this for each category of 'type' (coded as 1 - 10), which brought me to the added table (bottom).
My question is, if it is possible to:
Do this an easier way (without the filters? So I can do this all in 1 run), and
Is it possible run a test for p-value with all the groups/filters?
data <- read.table(header=T, text= '
PID time tdif testno type
3 205 0 1 1
4 77 0 1 1
4 85 8 2 1
4 126 41 3 1
4 165 39 4 1
4 202 37 5 1
4 238 36 6 1
4 272 34 7 1
4 277 5 8 1
4 370 93 9 1
4 397 27 10 1
4 452 55 11 1
4 522 70 12 1
4 529 7 13 1
4 608 79 14 1
4 651 43 15 1
4 655 4 16 1
4 713 58 17 1
4 804 91 18 1
4 900 96 19 1
4 944 44 20 1
4 979 35 21 1
4 1015 36 22 1
4 1051 36 23 1
4 1077 26 24 1
4 1124 47 25 1
4 1162 38 26 1
4 1222 60 27 1
4 1334 112 28 1
4 1383 49 29 1
4 1457 74 30 1
4 1506 49 31 1
4 1590 84 32 1
4 1768 178 33 1
4 1838 70 34 1
4 1880 42 35 1
4 1915 35 36 1
4 1973 58 37 1
4 2017 44 38 1
4 2090 73 39 1
4 2314 224 40 1
4 2381 67 41 1
4 2433 52 42 1
4 2484 51 43 1
4 2694 210 44 1
4 2731 37 45 1
4 2792 61 46 1
4 2958 166 47 1
5 48 0 1 3
5 111 63 2 3
5 699 588 3 3
5 1077 378 4 3
6 -43 0 1 3
8 67 0 1 1
8 168 101 2 1
8 314 146 3 1
8 368 54 4 1
8 586 218 5 1
10 639 0 1 6
13 -454 0 1 3
13 -384 70 2 3
13 -185 199 3 3
13 193 378 4 3
13 375 182 5 3
13 564 189 6 3
13 652 88 7 3
13 669 17 8 3
13 718 49 9 3
14 704 0 1 8
15 -165 0 1 3
15 -138 27 2 3
15 1335 1473 3 3
16 168 0 1 6
18 -1329 0 1 3
18 -1177 152 2 3
18 -1071 106 3 3
18 -945 126 4 3
18 -834 111 5 3
18 -719 115 6 3
18 -631 88 7 3
18 -497 134 8 3
18 -376 121 9 3
18 -193 183 10 3
18 -78 115 11 3
18 -13 65 12 3
18 100 113 13 3
18 196 96 14 3
18 552 356 15 3
18 650 98 16 3
18 737 87 17 3
18 804 67 18 3
18 902 98 19 3
18 983 81 20 3
18 1119 136 21 3
19 802 0 1 1
19 1593 791 2 1
26 314 0 1 8
26 389 75 2 8
26 597 208 3 8
33 639 0 1 6
Added table (values differ from example data, because this isn't the complete set).

Issue with R Shiny App --> Interactive Survival Plots

I am new to R Shiny and I am trying to build a Shiny Web App that produces a survival plot with two reactive inputs. The first input is the study (total=4). The second input is the groups (total=19) to compare survival curves. Ideally, these two inputs would allow me to see in a particular study, how group X survival curve compares to the survival curve of all other groups.
Here is a sample of my data:
ObsNum UniqueID Time Censored Group Group2 Study
1 523B95015 27 1 1 523 1
2 523B95014 27 1 1 523 1
3 523B85051 27 1 1 523 1
4 523B95009 27 1 1 523 1
5 523B85048 27 1 1 523 1
6 523B85050 27 1 1 523 1
7 675B89002 27 1 8 675 1
8 556B95006 27 1 12 556 1
9 556B85030 27 1 12 556 1
10 556B85044 27 1 12 556 1
11 556B95035 27 1 12 556 1
12 556B95000 27 1 12 556 1
13 556B95004 27 1 12 556 1
14 556B95002 27 1 12 556 1
15 756Y81172 27 1 17 756 1
16 741B95022 27 1 99 741 1
17 741B95020 27 1 99 741 1
18 619B92008 28 1 7 619 1
19 552B89003 28 1 10 552 1
20 101B94097 28 1 99 101 1
21 101B94098 28 1 99 101 1
22 618C84582 29 1 23 618 1
23 618C84580 29 1 23 618 1
24 618C84581 29 1 23 618 1
25 730B90003 29 1 99 730 1
26 646B42015 34 1 4 646 1
27 671B60009 35 1 17 671 1
28 612C80247 35 1 21 612 1
29 700C64500 35 1 99 791 1
30 101B89052 40 1 99 101 1
31 101B85047 40 1 99 101 1
32 101B95068 40 1 99 101 1
33 538B70011 51 1 10 538 1
34 689C85036 57 1 1 689 1
35 689C95450 57 1 1 689 1
36 556B85050 62 1 12 556 1
37 636B80005 62 1 23 636 1
38 636B92002 62 1 23 636 1
39 630B30005 70 1 2 630 1
40 642B80021 78 1 4 642 1
41 101B79173 86 1 99 101 1
42 523B81007 106 0 1 523 1
43 620B88003 106 0 2 620 1
44 642B40002 106 1 4 642 1
45 642B40001 106 1 4 642 1
46 581B81002 106 0 5 581 1
47 581B81001 106 0 5 581 1
48 573B95000 106 0 8 573 1
49 589B80015 106 0 15 589 1
50 589B80016 106 0 15 589 1
51 657B50013 106 0 15 657 1
52 657B43004 106 0 15 657 1
53 459B85085 106 0 21 459 1
54 459Y81171 106 0 21 459 1
55 101B75006 106 0 99 101 1
56 101SC8023 106 0 99 101 1
57 101B85122 106 0 99 101 1
58 101B55116 106 0 99 101 1
59 101B79086 106 0 99 101 1
60 101B95066 106 0 99 101 1
61 730B97005 106 0 99 730 1
62 741B85045 106 0 99 741 1
63 777B96001 106 0 99 777 1
64 556B85077 1 1 12 556 2
65 636B92003 1 1 23 636 2
66 101B94137 1 1 99 101 2
67 700C64500 5 1 99 791 2
68 463Y91171 6 1 20 463 2
69 618C84319 6 1 23 618 2
70 776C93046 6 1 99 776 2
71 556B95042 7 1 12 556 2
72 556B95043 7 1 12 556 2
73 556B97000 7 1 12 556 2
74 549B80069 7 1 17 549 2
75 573B95000 22 1 8 573 2
76 580B90024 22 1 16 580 2
77 523B81007 28 1 1 523 2
78 520B60012 32 1 16 520 2
79 520B70011 32 1 16 520 2
80 586B70008 33 1 16 586 2
81 586B80006 33 1 16 586 2
82 586B80011 33 1 16 586 2
83 586B80015 33 1 16 586 2
84 657B43004 34 1 15 657 2
85 636B99009 35 1 23 636 2
86 691B68018 36 1 22 691 2
87 657B50013 41 1 15 657 2
88 741B95031 42 1 99 741 2
89 620B88003 46 0 2 620 2
90 620B90008 46 0 2 620 2
91 581B81001 46 0 5 581 2
92 581B81002 46 0 5 581 2
93 552B99002 46 0 10 552 2
94 459B85085 46 0 21 459 2
95 459B95055 46 0 21 459 2
96 101B75006 46 0 99 101 2
97 101B55060 46 0 99 101 2
98 101B79086 46 0 99 101 2
99 101B79058 46 0 99 101 2
100 101B85122 46 0 99 101 2
101 101B89115 46 0 99 101 2
102 101B85047 46 0 99 101 2
103 101B94123 46 0 99 101 2
104 101B95091 46 0 99 101 2
105 101B95038 46 0 99 101 2
106 101D98001 46 0 99 101 2
107 730B97005 46 0 99 730 2
108 741B85045 46 0 99 741 2
Here is my code for the Shiny App:
library(shiny)
library(ggplot2)
library(survival)
library(survminer)
library(dplyr)
attach(tdata)
studychoices=unique(tdata$Study)
groupchoices=unique(tdata$Group)
# Define UI
ui <- fluidPage(
titlePanel("Survival Data"),
selectInput(inputId = "studyselector",label="Select a Study:", choices=studychoices),
selectInput(inputId = "groupselector",label="Select a Group:", choices=groupchoices),
plotOutput("p1")
)
# Define server logic
server <- function(input, output) {
filter=reactive({
filteredData=tdata[tdata$Study==input$studyselector,]
return(filteredData)
})
output$p1=renderPlot({
fit=survfit(Surv(Time,Censored)~input$groupselector,data=filter())
ggsurvplot(fit,data=filter(),pval=TRUE,xlim=c(0,max(Time)+1),
title=paste("Study","INSERT HERE STUDY #", "Survival Plot for Group","INSERT HERE GROUP #"),
xlab="Time (Days)",
ggtheme=theme(plot.title=element_text(hjust=0.5)))
})
}
# Run the application
shinyApp(ui = ui, server = server)
I have the following two questions:
1.) When I run the App, I get an error in an external window that reads, "Error: variable lengths differ (found for'input$groupselector'). There are no NAs in this data and I specified the data to be used is the filter() dataset based on the Study selection so I'm not sure why this error is popping up.
2.) How would I be able to dynamically change the Study # and the Group # in the title? I understand how to do that with a normal R function, but I'm a little lost with the Shiny set up.
Any help would be appreciated! Thank you!
Edited solution
This should now work - have edited the Group column to be a binary in/out when passed to your reactive dataframe, which should colour the lines appropriately:
library(tidyverse)
library(survival)
library(survminer)
library(shiny)
ui <- fluidPage(
titlePanel("Survival Data"),
selectInput(inputId = "studyselector",label="Select a Study:", choices=studychoices),
selectInput(inputId = "groupselector",label="Select a Group:", choices=groupchoices),
plotOutput("p1")
)
# Define server logic
server <- function(input, output) {
filter=reactive({
filteredData=data[data$Study==input$studyselector,]
filteredData['Group'] = ifelse(filteredData$Group==input$groupselector,
input$groupselector,
"Others")
return(filteredData)
})
output$p1=renderPlot({
fit=survfit(Surv(Time,Censored)~Group,data=filter()) # `Group` as variable to stratify by?
ggsurvplot(fit,data=filter(),pval=TRUE,xlim=c(0,max(filter()$Time)+1),
title=paste("Study",
input$studyselector, # paste these bits straight in
"Survival Plot for Group",
input$groupselector), # here too
xlab="Time (Days)",
ggtheme=theme(plot.title=element_text(hjust=0.5)))
})
}
# Run the application
shinyApp(ui = ui, server = server)
Do let me know in comments if there are any mistakes or further ideas/questions!

Error in x[j] : invalid subscript type 'list' while using subset in R

I have a problem while I'm trying to subset my dataframe. Here is the code that I'm using to import data file and sub-setting;
fiber_val<-read.csv(file.choose(), header=TRUE, dec=",", check.names=FALSE,stringsAsFactors=FALSE)
y<-14
z<-16
fiber_val[, y:z] <- sapply(fiber_val[, y:z], as.numeric)
fiber_val$sg<-(fiber_val$airdryweight/1.077)/fiber_val$waterweight
fiber_val<-subset(fiber_val, select = c(id,sample,standtreedisk,density,sg))
after running the last line, it yells at me
Error in x[j] : invalid subscript type 'list'
and here's part of data set that I'm using;
id stand tree disk species region standtreedisk nirblock sample barktopith pithtobark length sections ringssection airdryweight waterweight density
1 160 7 10 131 6 160x7x10 749 16907 4 2 52 5 2 0.6489 1.3245 0.48992
2 160 7 10 131 6 160x7x10 749 16905 2 4 52 5 3 0.6062 1.2206 0.49664
3 160 7 12 131 6 160x7x12 750 16915 2 3 43 4 2 0.6438 1.3279 0.48483
4 160 7 13 131 6 160x7x13 750 16919 2 2 30 3 3 0.5816 1.4101 0.41245
5 161 17 12 131 6 161x17x12 760 17166 4 2 50 5 1 0.5702 1.3952 0.40869
6 161 17 12 131 6 161x17x12 760 17167 5 1 50 5 1 0.5454 1.3307 0.40986
7 161 17 12 131 6 161x17x12 760 17163 1 5 50 5 1 0.6947 1.5702 0.44243
8 161 17 13 131 6 161x17x13 760 17170 3 1 32 3 2 0.4357 1.2244 0.35585
9 26 9 7 131 4 26x9x7 140 3883 8 1 82 8 2 0.4595 1.3503 0.34029
10 161 17 13 131 6 161x17x13 760 17169 2 2 32 3 1 0.484 1.2843 0.37686
11 136 50 1 131 6 136x50x1 579 12482 9 1 96 9 2 0.5392
12 137 54 5 131 4 137x54x5 586 12636 4 4 73 7 1 0.4692
13 137 54 5 131 4 137x54x5 586 12638 6 2 73 7 2 0.4555
14 137 54 6 131 4 137x54x6 586 12640 1 6 65 6 4 0.6449
15 137 54 1 131 4 137x54x1 585 12606 5 5 90 9 1 0.7035
16 137 54 1 131 4 137x54x1 585 12610 9 1 90 9 2 0.4963
17 137 54 1 131 4 137x54x1 585 12609 8 2 90 9 2 0.5193
18 137 54 1 131 4 137x54x1 585 12603 2 8 90 9 3 0.6427
19 137 54 6 131 4 137x54x6 586 12644 5 2 65 6 1 0.4654
20 137 54 4 131 4 137x54x4 585 12632 7 1 76 7 2 0.4974
21 137 54 5 131 4 137x54x5 586 12639 7 1 73 7 2
22 137 5 3 131 4 137x5x3 582 12557 2 7 82 8 3
23 137 74 3 131 4 137x74x3 588 12679 3 5 71 7 2
24 137 74 3 131 4 137x74x3 588 12683 7 1 71 7 1
25 137 5 3 131 4 137x5x3 582 12562 7 2 82 8 1
26 137 74 5 131 4 137x74x5 588 12695 6 1 61 6 2
27 138 108 1 131 4 138x108x1 594 12830 6 5 104 10 1
28 138 108 1 131 4 138x108x1 594 12831 7 4 104 10 2
29 138 108 1 131 4 138x108x1 594 12832 8 3 104 10 2
30 138 66 1 131 4 138x66x1 592 12781 5 4 87 8 2
any help would be appreciated :)
you appear to have a problem with the type of object you have. you can try converting it to a data frame using unlist, as.data.frame(), etc.

Plotting Stacked bar plot of large dataset and setting bar limits of plot in r

I am trying to plot a stacked bar plot of my dataset which is data.csv and which is as below.Apologies for posting large dataset.
degree Freq.x Freq.y
1 2978 0
2 1779 33
3 1390 22
4 919 19
5 787 16
6 676 22
7 578 16
8 513 23
9 460 11
10 376 17
11 345 13
12 292 17
13 291 14
14 286 8
15 269 15
16 216 10
17 192 18
18 183 10
19 184 7
20 190 10
21 157 9
22 155 14
23 127 9
24 151 15
25 119 10
26 102 6
27 113 7
28 99 6
29 98 4
30 103 7
31 94 11
32 79 7
33 76 5
34 73 8
35 76 11
36 59 5
37 58 5
38 61 5
39 63 7
40 68 9
41 63 4
42 57 8
43 45 6
44 45 4
45 39 3
46 40 6
47 42 6
48 30 3
49 36 7
50 28 5
51 33 1
52 32 6
53 34 5
54 43 4
55 35 6
56 29 2
57 27 4
58 35 6
59 25 4
60 24 4
61 32 4
62 15 2
63 24 5
64 25 4
65 23 9
66 25 7
67 27 7
68 22 7
69 23 7
70 17 6
71 19 4
72 19 4
73 19 2
74 18 2
75 19 6
76 12 3
77 25 6
78 23 9
79 20 4
80 17 6
81 15 5
82 13 4
83 14 4
84 13 5
85 15 1
86 13 1
87 12 5
88 14 5
89 16 4
90 12 3
91 10 3
92 12 5
93 12 7
94 10 0
95 11 4
96 12 3
97 6 5
98 20 7
99 5 3
100 8 3
101 11 2
102 11 3
103 8 0
104 14 4
105 15 2
106 7 0
107 7 1
108 6 0
109 9 2
110 10 1
111 8 1
112 6 1
113 8 1
114 8 2
115 7 4
116 3 1
117 4 2
118 5 0
120 5 0
121 1 0
122 9 2
123 7 3
124 4 1
125 3 0
126 3 2
127 7 3
128 5 3
129 3 1
130 3 0
131 5 1
132 5 2
133 2 0
134 5 2
135 10 1
136 5 2
137 3 1
138 7 2
139 6 2
140 3 1
141 5 1
142 9 4
143 3 1
144 2 1
145 4 2
146 2 0
147 2 2
148 3 1
149 1 0
150 1 0
151 2 1
152 3 1
153 3 1
154 2 1
155 3 1
156 6 4
157 4 2
158 3 1
159 4 1
160 2 1
161 2 1
163 3 1
164 5 2
165 2 1
166 3 0
167 4 4
168 2 1
169 1 0
170 2 2
171 3 2
172 1 0
173 4 3
174 3 2
175 1 1
177 3 3
178 3 2
179 1 0
180 3 1
181 2 0
182 1 1
183 3 1
184 2 2
185 2 1
186 3 1
187 2 1
188 1 1
191 1 0
192 1 0
193 1 0
195 4 2
196 2 2
197 4 1
198 1 0
199 2 1
200 1 0
201 2 2
202 1 0
204 2 0
206 3 1
207 1 0
208 1 0
209 2 1
211 1 1
212 2 1
213 2 2
214 1 1
215 1 1
218 2 2
220 2 1
222 3 1
223 2 2
224 1 1
225 1 1
226 1 1
227 2 1
228 2 1
230 3 1
231 1 1
233 2 2
234 3 1
235 1 1
236 1 1
237 1 1
239 2 2
241 1 1
242 1 0
243 1 0
244 1 1
245 1 1
246 1 1
247 2 0
250 2 1
251 3 2
252 1 1
253 2 2
254 1 1
256 1 1
258 2 1
260 1 1
262 1 1
264 1 0
267 1 1
268 1 1
269 1 1
270 1 1
271 2 1
272 1 1
275 2 1
276 1 1
277 2 2
278 1 0
280 1 1
283 1 0
285 2 1
290 1 1
291 1 1
294 1 1
299 1 1
301 4 3
303 1 1
304 2 0
305 1 1
307 1 1
311 1 1
314 2 1
317 1 1
318 1 1
319 1 1
321 1 1
323 1 1
329 2 1
330 1 1
333 1 0
334 1 1
335 1 1
337 1 1
339 1 1
342 1 1
343 1 0
350 2 2
356 1 1
368 1 0
370 2 2
377 1 1
390 1 1
392 1 1
394 1 1
406 1 1
408 1 1
409 1 1
419 1 1
424 1 1
427 1 1
451 1 1
459 1 1
461 1 1
462 1 0
478 1 1
479 1 0
488 1 1
530 1 1
550 1 1
553 1 1
568 1 0
594 1 1
608 1 1
622 1 1
625 1 1
626 1 1
628 1 1
646 1 1
648 1 1
652 1 1
655 1 1
656 1 1
660 1 0
688 1 1
723 1 1
732 1 1
740 1 1
761 1 1
769 1 0
845 1 1
865 1 1
1063 1 1
1105 1 1
1242 1 1
1737 1 1
1989 1 1
2456 1 1
9588 1 1
I want to plot stacked barplot in which i want to compare the degree in freq.x and freq.y field. That means on x axis there will be degree and on on y axis there will be frequency.I tried the ggplot2 function in r and plotted stacked bar plot. But the problem is my dataset is large so i want to combine bar limits. The code which i tried is as follow.
d_ap <- read.csv("data.csv")
l_nw <- data.frame(d_ap)
library(reshape2)
final_df <- melt(l_nw, id.var="Degree")
library(ggplot2)
ggplot(final_df, aes(x = Degree, y = value, fill = variable)) +
geom_bar(stat = "identity")
this will output a barplot but i want to set bar limits on x-axis and in my desired output of bar plot on x-axis i want to plot degree from 1 to 10 in individual bars. Then from degree 11 to 9588 i want to club it in bars like 11 to 20 then 20 to 30 and then 30 to 50 and 50 to 9588. How can i set bar limits on x-axis like this..?? So that by setting this bar limit i can better visualize my stacked bar plot.
Is that what you want?
final_df$cdegree=cut(final_df$degree,c(0,1,2,3,4,5,6,7,8,9,10,20,30,50,9590))
library(ggplot2)
ggplot(final_df, aes(x = cdegree, y = value, fill = variable)) +
geom_bar(stat = "identity")

Binning continuous data to stack histogram in R

I have a dataset that looks like this:
USER.ID avgfrequency orders group
1 3 3.7821782 101 3
2 7 14.7500000 8 3
3 9 13.4761905 21 3
4 13 5.1967213 61 3
5 16 6.7812500 64 3
6 26 41.7500000 4 2
7 49 13.6666667 3 2
8 50 7.0000000 1 1
9 51 1.0000000 1 1
10 52 17.7500000 4 2
11 69 4.5000000 2 1
12 75 9.9500000 20 3
13 91 84.2000000 5 2
14 98 8.0185185 54 3
15 138 14.2000000 5 2
16 139 34.7500000 4 2
17 149 7.6666667 21 3
18 155 35.3333333 9 3
19 167 24.0000000 1 1
20 170 7.3529412 34 3
21 171 4.4210526 76 3
22 174 4.5000000 2 1
23 175 6.5781250 64 3
24 176 19.2857143 21 3
25 177 10.4864865 37 3
26 178 28.0000000 15 3
27 180 4.8461538 39 3
28 183 25.5000000 2 1
29 184 13.0000000 1 1
30 210 32.0000000 1 1
31 215 13.4615385 13 3
32 220 11.3611111 36 3
33 223 26.2500000 8 3
34 224 40.5000000 8 3
35 230 15.4000000 10 3
36 232 14.6666667 3 2
37 234 34.5833333 12 3
38 238 138.5000000 2 1
39 240 7.0000000 3 2
40 243 35.0000000 3 2
41 246 6.7500000 4 2
42 247 8.5000000 50 3
43 258 17.6666667 3 2
44 283 23.5000000 2 1
45 295 19.5625000 16 3
46 300 81.6666667 3 2
47 311 34.4166667 12 3
48 338 64.0000000 1 1
49 342 113.3333333 3 2
50 343 197.0000000 1 1
51 347 3.6923077 13 3
52 350 4.6666667 3 2
53 360 177.5000000 2 1
54 361 39.0000000 10 3
55 362 1.4000000 5 2
56 365 15.0000000 24 3
57 366 59.2000000 5 2
58 367 5.0000000 4 2
59 369 27.9285714 14 3
60 372 63.6666667 3 2
61 375 9.3750000 8 3
62 377 13.3225806 31 3
63 380 169.5000000 2 1
64 383 23.2352941 17 3
65 391 0.0000000 1 1
I want to split avgfrequency into different bins of width 10 and plot it as x-axis and on y-axis I want to show the count of USER.ID as histograms and in each bar I want to show count of USER.ID of different group with different color. So, each histogram would have three different colors for each bin.
Is it possible to do it in R ?
It is possible. See below:
library(ggplot2) #load the ggplot2 graph package
data = data.frame(data) #make the dataset a R dataframe object
head(data,2) #just showing part of the data here.
USER.ID avgfrequency orders group
3 3.782178 101 3
7 14.750000 8 3
#build graph
ggplot(data, aes(x=avgfrequency,fill=factor(group))) +
geom_histogram(breaks=seq(0,200,by=10),colour='black') +
xlab("Average Frequency") + ylab("Count of USER.ID") +
scale_fill_manual("Group", breaks = c("1","2","3"), values = c("grey30","grey50", "grey70")) +
theme_bw()

Resources