Mean imputation method: graph representation? - r

Good morning,
I'm dealing with graphic representation of missing data imputeted via mean imputation method. This is the dataset I'm working on:
> data2
age fev ht sex smoke
1 9 1.708 145 1 1
2 8 1.724 171 1 1
3 7 1.720 138 1 1
4 9 1.558 135 2 1
5 9 1.895 145 2 1
6 8 2.336 155 1 1
7 6 1.919 147 1 1
8 6 1.415 142 1 1
9 8 1.987 149 1 1
10 9 1.942 152 1 1
11 6 1.602 135 1 1
12 8 1.735 137 2 1
13 8 2.193 149 1 1
14 8 2.118 154 2 1
15 8 2.258 147 2 1
16 7 1.932 135 2 1
17 5 1.472 127 2 1
18 6 1.878 NA 1 1
19 9 2.352 150 2 1
20 9 2.604 156 2 1
21 5 1.400 124 1 1
22 5 1.256 133 1 1
23 4 0.839 122 1 1
24 7 2.578 159 2 1
25 9 2.988 165 1 1
26 3 1.404 131 2 1
27 9 2.348 152 2 1
28 5 1.755 132 2 1
29 8 2.980 152 1 1
30 9 2.100 152 1 1
31 5 1.282 124 1 1
32 9 3.000 166 2 1
33 8 2.673 152 1 1
34 7 2.093 146 1 1
35 5 1.612 132 1 1
36 8 2.175 150 1 1
37 9 2.725 150 2 1
38 8 2.071 140 2 1
39 8 1.547 145 2 1
40 8 2.004 145 2 1
41 9 3.135 152 1 1
42 8 2.420 150 2 1
43 5 1.776 130 2 1
44 8 1.931 145 1 1
45 5 1.343 127 1 1
46 9 2.076 145 1 1
47 7 1.624 137 2 1
48 8 1.344 133 1 1
49 6 1.650 140 2 1
50 8 2.732 154 2 1
51 5 2.017 138 2 1
52 9 2.797 156 1 1
53 9 NA 157 2 1
54 8 1.703 138 2 1
55 6 1.634 137 2 1
56 9 2.570 145 2 1
57 9 3.016 159 1 1
58 7 2.419 152 1 1
59 4 1.569 127 1 1
60 8 1.698 146 1 1
61 8 2.123 152 2 1
62 8 2.481 152 1 1
63 6 1.481 130 1 1
64 4 1.577 124 1 1
65 8 1.940 150 2 1
66 6 1.747 146 2 1
67 9 2.069 147 2 1
68 7 1.631 141 1 1
69 5 1.536 132 1 1
70 9 2.560 154 1 1
71 8 1.962 145 2 1
72 8 2.531 147 1 1
73 9 2.715 152 2 1
74 9 2.457 150 2 1
75 9 2.090 151 2 1
76 7 1.789 142 2 1
77 5 1.858 135 2 1
78 5 1.452 130 2 1
79 9 NA 175 2 1
80 6 1.719 135 1 1
81 7 2.111 145 1 1
82 6 1.695 135 1 1
83 8 2.211 160 2 1
84 8 1.794 138 2 1
85 7 1.917 147 1 1
86 8 2.144 NA 1 1
87 7 1.253 132 2 1
88 9 2.659 156 2 1
89 5 1.580 133 2 1
90 9 2.126 157 2 1
91 9 3.029 156 1 1
92 9 2.964 164 2 1
93 7 1.611 NA 2 1
94 8 2.215 152 1 1
95 8 2.388 152 1 1
96 9 2.196 155 2 1
97 9 1.751 147 2 1
98 9 2.165 156 2 1
99 7 1.682 140 2 1
100 8 1.523 140 2 1
101 8 1.292 132 1 1
102 7 1.649 137 2 1
103 9 2.588 160 2 1
104 4 0.796 119 2 1
105 9 2.574 154 1 1
106 6 1.979 142 2 1
107 8 2.354 149 2 1
108 6 1.718 140 2 1
109 7 1.742 149 1 1
110 7 1.603 130 1 1
111 8 2.639 151 1 1
112 7 1.829 137 1 1
113 7 2.084 147 2 1
114 7 2.220 147 2 1
115 7 1.473 133 1 1
116 8 2.341 154 1 1
117 7 1.698 138 1 1
118 5 1.196 118 1 1
119 8 1.872 144 1 1
120 7 2.219 140 2 1
121 9 2.420 145 2 1
122 7 1.827 138 1 1
123 7 1.461 137 1 1
124 6 1.338 NA 2 1
125 8 2.090 145 2 1
126 8 1.697 150 1 1
127 8 1.562 140 2 1
128 9 2.040 141 1 1
129 7 1.609 131 1 1
130 8 2.458 155 1 1
131 9 2.650 161 2 1
132 8 1.429 146 2 1
133 8 1.675 135 2 1
134 9 1.947 144 1 1
135 8 2.069 137 2 1
136 6 1.572 132 2 1
137 6 1.348 135 2 1
138 8 2.288 156 1 1
139 9 1.773 149 2 1
140 5 0.791 132 1 1
141 7 1.905 147 2 1
142 9 2.463 155 1 1
143 6 1.431 130 2 1
144 9 2.631 157 1 1
145 9 3.114 164 2 1
146 9 2.135 149 2 1
147 6 1.527 133 2 1
148 8 2.293 147 1 1
149 9 3.042 168 1 1
150 8 2.927 161 2 1
151 8 2.665 163 1 1
152 9 2.301 149 2 1
153 9 2.460 163 2 1
154 9 2.592 154 1 1
155 7 1.750 140 1 1
156 8 1.759 135 2 1
157 6 1.536 122 2 1
158 9 2.259 149 1 1
159 9 2.048 164 1 1
160 9 2.571 154 2 1
161 7 2.046 142 2 1
162 8 1.780 149 1 1
163 5 1.552 137 1 1
164 8 1.953 147 1 1
165 9 2.893 164 2 1
166 6 1.713 128 2 1
167 9 2.851 152 1 1
168 6 1.624 131 2 1
169 8 2.631 150 2 1
170 5 1.819 135 2 1
171 7 1.658 135 2 1
172 7 2.158 136 2 1
173 4 1.789 132 2 1
174 9 3.004 163 1 1
175 8 2.503 160 2 1
176 9 1.933 147 1 1
177 9 2.091 149 1 1
178 9 2.316 NA 1 1
179 5 1.704 NA 1 1
180 9 1.606 146 1 1
181 7 1.165 119 2 1
182 6 2.102 141 1 1
183 9 2.320 145 1 1
184 9 2.230 155 2 1
185 9 1.716 141 2 1
186 7 1.790 136 2 1
187 5 1.146 127 1 1
188 8 2.187 156 1 1
189 9 2.717 156 2 1
190 7 1.796 140 2 1
191 9 1.953 147 2 2
192 8 1.335 144 1 1
193 9 2.119 145 2 1
194 6 1.666 132 2 1
195 6 1.826 133 2 1
196 8 2.709 159 1 1
197 9 2.871 165 2 1
198 5 1.092 127 1 1
199 6 2.262 146 2 1
200 6 2.104 144 2 1
I've used the following code to return back the observed vs the imputed data scenario and the sided scatterplot of Y="fev" versus X="age".
1. FIRST GRAPH
library(lattice)
par(mfrow=c(1,2))
breaks <- seq(-20, 200, 10)
nudge <- 1
lwd <- 1.5
x <- matrix(c(breaks-nudge, breaks+nudge), ncol=2, nrow = 46)
obs <- data2[,"fev"]
mis <- imp$imp$fev[,1]
fobs <- c(hist(obs, breaks, plot=FALSE)$fev, 0)
fmis <- c(hist(mis, breaks, plot=FALSE)$fev, 0)
y <- matrix(c(fobs, fmis), ncol=2, nrow = 46)
matplot(x, y, type="s",
col=c(mdc(4),mdc(5)), lwd=2, lty=1,
xlim = c(0, 150), ylim = c(0,40), yaxs = "i",
xlab="fev",
ylab="Frequency")
box()
2. SECOND GRAPH
tp <- xyplot(imp, fev ~ age, na.groups=ici(imp),
ylab="fev", xlab="age",
cex = 0.75, lex=lwd, pch=19,
ylim = c(-20, 180), xlim = c(0,350))
print(tp, newpage = FALSE, position = c(0.48,0.08,1,0.92))
Although the code works well, I'm not so as to its validity, as I am supposed to have back a graphic results like those I let attached here enter image description here, whereas I'm keeping on getting a sort of graphs like these enter image description here
What do you think about? Any clue as to making the right cade out?
Thanks for helping

You didn't post the complete code. It isn't exactly clear what your imp data, which you are trying to plot looks like. The data you posted is named data2, but I don't really know at which point in your code this is used.
As for reasons, why your code might not not show anything, it seems the range for fev is about from 0 to 3. age is about from 1 to 10.
But axis limits in the first plot are:
xlim = c(0, 150), ylim = c(0,40)
In the second plot
ylim = c(-20, 180), xlim = c(0,350)
Which means, the actual data you want to plot is in quite a small area of the plot (as you can see).
You have to adjust your axis limits to the range of your data.

Related

How can I sum the combined responses of multiple levels of one factor

Here is a piece of my dataset:
Plot Rate Rep Plant Tuber Weight
1 101 1 1 1 1 179.4
2 101 1 1 1 2 99.4
3 101 1 1 1 3 72.4
4 101 1 1 1 4 111.5
5 101 1 1 1 5 44.9
6 101 1 1 1 6 55.3
7 101 1 1 1 7 12.6
8 101 1 1 1 8 106.7
9 101 1 1 1 9 96.7
10 101 1 1 1 10 52.5
11 101 1 1 2 1 151.1
12 101 1 1 2 2 171.7
13 101 1 1 2 3 93.0
14 101 1 1 2 4 82.4
15 101 1 1 2 5 143.9
16 101 1 1 2 6 115.6
17 101 1 1 2 7 141.3
18 101 1 1 2 8 72.6
19 101 1 1 2 9 97.2
20 101 1 1 2 10 146.8
21 101 1 1 2 11 104.0
22 101 1 1 2 12 121.6
23 101 1 1 3 1 150.9
24 101 1 1 3 2 47.1
25 101 1 1 3 3 59.6
26 101 1 1 3 4 94.2
27 101 1 1 3 5 167.4
28 101 1 1 3 6 55.2
29 101 1 1 3 7 21.8
30 101 1 1 3 8 79.6
31 101 1 1 3 9 92.2
32 101 1 1 3 10 78.0
33 101 1 1 3 11 61.8
34 101 1 1 3 12 9.5
35 101 1 1 3 13 2.7
36 101 1 1 3 14 3.8
37 101 1 1 3 15 1.1
38 103 2 1 1 1 24.8
39 103 2 1 1 2 70.1
40 103 2 1 1 3 90.7
41 103 2 1 1 4 75.1
42 103 2 1 1 5 97.9
43 103 2 1 1 6 44.6
44 103 2 1 1 7 65.1
45 103 2 1 1 8 74.5
46 103 2 1 1 9 6.2
47 103 2 1 1 10 7.4
48 103 2 1 1 11 46.1
49 103 2 1 1 12 43.8
50 103 2 1 1 13 61.8
51 103 2 1 1 14 88.2
52 103 2 1 1 15 64.4
53 103 2 1 1 16 35.0
54 103 2 1 1 17 6.0
55 103 2 1 1 18 6.4
56 103 2 1 1 19 55.2
57 103 2 1 1 20 12.1
58 103 2 1 1 21 2.2
59 103 2 1 1 22 4.6
60 103 2 1 1 23 2.3
61 103 2 1 2 1 76.2
62 103 2 1 2 2 63.2
63 103 2 1 2 3 85.3
64 103 2 1 2 4 1.3
65 103 2 1 2 5 59.7
66 103 2 1 2 6 94.9
67 103 2 1 2 7 1.2
68 103 2 1 3 1 103.1
69 103 2 1 3 2 1.6
70 103 2 1 3 3 52.9
71 103 2 1 3 4 101.7
72 103 2 1 3 5 68.5
73 103 2 1 3 6 74.1
74 103 2 1 3 7 106.0
75 103 2 1 3 8 62.7
76 103 2 1 3 9 65.0
77 103 2 1 3 10 47.5
78 103 2 1 3 11 1.2
79 103 2 1 3 12 5.3
80 103 2 1 3 13 8.3
81 103 2 1 3 14 5.5
82 103 2 1 3 15 2.5
83 104 3 1 1 1 150.3
84 104 3 1 1 2 218.8
85 104 3 1 1 3 149.4
86 104 3 1 1 4 144.7
87 104 3 1 1 5 112.5
88 104 3 1 1 6 144.5
89 104 3 1 1 7 139.0
90 104 3 1 1 8 156.9
91 104 3 1 1 9 120.2
92 104 3 1 1 10 46.3
93 104 3 1 1 11 43.4
94 104 3 1 1 12 81.3
95 104 3 1 1 13 7.1
96 104 3 1 1 14 33.3
97 104 3 1 1 15 31.2
98 104 3 1 1 16 12.8
99 104 3 1 1 17 1.5
100 104 3 1 1 18 116.9
101 104 3 1 1 19 52.5
102 104 3 1 2 1 11.5
103 104 3 1 2 2 130.0
104 104 3 1 2 3 NA
105 104 3 1 2 4 125.9
106 104 3 1 2 5 103.6
107 104 3 1 2 6 43.0
108 104 3 1 2 7 79.0
109 104 3 1 2 8 79.4
110 104 3 1 2 9 51.1
111 104 3 1 2 10 1.9
112 104 3 1 2 11 4.5
113 104 3 1 2 12 17.2
114 104 3 1 2 13 58.2
115 104 3 1 2 14 71.6
116 104 3 1 2 15 80.4
117 104 3 1 2 16 44.1
118 104 3 1 2 17 62.4
119 104 3 1 2 18 52.9
120 104 3 1 2 19 28.0
121 104 3 1 2 20 89.4
122 104 3 1 2 21 62.7
123 104 3 1 2 22 55.5
124 104 3 1 2 23 0.8
125 104 3 1 2 24 22.5
126 104 3 1 2 25 2.5
127 104 3 1 2 26 1.6
128 104 3 1 2 27 46.6
129 104 3 1 3 1 191.9
130 104 3 1 3 2 153.2
131 104 3 1 3 3 137.0
132 104 3 1 3 4 90.8
133 104 3 1 3 5 152.8
134 104 3 1 3 6 69.2
135 104 3 1 3 7 11.6
136 104 3 1 3 8 58.7
137 104 3 1 3 9 53.2
138 104 3 1 3 10 68.4
139 104 3 1 3 11 46.0
140 104 3 1 3 12 75.6
141 104 3 1 3 13 68.9
142 104 3 1 3 14 94.8
143 104 3 1 3 15 89.7
This covers one of the four reps in my overall dataset. I am looking for a way to get the total collective tuber weight of the three collected plants, averaged between the four reps, for each rate. To be clear, I want to have the weight of all tubers for all 3 of the plants of each Rate/Rep combination added into one final value, then I want to get the average of this final value between the 4 unique Reps (repetitions) that make up each of the 4 Rates.
When calculating the first two reps of rate 1 by hand, I get the following:
Rate Rep TotalResponse
1 1 1 3197.5
2 1 2 2367.4
To be as clear as possible, the "TotalResponse" column shows the total sum of the "Weight" responses for each tuber in all three plants of that unique Rate/Rep combination.
When taking the average of those two responses, I get this:
Rate AvgResponse
1 1 2782.4
In reality, I need to do the first step for all 4 reps and not just 2 of them, and then I need the final table to have this average response for each of the 4 rates.
Rate AvgResponse
1 1 2782.4
2 1 xxxx
3 1 xxxx
4 1 xxxx
Thanks in advance for any help.
We may need to group by 'Rate', 'Rep', get the sum of 'Weight', then do a group by 'Rate' and return the mean of 'TotalResponse'
library(dplyr)
df1 %>%
group_by(Rate, Rep) %>%
summarise(TotalResponse = sum(Weight, na.rm = TRUE),
.groups = 'drop_last') %>%
group_by(Rate) %>%
summarise(AvgResponse = mean(TotalResponse))

How to test for p-value with groups/filters in dplyr

My data looks like the example below. (sorry if it's too long, not sure what's acceptable/needed).
I have used the following code to calculate the median and IQR of each time difference (tdif) between tests (testno):
data %>% group_by(testno) %>% filter(type ==1) %>%
summarise(Median = median(tdif), IQR= IQR(tdif), n= n(), .groups = 'keep') -> result
I have done this for each category of 'type' (coded as 1 - 10), which brought me to the added table (bottom).
My question is, if it is possible to:
Do this an easier way (without the filters? So I can do this all in 1 run), and
Is it possible run a test for p-value with all the groups/filters?
data <- read.table(header=T, text= '
PID time tdif testno type
3 205 0 1 1
4 77 0 1 1
4 85 8 2 1
4 126 41 3 1
4 165 39 4 1
4 202 37 5 1
4 238 36 6 1
4 272 34 7 1
4 277 5 8 1
4 370 93 9 1
4 397 27 10 1
4 452 55 11 1
4 522 70 12 1
4 529 7 13 1
4 608 79 14 1
4 651 43 15 1
4 655 4 16 1
4 713 58 17 1
4 804 91 18 1
4 900 96 19 1
4 944 44 20 1
4 979 35 21 1
4 1015 36 22 1
4 1051 36 23 1
4 1077 26 24 1
4 1124 47 25 1
4 1162 38 26 1
4 1222 60 27 1
4 1334 112 28 1
4 1383 49 29 1
4 1457 74 30 1
4 1506 49 31 1
4 1590 84 32 1
4 1768 178 33 1
4 1838 70 34 1
4 1880 42 35 1
4 1915 35 36 1
4 1973 58 37 1
4 2017 44 38 1
4 2090 73 39 1
4 2314 224 40 1
4 2381 67 41 1
4 2433 52 42 1
4 2484 51 43 1
4 2694 210 44 1
4 2731 37 45 1
4 2792 61 46 1
4 2958 166 47 1
5 48 0 1 3
5 111 63 2 3
5 699 588 3 3
5 1077 378 4 3
6 -43 0 1 3
8 67 0 1 1
8 168 101 2 1
8 314 146 3 1
8 368 54 4 1
8 586 218 5 1
10 639 0 1 6
13 -454 0 1 3
13 -384 70 2 3
13 -185 199 3 3
13 193 378 4 3
13 375 182 5 3
13 564 189 6 3
13 652 88 7 3
13 669 17 8 3
13 718 49 9 3
14 704 0 1 8
15 -165 0 1 3
15 -138 27 2 3
15 1335 1473 3 3
16 168 0 1 6
18 -1329 0 1 3
18 -1177 152 2 3
18 -1071 106 3 3
18 -945 126 4 3
18 -834 111 5 3
18 -719 115 6 3
18 -631 88 7 3
18 -497 134 8 3
18 -376 121 9 3
18 -193 183 10 3
18 -78 115 11 3
18 -13 65 12 3
18 100 113 13 3
18 196 96 14 3
18 552 356 15 3
18 650 98 16 3
18 737 87 17 3
18 804 67 18 3
18 902 98 19 3
18 983 81 20 3
18 1119 136 21 3
19 802 0 1 1
19 1593 791 2 1
26 314 0 1 8
26 389 75 2 8
26 597 208 3 8
33 639 0 1 6
Added table (values differ from example data, because this isn't the complete set).

Error in x[j] : invalid subscript type 'list' while using subset in R

I have a problem while I'm trying to subset my dataframe. Here is the code that I'm using to import data file and sub-setting;
fiber_val<-read.csv(file.choose(), header=TRUE, dec=",", check.names=FALSE,stringsAsFactors=FALSE)
y<-14
z<-16
fiber_val[, y:z] <- sapply(fiber_val[, y:z], as.numeric)
fiber_val$sg<-(fiber_val$airdryweight/1.077)/fiber_val$waterweight
fiber_val<-subset(fiber_val, select = c(id,sample,standtreedisk,density,sg))
after running the last line, it yells at me
Error in x[j] : invalid subscript type 'list'
and here's part of data set that I'm using;
id stand tree disk species region standtreedisk nirblock sample barktopith pithtobark length sections ringssection airdryweight waterweight density
1 160 7 10 131 6 160x7x10 749 16907 4 2 52 5 2 0.6489 1.3245 0.48992
2 160 7 10 131 6 160x7x10 749 16905 2 4 52 5 3 0.6062 1.2206 0.49664
3 160 7 12 131 6 160x7x12 750 16915 2 3 43 4 2 0.6438 1.3279 0.48483
4 160 7 13 131 6 160x7x13 750 16919 2 2 30 3 3 0.5816 1.4101 0.41245
5 161 17 12 131 6 161x17x12 760 17166 4 2 50 5 1 0.5702 1.3952 0.40869
6 161 17 12 131 6 161x17x12 760 17167 5 1 50 5 1 0.5454 1.3307 0.40986
7 161 17 12 131 6 161x17x12 760 17163 1 5 50 5 1 0.6947 1.5702 0.44243
8 161 17 13 131 6 161x17x13 760 17170 3 1 32 3 2 0.4357 1.2244 0.35585
9 26 9 7 131 4 26x9x7 140 3883 8 1 82 8 2 0.4595 1.3503 0.34029
10 161 17 13 131 6 161x17x13 760 17169 2 2 32 3 1 0.484 1.2843 0.37686
11 136 50 1 131 6 136x50x1 579 12482 9 1 96 9 2 0.5392
12 137 54 5 131 4 137x54x5 586 12636 4 4 73 7 1 0.4692
13 137 54 5 131 4 137x54x5 586 12638 6 2 73 7 2 0.4555
14 137 54 6 131 4 137x54x6 586 12640 1 6 65 6 4 0.6449
15 137 54 1 131 4 137x54x1 585 12606 5 5 90 9 1 0.7035
16 137 54 1 131 4 137x54x1 585 12610 9 1 90 9 2 0.4963
17 137 54 1 131 4 137x54x1 585 12609 8 2 90 9 2 0.5193
18 137 54 1 131 4 137x54x1 585 12603 2 8 90 9 3 0.6427
19 137 54 6 131 4 137x54x6 586 12644 5 2 65 6 1 0.4654
20 137 54 4 131 4 137x54x4 585 12632 7 1 76 7 2 0.4974
21 137 54 5 131 4 137x54x5 586 12639 7 1 73 7 2
22 137 5 3 131 4 137x5x3 582 12557 2 7 82 8 3
23 137 74 3 131 4 137x74x3 588 12679 3 5 71 7 2
24 137 74 3 131 4 137x74x3 588 12683 7 1 71 7 1
25 137 5 3 131 4 137x5x3 582 12562 7 2 82 8 1
26 137 74 5 131 4 137x74x5 588 12695 6 1 61 6 2
27 138 108 1 131 4 138x108x1 594 12830 6 5 104 10 1
28 138 108 1 131 4 138x108x1 594 12831 7 4 104 10 2
29 138 108 1 131 4 138x108x1 594 12832 8 3 104 10 2
30 138 66 1 131 4 138x66x1 592 12781 5 4 87 8 2
any help would be appreciated :)
you appear to have a problem with the type of object you have. you can try converting it to a data frame using unlist, as.data.frame(), etc.

Plotting Stacked bar plot of large dataset and setting bar limits of plot in r

I am trying to plot a stacked bar plot of my dataset which is data.csv and which is as below.Apologies for posting large dataset.
degree Freq.x Freq.y
1 2978 0
2 1779 33
3 1390 22
4 919 19
5 787 16
6 676 22
7 578 16
8 513 23
9 460 11
10 376 17
11 345 13
12 292 17
13 291 14
14 286 8
15 269 15
16 216 10
17 192 18
18 183 10
19 184 7
20 190 10
21 157 9
22 155 14
23 127 9
24 151 15
25 119 10
26 102 6
27 113 7
28 99 6
29 98 4
30 103 7
31 94 11
32 79 7
33 76 5
34 73 8
35 76 11
36 59 5
37 58 5
38 61 5
39 63 7
40 68 9
41 63 4
42 57 8
43 45 6
44 45 4
45 39 3
46 40 6
47 42 6
48 30 3
49 36 7
50 28 5
51 33 1
52 32 6
53 34 5
54 43 4
55 35 6
56 29 2
57 27 4
58 35 6
59 25 4
60 24 4
61 32 4
62 15 2
63 24 5
64 25 4
65 23 9
66 25 7
67 27 7
68 22 7
69 23 7
70 17 6
71 19 4
72 19 4
73 19 2
74 18 2
75 19 6
76 12 3
77 25 6
78 23 9
79 20 4
80 17 6
81 15 5
82 13 4
83 14 4
84 13 5
85 15 1
86 13 1
87 12 5
88 14 5
89 16 4
90 12 3
91 10 3
92 12 5
93 12 7
94 10 0
95 11 4
96 12 3
97 6 5
98 20 7
99 5 3
100 8 3
101 11 2
102 11 3
103 8 0
104 14 4
105 15 2
106 7 0
107 7 1
108 6 0
109 9 2
110 10 1
111 8 1
112 6 1
113 8 1
114 8 2
115 7 4
116 3 1
117 4 2
118 5 0
120 5 0
121 1 0
122 9 2
123 7 3
124 4 1
125 3 0
126 3 2
127 7 3
128 5 3
129 3 1
130 3 0
131 5 1
132 5 2
133 2 0
134 5 2
135 10 1
136 5 2
137 3 1
138 7 2
139 6 2
140 3 1
141 5 1
142 9 4
143 3 1
144 2 1
145 4 2
146 2 0
147 2 2
148 3 1
149 1 0
150 1 0
151 2 1
152 3 1
153 3 1
154 2 1
155 3 1
156 6 4
157 4 2
158 3 1
159 4 1
160 2 1
161 2 1
163 3 1
164 5 2
165 2 1
166 3 0
167 4 4
168 2 1
169 1 0
170 2 2
171 3 2
172 1 0
173 4 3
174 3 2
175 1 1
177 3 3
178 3 2
179 1 0
180 3 1
181 2 0
182 1 1
183 3 1
184 2 2
185 2 1
186 3 1
187 2 1
188 1 1
191 1 0
192 1 0
193 1 0
195 4 2
196 2 2
197 4 1
198 1 0
199 2 1
200 1 0
201 2 2
202 1 0
204 2 0
206 3 1
207 1 0
208 1 0
209 2 1
211 1 1
212 2 1
213 2 2
214 1 1
215 1 1
218 2 2
220 2 1
222 3 1
223 2 2
224 1 1
225 1 1
226 1 1
227 2 1
228 2 1
230 3 1
231 1 1
233 2 2
234 3 1
235 1 1
236 1 1
237 1 1
239 2 2
241 1 1
242 1 0
243 1 0
244 1 1
245 1 1
246 1 1
247 2 0
250 2 1
251 3 2
252 1 1
253 2 2
254 1 1
256 1 1
258 2 1
260 1 1
262 1 1
264 1 0
267 1 1
268 1 1
269 1 1
270 1 1
271 2 1
272 1 1
275 2 1
276 1 1
277 2 2
278 1 0
280 1 1
283 1 0
285 2 1
290 1 1
291 1 1
294 1 1
299 1 1
301 4 3
303 1 1
304 2 0
305 1 1
307 1 1
311 1 1
314 2 1
317 1 1
318 1 1
319 1 1
321 1 1
323 1 1
329 2 1
330 1 1
333 1 0
334 1 1
335 1 1
337 1 1
339 1 1
342 1 1
343 1 0
350 2 2
356 1 1
368 1 0
370 2 2
377 1 1
390 1 1
392 1 1
394 1 1
406 1 1
408 1 1
409 1 1
419 1 1
424 1 1
427 1 1
451 1 1
459 1 1
461 1 1
462 1 0
478 1 1
479 1 0
488 1 1
530 1 1
550 1 1
553 1 1
568 1 0
594 1 1
608 1 1
622 1 1
625 1 1
626 1 1
628 1 1
646 1 1
648 1 1
652 1 1
655 1 1
656 1 1
660 1 0
688 1 1
723 1 1
732 1 1
740 1 1
761 1 1
769 1 0
845 1 1
865 1 1
1063 1 1
1105 1 1
1242 1 1
1737 1 1
1989 1 1
2456 1 1
9588 1 1
I want to plot stacked barplot in which i want to compare the degree in freq.x and freq.y field. That means on x axis there will be degree and on on y axis there will be frequency.I tried the ggplot2 function in r and plotted stacked bar plot. But the problem is my dataset is large so i want to combine bar limits. The code which i tried is as follow.
d_ap <- read.csv("data.csv")
l_nw <- data.frame(d_ap)
library(reshape2)
final_df <- melt(l_nw, id.var="Degree")
library(ggplot2)
ggplot(final_df, aes(x = Degree, y = value, fill = variable)) +
geom_bar(stat = "identity")
this will output a barplot but i want to set bar limits on x-axis and in my desired output of bar plot on x-axis i want to plot degree from 1 to 10 in individual bars. Then from degree 11 to 9588 i want to club it in bars like 11 to 20 then 20 to 30 and then 30 to 50 and 50 to 9588. How can i set bar limits on x-axis like this..?? So that by setting this bar limit i can better visualize my stacked bar plot.
Is that what you want?
final_df$cdegree=cut(final_df$degree,c(0,1,2,3,4,5,6,7,8,9,10,20,30,50,9590))
library(ggplot2)
ggplot(final_df, aes(x = cdegree, y = value, fill = variable)) +
geom_bar(stat = "identity")

How to annotate boxplots using svyboxplot library in R

I am trying to figure out how to label the boxplots that appear after I use the svyboxplot library for R.
I have tried the following:
svyboxplot(~ALCANYNO~factor(REGION), design=ihisDesign3, xlab='Region', ylab='Frequency', ylim=c(0,10), colnames=c("Northeast", "Midwest", "South", "West"));
SOLUTION: Add the following to factor:
labels = c('Northeast', 'Midwest', 'South', 'West')
This changes the example above to the following:
svyboxplot(~ALCANYNO~factor(REGION,
labels=c('Northeast', 'Midwest', 'South', 'West')),
design=ihisDesign3, xlab='Region', ylab='Frequency',
ylim =c (0, 10))
I am Creating a dataset to explain:
options(width = 120)
library (survey)
library (KernSmooth)
xd1<-
"xsmoke age_p psu stratum wt8
13601 3 22 2 20 356.5600
32966 3 38 2 45 434.3562
63493 1 32 1 87 699.9987
238175 3 46 1 338 982.8075
174162 3 40 1 240 273.6313
220206 3 33 2 308 1477.1688
118133 3 68 1 159 716.3012
142859 2 23 1 194 1100.9475
115253 2 35 2 155 444.3750
61675 3 31 1 85 769.5963
189813 3 37 1 263 328.5600
226274 1 47 2 318 605.8700
41969 3 71 2 58 597.0150
167667 3 40 2 230 1030.4637
225103 3 37 2 316 349.6825
49894 3 70 2 68 517.7862
98075 3 46 2 130 1428.7225
180771 3 50 1 250 652.4188
137057 3 42 1 186 590.2100
77705 2 23 1 105 1687.2450
89106 3 48 1 118 407.6513
208178 3 50 1 290 556.5000
100403 3 52 2 133 1481.8200
221571 1 27 2 310 833.5338
10823 2 72 1 16 1807.6425
108431 3 71 2 145 945.6263
68708 1 46 1 94 1989.3775
23874 3 23 2 33 1707.8775
150634 3 19 2 206 761.1500
231232 3 42 2 326 1487.4113
184654 2 42 2 255 1715.2375
215312 3 57 1 300 483.5663
40713 2 57 2 56 2042.2762
130309 3 23 1 177 948.5625
25515 2 55 1 35 2719.7525
235612 2 83 2 333 603.3537
13755 2 36 2 20 265.1938
2441 3 33 1 4 1062.1200
157327 3 77 1 215 2010.6600
66502 3 20 2 91 1122.9725
230778 1 55 2 325 1207.3025
74805 3 54 1 101 1028.5150
166556 1 50 1 229 1546.9450
91914 1 68 1 121 428.5350
89651 3 59 2 118 143.5437
149329 3 44 2 204 1064.7725
212700 2 59 2 295 1050.1163
454 1 79 1 1 275.5700
125639 1 27 1 170 785.1037
55442 3 47 1 76 950.3312
145132 3 77 1 197 1269.2287
123069 3 24 1 167 216.1937
188301 1 55 2 260 426.6313
852 2 66 2 1 1443.4887
3582 3 81 1 6 790.8412
235423 1 44 2 333 659.4238
42175 2 40 1 59 1089.6762
57033 3 43 1 78 226.8750
177273 2 85 1 244 392.7200
218558 3 40 2 305 1680.2700
27784 2 45 1 39 280.0550
81823 3 43 1 110 965.0438
76344 3 26 1 103 1095.6012
114916 3 56 2 154 436.8838
35563 3 78 1 49 333.2875
192279 3 30 2 267 722.0312
61315 1 48 2 84 1426.5725
219903 3 43 1 308 791.5738
42612 3 25 1 60 658.1387
178488 3 33 2 246 675.1912
9031 1 27 2 14 989.4863
145092 2 64 1 197 960.1912
71885 3 53 2 97 595.4050
38137 2 75 1 53 1004.0912
140149 1 21 1 190 1870.9350
162052 3 25 1 223 892.7775
89527 2 39 2 118 518.1050
59650 3 26 2 82 432.7837
24709 2 84 1 34 453.9013
18933 3 85 1 27 582.3288
24904 3 35 2 34 1027.5287
213668 3 39 1 298 3174.1925
110509 3 30 1 149 469.8188
72462 3 63 1 98 386.2163
152596 3 19 1 209 1328.2188
17014 4 62 1 24 294.9250
33467 2 50 1 46 1601.4575
5241 3 33 1 9 1651.0988
215094 3 23 1 300 427.6313
88885 1 21 1 118 1092.2613
204868 2 60 2 285 781.2325
157415 2 31 2 215 1323.5750
71081 2 44 2 96 1059.2088
25420 3 38 1 35 530.7413
144226 1 27 1 196 1126.3112
47888 3 46 2 66 965.4050
216179 3 29 2 301 1237.6463
29172 3 68 1 41 1025.9738
168786 1 47 1 232 680.6213
94035 2 23 2 124 330.4563
170542 1 25 2 234 757.2287
160331 2 33 2 220 636.3900
124163 3 80 2 167 287.6988
71442 2 37 1 97 442.2300
80191 2 74 2 107 871.0338
199309 3 29 2 277 485.2337
91293 3 35 2 120 138.3187
219524 2 68 1 307 609.5862
119336 3 85 2 160 149.7612
31814 3 68 1 44 396.6913
54920 1 28 2 75 532.7175
161034 3 29 2 221 791.0100
177037 1 50 1 244 626.2400
119963 1 54 1 162 374.1062
107972 2 58 1 145 944.8863
22932 3 60 1 32 310.6413
54197 3 23 2 74 931.2737
209598 3 23 1 292 1078.2950
213604 1 74 2 297 588.5000
146480 3 27 1 200 212.0588
162463 3 55 2 223 1202.0925
215534 3 33 2 300 430.3938
100703 1 53 1 134 463.6200
162588 3 27 1 224 612.0250
222676 1 35 1 312 292.7000
220052 3 84 1 308 1301.4738
131382 3 36 1 178 825.9512
102117 3 28 1 137 451.4075
70362 3 52 2 95 185.2562
188757 3 22 2 261 704.3913
215878 2 37 1 301 789.9837
45820 3 18 2 64 2019.4137
84860 3 47 1 113 149.0200
110581 3 37 1 149 526.0775
207650 3 51 2 289 688.0538
40723 3 59 2 56 497.6050
169663 3 19 2 233 845.0362
191955 1 36 1 267 735.7350
213816 3 18 2 298 2275.3513
120967 3 48 2 163 1055.3238
209430 2 42 2 291 1771.0225
21235 3 21 1 30 1204.5663
131326 3 29 1 178 331.9588
19667 1 57 1 28 638.9138
74743 2 48 1 101 1208.8763
178672 3 66 2 246 338.2013
100174 3 24 2 133 1733.6275
69046 3 24 2 94 542.4863
79960 1 41 2 107 567.6363
108591 2 42 1 146 978.3775
235635 3 24 1 334 1382.9437
187426 2 54 2 259 478.2362
28728 3 39 2 40 1165.6175
205348 3 32 2 286 1082.9913
218812 3 30 1 306 308.1037
168389 3 48 2 231 593.2475
145479 1 21 1 198 864.2663
105170 2 40 1 141 1016.7862
155753 2 78 2 212 1109.0025
169399 3 28 1 233 1467.1363
55664 1 63 1 76 904.3763
74024 2 51 1 100 547.5538
85558 1 25 1 114 893.8825
142684 3 54 2 193 1203.3212
198792 1 22 1 277 1800.3325
82603 3 70 2 110 827.3763
171036 2 50 2 235 2003.9725
1616 1 42 2 2 590.5662
57042 3 45 1 78 1021.7287
45100 2 38 2 63 1807.9288
134828 2 28 1 183 715.1187
91167 3 26 2 120 480.1950
170605 3 40 2 234 507.2763
175869 3 77 1 242 386.2987
81594 2 82 2 109 580.0838
37426 1 20 2 52 1159.1613
113799 3 85 1 153 459.5450
24721 3 18 2 34 2912.7575
26297 3 45 2 36 1304.4925
57074 1 51 1 78 602.2112
185000 3 34 1 256 583.5738
94196 3 44 2 124 2344.1087
80656 3 45 2 108 1340.9713
14849 1 46 1 22 967.2525
145730 2 73 1 198 418.8037
56633 3 34 2 77 1011.5488
273 2 54 1 1 786.2138
60567 1 40 2 83 315.2925
47788 1 38 2 66 1105.9188
76943 2 53 2 103 537.7062
165014 3 34 1 227 824.3125
188444 3 22 1 261 623.2225
29043 1 35 1 41 724.9025
165578 3 25 1 228 596.0275
50702 3 43 2 69 985.9662
197621 3 39 2 275 1310.1163
26267 3 41 2 36 1030.3900
29565 1 60 2 41 920.8550
20060 3 36 2 28 157.2188
119780 2 20 1 162 863.8100"
tor <- read.table(textConnection(xd1), header=TRUE, as.is=TRUE)
# Grouping variable "xsmoke" must be a factor
tor$xsmoke <- factor(tor$xsmoke,levels=c (1,2,3),
labels=c('Current SMK','Former SMK', 'Never Smk'), ordered=TRUE)
is.factor(tor$xsmoke)
# object with survey design variables and data
nhis <- svydesign (id=~psu,strat=~stratum, weights=~wt8, data=tor, nest=TRUE)
MyBreaks <- c(18, 25, 35, 45, 55, 65, 75, 85)
svyboxplot (age_p~xsmoke,
subset (nhis, age_p>=0),
col=c("red", "yellow", "green"), medcol="blue",
varwidth=TRUE, all.outliers=TRUE,
ylab="Age at Interview",
xlab=" "
)
The Factor variable xsmoke is coded as tor$xsmoke <- factor(tor$xsmoke,levels=c (1,2,3),
labels=c('Current SMK','Former SMK', 'Never Smk'), ordered=TRUE) which should be useful
__________________________________________enter code here

Resources