creating bargraph with 2 variables on ggplo2 - r
I am trying to create a bar graph showing the count of some variables I grouped from a table using quintiles. I want this bargraph to be segmented by gender (it's in the data).
So it should look show me the prejudice cut and how often it is showcased in the data segmented by the gender.
prejiducequint = quantile(dta.sub2$WHPREJUDICE6R,seq(0,1, length = 5))
prejiducecut = cut(dta.sub2$WHPREJUDICE6R, breaks = incomequint)
p = ggplot(dta.sub2, aes(prejiducecut, fill = FEMALE6)) +
geom_bar(position = "dodge")
p
here is the data
WHPREJUDICE6R prejiducecut FEMALE6
1 -0.005 (-0.0125,0] 0
2 0.075 (0.0275,1] 0
3 0.000 (-0.0125,0] 0
4 0.000 (-0.0125,0] 1
5 -0.020 (-0.44,-0.0125] 1
6 0.130 (0.0275,1] 1
7 0.090 (0.0275,1] 0
8 0.230 (0.0275,1] 0
9 0.100 (0.0275,1] 1
10 0.870 (0.0275,1] 1
11 -0.130 (-0.44,-0.0125] 1
12 -0.010 (-0.0125,0] 1
13 0.005 (0,0.0275] 0
14 -0.010 (-0.0125,0] 1
15 -0.060 (-0.44,-0.0125] 1
16 0.000 (-0.0125,0] 1
17 -0.010 (-0.0125,0] 1
18 0.000 (-0.0125,0] 1
19 -0.075 (-0.44,-0.0125] 1
20 0.005 (0,0.0275] 1
21 -0.010 (-0.0125,0] 0
22 -0.060 (-0.44,-0.0125] 1
23 0.500 (0.0275,1] 0
24 0.020 (0,0.0275] 0
25 0.135 (0.0275,1] 0
26 -0.055 (-0.44,-0.0125] 1
27 -0.440 <NA> 0
28 0.000 (-0.0125,0] 0
29 -0.065 (-0.44,-0.0125] 0
30 0.000 (-0.0125,0] 1
31 0.035 (0.0275,1] 1
32 -0.005 (-0.0125,0] 0
33 0.000 (-0.0125,0] 1
34 -0.290 (-0.44,-0.0125] 1
35 0.005 (0,0.0275] 1
36 0.300 (0.0275,1] 0
37 0.005 (0,0.0275] 1
38 0.070 (0.0275,1] 1
39 -0.195 (-0.44,-0.0125] 1
40 -0.260 (-0.44,-0.0125] 0
41 -0.040 (-0.44,-0.0125] 1
42 0.720 (0.0275,1] 0
43 0.045 (0.0275,1] 1
44 0.125 (0.0275,1] 1
45 0.035 (0.0275,1] 0
46 0.005 (0,0.0275] 1
47 0.000 (-0.0125,0] 0
48 0.000 (-0.0125,0] 0
49 0.000 (-0.0125,0] 1
50 0.010 (0,0.0275] 1
51 0.495 (0.0275,1] 1
52 0.000 (-0.0125,0] 1
53 0.000 (-0.0125,0] 1
54 0.010 (0,0.0275] 0
55 -0.015 (-0.44,-0.0125] 1
56 -0.110 (-0.44,-0.0125] 0
57 0.000 (-0.0125,0] 0
58 0.065 (0.0275,1] 1
59 0.255 (0.0275,1] 1
60 -0.020 (-0.44,-0.0125] 1
61 0.070 (0.0275,1] 1
62 0.000 (-0.0125,0] 0
63 1.000 (0.0275,1] 0
64 0.000 (-0.0125,0] 1
65 0.490 (0.0275,1] 0
66 -0.005 (-0.0125,0] 1
67 0.000 (-0.0125,0] 0
68 0.010 (0,0.0275] 1
69 0.000 (-0.0125,0] 1
70 -0.065 (-0.44,-0.0125] 1
71 0.005 (0,0.0275] 0
72 -0.065 (-0.44,-0.0125] 0
73 0.060 (0.0275,1] 0
74 0.000 (-0.0125,0] 0
75 0.000 (-0.0125,0] 1
76 0.155 (0.0275,1] 0
77 -0.190 (-0.44,-0.0125] 0
78 0.000 (-0.0125,0] 0
79 -0.065 (-0.44,-0.0125] 0
80 0.005 (0,0.0275] 1
81 0.060 (0.0275,1] 0
82 -0.100 (-0.44,-0.0125] 1
83 0.000 (-0.0125,0] 1
84 0.005 (0,0.0275] 0
85 0.000 (-0.0125,0] 1
86 0.300 (0.0275,1] 1
87 -0.070 (-0.44,-0.0125] 1
88 0.430 (0.0275,1] 0
89 -0.060 (-0.44,-0.0125] 1
90 -0.005 (-0.0125,0] 1
91 0.000 (-0.0125,0] 1
92 -0.005 (-0.0125,0] 1
93 0.015 (0,0.0275] 0
94 -0.205 (-0.44,-0.0125] 0
95 0.000 (-0.0125,0] 1
96 0.045 (0.0275,1] 0
97 -0.075 (-0.44,-0.0125] 0
98 0.000 (-0.0125,0] 0
99 0.000 (-0.0125,0] 1
100 0.000 (-0.0125,0] 1
101 0.235 (0.0275,1] 0
102 -0.060 (-0.44,-0.0125] 1
103 0.505 (0.0275,1] 0
104 -0.185 (-0.44,-0.0125] 1
105 0.185 (0.0275,1] 0
106 -0.115 (-0.44,-0.0125] 0
107 0.005 (0,0.0275] 0
108 -0.440 <NA> 1
109 -0.100 (-0.44,-0.0125] 1
110 0.430 (0.0275,1] 1
111 -0.005 (-0.0125,0] 0
112 0.000 (-0.0125,0] 1
113 0.000 (-0.0125,0] 0
114 -0.120 (-0.44,-0.0125] 1
115 0.005 (0,0.0275] 0
116 0.145 (0.0275,1] 0
117 0.110 (0.0275,1] 0
118 -0.010 (-0.0125,0] 0
119 0.000 (-0.0125,0] 1
120 -0.005 (-0.0125,0] 0
121 -0.060 (-0.44,-0.0125] 0
122 0.120 (0.0275,1] 1
123 -0.240 (-0.44,-0.0125] 0
124 -0.005 (-0.0125,0] 0
125 0.000 (-0.0125,0] 1
126 -0.060 (-0.44,-0.0125] 0
127 -0.305 (-0.44,-0.0125] 1
128 0.050 (0.0275,1] 0
129 0.000 (-0.0125,0] 0
130 -0.005 (-0.0125,0] 0
131 -0.005 (-0.0125,0] 0
132 0.000 (-0.0125,0] 1
133 -0.045 (-0.44,-0.0125] 1
134 -0.005 (-0.0125,0] 0
135 0.000 (-0.0125,0] 1
136 -0.065 (-0.44,-0.0125] 1
137 0.000 (-0.0125,0] 1
138 0.055 (0.0275,1] 0
139 0.020 (0,0.0275] 1
140 0.000 (-0.0125,0] 0
141 0.000 (-0.0125,0] 0
142 -0.005 (-0.0125,0] 1
143 0.005 (0,0.0275] 1
The graph gets made, however, there is segmentation w/ gender (FEMALE6). Female 6 is a variable with either 0 or 1 based on the persons gender.
Related
Calculating the number of values in a vector between a set of intervals
Suppose I have a vector a: set.seed(123) a = runif(10000) / 2 and a vector of values, intervals: intervals = seq(0, 0.5, 0.001) How can I quickly calculate the number of values in a which are between intervals[n] and intervals[n+1]? For example, the results would like something like (made up numbers): interval count 0.000 - 0.001 100 0.001 - 0.002 300 0.002 - 0.003 2342
Not completely what you want, but you can use hist to count the numbers in a interval: set.seed(123) a = runif(10000) / 2 intervals = seq(0, 0.5, 0.001) result = hist(a, intervals, plot=F) Output: > result$counts [1] 20 17 24 15 23 15 28 23 15 26 15 25 28 23 20 20 18 20 28 14 10 25 30 16 23 19 21 18 20 29 22 11 19 17 16 22 18 20 [39] 29 15 21 13 19 15 17 12 20 12 19 25 17 16 17 13 24 20 19 17 16 26 27 17 12 20 17 18 16 32 24 18 24 24 21 17 23 17 [77] 25 19 25 23 29 25 26 18 17 20 24 20 14 18 19 22 20 21 16 22 21 18 18 16 20 19 19 19 18 16 12 13 22 26 18 21 23 23 [115] 17 23 22 11 23 15 26 14 18 18 21 16 19 20 29 26 17 21 24 23 18 29 16 21 28 17 29 25 15 21 25 20 15 16 25 22 20 12 [153] 23 16 23 27 19 18 18 23 24 14 21 27 17 17 16 23 17 26 24 16 13 10 21 13 17 16 14 17 12 20 23 24 19 18 26 24 18 23 [191] 17 23 18 18 25 23 14 19 23 13 21 21 26 19 28 16 22 28 31 22 25 22 25 22 16 16 34 25 27 18 17 17 20 28 21 23 17 13 [229] 22 11 21 21 24 24 21 24 17 19 24 25 21 22 21 18 23 18 24 23 17 23 22 19 18 11 18 21 18 22 16 23 31 23 22 17 26 17 [267] 14 8 19 19 20 27 12 22 16 20 12 25 30 16 20 19 22 27 22 17 19 13 28 19 14 23 28 23 15 21 14 20 21 21 16 13 14 21 [305] 25 18 23 22 21 26 25 23 26 22 23 25 14 20 16 16 15 16 19 22 16 24 21 22 19 18 21 22 22 20 25 21 22 19 18 17 20 25 [343] 18 26 22 22 18 19 11 12 16 24 21 17 21 17 22 17 20 17 16 23 17 22 22 26 23 26 24 14 25 27 20 25 15 9 30 20 17 24 [381] 22 16 23 21 18 19 17 18 17 24 16 18 28 15 17 19 16 23 20 19 18 23 14 21 14 19 25 21 23 18 20 25 15 22 20 23 17 19 [419] 23 16 15 21 32 20 17 15 31 20 24 14 20 27 20 28 20 18 22 15 16 17 22 17 21 22 24 17 18 21 14 20 20 23 15 24 15 16 [457] 15 24 13 25 11 18 21 14 18 29 17 18 17 21 15 18 20 24 21 17 27 18 27 19 17 21 10 21 20 13 19 20 20 18 15 13 16 19 [495] 22 20 24 20 17 20 > result$breaks [1] 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010 0.011 0.012 0.013 0.014 0.015 0.016 0.017 0.018 [20] 0.019 0.020 0.021 0.022 0.023 0.024 0.025 0.026 0.027 0.028 0.029 0.030 0.031 0.032 0.033 0.034 0.035 0.036 0.037 [39] 0.038 0.039 0.040 0.041 0.042 0.043 0.044 0.045 0.046 0.047 0.048 0.049 0.050 0.051 0.052 0.053 0.054 0.055 0.056 [58] 0.057 0.058 0.059 0.060 0.061 0.062 0.063 0.064 0.065 0.066 0.067 0.068 0.069 0.070 0.071 0.072 0.073 0.074 0.075 [77] 0.076 0.077 0.078 0.079 0.080 0.081 0.082 0.083 0.084 0.085 0.086 0.087 0.088 0.089 0.090 0.091 0.092 0.093 0.094 [96] 0.095 0.096 0.097 0.098 0.099 0.100 0.101 0.102 0.103 0.104 0.105 0.106 0.107 0.108 0.109 0.110 0.111 0.112 0.113 [115] 0.114 0.115 0.116 0.117 0.118 0.119 0.120 0.121 0.122 0.123 0.124 0.125 0.126 0.127 0.128 0.129 0.130 0.131 0.132 [134] 0.133 0.134 0.135 0.136 0.137 0.138 0.139 0.140 0.141 0.142 0.143 0.144 0.145 0.146 0.147 0.148 0.149 0.150 0.151 [153] 0.152 0.153 0.154 0.155 0.156 0.157 0.158 0.159 0.160 0.161 0.162 0.163 0.164 0.165 0.166 0.167 0.168 0.169 0.170 [172] 0.171 0.172 0.173 0.174 0.175 0.176 0.177 0.178 0.179 0.180 0.181 0.182 0.183 0.184 0.185 0.186 0.187 0.188 0.189 [191] 0.190 0.191 0.192 0.193 0.194 0.195 0.196 0.197 0.198 0.199 0.200 0.201 0.202 0.203 0.204 0.205 0.206 0.207 0.208 [210] 0.209 0.210 0.211 0.212 0.213 0.214 0.215 0.216 0.217 0.218 0.219 0.220 0.221 0.222 0.223 0.224 0.225 0.226 0.227 [229] 0.228 0.229 0.230 0.231 0.232 0.233 0.234 0.235 0.236 0.237 0.238 0.239 0.240 0.241 0.242 0.243 0.244 0.245 0.246 [248] 0.247 0.248 0.249 0.250 0.251 0.252 0.253 0.254 0.255 0.256 0.257 0.258 0.259 0.260 0.261 0.262 0.263 0.264 0.265 [267] 0.266 0.267 0.268 0.269 0.270 0.271 0.272 0.273 0.274 0.275 0.276 0.277 0.278 0.279 0.280 0.281 0.282 0.283 0.284 [286] 0.285 0.286 0.287 0.288 0.289 0.290 0.291 0.292 0.293 0.294 0.295 0.296 0.297 0.298 0.299 0.300 0.301 0.302 0.303 [305] 0.304 0.305 0.306 0.307 0.308 0.309 0.310 0.311 0.312 0.313 0.314 0.315 0.316 0.317 0.318 0.319 0.320 0.321 0.322 [324] 0.323 0.324 0.325 0.326 0.327 0.328 0.329 0.330 0.331 0.332 0.333 0.334 0.335 0.336 0.337 0.338 0.339 0.340 0.341 [343] 0.342 0.343 0.344 0.345 0.346 0.347 0.348 0.349 0.350 0.351 0.352 0.353 0.354 0.355 0.356 0.357 0.358 0.359 0.360 [362] 0.361 0.362 0.363 0.364 0.365 0.366 0.367 0.368 0.369 0.370 0.371 0.372 0.373 0.374 0.375 0.376 0.377 0.378 0.379 [381] 0.380 0.381 0.382 0.383 0.384 0.385 0.386 0.387 0.388 0.389 0.390 0.391 0.392 0.393 0.394 0.395 0.396 0.397 0.398 [400] 0.399 0.400 0.401 0.402 0.403 0.404 0.405 0.406 0.407 0.408 0.409 0.410 0.411 0.412 0.413 0.414 0.415 0.416 0.417 [419] 0.418 0.419 0.420 0.421 0.422 0.423 0.424 0.425 0.426 0.427 0.428 0.429 0.430 0.431 0.432 0.433 0.434 0.435 0.436 [438] 0.437 0.438 0.439 0.440 0.441 0.442 0.443 0.444 0.445 0.446 0.447 0.448 0.449 0.450 0.451 0.452 0.453 0.454 0.455 [457] 0.456 0.457 0.458 0.459 0.460 0.461 0.462 0.463 0.464 0.465 0.466 0.467 0.468 0.469 0.470 0.471 0.472 0.473 0.474 [476] 0.475 0.476 0.477 0.478 0.479 0.480 0.481 0.482 0.483 0.484 0.485 0.486 0.487 0.488 0.489 0.490 0.491 0.492 0.493 [495] 0.494 0.495 0.496 0.497 0.498 0.499 0.500
One option is to use sapply and between library(dplyr) sapply(seq_along(intervals), function(x) { sum(between(a, intervals[x], intervals[x+1])) })
Use group by logic with lapply function
I can use the following the function "tabyl" form the janitor package like this to apply tabyl to every column. lapply(mtcars[,2:4],tabyl) What I really want to do is use group by cyl and then use tabyl to those all those specified columns,something like this (does not work): lapply(mtcars[,2:4],tabyl(cyl)) How would I put this above line into an lapply function? Or is there some other way of grouping and using a group by logic? Please note, I have hundreds of variables in my actual data, and I want to apply tabyl to almost all the variables in my data (all the numeric at least). So I need a way of calling tabyl on them without explicitly calling on the variable names! I want it to look like this(provided in an answer below), except I want to include MANY more variables. Imagine mtcars has 104 variables, and I want to apply this group tabyl on only the numeric ones. cyl 4 6 8 n Percent n Percent n Percent disp 71.1 1 9.091 0 0.00 0 0.000 75.7 1 9.091 0 0.00 0 0.000 78.7 1 9.091 0 0.00 0 0.000 79 1 9.091 0 0.00 0 0.000 95.1 1 9.091 0 0.00 0 0.000 108 1 9.091 0 0.00 0 0.000 120.1 1 9.091 0 0.00 0 0.000 120.3 1 9.091 0 0.00 0 0.000 121 1 9.091 0 0.00 0 0.000 140.8 1 9.091 0 0.00 0 0.000 145 0 0.000 1 14.29 0 0.000 146.7 1 9.091 0 0.00 0 0.000 160 0 0.000 2 28.57 0 0.000 167.6 0 0.000 2 28.57 0 0.000 225 0 0.000 1 14.29 0 0.000 258 0 0.000 1 14.29 0 0.000 275.8 0 0.000 0 0.00 3 21.429 301 0 0.000 0 0.00 1 7.143 304 0 0.000 0 0.00 1 7.143 318 0 0.000 0 0.00 1 7.143 350 0 0.000 0 0.00 1 7.143 351 0 0.000 0 0.00 1 7.143 360 0 0.000 0 0.00 2 14.286 400 0 0.000 0 0.00 1 7.143 440 0 0.000 0 0.00 1 7.143 460 0 0.000 0 0.00 1 7.143 472 0 0.000 0 0.00 1 7.143 All 11 100.000 7 100.00 14 100.000 hp 52 1 9.091 0 0.00 0 0.000 62 1 9.091 0 0.00 0 0.000 65 1 9.091 0 0.00 0 0.000 66 2 18.182 0 0.00 0 0.000 91 1 9.091 0 0.00 0 0.000 93 1 9.091 0 0.00 0 0.000 95 1 9.091 0 0.00 0 0.000 97 1 9.091 0 0.00 0 0.000 105 0 0.000 1 14.29 0 0.000 109 1 9.091 0 0.00 0 0.000 110 0 0.000 3 42.86 0 0.000 113 1 9.091 0 0.00 0 0.000 123 0 0.000 2 28.57 0 0.000 150 0 0.000 0 0.00 2 14.286 175 0 0.000 1 14.29 2 14.286 180 0 0.000 0 0.00 3 21.429 205 0 0.000 0 0.00 1 7.143 215 0 0.000 0 0.00 1 7.143 230 0 0.000 0 0.00 1 7.143 245 0 0.000 0 0.00 2 14.286 264 0 0.000 0 0.00 1 7.143 335 0 0.000 0 0.00 1 7.143 All 11 100.000 7 100.00 14 100.000
There are lots of ways to generate counts and frequencies by multiple variables. A solution with tables::tabular() enables one to display the "by group" on the column dimension, and other variables on the row dimension of a table. We'll use the mtcars data to display disp and hp on the row dimension, and cyl on the column dimension. library(tables) tabular(((Factor(disp) + 1) + (Factor(hp) + 1))~(Factor(cyl))*((n=1) + Percent("col")),data = mtcars) ...and the output: cyl 4 6 8 n Percent n Percent n Percent disp 71.1 1 9.091 0 0.00 0 0.000 75.7 1 9.091 0 0.00 0 0.000 78.7 1 9.091 0 0.00 0 0.000 79 1 9.091 0 0.00 0 0.000 95.1 1 9.091 0 0.00 0 0.000 108 1 9.091 0 0.00 0 0.000 120.1 1 9.091 0 0.00 0 0.000 120.3 1 9.091 0 0.00 0 0.000 121 1 9.091 0 0.00 0 0.000 140.8 1 9.091 0 0.00 0 0.000 145 0 0.000 1 14.29 0 0.000 146.7 1 9.091 0 0.00 0 0.000 160 0 0.000 2 28.57 0 0.000 167.6 0 0.000 2 28.57 0 0.000 225 0 0.000 1 14.29 0 0.000 258 0 0.000 1 14.29 0 0.000 275.8 0 0.000 0 0.00 3 21.429 301 0 0.000 0 0.00 1 7.143 304 0 0.000 0 0.00 1 7.143 318 0 0.000 0 0.00 1 7.143 350 0 0.000 0 0.00 1 7.143 351 0 0.000 0 0.00 1 7.143 360 0 0.000 0 0.00 2 14.286 400 0 0.000 0 0.00 1 7.143 440 0 0.000 0 0.00 1 7.143 460 0 0.000 0 0.00 1 7.143 472 0 0.000 0 0.00 1 7.143 All 11 100.000 7 100.00 14 100.000 hp 52 1 9.091 0 0.00 0 0.000 62 1 9.091 0 0.00 0 0.000 65 1 9.091 0 0.00 0 0.000 66 2 18.182 0 0.00 0 0.000 91 1 9.091 0 0.00 0 0.000 93 1 9.091 0 0.00 0 0.000 95 1 9.091 0 0.00 0 0.000 97 1 9.091 0 0.00 0 0.000 105 0 0.000 1 14.29 0 0.000 109 1 9.091 0 0.00 0 0.000 110 0 0.000 3 42.86 0 0.000 113 1 9.091 0 0.00 0 0.000 123 0 0.000 2 28.57 0 0.000 150 0 0.000 0 0.00 2 14.286 175 0 0.000 1 14.29 2 14.286 180 0 0.000 0 0.00 3 21.429 205 0 0.000 0 0.00 1 7.143 215 0 0.000 0 0.00 1 7.143 230 0 0.000 0 0.00 1 7.143 245 0 0.000 0 0.00 2 14.286 264 0 0.000 0 0.00 1 7.143 335 0 0.000 0 0.00 1 7.143 All 11 100.000 7 100.00 14 100.000 > UPDATE: automate the process In the comments to my answer, the original poster asked how one might automate tabular() to avoid having to type out all the variables to be tabulated. We can do this with lapply() and an anonymous function. Since the OP used column numbers as part of their question, we'll create a vector of columns from the mtcars data frame to be tabulated. We'll use that as the input to lapply(), along with two other arguments, one for the data frame, and another to specify the column variable in the table. Since the column variable will be a single variable, we specified it with its column name rather than a number. # generalize and automate varList <- 2:4 lapply(varList,function(x,df,byVar){ tabular((Factor(df[[x]],paste(colnames(df)[x])) + 1) ~ ((Factor(df[[byVar]],paste(byVar)))*((n=1) + Percent("col"))), data= df) },mtcars,"cyl") The tricky part is how automating the process without the output tables having row headers of df[[x]] and column headers of df[[byVar]]. To avoid this situation, we extract the column name for the row dimension with colnames(), and we overwrite the header for the columns by pasting the byVar argument into the header. ...and the output: [[1]] cyl 4 6 8 cyl n Percent n Percent n Percent 4 11 100 0 0 0 0 6 0 0 7 100 0 0 8 0 0 0 0 14 100 All 11 100 7 100 14 100 [[2]] cyl 4 6 8 disp n Percent n Percent n Percent 71.1 1 9.091 0 0.00 0 0.000 75.7 1 9.091 0 0.00 0 0.000 78.7 1 9.091 0 0.00 0 0.000 79 1 9.091 0 0.00 0 0.000 95.1 1 9.091 0 0.00 0 0.000 108 1 9.091 0 0.00 0 0.000 120.1 1 9.091 0 0.00 0 0.000 120.3 1 9.091 0 0.00 0 0.000 121 1 9.091 0 0.00 0 0.000 140.8 1 9.091 0 0.00 0 0.000 145 0 0.000 1 14.29 0 0.000 146.7 1 9.091 0 0.00 0 0.000 160 0 0.000 2 28.57 0 0.000 167.6 0 0.000 2 28.57 0 0.000 225 0 0.000 1 14.29 0 0.000 258 0 0.000 1 14.29 0 0.000 275.8 0 0.000 0 0.00 3 21.429 301 0 0.000 0 0.00 1 7.143 304 0 0.000 0 0.00 1 7.143 318 0 0.000 0 0.00 1 7.143 350 0 0.000 0 0.00 1 7.143 351 0 0.000 0 0.00 1 7.143 360 0 0.000 0 0.00 2 14.286 400 0 0.000 0 0.00 1 7.143 440 0 0.000 0 0.00 1 7.143 460 0 0.000 0 0.00 1 7.143 472 0 0.000 0 0.00 1 7.143 All 11 100.000 7 100.00 14 100.000 [[3]] cyl 4 6 8 hp n Percent n Percent n Percent 52 1 9.091 0 0.00 0 0.000 62 1 9.091 0 0.00 0 0.000 65 1 9.091 0 0.00 0 0.000 66 2 18.182 0 0.00 0 0.000 91 1 9.091 0 0.00 0 0.000 93 1 9.091 0 0.00 0 0.000 95 1 9.091 0 0.00 0 0.000 97 1 9.091 0 0.00 0 0.000 105 0 0.000 1 14.29 0 0.000 109 1 9.091 0 0.00 0 0.000 110 0 0.000 3 42.86 0 0.000 113 1 9.091 0 0.00 0 0.000 123 0 0.000 2 28.57 0 0.000 150 0 0.000 0 0.00 2 14.286 175 0 0.000 1 14.29 2 14.286 180 0 0.000 0 0.00 3 21.429 205 0 0.000 0 0.00 1 7.143 215 0 0.000 0 0.00 1 7.143 230 0 0.000 0 0.00 1 7.143 245 0 0.000 0 0.00 2 14.286 264 0 0.000 0 0.00 1 7.143 335 0 0.000 0 0.00 1 7.143 All 11 100.000 7 100.00 14 100.000
One way is this, although I don't know if you need the cyl column: by(mtcars[,2:4],mtcars$cyl,lapply,tabyl) Or a tidy way, (I think the list part can be improved) : out = mtcars[,2:4] %>% mutate(id=cyl) %>% group_by(id) %>% summarize_all(~list(tabyl(.))) out # A tibble: 3 x 4 id cyl disp hp <dbl> <list> <list> <list> 1 4 <df[,3] [1 × 3]> <df[,3] [11 × 3]> <df[,3] [10 × 3]> 2 6 <df[,3] [1 × 3]> <df[,3] [5 × 3]> <df[,3] [4 × 3]> 3 8 <df[,3] [1 × 3]> <df[,3] [11 × 3]> <df[,3] [9 × 3]> out %>% filter(id==4) %>% pull(hp) [[1]] . n percent 52 1 0.09090909 62 1 0.09090909 65 1 0.09090909 66 2 0.18181818 91 1 0.09090909 93 1 0.09090909 95 1 0.09090909 97 1 0.09090909 109 1 0.09090909 113 1 0.09090909
Error when using DiceKriging function km: 'Error in t.default(T) : argument is not a matrix'
I'm attempting to apply the km function from the DiceKriging package to a multivariate dataset. When I use my entire dataset (dimensions = [938,13]), the algorithm runs without problem. When I use a smaller subset of this dataset (dimensions = [94,13]), I encounter this error: Error in t.default(T) : argument is not a matrix I'm not sure what's causing this error to occur when analyzing the data subset. For reference, here's the data and code I'm running: g<-km(design=sub.ev,response=sub.rv,covtype="matern5_2") > sub.ev 1 -0.1519272795 1 0 0 27.769 27.45715 8.02 7.330 21.16 17.73979 0.000 4119.280 0.5338750 2 -0.1436123857 1 0 0 27.420 27.45715 8.04 6.895 20.19 17.73979 0.000 4119.280 0.5338750 3 -0.1144934053 1 0 0 27.021 27.45715 8.06 6.690 19.54 17.73979 0.000 4119.280 0.5338750 4 -0.1585420923 1 0 0 26.531 27.45715 8.09 6.160 18.64 17.73979 0.000 4119.280 0.5338750 5 -0.0588867480 1 0 0 26.107 27.45715 8.11 5.790 17.71 17.73979 0.000 4119.280 0.5338750 6 -0.0402306730 1 0 0 25.704 27.45715 8.15 5.840 17.26 17.73979 0.000 4119.280 0.5338750 7 0.0161781773 1 0 0 25.265 27.45715 8.18 5.390 16.77 17.73979 0.000 4119.280 0.5338750 8 0.0660891620 1 0 0 24.967 27.45715 8.21 5.425 16.18 17.73979 0.000 4119.280 0.5338750 9 0.0079745010 1 0 0 24.665 27.45715 8.24 5.250 16.51 17.73979 0.000 4119.280 0.5338750 10 -0.0191421967 1 0 0 24.416 27.45715 8.27 5.090 15.46 17.73979 0.000 4119.280 0.5338750 11 0.0364618430 1 0 0 24.232 27.45715 8.29 4.820 14.47 17.73979 0.000 4119.280 0.5338750 12 0.0160652203 1 0 0 24.077 27.45715 8.31 4.745 15.57 17.73979 0.000 4119.280 0.5338750 13 0.0763233707 1 0 0 23.906 27.45715 8.33 4.680 15.14 17.73979 0.000 4119.280 0.5338750 14 0.0211064293 1 0 0 23.794 27.45715 8.35 4.940 18.25 17.73979 0.000 4119.280 0.5338750 15 0.0166228227 1 0 0 23.642 27.45715 8.36 5.190 14.60 17.73979 0.000 4119.280 0.5338750 16 0.0022511447 1 0 0 23.531 27.45715 8.37 5.425 15.30 17.73979 0.000 4119.280 0.5338750 17 -0.0122220320 1 0 0 23.445 27.45715 8.39 6.110 14.79 17.73979 0.000 4119.280 0.5338750 18 -0.0057961107 1 0 0 23.442 27.45715 8.39 5.940 14.49 17.73979 0.000 4119.280 0.5338750 19 -0.0322631907 1 0 0 23.512 27.45715 8.39 5.380 14.38 17.73979 0.000 4119.280 0.5338750 95 0.0159845000 1 0 0 27.306 27.45715 8.24 7.380 17.14 17.73979 0.000 4119.280 0.5338750 96 -0.0437959553 1 0 0 27.054 27.45715 8.25 7.100 15.35 17.73979 0.000 4119.280 0.5338750 97 -0.0805486920 1 0 0 26.520 26.88009 8.27 6.900 14.87 16.96688 0.000 3725.365 0.5422500 98 -0.0941385073 1 0 0 26.306 26.88009 8.30 6.725 15.19 16.96688 0.000 3725.365 0.5422500 99 -0.1159254400 1 0 0 26.039 26.88009 8.32 6.590 15.42 16.96688 0.000 3725.365 0.5422500 100 -0.0362266430 1 0 0 25.637 26.88009 8.36 6.280 14.34 16.96688 0.000 3725.365 0.5422500 101 0.0326682983 1 0 0 25.299 26.88009 8.38 6.100 14.36 16.96688 0.000 3725.365 0.5422500 102 0.0471005793 1 0 0 24.742 26.88009 8.44 5.955 13.48 16.96688 0.000 3725.365 0.5422500 103 -0.0596346010 1 0 0 24.262 26.88009 8.49 5.530 12.91 16.96688 0.000 3725.365 0.5422500 104 -0.0043536683 1 0 0 23.856 26.88009 8.52 5.315 12.80 16.96688 0.000 3725.365 0.5422500 105 -0.0177714297 1 0 0 23.578 26.88009 8.54 5.240 13.01 16.96688 0.000 3725.365 0.5422500 106 0.0169363000 1 0 0 23.313 26.88009 8.56 5.225 12.69 16.96688 0.000 3725.365 0.5422500 107 -0.0170451183 1 0 0 23.023 26.88009 8.59 5.090 12.71 16.96688 0.000 3725.365 0.5422500 108 0.0231896353 1 0 0 22.755 26.88009 8.62 5.280 11.58 16.96688 0.000 3725.365 0.5422500 109 0.0053651757 1 0 0 22.510 26.88009 8.65 5.690 11.28 16.96688 0.000 3725.365 0.5422500 110 0.0281674793 1 0 0 22.342 26.88009 8.68 5.555 11.21 16.96688 0.000 3725.365 0.5422500 111 0.0009483843 1 0 0 22.168 26.88009 8.70 5.770 10.68 16.96688 0.000 3725.365 0.5422500 112 0.0147559413 1 0 0 22.151 26.88009 8.69 5.995 11.57 16.96688 0.000 3725.365 0.5422500 190 -0.0115338953 1 0 0 27.062 25.91990 8.21 8.000 15.35 14.07031 0.000 3949.390 0.5342917 191 -0.0189870410 1 0 0 26.545 25.91990 8.22 7.870 14.21 14.07031 0.000 3949.390 0.5342917 192 -0.0184237180 1 0 0 26.104 25.91990 8.25 7.795 20.04 14.07031 0.000 3949.390 0.5342917 193 -0.0319295797 1 0 0 25.859 25.91990 8.26 7.730 13.14 14.07031 0.000 3949.390 0.5342917 194 -0.0184753123 1 0 0 25.573 25.91990 8.30 7.585 12.93 14.07031 0.000 3949.390 0.5342917 195 -0.0197481060 1 0 0 25.005 25.91990 8.36 7.490 11.29 14.07031 0.000 3949.390 0.5342917 196 -0.0215467360 1 0 0 24.710 25.91990 8.39 7.245 11.10 14.07031 0.000 3949.390 0.5342917 197 0.0265223447 1 0 0 24.455 25.91990 8.42 7.240 11.44 14.07031 0.000 3949.390 0.5342917 198 0.0470763840 1 0 0 24.087 25.91990 8.45 7.225 11.42 14.07031 0.000 3949.390 0.5342917 199 0.0622169450 1 0 0 23.673 25.91990 8.48 7.260 11.20 14.07031 0.000 3949.390 0.5342917 200 -0.0104582193 1 0 0 23.301 25.91990 8.52 7.190 11.57 14.07031 0.000 3949.390 0.5342917 201 0.0121972077 1 0 0 23.005 25.91990 8.55 7.220 10.71 14.07031 0.000 3949.390 0.5342917 202 0.0219721027 1 0 0 22.745 25.91990 8.59 7.245 10.86 14.07031 0.000 3949.390 0.5342917 203 0.0208879210 1 0 0 22.576 25.91990 8.61 7.180 10.62 14.07031 0.000 3949.390 0.5342917 204 0.0192644400 1 0 0 22.417 25.91990 8.63 7.220 10.47 14.07031 0.000 3949.390 0.5342917 205 0.0066226250 1 0 0 22.243 25.91990 8.65 7.170 10.29 14.07031 0.000 3949.390 0.5342917 206 0.0036012053 1 0 0 22.136 25.91990 8.67 7.200 10.86 14.07031 0.000 3949.390 0.5342917 207 -0.0027906963 1 0 0 22.012 25.91990 8.69 7.150 10.57 14.07031 0.000 3949.390 0.5342917 290 0.2266387893 1 0 0 27.763 26.21360 8.21 7.765 23.37 17.97479 0.000 3554.510 0.5938333 291 0.1151646527 1 0 0 27.377 26.21360 8.23 7.720 24.43 17.97479 0.000 3554.510 0.5938333 292 -0.0218444193 1 0 0 27.285 26.21360 8.23 7.655 23.40 17.97479 0.000 3554.510 0.5938333 293 -0.0908422353 1 0 0 26.935 26.21360 8.26 7.590 21.96 17.97479 0.000 3554.510 0.5938333 294 -0.1716709177 1 0 0 26.792 26.21360 8.28 7.540 23.30 17.97479 0.000 3554.510 0.5938333 295 -0.1943201847 1 0 0 26.869 26.21360 8.28 7.420 25.06 17.97479 0.000 3554.510 0.5938333 296 -0.1529985130 1 0 0 26.733 26.21360 8.28 7.310 33.62 17.97479 0.000 3554.510 0.5938333 297 -0.1106344563 1 0 0 26.394 26.21360 8.29 7.160 29.06 17.97479 0.000 3554.510 0.5938333 298 -0.0638089193 1 0 0 25.973 26.21360 8.32 7.010 23.25 17.97479 0.000 3554.510 0.5938333 299 -0.1208449610 1 0 0 25.501 26.21360 8.37 6.970 21.10 17.97479 0.000 3554.510 0.5938333 300 -0.2310616323 1 0 0 25.192 26.21360 8.40 7.010 19.16 17.97479 0.000 3554.510 0.5938333 301 -0.2043969970 1 0 0 24.867 26.21360 8.43 7.030 17.80 17.97479 0.000 3554.510 0.5938333 302 -0.2003585363 1 0 0 24.634 26.21360 8.45 7.080 17.01 17.97479 0.000 3554.510 0.5938333 303 -0.2535806687 1 0 0 24.468 26.21360 8.47 7.130 16.41 17.97479 0.000 3554.510 0.5938333 304 -0.2464920640 1 0 0 24.234 26.21360 8.50 7.055 15.50 17.97479 0.000 3554.510 0.5938333 389 0.2461277410 1 0 0 27.981 26.78711 8.29 6.760 20.65 22.02125 0.000 3589.260 0.7715417 390 0.1660650063 1 0 0 27.915 26.78711 8.29 6.575 22.44 22.02125 0.000 3589.260 0.7715417 391 -0.0609562143 1 0 0 27.757 26.78711 8.30 6.580 22.92 22.02125 0.000 3589.260 0.7715417 392 -0.2020911323 1 0 0 27.527 26.78711 8.32 6.590 22.91 22.02125 0.000 3589.260 0.7715417 393 -0.2980735343 1 0 0 27.563 26.78711 8.31 6.510 24.30 22.02125 0.000 3589.260 0.7715417 394 -0.2370078227 1 0 0 27.400 26.78711 8.32 6.525 22.67 22.02125 0.000 3589.260 0.7715417 395 -0.3440380117 1 0 0 27.274 26.78711 8.32 6.530 22.45 22.02125 0.000 3589.260 0.7715417 396 -0.0057092573 1 0 0 27.132 26.78711 8.30 6.490 22.08 22.02125 0.000 3589.260 0.7715417 397 -0.0241878650 1 0 0 27.018 26.78711 8.30 6.450 24.00 22.02125 0.000 3589.260 0.7715417 398 -0.2080665820 1 0 0 26.834 26.78711 8.32 6.405 25.81 22.02125 0.000 3589.260 0.7715417 399 -0.1716383953 1 0 0 26.637 26.78711 8.33 6.340 25.19 22.02125 0.000 3589.260 0.7715417 400 -0.2570107420 1 0 0 26.476 26.78711 8.34 6.300 22.05 22.02125 0.000 3589.260 0.7715417 495 -0.1302196527 1 0 0 25.967 26.53893 8.40 6.300 11.90 15.57448 2.286 2313.087 0.6087083 496 -0.0391473870 1 0 0 25.708 26.53893 8.41 6.260 10.77 15.57448 2.286 2313.087 0.6087083 587 -0.0382961500 1 0 0 28.647 26.98353 8.28 8.110 17.52 12.12667 0.000 3980.156 0.4227083 588 -0.0035965477 1 0 0 28.652 26.98353 8.26 7.920 21.14 12.12667 0.000 3980.156 0.4227083 589 -0.0050414577 1 0 0 28.307 26.98353 8.27 7.830 18.39 12.12667 0.000 3980.156 0.4227083 590 0.0354186967 1 0 0 27.896 26.98353 8.27 6.825 17.35 12.12667 0.000 3980.156 0.4227083 591 -0.0676664363 1 0 0 27.581 26.98353 8.27 6.110 16.06 12.12667 0.000 3980.156 0.4227083 592 -0.1723716683 1 0 0 27.223 26.98353 8.29 6.245 15.14 12.12667 0.000 3980.156 0.4227083 688 0.1598606430 1 0 0 28.738 27.54833 8.35 5.965 24.99 15.51010 0.000 2554.804 0.4730417 912 -0.1977195740 1 0 0 26.653 27.24447 9.09 7.635 11.38 12.71729 0.000 4002.945 0.5135417 913 -0.1306465143 1 0 0 27.399 27.24447 9.07 7.420 12.84 12.71729 0.000 4002.945 0.5135417 914 -0.1175953210 1 0 0 28.024 27.24447 9.01 7.270 13.91 12.71729 0.000 4002.945 0.5135417 > sub.rv [1] 3.24 2.65 2.18 1.75 1.52 1.32 1.25 1.18 1.16 1.10 1.05 1.01 0.95 0.88 0.83 0.76 0.72 0.61 0.58 3.64 3.46 2.97 2.64 2.27 2.07 1.72 1.33 1.03 0.89 0.82 0.69 0.64 0.60 [34] 0.55 0.44 0.33 0.26 4.65 4.10 3.55 3.30 2.70 2.12 1.74 1.53 1.27 1.02 0.88 0.76 0.66 0.58 0.54 0.47 0.40 0.28 5.87 5.24 5.09 4.22 3.90 3.94 3.41 2.75 2.18 1.86 1.67 [67] 1.52 1.42 1.34 1.35 5.88 5.56 4.88 3.98 3.76 3.31 3.11 2.78 2.41 1.97 1.65 1.39 1.63 1.44 4.08 4.83 3.84 3.23 2.83 2.36 4.33 1.86 3.17 4.48
I've run into the same issue. It is resolved when you remove the variables in your design matrix that have 0 variance (as these are simply constants). In your case, you need to remove columns (3:5).
import specific rows from "txt" into R
I have a "example.txt" document just as follows: SIGNAL: 40 41 42 0.406 0.043 0.051 0.021 0.013 0.056 0.201 0.026 0.009 0.000 0.000 0.128 0 0.009 0.000 TOTAL: 0.657 SIGNAL: 44 45 46 48 0.128 0.338 0.026 0.333 0.03 0.000 0.060 0.013 0.004 0.009 0.017 0.009 0.013 0 0.000 TOTAL: 0.704 SIGNAL: 51 52 54 0.368 0.081 0.085 0.004 0.162 0.09 0.064 0.073 0.013 0.017 0.009 0.000 TOTAL: 0.266 SIGNAL: 60 61 62 63 64 65 66 67 0.530 0.030 0.009 0.179 0.154 0.004 0.068 0.009 TOTAL: 0.796 I want to import the rows between "SIGNAL: 44 45 46 48" and "TOTAL: 0.704" into R, I use read.table("example.txt",skip=6 ,nrow=5) to extract these specific rows, it works. V1 V2 V3 1 0.128 0.338 0.026 2 0.333 0.030 0.000 3 0.060 0.013 0.004 4 0.009 0.017 0.009 5 0.013 0.000 0.000 However, my real data (has 450,000 rows) is very big, if I want to extract the rows between "SIGNAL: 3000 3001 3002 3003" and the next"TOTAL", how can I do with it? Thank you so much!
I have worked it out based on akrun's code. For example, I want to extract the first two sets. I can just use: lines <- readLines('example.txt') g<-c(40,44) sapply(1:length(g), function(x){Map(function(i,j) read.table(text=lines[(i+1):(j-1)], sep='', header=FALSE), grep(paste('SIGNAL:',g[x]), lines), grep('TOTAL', lines)[which(grep(paste('SIGNAL:',g[x]), lines)==grep('SIGNAL', lines))])})
Referring to other cells in R without using a for loop
I am new to R and one thing I have been told again and again is that there really is no need for for loops. I have had some success with apply but could not figure out how to use it in this instance. Here is the data I am working with: Bid Ask Exp Strike Price V6 51 4.95 5.15 NOV1 13 335 5.050 3.08 52 3.40 3.50 NOV1 13 340 3.450 NA 53 2.28 2.42 NOV1 13 345 2.350 NA 54 1.51 1.57 NOV1 13 350 1.540 NA 55 0.99 1.07 NOV1 13 355 1.030 NA 56 0.66 0.71 NOV1 13 360 0.685 NA 57 0.46 0.51 NOV1 13 365 0.485 NA 58 0.33 0.37 NOV1 13 370 0.350 NA 59 0.25 0.28 NOV1 13 375 0.265 NA 60 0.18 0.24 NOV1 13 380 0.210 NA 61 0.11 0.20 NOV1 13 385 0.155 NA 62 0.05 0.17 NOV1 13 390 0.110 NA 63 0.05 0.16 NOV1 13 395 0.105 NA 64 0.07 0.13 NOV1 13 400 0.100 NA In column 6 (called V6), I want the values to be twice the value in the price column in the cell that is 3 below the current row. For example, Row 1 in Col 6 is 3.08 which is 2*1.54 which is in column 5, row 4. I would like to do this for every cell in row 6 until it runs out in row 12. NA is fine in column 6 after this row. Here is how I accomplished this: for (i in 1:11){ data[i,6] <- 2*data[i+3,5]} Is there a faster/easier/ more appropriate way to do this? Here is the final data as I want it. Bid Ask Exp Strike Price V6 51 4.95 5.15 NOV1 13 335 5.050 3.08 52 3.40 3.50 NOV1 13 340 3.450 2.06 53 2.28 2.42 NOV1 13 345 2.350 1.37 54 1.51 1.57 NOV1 13 350 1.540 0.97 55 0.99 1.07 NOV1 13 355 1.030 0.70 56 0.66 0.71 NOV1 13 360 0.685 0.53 57 0.46 0.51 NOV1 13 365 0.485 0.42 58 0.33 0.37 NOV1 13 370 0.350 0.31 59 0.25 0.28 NOV1 13 375 0.265 0.22 60 0.18 0.24 NOV1 13 380 0.210 0.21 61 0.11 0.20 NOV1 13 385 0.155 0.20 62 0.05 0.17 NOV1 13 390 0.110 NA 63 0.05 0.16 NOV1 13 395 0.105 NA 64 0.07 0.13 NOV1 13 400 0.100 NA Thank you.
use mydata$V6 <- 2 * c(mydata$Price[-(1:3)], rep(NA, 3))
df1 is your data. I used sapply here which should be faster than for loop df1$V6<-sapply(1:nrow(df1),function(x) 2*df1[x+3,5])