creating bargraph with 2 variables on ggplo2 - r

I am trying to create a bar graph showing the count of some variables I grouped from a table using quintiles. I want this bargraph to be segmented by gender (it's in the data).
So it should look show me the prejudice cut and how often it is showcased in the data segmented by the gender.
prejiducequint = quantile(dta.sub2$WHPREJUDICE6R,seq(0,1, length = 5))
prejiducecut = cut(dta.sub2$WHPREJUDICE6R, breaks = incomequint)
p = ggplot(dta.sub2, aes(prejiducecut, fill = FEMALE6)) +
geom_bar(position = "dodge")
p
here is the data
WHPREJUDICE6R prejiducecut FEMALE6
1 -0.005 (-0.0125,0] 0
2 0.075 (0.0275,1] 0
3 0.000 (-0.0125,0] 0
4 0.000 (-0.0125,0] 1
5 -0.020 (-0.44,-0.0125] 1
6 0.130 (0.0275,1] 1
7 0.090 (0.0275,1] 0
8 0.230 (0.0275,1] 0
9 0.100 (0.0275,1] 1
10 0.870 (0.0275,1] 1
11 -0.130 (-0.44,-0.0125] 1
12 -0.010 (-0.0125,0] 1
13 0.005 (0,0.0275] 0
14 -0.010 (-0.0125,0] 1
15 -0.060 (-0.44,-0.0125] 1
16 0.000 (-0.0125,0] 1
17 -0.010 (-0.0125,0] 1
18 0.000 (-0.0125,0] 1
19 -0.075 (-0.44,-0.0125] 1
20 0.005 (0,0.0275] 1
21 -0.010 (-0.0125,0] 0
22 -0.060 (-0.44,-0.0125] 1
23 0.500 (0.0275,1] 0
24 0.020 (0,0.0275] 0
25 0.135 (0.0275,1] 0
26 -0.055 (-0.44,-0.0125] 1
27 -0.440 <NA> 0
28 0.000 (-0.0125,0] 0
29 -0.065 (-0.44,-0.0125] 0
30 0.000 (-0.0125,0] 1
31 0.035 (0.0275,1] 1
32 -0.005 (-0.0125,0] 0
33 0.000 (-0.0125,0] 1
34 -0.290 (-0.44,-0.0125] 1
35 0.005 (0,0.0275] 1
36 0.300 (0.0275,1] 0
37 0.005 (0,0.0275] 1
38 0.070 (0.0275,1] 1
39 -0.195 (-0.44,-0.0125] 1
40 -0.260 (-0.44,-0.0125] 0
41 -0.040 (-0.44,-0.0125] 1
42 0.720 (0.0275,1] 0
43 0.045 (0.0275,1] 1
44 0.125 (0.0275,1] 1
45 0.035 (0.0275,1] 0
46 0.005 (0,0.0275] 1
47 0.000 (-0.0125,0] 0
48 0.000 (-0.0125,0] 0
49 0.000 (-0.0125,0] 1
50 0.010 (0,0.0275] 1
51 0.495 (0.0275,1] 1
52 0.000 (-0.0125,0] 1
53 0.000 (-0.0125,0] 1
54 0.010 (0,0.0275] 0
55 -0.015 (-0.44,-0.0125] 1
56 -0.110 (-0.44,-0.0125] 0
57 0.000 (-0.0125,0] 0
58 0.065 (0.0275,1] 1
59 0.255 (0.0275,1] 1
60 -0.020 (-0.44,-0.0125] 1
61 0.070 (0.0275,1] 1
62 0.000 (-0.0125,0] 0
63 1.000 (0.0275,1] 0
64 0.000 (-0.0125,0] 1
65 0.490 (0.0275,1] 0
66 -0.005 (-0.0125,0] 1
67 0.000 (-0.0125,0] 0
68 0.010 (0,0.0275] 1
69 0.000 (-0.0125,0] 1
70 -0.065 (-0.44,-0.0125] 1
71 0.005 (0,0.0275] 0
72 -0.065 (-0.44,-0.0125] 0
73 0.060 (0.0275,1] 0
74 0.000 (-0.0125,0] 0
75 0.000 (-0.0125,0] 1
76 0.155 (0.0275,1] 0
77 -0.190 (-0.44,-0.0125] 0
78 0.000 (-0.0125,0] 0
79 -0.065 (-0.44,-0.0125] 0
80 0.005 (0,0.0275] 1
81 0.060 (0.0275,1] 0
82 -0.100 (-0.44,-0.0125] 1
83 0.000 (-0.0125,0] 1
84 0.005 (0,0.0275] 0
85 0.000 (-0.0125,0] 1
86 0.300 (0.0275,1] 1
87 -0.070 (-0.44,-0.0125] 1
88 0.430 (0.0275,1] 0
89 -0.060 (-0.44,-0.0125] 1
90 -0.005 (-0.0125,0] 1
91 0.000 (-0.0125,0] 1
92 -0.005 (-0.0125,0] 1
93 0.015 (0,0.0275] 0
94 -0.205 (-0.44,-0.0125] 0
95 0.000 (-0.0125,0] 1
96 0.045 (0.0275,1] 0
97 -0.075 (-0.44,-0.0125] 0
98 0.000 (-0.0125,0] 0
99 0.000 (-0.0125,0] 1
100 0.000 (-0.0125,0] 1
101 0.235 (0.0275,1] 0
102 -0.060 (-0.44,-0.0125] 1
103 0.505 (0.0275,1] 0
104 -0.185 (-0.44,-0.0125] 1
105 0.185 (0.0275,1] 0
106 -0.115 (-0.44,-0.0125] 0
107 0.005 (0,0.0275] 0
108 -0.440 <NA> 1
109 -0.100 (-0.44,-0.0125] 1
110 0.430 (0.0275,1] 1
111 -0.005 (-0.0125,0] 0
112 0.000 (-0.0125,0] 1
113 0.000 (-0.0125,0] 0
114 -0.120 (-0.44,-0.0125] 1
115 0.005 (0,0.0275] 0
116 0.145 (0.0275,1] 0
117 0.110 (0.0275,1] 0
118 -0.010 (-0.0125,0] 0
119 0.000 (-0.0125,0] 1
120 -0.005 (-0.0125,0] 0
121 -0.060 (-0.44,-0.0125] 0
122 0.120 (0.0275,1] 1
123 -0.240 (-0.44,-0.0125] 0
124 -0.005 (-0.0125,0] 0
125 0.000 (-0.0125,0] 1
126 -0.060 (-0.44,-0.0125] 0
127 -0.305 (-0.44,-0.0125] 1
128 0.050 (0.0275,1] 0
129 0.000 (-0.0125,0] 0
130 -0.005 (-0.0125,0] 0
131 -0.005 (-0.0125,0] 0
132 0.000 (-0.0125,0] 1
133 -0.045 (-0.44,-0.0125] 1
134 -0.005 (-0.0125,0] 0
135 0.000 (-0.0125,0] 1
136 -0.065 (-0.44,-0.0125] 1
137 0.000 (-0.0125,0] 1
138 0.055 (0.0275,1] 0
139 0.020 (0,0.0275] 1
140 0.000 (-0.0125,0] 0
141 0.000 (-0.0125,0] 0
142 -0.005 (-0.0125,0] 1
143 0.005 (0,0.0275] 1
The graph gets made, however, there is segmentation w/ gender (FEMALE6). Female 6 is a variable with either 0 or 1 based on the persons gender.

Related

Calculating the number of values in a vector between a set of intervals

Suppose I have a vector a:
set.seed(123)
a = runif(10000) / 2
and a vector of values, intervals:
intervals = seq(0, 0.5, 0.001)
How can I quickly calculate the number of values in a which are between intervals[n] and intervals[n+1]?
For example, the results would like something like (made up numbers):
interval count
0.000 - 0.001 100
0.001 - 0.002 300
0.002 - 0.003 2342
Not completely what you want, but you can use hist to count the numbers in a interval:
set.seed(123)
a = runif(10000) / 2
intervals = seq(0, 0.5, 0.001)
result = hist(a, intervals, plot=F)
Output:
> result$counts
[1] 20 17 24 15 23 15 28 23 15 26 15 25 28 23 20 20 18 20 28 14 10 25 30 16 23 19 21 18 20 29 22 11 19 17 16 22 18 20
[39] 29 15 21 13 19 15 17 12 20 12 19 25 17 16 17 13 24 20 19 17 16 26 27 17 12 20 17 18 16 32 24 18 24 24 21 17 23 17
[77] 25 19 25 23 29 25 26 18 17 20 24 20 14 18 19 22 20 21 16 22 21 18 18 16 20 19 19 19 18 16 12 13 22 26 18 21 23 23
[115] 17 23 22 11 23 15 26 14 18 18 21 16 19 20 29 26 17 21 24 23 18 29 16 21 28 17 29 25 15 21 25 20 15 16 25 22 20 12
[153] 23 16 23 27 19 18 18 23 24 14 21 27 17 17 16 23 17 26 24 16 13 10 21 13 17 16 14 17 12 20 23 24 19 18 26 24 18 23
[191] 17 23 18 18 25 23 14 19 23 13 21 21 26 19 28 16 22 28 31 22 25 22 25 22 16 16 34 25 27 18 17 17 20 28 21 23 17 13
[229] 22 11 21 21 24 24 21 24 17 19 24 25 21 22 21 18 23 18 24 23 17 23 22 19 18 11 18 21 18 22 16 23 31 23 22 17 26 17
[267] 14 8 19 19 20 27 12 22 16 20 12 25 30 16 20 19 22 27 22 17 19 13 28 19 14 23 28 23 15 21 14 20 21 21 16 13 14 21
[305] 25 18 23 22 21 26 25 23 26 22 23 25 14 20 16 16 15 16 19 22 16 24 21 22 19 18 21 22 22 20 25 21 22 19 18 17 20 25
[343] 18 26 22 22 18 19 11 12 16 24 21 17 21 17 22 17 20 17 16 23 17 22 22 26 23 26 24 14 25 27 20 25 15 9 30 20 17 24
[381] 22 16 23 21 18 19 17 18 17 24 16 18 28 15 17 19 16 23 20 19 18 23 14 21 14 19 25 21 23 18 20 25 15 22 20 23 17 19
[419] 23 16 15 21 32 20 17 15 31 20 24 14 20 27 20 28 20 18 22 15 16 17 22 17 21 22 24 17 18 21 14 20 20 23 15 24 15 16
[457] 15 24 13 25 11 18 21 14 18 29 17 18 17 21 15 18 20 24 21 17 27 18 27 19 17 21 10 21 20 13 19 20 20 18 15 13 16 19
[495] 22 20 24 20 17 20
> result$breaks
[1] 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010 0.011 0.012 0.013 0.014 0.015 0.016 0.017 0.018
[20] 0.019 0.020 0.021 0.022 0.023 0.024 0.025 0.026 0.027 0.028 0.029 0.030 0.031 0.032 0.033 0.034 0.035 0.036 0.037
[39] 0.038 0.039 0.040 0.041 0.042 0.043 0.044 0.045 0.046 0.047 0.048 0.049 0.050 0.051 0.052 0.053 0.054 0.055 0.056
[58] 0.057 0.058 0.059 0.060 0.061 0.062 0.063 0.064 0.065 0.066 0.067 0.068 0.069 0.070 0.071 0.072 0.073 0.074 0.075
[77] 0.076 0.077 0.078 0.079 0.080 0.081 0.082 0.083 0.084 0.085 0.086 0.087 0.088 0.089 0.090 0.091 0.092 0.093 0.094
[96] 0.095 0.096 0.097 0.098 0.099 0.100 0.101 0.102 0.103 0.104 0.105 0.106 0.107 0.108 0.109 0.110 0.111 0.112 0.113
[115] 0.114 0.115 0.116 0.117 0.118 0.119 0.120 0.121 0.122 0.123 0.124 0.125 0.126 0.127 0.128 0.129 0.130 0.131 0.132
[134] 0.133 0.134 0.135 0.136 0.137 0.138 0.139 0.140 0.141 0.142 0.143 0.144 0.145 0.146 0.147 0.148 0.149 0.150 0.151
[153] 0.152 0.153 0.154 0.155 0.156 0.157 0.158 0.159 0.160 0.161 0.162 0.163 0.164 0.165 0.166 0.167 0.168 0.169 0.170
[172] 0.171 0.172 0.173 0.174 0.175 0.176 0.177 0.178 0.179 0.180 0.181 0.182 0.183 0.184 0.185 0.186 0.187 0.188 0.189
[191] 0.190 0.191 0.192 0.193 0.194 0.195 0.196 0.197 0.198 0.199 0.200 0.201 0.202 0.203 0.204 0.205 0.206 0.207 0.208
[210] 0.209 0.210 0.211 0.212 0.213 0.214 0.215 0.216 0.217 0.218 0.219 0.220 0.221 0.222 0.223 0.224 0.225 0.226 0.227
[229] 0.228 0.229 0.230 0.231 0.232 0.233 0.234 0.235 0.236 0.237 0.238 0.239 0.240 0.241 0.242 0.243 0.244 0.245 0.246
[248] 0.247 0.248 0.249 0.250 0.251 0.252 0.253 0.254 0.255 0.256 0.257 0.258 0.259 0.260 0.261 0.262 0.263 0.264 0.265
[267] 0.266 0.267 0.268 0.269 0.270 0.271 0.272 0.273 0.274 0.275 0.276 0.277 0.278 0.279 0.280 0.281 0.282 0.283 0.284
[286] 0.285 0.286 0.287 0.288 0.289 0.290 0.291 0.292 0.293 0.294 0.295 0.296 0.297 0.298 0.299 0.300 0.301 0.302 0.303
[305] 0.304 0.305 0.306 0.307 0.308 0.309 0.310 0.311 0.312 0.313 0.314 0.315 0.316 0.317 0.318 0.319 0.320 0.321 0.322
[324] 0.323 0.324 0.325 0.326 0.327 0.328 0.329 0.330 0.331 0.332 0.333 0.334 0.335 0.336 0.337 0.338 0.339 0.340 0.341
[343] 0.342 0.343 0.344 0.345 0.346 0.347 0.348 0.349 0.350 0.351 0.352 0.353 0.354 0.355 0.356 0.357 0.358 0.359 0.360
[362] 0.361 0.362 0.363 0.364 0.365 0.366 0.367 0.368 0.369 0.370 0.371 0.372 0.373 0.374 0.375 0.376 0.377 0.378 0.379
[381] 0.380 0.381 0.382 0.383 0.384 0.385 0.386 0.387 0.388 0.389 0.390 0.391 0.392 0.393 0.394 0.395 0.396 0.397 0.398
[400] 0.399 0.400 0.401 0.402 0.403 0.404 0.405 0.406 0.407 0.408 0.409 0.410 0.411 0.412 0.413 0.414 0.415 0.416 0.417
[419] 0.418 0.419 0.420 0.421 0.422 0.423 0.424 0.425 0.426 0.427 0.428 0.429 0.430 0.431 0.432 0.433 0.434 0.435 0.436
[438] 0.437 0.438 0.439 0.440 0.441 0.442 0.443 0.444 0.445 0.446 0.447 0.448 0.449 0.450 0.451 0.452 0.453 0.454 0.455
[457] 0.456 0.457 0.458 0.459 0.460 0.461 0.462 0.463 0.464 0.465 0.466 0.467 0.468 0.469 0.470 0.471 0.472 0.473 0.474
[476] 0.475 0.476 0.477 0.478 0.479 0.480 0.481 0.482 0.483 0.484 0.485 0.486 0.487 0.488 0.489 0.490 0.491 0.492 0.493
[495] 0.494 0.495 0.496 0.497 0.498 0.499 0.500
One option is to use sapply and between
library(dplyr)
sapply(seq_along(intervals), function(x) {
sum(between(a, intervals[x], intervals[x+1]))
})

Use group by logic with lapply function

I can use the following the function "tabyl" form the janitor package like this to apply tabyl to every column.
lapply(mtcars[,2:4],tabyl)
What I really want to do is use group by cyl and then use tabyl to those all those specified columns,something like this (does not work):
lapply(mtcars[,2:4],tabyl(cyl))
How would I put this above line into an lapply function? Or is there some other way of grouping and using a group by logic?
Please note, I have hundreds of variables in my actual data, and I want to apply tabyl to almost all the variables in my data (all the numeric at least). So I need a way of calling tabyl on them without explicitly calling on the variable names!
I want it to look like this(provided in an answer below), except I want to include MANY more variables. Imagine mtcars has 104 variables, and I want to apply this group tabyl on only the numeric ones.
cyl
4 6 8
n Percent n Percent n Percent
disp 71.1 1 9.091 0 0.00 0 0.000
75.7 1 9.091 0 0.00 0 0.000
78.7 1 9.091 0 0.00 0 0.000
79 1 9.091 0 0.00 0 0.000
95.1 1 9.091 0 0.00 0 0.000
108 1 9.091 0 0.00 0 0.000
120.1 1 9.091 0 0.00 0 0.000
120.3 1 9.091 0 0.00 0 0.000
121 1 9.091 0 0.00 0 0.000
140.8 1 9.091 0 0.00 0 0.000
145 0 0.000 1 14.29 0 0.000
146.7 1 9.091 0 0.00 0 0.000
160 0 0.000 2 28.57 0 0.000
167.6 0 0.000 2 28.57 0 0.000
225 0 0.000 1 14.29 0 0.000
258 0 0.000 1 14.29 0 0.000
275.8 0 0.000 0 0.00 3 21.429
301 0 0.000 0 0.00 1 7.143
304 0 0.000 0 0.00 1 7.143
318 0 0.000 0 0.00 1 7.143
350 0 0.000 0 0.00 1 7.143
351 0 0.000 0 0.00 1 7.143
360 0 0.000 0 0.00 2 14.286
400 0 0.000 0 0.00 1 7.143
440 0 0.000 0 0.00 1 7.143
460 0 0.000 0 0.00 1 7.143
472 0 0.000 0 0.00 1 7.143
All 11 100.000 7 100.00 14 100.000
hp 52 1 9.091 0 0.00 0 0.000
62 1 9.091 0 0.00 0 0.000
65 1 9.091 0 0.00 0 0.000
66 2 18.182 0 0.00 0 0.000
91 1 9.091 0 0.00 0 0.000
93 1 9.091 0 0.00 0 0.000
95 1 9.091 0 0.00 0 0.000
97 1 9.091 0 0.00 0 0.000
105 0 0.000 1 14.29 0 0.000
109 1 9.091 0 0.00 0 0.000
110 0 0.000 3 42.86 0 0.000
113 1 9.091 0 0.00 0 0.000
123 0 0.000 2 28.57 0 0.000
150 0 0.000 0 0.00 2 14.286
175 0 0.000 1 14.29 2 14.286
180 0 0.000 0 0.00 3 21.429
205 0 0.000 0 0.00 1 7.143
215 0 0.000 0 0.00 1 7.143
230 0 0.000 0 0.00 1 7.143
245 0 0.000 0 0.00 2 14.286
264 0 0.000 0 0.00 1 7.143
335 0 0.000 0 0.00 1 7.143
All 11 100.000 7 100.00 14 100.000
There are lots of ways to generate counts and frequencies by multiple variables. A solution with tables::tabular() enables one to display the "by group" on the column dimension, and other variables on the row dimension of a table.
We'll use the mtcars data to display disp and hp on the row dimension, and cyl on the column dimension.
library(tables)
tabular(((Factor(disp) + 1) + (Factor(hp) + 1))~(Factor(cyl))*((n=1) + Percent("col")),data = mtcars)
...and the output:
cyl
4 6 8
n Percent n Percent n Percent
disp 71.1 1 9.091 0 0.00 0 0.000
75.7 1 9.091 0 0.00 0 0.000
78.7 1 9.091 0 0.00 0 0.000
79 1 9.091 0 0.00 0 0.000
95.1 1 9.091 0 0.00 0 0.000
108 1 9.091 0 0.00 0 0.000
120.1 1 9.091 0 0.00 0 0.000
120.3 1 9.091 0 0.00 0 0.000
121 1 9.091 0 0.00 0 0.000
140.8 1 9.091 0 0.00 0 0.000
145 0 0.000 1 14.29 0 0.000
146.7 1 9.091 0 0.00 0 0.000
160 0 0.000 2 28.57 0 0.000
167.6 0 0.000 2 28.57 0 0.000
225 0 0.000 1 14.29 0 0.000
258 0 0.000 1 14.29 0 0.000
275.8 0 0.000 0 0.00 3 21.429
301 0 0.000 0 0.00 1 7.143
304 0 0.000 0 0.00 1 7.143
318 0 0.000 0 0.00 1 7.143
350 0 0.000 0 0.00 1 7.143
351 0 0.000 0 0.00 1 7.143
360 0 0.000 0 0.00 2 14.286
400 0 0.000 0 0.00 1 7.143
440 0 0.000 0 0.00 1 7.143
460 0 0.000 0 0.00 1 7.143
472 0 0.000 0 0.00 1 7.143
All 11 100.000 7 100.00 14 100.000
hp 52 1 9.091 0 0.00 0 0.000
62 1 9.091 0 0.00 0 0.000
65 1 9.091 0 0.00 0 0.000
66 2 18.182 0 0.00 0 0.000
91 1 9.091 0 0.00 0 0.000
93 1 9.091 0 0.00 0 0.000
95 1 9.091 0 0.00 0 0.000
97 1 9.091 0 0.00 0 0.000
105 0 0.000 1 14.29 0 0.000
109 1 9.091 0 0.00 0 0.000
110 0 0.000 3 42.86 0 0.000
113 1 9.091 0 0.00 0 0.000
123 0 0.000 2 28.57 0 0.000
150 0 0.000 0 0.00 2 14.286
175 0 0.000 1 14.29 2 14.286
180 0 0.000 0 0.00 3 21.429
205 0 0.000 0 0.00 1 7.143
215 0 0.000 0 0.00 1 7.143
230 0 0.000 0 0.00 1 7.143
245 0 0.000 0 0.00 2 14.286
264 0 0.000 0 0.00 1 7.143
335 0 0.000 0 0.00 1 7.143
All 11 100.000 7 100.00 14 100.000
>
UPDATE: automate the process
In the comments to my answer, the original poster asked how one might automate tabular() to avoid having to type out all the variables to be tabulated. We can do this with lapply() and an anonymous function.
Since the OP used column numbers as part of their question, we'll create a vector of columns from the mtcars data frame to be tabulated. We'll use that as the input to lapply(), along with two other arguments, one for the data frame, and another to specify the column variable in the table. Since the column variable will be a single variable, we specified it with its column name rather than a number.
# generalize and automate
varList <- 2:4
lapply(varList,function(x,df,byVar){
tabular((Factor(df[[x]],paste(colnames(df)[x])) + 1) ~ ((Factor(df[[byVar]],paste(byVar)))*((n=1) + Percent("col"))),
data= df)
},mtcars,"cyl")
The tricky part is how automating the process without the output tables having row headers of df[[x]] and column headers of df[[byVar]]. To avoid this situation, we extract the column name for the row dimension with colnames(), and we overwrite the header for the columns by pasting the byVar argument into the header.
...and the output:
[[1]]
cyl
4 6 8
cyl n Percent n Percent n Percent
4 11 100 0 0 0 0
6 0 0 7 100 0 0
8 0 0 0 0 14 100
All 11 100 7 100 14 100
[[2]]
cyl
4 6 8
disp n Percent n Percent n Percent
71.1 1 9.091 0 0.00 0 0.000
75.7 1 9.091 0 0.00 0 0.000
78.7 1 9.091 0 0.00 0 0.000
79 1 9.091 0 0.00 0 0.000
95.1 1 9.091 0 0.00 0 0.000
108 1 9.091 0 0.00 0 0.000
120.1 1 9.091 0 0.00 0 0.000
120.3 1 9.091 0 0.00 0 0.000
121 1 9.091 0 0.00 0 0.000
140.8 1 9.091 0 0.00 0 0.000
145 0 0.000 1 14.29 0 0.000
146.7 1 9.091 0 0.00 0 0.000
160 0 0.000 2 28.57 0 0.000
167.6 0 0.000 2 28.57 0 0.000
225 0 0.000 1 14.29 0 0.000
258 0 0.000 1 14.29 0 0.000
275.8 0 0.000 0 0.00 3 21.429
301 0 0.000 0 0.00 1 7.143
304 0 0.000 0 0.00 1 7.143
318 0 0.000 0 0.00 1 7.143
350 0 0.000 0 0.00 1 7.143
351 0 0.000 0 0.00 1 7.143
360 0 0.000 0 0.00 2 14.286
400 0 0.000 0 0.00 1 7.143
440 0 0.000 0 0.00 1 7.143
460 0 0.000 0 0.00 1 7.143
472 0 0.000 0 0.00 1 7.143
All 11 100.000 7 100.00 14 100.000
[[3]]
cyl
4 6 8
hp n Percent n Percent n Percent
52 1 9.091 0 0.00 0 0.000
62 1 9.091 0 0.00 0 0.000
65 1 9.091 0 0.00 0 0.000
66 2 18.182 0 0.00 0 0.000
91 1 9.091 0 0.00 0 0.000
93 1 9.091 0 0.00 0 0.000
95 1 9.091 0 0.00 0 0.000
97 1 9.091 0 0.00 0 0.000
105 0 0.000 1 14.29 0 0.000
109 1 9.091 0 0.00 0 0.000
110 0 0.000 3 42.86 0 0.000
113 1 9.091 0 0.00 0 0.000
123 0 0.000 2 28.57 0 0.000
150 0 0.000 0 0.00 2 14.286
175 0 0.000 1 14.29 2 14.286
180 0 0.000 0 0.00 3 21.429
205 0 0.000 0 0.00 1 7.143
215 0 0.000 0 0.00 1 7.143
230 0 0.000 0 0.00 1 7.143
245 0 0.000 0 0.00 2 14.286
264 0 0.000 0 0.00 1 7.143
335 0 0.000 0 0.00 1 7.143
All 11 100.000 7 100.00 14 100.000
One way is this, although I don't know if you need the cyl column:
by(mtcars[,2:4],mtcars$cyl,lapply,tabyl)
Or a tidy way, (I think the list part can be improved) :
out = mtcars[,2:4] %>%
mutate(id=cyl) %>%
group_by(id) %>% summarize_all(~list(tabyl(.)))
out
# A tibble: 3 x 4
id cyl disp hp
<dbl> <list> <list> <list>
1 4 <df[,3] [1 × 3]> <df[,3] [11 × 3]> <df[,3] [10 × 3]>
2 6 <df[,3] [1 × 3]> <df[,3] [5 × 3]> <df[,3] [4 × 3]>
3 8 <df[,3] [1 × 3]> <df[,3] [11 × 3]> <df[,3] [9 × 3]>
out %>% filter(id==4) %>% pull(hp)
[[1]]
. n percent
52 1 0.09090909
62 1 0.09090909
65 1 0.09090909
66 2 0.18181818
91 1 0.09090909
93 1 0.09090909
95 1 0.09090909
97 1 0.09090909
109 1 0.09090909
113 1 0.09090909

Error when using DiceKriging function km: 'Error in t.default(T) : argument is not a matrix'

I'm attempting to apply the km function from the DiceKriging package to a multivariate dataset. When I use my entire dataset (dimensions = [938,13]), the algorithm runs without problem. When I use a smaller subset of this dataset (dimensions = [94,13]), I encounter this error:
Error in t.default(T) : argument is not a matrix
I'm not sure what's causing this error to occur when analyzing the data subset. For reference, here's the data and code I'm running:
g<-km(design=sub.ev,response=sub.rv,covtype="matern5_2")
> sub.ev
1 -0.1519272795 1 0 0 27.769 27.45715 8.02 7.330 21.16 17.73979 0.000 4119.280 0.5338750
2 -0.1436123857 1 0 0 27.420 27.45715 8.04 6.895 20.19 17.73979 0.000 4119.280 0.5338750
3 -0.1144934053 1 0 0 27.021 27.45715 8.06 6.690 19.54 17.73979 0.000 4119.280 0.5338750
4 -0.1585420923 1 0 0 26.531 27.45715 8.09 6.160 18.64 17.73979 0.000 4119.280 0.5338750
5 -0.0588867480 1 0 0 26.107 27.45715 8.11 5.790 17.71 17.73979 0.000 4119.280 0.5338750
6 -0.0402306730 1 0 0 25.704 27.45715 8.15 5.840 17.26 17.73979 0.000 4119.280 0.5338750
7 0.0161781773 1 0 0 25.265 27.45715 8.18 5.390 16.77 17.73979 0.000 4119.280 0.5338750
8 0.0660891620 1 0 0 24.967 27.45715 8.21 5.425 16.18 17.73979 0.000 4119.280 0.5338750
9 0.0079745010 1 0 0 24.665 27.45715 8.24 5.250 16.51 17.73979 0.000 4119.280 0.5338750
10 -0.0191421967 1 0 0 24.416 27.45715 8.27 5.090 15.46 17.73979 0.000 4119.280 0.5338750
11 0.0364618430 1 0 0 24.232 27.45715 8.29 4.820 14.47 17.73979 0.000 4119.280 0.5338750
12 0.0160652203 1 0 0 24.077 27.45715 8.31 4.745 15.57 17.73979 0.000 4119.280 0.5338750
13 0.0763233707 1 0 0 23.906 27.45715 8.33 4.680 15.14 17.73979 0.000 4119.280 0.5338750
14 0.0211064293 1 0 0 23.794 27.45715 8.35 4.940 18.25 17.73979 0.000 4119.280 0.5338750
15 0.0166228227 1 0 0 23.642 27.45715 8.36 5.190 14.60 17.73979 0.000 4119.280 0.5338750
16 0.0022511447 1 0 0 23.531 27.45715 8.37 5.425 15.30 17.73979 0.000 4119.280 0.5338750
17 -0.0122220320 1 0 0 23.445 27.45715 8.39 6.110 14.79 17.73979 0.000 4119.280 0.5338750
18 -0.0057961107 1 0 0 23.442 27.45715 8.39 5.940 14.49 17.73979 0.000 4119.280 0.5338750
19 -0.0322631907 1 0 0 23.512 27.45715 8.39 5.380 14.38 17.73979 0.000 4119.280 0.5338750
95 0.0159845000 1 0 0 27.306 27.45715 8.24 7.380 17.14 17.73979 0.000 4119.280 0.5338750
96 -0.0437959553 1 0 0 27.054 27.45715 8.25 7.100 15.35 17.73979 0.000 4119.280 0.5338750
97 -0.0805486920 1 0 0 26.520 26.88009 8.27 6.900 14.87 16.96688 0.000 3725.365 0.5422500
98 -0.0941385073 1 0 0 26.306 26.88009 8.30 6.725 15.19 16.96688 0.000 3725.365 0.5422500
99 -0.1159254400 1 0 0 26.039 26.88009 8.32 6.590 15.42 16.96688 0.000 3725.365 0.5422500
100 -0.0362266430 1 0 0 25.637 26.88009 8.36 6.280 14.34 16.96688 0.000 3725.365 0.5422500
101 0.0326682983 1 0 0 25.299 26.88009 8.38 6.100 14.36 16.96688 0.000 3725.365 0.5422500
102 0.0471005793 1 0 0 24.742 26.88009 8.44 5.955 13.48 16.96688 0.000 3725.365 0.5422500
103 -0.0596346010 1 0 0 24.262 26.88009 8.49 5.530 12.91 16.96688 0.000 3725.365 0.5422500
104 -0.0043536683 1 0 0 23.856 26.88009 8.52 5.315 12.80 16.96688 0.000 3725.365 0.5422500
105 -0.0177714297 1 0 0 23.578 26.88009 8.54 5.240 13.01 16.96688 0.000 3725.365 0.5422500
106 0.0169363000 1 0 0 23.313 26.88009 8.56 5.225 12.69 16.96688 0.000 3725.365 0.5422500
107 -0.0170451183 1 0 0 23.023 26.88009 8.59 5.090 12.71 16.96688 0.000 3725.365 0.5422500
108 0.0231896353 1 0 0 22.755 26.88009 8.62 5.280 11.58 16.96688 0.000 3725.365 0.5422500
109 0.0053651757 1 0 0 22.510 26.88009 8.65 5.690 11.28 16.96688 0.000 3725.365 0.5422500
110 0.0281674793 1 0 0 22.342 26.88009 8.68 5.555 11.21 16.96688 0.000 3725.365 0.5422500
111 0.0009483843 1 0 0 22.168 26.88009 8.70 5.770 10.68 16.96688 0.000 3725.365 0.5422500
112 0.0147559413 1 0 0 22.151 26.88009 8.69 5.995 11.57 16.96688 0.000 3725.365 0.5422500
190 -0.0115338953 1 0 0 27.062 25.91990 8.21 8.000 15.35 14.07031 0.000 3949.390 0.5342917
191 -0.0189870410 1 0 0 26.545 25.91990 8.22 7.870 14.21 14.07031 0.000 3949.390 0.5342917
192 -0.0184237180 1 0 0 26.104 25.91990 8.25 7.795 20.04 14.07031 0.000 3949.390 0.5342917
193 -0.0319295797 1 0 0 25.859 25.91990 8.26 7.730 13.14 14.07031 0.000 3949.390 0.5342917
194 -0.0184753123 1 0 0 25.573 25.91990 8.30 7.585 12.93 14.07031 0.000 3949.390 0.5342917
195 -0.0197481060 1 0 0 25.005 25.91990 8.36 7.490 11.29 14.07031 0.000 3949.390 0.5342917
196 -0.0215467360 1 0 0 24.710 25.91990 8.39 7.245 11.10 14.07031 0.000 3949.390 0.5342917
197 0.0265223447 1 0 0 24.455 25.91990 8.42 7.240 11.44 14.07031 0.000 3949.390 0.5342917
198 0.0470763840 1 0 0 24.087 25.91990 8.45 7.225 11.42 14.07031 0.000 3949.390 0.5342917
199 0.0622169450 1 0 0 23.673 25.91990 8.48 7.260 11.20 14.07031 0.000 3949.390 0.5342917
200 -0.0104582193 1 0 0 23.301 25.91990 8.52 7.190 11.57 14.07031 0.000 3949.390 0.5342917
201 0.0121972077 1 0 0 23.005 25.91990 8.55 7.220 10.71 14.07031 0.000 3949.390 0.5342917
202 0.0219721027 1 0 0 22.745 25.91990 8.59 7.245 10.86 14.07031 0.000 3949.390 0.5342917
203 0.0208879210 1 0 0 22.576 25.91990 8.61 7.180 10.62 14.07031 0.000 3949.390 0.5342917
204 0.0192644400 1 0 0 22.417 25.91990 8.63 7.220 10.47 14.07031 0.000 3949.390 0.5342917
205 0.0066226250 1 0 0 22.243 25.91990 8.65 7.170 10.29 14.07031 0.000 3949.390 0.5342917
206 0.0036012053 1 0 0 22.136 25.91990 8.67 7.200 10.86 14.07031 0.000 3949.390 0.5342917
207 -0.0027906963 1 0 0 22.012 25.91990 8.69 7.150 10.57 14.07031 0.000 3949.390 0.5342917
290 0.2266387893 1 0 0 27.763 26.21360 8.21 7.765 23.37 17.97479 0.000 3554.510 0.5938333
291 0.1151646527 1 0 0 27.377 26.21360 8.23 7.720 24.43 17.97479 0.000 3554.510 0.5938333
292 -0.0218444193 1 0 0 27.285 26.21360 8.23 7.655 23.40 17.97479 0.000 3554.510 0.5938333
293 -0.0908422353 1 0 0 26.935 26.21360 8.26 7.590 21.96 17.97479 0.000 3554.510 0.5938333
294 -0.1716709177 1 0 0 26.792 26.21360 8.28 7.540 23.30 17.97479 0.000 3554.510 0.5938333
295 -0.1943201847 1 0 0 26.869 26.21360 8.28 7.420 25.06 17.97479 0.000 3554.510 0.5938333
296 -0.1529985130 1 0 0 26.733 26.21360 8.28 7.310 33.62 17.97479 0.000 3554.510 0.5938333
297 -0.1106344563 1 0 0 26.394 26.21360 8.29 7.160 29.06 17.97479 0.000 3554.510 0.5938333
298 -0.0638089193 1 0 0 25.973 26.21360 8.32 7.010 23.25 17.97479 0.000 3554.510 0.5938333
299 -0.1208449610 1 0 0 25.501 26.21360 8.37 6.970 21.10 17.97479 0.000 3554.510 0.5938333
300 -0.2310616323 1 0 0 25.192 26.21360 8.40 7.010 19.16 17.97479 0.000 3554.510 0.5938333
301 -0.2043969970 1 0 0 24.867 26.21360 8.43 7.030 17.80 17.97479 0.000 3554.510 0.5938333
302 -0.2003585363 1 0 0 24.634 26.21360 8.45 7.080 17.01 17.97479 0.000 3554.510 0.5938333
303 -0.2535806687 1 0 0 24.468 26.21360 8.47 7.130 16.41 17.97479 0.000 3554.510 0.5938333
304 -0.2464920640 1 0 0 24.234 26.21360 8.50 7.055 15.50 17.97479 0.000 3554.510 0.5938333
389 0.2461277410 1 0 0 27.981 26.78711 8.29 6.760 20.65 22.02125 0.000 3589.260 0.7715417
390 0.1660650063 1 0 0 27.915 26.78711 8.29 6.575 22.44 22.02125 0.000 3589.260 0.7715417
391 -0.0609562143 1 0 0 27.757 26.78711 8.30 6.580 22.92 22.02125 0.000 3589.260 0.7715417
392 -0.2020911323 1 0 0 27.527 26.78711 8.32 6.590 22.91 22.02125 0.000 3589.260 0.7715417
393 -0.2980735343 1 0 0 27.563 26.78711 8.31 6.510 24.30 22.02125 0.000 3589.260 0.7715417
394 -0.2370078227 1 0 0 27.400 26.78711 8.32 6.525 22.67 22.02125 0.000 3589.260 0.7715417
395 -0.3440380117 1 0 0 27.274 26.78711 8.32 6.530 22.45 22.02125 0.000 3589.260 0.7715417
396 -0.0057092573 1 0 0 27.132 26.78711 8.30 6.490 22.08 22.02125 0.000 3589.260 0.7715417
397 -0.0241878650 1 0 0 27.018 26.78711 8.30 6.450 24.00 22.02125 0.000 3589.260 0.7715417
398 -0.2080665820 1 0 0 26.834 26.78711 8.32 6.405 25.81 22.02125 0.000 3589.260 0.7715417
399 -0.1716383953 1 0 0 26.637 26.78711 8.33 6.340 25.19 22.02125 0.000 3589.260 0.7715417
400 -0.2570107420 1 0 0 26.476 26.78711 8.34 6.300 22.05 22.02125 0.000 3589.260 0.7715417
495 -0.1302196527 1 0 0 25.967 26.53893 8.40 6.300 11.90 15.57448 2.286 2313.087 0.6087083
496 -0.0391473870 1 0 0 25.708 26.53893 8.41 6.260 10.77 15.57448 2.286 2313.087 0.6087083
587 -0.0382961500 1 0 0 28.647 26.98353 8.28 8.110 17.52 12.12667 0.000 3980.156 0.4227083
588 -0.0035965477 1 0 0 28.652 26.98353 8.26 7.920 21.14 12.12667 0.000 3980.156 0.4227083
589 -0.0050414577 1 0 0 28.307 26.98353 8.27 7.830 18.39 12.12667 0.000 3980.156 0.4227083
590 0.0354186967 1 0 0 27.896 26.98353 8.27 6.825 17.35 12.12667 0.000 3980.156 0.4227083
591 -0.0676664363 1 0 0 27.581 26.98353 8.27 6.110 16.06 12.12667 0.000 3980.156 0.4227083
592 -0.1723716683 1 0 0 27.223 26.98353 8.29 6.245 15.14 12.12667 0.000 3980.156 0.4227083
688 0.1598606430 1 0 0 28.738 27.54833 8.35 5.965 24.99 15.51010 0.000 2554.804 0.4730417
912 -0.1977195740 1 0 0 26.653 27.24447 9.09 7.635 11.38 12.71729 0.000 4002.945 0.5135417
913 -0.1306465143 1 0 0 27.399 27.24447 9.07 7.420 12.84 12.71729 0.000 4002.945 0.5135417
914 -0.1175953210 1 0 0 28.024 27.24447 9.01 7.270 13.91 12.71729 0.000 4002.945 0.5135417
> sub.rv
[1] 3.24 2.65 2.18 1.75 1.52 1.32 1.25 1.18 1.16 1.10 1.05 1.01 0.95 0.88 0.83 0.76 0.72 0.61 0.58 3.64 3.46 2.97 2.64 2.27 2.07 1.72 1.33 1.03 0.89 0.82 0.69 0.64 0.60
[34] 0.55 0.44 0.33 0.26 4.65 4.10 3.55 3.30 2.70 2.12 1.74 1.53 1.27 1.02 0.88 0.76 0.66 0.58 0.54 0.47 0.40 0.28 5.87 5.24 5.09 4.22 3.90 3.94 3.41 2.75 2.18 1.86 1.67
[67] 1.52 1.42 1.34 1.35 5.88 5.56 4.88 3.98 3.76 3.31 3.11 2.78 2.41 1.97 1.65 1.39 1.63 1.44 4.08 4.83 3.84 3.23 2.83 2.36 4.33 1.86 3.17 4.48
I've run into the same issue. It is resolved when you remove the variables in your design matrix that have 0 variance (as these are simply constants). In your case, you need to remove columns (3:5).

import specific rows from "txt" into R

I have a "example.txt" document just as follows:
SIGNAL: 40 41 42
0.406 0.043 0.051 0.021 0.013
0.056 0.201 0.026 0.009 0.000
0.000 0.128 0 0.009 0.000
TOTAL: 0.657
SIGNAL: 44 45 46 48
0.128 0.338 0.026
0.333 0.03 0.000
0.060 0.013 0.004
0.009 0.017 0.009
0.013 0 0.000
TOTAL: 0.704
SIGNAL: 51 52 54
0.368 0.081 0.085 0.004
0.162 0.09 0.064 0.073
0.013 0.017 0.009 0.000
TOTAL: 0.266
SIGNAL: 60 61 62 63 64 65 66 67
0.530 0.030
0.009 0.179
0.154 0.004
0.068 0.009
TOTAL: 0.796
I want to import the rows between "SIGNAL: 44 45 46 48" and "TOTAL: 0.704" into R, I use read.table("example.txt",skip=6 ,nrow=5) to extract these specific rows, it works.
V1 V2 V3
1 0.128 0.338 0.026
2 0.333 0.030 0.000
3 0.060 0.013 0.004
4 0.009 0.017 0.009
5 0.013 0.000 0.000
However, my real data (has 450,000 rows) is very big, if I want to extract the rows between "SIGNAL: 3000 3001 3002 3003" and the next"TOTAL", how can I do with it? Thank you so much!
I have worked it out based on akrun's code. For example, I want to extract the first two sets. I can just use:
lines <- readLines('example.txt')
g<-c(40,44)
sapply(1:length(g), function(x){Map(function(i,j) read.table(text=lines[(i+1):(j-1)], sep='', header=FALSE), grep(paste('SIGNAL:',g[x]), lines), grep('TOTAL', lines)[which(grep(paste('SIGNAL:',g[x]), lines)==grep('SIGNAL', lines))])})

Referring to other cells in R without using a for loop

I am new to R and one thing I have been told again and again is that there really is no need for for loops. I have had some success with apply but could not figure out how to use it in this instance.
Here is the data I am working with:
Bid Ask Exp Strike Price V6
51 4.95 5.15 NOV1 13 335 5.050 3.08
52 3.40 3.50 NOV1 13 340 3.450 NA
53 2.28 2.42 NOV1 13 345 2.350 NA
54 1.51 1.57 NOV1 13 350 1.540 NA
55 0.99 1.07 NOV1 13 355 1.030 NA
56 0.66 0.71 NOV1 13 360 0.685 NA
57 0.46 0.51 NOV1 13 365 0.485 NA
58 0.33 0.37 NOV1 13 370 0.350 NA
59 0.25 0.28 NOV1 13 375 0.265 NA
60 0.18 0.24 NOV1 13 380 0.210 NA
61 0.11 0.20 NOV1 13 385 0.155 NA
62 0.05 0.17 NOV1 13 390 0.110 NA
63 0.05 0.16 NOV1 13 395 0.105 NA
64 0.07 0.13 NOV1 13 400 0.100 NA
In column 6 (called V6), I want the values to be twice the value in the price column in the cell that is 3 below the current row. For example, Row 1 in Col 6 is 3.08 which is 2*1.54 which is in column 5, row 4. I would like to do this for every cell in row 6 until it runs out in row 12. NA is fine in column 6 after this row.
Here is how I accomplished this:
for (i in 1:11){
data[i,6] <- 2*data[i+3,5]}
Is there a faster/easier/ more appropriate way to do this?
Here is the final data as I want it.
Bid Ask Exp Strike Price V6
51 4.95 5.15 NOV1 13 335 5.050 3.08
52 3.40 3.50 NOV1 13 340 3.450 2.06
53 2.28 2.42 NOV1 13 345 2.350 1.37
54 1.51 1.57 NOV1 13 350 1.540 0.97
55 0.99 1.07 NOV1 13 355 1.030 0.70
56 0.66 0.71 NOV1 13 360 0.685 0.53
57 0.46 0.51 NOV1 13 365 0.485 0.42
58 0.33 0.37 NOV1 13 370 0.350 0.31
59 0.25 0.28 NOV1 13 375 0.265 0.22
60 0.18 0.24 NOV1 13 380 0.210 0.21
61 0.11 0.20 NOV1 13 385 0.155 0.20
62 0.05 0.17 NOV1 13 390 0.110 NA
63 0.05 0.16 NOV1 13 395 0.105 NA
64 0.07 0.13 NOV1 13 400 0.100 NA
Thank you.
use mydata$V6 <- 2 * c(mydata$Price[-(1:3)], rep(NA, 3))
df1 is your data. I used sapply here which should be faster than for loop
df1$V6<-sapply(1:nrow(df1),function(x) 2*df1[x+3,5])

Resources