Merge rows with duplicate IDs

Merge rows with duplicate IDs - r

I would like to merge and sum the values of each row that contains duplicated IDs.
For example, the data frame below contains a duplicated symbol 'LOC102723897'. I would like to merge these two rows and sum the value within each column, so that one row appears for the duplicated symbol.
> head(y$genes)
SM01 SM02 SM03 SM04 SM05 SM06 SM07 SM08 SM09 SM10 SM11 SM12 SM13 SM14 SM15 SM16 SM17 SM18 SM19 SM20 SM21 SM22
1 32 29 23 20 27 105 80 64 83 80 94 58 122 76 78 70 34 32 45 42 138 30
2 246 568 437 343 304 291 542 457 608 433 218 329 483 376 410 296 550 533 537 473 296 382
3 30 23 30 13 20 18 23 13 31 11 15 27 36 21 23 25 26 27 37 27 31 16
4 1450 2716 2670 2919 2444 1668 2923 2318 3867 2084 1121 2175 3022 2308 2541 1613 2196 1851 2843 2078 2180 1902
5 288 366 327 334 314 267 550 410 642 475 219 414 679 420 425 308 359 406 550 398 399 268
6 34 59 62 68 42 31 49 45 62 51 40 32 30 39 41 75 54 59 83 99 37 37
SM23 SM24 SM25 SM26 SM27 SM28 SM29 SM30 Symbol
1 41 23 57 160 84 67 87 113 LOC102723897
2 423 535 624 304 568 495 584 603 LINC01128
3 31 21 49 13 33 31 14 31 LINC00115
4 2453 3041 3590 2343 3450 3725 3336 3850 NOC2L
5 403 347 468 478 502 563 611 577 LOC102723897
6 45 51 56 107 79 105 92 131 PLEKHN1
> dim(y)
[1] 12928 30
I attempted using plyr to merge rows based on the 'Symbol' column, but it's not working.
> ddply(y$genes,"Symbol",numcolwise(sum))
> dim(y)
[1] 12928 30
> length(y$genes$Symbol)
[1] 12928
> length(unique(y$genes$Symbol))
[1] 12896

You group-by on Symbol and sum all columns.
library(dplyr)
df %>% group_by(Symbol) %>% summarise_all(sum)
using data.table
library(data.table)
setDT(df)[ , lapply(.SD, sum),by="Symbol"]

We can just use aggregate from base R
aggregate(.~ Symbol, df, FUN = sum)

Related

How to do a manipulations with datasets for which the name is listed in a vector?

I'd like to do several manipulations with datasets that are in-built in R from the packages that I have. So, first, I made a vector with dataset's names, but when I tried to filter the datasets which have only one column, I got an error, saying that the length of the argument is 0. Here is the code:
for (i in datasets){
if (ncol(i)==1){dataset <- i datasets <- c(dataset, datasets) }
}
It treats the names of the datasets as a character vector.
Here is the head of the aforementioned vector: [1] ability.cov airmiles AirPassengers airquality anscombe attenu. It's silly, but how could I treat the entries as dataframes?

I don't fully understand your logic, but based on your code, you want to identify which dataset that has one column by using ncol(x) == 1. If that's true, then you need to deal with some issues:
the various structures of the datasets. ncol produces the number of columns on data.frame and matrix but does not on time-series. For example: ncol(anscombe) results in 8 but ncol(AirPassengers) results in NULL. If you decide to use ncol, then you need to coerce each dataset to a data.frame by using as.data.frame.
indexing the character vector of the names of the datasets. You need to call a dataset, not its character name, to be able to use as.data.frame. One way of doing this is by using eval(parse(text=the_name)).
the way to store the result. You can use c() to combine the results but the datasets will be converted to vectors, no longer in their initial structures. I recommend using list to preserve the data frame structures of the datasets.
Here is one possible solution based on those considerations:
datasets <- c("ability.cov", "airmiles", "AirPassengers", "airquality", "anscombe", "attenu")
single_col_datasets <- vector('list', 1)
for (i in seq_along(datasets)){
if (ncol(as.data.frame(eval(parse(text = datasets[i])))) == 1){
single_col_datasets[[i]] <- as.data.frame(eval(parse(text = datasets[i])))
names(single_col_datasets[[i]]) <- datasets[i]
}
not.null.element <- single_col_datasets[lengths(single_col_datasets) != 0]
new.datasets <- list(not.null.element, datasets)
}
Here is the result:
new.datasets
[[1]]
[[1]][[1]]
airmiles
1 412
2 480
3 683
4 1052
5 1385
6 1418
7 1634
8 2178
9 3362
10 5948
11 6109
12 5981
13 6753
14 8003
15 10566
16 12528
17 14760
18 16769
19 19819
20 22362
21 25340
22 25343
23 29269
24 30514
[[1]][[2]]
AirPassengers
1 112
2 118
3 132
4 129
5 121
6 135
7 148
8 148
9 136
10 119
11 104
12 118
13 115
14 126
15 141
16 135
17 125
18 149
19 170
20 170
21 158
22 133
23 114
24 140
25 145
26 150
27 178
28 163
29 172
30 178
31 199
32 199
33 184
34 162
35 146
36 166
37 171
38 180
39 193
40 181
41 183
42 218
43 230
44 242
45 209
46 191
47 172
48 194
49 196
50 196
51 236
52 235
53 229
54 243
55 264
56 272
57 237
58 211
59 180
60 201
61 204
62 188
63 235
64 227
65 234
66 264
67 302
68 293
69 259
70 229
71 203
72 229
73 242
74 233
75 267
76 269
77 270
78 315
79 364
80 347
81 312
82 274
83 237
84 278
85 284
86 277
87 317
88 313
89 318
90 374
91 413
92 405
93 355
94 306
95 271
96 306
97 315
98 301
99 356
100 348
101 355
102 422
103 465
104 467
105 404
106 347
107 305
108 336
109 340
110 318
111 362
112 348
113 363
114 435
115 491
116 505
117 404
118 359
119 310
120 337
121 360
122 342
123 406
124 396
125 420
126 472
127 548
128 559
129 463
130 407
131 362
132 405
133 417
134 391
135 419
136 461
137 472
138 535
139 622
140 606
141 508
142 461
143 390
144 432
[[2]]
[1] "ability.cov" "airmiles" "AirPassengers" "airquality" "anscombe" "attenu"

You can use the get function:
for (i in datasets){
if (ncol(get(i))==1){
dataset <- i
datasets <- c(dataset, datasets)
}
}

Split a data.frame into n random groups with x rows each

Assume a data.frame as follows:
df <- data.frame(name = paste0("Person",rep(1:30)),
number = sample(1:100, 30, replace=TRUE),
focus = sample(1:500, 30, replace=TRUE))
I want to split the above data.frame into 9 groups, each with 9 observations. Each person can be assigned to multiple groups (replacement), so that all 9 groups have all 10 observations (since 9 groups x 9 observations require 81 rows while the df has only 30).
The output will ideally be a large list of 1000 data.frames.
Are there any efficient ways of doing this? This is just a sample data.frame. The actual df has ~10k rows and will require 1000 groups each with 30 rows.
Many thanks.

Is this what you are looking for?
res <- replicate(1000, df[sample.int(nrow(df), 30, TRUE), ], FALSE)
df I used
df <- data.frame(name = paste0("Person",rep(1:1e4)),
number = sample(1:100, 1e4, replace=TRUE),
focus = sample(1:500, 1e4, replace=TRUE))
Output
> res[1:3]
[[1]]
name number focus
529 Person529 5 351
9327 Person9327 4 320
1289 Person1289 78 164
8157 Person8157 46 183
6939 Person6939 38 61
4066 Person4066 26 103
132 Person132 34 39
6576 Person6576 36 397
5376 Person5376 47 456
6123 Person6123 10 18
5318 Person5318 39 42
6355 Person6355 62 212
340 Person340 90 256
7050 Person7050 19 198
1500 Person1500 42 208
175 Person175 34 30
3751 Person3751 99 441
3813 Person3813 93 492
7428 Person7428 72 142
6840 Person6840 58 45
6501 Person6501 95 499
5124 Person5124 16 159
3373 Person3373 38 36
5622 Person5622 40 203
8761 Person8761 9 225
6252 Person6252 75 444
4502 Person4502 58 337
5344 Person5344 24 233
4036 Person4036 59 265
8764 Person8764 45 1
[[2]]
name number focus
8568 Person8568 87 360
3968 Person3968 67 468
4481 Person4481 46 140
8055 Person8055 73 286
7794 Person7794 92 336
1110 Person1110 6 434
6736 Person6736 4 58
9758 Person9758 60 49
9356 Person9356 89 300
9719 Person9719 100 366
4183 Person4183 5 124
1394 Person1394 87 346
2642 Person2642 81 449
3592 Person3592 65 358
579 Person579 21 395
9551 Person9551 39 495
4946 Person4946 73 32
4081 Person4081 98 270
4062 Person4062 27 150
7698 Person7698 52 436
5388 Person5388 89 177
9598 Person9598 91 474
8624 Person8624 3 464
392 Person392 82 483
5710 Person5710 43 293
4942 Person4942 99 350
3333 Person3333 89 91
6789 Person6789 99 259
7115 Person7115 100 320
1431 Person1431 77 263
[[3]]
name number focus
201 Person201 100 272
4674 Person4674 27 410
9728 Person9728 18 275
9422 Person9422 2 396
9783 Person9783 45 37
5552 Person5552 76 109
3871 Person3871 49 277
3411 Person3411 64 24
5799 Person5799 29 131
626 Person626 31 122
3103 Person3103 2 76
8043 Person8043 90 384
3157 Person3157 90 392
7093 Person7093 11 169
2779 Person2779 83 2
2601 Person2601 77 122
9003 Person9003 50 163
9653 Person9653 4 235
9361 Person9361 100 391
4273 Person4273 83 383
4725 Person4725 35 436
2157 Person2157 71 486
3995 Person3995 25 258
3735 Person3735 24 221
303 Person303 81 407
4838 Person4838 64 198
6926 Person6926 90 417
6267 Person6267 82 284
8570 Person8570 67 317
2670 Person2670 21 342

Convert non-numeric rows and columns to zero

I have this data from an r package, where X is the dataset with all the data
library(ISLR)
data("Hitters")
X=Hitters
head(X)
here is one part of the data:
AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns CRBI CWalks League Division PutOuts Assists Errors Salary NewLeague
-Andy Allanson 293 66 1 30 29 14 1 293 66 1 30 29 14 A E 446 33 20 NA A
-Alan Ashby 315 81 7 24 38 39 14 3449 835 69 321 414 375 N W 632 43 10 475.0 N
-Alvin Davis 479 130 18 66 72 76 3 1624 457 63 224 266 263 A W 880 82 14 480.0 A
-Andre Dawson 496 141 20 65 78 37 11 5628 1575 225 828 838 354 N E 200 11 3 500.0 N
-Andres Galarraga 321 87 10 39 42 30 2 396 101 12 48 46 33 N E 805 40 4 91.5 N
-Alfredo Griffin 594 169 4 74 51 35 11 4408 1133 19 501 336 194 A W 282 421 25 750.0 A
I want to convert all the columns and the rows with non numeric values to zero, is there any simple way to do this.
I found here an example how to remove the rows for one column just but for more I have to do it for every column manually.
Is in r any function that does this for all columns and rows?

To remove non-numeric columns, perhaps something like this?
df %>%
select(which(sapply(., is.numeric)))
# AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun
#-Andy Allanson 293 66 1 30 29 14 1 293 66 1
#-Alan Ashby 315 81 7 24 38 39 14 3449 835 69
#-Alvin Davis 479 130 18 66 72 76 3 1624 457 63
#-Andre Dawson 496 141 20 65 78 37 11 5628 1575 225
#-Andres Galarraga 321 87 10 39 42 30 2 396 101 12
#-Alfredo Griffin 594 169 4 74 51 35 11 4408 1133 19
# CRuns CRBI CWalks PutOuts Assists Errors Salary
#-Andy Allanson 30 29 14 446 33 20 NA
#-Alan Ashby 321 414 375 632 43 10 475.0
#-Alvin Davis 224 266 263 880 82 14 480.0
#-Andre Dawson 828 838 354 200 11 3 500.0
#-Andres Galarraga 48 46 33 805 40 4 91.5
#-Alfredo Griffin 501 336 194 282 421 25 750.0
or
df %>%
select(-which(sapply(., function(x) is.character(x) | is.factor(x))))
Or much neater (thanks to #AntoniosK):
df %>% select_if(is.numeric)
Update
To additionally replace NAs with 0, you can do
df %>% select_if(is.numeric) %>% replace(is.na(.), 0)
# AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun
#-Andy Allanson 293 66 1 30 29 14 1 293 66 1
#-Alan Ashby 315 81 7 24 38 39 14 3449 835 69
#-Alvin Davis 479 130 18 66 72 76 3 1624 457 63
#-Andre Dawson 496 141 20 65 78 37 11 5628 1575 225
#-Andres Galarraga 321 87 10 39 42 30 2 396 101 12
#-Alfredo Griffin 594 169 4 74 51 35 11 4408 1133 19
# CRuns CRBI CWalks PutOuts Assists Errors Salary
#-Andy Allanson 30 29 14 446 33 20 0.0
#-Alan Ashby 321 414 375 632 43 10 475.0
#-Alvin Davis 224 266 263 880 82 14 480.0
#-Andre Dawson 828 838 354 200 11 3 500.0
#-Andres Galarraga 48 46 33 805 40 4 91.5
#-Alfredo Griffin 501 336 194 282 421 25 750.0

library(ISLR)
data("Hitters")
d = head(Hitters)
library(dplyr)
d %>%
mutate_if(function(x) !is.numeric(x), function(x) 0) %>% # if column is non numeric add zeros
mutate_all(function(x) ifelse(is.na(x), 0, x)) # if there is an NA element replace it with 0
# AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns CRBI CWalks League Division PutOuts Assists Errors Salary NewLeague
# 1 293 66 1 30 29 14 1 293 66 1 30 29 14 0 0 446 33 20 0.0 0
# 2 315 81 7 24 38 39 14 3449 835 69 321 414 375 0 0 632 43 10 475.0 0
# 3 479 130 18 66 72 76 3 1624 457 63 224 266 263 0 0 880 82 14 480.0 0
# 4 496 141 20 65 78 37 11 5628 1575 225 828 838 354 0 0 200 11 3 500.0 0
# 5 321 87 10 39 42 30 2 396 101 12 48 46 33 0 0 805 40 4 91.5 0
# 6 594 169 4 74 51 35 11 4408 1133 19 501 336 194 0 0 282 421 25 750.0 0
If you want to avoid function(x) you can use this
d %>%
mutate_if(Negate(is.numeric), ~0) %>%
mutate_all(~ifelse(is.na(.), 0, .))

You can get the numeric columns with sapply/inherits.
X <- Hitters
inx <- sapply(X, inherits, c("integer", "numeric"))
Y <- X[inx]
Then, it wouldn't make much sense to remove the rows with non-numeric entries, they were already removed, but you could do
inx <- apply(Y, 1, function(y) all(inherits(y, c("integer", "numeric"))))
Y[inx, ]

continuous value supplied to discrete scale

I am new to ggplot2. In fact, I only discovered it last week and I haven't quite figured out yet how to use aesthetics and scales etc. There is probably a very easy solution to my problem but I couldn't find a satisfying answer online.
Sorry for the size of the message, but all the data used is in the following script:
dados
Fres Vc Lu
1 466 30 10
2 416 30 10
3 465 30 10
4 416 30 10
5 464 30 10
6 416 30 10
7 476 30 10
8 412 30 10
9 468 30 10
10 410 30 10
11 470 30 10
12 407 30 10
13 468 30 10
14 412 30 10
15 469 30 10
16 414 30 10
17 469 30 10
18 412 30 10
19 467 30 10
20 409 30 10
21 469 30 10
22 415 30 10
23 471 30 10
24 420 30 10
25 469 30 10
26 416 30 10
27 464 30 10
28 409 30 10
29 465 30 10
30 412 30 10
31 464 30 10
32 409 30 10
33 466 30 10
34 417 30 10
35 466 30 10
36 417 30 10
37 464 30 10
38 414 30 10
39 466 30 10
40 415 30 10
41 585 30 94
42 234 30 94
43 589 30 94
44 231 30 94
45 585 30 94
46 223 30 94
47 586 30 94
48 223 30 94
49 572 30 94
50 233 30 94
51 585 30 94
52 233 30 94
53 589 30 94
54 234 30 94
55 598 30 94
56 237 30 94
57 605 30 94
58 237 30 94
59 586 30 94
60 233 30 94
61 588 30 94
62 227 30 94
63 585 30 94
64 230 30 94
65 586 30 94
66 230 30 94
67 591 30 94
68 237 30 94
69 586 30 94
70 234 30 94
71 592 30 94
72 237 30 94
73 595 30 94
74 236 30 94
75 600 30 94
76 227 30 94
77 592 30 94
78 237 30 94
79 592 30 94
80 240 30 94
81 468 30 10
82 408 30 10
83 471 30 10
84 405 30 10
85 475 30 10
86 403 30 10
87 470 30 10
88 409 30 10
89 478 30 10
90 405 30 10
91 474 30 10
92 403 30 10
93 472 30 10
94 402 30 10
95 478 30 10
96 408 30 10
97 477 30 10
98 406 30 10
99 473 30 10
100 406 30 10
101 474 30 10
102 406 30 10
103 477 30 10
104 411 30 10
105 480 30 10
106 413 30 10
107 479 30 10
108 408 30 10
109 476 30 10
110 406 30 10
111 476 30 10
112 404 30 10
113 472 30 10
114 407 30 10
115 474 30 10
116 411 30 10
117 473 30 10
118 415 30 10
119 479 30 10
120 409 30 10
121 578 30 94
122 370 30 94
123 570 30 94
124 378 30 94
125 575 30 94
126 367 30 94
127 579 30 94
128 371 30 94
129 576 30 94
130 362 30 94
131 579 30 94
132 372 30 94
133 588 30 94
134 375 30 94
135 586 30 94
136 372 30 94
137 589 30 94
138 378 30 94
139 587 30 94
140 375 30 94
141 578 30 94
142 368 30 94
143 575 30 94
144 375 30 94
145 574 30 94
146 376 30 94
147 575 30 94
148 367 30 94
149 580 30 94
150 382 30 94
151 583 30 94
152 368 30 94
153 591 30 94
154 386 30 94
155 595 30 94
156 379 30 94
157 593 30 94
158 384 30 94
159 607 30 94
160 399 30 94
161 760 30 122
162 625 30 122
163 746 30 122
164 612 30 122
165 762 30 122
166 625 30 122
167 783 30 122
168 637 30 122
169 778 30 122
170 640 30 122
171 778 30 122
172 638 30 122
173 791 30 122
174 638 30 122
175 782 30 122
176 635 30 122
177 792 30 122
178 640 30 122
179 783 30 122
180 637 30 122
181 774 30 122
182 622 30 122
183 777 30 122
184 618 30 122
185 777 30 122
186 622 30 122
187 765 30 122
188 623 30 122
189 769 30 122
190 625 30 122
191 775 30 122
192 622 30 122
193 777 30 122
194 628 30 122
195 769 30 122
196 620 30 122
197 778 30 122
198 623 30 122
199 788 30 122
200 634 30 122
201 457 40 38
202 416 40 38
203 460 40 38
204 438 40 38
205 465 40 38
206 441 40 38
207 467 40 38
208 442 40 38
209 473 40 38
210 452 40 38
211 469 40 38
212 446 40 38
213 478 40 38
214 450 40 38
215 476 40 38
216 454 40 38
217 479 40 38
218 452 40 38
219 480 40 38
220 450 40 38
221 481 40 38
222 443 40 38
223 476 40 38
224 447 40 38
225 472 40 38
226 450 40 38
227 479 40 38
228 449 40 38
229 478 40 38
230 455 40 38
231 478 40 38
232 457 40 38
233 481 40 38
234 447 40 38
235 504 40 38
236 452 40 38
237 472 40 38
238 447 40 38
239 472 40 38
240 451 40 38
241 622 40 66
242 377 40 66
243 619 40 66
244 378 40 66
245 622 40 66
246 369 40 66
247 616 40 66
248 374 40 66
249 619 40 66
250 374 40 66
251 616 40 66
252 374 40 66
253 621 40 66
254 375 40 66
255 618 40 66
256 397 40 66
257 633 40 66
258 406 40 66
259 652 40 66
260 412 40 66
261 652 40 66
262 419 40 66
263 658 40 66
264 423 40 66
265 659 40 66
266 409 40 66
267 650 40 66
268 405 40 66
269 653 40 66
270 405 40 66
271 652 40 66
272 403 40 66
273 656 40 66
274 408 40 66
275 644 40 66
276 406 40 66
277 649 40 66
278 412 40 66
279 650 40 66
280 406 40 66
281 853 40 122
282 330 40 122
283 859 40 122
284 323 40 122
285 842 40 122
286 308 40 122
287 842 40 122
288 324 40 122
289 831 40 122
290 334 40 122
291 838 40 122
292 341 40 122
293 836 40 122
294 328 40 122
295 840 40 122
296 324 40 122
297 836 40 122
298 321 40 122
299 831 40 122
300 328 40 122
301 833 40 122
302 328 40 122
303 840 40 122
304 330 40 122
305 831 40 122
306 321 40 122
307 833 40 122
308 328 40 122
309 833 40 122
310 321 40 122
311 840 40 122
312 319 40 122
313 838 40 122
314 317 40 122
315 831 40 122
316 319 40 122
317 827 40 122
318 323 40 122
319 836 40 122
320 328 40 122
321 442 40 38
322 407 40 38
323 437 40 38
324 410 40 38
325 444 40 38
326 412 40 38
327 440 40 38
328 414 40 38
329 439 40 38
330 413 40 38
331 436 40 38
332 416 40 38
333 446 40 38
334 412 40 38
335 438 40 38
336 414 40 38
337 443 40 38
338 408 40 38
339 446 40 38
340 407 40 38
341 445 40 38
342 413 40 38
343 453 40 38
344 414 40 38
345 449 40 38
346 417 40 38
347 447 40 38
348 411 40 38
349 443 40 38
350 417 40 38
351 447 40 38
352 410 40 38
353 449 40 38
354 409 40 38
355 442 40 38
356 413 40 38
357 451 40 38
358 412 40 38
359 447 40 38
360 420 40 38
361 526 40 66
362 467 40 66
363 532 40 66
364 470 40 66
365 528 40 66
366 474 40 66
367 529 40 66
368 472 40 66
369 533 40 66
370 480 40 66
371 542 40 66
372 487 40 66
373 545 40 66
374 504 40 66
375 549 40 66
376 507 40 66
377 546 40 66
378 517 40 66
379 541 40 66
380 518 40 66
381 554 40 66
382 514 40 66
383 564 40 66
384 514 40 66
385 571 40 66
386 522 40 66
387 575 40 66
388 525 40 66
389 582 40 66
390 533 40 66
391 588 40 66
392 536 40 66
393 591 40 66
394 553 40 66
395 592 40 66
396 557 40 66
397 592 40 66
398 563 40 66
399 583 40 66
400 568 40 66
> dadosc <- summarySE(dados, measurevar="Fres", groupvars=c("Vc","Lu"))
> dadosc
Vc Lu N Fres sd se ci
1 30 10 80 440.6875 30.91540 3.456447 6.879885
2 30 94 80 445.0250 150.97028 16.878990 33.596789
3 30 122 40 701.7000 75.06688 11.869115 24.007552
4 40 38 80 444.6125 23.31973 2.607225 5.189552
5 40 66 80 526.7125 90.77824 10.149316 20.201707
6 40 122 40 581.1250 259.74092 41.068645 83.069175
> ggplot(dadosc, aes(x=Lu, y=Fres, colour=Vc)) +
+ geom_errorbar(aes(ymin=Fres-se, ymax=Fres+se), width=5) +
+ geom_point()
> pd <- position_dodge(0.1)
Up to here I got this graph, very close to my desired graph, except for the fact I´d like a legend with only two colors, one for Vc=30 and other for Vc=40.
![enter image description here][1]
Then I try the following script:
ggplot(dadosc, aes(x=Lu, y=Fres, ymax = max(Fres), colour=Vc, group=Vc)) +
+ geom_errorbar(aes(ymin=Fres-se, ymax=Fres+se), colour="black", width=.1, position=pd) +
+ geom_point(position=pd, size=3, shape=21, fill="white") + # 21 is filled circle
+ xlab("Machining lenght (mm)") +
+ ylab("Machining forces (N)") +
+ scale_colour_hue(name="Cutting Velocity",
+ breaks=c("30", "40"),
+ labels=c("Vc = 30 m/min", " Vc = 40 m/min "),
+ l=40) +
+ ggtitle("The Effect of Cutting Velocity on Machining Forces") +
+ expand_limits(y=0) +
+ scale_y_continuous(breaks=0:750*50) +
+ theme_bw() +
+ theme(legend.justification=c(1,0),
+ legend.position=c(1,0))
Error: Continuous value supplied to discrete scale
And I receive this message:
"Error: Continuous value supplied to discrete scale"!

Vc should be a factor if you want two values in the legend. You were getting that error because you were trying to scale Vc as discrete (breaks = c(30, 40)) when it was of type integer
ggplot(dadosc, aes(x=Lu, y=Fres, colour=factor(Vc))) +
...

Technical error of measurement in between two columns

I have the following data frame:
data_2
sex age seca1 chad1 DL alog1 dig1 scifirst1 crimetech1
1 F 19 1800 1797 180 70 69 421 424
2 F 19 1682 1670 167 69 69 421 423
3 F 21 1765 1765 178 80 81 421 423
4 F 21 1829 1833 181 74 72 421 419
5 F 21 1706 1705 170 103 101 439 440
6 F 18 1607 1606 160 76 76 440 439
7 F 19 1578 1576 156 50 48 422 422
8 F 19 1577 1575 156 61 61 439 441
9 F 21 1666 1665 166 52 51 439 441
10 F 17 1710 1716 172 65 65 420 420
11 F 28 1616 1619 161 66 65 426 428
12 F 22 1648 1644 165 58 57 426 429
13 F 19 1569 1570 155 55 54 419 420
14 F 19 1779 1777 177 55 54 422 422
15 M 18 1773 1772 179 70 69 420 419
16 M 18 1816 1809 181 81 80 442 440
17 M 19 1766 1765 178 77 76 425 425
18 M 19 1745 1741 174 76 76 421 423
19 M 18 1716 1714 170 71 70 445 446
20 M 21 1785 1783 179 64 63 446 445
21 M 19 1850 1854 185 71 72 422 421
22 M 31 1875 1880 188 95 95 419 420
23 M 26 1877 1877 186 106 106 420 420
24 M 19 1836 1837 185 100 100 426 423
25 M 18 1825 1823 182 85 85 444 439
26 M 19 1755 1754 174 79 78 420 419
27 M 26 1658 1658 165 69 69 421 421
28 M 20 1816 1818 183 84 83 439 440
29 M 18 1755 1755 175 67 67 429 422
I wish to compute the technical error measurement (TEM) between " alog1 " and " dig1 ", which has the following formula:
TEM= √(D/2n)
Where D is the sum of the differences between alog1 and dig1 squared and n is 29
I'm not sure how to compute the sum of the differences squared between the two columns in the first place. Please help.

Probably with
n <- 29
TEM <- sqrt((data_2$alog1-data_2$dig1)^2/2*n)
data_3 <- cbind(data_2, TEM) #To bind it to the table and create the output table 3
Check the formula of TEM maybe I didn't understand it correctly.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Merge rows with duplicate IDs - r

You group-by on Symbol and sum all columns. library(dplyr) df %>% group_by(Symbol) %>% summarise_all(sum) using data.table library(data.table) setDT(df)[ , lapply(.SD, sum),by="Symbol"]

We can just use aggregate from base R aggregate(.~ Symbol, df, FUN = sum)

Related

How to do a manipulations with datasets for which the name is listed in a vector?

Split a data.frame into n random groups with x rows each

Convert non-numeric rows and columns to zero

continuous value supplied to discrete scale

Technical error of measurement in between two columns

Categories

Resources