R map() 2 levels into list - r

I am stuck on doing nested map() or maybe map() pipe.
I have a list of 4 outputs in object "output". In each of the four output there is an element "parameters" that is a list of 3 elements. THe 1st element is "unstandardized"
From the View tool I can see the code to get the unstandardized parameters from any one output
output[["ar.4g_gm.pr.dual..semi.inv..phantom.out"]][["parameters"]][["unstandardized"]])
I have tried to use map over outputs extracting parameters piped into map_dfr to extract and rbind the unstandardized parameters, which does the job ...
x<- map(output,"parameters") %>% map_dfr("unstandardized")
but I want to have the top-level list element name (i.e., the output file) in a column of my result.
Is there a way to nest the map functions or some other syntax to get the 4 top-level list element names into a column?
Here is statements with dummy data. I tworks but I need to cbind rep(c"out1","out2","out3", each=5) to the result and I want it to happen w/o cbind.
output <- list(out1=list(e1=c(1,2,3),
e2=c(T,F,T),
parm=list(a = as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),
b = as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),
stand = cbind(as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),grp=rep(1,times=5)))),
out2=list(e1=c(3,4,5),
e2=c(T,F,T),
parm=list(a = as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),
b = as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),
stand = cbind(as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),grp=rep(2,times=5)))),
out3=list(e1=c(1,2,3),
e2=c(T,F,T),
parm=list(a = as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),
b = as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),
stand = cbind(as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),grp=rep(3,times=5)))) )
output[["out1"]][["parm"]][["stand"]]
map(output,"parm") %>% map_dfr("stand")

library(purrr)
library(dplyr)
map(output, pluck, "parm", "stand") %>%
bind_rows(.id = "foo")
# foo V1 V2 V3 V4 V5 V6 V7 V8 grp
# 1 out1 845 527 296 902 358 447 317 347 1
# 2 out1 679 473 290 482 349 691 144 731 1
# 3 out1 842 574 135 894 628 542 757 174 1
# 4 out1 379 548 836 176 796 744 889 922 1
# 5 out1 498 837 492 965 255 508 138 689 1
# 6 out2 203 599 158 355 793 884 722 210 2
# 7 out2 543 693 484 195 511 174 793 654 2
# 8 out2 593 839 296 926 387 788 260 143 2
# 9 out2 373 363 323 939 416 348 792 211 2
# 10 out2 773 218 616 806 119 304 775 775 2
# 11 out3 171 217 859 899 664 737 114 837 3
# 12 out3 953 225 600 581 528 388 714 899 3
# 13 out3 615 550 860 134 667 136 987 993 3
# 14 out3 494 407 726 128 559 418 782 832 3
# 15 out3 729 734 432 354 716 288 734 264 3

output <- list(out1=list(e1=c(1,2,3),
e2=c(T,F,T),
parm=list(a = as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),
b = as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),
stand = cbind(as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),grp=rep(1,times=5)))),
out2=list(e1=c(3,4,5),
e2=c(T,F,T),
parm=list(a = as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),
b = as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),
stand = cbind(as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),grp=rep(2,times=5)))),
out3=list(e1=c(1,2,3),
e2=c(T,F,T),
parm=list(a = as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),
b = as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),
stand = cbind(as.data.frame(matrix(sample(101:999,size=40,replace=TRUE),nrow=5)),grp=rep(3,times=5)))) )
library(tidyverse)
map(output,"parm") %>%
map("stand") %>%
map2(names(output), ~ cbind(.x, df_name=.y))
# $out1
# V1 V2 V3 V4 V5 V6 V7 V8 grp df_name
# 1 695 356 109 463 688 496 842 310 1 out1
# 2 922 450 680 170 567 921 530 419 1 out1
# 3 568 604 626 446 364 206 541 644 1 out1
# 4 210 237 300 432 366 945 413 368 1 out1
# 5 529 224 392 181 156 126 255 283 1 out1
#
# $out2
# V1 V2 V3 V4 V5 V6 V7 V8 grp df_name
# 1 320 429 109 749 394 657 690 764 2 out2
# 2 580 296 755 101 385 582 956 547 2 out2
# 3 939 122 697 146 747 108 672 836 2 out2
# 4 550 972 128 396 874 224 158 133 2 out2
# 5 923 650 888 895 742 166 533 225 2 out2
#
# $out3
# V1 V2 V3 V4 V5 V6 V7 V8 grp df_name
# 1 347 928 777 656 503 783 847 620 3 out3
# 2 496 586 919 991 810 797 779 202 3 out3
# 3 644 731 441 896 284 514 954 981 3 out3
# 4 303 803 945 806 938 692 587 775 3 out3
# 5 243 666 719 823 133 773 585 461 3 out3

Related

R_splitting a data frame based on column-wise NA value occurance

Sample data
set.seed(16)
aaa <- 1:1000
aaa[round(runif(100,1,1000))] <- NA
aaa.df <- as.data.frame(matrix(aaa, ncol=5))
I want the aaa.df to be split into multiple groups based on which column(s) contains NA value(s), so for example, if 10th, 16th, 200th rows has NA value in the same column, I want these rows to be in one group and so on. It should also work when a. there is no NA values in a row and b. there is multiple NA values in a row.
I also want to keep the original row number when grouping.
Edit: To make it clearer this is the expected output (Obtained using Taufi's answer, but I am still looking for a more elegant way)
[[1]]
# A tibble: 119 x 6
V1.y V2.y V3.y V4.y V5.y V6
<int> <int> <int> <int> <int> <int>
1 1 201 401 601 801 1
2 2 202 402 602 802 2
3 3 203 403 603 803 3
4 4 204 404 604 804 4
5 5 205 405 605 805 5
6 6 206 406 606 806 6
7 7 207 407 607 807 7
8 8 208 408 608 808 8
9 9 209 409 609 809 9
10 10 210 410 610 810 10
# ... with 109 more rows
[[2]]
# A tibble: 14 x 6
V1.y V2.y V3.y V4.y V5.y V6
<int> <int> <int> <int> <int> <int>
1 20 220 420 620 NA 20
2 32 232 432 632 NA 32
3 47 247 447 647 NA 47
4 70 270 470 670 NA 70
5 85 285 485 685 NA 85
6 92 292 492 692 NA 92
7 129 329 529 729 NA 129
8 132 332 532 732 NA 132
9 137 337 537 737 NA 137
10 151 351 551 751 NA 151
11 152 352 552 752 NA 152
12 168 368 568 768 NA 168
13 178 378 578 778 NA 178
14 181 381 581 781 NA 181
[[3]]
# A tibble: 15 x 6
V1.y V2.y V3.y V4.y V5.y V6
<int> <int> <int> <int> <int> <int>
1 11 211 411 NA 811 11
2 37 237 437 NA 837 37
3 62 262 462 NA 862 62
4 82 282 482 NA 882 82
5 83 283 483 NA 883 83
6 89 289 489 NA 889 89
7 107 307 507 NA 907 107
8 115 315 515 NA 915 115
9 116 316 516 NA 916 116
10 117 317 517 NA 917 117
11 118 318 518 NA 918 118
12 165 365 565 NA 965 165
13 176 376 576 NA 976 176
14 189 389 589 NA 989 189
15 200 400 600 NA 1000 200
[[4]]
# A tibble: 1 x 6
V1.y V2.y V3.y V4.y V5.y V6
<int> <int> <int> <int> <int> <int>
1 12 212 412 NA NA 12
[[5]]
# A tibble: 16 x 6
V1.y V2.y V3.y V4.y V5.y V6
<int> <int> <int> <int> <int> <int>
1 17 217 NA 617 817 17
2 28 228 NA 628 828 28
3 31 231 NA 631 831 31
4 48 248 NA 648 848 48
5 58 258 NA 658 858 58
6 72 272 NA 672 872 72
7 80 280 NA 680 880 80
8 126 326 NA 726 926 126
9 144 344 NA 744 944 144
10 145 345 NA 745 945 145
11 149 349 NA 749 949 149
12 153 353 NA 753 953 153
13 186 386 NA 786 986 186
14 190 390 NA 790 990 190
15 192 392 NA 792 992 192
16 196 396 NA 796 996 196
and so on..
In addition to my previous more brute-force kind of answer, I came up with the following way more elegant one-liner that avoids any unnecessary joins or intermediate assignment steps. Since you already accepted my previous answer, I let that be as it stands and add the conceptually different one-liner below. The idea is to split() the data.frame based on pasted column numbers from which() that indicate the presence of NA.
split(aaa.df,
apply(aaa.df, 1,
function(x) paste(which(is.na(x)), collapse = ",")))
Output
$`1`
V1 V2 V3 V4 V5
77 NA 277 477 677 877
93 NA 293 493 693 893
97 NA 297 497 697 897
109 NA 309 509 709 909
119 NA 319 519 719 919
140 NA 340 540 740 940
154 NA 354 554 754 954
158 NA 358 558 758 958
171 NA 371 571 771 971
172 NA 372 572 772 972
$`1,2,3`
V1 V2 V3 V4 V5
51 NA NA NA 651 851
$`1,3,5`
V1 V2 V3 V4 V5
75 NA 275 NA 675 NA
$`1,4`
V1 V2 V3 V4 V5
194 NA 394 594 NA 994
$`1,4,5`
V1 V2 V3 V4 V5
49 NA 249 449 NA NA
...
and so on ...
A quick, but not very elegant solution would be as follows. Note that the original row number later is in V6.
aaa.df %<>% mutate(Rownum = 1:nrow(aaa.df))
Aux.df <- cbind(is.na(aaa.df[, 1:(ncol(aaa.df) - 1)]), 1:nrow(aaa.df)) %>%
as.data.frame %>%
group_by(V1, V2, V3, V4, V5) %>%
group_split
Sol <- lapply(Aux.df, function(x) inner_join(x, aaa.df, by = c("V6"="Rownum")) %>%
select(V1.y, V2.y, V3.y, V4.y, V5.y, V6))
Output
> Sol
[[1]]
# A tibble: 119 x 6
V1.y V2.y V3.y V4.y V5.y V6
<int> <int> <int> <int> <int> <int>
1 1 201 401 601 801 1
2 2 202 402 602 802 2
3 3 203 403 603 803 3
4 4 204 404 604 804 4
5 5 205 405 605 805 5
6 6 206 406 606 806 6
7 7 207 407 607 807 7
8 8 208 408 608 808 8
9 9 209 409 609 809 9
10 10 210 410 610 810 10
# ... with 109 more rows
[[2]]
# A tibble: 14 x 6
V1.y V2.y V3.y V4.y V5.y V6
<int> <int> <int> <int> <int> <int>
1 20 220 420 620 NA 20
2 32 232 432 632 NA 32
3 47 247 447 647 NA 47
4 70 270 470 670 NA 70
5 85 285 485 685 NA 85
6 92 292 492 692 NA 92
7 129 329 529 729 NA 129
8 132 332 532 732 NA 132
9 137 337 537 737 NA 137
10 151 351 551 751 NA 151
11 152 352 552 752 NA 152
12 168 368 568 768 NA 168
13 178 378 578 778 NA 178
14 181 381 581 781 NA 181
....
and so on ...

R, remove rows based on the values from multiple column

Suppose i have a dataframe with 100 rows and 100 columns.
For each row, if any 2 columns have the same value, then this row should be removed.
For example, if column 1 and 2 are equal, then this row should be removed.
Another example, if column 10 and column 47 are equal, then this row should be removed as well.
Example:
test <- data.frame(x1 = c('a', 'a', 'c', 'd'),
x2 = c('a', 'x', 'f', 'h'),
x3 = c('s', 'a', 'f', 'g'),
x4 = c('a', 'x', 'u', 'a'))
test
x1 x2 x3 x4
1 a a s a
2 a x a x
3 c f f u
4 d h g a
Only the 4th row should be kept.
How to do this in a quick and concise way? Not using for loops....
Use apply to look for duplicates in each row. (Note that this internally converts your data to a matrix for the comparison. If you are doing a lot of row-wise operations I would recommend either keeping it as a matrix or converting it to a long format as in Jack Brookes's answer.)
# sample data
set.seed(47)
dd = data.frame(matrix(sample(1:5000, size = 100^2, replace = TRUE), nrow = 100))
# remove rows with duplicate entries
result = dd[apply(dd, MARGIN = 1, FUN = function(x) !any(duplicated(x))), ]
Tested on this 20x20 dataframe
library(tidyverse)
N <- 20
df <- matrix(as.integer(runif(N^2, 1, 500)), nrow = N, ncol = N) %>%
as.tibble()
df
# # A tibble: 20 x 20
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
# <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 350 278 256 484 486 249 35 308 248 66 493 130 149 2 374 51 370 423 165 388
# 2 368 448 441 62 304 373 38 375 406 463 412 95 174 365 170 113 459 369 62 21
# 3 250 459 416 128 372 67 281 450 48 122 308 56 121 497 498 220 34 4 126 411
# 4 171 306 390 13 395 160 256 258 76 131 471 487 190 492 21 237 380 129 5 30
# 5 402 421 6 401 50 292 470 319 283 178 234 46 176 178 288 499 7 221 123 268
# 6 415 342 132 379 150 35 323 225 246 496 460 478 205 255 460 62 78 207 82 118
# 7 207 52 420 216 9 366 390 382 304 63 427 425 350 112 488 400 328 239 148 40
# 8 392 455 156 386 478 3 359 184 420 138 29 434 31 279 87 233 455 21 181 437
# 9 349 460 498 278 104 93 253 287 124 351 60 333 321 116 19 156 372 168 95 169
# 10 386 73 362 127 313 93 427 81 188 366 418 115 353 412 483 147 295 53 82 188
# 11 272 480 168 306 359 75 436 228 187 279 410 388 62 227 415 374 366 313 187 49
# 12 177 382 233 146 338 76 390 232 336 448 175 79 202 230 317 296 410 90 102 465
# 13 108 433 59 151 8 138 464 458 183 316 481 153 403 193 71 136 27 454 62 439
# 14 421 72 106 442 338 440 476 357 74 108 94 407 453 262 355 356 27 217 243 455
# 15 325 449 151 473 241 11 154 52 77 489 137 279 420 120 165 289 70 128 384 53
# 16 126 189 43 354 233 168 48 285 175 348 404 254 168 126 95 65 493 493 187 228
# 17 26 143 112 107 350 198 353 439 192 158 151 23 326 4 304 162 84 412 499 170
# 18 88 156 222 227 452 233 397 203 478 73 483 241 151 38 176 77 244 396 9 393
# 19 361 486 423 310 153 235 274 204 399 493 422 374 399 10 215 468 322 38 395 390
# 20 417 124 21 220 123 399 354 182 233 24 397 263 182 211 360 419 202 240 363 187
Removing rows with any duplicates
df %>%
group_by(id = row_number()) %>%
gather(col, value, -id) %>%
filter(!any(duplicated(value))) %>%
spread(col, value)
# # A tibble: 11 x 21
# # Groups: id [11]
# id V1 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V2 V20 V3 V4 V5 V6 V7 V8 V9
# <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 1 350 66 493 130 149 2 374 51 370 423 165 278 388 256 484 486 249 35 308 248
# 2 3 250 122 308 56 121 497 498 220 34 4 126 459 411 416 128 372 67 281 450 48
# 3 4 171 131 471 487 190 492 21 237 380 129 5 306 30 390 13 395 160 256 258 76
# 4 7 207 63 427 425 350 112 488 400 328 239 148 52 40 420 216 9 366 390 382 304
# 5 9 349 351 60 333 321 116 19 156 372 168 95 460 169 498 278 104 93 253 287 124
# 6 12 177 448 175 79 202 230 317 296 410 90 102 382 465 233 146 338 76 390 232 336
# 7 13 108 316 481 153 403 193 71 136 27 454 62 433 439 59 151 8 138 464 458 183
# 8 14 421 108 94 407 453 262 355 356 27 217 243 72 455 106 442 338 440 476 357 74
# 9 15 325 489 137 279 420 120 165 289 70 128 384 449 53 151 473 241 11 154 52 77
# 10 17 26 158 151 23 326 4 304 162 84 412 499 143 170 112 107 350 198 353 439 192
# 11 18 88 73 483 241 151 38 176 77 244 396 9 156 393 222 227 452 233 397 203 478
You can try a series of filters from dplyr. I cooked up some sample data here. If your variables are named then you can use something like the first example. Otherwise the second should work
library(tidyverse)
#> Warning: package 'dplyr' was built under R version 3.5.1
data <- data_frame(
A = c(1,2,3,4,5,6),
B= c(1,3,5,7,9,11),
C = c(2,2,6,8,10,12)
)
data %>%
filter(A != B) %>% # This removed the first row
filter(A != C) # This removed the second row
#> # A tibble: 4 x 3
#> A B C
#> <dbl> <dbl> <dbl>
#> 1 3 5 6
#> 2 4 7 8
#> 3 5 9 10
#> 4 6 11 12
data %>%
filter(.[1] != .[2]) %>%
filter(.[1] != .[3])
#> # A tibble: 4 x 3
#> A B C
#> <dbl> <dbl> <dbl>
#> 1 3 5 6
#> 2 4 7 8
#> 3 5 9 10
#> 4 6 11 12

R reshape by day, month, year [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 5 years ago.
I have a simple table in the following format:
Date val
2005-01-01 15
2005-01-02 18
2005-01-03 20
...
And am trying to reshape it to the following "wide" column format:
Year Month day1 day2 day3 day4 ...day31
2005 01 day1val day2val day3val day4val ...day31val
2005 02 day1val day2val day3val day4val ...day31val
I've successfully split the date column into three separate d,m,y columns using
dates_separated <- data.frame(year = as.numeric(format(input_df$DATE, format = "%Y")),
month = as.numeric(format(input_df$DATE, format = "%m")),
day = as.numeric(format(input_df$DATE, format = "%d")))
output_df <- cbind(input_df, dates_sep)
I'm trying to use the reshape function to get this done, but am finding my output could be more complicated than it can handle. Is there another function I should be using here?
Edit: I don't believe this was a duplicate of what was suggested. markdly's answer below did exactly what I needed. Thanks!
For the sake of completeness, here is a solution using the dcast() function.
OP's input_df consists only of two columns Date and val. So, let's create a full year of sample data by
set.seed(1234L)
input_df <- data.frame(Date = as.Date("2005-01-01") + 0:364,
val = sample(100:999, 365L, TRUE))
The dcast() function is available from the reshape2 and the data.table packages. Here, data.table is used because of its handy year(), month(), and mday() functions:
library(data.table)
dcast(input_df, year(Date) + month(Date) ~ mday(Date))
year(Date) month(Date) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1 2005 1 202 660 648 661 874 676 108 309 699 562 724 590 354 931 363 853 357 340 268 309 384 372 243 135 296 829 573 923 848 141 510
2 2005 2 338 374 556 262 783 281 332 992 826 598 681 380 659 396 551 709 536 319 788 166 378 745 554 237 553 544 776 257 NA NA NA
3 2005 3 863 878 137 385 112 315 735 377 557 146 608 209 903 113 804 180 567 445 163 388 701 933 524 228 589 276 908 450 379 244 906
4 2005 4 249 910 220 218 194 560 370 124 378 767 131 608 352 283 220 393 239 216 491 134 741 190 955 209 297 921 951 351 211 817 NA
5 2005 5 769 924 995 948 537 355 326 552 547 386 966 670 214 480 922 521 917 637 668 882 552 985 391 533 421 664 767 609 982 619 495
6 2005 6 305 173 865 311 989 641 998 438 599 486 618 489 302 176 673 487 165 822 392 781 625 737 484 409 783 481 604 204 372 530 NA
7 2005 7 410 640 168 960 119 857 669 379 768 675 993 215 894 829 839 851 759 984 675 694 575 385 791 573 759 376 463 283 987 609 352
8 2005 8 266 782 610 938 674 730 531 865 480 128 332 401 220 549 821 403 558 544 817 610 196 826 610 291 774 376 540 990 481 319 295
9 2005 9 720 982 529 796 616 969 817 578 636 337 351 158 606 336 102 630 568 860 126 639 341 208 190 773 114 144 772 421 783 438 NA
10 2005 10 819 123 555 839 590 340 410 432 486 926 805 764 352 511 358 726 838 689 472 956 318 647 782 724 203 672 378 417 982 584 499
11 2005 11 954 507 271 992 593 791 922 713 466 466 231 277 272 467 413 851 278 875 457 237 405 430 484 267 692 928 760 894 958 275 NA
12 2005 12 525 447 436 125 936 469 960 344 565 980 432 379 130 700 928 140 281 769 217 737 998 949 633 758 538 791 102 602 514 396 851
To prettify the result, Year and Month can be computed in advance:
dcast(setDT(input_df)[, Year := year(Date)][, Month := month(Date)],
Year + Month ~ sprintf("day%02i", mday(Date)), value.var = "val")
Year Month day01 day02 day03 day04 day05 day06 day07 day08 day09 day10 day11 day12 day13 day14 day15 day16 day17 day18 day19 day20 ...
1: 2005 1 202 660 648 661 874 676 108 309 699 562 724 590 354 931 363 853 357 340 268 309
2: 2005 2 338 374 556 262 783 281 332 992 826 598 681 380 659 396 551 709 536 319 788 166
3: 2005 3 863 878 137 385 112 315 735 377 557 146 608 209 903 113 804 180 567 445 163 388
4: 2005 4 249 910 220 218 194 560 370 124 378 767 131 608 352 283 220 393 239 216 491 134
5: 2005 5 769 924 995 948 537 355 326 552 547 386 966 670 214 480 922 521 917 637 668 882
6: 2005 6 305 173 865 311 989 641 998 438 599 486 618 489 302 176 673 487 165 822 392 781
7: 2005 7 410 640 168 960 119 857 669 379 768 675 993 215 894 829 839 851 759 984 675 694
8: 2005 8 266 782 610 938 674 730 531 865 480 128 332 401 220 549 821 403 558 544 817 610
9: 2005 9 720 982 529 796 616 969 817 578 636 337 351 158 606 336 102 630 568 860 126 639
10: 2005 10 819 123 555 839 590 340 410 432 486 926 805 764 352 511 358 726 838 689 472 956
11: 2005 11 954 507 271 992 593 791 922 713 466 466 231 277 272 467 413 851 278 875 457 237
12: 2005 12 525 447 436 125 936 469 960 344 565 980 432 379 130 700 928 140 281 769 217 737
Note that here sprintf("Day%02i", mday(Date)) is used to keep the columns ordered. Using paste0("day", day) as in markdly's answer, the columns would be in the wrong order:
day1 day10 day11 day12 day13 day14 day15 day16 day17 day18 day19 day2 day20 ...
If you can add actual data to your question it really helps others to post answers. For example, here's some data for 5 days in each month in 2015:
set.seed(123)
df <- expand.grid(year = 2015, month = 1:12, day = 1:5)
df$val <- sample.int(1000, nrow(df))
head(df)
#> year month day val
#> 1 2015 1 1 288
#> 2 2015 2 1 788
#> 3 2015 3 1 409
#> 4 2015 4 1 881
#> 5 2015 5 1 937
#> 6 2015 6 1 46
This can be converted to the desired format using tidyr::spread:
library(dplyr)
library(tidyr)
df %>%
mutate(day = paste0("day", day)) %>%
spread(day, val)
#> year month day1 day2 day3 day4 day5
#> 1 2015 1 288 670 640 732 254
#> 2 2015 2 788 566 691 209 816
#> 3 2015 3 409 102 530 307 44
#> 4 2015 4 881 993 579 223 420
#> 5 2015 5 937 243 282 138 758
#> 6 2015 6 46 42 143 398 116
#> 7 2015 7 525 323 935 397 531
#> 8 2015 8 887 996 875 353 196
#> 9 2015 9 548 872 669 146 121
#> 10 2015 10 453 679 770 133 711
#> 11 2015 11 948 627 24 961 844
#> 12 2015 12 449 972 462 445 957

Reshaping data of different time lengths in R

I want to perform several repeated measures on my data. I first need to reshape the dataframe from a wide to a long format to do that.
This is my dataframe:
ID Group x1 x2 x3 y1 y2 y3 z1 z2
144 1 566 613 597 563 549 562 599 469
167 2 697 638 756 682 695 693 718 439.5
247 4 643 698 730 669 656 669 698 514.5
317 4 633 646 641 520 543 586 559 405.5
344 3 651 678 708 589 608 615 667 514
352 2 578 702 671 536 594 579 591 467.5
382 1 678 690 693 555 565 534 521 457.5
447 3 668 672 718 663 689 751 784 506.5
464 2 760 704 763 514 554 520 564 486
628 1 762 789 783 618 610 645 625 536
As you might notice, I have measured variable x and y on three time points and variable z at two points. I was wondering if it makes sense at all to try and reshape the data into long format, given the fact that I have separate time lengths.
I have not been able to do so. So first of all, does it even make sense to do it this way? Or should I make two dataframes? Second, if it does make sense, how?
EDIT: I would expect something like:
ID Group Timex Timey Timez x y z
144 1 1 1 1 566 563 599
144 1 2 2 2 613 549 469
144 1 3 3 597 562
167 2 1 1 1 697 682 718
167 2 2 2 2 638 695 439.5
167 2 3 3 756 693
....
But I'm not even sure if that makes sense at all, to have these empty cells?
Here is one idea. dt_all is the final output. Notice that this example does not create Timex, Timey, and Timez, but I would argue that one column called Time is sufficient and individual Timex, Timey, and Timez are redundant.
# Load packages
library(dplyr)
library(tidyr)
# Process the data
dt_all <- dt %>%
gather(Var, Value, -ID, -Group) %>%
mutate(Time = sub("[a-z]", "", Var), Type = sub("[0-9]", "", Var)) %>%
select(-Var) %>%
spread(Type, Value)
Data Preparation
# Create example data frames
dt <- read.table(text = "ID Group x1 x2 x3 y1 y2 y3 z1 z2
144 1 566 613 597 563 549 562 599 469
167 2 697 638 756 682 695 693 718 439.5
247 4 643 698 730 669 656 669 698 514.5
317 4 633 646 641 520 543 586 559 405.5
344 3 651 678 708 589 608 615 667 514
352 2 578 702 671 536 594 579 591 467.5
382 1 678 690 693 555 565 534 521 457.5
447 3 668 672 718 663 689 751 784 506.5
464 2 760 704 763 514 554 520 564 486
628 1 762 789 783 618 610 645 625 536",
header = TRUE)

For loop on dataframe in R

I have a dataframe, each variable has different length (shorter variables have NA values).
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 581 466 528 424 491 500 652 219 520
2 655 320 532 350 508 498 660 85 473
3 479 349 510 150 490 499 611 598 459
4 855 585 471 92 508 499 557 668 493
5 318 538 506 113 492 499 347 291 483
6 581 329 502 265 509 502 301 293 511
7 741 359 536 399 498 500 565 690 506
8 257 475 521 296 498 502 316 53 536
9 759 434 538 447 490 500 614 449 524
10 525 527 506 174 499 500 649 395 456
11 621 670 489 756 497 498 401 443 465
12 789 307 504 808 501 498 499 63 533
13 368 392 515 940 496 501 638 909 514
14 242 549 480 380 503 501 489 347 465
15 432 405 451 914 493 501 319 324 541
16 608 609 514 441 497 499 572 932 473
17 301 691 548 783 497 502 458 301 482
18 792 638 493 964 505 498 378 692 500
19 727 377 536 974 491 499 301 957 524
20 597 463 518 418 491 499 626 245 504
21 700 407 549 375 501 501 351 706 495
22 705 661 493 798 492 501 660 694 494
23 454 426 523 28 504 498 362 797 471
24 432 627 452 550 491 500 474 50 500
25 124 338 501 779 499 502 684 316 514
26 826 683 477 751 492 502 632 308 524
27 218 631 500 296 502 498 693 169 515
28 460 652 502 306 505 498 666 988 459
29 683 621 521 956 498 501 404 218 497
30 316 372 516 524 500 499 405 54 461
31 503 370 520 429 500 502 510 579 493
32 357 369 521 480 495 501 410 667 470
33 451 617 524 191 493 498 535 668 450
34 335 498 522 713 493 498 566 67 520
35 473 421 479 834 497 499 696 670 541
36 447 360 451 708 492 501 528 744 538
37 137 490 490 740 508 500 630 590 469
38 228 455 488 91 500 501 426 477 472
39 873 555 456 520 510 500 662 154 536
40 564 364 532 236 504 498 338 497 516
41 216 480 460 498 503 502 605 566 520
42 389 572 532 943 501 499 572 150 539
43 490 531 536 941 501 502 653 557 508
44 772 421 536 693 507 498 447 861 451
45 390 403 454 985 509 498 695 859 516
46 264 369 550 962 494 498 684 317 504
47 269 667 508 199 490 501 690 757 481
48 877 616 484 516 495 501 300 636 472
49 755 534 511 882 510 499 547 530 479
50 447 455 490 91 504 501 572 NA 539
51 137 555 488 520 503 500 653 NA NA
52 228 364 456 236 501 498 447 NA NA
53 873 480 532 498 501 502 NA NA NA
54 564 NA 460 943 507 499 NA NA NA
55 216 NA 532 941 509 NA NA NA NA
56 389 NA 490 693 NA NA NA NA NA
57 490 NA 488 985 NA NA NA NA NA
58 772 NA 456 NA NA NA NA NA NA
59 390 NA 532 NA NA NA NA NA NA
60 264 NA 460 NA NA NA NA NA NA
61 269 NA 532 NA NA NA NA NA NA
62 877 NA NA NA NA NA NA NA NA
63 755 NA NA NA NA NA NA NA NA
I'm running operations on each variable.
First, I cut the dataframe in single vectors in ascending order for each variable:
a1=dat0[order(dat0$V1),"V1"]
a2=dat0[order(dat0$V2),"V2"]
a3=dat0[order(dat0$V3),"V3"]
a4=dat0[order(dat0$V4),"V4"]
a5=dat0[order(dat0$V5),"V5"]
a6=dat0[order(dat0$V6),"V6"]
a7=dat0[order(dat0$V7),"V7"]
a8=dat0[order(dat0$V8),"V8"]
a9=dat0[order(dat0$V9),"V9"]
Next, I remove the NA.
a1=a1[!is.na(a1)]
a2=a2[!is.na(a2)]
a3=a3[!is.na(a3)]
a4=a4[!is.na(a4)]
a5=a5[!is.na(a5)]
a6=a6[!is.na(a6)]
a7=a7[!is.na(a7)]
a8=a8[!is.na(a8)]
a9=a9[!is.na(a9)]
Finally, I calculate the average of the 25% lowest values of each variable (below the code for only the first variable)
le.1=seq(1:length(a1))
fr.1=le.1/length(a1)
df.1=data.frame(a1,le.1,fr.1)
lq.1=df.1[fr.1<=0.25,]
lqavg.1=mean(lq.1$a1)
The final results I get are:
lqavg.1 lqavg.2 lqavg.3 lqavg.4 lqavg.5 lqavg.6 lqavg.7 lqavg.8 lqavg.9
1 224.6667 351.5385 463.1333 175.5714 491.3846 498 347.9231 127.25 462.3333
The goal is writing a for loop or finding a function to do this without writing the code for each variable.
With the functions kindly suggested by Barker, I get:
> apply(dat0, 2, function(x) mean(x[x <= quantile(x, 0.25, na.rm = TRUE)], na.rm = TRUE))
V1 V2 V3 V4 V5 V6 V7 V8 V9
230.3750 353.3571 467.2778 184.2667 491.5000 498.0000 347.9231 139.8462 463.0769
> apply(dat0, 2, function(x) mean(x[x < quantile(x, 0.25, na.rm = TRUE)], na.rm = TRUE))
V1 V2 V3 V4 V5 V6 V7 V8 V9
230.3750 351.5385 463.1333 175.5714 491.5000 498.0000 347.9231 127.2500 463.0769
Any help is appreciated!
Thanks!
This is ridiculous. Here's how to translate your code to use sapply:
sapply(dat0, function(x) {
x = x[order(x)]
x = x[!is.na(x)]
x = x[(1:length(x)) / length(x) <= 0.25]
return(mean(x))
})
# V1 V2 V3 V4 V5 V6 V7 V8 V9
# 224.6667 351.5385 463.1333 175.5714 491.3846 498.0000 347.9231 127.2500 462.3333
This follows the exact same steps as your code, (order, remove missing values, take 25% of remaining values based on length, find the average). It's output matches yours. sapply will call a function on every column of a data frame. Here we make an anonymous function that does what we want to the column it's being called on.

Resources