I have this data below. I am having problem partitioning this using caret's createPartition.
gg <- structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L,
5L, 5L, 6L, 6L, 6L, 145L, 145L, 145L, 146L, 146L, 146L, 147L,
147L, 147L, 148L, 148L, 148L, 149L, 149L, 149L, 150L, 150L, 150L,
193L, 193L, 193L, 194L, 194L, 194L, 195L, 195L, 195L, 196L, 196L,
196L, 197L, 197L, 197L, 198L, 198L, 198L, 199L, 199L, 199L, 200L,
200L, 200L, 201L, 201L, 201L, 202L, 202L, 202L, 203L, 203L, 203L,
204L, 204L, 204L, 205L, 205L, 205L, 206L, 206L, 206L, 207L, 207L,
207L, 208L, 208L, 208L, 209L, 209L, 209L, 210L, 210L, 210L, 211L,
211L, 211L, 212L, 212L, 212L, 213L, 213L, 213L, 214L, 214L, 214L,
215L, 215L, 215L, 216L, 216L, 216L, 217L, 217L, 217L, 218L, 218L,
218L, 219L, 219L, 219L, 220L, 220L, 220L, 221L, 221L, 221L, 222L,
222L, 222L, 223L, 223L, 223L, 224L, 224L, 224L, 225L, 225L, 225L,
226L, 226L, 226L, 227L, 227L, 227L, 228L, 228L, 228L, 229L, 229L,
229L, 230L, 230L, 230L, 231L, 231L, 231L, 232L, 232L, 232L, 233L,
233L, 233L, 234L, 234L, 234L, 235L, 235L, 235L, 236L, 236L, 236L,
237L, 237L, 237L, 238L, 238L, 238L, 239L, 239L, 239L, 240L, 240L,
240L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L,
11L, 11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L, 15L, 15L,
15L, 16L, 16L, 16L, 17L, 17L, 17L, 18L, 18L, 18L, 19L, 19L, 19L,
20L, 20L, 20L, 21L, 21L, 21L, 22L, 22L, 22L, 23L, 23L, 23L, 24L,
24L, 24L, 25L, 25L, 25L, 26L, 26L, 26L, 27L, 27L, 27L, 28L, 28L,
28L, 29L, 29L, 29L, 30L, 30L, 30L, 31L, 31L, 31L, 32L, 32L, 32L,
33L, 33L, 33L, 34L, 34L, 34L, 35L, 35L, 35L, 36L, 36L, 36L, 37L,
37L, 37L, 38L, 38L, 38L, 39L, 39L, 39L, 40L, 40L, 40L, 41L, 41L,
41L, 42L, 42L, 42L, 43L, 43L, 43L, 44L, 44L, 44L, 45L, 45L, 45L,
46L, 46L, 46L, 47L, 47L, 47L, 48L, 48L, 48L, 49L, 49L, 49L, 50L,
50L, 50L, 51L, 51L, 51L, 52L, 52L, 52L, 53L, 53L, 53L, 54L, 54L,
54L, 55L, 55L, 55L, 56L, 56L, 56L, 57L, 57L, 57L, 58L, 58L, 58L,
59L, 59L, 59L, 60L, 60L, 60L, 61L, 61L, 61L, 62L, 62L, 62L, 63L,
63L, 63L, 64L, 64L, 64L, 65L, 65L, 65L, 66L, 66L, 66L, 67L, 67L,
67L, 68L, 68L, 68L, 69L, 69L, 69L, 70L, 70L, 70L, 71L, 71L, 71L,
72L, 72L, 72L, 73L, 73L, 73L, 74L, 74L, 74L, 75L, 75L, 75L, 76L,
76L, 76L, 77L, 77L, 77L, 78L, 78L, 78L, 79L, 79L, 79L, 80L, 80L,
80L, 81L, 81L, 81L, 82L, 82L, 82L, 83L, 83L, 83L, 84L, 84L, 84L,
85L, 85L, 85L, 86L, 86L, 86L, 87L, 87L, 87L, 88L, 88L, 88L, 89L,
89L, 89L, 90L, 90L, 90L, 91L, 91L, 91L, 92L, 92L, 92L, 93L, 93L,
93L, 94L, 94L, 94L, 95L, 95L, 95L, 96L, 96L, 96L, 97L, 97L, 97L,
98L, 98L, 98L, 99L, 99L, 99L, 100L, 100L, 100L, 101L, 101L, 101L,
102L, 102L, 102L, 103L, 103L, 103L, 104L, 104L, 104L, 105L, 105L,
105L, 106L, 106L, 106L, 107L, 107L, 107L, 108L, 108L, 108L, 109L,
109L, 109L, 110L, 110L, 110L, 111L, 111L, 111L, 112L, 112L, 112L,
113L, 113L, 113L, 114L, 114L, 114L, 115L, 115L, 115L, 116L, 116L,
116L, 117L, 117L, 117L, 118L, 118L, 118L, 119L, 119L, 119L, 120L,
120L, 120L, 121L, 121L, 121L, 122L, 122L, 122L, 123L, 123L, 123L,
124L, 124L, 124L, 125L, 125L, 125L, 126L, 126L, 126L, 127L, 127L,
127L, 128L, 128L, 128L, 129L, 129L, 129L, 130L, 130L, 130L, 131L,
131L, 131L, 132L, 132L, 132L, 151L, 151L, 151L, 152L, 152L, 152L,
153L, 153L, 153L, 154L, 154L, 154L, 155L, 155L, 155L, 156L, 156L,
156L, 157L, 157L, 157L, 158L, 158L, 158L, 159L, 159L, 159L, 160L,
160L, 160L, 161L, 161L, 161L, 162L, 162L, 162L, 163L, 163L, 163L,
164L, 164L, 164L, 165L, 165L, 165L, 166L, 166L, 166L, 167L, 167L,
167L, 168L, 168L, 168L, 169L, 169L, 169L, 170L, 170L, 170L, 171L,
171L, 171L, 172L, 172L, 172L, 173L, 173L, 173L, 174L, 174L, 174L,
175L, 175L, 175L, 176L, 176L, 176L, 177L, 177L, 177L, 178L, 178L,
178L, 179L, 179L, 179L, 180L, 180L, 180L, 181L, 181L, 181L, 182L,
182L, 182L, 183L, 183L, 183L, 184L, 184L, 184L, 185L, 185L, 185L,
186L, 186L, 186L, 187L, 187L, 187L, 188L, 188L, 188L, 189L, 189L,
189L, 190L, 190L, 190L, 191L, 191L, 191L, 192L, 192L, 192L, 133L,
133L, 133L, 134L, 134L, 134L, 135L, 135L, 135L, 136L, 136L, 136L,
137L, 137L, 137L, 138L, 138L, 138L, 139L, 139L, 139L, 140L, 140L,
140L, 141L, 141L, 141L, 142L, 142L, 142L, 143L, 143L, 143L, 144L,
144L, 144L, 241L, 241L, 241L, 242L, 242L, 242L, 243L, 243L, 243L,
244L, 244L, 244L, 245L, 245L, 245L, 246L, 246L, 246L, 385L, 385L,
385L, 386L, 386L, 386L, 387L, 387L, 387L, 388L, 388L, 388L, 389L,
389L, 389L, 390L, 390L, 390L, 433L, 433L, 433L, 434L, 434L, 434L,
435L, 435L, 435L, 436L, 436L, 436L, 437L, 437L, 437L, 438L, 438L,
438L, 439L, 439L, 439L, 440L, 440L, 440L, 441L, 441L, 441L, 442L,
442L, 442L, 443L, 443L, 443L, 444L, 444L, 444L, 445L, 445L, 445L,
446L, 446L, 446L, 447L, 447L, 447L, 448L, 448L, 448L, 449L, 449L,
449L, 450L, 450L, 450L, 451L, 451L, 451L, 452L, 452L, 452L, 453L,
453L, 453L, 454L, 454L, 454L, 455L, 455L, 455L, 456L, 456L, 456L,
457L, 457L, 457L, 458L, 458L, 458L, 459L, 459L, 459L, 460L, 460L,
460L, 461L, 461L, 461L, 462L, 462L, 462L, 463L, 463L, 463L, 464L,
464L, 464L, 465L, 465L, 465L, 466L, 466L, 466L, 467L, 467L, 467L,
468L, 468L, 468L, 469L, 469L, 469L, 470L, 470L, 470L, 471L, 471L,
471L, 472L, 472L, 472L, 473L, 473L, 473L, 474L, 474L, 474L, 475L,
475L, 475L, 476L, 476L, 476L, 477L, 477L, 477L, 478L, 478L, 478L,
479L, 479L, 479L, 480L, 480L, 480L, 247L, 247L, 247L, 248L, 248L,
248L, 249L, 249L, 249L, 250L, 250L, 250L, 251L, 251L, 251L, 252L,
252L, 252L, 253L, 253L, 253L, 254L, 254L, 254L, 255L, 255L, 255L,
256L, 256L, 256L, 257L, 257L, 257L, 258L, 258L, 258L, 259L, 259L,
259L, 260L, 260L, 260L, 261L, 261L, 261L, 262L, 262L, 262L, 263L,
263L, 263L, 264L, 264L, 264L, 265L, 265L, 265L, 266L, 266L, 266L,
267L, 267L, 267L, 268L, 268L, 268L, 269L, 269L, 269L, 270L, 270L,
270L, 271L, 271L, 271L, 272L, 272L, 272L, 273L, 273L, 273L, 274L,
274L, 274L, 275L, 275L, 275L, 276L, 276L, 276L, 277L, 277L, 277L,
278L, 278L, 278L, 279L, 279L, 279L, 280L, 280L, 280L, 281L, 281L,
281L, 282L, 282L, 282L, 283L, 283L, 283L, 284L, 284L, 284L, 285L,
285L, 285L, 286L, 286L, 286L, 287L, 287L, 287L, 288L, 288L, 288L,
289L, 289L, 289L, 290L, 290L, 290L, 291L, 291L, 291L, 292L, 292L,
292L, 293L, 293L, 293L, 294L, 294L, 294L, 295L, 295L, 295L, 296L,
296L, 296L, 297L, 297L, 297L, 298L, 298L, 298L, 299L, 299L, 299L,
300L, 300L, 300L, 301L, 301L, 301L, 302L, 302L, 302L, 303L, 303L,
303L, 304L, 304L, 304L, 305L, 305L, 305L, 306L, 306L, 306L, 307L,
307L, 307L, 308L, 308L, 308L, 309L, 309L, 309L, 310L, 310L, 310L,
311L, 311L, 311L, 312L, 312L, 312L, 319L, 319L, 319L, 320L, 320L,
320L, 321L, 321L, 321L, 322L, 322L, 322L, 323L, 323L, 323L, 324L,
324L, 324L, 325L, 325L, 325L, 326L, 326L, 326L, 327L, 327L, 327L,
328L, 328L, 328L, 329L, 329L, 329L, 330L, 330L, 330L, 331L, 331L,
331L, 332L, 332L, 332L, 333L, 333L, 333L, 334L, 334L, 334L, 335L,
335L, 335L, 336L, 336L, 336L, 337L, 337L, 337L, 338L, 338L, 338L,
339L, 339L, 339L, 340L, 340L, 340L, 341L, 341L, 341L, 342L, 342L,
342L, 343L, 343L, 343L, 344L, 344L, 344L, 345L, 345L, 345L, 346L,
346L, 346L, 347L, 347L, 347L, 348L, 348L, 348L, 349L, 349L, 349L,
350L, 350L, 350L, 351L, 351L, 351L, 352L, 352L, 352L, 353L, 353L,
353L, 354L, 354L, 354L, 355L, 355L, 355L, 356L, 356L, 356L, 357L,
357L, 357L, 358L, 358L, 358L, 359L, 359L, 359L, 360L, 360L, 360L,
361L, 361L, 361L, 362L, 362L, 362L, 363L, 363L, 363L, 364L, 364L,
364L, 365L, 365L, 365L, 366L, 366L, 366L, 367L, 367L, 367L, 368L,
368L, 368L, 369L, 369L, 369L, 370L, 370L, 370L, 371L, 371L, 371L,
372L, 372L, 372L, 391L, 391L, 391L, 392L, 392L, 392L, 393L, 393L,
393L, 394L, 394L, 394L, 395L, 395L, 395L, 396L, 396L, 396L, 397L,
397L, 397L, 398L, 398L, 398L, 399L, 399L, 399L, 400L, 400L, 400L,
401L, 401L, 401L, 402L, 402L, 402L, 403L, 403L, 403L, 404L, 404L,
404L, 405L, 405L, 405L, 406L, 406L, 406L, 407L, 407L, 407L, 408L,
408L, 408L, 409L, 409L, 409L, 410L, 410L, 410L, 411L, 411L, 411L,
412L, 412L, 412L, 413L, 413L, 413L, 414L, 414L, 414L, 415L, 415L,
415L, 416L, 416L, 416L, 417L, 417L, 417L, 418L, 418L, 418L, 419L,
419L, 419L, 420L, 420L, 420L, 421L, 421L, 421L, 422L, 422L, 422L,
423L, 423L, 423L, 424L, 424L, 424L, 425L, 425L, 425L, 426L, 426L,
426L, 427L, 427L, 427L, 428L, 428L, 428L, 429L, 429L, 429L, 430L,
430L, 430L, 431L, 431L, 431L, 432L, 432L, 432L, 373L, 373L, 373L,
374L, 374L, 374L, 375L, 375L, 375L, 376L, 376L, 376L, 377L, 377L,
377L, 378L, 378L, 378L, 379L, 379L, 379L, 380L, 380L, 380L, 381L,
381L, 381L, 382L, 382L, 382L, 383L, 383L, 383L, 384L, 384L, 384L,
313L, 313L, 313L, 314L, 314L, 314L, 315L, 315L, 315L, 316L, 316L,
316L, 317L, 317L, 317L, 318L, 318L, 318L), .Label = c("CUR:0:L1",
"CUR:0:L2", "CUR:0:L3", "CUR:0:L4", "CUR:0:L5", "CUR:0:L6", "CUR:00A:L1",
"CUR:00A:L2", "CUR:00A:L3", "CUR:00A:L4", "CUR:00A:L5", "CUR:00A:L6",
"CUR:00B:L1", "CUR:00B:L2", "CUR:00B:L3", "CUR:00B:L4", "CUR:00B:L5",
"CUR:00B:L6", "CUR:00C:L1", "CUR:00C:L2", "CUR:00C:L3", "CUR:00C:L4",
"CUR:00C:L5", "CUR:00C:L6", "CUR:00D:L1", "CUR:00D:L2", "CUR:00D:L3",
"CUR:00D:L4", "CUR:00D:L5", "CUR:00D:L6", "CUR:00F:L1", "CUR:00F:L2",
"CUR:00F:L3", "CUR:00F:L4", "CUR:00F:L5", "CUR:00F:L6", "CUR:00H:L1",
"CUR:00H:L2", "CUR:00H:L3", "CUR:00H:L4", "CUR:00H:L5", "CUR:00H:L6",
"CUR:00I:L1", "CUR:00I:L2", "CUR:00I:L3", "CUR:00I:L4", "CUR:00I:L5",
"CUR:00I:L6", "CUR:00J:L1", "CUR:00J:L2", "CUR:00J:L3", "CUR:00J:L4",
"CUR:00J:L5", "CUR:00J:L6", "CUR:00K:L1", "CUR:00K:L2", "CUR:00K:L3",
"CUR:00K:L4", "CUR:00K:L5", "CUR:00K:L6", "CUR:00L:L1", "CUR:00L:L2",
"CUR:00L:L3", "CUR:00L:L4", "CUR:00L:L5", "CUR:00L:L6", "CUR:00N:L1",
"CUR:00N:L2", "CUR:00N:L3", "CUR:00N:L4", "CUR:00N:L5", "CUR:00N:L6",
"CUR:00O:L1", "CUR:00O:L2", "CUR:00O:L3", "CUR:00O:L4", "CUR:00O:L5",
"CUR:00O:L6", "CUR:00P:L1", "CUR:00P:L2", "CUR:00P:L3", "CUR:00P:L4",
"CUR:00P:L5", "CUR:00P:L6", "CUR:00Q:L1", "CUR:00Q:L2", "CUR:00Q:L3",
"CUR:00Q:L4", "CUR:00Q:L5", "CUR:00Q:L6", "CUR:00R:L1", "CUR:00R:L2",
"CUR:00R:L3", "CUR:00R:L4", "CUR:00R:L5", "CUR:00R:L6", "CUR:00T:L1",
"CUR:00T:L2", "CUR:00T:L3", "CUR:00T:L4", "CUR:00T:L5", "CUR:00T:L6",
"CUR:00U:L1", "CUR:00U:L2", "CUR:00U:L3", "CUR:00U:L4", "CUR:00U:L5",
"CUR:00U:L6", "CUR:00V:L1", "CUR:00V:L2", "CUR:00V:L3", "CUR:00V:L4",
"CUR:00V:L5", "CUR:00V:L6", "CUR:00W:L1", "CUR:00W:L2", "CUR:00W:L3",
"CUR:00W:L4", "CUR:00W:L5", "CUR:00W:L6", "CUR:00X:L1", "CUR:00X:L2",
"CUR:00X:L3", "CUR:00X:L4", "CUR:00X:L5", "CUR:00X:L6", "CUR:00Z:L1",
"CUR:00Z:L2", "CUR:00Z:L3", "CUR:00Z:L4", "CUR:00Z:L5", "CUR:00Z:L6",
"CUR:01A:L1", "CUR:01A:L2", "CUR:01A:L3", "CUR:01A:L4", "CUR:01A:L5",
"CUR:01A:L6", "CUR:01B:L1", "CUR:01B:L2", "CUR:01B:L3", "CUR:01B:L4",
"CUR:01B:L5", "CUR:01B:L6", "CUR:1:L1", "CUR:1:L2", "CUR:1:L3",
"CUR:1:L4", "CUR:1:L5", "CUR:1:L6", "CUR:10:L1", "CUR:10:L2",
"CUR:10:L3", "CUR:10:L4", "CUR:10:L5", "CUR:10:L6", "CUR:11:L1",
"CUR:11:L2", "CUR:11:L3", "CUR:11:L4", "CUR:11:L5", "CUR:11:L6",
"CUR:12:L1", "CUR:12:L2", "CUR:12:L3", "CUR:12:L4", "CUR:12:L5",
"CUR:12:L6", "CUR:13:L1", "CUR:13:L2", "CUR:13:L3", "CUR:13:L4",
"CUR:13:L5", "CUR:13:L6", "CUR:16:L1", "CUR:16:L2", "CUR:16:L3",
"CUR:16:L4", "CUR:16:L5", "CUR:16:L6", "CUR:18:L1", "CUR:18:L2",
"CUR:18:L3", "CUR:18:L4", "CUR:18:L5", "CUR:18:L6", "CUR:19:L1",
"CUR:19:L2", "CUR:19:L3", "CUR:19:L4", "CUR:19:L5", "CUR:19:L6",
"CUR:2:L1", "CUR:2:L2", "CUR:2:L3", "CUR:2:L4", "CUR:2:L5", "CUR:2:L6",
"CUR:3:L1", "CUR:3:L2", "CUR:3:L3", "CUR:3:L4", "CUR:3:L5", "CUR:3:L6",
"CUR:4:L1", "CUR:4:L2", "CUR:4:L3", "CUR:4:L4", "CUR:4:L5", "CUR:4:L6",
"CUR:5:L1", "CUR:5:L2", "CUR:5:L3", "CUR:5:L4", "CUR:5:L5", "CUR:5:L6",
"CUR:6:L1", "CUR:6:L2", "CUR:6:L3", "CUR:6:L4", "CUR:6:L5", "CUR:6:L6",
"CUR:7:L1", "CUR:7:L2", "CUR:7:L3", "CUR:7:L4", "CUR:7:L5", "CUR:7:L6",
"CUR:8:L1", "CUR:8:L2", "CUR:8:L3", "CUR:8:L4", "CUR:8:L5", "CUR:8:L6",
"CUR:9:L1", "CUR:9:L2", "CUR:9:L3", "CUR:9:L4", "CUR:9:L5", "CUR:9:L6",
"PRI:0:L1", "PRI:0:L2", "PRI:0:L3", "PRI:0:L4", "PRI:0:L5", "PRI:0:L6",
"PRI:00A:L1", "PRI:00A:L2", "PRI:00A:L3", "PRI:00A:L4", "PRI:00A:L5",
"PRI:00A:L6", "PRI:00B:L1", "PRI:00B:L2", "PRI:00B:L3", "PRI:00B:L4",
"PRI:00B:L5", "PRI:00B:L6", "PRI:00C:L1", "PRI:00C:L2", "PRI:00C:L3",
"PRI:00C:L4", "PRI:00C:L5", "PRI:00C:L6", "PRI:00D:L1", "PRI:00D:L2",
"PRI:00D:L3", "PRI:00D:L4", "PRI:00D:L5", "PRI:00D:L6", "PRI:00F:L1",
"PRI:00F:L2", "PRI:00F:L3", "PRI:00F:L4", "PRI:00F:L5", "PRI:00F:L6",
"PRI:00H:L1", "PRI:00H:L2", "PRI:00H:L3", "PRI:00H:L4", "PRI:00H:L5",
"PRI:00H:L6", "PRI:00I:L1", "PRI:00I:L2", "PRI:00I:L3", "PRI:00I:L4",
"PRI:00I:L5", "PRI:00I:L6", "PRI:00J:L1", "PRI:00J:L2", "PRI:00J:L3",
"PRI:00J:L4", "PRI:00J:L5", "PRI:00J:L6", "PRI:00K:L1", "PRI:00K:L2",
"PRI:00K:L3", "PRI:00K:L4", "PRI:00K:L5", "PRI:00K:L6", "PRI:00L:L1",
"PRI:00L:L2", "PRI:00L:L3", "PRI:00L:L4", "PRI:00L:L5", "PRI:00L:L6",
"PRI:00N:L1", "PRI:00N:L2", "PRI:00N:L3", "PRI:00N:L4", "PRI:00N:L5",
"PRI:00N:L6", "PRI:00O:L1", "PRI:00O:L2", "PRI:00O:L3", "PRI:00O:L4",
"PRI:00O:L5", "PRI:00O:L6", "PRI:00P:L1", "PRI:00P:L2", "PRI:00P:L3",
"PRI:00P:L4", "PRI:00P:L5", "PRI:00P:L6", "PRI:00Q:L1", "PRI:00Q:L2",
"PRI:00Q:L3", "PRI:00Q:L4", "PRI:00Q:L5", "PRI:00Q:L6", "PRI:00R:L1",
"PRI:00R:L2", "PRI:00R:L3", "PRI:00R:L4", "PRI:00R:L5", "PRI:00R:L6",
"PRI:00T:L1", "PRI:00T:L2", "PRI:00T:L3", "PRI:00T:L4", "PRI:00T:L5",
"PRI:00T:L6", "PRI:00U:L1", "PRI:00U:L2", "PRI:00U:L3", "PRI:00U:L4",
"PRI:00U:L5", "PRI:00U:L6", "PRI:00V:L1", "PRI:00V:L2", "PRI:00V:L3",
"PRI:00V:L4", "PRI:00V:L5", "PRI:00V:L6", "PRI:00W:L1", "PRI:00W:L2",
"PRI:00W:L3", "PRI:00W:L4", "PRI:00W:L5", "PRI:00W:L6", "PRI:00X:L1",
"PRI:00X:L2", "PRI:00X:L3", "PRI:00X:L4", "PRI:00X:L5", "PRI:00X:L6",
"PRI:00Z:L1", "PRI:00Z:L2", "PRI:00Z:L3", "PRI:00Z:L4", "PRI:00Z:L5",
"PRI:00Z:L6", "PRI:01A:L1", "PRI:01A:L2", "PRI:01A:L3", "PRI:01A:L4",
"PRI:01A:L5", "PRI:01A:L6", "PRI:01B:L1", "PRI:01B:L2", "PRI:01B:L3",
"PRI:01B:L4", "PRI:01B:L5", "PRI:01B:L6", "PRI:1:L1", "PRI:1:L2",
"PRI:1:L3", "PRI:1:L4", "PRI:1:L5", "PRI:1:L6", "PRI:10:L1",
"PRI:10:L2", "PRI:10:L3", "PRI:10:L4", "PRI:10:L5", "PRI:10:L6",
"PRI:11:L1", "PRI:11:L2", "PRI:11:L3", "PRI:11:L4", "PRI:11:L5",
"PRI:11:L6", "PRI:12:L1", "PRI:12:L2", "PRI:12:L3", "PRI:12:L4",
"PRI:12:L5", "PRI:12:L6", "PRI:13:L1", "PRI:13:L2", "PRI:13:L3",
"PRI:13:L4", "PRI:13:L5", "PRI:13:L6", "PRI:16:L1", "PRI:16:L2",
"PRI:16:L3", "PRI:16:L4", "PRI:16:L5", "PRI:16:L6", "PRI:18:L1",
"PRI:18:L2", "PRI:18:L3", "PRI:18:L4", "PRI:18:L5", "PRI:18:L6",
"PRI:19:L1", "PRI:19:L2", "PRI:19:L3", "PRI:19:L4", "PRI:19:L5",
"PRI:19:L6", "PRI:2:L1", "PRI:2:L2", "PRI:2:L3", "PRI:2:L4",
"PRI:2:L5", "PRI:2:L6", "PRI:3:L1", "PRI:3:L2", "PRI:3:L3", "PRI:3:L4",
"PRI:3:L5", "PRI:3:L6", "PRI:4:L1", "PRI:4:L2", "PRI:4:L3", "PRI:4:L4",
"PRI:4:L5", "PRI:4:L6", "PRI:5:L1", "PRI:5:L2", "PRI:5:L3", "PRI:5:L4",
"PRI:5:L5", "PRI:5:L6", "PRI:6:L1", "PRI:6:L2", "PRI:6:L3", "PRI:6:L4",
"PRI:6:L5", "PRI:6:L6", "PRI:7:L1", "PRI:7:L2", "PRI:7:L3", "PRI:7:L4",
"PRI:7:L5", "PRI:7:L6", "PRI:8:L1", "PRI:8:L2", "PRI:8:L3", "PRI:8:L4",
"PRI:8:L5", "PRI:8:L6", "PRI:9:L1", "PRI:9:L2", "PRI:9:L3", "PRI:9:L4",
"PRI:9:L5", "PRI:9:L6"), class = "factor")
I wanted to use caret to partition my data, so this is what I did:
library(caret)
train.rows<- createDataPartition(gg, p=0.7,list = FALSE)
> length(train.rows)
[1] 1440
However, I am getting everything in gg in my train.rows even after 0.7 partitioning. What am I missing here?
Try it without class = factor
Then your partitioned vector will be:
indexes <- caret::createDataPartition(gg, times = 1, p = 0.7, list=FALSE)
train <- gg[indexes]
test <- gg[-indexes]
Related
I want to plot a heatmap where the x-axis is clustered by "normal" and "KIRP" from left to right.
Currently, my code clusters by dendrogram/similarity, and unfortunately I have one outlier "KIRP" on the left-most. I want to move this "KIRP" sample so that it appears after all the "normal" samples. Nevertheless, both groups "normal" and "KIRP" should still be clustered and arranged based on group similarity.
[![enter image description here][1]][1]
Code:
dge <- DGEList(counts=mat, group=group)
keep <- filterByExpr(dge)
v <- voom(mat, design, plot=TRUE)
vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contrasts)
efit <- eBayes(vfit)
tfit <- treat(vfit, lfc=1)
dt <- decideTests(tfit)
de.common <- which(dt[,1]!=0)
kirp.vs.normal <- topTreat(tfit, coef=1, n=Inf)
topgenes <- rownames(kirp.vs.normal)[1:150]
i <- which(rownames(dge) %in% topgenes)
mycol <- colorpanel(1000,"#FFA500","white","#2E2787")
heatmap.2(lcpm[i,], scale="row",
labRow=rownames(dge)[i], labCol=group,
col=mycol, trace="none", density.info="none",
margin=c(8,1), lhei=c(2,10), dendrogram="column", main="Differential Gene Expression in\nNormal vs KIRP Type II CIMP samples", cex.main=0.3, lmat=rbind(c(0,3,4), c(2,1,0)), lwid=c(0.5,10,3))
Data:
dput(dge[1:50,1:50])
new("DGEList", .Data = list(structure(c(2L, 47L, 5L, 185L, 124L,
272L, 197L, 405L, 59L, 258L, 270L, 226L, 112L, 322L, 381L, 281L,
145L, 53L, 325L, 107L, 103L, 375L, 70L, 298L, 131L, 79L, 297L,
2L, 345L, 390L, 113L, 289L, 58L, 400L, 389L, 414L, 228L, 188L,
392L, 222L, 86L, 355L, 20L, 49L, 211L, 311L, 96L, 304L, 378L,
145L, 3L, 363L, 199L, 22L, 313L, 305L, 182L, 338L, 32L, 266L,
314L, 35L, 384L, 361L, 37L, 241L, 4L, 340L, 356L, 26L, 100L,
212L, 27L, 273L, 25L, 43L, 355L, 5L, 211L, 155L, 372L, 253L,
180L, 380L, 105L, 13L, 242L, 221L, 401L, 215L, 197L, 233L, 345L,
136L, 254L, 183L, 111L, 390L, 392L, 298L, 1L, 308L, 89L, 118L,
306L, 219L, 50L, 100L, 352L, 286L, 229L, 340L, 135L, 194L, 130L,
124L, 323L, 54L, 105L, 279L, 91L, 99L, 391L, 291L, 395L, 83L,
353L, 1L, 322L, 185L, 196L, 263L, 33L, 274L, 362L, 265L, 234L,
356L, 297L, 154L, 81L, 65L, 293L, 144L, 2L, 132L, 270L, 360L,
371L, 5L, 2L, 95L, 1L, 93L, 248L, 317L, 269L, 373L, 71L, 192L,
375L, 340L, 60L, 108L, 42L, 128L, 3L, 292L, 312L, 173L, 363L,
178L, 17L, 387L, 143L, 329L, 385L, 2L, 252L, 118L, 413L, 16L,
87L, 339L, 88L, 75L, 347L, 184L, 337L, 297L, 136L, 229L, 85L,
358L, 8L, 283L, 162L, 316L, 45L, 7L, 1L, 319L, 2L, 117L, 137L,
199L, 300L, 114L, 291L, 92L, 125L, 168L, 153L, 238L, 3L, 259L,
192L, 360L, 125L, 230L, 80L, 262L, 34L, 266L, 220L, 237L, 272L,
1L, 326L, 38L, 350L, 273L, 352L, 320L, 45L, 218L, 209L, 224L,
288L, 145L, 372L, 192L, 307L, 203L, 2L, 277L, 280L, 233L, 368L,
6L, 2L, 83L, 2L, 192L, 141L, 297L, 203L, 338L, 323L, 210L, 289L,
275L, 91L, 263L, 3L, 4L, 2L, 28L, 259L, 264L, 317L, 198L, 361L,
365L, 373L, 312L, 300L, 2L, 283L, 63L, 123L, 324L, 286L, 251L,
253L, 104L, 284L, 143L, 371L, 237L, 325L, 314L, 16L, 208L, 1L,
191L, 134L, 279L, 348L, 180L, 2L, 126L, 1L, 369L, 368L, 377L,
305L, 314L, 38L, 24L, 407L, 223L, 320L, 66L, 3L, 136L, 2L, 240L,
404L, 227L, 336L, 356L, 403L, 49L, 195L, 260L, 365L, 2L, 405L,
350L, 302L, 351L, 11L, 358L, 225L, 37L, 340L, 132L, 380L, 276L,
146L, 80L, 200L, 328L, 2L, 317L, 184L, 269L, 304L, 280L, 3L,
147L, 4L, 183L, 279L, 198L, 69L, 90L, 337L, 192L, 9L, 173L, 201L,
265L, 2L, 237L, 291L, 392L, 96L, 287L, 30L, 78L, 383L, 317L,
325L, 333L, 275L, 1L, 354L, 12L, 37L, 245L, 378L, 316L, 51L,
284L, 223L, 330L, 308L, 113L, 44L, 321L, 298L, 92L, 4L, 18L,
241L, 269L, 336L, 22L, 1L, 272L, 4L, 114L, 134L, 224L, 315L,
72L, 361L, 200L, 135L, 269L, 98L, 260L, 4L, 42L, 4L, 371L, 148L,
168L, 110L, 323L, 48L, 271L, 33L, 49L, 345L, 2L, 285L, 95L, 79L,
277L, 38L, 327L, 352L, 124L, 230L, 189L, 283L, 160L, 54L, 220L,
357L, 211L, 2L, 287L, 273L, 275L, 339L, 2L, 2L, 302L, 203L, 210L,
190L, 276L, 351L, 51L, 361L, 155L, 232L, 213L, 184L, 330L, 130L,
56L, 342L, 79L, 209L, 178L, 163L, 86L, 375L, 337L, 96L, 286L,
335L, 5L, 382L, 398L, 116L, 322L, 16L, 268L, 40L, 261L, 229L,
263L, 359L, 181L, 117L, 71L, 400L, 113L, 1L, 390L, 23L, 329L,
284L, 5L, 1L, 330L, 3L, 40L, 102L, 200L, 269L, 67L, 284L, 149L,
186L, 145L, 93L, 296L, 4L, 321L, 1L, 35L, 53L, 148L, 57L, 283L,
366L, 280L, 85L, 43L, 357L, 1L, 304L, 9L, 41L, 259L, 326L, 310L,
106L, 153L, 229L, 214L, 243L, 172L, 30L, 289L, 331L, 174L, 111L,
359L, 273L, 294L, 365L, 4L, 3L, 379L, 10L, 171L, 216L, 301L,
151L, 70L, 40L, 34L, 394L, 245L, 390L, 142L, 3L, 146L, 10L, 341L,
154L, 35L, 263L, 65L, 387L, 356L, 23L, 290L, 24L, 1L, 227L, 91L,
323L, 389L, 376L, 275L, 55L, 369L, 328L, 257L, 256L, 304L, 102L,
57L, 62L, 336L, 7L, 217L, 187L, 310L, 401L, 10L, 2L, 284L, 2L,
141L, 222L, 278L, 376L, 67L, 342L, 276L, 142L, 179L, 74L, 310L,
253L, 228L, 8L, 14L, 193L, 190L, 89L, 62L, 40L, 330L, 184L, 283L,
324L, 2L, 244L, 390L, 183L, 277L, 402L, 357L, 388L, 156L, 256L,
255L, 343L, 114L, 79L, 38L, 361L, 167L, 2L, 301L, 375L, 262L,
356L, 205L, 1L, 111L, 1L, 58L, 109L, 315L, 210L, 23L, 18L, 218L,
36L, 268L, 285L, 301L, 7L, 186L, 8L, 258L, 142L, 130L, 291L,
335L, 71L, 19L, 16L, 385L, 69L, 2L, 276L, 375L, 128L, 42L, 369L,
333L, 91L, 318L, 371L, 225L, 270L, 226L, 31L, 329L, 106L, 224L,
2L, 172L, 88L, 292L, 35L, 143L, 2L, 23L, 2L, 74L, 207L, 257L,
357L, 27L, 341L, 124L, 202L, 72L, 86L, 237L, 7L, 287L, 3L, 44L,
224L, 221L, 116L, 35L, 30L, 305L, 71L, 337L, 350L, 1L, 365L,
353L, 62L, 292L, 17L, 288L, 21L, 143L, 228L, 253L, 271L, 178L,
39L, 382L, 363L, 238L, 6L, 137L, 374L, 331L, 346L, 100L, 1L,
26L, 114L, 270L, 218L, 370L, 151L, 361L, 48L, 121L, 345L, 68L,
280L, 308L, 326L, 217L, 2L, 234L, 93L, 40L, 73L, 102L, 85L, 265L,
335L, 301L, 375L, 1L, 163L, 201L, 123L, 260L, 109L, 357L, 208L,
319L, 286L, 108L, 252L, 284L, 184L, 181L, 235L, 240L, 2L, 56L,
194L, 248L, 20L, 232L, 1L, 379L, 4L, 39L, 188L, 291L, 352L, 17L,
363L, 57L, 177L, 215L, 127L, 300L, 3L, 112L, 5L, 23L, 77L, 199L,
32L, 385L, 47L, 311L, 139L, 277L, 346L, 1L, 305L, 60L, 162L,
284L, 13L, 332L, 36L, 159L, 224L, 230L, 304L, 228L, 53L, 376L,
371L, 190L, 2L, 366L, 380L, 258L, 386L, 132L, 2L, 34L, 5L, 120L,
375L, 338L, 126L, 41L, 24L, 173L, 33L, 387L, 54L, 146L, 198L,
292L, 6L, 15L, 226L, 267L, 333L, 153L, 335L, 57L, 380L, 148L,
32L, 3L, 349L, 189L, 298L, 49L, 403L, 350L, 88L, 94L, 343L, 260L,
322L, 311L, 92L, 80L, 371L, 388L, 1L, 274L, 73L, 47L, 280L, 229L,
1L, 316L, 1L, 118L, 261L, 254L, 347L, 58L, 388L, 209L, 264L,
298L, 56L, 288L, 365L, 227L, 365L, 321L, 179L, 92L, 86L, 135L,
35L, 279L, 133L, 190L, 353L, 2L, 314L, 93L, 223L, 329L, 45L,
379L, 16L, 211L, 216L, 262L, 306L, 202L, 108L, 90L, 319L, 204L,
4L, 273L, 381L, 332L, 15L, 118L, 3L, 211L, 138L, 48L, 255L, 302L,
283L, 22L, 368L, 29L, 228L, 363L, 405L, 309L, 58L, 350L, 286L,
301L, 78L, 159L, 65L, 149L, 399L, 306L, 74L, 198L, 336L, 1L,
327L, 91L, 312L, 259L, 108L, 345L, 168L, 70L, 251L, 221L, 314L,
253L, 169L, 97L, 231L, 177L, 4L, 325L, 291L, 293L, 386L, 138L,
2L, 181L, 1L, 10L, 214L, 292L, 311L, 28L, 382L, 163L, 262L, 347L,
77L, 242L, 404L, 340L, 9L, 416L, 232L, 168L, 157L, 95L, 14L,
297L, 303L, 113L, 354L, 129L, 313L, 70L, 190L, 296L, 67L, 355L,
17L, 103L, 266L, 257L, 377L, 226L, 98L, 132L, 345L, 241L, 7L,
108L, 54L, 320L, 395L, 205L, 196L, 265L, 3L, 140L, 257L, 190L,
74L, 95L, 315L, 177L, 151L, 130L, 169L, 235L, 5L, 188L, 311L,
9L, 72L, 277L, 23L, 16L, 354L, 302L, 378L, 353L, 229L, 2L, 338L,
19L, 156L, 262L, 352L, 288L, 274L, 228L, 200L, 351L, 310L, 127L,
28L, 355L, 237L, 78L, 2L, 385L, 273L, 304L, 330L, 1L, 2L, 377L,
138L, 141L, 22L, 286L, 318L, 399L, 38L, 230L, 97L, 295L, 57L,
312L, 2L, 263L, 4L, 385L, 240L, 149L, 58L, 24L, 45L, 300L, 104L,
263L, 367L, 1L, 319L, 100L, 169L, 298L, 36L, 383L, 130L, 91L,
241L, 159L, 320L, 233L, 120L, 347L, 351L, 194L, 4L, 372L, 355L,
329L, 394L, 8L, 3L, 110L, 1L, 311L, 143L, 373L, 178L, 181L, 379L,
393L, 307L, 274L, 121L, 86L, 277L, 3L, 202L, 365L, 124L, 378L,
374L, 107L, 348L, 57L, 370L, 304L, 18L, 139L, 255L, 325L, 286L,
13L, 391L, 293L, 46L, 271L, 345L, 116L, 372L, 242L, 213L, 248L,
149L, 295L, 4L, 106L, 263L, 303L, 279L, 119L, 2L, 262L, 2L, 242L,
373L, 304L, 246L, 216L, 18L, 265L, 255L, 199L, 30L, 291L, 6L,
357L, 133L, 359L, 168L, 266L, 80L, 114L, 352L, 15L, 157L, 389L,
61L, 40L, 60L, 273L, 91L, 130L, 156L, 329L, 387L, 292L, 397L,
403L, 323L, 301L, 215L, 332L, 177L, 375L, 9L, 161L, 181L, 330L,
335L, 254L, 2L, 178L, 120L, 70L, 49L, 352L, 287L, 351L, 368L,
214L, 389L, 221L, 44L, 60L, 2L, 121L, 272L, 363L, 36L, 388L,
284L, 63L, 38L, 57L, 47L, 263L, 342L, 237L, 312L, 300L, 228L,
22L, 373L, 201L, 116L, 77L, 347L, 168L, 318L, 257L, 107L, 27L,
61L, 326L, 5L, 390L, 229L, 349L, 279L, 120L, 1L, 205L, 6L, 108L,
141L, 129L, 47L, 83L, 327L, 23L, 54L, 184L, 181L, 249L, 188L,
143L, 188L, 102L, 107L, 228L, 35L, 82L, 343L, 290L, 12L, 311L,
232L, 1L, 339L, 53L, 159L, 212L, 346L, 239L, 316L, 305L, 158L,
313L, 280L, 139L, 40L, 68L, 320L, 84L, 4L, 143L, 276L, 275L,
310L, 2L, 1L, 147L, 1L, 339L, 67L, 301L, 117L, 329L, 347L, 372L,
276L, 258L, 24L, 310L, 1L, 5L, 1L, 335L, 368L, 47L, 74L, 209L,
262L, 83L, 371L, 28L, 341L, 1L, 305L, 31L, 9L, 357L, 325L, 259L,
30L, 313L, 344L, 216L, 311L, 210L, 100L, 48L, 234L, 330L, 6L,
19L, 285L, 350L, 275L, 180L, 8L, 128L, 2L, 120L, 162L, 266L,
309L, 35L, 375L, 83L, 192L, 258L, 147L, 202L, 142L, 41L, 2L,
313L, 9L, 197L, 126L, 75L, 368L, 323L, 14L, 199L, 63L, 5L, 380L,
91L, 114L, 311L, 29L, 296L, 301L, 203L, 276L, 246L, 335L, 182L,
45L, 11L, 387L, 215L, 1L, 15L, 383L, 364L, 332L, 7L, 2L, 132L,
2L, 116L, 130L, 218L, 321L, 47L, 351L, 264L, 192L, 171L, 126L,
227L, 2L, 363L, 229L, 379L, 210L, 188L, 154L, 285L, 11L, 314L,
338L, 177L, 323L, 2L, 362L, 382L, 49L, 237L, 375L, 310L, 244L,
185L, 254L, 241L, 346L, 111L, 27L, 230L, 383L, 175L, 2L, 232L,
386L, 284L, 287L, 1L, 1L, 347L, 9L, 95L, 285L, 339L, 245L, 397L,
91L, 229L, 298L, 26L, 396L, 385L, 401L, 24L, 8L, 347L, 264L,
122L, 146L, 128L, 28L, 306L, 93L, 390L, 415L, 2L, 300L, 207L,
262L, 287L, 127L, 51L, 230L, 43L, 297L, 194L, 348L, 209L, 213L,
193L, 365L, 165L, 3L, 69L, 17L, 342L, 382L, 131L, 2L, 328L, 128L,
109L, 207L, 199L, 305L, 43L, 357L, 269L, 207L, 209L, 162L, 251L,
4L, 195L, 190L, 104L, 275L, 153L, 117L, 302L, 51L, 307L, 69L,
358L, 341L, 4L, 283L, 359L, 46L, 238L, 12L, 293L, 312L, 212L,
215L, 271L, 348L, 136L, 15L, 353L, 52L, 174L, 190L, 18L, 123L,
321L, 335L, 1L, 2L, 301L, 5L, 365L, 226L, 285L, 206L, 28L, 395L,
297L, 245L, 13L, 75L, 281L, 32L, 324L, 7L, 393L, 94L, 181L, 125L,
153L, 368L, 268L, 10L, 191L, 369L, 1L, 326L, 180L, 188L, 314L,
58L, 375L, 53L, 122L, 236L, 235L, 265L, 207L, 156L, 165L, 313L,
171L, 5L, 385L, 256L, 276L, 29L, 129L, 1L, 216L, 128L, 72L, 117L,
233L, 287L, 378L, 353L, 204L, 80L, 257L, 65L, 315L, 190L, 201L,
227L, 12L, 198L, 146L, 103L, 15L, 48L, 280L, 155L, 308L, 359L,
1L, 253L, 144L, 19L, 288L, 49L, 329L, 119L, 113L, 200L, 151L,
284L, 196L, 102L, 295L, 358L, 240L, 128L, 241L, 281L, 264L, 349L,
3L, 2L, 20L, 3L, 131L, 144L, 180L, 356L, 57L, 331L, 232L, 175L,
199L, 114L, 215L, 191L, 74L, 5L, 88L, 144L, 202L, 135L, 355L,
30L, 283L, 291L, 98L, 350L, 1L, 371L, 26L, 251L, 288L, 398L,
270L, 353L, 93L, 237L, 225L, 333L, 174L, 45L, 8L, 391L, 197L,
6L, 65L, 330L, 346L, 367L, 122L, 1L, 244L, 2L, 95L, 191L, 166L,
360L, 63L, 305L, 118L, 182L, 172L, 110L, 236L, 6L, 17L, 6L, 365L,
237L, 173L, 52L, 322L, 23L, 290L, 373L, 354L, 307L, 1L, 340L,
47L, 97L, 273L, 329L, 284L, 316L, 203L, 219L, 253L, 315L, 142L,
377L, 325L, 321L, 274L, 2L, 40L, 336L, 321L, 351L, 1L, 4L, 236L,
5L, 278L, 195L, 243L, 348L, 40L, 9L, 230L, 215L, 364L, 39L, 196L,
58L, 332L, 114L, 297L, 130L, 165L, 65L, 151L, 75L, 288L, 322L,
200L, 380L, 4L, 227L, 198L, 217L, 233L, 111L, 276L, 293L, 167L,
211L, 303L, 287L, 160L, 128L, 177L, 308L, 147L, 1L, 136L, 376L,
274L, 335L, 176L, 1L, 397L, 1L, 46L, 137L, 315L, 271L, 373L,
393L, 323L, 196L, 232L, 351L, 130L, 207L, 174L, 1L, 197L, 229L,
123L, 228L, 368L, 36L, 383L, 140L, 322L, 342L, 2L, 200L, 345L,
336L, 294L, 249L, 227L, 276L, 88L, 307L, 119L, 390L, 234L, 70L,
46L, 64L, 241L, 205L, 132L, 208L, 270L, 343L, 13L, 1L, 221L,
1L, 168L, 38L, 266L, 333L, 272L, 358L, 335L, 79L, 241L, 110L,
33L, 1L, 1L, 2L, 41L, 295L, 165L, 287L, 368L, 332L, 337L, 357L,
30L, 284L, 117L, 324L, 356L, 162L, 309L, 270L, 306L, 49L, 7L,
282L, 155L, 12L, 194L, 68L, 259L, 208L, 85L, 1L, 214L, 63L, 252L,
305L, 117L, 1L, 398L, 131L, 399L, 221L, 323L, 142L, 373L, 397L,
262L, 277L, 389L, 315L, 324L, 98L, 393L, 8L, 190L, 285L, 58L,
56L, 194L, 87L, 340L, 192L, 239L, 382L, 1L, 246L, 400L, 254L,
269L, 161L, 314L, 148L, 146L, 276L, 200L, 334L, 284L, 185L, 147L,
341L, 140L, 203L, 280L, 16L, 330L, 392L, 325L, 6L, 271L, 6L,
75L, 241L, 205L, 14L, 108L, 363L, 190L, 148L, 269L, 169L, 226L,
6L, 34L, 137L, 398L, 234L, 276L, 104L, 381L, 355L, 341L, 12L,
322L, 325L, 1L, 360L, 123L, 224L, 291L, 400L, 301L, 285L, 245L,
232L, 359L, 344L, 145L, 64L, 369L, 272L, 194L, 4L, 26L, 326L,
309L, 323L, 6L, 1L, 376L, 2L, 102L, 109L, 202L, 298L, 72L, 366L,
230L, 213L, 221L, 84L, 242L, 2L, 339L, 8L, 42L, 247L, 204L, 111L,
28L, 17L, 281L, 375L, 150L, 337L, 1L, 349L, 24L, 229L, 262L,
378L, 277L, 358L, 170L, 254L, 261L, 285L, 205L, 48L, 10L, 333L,
198L, 2L, 324L, 309L, 355L, 359L, 4L, 1L, 55L, 4L, 91L, 386L,
289L, 45L, 13L, 347L, 263L, 370L, 303L, 384L, 127L, 2L, 131L,
6L, 353L, 165L, 333L, 271L, 295L, 105L, 18L, 109L, 20L, 14L,
1L, 305L, 30L, 380L, 397L, 12L, 293L, 385L, 213L, 357L, 201L,
312L, 282L, 23L, 318L, 72L, 301L, 4L, 146L, 96L, 302L, 379L,
2L, 2L, 379L, 2L, 188L, 174L, 223L, 304L, 388L, 392L, 312L, 278L,
182L, 144L, 334L, 5L, 283L, 195L, 59L, 310L, 179L, 192L, 30L,
316L, 335L, 68L, 387L, 332L, 3L, 374L, 402L, 143L, 305L, 20L,
358L, 21L, 208L, 239L, 214L, 407L, 133L, 97L, 326L, 42L, 256L,
123L, 63L, 45L, 395L, 276L, 124L, 1L, 282L, 204L, 21L, 248L,
311L, 74L, 62L, 416L, 347L, 278L, 276L, 69L, 385L, 20L, 297L,
6L, 324L, 176L, 188L, 127L, 108L, 331L, 307L, 214L, 93L, 390L,
2L, 300L, 80L, 349L, 344L, 109L, 398L, 339L, 173L, 244L, 323L,
272L, 259L, 185L, 172L, 366L, 226L, 129L, 35L, 303L, 404L, 391L,
206L, 2L, 346L, 4L, 109L, 30L, 218L, 171L, 17L, 332L, 240L, 239L,
384L, 102L, 73L, 2L, 290L, 245L, 350L, 221L, 39L, 214L, 233L,
74L, 378L, 329L, 286L, 359L, 4L, 315L, 115L, 353L, 306L, 328L,
226L, 305L, 255L, 297L, 167L, 356L, 228L, 392L, 244L, 398L, 210L,
1L, 170L, 92L, 264L, 300L, 8L, 4L, 197L, 2L, 7L, 231L, 352L,
118L, 320L, 24L, 176L, 252L, 392L, 13L, 133L, 3L, 209L, 258L,
121L, 186L, 204L, 80L, 168L, 379L, 374L, 335L, 271L, 39L, 279L,
44L, 148L, 262L, 95L, 99L, 388L, 37L, 309L, 365L, 136L, 333L,
291L, 164L, 215L, 27L, 343L, 3L, 115L, 233L, 368L, 342L, 51L,
4L, 329L, 2L, 14L, 37L, 301L, 125L, 31L, 315L, 389L, 34L, 218L,
38L, 312L, 11L, 4L, 227L, 271L, 228L, 102L, 357L, 208L, 316L,
47L, 20L, 369L, 45L, 4L, 390L, 302L, 94L, 19L, 362L, 251L, 394L,
335L, 32L, 258L, 325L, 256L, 73L, 52L, 110L, 288L, 8L, 58L, 113L,
353L, 320L, 136L, 2L, 230L, 2L, 42L, 123L, 327L, 222L, 354L,
338L, 345L, 236L, 128L, 102L, 41L, 135L, 7L, 3L, 31L, 293L, 347L,
328L, 296L, 16L, 47L, 385L, 86L, 294L, 2L, 19L, 246L, 164L, 371L,
317L, 297L, 100L, 346L, 352L, 206L, 379L, 252L, 40L, 363L, 54L,
186L, 1L, 339L, 139L, 274L, 391L, 194L, 1L, 287L, 1L, 128L, 53L,
268L, 180L, 343L, 13L, 213L, 377L, 270L, 80L, 204L, 119L, 118L,
118L, 10L, 103L, 375L, 202L, 21L, 63L, 316L, 393L, 342L, 344L,
1L, 353L, 93L, 86L, 324L, 327L, 336L, 319L, 230L, 308L, 167L,
355L, 219L, 43L, 32L, 389L, 199L, 3L, 96L, 386L, 290L, 388L,
6L), dim = c(50L, 50L), dimnames = list(c("A2ML1", "ABCA4", "ABCB5",
"ABHD1", "ACRBP", "ACSL5", "ACSM5", "ACSS3", "ACVRL1", "ADH1C",
"ADRB2", "AEBP1", "AFMID", "AIF1", "AIM2", "AKR1B10", "AKR1C4",
"AKR7L", "ALDH3B2", "ALDH8A1", "ALDOC", "ALOX5AP", "ALPK3", "AMFR",
"ANKRD22", "ANKRD2", "ANKRD45", "ANXA8L2", "ANXA9", "AOC3", "APBB1IP",
"APH1B", "APOBEC3C", "APOL3", "APOL4", "APOM", "APP", "AQP1",
"ARFRP1", "ARHGAP29", "ARHGDIB", "ARL11", "ARL4D", "ARRDC3",
"ASCL3", "B3GNT3", "B3GNT8", "BAMBI", "BAZ2B", "BCL2L14"), c("TCGA.BQ.7051.11A",
"TCGA.DZ.6132.11A", "TCGA.CZ.4864.11A", "TCGA.KN.8426.11A", "TCGA.CZ.5982.11A",
"TCGA.A4.A4ZT.11A", "TCGA.CZ.5468.11A", "TCGA.BQ.5894.11A", "TCGA.B0.5699.11A",
"TCGA.KL.8339.11A", "TCGA.CZ.5988.11A", "TCGA.CZ.5461.11A", "TCGA.CJ.6030.11A",
"TCGA.B8.5549.11A", "TCGA.CW.5587.11A", "TCGA.CZ.5987.11A", "TCGA.CJ.5677.11A",
"TCGA.CZ.5470.11A", "TCGA.B2.5636.11A", "TCGA.CJ.5676.11A", "TCGA.KN.8435.11A",
"TCGA.BQ.5877.11A", "TCGA.CZ.5984.11A", "TCGA.CZ.5457.11A", "TCGA.CZ.4863.11A",
"TCGA.CZ.5467.11A", "TCGA.A3.3387.11A", "TCGA.CZ.5456.11A", "TCGA.B9.4115.11A",
"TCGA.GL.6846.11A", "TCGA.B0.5402.11A", "TCGA.DZ.6133.11A", "TCGA.B0.5691.11A",
"TCGA.B0.4700.11A", "TCGA.B0.5696.11A", "TCGA.CW.5581.11A", "TCGA.BQ.7045.11A",
"TCGA.KN.8427.11A", "TCGA.GL.A59R.11A", "TCGA.CW.5584.11A", "TCGA.BQ.5878.11A",
"TCGA.CW.5589.11A", "TCGA.CJ.5672.11A", "TCGA.BQ.7044.11A", "TCGA.CZ.5466.11A",
"TCGA.BQ.5887.11A", "TCGA.CZ.4865.11A", "TCGA.CZ.5458.11A", "TCGA.Y8.A8RY.11A",
"TCGA.KN.8422.11A"))), structure(list(group = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), levels = "normal", class = "factor"), lib.size = c(94225,
93733, 87671, 94478, 81956, 81966, 91604, 87469, 81048, 91004,
81264, 92424, 91496, 85877, 87734, 83846, 88254, 92553, 89254,
91736, 96220, 84907, 90231, 87189, 90384, 87166, 81495, 81436,
90285, 83664, 95495, 84763, 91291, 85265, 87117, 81741, 88278,
92099, 81840, 90942, 88312, 89034, 89739, 90942, 95178, 88887,
91263, 85695, 87717, 90705), norm.factors = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1)), row.names = c("TCGA.BQ.7051.11A", "TCGA.DZ.6132.11A",
"TCGA.CZ.4864.11A", "TCGA.KN.8426.11A", "TCGA.CZ.5982.11A", "TCGA.A4.A4ZT.11A",
"TCGA.CZ.5468.11A", "TCGA.BQ.5894.11A", "TCGA.B0.5699.11A", "TCGA.KL.8339.11A",
"TCGA.CZ.5988.11A", "TCGA.CZ.5461.11A", "TCGA.CJ.6030.11A", "TCGA.B8.5549.11A",
"TCGA.CW.5587.11A", "TCGA.CZ.5987.11A", "TCGA.CJ.5677.11A", "TCGA.CZ.5470.11A",
"TCGA.B2.5636.11A", "TCGA.CJ.5676.11A", "TCGA.KN.8435.11A", "TCGA.BQ.5877.11A",
"TCGA.CZ.5984.11A", "TCGA.CZ.5457.11A", "TCGA.CZ.4863.11A", "TCGA.CZ.5467.11A",
"TCGA.A3.3387.11A", "TCGA.CZ.5456.11A", "TCGA.B9.4115.11A", "TCGA.GL.6846.11A",
"TCGA.B0.5402.11A", "TCGA.DZ.6133.11A", "TCGA.B0.5691.11A", "TCGA.B0.4700.11A",
"TCGA.B0.5696.11A", "TCGA.CW.5581.11A", "TCGA.BQ.7045.11A", "TCGA.KN.8427.11A",
"TCGA.GL.A59R.11A", "TCGA.CW.5584.11A", "TCGA.BQ.5878.11A", "TCGA.CW.5589.11A",
"TCGA.CJ.5672.11A", "TCGA.BQ.7044.11A", "TCGA.CZ.5466.11A", "TCGA.BQ.5887.11A",
"TCGA.CZ.4865.11A", "TCGA.CZ.5458.11A", "TCGA.Y8.A8RY.11A", "TCGA.KN.8422.11A"
), class = "data.frame")))
I used some simulated data to try this, since it seems that the question can be generalized to other datasets. I also ran into errors with your subset of data.
set.seed(123)
data <- as.matrix(data.frame(
a1 = rnorm(100, 0, 1),
a2 = rnorm(100, 0, 1),
a3 = rnorm(100, 0, 1),
b1 = rnorm(100, 0, 1.2),
b2 = rnorm(100, 0, 1.2),
b3 = rnorm(100, 0, 1.2)
))
Here we see that b1 would look nicer next to b3 and b2.
library(gplots)
ht <- heatmap.2(
x = data,
ColSideColors = c(rep("#b66363", 3), rep("#8fa8c0", 3)),
col = colorpanel(1000,"#FFA500","white","#2E2787"),
trace = "none",
key = FALSE,
dendrogram = "column",
main = "Example Data",
labRow = FALSE
)
Looking at the structure of the heatmap output, the dendrogram is stored as a "dendrogram" class object, which can be manipulated with with the reorder() generic.
The documentation doesn't reveal too much, but the second argument wts describes arbitrary weights to determine a reordered dendrogram. From what I can tell, large values generally get placed to the right. From trial-and-error, it appears that supplying weights in the original order of the columns worked out. This essentially flips branches without affecting the distance metrics.
colden <- ht$colDendrogram
colden_reordered <- reorder(colden, c(10, 1, 1, 100, 300, 200))
plot(colden, main = "original dendrogram")
plot(colden_reordered, main = "modified dendrogram")
When can then plot the heatmap with the new dendrogram using the Colv option.
ht2 <- heatmap.2(
x = data,
ColSideColors = c(rep("#b66363", 3), rep("#8fa8c0", 3)),
col = colorpanel(1000,"#FFA500","white","#2E2787"),
trace = "none",
key = FALSE,
dendrogram = "column",
Colv = colden_reordered,
main = "Reordered manually",
labRow = FALSE
)
The dendsort package may be a better go-to tool for sorting dendrograms. In short, the dendsort() moves clusters with smaller average distances to the left. Using this alone seems to solve the issue. With larger heatmaps, the benefit may be apparent when looking for patterns in the data. Seems preferable to manual reordering, when possible. Below I've modified the option Colv to use this function.
library(dendsort)
ht3 <- heatmap.2(
x = data,
ColSideColors = c(rep("#b66363", 3), rep("#8fa8c0", 3)),
col = colorpanel(1000,"#FFA500","white","#2E2787"),
trace = "none",
density.info = "none",
key = FALSE,
dendrogram = "column",
Colv = dendsort(colden),
main = "Reordered w/ dendsort()",
labRow = FALSE
)
For more options for heatmaps concerning clustering and groups, I use the ComplexHeatmap package, which has wonderful documentation. They have many options for splitting columns; for instance, slicing up the columns into groups first, and then clustering within those slices. See this section 2.7: Chapter 2 A Single Heatmap | ComplexHeatmap Complete Reference.
I try do define the model for my test and training dataset. But I get the following Error:
Error in eval(predvars, data, env) : object 'avg_rating' not found
But all of my datasets have the "avg_rating"
This is my code
lm_model <- train(avg_rating ~., data = trainingindex,method = "lm",na.action = na.omit, preProcess = c("scale", "center"),trControl = trainControl(method = "none"))
structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 21L, 23L, 24L, 25L, 27L, 28L, 29L, 30L,
31L, 32L, 33L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L,
45L, 46L, 47L, 48L, 49L, 52L, 53L, 55L, 58L, 61L, 62L, 63L, 65L,
66L, 67L, 68L, 69L, 70L, 71L, 74L, 77L, 78L, 80L, 81L, 83L, 84L,
85L, 86L, 87L, 88L, 90L, 91L, 92L, 93L, 94L, 96L, 97L, 99L, 102L,
103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L, 111L, 113L, 115L,
116L, 118L, 119L, 120L, 121L, 122L, 123L, 124L, 125L, 126L, 127L,
128L, 129L, 130L, 131L, 132L, 133L, 134L, 135L, 136L, 137L, 138L,
139L, 140L, 141L, 142L, 143L, 144L, 145L, 146L, 147L, 148L, 150L,
152L, 154L, 155L, 157L, 158L, 160L, 161L, 162L, 165L, 166L, 167L,
168L, 170L, 171L, 172L, 173L, 174L, 175L, 176L, 177L, 178L, 179L,
180L, 181L, 182L, 185L, 187L, 188L, 189L, 190L, 191L, 192L, 193L,
194L, 195L, 196L, 197L, 199L, 200L, 201L, 202L, 203L, 204L, 205L,
207L, 208L, 209L, 210L, 213L, 214L, 216L, 217L, 219L, 220L, 221L,
223L, 224L, 225L, 226L, 227L, 228L, 230L, 231L, 232L, 233L, 234L,
235L, 236L, 237L, 238L, 239L, 240L, 242L, 243L, 244L, 245L, 246L,
247L, 248L, 249L, 250L, 251L, 252L, 253L, 254L, 255L, 257L, 259L,
260L, 261L, 262L, 263L, 264L, 266L, 267L, 268L, 271L, 272L, 273L,
274L, 275L, 276L, 277L, 278L, 280L, 281L, 282L, 284L, 285L, 286L,
287L, 288L, 290L, 291L, 294L, 295L, 296L, 297L, 298L, 299L, 300L,
301L, 302L, 303L, 304L, 305L, 308L, 309L, 310L, 311L, 312L, 313L,
314L, 315L, 317L, 318L, 319L, 320L, 321L, 322L, 323L, 324L, 326L,
327L, 329L, 330L, 331L, 332L, 333L, 334L, 335L, 336L, 337L, 338L,
340L, 341L, 343L, 344L, 345L, 346L, 348L, 349L, 350L, 351L, 353L,
354L, 355L, 356L, 357L, 358L, 359L, 360L, 361L, 363L, 364L, 365L,
366L, 367L, 368L, 369L, 370L, 371L, 372L, 373L, 374L, 375L, 376L,
377L, 378L, 379L, 380L, 381L, 382L, 383L, 384L, 385L, 386L, 387L,... 3687L), .Dim = c(2952L, 1
), .Dimnames = list(NULL, "Resample1"))
15L, 16L, 17L, 18L, 19L, 21L, 23L, 24L, 25L, 27L, 28L, 29L, 30L,
31L, 32L, 33L, 35L, 36L), .Dim = c(30L, 1L), .Dimnames = list(
NULL, "Resample1"))
This is a little strange. I converted data from .csv to .xts other times before, but this time for some reasons cannot.
Here is my data set (dput() of half the real data set, since the complete one was out of characters limits. And yeah, the problem persists):
structure(list(time = structure(c(347L, 369L, 391L, 413L, 435L,
457L, 479L, 501L, 522L, 543L, 564L, 585L, 605L, 624L, 641L, 12L,
33L, 54L, 75L, 96L, 117L, 138L, 159L, 180L, 201L, 222L, 243L,
264L, 285L, 306L, 327L, 349L, 371L, 393L, 415L, 437L, 459L, 481L,
503L, 524L, 545L, 566L, 587L, 607L, 626L, 643L, 14L, 35L, 56L,
77L, 98L, 119L, 140L, 161L, 182L, 203L, 224L, 245L, 266L, 287L,
308L, 329L, 351L, 373L, 395L, 417L, 439L, 461L, 483L, 505L, 526L,
547L, 568L, 589L, 609L, 628L, 16L, 37L, 58L, 79L, 100L, 121L,
142L, 163L, 184L, 205L, 226L, 247L, 268L, 289L, 310L, 331L, 353L,
375L, 397L, 419L, 441L, 463L, 485L, 507L, 528L, 549L, 570L, 591L,
611L, 630L, 645L, 18L, 39L, 60L, 81L, 102L, 123L, 144L, 165L,
186L, 207L, 228L, 249L, 270L, 291L, 312L, 333L, 355L, 377L, 399L,
421L, 443L, 465L, 487L, 509L, 530L, 551L, 572L, 593L, 613L, 632L,
20L, 41L, 62L, 83L, 104L, 125L, 146L, 167L, 188L, 209L, 230L,
251L, 272L, 293L, 314L, 335L, 357L, 379L, 401L, 423L, 445L, 467L,
489L, 511L, 532L, 553L, 574L, 595L, 615L, 634L, 647L, 1L, 22L,
43L, 64L, 85L, 106L, 127L, 148L, 169L, 190L, 211L, 232L, 253L,
274L, 295L, 316L, 337L, 359L, 381L, 403L, 425L, 447L, 469L, 491L,
513L, 534L, 555L, 576L, 597L, 617L, 636L, 3L, 24L, 45L, 66L,
87L, 108L, 129L, 150L, 171L, 192L, 213L, 234L, 255L, 276L, 297L,
318L, 339L, 361L, 383L, 405L, 427L, 449L, 471L, 493L, 515L, 536L,
557L, 578L, 5L, 26L, 47L, 68L, 89L, 110L, 131L, 152L, 173L, 194L,
215L, 236L, 257L, 278L, 299L, 320L, 341L, 363L, 385L, 407L, 429L,
451L, 473L, 495L, 517L, 538L, 559L, 580L, 600L, 619L, 638L, 7L,
28L, 49L, 70L, 91L, 112L, 133L, 154L, 175L, 196L, 217L, 238L,
259L, 280L, 301L, 322L, 343L, 365L, 387L, 409L, 431L, 453L, 475L,
497L, 519L, 540L, 561L, 582L, 602L, 621L, 9L, 30L, 51L, 72L,
93L, 114L, 135L, 156L, 177L, 198L, 219L, 240L, 261L, 282L, 303L,
324L, 345L, 367L, 389L, 411L, 433L, 455L, 477L, 499L, 520L, 541L,
562L, 583L, 603L, 622L, 640L, 10L, 31L, 52L, 73L, 94L, 115L,
136L, 157L, 178L, 199L, 220L, 241L, 262L, 283L, 304L, 325L, 346L,
368L, 390L, 412L, 434L, 456L, 478L, 500L, 521L, 542L, 563L, 584L,
604L, 623L, 11L, 32L, 53L, 74L, 95L, 116L, 137L, 158L, 179L,
200L, 221L, 242L, 263L, 284L, 305L, 326L, 348L, 370L, 392L, 414L,
436L, 458L, 480L, 502L, 523L, 544L, 565L, 586L, 606L, 625L, 642L,
13L, 34L, 55L, 76L, 97L, 118L, 139L, 160L, 181L, 202L, 223L,
244L, 265L, 286L, 307L, 328L, 350L, 372L, 394L, 416L, 438L, 460L,
482L, 504L, 525L, 546L, 567L, 588L, 608L, 627L, 644L, 15L, 36L,
57L, 78L, 99L, 120L, 141L, 162L, 183L, 204L, 225L, 246L, 267L,
288L, 309L, 330L, 352L, 374L, 396L, 418L, 440L, 462L, 484L, 506L,
527L, 548L, 569L, 590L, 610L, 629L, 17L, 38L, 59L, 80L, 101L,
122L, 143L, 164L, 185L, 206L, 227L, 248L, 269L, 290L, 311L, 332L,
354L, 376L, 398L, 420L, 442L, 464L, 486L, 508L, 529L, 550L, 571L,
592L, 612L, 631L, 646L, 19L, 40L, 61L, 82L, 103L, 124L, 145L,
166L, 187L, 208L, 229L, 250L, 271L, 292L, 313L, 334L, 356L, 378L,
400L, 422L, 444L, 466L, 488L, 510L, 531L, 552L, 573L, 594L, 614L,
633L, 21L, 42L, 63L, 84L, 105L, 126L, 147L, 168L, 189L, 210L,
231L, 252L, 273L, 294L, 315L, 336L, 358L, 380L, 402L, 424L, 446L,
468L, 490L, 512L, 533L, 554L, 575L, 596L, 616L, 635L, 648L, 2L,
23L, 44L, 65L, 86L, 107L, 128L, 149L, 170L, 191L, 212L, 233L,
254L, 275L, 296L, 317L, 338L, 360L, 382L, 404L, 426L, 448L, 470L,
492L, 514L, 535L, 556L, 577L, 598L, 618L, 637L, 4L, 25L, 46L,
67L, 88L, 109L, 130L, 151L, 172L, 193L, 214L, 235L, 256L, 277L,
298L, 319L, 340L, 362L, 384L, 406L, 428L, 450L, 472L, 494L, 516L,
537L, 558L, 579L, 599L, 6L, 27L, 48L, 69L, 90L, 111L, 132L, 153L,
174L, 195L, 216L, 237L, 258L, 279L, 300L, 321L, 342L, 364L, 386L,
408L, 430L, 452L, 474L, 496L, 518L, 539L, 560L, 581L, 601L, 620L,
639L, 8L, 29L, 50L, 71L, 92L, 113L, 134L, 155L, 176L, 197L, 218L,
239L, 260L, 281L, 302L, 323L, 344L, 366L, 388L, 410L, 432L, 454L,
476L, 498L), .Label = c("01/01/2015", "01/01/2016", "01/02/2015",
"01/02/2016", "01/03/2015", "01/03/2016", "01/04/2015", "01/04/2016",
"01/05/2015", "01/06/2015", "01/07/2015", "01/08/2014", "01/08/2015",
"01/09/2014", "01/09/2015", "01/10/2014", "01/10/2015", "01/11/2014",
"01/11/2015", "01/12/2014", "01/12/2015", "02/01/2015", "02/01/2016",
"02/02/2015", "02/02/2016", "02/03/2015", "02/03/2016", "02/04/2015",
"02/04/2016", "02/05/2015", "02/06/2015", "02/07/2015", "02/08/2014",
"02/08/2015", "02/09/2014", "02/09/2015", "02/10/2014", "02/10/2015",
"02/11/2014", "02/11/2015", "02/12/2014", "02/12/2015", "03/01/2015",
"03/01/2016", "03/02/2015", "03/02/2016", "03/03/2015", "03/03/2016",
"03/04/2015", "03/04/2016", "03/05/2015", "03/06/2015", "03/07/2015",
"03/08/2014", "03/08/2015", "03/09/2014", "03/09/2015", "03/10/2014",
"03/10/2015", "03/11/2014", "03/11/2015", "03/12/2014", "03/12/2015",
"04/01/2015", "04/01/2016", "04/02/2015", "04/02/2016", "04/03/2015",
"04/03/2016", "04/04/2015", "04/04/2016", "04/05/2015", "04/06/2015",
"04/07/2015", "04/08/2014", "04/08/2015", "04/09/2014", "04/09/2015",
"04/10/2014", "04/10/2015", "04/11/2014", "04/11/2015", "04/12/2014",
"04/12/2015", "05/01/2015", "05/01/2016", "05/02/2015", "05/02/2016",
"05/03/2015", "05/03/2016", "05/04/2015", "05/04/2016", "05/05/2015",
"05/06/2015", "05/07/2015", "05/08/2014", "05/08/2015", "05/09/2014",
"05/09/2015", "05/10/2014", "05/10/2015", "05/11/2014", "05/11/2015",
"05/12/2014", "05/12/2015", "06/01/2015", "06/01/2016", "06/02/2015",
"06/02/2016", "06/03/2015", "06/03/2016", "06/04/2015", "06/04/2016",
"06/05/2015", "06/06/2015", "06/07/2015", "06/08/2014", "06/08/2015",
"06/09/2014", "06/09/2015", "06/10/2014", "06/10/2015", "06/11/2014",
"06/11/2015", "06/12/2014", "06/12/2015", "07/01/2015", "07/01/2016",
"07/02/2015", "07/02/2016", "07/03/2015", "07/03/2016", "07/04/2015",
"07/04/2016", "07/05/2015", "07/06/2015", "07/07/2015", "07/08/2014",
"07/08/2015", "07/09/2014", "07/09/2015", "07/10/2014", "07/10/2015",
"07/11/2014", "07/11/2015", "07/12/2014", "07/12/2015", "08/01/2015",
"08/01/2016", "08/02/2015", "08/02/2016", "08/03/2015", "08/03/2016",
"08/04/2015", "08/04/2016", "08/05/2015", "08/06/2015", "08/07/2015",
"08/08/2014", "08/08/2015", "08/09/2014", "08/09/2015", "08/10/2014",
"08/10/2015", "08/11/2014", "08/11/2015", "08/12/2014", "08/12/2015",
"09/01/2015", "09/01/2016", "09/02/2015", "09/02/2016", "09/03/2015",
"09/03/2016", "09/04/2015", "09/04/2016", "09/05/2015", "09/06/2015",
"09/07/2015", "09/08/2014", "09/08/2015", "09/09/2014", "09/09/2015",
"09/10/2014", "09/10/2015", "09/11/2014", "09/11/2015", "09/12/2014",
"09/12/2015", "10/01/2015", "10/01/2016", "10/02/2015", "10/02/2016",
"10/03/2015", "10/03/2016", "10/04/2015", "10/04/2016", "10/05/2015",
"10/06/2015", "10/07/2015", "10/08/2014", "10/08/2015", "10/09/2014",
"10/09/2015", "10/10/2014", "10/10/2015", "10/11/2014", "10/11/2015",
"10/12/2014", "10/12/2015", "11/01/2015", "11/01/2016", "11/02/2015",
"11/02/2016", "11/03/2015", "11/03/2016", "11/04/2015", "11/04/2016",
"11/05/2015", "11/06/2015", "11/07/2015", "11/08/2014", "11/08/2015",
"11/09/2014", "11/09/2015", "11/10/2014", "11/10/2015", "11/11/2014",
"11/11/2015", "11/12/2014", "11/12/2015", "12/01/2015", "12/01/2016",
"12/02/2015", "12/02/2016", "12/03/2015", "12/03/2016", "12/04/2015",
"12/04/2016", "12/05/2015", "12/06/2015", "12/07/2015", "12/08/2014",
"12/08/2015", "12/09/2014", "12/09/2015", "12/10/2014", "12/10/2015",
"12/11/2014", "12/11/2015", "12/12/2014", "12/12/2015", "13/01/2015",
"13/01/2016", "13/02/2015", "13/02/2016", "13/03/2015", "13/03/2016",
"13/04/2015", "13/04/2016", "13/05/2015", "13/06/2015", "13/07/2015",
"13/08/2014", "13/08/2015", "13/09/2014", "13/09/2015", "13/10/2014",
"13/10/2015", "13/11/2014", "13/11/2015", "13/12/2014", "13/12/2015",
"14/01/2015", "14/01/2016", "14/02/2015", "14/02/2016", "14/03/2015",
"14/03/2016", "14/04/2015", "14/04/2016", "14/05/2015", "14/06/2015",
"14/07/2015", "14/08/2014", "14/08/2015", "14/09/2014", "14/09/2015",
"14/10/2014", "14/10/2015", "14/11/2014", "14/11/2015", "14/12/2014",
"14/12/2015", "15/01/2015", "15/01/2016", "15/02/2015", "15/02/2016",
"15/03/2015", "15/03/2016", "15/04/2015", "15/04/2016", "15/05/2015",
"15/06/2015", "15/07/2015", "15/08/2014", "15/08/2015", "15/09/2014",
"15/09/2015", "15/10/2014", "15/10/2015", "15/11/2014", "15/11/2015",
"15/12/2014", "15/12/2015", "16/01/2015", "16/01/2016", "16/02/2015",
"16/02/2016", "16/03/2015", "16/03/2016", "16/04/2015", "16/04/2016",
"16/05/2015", "16/06/2015", "16/07/2015", "16/08/2014", "16/08/2015",
"16/09/2014", "16/09/2015", "16/10/2014", "16/10/2015", "16/11/2014",
"16/11/2015", "16/12/2014", "16/12/2015", "17/01/2015", "17/01/2016",
"17/02/2015", "17/02/2016", "17/03/2015", "17/03/2016", "17/04/2015",
"17/04/2016", "17/05/2015", "17/06/2015", "17/07/2014", "17/07/2015",
"17/08/2014", "17/08/2015", "17/09/2014", "17/09/2015", "17/10/2014",
"17/10/2015", "17/11/2014", "17/11/2015", "17/12/2014", "17/12/2015",
"18/01/2015", "18/01/2016", "18/02/2015", "18/02/2016", "18/03/2015",
"18/03/2016", "18/04/2015", "18/04/2016", "18/05/2015", "18/06/2015",
"18/07/2014", "18/07/2015", "18/08/2014", "18/08/2015", "18/09/2014",
"18/09/2015", "18/10/2014", "18/10/2015", "18/11/2014", "18/11/2015",
"18/12/2014", "18/12/2015", "19/01/2015", "19/01/2016", "19/02/2015",
"19/02/2016", "19/03/2015", "19/03/2016", "19/04/2015", "19/04/2016",
"19/05/2015", "19/06/2015", "19/07/2014", "19/07/2015", "19/08/2014",
"19/08/2015", "19/09/2014", "19/09/2015", "19/10/2014", "19/10/2015",
"19/11/2014", "19/11/2015", "19/12/2014", "19/12/2015", "20/01/2015",
"20/01/2016", "20/02/2015", "20/02/2016", "20/03/2015", "20/03/2016",
"20/04/2015", "20/04/2016", "20/05/2015", "20/06/2015", "20/07/2014",
"20/07/2015", "20/08/2014", "20/08/2015", "20/09/2014", "20/09/2015",
"20/10/2014", "20/10/2015", "20/11/2014", "20/11/2015", "20/12/2014",
"20/12/2015", "21/01/2015", "21/01/2016", "21/02/2015", "21/02/2016",
"21/03/2015", "21/03/2016", "21/04/2015", "21/04/2016", "21/05/2015",
"21/06/2015", "21/07/2014", "21/07/2015", "21/08/2014", "21/08/2015",
"21/09/2014", "21/09/2015", "21/10/2014", "21/10/2015", "21/11/2014",
"21/11/2015", "21/12/2014", "21/12/2015", "22/01/2015", "22/01/2016",
"22/02/2015", "22/02/2016", "22/03/2015", "22/03/2016", "22/04/2015",
"22/04/2016", "22/05/2015", "22/06/2015", "22/07/2014", "22/07/2015",
"22/08/2014", "22/08/2015", "22/09/2014", "22/09/2015", "22/10/2014",
"22/10/2015", "22/11/2014", "22/11/2015", "22/12/2014", "22/12/2015",
"23/01/2015", "23/01/2016", "23/02/2015", "23/02/2016", "23/03/2015",
"23/03/2016", "23/04/2015", "23/04/2016", "23/05/2015", "23/06/2015",
"23/07/2014", "23/07/2015", "23/08/2014", "23/08/2015", "23/09/2014",
"23/09/2015", "23/10/2014", "23/10/2015", "23/11/2014", "23/11/2015",
"23/12/2014", "23/12/2015", "24/01/2015", "24/01/2016", "24/02/2015",
"24/02/2016", "24/03/2015", "24/03/2016", "24/04/2015", "24/04/2016",
"24/05/2015", "24/06/2015", "24/07/2014", "24/07/2015", "24/08/2014",
"24/08/2015", "24/09/2014", "24/09/2015", "24/10/2014", "24/10/2015",
"24/11/2014", "24/11/2015", "24/12/2014", "24/12/2015", "25/01/2015",
"25/01/2016", "25/02/2015", "25/02/2016", "25/03/2015", "25/03/2016",
"25/04/2015", "25/05/2015", "25/06/2015", "25/07/2014", "25/07/2015",
"25/08/2014", "25/08/2015", "25/09/2014", "25/09/2015", "25/10/2014",
"25/10/2015", "25/11/2014", "25/11/2015", "25/12/2014", "25/12/2015",
"26/01/2015", "26/01/2016", "26/02/2015", "26/02/2016", "26/03/2015",
"26/03/2016", "26/04/2015", "26/05/2015", "26/06/2015", "26/07/2014",
"26/07/2015", "26/08/2014", "26/08/2015", "26/09/2014", "26/09/2015",
"26/10/2014", "26/10/2015", "26/11/2014", "26/11/2015", "26/12/2014",
"26/12/2015", "27/01/2015", "27/01/2016", "27/02/2015", "27/02/2016",
"27/03/2015", "27/03/2016", "27/04/2015", "27/05/2015", "27/06/2015",
"27/07/2014", "27/07/2015", "27/08/2014", "27/08/2015", "27/09/2014",
"27/09/2015", "27/10/2014", "27/10/2015", "27/11/2014", "27/11/2015",
"27/12/2014", "27/12/2015", "28/01/2015", "28/01/2016", "28/02/2015",
"28/02/2016", "28/03/2015", "28/03/2016", "28/04/2015", "28/05/2015",
"28/06/2015", "28/07/2014", "28/07/2015", "28/08/2014", "28/08/2015",
"28/09/2014", "28/09/2015", "28/10/2014", "28/10/2015", "28/11/2014",
"28/11/2015", "28/12/2014", "28/12/2015", "29/01/2015", "29/01/2016",
"29/02/2016", "29/03/2015", "29/03/2016", "29/04/2015", "29/05/2015",
"29/06/2015", "29/07/2014", "29/07/2015", "29/08/2014", "29/08/2015",
"29/09/2014", "29/09/2015", "29/10/2014", "29/10/2015", "29/11/2014",
"29/11/2015", "29/12/2014", "29/12/2015", "30/01/2015", "30/01/2016",
"30/03/2015", "30/03/2016", "30/04/2015", "30/05/2015", "30/06/2015",
"30/07/2014", "30/07/2015", "30/08/2014", "30/08/2015", "30/09/2014",
"30/09/2015", "30/10/2014", "30/10/2015", "30/11/2014", "30/11/2015",
"30/12/2014", "30/12/2015", "31/01/2015", "31/01/2016", "31/03/2015",
"31/03/2016", "31/05/2015", "31/07/2014", "31/07/2015", "31/08/2014",
"31/08/2015", "31/10/2014", "31/10/2015", "31/12/2014", "31/12/2015"
), class = "factor"), index = c(11.54043, 14.27814, 11.5583,
12.37828, 12.54057, 12.10189, 12.12189, 12.28188, 11.96189, 12.35303,
13.023, 12.55187, 11.04192, 8.722033, 6.952167, 6.732189, 9.022016,
8.432052, 5.882287, 5.276563, 4.731485, 4.403024, 4.651509, 6.319038,
7.818936, 7.948929, 6.809, 6.199048, 6.749004, 6.499023, 5.899076,
4.529247, 4.02078, 3.760833, 3.617566, 3.36093, 3.950794, 4.230742,
4.320727, 4.720667, 4.570688, 4.080769, 4.360721, 4.580687, 4.730665,
4.630679, 4.960635, 4.180751, 4.270736, 4.210746, 4.440708, 3.670853,
3.570877, 3.650858, 3.740838, 3.880808, 3.840816, 3.240964, 3.160988,
3.250961, 3.580874, 3.560879, 5.380586, 4.510697, 4.390716, 4.260737,
3.890806, 3.36093, 3.721801, 3.591829, 3.560497, 4.120431, 4.55039,
4.4404, 4.470397, 4.670381, 3.660484, 3.730475, 3.160559, 3.320533,
3.380523, 3.600492, 3.030583, 3.260542, 2.970594, 3.040581, 2.99059,
3.40052, 3.730475, 3.430516, 3.530501, 2.970594, 3.820464, 3.830463,
3.870458, 3.700479, 3.710477, 3.680481, 3.490507, 3.740474, 3.260542,
3.318999, 3.298999, 3.328999, 3.368284, 3.41828, 3.238295, 3.008317,
2.878331, 2.788342, 2.598366, 2.488382, 2.468385, 2.448388, 2.548373,
2.308412, 2.448388, 2.658358, 2.048463, 2.568371, 2.838336, 2.868332,
2.998318, 3.358285, 3.118306, 2.618364, 2.478384, 3.1783, 3.018316,
3.07831, 2.898329, 2.938325, 2.88833, 2.848335, 2.948324, 2.908328,
2.958322, 2.968321, 2.736638, 2.927969, 2.95236, 2.92152, 4.159778,
3.274662, 3.716456, 4.321648, 4.33252, 4.942867, 4.324445, 3.925162,
3.485163, 3.945088, 3.467801, 3.84071, 3.542677, 3.207959, 3.097636,
3.229113, 3.049058, 3.487368, 2.946642, 3.194158, 3.033129, 2.741163,
2.646968, 2.514944, 2.612467, 2.806449, 2.708465, 2.567833, 2.783192,
2.99844, 2.858031, 2.860846, 2.422666, 2.08108, 2.192705, 2.407469,
2.951197, 2.425093, 2.561358, 2.162087, 2.164641, 2.295119, 1.817072,
1.385466, 2.399334, 2.859039, 2.098575, 2.406024, 2.369869, 2.744476,
3.224035, 2.8761, 2.99883, 3.079353, 2.99788, 2.957237, 2.329897,
2.556688, 2.261765, 2.211449, 2.077952, 2.172062, 2.501332, 2.271251,
2.567649, 1.985015, 2.011745, 2.378133, 1.937532, 2.295658, 1.967439,
1.922405, 1.77076, 1.877509, 1.903558, 1.843825, 2.033853, 2.107302,
2.038126, 2.054973, 1.993873, 2.042604, 1.981318, 2.286632, 1.902597,
2.202905, 2.262768, 2.493253, 2.105771, 2.113826, 2.7515, 2.085522,
2.613089, 2.118656, 2.310738, 2.626212, 2.629956, 2.752603, 2.746964,
2.766788, 2.696453, 2.159032, 2.134599, 1.714365, 1.55678, 1.626582,
1.607851, 1.532417, 1.571745, 1.500041, 1.543227, 1.480322, 1.762261,
1.515217, 1.304601, 1.447073, 1.475861, 1.498862, 1.573622, 1.515242,
1.606151, 1.581706, 1.443625, 1.442918, 1.450428, 1.56483, 1.502704,
1.555937, 1.593459, 1.459013, 1.365548, 1.530271, 1.522306, 1.164105,
1.449812, 1.34549, 1.277848, 1.140585, 1.035555, 1.161103, 1.085743,
1.174396, 1.188879, 1.245301, 0.985737, 1.169837, 1.21196, 1.132433,
1.199008, 1.16729, 1.176818, 1.202165, 1.191286, 1.199928, 1.16782,
1.163427, 1.147315, 1.152607, 1.229492, 1.464407, 1.35002, 1.326579,
1.254948, 1.333277, 0.965398, 1.246482, 1.068102, 1.05843, 1.15212,
1.182821, 1.328945, 1.261149, 1.319696, 0.815034, 1.242683, 1.222728,
1.351629, 1.311053, 1.299895, 1.161236, 0.913985, 1.021523, 0.974081,
1.312736, 0.84724, 0.784337, 0.910343, 0.911839, 0.988695, 1.204447,
1.188309, 1.209292, 1.269653, 1.131285, 1.196762, 1.122018, 1.278813,
1.306997, 1.507417, 1.808925, 1.422698, 1.362512, 1.456492, 1.339841,
1.408134, 1.464803, 1.472624, 1.507043, 1.55663, 1.48721, 1.481805,
1.350952, 1.394053, 1.505662, 1.552468, 1.835227, 1.529406, 1.542733,
2.472506, 2.051214, 2.04605, 2.332706, 2.51142, 2.856563, 2.625034,
2.642861, 2.351145, 2.318266, 2.551799, 2.332817, 2.073351, 1.730547,
2.268209, 2.08866, 1.918522, 2.225836, 2.343466, 2.1983, 2.214688,
2.249369, 2.320987, 2.158788, 2.250545, 1.86419, 1.960187, 2.145659,
1.785818, 1.812893, 1.670426, 1.759863, 1.930967, 1.911622, 1.682475,
1.77137, 1.566444, 1.802325, 1.586361, 1.294167, 1.483635, 1.699373,
1.980278, 1.628827, 2.130249, 1.65064, 1.830685, 2.334663, 2.239406,
2.374907, 2.174426, 2.11795, 1.962688, 1.970793, 2.334288, 1.97112,
2.109338, 2.380336, 1.974693, 2.231339, 1.150346, 1.248199, 1.104014,
1.145332, 1.376, 1.365866, 1.431675, 1.411714, 1.470395, 1.463537,
1.479107, 1.571953, 1.582307, 1.425284, 1.357404, 1.459058, 1.29251,
2.079904, 2.043994, 2.02053, 1.854421, 2.024019, 2.027243, 2.024739,
2.020098, 2.072994, 1.89817, 1.970579, 1.925721, 1.940698, 1.958429,
1.97927, 1.990377, 2.545347, 2.343933, 2.110605, 2.372304, 2.614607,
2.65837, 1.253188, 2.371879, 2.48065, 2.581769, 2.201459, 1.705221,
2.662408, 1.769794, 2.160805, 1.933198, 2.318748, 2.279574, 2.206514,
1.86008, 2.221785, 2.732116, 2.876525, 2.45854, 2.093711, 1.990731,
2.119744, 1.88928, 1.906683, 1.711405, 1.290373, 1.965132, 1.639966,
1.579937, 1.896039, 1.955329, 1.970785, 1.41028, 1.963055, 1.935048,
1.958985, 1.912964, 1.915689, 1.844459, 2.267502, 2.263569, 2.260751,
1.863576, 1.810112, 1.739387, 1.646463, 1.552307, 1.871372, 1.735762,
1.694135, 1.627406, 1.789137, 1.636116, 1.65404, 1.655442, 1.466584,
1.630533, 1.474457, 1.505985, 1.435338, 1.537106, 1.521365, 1.464372,
1.450722, 1.387195, 1.432416, 1.409623, 1.943541, 1.895353, 1.727831,
1.915016, 2.142965, 1.78175, 1.757019, 4.046341, 2.268203, 1.695811,
1.714067, 1.689575, 1.810448, 1.587102, 1.83034, 1.513751, 1.535203,
1.531233, 1.43809, 1.390571, 1.292746, 1.3538, 1.201273, 1.481288,
1.600983, 1.438571, 1.583992, 1.766542, 1.717157, 1.773975, 1.95323,
2.0458, 1.965663, 1.868745, 1.862877, 1.717166, 1.85268, 1.865566,
2.831913, 1.858382, 1.926938, 1.911859, 2.364972, 2.271169, 2.147911,
2.273932, 2.173164, 2.235003, 2.160419, 2.58684, 2.440009, 2.334429,
2.374356, 2.637341, 2.751997, 2.662583, 2.570964, 2.643219, 2.196613,
2.226018, 2.142688, 2.403963, 2.384954, 2.661776, 2.711935, 2.714279,
2.329776, 2.370735, 2.100872, 1.943771, 1.575529, 1.544865, 1.51201,
1.443336, 1.655716, 1.664355, 1.717507, 1.717282, 1.806321, 1.788896,
1.803193, 1.401859, 1.762782, 1.537422, 2.145965, 2.305251, 2.110511,
1.934735, 1.946052, 2.138253, 2.025721, 1.993805, 2.072526, 1.888899,
1.803845, 1.830216, 1.821895, 1.843385, 1.999159, 1.951067, 1.889941,
2.360204, 2.645206, 2.347469, 2.241971, 2.043113, 1.962672, 1.903516,
1.609725, 1.71036, 1.801525, 1.748996, 1.566542, 1.588622, 1.507817,
1.629962, 1.669554, 1.624924, 1.555608, 1.474775, 1.438227, 1.664659,
1.499378)), .Names = c("time", "index"), class = "data.frame", row.names = c(NA,
-648L))
So, what I generally do is to write this code:
library(fBasics)
pw_index <- read.csv("~/data/index.csv",
header=T)
# Set time in date format
index$time <- as.Date(index$time, format="%d/%m/%y")
index <- index[order(index$time), ]
# Save the date in a separate identifier as character
dates = as.character(index$time)
index <- index[order(dates), ]
# Convert the data frame to an .xts object:
index_xts <- as.xts(index$index, order.by=index$time)
head(index_xts)
If I initially inspect the dataset vie head() I obtain this:
time index
<fctr> <dbl>
1 17/07/2014 11.54043
2 18/07/2014 14.27814
3 19/07/2014 11.55830
4 20/07/2014 12.37828
5 21/07/2014 12.54057
6 22/07/2014 12.10189
However, what I do obtain after the code is a completely messed out dataset (last observation should be of 2016...):
[,1]
2020-01-01 2.708465
2020-01-01 2.268203
2020-01-02 2.567833
2020-01-02 1.695811
2020-01-03 2.783192
2020-01-03 1.714067
Who knows what's going on?
Your code is somewhat convoluted, and I'm not entirely sure what you're trying to do. For converting the data in your data.frame into an xts object you can do the following:
library(xts);
xts <- xts(x = df$index, order.by = as.POSIXct(df$time, format = "%d/%m/%Y"));
tail(xts);
# [,1]
#2016-04-19 1.624924
#2016-04-20 1.555608
#2016-04-21 1.474775
#2016-04-22 1.438227
#2016-04-23 1.664659
#2016-04-24 1.499378
I assume that df is your data.frame the content of which you provided with dput.
I have weekly data for 3 years. Now my objective is to remove the trend and seasonality effects from the series using STL function.
I can decompose time series components using decompose function in stats package. But I am getting NA values for first and last 52 values of trend and random effects.
In my sample dataset there is perfect seasonality and mean and varience are changing over time. So, I wanted to build multiplicative model. Here I have used stl function in stats package to decompose trend and seasonality. I know that stl function can handle additive model. But we can build multiplicative model also by using log transformation. Here I tried both of the models. But I am not getting as results as expected. I am sure that i am missing something in this code.
series<-ts(series,frequency=365.25/7,start(2013,9))
series<-structure(c(62L, 72L, 48L, 50L, 302L, 396L, 66L, 33L, 77L, 91L,
38L, 38L, 43L, 45L, 134L, 754L, 1011L, 901L, 483L, 237L, 99L,
59L, 92L, 65L, 120L, 214L, 329L, 387L, 276L, 307L, 395L, 372L,
332L, 258L, 291L, 359L, 211L, 308L, 250L, 1374L, 1131L, 845L,
588L, 770L, 499L, 532L, 491L, 359L, 318L, 219L, 153L, 138L, 156L,
133L, 92L, 77L, 214L, 273L, 86L, 75L, 51L, 163L, 72L, 191L, 62L,
49L, 79L, 573L, 569L, 444L, 410L, 404L, 345L, 141L, 146L, 179L,
127L, 143L, 382L, 548L, 283L, 315L, 392L, 394L, 313L, 373L, 603L,
429L, 384L, 419L, 449L, 1774L, 2025L, 1532L, 1252L, 857L, 790L,
658L, 389L, 398L, 398L, 302L, 237L, 249L, 182L, 167L, 109L, 179L,
377L, 288L, 146L, 126L, 449L, 138L, 580L, 130L, 94L, 150L, 173L,
1246L, 1227L, 991L, 707L, 489L, 592L, 326L, 209L, 259L, 286L,
243L, 344L, 335L, 368L, 397L, 349L, 313L, 1345L, 301L, 1111L,
366L, 274L, 302L, 248L, 2518L, 2186L, 2094L, 2151L, 1847L, 1384L,
666L, 455L, 415L, 302L, 277L, 172L, 186L), .Tsp = c(1, 3.97056810403833,
52.1785714285714), class = "ts")
#Model 1
model1<-stl(series,"periodic",robust="TRUE")
op<-as.data.frame(model1$time.series)
head(op,25)
matplot(op,type="l")
#Model 2
model2<-stl(log(series),"periodic",robust="TRUE")
op<-exp(as.data.frame(model2$time.series))
matplot(op,type="l")
How can I improve the model performance?
Please suggest me if there are any better ways to solve with this problem.
Thanks in advance.
I have toyed with a number of ideas to do this, but so far have only come up with some rather inelegant solutions. I'm sure I could make it work, but the code would neither be pretty nor efficient. Here's the problem:
I have a series of integer pairs that are presented as rows in a two-column data frame. The goal is three-fold:
You need to "eliminate" all the rows in this data frame. To "eliminate" a row, you must select either one of the units from that pair and send/save it to a vector of "selected" elements.
You must find the smallest possible combination of "selected elements" that will eliminate all the pairs in the data frame.
The code must be computationally efficient because it will be applied to rather large datasets.
For instance, one would choose items "1" and "2" from the following list of pairs:
1 3
1 4
2 5
3 2
The data below can be used as a working example.
Thanks!
Vincent
Update for some context:
Hi Cipi and SiggyF.
I understand your concerns about this being homework, so in case you read this again, here's some context that I hope may dispell your doubts.
I am working with time-series cross sectional data in which N is much larger than T. I would like to use panel-corrected standard errors like those proposed in Beck & Katz (1995). The packages "pcse" is mostly able to do this just fine. When you have an unbalanced panel, it essentially creates a "rectangular" dataset (every time units has the full amount of observations) by filling in missing values for the omitted observations in every panel. Then, pcse computes a matrix Sigma.hat which is essentially the weighted average of the outer product of the residuals within time periods (think of it as averaging over an N X N X T array to bring it down to a N X N Sigma.hat).
The problem is that if any two units have zero contemporaneous observation, then the corresponding cell in Sigma.hat will be NA, and pcse won't be able to use it to get the sandwich estimator of the variance covariance matrix. In my example, the data frame numbers correspond to the index of the missing values in Sigma.hat. I want to trim down Sigma.hat automatically, to get an estimate of the VCOV that uses the most information possible, hence my desire to keep as many of the numbers in the data frame.
This is probably very unclear to anyone who hasn't looked into pcse, but I hope you get the gist of it.
Sorry to have given an impression of impropriety, but I assure you, this is legit.
test<-structure(list(row = c(27L, 44L,
45L, 111L, 128L, 129L, 195L, 212L,
213L, 279L, 296L, 297L, 363L, 380L,
381L, 7L, 91L, 175L, 259L, 343L, 44L,
45L, 70L, 128L, 129L, 154L, 212L,
213L, 238L, 296L, 297L, 322L, 380L,
381L, 406L, 7L, 37L, 48L, 91L, 121L,
132L, 175L, 205L, 216L, 259L, 289L,
300L, 343L, 373L, 384L, 7L, 37L, 48L,
91L, 121L, 132L, 175L, 205L, 216L,
259L, 289L, 300L, 343L, 373L, 384L,
44L, 45L, 128L, 129L, 212L, 213L,
296L, 297L, 380L, 381L, 37L, 121L,
205L, 289L, 373L, 27L, 44L, 45L, 111L,
128L, 129L, 195L, 212L, 213L, 279L,
296L, 297L, 363L, 380L, 381L, 7L,
91L, 175L, 259L, 343L, 44L, 45L, 70L,
128L, 129L, 154L, 212L, 213L, 238L,
296L, 297L, 322L, 380L, 381L, 406L,
7L, 37L, 48L, 91L, 121L, 132L, 175L,
205L, 216L, 259L, 289L, 300L, 343L,
373L, 384L, 7L, 37L, 48L, 91L, 121L,
132L, 175L, 205L, 216L, 259L, 289L,
300L, 343L, 373L, 384L, 44L, 45L,
128L, 129L, 212L, 213L, 296L, 297L,
380L, 381L, 37L, 121L, 205L, 289L,
373L, 27L, 44L, 45L, 111L, 128L,
129L, 195L, 212L, 213L, 279L, 296L,
297L, 363L, 380L, 381L, 7L, 91L,
175L, 259L, 343L, 44L, 45L, 70L, 128L,
129L, 154L, 212L, 213L, 238L, 296L,
297L, 322L, 380L, 381L, 406L, 7L,
37L, 48L, 91L, 121L, 132L, 175L, 205L,
216L, 259L, 289L, 300L, 343L, 373L,
384L, 7L, 37L, 48L, 91L, 121L, 132L,
175L, 205L, 216L, 259L, 289L, 300L,
343L, 373L, 384L, 44L, 45L, 128L,
129L, 212L, 213L, 296L, 297L, 380L,
381L, 37L, 121L, 205L, 289L, 373L,
27L, 44L, 45L, 111L, 128L, 129L, 195L,
212L, 213L, 279L, 296L, 297L, 363L,
380L, 381L, 7L, 91L, 175L, 259L, 343L,
44L, 45L, 70L, 128L, 129L, 154L,
212L, 213L, 238L, 296L, 297L, 322L,
380L, 381L, 406L, 7L, 37L, 48L, 91L,
121L, 132L, 175L, 205L, 216L, 259L,
289L, 300L, 343L, 373L, 384L, 7L, 37L,
48L, 91L, 121L, 132L, 175L, 205L,
216L, 259L, 289L, 300L, 343L, 373L,
384L, 44L, 45L, 128L, 129L, 212L,
213L, 296L, 297L, 380L, 381L, 37L,
121L, 205L, 289L, 373L, 27L, 44L,
45L, 111L, 128L, 129L, 195L, 212L,
213L, 279L, 296L, 297L, 363L, 380L,
381L, 7L, 91L, 175L, 259L, 343L, 44L,
45L, 70L, 128L, 129L, 154L, 212L,
213L, 238L, 296L, 297L, 322L, 380L,
381L, 406L, 7L, 37L, 48L, 91L, 121L,
132L, 175L, 205L, 216L, 259L, 289L,
300L, 343L, 373L, 384L, 7L, 37L, 48L,
91L, 121L, 132L, 175L, 205L, 216L,
259L, 289L, 300L, 343L, 373L, 384L,
44L, 45L, 128L, 129L, 212L, 213L,
296L, 297L, 380L, 381L, 37L, 121L,
205L, 289L, 373L), col = c(7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 27L, 27L, 27L, 27L, 27L,
37L, 37L, 37L, 37L, 37L, 37L, 37L,
37L, 37L, 37L, 37L, 37L, 37L, 37L,
37L, 44L, 44L, 44L, 44L, 44L, 44L,
44L, 44L, 44L, 44L, 44L, 44L, 44L,
44L, 44L, 45L, 45L, 45L, 45L, 45L,
45L, 45L, 45L, 45L, 45L, 45L, 45L,
45L, 45L, 45L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 70L,
70L, 70L, 70L, 70L, 91L, 91L, 91L,
91L, 91L, 91L, 91L, 91L, 91L, 91L,
91L, 91L, 91L, 91L, 91L, 111L, 111L,
111L, 111L, 111L, 121L, 121L, 121L,
121L, 121L, 121L, 121L, 121L, 121L,
121L, 121L, 121L, 121L, 121L, 121L,
128L, 128L, 128L, 128L, 128L, 128L,
128L, 128L, 128L, 128L, 128L, 128L,
128L, 128L, 128L, 129L, 129L, 129L,
129L, 129L, 129L, 129L, 129L, 129L,
129L, 129L, 129L, 129L, 129L, 129L,
132L, 132L, 132L, 132L, 132L, 132L,
132L, 132L, 132L, 132L, 154L, 154L,
154L, 154L, 154L, 175L, 175L, 175L,
175L, 175L, 175L, 175L, 175L, 175L,
175L, 175L, 175L, 175L, 175L, 175L,
195L, 195L, 195L, 195L, 195L, 205L,
205L, 205L, 205L, 205L, 205L, 205L,
205L, 205L, 205L, 205L, 205L, 205L,
205L, 205L, 212L, 212L, 212L, 212L,
212L, 212L, 212L, 212L, 212L, 212L,
212L, 212L, 212L, 212L, 212L, 213L,
213L, 213L, 213L, 213L, 213L, 213L,
213L, 213L, 213L, 213L, 213L, 213L,
213L, 213L, 216L, 216L, 216L, 216L,
216L, 216L, 216L, 216L, 216L, 216L,
238L, 238L, 238L, 238L, 238L, 259L,
259L, 259L, 259L, 259L, 259L, 259L,
259L, 259L, 259L, 259L, 259L, 259L,
259L, 259L, 279L, 279L, 279L, 279L,
279L, 289L, 289L, 289L, 289L, 289L,
289L, 289L, 289L, 289L, 289L, 289L,
289L, 289L, 289L, 289L, 296L, 296L,
296L, 296L, 296L, 296L, 296L, 296L,
296L, 296L, 296L, 296L, 296L, 296L,
296L, 297L, 297L, 297L, 297L, 297L,
297L, 297L, 297L, 297L, 297L, 297L,
297L, 297L, 297L, 297L, 300L, 300L,
300L, 300L, 300L, 300L, 300L, 300L,
300L, 300L, 322L, 322L, 322L, 322L,
322L, 343L, 343L, 343L, 343L, 343L,
343L, 343L, 343L, 343L, 343L, 343L,
343L, 343L, 343L, 343L, 363L, 363L,
363L, 363L, 363L, 373L, 373L, 373L,
373L, 373L, 373L, 373L, 373L, 373L,
373L, 373L, 373L, 373L, 373L, 373L,
380L, 380L, 380L, 380L, 380L, 380L,
380L, 380L, 380L, 380L, 380L, 380L,
380L, 380L, 380L, 381L, 381L, 381L,
381L, 381L, 381L, 381L, 381L, 381L,
381L, 381L, 381L, 381L, 381L, 381L,
384L, 384L, 384L, 384L, 384L, 384L,
384L, 384L, 384L, 384L, 406L, 406L,
406L, 406L, 406L)), .Names = c("row",
"col" ), row.names = c(NA, -400L),
class = "data.frame")
Ok, if you consider your elements as vertices, and your pairs as edges of a graph, and your problem becomes a case of the well known (and NP complete) vertex cover problem. You can easily find an approximate solution, guaranteed to be within a factor of two of optimal by choosing an arbitrary edge, and selecting both vertices, removing all eliminated edges, lather, rinse, repeat. You can do incrementally better with more complicated approximation algorithms, but finding the optimal solution with a large graph is probably not feasible.
Here is a simple function to do this. (Note R is not my native language, so this is probably hideously non idomatic, any suggestions for improvement would be appreciated).
good <- function(dat, result = NULL) {
sampr <- dat[sample(1:(dim(dat)[1]),1),]
if (dim(dat)[1] == 0){
result
} else {
good(subset(dat, row != sampr$row & row != sampr$col & col != sampr$row &
col != sampr$col),result = c(result, sampr$row, sampr$col))
}
}
I'd run this a number of times and keep the best one. (It might also be useful to keep track of the size of the worst one, as it gives you a lower bound on the optimal size). It might be useful to postprocess the result to remove excess vertices.
Running 10000 iterations (and removing redundant vertices) gives the following 19 element solution to your sample problem.
7 37 45 48 91 121 128 132 175 205 212 216 259 279 289 300 343 373 384
We also know that the optimal solution must have at least 15 vertices.