R cbind xts objects results in added/duplicate row - r

I want to merge three xts objects together with cbind:
> OIH.tmp <-
structure(c(NA, 7.7, 5.1, -6.9, -2.6), index = structure(c(1325221200,
1327986000, 1330491600, 1333080000, 1334894400), tzone = "", tclass = "yearmon"),
tclass = "Date", tzone = "", src = "yahoo", updated = structure(1335041586.83363,
class = c("POSIXct", "POSIXt")), .indexTZ = "", .indexCLASS = "yearmon",
.Dim = c(5L, 1L), .Dimnames = list(NULL, "OIH"), class = c("xts", "zoo"))
> SMH.tmp <-
structure(c(NA, 9.3, 2.9, 3.7, -5), index = structure(c(1325134800,
1327986000, 1330491600, 1333080000, 1334894400), tzone = "", tclass = "yearmon"),
tclass = "Date", tzone = "", src = "yahoo", updated = structure(1335041596.41175,
class = c("POSIXct", "POSIXt")), .indexTZ = "", .indexCLASS = "yearmon",
.Dim = c(5L, 1L), .Dimnames = list(NULL, "SMH"), class = c("xts", "zoo"))
> SU.tmp <-
structure(c(NA, -9.4, -6.9, -2.3, -18.1, -22.6, 22.7, -6.1, -4,
18, 4.1, -9.4, -4.7), index = structure(c(1304049600, 1306814400,
1309406400, 1311912000, 1314763200, 1317355200, 1320033600, 1322629200,
1325221200, 1327986000, 1330491600, 1333080000, 1334894400), tzone = "",
tclass = "yearmon"), tclass = "Date", tzone = "", src = "yahoo",
updated = structure(1335041613.0055, class = c("POSIXct", "POSIXt")),
.indexTZ = "", .indexCLASS = "yearmon", .Dim = c(13L, 1L),
.Dimnames = list(NULL, "SU"), class = c("xts", "zoo"))
> cbind(OIH.tmp, SU.tmp, SMH.tmp)
OIH SU SMH
Apr 2011 NA NA NA
May 2011 NA -9.4 NA
Jun 2011 NA -6.9 NA
Jul 2011 NA -2.3 NA
Aug 2011 NA -18.1 NA
Sep 2011 NA -22.6 NA
Oct 2011 NA 22.7 NA
Nov 2011 NA -6.1 NA
Dec 2011 NA NA NA
Dec 2011 NA -4.0 NA
Jan 2012 7.7 18.0 9.3
Feb 2012 5.1 4.1 2.9
Mar 2012 -6.9 -9.4 3.7
Apr 2012 -2.6 -4.7 -5.0
Notice that there is an additional/duplicate row for Dec 2011, which I don't want. I can think of messy ways to achieve my end goal (here) but I'm sure there must be something more simple/elegant - merge by the index of one object perhaps. This seems simple enough but I've read through documentation for cbind and merge and have not found a simple solution.
I actually have a series of objects that I want to combine/merge. I just used the 3 you see here to illustrate the problem. I actually use the following command to construct the return series:
oneMonthReturn <- do.call(merge, lapply(tickers.tmp, function(x)
round(ROC(Cl(to.monthly(get(x, myEnv))),1) * 100, 1) ))
> dput(tickers.tmp)
c("DJI", "GSPC", "IXIC", "GSPTSE", "XLE", "OIH", "XOP", "XLI",
"XLB", "XLF", "XRT", "XLK", "SMH", "XLY", "XLP", "XLU", "XLV",
"PPH", "MOO", "GLD", "SLV", "GDX", "TLT", "X", "SU", "TCK", "ACHN",
"IDIX", "AGU")
> dput(oneMonthReturn)
structure(c(NA, -1.9, -1.2, -2.2, -4.5, -6.2, 9.1, 0.8, NA, 1.4,
3.3, 2.5, 2, -1.4, NA, -1.4, -1.8, -2.2, -5.8, -7.4, 10.2, -0.5,
NA, 0.8, 4.3, 4, 3.1, -2.1, NA, -1.3, -2.2, -0.6, -6.6, -6.6,
10.6, -2.4, NA, -0.6, 7.7, 5.3, 4.1, -3, NA, -1, -3.7, -2.7,
-1.4, -9.4, 5.3, -0.4, NA, -2.1, 4.1, 1.5, -2, -2, NA, -4.3,
-2.3, 1.4, -10.8, -16, 17.5, 1.7, NA, -2.5, 2.2, 5.8, -4.3, -4,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 7.7, 5.1, -6.9, -2.6,
NA, -4.5, -3.5, 5.7, -14.8, -22.5, 22.8, 2.4, NA, -4.6, 3.5,
8.3, -4.1, -7.7, NA, -2.8, -1, -7.1, -6.8, -10.3, 13.4, 1.4,
NA, -0.4, 7.1, 2.8, 0.5, -1.8, NA, -2.8, -1, -3.5, -7.3, -18.5,
16, 0.2, NA, -3, 10.4, -0.6, 0, -1.3, NA, -3.4, -3.1, -3.6, -10.1,
-12.5, 13.4, -5.2, NA, 1.5, 7.8, 4.9, 6.8, -3.9, NA, 1.5, -1.4,
-0.2, -7.1, -7.1, 12.9, -1.3, NA, 1.3, 4.8, 6.5, 3.9, -0.5, NA,
-1.1, -2.9, 0.4, -5.5, -3.5, 9.7, -1.5, NA, -0.7, 6, 6.9, 4.1,
-3, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 9.3, 2.9, 3.7, -5,
NA, -0.3, -0.6, -1.4, -5.4, -7.5, 11.3, -0.7, NA, 0.7, 5.7, 4.4,
4.3, -1, NA, 2.5, -3.4, -1.3, 0.2, -4.1, 4.5, 2.7, NA, 1.8, -1.4,
3.7, 2.5, 0.7, NA, 2.1, -1.2, -0.9, 2.1, -0.8, 3.6, 1, NA, 2.2,
-3.7, 0.6, 0.4, -0.1, NA, 2.4, -1.6, -4, -2.1, -5.1, 5.6, 0.9,
NA, 2.4, 3.1, 1.1, 3.9, -0.9, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 0, 2.5, 3, 0.2, NA, -2.4, -3.1, 0.6, -3.6, -18.9, 14.6,
-1.6, NA, -4.4, 8.5, 2.5, 0.4, -1.5, NA, -1.8, -2.5, 8.1, 11.6,
-11.7, 5.7, 1.7, NA, -11.3, 10.8, -3, -1.3, -1.6, NA, -22.1,
-10.5, 13.8, 4, -33.6, 14.6, -4.4, NA, -17.2, 18.1, 3.9, -6.7,
-2, NA, -6.7, -6.3, 4.1, 9.9, -12.9, 6.4, 2.7, NA, -16.1, 9.3,
-1.9, -11.2, -7.2, NA, 2.9, -2.7, 4, 8.9, 12.1, -4.2, 1.7, NA,
2.8, -0.3, -2.9, -4.6, 4.2, NA, -3.4, -0.2, -14.1, -28.4, -31.3,
14.2, 7.4, NA, -3.1, 13.2, -10.4, 7.6, -1.3, NA, -9.4, -6.9,
-2.3, -18.1, -22.6, 22.7, -6.1, NA, -4, 18, 4.1, -9.4, -4.7,
NA, -3.1, -3.5, -2.6, -10.9, -41.8, 31.8, -9.4, NA, -3.6, 18.4,
-5.7, -11.4, 3.5, NA, 29.4, -0.7, -0.4, -18.8, -26.3, 29.2, 5.1,
NA, 13.6, 37.5, -5.5, -9.2, -14.6, NA, -6.7, 5.1, 29, -14, -15.2,
18.4, 23.6, NA, -2, 58.6, -12.8, -18.5, -16.9, NA, -2.7, -0.3,
-0.4, -1.6, -25.4, 21.1, -16.2, NA, -4.2, 17.9, 5.9, 1.4, 0.2
), .Dim = c(14L, 29L), .Dimnames = list(NULL, c("DJI", "GSPC",
"IXIC", "GSPTSE", "XLE", "OIH", "XOP", "XLI", "XLB", "XLF", "XRT",
"XLK", "SMH", "XLY", "XLP", "XLU", "XLV", "PPH", "MOO", "GLD",
"SLV", "GDX", "TLT", "X", "SU", "TCK", "ACHN", "IDIX", "AGU")), index = structure(c(1304049600,
1306814400, 1309406400, 1311912000, 1314763200, 1317355200, 1320033600,
1322629200, 1325134800, 1325221200, 1327986000, 1330491600, 1333080000,
1334894400), tzone = "", tclass = "yearmon"), .indexTZ = "", .indexCLASS = "yearmon", tclass = c("POSIXct",
"POSIXt"), tzone = "", src = "yahoo", updated = structure(1335041583.80238, class = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"))
Appreciate the help.

This should do it (on simpler daya as you didn't post a dput() of your data):
newvar <- merge(merge(OIH.tmp, SU.tmp), SMH.tmp)
merge.xts() only takes two arguments so you have to call it repeatedly. The default aggregations "do the right thing" here:
R> OIH <- xts(c(NA, 1:4),
+ order.by=seq(as.Date("2011-12-01"), by="1 month", length=5))
R> SMH <- xts(c(NA, 1:4),
+ order.by=seq(as.Date("2011-12-01"), by="1 month", length=5))
R> SU <- xts(c(NA, 1:12),
+ order.by=seq(as.Date("2011-04-01"), by="1 month", length=13))
R> merge(OIH, merge(SU, SMH))
OIH SU SMH
2011-04-01 NA NA NA
2011-05-01 NA 1 NA
2011-06-01 NA 2 NA
2011-07-01 NA 3 NA
2011-08-01 NA 4 NA
2011-09-01 NA 5 NA
2011-10-01 NA 6 NA
2011-11-01 NA 7 NA
2011-12-01 NA 8 NA
2012-01-01 1 9 1
2012-02-01 2 10 2
2012-03-01 3 11 3
2012-04-01 4 12 4
R>

The issue with your call to to.monthly, not with merge.xts (as I originally thought). This solution will not work with the version of xts currently on CRAN, but it will work with revision 613 or greater from R-forge.
The problem arises because to.monthly aligns the series on the last time of the period in the actual index, not the theoretical last time of the period; and it does not drop the time component of the index by default. In your case, the last time of 2011-12 is 2011-12-29 for SMH and 2011-12-30 for the other two objects.
If you set drop.time=TRUE (again this only current works with the R-forge version of xts), the results are as you expect:
oneMonthReturn <- do.call(merge, lapply(tickers.tmp, function(x)
round(ROC(Cl(to.monthly(get(x, myEnv),drop.time=TRUE)),1) * 100, 1) ))

For a multiple merge (using Dirk's data) with many columns you can use the Reduce function. I wrapped up the usage in a function that you could tweak to meet your needs. Credit for answering this question correctly should go to Dirk as he answered correctly already, this is just an alternative method as I was bored on a Saturday night :) Shoot, I don't even know what xts is for.
multi.xts.merge <- function(listOguys) {
dat <- Reduce(function(x, y) {merge.xts(x, y)}, listOguys)
names(dat) <- as.character(substitute(listOguys))[-1]
return(dat)
}
multi.xts.merge(list(OIH, SU, SMH))
Note that you have to supply a list to this function

Related

Avoid hard-coding with pivot longer to pivot multiple columns at once

I want to pivot_long() multiple columns of the dataset below avoiding hard-coding. I've seen some similar questions, but I still cannot do it.
Wide data:
> head(data)
ID IND_TEST_SCORE ARG_G1_ABC NARR_G1_ABC ARG_G1_EF NARR_G1_EF ARG_G2_ABC NARR_G2_ABC
1 PART_1 100 68.53 71.32 4.94 3.42 64.90 64.25
2 PART_2 36 65.90 NA 6.55 NA 63.80 59.00
3 PART_3 32 69.78 NA 2.44 NA 71.73 NA
4 PART_4 96 68.29 67.83 3.00 3.17 67.67 67.88
5 PART_5 11 NaN NaN NaN NaN NA 67.08
6 PART_6 12 69.50 71.60 3.25 2.50 NA NA
ARG_G2_EF NARR_G2_EF
1 7.10 5.08
2 7.40 7.00
3 1.09 NA
4 3.67 1.76
5 NA 3.00
6 NA NA
Desired output:
ID IND_TEST_SCORE ABC EF GROUP TYPE
1 PART_1 100 G1 ARG
1 PART_1 100 G1 NARR
1 PART_1 100 G2 ARG
1 PART_1 100 G2 NARR
2 PART_2 36 G1 ARG
2 PART_2 36 G1 NARR
2 PART_2 36 G2 ARG
2 PART_2 36 G2 NARR
so on...
Questions: how can I:
Create a new column called "GROUP" with 'G1' and G2' values
Create a new column called "TYPE" with 'ARG' and NARR' values
Create 2 new columns, one for "ABC" values and another for "EF" values
without hard-coding it? I'd like to work with patterns...Thanks in advance!
My attempt so far:
# create a single "my_names" columns and work on it:
dataLong <- data %>%
pivot_longer(cols = c(-ID, -IND_TEST_SCORE),
names_to = "my_names",
values_to = "my_values") %>%
mutate(GROUP = case_when(my_names == "ARG_G1_ABC" ~ "G1",
my_names == "ARG_G1_ABC" ~ "G2",
my_names == "ARG_G1_EF" ~ "G1",
my_names == "ARG_G2_EF" ~ "G2",
my_names == "NARR_G1_ABC" ~ "G1",
my_names == "NARR_G1_ABC" ~ "G2",
my_names == "NARR_G1_EF" ~ "G1",
my_names == "NARR_G2_EF" ~ "G2")) %>%
mutate(TYPE = case_when(my_names == "ARG_G1_ABC" ~ "ARG",
my_names == "ARG_G2_ABC" ~ "ARG",
my_names == "ARG_G1_EF" ~ "ARG",
my_names == "ARG_G2_EF" ~ "ARG",
my_names == "NARR_G1_ABC" ~ "NARR",
my_names == "NARR_G2_ABC" ~ "NARR",
my_names == "NARR_G1_EF" ~ "NARR",
my_names == "NARR_G2_EF" ~ "NARR"))
Dataset:
> dput(data)
structure(list(ID = structure(c("PART_1", "PART_2", "PART_3",
"PART_4", "PART_5", "PART_6", "PART_7", "PART_8", "PART_9", "PART_10",
"PART_11", "PART_12", "PART_13", "PART_14", "PART_15", "PART_16",
"PART_17", "PART_18", "PART_19", "PART_20", "PART_21", "PART_22",
"PART_23", "PART_24", "PART_25", "PART_26", "PART_27", "PART_28",
"PART_29", "PART_30", "PART_31", "PART_32", "PART_33", "PART_34",
"PART_35", "PART_36", "PART_37", "PART_38", "PART_39", "PART_40",
"PART_41", "PART_42", "PART_43", "PART_44", "PART_45", "PART_46",
"PART_47", "PART_48", "PART_49", "PART_50", "PART_51", "PART_52",
"PART_53", "PART_54", "PART_55", "PART_56", "PART_57", "PART_58",
"PART_59", "PART_60", "PART_61", "PART_62", "PART_63", "PART_64",
"PART_65", "PART_66", "PART_67", "PART_68", "PART_69", "PART_70",
"PART_71"), class = c("glue", "character")), IND_TEST_SCORE = c(100,
36, 32, 96, 11, 12, 32, 72, 100, 64, 2, 19, 99, 86, 60, 108,
95, 35, 60, 9, 78, 61, 61, 67, 105, 99, 51, 21, 65, 30, 0.9,
77, 54, 14, 103, 48, 0.7, 2, 39, 94, 80, 8, 30, 103, 113, 91,
59, 56, 86, 99, 72, 34, 32, 6, 44, 99, 65, 98, 110, 102, 87,
50, 89, 36, 93, 8, 11, 78, 48, 77, 4), ARG_G1_ABC = c(68.53,
65.9, 69.78, 68.29, NaN, 69.5, 67.05, 73.74, 73.59, 72.57, 64.33,
67.79, 72.94, 63.75, 71.56, 75.5, 68.16, NA, 65.64, 68.36, 69.75,
72.73, 67.67, 66.19, 62.94, 72.48, 72.19, 62.44, 72.5, 71.06,
70.4, 69.14, NA, 67.59, 69.1, 74.05, NA, 68.6, 68.27, 59.12,
NA, NA, 63.7, 67.18, NA, 68.38, 63.44, 72.56, 66.06, 66.53, 73.19,
NA, NA, NA, 73.44, 67.45, 72.91, 65.81, 73.96, 75, 75.89, 72,
NA, 68.2, 67.29, 69.91, NaN, 69.67, 68.39, 69.2, 67.55), NARR_G1_ABC = c(71.32,
NA, NA, 67.83, NaN, 71.6, 64.2, 71.68, 73.29, 70.53, 73.35, 59.31,
71.08, 74.06, 68.7, 74, 69.08, NA, 68.52, 63.47, 68.33, NA, 65.64,
62.11, 63.9, 70.41, 60.36, 65.88, 68.81, 69.62, 70.68, 67.5,
NA, 68.45, 67.16, 74.39, 60.6, 65.89, 71.94, 68.75, NA, NA, 67,
66.85, NA, NA, 62.56, 73.33, 69.81, 67.68, 73.06, 65.8, 63.85,
NA, 67.64, 71.6, 68.47, 69.39, 71.16, 72.33, NA, 66.68, NA, 66.22,
67, 61.27, NaN, 72.33, 68.29, 71.33, 65.57), ARG_G1_EF = c(4.94,
6.55, 2.44, 3, NaN, 3.25, 4.71, 2.84, 1.07, 2, 5.33, 5.43, 1.72,
10.55, 3, 1.17, 5.8, NA, 10.55, 4.21, 2.94, 3.55, 6.33, 8.25,
5.88, 2, 3.44, 9.22, 1.69, 4.18, 2.5, 4.71, NA, 4.41, 5.9, 2.21,
NA, 6.67, 3.33, 7, NA, NA, 8, 4.76, NA, 4.44, 2.68, 3.16, 4.94,
5.42, 2.81, NA, NA, NA, 1.78, 6.09, 2.52, 6.56, 1.96, 1.12, 0.67,
3.78, NA, 3.5, 3.65, 5.27, NaN, 4.33, 6.78, 3.6, 4.35), NARR_G1_EF = c(3.42,
NA, NA, 3.17, NaN, 2.5, 3.29, 1.64, 1.07, 6, 1.41, 9.25, 3.25,
2.69, 3.8, 1.32, 3.04, NA, 2.38, 5.18, 2.38, NA, 6.18, 6.11,
6.4, 1.85, 7.45, 3.69, 1.89, 3.25, 1.6, 4.8, NA, 2.8, 4.32, 2.3,
6.6, 7.42, 2.83, 4.75, NA, NA, 5, 4.75, NA, NA, 8, 1.71, 2.67,
2.05, 1.47, 4.8, 7.96, NA, 4.43, 3.8, 4.47, 4.91, 1.68, 2.78,
NA, 6.58, NA, 6.67, 6, 5.18, NaN, 1.67, 4.86, 2.08, 4.38), ARG_G2_ABC = c(64.9,
63.8, 71.73, 67.67, NA, NA, 52.5, 72.35, 65.28, 57.22, NA, NaN,
69, 66.67, NaN, 66.58, 69, 60.55, 56.29, 67.45, 68.4, 64.25,
NaN, 50.86, 67.83, 65.96, 57, 53.07, 66.89, NaN, NA, 59, 61.5,
NA, 65.9, 64.07, NA, NA, 57.91, 67.89, 68.75, 68.5, NaN, 63.24,
66.19, 60.59, 59.24, 54.33, 64.39, 65.83, 65.71, 63, 63.78, 63.62,
64, 65.08, NA, 67.61, 67.57, 72.71, 65.46, 61.71, NA, 57.62,
NA, NA, NA, 64, 61.33, 62.64, NA), NARR_G2_ABC = c(64.25, 59,
NA, 67.88, 67.08, NA, 60.75, 64.42, 71.17, 58.42, NA, 49.8, 63.36,
65.2, NaN, 70.2, 62.85, NaN, 61.6, 53.92, 62.63, NA, NaN, 50.46,
65.14, 60.58, 63.29, NA, 64.33, NaN, NA, 68.57, NA, NA, 66.3,
NA, 57.29, NA, 53.5, 63.48, NA, 57.07, NaN, 61.82, NA, 68.61,
57.1, 62.84, 63, 61.91, 58.38, NaN, 61.56, NA, NaN, 65.55, 63.8,
65, 63.14, 67.31, 67.75, 57.62, 63.31, 54.83, 66.43, NA, NA,
64.67, 57.92, 59, NA), ARG_G2_EF = c(7.1, 7.4, 1.09, 3.67, NA,
NA, 12.75, 1.24, 3.28, 9.78, NA, NaN, 1.71, 1.93, NaN, 6.21,
2.76, 7.91, 8.65, 3.55, 3.4, 5, NaN, 16.05, 3.39, 4.52, 13, 11.6,
5.05, NaN, NA, 9.5, 9.67, NA, 7.03, 3.87, NA, NA, 8, 3.33, 2.19,
3, NaN, 8.53, 3.37, 5.47, 7.35, 13.48, 5.33, 3.83, 3.65, 5.82,
4, 6.17, 6, 6.42, NA, 3.83, 2.71, 2.19, 4.58, 5.18, NA, 9.75,
NA, NA, NA, 5, 6.44, 5.36, NA), NARR_G2_EF = c(5.08, 7, NA, 1.76,
3, NA, 8.88, 4.26, 2.92, 7.08, NA, 10.6, 5.5, 4.16, NaN, 2.87,
4.7, NaN, 7, 9.5, 4.68, NA, NaN, 12.75, 4.77, 9.15, 5, NA, 5.44,
NaN, NA, 4.57, NA, NA, 1.7, NA, 11.29, NA, 13.33, 5.95, NA, 10.79,
NaN, 5.18, NA, 5.22, 7.1, 3.53, 5.75, 6.77, 6.31, NaN, 7.88,
NA, NaN, 3, 4.88, 4.69, 6.19, 10.31, 3.62, 9.75, 5.46, 6.83,
4.43, NA, NA, 3.67, 8.67, 8.53, NA)), row.names = c(NA, -71L), class = "data.frame")
We may use pivot_longer - specify the columns with matches that match the column names substring _ABC or _EF at the end ($) of the string and split the column names at _ by specifying names_sep as _ as well as specify the corresponding column names in names_to (.value will return the value of the columns where as TYPE or GROUP gets the first and second substring from column names
library(tidyr)
pivot_longer(data, cols = matches('_(ABC|EF)$'),
names_to = c("TYPE", "GROUP", ".value"),
names_sep = "_", values_drop_na = TRUE)
-output
# A tibble: 217 × 6
ID IND_TEST_SCORE TYPE GROUP ABC EF
<glue> <dbl> <chr> <chr> <dbl> <dbl>
1 PART_1 100 ARG G1 68.5 4.94
2 PART_1 100 NARR G1 71.3 3.42
3 PART_1 100 ARG G2 64.9 7.1
4 PART_1 100 NARR G2 64.2 5.08
5 PART_2 36 ARG G1 65.9 6.55
6 PART_2 36 ARG G2 63.8 7.4
7 PART_2 36 NARR G2 59 7
8 PART_3 32 ARG G1 69.8 2.44
9 PART_3 32 ARG G2 71.7 1.09
10 PART_4 96 ARG G1 68.3 3
# … with 207 more rows

does R have a "alternate()" function?

I have a large df and I'm trying to relocate the columns with patterns instead of manually write each column name in select(). More details here.
A glimpse of the issue (edit): All my columns share a pattern ARG_G1_50_AAA or ARG_G2_50_AAA or NARR_G1_50_AAA or NARR_G2_50_AAA. The final parts are: AAA, AAC, AC and AB. I need two subsets of this data.
Set 1: I need to intercalate "G1" and "G2" columns (in the order 50, 100, 150 and 200) and in the order (AAA, AAC, AC and AB). Ex:
NARR_G1_50_AAA, NARR_G2_50_AAA,
NARR_G1_50_AAC, NARR_G2_50_AAC.... so on
Set 2: I need to intercalate "Narr" and "Arg" columns (again, 50 before 100, 150 and 200 and AAA before AAC, AC and AB). No need to intercalate G1 and G2 now. Ex:
NARR_G1_50_AAA, ARG_G1_50_AAA,
NARR_G2_50_AAA, ARG_G2_50_AAA... so on
Basically, I was able to partially solve my problem (cf. linked post above) with:
dfPaired <- merged_DF %>%
dplyr::select(ID, str_subset(names(merged_DF), "G?_50\\w*"))
head(dfPaired)
ID ARG_G1_50_AAA ARG_G1_50_AAC ARG_G1_50_AC ARG_G1_50_AB
ARG_G2_50_AAA ARG_G2_50_AAC, ARG_G2_50_AC ARG_G2_50_AB....
## I know that I'm only getting the "50" here, in fact I need all, but It wouldn't be "A" problem to repeat the code for 100, 150, 200)
How can I make R "intercalate" the strings? I mean, I need:
ARG_G1_50_AAA, ARG_G2_50_AAA
ARG_G1_50_AAC, ARG_G2_50_AAC,
ARG_G1_50_AC, ARG_G2_50_AC,
ARG_G1_50_AB, ARG_G2_50_AB ... (so on)
(intercalate G1 and G2 coluns in case of set 1)
Questions :
Could I use sth as seq(by = 2) ?
Is there a way to pass two patterns to str() and ask it to intercalate the output?
Is there an "intercalate()" function that I could pass to str_subset(names(merged_DF), "G?_50\\w*")) ?
** I mean, sth as int(str_subset(names(merged_DF), "G1_50\w*")), str_subset(names(merged_DF), "G2_50\w*")) Thanks in advance :)
EDIT:
dput(merged_DF[1:50])
structure(list(ID = structure(c("P1", "P2", "P3", "P4", "P5",
"P6", "P7", "P8", "P9", "P10", "P11", "P12", "P13", "P14", "P15",
"P16", "P17", "P18", "P19", "P20", "P21", "P22", "P23", "P24",
"P25", "P26", "P27", "P28", "P29", "P30", "P31", "P32", "P33",
"P34", "P35", "P36", "P37", "P38", "P39", "P40", "P41", "P42",
"P43", "P44", "P45", "P46", "P47", "P48", "P49", "P50", "P51",
"P52", "P53", "P54", "P55", "P56", "P57", "P58", "P59", "P60",
"P61", "P62", "P63", "P64", "P65", "P66", "P67", "P68", "P69",
"P70", "P71"), class = c("glue", "character")), ARG_G1_100_AAA = c(68.53,
65.9, 69.78, 68.29, NaN, 69.5, 67.05, 73.74, 73.59, 72.57, 64.33,
67.79, 72.94, 63.75, 71.56, 75.5, 68.16, NA, 65.64, 68.36, 69.75,
72.73, 67.67, 66.19, 62.94, 72.48, 72.19, 62.44, 72.5, 71.06,
70.4, 69.14, NA, 67.59, 69.1, 74.05, NA, 68.6, 68.27, 59.12,
NA, NA, 63.7, 67.18, NA, 68.38, 63.44, 72.56, 66.06, 66.53, 73.19,
NA, NA, NA, 73.44, 67.45, 72.91, 65.81, 73.96, 75, 75.89, 72,
NA, 68.2, 67.29, 69.91, NaN, 69.67, 68.39, 69.2, 67.55), ARG_G1_100_AAC = c(70.18,
67.65, 71.89, 70.42, NaN, 72.38, 69.67, 75.63, 76.7, 76.21, 66.5,
70.57, 76.72, 66.4, 74.75, 79.17, 70.84, NA, 67.82, 70, 71.88,
74.55, 69.33, 69.5, 65.25, 75.05, 75.44, 64.56, 74.88, 74.29,
72.4, 71.93, NA, 69.12, 71.43, 77.53, NA, 71.93, 70.4, 60.25,
NA, NA, 64.8, 69, NA, 71.19, 71.12, 75.04, 68.89, 68.26, 75.81,
NA, NA, NA, 75.89, 68.82, 77.35, 68.38, 76.71, 79.12, 78.89,
73.5, NA, 69.7, 69.82, 70.91, NaN, 72, 71.17, 71.85, 69.7), ARG_G1_100_AC = c(4.35,
4.95, 1.44, 2.71, NaN, 3.25, 3.95, 2.26, 0.85, 1.21, 5.33, 5.43,
0.83, 10.4, 2.56, 0.33, 4.92, NA, 10.55, 3.43, 2.94, 1.55, 5.33,
6.44, 5.25, 2, 3.12, 8.5, 1.38, 3.76, 1.9, 2.79, NA, 4.06, 5.57,
1.95, NA, 6.07, 2.67, 7, NA, NA, 8, 4.76, NA, 4.19, 2.68, 3,
4.94, 4.79, 2.19, NA, NA, NA, 1.78, 5.27, 2.52, 5.88, 1.96, 1.12,
0.67, 3.28, NA, 3.5, 3.41, 3.73, NaN, 3.83, 6.06, 3.3, 3.9),
ARG_G1_100_AB = c(4.94, 6.55, 2.44, 3, NaN, 3.25, 4.71, 2.84,
1.07, 2, 5.33, 5.43, 1.72, 10.55, 3, 1.17, 5.8, NA, 10.55,
4.21, 2.94, 3.55, 6.33, 8.25, 5.88, 2, 3.44, 9.22, 1.69,
4.18, 2.5, 4.71, NA, 4.41, 5.9, 2.21, NA, 6.67, 3.33, 7,
NA, NA, 8, 4.76, NA, 4.44, 2.68, 3.16, 4.94, 5.42, 2.81,
NA, NA, NA, 1.78, 6.09, 2.52, 6.56, 1.96, 1.12, 0.67, 3.78,
NA, 3.5, 3.65, 5.27, NaN, 4.33, 6.78, 3.6, 4.35), ARG_G1_150_AAA = c(93.38,
90.2, 98.33, 94.69, NaN, 99, 93.64, 104.22, 104.8, 103.17,
87, 93.83, 101.89, 87.5, 100.38, 107, 94.69, NA, 90.75, 91.5,
93.88, 99.5, NaN, 89.5, 86.5, 100.55, 101, 84.22, 101.88,
94.62, 97.2, 96.5, NA, 87.38, 96.82, 103.67, NA, 97.57, 95.86,
84, NA, NA, 85.5, 90.5, NA, 96.29, 89.71, 101.64, 92.33,
93.89, 104.43, NA, NA, NA, 101.33, 93.5, 105.42, 90.75, 104.23,
108.86, 102.67, 97, NA, 91.9, 91.38, 93.5, NaN, 98, 94.78,
95.1, 93.4), ARG_G1_150_AAC = c(96.38, 90.9, 100, 96.08,
NaN, 99.5, 95.82, 106.33, 106.6, 106.5, 92, 95.83, 104, 89,
103.75, 109, 96.92, NA, 93, 93.17, 95.12, 102.75, NaN, 93.5,
89.38, 102.09, 104.12, 85.44, 103.38, 96.75, 99.2, 98.5,
NA, 90.38, 99.18, 105.89, NA, 99.43, 97, 84, NA, NA, 86.75,
91.88, NA, 96.86, 98.64, 103.71, 94.22, 95.22, 105.71, NA,
NA, NA, 102.33, 94.25, 108.08, 91.75, 107, 112.29, 106.33,
98.22, NA, 93.5, 93.25, 94.25, NaN, 100, 96.78, 97.8, 95.5
), ARG_G1_150_AC = c(8.75, 10.1, 3.67, 5.23, NaN, 6.5, 6.73,
4.78, 2.27, 3.17, 12, 9.83, 3.44, 21.1, 4.25, 2, 11.85, NA,
17.5, 6.17, 7.25, 3, NaN, 13.5, 10.62, 5, 5.75, 17.44, 4,
10.75, 5, 5.5, NA, 9.5, 9.36, 3.56, NA, 10, 6.86, 9.5, NA,
NA, 16.25, 10.25, NA, 10.43, 6, 6.21, 9.22, 9.22, 5.14, NA,
NA, NA, 3, 10.75, 6, 12.88, 3.77, 2.57, 4.33, 7.22, NA, 8.6,
7.88, 10, NaN, 7, 11.67, 7.8, 7.7), ARG_G1_150_AB = c(10.12,
12.6, 5.33, 5.77, NaN, 6.5, 7.91, 5.44, 2.53, 4.33, 12, 9.83,
4.78, 21.4, 5.25, 3, 13.77, NA, 17.5, 7.33, 7.25, 6, NaN,
16.5, 11.5, 5, 6.25, 18.67, 4.5, 11.38, 5.8, 8.5, NA, 10,
9.82, 4.33, NA, 11, 7.71, 9.5, NA, NA, 16.25, 10.25, NA,
10.86, 6, 7, 9.22, 10.33, 6.43, NA, NA, NA, 3.33, 11.75,
6, 14, 3.77, 2.57, 4.33, 8.22, NA, 8.8, 9, 12, NaN, 8, 12.67,
8.2, 8.4), ARG_G1_200_AAA = c(121.5, 110.6, NaN, 120.57,
NaN, NaN, 115.67, 132.4, 131.11, 128.5, NaN, 114.5, 126.25,
107.4, 124.67, NaN, 120.5, NA, 108, 110.5, 114.33, 125, NaN,
114.67, 108, 123.5, 126.67, 105.5, 129.67, 117.75, 121, 120,
NA, 108.5, 122.83, 130.8, NA, 123.67, 119, NaN, NA, NA, NaN,
109.75, NA, 119, 114.75, 128.88, 115.25, 117, 134, NA, NA,
NA, NaN, 113, 131.86, 110.67, 133.57, 138.33, 127.5, 118.25,
NA, 112.8, 111.5, 113, NaN, NaN, 114.25, 118, 112.8), ARG_G1_200_AAC = c(123.25,
111.6, NaN, 121.29, NaN, NaN, 116.33, 133.4, 132.89, 130.5,
NaN, 115.5, 129.5, 108.2, 128.33, NaN, 123, NA, 108, 111.5,
115.67, 125, NaN, 118, 112, 125.17, 129, 105.75, 130.33,
119.5, 121.4, 121, NA, 109.75, 124.33, 133.4, NA, 125, 120.33,
NaN, NA, NA, NaN, 110.75, NA, 123, 124, 129.75, 117.5, 117.2,
134, NA, NA, NA, NaN, 116, 134.43, 111.33, 135, 141.33, 129.5,
119.5, NA, 114, 113.5, 113, NaN, NaN, 115.5, 120.6, 114),
ARG_G1_200_AC = c(12, 15.6, NaN, 8, NaN, NaN, 10.83, 7.8,
5.33, 6, NaN, 16.5, 6.75, 31.2, 9.33, NaN, 18, NA, 30, 14.5,
13, 11, NaN, 19.67, 17, 9, 9.33, 25.5, 8, 16.25, 9.6, 9,
NA, 16, 12.67, 6.2, NA, 13.67, 11.67, NaN, NA, NA, NaN, 17.5,
NA, 17, 9, 9.5, 14.75, 15.8, 8, NA, NA, NA, NaN, 23, 10.43,
21.33, 5.71, 4.67, 10.25, 13.25, NA, 14.6, 13.25, 19, NaN,
NaN, 21.5, 13.2, 14.6), ARG_G1_200_AB = c(14, 19.4, NaN,
8.71, NaN, NaN, 12.5, 9, 6, 8, NaN, 16.5, 8.5, 31.8, 11,
NaN, 21, NA, 30, 15.5, 13, 15, NaN, 24, 18, 9, 10, 27, 9,
17.25, 10.8, 12, NA, 17, 13.5, 7.2, NA, 14.67, 14, NaN, NA,
NA, NaN, 17.5, NA, 17.67, 9, 10.88, 14.75, 17, 9.67, NA,
NA, NA, NaN, 24, 10.43, 23.33, 5.71, 4.67, 10.5, 15, NA,
14.8, 14.75, 21, NaN, NaN, 23.25, 13.8, 15.8), ARG_G1_50_AAA = c(36.35,
35.88, 36.22, 35.72, 36.12, 36.96, 35.24, 37.62, 36.05, 34.63,
34.19, 33.71, 36.22, 34.43, 34.95, 34.59, 36.03, NA, 32.61,
35.29, 37.17, 37.13, 35.62, 34.64, 34.4, 35.69, 37.36, 36.4,
36.69, 35.8, 36.57, 35.97, NA, 36.44, 34.94, 35.26, NA, 34.44,
37.85, 33.15, NA, NA, 36.13, 34.91, NA, 35.54, 29.02, 35.55,
35.64, 35.79, 35.93, NA, NA, NA, 37, 32.58, 35.71, 34.98,
36.64, 33.29, 35.29, 37.2, NA, 36.29, 36.91, 31.26, 34, 37.48,
33.89, 36.34, 35.88), ARG_G1_50_AAC = c(41.19, 38.7, 41.22,
40.53, 44.12, 41.04, 40.18, 42.38, 42.17, 41.87, 38, 41.21,
42.24, 38.69, 42.64, 42.14, 41.53, NA, 39.65, 40.76, 41.88,
42.23, 39.62, 41.55, 38.19, 42.53, 42.24, 39.49, 42.07, 43.3,
40.92, 39.92, NA, 40.35, 40.49, 44.11, NA, 41.72, 40.64,
36.15, NA, NA, 39.03, 40.86, NA, 40.93, 37.95, 42.27, 39.47,
39.72, 42.12, NA, NA, NA, 42.11, 39.81, 42.82, 39.12, 42.67,
43.02, 43.58, 42.61, NA, 40.04, 41.42, 40.9, 41.5, 41.62,
40.02, 41.08, 40.18), ARG_G1_50_AC = c(0.98, 1.5, 0.37, 0.6,
0.88, 0.73, 1.51, 0.23, 0.25, 0.42, 1.67, 1.58, 0.31, 3.27,
0.62, 0.05, 0.83, NA, 3.71, 1.47, 1.07, 0.1, 1.81, 1.19,
1.62, 0.61, 0.76, 1.73, 0.24, 0.64, 0.33, 0.97, NA, 0.6,
1.98, 0.34, NA, 1.69, 0.26, 2.12, NA, NA, 1.5, 1.14, NA,
1, 0.65, 0.88, 1.62, 1.3, 0.39, NA, NA, NA, 0.57, 1.48, 0.58,
2.21, 0.43, 0.24, 0.16, 0.65, NA, 0.96, 0.4, 1.13, 1.5, 1.05,
1.91, 0.7, 0.94), ARG_G1_50_AB = c(1.09, 2.24, 0.74, 0.68,
0.88, 0.73, 1.82, 0.38, 0.36, 0.89, 1.67, 1.58, 0.76, 3.27,
0.83, 0.45, 1.15, NA, 3.71, 1.82, 1.07, 1.16, 2.25, 1.93,
1.86, 0.61, 1, 2.09, 0.31, 0.86, 0.61, 1.73, NA, 0.77, 2.18,
0.34, NA, 1.92, 0.49, 2.12, NA, NA, 1.5, 1.14, NA, 1.2, 0.65,
0.88, 1.62, 1.49, 0.63, NA, NA, NA, 0.57, 1.77, 0.58, 2.6,
0.43, 0.24, 0.16, 0.85, NA, 0.96, 0.4, 1.84, 1.5, 1.05, 2.4,
0.76, 1.14), ARG_G2_100_AAA = c(64.9, 63.8, 71.73, 67.67,
NA, NA, 52.5, 72.35, 65.28, 57.22, NA, NaN, 69, 66.67, NaN,
66.58, 69, 60.55, 56.29, 67.45, 68.4, 64.25, NaN, 50.86,
67.83, 65.96, 57, 53.07, 66.89, NaN, NA, 59, 61.5, NA, 65.9,
64.07, NA, NA, 57.91, 67.89, 68.75, 68.5, NaN, 63.24, 66.19,
60.59, 59.24, 54.33, 64.39, 65.83, 65.71, 63, 63.78, 63.62,
64, 65.08, NA, 67.61, 67.57, 72.71, 65.46, 61.71, NA, 57.62,
NA, NA, NA, 64, 61.33, 62.64, NA), ARG_G2_100_AAC = c(65.7,
65.8, 74.45, 68, NA, NA, 53.75, 73.94, 67.24, 58.22, NA,
NaN, 71.07, 68.07, NaN, 69.88, 71.32, 62.18, 58.65, 76.45,
71.13, 67.25, NaN, 51.76, 69.33, 68.17, 58, 54.27, 68.05,
NaN, NA, 61, 61.67, NA, 67.79, 65.93, NA, NA, 59.27, 69.67,
71.38, 70, NaN, 64.88, 68.19, 62.06, 61, 55.48, 65.67, 67.72,
68.47, 64, 65.11, 66, 67.5, 66.33, NA, 69.61, 69.33, 75.67,
68.17, 63, NA, 58.81, NA, NA, NA, 66.5, 62.33, 65, NA), ARG_G2_100_AC = c(7.1,
6.4, 0.18, 3.67, NA, NA, 12.75, 1.24, 2.96, 9.78, NA, NaN,
1.43, 1.33, NaN, 5.21, 2.76, 7.91, 8.06, 2.36, 2.87, 4, NaN,
15.52, 2.67, 4.17, 13, 10.07, 5.05, NaN, NA, 9.5, 8.17, NA,
5.86, 3.87, NA, NA, 7, 3.33, 1.75, 3, NaN, 7.94, 3.11, 5.29,
5.29, 13.1, 3.78, 3.33, 3.06, 5.18, 2.56, 5.04, 5.5, 5.75,
NA, 2.22, 2.48, 1, 3.83, 4.82, NA, 8.19, NA, NA, NA, 5, 6.44,
5.29, NA), ARG_G2_100_AB = c(7.1, 7.4, 1.09, 3.67, NA, NA,
12.75, 1.24, 3.28, 9.78, NA, NaN, 1.71, 1.93, NaN, 6.21,
2.76, 7.91, 8.65, 3.55, 3.4, 5, NaN, 16.05, 3.39, 4.52, 13,
11.6, 5.05, NaN, NA, 9.5, 9.67, NA, 7.03, 3.87, NA, NA, 8,
3.33, 2.19, 3, NaN, 8.53, 3.37, 5.47, 7.35, 13.48, 5.33,
3.83, 3.65, 5.82, 4, 6.17, 6, 6.42, NA, 3.83, 2.71, 2.19,
4.58, 5.18, NA, 9.75, NA, NA, NA, 5, 6.44, 5.36, NA), ARG_G2_150_AAA = c(85.25,
NaN, 99, NaN, NA, NA, 66.86, 101, 89.31, 71.33, NA, NaN,
94.5, 88.57, NaN, 95, 95.5, 81.5, 78.5, 107.75, 93.43, NaN,
NaN, 66.18, 92.33, 92.25, NaN, 67.43, 87.44, NaN, NA, NaN,
78, NA, 89.81, 86.43, NA, NA, 75.75, 91.67, 95, NaN, NaN,
85.12, 91.47, 81.88, 79.38, 72.45, 87.67, 91.22, 90.88, 83,
85, 89.23, NaN, 86.2, NA, 92, 93.09, 100.27, 88.62, 83.88,
NA, 75, NA, NA, NA, NaN, 80, 83.5, NA), ARG_G2_150_AAC = c(86.75,
NaN, 101, NaN, NA, NA, 67.29, 103.75, 91.15, 71.67, NA, NaN,
96.33, 88.86, NaN, 96.23, 97.5, 83.5, 79.12, 109.5, 95, NaN,
NaN, 66.45, 93.56, 93.42, NaN, 68, 88.33, NaN, NA, NaN, 78,
NA, 91.69, 87, NA, NA, 76.75, 93, 96.88, NaN, NaN, 85.5,
92.67, 83.38, 80.25, 73.09, 88.33, 92.44, 92.38, 84.25, 85.33,
91.23, NaN, 87.8, NA, 92.67, 94.09, 102.09, 90.15, 84.75,
NA, 76.14, NA, NA, NA, NaN, 81, 85.67, NA), ARG_G2_150_AC = c(15.75,
NaN, 1, NaN, NA, NA, 25.71, 2.62, 6.85, 19.33, NA, NaN, 3.83,
4.57, NaN, 9.85, 6.5, 15.5, 13.88, 3.75, 6.29, NaN, NaN,
27.36, 5.67, 8.42, NaN, 18.86, 11.33, NaN, NA, NaN, 19, NA,
11.25, 9.57, NA, NA, 12.75, 6, 4.5, NaN, NaN, 15.75, 5.67,
10.75, 9.75, 24.82, 8.67, 6.67, 5.88, 13.25, 7, 10, NaN,
10.6, NA, 6.56, 4.18, 2.55, 8.54, 9.75, NA, 17.86, NA, NA,
NA, NaN, 15.67, 13.17, NA), ARG_G2_150_AB = c(15.75, NaN,
2, NaN, NA, NA, 25.71, 2.62, 8.69, 19.33, NA, NaN, 4.33,
5.43, NaN, 11.31, 6.5, 15.5, 14.75, 6, 7.14, NaN, NaN, 28.27,
7.22, 9, NaN, 21.29, 11.33, NaN, NA, NaN, 22, NA, 13.44,
9.71, NA, NA, 14.75, 6, 5.12, NaN, NaN, 16.75, 6, 11.25,
12.75, 25.36, 11.11, 7.33, 6.62, 14.25, 9.33, 11.62, NaN,
11.8, NA, 9.22, 4.91, 4.64, 10, 10.38, NA, 19.86, NA, NA,
NA, NaN, 15.67, 13.33, NA), ARG_G2_200_AAA = c(NaN, NaN,
125, NaN, NA, NA, 81.33, 129.5, 112.25, NaN, NA, NaN, 117.5,
108.33, NaN, 120, 119.25, 99, 94, 134, 113.67, NaN, NaN,
77.67, 112.25, 112.86, NaN, 78.33, 106.6, NaN, NA, NaN, NaN,
NA, 112.4, 106.67, NA, NA, 93, NaN, 122, NaN, NaN, 104.25,
114.89, 101.25, 96.75, 87, 107, 112.25, 112.25, 100, NaN,
111.86, NaN, 101, NA, 114, 114.5, 124.17, 108.86, 103.25,
NA, 90.67, NA, NA, NA, NaN, NaN, 99, NA), ARG_G2_200_AAC = c(NaN,
NaN, 126, NaN, NA, NA, 82.33, 129.75, 113.5, NaN, NA, NaN,
118, 109.33, NaN, 120.71, 120.25, 101, 94.25, 136, 114, NaN,
NaN, 78, 114, 114, NaN, 78.67, 106.8, NaN, NA, NaN, NaN,
NA, 114, 108.33, NA, NA, 93, NaN, 123, NaN, NaN, 104.25,
116.67, 102.75, 97.25, 87.67, 107.75, 113.25, 113.25, 101,
NaN, 113.14, NaN, 101, NA, 114.5, 115, 126.17, 111.29, 104.25,
NA, 92, NA, NA, NA, NaN, NaN, 99, NA), ARG_G2_200_AC = c(NaN,
NaN, 1, NaN, NA, NA, 36, 5.25, 12.25, NaN, NA, NaN, 8.5,
8.33, NaN, 14.29, 11.38, 24, 22.25, 6, 11.67, NaN, NaN, 42.5,
9.25, 13.14, NaN, 32, 19.4, NaN, NA, NaN, NaN, NA, 15.6,
17, NA, NA, 24, NaN, 6.67, NaN, NaN, 21.5, 8.89, 17.5, 16,
37.83, 15.75, 12.25, 11.75, 20, NaN, 15.43, NaN, 26, NA,
12.25, 7.5, 5.67, 12.86, 14.75, NA, 27, NA, NA, NA, NaN,
NaN, 28.5, NA), ARG_G2_200_AB = c(NaN, NaN, 2, NaN, NA, NA,
36, 5.25, 16, NaN, NA, NaN, 10, 9.33, NaN, 16.57, 11.38,
24, 23.25, 9, 13, NaN, NaN, 44.33, 11.5, 14.29, NaN, 35,
19.4, NaN, NA, NaN, NaN, NA, 18.8, 17.33, NA, NA, 26, NaN,
7.67, NaN, NaN, 22.5, 9.33, 18.25, 20.25, 38.67, 19, 13.25,
13.25, 22, NaN, 18, NaN, 28, NA, 15.75, 8.83, 8.17, 15.14,
16, NA, 29.33, NA, NA, NA, NaN, NaN, 29, NA), ARG_G2_50_AAA = c(36.97,
35.4, 34.72, 33.81, NA, NA, 32.98, 35.7, 35.59, 35.36, NA,
36, 37.66, 36.35, 33.44, 34.72, 36.9, 34.32, 32.28, 33.74,
36.38, 35.06, 34.5, 31.47, 36.59, 36.18, 34.75, 31.9, 36.53,
32.62, NA, 33.85, 34.86, NA, 35.36, 34.52, NA, NA, 33.68,
35.89, 36.24, 37.21, 28, 34.05, 36.3, 34.16, 32.86, 32.06,
34.65, 35.57, 35.95, 33.19, 34.61, 34.6, 34.92, 34.24, NA,
34.33, 35.65, 36.16, 33.91, 34.37, NA, 33.44, NA, NA, NA,
33.93, 33.71, 35.42, NA), ARG_G2_50_AAC = c(40.2, 38.6, 42.09,
39.25, NA, NA, 35.68, 41.41, 39.12, 37.68, NA, 39, 41.16,
40.67, 36.11, 39.25, 40.65, 37.52, 35.14, 41.26, 41.13, 40.71,
36.25, 33.33, 40.59, 39.67, 36.83, 34.44, 40.57, 34, NA,
37, 36.45, NA, 39.52, 38.17, NA, NA, 36.52, 40.39, 40.69,
41.21, 29, 39.63, 40.23, 37.27, 36.58, 34.45, 38.87, 38.98,
39.51, 38.13, 37.68, 37.88, 38.85, 38.48, NA, 40, 40.43,
42.73, 39.93, 38.19, NA, 36.41, NA, NA, NA, 39.71, 36.43,
38.03, NA), ARG_G2_50_AC = c(0.8, 1.9, 0, 0.5, NA, NA, 2.93,
0.52, 0.58, 2.75, NA, 1.25, 0.21, 0.25, 2.11, 2, 0.85, 2.03,
2.67, 0.71, 0.82, 0.29, 0.75, 4.27, 0.63, 0.78, 2.92, 2.77,
1.17, 4.88, NA, 3, 2.64, NA, 1.78, 0.98, NA, NA, 2.29, 0.82,
0.45, 0.93, 6, 1.67, 0.86, 1.27, 1.79, 3.37, 1.11, 0.74,
0.79, 1.1, 0.71, 1.11, 1.08, 2.48, NA, 0.17, 0.75, 0.22,
0.91, 1.19, NA, 1.66, NA, NA, NA, 1.07, 1.75, 1.42, NA),
ARG_G2_50_AB = c(0.8, 2, 0.31, 0.5, NA, NA, 2.93, 0.52, 0.58,
2.75, NA, 1.25, 0.34, 0.5, 3.33, 2.44, 0.85, 2.03, 2.91,
1.42, 1, 0.94, 0.75, 4.63, 0.85, 0.96, 2.92, 3.49, 1.17,
4.88, NA, 3, 3.36, NA, 2.3, 0.98, NA, NA, 2.61, 0.82, 0.52,
0.93, 6, 1.91, 1.02, 1.34, 2.58, 3.67, 1.59, 0.96, 1.09,
1.39, 1.5, 1.65, 1.15, 2.76, NA, 0.93, 0.8, 0.82, 1.25, 1.44,
NA, 2.49, NA, NA, NA, 1.07, 1.75, 1.47, NA), NARR_G1_100_AAA = c(71.32,
NA, NA, 67.83, NaN, 71.6, 64.2, 71.68, 73.29, 70.53, 73.35,
59.31, 71.08, 74.06, 68.7, 74, 69.08, NA, 68.52, 63.47, 68.33,
NA, 65.64, 62.11, 63.9, 70.41, 60.36, 65.88, 68.81, 69.62,
70.68, 67.5, NA, 68.45, 67.16, 74.39, 60.6, 65.89, 71.94,
68.75, NA, NA, 67, 66.85, NA, NA, 62.56, 73.33, 69.81, 67.68,
73.06, 65.8, 63.85, NA, 67.64, 71.6, 68.47, 69.39, 71.16,
72.33, NA, 66.68, NA, 66.22, 67, 61.27, NaN, 72.33, 68.29,
71.33, 65.57), NARR_G1_100_AAC = c(74.26, NA, NA, 70.94,
NaN, 75, 66.14, 74.48, 77.07, 73.47, 76, 60.44, 73.92, 77.19,
71.4, 77.59, 72, NA, 70.38, 65.47, 70.54, NA, 68.09, 64.61,
66.5, 72.52, 62.59, 69.25, 71.48, 71.88, 74.4, 70.1, NA,
70, 69.6, 78.04, 62.3, 68.79, 73.44, 72.25, NA, NA, 67, 68.25,
NA, NA, 65.94, 75.71, 72.43, 69.68, 76, 68.6, 65.65, NA,
70.43, 74, 71.76, 71.17, 74.63, 74.22, NA, 69.47, NA, 68.72,
67, 62.82, NaN, 77.33, 69.76, 75.42, 67.62), NARR_G1_100_AC = c(3.05,
NA, NA, 2.33, NaN, 2.4, 1.89, 0.84, 0.07, 5.47, 1.12, 8.81,
2.39, 1.38, 3.6, 0.88, 2.65, NA, 2.05, 5.18, 2.38, NA, 5,
4.78, 6.4, 1.85, 7.41, 3.69, 1.85, 2.62, 1.28, 3.9, NA, 2.35,
3.8, 1.87, 5.1, 6.95, 1.67, 4.5, NA, NA, 4, 4.25, NA, NA,
7.17, 1.29, 2.62, 1.37, 1.47, 3.3, 7.27, NA, 3.64, 3.6, 2.59,
4.83, 0.63, 2.28, NA, 6.58, NA, 4.56, 6, 4.82, NaN, 0.67,
3.95, 1.75, 4.38), NARR_G1_100_AB = c(3.42, NA, NA, 3.17,
NaN, 2.5, 3.29, 1.64, 1.07, 6, 1.41, 9.25, 3.25, 2.69, 3.8,
1.32, 3.04, NA, 2.38, 5.18, 2.38, NA, 6.18, 6.11, 6.4, 1.85,
7.45, 3.69, 1.89, 3.25, 1.6, 4.8, NA, 2.8, 4.32, 2.3, 6.6,
7.42, 2.83, 4.75, NA, NA, 5, 4.75, NA, NA, 8, 1.71, 2.67,
2.05, 1.47, 4.8, 7.96, NA, 4.43, 3.8, 4.47, 4.91, 1.68, 2.78,
NA, 6.58, NA, 6.67, 6, 5.18, NaN, 1.67, 4.86, 2.08, 4.38),
NARR_G1_150_AAA = c(102, NA, NA, 96.22, NaN, 105.33, 87.1,
100.14, 106.17, 97.67, 99.88, 75.43, 99.62, 106.86, 95.3,
105.68, 97.14, NA, 92.82, 87.25, 96.23, NA, 88.5, 83.56,
89.75, 98.47, 80.64, 92.14, 96.07, 94.62, 99.46, 100, NA,
92.6, 94.54, 106.25, 82.5, 93.6, 100.33, 95, NA, NA, NaN,
90.9, NA, NA, 87.89, 101.08, 96.18, 95, 103.12, 92.75, 85.71,
NA, 94.17, NaN, 95.25, 97.5, 100.67, 100.44, NA, 90.9, NA,
90.11, NaN, 81.5, NaN, NaN, 94.45, 100.4, 91.64), NARR_G1_150_AAC = c(103.2,
NA, NA, 97.67, NaN, 106.67, 88.55, 102.43, 109.17, 98.78,
103.25, 76.57, 102.05, 109.43, 97.4, 108.42, 99.29, NA, 94.73,
89, 98, NA, 89.75, 85, 91.75, 100.47, 81.64, 93.14, 97.73,
96, 101.08, 101.33, NA, 94.1, 95.92, 110.33, 83.25, 95.5,
101.67, 98, NA, NA, NaN, 93, NA, NA, 90.56, 102.38, 99, 96.78,
106.5, 94.25, 87.43, NA, 98.33, NaN, 99, 98.92, 103.44, 103,
NA, 93.8, NA, 92, NaN, 82.25, NaN, NaN, 95.45, 102.8, 93.82
), NARR_G1_150_AC = c(6.4, NA, NA, 5.78, NaN, 5, 4.85, 2.29,
0.5, 12.44, 2.5, 19, 4.71, 3, 8, 1.63, 5.86, NA, 4.82, 9.25,
4.08, NA, 10.75, 9.44, 12.25, 3.6, 15.73, 7.14, 3.73, 7.12,
4.08, 6.33, NA, 5.1, 6.62, 3.08, 10.25, 12.5, 4.56, 7.5,
NA, NA, NaN, 8.6, NA, NA, 13.67, 3.15, 6, 2.22, 2.5, 8, 15,
NA, 6, NaN, 5.5, 8.75, 2.44, 4.33, NA, 13.9, NA, 8.78, NaN,
13.75, NaN, NaN, 7.73, 4.4, 9.36), NARR_G1_150_AB = c(7,
NA, NA, 7.33, NaN, 5.33, 7.4, 3.71, 2.17, 13.33, 2.88, 20.14,
6, 5.14, 8.5, 2.42, 6.43, NA, 5.18, 9.25, 4.08, NA, 12.5,
11.56, 12.25, 3.6, 15.73, 7.14, 4, 8.12, 4.46, 7.33, NA,
5.9, 7.54, 3.67, 13, 13.3, 6.78, 8, NA, NA, NaN, 9.1, NA,
NA, 15.11, 4.15, 6.09, 3.22, 2.5, 10.5, 16.29, NA, 7.33,
NaN, 8.38, 8.83, 4, 5.22, NA, 13.9, NA, 12.11, NaN, 15.25,
NaN, NaN, 9.27, 5, 9.36), NARR_G1_200_AAA = c(127.8, NA,
NA, 120.25, NaN, NaN, 105.85, 126.62, 134.5, 121.4, 126.25,
89.33, 126.23, 136, 120.4, 133.17, 124, NA, 115.5, 106.5,
120.86, NA, 115, 104.25, NaN, 123.22, 100, 114, 120.22, 115.67,
124.38, NaN, NA, 112.6, 119, 137.29, NaN, 118.4, 127, NaN,
NA, NA, NaN, 113.8, NA, NA, 111.5, 123.57, 122.33, 118.8,
130, NaN, 106.38, NA, 123.5, NaN, 123.75, 123.29, 127.2,
126.5, NA, 113.8, NA, 113.75, NaN, 101, NaN, NaN, 117.83,
125, 114.5), NARR_G1_200_AAC = c(130, NA, NA, 123, NaN, NaN,
107.54, 128.75, 136.5, 123, 128.5, 90, 128, 137.33, 121.6,
136.92, 125.5, NA, 117, 108.25, 122.29, NA, 115, 105, NaN,
125.11, 102, 116, 122.33, 117.33, 126.25, NaN, NA, 114.6,
121.12, 138.86, NaN, 119.2, 127.75, NaN, NA, NA, NaN, 114.4,
NA, NA, 113, 124.43, 124, 120.6, 133, NaN, 107, NA, 124.5,
NaN, 127.75, 123.57, 129, 127.5, NA, 115.6, NA, 117, NaN,
101, NaN, NaN, 118.5, 129, 115.5), NARR_G1_200_AC = c(11.2,
NA, NA, 12.5, NaN, NaN, 9.31, 4.25, 2, 17.8, 4.5, 32.33,
7.77, 5.67, 13.4, 2.67, 9.62, NA, 7.67, 15, 6.14, NA, 16,
14.75, NaN, 6.22, 24.33, 11, 6.67, 14.33, 7.62, NaN, NA,
9.4, 9.75, 4.86, NaN, 18.6, 8.25, NaN, NA, NA, NaN, 13.8,
NA, NA, 21.75, 6.14, 9.33, 6, 4.5, NaN, 23.75, NA, 8.5, NaN,
6.75, 13.86, 3.8, 6.75, NA, 21.4, NA, 12.75, NaN, 20, NaN,
NaN, 12.83, 7, 15.83), NARR_G1_200_AB = c(12, NA, NA, 14.5,
NaN, NaN, 12.85, 6.38, 4.5, 18.8, 5.25, 34.67, 9.54, 8.67,
14.4, 4, 10.62, NA, 8.33, 15, 6.29, NA, 18, 17.5, NaN, 6.22,
24.33, 11.33, 7, 15.33, 8.12, NaN, NA, 10.8, 11, 5.71, NaN,
19.6, 10.75, NaN, NA, NA, NaN, 14.6, NA, NA, 24, 7.57, 9.5,
8, 5, NaN, 25.75, NA, 10.5, NaN, 10.5, 14, 6, 8.75, NA, 21.4,
NA, 17.75, NaN, 22, NaN, NaN, 15.5, 8, 15.83), NARR_G1_50_AAA = c(37.69,
NA, NA, 37.02, 35.38, 34.34, 36.19, 37.25, 36.78, 36.83,
36.61, 34.2, 34.24, 37.51, 35.74, 34, 35.02, NA, 37.4, 36.18,
36.63, NA, 34.42, 34.38, 35.43, 37.2, 34.49, 34.2, 36.41,
37.07, 36.56, 34.93, NA, 36.06, 36.49, 35.31, 33.33, 34.27,
36.5, 36.5, NA, NA, 34.21, 36.02, NA, NA, 34.02, 35.59, 37.16,
36.02, 37.58, 36.53, 35.46, NA, 36.46, 38.42, 36.05, 37.39,
37.3, 36.22, NA, 35.31, NA, 33.96, 35.55, 35.03, 35, 35.31,
36.54, 36.06, 34.98), NARR_G1_50_AAC = c(41.85, NA, NA, 40.71,
37.5, 42.38, 39.05, 41.98, 42.51, 42.47, 43.43, 36.41, 42.17,
43.27, 40.42, 43.1, 40.52, NA, 41.65, 38.82, 40.63, NA, 40.35,
39.18, 38.93, 41.44, 38.3, 39.54, 40.73, 41.83, 42.54, 40.34,
NA, 40.69, 40.31, 43.51, 36.13, 39.1, 41.65, 41.62, NA, NA,
38.57, 40.02, NA, NA, 38.26, 42.66, 41.55, 39.7, 42.91, 40.43,
38.87, NA, 40.86, 43.26, 40.55, 40.84, 42.13, 42.09, NA,
40.31, NA, 39.69, 39.73, 36.97, 37.71, 43.44, 40.44, 42.33,
39.65), NARR_G1_50_AC = c(0.77, NA, NA, 0.69, 2.25, 0.45,
0.59, 0.12, 0, 1.15, 0.34, 2.61, 0.61, 0.24, 0.64, 0.26,
0.79, NA, 0.19, 1.43, 0.65, NA, 1.39, 1.11, 1.87, 0.31, 1.98,
1.07, 0.54, 0.29, 0.24, 0.76, NA, 0.59, 1.05, 0.62, 2.17,
2.25, 0.33, 1.62, NA, NA, 1.36, 1.53, NA, NA, 2.22, 0.22,
0.65, 0.45, 0.42, 0.9, 2.18, NA, 0.97, 0.05, 0.84, 0.98,
0, 0.44, NA, 1.83, NA, 1.71, 0.91, 1.16, 1.86, 0.12, 0.69,
0.45, 1.24), NARR_G1_50_AB = c(0.88, NA, NA, 0.82, 2.25,
0.45, 1.03, 0.45, 0.54, 1.36, 0.55, 2.71, 0.96, 0.73, 0.64,
0.47, 0.97, NA, 0.29, 1.43, 0.65, NA, 1.81, 1.69, 1.87, 0.31,
2.02, 1.07, 0.54, 0.52, 0.39, 1.1, NA, 0.8, 1.31, 0.82, 2.9,
2.44, 0.74, 1.62, NA, NA, 1.86, 1.76, NA, NA, 2.48, 0.38,
0.67, 0.66, 0.42, 1.67, 2.38, NA, 1.43, 0.16, 1.64, 1.04,
0.57, 0.69, NA, 1.83, NA, 2.6, 0.91, 1.16, 2.71, 0.75, 0.98,
0.58, 1.24), NARR_G2_100_AAA = c(64.25, 59, NA, 67.88, 67.08,
NA, 60.75, 64.42, 71.17, 58.42, NA, 49.8, 63.36, 65.2, NaN,
70.2, 62.85, NaN, 61.6, 53.92, 62.63, NA, NaN, 50.46, 65.14,
60.58, 63.29, NA, 64.33, NaN, NA, 68.57, NA, NA, 66.3, NA,
57.29, NA, 53.5, 63.48, NA, 57.07, NaN, 61.82, NA, 68.61,
57.1, 62.84, 63, 61.91, 58.38, NaN, 61.56, NA, NaN, 65.55,
63.8, 65, 63.14, 67.31, 67.75, 57.62, 63.31, 54.83, 66.43,
NA, NA, 64.67, 57.92, 59, NA)), row.names = c(NA, -71L), class = "data.frame")
I would suggest pulling your column names into a data frame, separating them into their components, and ordering them as desired:
library(dplyr)
library(tidyr)
col_df = data.frame(names = names(merged_DF)[-1]) ## -1 to skip the ID col
col_df = col_df %>%
separate(
col = names, sep = "_",
into = c("s1", "gnum", "num2", "astring"),
remove = FALSE, convert = TRUE
) %>%
arrange(s1, num2, astring, gnum)
## now we have the names in order:
col_df
# names s1 gnum num2 astring
# 1 ARG_G1_50_AAA ARG G1 50 AAA
# 2 ARG_G2_50_AAA ARG G2 50 AAA
# 3 ARG_G1_50_AAC ARG G1 50 AAC
# 4 ARG_G2_50_AAC ARG G2 50 AAC
# 5 ARG_G1_50_AB ARG G1 50 AB
# 6 ARG_G2_50_AB ARG G2 50 AB
# 7 ARG_G1_50_AC ARG G1 50 AC
# 8 ARG_G2_50_AC ARG G2 50 AC
# 9 ARG_G1_100_AAA ARG G1 100 AAA
# 10 ARG_G2_100_AAA ARG G2 100 AAA
# ...
## we can use this order to rearrange the columns
merged_DF = select(merged_DF, c(ID, col_df$names))
names(merged_DF)
# [1] "ID" "ARG_G1_50_AAA" "ARG_G2_50_AAA" "ARG_G1_50_AAC" "ARG_G2_50_AAC"
# [6] "ARG_G1_50_AB" "ARG_G2_50_AB" "ARG_G1_50_AC" "ARG_G2_50_AC" "ARG_G1_100_AAA"
# [11] "ARG_G2_100_AAA" "ARG_G1_100_AAC" "ARG_G2_100_AAC" "ARG_G1_100_AB" "ARG_G2_100_AB"
# [16] "ARG_G1_100_AC" "ARG_G2_100_AC" "ARG_G1_150_AAA" "ARG_G2_150_AAA" "ARG_G1_150_AAC"
# [21] "ARG_G2_150_AAC" "ARG_G1_150_AB" "ARG_G2_150_AB" "ARG_G1_150_AC" "ARG_G2_150_AC"
# [26] "ARG_G1_200_AAA" "ARG_G2_200_AAA" "ARG_G1_200_AAC" "ARG_G2_200_AAC" "ARG_G1_200_AB"
# [31] "ARG_G2_200_AB" "ARG_G1_200_AC" "ARG_G2_200_AC" "NARR_G1_50_AAA" "NARR_G1_50_AAC"
# [36] "NARR_G1_50_AB" "NARR_G1_50_AC" "NARR_G1_100_AAA" "NARR_G2_100_AAA" "NARR_G1_100_AAC"
# [41] "NARR_G1_100_AB" "NARR_G1_100_AC" "NARR_G1_150_AAA" "NARR_G1_150_AAC" "NARR_G1_150_AB"
# [46] "NARR_G1_150_AC" "NARR_G1_200_AAA" "NARR_G1_200_AAC" "NARR_G1_200_AB" "NARR_G1_200_AC"
I bet that there are simpler ways of doing this but this one seems to work.
intercalate <- function(X, pattern) {
f <- function(h, n) {
i <- seq(1, length(h), by = 2)
j <- seq(2, length(h), by = 2)
h[order(c(i, j))]
}
#
g <- function(x, y) {
nx <- length(x)
ny <- length(y)
if(nx == ny) {
h <- c(x, y)
f(h, nx)
} else if(nx > ny) {
h <- c(x[seq_along(y)], y)
h <- f(h, ny)
c(h, x[-seq_along(y)])
} else {
h <- c(x, y[seq_along(x)])
h <- f(h, nx)
c(h, y[-seq_along(x)])
}
}
#
s <- grepl(pattern = pattern, X)
s <- abs(c(0, diff(s)))
sp <- split(X, cumsum(s))
i_odd <- seq(1, length(sp), by = 2)
i_even <- seq(2, length(sp), by = 2)
new_names <- mapply(g, sp[i_odd], sp[i_even])
unname(unlist(new_names))
}
newnames <- intercalate(names(merged_DF)[-1], pattern = "G2")
newnames <- c(names(merged_DF)[1], newnames)
merged_DF[newnames]
This is probably insufficient to the task:
strings <- c('ARG_G1_50_AAA' ,'ARG_G1_50_AAC', 'ARG_G1_50_AC' ,'ARG_G1_50_AB',
'ARG_G2_50_AAA' ,'ARG_G2_50_AAC', 'ARG_G2_50_AC')
substring(strings, regexpr('_\\K[[:upper:]]{2,3}', strings, perl = TRUE), nchar(strings))
[1] "AAA" "AAC" "AC" "AB" "AAA" "AAC" "AC"
idx_strings <- order(substring(strings, regexpr('_\\K[[:upper:]]{2,3}', strings, perl = TRUE), nchar(strings)))
idx_strings
[1] 1 5 2 6 4 3 7
> strings[idx_strings]
[1] "ARG_G1_50_AAA" "ARG_G2_50_AAA" "ARG_G1_50_AAC" "ARG_G2_50_AAC"
[5] "ARG_G1_50_AB" "ARG_G1_50_AC" "ARG_G2_50_AC"
Getting nearly desired 'set1' results for 'NARR_' and 'ARG_' as follows
for 'NARR_', using #akrun data v1, though [7] & [8] appear reversed
idx_v1_N <- which(regexpr('^[N]', v1, perl = TRUE) == 1)
v1[idx_v1_N[order(
substring(v1[idx_v1_N],
regexpr('[^_.G][\\d_]\\d.+[[:upper:]]', v1[idx_v1_N], perl = TRUE),
nchar(v1[idx_v1_N]))[idx_v1_N])]]
[1] "NARR_G1_100_AAC" "NARR_G1_100_AB" "NARR_G2_100_AC" "NARR_G1_150_AAC"
[5] "NARR_G1_150_AB" "NARR_G1_100_AAA" "NARR_G2_150_AAA" "NARR_G2_100_AAA"
[9] "NARR_G1_100_AC" "NARR_G1_150_AAA" "NARR_G1_150_AC" "NARR_G2_100_AAC"
[13] "NARR_G2_50_AB" "NARR_G1_50_AC" "NARR_G1_50_AAA" "NARR_G2_150_AB"
[17] "NARR_G2_150_AAC" "NARR_G2_50_AC" "NARR_G1_50_AAC" "NARR_G2_150_AC"
[21] "NARR_G1_50_AB" "NARR_G2_100_AB" "NARR_G2_50_AAA" "NARR_G2_50_AAC"
the substring and regexpr '[^_.G][\\d_]\\d.+[[:upper:]]' return
substring(v1[idx_v1_N], regexpr('[^_.G][\\d_]\\d.+[[:upper:]]', v1[idx_v1_N], perl = TRUE), nchar(v1[idx_v1_N]))
[1] "1_100_AB" "1_150_AAC" "2_50_AB" "1_150_AB" "2_100_AAA" "1_100_AAC"
[7] "1_150_AAA" "2_100_AC" "1_100_AAA" "1_150_AC" "2_100_AAC" "2_150_AAA"
[13] "1_100_AC" "1_50_AC" "1_50_AAA" "2_150_AB" "2_150_AAC" "2_50_AC"
[19] "1_50_AAC" "2_150_AC" "1_50_AB" "2_100_AB" "2_50_AAA" "2_50_AAC"
which is then order([ed] nearly correctly. Results for 'ARG_' just need an index for starting with 'A'. There are better hammers for this nail, as seen above.

Divide each column of a dataframe by one row of the dataframe

I would like to divide each column of my dataframe by the values of one row.
I tried to transform my dataframe into a matrix and to extract one row of the dataframe as a vector then divide the matrix by the vector but it did not work. Indeed, only the first row of the matrix got divided by the vector.
Here is my original dataframe.
And this is the code I tried to run :
data <- read_excel("Documents/TFB/xlsx_geochimie/solfatara_maj.xlsx")
View(data)
data.mat <- as.matrix(data[,2:20])
vector <- data[12,2:20]
data.mat/vector
We replicate the vector to make the length same and then do the division
data.mat/unlist(vector)[col(data.mat)]
# FeO Total S SO4 Total N SiO2 Al2O3 Fe2O3 MnO MgO CaO Na2O K2O
#[1,] 0.10 16.5555556 NA NA 0.8908607 0.8987269 0.1835206 0.08333333 0.03680982 0.04175365 0.04823151 0.5738562
#[2,] 0.40 125.8333333 NA NA 0.5510204 0.4456019 0.2359551 0.08333333 0.04294479 0.01878914 0.04501608 0.2588235
#[3,] 0.85 0.6111111 NA NA 1.0021295 1.0162037 0.7715356 1.08333333 0.53987730 0.69728601 1.03858521 1.0457516
#[4,] 0.15 48.0555556 NA NA 1.1027507 0.2569444 NA 0.08333333 0.01840491 0.01878914 0.04180064 0.1647059
#[5,] 0.85 NA NA NA 1.0889086 1.0271991 0.6591760 0.75000000 0.59509202 0.53862213 1.02250804 1.1228758
#[6,] NA NA NA NA 1.3426797 0.6319444 0.0411985 0.08333333 0.03067485 0.11899791 0.65594855 0.7764706
# TiO2 P2O5 LOI LOI2 Total Total 2 Fe2O3(T)
#[1,] 0.7924528 0.3928571 7.0841837 6.6963855 0.9922233 0.9894632 0.14489796
#[2,] 0.5094340 0.3214286 14.5561224 13.7710843 0.9958126 0.9936382 0.31020408
#[3,] 0.8679245 0.6428571 1.5637755 1.5228916 0.9990030 0.9970179 0.80612245
#[4,] 1.4905660 0.2857143 7.4056122 7.0024096 0.9795613 0.9769384 0.05510204
#[5,] 1.0377358 0.2500000 0.3520408 0.3783133 0.9969093 0.9960239 0.74489796
#[6,] 0.3018868 0.2500000 1.2551020 1.1879518 1.0019940 1.0000000 0.04489796
Or use sweep
sweep(data.mat, MARGIN = 2, unlist(vector), FUN = `/`)
Or using mapply with asplit
mapply(`/`, asplit(data.mat, 2), vector)
data
data_mat <- structure(c(0.2, 0.8, 1.7, 0.3, 1.7, NA, 5.96, 45.3, 0.22, 17.3,
NA, NA, NA, 6.72, NA, 4.08, 0.06, 0.16, NA, NA, NA, NA, NA, NA,
50.2, 31.05, 56.47, 62.14, 61.36, 75.66, 15.53, 7.7, 17.56, 4.44,
17.75, 10.92, 0.49, 0.63, 2.06, NA, 1.76, 0.11, 0.01, 0.01, 0.13,
0.01, 0.09, 0.01, 0.06, 0.07, 0.88, 0.03, 0.97, 0.05, 0.2, 0.09,
3.34, 0.09, 2.58, 0.57, 0.15, 0.14, 3.23, 0.13, 3.18, 2.04, 4.39,
1.98, 8, 1.26, 8.59, 5.94, 0.42, 0.27, 0.46, 0.79, 0.55, 0.16,
0.11, 0.09, 0.18, 0.08, 0.07, 0.07, 27.77, 57.06, 6.13, 29.03,
1.38, 4.92, 27.79, 57.15, 6.32, 29.06, 1.57, 4.93, 99.52, 99.88,
100.2, 98.25, 99.99, 100.5, 99.54, 99.96, 100.3, 98.28, 100.2,
100.6, 0.71, 1.52, 3.95, 0.27, 3.65, 0.22), .Dim = c(6L, 19L), .Dimnames = list(
NULL, c("FeO", "Total S", "SO4", "Total N", "SiO2", "Al2O3",
"Fe2O3", "MnO", "MgO", "CaO", "Na2O", "K2O", "TiO2", "P2O5",
"LOI", "LOI2", "Total", "Total 2", "Fe2O3(T)")))
vector <- structure(list(FeO = 2, `Total S` = 0.36, SO4 = NA_real_, `Total N` = NA_real_,
SiO2 = 56.35, Al2O3 = 17.28, Fe2O3 = 2.67, MnO = 0.12, MgO = 1.63,
CaO = 4.79, Na2O = 3.11, K2O = 7.65, TiO2 = 0.53, P2O5 = 0.28,
LOI = 3.92, LOI2 = 4.15, Total = 100.3, `Total 2` = 100.6,
`Fe2O3(T)` = 4.9), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"))
To divide data frame, df, by the third row:
df/df[rep(3, nrow(df)), ]

How to get count of pairs giving rise to correlation values using the cor() function

I like the cor() function but would like to know how to get a count of the number of pairs in a sparse matrix giving rise to the correlation values.
Many thanks.
Here's a very simplified version of the kind of data table I have.
example data table picture
I've also added a cut down version of the actual file I'm working on here
What I'd like is to find a way to give me something like the following matrix (below is from the pic rather than the file in the hyperlink):
example desired output
This shows how many pairs of values there are in common between columnA and columnB so I can see which columns are worth comparing in a correlation. Does that make sense?
dput(mat)
structure(list(A = c(9.4, 9.4, 4.7, 1.2, NA, 0.6, 7.712, 0.2,
NA, NA, 3.13, NA, 1.56, 6.25, NA, NA, 0.9471, NA, 1.56, 1.2,
0.78, NA, NA, NA, NA, NA, NA), B = c(4.7, 12.5, 2.3, 2.3, 9.4,
0.78, 9.45, 0.6, NA, NA, 3.13, NA, 2.3, 6.25, NA, NA, 10.72,
NA, 2.3, 12.5, 6.25, NA, NA, NA, NA, NA, NA), C = c(4.7, 9.4,
4.7, 0.6, NA, 0.6, 10.84, 0.2, 3.67, 2.345, 3.13, 3.288, 1.56,
9.4, 11.21, 0.6, 2.256, 50, 1.56, 3.13, 0.78, 18.7, 0.66, 1.2,
6.26, 6.258, 50)), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA,
-27L))
outdf <- c()
for (x in colnames(mat)) {
for (y in colnames(mat)) {
subset <- mat[,c(x, y)]
number_complete <- nrow(subset[complete.cases(subset),])
row <- c(x, y, number_complete)
outdf <- rbind(outdf, row)
}
}
outdf <- data.frame(outdf)
dcast(outdf, X1 ~ X2)
# X1 A B C
# 1 A 14 14 14
# 2 B 14 15 14
# 3 C 14 14 26

Weird behaviour (bug?) in car::bcPower

Consider the dataset Kort:
structure(list(V1 = c(-0.03, 0.22, -0.11, -0.01, 0.25, 0.29,
-0.74, 0.23, 0.39, -0.04, 0.18, 0.19, 0.4, 0.21, 0.21, -0.01,
-0.05, 0.02, -0.12, 0.37, -0.07, 0.51, 0.39, 0.14, 0.02, 0.73,
-0.25, 0.44, 0.29), V2 = c(35.39, 34.33, 32.74, 34.72, 33.07,
30.9, 29.89, 31.17, 31.62, 33.13, 30.64, 33.31, 33.61, 34.16,
30.06, 30.06, 31.18, 25.57, 30.52, 32.43, 31.54, 29.6, 34.66,
31.74, 27.22, 41, 32.02, 37.96, 29.25), V3 = c(37.24, 36.77,
37.21, 41.16, 40.3, 42.16, 40.77, 39.59, 37, 38.32, 34.6, 38.1,
36.07, 39.2, 36.97, 38.28, 38.72, 46.81, 39.63, 36, 45.33, 38.72,
36.2, 40.94, 37.7, 42.44, 37.92, 39.87, 37.15), V4 = c(-36L,
-18L, -2L, 20L, 37L, 39L, -7L, 31L, -23L, 32L, 73L, 10L, 14L,
18L, 126L, 98L, 13L, 14L, 15L, 37L, 66L, 3L, -50L, 9L, 6L, -20L,
4L, -26L, -2L), V5 = c(12.4, 10.5, 2.8, 9.5, 9.4, 10.7, 7.5,
14.8, 10.9, 13.5, 11.5, 11.8, 13.6, 8.6, 13.6, 13.1, 14.3, 11.3,
16.1, 14.5, 8.4, 15.4, 13.4, 14, 18.8, 17.4, 16.4, 16, 17.7),
V6 = c(27424L, 25597L, 20968L, 24730L, 25423L, 25801L, 23681L,
29527L, 26228L, 28262L, 27363L, 27134L, 27542L, 24647L, 28260L,
27922L, 29054L, 25650L, 30096L, 29103L, 24112L, 30035L, 28771L,
27818L, 32455L, 29722L, 30508L, 29896L, 31961L), V7 = c(68.8,
70.4, 61.6, 73.5, 71.8, 76.5, 72.7, 75.3, 71.7, 75, 72.9,
73.3, 73.7, 69, 72.7, 74.2, 73.4, 71.2, 76.4, 73, 62.5, 76,
73.7, 74.7, 74.3, 74.8, 74.6, 74.4, 74.4), V8 = c(8.1, 6.8,
11, 5.3, 6.3, 4.1, 5.5, 4, 5.9, 4.3, 5.5, 5.4, 4.2, 8.1,
5.2, 4.8, 4.4, 8.2, 3.8, 5.9, 12.9, 4.3, 5.2, 5, 3.6, 3.8,
4.6, 4.3, 4.5), V9 = c(0.38, 0.15, 0.16, 0.08, 0.12, 0.05,
0.07, 0.04, 0.08, 0.07, 0.13, 0.08, 0.08, 0.26, 0.05, 0.14,
0.05, 0.26, 0.03, 0.18, 0.26, 0.04, 0.04, 0.14, 0.05, 0,
0.02, 0.02, 0.1), V10 = c(9.8, 9.9, 19.4, 7, 9.2, 3, 8.5,
1.1, 3, 2.3, 5.1, 5.6, 1, 22.3, 4.4, 6.2, 2.2, 5.3, 1.5,
5, 18.7, 1.5, 3, 8.9, 1.6, 0, 5.1, 2.1, 3.6), V11 = c(6.3,
7.5, 5.5, 10.2, 5, 9.6, 9.3, 4.8, 4.3, 4.6, 4.1, 5.7, 6.4,
4, 7.2, 4.7, 4.2, 4.5, 7.6, 5.3, 6.2, 4.1, 4.9, 4.1, 5.1,
3.3, 5.4, 5, 5.6), V12 = c(153605L, 152867L, 115972L, 140341L,
139245L, 167038L, 143239L, 179712L, 135273L, 167487L, 160738L,
160648L, 154717L, 118800L, 168954L, 148412L, 147637L, 142615L,
210838L, 161840L, 114310L, 182670L, 160293L, 147747L, 192889L,
191077L, 164107L, 202051L, 192945L)), .Names = c("V1", "V2",
"V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12"
), class = "data.frame", row.names = c(NA, -29L))
Where the response is:
Kort$V12
[1] 153605 152867 115972 140341 139245 167038 143239 179712 135273 167487
[11] 160738 160648 154717 118800 168954 148412 147637 142615 210838 161840
[21] 114310 182670 160293 147747 192889 191077 164107 202051 192945
Doing a box-cox transform, using car::boxcox
boxcox(V12~.,data=Kort,lambda=seq(-4,4,4/10))
yields an optimal parameter of -2. Transforming the response using
car::bcPower
TVP<-bcPower(Kort$V12,lambda=-2)
turns TVP into a vector of constants:
TVP
[1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[20] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
but box cox transform should be a continuous map!
I don't think this is a bug, there's simply a limit to how many decimal places are printed out. The help file suggests that the calculation is (U^(lambda)-1)/lambda which is pretty close to 1/2 where U is large. You can see that TVP is being calculated correctly with
TVP-0.5
# [1] -2.119138e-11 -2.139650e-11 -3.717610e-11 ...
or
options(digits=20)
TVP
# [1] 0.49999999997880861802 0.49999999997860350431 0.49999999996282390446 ...

Resources