check if two columns have a one-to-one relationship in R - r

Is there any existing R functionality to check if two columns have a one-to-one relationship (regardless of column type).
Example of expected output:
A B C
0 'a' 'apple'
1 'b' 'banana'
2 'c' 'apple'
A & B are one-to-one? TRUE
A & C are one-to-one? FALSE
B & C are one-to-one? FALSE

If you match a vector to itself it will return an integer vector giving the first index each unique value occurs at. We can compare these integer vectors directly:
is_one_to_one = function(x, y) {
xu = match(x, x)
yu = match(y, y)
identical(xy, yu)
}
You could then apply this to each pair of columns.
Wrapping it up in a function:
cor_1to1 = function(df) {
mat = vapply(df, \(x) match(x, x), FUN.VALUE = integer(nrow(df)))
nm = combn(colnames(mat), m = 2, FUN = paste, collapse = " :: ")
val = combn(colnames(mat), m = 2, FUN = function(i) {
identical(mat[, i[1]], mat[, i[2]])
}, simplify = TRUE)
setNames(val, nm)
}
# A :: B A :: C B :: C
# TRUE FALSE FALSE

You can do:
one_to_one <- function(data){
data[] <- sapply(data, \(x) match(x, x))
pairs <- t(combn(seq_len(ncol(data)), 2))
cbind(t(matrix(colnames(data)[t(pairs)], nrow = 2)),
One2One = apply(pairs, 1, function(x) all(Reduce(`==`, data[, x])))) |>
as.data.frame()
}
test
one_to_one(df)
# V1 V2 One2One
#1 A B TRUE
#2 A C FALSE
#3 B C FALSE

Related

Referring to Elements in an Array in a For Loop in R - beginner

Edit: Someone said the question is unclear, edited.
I have made a 3 dimensional array, and assigned values as follows:
D <- c('g', 't', NA, 'd')
nPeriods = 4
column.names = c('aaa', 'bbb')
row.names = c('jjj', 'hhh')
threeD.names = c(1:nPeriods)
E = array(c(D), dim=c(2, 2, nPeriods),
dimnames = list(row.names, column.names, threeD.names))
values <- c(g = 5,
t = 2,
d = 7)
G <- apply(E, 1:3, function(x) values[x])
Now I want to make a for loop, to do things like:
for (i in 2:nPeriods){
G[1,1,i]=G[1,1,i-1]*G[2,1,i-1]+G[2,2,i]
}
But I don't want to have to find the location of g, t and d each time I want to write something like this. I just want to be able to use g, t, and d if possible.
Question ends here.
Below is some helpful code that could possibly be adapted to find a solution?
I have this code which looks up and returns an index for each value:
result <- G
for (i in 2:dim(G)[3]) {
idx <- which(E[, , 1] == 'g', arr.ind = T)
row <- idx[1, 'row']
col <- idx[1, 'col']
result[row, col, i] <- result[row, col, i-1] * 2
}
For a simpler problem, but my real array is quite large, so writing for each element will be long. Is there a way of automating this?
They also suggested this - which is great for simple sums, but I'm not sure how it could apply to the type of sum I have above:
funcs <- c(g = '*', t = '+', d = '-')
modifiers <- c(g = 2, t = 3, d = 4)
G <- apply(E, 1:3, function(x) values[x])
result <- G
for (i in 2:dim(G)[3]) {
for (j in names(values)) {
idx <- which(E[, , 1] == j, arr.ind = T)
row <- idx[1, 'row']
col <- idx[1, 'col']
result[row, col, i] <- do.call(funcs[j], args = list(result[row, col, i-1], modifiers[j]))
}
}
Based on the clarification, maybe this works - get the row/column index for 'g', 't', 'd' from the E[, , 1], loop over the nPeriods from 2, and update the 'result' by subseting the elements with a matrix index created with cbind using gidx, tidx and didx with i or i-1 to update recursively
result <- G
gidx <- which(E[, , 1] == 'g', arr.ind = TRUE)
tidx <- which(E[, , 1] == 't', arr.ind = TRUE)
didx <- which(E[, , 1] == 'd', arr.ind = TRUE)
for (i in 2:nPeriods) {
result[cbind(gidx, i)] <- result[cbind(gidx, i-1)] *
result[cbind(tidx, i-1)] + result[cbind(didx, i)]
}
-output
> result
, , 1
aaa bbb
jjj 5 NA
hhh 2 7
, , 2
aaa bbb
jjj 17 NA
hhh 2 7
, , 3
aaa bbb
jjj 41 NA
hhh 2 7
, , 4
aaa bbb
jjj 89 NA
hhh 2 7
-checking with OP's output
resultold <- G
for (i in 2:nPeriods){
resultold[1, 1, i] <- resultold[1,1,i-1]* resultold[2,1,i-1]+resultold[2,2,i]
}
identical(result, resultold)
[1] TRUE

Merge two lists into one component by component

I have two lists, plus an empty list:
A <- list(1:4,5:8,9:12)
B <- c("a","b")
C <- vector(mode = "list")
I would like to merge A and B into C as following:
C[[1]][1] = A[[1]] C[[1]][2] = B
C[[2]][1] = A[[2]] C[[2]][2] = B
C[[3]][1] = A[[3]] C[[3]][2] = B
Thank you.
How about C <- lapply(A, function(x) list(x, B))?
for example:
A <- list(1:4,5:8,9:12)
B <- c("a","b")
C <- lapply(A, function(x) list(x, B))
# C <- lapply(A, list, B) # also works
all(
C[[1]][[1]] == A[[1]],
C[[2]][[1]] == A[[2]],
C[[3]][[1]] == A[[3]],
C[[1]][[2]] == B,
C[[2]][[2]] == B,
C[[3]][[2]] == B
)
note that you'll need double [[ since each element of C is also a list (C[[1]][[1]] rather than C[[1]][1]).

R - looping through column differentials

let's assume we have 4 vectors
a <- c(200,204,209,215)
b <- c(215,220,235,245)
c <- c(230,236,242,250)
d <- c(240,242,243,267)
I basically want to create a loop which creates the differentials between each pair, and then calculate the Z scores for those differentials. So something like scale(d-a). How do I create the loop that basically goes scale(b-a), then scale(c-a), scale(d-a) etc? many thanks.
Single named variables don't lend themselves too well to "looping".
Let's use a list() of vectors instead:
vecs <- list(
a = c(200,204,209,215),
b = c(215,220,235,245),
c = c(230,236,242,250),
d = c(240,242,243,267)
)
This allows us to apply a function to all pairs using combn
scale_diff <- function(subset) {
z <- scale(subset[[1]] - subset[[2]])
colnames(z) <- paste(names(subset), collapse = " - ")
z
}
z_scores <- combn(vecs, 2, scale_diff, simplify = FALSE)
Now z_scores is a list of 6 matrices (column vectors). The column names show you which vectors were subtracted before scaling.
We can place it in a list and use combn to get the combinations and then apply the difference
lst1 <- list(a = a, b = b, c = c, d = d)
out <- combn(lst1, 2, FUN = function(x) scale(Reduce(`-`, x))[,1])
colnames(out) <- combn(names(lst1), 2, FUN = paste, collapse='_')
out
# a_b a_c a_d b_c b_d c_d
#[1,] 0.9108601 1.2009612 0.1290994 -0.7643506 -0.753390 -0.2219686
#[2,] 0.7759179 0.2401922 0.3872983 -0.9441978 -0.360317 0.3699477
#[3,] -0.5735045 -0.2401922 0.9036961 0.6744270 1.474024 1.1098432
#[4,] -1.1132735 -1.2009612 -1.4200939 1.0341214 -0.360317 -1.2578222
As #AlexR mentioned in the comments, if the attributes are important, then remove [,1] and keep it as a matrix of 1 column
out <- combn(lst1, 2, FUN = function(x) scale(Reduce(`-`, x)), simplify = FALSE)

An R function to return TRUE if the length of 3 objects is the same

I was wondering how I could test to see if the length of 3 R objects are the same or not?
Here is a simple example and what I have tried with no success:
a = c(2, 3) ; b = 2 ; c = "hi"
is.df = function(x, y, z) length(x) != length(y) != length(z) ## gives error
foo = function(...){
length(unique(lengths(list(...)))) == 1
}
foo(a, b, c)
#[1] FALSE

Keep all names from list to data.frame

When converting a list into a data.frame, R names the variables automatically by concatenating all the sublists names. However it appears that it only keeps the last name when a list is of length 1. Is there a way to enforce a full path name for the variable name?
MWE:
> l <- list(a = list(b = 1), c = 2)
> l
$a
$a$b
[1] 1
$c
[1] 2
> data.frame(l)
b c
1 1 2
> ll <- list(a = list(b = 1, bb = 1), c = 2)
> data.frame(ll)
a.b a.bb c
1 1 1 2
Here I would like to have a.b as the name of the variable of data.frame(l) like it does for data.frame(ll).
A possible solution is to create a function that converts the list into a data frame with as.data.frame() and then sets the names to the desired values in a second step:
list_df <- function(list) {
df <- as.data.frame(list)
names(df) <- list_names(list)
return (df)
}
Obviously, defining list_names() is the hard part. One possibility is to recurse through the nested lists:
list_names <- function(list) {
recursor <- function(list, names) {
if (is.list(list)) {
new_names <- paste(names, names(list), sep = ".")
out <- unlist(mapply(list, new_names, FUN = recursor))
} else {
out <- names
}
return(out)
}
new_names <- unlist(mapply(list, names(list), FUN = recursor))
return(new_names)
}
This works for your two examples:
l <- list(a = list(b = 1), c = 2)
ll <- list(a = list(b = 1, bb = 1), c = 2)
list_df(l)
## a.b c
## 1 1 2
list_df(ll)
## a.b a.bb c
## 1 1 1 2
It also works for a list that is not nested, as well as for a list with deeper nesting:
ls <- list(a = 1, b = 3)
lc <- list(a = list(b = 1, bb = 1), c = 2, d = list(e = list(f = 1, ff = 2), ee = list(fff = 5)))
list_df(ls)
## a b
## 1 1 3
list_df(lc)
## a.b a.bb c d.e.f d.e.ff d.ee.fff
## 1 1 1 2 1 2 5

Resources