Matching interacting terms to a vector - r

I am trying to merge the xy vector with z vector according to the interaction terms that are in xy to the terms in z. Then change the final code to Q1, Q2...Q1*Q2
I have two vectors that need to match as vector xy:
x<-c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,4,6,6,9,10,16,21)
y<-c(1,2,3,5,6,8,18,1,2,5,6,7,8,12,15,16,11,17,18,19,20,21)
I want any of 2*6,or 6*11 to be added to the vector z for any case of z because according to vector xy there are interactions between 2,6,11 according to vector z
xy=paste0(x,"*",y,collapse=",")
xy
# [1] #"1*1,1*2,1*3,1*5,1*6,1*8,1*18,2*1,2*2,2*5,2*6,2*7,3*8,3*12,3*15,4*16,6*11,6*17,#9*18,10*19,16*20,21*21"
z<-c(2,6,11)
z
#[1] 2 6 11
I want a fourth vector to have all interactions of z from vector xy and combined into a new vector xyz
xyz<-print("2+6+11+2*6+6*11")
#[1] "2+6+11+2*6+2*11+6*11"
xyz
#[1] "2+6+11+2*6+2*11+6*11"
then for each varaible 2,6,11 convert to Q1,Q2,Q3 So the end product looks like...
xyz<-print("Q1+Q2+Q3+Q1*Q2+Q2*Q3")
#[1]
#End result:
#"Q1+Q2+Q3+Q1*Q2+Q2*Q3"

Just loop through x/y and add an interaction if they are element of z:
x <- c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,4,6,6,9,10,16,21)
y <- c(1,2,3,5,6,8,18,1,2,5,6,7,8,12,15,16,11,17,18,19,20,21)
z <- c(2,6,11)
xyz <- as.character(z)
for(i in 1:length(x)){
if(x[i] %in% z & y[i] %in% z & x[i] != y[i]){
xyz <- c(xyz, (paste0(x[i], "*", y[i])))
}
}
xyz <- paste(xyz, collapse = "+")
Then replace these numbers with any coding you define:
z_map <- c("Q1", "Q2", "Q3")
for(i in 1:length(z)){
xyz <- gsub(z[i], z_map[i], xyz)
}

Related

R: Filter vectors by 'two-way' partial match

With two vectors
x <- c("abc", "12")
y <- c("bc", "123", "nomatch")
is there a way to do a filter of both by 'two-way' partial matching (remove elements in one vector if they contain or are contained in any element in the other vector) so that the result are these two vectors:
x1 <- c()
y1 <- c("nomatch")
To explain - every element of x is either a substring or a superstring of one of the elements of y, hence x1 is empty. Update - it is not sufficient for a substring to match the initial chars - a substring might be found anywhere in the string it matches. Example above has been updated to reflect this.
I originally thought ?pmatch might be handy, but your edit clarifies you don't just want to match the start of items. Here's a function that should work:
remover <- function(x,y) {
pmx <- sapply(x, grep, x=y)
pmy <- sapply(y, grep, x=x)
hit <- unlist(c(pmx,pmy))
list(
x[!(seq_along(x) %in% hit)],
y[!(seq_along(y) %in% hit)]
)
}
remover(x,y)
#[[1]]
#character(0)
#
#[[2]]
#[1] "nomatch"
It correctly does nothing when no match is found (thanks #Frank for picking up the earlier error):
remover("yo","nomatch")
#[[1]]
#[1] "yo"
#
#[[2]]
#[1] "nomatch"
We can do the following:
# Return data.frame of matches of a in b
m <- function(a, b) {
data.frame(sapply(a, function(w) grepl(w, b), simplify = F));
}
# Match x and y and remove
x0 <- x[!apply(m(x, y), 2, any)]
y0 <- y[!apply(m(x, y), 1, any)]
# Match y and x and remove
x1 <- x0[!apply(m(y0, x0), 1, any)]
y1 <- y0[!apply(m(y0, x0), 2, any)]
x1;
#character(0)
x2;
#[1] "nomatch"
I build a matrix of all possible matches in both directions, then combine both with | as a match in any direction is equally a match, and then and use it to subset x and y:
x <- c("abc", "12")
y <- c("bc", "123", "nomatch")
bool_mat <- sapply(x,function(z) grepl(z,y)) | t(sapply(y,function(z) grepl(z,x)))
x1 <- x[!apply(bool_mat,2,any)] # character(0)
y1 <- y[!apply(bool_mat,1,any)] # [1] "nomatch"

How to get names of cells in matrix (e.g., correlation matrix) which match a condition in R using row and column names for cell names?

I often need deal with large correlation matrices between sets of variables, and I want to know which correlations meet a given condition (e.g., are above .2 or .3 or absolute .2 or .3) and so on. So given a correlation matrix, it would be useful if I could get the pairs of variables that form correlations that satisfy a condition.
To make it a little more concrete, here is matrix
x <- matrix(1:9, nrow = 3)
rownames(x) <- colnames(x) <- c("a", "b", "c")
x
# x
# a b c
# a 1 4 7
# b 2 5 8
# c 3 6 9
I want a function that allows me to specify a cell condition, and then will return names for matching cells.
e.g., > 8 returns "c:c"
odd number returns "a:a", "c:a", ... "c:c"
The following function takes a matrix and a function. The function should return TRUE/FALSE for each cell value.
Using the sample matrix:
x <- matrix(1:9, nrow = 3)
rownames(x) <- colnames(x) <- c("a", "b", "c")
The function is:
cell_matches <- function(x, FUN = function(X) X > .2) {
cellnames <- outer(row.names(x), colnames(x), function(X, Y) paste0(X, ":", Y))
cellnames[FUN(x) ]
}
Thus, using the matrix above following works:
cell_matches(x, function(X) X > 8)
# [1] "c:c"
cell_matches(x, function(X) X %% 2 == 1)
# [1] "a:a" "c:a" "b:b" "a:c" "c:c"
Application to correlation matrix:
# correlations above .80
cell_matches(cor(mtcars), function(X) X > .80 & X != 1)
# [1] "disp:cyl" "hp:cyl" "cyl:disp" "wt:disp" "cyl:hp" "disp:wt"
check_cor <- function(mat,FUN)
{
apply(which(FUN(x), arr.ind = TRUE),1,
function(i)
{
paste0(row.names(mat)[i[1]],':',colnames(mat)[i[2]])
}
)
}
check_cor(cor(mtcars), function(X) X > .80 & X != 1)

Apply a function to two vectors the "R" way?

There are two vectors x and y. If x contains an NA I want the NA to be replaced by a value from "y" with the corresponding index. Here is some example code that works:
x <- c(1,2,3,NA,5)
y <- c(6,7,8,9,10)
combineVector <- function(x,y)
{
for (i in 1:length(x)){
if (is.na(x[i]) && !is.na(y[i])){
x[i] = y[i]
}
}
return (x)
}
combineVector(x,y)
# [1] 1 2 3 9 5
I could have written this in almost any programming language. Is there a more "R" way to perform this task?
x <- c(1,2,3,NA,5)
y <- c(6,7,8,9,10)
x[is.na(x)] <- y[is.na(x)]
See the above. using is.na() on x returns a logical vector where it is TRUE for the NA elements of x. Using these in the selector for X and Y will select only those NA elements. Using it in assignment will replace the NA elements from x with the corresponding ones from Y.
That will be much faster than looping as the vector gets large.
Try this code:
x[is.na(x)] <- y[is.na(x)]
By subsetting the x vector with is.na(x) you will be assigning only those values of x which are NA to the corresponding indices in the y vector.
To generate a new vector taking x and y as input, you can use the ifelse function:
x<-c(1,2,3,NA,NA)
y<-c(6,7,8,9,NA)
ifelse(is.na(x), y, x)
# [1] 1 2 3 9 NA

Calculate name number

I would like to calculate name number for a set of given names.
Name number is calculated by summing the value assigned to each alphabet. The values are given below:
a=i=j=q=y=1
b=k=r=2
c=g=l=s=3
d=m=t=4
h=e=n=x=5
u=v=w=6
o=z=7
p=f=8
Example: Name number of David can be calculated as follows:
D+a+v+i+d
4+1+6+1+4
16=1+6=7
Name number of David is 7.
I would like to write a function in R for doing this.
I am thankful for any directions or tips or package suggestions that I should look into.
This code snippet will accomplish what you want:
# Name for which the number should be computed.
name <- "David"
# Prepare letter scores array. In this case, the score for each letter will be the array position of the string it occurs in.
val <- c("aijqy", "bkr", "cgls", "dmt", "henx", "uvw", "oz", "pf")
# Convert name to lowercase.
lName <- tolower(name)
# Compute the sum of letter scores.
s <- sum(sapply(unlist(strsplit(lName,"")), function(x) grep(x, val)))
# Compute the "number" for the sum of letter scores. This is a recursive operation, which can be shortened to taking the mod by 9, with a small correction in case the sum is 9.
n <- (s %% 9)
n <- ifelse(n==0, 9, n)
'n' is the result that you want for any 'name'
You will want to create a vector of values, in alphabetical order, then use match to get their indices. Something like this:
a <- i <- j <- q <- y <- 1
b <- k <- r <- 2
c <- g <- l <- s <- 3
d <- m <- t <- 4
h <- e <- n <- x <- 5
u <- v <- w <- 6
o <- z <- 7
p <- f <- 8
vals <- c(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z)
sum(vals[match(c("d","a","v","i","d"), letters)])
I'm sure there are several ways to do this, but here's an approach using a named vector:
x <- c(
"a"=1,"i"=1,"j"=1,"q"=1,"y"=1,
"b"=2,"k"=2,"r"=2,
"c"=3,"g"=3,"l"=3,"s"=3,
"d"=4,"m"=4,"t"=4,
"h"=5,"e"=5,"n"=5,"x"=5,
"u"=6,"v"=6,"w"=6,
"o"=7,"z"=7,
"p"=8,"f"=8)
##
name_val <- function(Name, mapping=x){
split <- tolower(unlist(strsplit(Name,"")))
total <-sum(mapping[split])
##
sum(as.numeric(unlist(strsplit(as.character(total),split=""))))
}
##
Names <- c("David","Betty","joe")
##
R> name_val("David")
[1] 7
R> sapply(Names,name_val)
David Betty joe
7 7 4

Matching numbers by their order when in two different vectors

The title does not really do this question justice, but I could not think of any other way to phrase the question. I can best explain the problem with an example.
Let's say we have two vectors of numbers (each of which are always going to be ascending and unique):
vector1 <- c(1,3,10,11,24,26,30,31)
vector2 <- c(5,9,15,19,21,23,28,35)
What I am trying to do is create a function that will take these two vectors and match them in the following way:
1) Start with the first element of vector1 (in this case, 1)
2) Go to vector2 and match the element from #1 with the first element in vector 2 that is bigger than it (in this case, 5)
3) Go back to vector1 and skip all elements less than the value in #2 we found (in this case, we skip 3, and grab 10)
4) Go back to vector2 and skip all elements less than the value in #3 we found (in this case, we skip 9 and grab 15)
5) repeat until we are done with all elements.
The resulting two vectors we should have are:
result1 = c(1,10,24,30)
result2 = c(5,15,28,35)
My current solution goes something like this, but I believe it might be highly inefficient:
# establishes where we start from the vector2 numbers
# just in case we have vector1 <- c(5,8,10)
# and vector2 <- c(1,2,3,4,6,7). We would want to skip the 1,2,3,4 values
i <- 1
while(vector2[i]<vector1[1]){
i <- i+1
}
# starts the result1 vector with the first value from the vector1
result1 <- vector1[1]
# starts the result2 vector empty and will add as we loop through
result2 <- c()
# super complicated and probably hugely inefficient loop within a loop within a loop
# i really want to avoid doing this, but I cannot think of any other way to accomplish this
for(j in 1:length(vector1)){
while(vector1[j] > vector2[i] && (i+1) <= length(vector2)){
result1 <- c(result1,vector1[j])
result2 <- c(result2,vector2[i])
while(vector1[j] > vector2[i+1] && (i+2) <= length(vector2)){
i <- i+1
}
i <- i+1
}
}
## have to add on the last vector2 value cause while loop skips it
## if it doesn't exist (there are no more vector2 values bigger) we put in an NA
if(result1[length(result1)] < vector2[i]){
result2 <- c(result2,vector2[i])
}
else{
### we ran out of vector2 values that are bigger
result2 <- c(result2,NA)
}
This is really difficult to explain. Just call it magic :)
vector1 <- c(1,3,10,11,24,26,30,31)
vector2 <- c(5,9,15,19,21,23,28,35)
## another case
# vector2 <- c(0,9,15,19,21,23,28,35)
## handling the case where vector2 min value(s) are < vector1 min value
if (any(idx <- which(min(vector1) >= vector2)))
vector2 <- vector2[-idx]
## interleave the two vectors
tmp <- c(vector1,vector2)[order(c(order(vector1), order(vector2)))]
## if we sort the vectors, which pairwise elements are from the same vector
r <- rle(sort(tmp) %in% vector1)$lengths
## I want to "remove" all the pairwise elements which are from the same vector
## so I again interleave two vectors:
## the first will be all TRUEs because I want the first instance of each *new* vector
## the second will be all FALSEs identifying the elements I want to throw out because
## there is a sequence of elements from the same vector
l <- rep(1, length(r))
ord <- c(l, r - 1)[order(c(order(r), order(l)))]
## create some dummy TRUE/FALSE to identify the ones I want
res <- sort(tmp)[unlist(Map(rep, c(TRUE, FALSE), ord))]
setNames(split(res, res %in% vector2), c('result1', 'result2'))
# $result1
# [1] 1 10 24 30
#
# $result2
# [1] 5 15 28 35
obviously this will only work if both vectors are ascending and unique which you said
EDIT:
works with duplicates:
vector1 <- c(1,3,10,11,24,26,30,31)
vector2 <- c(5,9,15,19,21,23,28,35)
vector2 <- c(0,9,15,19,21,23,28,35)
vector2 <- c(1,3,3,5,7,9,28,35)
f <- function(v1, v2) {
if (any(idx <- which(min(vector1) >= vector2)))
vector2 <- vector2[-idx]
vector1 <- paste0(vector1, '.0')
vector2 <- paste0(vector2, '.00')
n <- function(x) as.numeric(x)
tmp <- c(vector1, vector2)[order(n(c(vector1, vector2)))]
m <- tmp[1]
idx <- c(TRUE, sapply(1:(length(tmp) - 1), function(x) {
if (n(tmp[x + 1]) > n(m)) {
if (gsub('^.*\\.','', tmp[x + 1]) == gsub('^.*\\.','', m))
FALSE
else {
m <<- tmp[x + 1]
TRUE
}
} else FALSE
}))
setNames(split(n(tmp[idx]), grepl('\\.00$', tmp[idx])), c('result1','result2'))
}
f(vector1, vector2)
# $result1
# [1] 1 10 30
#
# $result2
# [1] 3 28 35

Resources