Merge and replace values from overlapping matrices - r

I have two overlapping matrices with some shared columns and rows:
m.1 = matrix(c(NA,NA,1,NA,NA,NA,1,1,1,NA,1,1,1,1,1,NA,1,1,1,NA,NA,NA,1,NA,NA), ncol=5)
colnames(m.1) <- c("-2","-1","0","1","2")
rownames(m.1) <- c("-2","-1","0","1","2")
## -2 -1 0 1 2
## -2 NA NA 1 NA NA
## -1 NA 1 1 1 NA
## 0 1 1 1 1 1
## 1 NA 1 1 1 NA
## 2 NA NA 1 NA NA
m.2 = matrix(c(NA,2,NA,2,2,2,NA,2,NA), ncol=3)
colnames(m.2) <- c("-1","0","1")
rownames(m.2) <- c("-1","0","1")
## -1 0 1
## -1 NA 2 NA
## 0 2 2 2
## 1 NA 2 NA
Now I want to pass the maximum value in each column from m.1 and m.2 to a new matrix m.max, which should look like this:
## -2 -1 0 1 2
## -2 NA NA 1 NA NA
## -1 NA 1 2 1 NA
## 0 1 2 2 2 1
## 1 NA 1 2 1 NA
## 2 NA NA 1 NA NA
Based on previous threads, I have meddled with merge(), replace() and match() but cannot get the desired result at all, e.g.
m.max<- merge(m.1,m.2, by = "row.names", all=TRUE, sort = TRUE)
## Row.names -2 -1.x 0.x 1.x 2 -1.y 0.y 1.y
## 1 -1 NA 1 1 1 NA NA 2 NA
## 2 -2 NA NA 1 NA NA NA NA NA
## 3 0 1 1 1 1 1 2 2 2
## 4 1 NA 1 1 1 NA NA 2 NA
## 5 2 NA NA 1 NA NA NA NA NA
Please help! Am I completely on the wrong track? Does this operation require a different kind of object than matrix? For example, I also tried to convert the matrices into raster objects and do cell statistics, but ran into problems because of the unequal dimensions of m.1 and m.2.
Importantly, the answer should also work for much larger objects, or whether I want to calculate the maximum, minimum or sum.

You can use pmax:
#we create a new matrix as big as m.1 with the values of m.2 in it
mres<-array(NA,dim(m.1),dimnames(m.1))
mres[rownames(m.2),colnames(m.2)]<-m.2
#Then we use pmax
pmax(m.1,mres,na.rm=TRUE)
# -2 -1 0 1 2
#-2 NA NA 1 NA NA
#-1 NA 1 2 1 NA
#0 1 2 2 2 1
#1 NA 1 2 1 NA
#2 NA NA 1 NA NA

Related

Turn value to the left of NA to NA value for entire dataframe

I have the following dataframe
df <- data.frame(a = c(1,2,3,4),
b = c(NA,1,NA,1),
c = c(1,4,5,2),
d = c(1,NA,NA,1))
a b c d
1 1 NA 1 1
2 2 1 4 NA
3 3 NA 5 NA
4 4 1 2 1
I have columns b and d with either NA or 1.
I have columns a and c with my values.
I want all the values to the left of NA values in b and d to be NA
So I want the following df_1 but cant figure out how to get there:
a b c d
1 NA NA 1 1
2 2 1 NA NA
3 NA NA NA NA
4 4 1 2 1
You can try:
df[c(TRUE, FALSE)][is.na(df[c(FALSE, TRUE)])] <- NA
df
a b c d
1 NA NA 1 1
2 2 1 NA NA
3 NA NA NA NA
4 4 1 2 1
You can use this function:
myFun <- function(df){
for(i in seq_along(df)) {
if(is.na(df$b[i]))
df$a[i]="NA"
if(is.na(df$d[i]))
df$c[i]="NA"
}
df
}
Output:
myFun(df)
a b c d
1 NA NA 1 1
2 2 1 NA NA
3 NA NA NA NA
4 4 1 2 1

Conditionally creating matrix or dataframe in R

I have two objects let's call them 1 and 2. They can take either 1 or 2 as values for x variable and depending on that, their y values (binary) are determined as depicted in the image.
For example, if x=1 then only yA can be 1. But if x=2, all yA, yB and yC for that object can be 1. The constraint is that for each object maximum one y can be 1. In the image, blue columns are for object 1 and greens are for object 2.
Is there any efficient way to do it as the number of variables in original problem is much higher?
EDIT: The objective is to find all the possible combination of y variables as depicted in the image. The image is only to provide an idea for expected outcome.
A bit of a brute-force generation.
First, creating the basic frame of all y* columns:
dat <- data.frame(yA=c(1,NA,NA),yB=c(NA,1,NA),yC=c(NA,NA,1),ign=1)
dat <- merge(dat, dat, by="ign")
names(dat)[-1] <- c("y1A", "y1B", "y1C", "y2A", "y2B", "y2C")
dat
# ign y1A y1B y1C y2A y2B y2C
# 1 1 1 NA NA 1 NA NA
# 2 1 1 NA NA NA 1 NA
# 3 1 1 NA NA NA NA 1
# 4 1 NA 1 NA 1 NA NA
# 5 1 NA 1 NA NA 1 NA
# 6 1 NA 1 NA NA NA 1
# 7 1 NA NA 1 1 NA NA
# 8 1 NA NA 1 NA 1 NA
# 9 1 NA NA 1 NA NA 1
Merge (outer/cartesian) with a frame of x*:
alldat <- merge(data.frame(x1=c(1,1,2),x2=c(1,2,2),ign=1), dat, by="ign")
subset(alldat, (!is.na(y1B) | x1 > 1) & (!is.na(y2B) | x2 > 1), select = -ign)
# x1 x2 y1A y1B y1C y2A y2B y2C
# 5 1 1 NA 1 NA NA 1 NA
# 13 1 2 NA 1 NA 1 NA NA
# 14 1 2 NA 1 NA NA 1 NA
# 15 1 2 NA 1 NA NA NA 1
# 19 2 2 1 NA NA 1 NA NA
# 20 2 2 1 NA NA NA 1 NA
# 21 2 2 1 NA NA NA NA 1
# 22 2 2 NA 1 NA 1 NA NA
# 23 2 2 NA 1 NA NA 1 NA
# 24 2 2 NA 1 NA NA NA 1
# 25 2 2 NA NA 1 1 NA NA
# 26 2 2 NA NA 1 NA 1 NA
# 27 2 2 NA NA 1 NA NA 1
The ign column is merely to force/enable merge to do a cartesian/outer join.

Calculations in R with Missing Values

In the below test data, v4 is calculated out of v1, v2 and v3 as follows:
test$v4 <- (test$v1 + test$v2 + test$v3) / 3
As expected, any row with a missing value returns an NA result for v4:
v1 v2 v3 v4
1 1 1 2 1.333333
2 1 1 2 1.333333
3 1 2 NA NA
4 0 1 NA NA
5 NA NA 0 NA
6 NA 1 0 NA
7 1 2 NA NA
However, I want R to return an NA only when there are two or three NA values. If there is only one NA, I want R to calculate the mean of the two available values.
Can you please advise as to how I can do that?
Thank you.
You can use ifelse and rowSums(is.na()) to have differing formula on different rows:
dat <- read.table(text= "v1 v2 v3 v4
1 1 1 2 1.333333
2 1 1 2 1.333333
3 1 2 NA NA
4 0 1 NA NA
5 NA NA 0 NA
6 NA 1 0 NA
7 1 2 NA NA")
# if more than 2 NAs in each row, NA, otherwise the mean ignoring NAs
dat$v4 <- ifelse(rowSums(is.na(dat)) >= 2, NA, rowMeans(dat, na.rm = TRUE))

R- Replace all values in rows of dataframe after first NA by NA

I have a dataframe of 3500 observations and 278 variables. For each row going from the first column, I want to replace all values occurring after the first NA by NAs. For instance, I want to go from a dataframe like so:
X1 X2 X3 X4 X5
1 3 NA 6 9
1 NA 4 6 18
6 7 NA 3 1
10 1 2 NA 2
To something like
X1 X2 X3 X4 X5
1 3 NA NA NA
1 NA NA NA NA
6 7 NA NA NA
10 1 2 NA NA
I tried using the following nested for loop, but it is not terminating:
for(i in 2:3500){
firstna <- min(which(is.na(df[i,])))
df[i, firstna:278] <- NA
}
Is there a more efficient way to do this? Thanks in advance.
You could do something like this:
# sample data
mat <- matrix(1, 10, 10)
set.seed(231)
mat[sample(100, 7)] <- NA
You can use apply with cumsum and is.na to keep track of where NAs need to be placed (i.e. places across the row where the cumulative sum of NAs is greater than 0). Then, use those locations to assign NAs to the original structure in the appropriate places.
mat[t(apply(is.na(mat), 1, cumsum)) > 0 ] <- NA
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 1 1 1 1 1 NA NA NA NA
# [2,] NA NA NA NA NA NA NA NA NA NA
# [3,] 1 1 1 1 1 1 1 1 1 1
# [4,] 1 1 1 1 1 1 1 1 1 1
# [5,] 1 1 1 NA NA NA NA NA NA NA
# [6,] 1 1 1 1 1 1 1 1 1 1
# [7,] 1 NA NA NA NA NA NA NA NA NA
# [8,] 1 1 1 1 1 1 1 1 1 1
# [9,] 1 1 1 1 1 1 1 1 1 1
#[10,] 1 1 NA NA NA NA NA NA NA NA
Works the fine with data frames. Using the provided example data:
d<-read.table(text="
X1 X2 X3 X4 X5
1 3 NA 6 9
1 NA 4 6 18
6 7 NA 3 1
10 1 2 NA 2 ", header=TRUE)
d[t(apply(is.na(d), 1, cumsum)) > 0 ] <- NA
# X1 X2 X3 X4 X5
#1 1 3 NA NA NA
#2 1 NA NA NA NA
#3 6 7 NA NA NA
#4 10 1 2 NA NA
We can use rowCumsums from library(matrixStats)
library(matrixStats)
d*NA^rowCumsums(+(is.na(d)))
# X1 X2 X3 X4 X5
#1 1 3 NA NA NA
#2 1 NA NA NA NA
#3 6 7 NA NA NA
#4 10 1 2 NA NA
Or a base R option is
d*NA^do.call(cbind,Reduce(`+`,lapply(d, is.na), accumulate=TRUE))
I did this using the cumany function from the dplyr package, which returns TRUE for each element after the condition is met.
df <- read.table(text = "X1 X2 X3 X4 X5
1 3 NA 6 9
1 NA 4 6 18
6 7 NA 3 1
10 1 2 NA 2 ",
header = T)
library(plyr)
library(dplyr)
na_row_replace <- function(x){
x[which(cumany(is.na(x)))] <- NA
return(x)
}
adply(df, 1, na_row_replace)

How to find the number of discordant and concordant pairs in R?

I am trying to find the number of discordant and concordant pairs in a clinical trial, and have come across the 'asbio' library which provides the function ConDis.matrix. (http://artax.karlin.mff.cuni.cz/r-help/library/asbio/html/ConDis.matrix.html)
The dataset they give as an example is:
crab<-data.frame(gill.wt=c(159,179,100,45,384,230,100,320,80,220,320,210),
body.wt=c(14.4,15.2,11.3,2.5,22.7,14.9,1.41,15.81,4.19,15.39,17.25,9.52))
attach(crab)
crabm<-ConDis.matrix(gill.wt,body.wt)
crabm
Which gives a result that looks like:
1 2 3 4 5 6 7 8 9 10 11 12
1 NA NA NA NA NA NA NA NA NA NA NA NA
2 1 NA NA NA NA NA NA NA NA NA NA NA
3 1 1 NA NA NA NA NA NA NA NA NA NA
4 1 1 1 NA NA NA NA NA NA NA NA NA
5 1 1 1 1 NA NA NA NA NA NA NA NA
6 1 -1 1 1 1 NA NA NA NA NA NA NA
7 1 1 0 -1 1 1 NA NA NA NA NA NA
8 1 1 1 1 1 1 1 NA NA NA NA NA
9 1 1 1 1 1 1 -1 1 NA NA NA NA
10 1 1 1 1 1 -1 1 1 1 NA NA NA
11 1 1 1 1 1 1 1 0 1 1 NA NA
12 -1 -1 -1 1 1 1 1 1 1 1 1 NA
The solution I can think of is adding up the 1s and -1s (for concordant and discordant) respectively but I don't know how to count values in a matrix. Alternatively is someone has a better way of counting concordant/discordant then I would love to know.
Your found solution was
sum(crabm == 1, na.rm = TRUE)
[1] 57
sum(crabm == -1, na.rm = TRUE)
[1] 7
You could try (C...concordant, D...discordant pairs):
library(DescTools)
tab <- table(crab$gill.wt, crab$body.wt)
ConDisPairs(tab)[c("C","D")]
$C
[1] 57
$D
[1] 7

Resources