Conditionally creating matrix or dataframe in R - r

I have two objects let's call them 1 and 2. They can take either 1 or 2 as values for x variable and depending on that, their y values (binary) are determined as depicted in the image.
For example, if x=1 then only yA can be 1. But if x=2, all yA, yB and yC for that object can be 1. The constraint is that for each object maximum one y can be 1. In the image, blue columns are for object 1 and greens are for object 2.
Is there any efficient way to do it as the number of variables in original problem is much higher?
EDIT: The objective is to find all the possible combination of y variables as depicted in the image. The image is only to provide an idea for expected outcome.

A bit of a brute-force generation.
First, creating the basic frame of all y* columns:
dat <- data.frame(yA=c(1,NA,NA),yB=c(NA,1,NA),yC=c(NA,NA,1),ign=1)
dat <- merge(dat, dat, by="ign")
names(dat)[-1] <- c("y1A", "y1B", "y1C", "y2A", "y2B", "y2C")
dat
# ign y1A y1B y1C y2A y2B y2C
# 1 1 1 NA NA 1 NA NA
# 2 1 1 NA NA NA 1 NA
# 3 1 1 NA NA NA NA 1
# 4 1 NA 1 NA 1 NA NA
# 5 1 NA 1 NA NA 1 NA
# 6 1 NA 1 NA NA NA 1
# 7 1 NA NA 1 1 NA NA
# 8 1 NA NA 1 NA 1 NA
# 9 1 NA NA 1 NA NA 1
Merge (outer/cartesian) with a frame of x*:
alldat <- merge(data.frame(x1=c(1,1,2),x2=c(1,2,2),ign=1), dat, by="ign")
subset(alldat, (!is.na(y1B) | x1 > 1) & (!is.na(y2B) | x2 > 1), select = -ign)
# x1 x2 y1A y1B y1C y2A y2B y2C
# 5 1 1 NA 1 NA NA 1 NA
# 13 1 2 NA 1 NA 1 NA NA
# 14 1 2 NA 1 NA NA 1 NA
# 15 1 2 NA 1 NA NA NA 1
# 19 2 2 1 NA NA 1 NA NA
# 20 2 2 1 NA NA NA 1 NA
# 21 2 2 1 NA NA NA NA 1
# 22 2 2 NA 1 NA 1 NA NA
# 23 2 2 NA 1 NA NA 1 NA
# 24 2 2 NA 1 NA NA NA 1
# 25 2 2 NA NA 1 1 NA NA
# 26 2 2 NA NA 1 NA 1 NA
# 27 2 2 NA NA 1 NA NA 1
The ign column is merely to force/enable merge to do a cartesian/outer join.

Related

Turn value to the left of NA to NA value for entire dataframe

I have the following dataframe
df <- data.frame(a = c(1,2,3,4),
b = c(NA,1,NA,1),
c = c(1,4,5,2),
d = c(1,NA,NA,1))
a b c d
1 1 NA 1 1
2 2 1 4 NA
3 3 NA 5 NA
4 4 1 2 1
I have columns b and d with either NA or 1.
I have columns a and c with my values.
I want all the values to the left of NA values in b and d to be NA
So I want the following df_1 but cant figure out how to get there:
a b c d
1 NA NA 1 1
2 2 1 NA NA
3 NA NA NA NA
4 4 1 2 1
You can try:
df[c(TRUE, FALSE)][is.na(df[c(FALSE, TRUE)])] <- NA
df
a b c d
1 NA NA 1 1
2 2 1 NA NA
3 NA NA NA NA
4 4 1 2 1
You can use this function:
myFun <- function(df){
for(i in seq_along(df)) {
if(is.na(df$b[i]))
df$a[i]="NA"
if(is.na(df$d[i]))
df$c[i]="NA"
}
df
}
Output:
myFun(df)
a b c d
1 NA NA 1 1
2 2 1 NA NA
3 NA NA NA NA
4 4 1 2 1

How to extract values of existing variable and paste them in top rows of dataframe (using R)

Probably there's a very easy solution to this but I can't figure it out for some reason. This is what my data (in R) look like (except for value_new which is the exact description of what I need!):
dat<-data.frame("id"=c(1,2,3,4,5,NA,NA,NA,NA,NA),
"value"=c(rep(NA,5),7,NA,4,1,9),
"value_new"=c(7,NA,4,1,9,rep(NA,5)))
I hope that this is self explanatory. What I need is the values of "value" for is.na(value) (i.e. the first five rows) and paste these values as the first five rows (i.e. when value<0) of a new variable I'd like to call "value_new".
What is an easy way of doing this? I'd basically need to cut out the bottom half and paste it as new variable(s) in the top section of the dataframe. Hope this makes sense.
dat<-data.frame("id"=c(1,2,3,4,5,NA,NA,NA,NA,NA),
"value"=c(rep(NA,5),7,NA,4,1,9))
dat$value_new = NA
dat$value_new[!is.na(dat$id)] = dat$value[is.na(dat$id)]
dat
# id value value_new
# 1 1 NA 7
# 2 2 NA NA
# 3 3 NA 4
# 4 4 NA 1
# 5 5 NA 9
# 6 NA 7 NA
# 7 NA NA NA
# 8 NA 4 NA
# 9 NA 1 NA
# 10 NA 9 NA
In case you have more rows with a non-NA id compared to NA id you can use:
dat<-data.frame("id"=c(1,2,3,4,5,6,NA,NA,NA,NA,NA),
"value"=c(rep(NA,6),7,NA,4,1,9))
k = sum(is.na(dat$id))
dat$value_new = NA
dat$value_new[!is.na(dat$id)][1:k] = dat$value[is.na(dat$id)]
dat
# id value value_new
# 1 1 NA 7
# 2 2 NA NA
# 3 3 NA 4
# 4 4 NA 1
# 5 5 NA 9
# 6 6 NA NA
# 7 NA 7 NA
# 8 NA NA NA
# 9 NA 4 NA
# 10 NA 1 NA
# 11 NA 9 NA
where k is the number of values you'll replace in the top part of your new column.
dat<-data.frame("id"=c(1,2,3,4,5,NA,NA,NA,NA,NA),
"value"=c(rep(NA,5),7,NA,4,1,9),
"value_new"=c(7,NA,4,1,9,rep(NA,5)))
ind <- which(!is.na(dat$value))[1]
newcol <- `length<-`(dat$value[ind:nrow(dat)], nrow(dat))
dat$value_new2 <- newcol
# id value value_new value_new2
#1 1 NA 7 7
#2 2 NA NA NA
#3 3 NA 4 4
#4 4 NA 1 1
#5 5 NA 9 9
#6 NA 7 NA NA
#7 NA NA NA NA
#8 NA 4 NA NA
#9 NA 1 NA NA
#10 NA 9 NA NA
Short version:
dat$value_new2 <- `length<-`(dat$value[which(!is.na(dat$value))[1]:nrow(dat)], nrow(dat))
I remove the first continuing NA and add them to the end. Not considering id's here.

Merge and replace values from overlapping matrices

I have two overlapping matrices with some shared columns and rows:
m.1 = matrix(c(NA,NA,1,NA,NA,NA,1,1,1,NA,1,1,1,1,1,NA,1,1,1,NA,NA,NA,1,NA,NA), ncol=5)
colnames(m.1) <- c("-2","-1","0","1","2")
rownames(m.1) <- c("-2","-1","0","1","2")
## -2 -1 0 1 2
## -2 NA NA 1 NA NA
## -1 NA 1 1 1 NA
## 0 1 1 1 1 1
## 1 NA 1 1 1 NA
## 2 NA NA 1 NA NA
m.2 = matrix(c(NA,2,NA,2,2,2,NA,2,NA), ncol=3)
colnames(m.2) <- c("-1","0","1")
rownames(m.2) <- c("-1","0","1")
## -1 0 1
## -1 NA 2 NA
## 0 2 2 2
## 1 NA 2 NA
Now I want to pass the maximum value in each column from m.1 and m.2 to a new matrix m.max, which should look like this:
## -2 -1 0 1 2
## -2 NA NA 1 NA NA
## -1 NA 1 2 1 NA
## 0 1 2 2 2 1
## 1 NA 1 2 1 NA
## 2 NA NA 1 NA NA
Based on previous threads, I have meddled with merge(), replace() and match() but cannot get the desired result at all, e.g.
m.max<- merge(m.1,m.2, by = "row.names", all=TRUE, sort = TRUE)
## Row.names -2 -1.x 0.x 1.x 2 -1.y 0.y 1.y
## 1 -1 NA 1 1 1 NA NA 2 NA
## 2 -2 NA NA 1 NA NA NA NA NA
## 3 0 1 1 1 1 1 2 2 2
## 4 1 NA 1 1 1 NA NA 2 NA
## 5 2 NA NA 1 NA NA NA NA NA
Please help! Am I completely on the wrong track? Does this operation require a different kind of object than matrix? For example, I also tried to convert the matrices into raster objects and do cell statistics, but ran into problems because of the unequal dimensions of m.1 and m.2.
Importantly, the answer should also work for much larger objects, or whether I want to calculate the maximum, minimum or sum.
You can use pmax:
#we create a new matrix as big as m.1 with the values of m.2 in it
mres<-array(NA,dim(m.1),dimnames(m.1))
mres[rownames(m.2),colnames(m.2)]<-m.2
#Then we use pmax
pmax(m.1,mres,na.rm=TRUE)
# -2 -1 0 1 2
#-2 NA NA 1 NA NA
#-1 NA 1 2 1 NA
#0 1 2 2 2 1
#1 NA 1 2 1 NA
#2 NA NA 1 NA NA

How to find the number of discordant and concordant pairs in R?

I am trying to find the number of discordant and concordant pairs in a clinical trial, and have come across the 'asbio' library which provides the function ConDis.matrix. (http://artax.karlin.mff.cuni.cz/r-help/library/asbio/html/ConDis.matrix.html)
The dataset they give as an example is:
crab<-data.frame(gill.wt=c(159,179,100,45,384,230,100,320,80,220,320,210),
body.wt=c(14.4,15.2,11.3,2.5,22.7,14.9,1.41,15.81,4.19,15.39,17.25,9.52))
attach(crab)
crabm<-ConDis.matrix(gill.wt,body.wt)
crabm
Which gives a result that looks like:
1 2 3 4 5 6 7 8 9 10 11 12
1 NA NA NA NA NA NA NA NA NA NA NA NA
2 1 NA NA NA NA NA NA NA NA NA NA NA
3 1 1 NA NA NA NA NA NA NA NA NA NA
4 1 1 1 NA NA NA NA NA NA NA NA NA
5 1 1 1 1 NA NA NA NA NA NA NA NA
6 1 -1 1 1 1 NA NA NA NA NA NA NA
7 1 1 0 -1 1 1 NA NA NA NA NA NA
8 1 1 1 1 1 1 1 NA NA NA NA NA
9 1 1 1 1 1 1 -1 1 NA NA NA NA
10 1 1 1 1 1 -1 1 1 1 NA NA NA
11 1 1 1 1 1 1 1 0 1 1 NA NA
12 -1 -1 -1 1 1 1 1 1 1 1 1 NA
The solution I can think of is adding up the 1s and -1s (for concordant and discordant) respectively but I don't know how to count values in a matrix. Alternatively is someone has a better way of counting concordant/discordant then I would love to know.
Your found solution was
sum(crabm == 1, na.rm = TRUE)
[1] 57
sum(crabm == -1, na.rm = TRUE)
[1] 7
You could try (C...concordant, D...discordant pairs):
library(DescTools)
tab <- table(crab$gill.wt, crab$body.wt)
ConDisPairs(tab)[c("C","D")]
$C
[1] 57
$D
[1] 7

How to implement conditional search to upper direction from each row using dplyr?

This is a sample data frame as below:
df <- data.frame(
A=c(1,2,3,4,5,6,7),
B=c(1,NA,3,2,NA,4,3),
C=c(NA,1,NA,NA,1,NA,NA),
D=c(NA,2,NA,NA,4,NA,NA))
> df
A B C D
1 1 1 NA NA
2 2 NA 1 2
3 3 3 NA NA
4 4 2 NA NA
5 5 NA 2 4
6 6 4 NA NA
7 7 3 NA NA
I want to implement following manipulation using dplyr piping function in R.
Adding a new columns E which contains D in the following conditions.
Search !is.na(C) from each row to upper direction
If !is.na(C), pad column E by a value stored in D
This is a desired output.
> df2
A B C D E
1 1 1 NA NA NA
2 2 NA 1 2 NA
3 3 3 NA NA NA
4 4 2 NA NA NA
5 5 NA 2 4 2
6 6 4 NA NA NA
7 7 3 NA NA NA
I prefer to implement upper-directional search using piping function in dplyr.
I know a lag function in base but it does not work for this issue. I also tried to use slice function in dplyr but it also do not do searching from each row to upper direction.
I hope you could suggest other solutions for this matter.
I tried to use slice in dplyr but I could not do appropriate filtering from each row.
We can copy the contents of D in E and use tidyr::fill to replace NA's with recent non-NA values and use lag to get previous value in E.
library(dplyr)
df %>%
mutate(E = D) %>%
tidyr::fill(E) %>%
mutate(E = replace(lag(E), is.na(D), NA))
# A B C D E
#1 1 1 NA NA NA
#2 2 NA 1 2 NA
#3 3 3 NA NA NA
#4 4 2 NA NA NA
#5 5 NA 1 4 2
#6 6 4 NA NA NA
#7 7 3 NA NA NA
This uses bind_rows to combine the NA C values with the non-NA C values with your lag criteria:
bind_rows(df%>%
filter(is.na(C))%>%
mutate(E = NA)
,
df%>%
filter(!is.na(C))%>%
mutate(E = lag(D))
)%>%
arrange(A)
A B C D E
1 1 1 NA NA NA
2 2 NA 1 2 NA
3 3 3 NA NA NA
4 4 2 NA NA NA
5 5 NA 1 4 2
6 6 4 NA NA NA
7 7 3 NA NA NA
In data.table this is very simple:
library(data.table)
dt <- as.data.table(df)
dt[!is.na(C), E:=shift(D)][]
A B C D E
1: 1 1 NA NA NA
2: 2 NA 1 2 NA
3: 3 3 NA NA NA
4: 4 2 NA NA NA
5: 5 NA 1 4 2
6: 6 4 NA NA NA
7: 7 3 NA NA NA
Base isn't too bad either:
# base
df2 <- df
df2$E <- NA
ind <- !is.na(df2$C)
df2[ind, 'E'] <- df2[ind, 'D'][c(NA,seq_len(sum(ind)-1))]
df2

Resources