Calculate cumulative prevalence of carriage of resistant bugs - r

I have recently started using R. When working on carriage of problematic bacteria, I encountered one problem that I hope somebody could help solve. Apologies if the question is on the easy side.
I want to calculate the cumulative proportion of people who get colonized by the problem bug at various time points (a, b, c) as shown in the dataset below "df". "0" means negative test, "1" means positive test for resistant bug, "NA" means test was not done at the time point. The result should be as described in "x", i.e. if the person ever tests positive on either time point (a,b,c) he should have the value "1" in x. If all his tests were negative he should have value "0", and if he never had a test done, the value should be "NA". Is there a good way to calculate this "x" automatically?
a <- c(0, 0, 1, 0, 0, 1, 0, 0, NA, NA)
b <- c(0, 0, 1, 0, 1, NA, 0, 0, NA, 0)
c <- c(NA, 1, 0, 0, 0, 1, 1, 0, NA, 0)
df <- cbind(a, b, c)
df
x <- c(0, 1, 1, 0, 1, 1, 1, 0,NA,0)
df <- cbind(df, x)
df
I tried to create the x-variable using ifelse, but get problems with missing values. For instance, using the following expression:
y <- ifelse(a==1 | b==1 | c==1, 1, ifelse(a==0 | b==0 | c==0, 0, NA))
df <- cbind(df, y)
df
... the resultant column erroneously get "NA" in row 1 and 10, i.e. when there is a combination of 0 and NA, the result should be 0, not NA.

You can use rowSums :
cols <- c('a', 'b', 'c')
+(rowSums(df[, cols], na.rm = TRUE) > 0) * NA^+(rowSums(!is.na(df[, cols])) == 0)
#[1] 0 1 1 0 1 1 1 0 NA 0
This gives similar result as x shown however, might be difficult to understand.
Here is a simple alternative using apply :
apply(df[, cols], 1, function(x) if(all(is.na(x))) NA else +(any(x == 1, na.rm = TRUE)))
#[1] 0 1 1 0 1 1 1 0 NA 0
This returns NA if all the values in the row are NA else checks if any value has 1 in it.

Related

Count the max number of ones in a vector

I am doing the next task.
Suppose that I have the next vector.
(1,1,0,0,0,1,1,1,1,0,0,1,1,1,0)
I need to extract the next info.
the maximum number of sets of consecutive zeros
the mean number of consecutive zeros.
FOr instance in the previous vector
the maximum is: 3, because I have 000 00 0
Then the mean number of zeros is 2.
I am thinking in this idea because I need to do the same but with several observations. I think to implement this inside an apply function.
We could use rle for this. As there are only binary values, we could just apply the rle on the entire vector, then extract the lengths that correspond to 0 (!values - returns TRUE for 0 and FALSE others)
out <- with(rle(v1), lengths[!values])
And get the length and the mean from the output
> length(out)
[1] 3
> mean(out)
[1] 2
data
v1 <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)
You can try another option using regmatches
> v <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)
> s <- paste0(v, collapse = "")
> zeros <- unlist(regmatches(s, gregexpr("0+", s)))
> length(zeros)
[1] 3
> mean(nchar(zeros))
[1] 2

R: randomly sample a nonzero element in a vector and replace other elements with 0

Suppose I have a vector
vec <- c(0, 1, 0, 0, 0, 1, 1, 1, 1, 2)
How do I random sample a nonzero element and turn other elements into 0?
Suppose the element sampled was vec[2], then the resulting vector would be
vec <- c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0)
I know that I can sample the indice of one nonzero element by sample(which(vec != 0), 1), but I am not sure how to proceed from that. Thanks!
You can try the code below
> replace(0 * vec, sample(which(vec != 0), 1), 1)
[1] 0 0 0 0 0 0 0 1 0 0
where
which returns the indices of non-zero values
sample gives a random index
replace replaces the value to 1 at the specific index
Watch out for sample's behavior if which returns only 1 value:
> vec <- c(rep(0, 9), 1)
> sample(which(vec != 0), 1)
[1] 4
This preserves the vector value (instead of turning it to 1) and guards against vectors with only one nonzero value using rep to guarantee sample gets a vector with more than one element:
vec[-sample(rep(which(vec != 0), 2), 1)] <- 0

Error in converting categorical variables to factor in R

In this tutorial, I tried to use another method for converting categorical variables to factor.
In the article, the following method is used.
library(MASS)
library(rpart)
cols <- c('low', 'race', 'smoke', 'ht', 'ui')
birthwt[cols] <- lapply(birthwt[cols], as.factor)
and I replaced the last line by
birthwt[cols] <- as.factor((birthwt[cols]))
but the result is NA all
What is wrong with that?
as.factor((birthwt[cols])) is calling as.factor on a list of 5 vectors. If you do that R will interpret each of those 5 vectors as the levels, and the column headers as the labels, of a factor variable, which is clearly not what you want:
> as.factor(birthwt[cols])
low race smoke ht ui
<NA> <NA> <NA> <NA> <NA>
5 Levels: c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) ...
> labels(as.factor(birthwt[cols]))
[1] "low" "race" "smoke" "ht" "ui"
lapply iterates over a list, calling the function as.factor on each of the vectors separately in that list. You need to do this to convert each variable separately into a factor, rather than attempting to convert the entire list into a single factor, which is what as.factor(birthwt[cols]) does.

Setting NAs with zeros in matrix with lapply seems does not work well?

I have these matrices.
matr <- list()
matr[[i]] <- c(0, NA, 3, 4, 4,
0, 0, 3, 4, 1,
0, 0, 0, NA, 1,
0, 0, NA, 0, 3,
0, 0, 0, 0, 0)
matr[[i]] <- matrix(matr[[i]], 5, 5)
I want to set NA to zero using the following code:
x <- lapply(matr,function(x) x[is.na(x) <- 0])
Then I got this result:
> x
[[1]]
numeric(0)
[[2]]
numeric(0)
[[3]]
numeric(0)
Why it does not return the matrices? Is my code correct? any help please?
Since lapply works on lists and return lists I think that isn't what you want.
I think using apply here fits better.
Try x <- apply(matr[[1]], 2, function(x){
x[is.na(x)] <- 0
x
})
The number 2 here indicates that you want to operate column-wise instead of row-wise (1st margin are rows, and 2nd margin are columns).
Also notice that you had put the <- operator within the brackets which was a wrong sintax.
EDIT:
It seems that I have misunderstood your question.
Here follows a code that works for an entire list:
lapply(matr, function(x){
apply(x, 2, function(y){
y[is.na(y)] <- 0
y
})
})

Vectorized replacement of a subset of a vector

Simple question: I've got two vectors of 0's and 1's, a and b. The b vector has as many entries as there are 1's in a. I would like to replace the 1's in a with the entries from b. Of course I can do this in a for loop, but is there a nice vectorized way to do this?
From
a <- c(0, 1, 1, 0, 1)
b <- c(1, 0, 1)
create
c <- c(0, 1, 0, 0, 1)
This is pretty simple: a[a == 1] <- b

Resources