R - If statement function to set value of new column - r

I have 2 columns pos and neg, each with a integer value.
I would like to create a new column score, with each element of this column given a value of:
1 if pos > neg
0 if pos = neg
-1 if pos < neg
What would be the best way to do this? I am new to creating functions in R, so any help or direction is appreciated.

We can use ifelse instead of if/else as ifelse is vectorized
df1$score <- with(df1, ifelse(pos > neg, 1, ifelse(pos < neg, -1, 0)))
Or get the difference of 'pos' and 'neg' and apply sign which will give values -1, 0, 1 when the sign is negative, value 0 or positive
df1$score <- with(df1, sign(pos - neg ))
data
df1 <- data.frame(pos = c(5, 4, 3, 1, 2), neg = c(5, 3, 4, 1, 3))

Related

Calculate cumulative prevalence of carriage of resistant bugs

I have recently started using R. When working on carriage of problematic bacteria, I encountered one problem that I hope somebody could help solve. Apologies if the question is on the easy side.
I want to calculate the cumulative proportion of people who get colonized by the problem bug at various time points (a, b, c) as shown in the dataset below "df". "0" means negative test, "1" means positive test for resistant bug, "NA" means test was not done at the time point. The result should be as described in "x", i.e. if the person ever tests positive on either time point (a,b,c) he should have the value "1" in x. If all his tests were negative he should have value "0", and if he never had a test done, the value should be "NA". Is there a good way to calculate this "x" automatically?
a <- c(0, 0, 1, 0, 0, 1, 0, 0, NA, NA)
b <- c(0, 0, 1, 0, 1, NA, 0, 0, NA, 0)
c <- c(NA, 1, 0, 0, 0, 1, 1, 0, NA, 0)
df <- cbind(a, b, c)
df
x <- c(0, 1, 1, 0, 1, 1, 1, 0,NA,0)
df <- cbind(df, x)
df
I tried to create the x-variable using ifelse, but get problems with missing values. For instance, using the following expression:
y <- ifelse(a==1 | b==1 | c==1, 1, ifelse(a==0 | b==0 | c==0, 0, NA))
df <- cbind(df, y)
df
... the resultant column erroneously get "NA" in row 1 and 10, i.e. when there is a combination of 0 and NA, the result should be 0, not NA.
You can use rowSums :
cols <- c('a', 'b', 'c')
+(rowSums(df[, cols], na.rm = TRUE) > 0) * NA^+(rowSums(!is.na(df[, cols])) == 0)
#[1] 0 1 1 0 1 1 1 0 NA 0
This gives similar result as x shown however, might be difficult to understand.
Here is a simple alternative using apply :
apply(df[, cols], 1, function(x) if(all(is.na(x))) NA else +(any(x == 1, na.rm = TRUE)))
#[1] 0 1 1 0 1 1 1 0 NA 0
This returns NA if all the values in the row are NA else checks if any value has 1 in it.

R function to find count of elements before sum is above a threshold

I'm trying to recreate a function from Sum of first n elements of a vector, but where this solution took an argument to sum first n elements of vector, I'd like an argument which is the threshold (including a default) the elements sum up to (or over).
After trying different for and/or while possibilities and searching StackOverflow, I've ended up here: unclear how to implement the threshold and set the n_elements.
I have this logic which returns 0 for the given vector. It doesn't seem the n_elements = x[i] + 1 part is correct.
theFunc <- function(x, threshold = 5){
n_elements = 0
while (sum(head(x)) < threshold){
n_elements = x[i] + 1
}
return(n_elements)
}
Call:
x <- c(0, 0, 1, 1, 2, 3, 6, 7)
theFunc(x)
[1] 0
If the input is as above and the threshold is 5, then the function should return 6 (number of elements) because 0+0+1+1+2+3 = 7 and is above the threshold.
A simple function without a loop is as follows:
theFunc <- function(x, threshold = 5){
sum(cumsum(x) < threshold) + 1
}
x <- c(0, 0, 1, 1, 2, 3, 6, 7)
theFunc(x)
[1] 6

How to define a vector based on a condition on its own cumulative sum?

I have a matrix and I would like to define a vector based on a condition on its own cumulative sum.
For example :
data$m <- c(1, 0, 2, 1, 2)
data$n <- c(2, 1, 1, 2, 2)
I would like to calculate data$x as :
data$x <- data$m * data$n
based on the condition that data$cumsum_x <- cumsum(data$x) is lower than a certain value, for example 5. If data$cumsum_x > 5, I should get data$x = 0
So I should get the following results for data$x :
2 0 2 2 0
Do you know how to do that?
I guess I should make a loop because data$x depends on the cumsum of n-1 ?

Vectorize loop with repeating indices

I have a vector of indices that contains repeating values:
IN <- c(1, 1, 2, 2, 3, 4, 5)
I would like to uses these indices to subtract two vectors:
ST <- c(0, 0, 0, 0, 0, 0, 0)
SB <- c(1, 1, 1, 1, 1, 1, 1)
However, I would like to do the subtraction in "order" such that after subtraction of the first index values (0, 1), the second substraction would "build off" the first subtraction. I would like to end up with a vector FN that looks like this:
c(-2, -2, -1, -1, -1, 0, 0)
This is easy enough to do in a for loop:
for(i in seq_along(IN)){
ST[IN[i]] <- ST[IN[i]] - SB[IN[i]]
}
But I need to run this loop many times on long vectors and this can take many hours. Is there any way to vectorize this task and avoid a for loop? Maybe using a data.table technique?
Sure, with data.table, it's
library(data.table)
DT = data.table(ST)
mDT = data.table(IN, SB)[, .(sub = sum(SB)), by=.(w = IN)]
DT[mDT$w, ST := ST - mDT$sub ]
ST
1: -2
2: -2
3: -1
4: -1
5: -1
6: 0
7: 0
Or with base R:
w = sort(unique(IN))
ST[w] <- ST[w] - tapply(SB, IN, FUN = sum)
# [1] -2 -2 -1 -1 -1 0 0
Here is an option using aggregate in base R:
ag <- aggregate(.~IN, data.frame(IN, ST[IN]-SB[IN]), sum)
replace(ST, ag[,1], ag[,2])
#[1] -2 -2 -1 -1 -1 0 0
OR using xtabs:
d <- as.data.frame(xtabs(B~A, data.frame(A=IN, B=ST[IN]-SB[IN])))
replace(ST, d[,1], d[,2])

R: selecting items matching criteria from a vector

I have a numeric vector in R, which consists of both negative and positive numbers. I want to separate the numbers in the list based on sign (ignoring zero for now), into two seperate lists:
a new vector containing only the negative numbers
another vector containing only the positive numbers
The documentation shows how to do this for selecting rows/columns/cells in a dataframe - but this dosen't work with vectors AFAICT.
How can it be done (without a for loop)?
It is done very easily (added check for NaN):
d <- c(1, -1, 3, -2, 0, NaN)
positives <- d[d>0 & !is.nan(d)]
negatives <- d[d<0 & !is.nan(d)]
If you want exclude both NA and NaN, is.na() returns true for both:
d <- c(1, -1, 3, -2, 0, NaN, NA)
positives <- d[d>0 & !is.na(d)]
negatives <- d[d<0 & !is.na(d)]
It can be done by using "square brackets".
A new vector is created which contains those values which are greater than zero. Since a comparison operator is used, it will denote values in Boolean. Hence square brackets are used to get the exact numeric value.
d_vector<-(1,2,3,-1,-2,-3)
new_vector<-d_vector>0
pos_vector<-d_vector[new_vector]
new1_vector<-d_vector<0
neg_vector<-d_vector[new1_vector]
purrrpackage includes some useful functions for filtering vectors:
library(purrr)
test_vector <- c(-5, 7, 0, 5, -8, 12, 1, 2, 3, -1, -2, -3, NA, Inf, -Inf, NaN)
positive_vector <- keep(test_vector, function(x) x > 0)
positive_vector
# [1] 7 5 12 1 2 3 Inf
negative_vector <- keep(test_vector, function(x) x < 0)
negative_vector
# [1] -5 -8 -1 -2 -3 -Inf
You can use also discard function

Resources