Ifelse() with three conditions - r

I have two vectors:
a<-rep(1:2,100)
b<-sample(a)
I would like to have an ifelse condition that compares each value of a with the corresponding value of b, and does the following:
if a>b 1
if a<b 0
if a=b sample(1:2,length(a),replace=T)
the first two can be done with :
ifelse(a>b,1,0)
but I'm not sure how to incorporate the case where a and b are equal.

How about adding another ifelse:
ifelse(a>b, 1, ifelse(a==b, sample(1:2, length(a), replace = TRUE), 0))
In this case you get the value 1 if a>b, then, if a is equal to b it is either 1 or 2 (sample(1:2, length(a), replace = TRUE)), and if not (so a must be smaller than b) you get the value 0.

This is an easy way:
(a > b) + (a == b) * sample(2, length(a), replace = TRUE)
This is based on calculations with boolean values which are cast into numerical values.

There is ambiguity in your question. Do you want different random values for all indexes where a==b or one random value for all indexes?
The answer by #Rob will work in the second scenario. For the first scenario I suggest avoiding ifelse:
u<-rep(NA,length(a))
u[a>b] <- 1
u[a<b] <- 0
u[a==b] <- sample(1:2,sum(a==b),replace=TRUE)

Related

Randomly select values from a given number list to add to a certain value in r

If I have a set of values such as
c(1,2,5,6,7,15,19,20)
and I want to randomly select 2 values where the sum equals 20. From the above list possible samples that I would like to see would be
[19,1], [15,5]
How do I do this in R. Any help would be greatly appreciated.
This computes all possible combinations of your input vector, so if this is very long, this might be a problem.
getVal <- function(vec,val) {
comb = combn(vec, 2)
idx = colSums(comb) == val
if (sum(idx)) {
return(comb[,idx][,sample(sum(idx),1)])
}
return(FALSE)
}
vec = (c(1,4,6,9))
val = 10
getVal(vec,val)
>>[1] 1 9
val = 11
>>[1] FALSE
getVal(vec,val)
For a small vector of values you can do an exhaustive search by working out all the combinations of pairs in the values. Example:
> values = c(1,2,5,6,7,15,19,20)
> pairs = matrix(values[t(combn(length(values),2))],ncol=2)
That is a 2-column matrix of all pairs from values. Now sum the rows and look for the target value of 20:
> targets = apply(pairs,1,sum)==20
> pairs[targets,]
[,1] [,2]
[1,] 1 19
[2,] 5 15
The size of pairs increases such that if you have 100 values then pairs will have nearly 5000 rows.
You can do this with the sample()-functie and a while-loop. It isn't the prettiest solution but a simple to implement one for sure.
First you sample two values from the vector and store them in an object, like:
values <- c(1, 2, 5, 6, 7, 15, 19, 20)
randomTwo <- sample(values, 2)
Then you start you while-loop. This loop checks if sum of the two sampled values modulo 10 equals 0 (I assumed you meant modulo from the examples in your question, see https://en.wikipedia.org/wiki/Modulo_operation to see what it does). If the operation does not equal 0 the loop samples two new values until the operation does equal zero, and you get your two values.
Here's what it looks like:
while (sum(randomTwo) %% 10 != 0) {
randomTwo <- sample(values, 2)
}
Now this might take more iterations than checking all combo's, and it might take less, depending on chance. If you have just this small vector than it's a nice solution. Good luck!
In a way where you don't need to compute a inmense matrix (way faster):
findpairs=function(a,sum,num){
list=list()
aux=1
for (i in 1:length(a)){
n=FALSE
n=which((a+a[i])==sum)
if (length(n)){
for (j in n){
if (j!=i){
list[[aux]]=c(a[i],a[j])
aux=aux+1
}
}
}
}
return(sample(list[1:(length(list)/2),num))
}
a=c(1,2,5,6,19,7,15,20)
a=a[order(a)]
sum=20
findpairs(a,sum,2)
[[1]]
[1] 5 15
[[2]]
[1] 1 19
Issue is that it gives repetition.
edit
Solved. Just take half of the list as the other half will be the same pairs the other way around.

Select random and unique elements from a vector

Say I have a simple vector with repeated elements:
a <- c(1,1,1,2,2,3,3,3)
Is there a way to randomly select a unique element from each of the repeated elements? I.e. one random draw pointing which elements to keep would be:
1,4,6 ## here I selected the first 1, the first 2 and the first 3
And another:
1,5,8 ## here I selected the first 1, the second 2 and the third 3
I could do this with a loop for each repeated elements, but I am sure there must be a faster way to do this?
EDIT:
Ideally the solution should also always select a particular element if it is already a unique element. I.e. my vector could also be:
b <- c(1,1,1,2,2,3,3,3,4) ## The number four is unique and should always be drawn
Using base R ave we could do something like
unique(ave(seq_along(a), a, FUN = function(x) if(length(x) > 1) head(sample(x), 1) else x))
#[1] 3 5 6
unique(ave(seq_along(a), a, FUN = function(x) if(length(x) > 1) head(sample(x), 1) else x))
#[1] 3 4 7
This generates an index for every value of a, grouped by a and then selects one random index value in each group.
Using same logic with sapply and split
sapply(split(seq_along(a), a), function(x) if(length(x) > 1) head(sample(x), 1) else x)
And it would also work with tapply
tapply(seq_along(a), a, function(x) if(length(x) > 1) head(sample(x), 1) else x)
The reason why we need to check the length (if(length(x) > 1)) is because from ?sample
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x.
Hence, when there is only one number (n) in sample(), it takes sample from 1:n (and not n) so we need to check it's length.

Create a vector with the sum of the positive elements of each column of a m*n numeric matrix in R

I need to create an R function which takes a numeric matrix A of arbitrary format n*m as input and returns a vector that is as long as A's number of columns that contains the sum of the positive elements of each column.
I must do this in 2 ways - the first in a nested loop and the second as a one liner using vector/matrix operations.
So far I have come up with the following code which creates the vector the size of the amounts of columns of matrix A but I can only seem to get it to give me the sum of all positive elements of the matrix instead of each column:
colSumPos(A){
columns <- ncol(A)
v1 <- vector("numeric", columns)
for(i in 1:columns)
{
v1[i] <- sum(A[which(A>0)])
}
}
Could someone please explain how I get the sum of each column separately and then how I can simplify the code to dispose of the nested loop?
Thanks in advance for the help!
We can use apply with MARGIN=2 to loop through the columns and get the sum of elements that are greater than 0
apply(A, 2, function(x) sum(x[x >0], na.rm = TRUE))
#[1] 1.8036685 0.7129192 0.9305136 2.6625824 0.0000000
Or another option is colSums after replacing the values less than or equal to 0 with NA
colSums(A*NA^(A<=0), na.rm = TRUE)
#[1] 1.8036685 0.7129192 0.9305136 2.6625824 0.0000000
Or by more direct approach
colSums(replace(A, A<=0, NA), na.rm = TRUE)
#[1] 1.8036685 0.7129192 0.9305136 2.6625824 0.0000000
Or if there are no NA elements (no need for na.rm=TRUE), we can replace the values that are less than or equal to 0 with 0 and make it compact (as #ikop commented)
colSums(A*(A>0))
#[1] 1.8036685 0.7129192 0.9305136 2.6625824 0.0000000
data
set.seed(24)
A <- matrix(rnorm(25), 5, 5)
You try code folow if you using for loop
sumColum <- function(A){
for(i in 1:nrow(A)){
for(j in 1:ncol(A)){
colSums(replace(A, A<=0, NA), na.rm = TRUE)
}
}
colSums(A)
}

How many values of a vector are divisible by 2? Use R

I have an ex. where I have to see how many values of a vector are divisible by 2. I have this random sample:
set.seed(1)
y <- sample(c(0:99, NA), 400, replace=TRUE)
I created a new variable d to see which of the values are or aren't divisible by 2:
d <- y/2 ; d
What I want to do is to create a logical argument, where all entire numbers give true and the rest gives false. (ex: 22.0 -> TRUE & 24.5 -> FALSE)
I used this command, but I believe that the answer is wrong since it would only give me the numbers that are in the sample:
sum(d %in% y, na.rm=T)
I also tried this (I found on the internet, but I don't really understand it)
is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) < tol
sum(is.wholenumber(d),na.rm = T)
Are there other ways that I could use the operator "%%"?
you can sum over the mod operator like so: sum(1-y%%2) or sum(y%%2 == 0). Note that x %% 2 is the remainder after dividing by two which is why this solution works.
Here are three different ways:
length(y[y %% 2 == 0])
length(subset(y, y %% 2 == 0))
length(Filter(function(x) x %% 2 == 0, y))
Since we're talking about a division by 2, I would actually take it to the bit level and check if the last bit of the number is a 0 or a 1 (a 0 means it would be divisible by 2).
Going out on a limb here (not sure how the compiler handles this division by 2) but think that would likely be more optimized than a division, which is typically fairly expensive.
To do this at the bit level, you can just do an AND operation between the number itself and 1, if result it 1 it means won't be divisible by 2:
bitwAnd(a, b)

How to count TRUE values in a logical vector

In R, what is the most efficient/idiomatic way to count the number of TRUE values in a logical vector? I can think of two ways:
z <- sample(c(TRUE, FALSE), 1000, rep = TRUE)
sum(z)
# [1] 498
table(z)["TRUE"]
# TRUE
# 498
Which do you prefer? Is there anything even better?
The safest way is to use sum with na.rm = TRUE:
sum(z, na.rm = TRUE) # best way to count TRUE values
which gives 1.
There are some problems with other solutions when logical vector contains NA values.
See for example:
z <- c(TRUE, FALSE, NA)
sum(z) # gives you NA
table(z)["TRUE"] # gives you 1
length(z[z == TRUE]) # f3lix answer, gives you 2 (because NA indexing returns values)
Additionally table solution is less efficient (look at the code of table function).
Also, you should be careful with the "table" solution, in case there are no TRUE values in the logical vector. See for example:
z <- c(FALSE, FALSE)
table(z)["TRUE"] # gives you `NA`
or
z <- c(NA, FALSE)
table(z)["TRUE"] # gives you `NA`
Another option which hasn't been mentioned is to use which:
length(which(z))
Just to actually provide some context on the "which is faster question", it's always easiest just to test yourself. I made the vector much larger for comparison:
z <- sample(c(TRUE,FALSE),1000000,rep=TRUE)
system.time(sum(z))
user system elapsed
0.03 0.00 0.03
system.time(length(z[z==TRUE]))
user system elapsed
0.75 0.07 0.83
system.time(length(which(z)))
user system elapsed
1.34 0.28 1.64
system.time(table(z)["TRUE"])
user system elapsed
10.62 0.52 11.19
So clearly using sum is the best approach in this case. You may also want to check for NA values as Marek suggested.
Just to add a note regarding NA values and the which function:
> which(c(T, F, NA, NULL, T, F))
[1] 1 4
> which(!c(T, F, NA, NULL, T, F))
[1] 2 5
Note that which only checks for logical TRUE, so it essentially ignores non-logical values.
Another way is
> length(z[z==TRUE])
[1] 498
While sum(z) is nice and short, for me length(z[z==TRUE]) is more self explaining. Though, I think with a simple task like this it does not really make a difference...
If it is a large vector, you probably should go with the fastest solution, which is sum(z). length(z[z==TRUE]) is about 10x slower and table(z)[TRUE] is about 200x slower than sum(z).
Summing up, sum(z) is the fastest to type and to execute.
Another option is to use summary function. It gives a summary of the Ts, Fs and NAs.
> summary(hival)
Mode FALSE TRUE NA's
logical 4367 53 2076
>
which is good alternative, especially when you operate on matrices (check ?which and notice the arr.ind argument). But I suggest that you stick with sum, because of na.rm argument that can handle NA's in logical vector.
For instance:
# create dummy variable
set.seed(100)
x <- round(runif(100, 0, 1))
x <- x == 1
# create NA's
x[seq(1, length(x), 7)] <- NA
If you type in sum(x) you'll get NA as a result, but if you pass na.rm = TRUE in sum function, you'll get the result that you want.
> sum(x)
[1] NA
> sum(x, na.rm=TRUE)
[1] 43
Is your question strictly theoretical, or you have some practical problem concerning logical vectors?
There's also a package called bit that is specifically designed for fast boolean operations. It's especially useful if you have large vectors or need to do many boolean operations.
z <- sample(c(TRUE, FALSE), 1e8, rep = TRUE)
system.time({
sum(z) # 0.170s
})
system.time({
bit::sum.bit(z) # 0.021s, ~10x improvement in speed
})
I've been doing something similar a few weeks ago. Here's a possible solution, it's written from scratch, so it's kind of beta-release or something like that. I'll try to improve it by removing loops from code...
The main idea is to write a function that will take 2 (or 3) arguments. First one is a data.frame which holds the data gathered from questionnaire, and the second one is a numeric vector with correct answers (this is only applicable for single choice questionnaire). Alternatively, you can add third argument that will return numeric vector with final score, or data.frame with embedded score.
fscore <- function(x, sol, output = 'numeric') {
if (ncol(x) != length(sol)) {
stop('Number of items differs from length of correct answers!')
} else {
inc <- matrix(ncol=ncol(x), nrow=nrow(x))
for (i in 1:ncol(x)) {
inc[,i] <- x[,i] == sol[i]
}
if (output == 'numeric') {
res <- rowSums(inc)
} else if (output == 'data.frame') {
res <- data.frame(x, result = rowSums(inc))
} else {
stop('Type not supported!')
}
}
return(res)
}
I'll try to do this in a more elegant manner with some *ply function. Notice that I didn't put na.rm argument... Will do that
# create dummy data frame - values from 1 to 5
set.seed(100)
d <- as.data.frame(matrix(round(runif(200,1,5)), 10))
# create solution vector
sol <- round(runif(20, 1, 5))
Now apply a function:
> fscore(d, sol)
[1] 6 4 2 4 4 3 3 6 2 6
If you pass data.frame argument, it will return modified data.frame.
I'll try to fix this one... Hope it helps!
I've just had a particular problem where I had to count the number of true statements from a logical vector and this worked best for me...
length(grep(TRUE, (gene.rep.matrix[i,1:6] > 1))) > 5
So This takes a subset of the gene.rep.matrix object, and applies a logical test, returning a logical vector. This vector is put as an argument to grep, which returns the locations of any TRUE entries. Length then calculates how many entries grep finds, thus giving the number of TRUE entries.

Resources