Randomly sampling from each element of a vector - r

Let's say I have a numeric vector X
X <- c(1,42,1,23,5,7)
I would like to create another vector Y with the same number of elements, each of which is a randomly generated whole number from a sequence in which 1 is the lower bound and the element in X is the upper bound e.g for Y[2] the number would be a randomly generated number selected from between 1 and 42 and for Y[4] the number would be randomly selected from between 1 and 23.
I have tried to use the apply function to do this
Y<-apply(C, 1, sample)
but I am having no luck and generating the error message
Error in apply(X, 1, sample) : dim(X) must have a positive length1,
sample
Is there a better way to do this?

You can't use apply for a vector, but for multidimensional objects only (e.g., matrices). You have to use sapply instead. Futhermore, you need the argument size = 1 since you want to sample one value for each entry of X.
sapply(X, sample, size = 1)
[1] 1 7 1 16 3 6

Related

How to safely drop nothing from a vector when the negative index could be integer(0)?

Suppose I have a vector x = 1:10, and it is constructed by concatenating two other vectors a = integer(0) and b = 1:10 together (this is an edge case). I want to split up the combined vector again into a and b later on. I would have thought I could safely separate them with:
i = seq_along(a)
x[i]
x[-i]
But I discovered that when I use x[-integer(0)] I get integer(0) returned, instead of x itself as I naively thought. What is the best way to do this sort of thing?
If you want to use negative indexing and the index may degenerate to integer(0) (for example, the index is computed from which), pad a large "out-of-bound" value to the index. Removing an "out-of-bound" value has no side effect.
x <- 1:10
i <- integer(0)
x[-c(i, 11)] ## position 11 is "out-of-bound"
# [1] 1 2 3 4 5 6 7 8 9 10
If you bother setting this "out-of-bound" value, here is a canonical choice: 2 ^ 31, because this value has exceeded representation range of 32-bit signed integer, yet it is not Inf.
An alternative way is to do an if test on length(i). For example:
if (length(i)) x[-i] else x
Caution: don't use function ifelse for this purpose.

How to find if two or more continuously elements of a vector are equal in R

I want to find a way to determine if two or more continuously elements of a vector are equal.
For example, in vector x=c(1,1,1,2,3,1,3), the first, the second and the third element are equal.
With the following command, I can determine if a vector, say y, contains two or more continuously elements that are equal to 2 or 3
all(rle(y)$lengths[which( rle(y)$values==2 | rle(y)$values==3 )]==1)
Is there any other faster way?
EDIT
Let say we have the vector z=c(1,1,2,1,2,2,3,2,3,3).
I want a vector with three elements as output. The first element will refer to value 1, the second to 2 and the third one to 3. The values of the elements of the output vector will be equal to 1 if two or more continuously elements of z are the same for one value of 1,2,3 and 0 otherwise. So, the output for the vector z will be (1,1,1).
For the vector w=c(1,1,2,3,2,3,1) the output will be 1,0,0, since only for the value 1 there are two continuously elements, that is in the first and in the second position of w.
I'm not entirely sure if I'm understanding your question as it could be worded better. The first part just asks how you find if continuous elements in a vector are equal. The answer is to use the diff() function combined with a check for a difference of zero:
z <- c(1,1,2,1,2,2,3,2,3,3)
sort(unique(z[which(diff(z) == 0)]))
# [1] 1 2 3
w <- c(1,1,2,3,2,3,1)
sort(unique(w[which(diff(w) == 0)]))
# [1] 1
But your edit example seems to imply you are looking to see if there are repeated units in a vector, of which will only be the integers 1, 2, or 3. Your output will always be X, Y, Z, where
X is 1 if there is at least one "1" repeated, else 0
Y is 2 if there is at least one "2" repeated, else 0
Z is 3 if there is at least one "3" repeated, else 0
Is this correct?
If so, see the following
continuously <- function(x){
s <- sort(unique(x[which(diff(x) == 0)]))
output <- c(0,0,0)
output[s] <- s
return(output)
}
continuously(z)
# [1] 1 2 3
continuously(w)
# [1] 1 0 0
Assuming your series name is z=c(1,1,2,1,2,2,3,2,3,3) then you can do:
(unique(z[c(FALSE, diff(z) == 0)]) >= 0)+0 which will output to 1, 1, 1,
When you run the above command on your other sequenc:
w=c(1,1,2,3,2,3,1)
then (unique(w[c(FALSE, diff(w) == 0)]) >= 0)+0 will return to 1
You may also try this for an exact output like 1,1,1 or 1,0,0
(unique(z[c(FALSE, diff(z) == 0)]) == unique(z))+0 #1,1,1 for z and 1,0,0 for w
Logic:
diff command will take difference between corresponding second and prior items, since total differences will always 1 less than the number of items, I have added first item as FALSE. Then subsetted with your original sequences and for boolean comparison whether the difference returned is zero or not. Finally we convert them to 1s by asking if they are greater than or equal to 0 (To get series of 1s, you may also check it with some other conditions to get 1s).
Assuming your sequence doesn't have negative numbers.

Convert a one column matrix to n x c matrix

I have a (nxc+n+c) by 1 matrix. And I want to deselect the last n+c rows and convert the rest into a nxc matrix. Below is what I've tried, but it returns a matrix with every element the same in one row. I'm not sure why is this. Could someone help me out please?
tmp=x[1:n*c,]
Membership <- matrix(tmp, nrow=n, ncol=c)
You have a vector x of length n*c + n + c, when you do the extract, you put a comma in your code.
You should do tmp=x[1:(n*c)].
Notice the importance of parenthesis, since if you do tmp=x[1:n*c], it will take the range from 1 to n, multiply it by c - giving a new range and then extract based on this new range.
For example, you want to avoid:
(1:100)[1:5*5]
[1] 5 10 15 20 25
You can also do without messing up your head with indexing:
matrix(head(x, n*c), ncol=c)

Finding peaks in vector

I'm trying to find "peaks" in a vector, i.e. elements for which the nearest neighboring elements on both sides that do not have the same value have lower values.
So, e.g. in the vector
c(0,1,1,2,3,3,3,2,3,4,5,6,5,7)
there are peaks at positions 5,6,7,12 and 14
Finding local maxima and minima comes close, but doesn't quite fit.
This should work. The call to diff(sign(diff(x)) == -2 finds peaks by, in essence, testing for a negative second derivative at/around each of the unique values picked out by rle.
x <- c(0,1,1,2,3,3,3,2,3,4,5,6,5,7)
r <- rle(x)
which(rep(x = diff(sign(diff(c(-Inf, r$values, -Inf)))) == -2,
times = r$lengths))
# [1] 5 6 7 12 14
(I padded your vector with -Infs so that both elements 1 and 14 have the possibility of being matched, should the nearest different-valued element have a lower value. You can obviously adjust the end-element matching rule by instead setting one or both of these to Inf.)

How to create a list from an array of z-scores in R?

I have an array of z-scores that is structured like num [1:27, 1:11, 1:467], so there are 467 entries with 27 rows and 11 columns. Is there a way that I can make a list from this array? For example a list of entries which contain a z-score over 2.0 (not just a list of z scores, a list which identifies which 1:467 entries have z > 2).
Say that your array is called z in your R session. The function you are looking for is which with the argument arr.ind set to TRUE.
m <- which(z > 2, arr.ind=TRUE)
This will give you a selection matrix, i.e. a matrix with three columns, each line corresponding to an entry with a Z-score greater than 2. To know the number of Z-scores greater than 2 you can do
nrow(m)
# Note that 'sum(z > 2)' is easier.
and to get the values
z[m]
# Note that 'z[z > 2]' is easier

Resources