sampling bug in R? [duplicate] - r

This question already has answers here:
Sample from vector of varying length (including 1)
(4 answers)
Closed 4 years ago.
I am trying to sample one element out of a numeric vector.
When the length of the vector > 1, the result is one of the numbers of the vector, as expected. However when the vector contains one element, it samples a number between 0 and this single number.
For example:
sample(c(100, 1000), 1)
results in either 100 or 1000, however
sample(c(100), 1)
results in different numbers smaller than 100.
What is going on?

Have a look at the Details of the sample function:
"If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x"

This is (unfortunately) expected behavior. See ?sample. The first line of the Details section:
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x. Note that this convenience feature may lead to undesired behaviour when x is of varying length in calls such as sample(x). See the examples.
Luckily the Examples section provides a suggested fix:
# sample()'s surprise -- example
x <- 1:10
sample(x[x > 8]) # length 2
sample(x[x > 9]) # oops -- length 10!
sample(x[x > 10]) # length 0
## safer version:
resample <- function(x, ...) x[sample.int(length(x), ...)]
resample(x[x > 8]) # length 2
resample(x[x > 9]) # length 1
resample(x[x > 10]) # length 0
You could, of course, also just use an if statement:
sampled_x = if (length(my_x) == 1) my_x else sample(my_x, size = 1)

Related

Longest conditional run in a vector in R

Given a single vector, I would like to find the longest run, which meets this: the count is being stopped when there is a run of x>1 for the first time, which has got length of at least 5.
For example, I got a vector X:
X <- c(2,3,4,0,1,0,0,0,3,2,2,0,3,3,3,3,3,0,0,0)
My desired run has got length of 12, its end is the beggining of the first run of x>1 which is at least 5 numbers long.
I know my question is not asked in the most aesthetic way, but I think that I explained it sufficiently.
Maybe you are looking for this -
with(rle(X > 1), {
val <- max(lengths[values & lengths >= 5])
inds <- which(values & lengths == val) - 1
cumsum(lengths)[inds]
})
#[1] 12

Why does sample() not work for a single number? [duplicate]

This question already has answers here:
Sample from vector of varying length (including 1)
(4 answers)
Closed 3 years ago.
sample(x,n) The parameters are the vector, and how many times you wish to sample
sample(c(5,9),1) returns either 5 or 9
however,
sample(5,1) returns 1,2,3,4, or 5?
I've read the help section:
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1,
sampling via sample takes place from 1:x. Note that this convenience
feature may lead to undesired behaviour when x is of varying length in
calls such as sample(x). See the examples.
But is there a way to make it not do this? Or do I just need to include an if statement to avoid this.
Or do I just need to include an if statement to avoid this.
Yeah, unfortunately. Something like this:
result = if(length(x) == 1) {x} else {sample(x, ...)}
Here's an alternative approach: you simply subset a random value from your vector like this -
set.seed(4)
x <- c(5,9)
x[sample(length(x), 1)]
[1] 9
x <- 5
x[sample(length(x), 1)]
[1] 5

Generate random numbers with 3 to 7 digits in R

How can I generate random numbers of varying length, say between 3 to 7 digits with equal probability.
At the end I would like the code to come up with a 3 to 7 digit number (with equal probability) consisting of random numbers between 0 and 9.
I came up with this solution but feel that it is overly complicated because of the obligatory generation of a data frame.
options(scipen=999)
t <- as.data.frame(c(1000,10000,100000,1000000,10000000))
round(runif(1, 0,1) * sample_n(t,1, replace = TRUE),0)
Is there a more elegant solution?
Based on the information you provided, I came up with another solution that might be closer to what you want. In the end, it consists of these steps:
randomly pick a number len from [3, 7] determining the length of the output
randomly pick len numbers from [0, 9]
concatenate those numbers
Code to do that:
(len <- runif(1, 3, 7) %/% 1)
(s <- runif(len, 0, 9) %/% 1)
cat(s, sep = "")
I previously provided this answer; it does not meet the requirements though, as became clear after OP provided further details.
Doesn't that boil down to generating a random number between 100 and 9999999?
If so, does this do what you want?
runif(5, 100, 9999999) %/% 1
You could probably also use round, but you'd always have to round down.
Output:
[1] 4531543 9411580 2195906 3510185 1129009
You could use a vectorized approach, and sample from the allowed range of exponents directly in the exponent:
pick.nums <- function(n){floor(10^(sample(3:7,n,replace = TRUE))*runif(n))}
For example,
> set.seed(123)
> pick.nums(5)
[1] 455 528105 89241 5514350 4566147

Check if numbers in a vector are alternating in R

i need to check if the first number of a vector is smaller than the second number and the second number is greater than the third number and so on. I got so far that i can calculate the differences of the numbers of a vector like this:
n <- sample(3) #may n = 132
diff(n) # outputs 2 -1
I need to check if the first number is positive, the second negative etc. The problem i have is that i need the program to do it for a vector of length n. How can i implement this?
As it is not very clear what i am trying to do here i will give a better example:
May v be a vector c(1,2,4,3).
I need to check if the first number of the vector is smaller than the second, the second greater than the third, the third smaller than the fourth.
So i need to check if 1 < 2 > 4 < 3. (This vector wouldn´t meet the requirements) Every number i will get will be > 0 and is guaranteed to just be there once.
This process needs to be generalized to a given n which is > 0 and a natural number.
v <- c(1, 2, 4, 3)
all(sign(diff(v)) == c(1, -1))
# [1] FALSE
# Warning message:
# In sign(diff(v)) == c(1, -1) :
# longer object length is not a multiple of shorter object length
We can safely ignore the warning message, since we make deliberate use of "recycling" (which means c(1, -1) is implicitly repeated to match the length of sign(diff(v))).
Edit: taking #digEmAll's comment into account, if you want to allow a negative difference rather than a positive one at the start of the sequence, then this naive change should do it:
diffs <- sign(diff(v))
all(diffs == c(1, -1)) || all(diffs == c(-1, 1))
If we need to find whether there are alternative postive, negative difference, then
all(rle(as.vector(tapply(n, as.integer(gl(length(n),
2, length(n))), FUN = diff)))$lengths==1)
#[1] TRUE
Also, as #digEmAll commented and the variation of my initial response
all(rle(sign(diff(n)) > 0)$lengths == 1)
data
n <- c(1, 2, 4, 3)

Removing zero lines from dataframe yields dataframe of zero lines

I have a script that has a bunch of quality control checksums and it got caught on a dataset that had no need to remove any samples (rows) due to quality control. However, this script gave me an unexpected result of a dataframe with zero rows. With example data, why does this work:
data(iris)
##get rid of those pesky factors
iris$Species <- NULL
med <- which(iris[, 1] < 4.9)
medtemp <- iris[-med, ]
dim(medtemp)
[1] 134 4
but this returns a dataframe of zero rows:
small <- which(iris[, 1] < 4.0)
smalltemp <- iris[-small, ]
dim(smalltemp)
[1] 0 4
As does this:
x <- 0
zerotemp <- iris[-x, ]
dim(zerotemp)
[1] 0 4
It seems that the smalltemp dataframe should be the same size as iris since there are no rows to remove at all. Why is this?
Copied verbatim from Patrick Burns's R Inferno p. 41 (I hope this constitutes "fair use" -- if someone objects I'll remove it)
negative nothing is something
> x2 <- 1:4
> x2[-which(x2 == 3)]
[1] 1 2 4
The command above returns all of the values in x2 not equal to 3.
> x2[-which(x2 == 5)]
numeric(0)
The hope is that the above command returns all of x2 since no elements are
equal to 5. Reality will dash that hope. Instead it returns a vector of length
zero.
There is a subtle difference between the two following statements:
x[]
x[numeric(0)]
Subtle difference in the input, but no subtlety in the difference in the output.
There are at least three possible solutions for the original problem.
out <- which(x2 == 5)
if(length(out)) x2[-out] else x2
Another solution is to use logical subscripts:
x2[!(x2 %in% 5)]
Or you can, in a sense, work backwards:
x2[ setdiff(seq along(x2), which(x2 == 5)) ]
Could it be that in your second example, small evaluates to 0?
Taking the zeroth element of a vector will always return the empty vector:
> foo <- 1:3
> foo
[1] 1 2 3
> foo[0]
integer(0)
>
Instead of using which to get your indices, I would use a boolean vector and negate it. That way you can do this:
small <- iris[, 1] < 4.0
smalltemp <- iris[!small, ]
dim(smalltemp)
[1] 150 4
EDIT: I don't think a negative index of 0 (as in your case) is allowed since there is no 0th index and thus R can't exclude that index from your selection. Negative indexing can be interpreted as: "give me back all rows except those with these indices".
It is because of the rules of what to do with an index that is zero. Only strictly positive or strictly negative indices are allowed. As [0] returns nothing, and
R> -0 == 0
[1] TRUE
Hence you get nothing where you expected it to drop nothing.
The identical(0) issue is treated as indexing by a NULL and this is documented to work as if indexing by 0 and hence the same behaviour.
This is discussed in the R Language Definition manual

Resources