I need to create a function of five variables,
a (multiplier)
n (sample size)
c (increment with default 0)
m (modulus)
x0 (Initial seed value)
I need to generate a sequence of random numbers with the equation
xi = (a*xi-1 + c) (mod m), i = 1, 2, ..., n
As in the vector x = (x1, ..., xn).
My attempt:
my.unif1 <- function(n, a,c = 0, m, x = x[0]) {
while(n > 0) {
x[n] <- (a*x[n-1]+c)%%m
}
}
It sounds like you want to learn more about Linear Congruential Generators. Here's a resource that will probably help you solve your code problem: https://qualityandinnovation.com/2015/03/03/a-linear-congruential-generator-lcg-in-r/
lcg <- function(a,c,m,run.length,seed) {
x <- rep(0,run.length)
x[1] <- seed
for (i in 1:(run.length-1)) {
x[i+1] <- (a * x[i] + c) %% m
}
U <- x/m # scale all of the x's to
# produce uniformly distributed
# random numbers between [0,1)
return(list(x=x,U=U))
}
> z <- lcg(6,7,23,20,5)
> z
$x
[1] 5 14 22 1 13 16 11 4 8 9 15 5 14 22 1 13 16 11
[19] 4 8
$U
[1] 0.21739130 0.60869565 0.95652174 0.04347826 0.56521739
[6] 0.69565217 0.47826087 0.17391304 0.34782609 0.39130435
[11] 0.65217391 0.21739130 0.60869565 0.95652174 0.04347826
[16] 0.56521739 0.69565217 0.47826087 0.17391304 0.34782609
That could help:
my.fct.1 <- function(x, multiplier, increment, modulus){
increment <- ifelse(missing(increment), 0, increment) # setting the default increment to 0
newval <- (multiplier*x + increment) %% modulus
return(newval)
}
my.fct.2 <- function(x0, n, multiplier, increment, modulus){
if(n == 1){
val <- my.fct.1(x = x0, multiplier = multiplier, increment = increment, modulus = modulus)
vec <- c(x0, val)
return(vec)
}
if(n > 1){
vec <- my.fct.2(x = x0, n = n-1, multiplier = multiplier, increment = increment, modulus = modulus)
val <- vec[length(vec)]
newval <- my.fct.1(x = val, multiplier = multiplier, increment = increment, modulus = modulus)
newvec <- c(vec, newval)
return(newvec)
}
}
my.fct.2 does the required, the arguments are pretty much self explanatory. Watch out though, because it is a recursive function (which can affect speed among other things).
And here are some examples of such generated sequences:
> my.fct.2(3, 9, 7, -1, 4)
[1] 3 0 3 0 3 0 3 0 3 0
> my.fct.2(1, 9, 2, 1, 13)
[1] 1 3 7 2 5 11 10 8 4 9
> my.fct.2(0, 17, 5, 3, 7)
[1] 0 3 4 2 6 5 0 3 4 2 6 5 0 3 4 2 6 5
# and here the arguments set to cross check it against #mysteRious's answer
> my.fct.2(5, 20, 6, 7, 23)
[1] 5 14 22 1 13 16 11 4 8 9 15 5 14 22 1 13 16 11 4 8 9
U <- my.fct.2(5, 20, 6, 7, 23)/23
> U
[1] 0.21739130 0.60869565 0.95652174 0.04347826 0.56521739 0.69565217 0.47826087 0.17391304
[9] 0.34782609 0.39130435 0.65217391 0.21739130 0.60869565 0.95652174 0.04347826 0.56521739
[17] 0.69565217 0.47826087 0.17391304 0.34782609 0.39130435
Related
I'm trying to build this function to check the multiples of 3, from 0 to the half of the element "number". I'm adding "n" that limits the number of results that I will get.
function1 <- function(number, n){
half <- number / 2
lessequal <- seq.int(from = 0, to = half, length.out = n)
multiple <- (lessequal %% 3) == 0
return (lessequal [multiple])
}
When I run this function with n = 2
function1 (24, 2)
[1] 0 12
When the expected result would be:
[1] 0 3
If I run it with n = 4. The outcome is always 2 elements instead of 4.
function1 (12, 4)
[1] 0 12
When I expected to get:
[1] 0 3 6 9
What am I doing wrong?
Thanks.
Try with this code:
function1 <- function(number, n){
half <- number / 2
lessequal <- seq.int(from = 0, to = half, by=1)
multiple <- (lessequal %% 3) == 0
vals <- lessequal [multiple]
vals <- vals[1:n]
return (vals)
}
function1 (24, 4)
Output:
function1 (24, 4)
[1] 0 3 6 9
For most of the cases you can get the output with seq function itself without passing numbers.
function1 <- function(n){
seq(0, length.out = n, by = 3)
}
function1(2)
#[1] 0 3
function1(4)
#[1] 0 3 6 9
If there are going to be cases when first n multiples of 3 will be higher than number/2 you can use :
function1 <- function(number, n){
val <- seq(0, number/2, by = 3)
val[1:n]
}
function1(24, 2)
#[1] 0 3
function1(24, 4)
#[1] 0 3 6 9
function1(12, 4)
#[1] 0 3 6 NA
we can use %/% (natural division) and then grab the first n numbers that can be divided by 3 i.e 3*k-1 as we start the vector from 0
get_odd_n <- function(x, n) lapply(x, function(i) (0:(i%/%2))[3*(0:(n-1))+1])
setNames(get_odd_n(6*1:6+4, 7), 6*1:6+4)
$`10`
[1] 0 3 NA NA NA NA NA
$`16`
[1] 0 3 6 NA NA NA NA
$`22`
[1] 0 3 6 9 NA NA NA
$`28`
[1] 0 3 6 9 12 NA NA
$`34`
[1] 0 3 6 9 12 15 NA
$`40`
[1] 0 3 6 9 12 15 18
I'm looking for a way to identify a growing season which consists of a number of days greater than say 60 between the last frost day of spring and the first frost day in the fall. A general version of this problem is this. If I have a vector of numbers like testVec, I want the item numbers of the beginning and end range of values where the number of items is 5 or greater and all of them are greater than 0.
testVec <- c(1,3,4,0, 1, -5, 6, 0, 1,3,4,6,7,5,9, 0)
In this example, the relevant range is 1,3,4,6,7,5,9 which is testVec[9] to testVec[15]
One option could be:
testVec[with(rle(testVec > 0), rep(lengths * values >= 5, lengths))]
[1] 1 3 4 6 7 5 9
Here, the idea is to, first, create runs of values that are smaller or equal to zero and bigger than zero. Second, it checks whether the runs of values bigger than zero are of length 5 or more. Finally, it subsets the original vector for the runs of values bigger than zero with length 5 or more.
1) rleid This also handles any number of sequences including zero. rleid(ok) is a vector the same length as ok such that the first run of identical elements is replaced with 1, the second run with 2 and so on. The result is a list of vectors where each vector has its positions in the original input as its names.
library(data.table)
getSeq <- function(x) {
names(x) <- seq_along(x)
ok <- x > 0
s <- split(x[ok], rleid(ok)[ok])
unname(s)[lengths(s) >= 5]
}
getSeq(testVec)
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
getSeq(numeric(16))
## list()
getSeq(c(testVec, 10 * testVec))
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
##
## [[2]]
## 25 26 27 28 29 30 31
## 10 30 40 60 70 50 90
If a data frame were desired then following gives the values and which sequence the row came from. The row names indicate the positions in the original input.
gs <- getSeq(c(testVec, 10 * testVec))
names(gs) <- seq_along(gs)
if (length(gs)) stack(gs) else gs
## values ind
## 9 1 1
## 10 3 1
## 11 4 1
## 12 6 1
## 13 7 1
## 14 5 1
## 15 9 1
## 25 10 2
## 26 30 2
## 27 40 2
## 28 60 2
## 29 70 2
## 30 50 2
## 31 90 2
2) gregexpr Replace each element that is > 0 with 1 and each other element with 0 pasting the 0's and 1's into a single character string. Then use gregexpr to look for sequences of 1's at least 5 long and for the ith such nonoverlapping sequence return the first positions, g, and lengths, attr(g, "match.length"). Define a function vals which extracts the values at the required positions from testVec of the ith such nonoverlapping sequence returning a list such that the ith component of the list is the ith such sequence. The names in the output vector are its positions in the input.
getSeq2 <- function(x) {
g <- gregexpr("1{5,}", paste(+(x > 0), collapse = ""))[[1]]
vals <- function(i) {
ix <- seq(g[i], length = attr(g, "match.length")[i])
setNames(x[ix], ix)
}
if (length(g) == 1 && g == -1) list() else lapply(seq_along(g), vals)
}
getSeq2(testVec)
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
The above handles any number of sequences including 0 but if we knew there were exactly one sequence (which is the case for the example in the question) then it could be simplified to the following where the return value is just that vector:
g <- gregexpr("1{5,}", paste(+(testVec > 0), collapse = ""))[[1]]
ix <- seq(g, length = attr(g, "match.length"))
setNames(testVec[ix], ix)
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
You could "fix" #tmfmnk's solution like this:
f1 <- function(x, threshold, n) {
range(which(with(rle(x > threshold), rep(lengths * values >= n, lengths))))
}
x <- c(1, 3, 4, 0, 1, -5, 6, 0, 1,3,4,6,7,5,9, 0)
f1(x, 0, 5)
#[1] 9 15
But that does not work well when there are multiple runs
xx <- c(x, x)
f1(xx, 0, 5)
#[1] 9 31
Here is another, not so concise approach that returns the start and end of the longest run (the first one if there are ties).
f2 <- function(x, threshold, n) {
y <- x > threshold
y[is.na(y)] <- FALSE
a <- ave(y, cumsum(!y), FUN=cumsum)
m <- max(a)
if (m < n) return (c(NA, NA))
i <- which(a == m)[1]
c(i-m+1, i)
}
f2(x, 0, 5)
#[1] 9 15
f2(xx, 0, 5)
#[1] 9 15
or with rle
f3 <- function(x, threshold, n) {
y <- x > threshold
r <- rle(y)
m <- max(r$lengths)
if (m < n) return (c(NA, NA))
i <- sum(r$lengths[1:which.max(r$lengths)[1]])
c(i-max(r$lengths)+1, i)
}
f3(x, 0, 5)
#[1] 9 15
f3(xx, 0, 5)
#[1] 9 15
If you wanted the first run that is at least n, that is you do not want a next run, even if it is longer, you could do
f4 <- function(x, threshold, n) {
y <- with(rle(x > threshold), rep(lengths * values >= n, lengths))
i <- which(y)[1]
j <- i + which(!y[-c(1:i)])[1] - 1
c(i, j)
}
First, I simplify my question. I want to extract certain ranges from a numeric vector. For example, extracting 3 ranges from 1:20 at the same time :
1 < x < 5
8 < x < 12
17 < x < 20
Therefore, the expected output is 2, 3, 4, 9, 10, 11, 18, 19.
I try to use the function findInterval() and control arguments rightmost.closed and left.open to do that, but any arguments sets cannot achieve the goal.
x <- 1:20
v <- c(1, 5, 8, 12, 17, 20)
x[findInterval(x, v) %% 2 == 1]
# [1] 1 2 3 4 8 9 10 11 17 18 19
x[findInterval(x, v, rightmost.closed = T) %% 2 == 1]
# [1] 1 2 3 4 8 9 10 11 17 18 19 20
x[findInterval(x, v, left.open = T) %% 2 == 1]
# [1] 2 3 4 5 9 10 11 12 18 19 20
By the way, the conditions can also be a matrix like that :
[,1] [,2]
[1,] 1 5
[2,] 8 12
[3,] 17 20
I don't want to use for loop if it's not necessary.
I am grateful for any helps.
I'd probably do it using purrr::map2 or Map, passing your lower-bounds and upper-bounds as arguments and filtering your dataset with a custom function
library(purrr)
x <- 1:20
lower_bounds <- c(1, 8, 17)
upper_bounds <- c(5, 12, 20)
map2(
lower_bounds, upper_bounds, function(lower, upper) {
x[x > lower & x < upper]
}
)
You may use data.table::inrange and its incbounds argument. Assuming ranges are in a matrix 'm', as shown in your question:
x[data.table::inrange(x, m[ , 1], m[ , 2], incbounds = FALSE)]
# [1] 2 3 4 9 10 11 18 19
m <- matrix(v, ncol = 2, byrow = TRUE)
You were on the right path, and left.open indeed helps, but rightmost.closed actually concerns only the last interval rather than the right "side" of each interval. Hence, we need to use left.open twice. As you yourself figured out, it looks like an optimal way to do that is
x[findInterval(x, v) %% 2 == 1 & findInterval(x, v, left.open = TRUE) %% 2 == 1]
# [1] 2 3 4 9 10 11 18 19
Clearly there are alternatives. E.g.,
fun <- function(x, v)
if(length(v) > 1) v[1] < x & x < v[2] | fun(x, v[-1:-2]) else FALSE
x[fun(x, v)]
# [1] 2 3 4 9 10 11 18 19
I found an easy way just with sapply() :
x <- 1:20
v <- c(1, 5, 8, 12, 17, 20)
(v.df <- as.data.frame(matrix(v, 3, 2, byrow = T)))
# V1 V2
# 1 1 5
# 2 8 12
# 3 17 20
y <- sapply(x, function(x){
ind <- (x > v.df$V1 & x < v.df$V2)
if(any(ind)) x else NA
})
y[!is.na(y)]
# [1] 2 3 4 9 10 11 18 19
I don't know what it is called, nested arithmetic progression maybe?
If n is a integer, say n=50, What I would like is
(1,2,3...n,2,3...n,3,4..n...n-1,n)
it is like concatenation of
1:n, 2:n, ...,n-1:n
Is there an easy way of doing this?
Thanks!
The subject says the last subsequence is n but the body of the question says it is (n-1):n. I have assumed (n-1):n but to get the other just change each n-1 in the code to n and each 2 in the code to 1.
1) lapply Assuming we want 1:n, 2:n, ..., (n-1):n iterate over the starting value of each subsequence like this:
n <- 4
unlist(lapply(seq_len(n-1), seq, n))
## [1] 1 2 3 4 2 3 4 3 4
2) sequence Another approach is to transform sequence(seq(n, 2)) like this:
s <- sequence(seq(n, 2))
s + cumsum(s == 1) - 1
## [1] 1 2 3 4 2 3 4 3 4
3) outer
m <- outer(seq_len(n), seq_len(n-1), ">=") * seq(n)
m[m > 0]
## [1] 1 2 3 4 2 3 4 3 4
3a) This variation of (3) also works:
m <- outer(seq_len(n), seq_len(n-1), "+") - 1
m[m <= n]
## [1] 1 2 3 4 2 3 4 3 4
4) Reduce
f <- function(x, y) c(x, seq(y, n))
Reduce(f, 1:(n-1), c())
## [1] 1 2 3 4 2 3 4 3 4
5) Recursion
Recurse <- function(v) {
if (length(v) > 2) c(v, Recall(tail(v, -1))) else v
}
Recurse(1:n)
## [1] 1 2 3 4 2 3 4 3 4
Using Rcpp
library(Rcpp)
cppFunction('Rcpp::NumericVector mySeq( int n ) {
Rcpp::IntegerVector vec = seq(0, n);
int total_n = sum( vec );
Rcpp::NumericVector out(total_n);
size_t i, j;
int idx = 0;
int x = 1;
for( i = 0; i < n; i++ ) {
x = i + 1;
for( j = i; j < n; j++) {
out[idx] = x;
x++;
idx++;
}
}
return out;
}')
mySeq(5)
# [1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5
mySeq(10)
# [1] 1 2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 3 4 5 6 7 8 9 10 4 5 6 7 8 9 10 5 6 7 8
# [39] 9 10 6 7 8 9 10 7 8 9 10 8 9 10 9 10 10
And as ever with these multi-option answers, here's a benchmark
library(microbenchmark)
n <- 10000
microbenchmark(
rcpp = { mySeq(n) },
lapply = { lapn(n) },
sequence = { seqn(n) },
outer = { outn(n) },
outer2 = { outn2(n) },
# reduce = { reducen(n) }, ## takes too long
# recurse = { recursen(n) }, ## takes too long
times = 10
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# rcpp 213.9762 220.3786 245.6753 230.6847 262.8544 326.5764 10
# lapply 250.5695 260.5681 288.2523 278.9582 302.9768 367.5507 10
# sequence 1356.2691 1430.5877 1472.6946 1455.7467 1485.3578 1753.4076 10
# outer 2381.8864 2459.8159 2497.1630 2478.9865 2526.9577 2662.0489 10
# outer2 2509.8079 2531.1497 2651.6906 2636.3873 2785.3693 2820.2356 10
Functions
lapn <- function(n) { unlist(lapply(seq_len(n-1), seq, n)) }
seqn <- function(n) {
s <- sequence(seq(n, 2))
s + cumsum(s == 1) - 1
return(s)
}
outn <- function(n) {
m <- outer(seq_len(n), seq_len(n-1), ">=") * seq(n)
m[m > 0]
}
outn2 <- function(n) {
m <- outer(seq_len(n), seq_len(n-1), "+") - 1
m[m <= n]
}
reducen <- function(n) {
f <- function(x, y) c(x, seq(y, n))
Reduce(f, 1:(n-1), c())
}
recursen <- function(n) {
Recurse <- function(v) {
if (length(v) > 2) c(v, Recall(tail(v, -1))) else v
}
Recurse(1:n)
}
For example,
> n <- 4
> X <- matrix(1:n, nrow = n, ncol = n)
> X
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 2 2 2 2
[3,] 3 3 3 3
[4,] 4 4 4 4
> lower.tri(X, diag = TRUE)
[,1] [,2] [,3] [,4]
[1,] TRUE FALSE FALSE FALSE
[2,] TRUE TRUE FALSE FALSE
[3,] TRUE TRUE TRUE FALSE
[4,] TRUE TRUE TRUE TRUE
> x <- X[lower.tri(X, diag = TRUE)]
> x
[1] 1 2 3 4 2 3 4 3 4 4
> x[-length(x)]
[1] 1 2 3 4 2 3 4 3 4
Given a vector, say v = 1:10, one can remove elements from v using negative indexing, e.g. v[-1], v[-length(v)], v[-c(2,3)], to remove the first, last and 2nd/3rd element respectively.
I would like to split v by passing in a split index n, taking values 0 to length(v). The code below:
v1 <- v[1:n]
v2 <- v[-c(1:n)]
works perfectly fine except for n = 0. Now I know that 1:n is generally unsafe and should be replaced with seq_len(n), however, the assignment v2 <- v[-seq_len(0)] produces an empty vector.
Is there way of doing this 'safely' using the bracket subsetting notation? Otherwise I know how to do it using head and tails:
v1 <- head(v, n)
v2 <- tail(v, length(v) - n)
Relevant other q/as:
Complement of empty index vector is empty index vector
You could use an if() statement inside the brackets. For example, this will just return the whole vector if n is zero and remove the sequence 1:n otherwise.
x <- 1:10
n <- 0
x[ if(n == 0) TRUE else -seq_len(n) ] ## n == 0 is !n for the golfers
# [1] 1 2 3 4 5 6 7 8 9 10
n <- 5
x[ if(n == 0) TRUE else -seq_len(n) ]
# [1] 6 7 8 9 10
v = 1:10
n = 0; split(v, seq_along(v)/min(n,length(v)) <= 1)
#$`FALSE`
# [1] 1 2 3 4 5 6 7 8 9 10
n = 1; split(v, seq_along(v)/min(n,length(v)) <= 1)
#$`FALSE`
#[1] 2 3 4 5 6 7 8 9 10
#$`TRUE`
#[1] 1
n = 10; split(v, seq_along(v)/min(n,length(v)) <= 1)
#$`TRUE`
# [1] 1 2 3 4 5 6 7 8 9 10
n = -10; split(v, seq_along(v)/min(n,length(v)) <= 1)
#$`TRUE`
# [1] 1 2 3 4 5 6 7 8 9 10
n = 100; split(v, seq_along(v)/min(n,length(v)) <= 1)
#$`TRUE`
# [1] 1 2 3 4 5 6 7 8 9 10
Further simplified by thelatemail in comment
split(v, seq_along(v) > n)