Select random element in a list of R? - r

a<-c(1,2,0,7,5)
Some languages have a picker -function -- choose one random number from a -- how in R?

# Sample from the vector 'a' 1 element.
sample(a, 1)

the above answers are technically correct:
sample(a, 1)
however, if you would like to repeat this process many times, let's say you would like to imitate throwing a dice, then you need to add:
a <- c(1,2,3,4,5,6)
sample(a, 12, replace=TRUE)
Hope it helps.

Be careful when using sample!
sample(a, 1) works great for the vector in your example, but when the vector has length 1 it may lead to undesired behavior, it will use the vector 1:a for the sampling.
So if you are trying to pick a random item from a varying length vector, check for the case of length 1!
sampleWithoutSurprises <- function(x) {
if (length(x) <= 1) {
return(x)
} else {
return(sample(x,1))
}
}

This method doesn't produce an error when your vector is length one, and it's simple.
a[sample(1:length(a), 1)]

Read this article about generating random numbers in R.
http://blog.revolutionanalytics.com/2009/02/how-to-choose-a-random-number-in-r.html
You can use sample in this case
sample(a, 1)
Second attribute is showing that you want to get only one random number.
To generate number between some range runif function is useful.

An alternative is to select an item from the vector using runif. i.e
a <- c(1,2,0,7,5)
a[runif(1,1,6)]
Lets say you want a function that picks one each time it is run (useful in a simulation for example). So
a <- c(1,2,0,7,5)
sample_fun_a <- function() sample(a, 1)
runif_fun_a <- function() a[runif(1,1,6)]
microbenchmark::microbenchmark(sample_fun_a(),
runif_fun_a(),
times = 100000L)
Unit: nanoseconds
sample_fun_a() - 4665
runif_fun_a() - 1400
runif seems to be quicker in this example.

Related

what is the most efficient way to find the most common value in a vector?

I'm trying to create a function to solve this puzzle:
An Arithmetic Progression is defined as one in which there is a constant difference between the consecutive terms of a given series of numbers. You are provided with consecutive elements of an Arithmetic Progression. There is however one hitch: exactly one term from the original series is missing from the set of numbers which have been given to you. The rest of the given series is the same as the original AP. Find the missing term.
You have to write the function findMissing(list), list will always be at least 3 numbers. The missing term will never be the first or last one.
The next section of code shows my attempt at this function. The site i'm on runs tests against the function, all of which passed, as in they output the correct missing integer.
The problem i'm facing is it's giving me a timeout error, because it takes to long to run all the tests. There are 102 tests and it's saying it takes over 12 seconds to complete them. Taking more than 12 seconds means the function isn't efficient enough.
After running my own timing tests in RStudio it seems running the function would take considerably less time than 12 seconds to run but regardless i need to make it more efficient to be able to complete the puzzle.
I asked on the site forum and someone said "Sorting is expensive, think of another way of doing it without it." I took this to mean i shouldn't be using the sort() function. Is this what they mean?
I've since found a few different ways of getting my_diff which is calculated using the sort() function. All of these ways are even less efficient than the original way of doing it.
Can anyway give me a more efficient way of doing the sort to find my_diff or maybe make other parts of the code more efficient? It's the sort() part which is apparently the inefficient part of the code though.
find_missing <- function(sequence){
len <- length(sequence)
if(len > 3){
my_diff <- as.integer(names(sort(table(diff(sequence)), decreasing = TRUE))[1])
complete_seq <- seq(sequence[1], sequence[len], my_diff)
}else{
differences <- diff(sequence)
complete_seq_1 <- seq(sequence[1],sequence[len],differences[1])
complete_seq_2 <- seq(sequence[1],sequence[len],differences[2])
if(length(complete_seq_1) == 4){
complete_seq <- complete_seq_1
}else{
complete_seq <- complete_seq_2
}
}
complete_seq[!complete_seq %in% sequence]
}
Here are a couple of sample sequences to check the code works:
find_missing(c(1,3,5,9,11))
find_missing(c(1,5,7))
Here are some of the other things i tried instead of sort:
1:
library(pracma)
Mode(diff(sequence))
2:
library(dplyr)
(data.frame(diff_1 = diff(sequence)) %>%
group_by(diff_1) %>%
summarise(count = n()) %>%
ungroup() %>%
filter(count==max(count)))[1]
3:
MaxTable <- function(sequence, mult = FALSE) {
differences <- diff(sequence)
if (!is.factor(differences)) differences <- factor(differences)
A <- tabulate(differences)
if (isTRUE(mult)) {
as.integer(levels(differences)[A == max(A)])
}
else as.integer(levels(differences)[which.max(A)])
}
Here is one way to do this using seq. We can create a sequence from minimum value in sequence to maximum value in the sequence having length as length(x) + 1 as there is exactly one term missing in the sequence.
find_missing <- function(x) {
setdiff(seq(min(x), max(x), length.out = length(x) + 1), x)
}
find_missing(c(1,3,5,9,11))
#[1] 7
find_missing(c(1,5,7))
#[1] 3
This approach takes the diff() of the vector - there will always be one difference higher than the others.
find_missing <- function(x) {
diffs <- diff(x)
x[which.max(diffs)] + min(diffs)
}
find_missing(c(1,3,5,9,11))
[1] 7
find_missing(c(1,5,7))
[1] 3
There is actually a simple formula for this, which will work even if your vector is not sorted...
find_missing <- function(x) {
(length(x) + 1) * (min(x) + max(x))/2 - sum(x)
}
find_missing(c(1,5,7))
[1] 3
find_missing(c(1,3,5,9,11,13,15))
[1] 7
find_missing(c(2,8,6))
[1] 4
It is based on the fact that the sum of the full series should be the average value times the length.

Which loop to use, R language?

We have to create function(K) that returns vector which has all items smaller than or equal to K from fibonacci sequence. We can assume K is fibonacci item. For example if K is 3 the function would return vector (1,1,2,3).
In general, a for loop is used when you know how many iterations you need to do, and a while loop is used when you want to keep going until a condition is met.
For this case, it sounds like you get an input K and you want to keep going until you find a Fibonacci term > K, so use a while loop.
ans <- function(n) {
x <- c(1,1)
while (length(x) <= n) {
position <- length(x)
new <- x[position] + x[position-1]
x <- c(x,new)
}
return(x[x<=n])
}
`
Tried many different loops, and this is closest I get. It works with every other number but ans(3) gives 1,1,2 even though it should give 1,1,2,3. Couldn't see what is wrong with this.

Indexing variables in R

I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.

Vectorize loop in R

I have some array named P_Array with 100,000 data points and need to calculate the first order autocorrelation for subintervalls of length 100, i.e. from 1:100 and 2:101 etc. I've written a loop which works just fine, but is very slow.
Tf <- 100000
acf_Array <- rep(0, length.out = Tf-100)
for (t in 1:(Tf-100)){
acf_Array[t] <- acf(P_Array[t:(t+100)])$acf[2]
}
My idea was to use something like
acf_Array[1:(Tf-100)] <- acf(P_Array[(1:(Tf-100)):(101:Tf)])$acf[2]
which, however, does not work. Any suggestions?
Edit
I think this will do the trick
for (t in 1:(Tf-100)){
acf_Array[t] <- cor(P_Array[t:(t+98)], P_Array[(t+1):(t+99)])
}
To answer the specific question on vectorising the for loop, this is my answer:
acf_Array <- sapply(1:Tf-100, function(x) acf(P_Array[x:x+100])$acf[2])
But as mentioned in the comments the speed limiting bit is probably the acf function.

vector-matrix multiplication in r

I want to multiply 1000 random variables to a matrix so as to get 1000 different resultant matrices.
I'm running the following code :
Threshold <- runif(1000,min=0,max=1) #Generating 1000 random variables so that we can see 1000 multiple results of Burstscore
Burstscore <- matrix(data=0,nrow=nrow(Fm2),ncol=ncol(Fpre2))
#Calculating the final burst score
for (i in 1:nrow(Fm2)){
for (j in 1:ncol(Fpre)){ #Dimentions of all the matrices (Fpre,Fm,Growth,TD,Burstscore) are 432,24
{
Burstscore[i,j]= ((as.numeric(Threshold))*(as.numeric(Growth[i,j]))) + ((1-(as.numeric(Threshold)))*(as.numeric(TD[i,j])))
}
}
}
I'm getting the following error -
'Error in Burstscore[i, j] = ((as.numeric(Threshold)) * (as.numeric(Growth[i, :
number of items to replace is not a multiple of replacement length'
You are trying to put in one cell of the Burstscore matrix 1000 values (as you are multiplying each [i,j] one by the entire "Threshold" vector). Apart from this, your code contains unnecesary elements (brackets or as.numeric() statements). And, of course, as said above, it is not fully reproducible, and I had to "invent" several matrices.
I guess that what you want to do is the following:
Threshold <- runif(1000,min=0,max=1)
Growth <- matrix(runif(432*24), ncol=24)
Burstscore <- vector("list", length(Threshold))
for (i in 1:length(Threshold)) {
Burstscore[[i]]= (Threshold[i]*Growth) + ((1-Threshold[i])*TD)
}
In R, it would be even more elegant to use a lapply() function:
Burstscore <- lapply(Threshold, function(x) (x*Growth)+((1-x)*TD))
Finally, I suggest you also put a more meaningful title to your question, so it could potentially be helpful to others also.

Resources