Count even numbers from function output in R - r

I am working through the Euler Problems, and the problem is to sum the even terms in a Fibonacci sequence up to the length where the last term is < 4e6. I got it eventually but the following method of counting the even numbers did not work, and I am curious as to why.
First, this method of counting even numbers from a sequence works:
numbers <- 1:32
N <- length(numbers)
total <- rep(0,N)
for (i in numbers){
if(i %% 2 == 0) total[i] <-i
}
sum(total) #272
Then, this Fibb sequence works:
Fibb<-function(x){
y <- 1:x
y[1] = 1
y[2] = 2
for (i in 3:x){
y[i] <- y[i-2] + y[i-1]
}
return(y)
}
but the same sum function I used on the first sequence doesn't work:
numbers <- as.integer(Fibb(32)) # 1, 2, 3, 5, 8, 13, 21...
N <- length(numbers)
total <- rep(0,N)
for (i in numbers){
if(i %% 2 == 0) total[i] <-i
}
sum(total) #NA
The total of the third chunk is a large numeric, mostly composed of NAs.
EDIT: What I'd like to know is why the loop in the first block of code runs correctly and not that in the third; I copied and pasted likes 6-7, from the first chunk to the third, the only difference is the "numbers" sequence.
Has anyone encountered a problem like this?
Thanks!

It is because you are using elements of numbers as your index into total.
See how you have for (i in numbers). So (for example) when considering the Fibbonaci number 2584 in numbers, you are setting total[2584] <- 1.
Your eventual total vector is 3524578 elements long (!!) when it only needs to be 32 long. All the other elements that you don't store a result in are set to NA, and the sum of NA is NA.
Separate out your Fibonacci number (which can be arbitrarily large) from your index into total (which only goes up to 32). To make the index, you can use seq_along(numbers) which is essentially 1:length(numbers). Then use numbers[i] to get that Fibonacci number.
for (i in seq_along(numbers)) {
if(numbers[i] %% 2 == 0) total[i] <- 1
}

Related

R programming: How to set while loop condition based if all required values in vector have been copied from sample?

I new to R and I'm trying to see how many iterations are needed to fill a vector with numbers 1 to 55 (no duplicates) from a random sample using runif.
At the moment, the vector has a lots of duplicates in it and my number of iterations being returned is the size of the vector. So, i'm not sure if my logic is correct.
The aim of the if statement is to check if the value from the sample exists in the vector, and if it does, choose the next one. But i'm not sure if it's correct, since the next number could already exist in the vector. Any help would be much appreciated
numbers=as.integer(runif(800, min=1, max=55)) ## my sample from runif
i=sample(numbers, 1)
## setting up my vector to store 55 unique values (1 to 55)
p=rep(0,55)
## my counters
j=0
n=1
## my while loop
while (p[n] %in% 0){
## if the sample value already exists in the vector, choose the next value from the sample
if (numbers[n] %in% p) {
p[n]=numbers[n+1]
}
else {
p[n] = numbers[n]
}
n = n + 1
j = j + 1
}
I believe that the following is what you want. Instead pf a while loop on p, the while loop should search for a new value in numbers.
set.seed(2021) # make the results reproducible
numbers <- sample(55, 800, TRUE)
## setting up my vector to store 55 unique values (1 to 55)
p <- integer(55)
# assign the elemnts of p one by one
for(j in seq_along(p)){
## if the sample value already exists in the vector,
## choose the next value from the sample
n <- 1
while (numbers[n] %in% p) {
n <- n + 1
}
if(n <= length(numbers)){
p[j] <- numbers[n]
}
}
j
#[1] 55
length(unique(p)) == length(p)
#[1] TRUE

Algorithm that gives you any number n in base 3 in R

I need to write an algorithm that gives you any number n in base 3 in R.
So far I wrote that:
NameOfTheFunction <- function(n) { while (n != 0) {
{q<- n%/%3}
{r <- n%%3}
{return(r)}
q<- n } }
My problem is that I now need to stock every r in a vector. I've never done that and don't quite know how to handle it. I tried to find some things on the internet but I did not find anything really relevant to this particular situation.
After your function, use:
sapply(vector, FUN=function(n) return(NameOfTheFunction(n)))
What sapply does is, for a given vector of your choice, it will repeat the function NameOfTheFunction(n) using every element in your vector in place of n in the function. The result, in this case, will be a vector of every output from your vector.
For example:
vector <- c(10, 100, 1000, 10000)
NameOfTheFunction <- function(n) { while (n != 0) {
{q<- n%/%3}
{r <- n%%3}
{return(r)}
q<- n } }
sapply(vector, NameOfTheFunction)
[1] 1 1 1 1

Running random operations over vector, conditional

I am doing some modelling and wish to simulate randomness.
I have a total number of runs run_times which is 5 in this example.
A vector holding run_lengths will print 1's for which, so if run length is 3, it prints 1's 3 times.
The sample_data includes a sample of 1's and 0's. The application of printing 1's along a run_lengths is randomly done when sample_data == 1; not all == 1 is to be picked though. Only random... and operation can only print 1 for a total number of run_times (5).
Theres a few moving parts for sure.
I am tackling the problem in this manner:
I am able to select run_lengths at random with sample(run_lengths, 1). I am unsure how to select sample_data at random and I'm trying to keep a counter in order to stay under run_times:
run_lengths <- c(2,4,5,6,7,8,1)
run_times <- 5
sample_data <- c(0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0)
# Randomly select 1's from sample_data, when find 1, randomly print 1's along run_lengths
# Only print a certain amount of times (run_times)
# Pick run_lengths at random == sample(run_lengths,1)
# Pick df$sample 1's at random, how to randomly select????
count <- 0 # keep track of how many random run_lengths is being applied
res <- NULL
while (length(res) < length(sample_data)) {
if (sample_data[length(res)+1] == 1 & count < run_times) { # not sure how to pick sample_date == 1?
res <- c(res, rep(1,sample(run_lengths,1))) # if signal == 1 (randomly) then randomly rep a run_length
count <- count +1 # count how many random reps, run_lengths have been applied
} else {
res <- c(res, 0) # Note if condition is not true, we print 0 vs 1
}
}
res <- res[1:length(sample_data)]
res
I have completed it maybe on 60%? I'm not sure what is the best approach for choosing random 1's from sample_data. Also I'm not sure how to only keep number of run_lengths under the run_times maximum. I am attempting to keep a count for when the condition was true. If it was exceeded, it would ignore any other true conditions.
Ok, time to put down some code, still not sure about if it's right or not
sample_data <- c(0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0)
# take indices of of sampled data where value == 1
i <- which(sample_data %in% 1)
# now shuffle them all, no replacement - random positions with 1s
p <- sample(i, length(i), replace=FALSE)
print(sample_data[p[1]])
print(sample_data[p[2]])
print(sample_data[p[3]])
...
Is this what you want?

Faster solution to looped grouped RLE calculation

I have a working solution to my problem, but I will not be able to use it because it is so slow (my calculations predict that the whole simulation will take 2-3 years!). Thus I am looking for a better (faster) solution. This is (in essence) the code I am working with:
N=4
x <-NULL
for (i in 1:N) { #first loop
v <-sample(0:1, 1000000, 1/2) #generate data
v <-as.data.frame(v) #convert to dataframe
v$t <-rep(1:2, each=250) #group
v$p <-rep(1:2000, each=500) #p.number
# second loop
for (j in 1:2000) { #second loop
#count rle for group 1 for each pnumber
x <- rbind(x, table(rle(v$v[v$t==1&v$p==j])))
#count rle for group 2 for each pnumber
x <- rbind(x, table(rle(v$v[v$t==2&v$p==j])))
} #end second loop
} #end first loop
#total rle counts for both group 1 & 2
y <-aggregate(x, list(as.numeric(rownames(x))), sum)
In words: The code generates a coin-flip simulation (v). A group factor is generated (1 & 2). A p.number factor is generated (1:2000). The run lengths are recorded for each p.number (1:2000) for both groups 1 & group 2 (each p.number has runs in both groups). After N loops (the first loop), the total run lengths are presented as a table (aggregate) (that is, the run lengths for each group, for each p.number, over N loops as a total).
I need the first loop because the data that I am working with comes in individual files (so I'm loading the file, calculating various statistics etc and then loading the next file and doing the same). I am much less attached to the second loop, but can't figure out how to replace it with something faster.
What can be done to the second loop to make it (hopefully, a lot) faster?
You are committing the cardinal sin of growing an object within a for() loop in R. Don't (I repeat don't) do this. Allocate sufficient storage for x at the beginning and then fill in x as you go.
x <- matrix(nrow = N * (2000 * 2), ncol = ??)
Then in the inner loop
x[ii, ] <- table(rle(....))
where ii is a loop counter that you initialise to 1 before the first loop and increment within the second loop:
x <- matrix(nrow = N * (2000 * 2), ncol = ??)
ii <- 1
for(i in 1:N) {
.... # stuff here
for(j in 1:2000) {
.... # stuff here
x[ii, ] <- table(rle(....))
## increment ii
ii <- ii + 1
x[ii, ] <- table(rle(....))
## increment ii
ii <- ii + 1
} ## end inner loop
} ## end outer loop
Also note that you are reusing index i in bot for()loops which will not work.iis just a normal R object and so bothfor()loops will be overwriting it as the progress. USej` for the second loop as I did above.
Try that simple optimisation first and see if that will allow the real simulation to complete in an acceptable amount of time. If not, come back with a new Q showing the latest code and we can think about other optimisations. The optimisation above is simple to do, optimising table() and rle() might take a lot more work. Noting that, you might look at the tabulate() function which does the heavy lifting in table(), which might be one avenue for optimising that particular step.
If you just want to run rle and table for each combination of the values of v$t and v$p separately, there is no need for the second loop. It is much faster in this way:
values <- v$v + v$t * 10 + v$p * 100
runlength <- rle(values)
runlength$values <- runlength$values %% 2
x <- table(runlength)
y <- aggregate(unclass(x), list(as.numeric(rownames(x))), sum)
The whole code will look like this. If N is as low as 4, the growing object x will not be a severe problem. But generally I agree with #GavinSimpson, that it is not a good programming technique.
N=4
x <-NULL
for (i in 1:N) { #first loop
v <-sample(0:1, 1000000, 1/2) #generate data
v <-as.data.frame(v) #convert to dataframe
v$t <-rep(1:2, each=250) #group
v$p <-rep(1:2000, each=500) #p.number
values <- v$v + N * 10 + v$t * 100 + v$p * 1000
runlength <- rle(values)
runlength$values <- runlength$values %% 2
x <- rbind(x, table(runlength))
} #end first loop
y <-aggregate(x, list(as.numeric(rownames(x))), sum) #tota

Subtracting from random values in a weighted matrix in R

and thanks in advance for your help!
This question is related to one I posted before, but I think it deserves its own post because it is a separate challenge.
Last time I asked about randomly selecting values from a matrix after adding a vector. In that example, the matrix and the vector were both binary. Now I would like to change the values in a weighted matrix after adding a weighted vector. Here is some example code to play with.
require(gamlss.dist)
mat1<-matrix(c(0,0,0,0,1,0, 0,10,0,0,0,5, 0,0,0,0,1,0, 0,0,3,0,0,0, 0,0,0,0,3,0,
0,0,2,0,0,0, 2,1,0,1,0,1, 0,0,0,0,37,0, 0,0,0,2,0,0, 0,0,0,0,0,1, 1,0,0,0,0,0,
0,1,1,0,0,0), byrow=T, ncol=6, nrow=12)
vec1<-c(0,0,0,1,1,1)
ones <- which(vec1 == 1L)
temp=rZIP(sum(vec1)) #rZIP is a function from gamlss.dist that randomly selects values from a zero-inflated distribution
vec1[ones]<-temp
The values in the vector are sampled from a zero-inflated distribution (thanks to this question). When I bind the vector to the matrix, I want to randomly select a non zero value from the same column, and subtract the vector value from it. I can see a further complication arising if the vector value is greater than the randomly selected value in the same column. In such an instance, it would simply set that value to zero.
Here is some modified code from the earlier question that does not work for this problem but maybe will be helpful.
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec != 0) #select matrix columns where the vector is not zero
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] != 0)
out <- if(length(ones) != 0) {
ones
} else {
sample(ones, 1)
}
out
}, mat = mat, cols = cols)
ind <- (nr*(cols-1)) + rows #this line doesn't work b/c it is not binary
mat[ind] <- 0 #here is where I would like to subtract the vector value
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
Any ideas? Thanks again for all of the fantastic help!
EDIT:
Thanks to help from bnaul down below, I am a lot closer to the answer, but we have run into the same problem we hit last time. The sample function doesn't work properly on columns where there is only one nonzero value. I have fixed this using Gavin Simpson's if else statement (which was the solution in the previous case). I've adjusted the matrix to have columns with only one nonzero value.
mat1<-matrix(c(0,0,0,0,1,0, 0,0,0,0,0,5, 0,0,0,0,1,0, 0,0,0,0,0,0, 0,0,0,0,3,0,
0,0,2,0,0,0, 2,1,0,1,0,1, 0,0,0,0,37,0, 0,0,0,2,0,0, 0,0,0,0,0,1, 1,0,0,0,0,0,
0,0,0,0,0,0), byrow=T, ncol=6, nrow=12)
vec1<-c(0,1,0,0,1,1)
ones <- which(vec1 == 1L)
temp=rZIP(sum(vec1))
vec1[ones]<-temp
mat2 = rbind(mat1, vec1)
apply(mat2, 2, function(col) { #Returns matrix of integers indicating their column
#number in matrix-like object
nonzero = which(head(col,-1) != 0); #negative integer means all but last # of elements in x
sample_ind = if(length(nonzero) == 1){
nonzero
} else{
sample(nonzero, 1)
}
; #sample nonzero elements one time
col[sample_ind] = max(0, col[sample_ind] - tail(col,1)); #take max of either 0 or selected value minus Inv
return(col)
}
)
Thanks again!
mat2 = rbind(mat1, vec1)
apply(mat2, 2, function(col) {
nonzero = which(head(col,-1) != 0);
sample_ind = sample(nonzero, 1);
col[sample_ind] = max(0, col[sample_ind] - tail(col,1));
return(col)
}
)
I made a couple of simplifications; hopefully they don't conflict with what you had in mind. First, I ignore the requirement that you only operate on the nonzero elements of the vector, since subtracting 0 from anything will not change it. Second, I bind the matrix and vector and then perform the operation column-wise on the result, since this is a bit easier than tracking the indices in two separate data structures and then combining them afterward.

Resources