while loop with conditions - r

I have been trying to compute a bigger function and one part of it is a while loop with 2 conditions. Foreach value of k, in a certain range of values (x_min and x_max are computed within the whole function), i am trying to compute a matrix with values from a distribution in which the k itself is a part. The while loop assesses that the necessary conditions for the distribution are met, while the foreach- function should compute the while loop for every element of k. Since i do not know the exact amount of elements in k, i thought the problem might be the predetermination of I. The best i could derive was an endless computation within the loops (or simple crashes of R). I am thankful for any suggestion!
I<-mat.or.vec(n,10000)
k<-x[x_min < x & x < x_max]
foreach(k) %do% {
for(i in 1:100){
check=0
while(check==0){
I<-replicate(n=100,rbinom(n= 250, size=1, prob = k/250))
if(sum(I[,i])==k) check=1
}
}
}
Changing the order unfortunatly did not work.
It seems to still have problems. I tried to extract the matrix, but it is reporting "NULL".
n is 250, x_min and x_max are defined from a formula (around 2-8), k is defined within the formula given above, x are values between 0 and around 10 (also computed within the formula). I would provide you with the whole formula, but it is big and i could narrow down the problems to these parts, so i wanted to keep the problem as simple as possible. Thank you for your help and comments!

Related

How do I perform a simulation to find a z-score (x) with a given probability in R

As the title illustrates, I would like to conduct a simulation test. I was given a probability P(L>x)=0.05, and L follows a normal distribution with mean=0, std=100. I was asked to perform some sort of simulation, IDEALLY using a hit-or-miss approach multiple times to do so in order to find an appropriate x. I was not allowed to use qnorm() function. Can you please help me out? Thank you
As we want P(L>x)=0.05, we can create a function that calculates P(L>x)-0.05, and find the x that turns it to 0 (its root) with uniroot:
prob = function(x){
n = 10000
L = rnorm(n,0,100)
sum(L > x)/n - 0.05}
uniroot(prob, c(-400,-50))
Obs: the second argument for uniroot is the arbitrary interval where it'll try to find the root.
This will find a different root every time you run it as L is created inside prob. For better accuracy, you can increase n.

Getting incorrect values of theta while trying to implement stochastic gradient descent

I am trying to implement Stochastic Gradient Descent algorithm for logistic regression. I have written a small train function whose job is to get the theta values / coefficients. But the values of theta come out to be incorrect and are same as the one initialised. I could not understand the reason for this. Is it not the correct way to implement stochastic gradient descent?
Here is the code I wrote for it:
train <- function(data, labels, alpha = 0.0009) {
theta <- seq(from = 0, to = 1, length.out = nrow(data))
label <- label[,shuffle]
data <- data[,shuffle]
for(i in seq(1:ncol(data))) {
h = hypothesis(x, theta)
theta <- theta - (alpha * ((h - y) * data[,i]))
}
return(theta)
}
Please note that, each column in the data frame is one input. There are 20K columns and 456 rows. So, 20K input values for training. The corresponding data frame named labels has the correct value for the input training data. So for example column 45 in data has its corresponding y value in column 45 of labels.
In the regression above, I am trying to train to predict between the label 1 and label 0. So labels is a data frame that comprises of 0 and 1.
I can't debug this for you without a minimal, complete, and verifiable example, but I can offer you a tool to help you debug it:
add browser() in the body of your function like this:
train <- function(data, labels, alpha = 0.001) {
browser()
# ... the rest of your function
Call train with your data. This will open up a browser session. You can enter help (not the function, just help) to get the commands to navigate in the browser, but in general, use n and s to step through the statements (s will step into a nested function call, n will step over). If you do this in RStudio, you can keep an eye on your environment tab to see what the values for, e.g., theta are, and see a current traceback. You can also evaluate any R expression, e.g., tail(theta) in the executing environment. Q exits the browser.
I'd recommend exploring what hypothesis returns in particular (I'd be surprised if it's not almost always 1). But I think you have other issues causing the undesired behavior you described (the return value for theta isn't changing from its initial assignment).
EDIT:
Fix the typo: label should be labels each time.
Compare the sum of your return with the sum of theta as it is initialized, and you'll see that the return value is not the same as your initialized theta. Hope that helped!

Monte-Carlo Simulation for the sum of die

I am very new to programming so I apologise in advance for my lack of knowledge.
I want to find the probability of obtaining the sum k when throwing m die. I am not looking for a direct answer, I just want to ask if I am on the right track and what I can improve.
I begin with a function that calculates the sum of an array of m die:
function dicesum(m)
j = rand((1:6), m)
sum(j)
end
Now I am trying specific values to see if I can find a pattern (but without much luck). I have tried m = 2 (two die). What I am trying to do is to write a function which checks whether the sum of the two die is k and if it is, it calculates the probability. My attempt is very naive but I am hoping someone can point me in the right direction:
m = 2
x, y = rand(1:6), rand(1:6)
z = x+y
if z == dicesum(m)
Probability = ??/6^m
I want to somehow find the number of 'elements' in dicesum(2) in order to calculate the probability. For example, consider the case when dicesum(2) = 8. With two die, the possible outcomes are (2,6),(6,2), (5,3), (3,5), (4,4), (4,4). The probability being (2/36)*3.
I understand that the general case is far more complicated but I just want an idea of how to being this problem. Thanks in advance for any help.
If I understand correctly, you want to use simulation to approximate the probability of obtaining a sum of k when roll m dice. What I recommend is creating a function that will take k and m as arguments and repeat the simulation a large number of times. The following might help you get started:
function Simulate(m,k,Nsim=10^4)
#Initialize the counter
cnt=0
#Repeat the experiment Nsim times
for sim in 1:Nsim
#Simulate roll of m dice
s = sum(rand(1:6,m))
#Increment counter if sum matches k
if s == k
cnt += 1
end
end
#Return the estimated probability
return cnt/Nsim
end
prob = Simulate(3,4)
The estimate is approximately .0131.
You can also perform your simulation in a vectorized style as shown below. Its less efficient in terms of memory allocation because it creates a vector s of length Nsim, whereas the loop code uses a single integer to count, cnt. Sometimes unnecessary memory allocation can cause performance issues. In this case, it turns out that the vectorized code is about twice as fast. Usually, loops are a bit faster. Someone more familiar with the internals of Julia might be able to offer an explanation.
function Simulate1(m,k,Nsim=10^4)
#Simulate roll of m dice Nsim times
s = sum(rand(1:6,Nsim,m),2)
#Relative frequency of matches
prob = mean(s .== k)
return prob
end

How do I get started with this?

So I am stuck on this problem for a long time.
I was think I should first create the two functions, like this:
n = runif(10000)
int sum = 0
estimator1_fun = function(n){
for(i in 1:10000){
sum = sum + ((n/i)*runif(1))
)
return (sum)
}
and do the same for the other function, and use the mse formula? Am I even approaching this correctly? I tried formatting it, but found that using an image would be better.
Assuming U(0,Theta_0) is the uniform distribution from 0 to Theta_0, and that Theta_0 is a fixed constant, I would proceed as follows:
1. Define Theta_0. Give it a fixed value.
2. Write the function that gives a random number from that distribution
- The distribution function is runif(0,Theta_0).
- Arguments could be Theta_0 and N.
3. Sample it a few thousand (or whatever) times into a vector X.
4. Calculate the two estimates.
5. Repeat steps 3 & 4 for more samples
6. Plot the two estimates against the number of samples and
see if it is approaching Theta_0

Find the second derivative of a log likelihood function

I'm interested in finding the values of the second derivatives of the log-likelihood function for logistic regression with respect to all of my m predictor variables.
Essentially I want to make a vector of m ∂2L/∂βj2 values where j goes from 1 to m.
I believe the second derivative should be -Σi=1n xij2(exiβ)/((1+exiβ)2) and I am trying to code it in R. I did something dumb when trying to code it and was wondering if there was some sort of sapply function I could use to do it more easily.
Here's the code I tried (I know the sum in the for loop doesn't really do anything, so I wasn't sure how to sum those values).
for (j in 1:m)
{
for (i in 1:n)
{
d2.l[j] <- -1*(sum((x.center[i,j]^2)*(exp(logit[i])/((1 + exp(logit[i])^2)))))
}
}
And logit is just a vector consisting of Xβ if that's not clear.
I'm hazy on the maths (and it's hard to read latex) but purely on the programming side, if logit is a vector with indices i=1,...,n and x.center is a nxm matrix:
for (j in 1:m)
dt.l[j] <- -sum( x.center[,j]^2 * exp(logit)/(1+exp(logit))^2 )
where the sum sums over i.
If you want to do it "vector-ish", you can take advantage of the fact that if you do matrix * vector (your x.center * exp(logit)/...) this happens column-wise in R which suits your equation:
-colSums(x.center^2 * exp(logit)/(1+exp(logit))^2)
For what it's worth, although the latter is "slicker", I will often use the explicit loop (as with the first example), purely for readability. Or else when I come back in a month's time I get very confused about my is and js and what is being summed over when.

Resources