I am very new to programming so I apologise in advance for my lack of knowledge.
I want to find the probability of obtaining the sum k when throwing m die. I am not looking for a direct answer, I just want to ask if I am on the right track and what I can improve.
I begin with a function that calculates the sum of an array of m die:
function dicesum(m)
j = rand((1:6), m)
Now I am trying specific values to see if I can find a pattern (but without much luck). I have tried m = 2 (two die). What I am trying to do is to write a function which checks whether the sum of the two die is k and if it is, it calculates the probability. My attempt is very naive but I am hoping someone can point me in the right direction:
m = 2
x, y = rand(1:6), rand(1:6)
z = x+y
if z == dicesum(m)
Probability = ??/6^m
I want to somehow find the number of 'elements' in dicesum(2) in order to calculate the probability. For example, consider the case when dicesum(2) = 8. With two die, the possible outcomes are (2,6),(6,2), (5,3), (3,5), (4,4), (4,4). The probability being (2/36)*3.
I understand that the general case is far more complicated but I just want an idea of how to being this problem. Thanks in advance for any help.

If I understand correctly, you want to use simulation to approximate the probability of obtaining a sum of k when roll m dice. What I recommend is creating a function that will take k and m as arguments and repeat the simulation a large number of times. The following might help you get started:
function Simulate(m,k,Nsim=10^4)
#Initialize the counter
#Repeat the experiment Nsim times
for sim in 1:Nsim
#Simulate roll of m dice
s = sum(rand(1:6,m))
#Increment counter if sum matches k
if s == k
cnt += 1
#Return the estimated probability
return cnt/Nsim
prob = Simulate(3,4)
The estimate is approximately .0131.

You can also perform your simulation in a vectorized style as shown below. Its less efficient in terms of memory allocation because it creates a vector s of length Nsim, whereas the loop code uses a single integer to count, cnt. Sometimes unnecessary memory allocation can cause performance issues. In this case, it turns out that the vectorized code is about twice as fast. Usually, loops are a bit faster. Someone more familiar with the internals of Julia might be able to offer an explanation.
function Simulate1(m,k,Nsim=10^4)
#Simulate roll of m dice Nsim times
s = sum(rand(1:6,Nsim,m),2)
#Relative frequency of matches
prob = mean(s .== k)
return prob


As the title illustrates, I would like to conduct a simulation test. I was given a probability P(L>x)=0.05, and L follows a normal distribution with mean=0, std=100. I was asked to perform some sort of simulation, IDEALLY using a hit-or-miss approach multiple times to do so in order to find an appropriate x. I was not allowed to use qnorm() function. Can you please help me out? Thank you
As we want P(L>x)=0.05, we can create a function that calculates P(L>x)-0.05, and find the x that turns it to 0 (its root) with uniroot:
prob = function(x){
n = 10000
L = rnorm(n,0,100)
sum(L > x)/n - 0.05}
uniroot(prob, c(-400,-50))
Obs: the second argument for uniroot is the arbitrary interval where it'll try to find the root.
This will find a different root every time you run it as L is created inside prob. For better accuracy, you can increase n.

I have a list of about 100 000 probabilities on an event stored in a vector.
I want to know if it is possible to calculate the probability of n occuring events (e.g. what is the probability that exactly 1000 events occur).
I managed to calculate several probabilities in R :
p is the vector containing all the probabilities
probability of none : prod(1-p)
probability of at least one : 1 - prod(1-p)
I found how to calculate the probability of exactly one event :
sum(p * (prod(1-p) / (1-p)))
But I don't know how to generate a formula for n events.
I do not know R, but I know how I would solve this with programming.
This is a straightforward dynamic programming problem. We start with a vector v = [1.0] of probabilities. Then in untested Python:
for p_i in probabilities:
next_v = [p_i * v[0]]
for j in range(len(v) - 1):
next_v.append(v[j]*p_i + v[j+1]*(1-p_i)
# For roundoff errors
total = sum(next_v)
for j in range(len(next_v)):
next_v[j] /= total
v = next_v
And now your answers can be just read off of the right entry in the vector.
This approach is equivalent to calculating Pascal's triangle row by row, throwing away the old row when you're done.

I have been trying to compute a bigger function and one part of it is a while loop with 2 conditions. Foreach value of k, in a certain range of values (x_min and x_max are computed within the whole function), i am trying to compute a matrix with values from a distribution in which the k itself is a part. The while loop assesses that the necessary conditions for the distribution are met, while the foreach- function should compute the while loop for every element of k. Since i do not know the exact amount of elements in k, i thought the problem might be the predetermination of I. The best i could derive was an endless computation within the loops (or simple crashes of R). I am thankful for any suggestion!
k<-x[x_min < x & x < x_max]
foreach(k) %do% {
for(i in 1:100){
I<-replicate(n=100,rbinom(n= 250, size=1, prob = k/250))
if(sum(I[,i])==k) check=1
Changing the order unfortunatly did not work.
It seems to still have problems. I tried to extract the matrix, but it is reporting "NULL".
n is 250, x_min and x_max are defined from a formula (around 2-8), k is defined within the formula given above, x are values between 0 and around 10 (also computed within the formula). I would provide you with the whole formula, but it is big and i could narrow down the problems to these parts, so i wanted to keep the problem as simple as possible. Thank you for your help and comments!

Objective function to be maximized : pos%*%mu where pos is the weights row vector and mu is the column vector of mean returns of d stocks
Constraints: 1) ones%*%pos = 1 where ones is a row vector of 1's of size 1*d (d is the number of stocks)
2) pos%*%cov%*%t(pos) = rb^2 # where cov is the covariance matrix of size d*d and rb is risk budget which is the free parameter whose values will be changed to draw the efficient frontier
I want to write a code for this optimization problem in R but I can't think of any function or library for help.
PS: solve.QP in library quadprog has been used to minimize covariance subject to a target return . Can this function be also used to maximize return subject to a risk budget ? How should I specify the Dmat matrix and dvec vector for this problem ?
mu <- matrix(c(0.01,0.02,0.03),3,1)
cov # predefined covariance matrix of size 3*3
pos <- matrix(c(1/3,1/3,1/3),1,3) # random weights vector
edr <- pos%*%mu # expected daily return on portfolio
m1 <- matrix(1,1,3) # constraint no.1 ( sum of weights = 1 )
m2 <- pos%*%cov # constraint no.2
Amat <- rbind(m1,m2)
bvec <- matrix(c(1,0.1),2,1)
solve.QP(Dmat= ,dvec= ,Amat=Amat,bvec=bvec,meq=2)
How should I specify Dmat and dvec ? I want to optimize over pos
Also, I think I have not specified constraint no.2 correctly. It should make the variance of portfolio equal to the risk budget.
(Disclaimer: There may be a better way to do this in R. I am by no means an expert in anything related to R, and I'm making a few assumptions about how R is doing things, notably that you're using an interior-point method. Also, there is likely an R package for what you're trying to do, but I don't know what it is or how to use it.)
Minimising risk subject to a target return is a linearly-constrained problem with a quadratic objective, looking like this:
min x^T Q x
subject to sum x_i = 1
sum ret_i x_i >= target
(and x >= 0 if you want to be long-only).
Maximising return subject to a risk budget is quadratically-constrained, however; it looks like this:
max ret^T x
subject to sum x_i = 1
x^T Q x <= riskbudget
(and maybe x >= 0).
Convex quadratic terms in the objective impose less of a computational cost in an interior-point method compared to introducing a convex quadratic constraint. With a quadratic objective term, the Q matrix just shows up in the augmented system. With a convex quadratic constraint, you need to optimise over a more complicated cone containing a second-order cone factor and you need to be careful about how you solve the linear systems that arise.
I would suggest you use the risk-minimisation formulation repeatedly, doing a binary search on the target parameter until you've found a portfolio approximately maximising return subject to your risk budget. I am suggesting this approach because it is likely sufficient for your needs.
If you really want to solve your problem directly, I would suggest using an interface Todd, Toh, and Tutuncu's SDPT3. This really is overkill; SDPT3 permits you to formulate and solve symmetric cone programs of your choosing. I would also note that portfolio optimisation problems are particularly special cases of symmetric cone programs; other approaches exist that are reportedly very successful. Unfortunately, I'm not studied up on them.

I have a homework problem for my algorithms class asking me to calculate the maximum size of a problem that can be solved in a given number of operations using an O(n log n) algorithm (ie: n log n = c). I was able to get an answer by approximating, but is there a clean way to get an exact answer?
There is no closed-form formula for this equation. Basically, you can transform the equation:
n log n = c
log(n^n) = c
n^n = exp(c)
Then, this equation has a solution of the form:
n = exp(W(c))
where W is Lambert W function (see especially "Example 2"). It was proved that W cannot be expressed using elementary operations.
However, f(n)=n*log(n) is a monotonic function. You can simply use bisection (here in python):
import math
def nlogn(c):
lower = 0.0
upper = 10e10
while True:
middle = (lower+upper)/2
if lower == middle or middle == upper:
return middle
if middle*math.log(middle, 2) > c:
upper = middle
lower = middle
the O notation only gives you the biggest term in the equation. Ie the performance of your O(n log n ) algorithm could actually be better represented by c = (n log n) + n + 53.
This means that without knowing the exact nature of the performance of your algorithm you wouldn't be able to calculate the exact number of operations required to process an given amount of data.
But it is possible to calculate that the maximum number of operations required to process a data set of size n is more than a certain number, or conversely that the biggest problem set that can be solved, using that algorithm and that number of operations, is smaller than a certain number.
The O notation is useful for comparing 2 algorithms, ie an O(n^2) algorithm is faster than a O(n^3) algorithm etc.
see Wikipedia for more info.
