Equivalent to R's sample(x,y,prob=) in Julia - julia

Julia-users: is there an equivalent to R's sample(x,y,prob=) to sample from a given set of values with weighted probabilities? The rand() function is equivalent to sample(x,y), but as far as I'm aware there's no option to add probability weights... Any help appreciated!

OK - done a bit more digging and wsample from the Distributions package seems to be the answer:
using Distributions
wsample(population, weights, n)
Next time I'll look harder before posting!

wsample exists also in the StatsBase.jl package (which I am not sure is a more recent addition compared to when the question was first answered)
If you go by StatsBase.jl you can also just use "sample":
using StatsBase
sample(population, Weights(weights), n)
In both packages you can also set a random number generator and whether to take with replacement for both functions too.

Related

How to run a hodges-lehmann test in R or SPSS

How can I compute the Hodges-Lehmann (aligned ranks) test in R or SPSS? Is there are ready-made functions to call? I understood the formula,but if the Hodges-Lehmann test has already been implemented, there would be no need to create a new function .
I tried hodgeslehmann() in the senstrat package, but it's not what I need. It only computes Hodges-Lehmann Aligned Ranks, but couldn't give the statistical value.
An R version in a package called DescTools is documented here and also here.
Note especially that it will supply confidence intervals but only if you make use of the conf.level argument.
And there is SPSS Syntax code here, but you may prefer something that has had better open-source scrutiny.

1 sample t-test from summarized data in R

I can perform a 1 sample t-test in R with the t.test command. This requires actual sets of data. I can't use summary statistics (sample size, sample mean, standard deviation). I can work around this utilizing the BSDA package. But are there any other ways to accomplish this 1-sample-T in R without the BSDA pacakage?
Many ways. I'll list a few:
directly calculate the p-value by computing the statistic and calling pt with that and the df as arguments, as commenters suggest above (it can be done with a single short line in R - ekstroem shows the two-tailed test case; for the one tailed case you wouldn't double it)
alternatively, if it's something you need a lot, you could convert that into a nice robust function, even adding in tests against non-zero mu and confidence intervals if you like. Presumably if you go this route you'' want to take advantage of the functionality built around the htest class
(code and even a reasonably complete function can be found in the answers to this stats.SE question.)
If samples are not huge (smaller than a few million, say), you can simulate data with the exact same mean and standard deviation and call the ordinary t.test function. If m and s and n are the mean, sd and sample size, t.test(scale(rnorm(n))*s+m) should do (it doesn't matter what distribution you use, so runif would suffice). Note the importance of calling scale there. This makes it easy to change your alternative or get a CI without writing more code, but it wouldn't be suitable if you had millions of observations and needed to do it more than a couple of times.
call a function in a different package that will calculate it -- there's at least one or two other such packages (you don't make it clear whether using BSDA was a problem or whether you wanted to avoid packages altogether)

Is there an equivalent to matlab's rcond() function in Julia?

I'm porting some matlab code that uses rcond() to test for singularity, as also recommended here (for matlab singularity testing).
I see that there is a cond() function in Julia (as also in Matlab), but rcond() doesn't appear to be available by default:
ERROR: rcond not defined
I'd assume that rcond(), like the Matlab version is more efficient than 1/cond(). Is there such a function in Julia, perhaps using an add-on module?
Julia calculates the condition number using the ratio of maximum to the minimum of the eigenvalues (got to love open source, no more MATLAB black boxs!)
Julia doesn't have a rcond function in Base, and I'm unaware of one in any package. If it did, it'd just be the ratio of the maximum to the minimum instead. I'm not sure why its efficient in MATLAB, but its quite possible that whatever the reason is it doesn't carry though to Julia.
Matlab's rcond is an optimization based upon the fact that its an estimate of the condition number for square matrices. In my testing and given that its help mentions LAPACK's 1-norm estimator, it appears as though it uses LAPACK's dgecon.f. In fact, this is exactly what Julia does when you ask for the condition number of a square matrix with the 1- or Inf-norm.
So you can simply define
rcond(A::StridedMatrix) = 1/cond(A,1)
You can save Julia from twice-inverting LAPACK's results by manually combining cond(::StridedMatrix) and cond(::LU), but the savings here will almost certainly be immeasurable. Where there is a measurable savings, however, is that you can directly take the norm(A) instead of reconstructing a matrix similar to A through its LU factorization.
rcond(A::StridedMatrix) = LAPACK.gecon!('1', lufact(A).factors, norm(A, 1))
In my tests, this behaves identically to Matlab's rcond (2014b), and provides a decent speedup.

Operations on long numbers in R

I aim to use maximum likelihood methods (usually about 10^5 iterations) with a probability distribution that creates very big integers and very small float values that cannot be stored as a numeric nor a in a float type.
I thought I would use the as.bigq in the gmp package. My issue is that one can only add, substract, multiply and dived two objects of class/type bigq, while my distribution actually contains logarithm, power, gamma and confluent hypergeometric functions.
What is my best option to deal with this issue?
Should I use another package?
Should I code all these functions for bigq objects.
Coding these function on R may cause some functions to be very slow, right?
How to write the logarithm function using only the +,-,*,/ operators? Should I approximate this function using a taylor series expansion?
How to write the power function using only the +,-,*,/ operators when the exponent is not an integer?
How to write the confluent hypergeometric function (the equivalent of the Hypergeometric1F1Regularized[..] function in Mathematica)?
I could eventually write these functions in C and call them from R but it sounds like some complicated work for not much, especially if I have to use the gmp package in C as well to handle these big numbers.
All your problems can be solved with Rmpfr most likely which allows you to use all of the functions returned by getGroupMembers("Math") with arbitrary accuracy.
Vignette: http://cran.r-project.org/web/packages/Rmpfr/vignettes/Rmpfr-pkg.pdf
Simple example of what it can do:
test <- mpfr(rnorm(100,mean=0,sd=.0001), 240)
Reduce("*", test)
I don't THINK it has hypergeometric functions though...

Optimization in R with arbitrary constraints

I have done it in Excel but need to run a proper simulation in R.
I need to minimize function F(x) (x is a vector) while having constraints that sum(x)=1, all values in x are [0,1] and another function G(x) > G_0.
I have tried it with optim and constrOptim. None of them give you this option.
The problem you are referring to is (presumably) a non-linear optimization with non-linear constraints. This is one of the most general optimization problems.
The package I have used for these purposes is called nloptr: see here. From my experience, it is both versatile and fast. You can specify both equality and inequality constaints by setting eval_g_eq and eval_g_ineq, correspondingly. If the jacobians are known explicitly (can be derived analytically), specify them for faster convergence; otherwise, a numerical approximation is used.
Use this list as a general reference to optimization problems.
Write the set of equations using the Lagrange multiplier, then solve using the R command nlm.
You can do this in the OpenMx Package (currently host at the site listed below. Aiming for 2.0 relase on cran this year)
It is a general purpose package mostly used for Structural Equation Modelling, but handling nonlinear constraints.
FOr your case, make an mxModel() with your algebras expressed in mxAlgebras() and the constraints in mxConstraints()
When you mxRun() the model, the algebras will be solved within the constraints, if possible.
http://openmx.psyc.virginia.edu/

Resources