Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
My question deals with the fracdiff.sim function in R (in the fracdiff package) for which the help document, just like for arima.sim, is not really clear concerning initial values.
It's ok that stationary processes do not depend on their initial values when time grows, but my aim is to see in my simulations the return of my long memory process (fitted with arfima) to its mean.
Therefore, I need to input at least the p final values of my in-sample process (and eventually q innovations) if it is ARFIMA(p,d,q). In other words, I would like to set the burn-in period's length to 0 and give starting values instead.
Nevertheless, I'm currently not able to do this. I know that fracdiff.sim makes it possible for the user to chose the length of a burning period (which leads to the stationnary behavior) and the mean of the simulated process (it is simulated and then translated to make the means match). There is also a condition: the length of the burn-in period must be >= p+q. What I suppose is that there is something to do with the innov argument but I'm really not sure.
This idea is inspired by the arima.sim function which has a start.innov argument. However, even if my aim was to simulate an ARMA(p,q), I'm not sure of the exact use of this argument (the help is quite poor) : must we input only q innovations ? put with them the p last values of the in-sample process ? In which order ?
To sum up, I want to simulate ARFIMA processes starting from a specific value and having a specific mean in order to see the return to the mean and not only the long term behavior. I fund beginnings of solutions for arima.sim on the internet but nobody clearly answered, and if the solution uses start.innov, how to solve the problem for ARFIMA processes (fracdiff.sim doesn't have the start.innov argument) ?
Hopping I have been clear enough,
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 months ago.
Improve this question
I am trying to follow chapter 2 on SDT in
https://link.springer.com/chapter/10.1007/978-3-030-03499-3_2
It basically says
d'emp = z(HIT) - z(FA)
if you don't know z() let your computer compute it ..
But how? Is there a function in R? It cannot be scale becaus Hit and FA are single values.
In this book, the z-transformation z() is defined as "the inverse cumulative Gaussian function". I think the sentence "If you are not familiar with the z-transformation just treat it as a function you can find on your computer" means for readers to not stop too much time in what does z-transformation means and pay attention to the calculations of d_emp and b_emp as the differences and the average.
However, if you want to know how to compute the inverse cumulative Gaussian (normal) function, you can use qnorm() from statslibrary. Be aware that you have to specify the mean and sd of the population, by default the function takes mean = 0 and sd = 1.
To know more:
Inverse of the cumulative gaussian distribution in R
https://www.statology.org/dnorm-pnorm-rnorm-qnorm-in-r/
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
More a general question, but since I am using R -> tags
My training data set is 15,000 entries big from which around 20 i would like to use for positive data set -> building up the svm. I wanted to use the remaining resampled dataset as my negative dataset, but i was wondering, it might be better to take the same size (around 20) as the negative data set, otherwise it's highly imbalanced? Is there an easy approach to pool then the classifiers (ensemble based) in R after 1000 rounds of resampling? (or even with the e1071 package)
Followup question: I would like to calculate a score for each prediction afterwards, is it fine just to take the probabilities times 100??
Thx
You can try "class weight" approach in which the smaller class gets more weight, thus taking more cost to mis-classify the positive labelled class.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
In help(predict.lars) we can read that the parameter s is "a value, or vector of values, indexing the path. Its values depends on the mode= argument. By default (mode="step"), s should take on values between 0 and p (e.g., a step of 1.3 means .3 of the way between step 1 and 2.)"
What does "indexing the path" mean? Also, s must take a value between 1 and p, but what is p? The parameter p is not mentioned elsewhere in the help file.
I know this is basic, but there is not a single question up on SO about predict.lars.
It is easiest to use the mode="norm" option. In this case, s should just be your L1-regularization coefficient (\lambda).
To understand mode=step, you need to know a little more about the LARS algorithm.
One problem that LARS can solve is the L1-regularized regression problem: min ||y-Xw||^2+\lambda|w|, where y are the outputs, X is a matrix of input vectors, and w are the regression weights.
A simplified explanation of how LARS works is that it greedily builds a solution to this problem by adding or removing dimensions from the regression weight vector.
Each of these greedy steps can be interpreted as a solution to a L1 regularized problem with decreasing values of \lambda. The sequence of these steps is known as the path.
So, given the LARS path, to get the solution for a user-supplied \lambda, you iterate along the path until the next element is less than the input \lambda, then you take a partial step (\lambda decreases linearly between each step).
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have a species abundance dataset with quite a few zeros in it and even when I set trymax = 1000 for metaMDS() the program is unable to find a stable solution for the stress. I have already tried combining data (collapsing multiple years together to reduce the number of zeros) and I can't do any more. I was just wondering if anyone knows - is it scientifically valid to pick what R gives me at the end (the lowest of the 1000 solutions) or should I not be using NMDS because it cannot find a stable spot? There seems to be very little information about this on the internet.
One explanation for this is that you are trying to use too few dimensions for the mapping. I presume you are using the default k = 2? If so, try k = 3 and compare the stress from the best solution you got from the 1000 tries for the k = 2 solution.
I would be a little concerned to take one solution out of 1000 just because it had the best/lowest stress.
You could also try 1000 more random starts and see if it converges if you run more iterations. When you saved the output from metaMDS(), you can supply that object to another call to metaMDS() via the previous.best argument. It will then do trymax further random starts but compare any lower-stress solutions with the previous best and converge if it finds one similar to it, rather than have to find two similar low-stress solutions in the 1000 starts.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
1:I would like to create a synthetic dataset of 14.000 genes (rows) and 250 samples (columns of the matrix).
How this can be done?
2: after this, I would like to infer gene regulation using for ex algorithms of mutual information. I know how and in fact I have a network.
3: I would like to know if the net I had is due by chance or not. To do this, one common approach is to schuffle samples or genes, 1000 times, to create 1000 net and plot a null distribution to validate the net you previously (point 2) obtained. This is called bootstrap.
Is there another method?
Best,
E.
The sample function in R is the basic way to construct random permutations of existing data. It's not clear what you want, and an additional thought was that you might just need to be pointed to the runif function for generating random Uniform sequences. If you had 1000 objects of a particular sort in an object vector, obj:
sample( obj ) # returns a permuted sequence
# Same as ...
obj[ sample(length(obj)) ]
Whether that is a "null distribution" is up to you to decide. (And that request for "all" of the methods to do any particular task in R will be viewed as being excessively demanding. There are often a large number of methods, and even asking for the "best" will increase you chances of getting your question closed.)