Get quantile for each value - r

Is there an implemented (!) function in R which gives you the empirical quantile for each value? I couldn't find any ...
Let's say we have x
x = c(1,3,4,2)
I want to have the quantile of each element.
[1] 0.25, 0.75, 1, 0.5
Thank you very much!

You can use the ecdf() function:
ecdf(x)(x)
[1] 0.25 0.75 1.00 0.50
ecdf(x) creates a function, and you pass the elements of x to that function. The syntax admittedly looks strange

Related

How to create simple covariance in Julia on a matrix

Using Julia 0.5. Given:
Supertech = [-.2 .1 .3 .5];
Slowpoke = [.05 .2 -.12 .09];
How in the world can I get a covariance. In Excel I just say
=covariance.p(Supertech,Slowpoke)
and it gives me the correct answer of -0.004875
For the life of me I can't figure out how to get this to work using StatsBase.cov()
I've tried putting this into a matrix like:
X = [Supertech; Slowpoke]'
which gives me a nice:
4×2 Array{Float64,2}:
-0.2 0.05
0.1 0.2
0.3 -0.12
0.5 0.09
but I can't get this simple thing to work. I keep coming up with dimension mismatches when I try to use the WeightedVector type.
The syntax [-.2 .1 .3 .5] doesn't create a vector, it creates a one-row matrix. The cov function is actually defined in base Julia, but it requires vectors. So you simply need to use the syntax with commas to create vectors in the first place ([-.2, .1, .3, .5]), or you can use the vec function to reshape the matrix to a one-dimensional vector. It also uses the "corrected" covariance by default, whereas Excel is using the "uncorrected" covariance. You can use the third argument to specify that you don't want this correction.
julia> cov(vec(Supertech), vec(Slowpoke))
-0.0065
julia> cov(vec(Supertech), vec(Slowpoke), false)
-0.004875

extract percentage explained variance in ssa result (Rssa)

I'm working with the Rssa package to decompose time series, witch works fine except that I can't get the percentage of explained variance from each eigenvector (if these are the right words to explain this). However, these percentages are noted on top on one of the graphs I can plot with this package.
Let me give an example:
d=rnorm(200,10,3)
plot(d,type="l")
ssa=ssa(d, L = 100,digits=0)
plot(ssa,type="vector") #the percentage I want is in the title of each individual graph
# to reconstruct the trend and the residuals
res <- reconstruct(ssa, groups = list(1))
trend <- res$F1
How do I get these percentages in a vector? Especially since I want to loop over multiple series.
Thank you!
Seems that the code for weighted norm of the series by component is hidden in the package.
I extract the code from Rssa:::.plot.ssa.vectors.1d.ssa and wrapped it a small function:
component_wnorm <-
function(x) {
idx <- seq_len(min(nsigma(x), 10))
x <- ssa
total <- wnorm(x)^2
round(100*x$sigma[idx]^2 / total, digits = 2)
}
component_wnorm(ssa)
[1] 92.02 0.35 0.34 0.27 0.27 0.25 0.22 0.20 0.20 0.18
The recent version of Rssa has the function contributions.
Therefore, you can use
> s <- ssa(d, L=100)
> c <- contributions(s)*100
> print(c[1:10], digits = 2)
[1] 92.41 0.28 0.26 0.26 0.26 0.23 0.23 0.21 0.20 0.20

Error in Weibull distribution

file.data has the following values to fit with Weibull distribution,
x y
2.53 0.00
0.70 0.99
0.60 2.45
0.49 5.36
0.40 9.31
0.31 18.53
0.22 30.24
0.11 42.23
Following the Weibull distribution function f(x)=1.0-exp(-lambda*x**n), it is giving error:
fit f(x) 'data.dat' via lambda, n
and finally plotting f(x) and xy graph have large discrepancy.
Any feedback would be highly appreciated. Thanks!
Several things:
You must skip the first line (if it really is x y).
You must use the correct function (the pdf and not the CDF, see http://en.wikipedia.org/wiki/Weibull_distribution, like you did in https://stackoverflow.com/q/20336051/2604213)
You must use an additional scaling parameter, because your data are not normalized
You must select adequate initial values for the fitting.
The following works fine:
f(x) = (x < 0 ? 0 : a*(x/lambda)**(n-1)*exp(-(x/lambda)**n))
n = 0.5
a = 100
lambda = 0.15
fit f(x) 'data.dat' every ::1 via lambda, n, a
set encoding utf8
plot f(x) title sprintf('λ = %.2f, n = %.2f', lambda, n), 'data.dat' every ::1
That gives (with 4.6.4):
If that's the actual command you provided to gnuplot, it won't work because you haven't yet defined f(x).

How do I extract ecdf values out of ecdfplot()

If I use the ecdfplot() function of the latticeExtra package how do I get the actual values calculated i.e. the y-values which correspond to the ~x|g input?
I've been looking at ?ecdfplot but there's not discription to it. For the usual highlevel function ecdf() it works with the command plot=FALSE but this does not work for ecdfplot().
The reason I want to use ecdfplot() rather than ecdf() is that I need to calculate the ecdf() values for a grouping variable. I know I could do this handish too but I'm quite convinced that there is a highroad too.
Here a small expample
u <- rnorm(100,0,1)
mygroup <- c(rep("group1",50),rep("group2",50))
ecdfplot(~u, groups=mygroup)
I would like to extract the y-values given each group for the corresponding x-values.
If you stick with the ecdf() function in the base package, you can simply do as follows:
Create ecdf function with your data:
fun.ecdf <- ecdf(x) # x is a vector of your data
Now use this "ecdf function" to generate the cumulative probabilities of any vector you feed it, including your original, sorted data:
my.ecdf <- fun.ecdf(sort(x))
I know you said you don't want to use ecdf, but in this case it is much easier to use it than to get the data out of the trellis object that ecdfplot returns. (After all, that's all that ecdfplot is doing- it's just doing it behind the scenes).
In the case of your example, the following will get you a matrix of the y values (where x is your entire input u, though you could choose a different one) for each ECDF:
ecdfs = lapply(split(u, mygroup), ecdf)
ys = sapply(ecdfs, function(e) e(u))
# output:
# group1 group2
# [1,] 0.52 0.72
# [2,] 0.68 0.78
# [3,] 0.62 0.78
# [4,] 0.66 0.78
# [5,] 0.72 0.80
# [6,] 0.86 0.94
# [7,] 0.10 0.26
# [8,] 0.90 0.94
# ...
ETA: If you just want each column to correspond to the 50 x-values in that column, you could do:
ys = sapply(split(u, mygroup), function(g) ecdf(g)(g))
(Note that if the number of values in each group aren't identical, this will end up as a list rather than a matrix with columns).

Why Does 1.59 not Equal 1.59

Alright, so I have the strangest issue here. I'm taking the mean of a dependent variable Y, when we partition a space by a particular quantile of an independent variable X.
My issue is, the quantile function in R is not returning a value within the range of my independent variable X, however the value it is returning, when printed to the screen is the correct value. What makes this stranger is it only happens with particular quantiles.
Some example code to demonstrate this weird effect:
x<-c(1.49,rep(1.59,86))
quantile(x,0.05) # returns 1.59, the correct value
# However both of these return all values as false
table(x>=quantile(x,0.05))
table(x==quantile(x,0.05))
# But if we take a quantile at 0.075 it works correctly
table(x>=quantile(x,0.075))
Any insight you guys can provide would be appreciated.
The quantile isn't exactly 1.59:
> quantile(x, 0.05)[[1]] == 1.59
[1] FALSE
> quantile(x, 0.05)[[1]] == 1.5900000000000003
[1] TRUE
quantile(..., type = 7) appears to be replacing 1.59 with 0.7000000000000001 * 1.59 + 0.3 * 1.59, which introduces a tiny error that bars the use of exact equality.

Resources