Plotting two distributions on same plot - julia

I am trying to plot two probability distributions onto the same plot. One is a uniform distribution with area under the curve = 1 and the other is a chi distribution (or to be more specific, a Maxwellian) with three degrees of freedom with area under the curve = 1. Here is my code so far:
using Plots, Distributions
gr()
v = -0.5:0.001:1.5
f_s0 = pdf.(Uniform(0,1), v) # uniform distribution with area 1
F_s = pdf.(Chi(3), v) # chi distribution with area 1
plot(v, f_s0, label="f_s0")
plot!(v, F_s, linecolor = :orange, linestyle = :dash, label="F_s (Maxwellian)")
xlabel!("velocity")
ylabel!("probability density")
xlims!(-0.5, 1.5)
ylims!(0, 1.5)
When I run this, I get
DomainError with -0.5:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).
Any suggestions?

Chi distribution starts from zero so this should be:
F_s = pdf.(Chi(3), 0:0.001:1.5)

Probably there is a issue with Distributions, you can define a function as a workaround:
using Plots, Distributions
gr()
v = -0.5:0.001:1.5
f_s0 = pdf.(Uniform(0,1), v) # uniform distribution with area 1
CHIPDF(x) = x>0 ? pdf(Chi(3),x) : 0
F_s = CHIPDF.(v) # chi distribution with area 1
plot(v, f_s0, label="f_s0")
plot!(v, F_s, linecolor = :orange, linestyle = :dash, label="F_s (Maxwellian)")
xlabel!("velocity")
ylabel!("probability density")
xlims!(-0.5, 1.5)
ylims!(0, 1.5)

The Chi distribution is only defined on the nonnegative numbers, and this is really a constraint of math rather than programming. If you are considering a Chi distribution, this may suggest that the quantity you are considering cannot even be negative, in which case the answer is to only plot for positive x values anyways. Nonetheless, if you want to show the Uniform distribution going back to zero, then I would do something like the following.
using Plots, Distributions
vᵤ = -0.1:0.005:4
f_s0 = pdf.(Uniform(0,1), vᵤ) # uniform distribution with area 1
plot(vᵤ, f_s0, label="f_s0", framestyle=:box)
vᵪ = 0:0.005:4
F_s = pdf.(Chi(3), vᵪ) # chi distribution with area 1
plot!(vᵪ, F_s, linecolor = :orange, linestyle = :dash, label="F_s (Maxwellian)")
xlabel!("velocity")
ylabel!("probability density")
xlims!(-0.1, 4)
ylims!(0, 1.5)
yielding

Related

Logistic regression plot gives a linear regression line instead of a S-shaped curve [duplicate]

I was plotting the results of a logistic regression, but instead of the expected S curve, I got a straight line like this:
This was the code that I was using:
I created a range of data from the original x-axis, converted it to data frame, and then predicted and drew the lines.
model = glm(SHOT_RESULT~SHOT_DISTANCE,family='binomial',data = df_2shot)
summary(model)
#Eqn : P(SHOT_RESULT = True) = 1 / (1 + e^-(0.306 - 0.0586(SHOT_DISTANCE)))
r = range(df_2shot$SHOT_DISTANCE) # draws a curve based on prediction
x_range = seq(r[1],r[2],1)
x_range = as.integer(x_range)
y = predict(model,data.frame(SHOT_DISTANCE = x_range),type="response")
plot(df_2shot$SHOT_DISTANCE, df_2shot$SHOT_RESULT, pch = 16,
xlab = "SHOT DISTANCE", ylab = "SHOT RESULT")
lines(x_range,y)
Side note: I was following this tutorial: http://www.theanalysisfactor.com/r-glm-plotting/
Any insights would be appreciated! Thank you! :)
Haha, I see what happened. It is because of the range you plot. I saw the functional form of the curve from your comment line, and I define it as a function:
f <- function (x) 1 / (1 + exp(-0.306 + 0.0586 * x))
Now, if we plot
x <- -100 : 100
plot(x, f(x), type = "l")
Logistic curve has a near linear shape in the middle. That is what you arrived at!

plot create cutoff line at particular point

consider the following plot:
pwrt<-pwr.t.test(d=.8,n=c(10,20,30,40,50,60,70,80,90,100),sig.level=.05,type="two.sample",alternative="two.sided")
plot(pwrt$n,pwrt$power,type="b",xlab="sample size",ylab="power", main = "Power curve for t-test d = .8")
which creates
I would like to add a vertical line as a 'cutoff' point at power = .9 for example. And also to compute the exact x-value (sample size) for this cutoff point
How do I do this? Any help is much appreciated.
You can calculate the sample size for a given power with the same pwr.t.test function.
From help(pwr.t.test):
Exactly one of the parameters 'd','n','power' and 'sig.level' must be passed as NULL, and that parameter is determined from the others.
library(pwr)
N90 <- pwr.t.test(d=.8,power = 0.9,sig.level=.05,type="two.sample",alternative="two.sided")$n
N90
[1] 33.82555
From there, it's simple to add a line and text label.
plot(pwrt$n,pwrt$power,type="b",xlab="sample size",ylab="power", main = "Power curve for t-test d = .8")
abline(v = N90)
text(x = N90 + 7, y = 0.8, labels = paste0("N = ",round(N90,2)))

Drawing random numbers from a power law distribution in R

I am using the R package "poweRlaw" to estimate and subsequently draw from discrete power law distributions, however the distribution drawn from the fit does not seem to match the data. To illustrate, consider this example from a guide for this package: https://cran.r-project.org/web/packages/poweRlaw/vignettes/b_powerlaw_examples.pdf. Here we first download an example dataset from the package and then fit a discrete power law.
library("poweRlaw")
data("moby", package = "poweRlaw")
m_pl = displ$new(moby)
est = estimate_xmin(m_pl)
m_pl$setXmin(est)
The fit looks like a good one, as we can't discard the hypothesis that this data is drawn from a power distribution (p-value > 0.05):
bs = bootstrap_p(m_pl, threads = 8)
bs$p
However, when we draw from this distribution using the built in function dist_rand(), the resulting distribution is shifted to the right of the original distribution:
set.seed(1)
randNum = dist_rand(m_pl, n = length(moby))
plot(density(moby), xlim = c(0, 1000), ylim = c(0, 1), xlab = "", ylab = "", main = "")
par(new=TRUE)
plot(density(randNum), xlim = c(0, 1000), ylim = c(0, 1), col = "red", xlab = "x", ylab = "Density", main = "")
I am probably misunderstanding what it means to draw from a power distribution, but does this happen because we only fit the tail of the experimental distribution (so we draw after the parameter Xmin)? If something like this is happening, is there any way I can compensate for this fact so that the fitted distribution resembles the experimental distribution?
So there's a few things going on here.
As you hinted at in your question, if you want to compare distributions, you need to truncate moby, so moby = moby[moby >= m_pl$getXmin()]
Using density() is a bit fraught. This is a kernel density smoother, that draws Normal distributions over discrete points. As the powerlaw has a very long tail, this is suspect
Comparing the tails of two powerlaw distributions is tricky (simulate some data and see).
Anyway, if you run
set.seed(1)
x = dist_rand(m_pl, n = length(moby))
# Cut off the tail for visualisation
moby = moby[moby >= m_pl$getXmin() & moby < 100]
plot(density(moby), log = "xy")
x = x[ x < 100]
lines(density(x), col = 2)
Gives something fairly similar.

QQ plot in r from tassel pipeline

For my GWAS analysis I am using the tassel pipeline. In my GWAS I am studying two correlated traits.
I want to plot a Q_Q plot for two trait in one plot like the one which we can obtain from tassel Program.
Any one has any suggestion with which package of r I can do that?
With qq() command from qqman package I plot QQ plot in seprate plot but I want a plot which involved my two traits as i did in Tassel
Ay suggestion?
A QQ-Plot in your case compares quantiles of the empirical distribution of your result to quantiles of the distribution that you'd expect theoretically if the null hypothesis is true.
If you have n data points, it makes sense to compare the n-quantiles, because then the actual quantiles of your empirical distribution are just your data points, ordered.
The theoretical distribution of p-values is the uniform distribution. Think of it, that's exactly the reason why they exist. If a measurement is assigned for example a p-value of 0.05, you'd expect this or a more extreme measurement by pure chance (null hypothesis) in only 5% of your experiments, if you repeat that experiment very often. A measurement with p=0.5, is expected in 50% of the cases. So, generalizing to any value p, your cumulative distribution function
CDF(p) = P[measurement with p-value of ≤ p] = p.
Look in Wikipedia, that's the
CDF for the uniform distribution between 0 and 1.
Therefore, the expected n-quantiles for your QQ-Plot are {1/n, 2/n, ... n/n}. (They represent the case that the null hypothesis is true)
So, now we have the theoretical quantiles (x-axis) and the actual quantiles. In R code, this is something like
expected_quantiles <- function(pvalues){
n = length(pvalues)
actual_quantiles = sort(pvalues)
expected_quantiles = seq_along(pvalues)/n
data.frame(expected = expected_quantiles, actual = actual_quantiles)
}
You can take the -log10 of these values and plot them, for example like so
testdata1 <- c(runif(98,0,1), 1e-4, 2e-5)
testdata2 <- c(runif(96,0,1), 1e-3, 2e-3, 2e-4)
qq <- lapply(list(d1 = testdata1, d2 = testdata2), expected_quantiles)
xlim <- rev(-log10(range(rbind(qq$d1, qq$d2)$expected))) * c(1, 1.1)
ylim <- rev(-log10(range(rbind(qq$d1, qq$d2)$actual))) * c(1, 1.1)
plot(NULL, xlim = xlim, ylim = ylim)
points(x = -log10(qq$d1$expected) ,y = -log10(qq$d1$actual), col = "red")
points(x = -log10(qq$d2$expected) ,y = -log10(qq$d2$actual), col = "blue")
abline(a = 0, b = 1)

Logistic regression plot in R gives a straight line instead of an S-shape curve

I was plotting the results of a logistic regression, but instead of the expected S curve, I got a straight line like this:
This was the code that I was using:
I created a range of data from the original x-axis, converted it to data frame, and then predicted and drew the lines.
model = glm(SHOT_RESULT~SHOT_DISTANCE,family='binomial',data = df_2shot)
summary(model)
#Eqn : P(SHOT_RESULT = True) = 1 / (1 + e^-(0.306 - 0.0586(SHOT_DISTANCE)))
r = range(df_2shot$SHOT_DISTANCE) # draws a curve based on prediction
x_range = seq(r[1],r[2],1)
x_range = as.integer(x_range)
y = predict(model,data.frame(SHOT_DISTANCE = x_range),type="response")
plot(df_2shot$SHOT_DISTANCE, df_2shot$SHOT_RESULT, pch = 16,
xlab = "SHOT DISTANCE", ylab = "SHOT RESULT")
lines(x_range,y)
Side note: I was following this tutorial: http://www.theanalysisfactor.com/r-glm-plotting/
Any insights would be appreciated! Thank you! :)
Haha, I see what happened. It is because of the range you plot. I saw the functional form of the curve from your comment line, and I define it as a function:
f <- function (x) 1 / (1 + exp(-0.306 + 0.0586 * x))
Now, if we plot
x <- -100 : 100
plot(x, f(x), type = "l")
Logistic curve has a near linear shape in the middle. That is what you arrived at!

Resources