I would like to add a density curve to a histogram while keeping the y-axis with count values. The answer on this forum describes how to add a density curve to a histogram but with the probability on the y-axis. Here is the code:
Pkg.add("Plots")
Pkg.add("Distributions")
using Distributions, Plots
dist = Normal(0, 1)
data = rand(dist, 1000)
histogram(data, normalize=true)
plot!(x->pdf(dist, x), xlim=xlims())
Output:
It nicely creates the histogram with a density curve and density y-axis. But I was wondering if anyone knows how to add a density curve to a histogram while keeping the y-axis as count values in Julia?
If you still wish to use automatic binning like in the example in the question, then I've found the following will work:
plt = histogram(data, normalize=false)
pre_factor = plt.series_list[1]
factor = pre_factor.plotattributes[:bar_width][1]
plot!(x->length(data)*factor*pdf(dist, x), xlim=xlims())
The code just digs out the width of the bars in the histogram to properly scale the pdf.
Also, you might want to look at charts with 2 y-axes. They do appear in some places, and I don't know how to draw them, but maybe someone else does (or it needs investigation).
Exact fitting depends on the bins you use to plot histogram. It is easiest to create fixed-width bins, then do:
julia> using StatsBase
julia> h = fit(Histogram, data, nbins=20)
Histogram{Int64, 1, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}}
edges:
-3.0:0.5:3.5
weights: [4, 12, 46, 98, 143, 204, 196, 137, 101, 32, 23, 3, 1]
closed: left
isdensity: false
julia> plot(h)
julia> plot!(x->length(data)*step(h.edges[1])*pdf(dist, x))
Related
For example, I want to plot this function y1 = function(x) 2*x/sqrt(x^2 + 1) in range (-2, 12)
But I tried using plot(y1(-2:12), type='l'), the plot comes out like a graph made up from several straight lines. How could I make it smoother?
Also, how could I define my own range of x and y shown on the plot? Thanks!
(sorry I can't insert any pictures of my plot because lacking reputation points...)
Use the curve function:
curve(2*x/sqrt(x^2 + 1), -2, 12)
I have a vector
v = [..., -10, -10, -10, ..., 1, 2, 5, 6, 7, 9, ...]
The geom_density plots the histogram of this vector in a smooth fashion, like a density function!
How can I use the auc, area under the curve, function of library MESS, to compute the areas under the curve for the density plot of such vector in a given interval, let say (-1, 3)?
"The geom_density plots the histogram of this vector in a smooth fashion, like a density function!"
Well, that's because geom_density performs a kernel density estimation! So it's not "like a density function", it is a density function.
Under the hood of geom_density it is actually stats::density that performs the density estimation. The kernel density estimates are given such that they define a proper probability density function with unit area under the curve.
We can confirm that by
x <- rnorm(100)
dens <- density(x)
df <- data.frame(x = dens$x, y = dens$y)
sum(df$y) * diff(df$x)[1]
#[1] 1.000952
Close enough.
It's straight-forward to integrate the density function over a specific range by summing the corresponding values in df; since you don't provide sample data I leave that up to you.
I am trying to do the following:
plot a time series in R using a polygonal line
plot one or more horizontal lines superimposed
find the intersections of said line with the orizontal ones
I got this far:
set.seed(34398)
c1 <- as.ts(rbeta(25, 33, 12))
p <- plot(c1, type = 'l')
# set thresholds
thresholds <- c(0.7, 0.77)
I can find no way to access the segment line object plotted by R. I really really really would like to do this with base graphics, while realizing that probably there's a ggplot2 concoction out there that would work. Any idea?
abline(h=thresholds, lwd=1, lty=3, col="dark grey")
I will just do one threshold. You can loop through the list to get all of them.
First find the points, x, so that the curve crosses the threshold between x and x+1
shift = (c1 - 0.7)
Lower = which(shift[-1]*shift[-length(shift)] < 0)
Find the actual points of crossing, by finding the roots of Series - 0.7 and plot
shiftedF = approxfun(1:length(c1), c1-0.7)
Intersections = sapply(Lower, function(x) { uniroot(shiftedF, x:(x+1))$root })
points(Intersections, rep(0.7, length(Intersections)), pch=16, col="red")
I've been working extensively with R lately and I have a nitpicky plotting question.
I've attached an image of my current plot as reference. As you can see, I've added vertical lines to segment parts of my data inputs. I have 200 'agents' and each of them comes from different categorical subsets which make them all a little different. So, my goal is to keep the bottom axis as the index of my 'agents' vector, but I'd like to add a label to each of my subdivisions at the bottom to make it a little clearer as to why I'm segmenting them with the vertical lines.
Any suggestions?
http://i.imgur.com/YGNdBhg.png?1?1971
You just need to call axis like this:
x = sin(1:100) + rnorm(100, 0,.125)
breaks = c(10,33,85, 96)
plot(x)
sapply(breaks, function(x){abline(v=x, lty=2)})
axis(1, breaks, as.character(breaks))
If you don't want the default ticks plotted at all (i.e. just the ticks in the "breaks" vector) you just need to modify this slightly:
plot(x, axes=F)
sapply(breaks, function(x){abline(v=x, lty=2)})
axis(1, breaks, as.character(breaks))
axis(2)
box()
You don't provide any example data or code, so the code I am sending is untested. I am calling the vector of vertical lines vertlines and the vector of labels labels. I define the midpoints of each category using the vertlines and the range of the agent values. Then I add them to the plot using the mtext() function. Give it a try.
vertlines <- c(40, 80, 120, 140, 160, 180)
labels <- letters[1:7]
labelx <- diff(c(1, vertlines, 200))/2 + c(1, vertlines)
mtext(labels, at=labelx, side=1, line=4)
I am plotting a scatter plot(x, y) and want it to be log-valued, so I do: plot(log(x), log(y)). I'd like to deal with cases where some value in x is 0, and thus not on the plot, while the corresponding y value is nonzero.
I'd like to display the scatter with log ticks but natural number values, meaning if in log2 then the ticks should be 2^0, 2^1, 2^2, ... that would allow me to plot 0 values on the scale as well so as to not miss these points.
Here's an example:
> x = c(0, 1, 20, 100, 200, 500)
> y = c(1, 16, 32, 105, 300, 50)
> plot(x, y)
There are six points. If I use:
> plot(log2(x), log2(y))
There are only 5 plotted, since x[0], y[0] is omitted (the x-value is 0). Therefore, I'd like to plot the log values but have tick labels to be natural numbers that are simply marked on a log scale. Then you can easily have on the same axis, 0, 2^0 (which is 1 of course), 2^1, 2^2, etc. and so on. Then the point (x[0], y[0]) will still be plotted, while keeping the log scale.
Side note: I don't think it's fair to downvote a post asking something very reasonable with an example. This is clearly on topic and relevant, and will come up for virtually everyone who is plotting things on a log value and cares about boundary / edge cases.
(I know some people deal with this by adding an arbitrary small constant to all points but I'd like to avoid that as it is messy.) thanks
If I understand you want to plot x versus y in the log scale?
Here and example using lattice and latticeExtra
# Some reproducible data
tm <- data.frame(x=seq(0,10,1),y=seq(0,10,1))
library(lattice)
library(latticeExtra)
xyplot(x ~ y , data=tm,
scales= list(x=list(log=2),
y=list(log=2)),
xscale.components = xscale.components.logpower, ## to get pretty scales
yscale.components = yscale.components.logpower
)