histogram and pdf in the same graph [duplicate] - r

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Fitting a density curve to a histogram in R
I'd like to plot on the same graph the histogram and various pdf's. I've tried for just one pdf with the following code (adopted from code I've found in the web):
hist(data, freq = FALSE, col = "grey", breaks = "FD")
.x <- seq(0, 0.1, length.out=100)
curve(dnorm(.x, mean=a, sd=b), col = 2, add = TRUE)
It gives me an error. Can you advise me?
For multiple pdf's what's the trick?
And I've observed that the histogram seems to be plot the density (on y-y axis) instead of the number of observations.... how can I change this?
Many thanks!

It plots the density instead of the frequency because you specified freq=FALSE. It is not very fair to complain about it doing exactly what you told it to do.
The curve function expects an expression involving x (not .x) and it does not require you to precompute the x values. You probably want something like:
a <- 5
b <- 2
hist( rnorm(100, a, b), freq=FALSE )
curve( dnorm(x,a,b), add=TRUE )
To head of your next question, if you specify freq=TRUE (or just leave it out for the default) and add the curve then the curve just runs along the bottom (that is the whole purpose of plotting the histogram as a density rather than frequencies). You can work around this by scaling the expression given to curve by the width of the bins and the number of total points:
out <- hist( rnorm(100, a, b) )
curve( dnorm(x,a,b)*100*diff(out$breaks[1:2]), add=TRUE )
Though personally the first option (density scale) without tickmark labels on the y-axis makes more sense to me.

h<-hist(data, breaks="FD", col="red", xlab="xTitle", main="Normal pdf and histogram")
xfit<-seq(min(data),max(data),length=100)
x.norm<-rnorm(n=100000, mean=a, sd=b)
yfit<-dnorm(xfit,mean=mean(x.norm),sd=sd(x.norm))
yfit <- yfit*diff(h$mids[1:2])*length(loose_All)
lines(xfit, yfit, col="blue", lwd=2)

Related

How can I plot a smooth line over plot points, like a contour/skyline of the plot?

What I'm looking for is best explained by a picture: A line that "contours" the maxima of my points (like giving the "skyline" of the plot). I have a plot of scattered points with dense, (mostly) unique x coordinates (not equally distributed in either axis). I want a red line surfacing this plot:
What I've tried/thought of so far is, that a simple "draw as line" approach fails due to the dense nature of the data with unique x values and a lot of local maxima and minima (basically at every point). The same fact makes a mere "get maximum"-approach impossible.
Therefore I'm asking: Is there some kind of smoothing option for a plot? Or any existing "skyline" operator for a plot?
I am specifically NOT looking for a "contour plot" or a "skyline plot" (as in Bayesian skylineplot) - the terms would actually describe what I want, but unfortunately are already used for other things.
Here is a minimal version of what I'm working with so far, a negative example of lines not giving the desired results. I uploaded sample data here.
load("xy_lidarProfiles.RData")
plot(x, y,
xlab="x", ylab="y", # axis
pch = 20, # point marker style (1 - 20)
asp = 1 # aspect of x and y ratio
)
lines(x, y, type="l", col = "red") # makes a mess
You will get close to your desired result if you order() by x values. What you want then is a running maximum, which TTR::runMax() provides.
plot(x[order(x)], y[order(x)], pch=20)
lines(x[order(x)], TTR::runMax(y[order(x)], n=10), col="red", lwd=2)
You may adjust the window with the n= parameter.

Is it possible to create this graph on R?

I'm really new to R and I'm looking to create a graph similar to the one attached. I have tried to create a density plot using both ggplot and the base program.
I have used code ggplot(data, aes(x = Freq)) + geom_density() but the output is incorrect. I'm getting a spike at each number point rather than an overall curve. Every row is one data point of between 1 to 7 and the frequency distributions for one trait is as follows:
1: 500, 2: 550 3:700 4:1000 5:900 6:835: 7:550
As such I have 5035 rows as one row equates to one score.
Any help is much appreciated.
Here is what I wish the plot would look like. (Note I'll add other traits at a later stage, I just wish to add one line at the moment).
there are a few things going on here, first is generating summary statistics of the data. you just need to call mean and sd in the appropriate way to get mean and standard deviation from your data. you've not shown your data so it would be difficult to suggest much here.
as far as plotting these summary statistics, you can replicate the plot from the original paper easily, but it's pretty bad and I'd suggest you not do that. stronger lines imply more importance, the need to double label everything, mislabelling the y-axis, all of that on top of drawing nice smooth parametric curves gives a false impression of confidence. I've only scanned the paper, but that sort of data is crying out for a multi-level model of some sort
I prefer "base" graphics, ggplot is great for exploratory graphics but if you have hard constraints on what a plot should look like it tends to get in the way. We start with the summary statistics:
df <- read.csv(text="
title, mu, sigma,label, label_x,label_pos
Extraversion, 4.0, 1.08,Extra, 3.85,3
Agreeableness, 5.0, 0.77,Agree, 5.0, 3
Conscientiousness, 4.7, 0.97,Cons, 3.4, 2
Emotional stability,5.3, 0.84,Emot stab,5.9, 4
Intellect, 3.7, 0.86,Intellect,3.7, 3
")
I've just pulled numbers out of the paper here, you'd have to calcular them. the mu column is the mean of the variable, and sigma is the standard deviation. label_x and label_pos are used to draw labels so need to be manually chosen (or the plot can be annotated afterwards in something like Inkscape). label_x is the x-axis position, and label_pos stands for where it is in relation to the x-y point (see text for info about the pos parameter)
next we calculate a couple of things:
lwds <- 1 + seq(3, 1, len=5) ^ 2
label_y <- dnorm(df$label_x, df$mu, df$sigma)
i.e. line widths and label y positions, and we can start to make the plot:
# start by setting up plot nicely and setting plot limits
par(bty='l', mar=c(3, 3, 0.5, 0.5), mgp=c(1.8, 0.4, 0), tck=-0.02)
plot.new(); plot.window(c(1, 7), c(0, 0.56), yaxs='i')
# loop over data drawing curves
for (i in 1:nrow(df)) {
curve(dnorm(x, df$mu[[i]], df$sigma[[i]]), add=T, n=151, lwd=lwds[[i]])
}
# draw labels
text(df$label_x, label_y, df$label, pos=df$label_pos)
# draw axes
axis(1, lwd=0, lwd.ticks=1)
axis(2, lwd=0, lwd.ticks=1)
box(lwd=1)
# finally, title and legend
title(xlab='Level of state', ylab='Probability density')
legend('topleft', legend=df$title, lwd=lwds, bty='n', cex=0.85)
this gives us something like:
I've also gone with more modern capitalisation, and started the y-axis at zero as these are probabilities so can't be negative
My preferences would be for something closer to this:
the thin lines cover 2 standard deviations (i.e. 95% intervals) around the mean, thick lines 1 SDs (68%), and the point is the mean. it's much easier to discriminate each measure and compare across them, and it doesn't artificially make "extraversion" more prominent. the code for this is similar:
par(bty='l', mar=c(3, 8, 0.5, 0.5), mgp=c(1.8, 0.4, 0), tck=-0.02)
plot.new(); plot.window(c(1, 7), c(5.3, 0.7))
# draw quantiles
for (i in 1:nrow(df)) {
lines(df$mu[[i]] + df$sigma[[i]] * c(-1, 1), rep(i,2), lwd=3)
lines(df$mu[[i]] + df$sigma[[i]] * c(-2, 2), rep(i,2), lwd=1)
}
# and means
points(df$mu, 1:5, pch=20)
axis(1, lwd=0, lwd.ticks=1)
axis(2, at=1:5, labels=df$title, lwd=0, lwd.ticks=1, las=1)
box()
title(xlab='Level of state')

Access lines plotted by R using basic plot()

I am trying to do the following:
plot a time series in R using a polygonal line
plot one or more horizontal lines superimposed
find the intersections of said line with the orizontal ones
I got this far:
set.seed(34398)
c1 <- as.ts(rbeta(25, 33, 12))
p <- plot(c1, type = 'l')
# set thresholds
thresholds <- c(0.7, 0.77)
I can find no way to access the segment line object plotted by R. I really really really would like to do this with base graphics, while realizing that probably there's a ggplot2 concoction out there that would work. Any idea?
abline(h=thresholds, lwd=1, lty=3, col="dark grey")
I will just do one threshold. You can loop through the list to get all of them.
First find the points, x, so that the curve crosses the threshold between x and x+1
shift = (c1 - 0.7)
Lower = which(shift[-1]*shift[-length(shift)] < 0)
Find the actual points of crossing, by finding the roots of Series - 0.7 and plot
shiftedF = approxfun(1:length(c1), c1-0.7)
Intersections = sapply(Lower, function(x) { uniroot(shiftedF, x:(x+1))$root })
points(Intersections, rep(0.7, length(Intersections)), pch=16, col="red")

plot lines instead of points R

This is probably a simple question, but I´m not able to find the solution for this.
I have the following plot (I´m using plot CI since I´m not able to fill the points with plot()).
leg<-c("1","2","3","4","5","6","7","8")
Col.rar1<-c(rgb(1,0,0,0.7), rgb(0,0,1,0.7), rgb(0,1,1,0.7),rgb(0.6,0,0.8,0.7),rgb(1,0.8,0,0.7),rgb(0.4,0.5,0.6,0.7),rgb(0.2,0.3,0.2,0.7),rgb(1,0.3,0,0.7))
library(plotrix)
plotCI(test$size,test$Mean,
pch=c(21), pt.bg=Col.rar1,xlab="",ylab="", ui=test$Mean,li= test$Mean)
legend(4200,400,legend=leg,pch=c(21),pt.bg=Col.rar1, bty="n", cex=1)
I want to creat the same effect but with lines, instead of points (continue line)
Any suggestion?
You have 2 solutions :
Use The lines() function draws lines between (x, y) locations.
Use plot with type = "l" like line
hard to show it without a reproducible example , but you can do for example:
Col.rar1<-c(rgb(1,0,0,0.7), rgb(0,0,1,0.7), rgb(0,1,1,0.7),rgb(0.6,0,0.8,0.7),rgb(1,0.8,0,0.7),rgb(0.4,0.5,0.6,0.7),rgb(0.2,0.3,0.2,0.7),rgb(1,0.3,0,0.7))
x <- seq(0, 5000, length.out=10)
y <- matrix(sort(rnorm(10*length(Col.rar1))), ncol=length(Col.rar1))
plot(x, y[,1], ylim=range(y), ann=FALSE, axes=T,type="l", col=Col.rar1[1])
lapply(seq_along(Col.rar1),function(i){
lines(x, y[,i], col=Col.rar1[i])
points(x, y[,i]) # this is optional
})
When it comes to generating plots where you want lines connected according to some grouping variable, you want to get away from base-R plots and check out lattice and ggplot2. Base-R plots don't have a simple concept of 'groups' in an xy plot.
A simple lattice example:
library( lattice )
dat <- data.frame( x=rep(1:5, times=4), y=rnorm(20), gp=rep(1:4,each=5) )
xyplot( y ~ x, dat, group=gp, type='b' )
You should be able to use something like this if you have a variable in test similar to the color vector you define.

Make y-axis logarithmic in histogram using R [duplicate]

This question already has answers here:
Histogram with Logarithmic Scale and custom breaks
(7 answers)
Closed 5 years ago.
Hi I'm making histogram using R, but the number of Y axis is so large that I need to turn it into logarithmic.See below my script:
hplot<-read.table("libl")
hplot
pdf("first_end")
hist(hplot$V1, breaks=24, xlim=c(0,250000000), ylim=c(0,2000000),main="first end mapping", xlab="Coordinates")
dev.off()
So how should I change my script?
thx
You can save the histogram data to tweak it before plotting:
set.seed(12345)
x = rnorm(1000)
hist.data = hist(x, plot=F)
hist.data$counts = log10(hist.data$counts)
dev.new(width=4, height=4)
hist(x)
dev.new(width=4, height=4)
plot(hist.data, ylab='log10(Frequency)')
Another option would be to use plot(density(hplot$V1), log="y").
It's not a histogram, but it shows just about the same information, and it avoids the illogical part where a bin with zero counts is not well-defined in log-space.
Of course, this is only relevant when your data is continuous and not when it's really categorical or ordinal.
A histogram with the y-axis on the log scale will be a rather odd histogram. Technically it will still fit the definition, but it could look rather misleading: the peaks will be flattened relative to the rest of the distribution.
Instead of using a log transformation, have you considered:
Dividing the counts by 1 million:
h <- hist(hplot$V1, plot=FALSE)
h$counts <- h$counts/1e6
plot(h)
Plotting the histogram as a density estimate:
hist(hplot$V1, freq=FALSE)
You can log your y-values for the plot and add a custom log y-axis afterwards.
Here is an example for a table object of random normal distribution numbers:
# data
count = table(round(rnorm(10000)*2))
# plot
plot(log(count) ,type="h", yaxt="n", xlab="position", ylab="log(count)")
# axis labels
yAxis = c(0,1,10,100,1000)
# draw axis labels
axis(2, at=log(yAxis),labels=yAxis, las=2)

Resources