Problem with axis limits when plotting curve over histogram [duplicate] - r

This question already has an answer here:
How To Avoid Density Curve Getting Cut Off In Plot
(1 answer)
Closed 6 years ago.
newbie here. I have a script to create graphs that has a bit that goes something like this:
png(Test.png)
ht=hist(step[i],20)
curve(insert_function_here,add=TRUE)
I essentially want to plot a curve of a distribution over an histogram. My problem is that the axes limits are apparently set by the histogram instead of the curve, so that the curve sometimes gets out of the Y axis limits. I have played with par("usr"), to no avail. Is there any way to set the axis limits based on the maximum values of either the histogram or the curve (or, in the alternative, of the curve only)?? In case this changes anything, this needs to be done within a for loop where multiple such graphs are plotted and within a series of subplots (par("mfrow")).

Inspired by other answers, this is what i ended up doing:
curve(insert_function_here)
boundsc=par("usr")
ht=hist(A[,1],20,plot=FALSE)
par(usr=c(boundsc[1:2],0,max(boundsc[4],max(ht$counts))))
plot(ht,add=TRUE)
It fixes the bounds based on the highest of either the curve or the histogram.

You could determine the mx <- max(curve_vector, ht$counts) and set ylim=(0, mx), but I rather doubt the code looks like that since [] is not a proper parameter passing idiom and step is not an R plotting function, but rather a model selection function. So I am guessing this is code in Matlab or some other idiom. In R, try this:
set.seed(123)
png("Test.png")
ht=hist(rpois(20,1), plot=FALSE, breaks=0:10-0.1)
# better to offset to include discrete counts that would otherwise be at boundaries
plot(round(ht$breaks), dpois( round(ht$breaks), # plot a Poisson density
mean(ht$counts*round(ht$breaks[-length(ht$breaks)]))),
ylim=c(0, max(ht$density)+.1) , type="l")
plot(ht, freq=FALSE, add=TRUE) # plot the histogram
dev.off()

You could plot the curve first, then compute the histogram with plot=FALSE, and use the plot function on the histogram object with add=TRUE to add it to the plot.
Even better would be to calculate the the highest y-value of the curve (there may be shortcuts to do this depending on the nature of the curve) and the highest bar in the histogram and give this value to the ylim argument when plotting the histogram.

Related

How to find y-axis values given x-axis for multiple graphs

I have a simple question that I don't know the answer to.
Assume having multiple graphs on a plot. I would like to see the exact y-values on all graphs given a specific x.
Here is a sample R code:
x1=c(1,5,7,9,15)
y1=c(50,30,43,33,12)
x2=c(1,3,5.5,6,15)
y2=c(20,55,44,38,10)
plot(x1,y1,type="o",ylim=c(1,60))
points(x2,y2,type="o")
abline(v=c(2.5,4,6,10))
My question is how I can find the exact y-value for any vertical line crossing the plots?
You can create functions that will tell you the values with approxfun.
NewPoints = c(2.5,4,6,10)
f1= approxfun(x1,y1)
f2= approxfun(x2,y2)
Now the values that you want are: f1(NewPoints) and f2(NewPoints). You can see this by plotting:
points(NewPoints, f1(NewPoints), pch=16, col="red")
points(NewPoints, f2(NewPoints), pch=16, col="blue")

R: pROC package: plot ROC curve across specific range?

I would like to plot a segment of an ROC curve over a specific range of x values, instead of plotting the entire curve. I don't want to change the range of the x axis itself. I just want to plot only part of the ROC curve, within a range of x values that I specify.
library(pROC)
data(aSAH)
rocobj <- roc(aSAH$outcome, aSAH$wfns)
plot(rocobj)
That code plots the whole ROC curve. Let's say I just wanted to plot the curve from x=1 to x=.5. How could I do that? Thank you.
The default plot function for roc objects plots the rocobj$sensitivities as a function of rocobj$specificities.
So
plot(rocobj$specificities,rocobj$sensitivities,type="l",xlim=c(1.5,-0.5))
abline(1,-1)
achieves the same as
plot(rocobj)
And
plot(rocobj$specificities[2:6],rocobj$sensitivities[2:6],type="l",xlim=c(1.5,-0.5),ylim=c(0,1))
abline(1,-1)
Gets close to what I think you are after (plots from 0.514 to 1.0). I don't know enough about the package to know if the sensitivities can be calculated at a specific range, or resolution of specificities.
The plot function of pROC uses the usual R semantics for plotting, so you can use the xlim argument as you would for any other plot:
plot(rocobj, xlim = c(1, .5))

R - logistic curve plot with aggregate points

Let's say I have the following dataset
bodysize=rnorm(20,30,2)
bodysize=sort(bodysize)
survive=c(0,0,0,0,0,1,0,1,0,0,1,1,0,1,1,1,0,1,1,1)
dat=as.data.frame(cbind(bodysize,survive))
I'm aware that the glm plot function has several nice plots to show you the fit,
but I'd nevertheless like to create an initial plot with:
1)raw data points
2)the loigistic curve and both
3)Predicted points
4)and aggregate points for a number of predictor levels
library(Hmisc)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
All fine up to here.
Now I want to plot the real data survival rates for a given levels of x1
dat$bd<-cut2(dat$bodysize,g=5,levels.mean=T)
AggBd<-aggregate(dat$survive,by=list(dat$bd),data=dat,FUN=mean)
plot(AggBd,add=TRUE)
#Doesn't work
I've tried to match AggBd to the dataset used for the model and all sort of other things but I simply can't plot the two together. Is there a way around this?
I basically want to overimpose the last plot along the same axes.
Besides this specific task I often wonder how to overimpose different plots that plot different variables but have similar scale/range on two-dimensional plots. I would really appreciate your help.
The first column of AggBd is a factor, you need to convert the levels to numeric before you can add the points to the plot.
AggBd$size <- as.numeric (levels (AggBd$Group.1))[AggBd$Group.1]
to add the points to the exisiting plot, use points
points (AggBd$size, AggBd$x, pch = 3)
You are best specifying your y-axis. Also maybe using par(new=TRUE)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
#then
par(new=TRUE)
#
plot(AggBd$Group.1,AggBd$x,pch=30)
obviously remove or change the axis ticks to prevent overlap e.g.
plot(AggBd$Group.1,AggBd$x,pch=30,xaxt="n",yaxt="n",xlab="",ylab="")
giving:

Plot histogram and density function curve on one chart

I have a density function f, and I do MCMC sampling for it. To evaluate the goodness of the sampling, I need to plot the hist and curve within the same chart. The problem of
hist(samples);
curve(dfun,add=TRUE);
is that they are on the different scale: the frequency of a certain bin is usually hundreds, while the maximum of a density function is about 1 or so. What I want to do is to configure two plots at the same height, with one y-axis on the left and the other on the right. Can anyone help? Thank you.
Use the prob=TRUE argument to hist:
hist(samples, prob=TRUE)
curve(dfun,add=TRUE)
Also see this SO question

Make y-axis logarithmic in histogram using R [duplicate]

This question already has answers here:
Histogram with Logarithmic Scale and custom breaks
(7 answers)
Closed 5 years ago.
Hi I'm making histogram using R, but the number of Y axis is so large that I need to turn it into logarithmic.See below my script:
hplot<-read.table("libl")
hplot
pdf("first_end")
hist(hplot$V1, breaks=24, xlim=c(0,250000000), ylim=c(0,2000000),main="first end mapping", xlab="Coordinates")
dev.off()
So how should I change my script?
thx
You can save the histogram data to tweak it before plotting:
set.seed(12345)
x = rnorm(1000)
hist.data = hist(x, plot=F)
hist.data$counts = log10(hist.data$counts)
dev.new(width=4, height=4)
hist(x)
dev.new(width=4, height=4)
plot(hist.data, ylab='log10(Frequency)')
Another option would be to use plot(density(hplot$V1), log="y").
It's not a histogram, but it shows just about the same information, and it avoids the illogical part where a bin with zero counts is not well-defined in log-space.
Of course, this is only relevant when your data is continuous and not when it's really categorical or ordinal.
A histogram with the y-axis on the log scale will be a rather odd histogram. Technically it will still fit the definition, but it could look rather misleading: the peaks will be flattened relative to the rest of the distribution.
Instead of using a log transformation, have you considered:
Dividing the counts by 1 million:
h <- hist(hplot$V1, plot=FALSE)
h$counts <- h$counts/1e6
plot(h)
Plotting the histogram as a density estimate:
hist(hplot$V1, freq=FALSE)
You can log your y-values for the plot and add a custom log y-axis afterwards.
Here is an example for a table object of random normal distribution numbers:
# data
count = table(round(rnorm(10000)*2))
# plot
plot(log(count) ,type="h", yaxt="n", xlab="position", ylab="log(count)")
# axis labels
yAxis = c(0,1,10,100,1000)
# draw axis labels
axis(2, at=log(yAxis),labels=yAxis, las=2)

Resources