adjusting plot axis in user defined function - R - r

I have a function in R which creates a standard normal plot, and then uses a for loop that calls density plots for the t distribution for various degrees of freedom. The plot looks like:
Note that the density for degrees of freedom = 2 extends outside of the y axis limits. I am wondering if there is a way to edit the for loop so that the axis limits are adjusted according to the range of the density lines that are drawn.
The for loop code that I am using is as follows:
N <- 1000
n <- c(25,50,100,200)
df<-c(1:4,seq(5,25,by=5))
histPlot <- function(data) {
x <- seq(-4, 4, length=100)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l",
main=paste("Distribution of size", nrow(data)/9000, sep=" "),
xlab="standard deviation")
colors <- brewer.pal(n = 9, name = "Spectral")
i<-1
for (d in df) {
lines(density(data[data$df==d, "t"]),col=colors[i])
legend("topright", pch=c(21,21), col=c(colors, "black"), legend=c(df, "normal"), bty="o", cex=.8)
i <- i+1
}
}

The lines functions called inside the for loop add up to the existing plot.
This means you have to change the ylim parameter in the plot function call. This will make a higher plot, and lines will be visible when added.
Try like this:
plot(x, y, type="l",
main=paste("Distribution of size", nrow(data)/9000, sep=" "),
xlab="standard deviation",
ylim = c(0, 1)) # This line will make the plot higher, i.e. the y axis range will be from 0 to 1

Related

How do you implement rgamma and dgamma in a single plot

For an assignment I was asked this:
For the values of
(shape=5,rate=1),(shape=50,rate=10),(shape=.5,rate=.1), plot the
histogram of a random sample of size 10000. Use a density rather than
a frequency histogram so that you can add in a line for the population
density (hint: you will use both rgamma and dgamma to make this plot).
Add an abline for the population and sample mean. Also, add a subtitle
that reports the population variance as well as the sample variance.
My current code looks like this:
library(ggplot2)
set.seed(1234)
x = seq(1, 1000)
s = 5
r = 1
plot(x, dgamma(x, shape = s, rate = r), rgamma(x, shape = s, rate = r), sub =
paste0("Shape = ", s, "Rate = ", r), type = "l", ylab = "Density", xlab = "", main =
"Gamma Distribution of N = 1000")
After running it I get this error:
Error in plot.window(...) : invalid 'xlim' value
What am I doing incorrectly?
plot() does not take y1 and y2 arguments. See ?plot. You need to do a plot (or histogram) of one y variable (e.g., from rgamma), then add the second y variable (e.g., from dgamma) using something like lines().
Here's one way to get a what you want:
#specify parameters
s = 5
r = 1
# plot histogram of random draws
set.seed(1234)
N = 1000
hist(rgamma(N, shape=s, rate=r), breaks=100, freq=FALSE)
# add true density curve
x = seq(from=0, to=20, by=0.1)
lines(x=x, y=dgamma(x, shape=s, rate=r))

Logarithmic scale plot in R

I want to plot the clustering coefficient and the average shortest-
path as a function of the parameter p of the Watts-Strogatz model as following:
And this is my code:
library(igraph)
library(ggplot2)
library(reshape2)
library(pracma)
p <- #don't know how to generate this?
trans <- -1
path <- -1
for (i in p) {
ws_graph <- watts.strogatz.game(1, 1000, 4, i)
trans <-c(trans, transitivity(ws_graph, type = "undirected", vids = NULL,
weights = NULL))
path <- c(path,average.path.length(ws_graph))
}
#Remove auxiliar values
trans <- trans[-1]
path <- path[-1]
#Normalize them
trans <- trans/trans[1]
path <- path/path[1]
x = data.frame(v1 = p, v2 = path, v3 = trans)
plot(p,trans, ylim = c(0,1), ylab='coeff')
par(new=T)
plot(p,path, ylim = c(0,1), ylab='coeff',pch=15)
How should I proceed to make this x-axis?
You can generate the values of p using code like the following:
p <- 10^(seq(-4,0,0.2))
You want your x values to be evenly spaced on a log10 scale. This means you need to take evenly spaced values as the exponent for the base 10, because the log10 scale takes the log10 of your x values, which is the exact opposite operation.
With this, you are already pretty far. You don't need par(new=TRUE), you can simply use the function plot followed by the function points. The latter does not redraw the whole plot. Use the argument log = 'x' to tell R you need a logarithmic x axis. This only needs to be set in the plot function, the points function and all other low-level plot functions (those who do not replace but add to the plot) respect this setting:
plot(p,trans, ylim = c(0,1), ylab='coeff', log='x')
points(p,path, ylim = c(0,1), ylab='coeff',pch=15)
EDIT: If you want to replicate the log-axis look of the above plot, you have to calculate them yourselves. Search the internet for 'R log10 minor ticks' or similar. Below is a simple function which can calcluate the appropriate position for log axis major and minor ticks
log10Tck <- function(side, type){
lim <- switch(side,
x = par('usr')[1:2],
y = par('usr')[3:4],
stop("side argument must be 'x' or 'y'"))
at <- floor(lim[1]) : ceil(lim[2])
return(switch(type,
minor = outer(1:9, 10^(min(at):max(at))),
major = 10^at,
stop("type argument must be 'major' or 'minor'")
))
}
After you have defined this function, by using the above code, you can call the function inside the axis(...) function, which draws axes. As a suggestion: save the function away in its own R script and import that script at the top of your calculation using the function source. By this means, you can reuse the function in future projects. Prior to drawing the axes, you have to prevent plot from drawing default axes, so add the parameter axes = FALSE to your plot call:
plot(p,trans, ylim = c(0,1), ylab='coeff', log='x', axes=F)
Then you may generate the axes, using the tick positions generated by the
new function:
axis(1, at=log10Tck('x','major'), tcl= 0.2) # bottom
axis(3, at=log10Tck('x','major'), tcl= 0.2, labels=NA) # top
axis(1, at=log10Tck('x','minor'), tcl= 0.1, labels=NA) # bottom
axis(3, at=log10Tck('x','minor'), tcl= 0.1, labels=NA) # top
axis(2) # normal y axis
axis(4) # normal y axis on right side of plot
box()
As a third option, as you are importing ggplot2 in your original post: The same, without all of the above, with ggplot:
# Your data needs to be in the so-called 'long format' or 'tidy format'
# that ggplot can make sense of it. Google 'Wickham tidy data' or similar
# You may also use the function 'gather' of the package 'tidyr' for this
# task, which I find more simple to use.
d2 <- reshape2::melt(x, id.vars = c('v1'), measure.vars = c('v2','v3'))
ggplot(d2) +
aes(x = v1, y = value, color = variable) +
geom_point() +
scale_x_log10()

2 Y axis histogram (normal frequency vs relative frequency)

I would like your help, please.
I have this 2 plots, separately. One is normal frequency and the other one, with exactly the same data, is for relative frequency.
Can you tell me how can i join them in a single plot with 2 y axis ( frequency and relative frequency?)
x<- AAA$starch
h<-hist(x, breaks=40, col="lightblue", xlab="Starch ~ Corn",
main="Histogram with Normal Curve", xlim=c(58,70),ylim = c(0,2500),axes=TRUE)
xfit<-seq(min(x),max(x),length=40)
yfit<-dnorm(xfit,mean=mean(x),sd=sd(x))
yfit <- yfit*diff(h$mids[1:2])*length(x)
lines(xfit, yfit, col="blue", lwd=3)
library(HistogramTools)
x<- AAA$starch
c <- hist(x,breaks=10, ylab="Relative Frequency", main="Histogram with Normal Curve",ylim=c(0,2500), xlim=c(58,70), axes=TRUE)
PlotRelativeFrequency((c))
Thank you!!
EDIT:
This is just an example image of what I want...
I use doubleYScale from package latticeExtra.
Here is an example (I am not sure about relative frequency calculation) :
library(latticeExtra)
set.seed(42)
firstSet <- rnorm(500,4)
breaks = 0:10
#Cut data into sections
firstSet.cut = cut(firstSet, breaks, right=FALSE)
firstSet.freq = table(firstSet.cut)
#Calculate relative frequency
firstSet.relfreq = firstSet.freq / length(firstSet)
#Parse to a list to use xyplot later and assigning x values
firstSet.list <- list(x = 1:10, y = as.vector(firstSet.relfreq))
#Build histogram and relative frequency curve
hist1 <- histogram(firstSet, breaks = 10, freq = TRUE, col='skyblue', xlab="Starch ~ Corn", ylab="Frequency", main="Histogram with Normal Curve", ylim=c(0,40), xlim=c(0,10), plot=FALSE)
relFreqCurve <- xyplot(y ~ x, firstSet.list, type="l", ylab = "Relative frequency", ylim=c(0,1))
#Build double objects plot
doubleYScale(hist1, relFreqCurve, add.ylab2 = TRUE)
And here is the result with two y axis with different scales :

How to plot a normal distribution by labeling specific parts of the x-axis?

I am using the following code to create a standard normal distribution in R:
x <- seq(-4, 4, length=200)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2)
I need the x-axis to be labeled at the mean and at points three standard deviations above and below the mean. How can I add these labels?
The easiest (but not general) way is to restrict the limits of the x axis. The +/- 1:3 sigma will be labeled as such, and the mean will be labeled as 0 - indicating 0 deviations from the mean.
plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5))
Another option is to use more specific labels:
plot(x,y, type = "l", lwd = 2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
Using the code in this answer, you could skip creating x and just use curve() on the dnorm function:
curve(dnorm, -3.5, 3.5, lwd=2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
But this doesn't use the given code anymore.
If you like hard way of doing something without using R built in function or you want to do this outside R, you can use the following formula.
x<-seq(-4,4,length=200)
s = 1
mu = 0
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2))
plot(x,y, type="l", lwd=2, col = "blue", xlim = c(-3.5,3.5))
An extremely inefficient and unusual, but beautiful solution, which works based on the ideas of Monte Carlo simulation, is this:
simulate many draws (or samples) from a given distribution (say the normal).
plot the density of these draws using rnorm. The rnorm function takes as arguments (A,B,C) and returns a vector of A samples from a normal distribution centered at B, with standard deviation C.
Thus to take a sample of size 50,000 from a standard normal (i.e, a normal with mean 0 and standard deviation 1), and plot its density, we do the following:
x = rnorm(50000,0,1)
plot(density(x))
As the number of draws goes to infinity this will converge in distribution to the normal. To illustrate this, see the image below which shows from left to right and top to bottom 5000,50000,500000, and 5 million samples.
In general case, for example: Normal(2, 1)
f <- function(x) dnorm(x, 2, 1)
plot(f, -1, 5)
This is a very general, f can be defined freely, with any given parameters, for example:
f <- function(x) dbeta(x, 0.1, 0.1)
plot(f, 0, 1)
I particularly love Lattice for this goal. It easily implements graphical information such as specific areas under a curve, the one you usually require when dealing with probabilities problems such as find P(a < X < b) etc.
Please have a look:
library(lattice)
e4a <- seq(-4, 4, length = 10000) # Data to set up out normal
e4b <- dnorm(e4a, 0, 1)
xyplot(e4b ~ e4a, # Lattice xyplot
type = "l",
main = "Plot 2",
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(0, 1, 1.5), lty = 2) #set z and lines
xx <- c(1, x[x>=1 & x<=1.5], 1.5) #Color area
yy <- c(0, y[x>=1 & x<=1.5], 0)
panel.polygon(xx,yy, ..., col='red')
})
In this example I make the area between z = 1 and z = 1.5 stand out. You can move easily this parameters according to your problem.
Axis labels are automatic.
This is how to write it in functions:
normalCriticalTest <- function(mu, s) {
x <- seq(-4, 4, length=200) # x extends from -4 to 4
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2)) # y follows the formula
of the normal distribution: f(Y)
plot(x,y, type="l", lwd=2, xlim = c(-3.5,3.5))
abline(v = c(-1.96, 1.96), col="red") # draw the graph, with 2.5% surface to
either side of the mean
}
normalCriticalTest(0, 1) # draw a normal distribution with vertical lines.
Final result:

R Density Graph: How can I add a solid line from the x-axis to the top of the density curve

I have a density plot graphed using:
plot(density(x))
What I am interested in doing is creating a line for something like x = 5 from the x-axis to the corresponding spot on the curve.
Like this:
You can do this by first storing the density values in an object, and then retrieving the x and y elements from this object. In the following example I use findInterval to retrieve the y-value for the given x-value:
x <- rnorm(1000) # Sample data
y <- density(x) # Calculate and store density
x0 <- 2 # Desired position on x-axis
y0 <- y$y[findInterval(x0, y$x)] # Corresponding y value
plot(density(x))
segments(x0, 0, x0, y0)
If it's really z-scores then just actually plot the density function dnorm(). It also looks like you'd like to have your actual x-axis at 0.
zmax <- 4
curve(dnorm, -zmax, zmax, xaxt = 'n', bty = 'n')
axis(1, -zmax:zmax, pos = 0)
To draw in your line you can use the dnorm function again.
zscore <- 1.65
segments(zscore, 0, zscore, dnorm(zscore))
You could also even shade it out nicely... :)
x <- seq(zscore, zmax, 0.01)
y <- c(0, dnorm(x)[1:(length(x)-2)],0)
polygon(x,y, density = 20)
You can also use segments and the text command to put the labels right on your graph of what the shaded and unshaded areas mean.

Resources