Drawing overlayed sideways plots in R - r

I have the following code, in R.
x = c(rep(2,10),rep(4,10))
y1 = c(5.1,3,4.2,4.1,4.8,4.0,5,4.15,3,4.5)
y2 = c(9.1,8,9.2,8.2,7,9.5,8.8,9.3,10,10.4)
y = c(y1,y2)
plot(x,y,pch=16,cex=0.9,xlim=c(0,6),ylim=c(0,13))
This code produces a plot with two bands of dots. I've overlayed normal curves sideways on those bands using powerpoint. How can I do this in R (drawing the sideways normal curves), using the actual means and sd values? NOTE: I repeat, the normal curves are not part of the plot. The code above just produces the raw plot.

First, calculate mean and standard deviation for y1 and y2.
m1<-mean(y1)
s1<-sd(y1)
m2<-mean(y2)
s2<-sd(y2)
Then made two data frame (for convenience) that contains y values as sequence of numbers (wider than actual y1 and y2 values). Then calculated density values for x using dnorm() and calculated mean and standard deviation values. Then added 2 or 4 to shift values to desired position.
df1<-data.frame(yval=seq(1,7,0.1),xval=(dnorm(seq(1,7,0.1),m1,s1)+2))
df2<-data.frame(yval=seq(6,12,0.1),xval=(dnorm(seq(6,12,0.1),m2,s2)+4))
Added density lines with function lines().
plot(x,y,pch=16,cex=0.9,xlim=c(0,6),ylim=c(0,13))
with(df1,lines(xval,yval))
with(df2,lines(xval,yval))

Related

Erratic behavior of a density plot

I have two numerical variables that I plotted in the following by means command density in R. The code is the following:
d0<-density(T0,n=2^14)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)
d1<-density(T1,n=2^14)
df_density1<-data.frame(x=d1$x,y=d1$y,stringsAsFactors = FALSE)
Initially, I had left the number of equally spaced points n at the default value 512, but then I realized that the area under a density plot d1 was not equal to 1 (it was around 13). Then I selected the proper number n in order to obtain an AUC (area under curve) near to 1 for both the density plots. In this way:
library(zoo)
x <- df_density0$x
y <- df_density0$y
id <- order(x)
AUC0 <- sum(diff(x[id])*rollmean(y[id],2))
x <- df_density1$x
y <- df_density1$y
id <- order(x)
AUC1 <- sum(diff(x[id])*rollmean(y[id],2))
For n=2^14 I obtained AUC0 and AUC1 equal to 0.9999... I plotted these density curves and I obtained the following graphic (blue is for T0 and red is for T1):
Since these graphs were indistinguishable from each other, I selected a logarithmic scale for the vertical axis:
Is it possible that I get such a result? Should I change the kernel field in the density function?
Ps. To draw these graphs I exported d0 and d1 in two csv files to import them into LaTeX. Anyway, I obtained the same plots in R. Moreover,
> nrow(T0)
[1] 9760
> nrow(T1)
[1] 1963

2d density plot from curves

I have a multi-parameter function on which I infer the parameters using MCMC. This means that I have many samples of the parameters, and I can plot the functions:
# Simulate some parameters. Really, I get these from MCMC sampling.
first = rnorm(1000) # a
second = rnorm(1000) # b
# The function (geometric)
geometric = function(x, a, b) b*(1 - a^(x + 1)/a)
# Plot curves. Perhaps not the most efficient way, but it works.
curve(geometric(x, first[1], second[1]), ylim=c(-3, 3)) # first curve
for(i in 2:length(first)) {
curve(geometric(x, first[i], second[i]), add=T, col='#00000030') # add others
}
How do I make this into a density plot instead of plotting the individual curves? For example, it's hard to see just how much denser it is around y=0 than around other values.
The following would be nice:
The ability to draw observed values on top (points and lines).
Drawing a contour line in the density, e.g. the 95% Highest Posterior Density interval or the 2.5 and 97.5 quantiles.

Shade area under a curve [duplicate]

This question already has answers here:
Shading a kernel density plot between two points.
(5 answers)
Closed 6 years ago.
I'm trying to shade an area under a curve in R. I can't quite get it right and I'm not sure why. The curve is defined by
# Define the Mean and Stdev
mean=1152
sd=84
# Create x and y to be plotted
# x is a sequence of numbers shifted to the mean with the width of sd.
# The sequence x includes enough values to show +/-3.5 standard deviations in the data set.
# y is a normal distribution for x
x <- seq(-3.5,3.5,length=100)*sd + mean
y <- dnorm(x,mean,sd)
The plot is
# Plot x vs. y as a line graph
plot(x, y, type="l")
The code I'm using to try to color under the curve where x >= 1250 is
polygon(c( x[x>=1250], max(x) ), c(y[x==max(x)], y[x>=1250] ), col="red")
but here's the result I'm getting
How can I correctly color the portion under the curve where x >= 1250
You need to follow the x,y points of the curve with the polygon, then return along the x-axis (from the maximum x value to the point at x=1250, y=0) to complete the shape. The final vertical edge is drawn automatically, because polygon closes the shape by returning to its start point.
polygon(c(x[x>=1250], max(x), 1250), c(y[x>=1250], 0, 0), col="red")
If, rather than dropping the shading all the way down to the x-axis, you prefer to have it at the level of the curve, then you can use the following instead. Although, in the example given, the curve drops almost to the x-axis, so its hard to see the difference visually.
polygon(c(x[x>=1250], 1250), c(y[x>=1250], y[x==max(x)]), col="red")

R: area under curve of ogive?

I have an algorithm that uses an x,y plot of sorted y data to produce an ogive.
I then derive the area under the curve to derive %'s.
I'd like to do something similar using kernel density estimation. I like how the upper/lower bounds are smoothed out using kernel densities (i.e. the min and max will extend slightly beyond my hard coded input).
Either way... I was wondering if there is a way to treat an ogive as a type of cumulative distribution function and/or use kernel density estimation to derive a cumulative distribution function given y data?
I apologize if this is a confusing question. I know there is a way to derive a cumulative frequency graph (i.e. ogive). However, I can't determine how to derive a % given this cumulative frequency graph.
What I don't want is an ecdf. I know how to do that, and I am not quite trying to capture an ecdf. But, rather integration of an ogive given two intervals.
I'm not exactly sure what you have in mind, but here's a way to calculate the area under the curve for a kernel density estimate (or more generally for any case where you have the y values at equally spaced x-values (though you can, of course, generalize to variable x intervals as well)):
library(zoo)
# Kernel density estimate
# Set n to higher value to get a finer grid
set.seed(67839)
dens = density(c(rnorm(500,5,2),rnorm(200,20,3)), n=2^5)
# How to extract the x and y values of the density estimate
#dens$y
#dens$x
# x interval
dx = median(diff(dens$x))
# mean height for each pair of y values
h = rollmean(dens$y, 2)
# Area under curve
sum(h*dx) # 1.000943
# Cumulative area
# cumsum(h*dx)
# Plot density, showing points at which density is calculated
plot(dens)
abline(v=dens$x, col="#FF000060", lty="11")
# Plot cumulative area under curve, showing mid-point of each x-interval
plot(dens$x[-length(dens$x)] + 0.5*dx, cumsum(h*dx), type="l")
abline(v=dens$x[-length(dens$x)] + 0.5*dx, col="#FF000060", lty="11")
UPDATE to include ecdf function
To address your comments, look at the two plots below. The first is the empirical cumulative distribution function (ECDF) of the mixture of normal distributions that I used above. Note that the plot of this data looks the same below as it does above. The second is a plot of the ECDF of a plain vanilla normal distribution, mean=0, sd=1.
set.seed(67839)
x = c(rnorm(500,5,2),rnorm(200,20,3))
plot(ecdf(x), do.points=FALSE)
plot(ecdf(rnorm(1000)))

Wireframe plot with small values in R

I have data with very small values between -1 to 1 in X, Y and Z values between -1 to 1 like below
X,Y,Z
-0.858301,-1,1.00916
-0.929151,-1,1.0047
-0.896405,-0.940299,1.00396
-0.960967,-0.944075,1.00035
wireframe(Z~X+Y,data=sol)
Seems wireframe works only with larger values (1, 2, 3...) , How do I plot small values?
wireframe might be use in one of two ways -
With a rectangular data matrix where the values of x and y are implied by the shape of the matrix.
wireframe(matrix(rnorm(100),ncol=5),drape=TRUE)
Or with a dataframe, where the values of x and y are explicit, and here you can use a formula for the relationships between the columns.
df<-expand.grid(x = seq(0,.1,.01), y = seq(0,.1,.01))
df$z<-rnorm(121)
wireframe(z~x*y,data=df,drape=TRUE)
I've found that if you include the line defining the z axis limits, then you can't draw it below 1. But if you take out the defined axis limits, and let R graph it itself, then it works and you can graph small numbers.

Resources