Smooth curve through points and include the origin in R - r

I am a beginner in R and started with graphics recently.
I have managed to program a working empirical cumulative distribution function (user-generated, not using the standard ecdf() function) and to generate a plot. However, the plot is not as it should be, there are two issues with it and I am not sure on how to solve them (I have done my 'research' but have not found a solution).
This is my code:
set.seed(1)
n = 50
x = rpois(n, 2.2)
cdf = function(x,n)
{
v=c()
for(z in 1:max(x))
{
a = length(x[x<=z])/n
v = c(v, a)
}
plot(v,type="l", main="empirical cumulative distribution function", xlab="x", ylab="cumulative probability", xlim=c(0,6), ylim=c(0,1.0))
}
cdf(x, n)
There are two issues with this plot:
The lines are straight but it should be a smooth curve through all points.
The origin is not included (now the curve starts at x = 1).
How can these issues be resolved in an elegant way?

Try the following spline interpolator:
plot(spline(c(0, v)), type = "l")

Related

How can I fit a smooth hysteresis in R?

I have a measurment of which should fit an hysteresis. For visualisation purpose I would like to plot a line approximating the hysteresis to help explain this pattern.
I created an example in the following image using the code below.
I would like to have an output similar to the green curve - however I don't have this data directly available, and I don't care whether it is pointy.
However most smoothing functions such as smooth.spline which I plotted in blue - allow no loops. The closest I can find is from the bezier library - plotted in red. Not nicely visible here but it produces a loop, however it fits poorly (and gives some warnings and takes quite some time).
Can you suggest a method?
set.seed(12345)
up <- seq(0,1,length.out=100)^3
down <- sqrt(seq(1,0,length.out=100))
x <- c(seq(0,1,length.out=length(up)),
seq(1,0, length.out=length(down)))
data <- data.frame(x=x, y=c(up,down),
measuredx=x + rnorm(length(x))*0.01,
measuredy=c(up,down) + rnorm(length(up)+length(down))*0.03)
with(data,plot(measuredx,measuredy, type = "p"))
with(data,lines(x,y, col='green'))
sp <- with(data,smooth.spline(measuredx, measuredy))
with(sp, lines(x,y, col="blue"))
library(bezier)
bf <- bezierCurveFit(as.matrix(data[,c(1,3)]))
lines(bezier(t=seq(0, 1, length=500), p=bf$p), col="red", cex=0.25)
UPDATE
As it turns out my actual problem is slightly different I ask another question to reflect my actual issue in the question: How to fit a smooth hysteresis in a poorly distributed data set?
set.seed(12345)
up <- seq(0,1,length.out=100)^3
down <- sqrt(seq(1,0,length.out=100))
x <- c(seq(0,1,length.out=length(up)),
seq(1,0, length.out=length(down)))
data <- data.frame(x=x, y=c(up,down),
measuredx=x + rnorm(length(x))*0.01,
measuredy=c(up,down) + rnorm(length(up)+length(down))*0.03)
Instead of smoothing data$measuredy directly over data$measuredx, do two separate smoothing, by smoothing each against a time stamp variable. Then combine the fitted values from two smoothing. This is a general way for smoothing a closed curve or a loop. (See also Q & A: Smoothing Continuous 2D Points)
t <- seq_len(nrow(data) + 1)
xs <- smooth.spline(t, c(data$measuredx, data$measuredx[1]))$y
ys <- smooth.spline(t, c(data$measuredy, data$measuredy[1]))$y
with(data, plot(measuredx, measuredy))
lines(xs, ys)
c(data$measuredx, data$measuredx[1]) for example is just to ensure that the last value in the vector agrees with the first, so that it completes a cycle.
The curve is not really closed at the bottom left corner, because smooth.spline is doing smoothing not interpolation, so even if we have ensure that data vector completes a cycle, the fitted one may not be a closed one. A practical workaround is to use weighted regression, imposing heavy weight on this spot to make it closed.
t <- seq_len(nrow(data) + 1)
w <- rep(1, length(t)) ## initially identical weight everywhere
w[c(1, length(w))] <- 100000 ## give heavy weight
xs <- smooth.spline(t, c(data$measuredx, data$measuredx[1]), w)$y
ys <- smooth.spline(t, c(data$measuredy, data$measuredy[1]), w)$y
with(data, plot(measuredx, measuredy), col = 8)
lines(xs, ys, lwd = 2)

How to Plot an exponential distribution of spike times over a histogram of them, in R?

So my question follows the development after my last one. I have been trying to work on getting the spike times as a rastor plot for a spike train. I took a firing rate of 100 and got spike train for 20 trials: The code for that is:
fr = 100
dt = 1/1000 #dt in milisecond
duration = 2 #no of duration in s
nBins = 2000 #SpikeTrain
nTrials = 20 #NumberOfSimulations
MyPoissonSpikeTrain = function(p, fr= 100) {
p = runif(nBins)
q = ifelse(p < fr*dt, 1, 0)
return(q)
}
set.seed(1)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
plot(x=-1,y=-1, xlab="time (s)", ylab="Trial",
main="Spike trains",
ylim=c(0.5, nTrials+1), xlim=c(0, duration))
for (i in 1: nTrials)
{
clip(x1 = 0, x2= duration, y1= (i-0.2), y2= (i+0.4))
abline(h=i, lwd= 1/4)
abline(v= dt*which( SpikeMat[i,]== 1))
}
This gives the result:
After all this was done, my next task was to get a vector of Inter-Spike intervals and get a histogram of them. Because the distribution of ISIs follows the exponential distribution, if I plot the exponential distribution of ISIs with the same data, it will match the curve made by the height of the histograms.
So to get the interspike timings first, I used:
spike_times <- c(dt*which( SpikeMat[i, ]==1))
Then to get a vector for interspike intervals and their histogram, I used the following command line,
ISI <- diff(spike_times)
hist(ISI, density= 10, col= 'blue', xlab='ISI(ms)', ylab='number of occurences')
and it gave me this plot:
Now, What I want is to plot the exponential distributions within the histograms that justifies the exponential distribution nature of the inter spike intervals. I am confused about what parameters to use and which rate to use. If somebody has worked with Interspike interval plotting, please help. And I am sorry if my data seems incomplete, please let me know if I am missing something.
My fellow researcher just told me a simple line of codes:
x <- seq(0, 0.05, length=1000)
y <- dexp(x, rate=100)
lines(x,y)
which gave me, this:
If somebody has any way of making this process more efficient, please help me.

ggplot2: easy way to plot integral over independent variable?

I'm integrating a function f(t) = 2t (just an example) and would like to plot the integral as a function of time t using
awesome_thing <- function(t) {2*t}
integrate(awesome_thing, lower=0, upper=10)
However, I would like to plot the integral as a function of time in ggplot2, so for this example the plotted points would be (1,1), (2,4), (3,9), ..., (10,100).
Is there an easy way to do this in ggplot (e.g., something similar to how functions are plotted)? I understand I can "manually" evaluate and plot the data for each t, but I thought i'd see if anyone could recommend a simpler way.
Here is a ggplot solution and stat_function
# create a function that is vectorized over the "upper" limit of your
# integral
int_f <- Vectorize(function(f = awesome_thing, lower=0,upper,...){
integrate(f,lower,upper,...)[['value']] },'upper')
ggplot(data.frame(x = c(0,10)),aes(x=x)) +
stat_function(fun = int_f, args = list(f = awesome_thing, lower=0))
Not ggplot2 but shouldn't be difficult to adapt by creating a dataframe to pass to that paradgm:
plot(x=seq(0.1,10, by=0.1),
y= sapply(seq(0.1,10, by=0.1) ,
function(x) integrate(awesome_thing, lower=0, upper=x)$value ) ,
type="l")
The trick with the integrate function is that it retruns a list and you need to extract the 'value'-element for various changes in the upper limit.

R superimposing bivariate normal density (ellipses) on scatter plot

There are similar questions on the website, but I could not find an answer to this seemingly very simple problem. I fit a mixture of two gaussians on the Old Faithful Dataset:
if(!require("mixtools")) { install.packages("mixtools"); require("mixtools") }
data_f <- faithful
plot(data_f$waiting, data_f$eruptions)
data_f.k2 = mvnormalmixEM(as.matrix(data_f), k=2, maxit=100, epsilon=0.01)
data_f.k2$mu # estimated mean coordinates for the 2 multivariate Gaussians
data_f.k2$sigma # estimated covariance matrix
I simply want to super-impose two ellipses for the two Gaussian components of the model described by the mean vectors data_f.k2$mu and the covariance matrices data_f.k2$sigma. To get something like:
For those interested, here is the MatLab solution that created the plot above.
If you are interested in the colors as well, you can use the posterior to get the appropriate groups. I did it with ggplot2, but first I show the colored solution using #Julian's code.
# group data for coloring
data_f$group <- factor(apply(data_f.k2$posterior, 1, which.max))
# plotting
plot(data_f$eruptions, data_f$waiting, col = data_f$group)
for (i in 1: length(data_f.k2$mu)) ellipse(data_f.k2$mu[[i]],data_f.k2$sigma[[i]], col=i)
And for my version using ggplot2.
# needs ggplot2 package
require("ggplot2")
# ellipsis data
ell <- cbind(data.frame(group=factor(rep(1:length(data_f.k2$mu), each=250))),
do.call(rbind, mapply(ellipse, data_f.k2$mu, data_f.k2$sigma,
npoints=250, SIMPLIFY=FALSE)))
# plotting command
p <- ggplot(data_f, aes(color=group)) +
geom_point(aes(waiting, eruptions)) +
geom_path(data=ell, aes(x=`2`, y=`1`)) +
theme_bw(base_size=16)
print(p)
You can use the ellipse-function from package mixtools. The initial problem was that this function swaps x and y from your plot. I'll try to figure this out and update the answe. (I'll leave the colors to somebody else...)
plot( data_f$eruptions,data_f$waiting)
for (i in 1: length(data_f.k2$mu)) ellipse(data_f.k2$mu[[i]],data_f.k2$sigma[[i]])
Using mixtools internal plotting function:
plot.mixEM(data_f.k2, whichplots=2)

What does autoplot.microbenchmark actually plot?

According to the docs, microbenchmark:::autoplot "Uses ggplot2 to produce a more legible graph of microbenchmark timings."
Cool! Let's try the example code:
library("ggplot2")
tm <- microbenchmark(rchisq(100, 0),
rchisq(100, 1),
rchisq(100, 2),
rchisq(100, 3),
rchisq(100, 5), times=1000L)
autoplot(tm)
I don't see anything about the...squishy undulations in the documentation, but my best guess from this answer by the function creator is that this is like a smoothed series of boxplots of the time taken to run, with the upper and lower quartiles connected over the body of the shape. Maybe? These plots look too interesting not to find out what is going on here.
What is this a plot of?
The short answer is a violin plot:
It is a box plot with a rotated kernel density plot on each side.
The longer more interesting(?) answer. When you call the autoplot function, you are actually calling
## class(ts) is microbenchmark
autoplot.microbenchmark
We can then inspect the actual function call via
R> getS3method("autoplot", "microbenchmark")
function (object, ..., log = TRUE, y_max = 1.05 * max(object$time))
{
y_min <- 0
object$ntime <- convert_to_unit(object$time, "t")
plt <- ggplot(object, ggplot2::aes_string(x = "expr", y = "ntime"))
## Another ~6 lines or so after this
The key line is + stat_ydensity(). Looking at ?stat_ydensity you
come to the help page on violin plots.

Resources