R smoothing functions - r

I have an R plot that looks like this:
The redline is the attempted smoothing with lines(smooth.spline(x, y, spar=0.000001)). Notice the insanely low spar value that STILL fails to include the spike near the end of the graph. Nevertheless, it is because of the number of points plotted: 107350. The 20 points near the end are unable to sway, although it is clearly noticeable that these are different than the rest.
What kind of R smoothing function could I use that encompasses these points?
Or if a smoother won't do it, how would I be able to "statistically" distinguish these points?

Related

Confidence interval square in a plot with one variable in each axis in ggplot

Although it might sound easy at first, I do not have a scatterplot. And I think that is what make this question challenging. I am having this plot, which comes from this question.
Summing up, each axis represents a variable that is not connected to the other. It is not an XY scatterplot, as you see.
I wonder to know if there is any possibility to trace the 95% confidence interval for the mean in both variables, and draw a square in the middle of the plot representing the overlapping area among both datasets.
The result might be something similar to this, bearing in mind that 95CL represented do not correspond to reality (just for the sake of illustrating how it might appear):
Here is a another question which deals with this situation, but not using ggplot.

How to plot different ROC curves with different symbols on the line using ROCR package?

I am trying to plot average ROC curves from different models using ROCR package.
I actually made it work, with each curve in different colors. However, in a black and white printing, I need to plot different curves with different symbols, rather than colors. I tried using type="o" and pch options in plot. However, I guess because the ROCR performance creates so many points for plotting an accurate roc curve, the curves just look like a very thick solid lines - you cannot tell which symbol used for each curve.
And here is the code that I used:
pred_our_update<-prediction(prob_our_update,label)
perf_our_update<-performance(pred_our_update,"tpr","fpr")
plot(perf_our_update,avg="vertical",spread.estimate="stderror",type="o", pch=1,add=TRUE)
Anyone know how to resolve this?
One easy solution is using the downsampling option to cut down the amount of data actually plotted, which may let the symbols stand out more without making any material difference to the shape of the curves. I don't know your data set size, but perhaps start with:
plot(perf_our_update,avg="vertical",spread.estimate="stderror",downsampling=0.1,type="o", pch=1,add=TRUE)

Drawing circles in R

I'm using plotrix package to draw circles.
And I don't get what is wrong with my code... :-(
I have three points. The first point (1,1) should be the center of the circle. The following two points (1,4) and (4,1) have the same distance/radius to the center.
So the circle in the plot should go through these points, right?
And I don't know why the circle looks wrong. Is there an explanation?
p1 <- c(1,1)
p2 <- c(4,1)
p3 <- c(1,4)
r <- sqrt(sum((p1-p2)^2))
plot(x=c(p1[1], p2[1], p3[1]),
y=c(p1[2], p2[2], p3[2]),
ylim=c(-5,5), xlim=c(-5,5))
draw.circle(x=p1[1], y=p1[2], radius=(r))
abline(v=-5:5, col="#0000FF66")
abline(h=-5:5, col="#0000FF66")
Take a look at the produced output here
As #Baptiste says above, you can use plot(...,asp=1). This will only work if your x and y ranges happen to be the same, though (because it sets the physical aspect ratio of your plot to 1). Otherwise, you probably want to use the eqscplot function from the MASS package. A similar issue arises whenever you try to do careful plots of geometric objects, e.g. Drawing non-intersecting circles
This plot is produced by substituting MASS::eqscplot for plot in your code above:
Note that depending on the details of what R thinks about your monitor configuration etc., the circle may look a bit squashed (even though it goes through the points) when you plot in R's graphics window -- it did for me -- but should look OK in the graphical output.

Measuring the limit of a point on a smooth.spline in R

I'm not sure if that's the right terminology.
I've entered some data into R, and I've put a smooth.spline through it using the following command.
smoothingSpline = smooth.spline(year, rate, spar=0.35)
plot(x,y)
lines(smoothingSpline)
Now I'd like to measure some limits (or where the curve is at a given x point), and maybe do some predictive analysis on points that extend beyond the graph.
Are there commands in R for measuring (or predicting) the points along a curve?
Is ?predict.smooth.spline what you are looking for?

R: update plot [xy]lims with new points() or lines() additions?

Background:
I'm running a Monte Carlo simulation to show that a particular process (a cumulative mean) does not converge over time, and often diverges wildly in simulation (the expectation of the random variable = infinity). I want to plot about 10 of these simulations on a line chart, where the x axis has the iteration number, and the y axis has the cumulative mean up to that point.
Here's my problem:
I'll run the first simulation (each sim. having 10,000 iterations), and build the main plot based on its current range. But often one of the simulations will have a range a few orders of magnitude large than the first one, so the plot flies outside of the original range. So, is there any way to dynamically update the ylim or xlim of a plot upon adding a new set of points or lines?
I can think of two workarounds for this: 1. store each simulation, then pick the one with the largest range, and build the base graph off of that (not elegant, and I'd have to store a lot of data in memory, but would probably be laptop-friendly [[EDIT: as Marek points out, this is not a memory-intense example, but if you know of a nice solution that'd support far more iterations such that it becomes an issue (think high dimensional walks that require much, much larger MC samples for convergence) then jump right in]]) 2. find a seed that appears to build a nice looking version of it, and set the ylim manually, which would make the demonstration reproducible.
Naturally I'm holding out for something more elegant than my workarounds. Hoping this isn't too pedestrian a problem, since I imagine it's not uncommon with simulations in R. Any ideas?
I'm not sure if this is possible using base graphics, if someone has a solution I'd love to see it. However graphics systems based on grid (lattice and ggplot2) allow the graphics object to be saved and updated. It's insanely easy in ggplot2.
require(ggplot2)
make some data and get the range:
foo <- as.data.frame(cbind(data=rnorm(100), numb=seq_len(100)))
make an initial ggplot object and plot it:
p <- ggplot(as.data.frame(foo), aes(numb, data)) + layer(geom='line')
p
make some more data and add it to the plot
foo <- as.data.frame(cbind(data=rnorm(200), numb=seq_len(200)))
p <- p + geom_line(aes(numb, data, colour="red"), data=as.data.frame(foo))
plot the new object
p
I think (1) is the best option. I actually don't think this isn't elegant. I think it would be more computationally intensive to redraw every time you hit a point greater than xlim or ylim.
Also, I saw in Peter Hoff's book about Bayesian statistics a cool use of ts() instead of lines() for cumulative sums/means. It looks pretty spiffy:

Resources