I'm not sure if that's the right terminology.
I've entered some data into R, and I've put a smooth.spline through it using the following command.
smoothingSpline = smooth.spline(year, rate, spar=0.35)
plot(x,y)
lines(smoothingSpline)
Now I'd like to measure some limits (or where the curve is at a given x point), and maybe do some predictive analysis on points that extend beyond the graph.
Are there commands in R for measuring (or predicting) the points along a curve?
Is ?predict.smooth.spline what you are looking for?
Related
What I have: A scatter chart(plot) of PCA. Plotted in JS. I have Rtools that Ive used to push PCA data to the client side.
What I'm trying to do: Plot a confidence ellipse formula.
I can't seem to find a straight forward formula for the CI ellipse. I came across a lot of theory and a lot of examples in R which give you the end result - an ellipse (One can use ggplot or CRAN to plot it).
But Im looking for a formula that I could use in the client side to plug my scatter chart points and calculate the ellipse or even better a function in R that would give me a formula for the ellipse.
I have the covariance matrix and Eigen vectors as well (calculated in R).
All suggestions much appreciated.
Haven't found a formula but after using Momocs:::conf_ell library I managed to get the vertices and the x,y points of an ellipse.
I will update this answer once I find the second part to my answer - a straight forward formula.
I have an R plot that looks like this:
The redline is the attempted smoothing with lines(smooth.spline(x, y, spar=0.000001)). Notice the insanely low spar value that STILL fails to include the spike near the end of the graph. Nevertheless, it is because of the number of points plotted: 107350. The 20 points near the end are unable to sway, although it is clearly noticeable that these are different than the rest.
What kind of R smoothing function could I use that encompasses these points?
Or if a smoother won't do it, how would I be able to "statistically" distinguish these points?
My question consists of two sub questions.
I have a graphical illustration presenting (some virtual) worst case scenarios sampled from history organized based on two parameters.
Image:
At this moment I have a point cloud. I would like to create nicely splined density cloud of my results. I would like the 3d spline to consider density of points when aproximating (so aproximate further around when there are less samples availabe and more exactly in more dense region of space)
Because then, having that density cloud, I would be able scale the density in each vertical line specified by the two input parameters, and that would make it a likehood function of each outcome - [the worst case scenario])
Second part is, I would like to plot it, at best as semi-transparent 3d-regions that would be forming sometihng like a fog around the most dense region.
Uh,wow.. that wasn't easy to explain. Sigh. :)
Thanks for reading that far.
So here is a way to generate 3D density plots using the ks package. Since you provided no data this example is taken directly from the documentation to plot(...) in the ks package
library(MASS)
library(ks)
x <- iris[,1:3]
H.pi <- Hpi(x, pilot="samse")
fhat <- kde(x, H=H.pi, compute.cont=TRUE)
plot(fhat, drawpoints=TRUE)
There is histfit function in Matlab would plot histogram and fit the distribution by bin values.
The distribution's parameters have to be estimated.
How to implement histfit in r? I searched for a long time, but it has no lucky.
This post have mentioned this before, but there is no preferable solution. The sn package seems support several distribution, not so much.
I explore the data with hist function, the histogram shows gamma distribution in gerneral.
But if I add up bins and show it again, the graph will show more details, and gamma distribution fails.
fitdistr would fail to find parameters also.
so I want to fit the data just using the coarse data from histogram. This is the question, thank you for your help.
The fitdistr function in the MASS package can be used to find parameters for a given distribution (including gamma). The function density and the logspline package (and others) can be used to estimate the density function of the data without assuming a specific distribution.
The lines and curve functions can be used to add an estimated density curve to a plotted histogram (use prob=TRUE when creating the histogram).
If you want to compare your data to a specific distribution then tools like qqplots (qqplot function or others) or visual tests (vis.test in the TeachingDemos package) will probably be better than a histogram and density plot.
I have to answer it myself, package 'bda' could fit the binned data in several distributions, however it could only binning data by rounding.
Background:
I'm running a Monte Carlo simulation to show that a particular process (a cumulative mean) does not converge over time, and often diverges wildly in simulation (the expectation of the random variable = infinity). I want to plot about 10 of these simulations on a line chart, where the x axis has the iteration number, and the y axis has the cumulative mean up to that point.
Here's my problem:
I'll run the first simulation (each sim. having 10,000 iterations), and build the main plot based on its current range. But often one of the simulations will have a range a few orders of magnitude large than the first one, so the plot flies outside of the original range. So, is there any way to dynamically update the ylim or xlim of a plot upon adding a new set of points or lines?
I can think of two workarounds for this: 1. store each simulation, then pick the one with the largest range, and build the base graph off of that (not elegant, and I'd have to store a lot of data in memory, but would probably be laptop-friendly [[EDIT: as Marek points out, this is not a memory-intense example, but if you know of a nice solution that'd support far more iterations such that it becomes an issue (think high dimensional walks that require much, much larger MC samples for convergence) then jump right in]]) 2. find a seed that appears to build a nice looking version of it, and set the ylim manually, which would make the demonstration reproducible.
Naturally I'm holding out for something more elegant than my workarounds. Hoping this isn't too pedestrian a problem, since I imagine it's not uncommon with simulations in R. Any ideas?
I'm not sure if this is possible using base graphics, if someone has a solution I'd love to see it. However graphics systems based on grid (lattice and ggplot2) allow the graphics object to be saved and updated. It's insanely easy in ggplot2.
require(ggplot2)
make some data and get the range:
foo <- as.data.frame(cbind(data=rnorm(100), numb=seq_len(100)))
make an initial ggplot object and plot it:
p <- ggplot(as.data.frame(foo), aes(numb, data)) + layer(geom='line')
p
make some more data and add it to the plot
foo <- as.data.frame(cbind(data=rnorm(200), numb=seq_len(200)))
p <- p + geom_line(aes(numb, data, colour="red"), data=as.data.frame(foo))
plot the new object
p
I think (1) is the best option. I actually don't think this isn't elegant. I think it would be more computationally intensive to redraw every time you hit a point greater than xlim or ylim.
Also, I saw in Peter Hoff's book about Bayesian statistics a cool use of ts() instead of lines() for cumulative sums/means. It looks pretty spiffy: