I want to create a 3d plot with densities.
I use the function density to first create a 2d dimensional plot for specific x values, the function then creates the density and puts them into a y variable. Now I have a second set of x values and put it again into the density function and I get a second set of y variables and so on....
I want to put those sets into a 3d plot, I hope you know what I mean. So I have a surface of densities....
E.g. I have:
x1<-c(1:10)
x2<-c(2:11)
y1<-c(1,1,2,1,3,4,2,3,2,2)
y2<-c(1,2,3,1,3,6,2,8,2,2)
.
.
.
.
Now I want to put on the x axis for the first value 1 the first set , on the y axis the corresponding x values and on the z axis the densities. So I have a "disk" for x=1, for x=2 I have the second "disk" and so on, so I get a density "mountain".
I hope I am understandable, if you have a better idea to realize it then you are welcome!
I want to do it with the persp function, would be nice if you make an example with that function,
Thanks a lot for your help.
I'm afraid I can't make head or tail out of your question. But here is how you draw a plot of the sort I think you are looking for from a two dimensional dataset for which you first estimate the bivariate density:
x <- rnorm(1000)
y <- 2 + x*rnorm(1000,1,.1) + rnorm(1000)
library(MASS)
den3d <- kde2d(x, y)
persp(den3d, box=FALSE)
Then there are many options for persp, check out
?persp
Building on Peter answer. The plot can now be more interesting, prettier and interactive with the plotly library.
x <- rnorm(1000)
y <- 2 + x*rnorm(1000,1,.1) + rnorm(1000)
library(MASS)
den3d <- kde2d(x, y)
# the new part:
library(plotly)
plot_ly(x=den3d$x, y=den3d$y, z=den3d$z) %>% add_surface()
which gives:
Related
my problem is the following:
I have to plot a curve which shows the number of breakdowns (y) by the service life (x) but in a cumulative way - and that's the point where I struggle!!
The solution is given in the second Picture, my code in the first (I think only the type of the plot should be different)
my code
solution
Thanks so much for every help!!
I can't replicate your data, so this is more of a comment, then a complete solution.
n <- sum(h$counts) # This should sum up to the number of observations
y <- cumsum(h$counts) / n # Your y values
x <- h$mids # I assume these to be your x-axis value, but this might need an edit.
plot(x = x, y = y, type = "l")
Finally, you can add the vertical and horizontal lines via the abline() function at the respective points.
I have a perspective plot of a locfit model and I wish to add two things to it
Predictor variables as points in the 3D space
Color the surface according to the Z axis value
For the first, I have tried to use the trans3d function. But I get the following error even though my variables are in vector format:
Error in cbind(x, y, z, 1) %*% pmat : requires numeric/complex matrix/vector arguments
Here is a snippet of my code
library(locfit)
X <- as.matrix(loc1[,1:2])
Y <- as.matrix(loc1[,3])
zz <- locfit(Y~X,kern="bisq")
pmat <- plot(zz,type="persp",zlab="Amount",xlab="",ylab="",main="Plains",
phi = 30, theta = 30, ticktype="detailed")
x1 <- as.vector(X[,1])
x2 <- as.vector(X[,2])
Y <- as.vector(Y)
points(trans3d(x1,x2,Y,pmat))
My "loc1" data can be found here - https://www.dropbox.com/s/0kdpd5hxsywnvu2/loc1_amountfreq.txt?dl=0
TL,DR: not really in plot.locfit, but you can reconstruct it.
I don't think plot.locfit has good support for this sort of customisation. Supposedly get.data=T in your plot call will plot the original data points (point 1), and it does seem to do so, except if type="persp". So no luck there. Alternatively you can points(trans3d(...)) as you have done, except you need the perspective matrix returned by persp, and plot.locfit.3d does not return it. So again, no luck.
For colouring, typically you make a colour scale (http://r.789695.n4.nabble.com/colour-by-z-value-persp-in-raster-package-td4428254.html) and assign each z facet the colour that goes with it. However, you need the z-values of the surface (not the z-values of your original data) for this, and plot.locfit does not appear to return this either.
So to do what you want, you'll essentially be recoding plot.locfit yourself (not hard, though just cludgy).
You could put this into a function so you can reuse it.
We:
make a uniform grid of x-y points
calculate the value of the fit at each point
use these to draw the surface (with a colour scale), saving the perspective matrix so that we can
plot your original data
so:
# make a grid of x and y coords, calculate the fit at those points
n.x <- 20 # number of x points in the x-y grid
n.y <- 30 # number of y points in the x-y grid
zz <- locfit(Total ~ Mex_Freq + Cal_Freq, data=loc1, kern="bisq")
xs <- with(loc1, seq(min(Mex_Freq), max(Mex_Freq), length.out=20))
ys <- with(loc1, seq(min(Cal_Freq), max(Cal_Freq), length.out=30))
xys <- expand.grid(Mex_Freq=xs, Cal_Freq=ys)
zs <- matrix(predict(zz, xys), nrow=length(xs))
# generate a colour scale
n.cols <- 100 # number of colours
palette <- colorRampPalette(c('blue', 'green'))(n.cols) # from blue to green
# palette <- colorRampPalette(c(rgb(0,0,1,.8), rgb(0,1,0,.8)), alpha=T)(n.cols) # if you want transparency for example
# work out which colour each z-value should be in by splitting it
# up into n.cols bins
facetcol <- cut(zs, n.cols)
# draw surface, with colours (col=...)
pmat <- persp(x=xs, y=ys, zs, theta=30, phi=30, ticktype='detailed', main="plains", xlab="", ylab="", zlab="Amount", col=palette[facetcol])
# draw your original data
with(loc1, points(trans3d(Mex_Freq,Cal_Freq,Total,pmat), pch=20))
Note - doesn't look that pretty! might want to adjust say your colour scale colours, or the transparency of the facets, etc. Re: adding legend, there are some other questions that deal with that.
(PS: what a shame ggplot doesn't do 3D scatter plots.)
My data are pre-processed image data and I want to seperate two classes. In therory (and hopefully in practice) the best threshold is the local minimum between the two peaks in the bimodal distributed data.
My testdata is: http://www.file-upload.net/download-9365389/data.txt.html
I tried to follow this thread:
I plotted the histogram and calculated the kernel density function:
datafile <- read.table("....txt")
data <- data$V1
hist(data)
d <- density(data) # returns the density data with defaults
hist(data,prob=TRUE)
lines(d) # plots the results
But how to continue?
I would calculate the first and second derivates of the density function to find the local extrema, specifically the local minimum. However I have no idea how to do this in R and density(test) seems not to be a normal function. Thus please help me: how can I calculate the derivates and find the local minimum of the pit between the two peaks in the density function density(test)?
There are a few ways to do this.
First, using d for the density as in your question, d$x and d$y contain the x and y values for the density plot. The minimum occurs when the derivative dy/dx = 0. Since the x-values are equally spaced, we can estimate dy using diff(d$y), and seek d$x where abs(diff(d$y)) is minimized:
d$x[which.min(abs(diff(d$y)))]
# [1] 2.415785
The problem is that the density curve could also be maximized when dy/dx = 0. In this case the minimum is shallow but the maxima are peaked, so it works, but you can't count on that.
So a second way uses optimize(...) which seeks a local minimum in a given interval. optimize(...) needs a function as argument, so we use approxfun(d$x,d$y) to create an interpolation function.
optimize(approxfun(d$x,d$y),interval=c(1,4))$minimum
# [1] 2.415791
Finally, we show that this is indeed the minimum:
hist(data,prob=TRUE)
lines(d, col="red", lty=2)
v <- optimize(approxfun(d$x,d$y),interval=c(1,4))$minimum
abline(v=v, col="blue")
Another approach, which is preferred actually, uses k-means clustering.
df <- read.csv(header=F,"data.txt")
colnames(df) = "X"
# bimodal
km <- kmeans(df,centers=2)
df$clust <- as.factor(km$cluster)
library(ggplot2)
ggplot(df, aes(x=X)) +
geom_histogram(aes(fill=clust,y=..count../sum(..count..)),
binwidth=0.5, color="grey50")+
stat_density(geom="line", color="red")
The data actually looks more trimodal than bimodal.
# trimodal
km <- kmeans(df,centers=3)
df$clust <- as.factor(km$cluster)
library(ggplot2)
ggplot(df, aes(x=X)) +
geom_histogram(aes(fill=clust,y=..count../sum(..count..)),
binwidth=0.5, color="grey50")+
stat_density(geom="line", color="red")
I have fit a LOESS local regression to some data and I want to be able to find the X value associated with a given Y value.
plot(cars, main = "Stopping Distance versus Speed")
car_loess <- loess(cars$dist~cars$speed,span=.5)
lines(1:50, predict(car_loess,data.frame(speed=1:50)))
I was hoping that I could use teh inverse.predict function from the chemCal package, but that does not work for LOESS objects.
Does anyone have any idea how I might be able to do this calibrationa in a better way than predicticting Y values from a long vector of X values and looking through the resulting fitted Y for the Y value of interest and taking its corresponding X value?
Practically speaking in the above example, let's say I wanted to find the speed at which the stopping distance is 15.
Thanks!
The predicted line that you added to the plot is not quite right. Use code like this instead:
# plot the loess line
lines(cars$speed, car_loess$fitted, col="red")
You can use the approx() function to get a linear approximation from the loess line at a give y value. It works just fine for the example that you give:
# define a given y value at which you wish to approximate x from the loess line
givenY <- 15
estX <- approx(x=car_loess$fitted, y=car_loess$x, xout=givenY)$y
# add corresponding lines to the plot
abline(h=givenY, lty=2)
abline(v=estX, lty=2)
But, with a loess fit, there may be more than one x for a given y. The approach I am suggesting does not provide you with ALL of the x values for the given y. For example ...
# example with non-monotonic x-y relation
y <- c(1:20, 19:1, 2:20)
x <- seq(y)
plot(x, y)
fit <- loess(y ~ x)
# plot the loess line
lines(x, fit$fitted, col="red")
# define a given y value at which you wish to approximate x from the loess line
givenY <- 15
estX <- approx(x=fit$fitted, y=fit$x, xout=givenY)$y
# add corresponding lines to the plot
abline(h=givenY, lty=2)
abline(v=estX, lty=2)
I have this problem. I got a heatmap, (but i suppose this applies to every plot) but I need to mirror my y-axis.
I got here some example code:
library(gstat)
x <- seq(1,50,length=50)
y <- seq(1,50,length=50)
z <- rnorm(1000)
df <- data.frame(x=x,y=y,z=z)
image(df,col=heat.colors(256))
This will generate the following heatmap
But I need the y-axis mirrored. Starting with 0 on the top and 50 on the bottom. Does anybody has a clue as to what I must do to change this?
See the help page for ?plot.default, which specifies
xlim: the x limits (x1, x2) of the plot. Note that ‘x1 > x2’ is
allowed and leads to a ‘reversed axis’.
library(gstat)
x <- seq(1,50,length=50)
y <- seq(1,50,length=50)
z <- rnorm(1000)
df <- data.frame(x=x,y=y,z=z)
So
image(df,col=heat.colors(256), ylim = rev(range(y)))
Does this work for you (it's a bit of a hack, though)?
df2<-df
df2$y<-50-df2$y #reverse oredr
image(df2,col=heat.colors(256),yaxt="n") #avoid y axis
axis(2, at=c(0,10,20,30,40,50), labels=c(50,40,30,20,10,0)) #draw y axis manually
The revaxis function in the plotrix package "reverses the sense of either or both the ‘x’ and ‘y’ axes". It doesn't solve your problem (Nick's solution is the correct one) but can be useful when you need to plot a scatterplot with reversed axes.
I would use rev like so:
df <- data.frame(x=x,y=rev(y),z=z)
In case you were not aware, notice that df is actually a function. You might want to be careful when overwriting. If you rm(df), things will go back to normal.
Don't forget to relabel the y axis as Nick suggests.
For the vertical axis increasing in the downward direction, I provided two ways (two different answers) for the following question:
R - image of a pixel matrix?