stat_density_2d doesnt generate contours that match raw data - r

I have a plot showing the points in my data (image 1)
and a contour plot produced using stat_density_2d (image 2)
The contours clearly don't represent the raw data very well. I have used the same code to generate other contour plots that fit the data perfectly (image 3)
The code I am using is:
SolidReg<-ggplot(RhyShp[,c(13,15)], aes(x=Solidity, y=Reg) ) +
stat_density_2d(aes(fill = ..level..), geom = "polygon") +
labs(x = "Solidity", y = "Regularity") +
theme_classic()
RhyShp is the dataframe from my file 5_102_Rhy.csv used to generate images 1 and 2.
Does anyone know why the contour plot doesn't reflect the dataset?
I am not sure why the code would work for one csv but not another....
thanks!

Turns out this was an issue with the data containing multiples of identical values which skewed the density without being discernible in the geom_point() plot. Once these duplicates were removed the density plot reflected the true density of the data.

Related

How do I add intensity legend of colors after I plot using grid.raster()?

I am doing kmeans clustering on a png image and have been plotting it using grid::grid.raster(image). But I would like to put a legend which shows the intensity in a bar(from blue to red) marked with values, essentially indicating the intensity on the image. (image is an array where the third dimension equals 3 giving the red, green and blue channels.)
I thought of using grid.legend() but couldn't figure it out. I am hoping the community can help me out. Following is the image I have been using and after I perform kmeans clustering want a legend beside it that displays intensity on a continuous scale on a color bar.
Also I tried with ggplot2 and could plot the image but still couldn't plot the legend. I am providing the ggplot code for plotting the image. I can extract the RGB channels separately using ggplot2 also, so showing that also helps.
colassign <- rgb(Kmeans2#centers[clusters(Kmeans2),])
library(ggplot2)
ggplot(data = imgVEC, aes(x = x, y = y)) +
geom_point(colour = colassign) +
labs(title = paste("k-Means Clustering of", kClusters, "Colours")) +
xlab("x") +
ylab("y")
Did not find a way to use grid.raster() properly but found a way to do it by ggplot2 when plotting the RGB channels separately. Note: this only works for plotting the pannels separately, but this is what I needed. Following shows the code for green channel.
#RGB channels are respectively stored in columns 1,2,3.
#x-axis and y-axis values are stored in columns 4,5.
#original image is a nx5 matrix
ggplot(original_img[,c(3,4,5)], aes(x, y)) +
geom_point(aes(colour = segmented_img[,3])) +
scale_color_gradient2()+
# scale_color_distiller(palette="RdYlBu") can be used instead of scale_color_gradient2() to get color selections of choice using palette as argument.

Probability density Matrix Subtraction for heatmaps

Pretty new to R and stuck. I am attempting to normalize 2d probability density of a heat map by subtracting the 2d probability densities of another data set. I am looking where behaviors occur in space, however to do this I want to subtract out where the subjects just spend most of their time from were the behaviors are occuring to get an idea of relative density of just the behaviors. To do this I am trying to find the probability density matrices used to plot a heatmap for the following code:
ctrlplot<-ctrl %>% ggplot(aes(x=x, y=y)) +
stat_density_2d(geom = "raster", aes(fill = stat(density)), contour = FALSE)+
scale_fill_gradientn(colours=matlab.like(15), na.value = "gray",
as lowertick, uppertick, interval
limit=c(0,1.3e-05)) #sets the static limit of probabilities.
This works to make the heat plot for either data set plot, however I cannot find where ggplot or stat_density_2d is storing the density data to subtract the two.
Alternatively I have tried to get just the densities for both data sets using the following code and storing it as the variable dens:
n<-100
h<-c(bandwidth.nrd(ctrl$x),bandwidth.nrd(ctrl$y))
dens<-kde2d(ctrl$x,ctrl$y,n=n,h=h)
Now I am not sure how to subtract the resulting z values and get it back into a heat plot. I know there is likely an easy solution for this, but I am definitely stuck. Any advice on how to do this easier, or other suggestions on how to subtract the densities from one another would be greatly appreciated.
UPDATE:
I found a way to pull the density data from ggplot. I was able to pull the density data from two different data sets, subtract the vectors and place the densities back into the original data frame using the following code:
ctrlplot<-ctrl %>% ggplot(aes(x=x, y=y)) +
stat_density_2d(geom = "raster", aes(fill = stat(density)), contour = FALSE)+
scale_fill_gradientn(colours=matlab.like(15), na.value = "gray")
ctxplot<-ctx %>% ggplot(aes(x=x, y=y)) +
stat_density_2d(geom = "raster", aes(fill = stat(density)), contour = FALSE)+
scale_fill_gradientn(colours=matlab.like(15), na.value = "gray")
ctrlplot2<-ggplot_build(ctrlplot)
gbctrl<-ctrlplot2$data[[1]]
densctrl<-gbctrl$density
gbctx<-ggplot_build(ctxplot)
gbctx<-gbctx$data[[1]]
densctx<-gbctx$density
diff_ctrl_ctx<-densctrl-densctx
gbctrl$density<-diff_ctrl_ctx
ctrlplot2$data[[1]]<-gbctrl
ctrlplot2
ctrlplot
However the last two plots ctrlplot (original) and ctrlplot2(subtracted densities) give the same plot. Not sure if I am not replacing the correct parts of the data frame so that it updates for the graphing part since there are different lists in the original ggplot_build.

ggplot query or change plot limits

I have a ggplot object returned by a function in an R package. I want to add some elements to this plot before plotting it. But, I do not know the plot limits. Is there a way to query the ggplot object to find the plot limits? Actually, what I'd really like to do is simply set new limits for subsequent plotting, but I understand this is not possible, based on discussions of the impossibility of plotting data against two different y-axes.
For example, say I want to plot a small rectangle in lower-left corner of plot, but not knowing the plot limits, I don't know where to put it:
p = function() return(ggplot() + xlim(-2, 5) + ylim(-3, 5) +
geom_rect(mapping=aes(xmin=1, xmax=2, ymin=1, ymax=2)))
gp = p()
gp = gp + geom_rect(mapping=aes(xmin=0, ymin=0, xmax=0.5, ymax=0.5))
print(gp)
In ggplot2 3.0.0:
ggplot_build(gp)$layout$panel_params[[1]][c("x.range","y.range")]
ggplot_build(p)$layout$panel_ranges[[1]][c("x.range","y.range")]

R: overlay multiple plots with same y axis ggplot2 (with datapoints and only geom_smooth lines)

Maybe someone knows a simple solution to this:
I have a R script that produces several plots with ggplot2 out of one dataframe. These plots on their own look somewhat like these 2 images:
The problem I have: This script generates about 20 plots, all are using different variables and values for the x-axies, and the same variable but different values for the y-axies.
In general the plots look like this:
plot.a <- ggplot(DF[which(DF$a>0&DF$a<200),], aes(x=a, y=myY))+
geom_point(shape=1, color=color2, alpha=alpha) +
labs(title='a vs y', x='a', y='y')+
geom_smooth(method = lm, se = FALSE, color = color)+ ..
plot.b <- ggplot(DF[which(DF$b>0),], aes(x=b, y=myY))+
geom_point(shape=1, color=color2, alpha=alpha) +
labs(title='b vs. y', x='b', y='y')+
geom_smooth(method = lm, se = FALSE, color = color)+ ..
As you can see the different plots use different columns of the same DF and also only relevant data (e.g. DF$b>0 on plot.b but on plot.a DF$a<200) so there are not all lines of data used in every plot.
Now I want to combine all these plots (about 20 plots) into one plot with the same y-axies but different x-axies. I'am mainly interested in the geom_smooth trend lines of all plots.
Is there a way to combine all these plots into a new ggplot (or only the geom_smooth lines), with the same y-axies and adding and displaying for every new smooth line a new x-axies?
For better reading of the new plot, is it possible to create a different color of every geom_smooth and the corresponding x-axies (e.g. color = plottype) with a legend?
Thanks in advance!

ggplot boxplots with scatterplot overlay (same variables)

I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.

Resources