Pretty new to R and stuck. I am attempting to normalize 2d probability density of a heat map by subtracting the 2d probability densities of another data set. I am looking where behaviors occur in space, however to do this I want to subtract out where the subjects just spend most of their time from were the behaviors are occuring to get an idea of relative density of just the behaviors. To do this I am trying to find the probability density matrices used to plot a heatmap for the following code:
ctrlplot<-ctrl %>% ggplot(aes(x=x, y=y)) +
stat_density_2d(geom = "raster", aes(fill = stat(density)), contour = FALSE)+
scale_fill_gradientn(colours=matlab.like(15), na.value = "gray",
as lowertick, uppertick, interval
limit=c(0,1.3e-05)) #sets the static limit of probabilities.
This works to make the heat plot for either data set plot, however I cannot find where ggplot or stat_density_2d is storing the density data to subtract the two.
Alternatively I have tried to get just the densities for both data sets using the following code and storing it as the variable dens:
n<-100
h<-c(bandwidth.nrd(ctrl$x),bandwidth.nrd(ctrl$y))
dens<-kde2d(ctrl$x,ctrl$y,n=n,h=h)
Now I am not sure how to subtract the resulting z values and get it back into a heat plot. I know there is likely an easy solution for this, but I am definitely stuck. Any advice on how to do this easier, or other suggestions on how to subtract the densities from one another would be greatly appreciated.
UPDATE:
I found a way to pull the density data from ggplot. I was able to pull the density data from two different data sets, subtract the vectors and place the densities back into the original data frame using the following code:
ctrlplot<-ctrl %>% ggplot(aes(x=x, y=y)) +
stat_density_2d(geom = "raster", aes(fill = stat(density)), contour = FALSE)+
scale_fill_gradientn(colours=matlab.like(15), na.value = "gray")
ctxplot<-ctx %>% ggplot(aes(x=x, y=y)) +
stat_density_2d(geom = "raster", aes(fill = stat(density)), contour = FALSE)+
scale_fill_gradientn(colours=matlab.like(15), na.value = "gray")
ctrlplot2<-ggplot_build(ctrlplot)
gbctrl<-ctrlplot2$data[[1]]
densctrl<-gbctrl$density
gbctx<-ggplot_build(ctxplot)
gbctx<-gbctx$data[[1]]
densctx<-gbctx$density
diff_ctrl_ctx<-densctrl-densctx
gbctrl$density<-diff_ctrl_ctx
ctrlplot2$data[[1]]<-gbctrl
ctrlplot2
ctrlplot
However the last two plots ctrlplot (original) and ctrlplot2(subtracted densities) give the same plot. Not sure if I am not replacing the correct parts of the data frame so that it updates for the graphing part since there are different lists in the original ggplot_build.
Related
I have a dataframe with three columns "DateTime", "T_ET", and "LAI". I want to plot T_ET (on y-axis) against LAI (on x-axis) along with 0.1-bin LAI averaged values of T_ET on the same plot something like below (Wei et al., 2017):
In above figure, y-axis is T_ET or T/(E+T), x-axis is LAI, red open diamonds with error bars are 0.1-bin LAI averaged of black points and the standard deviation, solid line is
a regression of the individual data points (estimated from the bin averages), n is available data points. Dash lines are 95% confidence bounds.
How can I obtain the plot similar to above plot? Please find the sample data using the following link: file
or use following sample data:
df <- structure(list(DateTime = structure(c(1478088000, 1478347200, 1478692800, 1478779200, 1478865600, 1478952000, 1479124800, 1479211200, 1479297600, 1479470400), class = c("POSIXct", "POSIXt"), tzone = "GMT"),
T_ET = c(0.996408350852751, 0.904748351479432, 0.28771236118773, 0.364402232484906, 0.452348409759872, 0.415408041501318, 0.629291202120187, 0.812083112145703, 0.992414777441755, 0.818032913071265),
LAI = c(1.3434, 1.4669, 1.6316, 1.6727, 1.8476, 2.0225, 2.3723, 2.5472, 2.7221, 3.0719)),
row.names = c(NA, 10L),
class = "data.frame")
You can do this directly while plotting via stat_summary_bin(). By default, the geom associated with this would be the pointrange geom and uses mean_se(). bins= controls the number of bins, but you can also supply binwidth=. Note that with the pointrange geom, fatten controls the size of the central point:
ggplot(df, aes(LAI, T_ET)) + geom_point() + theme_classic() +
stat_summary_bin(bins=3, color='red', shape=5, fatten=5)
Your sample data is a little light, so here's another example via the diamonds dataset. Here, I'm constructing the same look as the example plot you show by combining the errorbar and poing geom. Please note that apparently setting the width of the errorbar doesn't work correctly with stat_summary_bin().
ggplot(diamonds, aes(carat, price)) + geom_point(size=0.3) +
stat_summary_bin(geom='errorbar', color='red', bins=12, width=0.001) +
stat_summary_bin(geom='point', size=3, shape=5, color='red', bins=12) +
theme_classic()
EDIT: Showing Regression for Binned Data
As indicated in the comments, drawing a regression line based on the binned data and not the original data is possible, but not through the stat_summary_bin() function unless you are okay to use loess. If you're looking for linear regression, you'll need to bin the data outside of ggplot, then plot the regression on the binned data.
The reason for this is probably by design. It's inherently not a good idea to draw a regression line (a way of summarizing data) that is based on summarized data. Regardless, here's one way to do this via the diamonds dataset. We can use the cut() function to cut into separate bins, then summarize the data on those binned values. Due to the way the cut() function labels the output, we have to create our own labels. Since we're cutting into 12 equal pieces in this example, I'm creating 12 evenly-spaced positions on the x axis for our data values to sit into - this may be different in your case, just take care you label according to what the data represents and what makes the most statistical sense.
df <- diamonds
# setting interval labeling
bin_width <- diff(range(df$carat)/12)
bin_labels <- c((range(df$carat)[1] + (bin_width/2))+(0:11*bin_width))
# cutting the data
df$bins <- cut(df$carat, breaks=12, labels=bin_labels)
df$bins <- as.numeric(levels(df$bins)[df$bins]) # convert factor to numeric
ggplot(diamonds, aes(carat, price)) + geom_point(size=0.3) +
stat_summary_bin(geom='errorbar', color='red', bins=12, width=0.001) +
stat_summary_bin(geom='point', size=3, shape=5, color='red', bins=12) +
geom_smooth(data=df, aes(x=bins), method='lm', color='blue') +
theme_classic()
Note that the regression line above is weighting all binned values equally. This is generally not a good idea unless your data is spaced evenly among the dataset. I'd still recommend if you're going to draw a regression line, have it linked to the original data, which is much more representative of the reality within your data. That would look like this:
ggplot(diamonds, aes(carat, price)) + geom_point(size=0.3) +
stat_summary_bin(geom='errorbar', color='red', bins=12, width=0.001) +
stat_summary_bin(geom='point', size=3, shape=5, color='red', bins=12) +
geom_smooth(method='lm', color='green') +
theme_classic()
When it comes down to it, drawing a regression line for binned data is summarizing the summarized data rather than summarizing your original data. It's statistical heresay, so use at your own risk. But if you simply must for whatever strange reason... I can't stop you. ;)
I have a plot showing the points in my data (image 1)
and a contour plot produced using stat_density_2d (image 2)
The contours clearly don't represent the raw data very well. I have used the same code to generate other contour plots that fit the data perfectly (image 3)
The code I am using is:
SolidReg<-ggplot(RhyShp[,c(13,15)], aes(x=Solidity, y=Reg) ) +
stat_density_2d(aes(fill = ..level..), geom = "polygon") +
labs(x = "Solidity", y = "Regularity") +
theme_classic()
RhyShp is the dataframe from my file 5_102_Rhy.csv used to generate images 1 and 2.
Does anyone know why the contour plot doesn't reflect the dataset?
I am not sure why the code would work for one csv but not another....
thanks!
Turns out this was an issue with the data containing multiples of identical values which skewed the density without being discernible in the geom_point() plot. Once these duplicates were removed the density plot reflected the true density of the data.
I am attempting to place individual points on a plot using ggplot2, however as there are many points, it is difficult to gauge how densely packed the points are. Here, there are two factors being compared against a continuous variable, and I want to change the color of the points to reflect how closely packed they are with their neighbors. I am using the geom_point function in ggplot2 to plot the points, but I don't know how to feed it the right information on color.
Here is the code I am using:
s1 = rnorm(1000, 1, 10)
s2 = rnorm(1000, 1, 10)
data = data.frame(task_number = as.factor(c(replicate(100, 1),
replicate(100, 2))),
S = c(s1, s2))
ggplot(data, aes(x = task_number, y = S)) + geom_point()
Which generates this plot:
However, I want it to look more like this image, but with one dimension rather than two (which I borrowed from this website: https://slowkow.com/notes/ggplot2-color-by-density/):
How do I change the colors of the first plot so it resembles that of the second plot?
I think the tricky thing about this is you want to show the original values, and evaluate the density at those values. I borrowed ideas from here to achieve that.
library(dplyr)
data = data %>%
group_by(task_number) %>%
# Use approxfun to interpolate the density back to
# the original points
mutate(dens = approxfun(density(S))(S))
ggplot(data, aes(x = task_number, y = S, colour = dens)) +
geom_point() +
scale_colour_viridis_c()
Result:
One could, of course come up with a meausure of proximity to neighbouring values for each value... However, wouldn't adjusting the transparency basically achieve the same goal (gauging how densely packed the points are)?
geom_point(alpha=0.03)
I would like to plot multiple groups in a stat_density2 plot with alpha values related to the counts of observations in each group. However, the levels formed by stat_density2d seem to be normalized to the number of observations in each group. For example,
temp <- rbind(movies[1:2,],movies[movies$mpaa == "R" | movies$mpaa == "PG-13",])
ggplot(temp, aes(x=rating,y=length)) +
stat_density2d(geom="tile", aes(fill = mpaa, alpha=..density..), contour=FALSE) +
theme_minimal()
Creates a plot like this:
Because I only included 2 points without ratings, they result in densities that look much tighter/stronger than the other two, and so wash out the other two densities. I've tried looking at Overlay two ggplot2 stat_density2d plots with alpha channels and Specifying the scale for the density in ggplot2's stat_density2d but they don't really address this specific issue.
Ultimately, what I'm trying to accomplish with my real data, is I have "power" samples from discrete 2d locations for multiple conditions, and I am trying to plot what their relative powers/spatial distributions are. I am duplicating points in locations relative to their powers, but this has resulted in low power conditions with just a few locations looking the strongest when using stat_density2d. Please let me know if there is a better way of going about doing this!
Thanks!
stat_hexbin, which understands ..count.. in addition to ..density.., may work for you:
ggplot(temp, aes(x=rating,y=length)) +
stat_binhex(geom="hex", aes(fill = mpaa, alpha=..count..)) +
theme_minimal()
Although you may want to adjust the bin width.
Not the most elegant r code, but this seems to work. I normalize my real data a bit differently than this, but this gets the solution I found across. I use a for loop where I find the average power for the condition and add a new stat_density2d layer with the alpha scaled by that average power.
temp <- rbind(movies[1:2,],movies[movies$mpaa == "R" | movies$mpaa == "PG-13",])
mpaa = unique(temp$mpaa)
p <- ggplot() + theme_minimal()
for (ii in seq(1,3)) {
ratio = length(which(temp$mpaa == mpaa[ii]))
p <- p + stat_density2d(data=temp[temp$mpaa == mpaa[ii],],
aes(x=rating,y=length,fill = mpaa, alpha=..level..),
geom="polygon",
contour=TRUE,
alpha = ratio/20,
lineType = "none")
}
print(p)
I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.