I'm having a filled ggplot contour plot, depicting continuous R-squared values on a 100x100 grid. By default the legend depicts the values in a gradient like continuous manner, resulting from the data's continuous nature.
I, however, would like to classify and visualize the data in classes, each covering a range of 0.05 R-squared values (i.e. class 1 = 0.00-0.05, class 2 = 0.05-0.10, etc.). I have tried several commands such as scale_fill_brewer and scale_fill_gradient2. The latter does in fact generate some sort of discrete classes, but the class labels depict the break values rather than the class range. Scale_fill_brewer returns the error that the continuous data is forced on to a discrete scale, which makes sense, although I can't see how to work around it.
To make matters more complex, I prefer to make use of a diversified color palette to allow identification of specific classes more easily. Besides, I have a multitude of different contour plots with different maximum R-squared values. So ideally the code is generic and can be easily used for the other plots as well.
Thus far, this is the code I have:
library(scales)
library(ggplot2)
p1 <- ggplot(res, aes(x=Var1, y=Var2, fill=R2)) +
geom_tile
p1 +
theme(axis.text.x=element_text(angle=+90)) +
geom_vline(xintercept=c(seq(from = 1, to = 101, by = 5)),color="#8C8C8C") +
geom_hline(yintercept=c(seq(from = 1, to = 101, by = 5)),color="#8C8C8C") +
labs(list(title = "Contour plot of R^2 values for all possible correlations between Simple Ratio indices & Nitrogen Content", x = "Wavelength 1 (nm)", y = "Wavelength 2 (nm)")) +
scale_x_discrete(breaks = c("b450","b475","b500","b525","b550","b575","b600","b625","b650","b675","b700","b725","b750","b775","b800","b825","b850","b875","b900","b925","b950")) +
scale_y_discrete(breaks = c("b450","b475","b500","b525","b550","b575","b600","b625","b650","b675","b700","b725","b750","b775","b800","b825","b850","b875","b900","b925","b950")) +
scale_fill_continuous(breaks = c(seq(from = 0, to = 0.7, by = 0.05)), low = "black", high = "green")
The output at this point looks like this:
You might what to use scale_fill_gradientn() Untested since you failed to provide a reproducible example
library(scales)
library(ggplot2)
ggplot(res, aes(x=Var1, y=Var2, fill=R2)) +
geom_tile() +
scale_fill_gradientn(
colours = terrain.colors(15),
breaks = seq(from = 0, to = 0.7, by = 0.05)
)
Related
Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!
I have a two small sets of points, viz. (1,a1),...,(9,a9) and (1,b1),...,(9,b9). I'm trying to interpolate these two set of points separately by using splines with the help of ggplot2. So, what I want is 2 different splines curves interpolating the two sets of points on the same plot (Refer to the end of this post).
Since I have a very little plotting experience using ggplot2, I copied a code snippet from this answer by Richard Telford. At first, I stored my Y-values for set of points in two numeric variables A and B, and wrote the following code :
library(ggplot2)
library(plyr)
A <- c(a1,...,a9)
B <- c(b1,...,b9)
d <- data.frame(x=1:9,y=A)
d2 <- data.frame(x=1:9,y=B)
dd <- rbind(cbind(d, case = "d"), cbind(d2, case = "d2"))
ddsmooth <- plyr::ddply(dd, .(case), function(k) as.data.frame(spline(k)))
ggplot(dd,aes(x, y, group = case)) + geom_point() + geom_line(aes(x, y, group = case), data = ddsmooth)
This produces the following output :
Now, I'm seeking for an almost identical plot with the following customizations :
The two spline curves should have different colours
The line width should be user's choice (Like we do in plot function)
A legend (Specifying the colour and the corresponding attribute)
Markings on the X-axis should be 1,2,3,...,9
Hoping for a detailed solution to my problem, though any kind of help is appreciated. Thanks in advance for your time and help.
You have already shaped your data correctly for the plot. It's just a case of associating the case variable with colour and size scales.
Note the following:
I have inferred the values of A and B from your plot
Since the lines are opaque, we plot them first so that the points are still visible
I have included size and colour parameters to the aes call in geom_line
I have selected the colours by passing them as a character vector to scale_colour_manual
I have also selected the sizes of the lines by calling scale_size_manual
I have set the x axis breaks by adding a call to scale_x_continuous
The legend has been added automatically according to the scales used.
ggplot(dd, aes(x, y)) +
geom_line(aes(colour = case, size = case, linetype = case), data = ddsmooth) +
geom_point(colour = "black") +
scale_colour_manual(values = c("red4", "forestgreen"), name = "Legend") +
scale_size_manual(values = c(0.8, 1.5), name = "Legend") +
scale_linetype_manual(values = 1:2, name = "Legend") +
scale_x_continuous(breaks = 1:9)
Created on 2020-07-15 by the reprex package (v0.3.0)
I would like to plot with gglot's geom_raster a 2D plot with 2 different gradients, but I do not know if there is a fast and elegant solution for this and I am stuck.
The effect that I would like to see is the overlay of multiple geom_raster, essentially. Also, I would need a solution that scales to N different gradients; let me give an example with N=2 gradients which is easier to follow.
I first create a 100 x 100 grid of positions X and Y
# the domain are 100 points on each axis
domain = seq(0, 100, 1)
# the grid with the data
grid = expand.grid(domain, domain, stringsAsFactors = FALSE)
colnames(grid) = c('x', 'y')
Then I compute one value per grid point; imagine something stupid like this
grid$val = apply(grid, 1, function(w) { w['x'] * w['y'] }
I know how to plot this with a custom white to red gradient
ggplot(grid, aes(x = x, y = y)) +
geom_raster(aes(fill = val), interpolate = TRUE) +
scale_fill_gradient(
low = "white",
high = "red", aesthetics = 'fill')
But now imagine I have another value per grid point
grid$second_val = apply(grid, 1, function(w) { w['x'] * w['y'] + runif(1) }
Now, how do I plot a grid where each position "(x,y)" is coloured with an overlay of:
1 "white to red" gradient with value given by val
1 "white to blue" gradient with value given by second_val
Essentially, in most applications val and second_val will be two 2D density functions and I would like each gradient to represent the density value. I need two different colours to see the different distribution of the values.
I have seen this similar question but don't know how to use that answer in my case.
#Axeman's answer to my question, which you linked to, applies directly the same to your question.
Note that scales::color_ramp() uses values between 0 and 1, so normalize val and second_val between 0, 1 before plotting
grid$val_norm <- (grid$val-min(grid$val))/diff(range(grid$val))
grid$second_val_norm <- (grid$second_val-min(grid$second_val))/diff(range(grid$second_val))
Now plot using #Axeman's answer. You can plot one later as raster, and overlay the second with annotate. I have added transparency (alpha=.5) otherwise you'll only be able to see the second layer.:
ggplot(grid, aes(x = x, y = y)) +
geom_raster(aes(fill=val)) +
scale_fill_gradient(low = "white", high = "red", aesthetics = 'fill') +
annotate(geom="raster", x=grid$x, y=grid$y, alpha=.5,
fill = scales::colour_ramp(c("transparent","blue"))(grid$second_val_norm))
Or, you can plot both layers using annotate().
# plot using annotate()
ggplot(grid, aes(x = x, y = y)) +
annotate(geom="raster", x=grid$x, y=grid$y, alpha=.5,
fill = scales::colour_ramp(c("transparent","red"))(grid$val_norm)) +
annotate(geom="raster", x=grid$x, y=grid$y, alpha=.5,
fill = scales::colour_ramp(c("transparent","blue"))(grid$second_val_norm))
I'm quite new to ggplot but I like the systematic way how you build your plots. Still, I'm struggeling to achieve desired results. I can replicate plots where you have categorical data. However, for my use I often need to fit a model to certain observations and then highlight them in a combined plot. With the usual plot function I would do:
library(splines)
set.seed(10)
x <- seq(-1,1,0.01)
y <- x^2
s <- interpSpline(x,y)
y <- y+rnorm(length(y),mean=0,sd=0.1)
plot(x,predict(s,x)$y,type="l",col="black",xlab="x",ylab="y")
points(x,y,col="red",pch=4)
points(0,0,col="blue",pch=1)
legend("top",legend=c("True Values","Model values","Special Value"),text.col=c("red","black","blue"),lty=c(NA,1,NA),pch=c(4,NA,1),col=c("red","black","blue"),cex = 0.7)
My biggest problem is how to build the data frame for ggplot which automatically then draws the legend? In this example, how would I translate this into ggplot to get a similar plot? Or is ggplot not made for this kind of plots?
Note this is just a toy example. Usually the model values are derived from a more complex model, just in case you wante to use a stat in ggplot.
The key part here is that you can map colors in aes by giving a string, which will produce a legend. In this case, there is no need to include the special value in the data.frame.
df <- data.frame(x = x, y = y, fit = predict(s, x)$y)
ggplot(df, aes(x, y)) +
geom_line(aes(y = fit, col = 'Model values')) +
geom_point(aes(col = 'True values')) +
geom_point(aes(col = 'Special value'), x = 0, y = 0) +
scale_color_manual(values = c('True values' = "red",
'Special value' = "blue",
'Model values' = "black"))
I am trying to create a graph where because there are so many points on the graph, at the edges of the green it starts to fade to black while the center stays green. The code I am currently using to create this graph is:
plot(snb$px,snb$pz,col=snb$event_type,xlim=c(-2,2),ylim=c(1,6))
I looked into contour plotting but that did not work for this. The coloring variable is a factor variable.
Thanks!
This is a great problem for ggplot2.
First, read the data in:
snb <- read.csv('MLB.csv')
With your data frame you could try plotting points that are partly transparent, and setting them to be colored according to the factor event_type:
require(ggplot2)
p1 <- ggplot(data = snb, aes(x = px, y = py, color = event_type)) +
geom_point(alpha = 0.5)
print(p1)
and then you get this:
Or, you might want to think about plotting this as a heatmap using geom_bin2d(), and plotting facets (subplots) for each different event_type, like this:
p2 <- ggplot(data = snb, aes(x = px, y = py)) +
geom_bin2d(binwidth = c(0.25, 0.25)) +
facet_wrap(~ event_type)
print(p2)
which makes a plot for each level of the factor, where the color will be the number of data points in each bins that are 0.25 on each side. But, if you have more than about 5 or 6 levels, this might look pretty bad. From the small data sample you supplied, I got this
If the levels of the factors don't matter, there are some nice examples here of plots with too many points. You could also try looking at some of the examples on the ggplot website or the R cookbook.
Transparency could help, which is easily achieved, as #BenBolker points out, with adjustcolor:
colvect = adjustcolor(c("black", "green"), alpha = 0.2)
plot(snb$px, snb$pz,
col = colvec[snb$event_type],
xlim = c(-2,2),
ylim = c(1,6))
It's built in to ggplot:
require(ggplot2)
p <- ggplot(data = snb, aes(x = px, y = pz, color = event_type)) +
geom_point(alpha = 0.2)
print(p)