R ggplot2 warns about missing rows/values and drops parts of plot - r

I've got an R data frame that looks like this:
> glimpse(spottingIntensity)
Observations: 28
Variables: 5
$ nClassifications <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,...
$ nPhotosClassified <int> 45816, 25252, 12327, 5286, 2327, 1231, 713, 565, 447, 435, 318, 227, 192, 156,...
$ totalClassifiedPhotos <int> 95781, 95781, 95781, 95781, 95781, 95781, 95781, 95781, 95781, 95781, 95781, 9...
$ proportionOfClassified <dbl> 4.783412e-01, 2.636431e-01, 1.286998e-01, 5.518840e-02, 2.429501e-02, 1.285224...
$ cumulativeProportions <dbl> 0.4783412, 0.7419843, 0.8706842, 0.9258726, 0.9501676, 0.9630198, 0.9704639, 0...
In it, nClassifications and nPhotosClassified are data and the other variables are derived.
I use the following to plot the data with ggplot2:
ggplot(data = spottingIntensity, mapping = aes(x = nClassifications, y = cumulativeProportions)) +
geom_col() +
geom_text(mapping = aes(label = nPhotosClassified), nudge_y = 0.03) +
scale_x_continuous(limits = c(NA, 10),
breaks = seq.int(from = 1, to = 10, by = 1))
Which gives me these warnings:
Warning messages:
1: Removed 18 rows containing missing values (position_stack).
2: Removed 18 rows containing missing values (geom_text).
And this output:
I see that in the plot, the column for nClassifications = 10 is not shown even though data for it exists in my original data frame.
I checked the data frame, and I do have a few "missing rows" for nClassifications = 24, 27, 30, 31, but not for nClassifications = 10.
So:
Why isn't the bar in the plot for nClassifications = 10 showing up? How do I fix this? (I expected a bar similar in height to nClassifications = 9)
How do I programmatically "fill/complete" my data frame so that there are corresponding rows for nClassifications = 24, 27, 30, 31? In this case, nPhotosClassified <- 0 for those four nClassifications. And with that I can derive the other variables.
Can dplyr/tidyr help with 1. and 2.? Or is there another way? Thank you!
EDIT: Ooooops, I pasted the wrong code snippet before, it's correct now.

Related

R:mgcv add colorbar to 2D heatmap of GAM

I'm fitting a gam with mgcv and plot the result with the default plot.gam() function. My model includes a 2D-smoother and I want to plot the result as a heatmap. Is there any way to add a colorbar for the heatmap?
I've previously looked into other GAM potting packages, but none of them provided the necessary visualisation. Please note, this is just a simplification for illustration purposes; the actual model (and reporting needs) is much more complicated
edited: I initially had swapped y and z in my tensor product, updated to reflect the correct version both in the code and the plot
df.gam<-gam(y~te(x,z), data=df, method='REML')
plot(df.gam, scheme=2, hcolors=heat.colors(999, rev =T), rug=F)
sample data:
structure(list(x = c(3, 17, 37, 9, 4, 11, 20.5, 11.5, 16, 17,
18, 15, 13, 29.5, 13.5, 25, 15, 13, 20, 20.5, 17, 11, 11, 5,
16, 13, 3.5, 16, 16, 5, 20.5, 2, 20, 9, 23.5, 18, 3.5, 16, 23,
3, 37, 24, 5, 2, 9, 3, 8, 10.5, 37, 3, 9, 11, 10.5, 9, 5.5, 8,
22, 15.5, 18, 15, 3.5, 4.5, 20, 22, 4, 8, 18, 19, 26, 9, 5, 18,
10.5, 30, 15, 13, 27, 19, 5.5, 18, 11.5, 23.5, 2, 25, 30, 17,
18, 5, 16.5, 9, 2, 2, 23, 21, 15.5, 13, 3, 24, 17, 4.5), z = c(144,
59, 66, 99, 136, 46, 76, 87, 54, 59, 46, 96, 38, 101, 84, 64,
92, 56, 69, 76, 93, 109, 46, 124, 54, 98, 131, 89, 69, 124, 105,
120, 69, 99, 84, 75, 129, 69, 74, 112, 66, 78, 118, 120, 103,
116, 98, 57, 66, 116, 108, 95, 57, 41, 20, 89, 61, 61, 82, 52,
129, 119, 69, 61, 136, 98, 94, 70, 77, 108, 118, 94, 105, 52,
52, 38, 73, 59, 110, 97, 87, 84, 119, 64, 68, 93, 94, 9, 96,
103, 119, 119, 74, 52, 95, 56, 112, 78, 93, 119), y = c(96.535,
113.54, 108.17, 104.755, 94.36, 110.74, 112.83, 110.525, 103.645,
117.875, 105.035, 109.62, 105.24, 119.485, 107.52, 107.925, 107.875,
108.015, 115.455, 114.69, 116.715, 103.725, 110.395, 100.42,
108.79, 110.94, 99.13, 110.935, 112.94, 100.785, 110.035, 102.95,
108.42, 109.385, 119.09, 110.93, 99.885, 109.96, 116.575, 100.91,
114.615, 113.87, 103.08, 101.15, 98.68, 101.825, 105.36, 110.045,
118.575, 108.45, 99.21, 109.19, 107.175, 103.14, 94.855, 108.15,
109.345, 110.935, 112.395, 111.13, 95.185, 100.335, 112.105,
111.595, 100.365, 108.75, 116.695, 110.745, 112.455, 104.92,
102.13, 110.905, 107.365, 113.785, 105.595, 107.65, 114.325,
108.195, 96.72, 112.65, 103.81, 115.93, 101.41, 115.455, 108.58,
118.705, 116.465, 96.89, 108.655, 107.225, 101.79, 102.235, 112.08,
109.455, 111.945, 104.11, 94.775, 110.745, 112.44, 102.525)), row.names = c(NA,
-100L), class = "data.frame")
It would be easier (IMHO) to do this reliably within the ggplot2 ecosphere.
I'll show a canned approach using my {gratia} package but also checkout {mgcViz}. I'll also suggest a more generic solution using tools from {gratia} to extra information about your model's smooths and then plot them yourself using ggplot().
library('mgcv')
library('gratia')
library('ggplot2')
library('dplyr')
# load your snippet of data via df <- structure( .... )
# then fit your model (note you have y as response & in the tensor product
# I assume z is the response below and x and y are coordinates
m <- gam(z ~ te(x, y), data=df, method='REML')
# now visualize the mode using {gratia}
draw(m)
This produces:
{gratia}'s draw() methods can't plot everything yet, but where it doesn't work you should still be able to evaluate the data you need using tools in {gratia}, which you can then plot with ggplot() itself by hand.
To get values for your smooths, i.e. the data behind the plots that plot.gam() or draw() display, use gratia::smooth_estimates()
# dist controls what we do with covariate combinations too far
# from support of the data. 0.1 matches mgcv:::plot.gam behaviour
sm <- smooth_estimates(m, dist = 0.1)
yielding
r$> sm
# A tibble: 10,000 × 7
smooth type by est se x y
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 te(x,y) Tensor NA 35.3 11.5 2 94.4
2 te(x,y) Tensor NA 35.5 11.0 2 94.6
3 te(x,y) Tensor NA 35.7 10.6 2 94.9
4 te(x,y) Tensor NA 35.9 10.3 2 95.1
5 te(x,y) Tensor NA 36.2 9.87 2 95.4
6 te(x,y) Tensor NA 36.4 9.49 2 95.6
7 te(x,y) Tensor NA 36.6 9.13 2 95.9
8 te(x,y) Tensor NA 36.8 8.78 2 96.1
9 te(x,y) Tensor NA 37.0 8.45 2 96.4
10 te(x,y) Tensor NA 37.2 8.13 2 96.6
# … with 9,990 more rows
In the output, x and y are a grid of values over the range of both covariates (the number of points in the grid in each covariate is controlled by n such that the grid for a 2d tensor product smooth is of size n by n). est is the estimated value of the smooth at the values of the covariates and se its standard error. For models with multiple smooths, the smooth variable uses the internal label that {mgcv} gives each smooth - these are the labels used in the output you get from calling summary() on your GAM.
We can add a confidence interval if needed using add_confint().
Now you can plot your smooth(s) by hand using ggplot(). At this point you have two options
if draw() can handle the type of smooth you want to plot, you can use the draw() method for that object and then build upon it, or
plot everything by hand.
Option 1
# evaluate just the smooth you want to plot
smooth_estimates(m, smooth = "te(x,y)", dist = 0.1) %>%
draw() +
geom_point(data = df, alpha = 0.2) # add a point layer for original data
This pretty much gets you what draw() produced when given the model object itself. And you can add to it as if it were a ggplot object (which is not the case of the objects returned by gratia:::draw.gam(), which is wrapped by {patchwork} and needs other ways to interact with the plots).
Option 2
Here you are in full control
sm <- smooth_estimates(m, smooth = "te(x,y)", dist = 0.1)
ggplot(sm, aes(x = x, y = y)) +
geom_raster(aes(fill = est)) +
geom_point(data = df, alpha = 0.2) + # add a point layer for original data
scale_fill_viridis_c(option = "plasma")
which produces
A diverging palette is likely better for this, along the lines of the one gratia:::draw.smooth_estimates uses
sm <- smooth_estimates(m, smooth = "te(x,y)", dist = 0.1)
ggplot(sm, aes(x = x, y = y)) +
geom_raster(aes(fill = est)) +
geom_contour(aes(z = est), colour = "black") +
geom_point(data = df, alpha = 0.2) + # add a point layer for original data
scale_fill_distiller(palette = "RdBu", type = "div") +
expand_limits(fill = c(-1,1) * abs(max(sm[["est"]])))
which produces
Finally, if {gratia} can't handle your model, I'd appreciate you filing a bug report here so that I can work on supporting as many model types as possible. But do try {mgcViz} as well for an alternative approach to visualsing GAMs fitted using {mgcv}.
A base plot solution would be to use fields::image.plot directly. Unfortunately, it require data in a classic wide format, not the long format needed by ggplot.
We can facilitate plotting by grabbing the object returned by plot.gam(), and then do a little manipulation of the object to get what we need for image.plot()
Following on from #Anke's answer then, instead of plotting with plot.gam() then using image.plot() to add the legend, we proceed to use plot.gam() to get what we need to plot, but do everything in image.plot()
plt <- plot(df.gam)
plt <- plt[[1]] # plot.gam returns a list of n elements, one per plot
# extract the `$fit` variable - this is est from smooth_estimates
fit <- plt$fit
# reshape fit (which is a 1 column matrix) to have dimension 40x40
dim(fit) <- c(40,40)
# plot with image.plot
image.plot(x = plt$x, y = plt$y, z = fit, col = heat.colors(999, rev = TRUE))
contour(x = plt$x, y = plt$y, z = fit, add = TRUE)
box()
This produces:
You could also use the fields::plot.surface() function
l <- list(x = plt$x, y = plt$y, z = fit)
plot.surface(l, type = "C", col = heat.colors(999, rev = TRUE))
box()
This produces:
See ?fields::plot.surface for other arguments to modify the contour plot etc.
As shown, these all have the correct range on the colour bar. It would appear that #Anke's version the colour bar mapping is off in all of the plots, but mostly just a little bit so it wasn't as noticeable.
Following Gavin Simpson's answer and this thread (How to add colorbar with perspective plot in R), I think I've come up with a solution that uses plot.gam() (though I really love that {gratia} takes it into a ggplot universe and will definitely look more into that)
require(fields)
df.gam<-gam(y~te(x,z), data=df, method='REML')
sm <- as.data.frame(smooth_estimates(df.gam, dist = 0.1))
plot(df.gam, scheme=2, hcolors=heat.colors(999, rev =T), contour.col='black', rug=F, main='', cex.lab=1.75, cex.axis=1.75)
image.plot(legend.only=T, zlim=range(sm$est), col=heat.colors(999, rev =T), legend.shrink = 0.5, axis.args = list(at =c(-10,-5,0,5, 10, 15, 20)))
I hope I understood correctly that gratia:smooth_estimates() actually pulls out the partial effects.
For my model with multiple terms (and multiple tensor products), this seems to work nicely by indexing the sections of the respective terms in sm. Except for one, where the colorbar and the heatmap aren't quite matching up. I can't provide the actual underlaying data, but add that plot for illustration in case anyone has any idea. I'm using the same approach as outlined above. In the colorbar, dark red is at 15-20, but in the heatmap the isolines just above 0 already correspond with the dark red (while 0 is dark yellow'ish in the colorbar).

How can i add Hatched polygons on a spplot in R?

I have a map which summarizes an indicator of the saturation percentage of real estate by neighborhood in Paris (Observed Price of real estate/maximum price set by law). I would like to add hatched on neighborhoods which have less than 5 observations included in my dataset.
I searched, but I couldn't find a way to do it. Any advice in the right direction is welcomed. Thanks.
Here is my code:
library(sp)
library(sf)
library(rgdal)
library(RColorBrewer)
library(raster)
library(classInt)
library(cartography)
#Importation
setwd("path")
shp <- readOGR(dsn="path/to/file",layer="l_qu_paris")
#Breaks
q10 <- classIntervals(map$saturation2, n=7, style="fixed",
fixedBreaks=c(45,69.999999, 79.9999999, 89.9999999, 99.9999999
,109.99999999, 120))
#Colors
my.palette <- colors()[c(73,26,128,10,652,92)]
#Map
##Scale
scale.parameter = 1.1
xshift = 0
yshift = 0
original.bbox = shp#bbox
edges = original.bbox
edges[1, ] <- (edges[1, ] - mean(edges[1, ])) * scale.parameter + mean(edges[1, ]) + xshift
edges[2, ] <- (edges[2, ] - mean(edges[2, ])) * scale.parameter + mean(edges[2, ]) + yshift
#Saturation
idx <- match(shp$l_qu, map$l_qu)
is.na(idx)
concordance <- map[idx, "saturation2"]
shp$saturation2 <- concordance
spplot(shp, "saturation2",col.regions=my.palette,
col = "black", lwd= 1, at = q10$brks,
main=list(label="% de saturation des meublés 1 pièce",cex=1.2,fontfamily="serif"),
xlim = edges[1, ], ylim = edges[2, ])
grid.text("Saturation moyenne (en%)", x=unit(0.95, "npc"), y=unit(0.50, "npc"), rot=90)
Here is my map:
saturation
Here is an example of a map that i would like to have:
saturation example
Here are the polygons in shapefile format: https://www.data.gouv.fr/fr/datasets/quartiers-administratifs/
And here is my dataset:
map <- structure(list(l_qu = c("Amérique", "Archives", "Arsenal", "Arts-et-Métiers",
"Auteuil", "Batignolles", "Bel-Air", "Belleville", "Bercy", "Bonne-Nouvelle",
"Chaillot", "Champs-Elysées", "Charonne", "Chaussée-d'Antin",
"Clignancourt", "Combat", "Croulebarbe", "Ecole-Militaire", "Enfants-Rouges",
"Epinettes", "Europe", "Faubourg-du-Roule", "Faubourg-Montmartre",
"Folie-Méricourt", "Gaillon", "Gare", "Goutte-d'Or", "Grandes-Carrières",
"Grenelle", "Gros-Caillou", "Halles", "Hôpital-Saint-Louis",
"Invalides", "Jardin-des-Plantes", "Javel", "La Chapelle", "Madeleine",
"Mail", "Maison-Blanche", "Monnaie", "Montparnasse", "Muette",
"Necker", "Notre-Dame", "Notre-Dame-des-Champs", "Odéon", "Palais-Royal",
"Parc-de-Montsouris", "Père-Lachaise", "Petit-Montrouge", "Picpus",
"Place-Vendôme", "Plaine de Monceaux", "Plaisance", "Pont-de-Flandre",
"Porte-Dauphine", "Porte-Saint-Denis", "Porte-Saint-Martin",
"Quinze-Vingts", "Rochechouart", "Roquette", "Saint-Ambroise",
"Saint-Fargeau", "Saint-Germain-des-Prés", "Saint-Gervais",
"Saint-Lambert", "Saint-Merri", "Saint-Thomas-d'Aquin", "Saint-Victor",
"Saint-Vincent-de-Paul", "Sainte-Avoie", "Sainte-Marguerite",
"Saint-Georges", "Salpêtrière", "Sorbonne", "Saint-Germain-l'Auxerrois",
"Ternes", "Val-de-Grâce", "Villette", "Vivienne", "Total"),
saturation2 = c(98.188951329533, 85.4938271604938, 83.8463463463464,
90.1460755525873, 98.1726527090667, 90.2186740262059, 92.8743271072797,
72.8549079897508, 99.2356140350877, 90.1234567901235, 114.057904044022,
NA, 87.2208980972528, 91.2562612612613, 97.9518951016991,
86.2770900920801, 91.0239726151895, 92.8305400372439, 88.6514719848053,
73.876877752942, 108.693318725755, 67.3263578578579, 85.8735259484408,
89.2100224414912, 92, 90.6120989320281, 85.8446948520848,
91.4165103088783, 97.2760978594495, 93.60892313074, 102.471730530348,
95.9062868379746, 96, 92.5484278273071, 95.0066946433545,
85.8187074829932, 101.139150713213, 92.1272297297297, 93.0625144594594,
61.8074324324324, 100.173302938197, 99.720856146949, 84.8732544128823,
84.1911355800245, 85.1122672253259, 91.8422003734504, NA,
94.612349767814, 83.2363741480137, 87.0403187718064, 92.0886931496388,
77, 110.943302180685, 100.73486307088, 66.3899425287356,
96.2527514568292, 95.7430893746874, 87.9028997984617, 48,
85.5630809345015, 92.7010730078939, 82.075822827797, 83.1727736726875,
76.2162162162162, 104.534662867996, 98.3510353194912, 78.3333333333333,
103.169134078212, 80.8779605984059, 92.63515704154, 62, 90.3902768982325,
94.1391771653151, 94.8669917042241, 94.4825319797959, 95.4279279279279,
98.2238673533848, 94.0602977590835, 87.5105365473892, 102,
92.5123935729199), numobs = c(6, 4, 4, 6, 36, 15, 4, 4, 3,
2, 16, NA, 36, 3, 32, 9, 22, 13, 11, 6, 31, 5, 15, 14, 4,
22, 3, 64, 29, 58, 7, 18, 4, 13, 23, 2, 8, 4, 47, 12, 16,
49, 50, 9, 33, 26, NA, 15, 10, 10, 23, 2, 13, 15, 2, 12,
8, 31, 1, 17, 22, 42, 7, 3, 4, 74, 4, 7, 13, 6, 2, 23, 18,
16, 17, 1, 24, 44, 8, 4, 1290)), row.names = c(NA, -81L), class = c("tbl_df",
"tbl", "data.frame"))
Neither spplot, nor ggplot2 support textured fillings. Having said that, there is a package called ggpattern which provides custom ggplot2 geoms which support filled areas with geometric and image-based patterns. See developer site for more info on ggpattern: https://coolbutuseless.github.io/package/ggpattern/index.html
With ggpattern you can plot 'hatched' or textured geom fillings. Below is a working example from the developers website:
library(maps)
crimes <- data.frame(state = tolower(rownames(USArrests)), USArrests)
crimesm <- reshape2::melt(crimes, id = 1)
states_map <- map_data("state")
p <- ggplot(crimes, aes(map_id = state)) +
geom_map_pattern(
aes(
# fill = Murder,
pattern_fill = Murder,
pattern_spacing = state,
pattern_density = state,
pattern_angle = state,
pattern = state
),
fill = 'white',
colour = 'black',
pattern_aspect_ratio = 1.8,
map = states_map
) +
expand_limits(x = states_map$long, y = states_map$lat) +
coord_map() +
theme_bw(18) +
labs(title = "ggpattern::geom_map_pattern()") +
scale_pattern_density_discrete(range = c(0.01, 0.3)) +
scale_pattern_spacing_discrete(range = c(0.01, 0.03)) +
theme(legend.position = 'none')
p

Plot a continuous variable discrete [duplicate]

This question already has an answer here:
Making line plot with discrete x-axis in ggplot2
(1 answer)
Closed 3 years ago.
I have this data:
samp
date_block_num sales
<dbl> <dbl>
1 0 131479
2 1 128090
3 2 147142
4 3 107190
5 4 106970
6 5 125381
7 6 116966
8 7 125291
9 8 133332
10 9 127541
# ... with 25 more rows
date_block_num represents a month. I want to plot the date in a time series fashion.
If I use this code, date_block_num will be plotted as a continuous variable (0, 10, 20, etc.) but it should be discrete (1,2,3, etc.).
samp %>%
ggplot(aes(date_block_num, sales)) +
geom_line()
If I use this:
samp %>%
ggplot(aes(as.factor(date_block_num), sales)) +
geom_line()
or
samp %>%
ggplot(aes(date_block_num, sales)) +
geom_line(aes(group = date_block num)
I get:
geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
Any Idea how to fix this?
dput(samp)
structure(list(date_block_num = c(0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32, 33, 34), sales = c(131479, 128090,
147142, 107190, 106970, 125381, 116966, 125291, 133332, 127541,
130009, 183342, 116899, 109687, 115297, 96556, 97790, 97429,
91280, 102721, 99208, 107422, 117845, 168755, 110971, 84198,
82014, 77827, 72295, 64114, 63187, 66079, 72843, 71056, 0)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -35L))
You should be able to specify the x-axis labels using scale_x_continuous.
samp %>%
ggplot(aes(date_block_num, sales)) +
geom_line() +
scale_x_continuous(breaks = samp $date_block_num)

R - tidyverse/ggplot bar chart with custom discrete data labels and sorted by one variable?

I have a data frame with which I am learning tidyverse methods in R that looks like this:
> glimpse(data)
Observations: 16
Variables: 6
$ True.species <fct> Badger, Blackbird, Brown hare, Domestic cat, Domestic d...
$ misidentified <dbl> 17, 16, 59, 20, 12, 24, 28, 6, 3, 7, 191, 19, 110, 21, ...
$ missed <dbl> 61, 106, 7, 24, 16, 160, 110, 12, 15, 37, 200, 58, 259,...
$ Total <dbl> 78, 122, 66, 44, 28, 184, 138, 18, 18, 44, 391, 77, 369...
$ PrMissed <dbl> 0.7820513, 0.8688525, 0.1060606, 0.5454545, 0.5714286, ...
$ PrMisID <dbl> 0.21794872, 0.13114754, 0.89393939, 0.45454545, 0.42857...
Here is the dput():
data <- structure(list(True.species = structure(c(1L, 2L, 3L, 5L, 6L,
7L, 8L, 9L, 13L, 16L, 17L, 18L, 20L, 21L, 22L, 23L), .Label = c("Badger",
"Blackbird", "Brown hare", "Crow", "Domestic cat", "Domestic dog",
"Grey squirrel", "Hedgehog", "Horse", "Human", "Jackdaw", "Livestock",
"Magpie", "Muntjac", "Nothing", "Pheasant", "Rabbit", "Red fox",
"Red squirrel", "Roe Deer", "Small rodent", "Stoat or Weasel",
"Woodpigeon"), class = "factor"), misidentified = c(17, 16, 59,
20, 12, 24, 28, 6, 3, 7, 191, 19, 110, 21, 5, 13), missed = c(61,
106, 7, 24, 16, 160, 110, 12, 15, 37, 200, 58, 259, 473, 9, 17
), Total = c(78, 122, 66, 44, 28, 184, 138, 18, 18, 44, 391,
77, 369, 494, 14, 30), PrMissed = c(0.782051282051282, 0.868852459016393,
0.106060606060606, 0.545454545454545, 0.571428571428571, 0.869565217391304,
0.797101449275362, 0.666666666666667, 0.833333333333333, 0.840909090909091,
0.51150895140665, 0.753246753246753, 0.70189701897019, 0.95748987854251,
0.642857142857143, 0.566666666666667), PrMisID = c(0.217948717948718,
0.131147540983607, 0.893939393939394, 0.454545454545455, 0.428571428571429,
0.130434782608696, 0.202898550724638, 0.333333333333333, 0.166666666666667,
0.159090909090909, 0.48849104859335, 0.246753246753247, 0.29810298102981,
0.0425101214574899, 0.357142857142857, 0.433333333333333)), row.names = c(NA,
-16L), class = "data.frame")
I managed to make a rudimentary plot of what I want with ggplot() as follows:
ggplot(data = data, aes(x = True.species, y = PrMissed)) + geom_bar(stat = "identity")
But there are three things I can't figure out how to do:
I want a stacked bar chart where the variables PrMissed and PrMisID are on top of each other. Note that PrMissed + PrMisID == 1 for each row in the data frame, so the final plot would have equally high stacks but each containing two colors (how do I specify them?), one for PrMissed and another for PrMisID.
I want the order of the bars to be in ascending order of the PrMissed variable so that Brown hare would be on one end and Small rodent on the other.
I prefer this plot to be "flipped" on its side so that the labels (the animal names like "Brown hare") are on the left side and easier to read. An added complexity is that rather than the labels simply saying the animal name, I want them to say the corresponding Total value, so for example Brown hare would get a corresponding axis label like "Brown hare (total = 66)".
I been trying for a long time a for the life of me couldn't figure out an axiomatic way to do this with ggplot(). I know the answer might be simple so please excuse my ignorance. Can anyone help? Thanks in advance.
Here's my answer which does not require the use of data.tables and is solely based on tidyverse packages:
library(ggplot2)
library(reshape2)
library(magrittr)
library(dplyr)
# order Species by PrMissed value
data$True.species <- factor(data$True.species,
levels = data[order(data$PrMissed, decreasing = F),"True.species"])
# reshape to have the stackable values and plot
melt(data,
id.vars = c("True.species", "misidentified", "missed", "Total"),
measure.vars = c("PrMissed", "PrMisID")) %>%
mutate(x_axis_text = paste(.$True.species, "(Total = ", .$Total, ")") ) %>%
ggplot(aes(x = x_axis_text, y = value, fill = variable) ) +
geom_bar(stat = "identity") +
coord_flip()
Which would result in a plot like this
Break down of the code:
Your individual points are done like this.
1) To have stackable values, they need to be all in one column, so using melt from the reshape2 package we tidy the data and create 2 new columns in the data. One is value containing the values from 0 to 1 and the other is variable indicating if that number is associated with PrMissed or PrMisID
2) Before melting the data we convert the True.species values into factor based on PrMissed values. Use decreasing = T to invert the order if you wish.
3) coord_flip() flips the x and y axis so that the species are on the y axis instead of the y axis and you can easily read them on the left side.
I can help with a data.table and ggplot2 solution:
First, you'll need to make your wide table a long one with melt. Then, you're looking for position = "stack" argument to geom_bar:
Also, please notice that naming data a table is bad idea, as there's a function called data().
require(data.table)
ggplot(melt(df[, .(True.species, PrMissed, PrMisID)],
id.vars="True.species"),
aes(x = True.species, y = value, fill = variable))+
geom_bar(position = "stack", stat = "identity")
I forgot about the sorting... (and rotation of texts, so they are readable):
ggplot(melt(df[, .(True.species, PrMissed, PrMisID)],
id.vars="True.species"),
aes(x = True.species, y = value,
fill = variable))+
geom_bar(position = "stack", stat = "identity")+
theme(axis.text.x = element_text(angle = 90))+
scale_x_discrete(limits = sort(df$True.species))

R: How Plot an Excel Table(Matrix) with R

I got this problem I still haven't found out how to solve it. I want to plot all the Values MW1, MW2 and MW3 in function of "DHT + Procymidone". How can I plot all this values in the graphic so that I will get 3 different curves (in different colors and different number like curve 1, 2, ...)? And I want the labels of the X-Values("DHT + Procymidone") to be like -10, -9, ... , -4 instead of 1,00E-10, ...
DHT + Procymidone MW 1 MW 2 MW 3
1,00E-10 114,259526780335 111,022461066274 213,212408408682
1,00E-09 115,024187788314 111,083316791613 114,529425136628
1,00E-08 110,517449986348 107,867941606743 125,10230718665
1,00E-07 100,961311263444 98,4219995773135 116,045168653416
1,00E-06 71,2383604211297 73,539659636842 50,3213799775309
1,00E-05 20,3553333652104 36,1345771905088 15,42260866106
1,00E-04 4,06189509055904 18,1246447874679 10,1988107887318
I have shortened your data frame for convenience reasons, so here's an example:
mydat <- data.frame(DHT_Procymidone = c(-10, -9, -8, -7, -6, -5, -4),
MW1 = c(114, 115, 110, 100, 72, 20, 4),
MW2 = c(111, 111, 107, 98, 73, 36, 18),
MW3 = c(213, 114, 123, 116, 50, 15, 10))
library(tidyr)
library(ggplot2)
mydf <- gather(mydat, "grp", "MW", 2:4)
ggplot(mydf, aes(x = DHT_Procymidone, y = MW, colour = grp)) + geom_line()
which gives following plot:
To use ggplot, your data needs to be in long-format. gather does this for you, appending columns MW1-MW3 into one column, while the column names are added as new column values in the grp-column. This group-column allows to identify different groups, i.e. different colored lines in the plot.
Depending on the type of DHT + Procymidone, you can, e.g. use format(..., scientific = FALSE) to convert to numeric, however, this will result in -0.0000000001 (and not -10).
However, if this data column is a character vector (you can coerce with as.character), this may work:
a <- "1,00E-10"
sub("1,00E", "", a, fixed = TRUE)
> [1] "-10"
As an alternative answer to #Daniel's which doesn't rely on ggplot (thanks Daniel for providing the reproducible data).
mydat <- data.frame(DHT_Procymidone = c(-10, -9, -8, -7, -6, -5, -4),
MW1 = c(114, 115, 110, 100, 72, 20, 4),
MW2 = c(111, 111, 107, 98, 73, 36, 18),
MW3 = c(213, 114, 123, 116, 50, 15, 10))
plot(mydat[,2] ~ mydat[,1], typ = "l", ylim = c(0,220), xlim = c(-10,-2), xlab = "DHT Procymidone", ylab = "MW")
lines(mydat[,3] ~ mydat[,1], col = "blue")
lines(mydat[,4] ~ mydat[,1], col = "red")
legend(x = -4, y = 200, legend = c("MW1","MW2","MW3"), lty = 1, bty = "n", col = c("black","blue","red"))
To change axis labels see the text in xlab and ylab. To change axis limits see xlim and ylim.

Resources