Problem with colouring a GG Plot Histogram - r

I`ve got an issue with colouring a ggplot2 histogram.
R-Junk
ggplot(Hospital, aes(x=BodyTemperature)) +
geom_histogram(aes(fill = factor(BodyTemperature))) +
scale_x_continuous(breaks = seq(0, 100, by = 10)) +
ylab("prevalence") +
xlab("BodyTemperature") +
ggtitle("Temperature vs. prevalence")
So the histogram should plot the information (x-axis), that as higher the temperature gets, the worse it is. So for example „temperature“ at 36°C should be green, 38°C yellow, 40° red - going from left to right on the x-axis.
Y-Axis should provide how often these temperatures ocures in the Patientdata of the Hospital. The Data "BodyTemperature" is a list of 200+ Data like: "35.3" or "37.4" etc.
How can this chunk be fixed to provide the color changes? For a non-ggplot version ive already written this r-junk positiv:
```{r, fig.width=8}
color1 <- rep(brewer.pal(1, "Greens"))
color2 <- rep("#57c4fa", 0)
color3 <- brewer.pal(8, "Reds")
hist(Hospital$BodyTemperature[-357],
breaks = seq(from = 0, to = 100, by = 10),
main = "Temperature vs. prevalence",
ylab = "prevalence",
xlab = "Temperature",
col = c(color1, color2, color3))
```

The key is to make sure the bin intervals used for the fill scale match those used for the x axis. You can do this by setting the binwidth argument to geom_histogram(), and using ggplot2::cut_width() to break BodyTemperature into the same bins for the fill scale:
set.seed(13)
library(ggplot2)
# example data
Hospital <- data.frame(BodyTemperature = 36.5 + rchisq(100, 2))
ggplot(Hospital, aes(BodyTemperature)) +
geom_histogram(
aes(fill = cut_width(BodyTemperature, width = 1)),
binwidth = 1,
show.legend = FALSE
) +
scale_fill_brewer(palette = "RdYlGn", direction = -1) +
labs(
title = "Temperature vs. Prevalence",
x = "Body Temperature (°C)",
y = "Prevalence"
) +
theme_minimal()
Created on 2022-10-24 with reprex v2.0.2

Related

set x axis on ggtree heatmap in R

I would like to set x axis on a heatmap ggtree.
This is my code
ggtree(working_tree,open.angle=15, size=0.1) %<+% avian %<+% color +
aes(color = I(colour)) +
geom_tippoint(size = 2,) +
geom_tiplab(size = 3, colour = "black") +
theme_tree2()
# I want to rotate the x axis and get the positive number
p1 <- revts(p) + scale_x_continuous(labels = abs)
h1 <- gheatmap(p1, landuse,
offset = 15, width = 0.05, font.size = 3, colnames_position = "top", colnames_angle = 0,
colnames_offset_y = 0, hjust = 0) +
scale_fill_manual(breaks = c("Forest", "Jungle rubber", "Rubber", "Oil palm"),
values = c("#458B00", "#76EE00", "#1874CD", "#00BFFF"), name = "Land use system",
na.value = "white")
, and I got this picture
The problem is that when I showed the heatmap, the x axis automatically changes the range itself from 0 to 60. However, the range I want is from 0 to 80.
Does anyone know how to do this or have any experiences for this?
Updated
I already solved the case by using the function scale_x_continous like this
scale_x_continuous(breaks = seq(-80,0,20), labels = abs(seq(-80,0,20)))
For anyone interested in geological timescale in R, I suggest to use the package deeptime

divide the y axis to make part with a score <25 occupies the majority in ggplot

I want to divide the y axis for the attached figure to take part with a score <25 occupies the majority of the figure while the remaining represent a minor upper part.
I browsed that and I am aware that I should use scale_y_discrete(limits .I used this p<- p+scale_y_continuous(breaks = 1:20, labels = c(1:20,"//",40:100)) but it doesn't work yet.
I used the attached data and this is my code
Code
p<-ggscatter(data, x = "Year" , y = "Score" ,
color = "grey", shape = 21, size = 3, # Points color, shape and size
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
add = "loess", #reg.line
conf.int = T,
cor.coef = F, cor.method = "pearson",
xlab = "Year" , ylab= "Score")
p<-p+ coord_cartesian(xlim = c(1980, 2020));p
Here is as close as I could get getting a fake axis break and resizing the upper area of the plot. I still think it's a bad idea and if this were my plot I'd much prefer a more straightforward axis transform.
First, we'd need a function that generates a transform that squeezes all values above some threshold:
library(ggplot2)
library(scales)
# Define new transform
my_transform <- function(threshold = 25, squeeze_factor = 10) {
force(threshold)
force(squeeze_factor)
my_transform <- trans_new(
name = "trans_squeeze",
transform = function(x) {
ifelse(x > threshold,
((x - threshold) * (1 / squeeze_factor)) + threshold,
x)
},
inverse = function(x) {
ifelse(x > threshold,
((x - threshold) * squeeze_factor) + threshold,
x)
}
)
return(my_transform)
}
Next we apply that transformation to the y-axis and add a fake axis break. I've used vanilla ggplot2 code as I find the ggscatter() approach confusing.
ggplot(data, aes(Year, Score)) +
geom_point(color = "grey", shape = 21, size = 3) +
geom_smooth(method = "loess", fill = "lightgray") +
# Add fake axis lines
annotate("segment", x = -Inf, xend = -Inf,
y = c(-Inf, Inf), yend = c(24.5, 25.5)) +
# Apply transform to y-axis
scale_y_continuous(trans = my_transform(25, 10),
breaks = seq(0, 80, by = 10)) +
scale_x_continuous(limits = c(1980, 2020), oob = oob_keep) +
theme_classic() +
# Turn real y-axis line off
theme(axis.line.y = element_blank())
You might find it informative to read Hadley Wickham's view on discontinuous axes. People sometimes mock weird y-axes.

Colour segments of density plot by bin

Warning, I am brand-new to R!
I have the R bug and having a play with the possibilities but getting very lost. I want to try and colour segments of a density plot with a condition '>' to indicate bins. In my head it look like:
...but not quartile or % change dependant.
My data shows; x = duration (number of days) and y = frequency. I would like the plot to colour split on 3 month intervals up to 12 months and one colour after (using working days i.e. 63 = 3 months).
I have had a go, but really not sure where to start!
ggplot(df3, aes(x=Investigation.Duration))+
geom_density(fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>0],
fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>63], color = "white",
fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>127], color = "light Grey",
fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>190], color = "medium grey",
fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>253], color = "dark grey",
fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>506], color = "black")+
ggtitle ("Investigation duration distribution in 'Wales' complexity sample")+
geom_text(aes(x=175, label=paste0("Mean, 136"), y=0.0053))+
geom_vline(xintercept = c(136.5), color = "red")+
geom_text(aes(x=80, label=paste0("Median, 129"), y=0.0053))+
geom_vline(xintercept = c(129.5), color = "blue")
Any really simple help much appreciated.
Unfortunately, you can't do this directly with geom_density, as "under the hood" it is built with a single polygon, and a polygon can only have a single fill. The only way to do this is to have multiple polygons, and you need to build them yourself.
Fortunately, this is easier than it sounds.
There was no sample data in the question, so we will create a plausible distribution with the same median and mean:
#> Simulate data
set.seed(69)
df3 <- data.frame(Investigation.Duration = rgamma(1000, 5, 1/27.7))
round(median(df3$Investigation.Duration))
#> [1] 129
round(mean(df3$Investigation.Duration))
#> [1] 136
# Get the density as a data frame
dens <- density(df3$Investigation.Duration)
dens <- data.frame(x = dens$x, y = dens$y)
# Exclude the artefactual times below zero
dens <- dens[dens$x > 0, ]
# Split into bands of 3 months and group > 12 months together
dens$band <- dens$x %/% 63
dens$band[dens$band > 3] <- 4
# This us the complex bit. For each band we want to add a point on
# the x axis at the upper and lower ltime imits:
dens <- do.call("rbind", lapply(split(dens, dens$band), function(df) {
df <- rbind(df[1,], df, df[nrow(df),])
df$y[c(1, nrow(df))] <- 0
df
}))
Now we have the polygons, it's just a case of drawing and labelling appropriately:
library(ggplot2)
ggplot(dens, aes(x, y)) +
geom_polygon(aes(fill = factor(band), color = factor(band))) +
theme_minimal() +
scale_fill_manual(values = c("#003f5c", "#58508d", "#bc5090",
"#ff6361", "#ffa600"),
name = "Time",
labels = c("Less than 3 months",
"3 to 6 months",
"6 to 9 months",
"9 to 12 months",
"Over 12 months")) +
scale_colour_manual(values = c("#003f5c", "#58508d", "#bc5090",
"#ff6361", "#ffa600"),
guide = guide_none()) +
labs(x = "Days since investigation started", y = "Density") +
ggtitle ("Investigation duration distribution in 'Wales' complexity sample") +
geom_text(aes(x = 175, label = paste0("Mean, 136"), y = 0.0053),
check_overlap = TRUE)+
geom_vline(xintercept = c(136.5), linetype = 2)+
geom_text(aes(x = 80, label = paste0("Median, 129"), y = 0.0053),
check_overlap = TRUE)+
geom_vline(xintercept = c(129.5), linetype = 2)

Extend geom_voronoi past its limits with scale_*_continuous

the task I have set for myself is to make a voronoi diagram of the www.politicalcompass.org chart of the currently running democratic candidates. I have coded their positions and combined points that overlap into single observations. I have used two separate ggplot extensions that create voronoi diagrams.
The problem is that politicalcompass.org's chart goes from -10 to +10 on both axes. When I try to plot the voronoi diagrams, they only extend to their original limits and not to the full range of -10 to 10 that I intend to plot. Examples and code below:
https://github.com/McCartneyAC/average_of_polls/blob/master/stupid_voronoi_one.png?raw=true
https://github.com/McCartneyAC/average_of_polls/blob/master/stupid_voronoi_two.png?raw=true
library(tidyverse)
library(ggrepel)
candidates_list_voronoi <- tribble(
~candidate,~party,~economic,~authoritarian,
"Bennet","Democratic",8.5,6,
"Biden","Democratic",5.5,3.5,
"Booker","Democratic",4,2.5,
"Buttigieg/Castro","Democratic",6.5,4.5,
"Delaney","Democratic",4,3.5,
"Gabbard","Democratic",-1.5,-1.5,
"Harris","Democratic",5,4,
"Bullock/Klobuchar","Democratic",5,5,
"Sanders","Democratic",-1.5,-1,
"Sestak","Democratic",5.5,2,
"Warren","Democratic",0.5,1,
"Williamson","Democratic",2,-1.5,
"Yang","Democratic",7,1,
"Hawkins","Green",-5,-3,
"Vohra","Libertarian",10,1.5,
"Corker/Pence","Republican",10,8.5,
"Hogan","Republican",10,8,
"Kasich","Republican",8,9,
"Trump","Republican",8.5,8.5,
"Weld","Republican",9.5,4.5
)
library(ggvoronoi)
candidates_list_voronoi %>%
ggplot(aes(economic, authoritarian, label = candidate, fill = candidate)) +
geom_voronoi(color = "black") +
geom_label_repel(fill = "#FFFFFF") +
scale_x_continuous(limits = c(-10,10))+
scale_y_continuous(limits = c(-10,10))
library(ggforce)
candidates_list_voronoi %>%
ggplot(aes(economic, authoritarian)) +
geom_voronoi_tile(aes(fill = candidate, group = -1L)) +
geom_voronoi_segment() +
geom_label_repel(aes(label = candidate)) +
scale_x_continuous(limits = c(-10,10))+
scale_y_continuous(limits = c(-10,10))
You can specify the bounding box in the outline argument in geom_voronoi (see vignette example here).
outline.df <- data.frame(x = c(-10, 10, 10, -10),
y = c(-10, -10, 10, 10))
candidates_list_voronoi %>%
ggplot(aes(economic, authoritarian, fill = candidate)) +
geom_voronoi(outline = outline.df,
color = "black")
(Leaving out the labels part since it's not critical to the question.)

geom_density returns more observations than expected

I am plotting multiple density curves in one plot using geom_density from ggplot2. I am using a data frame of three different variables with 100 observations each. When I plot two of this variables everything seems ok, but for the third one I get unexpected results with density over 400.
Here is the code for the data:
ad <- c(-0.0132492114254477, -0.0131566406997403, -0.0124505699056991, -0.0115071942052754, -0.0137753259532595, -0.0123873418067515, -0.013484307776411, -0.0134860926609266, -0.0126213557468908, -0.0125706300396337, -0.0130604154708213, -0.0128227278939455, -0.0115426841601749, -0.0122782889162225, -0.013070774907749, -0.0119269454694547, -0.0116610578105781, -0.0121781467814678, -0.0124721634549679, -0.012449585895859, -0.0119129861965286, -0.0127578461117945, -0.0128044526445264, -0.013716807434741, -0.0112243706437065, -0.0116435861691951, -0.0114757004236708, -0.0127175755090884, -0.0116204482711493, -0.0130377477108104, -0.0137735602022686, -0.0115581604482711, -0.012729930299303, -0.0112369577695777, -0.0109428317616508, -0.0117127921279212, -0.0115321825884927, -0.0119841820418205, -0.0130280606806068, -0.0135132485991527, -0.0115461937952712, -0.0119339866065326, -0.011019811398114, -0.0129747054803881, -0.0121079158124913, -0.0128866529998634, -0.0121608692086921, -0.0114331529315293, -0.0119070302036353, -0.0119004100041, -0.0117581221812217, -0.011107114937816, -0.0131571764384311, -0.0141545086784201, -0.0100181331146644, -0.0119012190788575, -0.0115824982916497, -0.0113907448407818, -0.0133925816591499, -0.0127234057673909, -0.0131873199398661, -0.0132453409867432, -0.010473172065054, -0.0122787289872899, -0.0118153122864562, -0.0110454803881372, -0.0126237939046056, -0.012450955309553, -0.0121033155664889, -0.0115688861555282, -0.0143594615279486, -0.0119171873718737, -0.0123140139401394, -0.0131844881782151, -0.0107496569632364, -0.0126211343446768, -0.0115844608446084, -0.0114007844745114, -0.0128332786661199, -0.0128161158944922, -0.0114647013803472, -0.011756602432691, -0.0128521142544759, -0.0108213858138581, -0.0125040645073117, -0.0124875495421622, -0.0117613284132842, -0.0127021347546809, -0.0118033675003416, -0.0119659368593686, -0.0116807571409046, -0.0125886674866749, -0.0134783763837637, -0.0127761268279349, -0.0131142927429275, -0.0119841902419024, -0.0124082930162635, -0.0117776711767118, -0.0103475632089655, -0.0117088369550362)
jv <- c(-0.0482115384615385,0.0157269230769231,-0.0738038461538462,0.0211679487179487,-0.0435153846153846,-0.123296153846154,-0.0276717948717949,0.0533141025641026,0.0181576923076923,0.0129294871794872,-0.0384320512820513,0.0192589743589744,-0.0173948717948718,-0.0714230769230769,-0.0332628205128205,-0.0706025641025641,0.0366705128205128,0.0291115384615385,-0.0759076923076923,0.00654615384615385,-0.00717435897435898,-0.0177871794871795,0.101819230769231,0.0550935897435897,0.0267064102564103,-0.0546858974358974,-0.0297051282051282,-0.00357179487179487,-0.0270423076923077,-0.0272679487179487,0.0187871794871795,-0.0283602564102564,-0.0277012820512821,-0.105816666666667,0.0205679487179487,-0.0592487179487179,0.0306692307692308,-0.0260294871794872,0.00484615384615385,0.00461666666666667,-0.00527307692307692,-0.0263,-0.0303576923076923,0.0370576923076923,-0.0291346153846154,-0.0259294871794872,-0.0230320512820513,-0.0300089743589744,-0.0328589743589744,0.000247435897435898,-0.0256371794871795,-0.00738333333333333,-0.00796410256410257,0.00740000000000001,0.0251282051282051,-0.0435948717948718,0.0045474358974359,-0.0328589743589744,-0.028224358974359,-0.0188525641025641,-0.0164871794871795,-0.0456153846153846,-0.0882666666666667,0.0340987179487179,-0.0272166666666667,0.0326153846153846,-0.0682730769230769,-0.0203346153846154,-0.0712448717948718,0.0139166666666667,-0.00764487179487179,0.0173282051282051,-0.0299807692307692,0.0117282051282051,0.0266089743589744,-0.0869025641025641,-0.0227051282051282,0.053675641025641,0.0453115384615385,-0.00631794871794872,-0.0243923076923077,0.000192307692307693,-0.0350705128205128,-0.0226307692307692,0.019925641025641,-0.0162,-0.00284615384615385,0.0322615384615385,-0.024424358974359,-0.0704871794871795,-0.00747564102564103,-0.0441782051282051,0.0897589743589744,-0.00944871794871795,0.0320948717948718,-0.00680512820512821,-0.0837705128205128,-0.0299435897435897,-0.0639474358974359,0.0137384615384615)
all <- c(-0.0307303749434931,0.0012851411885914,-0.0431272080297726,0.00483037725633668,-0.0286453552843221,-0.0678417478264527,-0.0205780513241029,0.019914004951588,0.00276816828040077,0.000179428569926724,-0.0257462333764363,0.00321812323251441,-0.0144687779775233,-0.0418506829196497,-0.0231667977102848,-0.0412647547860094,0.0125047275049673,0.00846669584003534,-0.0441899278813301,-0.0029517160248526,-0.00954367258544381,-0.015272512799487,0.0445073890623522,0.0206883911544244,0.00774101980635189,-0.0331647418025463,-0.0205904143143995,-0.00814468519044164,-0.0193313779817285,-0.0201528482143795,0.00250680964245544,-0.0199592084292638,-0.0202156061752925,-0.0585268122181222,0.00481255847814894,-0.0354807550383196,0.00956852409036905,-0.0190068346106538,-0.00409095341722648,-0.00444829096624301,-0.00840963535917408,-0.0191169933032663,-0.0206887518529032,0.0120414934136521,-0.0206212655985533,-0.0194080700896753,-0.0175964602453717,-0.0207210636452518,-0.0223830022813048,-0.00582648705333207,-0.0186976508342006,-0.00924522413557467,-0.0105606395012668,-0.00337725433921004,0.00755503600677036,-0.0277480454368647,-0.0035175311971069,-0.0221248595998781,-0.0208084703167544,-0.0157879849349775,-0.0148372497135228,-0.0294303628010639,-0.0493699193658603,0.010909994480714,-0.0195159894765614,0.0107849521136237,-0.0404484354138413,-0.0163927853470842,-0.0416740936806804,0.00117389025556923,-0.0110021666614102,0.00270550887816572,-0.0211473915854543,-0.000728141525005001,0.007929658697869,-0.0497618492236205,-0.0171447945248683,0.0211374282755648,0.0162391298977093,-0.00956703230622048,-0.0179285045363275,-0.00578214737019165,-0.0239613135374943,-0.0167260775223137,0.00371078825916465,-0.0143437747710811,-0.00730374112971901,0.00977970185342875,-0.0181138632373503,-0.041226558173274,-0.00957819908327283,-0.02838343630744,0.0381402989876053,-0.0111124223883264,0.00949028952597218,-0.00939465922351531,-0.0480894029183882,-0.0208606304601508,-0.0371474995532007,0.00101481229171268)
wang <- c(0.2383,-0.0022,-0.1754,0.0201,-0.2122,-0.2433,-0.0417,-0.087,-0.1733,-0.0926,0.0108,0.1159,0.0116,-0.0188,-0.0521,0.0927,-0.029,-0.1382,-0.1039,-0.1547,0.178,0.1101,0.008,-0.0127,0.0442,0.0036,0.0718,0.0529,-0.0873,-0.4223,-0.016,0.1449,0.1787,0.2187,0.132,0.0556,-0.1027,0.2228,-0.305,-0.1352,0.0763,0.0236,0.2504,-0.046,0.1139,-0.1191,0.0101,0.0876,-0.1283,0.0761,0.1044,-0.0583,0.0929,-0.0966,-0.0196,0.1311,0.0329,-0.2297,0.0595,-0.3032,-0.0741,0.2044,0.0406,0.0533,0.0826,0.0035,-0.0818,-0.0747,-0.218,-9e-04,0.0666,-0.0916,-0.0613,-0.2477,-0.0238,0.1959,-0.3,0.069)
# data frames
df <- data.frame(ad = ad, jv = jv, all = all)
wang <- data.frame(wang = wang)
When I plot df$all using the function below, everything is OK.
This is the plot I get, with the expected density values considering 100 observations.
ggplot() +
geom_density(aes(x = wang, colour = 'observed'), wang, size = 1) +
geom_density(aes(x = jv, colour = 'expected within'), df, size = 1) +
geom_density(aes(x = all, colour = 'expected adults'), df, size = 1) + #df$all in this line
geom_vline(aes(xintercept = mean(wang$wang),
colour = 'observed mean')) +
scale_colour_manual("", values = c('observed' = "dodgerblue2",
'expected within' = "darkgoldenrod2",
'expected adults' = 'darkolivegreen4',
'observed mean' = 'red')) +
scale_x_continuous(expand = c(0, 0), limits = c(-0.3, 0.2)) +
scale_y_continuous(expand = c(0, 0))
However, when using df$ad instead of df$all in the fourth geom_density, I get this plot with density values much higher than the number of observations
ggplot() +
geom_density(aes(x = wang, colour = 'observed'), wang, size = 1) +
geom_density(aes(x = jv, colour = 'expected within'), df, size = 1) +
geom_density(aes(x = ad, colour = 'expected adults'), df, size = 1) + #df$ad in this line
geom_vline(aes(xintercept = mean(wang$wang),
colour = 'observed mean')) +
scale_colour_manual("", values = c('observed' = "dodgerblue2",
'expected within' = "darkgoldenrod2",
'expected adults' = 'darkolivegreen4',
'observed mean' = 'red')) +
scale_x_continuous(expand = c(0, 0), limits = c(-0.3, 0.2)) +
scale_y_continuous(expand = c(0, 0))
I than plotted the histogram along with the density plot for df$ad (code below), this is what I get
ggplot() +
geom_density(aes(x = ad, colour = 'density'), df) +
geom_histogram(aes(x = ad), df)
Why do I get such high values when plotting density for df$ad when I only have 100 observations, as shown by the histrogram? And why this does not happen when plotting df$all?
Thanks
Because 'geom_density' plots density estimates, and 'geom_hist' gives counts of data falling into different bins. The units on the y-axis for 'geom_density' are not counts. See Can a probability distribution value exceeding 1 be OK? for more info on what the density estimates actually mean.
Your ad variable is simply much less variable, with a standard deviation of 0.00085 relative to a sd of 0.019 for all.
Look at scaling to 1 in the help, ?geom_density if you want the densitys on the same scale for all variables: aes(x = ad, colour = 'expected adults', y=..scaled..). Either way 'geom_density' is displaying your data correctly, although you may want to explore whether histograms wouldn't be a better way to display your data's distributions anyways.

Resources