Warning, I am brand-new to R!
I have the R bug and having a play with the possibilities but getting very lost. I want to try and colour segments of a density plot with a condition '>' to indicate bins. In my head it look like:
...but not quartile or % change dependant.
My data shows; x = duration (number of days) and y = frequency. I would like the plot to colour split on 3 month intervals up to 12 months and one colour after (using working days i.e. 63 = 3 months).
I have had a go, but really not sure where to start!
ggplot(df3, aes(x=Investigation.Duration))+
geom_density(fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>0],
fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>63], color = "white",
fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>127], color = "light Grey",
fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>190], color = "medium grey",
fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>253], color = "dark grey",
fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>506], color = "black")+
ggtitle ("Investigation duration distribution in 'Wales' complexity sample")+
geom_text(aes(x=175, label=paste0("Mean, 136"), y=0.0053))+
geom_vline(xintercept = c(136.5), color = "red")+
geom_text(aes(x=80, label=paste0("Median, 129"), y=0.0053))+
geom_vline(xintercept = c(129.5), color = "blue")
Any really simple help much appreciated.
Unfortunately, you can't do this directly with geom_density, as "under the hood" it is built with a single polygon, and a polygon can only have a single fill. The only way to do this is to have multiple polygons, and you need to build them yourself.
Fortunately, this is easier than it sounds.
There was no sample data in the question, so we will create a plausible distribution with the same median and mean:
#> Simulate data
set.seed(69)
df3 <- data.frame(Investigation.Duration = rgamma(1000, 5, 1/27.7))
round(median(df3$Investigation.Duration))
#> [1] 129
round(mean(df3$Investigation.Duration))
#> [1] 136
# Get the density as a data frame
dens <- density(df3$Investigation.Duration)
dens <- data.frame(x = dens$x, y = dens$y)
# Exclude the artefactual times below zero
dens <- dens[dens$x > 0, ]
# Split into bands of 3 months and group > 12 months together
dens$band <- dens$x %/% 63
dens$band[dens$band > 3] <- 4
# This us the complex bit. For each band we want to add a point on
# the x axis at the upper and lower ltime imits:
dens <- do.call("rbind", lapply(split(dens, dens$band), function(df) {
df <- rbind(df[1,], df, df[nrow(df),])
df$y[c(1, nrow(df))] <- 0
df
}))
Now we have the polygons, it's just a case of drawing and labelling appropriately:
library(ggplot2)
ggplot(dens, aes(x, y)) +
geom_polygon(aes(fill = factor(band), color = factor(band))) +
theme_minimal() +
scale_fill_manual(values = c("#003f5c", "#58508d", "#bc5090",
"#ff6361", "#ffa600"),
name = "Time",
labels = c("Less than 3 months",
"3 to 6 months",
"6 to 9 months",
"9 to 12 months",
"Over 12 months")) +
scale_colour_manual(values = c("#003f5c", "#58508d", "#bc5090",
"#ff6361", "#ffa600"),
guide = guide_none()) +
labs(x = "Days since investigation started", y = "Density") +
ggtitle ("Investigation duration distribution in 'Wales' complexity sample") +
geom_text(aes(x = 175, label = paste0("Mean, 136"), y = 0.0053),
check_overlap = TRUE)+
geom_vline(xintercept = c(136.5), linetype = 2)+
geom_text(aes(x = 80, label = paste0("Median, 129"), y = 0.0053),
check_overlap = TRUE)+
geom_vline(xintercept = c(129.5), linetype = 2)
the task I have set for myself is to make a voronoi diagram of the www.politicalcompass.org chart of the currently running democratic candidates. I have coded their positions and combined points that overlap into single observations. I have used two separate ggplot extensions that create voronoi diagrams.
The problem is that politicalcompass.org's chart goes from -10 to +10 on both axes. When I try to plot the voronoi diagrams, they only extend to their original limits and not to the full range of -10 to 10 that I intend to plot. Examples and code below:
https://github.com/McCartneyAC/average_of_polls/blob/master/stupid_voronoi_one.png?raw=true
https://github.com/McCartneyAC/average_of_polls/blob/master/stupid_voronoi_two.png?raw=true
library(tidyverse)
library(ggrepel)
candidates_list_voronoi <- tribble(
~candidate,~party,~economic,~authoritarian,
"Bennet","Democratic",8.5,6,
"Biden","Democratic",5.5,3.5,
"Booker","Democratic",4,2.5,
"Buttigieg/Castro","Democratic",6.5,4.5,
"Delaney","Democratic",4,3.5,
"Gabbard","Democratic",-1.5,-1.5,
"Harris","Democratic",5,4,
"Bullock/Klobuchar","Democratic",5,5,
"Sanders","Democratic",-1.5,-1,
"Sestak","Democratic",5.5,2,
"Warren","Democratic",0.5,1,
"Williamson","Democratic",2,-1.5,
"Yang","Democratic",7,1,
"Hawkins","Green",-5,-3,
"Vohra","Libertarian",10,1.5,
"Corker/Pence","Republican",10,8.5,
"Hogan","Republican",10,8,
"Kasich","Republican",8,9,
"Trump","Republican",8.5,8.5,
"Weld","Republican",9.5,4.5
)
library(ggvoronoi)
candidates_list_voronoi %>%
ggplot(aes(economic, authoritarian, label = candidate, fill = candidate)) +
geom_voronoi(color = "black") +
geom_label_repel(fill = "#FFFFFF") +
scale_x_continuous(limits = c(-10,10))+
scale_y_continuous(limits = c(-10,10))
library(ggforce)
candidates_list_voronoi %>%
ggplot(aes(economic, authoritarian)) +
geom_voronoi_tile(aes(fill = candidate, group = -1L)) +
geom_voronoi_segment() +
geom_label_repel(aes(label = candidate)) +
scale_x_continuous(limits = c(-10,10))+
scale_y_continuous(limits = c(-10,10))
You can specify the bounding box in the outline argument in geom_voronoi (see vignette example here).
outline.df <- data.frame(x = c(-10, 10, 10, -10),
y = c(-10, -10, 10, 10))
candidates_list_voronoi %>%
ggplot(aes(economic, authoritarian, fill = candidate)) +
geom_voronoi(outline = outline.df,
color = "black")
(Leaving out the labels part since it's not critical to the question.)
I am plotting multiple density curves in one plot using geom_density from ggplot2. I am using a data frame of three different variables with 100 observations each. When I plot two of this variables everything seems ok, but for the third one I get unexpected results with density over 400.
Here is the code for the data:
ad <- c(-0.0132492114254477, -0.0131566406997403, -0.0124505699056991, -0.0115071942052754, -0.0137753259532595, -0.0123873418067515, -0.013484307776411, -0.0134860926609266, -0.0126213557468908, -0.0125706300396337, -0.0130604154708213, -0.0128227278939455, -0.0115426841601749, -0.0122782889162225, -0.013070774907749, -0.0119269454694547, -0.0116610578105781, -0.0121781467814678, -0.0124721634549679, -0.012449585895859, -0.0119129861965286, -0.0127578461117945, -0.0128044526445264, -0.013716807434741, -0.0112243706437065, -0.0116435861691951, -0.0114757004236708, -0.0127175755090884, -0.0116204482711493, -0.0130377477108104, -0.0137735602022686, -0.0115581604482711, -0.012729930299303, -0.0112369577695777, -0.0109428317616508, -0.0117127921279212, -0.0115321825884927, -0.0119841820418205, -0.0130280606806068, -0.0135132485991527, -0.0115461937952712, -0.0119339866065326, -0.011019811398114, -0.0129747054803881, -0.0121079158124913, -0.0128866529998634, -0.0121608692086921, -0.0114331529315293, -0.0119070302036353, -0.0119004100041, -0.0117581221812217, -0.011107114937816, -0.0131571764384311, -0.0141545086784201, -0.0100181331146644, -0.0119012190788575, -0.0115824982916497, -0.0113907448407818, -0.0133925816591499, -0.0127234057673909, -0.0131873199398661, -0.0132453409867432, -0.010473172065054, -0.0122787289872899, -0.0118153122864562, -0.0110454803881372, -0.0126237939046056, -0.012450955309553, -0.0121033155664889, -0.0115688861555282, -0.0143594615279486, -0.0119171873718737, -0.0123140139401394, -0.0131844881782151, -0.0107496569632364, -0.0126211343446768, -0.0115844608446084, -0.0114007844745114, -0.0128332786661199, -0.0128161158944922, -0.0114647013803472, -0.011756602432691, -0.0128521142544759, -0.0108213858138581, -0.0125040645073117, -0.0124875495421622, -0.0117613284132842, -0.0127021347546809, -0.0118033675003416, -0.0119659368593686, -0.0116807571409046, -0.0125886674866749, -0.0134783763837637, -0.0127761268279349, -0.0131142927429275, -0.0119841902419024, -0.0124082930162635, -0.0117776711767118, -0.0103475632089655, -0.0117088369550362)
jv <- c(-0.0482115384615385,0.0157269230769231,-0.0738038461538462,0.0211679487179487,-0.0435153846153846,-0.123296153846154,-0.0276717948717949,0.0533141025641026,0.0181576923076923,0.0129294871794872,-0.0384320512820513,0.0192589743589744,-0.0173948717948718,-0.0714230769230769,-0.0332628205128205,-0.0706025641025641,0.0366705128205128,0.0291115384615385,-0.0759076923076923,0.00654615384615385,-0.00717435897435898,-0.0177871794871795,0.101819230769231,0.0550935897435897,0.0267064102564103,-0.0546858974358974,-0.0297051282051282,-0.00357179487179487,-0.0270423076923077,-0.0272679487179487,0.0187871794871795,-0.0283602564102564,-0.0277012820512821,-0.105816666666667,0.0205679487179487,-0.0592487179487179,0.0306692307692308,-0.0260294871794872,0.00484615384615385,0.00461666666666667,-0.00527307692307692,-0.0263,-0.0303576923076923,0.0370576923076923,-0.0291346153846154,-0.0259294871794872,-0.0230320512820513,-0.0300089743589744,-0.0328589743589744,0.000247435897435898,-0.0256371794871795,-0.00738333333333333,-0.00796410256410257,0.00740000000000001,0.0251282051282051,-0.0435948717948718,0.0045474358974359,-0.0328589743589744,-0.028224358974359,-0.0188525641025641,-0.0164871794871795,-0.0456153846153846,-0.0882666666666667,0.0340987179487179,-0.0272166666666667,0.0326153846153846,-0.0682730769230769,-0.0203346153846154,-0.0712448717948718,0.0139166666666667,-0.00764487179487179,0.0173282051282051,-0.0299807692307692,0.0117282051282051,0.0266089743589744,-0.0869025641025641,-0.0227051282051282,0.053675641025641,0.0453115384615385,-0.00631794871794872,-0.0243923076923077,0.000192307692307693,-0.0350705128205128,-0.0226307692307692,0.019925641025641,-0.0162,-0.00284615384615385,0.0322615384615385,-0.024424358974359,-0.0704871794871795,-0.00747564102564103,-0.0441782051282051,0.0897589743589744,-0.00944871794871795,0.0320948717948718,-0.00680512820512821,-0.0837705128205128,-0.0299435897435897,-0.0639474358974359,0.0137384615384615)
all <- c(-0.0307303749434931,0.0012851411885914,-0.0431272080297726,0.00483037725633668,-0.0286453552843221,-0.0678417478264527,-0.0205780513241029,0.019914004951588,0.00276816828040077,0.000179428569926724,-0.0257462333764363,0.00321812323251441,-0.0144687779775233,-0.0418506829196497,-0.0231667977102848,-0.0412647547860094,0.0125047275049673,0.00846669584003534,-0.0441899278813301,-0.0029517160248526,-0.00954367258544381,-0.015272512799487,0.0445073890623522,0.0206883911544244,0.00774101980635189,-0.0331647418025463,-0.0205904143143995,-0.00814468519044164,-0.0193313779817285,-0.0201528482143795,0.00250680964245544,-0.0199592084292638,-0.0202156061752925,-0.0585268122181222,0.00481255847814894,-0.0354807550383196,0.00956852409036905,-0.0190068346106538,-0.00409095341722648,-0.00444829096624301,-0.00840963535917408,-0.0191169933032663,-0.0206887518529032,0.0120414934136521,-0.0206212655985533,-0.0194080700896753,-0.0175964602453717,-0.0207210636452518,-0.0223830022813048,-0.00582648705333207,-0.0186976508342006,-0.00924522413557467,-0.0105606395012668,-0.00337725433921004,0.00755503600677036,-0.0277480454368647,-0.0035175311971069,-0.0221248595998781,-0.0208084703167544,-0.0157879849349775,-0.0148372497135228,-0.0294303628010639,-0.0493699193658603,0.010909994480714,-0.0195159894765614,0.0107849521136237,-0.0404484354138413,-0.0163927853470842,-0.0416740936806804,0.00117389025556923,-0.0110021666614102,0.00270550887816572,-0.0211473915854543,-0.000728141525005001,0.007929658697869,-0.0497618492236205,-0.0171447945248683,0.0211374282755648,0.0162391298977093,-0.00956703230622048,-0.0179285045363275,-0.00578214737019165,-0.0239613135374943,-0.0167260775223137,0.00371078825916465,-0.0143437747710811,-0.00730374112971901,0.00977970185342875,-0.0181138632373503,-0.041226558173274,-0.00957819908327283,-0.02838343630744,0.0381402989876053,-0.0111124223883264,0.00949028952597218,-0.00939465922351531,-0.0480894029183882,-0.0208606304601508,-0.0371474995532007,0.00101481229171268)
wang <- c(0.2383,-0.0022,-0.1754,0.0201,-0.2122,-0.2433,-0.0417,-0.087,-0.1733,-0.0926,0.0108,0.1159,0.0116,-0.0188,-0.0521,0.0927,-0.029,-0.1382,-0.1039,-0.1547,0.178,0.1101,0.008,-0.0127,0.0442,0.0036,0.0718,0.0529,-0.0873,-0.4223,-0.016,0.1449,0.1787,0.2187,0.132,0.0556,-0.1027,0.2228,-0.305,-0.1352,0.0763,0.0236,0.2504,-0.046,0.1139,-0.1191,0.0101,0.0876,-0.1283,0.0761,0.1044,-0.0583,0.0929,-0.0966,-0.0196,0.1311,0.0329,-0.2297,0.0595,-0.3032,-0.0741,0.2044,0.0406,0.0533,0.0826,0.0035,-0.0818,-0.0747,-0.218,-9e-04,0.0666,-0.0916,-0.0613,-0.2477,-0.0238,0.1959,-0.3,0.069)
# data frames
df <- data.frame(ad = ad, jv = jv, all = all)
wang <- data.frame(wang = wang)
When I plot df$all using the function below, everything is OK.
This is the plot I get, with the expected density values considering 100 observations.
ggplot() +
geom_density(aes(x = wang, colour = 'observed'), wang, size = 1) +
geom_density(aes(x = jv, colour = 'expected within'), df, size = 1) +
geom_density(aes(x = all, colour = 'expected adults'), df, size = 1) + #df$all in this line
geom_vline(aes(xintercept = mean(wang$wang),
colour = 'observed mean')) +
scale_colour_manual("", values = c('observed' = "dodgerblue2",
'expected within' = "darkgoldenrod2",
'expected adults' = 'darkolivegreen4',
'observed mean' = 'red')) +
scale_x_continuous(expand = c(0, 0), limits = c(-0.3, 0.2)) +
scale_y_continuous(expand = c(0, 0))
However, when using df$ad instead of df$all in the fourth geom_density, I get this plot with density values much higher than the number of observations
ggplot() +
geom_density(aes(x = wang, colour = 'observed'), wang, size = 1) +
geom_density(aes(x = jv, colour = 'expected within'), df, size = 1) +
geom_density(aes(x = ad, colour = 'expected adults'), df, size = 1) + #df$ad in this line
geom_vline(aes(xintercept = mean(wang$wang),
colour = 'observed mean')) +
scale_colour_manual("", values = c('observed' = "dodgerblue2",
'expected within' = "darkgoldenrod2",
'expected adults' = 'darkolivegreen4',
'observed mean' = 'red')) +
scale_x_continuous(expand = c(0, 0), limits = c(-0.3, 0.2)) +
scale_y_continuous(expand = c(0, 0))
I than plotted the histogram along with the density plot for df$ad (code below), this is what I get
ggplot() +
geom_density(aes(x = ad, colour = 'density'), df) +
geom_histogram(aes(x = ad), df)
Why do I get such high values when plotting density for df$ad when I only have 100 observations, as shown by the histrogram? And why this does not happen when plotting df$all?
Thanks
Because 'geom_density' plots density estimates, and 'geom_hist' gives counts of data falling into different bins. The units on the y-axis for 'geom_density' are not counts. See Can a probability distribution value exceeding 1 be OK? for more info on what the density estimates actually mean.
Your ad variable is simply much less variable, with a standard deviation of 0.00085 relative to a sd of 0.019 for all.
Look at scaling to 1 in the help, ?geom_density if you want the densitys on the same scale for all variables: aes(x = ad, colour = 'expected adults', y=..scaled..). Either way 'geom_density' is displaying your data correctly, although you may want to explore whether histograms wouldn't be a better way to display your data's distributions anyways.