Using ggplot2 in R to generate stacked area graph [duplicate] - r

This question already has an answer here:
Making a stacked area plot using ggplot2
(1 answer)
Closed 2 years ago.
I'm trying to create a stacked area graph with R using the package ggplot2 with the below data:
> dput(ec.admin1.ma.tall[1:20,])
structure(list(date = structure(c(18346, 18347, 18348, 18349,
18350, 18351, 18352, 18353, 18362, 18363, 18364, 18365, 18366,
18367, 18354, 18374, 18375, 18376, 18379, 18380), class = "Date"),
locations = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("azuay_newcase_avg",
"bolivar_newcase_avg", "canar_newcase_avg", "carchi_newcase_avg",
"chimborazo_newcase_avg", "cotopaxi_newcase_avg", "eloro_newcase_avg",
"esmeraldas_newcase_avg", "galapagos_newcase_avg", "guayas_newcase_avg",
"imbabura_newcase_avg", "loja_newcase_avg", "losrios_newcase_avg",
"manabi_newcase_avg", "moronasant_newcase_avg", "napo_newcase_avg",
"orellana_newcase_avg", "pastaza_newcase_avg", "pichincha_newcase_avg",
"santaelena_newcase_avg", "stodom_newcase_avg", "sucumbios_newcase_avg",
"tungurahua_newcase_avg", "zamchin_newcase_avg"), class = "factor"),
newcases_ma = c(NA, NA, NA, 5.85714285714286, 8.14285714285714,
13.1428571428571, 12.8571428571429, 16.2857142857143, 15.2857142857143,
16.1428571428571, 14.2857142857143, 12.5714285714286, 18,
19.2857142857143, 39.2857142857143, 38.7142857142857, 53.2857142857143,
53, 52.4285714285714, 46)), row.names = c(NA, 20L), class = "data.frame")
> ec.admin1.ma.tall$locations <- factor(ec.admin1.ma.tall$locations)
> ec.admin1.ma.tall$date <- as.Date(ec.admin1.ma.tall$date, "%m/%d/%Y")
> ggplot(ec.admin1.ma.tall, aes(x = date, y = newcases_ma, fill = locations, group =
locations)) + geom_area()
The image I get from this code is: Stacked Area Graph plotting number of new cases by region
However, from plotting the individual regions, I don't believe my plot is accurate. The code for this plot is below:
ggplot(ec.admin1.ma.tall, aes(x = date, y = newcases_ma, fill = locations)) +
geom_col() +
labs(title = "Moving 7-Day Average for New Cases in Admin 1 Regions - Ecuador",
x = "Date", y = "7-Day Moving Average, New Cases") +
theme(axis.text.x = element_text(angle = 90, size = rel(0.5), vjust = 0.5, hjust=1)) +
facet_wrap(~locations, nrow = 6, scales = "free")
Bar graph of new cases over time, split by individual regions
As you can see from the y-axis of these individual regions, none of the values go above 2000 and not many go even above 1000 cases. Would anyone know why there is this discrepancy between the individual region's data and the stacked area graph?

I just took a quick look, but the plots seem reasonable to me and the code looks OK too. Check out the "guayas" small multiples plot. the peak values early on reach about 1500, which is about the vertical size of the large green section of your stacked area plot. None go over 2000, but the sum of guayas and other regions certainly goes over 2000 at that particular point on the x-axis.

Related

Specify end points for different groups when plotting regression output in R

I'm hoping to get some help with presenting regression outputs for my Masters thesis. I am assessing the impacts of elephants on woody vegetation, particularly in relation to artificial waterholes. In addition to generally declining with distance from waterholes, the impacts differ substantially between the two vegetation types involved.
I've figured out what seems to me a satisfactory way to of plotting this using visreg. In the model output shown below, both distance to waterhole and veg type explained damage, hence my attempt to show both. However, the issue is that I only have samples at the furthest distances for waterholes (x-axis) from the red vegetation type. As you can see, the regression line for the blue veg type is extending beyond the last points for this vegetation type. Is there anyway I can get the blue line to stop at a smaller distance from the waterhole (x axis value) than for the red to avoid this?
See code for the model and plot below the visreg plot.
Sample data and code
> dput(vegdata[21:52, c(4,7,33)])
structure(list(distance = c(207L, 202L, 501L, 502L, 1001L, 1004L,
2010L, 1997L, 4003L, 3998L, 202L, 194L, 499L, 494L, 1004L, 1000L,
2008L, 1993L, 4008L, 3998L, 493L, 992L, 1941L, 2525L, 485L, 978L,
1941L, 3024L, 495L, 978L, 1977L, 2952L), vegtype = structure(c(1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("teak",
"term"), class = "factor"), toedl = c(35.48031025, 47.30482718,
25.16709533, 22.29360164, 17.6546533, 12.81605101, 20.34136734,
18.45809334, 11.3578081, 3.490830751, 60.54870317, 44.9863128,
18.81010698, 20.4777188, 30.36994386, 18.7417214, 21.52247156,
18.29685939, 30.26217664, 8.945486104, 43.95749178, 43.54799495,
44.42693993, 50.06207783, 48.05538594, 35.31220933, 52.37339094,
40.51569938, 41.45677007, 58.86629306, 37.80203313, 46.35633342
)), row.names = 21:52, class = "data.frame")
m1<-lm(toedl~vegtype+distance, data=vegdata)
summary(m1)
library(visreg)
visreg(oedl6, 'sexactd', by='vegtype',overlay=TRUE, gg=TRUE, points=list(size=2.5), ylab='% old elephant damage', xlab='distance from waterhole')
Regarding the comments about a reproducible example, you can just make a small dataframe with representative data like below, also a general comment that you should avoid naming your variables names of base functions like 'all'.
I'm not sure whether it's possible to use visreg to do what you want, but you can extract the information from your model using predict, then use ggplot to plot it, which may be preferable because ggplot is really good for customizing plots.
library(ggplot2)
library(visreg)
# Create reproducible data example
allData <- data.frame(vegtype = rep(c("t1", "t2"), each = 10),
oedl = c(seq(from = 35, to = 20, length.out = 10),
seq(from = 20, to = 5, length.out = 10)),
sexactd = c(seq(from = -1, to = 1, length.out = 10),
seq(from = -1, to = 2, length.out = 10)))
# Make linear model
oedl6 <- lm(formula = oedl ~ sexactd + vegtype, data = allData)
# Predict the data using the linear model
odelPred <- cbind(allData, predict(oedl6, interval = 'confidence'))
ggplot(odelPred, aes(sexactd, oedl, color = vegtype, fill = vegtype)) +
geom_point() + geom_line(aes(sexactd, fit)) +
geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.3)
MR Macarthurs solution is great, and (s)he deserved the accepted answer. Visualising a multiple regression model with several predictors in a 2 dimensional graph is... difficult. Basically, you are limited to one predictor. And can add the interaction (in your case: vegtype). One can simply use
geom_smooth for it.
Using your data:
library(tidyverse)
ggplot(vegdata, aes(toedl, distance, color = vegtype)) +
geom_point() +
geom_smooth(method = 'lm')
Created on 2019-12-13 by the reprex package (v0.3.0)

R: ggplot2 multiple regression lines grouped by variable

I have a dataframe (sample below) with 3 columns. My goal is to have the variable "Return" on the y-axis and "BetaRealized" on the x-axis. Based on that, I would like to have two regression lines grouped by "SML" e.g. one regression line for the two "Theoretical" values and one for the 10 "Empirical" values. Preferably I would like to use ggplot2.
I've looked through several other questions but I wasn't able to find one that fits my case. As I am very new to R, I would greatly appreciate any help. Feel free to help me improve my question for future users if necessary.
Reproducible data sample:
structure(list(SML = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L), .Label = c("Empirical", "Theoretical"), class = "factor"),
Return = c(0.00136162543341773, 0.00327371856919072, 0.00402550498386094,
0.00514512870557883, 0.00491788632261087, 0.00501053666090353,
0.00485590289408263, 0.00576880451680399, 0.00579134238930521,
0.00704131096883141, 0.00471917614445859, 0), BetaRealized = c(0.42574984058487,
0.576898009418581, 0.684024167075167, 0.763551381826944,
0.833875797322081, 0.902738972263857, 0.976227211834564,
1.06544414896672, 1.19436401770255, 1.50932083346054, 0.893219438045588,
0)), class = "data.frame", row.names = c(NA, -12L))
Following AntoniosK comment, it seems the solution is to use geom_smooth with a color argument in the following manner. First, transforming you sample data into a dataframe:
df<-data.frame(structure(list(SML = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L), .Label = c("Empirical", "Theoretical"), class = "factor"),
Return = c(0.00136162543341773, 0.00327371856919072, 0.00402550498386094,
0.00514512870557883, 0.00491788632261087, 0.00501053666090353,
0.00485590289408263, 0.00576880451680399, 0.00579134238930521,
0.00704131096883141, 0.00471917614445859, 0), BetaRealized = c(0.42574984058487,
0.576898009418581, 0.684024167075167, 0.763551381826944,
0.833875797322081, 0.902738972263857, 0.976227211834564,
1.06544414896672, 1.19436401770255, 1.50932083346054, 0.893219438045588,
0)), class = "data.frame", row.names = c(NA, -12L)))
In the sequence, just call ggplot like this:
ggplot(df, aes(BetaRealized, Return, color = SML)) + geom_point()+geom_smooth(method=lm, se=FALSE)
the output will be this one: graph
Addtionally, you can add the equation using the package ggpubr:
ggplot(df, aes(BetaRealized, Return, color = SML)) + geom_point()+stat_smooth(method=lm, se=FALSE)+
stat_regline_equation()
Finally, depending on your objectvei, it may be interesting to use facet_wrap to distinguish the categories:
ggplot(df, aes(BetaRealized, Return, color = SML)) + geom_point()+
stat_smooth(method=lm, se=FALSE)+ facet_wrap(~SML)+
stat_regline_equation()
The image will look like this: graph2

R stackedBar chart

If this is my dataset.
Surgery Surv_Prob Group
CV 0.5113 Diabetic
Hip 0.6619 Diabetic
Knee 0.6665 Diabetic
QFox 0.7054 Diabetic
CV 0.5113 Non-Diabetic
Hip 0.6629 Non-Diabetic
Knee 0.6744 Non-Diabetic
QFox 0.7073 Non-Diabetic
How do i plot a stacked bar plot like this below.
Please note the values are already cumulative in nature, so the plot should show a very little increase from CV to Hip (delta = 0.6619- 0.5113)
And the order should be CV -> Hip -> Knee -> QFox
There could be a way where you can plot the cumulative values directly, however one way is to get the actual value and plot the stacked bar plot by arranging the Surgery data in the order you want using factor. For factor levels I have used rev(unique(Surgery)) for convenience as you want order in opposite order of how they appear in the dataset. For more complex types you might need to add levels manually.
library(tidyverse)
df %>%
group_by(Group) %>%
mutate(Surv_Prob1 = c(Surv_Prob[1], diff(Surv_Prob)),
Surgery = factor(Surgery, levels = rev(unique(Surgery)))) %>%
ggplot() + aes(Group, Surv_Prob1, fill = Surgery, label = Surv_Prob) +
geom_bar(stat = "identity") +
geom_text(size = 3, position = position_stack(vjust = 0.5))
data
df <- structure(list(Surgery = structure(c(1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L), .Label = c("CV", "Hip", "Knee", "QFox"), class = "factor"),
Surv_Prob = c(0.5113, 0.6619, 0.6665, 0.7054, 0.5113, 0.6629,
0.6744, 0.7073), Group = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L), .Label = c("Diabetic", "Non-Diabetic"), class =
"factor")), class = "data.frame", row.names = c(NA, -8L))

Change x axis origin to a value (not zero) in ggplot2

I am working on generating a tornado plot in R. I am using ggplot2 package with code like the following:
dat <- structure(list(variable = structure(c(1L, 2L, 3L, 1L, 2L, 3L), .Label = c("# of nodes needed",
"# of nodes owned", "cost per node"), class = "factor"), Level = structure(c(2L,
2L, 2L, 1L, 1L, 1L), .Label = c("high", "low"), class = "factor"),
value = c(-275, -550, -50, 825, 275, 450)), .Names = c("variable",
"Level", "value"), row.names = c(NA, -6L), class = "data.frame")
ggplot(dat, aes(fill=Level,variable,value )) +
geom_bar(position = 'identity',stat = 'identity') + coord_flip()
I am curious as to how to change x-axis origin. Right now, the origin is automatically set to zero, and I want to be able to change it to a variable specified numeric value.
Not sure if you are still looking for an answer but I was just solving a similar problem. I used limitsand expand in scale_x_continuous.
So I guess for you it would look something like this:
ggplot(dat, aes(fill=Level,variable,value )) +
geom_bar(position = 'identity',stat = 'identity') +
scale_x_continuous(limits = c(2, 32), expand = c(0, 0))
except making limits = c(2,32) be whatever you want the limits of the x axis to be. Means you have to set this manually, but the best work around I came up with doing the same thing.

How to ggplot two groups of income-segment populations and values

I have a data frame which has two types of 'groups,' the densities of which I would like to overlay on the same graph.
using ggplot, I tried to graph the density using the following two lines of code:
full$group <- factor(full$group)
ggplot(full, aes(x=income, fill=group)) + geom_density()
The issue with this is that the it does not take the frequency variable (freq) into account, and simply calculates the frequency itself. That is an issue because there is exactly one row for every income-group combination.
I believe I have two options, each of which has a question:
a) Should I plot the graph using the way the data is currently formatted? If so, how would I do that?
b) Should I reformat the data to make the frequency of each group/income combination equivalent to the freq variable assigned to it? If so, how would I do that?
This is the kind of graph I would like, where "income" = "rating" and "group" = "cond":
dput of 'full':
full <- structure(list(income = c(10000, 19000, 29000, 39000, 49000, 75000, 99000, 1e+05, 10000, 19000,29000, 39000, 49000, 75000, 99000, 1e+05),
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("one", "two"), class = "factor"),
freq = c(1237, 1791, 743, 291, 256, 212, 29, 11, 921, 1512, 614, 301, 209, 223, 48, 1)), .Names = c("income", "group", "freq"),
row.names = c(NA, 16L), class = "data.frame")
You can repeat the observations by their frequency with
ggplot(full[rep(1:nrow(full), full$freq),]) +
geom_density(aes(x=income, fill=group), color="black", alpha=.75, adjust=4)
Of course with your data this produces a pretty lousy plot
When estimating a density, your data should be observations from a continuous distribution. Here you really have a discrete distribution with repeated observations (in a true continuous distribution, the probability of seeing any value more than once is 0).
You could try to smooth this curve by setting the adjust= parameter to a number >1, (like 3 or 4). But really, your input data is just not in an appropriate form for a density plot. A bar plot would be a better choice. Maybe something like
ggplot(full, aes(as.factor(income), freq, fill=group)) +
geom_bar(stat="identity", position="dodge")

Resources