npdb dataset and displaying results in charts R - r

I'm, working with the npdb dataset from UsingR package. Here is some information about it
How could I calculate how much was the amount for each ear and inserting it into a pie chart?
How could I determinate which states and which ID has max amount value (for each state)?
How could I create a histogram with the amounts?
I was able to calculate some parts of these, but I don't know how to connect them into charts

Did you have this in mind?
options(scipen=999)
library(UsingR)
library(dplyr)
data(npdb)
df <- npdb
q1 <- df %>% group_by(year) %>% summarise(mean = mean(amount)) %>% rename(Ave_am = mean)
q1 <- q1 %>%
arrange(desc(year)) %>%
mutate(prop = Ave_am / sum(q1$Ave_am) *100) %>%
mutate(ypos = cumsum(prop)- 0.5*prop )
ggplot(q1, aes(x="", y=prop, fill=year)) +
geom_bar(stat="identity", width=1, color="white") +
coord_polar("y", start=0) +
theme_void() +
theme(legend.position="none") +
geom_text(aes(y = ypos, label = year), color = "white", size=6)
q2 <- df %>% group_by(state) %>% slice(which.max(amount))
ggplot(q2, aes(x=state, y=amount)) +
geom_bar(stat = "identity") +
ylab("Max amount")

Related

How do I combine two or more values output in LessR PC function for Donut Charts?

I am trying to create a donut chart with the lessR PieChart (PC) function, but I am stuck at finding a way to combine two or more values options to show in the chart. For instance, I would like for the values to show as counts and percentages at the same time. E.g., 10 (2%)
but the values argument seems to only allow % or counts or prop at one time but does not seem to allow combining argument options. I have tried using c to combine "input" and "%" but it didn't work. I have also tried paste0, & and | but nothing seems to work.
Below is an example code to recreate the problem using the lessR package:
# install.packages(lessR)
library(lessR)
# Sample data of the package
data <- rd("Employee")
# Donut chart
PieChart(Dept, data = data,
fill = "viridis",
values = "input",
main = NULL)
# install.packages(lessR)
library(lessR)
# Sample data of the package
data <- rd("Employee")
# Donut chart
PieChart(Dept, data = data,
fill = "viridis",
values = "%",
main = NULL)
I want my values to show as both input and percentages, e.g., 15 (20%). If anyone can help, I would greatly appreciate it. Thank you!
I would suggest to use ggplot for creating a pie-chart. You can use the following code:
library(tidyverse)
data %>%
group_by(Dept) %>%
summarise(value = n()) %>%
na.omit() %>%
ggplot(aes(x="", y=value, fill=Dept)) +
geom_bar(width=1, stat="identity") +
coord_polar("y", start=0) +
xlab("") +
ylab("Value") +
theme(axis.text.x=element_blank(),axis.ticks.x=element_blank()) +
geom_text(aes(label = paste0(value, " (", scales::percent(value / sum(value)), ")")), position = position_stack(vjust = 0.5))
Output:
Comment:
Maybe you want something like this using geom_label_repel:
library(tidyverse)
library(ggrepel)
data %>%
group_by(Dept) %>%
summarise(value = n()) %>%
na.omit() %>%
mutate(csum = rev(cumsum(rev(value))),
pos = value/2 + lead(csum, 1),
pos = if_else(is.na(pos), value/2, pos)) %>%
ggplot(aes(x="", y=value, fill=Dept)) +
geom_bar(width=1, stat="identity") +
coord_polar("y", start=0) +
xlab("") +
ylab("Value") +
theme(axis.text.x=element_blank(),axis.ticks.x=element_blank()) +
geom_label_repel(aes(y = pos, label = paste0(value, " (", scales::percent(value / sum(value)), ")", Dept)), size = 4.5, nudge_x = 1, show.legend = FALSE)
Output:
Comment: Donut chart
You can create donut charts in ggplot2 using coord_polar and geom_col. The variable hsize determines the size of the hole in the chart. You can use the following code:
hsize <- 1
library(tidyverse)
library(ggrepel)
data %>%
group_by(Dept) %>%
summarise(value = n()) %>%
na.omit() %>%
mutate(csum = rev(cumsum(rev(value))),
pos = value/2 + lead(csum, 1),
pos = if_else(is.na(pos), value/2, pos)) %>%
ggplot(aes(x=hsize, y=value, fill=Dept)) +
geom_col() +
coord_polar(theta = "y") +
xlim(c(0.2, hsize + 0.5)) +
xlab("") +
ylab("Value") +
theme(axis.text.x=element_blank(),axis.ticks.x=element_blank()) +
geom_label_repel(aes(y = pos, label = paste0(value, " (", scales::percent(value / sum(value)), ")", Dept)), size = 4.5, nudge_x = 1, show.legend = FALSE)
Output:

Combining Different Types of Graphs Together (R)

I am trying to learn how to combine different types of graphs together in the R programming language. Suppose I have the following data:
library(dplyr)
library(ggplot2)
date= seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")
var <- rnorm(731,10,10)
group <- sample( LETTERS[1:4], 731, replace=TRUE, prob=c(0.25, 0.22, 0.25, 0.25) )
data = data.frame(date, var, group)
data$year = as.numeric(format(data$date,'%Y'))
data$year = as.factor(data$year)
I summarized this data making different types of graphs. For example:
1) Pie Charts:
###Pie
Pie_2014 <- data %>% filter((data$year == "2014"))
Pie_2014 %>%
group_by(group) %>%
summarise(n = n())
Pie_2014_graph = ggplot(Pie_2014, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart 2014")
Pie_2015 <- data %>% filter((data$year == "2015"))
Pie_2015 %>%
group_by(group) %>%
summarise(n = n())
Pie_2015_graph = ggplot(Pie_2015, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart 2015")
Pie_total = data %>%
group_by(group) %>%
summarise(n = n())
Pie_total_graph = ggplot(data, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart Average")
Bar Plots:
Bar_years = data %>%
group_by(year, group) %>%
summarise(mean = mean(var))
Bar_years_plot = ggplot(Bar_years, aes(fill=group, y=mean, x=year)) +
geom_bar(position="dodge", stat="identity") + ggtitle("Bar Plot All Years")
Bar_total = data %>%
group_by(group) %>%
summarise(mean = n())
Bar_total_plot = ggplot(Bar_total, aes(x=group, y=mean, fill=group)) +
geom_bar(stat="identity")+theme_minimal() + ggtitle("Bar Plot Average")
Time Series Plots:
New <- data %>%
mutate(date = as.Date(date)) %>%
group_by(group, month = format(date, "%Y-%m")) %>%
summarise( Mean = mean(var, na.rm = TRUE), Count = n())
#Plot
ts_1 <- ggplot(New) +
geom_line(aes(x=month, y=Mean, colour=group,group=1))+
scale_colour_manual(values=c("red","green","blue", "purple"))+
theme(axis.text.x = element_text(angle=90)) + ggtitle("time seres 1")
ts_2 <- ggplot(New) +
geom_line(aes(x=month, y=Count, colour=group,group=1))+
scale_colour_manual(values=c("red","green","blue", "purple"))+
theme(axis.text.x = element_text(angle=90)) + ggtitle("time seres 2")
All these graphs work perfectly. Now I am looking for a better way to present them.
My question: Is it possible to neatly arrange all these graphs into a window using R and ggplot2?
For example:
Row 1: All Pie Charts (Pie_2014_graph, Pie_2015_graph, pie_total_graph)
Row 2: All Bar Graphs (Bar_years_plot, Bar_total_plot)
Row 3: All Time Series Graphs (ts_1, ts_2)
Right now, I creating all these graphs individually, pasting them into MS Paint and manually rearranging them.
Something like this?
All help is greatly appreciated.
Thanks
The code you posted above fails because you are trying to use the variable n but have not assigned the data anywhere after your summarise(n = n()) step for your pie chart data.
You can either pipe the summarised data straight into ggplot or otherwise you must assign the intermediary steps with something like this;
Pie_2014 <- data %>%
filter((data$year == "2014")) %>%
group_by(group) %>%
summarise(n = n())
Pie_2014_graph = ggplot(Pie_2014, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart 2014")
Pie_2015 <- data %>%
filter((data$year == "2015")) %>%
group_by(group) %>%
summarise(n = n())
Pie_2015_graph = ggplot(Pie_2015, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart 2015")
Pie_total = data %>%
group_by(group) %>%
summarise(n = n())
Pie_total_graph = ggplot(Pie_total, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart Average")
After that arranging the subplots together is pretty straightforward with the patchwork package. e.g. something like this will get you close;
# combine plots
# install.packages('patchwork')
library(patchwork)
(Pie_2014_graph | Pie_2015_graph | Pie_total_graph) /
(Bar_years_plot | Bar_total_plot) /
(ts_1 | ts_2)
EDIT: Following request for a non-patchwork alternative, here is a version to get you started using cowplot:
library(cowplot)
# arrange subplots in rows
top_row <- plot_grid(Pie_2014_graph, Pie_2015_graph, Pie_total_graph, nrow = 1)
middle_row <- plot_grid(Bar_years_plot, Bar_total_plot)
bottom_row <- plot_grid(ts_1, ts_2)
# arrange our new rows into combined plot
p <- plot_grid(top_row, middle_row, bottom_row, nrow = 3)
p

Creating the pie chart according to the dataframe

df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-
timeseries/master/countryReport/raw/rawReport.csv',
stringsAsFactors = FALSE)
How to create a pie chart of the death, confirmed and recovered fields in this data set by region.
perfect for a tidyverse
library(tidyverse)
df %>%
as_tibble() %>%
select(region, confirmed, recovered, death) %>%
gather(type, value, -region) %>%
group_by(region,type) %>%
summarise(value= sum(value)) %>%
ggplot(aes(x="", value, fill =region)) +
geom_col(position = position_fill(), color="white") +
ggrepel::geom_text_repel(aes(label = region), direction = "y",
position = position_fill(vjust = 0.5)) +
coord_polar(theta = "y") +
scale_fill_discrete("") +
facet_wrap(~type) +
theme_void() +
theme(legend.position = "bottom")
For labels I used function geom_text_repel from ggrepel package to easily avoid overplotting.

fill and group bar graphs by different variables

I am trying to create faceted geom_bar graphs with the following charactaristics:
The proportion of each answer per question is shown
Each bar is colored according to the response
The plot is faceted by question
I seem to be able to do any two of the conditions, but not all 3.
Question:
Is there a way to facet and calculate proportions using one variable, but colour/fill based on another variable?
Code:
df <- data.frame(
Question = rep(c('A', 'B', 'C'), each = 5),
Resp = sample(c('Yes', 'No', 'Unsure', NA), 15, T, c(0.3,0.3,0.3,0.1)),
stringsAsFactors = F
)
# Plot 1: grouping by question to get the right proportions, but has no colour
ggplot(df, aes(x = Resp, fill = Resp)) +
stat_count(aes(y = ..prop.., group = Question)) +
scale_y_continuous(labels = scales::percent_format()) +
facet_wrap(~ Question)
# Plot 2: grouping by response to get colour, but has wrong proportions
ggplot(df, aes(x = Resp, fill = Resp)) +
stat_count(aes(y = ..prop.., group = Resp)) +
scale_y_continuous(labels = scales::percent_format()) +
facet_wrap(~ Question)
Outputs:
This is a "ggplot2-only" option:
ggplot(df, aes(x = Resp)) +
geom_bar(aes(y = ..prop.., group = Question, fill = factor(..x..)), position = "dodge") +
scale_y_continuous(labels = scales::percent_format()) +
scale_fill_discrete(name = "Response", labels = c("No", "Unsure", "Yes", "NA")) +
facet_wrap(~ Question)
One way could be to calculate the proportions and then plot.
library(dplyr)
library(ggplot2)
df %>%
count(Question, Resp) %>%
group_by(Question) %>%
mutate(n = n/sum(n) * 100) %>%
ggplot() + aes(Resp, n, fill = Resp) +
geom_col() +
facet_wrap(~Question)
Plot without facet
df$n <- 1
df <- df %>% group_by(Question, Resp) %>% summarise(n = sum(n))
ggplot(df, aes(x=factor(Question), y=n, fill=Resp)) + geom_col()
Plot with facet
df <- df %>% group_by(Question, Resp) %>% summarise(n = sum(n)) %>% mutate(prop = n/5)
ggplot(df, aes(x=factor(Resp), y=prop, fill=Resp)) + geom_col() + facet_wrap(~Question)

Overlaying barplot with line graphs using ggplot2

My question is similar to those posted here and here.
I am working on creating a graph in ggplot where I have one bar plot and then want to overlay multiple line graphs. For the purposes of this question, I have reproduced my code for two barplots (one that includes all years (2007-2015) and two from specific years (2007 and 2015), but ultimately I will be overlaying data from 10 different years. The data used can be found here.
library(dplyr)
library(tidyr)
library(gridExtra)
library(ggplot2)
overallpierc<-data[(data$item=="piercing"),]
overp<-overallpierc %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
p07<-data[(data$yy=="2007") & (data$item=="piercing"),]
summary(p07)
subp07<-p07 %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
p15<-data[(data$yy=="2015") & (data$item=="piercing"),]
subp15<-p15 %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
grid.arrange(overp, subp07, subp15)
The code I have posted gives me the following figure.
What I am trying to do is plot the frequencies for females in 2007 and 2015 and males in 2007 and 2015 on top of the barplot for total frequencies (where this is also reflected in the legend). Is there a way to do that in R using ggplot2?
UPDATE: I tried using the geom_smooth and geom_line functions to add the lines to my ggplot as suggested in the comments and as other solutions to users questions, but I get the following error:
Error: Discrete value supplied to continuous scale
I created a new data frame for a subset that I would like to plot:
df<-data.frame(age=c(15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,40,50,60), val=c(0,5,13,77,70,106,62,51,46,27,46,16,22,16,14,48,21, 3,4))
And then added it to the ggplot code:
overallpierc %>%
filter(age != "15") %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) +
geom_line(data=df,aes(x=as.numeric(age),y=val),colour="blue") +
geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
Others have encountered similar issues and used as.numeric to solve the problem. However, age needs to be treated as a factor for the purposes of plotting.
Based on our discussion in the comments, let's try stacked bars and facets. I think it works but you can decide for yourself.
The stacked bar has the advantage of showing both proportions and total count in the same bar. To compare years, a facet grid places years in rows, so the eye can scan downwards to compare the same age in different years. Note that I kept age as a continuous variable here, rather than a factor.
library(dplyr)
library(ggplot2)
data30g %>%
count(yy, sex, age) %>%
ggplot(aes(age, n)) +
geom_col(aes(fill = sex)) +
facet_grid(yy ~ .) +
theme_bw() +
scale_fill_manual(values = c("#000000", "#cccccc"))
Not bad - I can see straight away, for example, an increase in both total and female count at age 30 over time, but perhaps a little small and crowded.
We can use a facet wrap instead of a grid to make the bars clearer, but at the expense of quick visual comparison across years.
data30g %>%
count(yy, sex, age) %>%
ggplot(aes(age, n)) +
geom_col(aes(fill = sex)) +
facet_wrap(~yy, ncol = 2) +
theme_bw() +
scale_fill_manual(values = c("#000000", "#cccccc"))
One more example which does not address your question in terms of total counts or barplots - but I thought it might be of interest. This code generates a "heatmap" style of plot which is poor for quantitative comparison, but can sometimes give a quick visual impression of interesting features. I think it shows, for example, that females aged 20 in 2014 have the highest total count.
data30g %>%
count(yy, sex, age) %>%
ggplot(aes(factor(age), yy)) +
geom_tile(aes(fill = n)) +
facet_grid(sex ~ .) +
scale_fill_gradient2() +
scale_y_reverse(breaks = 2006:2015) +
labs(x = "age", y = "Year")
EDIT:
Based on further discussions in the comments, here is one way to plot age as a factor, using bars for sexes, overlaid with a line for the totals and split by year.
overallpierc %>%
count(yy, sex, age) %>%
ggplot() +
geom_col(aes(factor(age), n, fill = sex), position = "dodge") +
stat_summary(aes(factor(age), n), fun.y = "sum", geom = "line", group = 1) +
facet_grid(yy ~ .)

Resources