R : ggplot2 to show two summary data [duplicate] - r

This question already has answers here:
How to make a single plot from two dataframes with ggplot2
(2 answers)
Closed 6 months ago.
df1 <- data.frame(
"Item" = c("20170315","20170316","20170409","20170411","20170525"),
"Value" = c(400, 515, 743, 682, 458))
df2 <- data.frame(
"Item" = c("20180102","20180227","20180311","20180318","20180522","20180628"),
"Value" = c(793, 541, 777, 847, 901, 433))
Want to show two dataframe in one plot,
like this picture. Have a nice day!

Like this?
Create a column cond and bind the data sets. Then it's a normal dodged bar plot.
df1 <- data.frame(
"Item" = c("20170315","20170316","20170409","20170411","20170525"),
"Value" = c(400, 515, 743, 682, 458))
df2 <- data.frame(
"Item" = c("20180102","20180227","20180311","20180318","20180522","20180628"),
"Value" = c(793, 541, 777, 847, 901, 433))
suppressPackageStartupMessages({
library(dplyr)
library(ggplot2)
})
bind_rows(
df1 %>% mutate(cond = "A"),
df2 %>% mutate(cond = "B")
) %>%
ggplot(aes(Item, Value, fill = cond)) +
geom_col() +
theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))
Created on 2022-08-21 by the reprex package (v2.0.1)

Related

Display grouped percentages in Likert plot with Plotly R

I have a dataframe like this:
library(tidyverse)
data <- tibble(Question_num = rep(c("Question_1", "Question_2"),each= 5),
Answer = rep(c('Strongly disagree',
'Disagree',
'Neutral',
'Agree',
'Strongly agree'), 2),
n = c(792, 79, 69, 46, 24, 34, 34, 111, 229, 602),
prop = c(78.4, 7.82, 6.83, 4.55, 2.38, 3.37, 3.37, 11.0, 22.7, 59.6))
where:
Question_num is the label of a question;
Answer is the response mode;
n is a simple count for each response mode;
prop is proportion, in percentage;
I would like to represent it graphically through a dynamic bar graph with divergent colours. Perhaps, this would be a starting point:
library(plotly)
library(RcolorBrewer)
data %>%
plot_ly(x = ~prop,
y = ~Question_num,
color = ~Answer) %>%
add_bars(colors = "RdYlBu") %>%
layout(barmode = "stack")
Is it possible, with Plotly in R, to obtain an ordered plot, which has the neutral category clearly delineated (in the center) and the percentages summarised by grouping the extreme categories together (even if they are in their plotted in different colours)? What I would like to obtain is a plot similar to this one:
The plot in the picture is obtained from a dataset in a different format (wide, not long) and with the likert package, which computes everything automatically. Could such a result be achieved with plotly (both for percentages and for counts)? If so, how?
I could not find any documentation to answer this challenging question.
Thank you very much to those who can help me.
The following isn't addressing all of the issues your post is raising (It might be better to split this into multiple questions).
However, I'd like to share what I was able to get so far.
(Sorry for switching from tidyverse to data.table - I'm not familar with the tidyverse and I'm not planning to familiarize any time soon).
To get the desired plot we can switch to barmode = 'relative'
Run schema() and navigate:
object ► traces ► bar ► layoutAttributes ► barmode
Determines how bars at the same location coordinate are displayed on
the graph. With stack, the bars are stacked on top of one another.
With
relative, the bars are stacked on top of one another, with negative values below the axis, positive values above
library(data.table)
library(plotly)
DF <- data.frame(Question_num = rep(c("Question_1", "Question_2"),each= 5),
Answer = rep(c('E - Strongly disagree',
'D - Disagree',
'A - Neutral',
'B - Agree',
'C - Strongly agree'), 2),
n = c(792, 79, 69, 46, 24, 34, 34, 111, 229, 602),
prop = c(78.4, 7.82, 6.83, 4.55, 2.38, 3.37, 3.37, 11.0, 22.7, 59.6))
DT <- as.data.table(DF)
DT[, order := .GRP, by = Answer]
DT[Answer == "A - Neutral", c("n", "prop") := .(n/2, prop/2)][Answer %in% c("E - Strongly disagree", "D - Disagree"), prop := -prop]
DT <- rbindlist(list(DT, DT[Answer == "A - Neutral", .(Question_num = Question_num, Answer = Answer, n = n, prop = -prop, order = order-0.5)]))
setorder(DT, -Question_num, order)
# setorder(DT, order)
fig <- plot_ly(
data = DT,
type = "bar",
x = ~ prop,
y = ~ Question_num,
color = ~ Answer,
colors = c("E - Strongly disagree" = "#a6611a",
"D - Disagree" = "#d2b08c",
"A - Neutral" = "#b3b3b3",
"B - Agree" = "#80c2b8",
"C - Strongly agree" = "#018571"),
text = ~ paste0(prop, "%"),
textfont = list(
size = 12,
color = 'black')
)
fig <- layout(
fig,
barmode = "relative",
xaxis = list(title ="Percentage"),
yaxis = list(
categoryorder = "array",
categoryarray = sort(unique(DT$Question_num), decreasing = TRUE),
title = ""
),
legend = list(orientation = "h")
)
print(fig)
Here a related question can be found.

{gtExtras} column showing in wrong order in {gt} table when grouped

I am making a gt table showing the progress of individuals towards a goal. In the table, there is a row showing a horizontal bar graph of progress towards that goal (if goal is 50 and score is 40, the bar is at 80%).
However, when I change the order of the gt rows by using the groupname_col argument, the order of the other cells changes, but not the order of the gtExtras gt_plt_bar_pct column, so it's showing the wrong bars for the name and score in that row, instead, that column seems to always be represented in the order of rows in the input data.
I understand that I can fix this by using arrange on the df before the gt begins, but this doesn't seem like a good solution since I'm going to want to change the order of the rows to view by different groups. Is this a flaw with gtExtras? is there a better fix?
thanks!
reprex:
library(tibble)
library(gt)
library(gtExtras)
library(dplyr)
# make dataframe of individuals and their goals
df <- tribble(
~name, ~group, ~score, ~goal,
"Bob", "C", 20, 40,
"Chris", "A", 50, 40,
"Dale", "B", 30, 50,
"Jay", "A", 0, 40,
"Ben", "B", 10, 20
) %>%
# calculate percent towards goal, and cap at 100%
mutate(percent_to_goal = score/goal *100,
percent_to_goal = case_when(percent_to_goal >= 100 ~ 100,
TRUE ~ percent_to_goal))
df %>%
# this fixes the issue, but doesn't seem like a permanent solution
#arrange(group, name) %>%
# make gt table
gt(rowname_col = "name", groupname_col = "group") %>%
# order groups
row_group_order(groups = c("A","B","C")) %>%
# add bar chart column
gt_plt_bar_pct(column = percent_to_goal) %>%
# highlight blue if person reaches their goal
tab_style(
style = list(
cell_fill(color = "lightcyan"),
cell_text(weight = "bold")),
locations = cells_body(
columns = c(goal,score, percent_to_goal),
rows = score >= goal
)
)
Here is the output from the above code: notice that the length of the bar charts do not always reflect the values of the rows they are appearing in. Instead, they reflect the order of the original dataset.
EDIT: remove row_group_order. If I run the above code again, but comment out the line meant to rearrange the appearance of groups, the grouping shows up in a different order (order of appearance of groups in the original dataset), and the name and first two columns sort into these groups accordingly, but the bar chart column still does not, and remains in the original order of the dataset. Image below:
Per gtExtras v 0.2.4 this bug has been fixed. Thanks for raising and the great reprex!
library(tibble)
library(gt)
library(gtExtras)
library(dplyr)
# make dataframe of individuals and their goals
df <- tribble(
~name, ~group, ~score, ~goal,
"Bob", "C", 20, 40,
"Chris", "A", 50, 40,
"Dale", "B", 30, 50,
"Jay", "A", 0, 40,
"Ben", "B", 10, 20
) %>%
# calculate percent towards goal, and cap at 100%
mutate(percent_to_goal = score/goal *100,
percent_to_goal = case_when(percent_to_goal >= 100 ~ 100,
TRUE ~ percent_to_goal))
df %>%
# make gt table
gt(rowname_col = "name", groupname_col = "group") %>%
# order groups
row_group_order(groups = c("A","B","C")) %>%
# add bar chart column
gt_plt_bar_pct(column = percent_to_goal) %>%
# highlight blue if person reaches their goal
tab_style(
style = list(
cell_fill(color = "lightcyan"),
cell_text(weight = "bold")),
locations = cells_body(
columns = c(goal,score, percent_to_goal),
rows = score >= goal
)
)

Deal with missing regions which provoke NAs in choropleth map

I have the dataframe below for which I want to create a chorpleth map. I downloaded the germany shapefile from here and then I use this code to create the map. As you can see the map is created but because I have several regions missing they are set to NAs and they get a black color. How can I deal with this issue? Maybe eliminate them or change them to 0? Im open to other packages like leaflet or something if they can solve the issue.
region<-c("09366",
"94130",
"02627",
"95336",
"08525",
"92637",
"95138",
"74177",
"08606",
"94152" )
value<-c( 39.5,
519.,
5.67,
5.10,
5.08,
1165,
342,
775,
3532,
61.1 )
df<-data.frame(region,value)
#shapefile from http://www.suche-postleitzahl.org/downloads?download=zuordnung_plz_ort.csv
library(choroplethr)
library(dplyr)
library(ggplot2)
library(rgdal)
library(maptools)
library(gpclib)
library(readr)
library(R6)
ger_plz <- readOGR(dsn = ".", layer = "plz-gebiete")
gpclibPermit()
#convert the raw data to a data.frame as ggplot works on data.frames
ger_plz#data$id <- rownames(ger_plz#data)
ger_plz.point <- fortify(ger_plz, region="id")
ger_plz.df <- inner_join(ger_plz.point,ger_plz#data, by="id")
head(ger_plz.df)
ggplot(ger_plz.df, aes(long, lat, group=group )) + geom_polygon()
#data file
#df <- produce_sunburst_sequences
# variable name 'region' is needed for choroplethr
ger_plz.df$region <- ger_plz.df$plz
head(ger_plz.df)
#subclass choroplethr to make a class for your my need
GERPLZChoropleth <- R6Class("GERPLZChoropleth",
inherit = choroplethr:::Choropleth,
public = list(
initialize = function(user.df) {
super$initialize(ger_plz.df, user.df)
}
)
)
#df<-df[,c(6,13)]
#choropleth needs these two columnames - 'region' and 'value'
colnames(df) = c("region", "value")
#df<-df[!(df$region=="Missing_company_zip"),]
#df<-df[!duplicated(df$region), ]
#instantiate new class with data
c <- GERPLZChoropleth$new(df)
#plot the data
c$ggplot_polygon = geom_polygon(aes(fill = value), color = NA)
c$title = "Comparison of number of Inhabitants per Zipcode in Germany"
c$legend= "Number of Inhabitants per Zipcode"
c$set_num_colors(9)
c$render()
Package sf will make your process easier.
library(tidyverse)
library(sf)
df <- data.frame(region = c("09366", "94130", "02627", "95336", "08525", "92637", "95138", "74177", "08606", "94152"),
value = c(39.5, 519, 5.67, 5.1, 5.08, 1165, 342, 775, 3532, 61.1))
germany_sf <- sf::st_read(dsn = "plz-gebiete.shp") %>%
left_join(df, by = c("plz" = "region"))
germany_sf %>%
ggplot() +
geom_sf(alpha = 0.1, size = 0.1, colour = "gray") +
geom_sf(data = . %>% filter(!is.na(value)), aes(fill = value)) +
scale_fill_viridis_c() +
theme_bw()
For a zoomable/interactive option, use {tmap}, a package that wraps {leaflet} for quick, simple maps.
library(tmap)
tmap_mode("view")
tm_shape(shp = germany_sf) +
tm_polygons(col = "value", border.alpha = 0)
I've been messing around with the choroplethr package a bit and had this same question. The "aha" moment was learning that the output from the various x_choropleth functions is actually just a ggplot object. This means you can modify them as you would any ggplot graphic. So if you add something like this in your graphic output pipeline I think it might achieve what you're after:
+ scale_fill_distiller(na.value = "white")
Not sure if some of the other things you're doing here would preclude this from working.
Shout out to this write-up: https://statisticaloddsandends.wordpress.com/2019/07/15/looking-at-flood-insurance-claims-with-choroplethr/

"Nested" barplots, with multiple levels of grouping

How can I group bars in a barplot by a third variable?
I would like to achieve this in base R, without, for example, ggplot2, as in this related question. In another related question the groups of groups are labeled, but not (visually) grouped – as in my example above –, making the plot difficult to read.
Sample data:
groups = c("A", "B")
choices = c("orange", "apple", "beer")
supergroups = c("fruits", "non-fruits")
dat <- data.frame(
group = rep(groups, c(93, 94)),
choice = factor(c(
rep(choices, c(51, 30, 12)),
rep(choices, c(47, 29, 18))
),
levels = choices
),
supergroup = c(
rep(supergroups, c(81, 12)),
rep(supergroups, c(76, 18))
)
)
barplot(table(dat), beside = TRUE)
Which returns the error:
Error in barplot.default(table(dat), beside = TRUE) :
'height' must be a vector or a matrix

Boxplot of specific groups and all groups using ggplot2

I have an example data set which represents my bigger data set I'm dealing with that looks like this:
dat <- data.frame(groupid = c(rep("2ppm", 5), rep("20ppm", 5)),
var1 = c(222, 212, 245, 233, 213, 444, 454, 464, 434, 424),
var2 = c(111, 112, 145, 133, 113, 744, 754, 764, 734, 724));
I want to plot the variables side-by-side grouped by their groupid, hence I did the following:
mDat <- melt(dat, by = "groupid");
With this data set, I can easily plot the boxplots needed using ggplot2:
bp <- ggplot(mDat, aes(x = variable, y = value, fill = groupid)) +
geom_boxplot();
So far so good, however, I want to add an additional boxplot to the end of the plot, where all values in both variables are plotted to see the overall spread in a boxplot; I couldn't figure out how to modify the melted data set to get this result, e.g. add another group to groupid called all.
Thanks in advance for your help!

Resources