How to build stacked bar chart - r

How can I build stacked bar chart from this data? Where years will be x axis while OLD and NEW differentiated via colours in bars.
However I want to avoid manual coding and automatize the process.
structure(list(`1998` = c(11, 826), `2000` = c(217, 620), `2007` = c(625,
212), `2012` = c(836, 1)), class = "data.frame", row.names = c("NEW",
"OLD"))
1998 2000 2007 2012
NEW 11 217 625 836
OLD 826 620 212 1
Expected output:

Looking for something like this?
library(tidyverse)
df %>%
# rownames to column
mutate(type = rownames(.)) %>%
# convert to long data
pivot_longer(-"type") %>%
# plot
ggplot() +
geom_col(aes(x = name, y = value, fill = type))

Related

Plotting/Mutating Data on R

I've trying to plot data that has been mutated into quarterly growth rates from nominal levels.
i.e the original dataset was
Date GDP Level
2010Q1 457
2010Q2 487
2010Q3 538
2010Q4 589
2011Q1 627
2011Q2 672.2
2011Q3 716.4
2011Q4 760.6
2012Q1 804.8
2012Q2 849
2012Q3 893.2
2012Q4 937.4
Which was in an excel file which I have imported using
dataset <- read_excel("xx")
Then, I have done the below in order to mutate it to quarter on quarter growth ("QoQ Growth):
dataset %>%
mutate(QoQ Growth= (GDP Level) / lag(GDP Level, n=1) - 1)
I would like to now plot this % growth across time, however I'm not too sure how what the geom_line code is for a mutated variable, any help would be really truly appreciated! I'm quite new to R and really trying to learn, thanks!
Something like this?
library(tidyverse)
df %>%
mutate(QoQGrowth = (GDPLevel) / lag(GDPLevel, n=1) - 1) %>%
ggplot(aes(factor(Date), QoQGrowth, group=1)) +
geom_line()
Output
Data
df <- structure(list(Date = c("2010Q1", "2010Q2", "2010Q3", "2010Q4",
"2011Q1", "2011Q2", "2011Q3", "2011Q4", "2012Q1", "2012Q2", "2012Q3",
"2012Q4"), GDPLevel = c(457, 487, 538, 589, 627, 672.2, 716.4,
760.6, 804.8, 849, 893.2, 937.4)), class = "data.frame", row.names = c(NA,
-12L))
Package zoo defines a S3 class "yearqtr" and has a function to handle quarterly dates, as.yearqtr. Combined with ggplot2's scale_x_date, the formating of quarterly axis labels becomes easier.
dataset <- read.table(text = "
Date 'GDP Level'
2010Q1 457
2010Q2 487
2010Q3 538
2010Q4 589
2011Q1 627
2011Q2 672.2
2011Q3 716.4
2011Q4 760.6
2012Q1 804.8
2012Q2 849
2012Q3 893.2
2012Q4 937.4
", header = TRUE, check.names = FALSE)
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(zoo))
library(ggplot2)
dataset %>%
mutate(Date = as.yearqtr(Date, format= "%Y Q%q"),
Date = as.Date(Date)) %>%
mutate(`QoQ Growth` = `GDP Level` / lag(`GDP Level`, n = 1) - 1) %>%
ggplot(aes(Date, `QoQ Growth`)) +
geom_line() +
scale_x_date(date_breaks = "3 months", labels = as.yearqtr) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
#> Warning: Removed 1 row(s) containing missing values (geom_path).
Created on 2022-03-08 by the reprex package (v2.0.1)
Convert dataset to a zoo object z, use diff.zoo to get the growth, QoQ Growth, and then use autoplot.zoo with scale_x_yearqtr.
library(zoo)
library(ggplot2)
z <- read.zoo(dataset, FUN = as.yearqtr)
`QoQ Growth` <- diff(z, arith = FALSE) - 1
autoplot(`QoQ Growth`) +
scale_x_yearqtr(format = "%YQ%q", n = length(`QoQ Growth`)) +
xlab("")

Make two geom_bar() plots base on different columns in one plot

I have a data frame that looks like this:
Year Women Men
1 2013 145169 889190
2 2014 119064 849778
3 2015 210107 1079592
4 2016 221217 1427639
5 2017 205000 1692592
6 2018 273721 1703456
7 2019 434407 2010493
I want to make a geom_bar, where x is a year and every year has two bars for a number from Women and Men. I have found a solution where this table should looks different, but I'm wondering if there is an option to work with this one. Thank You for any help :)
You can use the following code
library(tidyverse)
df %>%
pivot_longer(cols = -c(Year,Sl), values_to = "Value", names_to = "Name") %>%
ggplot(aes(x = Year, y = Value, fill = Name))+geom_col(position = "dodge")
Data
df = structure(list(Sl = 1:7, Year = 2013:2019, Women = c(145169L,
119064L, 210107L, 221217L, 205000L, 273721L, 434407L), Men = c(889190L,
849778L, 1079592L, 1427639L, 1692592L, 1703456L, 2010493L)), class = "data.frame", row.names = c(NA,
-7L))

How do I group by time in R and plot with ggplot? Can this be done within ggplot?

I'm analysing app data using R and I find myself having to group by time a lot so I can plot it in ggplot, however this doesn't seem easy to do.
my data looks like:
user_id | session_id | timestamp | time_seconds
001 | 123 | 2014-01-01| 251
002 | 845 | 2014-01-01| 514
003 | 741 | 2014-01-02| 141
003 | 477 | 2014-01-03| 221
004 | 121 | 2014-01-03| 120
005 | 921 | 2014-01-04| 60
...
The time_stamp column is formatted with as.Date() so it should be recognised as a date by R.
I need to plot line graphs showing no. of sessions over time in ggplot. Is there a simple way to do this within the ggplot code? for example:
ggplot(df, aes(timestamp,count(session_id)))+
geom_line()
I want to do a count of sessions per date, the above code doesn't work, just an example to show what I'm after.
What I'd also like to do is then summarise by month. I'd also like to look into specific months and would like to subset the data. Can this be done from that line of code? xlim isn't what I'm after as that just "shortens" the axis.
I've tried using the aggregate function but with mixed results, not really what I've been after.
Thanks.
You can use group_by and summarize from the dplyr-package:
library(dplyr)
library(ggplot2)
df %>%
group_by(timestamp) %>%
summarise(session_count = n()) %>%
ggplot(aes(timestamp, session_count)) +
geom_line()
For summarizing the data by month you can do:
df %>%
mutate(month_timestamp = format(timestamp, "%b %Y")) %>%
group_by(month_timestamp) %>%
summarise(session_count = n()) %>%
ggplot(aes(month_timestamp, session_count)) +
geom_line()
The plot here doesn't show something because there's only one month in your data.
Data
df <- structure(list(user_id = c("001", "002", "003", "003", "004", "005"),
session_id = c("123", "845", "741", "477", "121", "921"),
timestamp = structure(c(16071, 16071, 16072, 16073, 16073, 16074),
class = "Date"),
time_seconds = c(251, 514, 141, 221, 120, 60)),
.Names = c("user_id", "session_id", "timestamp", "time_seconds"),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -6L))
Might also be convenient to do with lubridate, e.g.
library(tidyverse)
dat <- data.frame(timestamp = rep(seq.Date(as.Date("2014/01/01"), as.Date("2014/12/24"), "day"), each = 2),
sessions = 1)
dat %>%
mutate(month = format(timestamp, "%Y-%m")) %>%
group_by(month) %>%
summarise(sum_session = sum(sessions)) %>%
ggplot(data = e, aes(x = month, y = sum_session, group = 1)) + geom_line()

R: plot multiple curves vs one var but for 4 factors

I have a DF that looks like:
id app vac dac
1: 1 1000802 579 455
2: 1 1000803 1284 918
3: 1 1000807 68 66
4: 1 1000809 1470 903
5: 2 1000802 407 188
6: 2 1000803 365 364
7: 2 1000807 938 116
8: 2 1000809 699 570
I need to plot vac and dac for each app on same canvas as a function of id. I know how to do it for only one app by using melt and bulk-plot with ggplot. But I'm stuck how to do it for arbitrary number of factors/levels.
In this example there will be total 8 curves for 4 app. Any thoughts?
Here's the data frame for tests. Thank you!!
df = structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), app = c(1000802,
1000803, 1000807, 1000809, 1000802, 1000803, 1000807, 1000809
), vac = c(579, 1284, 68, 1470, 407, 365, 938, 699), dac = c(455,
918, 66, 903, 188, 364, 116, 570)), .Names = c("id", "app", "vac",
"dac"), class = c("data.table", "data.frame"), row.names = c(NA,
-8L))
Edit: some clarification on axes,
x axis = id, y axis = values of vac and dac for each of 4 app factors
It is a bit unclear what you are looking for, but if you are looking for a line connecting the values of vac and dac, here is a solution using dplyr and tidyr.
First, gather the vac and dac columns (this is similar to reshape2::melt but with a syntax I find easier to follow). Then, set the variable (which has "vac" and "dac") as your x-locations, the value (from the old vac and dac columns) as your y and then map app and id to aesthetics (here, color and linetype). Set the group to ensure that it connects the right pairs of points, and add geom_line:
df %>%
gather(variable, value, vac, dac) %>%
ggplot(aes(x = variable
, y = value
, color = factor(app)
, linetype = factor(id)
, group = paste(app, id))) +
geom_line()
gives
Given the question edit, you can change axes like so:
df %>%
gather(variable, value, vac, dac) %>%
ggplot(aes(x = id
, y = value
, color = factor(app)
, linetype = variable
, group = paste(app, variable))) +
geom_line()
gives
I not sure, I understood your question but I would do something like
ggplot(df,aes(vac,app,group=app)) + geom_point(aes(color=factor(app)))

Draw a graph in R with header elaborate on two columns

I have a table with header expanded on two columns. How to draw a 3D graph on this table OR what would be a way to draw a graph on tables having elaborated headers. Kindly suggest me alternate ways to achieve this (if any)
Crime Table:
year
2014 2015 2016
Reported Detected Reported Detected Reported Detected
Murder 221 208 178 172 26 20
Murder(Gain) 20 16 11 9 1 1
Dacoity 51 45 44 36 5 1
Robbery 538 316 351 201 23 10
Chain Snatching 528 394 342 229 23 0
Code:
library(tables)
#CLASS 1 CRIMES 2014
c14 <- structure(list(`Reported` = c(221, 20, 51,
538, 528), `Detected` = c(208, 16, 45, 316, 394)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity", "Robbery", "Chain Snatching"), class = "data.frame")
c14
#CLASS 1 CRIMES 2015
c15 <- structure(list(`Reported` = c(178, 11, 44,
351, 342), `Detected` = c(172, 9,
36, 201, 229)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity",
"Robbery", "Chain Snatching"), class = "data.frame")
c15
#CLASS 1 CRIMES 31-01-2016
c16 <- structure(list(`Reported` = c(26, 1, 5,
23, 23), `Detected` = c(20, 1,
1, 10, 0)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity",
"Robbery", "Chain Snatching"), class = "data.frame")
c16
# rbind with rownames as a column
st <- rbind(
data.frame(c14, year = '2014', what = factor(rownames(c14), levels = rownames(c14)),
row.names= NULL, check.names = FALSE),
data.frame(c15,year = '2015',what = factor(rownames(c15), levels = rownames(c15)),
row.names = NULL,check.names = FALSE),
data.frame(c16,year = '2016',what = factor(rownames(c16), levels = rownames(c16)),
row.names = NULL,check.names = FALSE)
)
crimetable <- tabular(Heading()*what ~ year*(`Reported` +`Detected`)*Heading()*(identity),data=st)
crimetable
As I hate 3D plots for 3-way tables and I like ggplot2, I suggest this:
Gather your data into "long" format:
library(tidyr)
st_long = gather(st, type, count, -c(year, what))
head(st_long, 3)
# year what type count
# 1 2014 Murder Reported 221
# 2 2014 Murder(Gain) Reported 20
# 3 2014 Dacoity Reported 51
As you can see, both Detected and Reported columns are now included in the same column called type. This is useful for ggplot2, as it can easily create facets. Facets are separate elements within the plot that share the same aesthetic components but work with on different groups of data:
library(ggplot2)
ggplot(st_long, aes(year, count, group = what, color = what)) +
geom_line() +
facet_wrap(~ type)
(I am not saying that line plot is the only/best plot here, but it is often used when comparing frequencies across different time-points.)

Resources