How to make funnel chart with bars in R ggplot2? - r

I want to make a funnel chart in R with ggplot2 as following:
https://chartio.com/assets/c15a30/tutorials/charts/funnel-charts/c7cd4465bc714689646515692b6dbe7c74ae7550a265cd2d6a530f1f34d68ae1/funnel-chart-example.png
My code looks like this, but I don't know how to do the the light blue fills between the bars. (maybe with polygon?)
library(ggplot2)
library(reshape2) # for melt()
library(dplyr)
# get data
dat <- read.table(text=
"steps numbers rate
clicks 332835 100.000000
signup 157697 47.379933
cart 29866 8.973215
buys 17012 5.111241",
header = T)
barWidth <- 0.9
# add spacing, melt, sort
total <- subset(dat, rate==100)$numbers
dat$padding <- (total - dat$numbers) / 2
molten <- melt(dat[, -3], id.var='steps')
molten <- molten[order(molten$variable, decreasing = T), ]
molten$steps <- factor(molten$steps, levels = rev(dat$steps))
ggplot(molten, aes(x=steps)) +
geom_bar(aes(y = value, fill = variable),
stat='identity', position='stack') +
geom_text(data=dat,
aes(y=total/2, label= paste(round(rate), '%')),
color='white') +
scale_fill_manual(values = c('grey40', NA) ) +
coord_flip() +
theme(legend.position = 'none') +
labs(x='steps', y='volume')

I needed the same but hadn't found one, so I created a function to do so. It might need some improvements, but it is working well. The example below shows only numbers, but you can also add texts.
x <- c(86307,
34494,
28127,
17796,
12488,
11233
)
source("https://gist.github.com/jjesusfilho/fd14b58becab4924befef5be239c6011")
gg_funnel(x, color = viridisLite::plasma(6))

This should be just a comment, since you explicitly asked for a ggplot solution, which this is not - I posted it as an answer purely for reasons of code formatting.
You could consider plotly, which has a funnel type. Something like
library(plotly)
dat %>% mutate(steps=factor(steps, unique(steps)),
rate=sprintf("%.2f%%", rate)) %>%
plot_ly(
type = "funnel",
y = ~steps,
text= ~rate,
x = ~numbers)
could get you started; I do not really grasp the padding you have in your data, so this might not be exactly what you want.

Related

Change x-axis names in ggplot

I am not very good in R, and need some help.
My ggplot has a lot of dates(in the x-axis) so you can't actually see the dates, and I want to change it to months to give a better overview of the plot.
For example to something like this in the link:
Display the x-axis on ggplot as month only in R
This is the script I'm using:
r <- read.csv("xxdive.csv", header = T, sep = ";")
names(r) <- c("Date", "Number")
r <- data.frame(r)
r$Date <- factor(r$Date, ordered = T)
r[1:2, ]
Date Number
16.02.2015 97
17.02.2015 47
library(tidyverse)
ggplot(r, aes(Date, Number)) +
theme_light() +
ggtitle("16.02.15-10.02.16") +
ylab("Dives") +
geom_line(aes(group = 1), color = "blue")
This shows what kind of data I have.
I have tried using scale etc, but I can't make it work..
I hope this was understandable, and that someone can help me!! :)
I would convert column Date to data type Date
r$Date <- as.Date(r$Date, "%d.%m.%Y");
instead of converting it to data type factor.
r$Date <- factor(r$Date, ordered = T);
It's a little tricky without a working example, but try this.
install.packages("tidyverse")
library(tidyverse)
r <- read_delim("xxdive.csv", ";", col_types = list(col_date(), col_integer()))
names(r) <- c("Date", "Number")
ggplot(r, aes(Date, Number)) +
geom_line(aes(group = 1), color = "blue") +
scale_x_date(date_breaks = "1 month") +
ylab("Dives") +
ggtitle("16.02.15-10.02.16") +
theme_light()

How to overwrap on geom_bar in ggplot2?

I would like to create a bar chart with ggplot in R.
The sample data is as follows:
Name <- c('Sample1', 'Sample2', 'Sample3')
Total <- c(86020045,30974095,1520609)
Part <- c(41348957, 2956650, 595121)
DT <- data.frame(Name,Total,Part)
DT
ggplot(DT, aes(Name, Total, fill=Name)) +
geom_bar(position="stack",stat="identity")
What I would like to show is the stack bar chart that shows each Name's Total counts, and show the Part counts within the bar + label the % of it on in the middle of the bar.
Is there any way possible to do this? I've been searching on here but haven't been able to find a solution.
Oh... It seems like someone already commented the answer while I was writing it down. I'll post mine anyways since it's slightly different.
DT <- transform(DT, Part0 = Total - Part)
library(reshape2)
DT2 <- melt(DT, id.vars = c("Name", "Total"))
DT2 <- transform(DT2, perc = value/Total * 100)
ggplot(DT2, aes(Name, perc, fill=variable)) +
geom_bar(position="stack",stat="identity") +
geom_text(data = subset(DT2, variable == "Part"), aes(y = (perc),
label = paste0("Total = ", Total, "\n",
"Part = ", value, "\n",
round(perc, 1), "%\n")))
If you use value instead of perc you will get a proportional bar chart but since the total for sample 3 is a lot smaller than sample 1, it's going to be difficult to read the table. So I decided to use percentage instead of the actual values.

How to plot the mean of a single factor in a barplot with

I'm having trouble to create a figure with ggplot2.
In this plot, I'm using geom_bar to plot three factors. I mean, for each "time" and "dose" I'm plotting two bars (two genotypes).
To be more specific, this is what I mean:
This is my code till now (Actually I changed some settings, but I'm presenting just what is need for):
ggplot(data=data, aes(x=interaction(dose,time), y=b, fill=factor(genotype)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_grey(start=0.3, end=0.6, name="Genotype")
Question: I intend to add the mean of each time using points and that these points are just in the middle of the bars of a certain time. How can I proceed?
I tried to add these points using geom_dotplot and geom_point but I did not succeed.
library(dplyr)
time_data = data %>% group_by(time) %>% summarize(mean(b))
data <- inner_join(data,time_data,by = "time")
this gives you data with the means attached. Now make the plot
ggplot(data=data, aes(x=interaction(dose,time), y=b,fill=factor(genotype)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_grey(start=0.3, end=0.6, name="Genotype")+
geom_text(aes(b),vjust = 0)
You might need to fiddle around with the argument hjust and vjust in the geom_text statement. Maybe the aes one too, I didn't run the program so I don't know.
It generally helps if you can give a reproducible example. Here, I made some of my own data.
sampleData <-
data.frame(
dose = 1:3
, time = rep(1:3, each = 3)
, genotype = rep(c("AA","aa"), each = 9)
, b = rnorm(18, 20, 5)
)
You need to calculate the means somewhere, and I chose to do that on the fly. Note that, instead of using points, I used a line to show that the mean is for all of those values. I also sorted somewhat differently, and used facet_wrap to cluster things together. Points would be a fair bit harder to place, particularly when using position_dodge, but you could likely modify this code to accomplish that.
ggplot(
sampleData
, aes(x = dose
, y = b
, fill = genotype)
) +
geom_bar(position = "dodge", stat = "identity") +
geom_hline(data =
sampleData %>%
group_by(time) %>%
summarise(meanB = mean(b)
, dose = NA, genotype = NA)
, aes(yintercept = meanB)
, col = "black"
) +
facet_wrap(~time)

Plotting a bar graph in R

Here is a snapshot of data:
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales, restaurant_change_labor, restaurant_change_POS)
I want two bars for each of the columns indicating the change. One graph for each of the columns.
I tried:
ggplot(aes(x = rest_change$restaurant_change_sales), data = rest_change) + geom_bar()
This is not giving the result the way I want. Please help!!
So ... something like:
library(ggplot2)
library(dplyr)
library(tidyr)
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales,
restaurant_change_labor,
restaurant_change_POS)
cbind(rest_change,
change = c("Before", "After")) %>%
gather(key,value,-change) %>%
ggplot(aes(x = change,
y = value)) +
geom_bar(stat="identity") +
facet_grid(~key)
Which will produce:
Edit:
To be extra fancy e.g. make it so that the order of x-axis labels goes from "Before" to "After", you can add this line: scale_x_discrete(limits = c("Before", "After")) to the end of the ggplot function
Your data are not formatted properly to work well with ggplot2, or really any of the plotting packages in R. So we'll fix your data up first, and then use ggplot2 to plot it.
library(tidyr)
library(dplyr)
library(ggplot2)
# We need to differentiate between the values in the rows for them to make sense.
rest_change$category <- c('first val', 'second val')
# Now we use tidyr to reshape the data to the format that ggplot2 expects.
rc2 <- rest_change %>% gather(variable, value, -category)
rc2
# Now we can plot it.
# The category that we added goes along the x-axis, the values go along the y-axis.
# We want a bar chart and the value column contains absolute values, so no summation
# necessary, hence we use 'identity'.
# facet_grid() gives three miniplots within the image for each of the variables.
ggplot2(rc2, aes(x=category, y=value, facet=variable)) +
geom_bar(stat='identity') +
facet_grid(~variable)
You have to melt your data:
library(reshape2) # or library(data.table)
rest_change$rowN <- 1:nrow(rest_change)
rest_change <- melt(rest_change, id.var = "rowN")
ggplot(rest_change,aes(x = rowN, y = value)) + geom_bar(stat = "identity") + facet_wrap(~ variable)

jitter geom_line()

Is there a way to jitter the lines in geom_line()? I know it kinda defies the purpose of this plot, but if you have a plot with few lines and would like them all to show it could be handy. Maybe some other solution to this visibility problem.
Please see below for code,
A <- c(1,2,3,5,1)
B <- c(3,4,1,2,3)
id <- 1:5
df <- data.frame(id, A, B)
# install.packages(reshape2)
require(reshape2) # for melt
dfm <- melt(df, id=c("id"))
# install.packages(ggplot2)
require(ggplot2)
p1 <- ggplot(data = dfm, aes(x = variable, y = value, group = id,
color= as.factor(id))) + geom_line() + labs(x = "id # 1 is hardly
visible as it is covered by id # 5") + scale_colour_manual(values =
c('red','blue', 'green', 'yellow', 'black'))
p2 <- ggplot(subset(dfm, id != 5), aes(x = variable, y = value,
group = id, color= as.factor(id))) + geom_line() + labs(x = "id # 
5 removed, id # 1 is visible") + scale_colour_manual(values =
c('red','blue', 'green', 'yellow', 'black'))
# install.packages(RODBC)
require(gridExtra)
grid.arrange(p1, p2)
You can try
geom_line(position=position_jitter(w=0.02, h=0))
and see if that works well.
If you just want to prevent two lines from overlapping exactly, there is now a better way: position_dodge(), which "adjusts position by dodging overlaps to the side". This is nicer than adding jitter to any line, even when it's not needed.
Avoid ggplot2 lines overlapping exactly using position_dodge()
Code example:
df<-data.frame(x=1:10,y=1:10,z=1:10);
df.m <- melt(df, id.vars = "x");
ggplot(df.m, aes(x=x,y=value,group=variable,colour=variable))
+ geom_line(position=position_dodge(width=0.2));
Thanks to position_dodge(), we can now see that there are two lines in the plot, which just happen to co-incide exactly:
I tend to use different linestyles, so that, say, a solid blue line "peeks through" a dashed red line on top of it.
Then again, it does depend on what you want to impart to the reader. Keep in mind first and foremost that data should be points and theory lines unless this makes things cluttered. Unless the y and x values are identical, it'll be easier to see the points. (or you could apply the existing jitter function to the x-values)
Next, if you just want to show which runs are in the "bundle" and which are outliers, overlap doesn't matter because it's very unlikely that two outliers will be near-equal.
If you want to show a bunch of near-equal runs, you may prefer (which is to say, your readers will understand better) to plot the deltas against a mean rather than the actual values.
I would like to suggest a solution to a different problem than described, in which the Y axis is a factor, so position_dodge does nothing.
code:
library(tidyverse)
time_raw <- tibble(year=1900:1909,
person_A=c(rep("Rome",2),rep("Jerusalem",8)),
person_B=c(rep("Jerusalem",5),rep("Rome",5)))
achievements <- tribble(~year,~who,~what,
1900,"person_A","born",
1900,"person_B","born",
1909,"person_A","died",
1909,"person_B","died",
1905,"person_A","super star",
1905,"person_B","super star")
SCALE=0.5
jitter_locations <- time_raw %>%
pivot_longer(-year,names_to="who",values_to="place") %>%
distinct(place)%>%
filter(!is.na(place)) %>%
mutate(y_place=seq_along(place))
jitter_lines <- time_raw %>%
pivot_longer(-year,names_to="who",values_to="place") %>%
distinct(who) %>%
mutate(y_jitter=scale(seq_along(who))*0.015)
data_for_plot <- time_raw %>%
pivot_longer(-year,names_to="who",values_to="place") %>%
filter(!is.na(place)) %>%
left_join(achievements) %>%
left_join(jitter_locations) %>%
left_join(jitter_lines)
data_for_plot %>%
ggplot(aes(x=year,y=y_place+y_jitter,color=who,group=who))+
geom_line(size=2)+
geom_hline(aes(yintercept=y_place),size=50,alpha=0.1)+
geom_point(data = . %>% filter(!is.na(what)),size=5)+
geom_label(aes(label=what),size=3,nudge_y = -0.025)+
theme_bw()+
coord_cartesian(ylim = c(min(jitter_locations$y_place)-0.5*SCALE,
max(jitter_locations$y_place)+0.5*SCALE))+
scale_y_continuous(breaks =
min(jitter_locations$y_place):max(jitter_locations$y_place),
labels = jitter_locations$place)+
scale_x_continuous(breaks =
min(data_for_plot$year):max(data_for_plot$year))+
ylab("Place")

Resources