R ggplot horizontal bars - r

please I'm having really hard time to probably do something quite simply. I read different posts in here but can't find anything similar to what I would need.
I have the following dataframe:
sector <- c("tech", "energy", "retail", "gaming")
curr_sales <- c(10, 18, 15, 7)
avg_sales <- c(8.2, 20.1, 25.0, 4.1)
df <- data.frame(sector, curr_sales, avg_sales)
df$sector <- as.character(df$sector)
my initial goal was to create a plot with horizontal bars, with on the Y axis the sector, on the x axis the current sales curr_sales and bars sorted by current sales.
The following code so far helps to achieve this goal:
ggplot(df, aes(x = reorder(sector, curr_sales), y = curr_sales)) +
geom_bar(stat = "identity") +
coord_flip()
Goal: at this point, I would need a way to display for each sector (= for each horizontal bar) the average sales value. I was hoping to achieve this without having a second bar for each sector, but rather a marker or a line that would allow easily to see where the avg sales is per sector vs the current sales value.
I couldn't find any similar example and any suggestions would be much appreciated.
Thanks

It sounds like you might be able to do this with two geom_bar layers, each with a different y aesthetic. Something like:
ggplot(df) +
geom_bar(stat = "identity", aes(x = reorder(sector, curr_sales), y = curr_sales), fill=sector) +
geom_bar(stat = "identity", aes(x = reorder(sector, curr_sales), y = avg_sales), alpha=0, color='black') +
coord_flip() +
scale_fill_manual(values=c("energy"="red", "gaming"="blue", "retail"="orange", "tech"="green"))
And you could play with the second bar to get the exact effect you are looking for (in my example it is transparent with a black outline). This example also has colors.

Related

I want to add a total at the top of my bar chart using ggplot2 in R. When I try, I'm getting the value of each row instead of the total

I'm fairly new to R and have created a bar chart using
ggplot(df_mountainorframemodelsales, aes(x = FrameOrMountain, y = salestotal, fill= FrameOrMountain)) +
geom_bar(stat = "identity")+
ggtitle("Frame and Mountain Bikes Total Sales")
Which gives me a standard chart with two columns. I want to add a total at the top of each column, however when I try it is giving me a big cluster of numbers at the bottom of the chart. I think it is each individual total rather than the sum of them all together. How do I get the total for all mountains and all frames?
This is the code I've tried
ggplot(df_mountainorframemodelsales, aes(x = FrameOrMountain, y = salestotal, fill= FrameOrMountain)) +
geom_bar(stat = "identity") +
geom_text(aes(label=salestotal), vjust=0) +
ggtitle("Frame and Mountain Bikes Total Sales")
I've tried searching for answers but all the other problems are with stacked bar charts. Can anyone help please?
You can change the data= used by specific geoms, such as
ggplot(mtcars, aes(x = cyl, y = disp, fill = cyl)) +
geom_bar(stat = "identity") +
geom_text(aes(label = disp), vjust = 0,
data = ~ aggregate(disp ~ cyl, data = ., FUN = sum)) +
ggtitle("Total Displacement by Number of Cylinders")
This highlights that you may want to expand= the y-axis to allow room for the top labels. Alternatively, you can change vjust=1.2 or some number over 1 so that it is enclosed within the bar, though this will have problems when bars have extremely low totals (so I think expand= with vjust=0 is safer).
(I'm not saying this is an awesome plot: cyl being shown as a discrete vice continuous variable would make a lot of sense, and perhaps other aspects that would make this comparison better. My point of not doing that was to stay as close to your original code as possible.)

what are these gray lines inside the bars of my ggplot bargraph?

I wanted to create a graph to show which continent has the highest co2 emissions total per capita over the years,
so I coded this:
barchart <- ggplot(data = worldContinents,
aes(x = Continent,
y = `Per.Capita`,
colour = Continent))
+ geom_bar(stat = "identity")
barchart
This is my dataframe:
Geometry is just for some geographical mapping later.
I checked which(is.na(worldContinents$Per.Capita)) to see whether there were NA values but it returned nothing
What's causing these gray lines?
How do I get rid of them?
These are the gray lines inside the bar graph
Thank you
You have a couple of issues here. First of all, I'm guessing you want the fill to be mapped to the continent, not color, which only controls the color of the bars' outlines.
Secondly, there are multiple values for each continent in your data, so they are simply stacking on top of each other. This is the reason for the lines in your bars, and is probably not what you want. If you want the average value per capita in each continent, you either need to summarise your data beforehand or use stat_summary like this:
barchart <- ggplot(data = worldContinents,
aes(x = Continent,
y = `Per.Capita`,
fill = Continent)) +
stat_summary(geom = "col", fun = mean, width = 0.7,
col = "gray50") +
theme_light(base_size = 20) +
scale_fill_brewer(palette = "Spectral")
barchart
Data used
Obviously, we don't have your data, so I used a modified version of the gapminder data set to match your own data structure
worldContinents <- gapminder::gapminder
worldContinents$Per.Capita <- worldContinents$gdpPercap
worldContinents$Continent <- worldContinents$continent
worldContinents <- worldContinents[worldContinents$year == 2007,]

how to plot percentage instead of count, in facet_grid graph?

I am having a hard time plotting percentage instead of count when using facet_grid.
I have the following DF (this is an example, my DF is much longer):
'Gu<-c("1","0","0","0","1","0")
variable<-c("THR","Screw removal","THR","THR","THR","Screw removal")
value<-c("0","1","0","1","0","0")
df2<-data.frame(Gu,variable,value)'
and I am trying to plot the "1" values out of the specific variable (either THR or Screw removal) and split the graph by "Gu" (facet grid).
I manage to code it to plot count, but I can seem to be able to calculate the percentage (I need to calculate the percentage from each variable only and not from all the DF)
This is my code:
ggplot(data = df2, aes(x = variable,y =value ,
fill = variable)) +
geom_bar(stat = "identity")+
facet_grid(~ Gu,labeller=labeller(Gu
=c('0'="Nondisplaced fracture",'1'="Displaced
fracture")))+
scale_fill_discrete(name = "Revision", labels =
c("THR","SCREW"))
and this is what I plotted:
enter image description here
I searched this website and the web and couldn't find an answer...
any help will do!
thanks

ggplot: why is the y-scale larger than the actual values for each response?

Likely a dumb question, but I cannot seem to find a solution: I am trying to graph a categorical variable on the x-axis (3 groups) and a continuous variable (% of 0 - 100) on the y-axis. When I do so, I have to clarify that the geom_bar is stat = "identity" or use the geom_col.
However, the values still show up at 4000 on the y-axis, even after following the comments from Y-scale issue in ggplot and from Why is the value of y bar larger than the actual range of y in stacked bar plot?.
Here is how the graph keeps coming out:
I also double checked that the x variable is a factor and the y variable is numeric. Why would this still be coming out at 4000 instead of 100, like a percentage?
EDIT:
The y-values are simply responses from participants. I have a large dataset (N = 600) and the y-value are a percentage from 0-100 given by each participant. So, in each group (N = 200 per group), I have a value for the percentage. I wanted to visually compare the three groups based on the percentages they gave.
This is the code I used to plot the graph.
df$group <- as.factor(df$group)
df$confid<- as.numeric(df$confid)
library(ggplot2)
plot <-ggplot(df, aes(group, confid))+
geom_col()+
ylab("confid %") +
xlab("group")
Are you perhaps trying to plot the mean percentage in each group? Otherwise, it is not clear how a bar plot could easily represent what you are looking for. You could perhaps add error bars to give an idea of the spread of responses.
Suppose your data looks like this:
set.seed(4)
df <- data.frame(group = factor(rep(1:3, each = 200)),
confid = sample(40, 600, TRUE))
Using your plotting code, we get very similar results to yours:
library(ggplot2)
plot <-ggplot(df, aes(group, confid))+
geom_col()+
ylab("confid %") +
xlab("group")
plot
However, if we use stat_summary, we can instead plot the mean and standard error for each group:
ggplot(df, aes(group, confid)) +
stat_summary(geom = "bar", fun = mean, width = 0.6,
fill = "deepskyblue", color = "gray50") +
geom_errorbar(stat = "summary", width = 0.5) +
geom_point(stat = "summary") +
ylab("confid %") +
xlab("group")

How to change origin line position in ggplot bar graph?

Say I'm measuring 10 personality traits and I know the population baseline. I would like to create a chart for individual test-takers to show them their individual percentile ranking on each trait. Thus, the numbers go from 1 (percentile) to 99 (percentile). Given that a 50 is perfectly average, I'd like the graph to show bars going to the left or right from 50 as the origin line. In bar graphs in ggplot, it seems that the origin line defaults to 0. Is there a way to change the origin line to be at 50?
Here's some fake data and default graphing:
df <- data.frame(
names = LETTERS[1:10],
factor = round(rnorm(10, mean = 50, sd = 20), 1)
)
library(ggplot2)
ggplot(data = df, aes(x=names, y=factor)) +
geom_bar(stat="identity") +
coord_flip()
Picking up on #nongkrong's comment, here's some code that will do what I think you want while relabeling the ticks to match the original range and relabeling the axis to avoid showing the math:
library(ggplot2)
ggplot(data = df, aes(x=names, y=factor - 50)) +
geom_bar(stat="identity") +
scale_y_continuous(breaks=seq(-50,50,10), labels=seq(0,100,10)) + ylab("Percentile") +
coord_flip()
This post was really helpful for me - thanks #ulfelder and #nongkrong. However, I wanted to re-use the code on different data without having to manually adjust the tick labels to fit the new data. To do this in a way that retained ggplot's tick placement, I defined a tiny function and called this function in the label argument:
fix.labels <- function(x){
x + 50
}
ggplot(data = df, aes(x=names, y=factor - 50)) +
geom_bar(stat="identity") +
scale_y_continuous(labels = fix.labels) + ylab("Percentile") +
coord_flip()

Resources