barplot 2 bars one stacked the other not - r

Despite some similar questions and my research I cannot seem to solve my little problem. Please forgive if the answer is very easy and I am being silly....I have a data frame
df<-data.frame(X = c("Germany", "Chile","Netherlands","Papua New Guinea","Cameroon"), R_bar_Ger = c(1300000000, 620000, 550000, 400000, 320000))
I would like to produce a barplot with 2 bars (Country names on x-achsis, amounts on y-achsis).
The left bar should show Germany, the right one should be stacked with the remaining 4 countrys.
Please help and Thank you very much in advance!

One way to solve this is by using ggplot2 and a little bit of manipulating your data frame.
First, add a column to your data frame that indicates which bar a country should be plotted in (Germany or Not-Germany):
df$bar <- ifelse(df$X == "Germany", 1, 0)
Now, create the plot:
ggplot(data = df, aes(x = factor(bar), fill = factor(X), y = R_bar_Ger)) +
geom_bar(stat = "identity") +
scale_y_sqrt() +
labs(x = "Country Group", title = "Square Root Scale", fill = "Country") +
scale_x_discrete(labels = c("Not Germany", "Germany"))
Note that if you're not familiar with ggplot2, only the first two lines are necessary for creating the plot - the others are to make it look nice. Since Germany is orders of magnitude larger than your other countries, this isn't going to look very good without some sort of scaling. ggplot2 has a number of built in scaling commands that might be worth exploring - here, I've added the square root scale so you can that the non-Germany countries actually do get stacked as desired.
The documentation for ggplot2 bar charts can be found here - it's definitely worth a read if you're looking for a powerful plotting tool.

There are a number of ways to skin a cat, and your exact question will often change as you learn new tools. I probably wouldn't have set the problem specification up this way, but sticking as close to your data and barplot as possible, one way to achieve what I think you want is:
with(aggregate(R_bar_Ger ~ X=="Germany", data=df, sum), barplot(R_bar_Ger, names.arg=c("Other", "Germany")))
So what we're doing here is aggregating Germany and non-Germany figures by addition, and then passing those values to the barplot function along with sensible x-axis labels.

You'll need to add an additional column to your data first:
df$group <- ifelse(df$X=="Germany","Germany","Other")
Then we can use the following ggplot approach
library(ggplot)
qplot(x = factor(group), y = R_bar_Ger, data=df, geom = "bar", stat = "identity", fill = factor(X))

Related

Making a geom_bar from a dataframe in R

Background
I have a dataframe, df, of athlete injuries:
df <- data.frame(number_of_injuries = c(1,2,3,4,5,6),
number_of_people = c(73,52,43,12,7,2),
stringsAsFactors=FALSE)
The Problem
I'd like to use ggplot2 to make a bar chart or histogram of this simple data using geom_bar or geom_histogram. Important point: I'm pretty novice with ggplot2.
I'd like something where the x-axis shows bins of the number of injuries (number_of_injuries), and the y-axis shows the counts in number_of_people. Like this (from Excel):
What I've tried
I know this is the most trivial dang ggplot issue, but I keep getting errors or weird results, like so:
ggplot(df, aes(number_of_injuries)) +
geom_bar(stat = "count")
Which yields:
I've been in the tidyverse reference website for an hour at this and I can't crack the code.
It can cause confusion from time to time. If you already have "count" statistics, then do not count data using geom_bar(stats = "count") again, otherwise you simply get 1 in all categories. You want to plot those values as they are with geom_col:
ggplot(df, aes(x = number_of_injuries, y = number_of_people)) + geom_col()

Plotting lines between two points in ggplot2

I'm looking for a way to represent a vector coming off of a point given angle and magnitude in ggplot. I've calculated what the endpoint of these vectors should be, but can't figure out a way to plot this properly in ggplot2. In short, given an observation with (X,Y,vec.x,vec.y), how can I plot a line from (X,Y) to (vec.x,vec.y) that does not show (vec.x,vec.y)?
My first instinct was to use geom_line, but this seems to rely on connecting different observations, so I would need to separate each observation into two observations, one with the original point and one with the vector endpoint. However, this seems fairly messy and like there should be a cleaner way to achieve this. Furthermore, this would make it complicated to show the original points but hide the vector points, as they would be plotted within the same geom_point call.
Here's a sample dataset in the form I'm talking about:
test <- tibble(
x = c(1,2,3,4,5),
y = c(5,4,3,2,1),
vec.x = c(1.5,2.5,3.5,4.5,5.5),
vec.y = c(4,3,2,1,0)
)
test %>%
ggplot() +
geom_point(aes(x=x,y=y),color='red') +
geom_point(aes(x=vec.x,y=vec.y),color='blue')
What I'm hoping to achieve is this, but without the blue dots:
Any thoughts? Apologies if this is a duplicated issue. I did some Googling and was unable to find a similar question for ggplot.
test %>%
ggplot() +
geom_point(aes(x=x,y=y),color='red') +
geom_point(aes(x=vec.x,y=vec.y),color='blue') +
geom_segment(
aes(x = x,y = y, xend = vec.x,yend = vec.y),
arrow = arrow(length = unit(0.03,units = "npc")),
size = 1
)
Reference: https://ggplot2.tidyverse.org/reference/geom_segment.html

Clustered bar chart R using 2 Numeric Variables/Metrics

I want to create a clustered Bar chart in R using 2 numeric variables, e.g:
Movie Genre (X-axis) and Gross$ + Budget$ should be Y-axis
It's a very straightforward chart to create in Excel. However, in R, I have put Genre in my X-axis and Gross$ in Y-axis.
My question is: Where do I need to put another Numeric variable ie Budget$ in my code so that the new Budget$ will be visible beside Gross$ in the chart?
Here is my Code:
ggplot(data=HW, aes(reorder(x=HW$Genre,-HW$Gross...US, sum),
y=HW$Gross...US))+
geom_col()
P.S. In aes I have just put reorder to sort the categories.
Appreciate help!
Could you give us some data so we can recreate it?
I think you are looking for geom_bar() and one of its options, position="dodge", which tells ggplot to put the bars side by side. But without knowing your data and its structure I can't further help you.
Melting the dataset should help in this case. A dummy-data based example below:
Data
HW <- data.frame(Genre = letters[sample(1:6, 100, replace = T)],
Gross...US = rnorm(100, 1e6, sd=1e5),
Budget...US = rnorm(100, 1e5, sd=1e4))
Code
library(tidyverse)
library(reshape2)
HW %>%
melt %>%
ggplot(aes(Genre, value, fill=variable)) + geom_col(position = 'dodge')

Using ggplot2, how can I label the x axis by only one of the two factors used to create a grouped boxplot?

My question is simple, I think, but Googling and fussing have gotten me nowhere. Hopefully, someone here can help. I have data that was collected from two different treatment groups over three years. I want to generate a boxplot for each combination of treatment and year (to see if the treatment effect varied by year). However, I can't figure out how to label the x axis the way I want to using ggplot2.
Here's some code that creates some faked data kind of similar to my own.
fakeX = factor(rep(c(0,1), times=60))
fakeY = factor(rep(c(1,2,3), each=40))
fakeZ = round(runif(120, 0, 12), digits=1)
df1 = data.frame(fakeX, fakeY, fakeZ)
df1$inter = interaction(df1$fakeX, df1$fakeY)
ggplot(df1, aes(x=inter, y = fakeZ)) +
stat_boxplot(geom='errorbar', lwd=1.75) +
geom_boxplot(outlier.size=5, lwd = 1.75, fatten=1.25,
fill = rep(c("dim gray", "ivory3"), times=3), aes(fill=fakeY)) +
scale_x_discrete(labels = c("2013", "", "2014", "", "2015", ""))
This code produces a set of boxplots like those I'm looking for:
So far so good. However, as you can see, I have treatments labeled by color already (the figure caption will explain which is which), so I just need my axis labels to say which year each set of two boxplots is from. However, I can't figure out the code that will get the year labels centered exactly in between the two boxplots each is for, which is what I think would be the most clear thing to do. Right now, the best I've been able to do is put the year label with the first boxplot from each year. I've tried using the breaks argument inside of scale_x_discrete, to no avail. I've also tried passing a position=position_nudge() call to scale_x_discrete, but that didn't work either. If I add spaces in front of my axis labels, then I can fake what I'm looking for, but I'd have to use trial and error to get the number of spaces needed exactly right for each graph, and it seems like there has to be a better way. Any thoughts?
An more specific example than my comment above:
ggplot(df1,aes(x = fakeY,y = fakeZ,fill = fakeX)) +
stat_boxplot(geom = "errorbar",position = "dodge",lwd = 1.75) +
geom_boxplot(position = "dodge",lwd = 1.75,fatten = 1.25,show.legend = FALSE) +
scale_fill_manual(values = c("dim gray", "ivory3")) +
scale_x_discrete(labels = c('2013','2014','2015'))

Label selected percentage values inside stacked bar plot (ggplot2)

I want to put labels of the percentages on my stacked bar plot. However, I only want to label the largest 3 percentages for each bar. I went through a lot of helpful posts on SO (for example: 1, 2, 3), and here is what I've accomplished so far:
library(ggplot2)
groups<-factor(rep(c("1","2","3","4","5","6","Missing"),4))
site<-c(rep("Site1",7),rep("Site2",7),rep("Site3",7),rep("Site4",7))
counts<-c(7554,6982, 6296,16152,6416,2301,0,
20704,10385,22041,27596,4648, 1325,0,
17200, 11950,11836,12303, 2817,911,1,
2580,2620,2828,2839,507,152,2)
tapply(counts,site,sum)
tot<-c(rep(45701,7),rep(86699,7), rep(57018,7), rep(11528,7))
prop<-sprintf("%.1f%%", counts/tot*100)
data<-data.frame(groups,site,counts,prop)
ggplot(data, aes(x=site, y=counts,fill=groups)) + geom_bar()+
stat_bin(geom = "text",aes(y=counts,label = prop),vjust = 1) +
scale_y_continuous(labels = percent)
I wanted to insert my output image here but don't seem to have enough reputation...But the code above should be able to produce the plot.
So how can I only label the largest 3 percentages on each bar? Also, for the legend, is it possible for me to change the order of the categories? For example put "Missing" at the first. This is not a big issue here but for my real data set, the order of the categories in the legend really bothers me.
I'm new on this site, so if there's anything that's not clear about my question, please let me know and I will fix it. I appreciate any answer/comments! Thank you!
I did this in a sort of hacky manner. It isn't that elegant.
Anyways, I used the plyr package, since the split-apply-combine strategy seemed to be the way to go here.
I recreated your data frame with a variable perc that represents the percentage for each site. Then, for each site, I just kept the 3 largest values for prop and replaced the rest with "".
# I added some variables, and added stringsAsFactors=FALSE
data <- data.frame(groups, site, counts, tot, perc=counts/tot,
prop, stringsAsFactors=FALSE)
# Load plyr
library(plyr)
# Split on the site variable, and keep all the other variables (is there an
# option to keep all variables in the final result?)
data2 <- ddply(data, ~site, summarize,
groups=groups,
counts=counts,
perc=perc,
prop=ifelse(perc %in% sort(perc, decreasing=TRUE)[1:3], prop, ""))
# I changed some of the plotting parameters
ggplot(data2, aes(x=site, y=perc, fill=groups)) + geom_bar()+
stat_bin(geom = "text", aes(y=perc, label = prop),vjust = 1) +
scale_y_continuous(labels = percent)
EDIT: Looks like your scales are wrong in your original plotting code. It gave me results with 7500000% on the y axis, which seemed a little off to me...
EDIT: I fixed up the code.

Resources