How to visualize several parameters in a barplot - r

I am trying to make a graph, were I have amount up the y-axis (numeric), Office place (categorical) on the x= axis, sorted in regions (categorical)...
What I have tried to do:
My_df %>%
filter(Context == "Humanitarian") %>%
ggplot(aes(Office_abb, Award_USD)) +
geom_histogram(aes(color=Region, fill = Region)) +
theme(legend.position="bottom") +
ggtitle("Overview of office amount pr. region")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
My error message on the above script is:
Error: stat_bin() must not be used with a y aesthetic.
I wold like it to be in a stacked diagram, or dodged - but when I tried to add
geom_bar(position = "dodge2")
into the equation, it didn't work either
It did however work with geom_point - but that is not the type of visualization that I wish for.
With that i am kind of lost on what to do - I hope that someone can help me move on from here! :-)

Related

ggplot bar graph by percentages

I am trying to make a bar graph showing ages of first alcohol use by county by percent. I am not quite sure where the mistake is and would appreciate another set of eyes.
Data is publicly available here: https://www.datafiles.samhsa.gov/dataset/national-survey-drug-use-and-health-2020-nsduh-2020-ds0001 although I have cleaned it on my computer.
The percentages are definitely not out of 100 and the numbers are not adjusting for population. They are the same as my chart showing raw numbers.
palc.age.ct<-data1.cleaned%>%
mutate(ALCTRY= na_if(x=ALCTRY, y="Never Used"))%>%
drop_na(ALCTRY)%>%
ggplot(aes(x=ALCTRY, fill=COUTYP4))+
geom_bar (position = "dodge") +
geom_bar(aes(y = (..count..)/sum(..count..)))+
scale_y_continuous(labels=scales::percent)+
theme_minimal()+
labs(title = "First Alcohol Use by Age and Locality",
x="Age Initiated", y="Number Initiated")+
scale_color_viridis(option = "D")
I'm not recreating everything you did like labelling the bins, but based on the data you can do something like below. Note that you need to include the position = "dodge" in the bar chart where you want to calculate the percentage. Otherwise the calculation is done in a different geom than the one that is creating the grouped bar geom. Which is the reason for your issue.
library(dplyr)
library(ggplot2)
NSDUH_2020 %>%
select(alctry, COUTYP4) %>%
mutate(altcry = if_else(alctry > 66, NA_integer_, alctry),
COUTYP4 = forcats::as_factor(COUTYP4)) %>%
filter(!is.na(altcry)) %>%
ggplot(aes(x = alctry, fill = COUTYP4)) +
geom_bar(aes(y = (..count..)/sum(..count..)), position = "dodge") +
scale_y_continuous(labels = scales::label_percent(accuracy = .1)) +
scale_x_binned()

Adding extra space above graph with facet_wrap

I’m trying to use facet_wrap in r with 4 plots that have different x-axis. I used scales = “free” which was helpful, but most of the plots touch the top of the graph. I would like there to be space above the tallest bar of the graph, so it doesn’t look like it’s going off above the limit of the graph. I hope that makes sense - this is my first time posting a question here. I have provided the code that I have and a screenshot of one of the graphs. I haven't been able to find any sort of fix without using cow plot which is a bit more advanced than I would like.
Code:
wrap_plot <- df %>% ggplot(aes(x=Race, y = Rate, fill = Sex))+
geom_col(position = "dodge")+
labs(x = "Race/Ethnicity",
y = "Rate per 100,000 population")+
theme_classic()+
facet_wrap(~Disease, scales = "free")+
scale_y_continuous(expand = c(0, 0))

How to remove or not show any of the data points above and below the error bar in boxplot and violin plot?

I'm working on a very large dataset containing around 1.6M data points. I'm using the violin plot along with the boxplot to represent the data from each category (there are multiple categories and each has its own set of values).
But the problem which I'm facing is, there are a lot of data points (outliers) above the error bar because of that the focus of the plot has been lost.
Earlier I thought that probably if I remove all the data points after a specific value it will help me to represent what I wanted to show. But It didn't work because for each category the errorbar range is different and because of that, I lost the majority of data from other categories.
So, now I'm thinking to remove or not showing the data points above the error bar for each category individually, for both box and violin plot. And I introduced outlier.shape=NA in the geom_boxplot, it worked fine for the boxplot. Similarly, I wanted to remove all those data points from the violin plot as well which are above the error bar in the boxplot.
Here are the plots before and after using outlier.shape=NA.
Before:
After:
Here is my code :
med_violin <- data %>%
left_join(sample_size) %>%
mutate(myaxis = fct_reorder(paste0(Country), Diff, .fun='median')) %>%
ggplot( aes(x=myaxis, y=Diff, fill=Country)) +
geom_violin(width=1.5, color = "black", position = position_dodge(width=1.8), trim = TRUE) +
geom_boxplot(width=0.2, color="white", alpha=0.01, outlier.colour="red", outlier.size=0.1, outlier.shape = NA) +
scale_y_continuous(breaks = c(0,25,50,75,100,125,150,525,550))+
coord_trans(y = squash_axis(150, 525, 15)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
theme(axis.text.x = element_text(size = 8))+
theme(legend.position ="none")+
scale_fill_viridis(discrete = TRUE) +
xlab("")
med_violin
How can I implement the same thing in genom_violin, so that it will also not show the data points above the error bar?
I even tried this : Ignore outliers in ggplot2 geom_violin
But did not work for me.
Thank you.

Overlapping data on columns in a ggplot facet grid

Thanks in advance for humoring a complete newbie to R. I'm working with some data from the GSS for an online class, and I've created a ggplot facet grid. I'm sure I've done this a super awkward, long way, but I'm trying to get these data points to not overlap each other, but be centered on the columns.
Here's what I've got so far:
I've created a new dataset from the GSS with the variables 'conpress', 'sex', and 'news' -- which refer to the confidence in the press, gender and how often someone reads a newspaper. I wanted to get the percentages, not the counts, which is why I did the ..count..stuff.
gss_press_full <- gss %>% select (conpress, news, sex)
gss_press_clean <-na.omit(gss_press_full)
ggplot(gss_press_clean, aes(x = conpress, y = (..count..)/sum(..count..), fill = sex)) +
geom_bar(aes(y = (..count..)/sum(..count..)), position = position_dodge()) + facet_grid(news~.) +
geom_text(aes(y = ((..count..)/sum(..count..)), label = round((..count..)/sum(..count..), 2)), stat = "count", vjust = -0.25) +
labs(title = "Newspaper readership and Press confidence", y = "Percent", x = "Levels of confidence in the Press")
I have been googling for far too long and can't seem to find a way to adjust these labels atop the columns. It seems to be especially tricky since my y variable is being calculated in my ggplot creation, but again, like a complete novice, that was how I cobbled my way to the output. If someone has help on how to streamline this process, I'd appreciate that too!
( I hope I've included enough code to be helpful!)
Again, thanks for any help!

geom_dotplot Gaps Between scale_x_discrete

I am trying to make a histodot using geom_dotplot. For some reason, ggplot is giving me what appear to me are arbitrary breaks in my data along the x axis. My data has been binned seq(0,22000000,500000), so I would like to see gaps in my data where they actually exist. However, I'm struggling to successfully list those breaks(). I wouldn't expect to see my first break until after $6,000,000 with a break until $10,000,000. Bonus points for teaching me why I can't use labels=scales::dollar on my scale_x_discrete.
Here is my data.
library(ggplot2)
data <- read.csv("data2.csv")
ggplot(data,aes(x=factor(constructioncost),fill=designstage)) +
geom_dotplot(binwidth=.8,method="histodot",dotsize=0.75,stackgroups=T) +
scale_x_discrete(breaks=seq(0,22000000,500000)) +
ylim(0,30)
Thanks for any and all help and please, let me know if you have any questions!
Treating the x axis as continuous instead of a factor will give you what you need. However, you experienced the enormous range of your cost variable (0 to 21 million) was making ggplot2 choke when you try to treat is as continuous.
Because your values (other than 0) are all at least 500000, dividing the cost by 100000 will put things on a scale that ggplot2 can handle but also give you the variable spacing you want.
Note I had to play around with binwidth when I changed the scale of the data.
ggplot(data, aes(x = constructioncost/100000, fill = signsta)) +
geom_dotplot(binwidth = 3, method="histodot", stackgroups=T) +
scale_x_continuous(breaks=seq(0, 220, 5)) +
ylim(0,30)
Then you can change the labels to reflect the whole dollar amounts if you'd like. The number are so big you'll likely need to either add fewer or change the orientation (or both).
ggplot(data, aes(x = constructioncost/100000, fill = signsta)) +
geom_dotplot(binwidth = 3, method="histodot", stackgroups=T) +
scale_x_continuous(breaks = seq(0, 220, 10),
labels = scales::dollar(seq(0, 22000000, 1000000))) +
ylim(0,30) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

Resources