gg-plot isnt producing bars on the Bar chart - r

I have used the following code
ggplot(IncomeGroup_count,
aes(x=Income_Group,
y= Percentage,
colour = 'lightblue'))
But the only thing it produces is the x and y axis with no bars.

It looks like you are missing the geom_bar function. Typically ggplot works as the base function for fitting whatever data you are using and then the geom functions "draw" the plot using that data. Here I have constructed a bar plot using data in R since you have not supplied your own data in your question.
#### Load Library ####
library(tidyverse)
#### Plot ####
airquality %>%
ggplot(aes(x=is.na(Ozone)))+
geom_bar()
Which gives you this bare bones bar plot. It takes the NA values from the airquality dataset, then draws bars thereafter plotting the NA values by "TRUE" for NA and "FALSE" for not NA:
Edit
Again it's difficult to guess what is going wrong if you don't share your data, so I'm assuming based off your comments below that you are trying to plot y explicitly as a count. As suggested in the comments, geom_col may be better for this. Using a different example from the same dataset:
airquality %>%
group_by(Month) %>%
summarise(Mean_Ozone = mean(Ozone, na.rm=T)) %>%
ggplot(aes(x=Month,
y=Mean_Ozone))+
geom_col()
You get this:

Related

R ggplot only shows one bar

I am just starting to work with R, so apologies if my question is too basic,
I have an excel sheet , here's the link: https://file.io/LfsAOdDCVnFq
where I am trying to plot a simple bar plot as follows:
X = I want it to be my sample names , the column called OTU ID in the file
Y = I want it to be the sum of my variables for each sample, column called Sum ZOTUs in the file
so far, I have installed and called library of ggplot2 and tried to plot my data frame but when I do that it only shows one bar, and I don't know what is wrong
install.packages("readxl")
install.packages("ggplot2")
library(readxl)
library(ggplot2)
ZOTU <- read_excel(file.choose())
ggplot(data=ZOTU, aes(x="OTU ID")) + geom_bar ()
and it shows the plot below:
can anyone help how to fix this?
Thanks
I can't see your uploaded image with the excel sheet screenshot.
My guess would be using quotation marks instead of backticks. Try running this code:
ggplot(data = ZOTU, aes(x = `OTU ID`)) + geom_bar()
First
Your question can be better formulated, please read how to ask a good question and how to create a minimal example to understand the basics of a workable question.
In R, you have a very good tool for creating reproducible examples: the reprex package
Also, I would not download anything from a given link in a random question in StackOverflow, and neither should you.
Try
Execute this code in your computer, and see if it helps you understand how ggplot works:
library(ggplot2) # load ggplot
mpg # let's look at a 'mpg' data included in the ggplot package
# Now, a simple bar plot
ggplot(mpg, aes(x = fl)) +
geom_bar()
We use the mpg data as the data for our figure, and we set the x-axis to be the fl column of that data. Finally, we "add" a bar plot to the figure.
By default, the bar plot will plot the count of the different values present in the column you passed as x-axis.
After comments
Following our discussion in the comment section, maybe this is what you want.
If you have the names (discrete variable) for the x-axis in a column, and another column with the variable you want to sum and plot in y for each name, try:
ggplot(data = mpg) +
geom_col(aes(x = manufacturer, y = hwy))
You can have the values with the code
library(tidyverse)
mpg %>% group_by(manufacturer) %>% summarize(total = sum(hwy))
So for your case, if you have a column with the names you want in the x-axis, and another with the values you want the code to sum for each name, use
ggplot(data = your_data_frame) +
geom_col(aes(x = your_names, y = values_to_be_summed_for_each_name))

I want to create a bar graph that counts the number of times a variable occurs in the dataset

I have just started using R and am currently trying to create a bar graph that shows the amount of times each "category" is used. The categories include things like Travel & Events and Sports.
I've tried a few things that come up with errors
barplot(freq, main = category) +geom_bar(stat=category)
Error in as.graphicsAnnot(main) : object 'category' not found
ggplot(data=dat, aes(category))
Error in ggplot(data = dat, aes(category)) : object 'dat' not found
The one time I got a graph to appear it has no data in it just a bunch of lines.
If I understand your question correctly, I think this is an example of what you are after. I used the starwars data set, which is accessible through the tidyverse package:
# load the tidyverse
library(tidyverse)
# take a look at the starwars data set
starwars
# this will show the numbers in each category of hair_colour, which will be plotted below
starwars %>% count(hair_color)
# plot hair colour using ggplot
ggplot(dat = starwars, aes(x = hair_color)) +
geom_bar() +
coord_flip()
From the documentation for geom_bar(): "geom_bar() makes the height of the bar proportional to the number of cases in each group", which is - I think - what you wanted?
This should work if you substitute starwars with your own data, and hair_colour for your variable of interest.

Reorder factored count data in ggplot2 geom_bar

I find countless examples of reordering X by the corresponding size of Y if the Dataframe for ggplot2 (geom_bar) is read using stat="identity".
I have yet to find an example of stat="count". The reorder function fails as I have no corresponding y.
I have a factored DF of one column, "count" (see below for a poor example), where there are multiple instances of the data as you would expect. However, I expected factored data to be displayed:
ggplot(df, aes(x=df$count)) + geom_bar()
by the order defined from the quantity of each factor, as it is different for unfactored (character) data i.e., will display alphabetically.
Any idea how to reorder?
This is my current awful effort, sadly I figured this out last night, then lost my R command history:
If you start off your project with loading the tidyverse, I suggest you use the built-in tidyverse function: fct_infreq()
ggplot(df, aes(x=fct_infreq(df$count))) + geom_bar()
As your categories are words, consider adding coord_flip() so that your bars run horizontally.
ggplot(df, aes(x=fct_infreq(df$count))) + geom_bar() + coord_flip()
This is what it looks like with some fish species counts: A horzontal bar chart with species on the y axis (but really the flipped x-axis) and counts on horizontal axis (but actually the flipped y-axis). The counts are sorted from least to greatest.
Converting the counts to a factor and then modifying that factor might help accomplish what you need. In the below I'm reversing the order of the counts using fct_rev from the forcats package (part of tidyverse)
library(tidyverse)
iris %>%
count(Sepal.Length) %>%
mutate(n=n %>% as.factor %>% fct_rev) %>%
ggplot(aes(n)) + geom_bar()
Alternatively, if you'd like the bars to be arranged large to small, you can use fct_infreq.
iris %>%
count(Sepal.Length) %>%
mutate(n=n %>% as.factor %>% fct_infreq) %>%
ggplot(aes(n)) + geom_bar()

How to plot parallel coordinates with multiple categorical variables in R

I am facing a difficulty while plotting a parallel coordinates plot using the ggparcoord from the GGally package. As there are two categorical variables, what I want to show in the visualisation is like the image below. I've found that in ggparcoord, groupColumn is only allowed to a single variable to group (colour) by, and surely I can use showPoints to mark the values on the axes, but i also need to vary the shape of these markers according to the categorical variables. Is there other package that can help me to realise my idea?
Any response will be appreciated! Thanks!
It's not that difficult to roll your own parallel coordinates plot in ggplot2, which will give you the flexibility to customize the aesthetics. Below is an illustration using the built-in diamonds data frame.
To get parallel coordinates, you need to add an ID column so you can identify each row of the data frame, which we'll use as a group aesthetic in ggplot. You also need to scale the numeric values so that they'll all be on the same vertical scale when we plot them. Then you need to take all the columns that you want on the x-axis and reshape them to "long" format. We do all that on the fly below with the tidyverse/dplyr pipe operator.
Even after limiting the number of category combinations, the lines are probably too intertwined for this plot to be easily interpretable, so consider this merely a "proof of concept". Hopefully, you can create something more useful with your data. I've used colour (for the lines) and fill (for the points) aesthetics below. You can use shape or linetype instead, depending on your needs.
library(tidyverse)
theme_set(theme_classic())
# Get 20 random rows from the diamonds data frame after limiting
# to two levels each of cut and color
set.seed(2)
ds = diamonds %>%
filter(color %in% c("D","J"), cut %in% c("Good", "Premium")) %>%
sample_n(20)
ggplot(ds %>%
mutate(ID = 1:n()) %>% # Add ID for each row
mutate_if(is.numeric, scale) %>% # Scale numeric columns
gather(key, value, c(1,5:10)), # Reshape to "long" format
aes(key, value, group=ID, colour=color, fill=cut)) +
geom_line() +
geom_point(size=2, shape=21, colour="grey50") +
scale_fill_manual(values=c("black","white"))
I haven't used ggparcoords before, but the only option that seemed straightforward (at least on my first try with the function) was to paste together two columns of data. Below is an example. Even with just four category combinations, the plot is confusing, but maybe it will be interpretable if there are strong patterns in your data:
library(GGally)
ds$group = with(ds, paste(cut, color, sep="-"))
ggparcoord(ds, columns=c(1, 5:10), groupColumn=11) +
theme(panel.grid.major.x=element_line(colour="grey70"))

Extreme values with ggplot histogram

I am trying to plot a histogram using ggplot() however I am unable to deal with extreme values. I would like them to be comined within one bin (called "500 and more" for example).
I have tried the scale_x_continuous(breaks = seq(0,500, by = 50)) function but it just removes labels from the x-axis (attached below) Any ideas of how to deal with this?
I would suggest to compute counts before the plotting. Using function cut() you can set breaks as you need and plot those data using geom_bar(). Setting width=1 inside the geom_bar() will remove space between bars.
library(dplyr)
library(ggplot2movies)
data("movies")
df<-movies %>% mutate(length.class=cut(length,breaks=c(seq(0,500,50),10000))) %>%
group_by(length.class) %>% summarise(count=n())
ggplot(df,aes(length.class,count))+geom_bar(stat="identity",width=1)

Resources