Extreme values with ggplot histogram - r

I am trying to plot a histogram using ggplot() however I am unable to deal with extreme values. I would like them to be comined within one bin (called "500 and more" for example).
I have tried the scale_x_continuous(breaks = seq(0,500, by = 50)) function but it just removes labels from the x-axis (attached below) Any ideas of how to deal with this?

I would suggest to compute counts before the plotting. Using function cut() you can set breaks as you need and plot those data using geom_bar(). Setting width=1 inside the geom_bar() will remove space between bars.
library(dplyr)
library(ggplot2movies)
data("movies")
df<-movies %>% mutate(length.class=cut(length,breaks=c(seq(0,500,50),10000))) %>%
group_by(length.class) %>% summarise(count=n())
ggplot(df,aes(length.class,count))+geom_bar(stat="identity",width=1)

Related

Data in R (ggplot2) not showing up in histogram

I have attempted to use ggplot2 and the normal hist() function to display the data required. I messed around with bin widths and number of bins, but I've been getting very similar results to this.
This is my code:
geneCount = read.delim("smc_gene_expression_counts.txt")
geneCount$I98_FBS
geneCount %>% ggplot() + geom_histogram(aes(I98_FBS), binwidth = 500)
Histogram Output:
Examples of Values in Column Used (I98_FBS)

gg-plot isnt producing bars on the Bar chart

I have used the following code
ggplot(IncomeGroup_count,
aes(x=Income_Group,
y= Percentage,
colour = 'lightblue'))
But the only thing it produces is the x and y axis with no bars.
It looks like you are missing the geom_bar function. Typically ggplot works as the base function for fitting whatever data you are using and then the geom functions "draw" the plot using that data. Here I have constructed a bar plot using data in R since you have not supplied your own data in your question.
#### Load Library ####
library(tidyverse)
#### Plot ####
airquality %>%
ggplot(aes(x=is.na(Ozone)))+
geom_bar()
Which gives you this bare bones bar plot. It takes the NA values from the airquality dataset, then draws bars thereafter plotting the NA values by "TRUE" for NA and "FALSE" for not NA:
Edit
Again it's difficult to guess what is going wrong if you don't share your data, so I'm assuming based off your comments below that you are trying to plot y explicitly as a count. As suggested in the comments, geom_col may be better for this. Using a different example from the same dataset:
airquality %>%
group_by(Month) %>%
summarise(Mean_Ozone = mean(Ozone, na.rm=T)) %>%
ggplot(aes(x=Month,
y=Mean_Ozone))+
geom_col()
You get this:

Turning data into horizontal bar graph with ggplot2

I would like to create a horizontal bar graph from my data.
The link to my data is here.
The code that I am using
library(ggplot2)
ggplot(data=df , aes(x=fct_inorder(WorkSchedule),y=timing, fill=Value)) + geom_col() + coord_flip()
The output of the plot:
How to change the x-axis to show time from 04:00 till 03:45 (24h)
I tried factor(Source) but it does not work.
UPDATE# How can I change the x axis of this graph?
Many tahnks
With the function lvls_reorder() from library forçats, you can specify the order of the levels of your variable.
library(tidyverse) # forcats is included in tidyverse library
df <- df %>%
mutate(Workschedule = lvls_reorder(Workschedule, c(3,2,4,5,1))
If you transform the variable Source as a factor, you can also determine the order you want.

Reorder factored count data in ggplot2 geom_bar

I find countless examples of reordering X by the corresponding size of Y if the Dataframe for ggplot2 (geom_bar) is read using stat="identity".
I have yet to find an example of stat="count". The reorder function fails as I have no corresponding y.
I have a factored DF of one column, "count" (see below for a poor example), where there are multiple instances of the data as you would expect. However, I expected factored data to be displayed:
ggplot(df, aes(x=df$count)) + geom_bar()
by the order defined from the quantity of each factor, as it is different for unfactored (character) data i.e., will display alphabetically.
Any idea how to reorder?
This is my current awful effort, sadly I figured this out last night, then lost my R command history:
If you start off your project with loading the tidyverse, I suggest you use the built-in tidyverse function: fct_infreq()
ggplot(df, aes(x=fct_infreq(df$count))) + geom_bar()
As your categories are words, consider adding coord_flip() so that your bars run horizontally.
ggplot(df, aes(x=fct_infreq(df$count))) + geom_bar() + coord_flip()
This is what it looks like with some fish species counts: A horzontal bar chart with species on the y axis (but really the flipped x-axis) and counts on horizontal axis (but actually the flipped y-axis). The counts are sorted from least to greatest.
Converting the counts to a factor and then modifying that factor might help accomplish what you need. In the below I'm reversing the order of the counts using fct_rev from the forcats package (part of tidyverse)
library(tidyverse)
iris %>%
count(Sepal.Length) %>%
mutate(n=n %>% as.factor %>% fct_rev) %>%
ggplot(aes(n)) + geom_bar()
Alternatively, if you'd like the bars to be arranged large to small, you can use fct_infreq.
iris %>%
count(Sepal.Length) %>%
mutate(n=n %>% as.factor %>% fct_infreq) %>%
ggplot(aes(n)) + geom_bar()

geom_histogram to plot counts/accumulation of each x value and higher

I am trying to create a histogram/bar plot in R to show the counts of each x value I have in the dataset and higher. I am having trouble doing this, and I don't know if I use geom_histogram or geom_bar (I want to use ggplot2). To describe my problem further:
On the X axis I have "Percent_Origins," which is a column in my data frame. On my Y axis - for each of the Percent_Origin values I have occurring, I want the height of the bar to represent the count of rows with that percent value and higher. Right now, if I am to use a histogram, I have:
plot <- ggplot(dataframe, aes(x=dataframe$Percent_Origins)) +
geom_histogram(aes(fill=Percent_Origins), binwidth= .05, colour="white")
What should I change the fill or general code to be to do what I want? That is, plot an accumulation of counts of each value and higher? Thanks!
I think that your best bet is going to be creating the cumulative distribution function first then passing it to ggplot. There are several ways to do this, but a simple one (using dplyr) is to sort the data (in descending order), then just assign a count for each. Trim the data so that only the largest count is still included, then plot it.
To demonstrate, I am using the builtin iris data.
iris %>%
arrange(desc(Sepal.Length)) %>%
mutate(counts = 1:n()) %>%
group_by(Sepal.Length) %>%
slice(n()) %>%
ggplot(aes(x = Sepal.Length, y = counts)) +
geom_step(direction = "vh")
gives:
If you really want bars instead of a line, use geom_col instead. However, note that you either need to fill in gaps (to ensure the bars are evenly spaced across the range) or deal with breaks in the plot.

Resources