Bar plot X axis not in numerical order - r

I'm trying to plot using ggplot2 a bar graph. With the x values being different ranges ( SO x1= 0-10, x2= 11-20, x3= 21-30, ...... until 91-100, and the last range is ">100"). When i plot my graph with the corresponding y values as follows:
ggplot(data=figure1_data, aes(x=Average.Coverage.of.Study, y=Number.of.Studies)) + geom_bar(stat = "Identity")
The ">100" x value comes first in the plot and not at the end which is where I want it. How would I get it to come after the 91-100 range x value?? Can someone please help - I am very new to R. much appreciated!! :)

Edited:
You need to order levels by your preference.
df <- data.frame(avg = c("0-10","11-20","21-30","91-100",">100"),
studies = c(1:5))
ggplot(df)+
geom_bar(aes(x = ordered(avg, levels = c(avg)), y = studies), stat = "identity")
or
ggplot(df)+
geom_bar(aes(x = ordered(avg, levels = c("0-10","11-20","21-30","91-100",">100")), y = studies), stat = "identity")`
will both give you the same result.

Related

adding a line to a ggplot boxplot

I'm struggling with ggplot2 and I've been looking for a solution online for several hours. Maybe one of you can give me a help? I have a data set that looks like this (several 100's of observations):
Y-AXIS
X-AXIS
SUBJECT
2.2796598
F1
1
0.9118639
F1
2
2.7111228
F3
3
2.7111228
F2
4
2.2796598
F4
5
2.3876401
F10
6
....
...
...
The X-AXIS is a continuous value larger than 0 (the upper limit can vary from data set to data set, but is typically < 100). Y-AXIS is a categorical variable with 10 levels. SUBJECT refers to an individual and, across the entire data set, each individual has exactly 10 observations, exactly 1 for each level of the categorical variable.
To generate a box plot, I used ggplot like this:
plot1 <- ggplot(longdata,
aes(x = X_axis, y = Y_axis)) +
geom_boxplot() +
ylim(0, 12.5) +
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
That results in the boxplot I have in mind. You can check out the result here if you like: boxplot
So far so good. What I want to do next, and hopefully someone can help me, is this: for one specific SUBJECT, I want to plot a line for their 10 scores in the same figure. So on top of the boxplot. An example of what I have in mind can be found here: boxplot with data of one subject as a line. In this case, I simply assumed that the outliers belong to the same case. This is just an assumption. The data of an individual case can also look like this: boxplot with data of a second subject as a line
Additional tips on how to customize that line (colour, thikness, etc.) would also be appreciated. Many thanks!
library(ggplot2)
It is always a good idea to add a reproducible example of your data,
you can always simulate what you need
set.seed(123)
simulated_data <- data.frame(
subject = rep(1:10, each = 10),
xaxis = rep(paste0('F', 1:10), times = 10),
yaxis = runif(100, 0, 100)
)
In ggplot each geom can take a data argument, for your line just use
a subset of your original data, limited to the subject desired.
Colors and other visula elements for the line are simple, take a look here
ggplot() +
geom_boxplot(data = simulated_data, aes(xaxis, yaxis)) +
geom_line(
data = simulated_data[simulated_data$subject == 1,],
aes(xaxis, yaxis),
color = 'red',
linetype = 2,
size = 1,
group = 1
)
Created on 2022-10-14 with reprex v2.0.2
library(ggplot2)
library(dplyr)
# Simulate some data absent a reproducible example
testData <- data.frame(
y = runif(300,0,100),
x = as.factor(paste0("F",rep(1:10,times=30))),
SUBJECT = as.factor(rep(1:30, each = 10))
)
# Copy your plot with my own data + ylimits
plot1 <- ggplot(testData,
aes(x = x, y = y)) +
geom_boxplot() +
ylim(0, 100) +
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
# add the geom_line for subject 1
plot1 +
geom_line(data = filter(testData, SUBJECT == 1),
mapping = aes(x=x, y=y, group = SUBJECT))
My answer is very similar to Johan Rosa's but his doesn't use additional packages and makes the aesthetic options for the geom_line much more apparent - I'd follow his example if I were you!

ggplot2: Why is it displaying the wrong values when set to log10 axis?

I'm using stat_summary to display the mean and, based off my calculations, "type1, G-" should have a mean of ~10^7.3. And that's the value I get from plotting it without a log10 axis. But when I add in the log10 axis, suddenly "type1, G-" shows a value of 10^6.5.
What's going on?
#Data
Type = rep(c("type1", "type2"), each = 6)
Gen = rep(rep(c("G-", "G+"), each = 3), 2)
A = c(4.98E+05, 5.09E+05, 1.03E+05, 3.08E+05, 5.07E+03, 4.22E+04, 6.52E+05, 2.51E+04, 8.66E+05, 8.10E+04, 6.50E+06, 1.64E+06)
B = c(6.76E+07, 3.25E+07, 1.11E+07, 2.34E+06, 4.10E+04, 1.20E+06, 7.50E+07, 1.65E+05, 9.52E+06, 5.92E+06, 3.11E+08, 1.93E+08)
df = melt(data.frame(Type, Gen, A, B))
#Correct, non-log10 version ("type1 G-" has a value over 1e+07)
ggplot(data = df, aes(x =Type,y = value)) +
stat_summary(fun.y="mean",geom="bar",position="dodge",aes(fill=Gen))+
scale_x_discrete(limits=c("type1"))+
coord_cartesian(ylim=c(10^7,10^7.5))
#Incorrect, log10 version ("type1 G-" has a value under 1e+07)
ggplot(data = df, aes(x =Type,y = value)) +
stat_summary(fun.y="mean",geom="bar",position="dodge",aes(fill=Gen))+
scale_y_log10()
You want coord_trans. As its documentation says:
# The difference between transforming the scales and
# transforming the coordinate system is that scale
# transformation occurs BEFORE statistics, and coordinate
# transformation afterwards.
However, you cannot make a barplot with this, since bars start at 0 and log10(0) is not defined. But barplots are usually not a good visualization anyway.
ggplot(data = df, aes(x =Type,y = value)) +
stat_summary(fun.y="mean",geom="point",position="identity",aes(color=Gen))+
coord_trans(y = "log10", limy = c(1e5, 1e8)) +
scale_y_continuous(breaks = 10^(5:8))
Obviously you should plot some kind of uncertainty information. I'd recommend a boxplot.

Plot categorical data as histogram/ bar in R?

I am new to R and have been trying for a few days to plot histogram / bar chart to view the trend. I have this categorical variable : countryx and coded it into 1,2,3.
I have tried these 2 scripts below and got error messages as follows :
Output 1: blank chart with x and y axis, no stack/bar trend
qplot(DI$countryx,geom = "histogram",ylab = "count",
xlab = "countryx",binwidth=5,colour=I("blue"),fill=I("wheat"))
Output 2: error message- ggplot2 doesn't know how to deal with data of class integer
ggplot(DI$countryX, aes(x=countryx))
+ geom_bar(aes(y=count), stat = "count",position ="stack",...,
width =5,aes=true)
Appreciate for all advice.
Thank you very much for your help!
Multiple problems with your code. ggplot takes a dataframe, not a vector, but you're supplying a vector. Try this
ggplot(DI, aes(x=countryx, y = count)) + geom_col(width = 5)
As #yeedle mentioned you need a data.frame (maybe use as.data.frame)
How about:
library(ggplot2)
df <- data.frame(countryx = rep(1:3), count = rbinom(3,10,0.3))
p <- ggplot2::ggplot(df, aes(x = countryx, y = count)) + ylab("count")
p + geom_col(aes(x = countryx, fill = factor(countryx)))

changing y scale when using fun.y ggplot

This an example of my data
library(ggplot)
set.seed(1)
df <- data.frame(Groups = factor(rep(1:10, each = 10)))
x <- sample(1:100, 50)
df[x, "Style"] <- "Lame"
df[-x, "Style"] <- "Cool"
df$Style <- factor(df$Style)
p <- ggplot() + stat_summary(data = df, aes(Groups, Style, fill = Style),
geom = "bar", fun.y = length, position=position_dodge())
(Sorry, this is my first question... I don't know how to present code snippets like head(df) or the actual plot in SO. Please run this code to understand my question.)
So the plot adequately presents the count of every 'Style' per 'Groups'. However, the y axis scale shows the levels of the factor variable 'Style'. Although values I am plotting are originally discrete, the count of every 'Cool' and 'Lame' per 'Groups' is continuous.
How do I change the 'y' scale of my barplot from discrete to continuous in ggplot2, in order to correspond to the count values and not the original factor levels???
You can take advantage of ggplot grouping and the histogram to do this for you
p <- ggplot(df, aes(Groups, fill=Style)) + geom_histogram(position=position_dodge())

normalize ggplot histogram so that first height is 1 (to show growth) in R

I was wondering if there is a way to normalize the heights of the histograms with multiple groups so that their first heights are all = 1. For instance:
results <- rep(c(1,1,2,2,2,3,1,1,1,3,4,4,2,5,7),3)
category <- rep(c("a","b","c"),15)
data <- data.frame(results,category)
p <- ggplot(data, aes(x=results, fill = category, y = ..count..))
p + geom_histogram(position = "dodge")
gives a regular histogram with 3 groups.
Also
results <- rep(c(1,1,2,2,2,3,1,1,1,3,4,4,2,5,7),3)
category <- rep(c("a","b","c"),15)
data <- data.frame(results,category)
p <- ggplot(data, aes(x=results, fill = category, y = ..ncount..))
p + geom_histogram(position = "dodge")
gives a the histogram where each group is normalized to have maximum height of 1.
I want to get a histogram where each group is normalized to have first height of 1 (so I can show growth) but I don't understand if there is an appropriate alternative to ..ncount or ..count.. or if anyone can help me understand the structure of ..count.. I could maybe figure it out from there.
Thanks!
I bet there is a nice way to do everything within ggplot. However, I tend to prefer preparing the desired data set before I plug it into ggplot. If I understood you correctly, you may try something like this:
# convert 'results' to factor and set levels to get an equi-spaced 'results' x-axis
df$results <- factor(df$results, levels = 1:7)
# for each category, count frequency of 'results'
df <- as.data.frame(with(df, table(results, category)))
# normalize: for each category, divide all 'Freq' (heights) with the first 'Freq'
df$freq2 <- with(df, ave(Freq, category, FUN = function(x) x/x[1]))
ggplot(data = df, aes(x = results, y = freq2, fill = category)) +
geom_bar(stat = "identity", position = "dodge")
It looks like ..density.. does what you want, but I can't for the life of me find documentation on it. On both your examples it does what you are looking for, though!
results <- rep(c(1,1,2,2,2,3,1,1,1,3,4,4,2,5,7),3)
category <- rep(c("a","b","c"),15)
data <- data.frame(results,category)
p <- ggplot(data, aes(x=results, fill = category, y = ..density..))
p + geom_histogram(position = "dodge")

Resources