R: Arranging axis using ggplot2 - r

I am trying to present my data using ggplot2. My dataframe is build up like this:
type count
1 exon 4
2 intron 3
3 intron 1
4 exon 10
.. ... ..
I am trying to present the data by plotting as histograms and boxplots, but I encounter some problems.
For the histograms I used the following code:
ggplot(hisdat, aes(x=count, fill=type)) +
geom_histogram(binwidth=.5, position="dodge")
and that gives me this plot:
As you can see the counts in the bottom of the plot are arranged such that 10 follows 1 and 100 follows 10. I arrange them from the first single number of the number count. How do I get it to go from 1-148?
For the boxplot I have the same trouble and on top of that my plot is not looking like a boxplot at all. Is my code wrong?
ggplot(hisdat, aes(x=type, y=count, fill=type)) + geom_boxplot()
It gives me this result:

since the other part of your question has already been answered in the comments here is the answer to this part:
How do I get it to go from 1-148?
df <- read.table(header = TRUE, text=
" type count
1 exon 4
2 intron 3
3 intron 1
4 exon 10")
library(ggplot2)
library(ggplot2)
ggplot(df, aes(x = reorder(type, count), y = count, fill = type)) + geom_bar(stat = "identity", position = "dodge")

Related

Plot line on ggplot2 grouped bar chart

I have this data frame:
`Last Name` Feature Value
<chr> <chr> <dbl>
1 Name1 Resilience 1
2 Name2 Resilience 6
3 Name3 Resilience 2
4 Name1 Self-Discipline 3
5 Name2 Self-Discipline 7
6 Name3 Self-Discipline 4
7 Name1 Assertiveness 6
8 Name2 Assertiveness 7
9 Name3 Assertiveness 6
10 Name1 Activity level 4
and created a grouped barplot with the following code:
bar2 <- ggplot(team_sih_PP1, aes(x=Feature, y=Value, fill =`Last Name`)) + geom_bar(stat="identity", position="dodge") + coord_cartesian(ylim=c(1,7)) + scale_y_continuous(n.breaks = 7) +scale_fill_manual(values = c("#2a2b63", "#28d5ac", "#f2eff2")) + theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
I also created a new data frame that holds the average values of the 3 Last Names in each Feature:
mean_name means
1 Action 4.000000
2 Reflection 4.000000
3 Flexibility 3.666667
4 Structure 3.666667
I want to add a line that shows the means of each Feature so that it looks something like this:
I managed to plot just the line but not in the bar chart, please help!
Assuming you have your code correct for geom_line() to add to your plot, you will not see anything plotted unless you set the group aesthetic the same across your plot (ex. aes(group=1)). This is because your x axis is made of discrete values, and ggplot does not know that they are connected with your data via a line. When you set group=1 in the aesthetic, it forces ggplot2 to recognize that the entire dataset is tied together, and then the points of your line will be connected.
I'd show using your data you shared, but it does not provide the same plots as you've shown, so here's a representative example.
x_values <- c('These', 'Values', "are", "ordered", "but", "discrete")
set.seed(8675309)
df <- data.frame(
x=rep(x_values, 2),
type=rep(c("A", "B"), each =6),
y=sample(1:10, 12, replace=TRUE)
)
df$x <- factor(df$x, levels=x_values)
d_myline <- data.frame(
x=x_values,
rando=c(1,5,6,10,4,6)
)
p <- ggplot(df, aes(x,y)) +
geom_col(aes(fill=type), position="dodge", width=0.5)
The following code will not create a line on the plot (you won't get an error either, it just won't appear):
p + geom_line(data=d_myline, aes(x=x, y=rando))
However, if you set group=1, it shows the line as expected:
p + geom_line(data=d_myline, aes(x=x, y=rando, group=1))

Set free y limits in ggplot2 facets while using coord_cartesian

I have a data frame 'data' with three columns. The first column identifies the compound, the second the concentration of the compound and the third my measured data called 'Area'.
# A tibble: 12 x 3
Compound Conc Area
<chr> <dbl> <dbl>
1 Compound 1 0 247
2 Compound 1 5 44098
3 Compound 1 100 981797
4 Compound 1 1000 7084602
5 Compound 2 0 350
6 Compound 2 5 310434
7 Compound 2 100 6621537
8 Compound 2 1000 49493832
9 Compound 3 0 26
10 Compound 3 5 7707
11 Compound 3 100 174026
12 Compound 3 1000 1600143
I want to create a facetted plot per compound using geom_point & apply geom_smooth on the complete x axis. To look into detail in the lower concentration range I applied coord_cartesian to limit the x axis from 0 to 110.
However, each facet takes the maximum value of the given compound. As the scales are very different between compounds I can't use a fixed ylim as it would have to be different for each compound (in my real data I have > 20 compounds).
Is there a possibility to set the y-axis from 0 as minimum and as maximum per facet the maximal value which is visible?
The code I have (without any tries on limiting the y-axis is:
ggplot(data = data, aes(Conc, Area)) +
geom_point(size = 2.5) +
geom_smooth(method = "lm") +
facet_wrap(~Compound, ncol = 3, scales = "free_y") +
theme_bw() +
theme(legend.position = "bottom") +
coord_cartesian(xlim = c(0,110))
I figured out a workaround to get the results I want.
After creating a subset of the data I created a loop to plot all the data.
The subsetted data was used to determine the ylim in coord_cartesian.
With the resulting plot list I can use the gridExtra package to sort them in a grid.
data_100 <- data %>%
filter(Conc <= 110)
loop.vector <- unique(data$Compound)
plot_list = list()
for (i in seq_along(loop.vector)) {
p = ggplot(subset(data, data$Compound==loop.vector[i]),
aes(Conc, Area)) +
geom_point(size=2.5) +
geom_smooth(method = "lm", se = FALSE) +
theme_bw() +
theme(legend.position="bottom") +
coord_cartesian(xlim = c(0,110),
ylim = c(0, max(data_100$Area[data_100$Compound==loop.vector[i]]))) +
labs(title = loop.vector[i])
plot_list[[i]] = p
print(p)
}

how to make a bar plot for character value with percent format by ggplot2

I would like to make a bar plot with percent format.
here is my data set:
https://drive.google.com/file/d/1xpRqQwzKFuirpKYKcoi1qVYSaiA-D5WX/view?usp=sharing
load('test.Robj')
Here is my part of data looks like:
res.1.2 branch
AAACCTGCACCAGGCT 0 1
AAACCTGGTCATATGC 7 4
AAACCTGGTTAGTGGG 15 NA
AAACCTGTCCACGCAG 1 NA
AAACCTGTCCACGTTC 17 2
AAACGGGCACCGAATT 0 1
I tried to use this code to plot:
ggplot(test,aes(x = branch, y =factor(1),fill = res.1.2)) +
geom_bar(position = "fill",stat = "identity")+
scale_y_discrete(labels =scales::percent)
I want to make my y axis as percent of counts of res.1.2 in total(stacked bar chart, or similar to a pie chart),
quite similar to this issue
but I got this:
Any suggestion?
If I understand correctly, the OP wants to plot the values of res.1.2 which are of type character. So, res.1.2 needs to be coerced to integer for plotting:
# load OP's data
load('test.Robj')
# create plot
library(ggplot2)
ggplot(test,aes(x = branch, y = as.integer(res.1.2), fill = res.1.2)) +
geom_bar(position = "fill",stat = "identity") +
scale_y_continuous(labels = scales::percent)
However, if the OP intends to show the number of occurrences of each value of res.1.2 as share of total within each branch, the code is as follows:
# load OP's data
load('test.Robj')
# create plot
library(ggplot2)
ggplot(test, aes(x = branch, fill = res.1.2)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent)
The chart shows the counts of res.1.2 as percentage for each branch

coloring legend in bar chart in R

I'm faily new to R and do need some help in manipulating my graph. I'm trying to compare actual and forecast figures, but cannot get the coloring of the legends right. The data looks like this:
hierarchy Actual Forecast
<fctr> <dbl> <dbl>
1 E 9313 5455
2 K 6257 3632
3 O 7183 8684
4 A 1579 6418
5 S 8755 0149
6 D 5897 7812
7 F 1400 8810
8 G 4960 5710
9 R 3032 0412
And the code looks like this:
ggplot(sam4, aes(hierarchy))+ theme_bw() +
geom_bar(aes(y = Actual, colour="Actual"),fill="#66FF33", stat="identity",position="dodge", width=0.40) +
geom_bar(aes(y = Forecast, colour="Forecast"), fill="#FF3300", stat="identity",position="dodge", width=0.2)
The graph ends up looking like this:
I believe your problem is that your data is not formatted well to use ggplot. You want to tidy up your dataframe first. Check out http://tidyr.tidyverse.org/ to get familiar with the concept of tidy data.
Using the tidyverse (ggplot is part of it), I tidied up your data and I believe got the plot you want.
library(tidyverse) #includes ggplot
newdata <- gather(sam4, actualorforecast, value, -hierarchy)
ggplot(newdata, aes(x = hierarchy)) +
theme_bw() +
geom_bar(aes(y = value, fill = actualorforecast),
stat = "identity",
width = ifelse(newdata$actualorforecast == "Actual", .4, .2),
position = "dodge") +
scale_fill_manual(values= c(Actual ="#66FF33", Forecast="#FF3300"))

Character values on a continuous axis in R ggplot2

Is there a way to include character values on the axes when plotting continuous data with ggplot2? I have censored data such as:
x y Freq
1 -3 16 3
2 -2 12 4
3 0 10 6
4 2 7 7
5 2 4 3
The last row of data are right censored. I am plotting this with the code below to produce the following plot:
a1 = data.frame(x=c(-3,-2,0,2,2), y=c(16,12,10,7,4), Freq=c(3,4,6,7,3))
fit = ggplot(a1, aes(x,y)) + geom_text(aes(label=Freq), size=5)+
theme_bw() +
scale_x_continuous(breaks = seq(min(a1$x)-1,max(a1$x)+1,by=1),
labels = seq(min(a1$x)-1,max(a1$x)+1,by=1),
limits = c(min(a1$x)-1,max(a1$x)+1))+
scale_y_continuous(breaks = seq(min(a1$y),max(a1$y),by=2))
The 3 points at (2,4) are right censored. I would like them to be plotted one unit to the right with the corresponding xaxis tick mark '>=2' instead of 3. Any ideas if this is possible?
It is quite possible. I hacked the data so 2,4 it's 3,4. Then I modified your labels which can be whatever you want as long as they are the same length as the breaks.
ggplot(a1, aes(x,y)) + geom_text(aes(label=Freq), size=5)+
theme_bw() +
scale_x_continuous(breaks = seq(min(a1$x)-1,max(a1$x),by=1),
labels = c(seq(min(a1$x)-1,max(a1$x)-1,by=1), ">=2"),
limits = c(min(a1$x)-1,max(a1$x)))+
scale_y_continuous(breaks = seq(min(a1$y),max(a1$y),by=2))

Resources