R - order of legend ggplot - r

I have the following data frame:
Author<-c("University","Office", "School","University","Office", "School","University","Office", "School")
Typ<-c("Text", "Text", "Text","Data", "Data","Data", "List", "List", "List")
Number<-c("3","1","6","4","4","2","8","1","1")
df<-data.frame(Typ,Author,Number)
If I apply:
ggplot(df, aes(x=Author, y=Number, fill=Typ)) +
geom_bar(stat='identity') + coord_flip()
then I get a stacked bar plot where the bars are orders in the order of the date frame, i.e. Text, Data, List, but the legend is in alphabethic order. Is there any (non brute force, ie. by hand) option such that I can rearrange the legend also in the "given" order of the df, i.e. in Text, Data, List?
(just to clarify - I have a bunch of data frames like that which are also bigger in the sense that the vectors "Typ" (which are also different in each data frame) have more entries whose order should not be changed and also displayed in the legend. I wrote a routine which plots all these data frames so I cannot change the legends manually - I am really looking for a routine friendly solution)

You could automatically set your levels according to the order how they appear in your data.frame:
df$Typ <- factor(df$Typ, levels = unique(df$Typ))
ggplot(df, aes(x=Author, y=Number, fill=Typ)) +
geom_bar(stat='identity') + coord_flip()
In this way you change the order of your factor according to the order in df$Typ:

Related

How to make stacked bar chart with count values on y axis>

I'm trying to create a stacked barchart with gene sequencing data, where for each gene there is a tRF.type and Amino.Acid value. An example data set looks like this:
tRF <- c('tRF-26-OB1690PQR3E', 'tRF-27-OB1690PQR3P', 'tRF-30-MIF91SS2P46I')
tRF.type <- c('5-tRF', 'i-tRF', '3-tRF')
Amino.Acid <- c('Ser', 'Lys', 'Ser')
tRF.data <- data.frame(tRF, tRF.type, Amino.Acid)
I would like the x-axis to represent the amino acid type, the y-axis the number of counts of each tRF type and the the fill of the bars to represent each tRF type.
My code is:
ggplot(chart_data, aes(x = Amino.Acid, y = tRF.type, fill = tRF.type)) +
geom_bar(stat="identity") +
ggtitle("LAN5 - 4 days post CNTF treatment") +
xlab("Amino Acid") +
ylab("tRF type")
However, it generates this graph, where the y-axis is labelled with the categories of tRF type. How can I change my code so that the y-axis scale is numerical and represents the counts of each tRF type?
Barchart
OP and Welcome to SO. In future questions, please, be sure to provide a minimal reproducible example - meaning provide code, an image (if possible), and at least a representative dataset that can demonstrate your question or problem clearly.
TL;DR - don't use stat="identity", just use geom_bar() without providing a stat, since default is to use the counts. This should work:
ggplot(chart_data, aes(x = Amino.Acid, fill = tRF.type)) + geom_bar()
The dataset provided doesn't adequately demonstrate your issue, so here's one that can work. The example data herein consists of 100 observations and two columns: one called Capitals for randomly-selected uppercase letters and one Lowercase for randomly-selected lowercase letters.
library(ggplot2)
set.seed(1234)
df <- data.frame(
Capitals=sample(LETTERS, 100, replace=TRUE),
Lowercase=sample(letters, 100, replace=TRUE)
)
If I plot similar to your code, you can see the result:
ggplot(df, aes(x=Capitals, y=Lowercase, fill=Lowercase)) +
geom_bar(stat="identity")
You can see, the bars are stacked, but the y axis is all smooshed down. The reason is related to understanding the difference between geom_bar() and geom_col(). Checking the documentation for these functions, you can see that the main difference is that geom_col() will plot bars with heights equal to the y aesthetic, whereas geom_bar() plots by default according to stat="count". In fact, using geom_bar(stat="identity") is really just a complicated way of saying geom_col().
Since your y aesthetic is not numeric, ggplot still tries to treat the discrete levels numerically. It doesn't really work out well, and it's the reason why your axis gets smooshed down like that. What you want, is geom_bar(stat="count").... which is the same as just using geom_bar() without providing a stat=.
The one problem is that geom_bar() only accepts an x or a y aesthetic. This means you should only give it one of them. This fixes the issue and now you get the proper chart:
ggplot(df, aes(x=Capitals, fill=Lowercase)) + geom_bar()
You want your y-axis to be a count, not tRF.type. This code should give you the correct plot: I've removed the y = tRF.type from ggplot(), and stat = "identity from geom_bar() (it is using the default value of stat = "count instead).
ggplot(tRF.data, aes(x = Amino.Acid, fill = tRF.type)) +
geom_bar() +
ggtitle("LAN5 - 4 days post CNTF treatment") +
xlab("Amino Acid") +
ylab("tRF type")

Change colors boxplots by groups ggplot2

I have this boxplots in shiny and I would like to change colors with vector "cols", change the order of the legend and rename x axis. Do you know the best way to do that? I have tried with scale_fill_discrete and scale_x_discrete and it didn't work.
Thanks!
dados7 <- reactive({
dataset1() %>% filter(variable==input$frame) %>%
rename( var8 = regiao, var9 = imp, var10 = metodo)
})
cols<-c("green","orange", "red", "blue","pink","salmon","black")
renderPlotly({
title3<-paste(input$frame, "por regiĆ£o")
if (input$frame=="Taxa_Natalidade")
r<- dados7() %>%
ggplot(aes(x = var10, y = var9)) +
geom_boxplot(aes(fill = var10), position = position_dodge(0.9)) +
facet_wrap(vars(var8))
r
})
have you tried: scale_fill_manual(values=c("blue","green", etc...)
Should work
Your question is actually a few parts in one. You are looking to:
Change the colors for fill to a specific palette.
Change the order of the legend
Rename the x axis.
The easy one is #3. All you have to do is rename the x axis by specifying in labs(x="new name for your axis"). If you are changing the x axis scale with a scale_x_* function, you'll want to rename within that function, since labs() is just a convenience function for various scale_*_(name=...) functions.
Now, for the other questions, it's not possible to provide a good answer without a good dataset, so I'm going to make up some data using the iris dataset.
set.seed(123)
df <- iris
df$rand_label <- sample(paste0('Type',1:3), nrow(df), replace=TRUE)
Now, see a resulting boxplot to be used to demonstrate:
p <- ggplot(df, aes(x=Species, y=Sepal.Length, fill=rand_label)) +
geom_boxplot() + theme_classic()
p
Changing Colors
To change fill colors, you only need to specify via scale_fill_manual() by passing the list of colors to the values= argument. Caution: you must supply a list of colors that matches the number of levels in the factor used to define fill. In this example, df$rand_label contains 3 levels, so we need to supply a vector of 3 colors:
cols <- c('orange','pink','gray20')
p + scale_fill_manual(values=cols)
If you want to specify to which level the colors are assigned, instead of passing a character vector you can pass either a named vector or a list of "label name" = "color name". Note that order doesn't matter here, since everything is explicitly defined:
cols1 <- c('Type3'='orange','Type2'='pink','Type1'='gray20')
p + scale_fill_manual(values=cols1)
Changing Ordering
You can change the order of the fill legend in two different ways: (1) change the order of the legend and the positioning on the plot, and (2) just change the order of the items in the legend itself. First, I'll show you the #1 case (changing order of legend and positioning on the plot).
Change order of legend and order on plot
Changing both is more typically what you will do, since we often like the order things appear in the legend to match the order in which they appear on the plot. The best way to do this is to refactor the column in question and pass an ordered vector to levels= matching the order you want. You then need to call ggplot() again with your re-leveled factor:
df$rand_label <- factor(df$rand_label, levels=c('Type3','Type1','Type2'))
ggplot(df, aes(x=Species, y=Sepal.Length, fill=rand_label)) +
geom_boxplot() + theme_classic() + scale_fill_manual(values=cols)
Note that the order of the colors is still applied the same, but the order of the items is different in the plot. The order in which the items appear in the legend is also different.
Change only order in the legend
If you want to adjust the order of items as they appear in the legend, you can use the breaks= argument within scale_fill_manual() to define the order in which the items appear. In this case, we can use this to return the levels to their original order in the plot above, but retain the mixed up ordering we defined by releveling the factor. Also note that since we're just passing cols and not the named vector cols1, the colors are applied according to how the levels appear in the legend (not the way in which they are ordered in the factor):
df$rand_label <- factor(df$rand_label, levels=c('Type3','Type1','Type2'))
ggplot(df, aes(x=Species, y=Sepal.Length, fill=rand_label)) +
geom_boxplot() + theme_classic() +
scale_fill_manual(values=cols, breaks=c('Type1','Type2','Type3'))
You can also use a similar strategy to reorder the x axis: in this case, you would refactor df$Species and set the levels= according to your preferred order.

Apply ggplot2 across columns

I am working with a dataframe with many columns and would like to produce certain plots of the data using ggplot2, namely, boxplots, histograms, density plots. I would like to do this by writing a single function that applies across all attributes (columns), producing one boxplot (or histogram etc) and then storing that as a given element of a list into which all the boxplots will be chained, so I could later index it by number (or by column name) in order to return the plot for a given attribute.
The issue I have is that, if I try to apply across columns with something like apply(df,2,boxPlot), I have to define boxPlot as a function that takes just a vector x. And when I do so, the attribute/column name and index are no longer retained. So e.g. in the code for producing a boxplot, like
bp <- ggplot(df, aes(x=Group, y=Attr, fill=Group)) +
geom_boxplot() +
labs(title="Plot of length per dose", x="Group", y =paste(Attr)) +
theme_classic()
the function has no idea how to extract the info necessary for Attr from just vector x (as this is just the column data and doesn't carry the column name or index).
(Note the x-axis is a factor variable called 'Group', which has 6 levels A,B,C,D,E,F, within X.)
Can anyone help with a good way of automating this procedure? (Ideally it should work for all types of ggplots; the problem here seems to simply be how to refer to the attribute name, within the ggplot function, in a way that can be applied / automatically replicated across the columns.) A for-loop would be acceptable, I guess, but if there's a more efficient/better way to do it in R then I'd prefer that!
Edit: something like what would be achieved by the top answer to this question: apply box plots to multiple variables. Except that in that answer, with his code you would still need a for-loop to change the indices on y=y[2] in the ggplot code and get all the boxplots. He's also expanded-grid to include different ````x``` possibilities (I have only one, the Group factor), but it would be easy to simplify down if the looping problem could be handled.
I'd also prefer just base R if possible--dplyr if absolutely necessary.
Here's an example of iterating over all columns of a data frame to produce a list of plots, while retaining the column name in the ggplot axis label
library(tidyverse)
plots <-
imap(select(mtcars, -cyl), ~ {
ggplot(mtcars, aes(x = cyl, y = .x)) +
geom_point() +
ylab(.y)
})
plots$mpg
You can also do this without purrr and dplyr
to_plot <- setdiff(names(mtcars), 'cyl')
plots <-
Map(function(.x, .y) {
ggplot(mtcars, aes(x = cyl, y = .x)) +
geom_point() +
ylab(.y)
}, mtcars[to_plot], to_plot)
plots$mpg

ggplot2 stacked barplots, formatting, and grids

In the data that I am attempting to plot, each sample belongs in one of several groups, that will be plotted on their own grids. I am plotting stacked bar plots for each sample that will be ordered in increasing number of sequences, which is an id attribute of each sample.
Currently, the plot (with some random data) looks like this:
(Since I don't have the required 10 rep for images, I am linking it here)
There are couple things I need to accomplish. And I don't know where to start.
I would like the bars not to be placed at its corresponding nseqs value, rather placed next to each other in ascending nseqs order.
I don't want each grid to have the same scale. Everything needs to fit snugly.
I have tried to set scales and size to for facet_grid to free_x, but this results in an unused argument error. I think this is related to the fact that I have not been able to get the scales library loaded properly (it keeps saying not available).
Code that deals with plotting:
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_grid(~group) +
scale_y_continuous() +
opts(title=paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))
Try this:
update.packages()
## I'm assuming your ggplot2 is out of date because you use opts()
## If the scales library is unavailable, you might need to update R
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
ggfdata$nseqs <- factor(ggfdata$nseqs)
## Making nseqs a factor will stop ggplot from treating it as a numeric,
## which sounds like what you want
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_wrap(~group, scales="free_x") + ## No need for facet_grid with only one variable
labs(title = paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))

Reorder levels of facet_wrap using ggplot

I have an existing dataset with three factors. I would like to plot these three factors using facet_grid() and have them ordered based on how they are ordered in the dataset instead of alphabetical order. Is this possible to do somehow without modifying my data structure?
Here's the data:
https://dl.dropboxusercontent.com/u/22681355/data.csv
data<-read.csv("data.csv", head=T)
ggplot(data, aes(time,a, color="one")) +
geom_line(linetype=1, size=0.3) +
scale_y_continuous(breaks=seq(0,1,0.2)) +
scale_x_continuous(breaks=seq(100,300,50)) +
theme_bw() +
geom_line(aes(time,b)) +
geom_line(aes(time,c)) +
geom_line(aes(time,d))+facet_wrap(~X.1)
This question appears quite too often on SO. You've to get the desired column (by which you're facetting) as a factor with levels in the order you desire, as follows:
data$X.1 <- factor(data$X.1, levels=unique(data$X.1))
Now, plot it and you'll get the facetted plot in the desired order.

Resources