I have this boxplots in shiny and I would like to change colors with vector "cols", change the order of the legend and rename x axis. Do you know the best way to do that? I have tried with scale_fill_discrete and scale_x_discrete and it didn't work.
Thanks!
dados7 <- reactive({
dataset1() %>% filter(variable==input$frame) %>%
rename( var8 = regiao, var9 = imp, var10 = metodo)
})
cols<-c("green","orange", "red", "blue","pink","salmon","black")
renderPlotly({
title3<-paste(input$frame, "por regiĆ£o")
if (input$frame=="Taxa_Natalidade")
r<- dados7() %>%
ggplot(aes(x = var10, y = var9)) +
geom_boxplot(aes(fill = var10), position = position_dodge(0.9)) +
facet_wrap(vars(var8))
r
})
have you tried: scale_fill_manual(values=c("blue","green", etc...)
Should work
Your question is actually a few parts in one. You are looking to:
Change the colors for fill to a specific palette.
Change the order of the legend
Rename the x axis.
The easy one is #3. All you have to do is rename the x axis by specifying in labs(x="new name for your axis"). If you are changing the x axis scale with a scale_x_* function, you'll want to rename within that function, since labs() is just a convenience function for various scale_*_(name=...) functions.
Now, for the other questions, it's not possible to provide a good answer without a good dataset, so I'm going to make up some data using the iris dataset.
set.seed(123)
df <- iris
df$rand_label <- sample(paste0('Type',1:3), nrow(df), replace=TRUE)
Now, see a resulting boxplot to be used to demonstrate:
p <- ggplot(df, aes(x=Species, y=Sepal.Length, fill=rand_label)) +
geom_boxplot() + theme_classic()
p
Changing Colors
To change fill colors, you only need to specify via scale_fill_manual() by passing the list of colors to the values= argument. Caution: you must supply a list of colors that matches the number of levels in the factor used to define fill. In this example, df$rand_label contains 3 levels, so we need to supply a vector of 3 colors:
cols <- c('orange','pink','gray20')
p + scale_fill_manual(values=cols)
If you want to specify to which level the colors are assigned, instead of passing a character vector you can pass either a named vector or a list of "label name" = "color name". Note that order doesn't matter here, since everything is explicitly defined:
cols1 <- c('Type3'='orange','Type2'='pink','Type1'='gray20')
p + scale_fill_manual(values=cols1)
Changing Ordering
You can change the order of the fill legend in two different ways: (1) change the order of the legend and the positioning on the plot, and (2) just change the order of the items in the legend itself. First, I'll show you the #1 case (changing order of legend and positioning on the plot).
Change order of legend and order on plot
Changing both is more typically what you will do, since we often like the order things appear in the legend to match the order in which they appear on the plot. The best way to do this is to refactor the column in question and pass an ordered vector to levels= matching the order you want. You then need to call ggplot() again with your re-leveled factor:
df$rand_label <- factor(df$rand_label, levels=c('Type3','Type1','Type2'))
ggplot(df, aes(x=Species, y=Sepal.Length, fill=rand_label)) +
geom_boxplot() + theme_classic() + scale_fill_manual(values=cols)
Note that the order of the colors is still applied the same, but the order of the items is different in the plot. The order in which the items appear in the legend is also different.
Change only order in the legend
If you want to adjust the order of items as they appear in the legend, you can use the breaks= argument within scale_fill_manual() to define the order in which the items appear. In this case, we can use this to return the levels to their original order in the plot above, but retain the mixed up ordering we defined by releveling the factor. Also note that since we're just passing cols and not the named vector cols1, the colors are applied according to how the levels appear in the legend (not the way in which they are ordered in the factor):
df$rand_label <- factor(df$rand_label, levels=c('Type3','Type1','Type2'))
ggplot(df, aes(x=Species, y=Sepal.Length, fill=rand_label)) +
geom_boxplot() + theme_classic() +
scale_fill_manual(values=cols, breaks=c('Type1','Type2','Type3'))
You can also use a similar strategy to reorder the x axis: in this case, you would refactor df$Species and set the levels= according to your preferred order.
Related
I want to create a stacked barplot where I can have distinct colours for the categories along the x axis (for communication purposes as they relate to a set of strongly colour coded items), but I also want to distinguish between the stacked parts of the bar with two categories, ideally using a pattern. I've found a partial solution using colour transparency, but it's not exactly what I want.
I found some solutions using extra packages to give complex fill patterns using image fills, and some workarounds to plot lines to sit over the bars to create an artificial fill effect, and some that allowed bars to have a colour and pattern fill, but only splitting either based on bars or stacks, not both. So far I have found nothing that just allows a simple pattern fill for the stacks while also allowing the bars to be different colours.
Example:
mydata <- as.data.frame(cbind(letters = c("a","b","c","d","e","f","a","b","c","d","e","f"),
split=c("yes","yes","yes","yes","yes","yes","no","no","no","no","no","no"),
amount= c(2,3,5,3,4,6,7,2,5,7,2,4)))
colfill <- c("red","blue","green","orange","magenta","purple") ## fill for letters variable
## stackfill <- c("solid","striped") ## example of type of fill variable I want for for 'split'
## Make the barplots:
# this one colours bars by 'split':
ggplot(data=mydata, aes(x=letters, y=amount, fill=split)) +
geom_bar(stat="identity",position="stack")+
scale_fill_manual(values=colfill)
# while this one distinguished based on 'letters'
ggplot(data=mydata, aes(x=letters, y=amount, fill=letters)) +
geom_bar(stat="identity",position="stack")+
scale_fill_manual(values=colfill)
# I want to combine both to get something like this, with colour coded 'letters' and pattern coded 'split':
ggplot(data=mydata)+
geom_col(aes(x=letters, y=amount, fill=letters, alpha=split)) +
scale_alpha_discrete(range=c(1,0.5))+
scale_fill_manual(values=colfill)
Appreciate any suggestions!
Thanks,
Try this:
library(ggplot2)
#remotes::install_github("coolbutuseless/ggpattern")
library(ggpattern)
#Data
mydata <- as.data.frame(cbind(letters = c("a","b","c","d","e","f","a","b","c","d","e","f"),
split=c("yes","yes","yes","yes","yes","yes","no","no","no","no","no","no"),
amount= c(2,3,5,3,4,6,7,2,5,7,2,4)))
colfill <- c("red","blue","green","orange","magenta","purple") ## fill for letters variable
## Make the barplots:
ggplot(data=mydata)+
geom_col(aes(x=letters, y=amount, fill=letters)) +
geom_col_pattern(
aes(letters, amount, pattern_fill = split,fill=letters),
pattern = 'stripe',
colour = 'black'
)+
scale_fill_manual(values=colfill)
Output:
I have some diffraction data from XRD. I'd like to plot it all in one chart but stacked. Because the range of y is quite large, stacking is not so straight forward. there's a link to data if you wish to play and the simple script is below
https://www.dropbox.com/s/b9kyubzncwxge9j/xrd.csv?dl=0
library(dplyr)
library(ggplot2)
#load it up
xrd <- read.csv("xrd.csv")
#melt it
xrd.m = melt(xrd, id.var="Degrees_2_Theta")
# Reorder so factor levels are grouped together
xrd.m$variable = factor(xrd.m$variable,
levels=sort(unique(as.character(xrd.m$variable))))
names(xrd.m)[names(xrd.m) == "variable"] <- "Sample"
names(xrd.m)[names(xrd.m) == "Degrees_2_Theta"] <- "angle"
#colours use for nearly everything
cbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
#plot
ggplot(xrd.m, aes(angle, value, colour=Sample, group=Sample)) +
geom_line(position = "stack") +
scale_colour_manual(values=cbPalette) +
theme_linedraw() +
theme(legend.position = "none",
axis.text.y=element_blank(),
axis.ticks.y=element_blank()) +
labs(x="Degrees 2-theta", y="Intensity - stacked for clarity")
Here is the plot- as you can see it's not quite stacked
Here is something I had in excel a way back. ugly - but slightly better
I'm not sure that I will actually use the stacked plot function from R because I find it always looks off from past experience and instead might use the same data manipulation I used from excel.
It seems that you have a different understanding of the result of applying position="stack" on your geom_line() than what actually is happening. What you're looking to do is probably best served by either using faceting or creating a ridgeline plot. I will give you solutions for both of those approaches here with some example data (sorry, I don't click dropbox links and they will eventually break anyway).
What does position="stack" actually do?
The result of position="stack" will be that your y values of each line will be added, or "stacked", together in the resulting plot. That means that the lines as drawn will only actually accurately reflect the actual value in the data for one of the lines, and the other will be "added on top" of that (stacked). The behavior is best illustrated via an example:
ex <- data.frame(x=c(1,1,2,2,3,3), y=c(1,5,1,2,1,1), grp=rep(c('A','B'),3))
ggplot(ex, aes(x,y, color=grp)) + geom_line()
The y values for "A" are equal to 1 at all values of x. This is the same as indicating position="identity". Now, let's see what happens if we use position="stack":
ggplot(ex, aes(x,y, color=grp)) + geom_line(position="stack")
You should see, the value of y plotted for "B" is equal to B, whereas the y value for "A" is actually the value for "A" added to the value for "B". Hope that makes sense.
Faceting
What you're trying to do is take the overlapping lines you have and "separate" them vertically, right? That's not quite stacking, as you likely want to maintain their y values as position="identity" (the default). One way to do that quite easily is to use faceting, which creates what you could call "stacked plots" according to one or two variables in your dataset. In this case, I'm using example data (for reasons outlined above), but you can use this to understand how you want to arrange your own data.
set.seed(1919191)
df <- data.frame(
x=rep(1:100, 5),
y=c(rnorm(100,0,0.1), rnorm(100,0,0.2), rnorm(100,0,0.3), rnorm(100,0,0.4), rnorm(100,0,0.5)),
sample_name=c(rep('A',100), rep('B',100), rep('C',100), rep('D',100), rep('E',100)))
# plot code
p <- ggplot(df, aes(x,y, color=sample_name))
p + geom_line() + facet_grid(sample_name ~ .)
Create a Ridgeline Plot
The other way that kind of does the same thing is to create what is known as a ridgeline plot. You can do this via the package ggridges and here's an example using geom_ridgeline():
p + geom_ridgeline(
aes(y=sample_name, height=y),
fill=NA, scale=1, min_height=-Inf)
The idea here is to understand that geom_ridgeline() changes your y axis to be the grouping variable (so we actually have to redefine that in aes()), and the actual y value for each of those groups should be assigned to the height= aesthetic. If you have data that has negative y values (now height= values), you'll also want to set the min_height=, or it will cut them off at 0 by default. You can also change how much each of the groups are separated by playing with scale= (does not always change in the way you think it would, btw).
Related to this question.
If I create a gradient using colorRampPalette, is there a way to have ggplot2 automatically detect the number of colours it will need from this gradient?
In the example below, I have to specify 3 colours will be needed for the 3 cyl values. This requires me knowing ahead of time that I'll need this many. I'd like to not have to specify it and have ggplot detect the number it will need automatically.
myColRamp <- colorRampPalette(c('#a0e2f2', '#27bce1'))
ggplot(mtcars, aes(x = wt, y = mpg, col = as.factor(cyl))) +
geom_point(size = 3) +
scale_colour_manual(values = myColRamp(3)) # How to avoid having to specify 3?
I'm also open to options that don't use colorRampPalette but achieve the same functionality.
I see two options here. One which requires a little customisation. One which has more code but requires no customisation.
Option 1 - Determine number of unique factors from your specific variable
Simply use the length and unique functions to work out how many factors are in cyl.
values = myColRamp(length(unique(mtcars$cyl))
Option 2 - Build the plot, and see how many colours it used
If you don't want to specify the name of the variable, and want something more general, we can build the plot, and see how many colours ggplot used, then build it again.
To do this, we also have to save our plot as an object, let's call that plot object p.
p <- ggplot(mtcars, aes(x = wt, y = mpg, col = as.factor(cyl))) +
geom_point(size = 3)
#Notice I haven't set the colour option this time
p_built <- ggplot_build(p) #This builds the plot and saves the data based on
#the plot, so x data is called 'x', y is called 'y',
#and importantly in this case, colour is called the
#generic 'colour'.
#Now we can fish out that data and check how many colour levels were used
num_colours <- length(unique(p_built$data[[1]]$colour))
#Now we know how many colours were used, we can add the colour scale to our plot
p <- p + scale_colour_manual(values = myColRamp(num_colours))
Now either just call p or print(p) depending on your use to view it.
I'm plotting a dense scatter plot in ggplot2 where each point might be labeled by a different color:
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
df$label <- c("a")
df$label[50] <- "point"
df$size <- 2
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size))
When I do this, the scatter point labeled "point" (green) is plotted on top of the red points which have the label "a". What controls this z ordering in ggplot, i.e. what controls which point is on top of which?
For example, what if I wanted all the "a" points to be on top of all the points labeled "point" (meaning they would sometimes partially or fully hide that point)? Does this depend on alphanumerical ordering of labels?
I'd like to find a solution that can be translated easily to rpy2.
2016 Update:
The order aesthetic has been deprecated, so at this point the easiest approach is to sort the data.frame so that the green point is at the bottom, and is plotted last. If you don't want to alter the original data.frame, you can sort it during the ggplot call - here's an example that uses %>% and arrange from the dplyr package to do the on-the-fly sorting:
library(dplyr)
ggplot(df %>%
arrange(label),
aes(x = x, y = y, color = label, size = size)) +
geom_point()
Original 2015 answer for ggplot2 versions < 2.0.0
In ggplot2, you can use the order aesthetic to specify the order in which points are plotted. The last ones plotted will appear on top. To apply this, you can create a variable holding the order in which you'd like points to be drawn.
To put the green dot on top by plotting it after the others:
df$order <- ifelse(df$label=="a", 1, 2)
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size, order=order))
Or to plot the green dot first and bury it, plot the points in the opposite order:
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size, order=-order))
For this simple example, you can skip creating a new sorting variable and just coerce the label variable to a factor and then a numeric:
ggplot(df) +
geom_point(aes(x=x, y=y, color=label, size=size, order=as.numeric(factor(df$label))))
ggplot2 will create plots layer-by-layer and within each layer, the plotting order is defined by the geom type. The default is to plot in the order that they appear in the data.
Where this is different, it is noted. For example
geom_line
Connect observations, ordered by x value.
and
geom_path
Connect observations in data order
There are also known issues regarding the ordering of factors, and it is interesting to note the response of the package author Hadley
The display of a plot should be invariant to the order of the data frame - anything else is a bug.
This quote in mind, a layer is drawn in the specified order, so overplotting can be an issue, especially when creating dense scatter plots. So if you want a consistent plot (and not one that relies on the order in the data frame) you need to think a bit more.
Create a second layer
If you want certain values to appear above other values, you can use the subset argument to create a second layer to definitely be drawn afterwards. You will need to explicitly load the plyr package so .() will work.
set.seed(1234)
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
df$label <- c("a")
df$label[50] <- "point"
df$size <- 2
library(plyr)
ggplot(df) + geom_point(aes(x = x, y = y, color = label, size = size)) +
geom_point(aes(x = x, y = y, color = label, size = size),
subset = .(label == 'point'))
Update
In ggplot2_2.0.0, the subset argument is deprecated. Use e.g. base::subset to select relevant data specified in the data argument. And no need to load plyr:
ggplot(df) +
geom_point(aes(x = x, y = y, color = label, size = size)) +
geom_point(data = subset(df, label == 'point'),
aes(x = x, y = y, color = label, size = size))
Or use alpha
Another approach to avoid the problem of overplotting would be to set the alpha (transparancy) of the points. This will not be as effective as the explicit second layer approach above, however, with judicious use of scale_alpha_manual you should be able to get something to work.
eg
# set alpha = 1 (no transparency) for your point(s) of interest
# and a low value otherwise
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size,alpha = label)) +
scale_alpha_manual(guide='none', values = list(a = 0.2, point = 1))
The fundamental question here can be rephrased like this:
How do I control the layers of my plot?
In the 'ggplot2' package, you can do this quickly by splitting each different layer into a different command. Thinking in terms of layers takes a little bit of practice, but it essentially comes down to what you want plotted on top of other things. You build from the background upwards.
Prep: Prepare the sample data. This step is only necessary for this example, because we don't have real data to work with.
# Establish random seed to make data reproducible.
set.seed(1)
# Generate sample data.
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
# Initialize 'label' and 'size' default values.
df$label <- "a"
df$size <- 2
# Label and size our "special" point.
df$label[50] <- "point"
df$size[50] <- 4
You may notice that I've added a different size to the example just to make the layer difference clearer.
Step 1: Separate your data into layers. Always do this BEFORE you use the 'ggplot' function. Too many people get stuck by trying to do data manipulation from with the 'ggplot' functions. Here, we want to create two layers: one with the "a" labels and one with the "point" labels.
df_layer_1 <- df[df$label=="a",]
df_layer_2 <- df[df$label=="point",]
You could do this with other functions, but I'm just quickly using the data frame matching logic to pull the data.
Step 2: Plot the data as layers. We want to plot all of the "a" data first and then plot all the "point" data.
ggplot() +
geom_point(
data=df_layer_1,
aes(x=x, y=y),
colour="orange",
size=df_layer_1$size) +
geom_point(
data=df_layer_2,
aes(x=x, y=y),
colour="blue",
size=df_layer_2$size)
Notice that the base plot layer ggplot() has no data assigned. This is important, because we are going to override the data for each layer. Then, we have two separate point geometry layers geom_point(...) that use their own specifications. The x and y axis will be shared, but we will use different data, colors, and sizes.
It is important to move the colour and size specifications outside of the aes(...) function, so we can specify these values literally. Otherwise, the 'ggplot' function will usually assign colors and sizes according to the levels found in the data. For instance, if you have size values of 2 and 5 in the data, it will assign a default size to any occurrences of the value 2 and will assign some larger size to any occurrences of the value 5. An 'aes' function specification will not use the values 2 and 5 for the sizes. The same goes for colors. I have exact sizes and colors that I want to use, so I move those arguments into the 'geom_plot' function itself. Also, any specifications in the 'aes' function will be put into the legend, which can be really useless.
Final note: In this example, you could achieve the wanted result in many ways, but it is important to understand how 'ggplot2' layers work in order to get the most out of your 'ggplot' charts. As long as you separate your data into different layers before you call the 'ggplot' functions, you have a lot of control over how things will be graphed on the screen.
It's plotted in order of the rows in the data.frame. Try this:
df2 <- rbind(df[-50,],df[50,])
ggplot(df2) + geom_point(aes(x=x, y=y, color=label, size=size))
As you see the green point is drawn last, since it represents the last row of the data.frame.
Here is a way to order the data.frame to have the green point drawn first:
df2 <- df[order(-as.numeric(factor(df$label))),]
I'm plotting a set of discrete levels of a factor on the x axis and their relevant mean outcome value on the y-axis, something like this:
ggplot(data, aes(item, outcome)) +
stat_summary(fun.y=mean, geom="point", colour="red",size=3)
the last 'item' I have is the mean, and I would like to make this pop out visually.
Is it possible to have a different shape or color for just one level of the factor item?
Is it possible to physically shift or create a barrier for one level of a factor (as if it were a facet)?
You can easily make the last level a different color (or shape) by adding another factor to your data frame that has two levels: the one you want, and everything else. For instance:
dat <- data.frame(item=rep(letters[1:3],times=3),outcome=runif(9))
dat$grp <- rep(c("grp1","grp1","grp2"),times=3)
ggplot(dat, aes(item, outcome))+
stat_summary(fun.y=mean,aes(colour=grp), geom="point",size=3)
Then you set the colour aesthetic in aes rather than globally. Once you have this additional variable, you can also facet on it (edited to reflect #Ben Bolker's comment):
ggplot(dat, aes(item, outcome)) +
stat_summary(fun.y=mean, geom="point",size=3) +
facet_grid(.~grp,scale="free_x",space="free")