For example I have basic stacked plot:
ggplot(diamonds, aes(x=factor(color),fill=factor(cut)))+geom_bar(position="fill")
and I have small subset diamonds with "carat" value higher than 3:
subset(diamonds,carat>3)
and I want to highlight this particular values on plot (like points or labels if our diamonds would have IDs) to see in which part of distribution are they lying. Is there any possibility to do something like that?
PS: unfortunantly I`m not allowed to post figures.
The following inserts the count of "carat greater than 3" into the bar segments. I've broken the problem down to a number of steps. Step 1: New variable identifying "carat greater than 3". Step 2: Get a summary table of the counts - of diamonds for each color and cut, and of "carat greater than 3' for each color and cut. I used the ddply() function from the plyr packages. Step 3: The bar plot without the labels. Step 4: Add to the summary table a variable giving the y positions of the labels. Step 5: Add the geom_text layer to the plot. The data frame for geom_text is the summary table. geom_text() needs aesthetics for label (in this case, the count for "carat greater than 3'), y position (calculated in the previous step), and x positions (color).
library(ggplot2)
library(plyr)
# Step 1
diamonds$caratGT3 = ifelse(diamonds$carat > 3, 1, 0)
# Step 2
diamonds2 = ddply(diamonds, .(color, cut), summarize, CountGT3 = sum(caratGT3))
diamonds2$Count = count(diamonds, .(color, cut))[,3]
diamonds2
# Step 3
p = ggplot() + geom_bar(data = diamonds, aes(x=factor(color),fill=factor(cut)))
# Step 4
diamonds2 <- ddply(diamonds2,.(color),
function(x) {
x$cfreq <- cumsum(x$Count)
x$pos <- (c(0,x$cfreq[-nrow(x)]) + x$cfreq) / 2
x
})
# Step 5
(p <- p + geom_text(data = diamonds2,
aes(x = factor(color), y = pos, label = CountGT3),
size = 3, colour = "black", face = "bold"))
Related
Say I have the following dummy data frame:
df <- data.frame(let = LETTERS[1:13], value = sample(13),
group = rep(c("foo", "bar"), times = c(5,8)))
df
let value group
1 A 2 foo
2 B 1 foo
3 C 12 foo
4 D 8 foo
5 E 4 foo
6 F 13 bar
7 G 11 bar
8 H 3 bar
9 I 7 bar
10 J 5 bar
11 K 10 bar
12 L 9 bar
13 M 6 bar
Using ggplot with facet_wrap allows me to make a panel for each of the groups...
library(ggplot2)
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free")
..but the vertical axes are not equally spaced, i.e. the left plot contains more vertical ticks than the right one. I would like to fill up the right vertical axis with (unlabeled) ticks (with no plotted values). In this case that would add 3 empty ticks, but it should be scalable to any df size.
What is the best way to accomplish this? Should I change the data frame, or is there a way to do this using ggplot?
I’m not sure why you want to arrange the categorical variable on your chart as you do other than aesthetics (it does seem to look better). At any rate, a simple workaround which seems to handle general cases is to note that ggplot uses a numerical scale to plot categorical variables. The workaround for your chart is then for each x value to plot a transparent point at the y value equal to the number of categorical variables. Points are plotted for all x values as a simple solution to the case of non-overlapping ranges of x values for each group. I've added another group to your data frame to make the example a bit more general.
library(ggplot2)
set.seed(123)
df <- data.frame(let = LETTERS[1:19], value = c(sample(13),20+sample(6)),
group = rep(c("foo", "bar", "bar2"), times = c(5,8,6)))
num_rows <- xtabs(~ group, df)
max_rows <- max(num_rows)
sp <- ggplot(df, aes(y= let, x = value)) +
geom_point() +
geom_point(aes(y = max_rows +.5), alpha=0 ) +
facet_wrap(~group, scales = "free", nrow=1 )
plot(sp)
This gives the following chart:
A cludgy solution that requires magrittr (for the compound assignment pipe %<>%):
df %<>%
rbind(data.frame(let = c(" ", " ", " "),
value = NA,
group = "foo"))
I just add three more entries for foo that are blank strings (i.e., just spaces) of different lengths. There must be a more elegant solution, though.
Use free_x instead of free, like this:
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free_x")+
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())
I am trying to color points in a line conditional if they are above or below the yearly mean in ggplot2 and I cannot find any help that where colors are not exactly matched to values.
I'm using the following code:
ggplot(aes(x = M, y = O)) + geom_line()
I want it to be one color if O is above mean(O) or below.
I tried to follow the advice but I just get a split graph when I use:
mutate(color=ifelse(O>mean(O),"green","red")) %>% ggplot(aes(x=M,y=O,color=color))+geom_line()+scale_color_manual(values=c("red", "darkgreen"))
I get the following graph:
This works, but makes a break in the line.
library(tidyverse)
df <- data.frame(
M = 1:5,
O = c(1, 2, 3, 4, 5)
)
df <- mutate(df, above = O > mean(O))
ggplot(df, aes(x=M,y=O, color=above))+geom_line()
Build a variable color to mark your color type.
For points use geom_point(), not geom_line().
Edit: color option splits the data in 2 groups. Use group=1 (one value for all) to force a single group.
Advice: Avoid naming a variable O, there is a big confusion with 0 (zero).
library(tidyverse)
df <- data.frame(M=rnorm(10), O=rnorm(10)) %>%
mutate(color=ifelse(O > mean(O), T, F))
#ggplot(df, aes(x=M, y=O, color = color)) + geom_point()
ggplot(df, aes(x=M, y=O, color = color, group=1)) + geom_line() + scale_color_manual(values=c("red", "green"))
# > df
# M O color
# 1 0.05829207 -0.03490925 FALSE
# 2 -0.09255111 -0.52513201 FALSE
# 3 0.44859944 0.19371037 FALSE
# 4 -0.54216222 0.40783749 TRUE
I used geom_tile() for plot 3 variables on the same graph... with
tile_ruined_coop<-ggplot(data=df.1[sel1,])+
geom_tile(aes(x=bonus, y=malus, fill=rf/300))+
scale_fill_gradient(name="vr")+
facet_grid(Seuil_out_coop_i ~ nb_coop_init)
tile_ruined_coop
and I am pleased with the result !
But What kind of statistical treatment is applied to fill ? Is this a mean ?
To plot the mean of the fill values you should aggregate your values, before plotting. The scale_colour_gradient(...) does not work on the data level, but on the visualization level.
Let's start with a toy Dataframe to build a reproducible example to work with.
mydata = expand.grid(bonus = seq(0, 1, 0.25), malus = seq(0, 1, 0.25), type = c("Risquophile","Moyen","Risquophobe"))
mydata = do.call("rbind",replicate(40, mydata, simplify = FALSE))
mydata$value= runif(nrow(mydata), min=0, max=50)
mydata$coop = "cooperative"
Now, before plotting I suggest you to calculate the mean over your groups of 40 values, and for this operation like to use the dplyr package:
library(dplyr)
data = mydata %>% group_by("bonus","malus","type","coop") %>% summarise(vr=mean(value))
Tow you have your dataset ready to plot with ggplot2:
library(ggplot2)
g = ggplot(data, aes(x=bonus,y=malus,fill=vr))
g = g + geom_tile()
g = g + facet_grid(type~coop)
and this is the result:
where you are sure that the fill value is exactly the mean of your values.
Is this what you expected?
It uses stat_identity as can be seen in the documentation. You can test that easily:
DF <- data.frame(x=c(rep(1:2, 2), 1),
y=c(rep(1:2, each=2), 1),
fill=1:5)
# x y fill
#1 1 1 1
#2 2 1 2
#3 1 2 3
#4 2 2 4
#5 1 1 5
p <- ggplot(data=DF) +
geom_tile(aes(x=x, y=y, fill=fill))
print(p)
As you see the fill value for the 1/1 combination is 5. If you use factors it's even more clear what happens:
p <- ggplot(data=DF) +
geom_tile(aes(x=x, y=y, fill=factor(fill)))
print(p)
If you want to depict means, I'd suggest to calculate them outside of ggplot2:
library(plyr)
DF1 <- ddply(DF, .(x, y), summarize, fill=mean(fill))
p <- ggplot(data=DF1) +
geom_tile(aes(x=x, y=y, fill=fill))
print(p)
That's easier than trying to find out if stat_summary can play with geom_tile somehow (I doubt it).
scale_fill() and geom_tile() apply no statistics -or better apply stat_identity()- to your fill value=rf/300. It just computes how many colors you use and then generates the colors with the munsell function 'mnsl()'. If you want to apply some statistics only to the colors displayed you should use:
scale_colour_gradient(trans = "log")
or
scale_colour_gradient(trans = "sqrt")
Changing the colors among the tiles could not be the best idea since the plots have to be comparable, and you compare the values by their colours. Hope this helps
Is it possible to split the fill legend of a ggplot barplot following the values on the x-axis of the plot?
For example using this data:
library(ggplot2)
data <- data.frame(val=c(2,4,5,6,7,8,9),var1=c("A","A","A","B","B","C","C"),
var2=sample(LETTERS[1:7]))
ggplot(data,aes(x=factor(var1),y=val,fill=var2))+geom_bar(stat="identity")
I get the following plot:
I would like to have something like this to make it easier to find what each fill color corresponds to:
An alternative to the solutions in the links in the comments. The solution assumes that the data is available in an aggregated form, and that each category of var2 appear in one and only one category of var1. That is, the number of keys (and their order) in the legend is correct. All that need happen is for space to be inserted between appropriate keys and text dropped into those spaces. It gets the information it needs to construct the plot from the initial plot or its build data.
library(ggplot2)
library(gtable)
library(grid)
set.seed(1234)
data <- data.frame(val = c(2,4,5,6,7,8,9),
var1 = c("A","A","A","B","B","C","C"),
var2 = sample(LETTERS[1:7]))
# Sort levels of var2
data$var2 = factor(data$var2, labels = data$var2, levels = data$var2)
p = ggplot(data, aes(x = factor(var1), y = val, fill = var2)) +
geom_bar(stat = "identity")
# Get the ggplot grob
g = ggplotGrob(p)
# Get the legend
leg = g$grobs[[which(g$layout$name == "guide-box")]]$grobs[[1]]
# Get the labels from the ggplot build data
gt = ggplot_build(p)
labels = rev(gt$layout$panel_params[[1]]$x.labels)
## Positions of the labels
# Get the number of keys within each label from the ggplot build data
gt$data[[1]]$x
N = as.vector(table(gt$data[[1]]$x))
N = N[-length(N)]
# Get the positions of the labels in the legend gtable
pos = rev(cumsum(N)) + 3
pos = c(pos, 3)
# Add rows to the legend gtable, and add the labels to the new rows
for(i in seq_along(pos)){
leg = gtable_add_rows(leg, unit(1.5, "lines"), pos = pos[i])
leg = gtable_add_grob(leg, textGrob(labels[i], y = 0.1, just = "bottom"),
t = pos[i] + 1, l = 2)
}
# Put the legend back into the plot
g$grobs[[which(g$layout$name == "guide-box")]]$grobs[[1]] = leg
# Draw it
grid.newpage()
grid.draw(g)
I have a data frame:
x <- data.frame(id=letters[1:3],val0=1:3,val1=4:6,val2=7:9)
id val0 val1 val2
1 a 1 4 7
2 b 2 5 8
3 c 3 6 9
I want to plot a stacked bar plot that shows the percentage of each columns. So, each bar represents one row and and each bar is of length but of three different colors each color representing percentage of val0, val1 and val2.
I tried looking for it, I am getting only ways to plot stacked graph but not stacked proportional graph.
Thanks.
Using ggplot2
For ggplot2 and geom_bar
Work in long format
Pre-calculate the percentages
For example
library(reshape2)
library(plyr)
# long format with column of proportions within each id
xlong <- ddply(melt(x, id.vars = 'id'), .(id), mutate, prop = value / sum(value))
ggplot(xlong, aes(x = id, y = prop, fill = variable)) + geom_bar(stat = 'identity')
# note position = 'fill' would work with the value column
ggplot(xlong, aes(x = id, y = value, fill = variable)) +
geom_bar(stat = 'identity', position = 'fill', aes(fill = variable))
# will return the same plot as above
base R
A table object can be plotted as a mosaic plot. using plot. Your x is (almost) a table object
# get the numeric columns as a matrix
xt <- as.matrix(x[,2:4])
# set the rownames to be the first column of x
rownames(xt) <- x[[1]]
# set the class to be a table so plot will call plot.table
class(xt) <- 'table'
plot(xt)
you could also use mosaicplot directly
mosaicplot(x[,2:4], main = 'Proportions')