ggplot: How does geom_tile calculate the fill? [duplicate] - r

I used geom_tile() for plot 3 variables on the same graph... with
tile_ruined_coop<-ggplot(data=df.1[sel1,])+
geom_tile(aes(x=bonus, y=malus, fill=rf/300))+
scale_fill_gradient(name="vr")+
facet_grid(Seuil_out_coop_i ~ nb_coop_init)
tile_ruined_coop
and I am pleased with the result !
But What kind of statistical treatment is applied to fill ? Is this a mean ?

To plot the mean of the fill values you should aggregate your values, before plotting. The scale_colour_gradient(...) does not work on the data level, but on the visualization level.
Let's start with a toy Dataframe to build a reproducible example to work with.
mydata = expand.grid(bonus = seq(0, 1, 0.25), malus = seq(0, 1, 0.25), type = c("Risquophile","Moyen","Risquophobe"))
mydata = do.call("rbind",replicate(40, mydata, simplify = FALSE))
mydata$value= runif(nrow(mydata), min=0, max=50)
mydata$coop = "cooperative"
Now, before plotting I suggest you to calculate the mean over your groups of 40 values, and for this operation like to use the dplyr package:
library(dplyr)
data = mydata %>% group_by("bonus","malus","type","coop") %>% summarise(vr=mean(value))
Tow you have your dataset ready to plot with ggplot2:
library(ggplot2)
g = ggplot(data, aes(x=bonus,y=malus,fill=vr))
g = g + geom_tile()
g = g + facet_grid(type~coop)
and this is the result:
where you are sure that the fill value is exactly the mean of your values.
Is this what you expected?

It uses stat_identity as can be seen in the documentation. You can test that easily:
DF <- data.frame(x=c(rep(1:2, 2), 1),
y=c(rep(1:2, each=2), 1),
fill=1:5)
# x y fill
#1 1 1 1
#2 2 1 2
#3 1 2 3
#4 2 2 4
#5 1 1 5
p <- ggplot(data=DF) +
geom_tile(aes(x=x, y=y, fill=fill))
print(p)
As you see the fill value for the 1/1 combination is 5. If you use factors it's even more clear what happens:
p <- ggplot(data=DF) +
geom_tile(aes(x=x, y=y, fill=factor(fill)))
print(p)
If you want to depict means, I'd suggest to calculate them outside of ggplot2:
library(plyr)
DF1 <- ddply(DF, .(x, y), summarize, fill=mean(fill))
p <- ggplot(data=DF1) +
geom_tile(aes(x=x, y=y, fill=fill))
print(p)
That's easier than trying to find out if stat_summary can play with geom_tile somehow (I doubt it).

scale_fill() and geom_tile() apply no statistics -or better apply stat_identity()- to your fill value=rf/300. It just computes how many colors you use and then generates the colors with the munsell function 'mnsl()'. If you want to apply some statistics only to the colors displayed you should use:
scale_colour_gradient(trans = "log")
or
scale_colour_gradient(trans = "sqrt")
Changing the colors among the tiles could not be the best idea since the plots have to be comparable, and you compare the values by their colours. Hope this helps

Related

violin_plot() with continuous axis for grouping variable?

The grouping variable for creating a geom_violin() plot in ggplot2 is expected to be discrete for obvious reasons. However my discrete values are numbers, and I would like to show them on a continuous scale so that I can overlay a continuous function of those numbers on top of the violins. Toy example:
library(tidyverse)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T),
y = rnorm(1000, mean = x))
ggplot(df) + geom_violin(aes(x=factor(x), y=y))
This works as you'd imagine: violins with their x axis values (equally spaced) labelled 1, 2, and 5, with their means at y=1,2,5 respectively. I want to overlay a continuous function such as y=x, passing through the means. Is that possible? Adding + scale_x_continuous() predictably gives Error: Discrete value supplied to continuous scale. A solution would presumably spread the violins horizontally by the numeric x values, i.e. three times the spacing between 2 and 5 as between 1 and 2, but that is not the only thing I'm trying to achieve - overlaying a continuous function is the key issue.
If this isn't possible, alternative visualisation suggestions are welcome. I know I could replace violins with a simple scatter plot to give a rough sense of density as a function of y for a given x.
The functionality to plot violin plots on a continuous scale is directly built into ggplot.
The key is to keep the original continuous variable (instead of transforming it into a factor variable) and specify how to group it within the aesthetic mapping of the geom_violin() object. The width of the groups can be modified with the cut_width argument, depending on the data at hand.
library(tidyverse)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T),
y = rnorm(1000, mean = x))
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(group = cut_width(x, 1)), scale = "width") +
geom_smooth(method = 'lm')
By using this approach, all geoms for continuous data and their varying functionalities can be combined with the violin plots, e.g. we could easily replace the line with a loess curve and add a scatter plot of the points.
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(group = cut_width(x, 1)), scale = "width") +
geom_smooth(method = 'loess') +
geom_point()
More examples can be found in the ggplot helpfile for violin plots.
Try this. As you already guessed, spreading the violins by numeric values is the key to the solution. To this end I expand the df to include all x values in the interval min(x) to max(x) and use scale_x_discrete(drop = FALSE) so that all values are displayed.
Note: Thanks #ChrisW for the more general example of my approach.
library(tidyverse)
set.seed(42)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T), y = rnorm(1000, mean = x^2))
# y = x^2
# add missing x values
x.range <- seq(from=min(df$x), to=max(df$x))
df <- df %>% right_join(tibble(x = x.range))
#> Joining, by = "x"
# Whatever the desired continuous function is:
df.fit <- tibble(x = x.range, y=x^2) %>%
mutate(x = factor(x))
ggplot() +
geom_violin(data=df, aes(x = factor(x, levels = 1:5), y=y)) +
geom_line(data=df.fit, aes(x, y, group=1), color = "red") +
scale_x_discrete(drop = FALSE)
#> Warning: Removed 2 rows containing non-finite values (stat_ydensity).
Created on 2020-06-11 by the reprex package (v0.3.0)

ggplot facet_wrap with equally spaced axes

Say I have the following dummy data frame:
df <- data.frame(let = LETTERS[1:13], value = sample(13),
group = rep(c("foo", "bar"), times = c(5,8)))
df
let value group
1 A 2 foo
2 B 1 foo
3 C 12 foo
4 D 8 foo
5 E 4 foo
6 F 13 bar
7 G 11 bar
8 H 3 bar
9 I 7 bar
10 J 5 bar
11 K 10 bar
12 L 9 bar
13 M 6 bar
Using ggplot with facet_wrap allows me to make a panel for each of the groups...
library(ggplot2)
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free")
..but the vertical axes are not equally spaced, i.e. the left plot contains more vertical ticks than the right one. I would like to fill up the right vertical axis with (unlabeled) ticks (with no plotted values). In this case that would add 3 empty ticks, but it should be scalable to any df size.
What is the best way to accomplish this? Should I change the data frame, or is there a way to do this using ggplot?
I’m not sure why you want to arrange the categorical variable on your chart as you do other than aesthetics (it does seem to look better). At any rate, a simple workaround which seems to handle general cases is to note that ggplot uses a numerical scale to plot categorical variables. The workaround for your chart is then for each x value to plot a transparent point at the y value equal to the number of categorical variables. Points are plotted for all x values as a simple solution to the case of non-overlapping ranges of x values for each group. I've added another group to your data frame to make the example a bit more general.
library(ggplot2)
set.seed(123)
df <- data.frame(let = LETTERS[1:19], value = c(sample(13),20+sample(6)),
group = rep(c("foo", "bar", "bar2"), times = c(5,8,6)))
num_rows <- xtabs(~ group, df)
max_rows <- max(num_rows)
sp <- ggplot(df, aes(y= let, x = value)) +
geom_point() +
geom_point(aes(y = max_rows +.5), alpha=0 ) +
facet_wrap(~group, scales = "free", nrow=1 )
plot(sp)
This gives the following chart:
A cludgy solution that requires magrittr (for the compound assignment pipe %<>%):
df %<>%
rbind(data.frame(let = c(" ", " ", " "),
value = NA,
group = "foo"))
I just add three more entries for foo that are blank strings (i.e., just spaces) of different lengths. There must be a more elegant solution, though.
Use free_x instead of free, like this:
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free_x")+
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())

R ggplot conditional color without exact match

I am trying to color points in a line conditional if they are above or below the yearly mean in ggplot2 and I cannot find any help that where colors are not exactly matched to values.
I'm using the following code:
ggplot(aes(x = M, y = O)) + geom_line()
I want it to be one color if O is above mean(O) or below.
I tried to follow the advice but I just get a split graph when I use:
mutate(color=ifelse(O>mean(O),"green","red")) %>% ggplot(aes(x=M,y=O,color=color))+geom_line()+scale_color_manual(values=c("red", "darkgreen"))
I get the following graph:
This works, but makes a break in the line.
library(tidyverse)
df <- data.frame(
M = 1:5,
O = c(1, 2, 3, 4, 5)
)
df <- mutate(df, above = O > mean(O))
ggplot(df, aes(x=M,y=O, color=above))+geom_line()
Build a variable color to mark your color type.
For points use geom_point(), not geom_line().
Edit: color option splits the data in 2 groups. Use group=1 (one value for all) to force a single group.
Advice: Avoid naming a variable O, there is a big confusion with 0 (zero).
library(tidyverse)
df <- data.frame(M=rnorm(10), O=rnorm(10)) %>%
mutate(color=ifelse(O > mean(O), T, F))
#ggplot(df, aes(x=M, y=O, color = color)) + geom_point()
ggplot(df, aes(x=M, y=O, color = color, group=1)) + geom_line() + scale_color_manual(values=c("red", "green"))
# > df
# M O color
# 1 0.05829207 -0.03490925 FALSE
# 2 -0.09255111 -0.52513201 FALSE
# 3 0.44859944 0.19371037 FALSE
# 4 -0.54216222 0.40783749 TRUE

Density plot in ggplot [duplicate]

In the dataframe below, I would expect the y axis values for density be 0.6 and 0.4, yet they are 1.0. I feel there is obviously something extremely basic that I am missing about the way I am using ..density.. but am brain freezing. How would I obtain the desired behavior using ..density.. Any help would be appreciated.
df <- data.frame(a = c("yes","no","yes","yes","no"))
m <- ggplot(df, aes(x = a))
m + geom_histogram(aes(y = ..density..))
Thanks,
--JT
As per #Arun's comment:
At the moment, yes and no belong to different groups. To make them part of the same group set a grouping aesthetic:
m <- ggplot(df, aes(x = a , group = 1)) # 'group = 1' sets the group of all x to 1
m + geom_histogram(aes(y = ..density..))

ggplot2-line plotting with TIME series and multi-spline

This question's theme is simple but drives me crazy:
1. how to use melt()
2. how to deal with multi-lines in single one image?
Here is my raw data:
a 4.17125 41.33875 29.674375 8.551875 5.5
b 4.101875 29.49875 50.191875 13.780625 4.90375
c 3.1575 29.621875 78.411875 25.174375 7.8012
Q1:
I've learn from this post Plotting two variables as lines using ggplot2 on the same graph to know how to draw the multi-lines for multi-variables, just like this:
The following codes can get the above plot. However, the x-axis is indeed time-series.
df <- read.delim("~/Desktop/df.b", header=F)
colnames(df)<-c("sample",0,15,30,60,120)
df2<-melt(df,id="sample")
ggplot(data = df2, aes(x=variable, y= value, group = sample, colour=sample)) + geom_line() + geom_point()
I wish it could treat 0 15 30 60 120 as real number to show the time series, rather than name_characteristics. Even having tried this, I failed.
row.names(df)<-df$sample
df<-df[,-1]
df<-as.matrix(df)
df2 <- data.frame(sample = factor(rep(row.names(df),each=5)), Time = factor(rep(c(0,15,30,60,120),3)),Values = c(df[1,],df[2,],df[3,]))
ggplot(data = df2, aes(x=Time, y= Values, group = sample, colour=sample))
+ geom_line()
+ geom_point()
Loooooooooking forward to your help.
Q2:
I've learnt that the following script can add the spline() function for single one line, what about I wish to apply spline() for all the three lines in single one image?
n <-10
d <- data.frame(x =1:n, y = rnorm(n))
ggplot(d,aes(x,y))+ geom_point()+geom_line(data=data.frame(spline(d, n=n*10)))
Your variable column is a factor (you can verify by calling str(df2)). Just convert it back to numeric:
df2$variable <- as.numeric(as.character(df2$variable))
For your other question, you might want to stick with using geom_smooth or stat_smooth, something like this:
p <- ggplot(data = df2, aes(x=variable, y= value, group = sample, colour=sample)) +
geom_line() +
geom_point()
library(splines)
p + geom_smooth(aes(group = sample),method = "lm",formula = y~bs(x),se = FALSE)
which gives me something like this:

Resources