Density plot in ggplot [duplicate] - r

In the dataframe below, I would expect the y axis values for density be 0.6 and 0.4, yet they are 1.0. I feel there is obviously something extremely basic that I am missing about the way I am using ..density.. but am brain freezing. How would I obtain the desired behavior using ..density.. Any help would be appreciated.
df <- data.frame(a = c("yes","no","yes","yes","no"))
m <- ggplot(df, aes(x = a))
m + geom_histogram(aes(y = ..density..))
Thanks,
--JT

As per #Arun's comment:
At the moment, yes and no belong to different groups. To make them part of the same group set a grouping aesthetic:
m <- ggplot(df, aes(x = a , group = 1)) # 'group = 1' sets the group of all x to 1
m + geom_histogram(aes(y = ..density..))

Related

R ggplot How to Show Probability of Two Variables

I have a distribution of data that is shown below in image 1. My goal is to show the likelihood that a variable is below a particular value for both X and for Y. For instance, I'd like to have a good way to show that ~95% of values are below 8000 on X-axis and below 6500 on the Y-axis. I am confident that there is a simple answer to this. I apologize if this has been asked many times before.
plot1 <- df %>% ggplot(mapping = aes(x = FLUID_TOT)) + stat_ecdf() + theme_bw()
plot2 <- df %>% ggplot(mapping = aes(x = FLUID_TOT, y = y)) + geom_point() + theme_bw()

ggplot: How does geom_tile calculate the fill? [duplicate]

I used geom_tile() for plot 3 variables on the same graph... with
tile_ruined_coop<-ggplot(data=df.1[sel1,])+
geom_tile(aes(x=bonus, y=malus, fill=rf/300))+
scale_fill_gradient(name="vr")+
facet_grid(Seuil_out_coop_i ~ nb_coop_init)
tile_ruined_coop
and I am pleased with the result !
But What kind of statistical treatment is applied to fill ? Is this a mean ?
To plot the mean of the fill values you should aggregate your values, before plotting. The scale_colour_gradient(...) does not work on the data level, but on the visualization level.
Let's start with a toy Dataframe to build a reproducible example to work with.
mydata = expand.grid(bonus = seq(0, 1, 0.25), malus = seq(0, 1, 0.25), type = c("Risquophile","Moyen","Risquophobe"))
mydata = do.call("rbind",replicate(40, mydata, simplify = FALSE))
mydata$value= runif(nrow(mydata), min=0, max=50)
mydata$coop = "cooperative"
Now, before plotting I suggest you to calculate the mean over your groups of 40 values, and for this operation like to use the dplyr package:
library(dplyr)
data = mydata %>% group_by("bonus","malus","type","coop") %>% summarise(vr=mean(value))
Tow you have your dataset ready to plot with ggplot2:
library(ggplot2)
g = ggplot(data, aes(x=bonus,y=malus,fill=vr))
g = g + geom_tile()
g = g + facet_grid(type~coop)
and this is the result:
where you are sure that the fill value is exactly the mean of your values.
Is this what you expected?
It uses stat_identity as can be seen in the documentation. You can test that easily:
DF <- data.frame(x=c(rep(1:2, 2), 1),
y=c(rep(1:2, each=2), 1),
fill=1:5)
# x y fill
#1 1 1 1
#2 2 1 2
#3 1 2 3
#4 2 2 4
#5 1 1 5
p <- ggplot(data=DF) +
geom_tile(aes(x=x, y=y, fill=fill))
print(p)
As you see the fill value for the 1/1 combination is 5. If you use factors it's even more clear what happens:
p <- ggplot(data=DF) +
geom_tile(aes(x=x, y=y, fill=factor(fill)))
print(p)
If you want to depict means, I'd suggest to calculate them outside of ggplot2:
library(plyr)
DF1 <- ddply(DF, .(x, y), summarize, fill=mean(fill))
p <- ggplot(data=DF1) +
geom_tile(aes(x=x, y=y, fill=fill))
print(p)
That's easier than trying to find out if stat_summary can play with geom_tile somehow (I doubt it).
scale_fill() and geom_tile() apply no statistics -or better apply stat_identity()- to your fill value=rf/300. It just computes how many colors you use and then generates the colors with the munsell function 'mnsl()'. If you want to apply some statistics only to the colors displayed you should use:
scale_colour_gradient(trans = "log")
or
scale_colour_gradient(trans = "sqrt")
Changing the colors among the tiles could not be the best idea since the plots have to be comparable, and you compare the values by their colours. Hope this helps

Making smart multilevel histograms

I'm using RStudio Version 0.98.1028 on windows.
I'd like to make multilevel histogram using ggplot2. Let's say I have a 4D data frame like this:
facet <- as.factor(rep(c('alpha', 'beta', 'gamma'), each = 4, times = 3))
group <- as.factor(rep(c('X', 'Y'), each = 2, times = 9))
type <- as.factor(rep(c('a', 'b'), each = 1, times = 18))
day <- as.factor(rep(1:3, each = 12)
df = data.frame(facet = facet, group = group, type = type, day = day, value = abs(rnorm(36)))
I'd like to make histograms of x = day vs y = value in 3 facets, corresponding to facet, grouping by group and filling by type. In other words I'd like to pile up a and b in a single bar, but keeping separated bars for X and Y. It would look something like
g = ggplot(df, aes(day, value, group = group, fill = type))
g + geom_histogram(stat = 'identity', position = 'dodge') +
facet_grid(facet ~ .)
Unfortunately with the dodge option I get unstacked histograms, while without I get 4 bars at each day. Any idea on how to solve this problem?
Using excel one facet should look something like this
Thanks in advance!
EB
Well, maybe your question is related to this one on the ggplot group.
A possible solution is the following:
g = ggplot(df, aes(group, value, fill = type))
g + geom_bar(stat = 'identity', position = 'stack') +
facet_grid(facet ~ day)
It's suboptimal because you are using two facets, but in this way you obtain this figure:
As pointed out by #Matteo your specific wish is probably not directly achievable with the tooling provided by ggplot2. A little bit of hacking provided below which may point in the right direction - I am not endorsing it too much but I just spent a couple of minutes playin, around with it. Maybe you can pick up a few of the elements.
I combined group and day into a single factor and when plotting replaced the x-labels manually with the (non-unique) group names. I then included (in a lazy manner) day labels. I still feel day x facet is the way you should proceed.
df$combinedCategory <- as.factor(paste(df$day,df$group))
library(scales)
g = ggplot(df, aes(combinedCategory, value, fill = type))
g = g + geom_bar(stat='identity',position = 'fill')
g = g + facet_grid(facet ~ .)
g = g + scale_y_continuous(labels = percent)
g = g + scale_x_discrete(labels = c("X","Y"))
g = g + geom_text(aes(x=1.5,y=0.05, label="Day 1"))
g = g + geom_text(aes(x=3.5,y=0.05, label="Day 2"))
g = g + geom_text(aes(x=5.5,y=0.05, label="Day 3"))
g = g + theme_minimal()
g
This give the following:
Indeed it is sufficient to set y = interaction(group, day) in aes(). This was actually my first step, but I was wondering if something more precise existed. Apparently not: the only tricky point here is to create a 2nd level x-axis labels row. Thanks everybody!

Not understanding the behavior of ..density

In the dataframe below, I would expect the y axis values for density be 0.6 and 0.4, yet they are 1.0. I feel there is obviously something extremely basic that I am missing about the way I am using ..density.. but am brain freezing. How would I obtain the desired behavior using ..density.. Any help would be appreciated.
df <- data.frame(a = c("yes","no","yes","yes","no"))
m <- ggplot(df, aes(x = a))
m + geom_histogram(aes(y = ..density..))
Thanks,
--JT
As per #Arun's comment:
At the moment, yes and no belong to different groups. To make them part of the same group set a grouping aesthetic:
m <- ggplot(df, aes(x = a , group = 1)) # 'group = 1' sets the group of all x to 1
m + geom_histogram(aes(y = ..density..))

ggplot2-line plotting with TIME series and multi-spline

This question's theme is simple but drives me crazy:
1. how to use melt()
2. how to deal with multi-lines in single one image?
Here is my raw data:
a 4.17125 41.33875 29.674375 8.551875 5.5
b 4.101875 29.49875 50.191875 13.780625 4.90375
c 3.1575 29.621875 78.411875 25.174375 7.8012
Q1:
I've learn from this post Plotting two variables as lines using ggplot2 on the same graph to know how to draw the multi-lines for multi-variables, just like this:
The following codes can get the above plot. However, the x-axis is indeed time-series.
df <- read.delim("~/Desktop/df.b", header=F)
colnames(df)<-c("sample",0,15,30,60,120)
df2<-melt(df,id="sample")
ggplot(data = df2, aes(x=variable, y= value, group = sample, colour=sample)) + geom_line() + geom_point()
I wish it could treat 0 15 30 60 120 as real number to show the time series, rather than name_characteristics. Even having tried this, I failed.
row.names(df)<-df$sample
df<-df[,-1]
df<-as.matrix(df)
df2 <- data.frame(sample = factor(rep(row.names(df),each=5)), Time = factor(rep(c(0,15,30,60,120),3)),Values = c(df[1,],df[2,],df[3,]))
ggplot(data = df2, aes(x=Time, y= Values, group = sample, colour=sample))
+ geom_line()
+ geom_point()
Loooooooooking forward to your help.
Q2:
I've learnt that the following script can add the spline() function for single one line, what about I wish to apply spline() for all the three lines in single one image?
n <-10
d <- data.frame(x =1:n, y = rnorm(n))
ggplot(d,aes(x,y))+ geom_point()+geom_line(data=data.frame(spline(d, n=n*10)))
Your variable column is a factor (you can verify by calling str(df2)). Just convert it back to numeric:
df2$variable <- as.numeric(as.character(df2$variable))
For your other question, you might want to stick with using geom_smooth or stat_smooth, something like this:
p <- ggplot(data = df2, aes(x=variable, y= value, group = sample, colour=sample)) +
geom_line() +
geom_point()
library(splines)
p + geom_smooth(aes(group = sample),method = "lm",formula = y~bs(x),se = FALSE)
which gives me something like this:

Resources