two histograms in one plot (ggplot)

two histograms in one plot (ggplot) - r

Well, I've been looking in this site to make two histograms in one plot.
I get to
ggplot()+geom_histogram(data=etapa1, aes(x=AverageTemperature),col="red")+
geom_histogram(data=etapa2, aes(x=AverageTemperature),col="blue")
I've got two histograms with different colours, but I don't get a legend or a label which shows which is each colour. How can I produce it?

As Spacedman said it would be better if you could specify your problem more in detail and give an example data set.
So i create a random sample set which simulates a temperature.
etapa1 <- data.frame(AverageTemperature = rnorm(100000, 16.9, 2))
etapa2 <- data.frame(AverageTemperature = rnorm(100000, 17.4, 2))
#Now, combine your two dataframes into one. First make a new column in each.
etapa1$e <- 'etapa1'
etapa2$e <- 'etapa2'
# combine the two data frames etapa1 and etapa2
combo <- rbind(etapa1, etapa2)
ggplot(combo, aes(AverageTemperature, fill = e)) + geom_density(alpha = 0.2)
For me it seems more obvious to use a density plot rather than a histogram since temperatures are real numbers.
Hope this helps somehow...
If you don't want to combine the two data.frames it is a bit more tricky...
You have to use scale_colour_manual and scale_fill_manual. And then define a variable for the fill statement. This can be linked in the labels section
ggplot() +
geom_density(data = etapa1, aes(x = AverageTemperature, fill = "r"), alpha = 0.3) +
geom_density(data = etapa2, aes(x = AverageTemperature, fill = "b"), alpha = 0.3) +
scale_colour_manual(name ="etapa", values = c("r" = "red", "b" = "blue"), labels=c("b" = "blue values", "r" = "red values")) +
scale_fill_manual(name ="etapa", values = c("r" = "red", "b" = "blue"), labels=c("b" = "blue values", "r" = "red values"))
You can replace geom_density() with geom_histogram() respectively.

Using #TimoWagner's example:
set.seed(1001)
etapa1 <- data.frame(AverageTemperature = rnorm(100000, 16.9, 2))
etapa2 <- data.frame(AverageTemperature = rnorm(100000, 17.4, 2))
Here's another way to pack the two data sets together:
combdat <- dplyr::bind_rows(list(dat1=etapa1,dat2=etapa2),
.id="dataset")
Two superimposed histograms:
library(ggplot2)
ggplot(combdat,aes(AverageTemperature,fill=dataset))+
scale_fill_manual(values=c("red","blue"))+
geom_histogram(alpha=0.5,binwidth=0.1,position="identity")

Related

Advice/ on how to plot side by side histograms with line graph going through in ggplot2

I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))

You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.

Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))

You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.

Generating multiple lines for repeat observations in only some factor levels

I am generating density plots for observations. The observations belong to a species and some are also connected to an individual ID.
With the data below, I want to generate a line for each level of IndID for species One and Two, and only a single line for Species Three, which does not include IndID. There are related questions on SO, but not with reproducible data and looking for different results.
library(ggplot2)
set.seed(1)
dat <- data.frame(Species = c(rep(c("One", "Two"), each = 2, length = 30), rep("Three",50)),
IndID = c(rep(letters[1:5],each = 6),rep(NA,50) ),
Value = sample(1:20, replace = T))
Keeping the color ascetic on the Species level, I want to create multiple lines for Species One and Two (green and red) and a single blue line for species Three.
ggplot(dat, aes(Value)) + geom_density(aes(color = Species), size = 1.25) +
scale_colour_manual(values = c("darkgreen","blue", "red"))

If you want to be able to tell them apart, you can set the linetype to IndID. Note, however, that you will need to change the NA to some other value to (easily) get it to plot.
I also expanded your data a little bit to give enough values per individual to show meaningful lines. I also used geom_line(stat = "density") instead of geom_density() because it omits the line along the bottom and gives legends with lines instead of boxes.
set.seed(1)
dat <- data.frame(Species = c(rep(c("One", "Two"), each = 2, length = 60), rep("Three",50)),
IndID = c(rep(letters[1:5],each = 12),rep("NA",50) ),
Value = sample(1:20, 110, replace = T))
ggplot(dat
, aes(x = Value
, color = Species
, linetype = IndID)) +
geom_line(stat = "density"
, size = 1.25) +
scale_colour_manual(values = c("darkgreen","blue", "red"))
gives
If you want the lines to all be solid, you can run:
ggplot(dat
, aes(x = Value
, color = Species
, linetype = IndID)) +
geom_line(stat = "density"
, size = 1.25) +
scale_colour_manual(values = c("darkgreen","blue", "red")) +
scale_linetype_manual(values = rep("solid", 6)) +
guides(linetype = "none")
(or use group as #Henrik suggested in zir comment)

Change alpha value for certain break values in ggplot geom_point

I have made a scatter plot from 100k++ points and i would like the colour points (break values 1 and 2 which are "green" and break value 20 which is "red") to stand out more than the "cornsilk1" points (break values 3 to 19). I have tried the code below but no luck.
Any help would be appreciated.
Thanks so much
p.s. please excuse my juvenile code. I am sure there is a way more effective way to do this...
plotIA<-ggplot(plotintaobs,aes(x=SD13009PB,y=SD13009PB2,colour=quartile))+geom_point()+labs(x="Phillips Observeration 1", y="Phillips Observation 2") + ggtitle("Intra-observer Variation") + mytheme
plotIA+ scale_color_manual(breaks = c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20"),
values=c("green","green", "cornsilk1", "cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","red"))
plotIA+scale_alpha_manual(values=c(1,1,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,1))

One strategy is to use cut to split the quartiles into into your three groups. Then you can use scale_colour_manual
# some fake data
plotintaobs <- data.frame(SD13009PB = rnorm(20), SD13009PB2 = rnorm(20), quartile = 1:20)
#cut quartile
plotintaobs$q2 <- cut(plotintaobs$quartile, breaks = c(0, 2, 19, 20), labels = c("low", "mid", "high"))
#plot
plotIA <- ggplot(plotintaobs, aes(x = SD13009PB, y = SD13009PB2, colour = q2, alpha = q2)) +
geom_point() +
scale_colour_manual(values = c("green", "cornsilk1","red")) +
scale_alpha_manual(values = c(1, 0.8, 1))
plotIA

Adding text outside plot doesn't work in r

I have a simple dataset:
11 observations, 1 variable.
I want to plot them adding my own axis names, but when I want to change the position of them, R keeps plotting them in the exact same spot.
Here is my script:
plot(data[,5], xlab = "", xaxt='n')
axis(1, at = 1:11, labels = F)
text(1:11, par("usr")[3] - 0.1, srt = 90, adj = 1, labels = names, xpd = TRUE)
I am changing the -0.1, to any number but R keeps placing the labels in the exact same spot. I tried with short names like "a" but the result is the same.
Thanks in advance
My data:
10308.9
10201.6
12685.3
3957.93
7677.1
9671.7
11849.4
10755.7
11283.4
11583.8
12066.9
names <- rep("name",11)

My ggplot solution:
# creating the sample dataframe
data <- read.table(text="10308.9
10201.6
12685.3
3957.93
7677.1
9671.7
11849.4
10755.7
11283.4
11583.8
12066.9", header=FALSE)
# adding a names column
data$names <- as.factor(paste0("name",sprintf("%02.0f", seq(1,11,1))))
#creating the plot
require(ggplot2)
ggplot(data, aes(x=names, y=V1)) +
geom_bar(fill = "white", color = "black")
which gives:
When you want to change the order of the bars, you can do that with transform:
# transforming the data (I placed "name04" as the first one)
data2 <- transform(data,
newnames=factor(names,
levels=c("name04","name01","name02","name03","name04","name05","name06","name07","name08","name09","name10","name11"),
ordered =TRUE))
#creating the plot
ggplot(data2, aes(x=newnames, y=V1)) +
geom_bar(stat="identity", fill="white", color="black")
which gives:

Dotplot with error bars, two series, light jitter

I have a collection of data over several studies. For each study I am interested about the mean of a variable by gender, and if this significantly differs. For each study I have the mean and 95% confidence intervals for both males and females.
What I would like to do is something similar to this:
I have used several flavours of dotplots (dotplot, dotplot2, Dotplot) but did not quite get there.
Using Dotplot from Hmisc I managed to have one series and its errorbars, but I am at a loss on how to adding the second series.
I used Dotplot and got the vertical ending of the error bars following advice given here.
Here is a working example of the code I am using
data<-data.frame(ID=c("Study1","Study2","Study3"),avgm=c(2,3,3.5),avgf=c(2.5,3.3,4))
data$lowerm <- data$avgm*0.9
data$upperm <- data$avgm*1.1
data$lowerf <- data$avgf*0.9
data$upperf <- data$avgf*1.1
# Create the customized panel function
mypanel.Dotplot <- function(x, y, ...) {
panel.Dotplot(x,y,...)
tips <- attr(x, "other")
panel.arrows(x0 = tips[,1], y0 = y,
x1 = tips[,2], y1 = y,
length = 0.05, unit = "native",
angle = 90, code = 3)
}
library(Hmisc)
Dotplot(data$ID ~ Cbind(data$avgm,data$lowerm,data$upperm), col="blue", pch=20, panel = mypanel.Dotplot,
xlab="measure",ylab="study")
This plots three columns of data, the average for males (avgm), and the lower and upper bound of the 95% confidence interval (lowerm and upperm). I have other three series, for the same studies, that do the same job for the female subjects (avgf, lowerf, upperf).
The results I have look like this:
What is missing, in a nutshell:
adding a second series (avgf) with means and confidence intervals defined on three other variables for the same studies
adding some vertical jitter so that they are not one on top of the other but the reader can see both even when they overlap.

Unfortunately I can't help you with Dotplot, but I find it fairly straightforward using ggplot. You just need to rearrange the data slightly.
library(ggplot2)
# grab data for males
df_m <- data[ , c(1, 2, 4, 5)]
df_m$sex <- "m"
names(df_m) <- c("ID", "avg", "lower", "upper", "sex")
df_m
# grab data for females
df_f <- data[ , c(1, 3, 6, 7)]
df_f$sex <- "f"
names(df_f) <- c("ID", "avg", "lower", "upper", "sex")
df_m
# bind the data together
df <- rbind(df_m, df_f)
# plot
ggplot(data = df, aes(x = ID, y = avg, ymin = lower, ymax = upper, colour = sex)) +
geom_point(position = position_dodge(width = 0.2)) +
geom_errorbar(position = position_dodge(width = 0.2), width = 0.1) +
coord_flip() +
scale_colour_manual(values = c("blue", "red")) +
theme_classic()
# if you want horizontal grid lines you may change the last line with:
theme_bw() +
theme(panel.grid.major.y = element_line(colour = "grey", linetype = "dashed"),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

two histograms in one plot (ggplot) - r

Related

Advice/ on how to plot side by side histograms with line graph going through in ggplot2

Generating multiple lines for repeat observations in only some factor levels

Change alpha value for certain break values in ggplot geom_point

Adding text outside plot doesn't work in r

Dotplot with error bars, two series, light jitter

Categories

Resources