I'm trying to create a histogram with two superimposed density plots. The problem: is I want one density to be a dashed line, which works perfectly but in the legend the dashed line will not appear, as in the following example
x<-sort(rnorm(1000))
data<-data.frame(x=x,Normal=dnorm(x,mean(x),sd=sd(x)),Student=dt(x,df=3))
ggplot(data,aes(y=x))+geom_histogram(aes(x=x,y=..density..),
color="black",fill="darkgrey")+geom_line(aes(x=x,y=Normal,color="Normal"),size=1,
linetype=2)+ylab("")+xlab("")+labs(title="Density estimations")+geom_line(aes(x=x,y=Student,color="Student"),size=1)+
scale_color_manual(values=c("Student"="black","Normal"="black"))
Any ideas how I get the dashed line in the legend?
Thank you very much!
Rainer
The "ggplot" way generally likes data to be in "long" format with separate columns to specify each aesthetic. In this case, linetype should be interpreted as an aesthetic. The easiest way to deal with this is to prep your data into the appropriate format with reshape2 package:
library(reshape2)
data.m <- melt(data, measure.vars = c("Normal", "Student"), id.vars = "x")
And then modify your plotting code to look something like this:
ggplot(data,aes(y=x)) +
geom_histogram(aes(x=x,y=..density..),color="black",fill="darkgrey") +
geom_line(data = data.m, aes(x = x, y = value, linetype = variable), size = 1) +
ylab("") +
xlab("") +
labs(title="Density estimations")
Results in something like this:
You want to reshape this to long format ...makes it simpler
x<-sort(rnorm(1000))
Normal=dnorm(x,mean(x),sd=sd(x))
Student=dt(x,df=3)
y= c(Normal,Student)
DistBn= rep(c('Normal', 'Student'), each=1000)
# don't call it 'data' that is an R command
df<-data.frame(x=x,y=y, DistBn=DistBn)
head(df)
x y DistBn
1 -2.986430 0.005170920 Normal
2 -2.957834 0.005621358 Normal
3 -2.680157 0.012126747 Normal
4 -2.601635 0.014864165 Normal
5 -2.544302 0.017179353 Normal
6 -2.484082 0.019930239 Normal
ggplot(df,aes(x=x, y=y))+
geom_histogram(aes(x=x,y=..density..),color="black",fill="darkgrey")+
geom_line(aes(x=x,y=y,linetype=DistBn))+
ylab("")+xlab("")+labs(title="Density estimations")+
scale_color_manual(values=c("Student"="black","Normal"="black"))
Related
I am trying to make scatter plot with ggplot2. Below you can see data and my code.
data=data.frame(
gross_i.2019=seq(1,101),
Prediction=seq(21,121))
ggplot(data=data, aes(x=gross_i.2019, y=Prediction, group=1)) +
geom_point()
This code produce chart below
So now I want to have values on scatter plot with different two different colors, first for gross_i.2019 and second for Prediction. I try with this code below with different color but this code this lines of code only change previous color into new color.
sccater <- ggplot(data=data, aes(x=gross_i.2019, y=Prediction))
sccater + geom_point(color = "#00AFBB")
So can anybody help me how to make this plot with two different color (e.g black and red) one for gross_i.2019 and second for Prediction?
I may be confused by what you are trying to accomplish, but it doesn't seem like you have two groups of data to plot two different colors for. You have one dependent(Prediction) and one independent (gross_i.2019) variable that you are plotting a relationship for. If Prediction and gross_i.2019 are both supposed to be dependent variables of different groups, you need a common independent variable to plot them separately, against (like time for example). Then you can do something like geompoint(color=groups)
Edit1: If you wanted the index (count of the dataset to be your independent x axis then you could do the following:
library(tidyverse)
data=data.frame(gross_i.2019=seq(1,101),Prediction=seq(21,121))
#create a column for the index numbers
data$index <- c(1:101)
#using tidyr pivot your dataset to a tidy dataset (long not wide)
data <- data %>% pivot_longer(!index, names_to="group",values_to="count")
#asign the groups to colors
p<- ggplot(data=data, aes(x=index, y=count, color=group))
p1<- p + geom_point()
p1
This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.
long <- reshape(data,
ids = row.names(data),
varying = c("gross_i.2019", "Prediction"),
v.names = "line",
direction = "long")
long$time <- names(data)[long$time]
long$id <- as.numeric(long$id)
library(ggplot2)
ggplot(long, aes(id, line, color = time)) +
geom_point() +
scale_color_manual(values = c("#000000", "#00AFBB"))
Aim
I am trying to change the shape of the geom_point into a cross (so not a "plus/addition" sign, but a 'death' cross).
Attempt
Let say I have the following data:
library(tidyverse)
df <- read.table(text="x y
1 3
2 4
3 6
4 7 ", header=TRUE)
I am able to change the shape using the shape parameter in geom_point into different shapes, like this:
ggplot(data = df, aes(x =x, y=y)) +
geom_point(shape=2) # change shape
However, there is no option to change the shape into a cross.
Question
How do I change the shape of a value into a cross using ggplot in R?
Shape can be set to a unicode character. The below uses the skull and crossbones but you can look up a more suitable symbol.
Note that the final result will depend on the font used to generate the plot.
ggplot(data = df, aes(x =x, y=y)) +
geom_point(shape="\u2620", size = 10)
Using the Iris data set as an example, I can produce a ggplot with facet.
The code is:
library(ggplot2)
data(iris)
y=iris
y$Petal.Width.Range=factor(ifelse(y$Petal.Width<1.3,"Narrow","Wide"))
y$Petal.Length.Range=factor(ifelse(y$Petal.Length<4.35,"Short","Long"))
ggplot(y, aes(Sepal.Length,Sepal.Width)) +
geom_point(alpha=0.5)+
geom_hline(yintercept =3 ,alpha=0.3)+
facet_grid(Petal.Width.Range ~ Petal.Length.Range)
Here I have a horizontal spec of 3 in each of the 4 cases. What should I do if I want a case dependent spec please? For example, I can define 4 different specs as the following:
y$threshold=2
y$threshold[(y$Petal.Width.Range=="Narrow")&(y$Petal.Length.Range=="Short")] =2
y$threshold[(y$Petal.Width.Range=="Narrow")&(y$Petal.Length.Range=="Long")] =2.5
y$threshold[(y$Petal.Width.Range=="Wide")&(y$Petal.Length.Range=="Short")] =3.1
y$threshold[(y$Petal.Width.Range=="Wide")&(y$Petal.Length.Range=="Long")] =4
How should I add y$threshold into the ggplot commands please?
One easy solution is just to change your hline call to this: geom_hline(aes(yintercept=threshold), alpha=0.3) +.
The problem is, that would draw 150 lines on your plot (150 being the number of rows in the y data.frame). Maybe that's ok with you, because the lines would mostly be stacked on top of each other and you would really only see four lines, in their correct locations.
However, here is another solution where I create a smaller auxiliary data.frame. This is a common approach in ggplot2. Notice how the new data.frame is specified as the data source inside the geom_hline call.
hline_dat = data.frame(Petal.Width.Range=c("Narrow", "Narrow", "Wide", "Wide"),
Petal.Length.Range=c("Short", "Long", "Short", "Long"),
threshold=c(2, 2.5, 3.1, 4))
p = ggplot(y, aes(Sepal.Length,Sepal.Width)) +
geom_point(alpha=0.5) +
geom_hline(data=hline_dat, aes(yintercept=threshold), colour="salmon") +
facet_grid(Petal.Width.Range ~ Petal.Length.Range)
ggsave("plot.png", plot=p, height=4, width=6)
What I really want to do is plot a histogram, with the y-axis on a log-scale. Obviously this i a problem with the ggplot2 geom_histogram, since the bottom os the bar is at zero, and the log of that gives you trouble.
My workaround is to use the freqpoly geom, and that more-or less does the job. The following code works just fine:
ggplot(zcoorddist) +
geom_freqpoly(aes(x=zcoord,y=..density..),binwidth = 0.001) +
scale_y_continuous(trans = 'log10')
The issue is that at the edges of my data, I get a couple of garish vertical lines that really thro you off visually when combining a bunch of these freqpoly curves in one plot. What I'd like to be able to do is use points at every vertex of the freqpoly curve, and no lines connecting them. Is there a way to to this easily?
The easiest way to get the desired plot is to just recast your data. Then you can use geom_point. Since you don't provide an example, I used the standard example for geom_histogram to show this:
# load packages
require(ggplot2)
require(reshape)
# get data
data(movies)
movies <- movies[, c("title", "rating")]
# here's the equivalent of your plot
ggplot(movies) + geom_freqpoly(aes(x=rating, y=..density..), binwidth=.001) +
scale_y_continuous(trans = 'log10')
# recast the data
df1 <- recast(movies, value~., measure.var="rating")
names(df1) <- c("rating", "number")
# alternative way to recast data
df2 <- as.data.frame(table(movies$rating))
names(df2) <- c("rating", "number")
df2$rating <- as.numeric(as.character(df$rating))
# plot
p <- ggplot(df1, aes(x=rating)) + scale_y_continuous(trans="log10", name="density")
# with lines
p + geom_linerange(aes(ymax=number, ymin=.9))
# only points
p + geom_point(aes(y=number))
I am plotting value~date in ggplot2 (in R). I have the following code. As you see ggplot2 adds more breaks on the x-axis that I have in my data. I just want to have the x-label everytime I have a data point in my data frame. How can I force ggplot2 to just show the breaks only at the values of my.dates? It seems there is no "breaks" argument for scale_x_date
require(ggplot2)
my.dates = as.Date(c("2011-07-22","2011-07-23",
"2011-07-24","2011-07-28","2011-07-29"))
my.vals = c(5,6,8,7,3)
my.data <- data.frame(date =my.dates, vals = my.vals)
plot(my.dates, my.vals)
p <- ggplot(data = my.data, aes(date,vals))+ geom_line(size = 1.5)
p <- p + scale_x_date(format="%m/%d", ' ')
p
One approach would be to treat the x-axis as numeric and set the breaks and labels aesthetics with scale_x_continuous().
ggplot(my.data, aes(as.numeric(date), vals)) +
geom_line(size = 1.5) +
scale_x_continuous(breaks = as.numeric(my.data$date)
, labels = format(my.data$date, format = "%m/%d"))
though the break between 7/24 through 7/28 looks a bit strange in my opinion. However, I think that's what you want? Let me know if I've misinterpreted.
EDIT
As noted above, I wasn't thrilled with the way the breaks looked above, specifically with the gray grid in the background. Here's one way to maintain the rectangular grid and to only label the points where we have data. You could do this all within the ggplot call, but I think it's easier to do the processing outside of ggplot. First, create a vector that contains the sequence of numbers corresponding to the dates. Then we'll update the appropriate labels and replace the NA entries with " " to prevent anything from being plotted on the x-axis for those entries:
xscale <- data.frame(breaks = seq(min(as.numeric(my.data$date)), max(as.numeric(my.data$date)))
, labels = NA)
xscale$labels[xscale$breaks %in% as.numeric(my.data$date)] <- format(my.data$date, format = "%m/%d")
xscale$labels[is.na(xscale$labels)] <- " "
This gives us something that looks like:
breaks labels
1 15177 07/22
2 15178 07/23
3 15179 07/24
4 15180
5 15181
6 15182
7 15183 07/28
8 15184 07/29
which can then be passed to the scale like this:
scale_x_continuous(breaks = xscale$breaks, labels = xscale$labels)