ggplot histogram with labels - r

I want to label to different histograms that are in the same plot. By labels, I want to identify by colors each histogram, for example one green that corresponds to x and one red that corresponds to y.
I tried to use the function label. But it is not working.
ggplot() +
geom_histogram(data=junk, aes(x),fill="green", alpha=.2) +
geom_histogram(data=jun, aes(y), fill="red", alpha=.2)+
labs(x = "something") +
ggtitle("title")
I expect to have both histograms, one green and the other one red, and labels in the right describing each histogram.

for this, you need to have the data in long format, so the data that should make up the green histogram and the data that make up the red one in a data frame below one another, and another column, that defines the groups.
df=data.frame(values=rnorm(20000),colorby=c("red_values","green_values"))
ggplot(data=df,aes(x=values,fill=colorby))+
geom_histogram(position="dodge")+
scale_fill_manual(values=c("red_values"="red","green_values"="green"))
For the position argument you could also try if "stack" fits your needs better.

Related

Using ggplot in R to display dataframe with two different colours depending on value of data. Issue with Data jumping

I am using the below code to plot a data frame on the same plot:
ggplot(df) + geom_line(aes(x = date, y = values, colour = X > 5))
The plot is working and looks great all except for the fact that when the values are bigger than 5, because I am using geom_line, it then starts connecting points that are above the threshold. like below. I do not want the lines connecting the blue data.
How do I stop this from happening?
Here's an example using the economics dataset included in ggplot2. You see the same thing if we highlight the line based on values above 8000:
ggplot(economics, aes(date, unemploy)) +
geom_line(aes(color=unemploy > 8000))
When a mapping is defined in your dataset, by default ggplot2 also groups your data based on this. This makes total sense if you're trying to plot a line where you have data in long form and want to draw separate lines for each different value in a column. In cases like this, you want ggplot2 to change the color of the line based on the data, but you want to tell ggplot2 not to group based on color. This is why you will need to override the group= aesthetic.
To override the group= aesthetic change that happens when you map your line geom, you can just say group=1 or really group= any constant value. This effectively sets every observation mapped to the same group, and the line will connect all your points, but be colored differently:
ggplot(economics, aes(date, unemploy)) +
geom_line(aes(color=unemploy > 8000, group=1))

How do I add intensity legend of colors after I plot using grid.raster()?

I am doing kmeans clustering on a png image and have been plotting it using grid::grid.raster(image). But I would like to put a legend which shows the intensity in a bar(from blue to red) marked with values, essentially indicating the intensity on the image. (image is an array where the third dimension equals 3 giving the red, green and blue channels.)
I thought of using grid.legend() but couldn't figure it out. I am hoping the community can help me out. Following is the image I have been using and after I perform kmeans clustering want a legend beside it that displays intensity on a continuous scale on a color bar.
Also I tried with ggplot2 and could plot the image but still couldn't plot the legend. I am providing the ggplot code for plotting the image. I can extract the RGB channels separately using ggplot2 also, so showing that also helps.
colassign <- rgb(Kmeans2#centers[clusters(Kmeans2),])
library(ggplot2)
ggplot(data = imgVEC, aes(x = x, y = y)) +
geom_point(colour = colassign) +
labs(title = paste("k-Means Clustering of", kClusters, "Colours")) +
xlab("x") +
ylab("y")
Did not find a way to use grid.raster() properly but found a way to do it by ggplot2 when plotting the RGB channels separately. Note: this only works for plotting the pannels separately, but this is what I needed. Following shows the code for green channel.
#RGB channels are respectively stored in columns 1,2,3.
#x-axis and y-axis values are stored in columns 4,5.
#original image is a nx5 matrix
ggplot(original_img[,c(3,4,5)], aes(x, y)) +
geom_point(aes(colour = segmented_img[,3])) +
scale_color_gradient2()+
# scale_color_distiller(palette="RdYlBu") can be used instead of scale_color_gradient2() to get color selections of choice using palette as argument.

Adding group mean lines to geom_bar plot and including in legend

I want to be able to create a bar graph which shows also shows the mean value for bars in each group. AND shows the mean bar in the legend.
I have been able to get this graph Bar chart with means using the code below, which is fine, but I would like to be able to see the mean lines in the legend.
##The data to be graphed is the proportion of persons receiving a treatment
## (num=numerator) in each population (denom=demoninator). The population is
##grouped by two age groups and (Age) and further divided by a categorical
##variable V1
###SET UP DATAFRAME###
require(ggplot2)
df <- data.frame(V1 = c(rep(c("S1","S2","S3","S4","S5"),2)),
Age= c(rep(70,5),rep(80,5)),
num=c(5280,6570,5307,4894,4119,3377,4244,2999,2971,2322),
denom=c(9984,12600,9425,8206,7227,7290,8808,6386,6206,5227))
df$prop<-df$num/df$denom*100
PopMean<-sum(df$num)/sum(df$denom)*100
df70<-df[df$Age==70,]
group70mean<-sum(df70$num)/sum(df70$denom)*100
df80<-df[df$Age==80,]
group80mean<-sum(df80$num)/sum(df80$denom)*100
df$PopMean<-c(rep(PopMean,10))
df$groupmeans<-c(rep(group70mean,5),rep(group80mean,5))
I want the plot to look like this, but want the lines in the legend too, to be labelled as 'mean of group' or similar.
#basic plot
P<-ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(), colour='black',stat="identity")
P
####add mean lines
P+geom_errorbar(aes(y=df$groupmeans, ymax=df$groupmeans,
ymin=df$groupmeans), col="red", lwd=2)
Adding show.legend=TRUE overlays the error bars onto the factor legend, rather than separately. If there is a way of showing geom_errorbar separately in the legend this is probably the simplest solution.
I have also tried various things with geom_line
The syntax below produces a line for the population mean value, but running from the centre of each point rather than covering the width of the bars
This produces a line for the population mean and it does produce a legend but one showing a bar of colour rather than a line.
P+geom_line(aes(y=df$PopMean, group=df$PopMean, color=df$PopMean),lwd=1)
If i try to do lines for group means the lines are not visible (because they are only single points).
P+geom_line(aes(y=df$groupmeans, group=df$groupmeans, color=df$groupmeans))
I also tried to get round this with facet plot, although this requires me to pretend my categorical variable is numeric to get it to work.
###set up new df
df2<-df
df2$V1<-c(rep(c(1,2,3,4,5),2))
P<-ggplot(df2, aes(x=factor(V1), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(),
colour='black',stat="identity",width=1)
P+facet_grid(.~factor(df2$Age))
P+facet_grid(.~factor(df2$Age))+geom_line(aes(y=df$groupmeans,
group=df$groupmeans, color=df$groupmeans))
Facetplot
This allows me to show the mean lines, using geom_line, so a legend does appear (although it doesn't look right, showing a colour gradient rather than coloured lines!). However, the lines still do not go the full width of the bars. Also my x-axis now needs relabelling to show S1, S2 etc rather than numeric 1,2,3
To sum up - is there a way of showing error bar lines separately in the legend?
If not, then, if i use facetting, how do I correct the legend appearance and relabel axes with my categorical variables and is is possible to get the line to go the full width of the plot?
Or is there an alternate solution that I am missing!?
Thanks
To get the legend for the geom_error you need to pass the colour argument in the aes.
As you want only one category (here red), I've create a dummy variable first
df$mean <- "Mean"
ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(), colour='black',stat="identity") +
geom_errorbar(aes (ymax=groupmeans,
ymin=groupmeans, colour=mean), lwd=2) +
scale_colour_manual(name="",values = "#ff0000")

Readjusting one plot from a two-plots graph

I have two plots in the same graph. I need to adjust the first plot so that the y axis is in function of the number of cases (which is lower than for the second plot). As for now the first plot is difficult to read because it is much smaller than the second one. I would want it to be proportional.
all.scores = rbind(UK_Together.scores, YesScotland.scores)
ggplot(data=all.scores) + # ggplot works on data.frames, always
geom_bar(mapping=aes(x=score, fill=Parti), binwidth=1) +
facet_grid(Parti~.) + # make a separate plot for each hashtag
theme_bw() + scale_fill_brewer() # plain display, nicer colors
Anyone can help?

ggplot discrete color gradient but highlighting certain categories

I have a scatter plot of time series data grouped by years. It is currently plotted with a discrete color gradient to separate the years. I know however that one or more years are outliers and would like to highlight the points corresponding to them.
As an example using the diamond dataset
ggplot(diamonds,aes(carat,price,colour=color)) + geom_point()
Suppose I know color F is does not follow the same relationship and would like to highlight it on the graph. What is the best way to do it?
ggplot(diamonds,aes(carat,price,colour=color)) + geom_point() + scale_colour_brewer(palette="Blues")
I was thinking using a blue palette but coloring F as red, but I don't know how to do the 2nd part. Can someone help please?
You get black and blue by default if you use the strategy of adding 1 to a logical vector:
ggplot(diamonds,aes(carat,price,colour= 1+(color=="F") )) + geom_point()
Because it was numeric, we got a continuous scale (bewteen 1 and 2). To make it blue with a discrete scale (which I think looks equally strange, use as.factor()
ggplot(diamonds,aes(carat,price,colour= as.factor(1+(color=="F") ))) +
geom_point() + scale_colour_brewer(palette="Blues")

Resources