Gradient Fill in Bar Graph - r

I'm looking at behavior of different groups of people (called Clusters in this data set) and their preference for the type of browser they use. I want to create a bar graph that shows the percentage of each cluster that is using each type of browser.
Here is some code to generate a similar dataset (please ignore that the percentages for each cluster will not add up to 1):
browserNames <- c("microsoft","mozilla","google")
clusterNames <- c("Cluster 1","Cluster 2","Cluster 3")
percentages <- runif(n=length(browserNames)*length(clusterNames),min=0,max=1)
myData<-as.data.frame(list(browserNames=rep(browserNames,3),
clusterNames=rep(clusterNames,each=3),
percentages=percentages))
Here's the code I've been able to come up with so far to get the graph I desire:
ggplot(myData, aes(x=browserNames, y=percentages, fill=factor(clusterNames))) +
geom_bar(stat="identity",position="dodge") +
scale_y_continuous(name="Percent Weight", labels=percent)
I want the fill for each cluster to be a gradient fill with high and low values that I determine. So, in this example, I would like to be able to set 3 high and low values for each cluster that is represented.
I've had trouble with the different scale_fill commands, and I'm new enough to ggplot that I am pretty sure I'm probably just doing it wrong. Any ideas?
Edit: Here is a picture of what I'm looking for:
(Original image available at https://www.dropbox.com/s/py6hifejqz7k54v/gradientExample.bmp)

Is this close to what you had in mind??
# color set depends on browser
library(RColorBrewer) # for brewer.pal(...)
gg <- with(myData, myData[order(browserNames,percentages),])
gg$colors <- 1:9
colors <- c(brewer.pal(3,"Reds"),brewer.pal(3,"Greens"),brewer.pal(3,"Blues"))
ggplot(zz, aes(x=browserNames, y=percentages,
fill=factor(colors), group=factor(clusterNames))) +
geom_bar(stat="identity",position="dodge", color="grey70") +
scale_fill_manual("Cluster", values=colors,
breaks=c(3,6,9), labels=c("Google","Microsoft","Mosilla"))
# color set depends on cluster
library(RColorBrewer) # for brewer.pal(...)
gg <- with(myData, myData[order(clusterNames,percentages),])
gg$colors <- 1:9
col <- c(brewer.pal(3,"Reds"),brewer.pal(3,"Greens"),brewer.pal(3,"Blues"))
ggplot(gg, aes(x=browserNames, y=percentages,
fill=factor(colors), group=factor(clusterNames))) +
geom_bar(stat="identity",position="dodge", color="grey70") +
scale_fill_manual("Cluster", values=col,
breaks=c(3,6,9), labels=c("Cluster1","Cluster2","Cluster3"))

Related

How can I define a color palette (normalize) for multiple hexbin plots in R

I want to find a way to set a certain range of a color palette that is used for a hexbin plot to normalize multiple plots in R.
So far I have tried:
library(hexbin)
library(gplots)
my.colors <- function (n)
{
(rich.colors(n))
}
plot(hexbin(lastthousand$V4, lastthousand$V5, xbnds=c(0,35), ybnds=c(0,35),), xlab="Green Pucks", ylab="Red Pucks",colramp = my.colors, colorcut = seq(0, 1, length = 25),lcex=0.66)
Which results in the follwing plot:
hexbin plot #1
I understand that "colourcut" controls the resolution of the color palette. But I found no way to controll the min/max values
Lets say I have a second plot - 'hexbin plot #2' - with counts from 1(dark-blue) to 100(red). Is there a way to use only the colors 1(dark-blue)-24(light-blue) [based on only a part of the 1(dark-blue)-100(red) scale] for hexbin plot #1?
The final goal is to have several hexbin plots next to each other which follow the same colour scheme (min and max based on the one with the highest counts).
-this is my first question here :) and I'm new to R, please be gentle
//edit: For everyone with the same problem: My supervisor suggested to use facets in ggplot2. Will see how that works and return with another edit if it solves the issue.
//edit2: factes did the trick:
library(gplots)
library(ggplot2)
p <- ggplot(data=lastthousand, aes(lastthousand$V4,lastthousand$V5))+ geom_hex()
p + facet_grid(. ~ Market) + xlab("green pucks") + ylab("red pucks") + scale_colour_gradientn(colours=rainbow(7))
Maybe this can be useful: https://gist.github.com/wahalulu/1376861
and this for ranges:
https://stackoverflow.com/a/15505591/1600108
https://stackoverflow.com/a/14586941/1600108
Facets does the trick:
library(gplots)
library(ggplot2)
p <- ggplot(data=lastthousand, aes(lastthousand$V4,lastthousand$V5))+ geom_hex()
p + facet_grid(. ~ Market) + xlab("green pucks") + ylab("red pucks") + scale_colour_gradientn(colours=rainbow(7))

ggplot geom_histogram color by factor not working properly

In trying to color my stacked histogram according to a factor column; all the bars have a "green" roof? I want the bar-top to be the same color as the bar itself. The figure below shows clearly what is wrong. All the bars have a "green" horizontal line at the top?
Here is a dummy data set :
BodyLength <- rnorm(100, mean = 50, sd = 3)
vector <- c("80","10","5","5")
colors <- c("black","blue","red","green")
color <- rep(colors,vector)
data <- data.frame(BodyLength,color)
And the program I used to generate the plot below :
plot <- ggplot(data = data, aes(x=data$BodyLength, color = factor(data$color), fill=I("transparent")))
plot <- plot + geom_histogram()
plot <- plot + scale_colour_manual(values = c("Black","blue","red","green"))
Also, since the data column itself contains color names, any way I don't have to specify them again in scale_color_manual? Can ggplot identify them from the data itself? But I would really like help with the first problem right now...Thanks.
Here is a quick way to get your colors to scale_colour_manual without writing out a vector:
data <- data.frame(BodyLength,color)
data$color<- factor(data$color)
and then later,
scale_colour_manual(values = levels(data$color))
Now, with respect to your first problem, I don't know exactly why your bars have green roofs. However, you may want to look at some different options for the position argument in geom_histogram, such as
plot + geom_histogram(position="identity")
..or position="dodge". The identity option is closer to what you want but since green is the last line drawn, it overwrites previous the colors.
I like density plots better for these problems myself.
ggplot(data=data, aes(x=BodyLength, color=color)) + geom_density()
ggplot(data=data, aes(x=BodyLength, fill=color)) + geom_density(alpha=.3)

Heat Color Densities in R

I am trying to build a type of color density plot similar to the one here:
https://stats.stackexchange.com/questions/26676/generating-visually-appealing-density-heat-maps-in-r
But with different types of data that goes into it. My real data has a bunch of rows but for example I have code that is put into a data frame that is X, Y, Score and I want to have a color density plot using these static X, Y buckets. Is that possible?
X=seq(0,10,by=1)
Y=seq(50,60,by=1)
total=expand.grid(X,Y)
nrow(total)
total$score=runif(nrow(total), min=0, max=100)
range(total$score)
head(total)
my_palette <- colorRampPalette(c("blue", "yellow", "red"))(n = 100)
col_breaks = c(seq(0,100,length=100))
col=data.frame(as.character(my_palette),col_breaks)
col$num=row.names(col)
head(col)
col$col_breaks=round(col$col_breaks,0)
names(col)[1]="hex"
total$round=round(total$score)
total$color=as.character(col$hex[match(total$round,col$col_breaks)])
plot(total$Var1,total$Var2,col=total$color,xlim=c(0,10),ylim=c(50,60))
I am not trying to hexbin or anything confine into boxes, figured that out using conditional rect() with colors but wondering with this type of data if there is a way for it to sort of be more of a freeflowing shape of heat similar to this:
Or does it need to be continuous data to do something like that?
If I understand your question correctly, I think you can do this in ggplot.
Basically you can use geom_raster to fill in the tiles with an interpolate option so it won't look "blocky". You can then set the gradient to what you want. So for example, based on the sample data you gave me I have set the low, mid, high colours to be blue, white and red respectively. It would simply be the following code:
library(ggplot2)
ggplot(total, aes(x=Var1, y=Var2)) +
geom_raster(aes(fill=score), interpolate=TRUE) +
scale_fill_gradient2(limits=c(0,100), low="blue", mid="white", high="red", midpoint = 50)
Output:

How to plot stacked point histograms?

What's the ggplot2 equivalent of "dotplot" histograms? With stacked points instead of bars? Similar to this solution in R:
Plot Histogram with Points Instead of Bars
Is it possible to do this in ggplot2? Ideally with the points shown as stacks and a faint line showing the smoothed line "fit" to these points (which would make a histogram shape.)
ggplot2 does dotplots Link to the manual.
Here is an example:
library(ggplot2)
set.seed(789); x <- data.frame(y = sample(1:20, 100, replace = TRUE))
ggplot(x, aes(y)) + geom_dotplot()
In order to make it behave like a simple dotplot, we should do this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot')
You should get this:
To address the density issue, you'll have to add another term, ylim(), so that your plot call will have the form ggplot() + geom_dotplot() + ylim()
More specifically, you'll write ylim(0, A), where A will be the number of stacked dots necessary to count 1.00 density. In the example above, the best you can do is see that 7.5 dots reach the 0.50 density mark. From there, you can infer that 15 dots will reach 1.00.
So your new call looks like this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot') + ylim(0, 15)
Which will give you this:
Usually, this kind of eyeball estimate will work for dotplots, but of course you can try other values to fine-tune your scale.
Notice how changing the ylim values doesn't affect how the data is displayed, it just changes the labels in the y-axis.
As #joran pointed out, we can use geom_dotplot
require(ggplot2)
ggplot(mtcars, aes(x = mpg)) + geom_dotplot()
Edit: (moved useful comments into the post):
The label "count" it's misleading because this is actually a density estimate may be you could suggest we changed this label to "density" by default. The ggplot implementation of dotplot follow the original one of Leland Wilkinson, so if you want to understand clearly how it works take a look at this paper.
An easy transformation to make the y axis actually be counts, i.e. "number of observations". From the help page it is written that:
When binning along the x axis and stacking along the y axis, the numbers on y axis are not meaningful, due to technical limitations of ggplot2. You can hide the y axis, as in one of the examples, or manually scale it to match the number of dots.
So you can use this code to hide y axis:
ggplot(mtcars, aes(x = mpg)) +
geom_dotplot(binwidth = 1.5) +
scale_y_continuous(name = "", breaks = NULL)
I introduce an exact approach using #Waldir Leoncio's latter method.
library(ggplot2); library(grid)
set.seed(789)
x <- data.frame(y = sample(1:20, 100, replace = TRUE))
g <- ggplot(x, aes(y)) + geom_dotplot(binwidth=0.8)
g # output to read parameter
### calculation of width and height of panel
grid.ls(view=TRUE, grob=FALSE)
real_width <- convertWidth(unit(1,'npc'), 'inch', TRUE)
real_height <- convertHeight(unit(1,'npc'), 'inch', TRUE)
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
real_binwidth <- real_width / width_coordinate_range * 0.8 # 0.8 is the argument binwidth
num_balls <- real_height / 1.1 / real_binwidth # the number of stacked balls. 1.1 is expanding value.
# num_balls is the value of A
g + ylim(0, num_balls)
Apologies : I don't have enough reputation to 'comment'.
I like cuttlefish44's "exact approach", but to make it work (with ggplot2 [2.2.1]) I had to change the following line from :
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
to
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$layout$panel_ranges[[1]]$x.range)

Is it possible to create 3 series (2 lines and one point) faceted plot in ggplot?

I am trying to write a code that I wrote with a basic graphics package in R to ggplot.
The graph I obtained using the basic graphics package is as follows:
I was wondering whether this type of graph is possible to create in ggplot2. I think we could create this kind of graph by using panels but I was wondering is it possible to use faceting for this kind of plot. The major difficulty I encountered is that maximum and minimum have common lengths whereas the observed data is not continuous data and the interval is quite different.
Any thoughts on arranging the data for this type of plot would be very helpful. Thank you so much.
Jdbaba,
From your comments, you mentioned that you'd like for the geom_point to have just the . in the legend. This is a feature that is yet to be implemented to be used directly in ggplot2 (if I am right). However, there's a fix/work-around that is given by #Aniko in this post. Its a bit tricky but brilliant! And it works great. Here's a version that I tried out. Hope it is what you expected.
# bind both your data.frames
df <- rbind(tempcal, tempobs)
p <- ggplot(data = df, aes(x = time, y = data, colour = group1,
linetype = group1, shape = group1))
p <- p + geom_line() + geom_point()
p <- p + scale_shape_manual("", values=c(NA, NA, 19))
p <- p + scale_linetype_manual("", values=c(1,1,0))
p <- p + scale_colour_manual("", values=c("#F0E442", "#0072B2", "#D55E00"))
p <- p + facet_wrap(~ id, ncol = 1)
p
The idea is to first create a plot with all necessary attributes set in the aesthetics section, plot what you want and then change settings manually later using scale_._manual. You can unset lines by a 0 in scale_linetype_manual for example. Similarly you can unset points for lines using NA in scale_shape_manual. Here, the first two values are for group1=maximum and minimum and the last is for observed. So, we set NA to the first two for maximum and minimum and set 0 to linetype for observed.
And this is the plot:
Solution found:
Thanks to Arun and Andrie
Just in case somebody needs the solution of this sort of problem.
The code I used was as follows:
library(ggplot2)
tempcal <- read.csv("temp data ggplot.csv",header=T, sep=",")
tempobs <- read.csv("temp data observed ggplot.csv",header=T, sep=",")
p <- ggplot(tempcal,aes(x=time,y=data))+geom_line(aes(x=time,y=data,color=group1))+geom_point(data=tempobs,aes(x=time,y=data,colour=group1))+facet_wrap(~id)
p
The dataset used were https://www.dropbox.com/s/95sdo0n3gvk71o7/temp%20data%20observed%20ggplot.csv
https://www.dropbox.com/s/4opftofvvsueh5c/temp%20data%20ggplot.csv
The plot obtained was as follows:
Jdbaba

Resources