Problems making a graphic in ggplot - r

I an working with ggplot. I want to desine a graphic with ggplot. This graphics is with two continuous variables but I would like to get a graphic like this:
Where x and y are the continuous variables. My problem is I can't get it to show circles in the line of the plot. I would like the plot to have circles for each pair of observations from the continuous variables. For example in the attached graphic, it has a circle for pairs (1,1), (2,2) and (3,3). It is possible to get it? (The colour of the line doesn't matter.)

# dummy data
dat <- data.frame(x = 1:5, y = 1:5)
ggplot(dat, aes(x,y,color=x)) +
geom_line(size=3) +
geom_point(size=10) +
scale_colour_continuous(low="blue",high="red")
Playing with low/high will change the colours.
In general, to remove the legend, use + theme(legend.position="none")

Related

Geom_area plot doesn't fill the area between the lines

I want to make an area plot with ggplot(mpg, aes(x=year,y=hwy, fill=manufacturer)) + geom_area(), but I get this:
I'm realy new in R world, can anyone explain why it does not fill the area between the lines? Thanks!
First of all, there's nothing wrong with your code. It's working as intended and you are correct in the syntax required to do what you are looking to do.
Why don't you get the area geom to plot correctly, then? Simple answer is that you don't have enough points to draw a proper line between your x values for all of the aesthetics (manufacturers). Try the geom_point plot and you'll see what I mean:
ggplot(mpg, aes(x=year,y=hwy)) + geom_point(aes(color=manufacturer))
You need a different dataset. Here's a dummy one that is simply two lines with different slopes. It works as expected because each of the aesthetics has y values which span the x labels:
# dummy dataset
df <- data.frame(
x=rep(1:10,2),
y=c(seq(1,10,length.out=10), seq(1,5,length.out=10)),
z=c(rep('A',10), rep('B', 10))
)
# plot
ggplot(df, aes(x,y)) + geom_area(aes(fill=z))

How do I add intensity legend of colors after I plot using grid.raster()?

I am doing kmeans clustering on a png image and have been plotting it using grid::grid.raster(image). But I would like to put a legend which shows the intensity in a bar(from blue to red) marked with values, essentially indicating the intensity on the image. (image is an array where the third dimension equals 3 giving the red, green and blue channels.)
I thought of using grid.legend() but couldn't figure it out. I am hoping the community can help me out. Following is the image I have been using and after I perform kmeans clustering want a legend beside it that displays intensity on a continuous scale on a color bar.
Also I tried with ggplot2 and could plot the image but still couldn't plot the legend. I am providing the ggplot code for plotting the image. I can extract the RGB channels separately using ggplot2 also, so showing that also helps.
colassign <- rgb(Kmeans2#centers[clusters(Kmeans2),])
library(ggplot2)
ggplot(data = imgVEC, aes(x = x, y = y)) +
geom_point(colour = colassign) +
labs(title = paste("k-Means Clustering of", kClusters, "Colours")) +
xlab("x") +
ylab("y")
Did not find a way to use grid.raster() properly but found a way to do it by ggplot2 when plotting the RGB channels separately. Note: this only works for plotting the pannels separately, but this is what I needed. Following shows the code for green channel.
#RGB channels are respectively stored in columns 1,2,3.
#x-axis and y-axis values are stored in columns 4,5.
#original image is a nx5 matrix
ggplot(original_img[,c(3,4,5)], aes(x, y)) +
geom_point(aes(colour = segmented_img[,3])) +
scale_color_gradient2()+
# scale_color_distiller(palette="RdYlBu") can be used instead of scale_color_gradient2() to get color selections of choice using palette as argument.

Wrong density values in a histogram with `fill` option in `ggplot2`

I was creating histograms with ggplot2 in R whose bins are separated with colors and noticed one thing. When the bins of a histogram are separated by colors with fill option, the density value of the histogram turns funny.
Here is the data.
set.seed(42)
x <- rnorm(10000,0,1)
df <- data.frame(x=x, b=x>1)
This is a histogram without fill.
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..))
This is a histogram with fill.
ggplot(df, aes(x = x, fill=b)) +
geom_histogram(aes(y=..density..))
You can see the latter is pretty crazy. The left side of the bins is sticking out. The density values of the bins of each color are obviously wrong.
I thought over this issue for a while. The data can't be wrong for the first histogram was normal. It should be something in ggplot2 or geom_histogram function. I googled "geom_histogram density fill" and couldn't find much help.
I want the end product to look like:
Separated by colors as you see in the second histogram
Size and shape identical to the first histogram
The vertical axis being density
How would you deal with issue?
I think what you may want is this:
ggplot(df, aes(x = x, fill=b)) +
geom_histogram()
Rather than the density. As mentioned above the density is asking for extra calcuations.
One thing that is important (in my opinion) is that histograms are graphs of one variable. As soon as you start adding data from other variables you start to change them more into bar charts or something else like that.
You will want work on setting the axis manually if you want it to range from 0 to .4.
The solution is to hand-compute density like this (instead of using the built-in ggplot2 version):
library(ggplot2)
# Generate test data
set.seed(42)
x <- rnorm(10000,0,1)
df <- data.frame(x=x, b=x>1)
ggplot(df, aes(x = x, fill=b)) +
geom_histogram(mapping = aes(y = ..count.. / (sum(..count..) * ..width..)))
when you provide a column name for the fill parameter in ggplot it groups varaiables and plots them according to each group with a unique color.
if you want a single color for the plot just specify the color you want:
FIXED
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..),fill="Blue")

ggplot geom_histogram color by factor not working properly

In trying to color my stacked histogram according to a factor column; all the bars have a "green" roof? I want the bar-top to be the same color as the bar itself. The figure below shows clearly what is wrong. All the bars have a "green" horizontal line at the top?
Here is a dummy data set :
BodyLength <- rnorm(100, mean = 50, sd = 3)
vector <- c("80","10","5","5")
colors <- c("black","blue","red","green")
color <- rep(colors,vector)
data <- data.frame(BodyLength,color)
And the program I used to generate the plot below :
plot <- ggplot(data = data, aes(x=data$BodyLength, color = factor(data$color), fill=I("transparent")))
plot <- plot + geom_histogram()
plot <- plot + scale_colour_manual(values = c("Black","blue","red","green"))
Also, since the data column itself contains color names, any way I don't have to specify them again in scale_color_manual? Can ggplot identify them from the data itself? But I would really like help with the first problem right now...Thanks.
Here is a quick way to get your colors to scale_colour_manual without writing out a vector:
data <- data.frame(BodyLength,color)
data$color<- factor(data$color)
and then later,
scale_colour_manual(values = levels(data$color))
Now, with respect to your first problem, I don't know exactly why your bars have green roofs. However, you may want to look at some different options for the position argument in geom_histogram, such as
plot + geom_histogram(position="identity")
..or position="dodge". The identity option is closer to what you want but since green is the last line drawn, it overwrites previous the colors.
I like density plots better for these problems myself.
ggplot(data=data, aes(x=BodyLength, color=color)) + geom_density()
ggplot(data=data, aes(x=BodyLength, fill=color)) + geom_density(alpha=.3)

Clustering dots in a scatterplot

Let's say I have this data.frame:
df <- data.frame(x = rep(1, 20), y = runif(20, 10, 20))
and I want to plot df$y vs. df$x.
Since the x values are constant, points that have identical or close y values will be plotted on top of each other in a simple scatterplot, which kind of hides the density of points at such y-values. One solution for that situation is of course to use a violin plot.
I'm looking for another solution - plotting clusters of points instead of the individual points, which will therefore look similar to a bubble plot. In a bubble plot however, a third dimension is required in order to make the bubbles meaningful, which I don't have in my data. Does anyone know of an R function/package that take as input points (and probably a defined radius) and will cluster them and plot them?
You can jitter the x values:
plot(jitter(df$x),df$y)
You could try a hexplot, using either the hexplot library or stat_binhex in ggplot2.
http://cran.r-project.org/web/packages/hexbin/
http://docs.ggplot2.org/0.9.3/stat_binhex.html
The other standard approach (vs. jitter) is to use a partially transparent color, so that overlapping points will appear darker than "lone" points.
De gustibus, etc.
Using transparency is another solution. E.g.:
ggplot(df, aes(x=x, y=y)) +
geom_point(alpha=0.2, size=3)
When there is only one x value, a density plot:
ggplot(df, aes(x=y)) +
stat_density(geom="line")
or a violin plot:
ggplot(df, aes(x=x, y=y)) +
geom_violin()
might also be options for displaying your data.
look at the sunflowerplot function (and the xyTable function that it uses to count overlapping points).
You could also use the my.symbols function from the TeachingDemos package with the results of xyTable to use other shapes (polygrams or example).

Resources