Heatmap/density plot from a grid in ggplot - r

I have a CSV file that's a grid.
Link to the raw csv file on github
I want to plot this using ggplot so that it will look like this plot.
Where the colours are representing the values in the cells. (The provided csv has other values)
I can't get it to work without using a x and y aesthetic. Maybe it can be fixed with using rownames and colnames, but now the first row and column are used as colnames and rownames. Need to find something to change that as well.
I feel like I'm going nuts. Want to use geom_bin2d() or even stat_density_2d() but couldn't get is to work.

Maybe you can try to melt it and then geom_raster():
library(ggplot2)
library(reshape2)
x = read.csv("https://raw.githubusercontent.com/Friends-of-Tracking-Data-FoTD/LaurieOnTracking/master/EPV_grid.csv",header=FALSE)
ggplot(melt(as.matrix(t(x))), aes(Var1,Var2, fill=value)) +
geom_raster() +
scale_fill_viridis_c(direction=-1) +
theme_minimal()
Or:
ggplot(melt(as.matrix(t(x))), aes(Var1,Var2, fill=value)) +
geom_tile() +
scale_fill_viridis_c(direction=-1) +
theme_minimal()

Related

Why does this ggplot only plot the grid without the values?

I am trying to plot a bar chart in ggplot but I am continuously getting only the grid. This is apparently a demonstration about the draw nothing here but I would like to understand how to get the values visible in the simplest way.
library(ggplot2)
testData<-data.frame(x=c("a","b","c","d","e","f"), y=c(10,6,9,28,10,17))
bar <- ggplot(data=testData, aes(x=c("a","b","c","d","e","f"), y=c(10,6,9,28,10,17), fill = "#FFCC00"))
One way I can get the plots is the geom_bar
bar <- ggplot(data=testData, aes(x=c("a","b","c","d","e","f"), y=c(10,6,9,28,10,17), fill = "#FFCC00")) + geom_bar(stat="identity")
Why are the values not plotted on the first bar chart and how to fix it the simplest way? What is the idea behind of this way of plotting with + and what is it called?
With the ggplot2 package, calling ggplot() is only meant to call the basic grid; it's like taking out a piece of graph paper before drawing a graph. In either case, having the grid ready has nothing to do with plotting the graph. That's why running the following command will result in the empty grid in your first example:
ggplot(data=testData, aes(x=x, y=y, fill = "#FFCC00"))
It's not the same as using a function like plot() or hist(), which prep the grid and plot the data at the same time:
plot(x=x,y=y,data=testData)
hist(x=x,data=testData)
The "+" in ggplot is just a way to say that there are more arguments related to the ggplot that we want included on top of the first blank grid. That's why each line separated by a "+" is typically called a layer.
So, if we want to make a simple scatterplot, we add points on top of a grid:
testData<-data.frame(x=c(1:6), y=c(10,6,9,28,10,17))
ggplot(data=testData,aes(x=x,y=y)) +
geom_point()
Output:
If we want to add lines to that scatterplot, we can just add one line of code:
ggplot(data=testData,aes(x=x,y=y)) +
geom_point() +
geom_line()
Output:
We can keep adding layers like this if we want. Just note that they will print in the order that you type them (i.e. the first few lines will be below the lines printed after them):
ggplot(data=testData,aes(x=x,y=y)) +
geom_bar(stat="identity",fill="#00BFC4") +
geom_point() +
geom_line()
Output:
Also, note that it's recommended not to call your data multiple times within a ggplot call; that can lead to errors.
Don't use:
ggplot(data=testData, aes(x=c("a","b","c","d","e","f"),
y=c(10,6,9,28,10,17), fill = "#FFCC00")) +
geom_bar(stat="identity")
#or
ggplot(data=testData, aes(x=testData$x, y=testData$x, fill = "#FFCC00")) +
geom_bar(stat="identity")
Instead use:
ggplot(data=testData, aes(x=x, y=y, fill="#FFCC00")) +
geom_bar(stat="identity")
If you want to plot data from a data frame(s) not called within the first ggplot() line, then simply add a data argument to the "layers" that use that different data frame, like this:
ggplot(data=testData,aes(x=x,y=y)) +
geom_bar(stat="identity",fill="#00BFC4") +
geom_point(data=differentDf, aes(x=x,y=y)) +
geom_line(data=differentDf, aes(x=x,y=y))

Adding text to facetted histogram

Using ggplot2 I have made facetted histograms using the following code.
library(ggplot2)
library(plyr)
df1 <- data.frame(monthNo = rep(month.abb[1:5],20),
classifier = c(rep("a",50),rep("b",50)),
values = c(seq(1,10,length.out=50),seq(11,20,length.out=50))
)
means <- ddply (df1,
c(.(monthNo),.(classifier)),
summarize,
Mean=mean(values)
)
ggplot(df1,
aes(x=values, colour=as.factor(classifier))) +
geom_histogram() +
facet_wrap(~monthNo,ncol=1) +
geom_vline(data=means, aes(xintercept=Mean, colour=as.factor(classifier)),
linetype="dashed", size=1)
The vertical line showing means per month is to stay.
But I want to also add text over these vertical lines displaying the mean values for each month. These means are from the 'means' data frame.
I have looked at geom_text and I can add text to plots. But it appears my circumstance is a little different and not so easy. It's a lot simpler to add text in some cases where you just add values of the plotted data points. But cases like this when you want to add the mean and not the value of the histograms I just can't find the solution.
Please help. Thanks.
Having noted the possible duplicate (another answer of mine), the solution here might not be as (initially/intuitively) obvious. You can do what you need if you split the geom_text call into two (for each classifier):
ggplot(df1, aes(x=values, fill=as.factor(classifier))) +
geom_histogram() +
facet_wrap(~monthNo, ncol=1) +
geom_vline(data=means, aes(xintercept=Mean, colour=as.factor(classifier)),
linetype="dashed", size=1) +
geom_text(y=0.5, aes(x=Mean, label=Mean),
data=means[means$classifier=="a",]) +
geom_text(y=0.5, aes(x=Mean, label=Mean),
data=means[means$classifier=="b",])
I'm assuming you can format the numbers to the appropriate precision and place them on the y-axis where you need to with this code.

want to layer aes in ggplot2

I would like to plot another series of data on top of a current graph. The additional data only contains information for 3 (out of 6) spp, which are used in the facet_wraping.
The other series of data is currently a column (in the same data file).
Current graph:
ped.num <- ggplot(data, aes(ped.length, seeds.inflorstem))
ped.num + geom_point(size=2) + theme_bw() + facet_wrap(~spp, scales = "free_y")
Additional layer would be:
aes(ped.length, seeds.filled)
I feel I should be able to plot them using the same y-axis, because they have just slightly smaller values. How do I go about add this layer?
#ialm 's solution should work fine, but I recommend calling the aes function separately in each geom_* because it makes the code easier to read.
ped.num <- ggplot(data) +
geom_point(aes(x=ped.length, y=seeds.inflorstem), size=2) +
theme_bw() +
facet_wrap(~spp, scales="free_y") +
geom_point(aes(x=ped.length, y=seeds.filled))
(You'll always get better answers if you include example data, but I'll take a shot in the dark)
Since you want to plot two variables that are on the same data.frame, it's probably easiest to reshape the data before feeding it into ggplot:
library(reshape2)
# Melting data gives you exactly one observation per row - ggplot likes that
dat.melt <- melt(dat,
id.var = c("spp", "ped.length"),
measure.var = c("seeds.inflorstem", "seeds.filled")
)
# Plotting is slightly different - instead of explicitly naming each variable,
# you'll refer to "variable" and "value"
ggplot(dat.melt, aes(x = ped.length, y = value, color = variable)) +
geom_point(size=2) +
theme_bw() +
facet_wrap(~spp, scales = "free_y")
The seeds.filled values should plot only on the facets for the corresponding species.
I prefer this to Drew's (totally valid) approach of explicitly mapping different layers because you only need a single geom_point() whether you have two variables or twenty and it's easy to map a variety of aesthetics to variable.

ggplot2 stacked barplots, formatting, and grids

In the data that I am attempting to plot, each sample belongs in one of several groups, that will be plotted on their own grids. I am plotting stacked bar plots for each sample that will be ordered in increasing number of sequences, which is an id attribute of each sample.
Currently, the plot (with some random data) looks like this:
(Since I don't have the required 10 rep for images, I am linking it here)
There are couple things I need to accomplish. And I don't know where to start.
I would like the bars not to be placed at its corresponding nseqs value, rather placed next to each other in ascending nseqs order.
I don't want each grid to have the same scale. Everything needs to fit snugly.
I have tried to set scales and size to for facet_grid to free_x, but this results in an unused argument error. I think this is related to the fact that I have not been able to get the scales library loaded properly (it keeps saying not available).
Code that deals with plotting:
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_grid(~group) +
scale_y_continuous() +
opts(title=paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))
Try this:
update.packages()
## I'm assuming your ggplot2 is out of date because you use opts()
## If the scales library is unavailable, you might need to update R
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
ggfdata$nseqs <- factor(ggfdata$nseqs)
## Making nseqs a factor will stop ggplot from treating it as a numeric,
## which sounds like what you want
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_wrap(~group, scales="free_x") + ## No need for facet_grid with only one variable
labs(title = paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))

How can I change the colors in a ggplot2 density plot?

Summary: I want to choose the colors for a ggplot2() density distribution plot without losing the automatically generated legend.
Details: I have a dataframe created with the following code (I realize it is not elegant but I am only learning R):
cands<-scan("human.i.cands.degnums")
non<-scan("human.i.non.degnums")
df<-data.frame(grp=factor(c(rep("1. Candidates", each=length(cands)),
rep("2. NonCands",each=length(non)))), val=c(cands,non))
I then plot their density distribution like so:
library(ggplot2)
ggplot(df, aes(x=val,color=grp)) + geom_density()
This produces the following output:
I would like to choose the colors the lines appear in and cannot for the life of me figure out how. I have read various other posts on the site but to no avail. The most relevant are:
Changing color of density plots in ggplot2
Overlapped density plots in ggplot2
After searching around for a while I have tried:
## This one gives an error
ggplot(df, aes(x=val,colour=c("red","blue"))) + geom_density()
Error: Aesthetics must either be length one, or the same length as the dataProblems:c("red", "blue")
## This one produces a single, black line
ggplot(df, aes(x=val),colour=c("red","green")) + geom_density()
The best I've come up with is this:
ggplot() + geom_density(aes(x=cands),colour="blue") + geom_density(aes(x=non),colour="red")
As you can see in the image above, that last command correctly changes the colors of the lines but it removes the legend. I like ggplot2's legend system. It is nice and simple, I don't want to have to fiddle about with recreating something that ggplot is clearly capable of doing. On top of which, the syntax is very very ugly. My actual data frame consists of 7 different groups of data. I cannot believe that writing + geom_density(aes(x=FOO),colour="BAR") 7 times is the most elegant way of coding this.
So, if all else fails I will accept with an answer that tells me how to get the legend back on to the 2nd plot. However, if someone can tell me how to do it properly I will be very happy.
set.seed(45)
df <- data.frame(x=c(rnorm(100), rnorm(100, mean=2, sd=2)), grp=rep(1:2, each=100))
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set1")
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set3")
gives me same plots with different sets of colors.
Provide vector containing colours for the "values" argument to map discrete values to manually chosen visual ones:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("red", "blue"))
To choose any colour you wish, enter the hex code for it instead:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("#f5d142", "#2bd63f")) # yellow/green

Resources