Plot with a grid - r

I am looking for a type of plot that is essentially a grid. For example, there will be 10 columns and 50 rows. For example, something like this:
Each of the boxes (in this case, 10*50 = 500) will have a unique value that I will be providing via a data frame. Based on the unique values, I'll have a function that will assign a colour to each box. So then it becomes a grid to visualize "the range" of each box. I'd also need to label each of the columns (probably vertically so all labels fit) and rows (horizontally).
I just don't know what kind of plot that will be and I don't know if any libraries do this. I'm just looking for some help in finding something that does this. I'd appreciate some help if possible.

How about heatmap?
m=matrix(runif(12),3,4)
rownames(m)=c("Me","You","Him")
colnames(m)=c("We","Us","Them","I")
heatmap(m,NA,NA)
Note that it works on a matrix and not a data frame because all the values have to be numbers, and data frames are row-oriented records.
See the help for other options.

Look at the image function in the graphics package, or the rasterImage function if you want more control.
You could also build the plot up from scratch using the rect function.

I would go to ggplot2 for this as it allows a high degree of flexibility. In particular geom_tile is useful. If you actually want the panel lines you can comment out the theme(panel.grid.major = element_blank()) + and theme(panel.grid.minor = element_blank()) + lines and of course you can specify the colours as well. The text in each cell is optional; comment out the geom_text call if you don't need that. Note that you can control the size of the plot (rows and columns) simply by resizing the plot window or - if you want to output to a file using png() - by specifying the width and height arguments.
library(ggplot2)
library(reshape)
library(scales)
set.seed(1234)
num.els <- 5
mydf <- data.frame(category1 = rep(LETTERS[1:num.els], 1, each = num.els),
category2 = rep(1:num.els, num.els),
value = runif(num.els^2, 0, 100))
p <- ggplot(mydf, aes(x = category1,
y = category2,
fill = value)) +
geom_tile() +
geom_text(label = round(mydf$value, 2), size = 4, colour = "black") +
scale_fill_gradient2(low = "blue", high = "red",
limits = c(min(mydf$value), max(mydf$value)),
midpoint = median(mydf$value)) +
scale_x_discrete(expand = c(0,0)) +
scale_y_reverse() +
theme(panel.grid.minor = element_blank()) +
theme(panel.grid.major = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(panel.background = element_rect(fill = "transparent"))+
theme(legend.position = "none") +
theme()
print(p)
Output:
And resized:

Lets say you have a dataframe with "x" and "y" coordinates per each cell of the grid, and a variable "z" for each cell, and you loaded this dataframe in R called "intlgrid":
head(intlgrid)
x y z
243.742 6783.367 0.0035285
244.242 6783.367 0.0037111
244.742 6783.367 0.0039073
"..."
"so on..."
With ggplot2 package you can easily plot your raster. So:
install.packages("ggplot2")
once installed ggplot2, you just call it
library(ggplot2)
Now the code:
ggplot(intlgrid, aes(x,y, fill = z)) + geom_raster() + coord_equal()
And then you get your grid plotted.

Related

How to turn my legend horizontal as opposed to vertical with ggplot2?

I am struggling to understand why legend.horizontal is not rotating my legend axis so it isn't displaying vertically? Any help would be massively appreciated.
library(phyloseq)
library(ggplot2)
##phylum level
ps_tmp <- get_top_taxa(physeq_obj = ps.phyl, n = 10, relative = TRUE, discard_other = FALSE, other_label = "Other")
ps_tmp <- name_taxa(ps_tmp, label = "Unkown", species = T, other_label = "Other")
phyl <- fantaxtic_bar(ps_tmp, color_by = "phylum", label_by = "phylum",facet_by = "TREATMENT", other_label = "Other", order_alg = "as.is")
phyl + theme(legend.direction = "horizontal", legend.position = "bottom", )
Legends for discrete values don't have a formal direction per se and are positioned however ggplot2 decides it can best fit with your data. This is why things like legend.direction won't work here. I don't have the phyloseq package or access to your particular data, so I'll show you how this works and how you can mess with the legend using a reproducible example dataset.
library(ggplot2)
set.seed(8675309)
df <- data.frame(x=LETTERS[1:8], y=sample(1:100, 8))
p <- ggplot(df, aes(x, y, fill=x)) + geom_col()
p
By default, ggplot is putting our legend to the right and organizes it vertically as one column. Here's what happens when we move the legend to the bottom:
p + theme(legend.position="bottom")
Now ggplot thinks it's best to put that legend into 4 columns, 2 rows each. As u/Tech Commodities mentioned, you can use the guides() functions to specify how the legend looks. In this case, we will specify to have 2 columns instead of 4. We only need to supply the number of columns (or rows), and ggplot figures out the rest.
p + theme(legend.position="bottom") +
guides(fill=guide_legend(ncol=2))
So, to get a "horizontally-arranged" legend, you just need to specify that there should be only one row:
p + theme(legend.position="bottom") +
guides(fill=guide_legend(nrow=1))

ggplot2 and R - Applying custom colors to a multi group histogram in long format

I've made a histogram graph that shows the distribution of lidar returns per elevation for three lidar scans I have done.
I've converted my data to long format, with:
one column called 'value', describing the z position of each point
one column called 'variable', containing the name of each
scan group
In the attached image you can see the histograms of my three scan groups. I am currently using viridis to color the histogram by scan group (ie. the name of the scan in the variable column). However, I want to match the colours in the graph with colours I already have.
How might I do this?
The hexcols I'd like to like color each of my three histograms with are:
lightgreen = "#62FE96"
lightred = "#FE206B"
darkpurple = "#62278E"
A link to my data - 'density2'
My current code:
library(tidyverse)
library(viridisLite)
library(viridis)
# histogram
p <- density2 %>%
ggplot( aes(x=value,color = variable, show.legend = FALSE)) +
geom_histogram(binwidth = 1, alpha = 0.5, position="identity") +
scale_color_viridis(discrete =TRUE) +
scale_fill_viridis(discrete=TRUE) +
theme_bw() +
labs(fill="") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
p + scale_y_sqrt() + theme(legend.position="none") + labs(y = "data pts", x = "elevation (m)")
Any help would be most appreciated!
Delete the scale_color_viridis and scale_fill_viridis lines - these are applying the Viridis color scale. Replace with scale_fill_manual(values = c(lightgreen, lightred, darkpurple)). And in your aesthetic mapping replace color = variable with fill = variable. For a histogram, color refers to the color of the lines outlining each bar, and fill refers to the color each bar is filled in.
This should leave you with:
p <- density2 %>%
ggplot(aes(x = value, fill = variable)) +
geom_histogram(binwidth = 1, alpha = 0.5, position = "identity") +
scale_fill_manual(values = c(lightgreen, lightred, darkpurple)) +
theme_bw() +
labs(fill = "") +
theme(panel.grid = element_blank())
p + scale_y_sqrt() +
theme(legend.position = "none") +
labs(y = "data pts", x = "elevation (m)")
I've also done some other clean-up. show.legend = FALSE does not belong inside aes() - and your theme(legend.position = "none") should take care of it.
I did not download your data, save it in my working directory, import it into R, and test this code on it. If you need more help, please post a small subset of your data in a copy/pasteable format (e.g., dput(density2[1:20, ]) for the first 20 rows---choose a suitable subset) and I'll be happy to test and adjust.

R: ggplot2 density plot shows wrong fill colors

I would like to plot densities of two variables ("red_variable", "green_variable") from two independent dataframes on one density plot, using red and green color for the two variables.
This is my attempt at coding:
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = red_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = green_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
Result: The legend shows correct colors, but the colors on the plot are wrong: The "red" variable is plotted with green color, the "green" variable with red color. The "green" density (mean=8) should appear left and the "red" density (mean=12) on the right on the x-axis. This behavior of the plot doesn't make any sense to me.
I can in fact get the desired result by switching red and green in the code:
### load ggplot2
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = green_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = red_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
... While the plot makes sense now, the code doesn't. I cannot really trust code doing the opposite of what I would expect it to do. What's the problem here? Am I color blind?
On your code, in order to have color at the right position, you need to specify fill = red_color or fill = green_color (as well as alpha as it is a constant - as pointed out by #Gregor) outside of the aes such as:
...+
geom_density(aes(x=red_variable), alpha=0.5, fill = red_color, data=red_dataframe) +
geom_density(aes(x=green_variable), alpha=0.5, fill = green_color, data=green_dataframe) + ...
Alternatively, you can bind your dataframes together, reshape them into a longer format (much more appropriate to ggplot) and then add color column that you can use with scale_fill_identity function (https://ggplot2.tidyverse.org/reference/scale_identity.html):
df <- cbind(red_dataframe,green_dataframe)
library(tidyr)
library(ggplot2)
library(dplyr)
df <- df %>% pivot_longer(.,cols = c(red_variable,green_variable), names_to = "var",values_to = "val") %>%
mutate(Color = ifelse(grepl("red",var),red_color,green_color))
ggplot(df, aes(val, fill = Color))+
geom_density(alpha = 0.5)+
scale_fill_identity(guide = "legend", name = "Legend", labels = levels(as.factor(df$var)))+
xlab("X value") +
ylab("Density")
Does it answer your question ?
You're trying to use ggplot as if it's base graphics... the mindset shift can take a little while to get used to. dc37's answer shows how you should do it. I'll try to explain what goes wrong in your attempt:
When you put fill = green_color inside aes(), because it's inside aes() ggplot essentially creates a new column of data filled with the green_color values in your green_data_frame, i.e., "#008000", "#008000", "#008000", .... Ditto for the red color values in the red data frame. We can see this if we modify your plot by simply deleting your scale:
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
We can actually get what you want by putting the identity scale, which is designed for the (common in base, rare in ggplot2) case where you actually put color values in the data.
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
scale_fill_identity() +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
When you added your scale_fill_manual, ggplot was like "okay, cool, you want to specify colors and labels". But you were thinking in the order that you added the layers to the plot (much like base graphics), whereas ggplot was thinking of these newly created variables "#FF0000" and "#008000", which it ordered alphabetically by default (just as if they were factor or character columns in a data frame). And since you happened to add the layers in reverse alphabetical order, it was switched.
dc37's answer shows a couple better methods. With ggplot you should (a) work with a single, long-format data frame whenever possible (b) don't put constants inside aes() (constant color, constant alpha, etc.), (c) set colors in a scale_fill_* or scale_color_* function when they're not constant.

Creating a density histogram in ggplot2?

I want to create the next histogram density plot with ggplot2. In the "normal" way (base packages) is really easy:
set.seed(46)
vector <- rnorm(500)
breaks <- quantile(vector,seq(0,1,by=0.1))
labels = 1:(length(breaks)-1)
den = density(vector)
hist(df$vector,
breaks=breaks,
col=rainbow(length(breaks)),
probability=TRUE)
lines(den)
With ggplot I have reached this so far:
seg <- cut(vector,breaks,
labels=labels,
include.lowest = TRUE, right = TRUE)
df = data.frame(vector=vector,seg=seg)
ggplot(df) +
geom_histogram(breaks=breaks,
aes(x=vector,
y=..density..,
fill=seg)) +
geom_density(aes(x=vector,
y=..density..))
But the "y" scale has the wrong dimension. I have noted that the next run gets the "y" scale right.
ggplot(df) +
geom_histogram(breaks=breaks,
aes(x=vector,
y=..density..,
fill=seg)) +
geom_density(aes(x=vector,
y=..density..))
I just do not understand it. y=..density.. is there, that should be the height. So why on earth my scale gets modified when I try to fill it?
I do need the colours. I just want a histogram where the breaks and the colours of each block are directionally set according to the default ggplot fill colours.
Manually, I added colors to your percentile bars. See if this works for you.
library(ggplot2)
ggplot(df, aes(x=vector)) +
geom_histogram(breaks=breaks,aes(y=..density..),colour="black",fill=c("red","orange","yellow","lightgreen","green","darkgreen","blue","darkblue","purple","pink")) +
geom_density(aes(y=..density..)) +
scale_x_continuous(breaks=c(-3,-2,-1,0,1,2,3)) +
ylab("Density") + xlab("df$vector") + ggtitle("Histogram of df$vector") +
theme_bw() + theme(plot.title=element_text(size=20),
axis.title.y=element_text(size = 16, vjust=+0.2),
axis.title.x=element_text(size = 16, vjust=-0.2),
axis.text.y=element_text(size = 14),
axis.text.x=element_text(size = 14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
fill=seg results in grouping. You are actually getting a different histogram for each value of seg. If you don't need the colours, you could use this:
ggplot(df) +
geom_histogram(breaks=breaks,aes(x=vector,y=..density..), position="identity") +
geom_density(aes(x=vector,y=..density..))
If you need the colours, it might be easiest to calculate the density values outside of ggplot2.
Or an option with ggpubr
library(ggpubr)
gghistogram(df, x = "vector", add = "mean", rug = TRUE, fill = "seg",
palette = c("#00AFBB", "#E7B800", "#E5A800", "#00BFAB", "#01ADFA",
"#00FABA", "#00BEAF", "#01AEBF", "#00EABA", "#00EABB"), add_density = TRUE)
The confusion regarding interpreting the y-axis might be due to density is plotted rather than count. So, the values on the y-axis are proportions of the total sample, where the sum of the bars is equal to 1.

ggplot font size for different elements

I know that after I create a ggplot graph I can use theme_get() to return detail of all the theme elements. This has been very helpful in figuring out things like strip.text.x and the like. But I have two things I can't figure out:
1) In the following ggplot graphic, what is the name of the theme item representing the phrase "Percent of wood chucked by the woodchuck" as I want to resize it to a larger font:
2) How do I reformat the y axis labels to read 10%, 20, ... instead of .1, .2, ...
For 1), it is $axis.title.y
p + theme(axis.title.x = element_text(size = 25))
where p is an existing ggplot object.
I don't know about 2) off hand.
For (2) what you want is to use a formatter:
dat <- data.frame(x=1:10,y=1:10)
#For ggplot2 0.8.9
ggplot(dat,aes(x = x/10,y=y/10)) +
geom_point() +
scale_x_continuous(formatter = "percent")
#For ggplot2 0.9.0
ggplot(dat,aes(x = x/10,y=y/10)) +
geom_point() +
scale_x_continuous(labels = percent_format())

Resources