ggplot | Legend not filled - r

this question is different from this one and this one in the sense that I get a legend. However, the values in the legend only have coloured borders, and not the entire area.
I am fairly sure there is an easy argument I forgot to specify, but I am looking for it for about 2 hours now and am slowly going crazy. Maybe you can help.
First, a reproducable example:
library(ggplot2)
set.seed(1234)
#this only generates coordinates, in my case 72 latitude and 144 longitude ones. ignore it if you want
df <- data.frame(lon = rep(1:144/144*360-180,72), lat = rep(1:72/72*180-90, each = 144), val = runif(72*144) )
world <- map_data("world")
# This function is supposed to colour all points where the val columns is less than a specified value in blue, and the others in red.
#'#param cutoff a cutoff value, where all values less than the value are plotted in blue and all coefficients greater than it are plotted in red
#'#return A plot object of a map of the world where the color indicates the value
#'#export
plot_with_cutoff = function(df, cutoff = quantile(df$val, 0.05)){
df$indicator <- as.factor(ifelse(df$val<cutoff,0,1))
plot <- ggplot() + geom_tile(data = df, mapping = aes(x = lon, y = lat, color = indicator)) +
coord_fixed(1.4)+ scale_color_manual("Higher or lower than 5% quantile", values = c("blue","red")) +
geom_polygon(data = world, aes(x=long, y = lat, group = group), fill = NA, color = "black")
return(plot)
}
plot_with_cutoff(df)
As you can see, the function works as requested (it's not nice to look at, but only an example after all. It's not nice to look at because the data is randomly generated, which it ususally isn't). BUUUT, LOOK AT THE LEGEND!!!
I don't know how to get the "squares" to be filled, and I honestly have no idea what else to do, so any help is greatly appreciated! Thanks in Advance!!!

Related

How to fix overlapping hexagons with geom_hex() and ggplot()?

I am plotting some data, basically I have coordinates and a value for each point. I wanted to make a hexagon map, with each hexagon averaging all the point values that correspond to that hexagon.
I manage to produce the map, but the some of the hexagons are overlapping and I am not sure how to fix it.
Here is my code:
pp = ggplot(df, aes(x = lon, y = lat, fill=value, group=value)) +
geom_hex(bins = 50, linewidth = 10)
pp
And the plot:
If you want the hexagons to be colored according to the average value, you will need stat_summary_hex, passing the numeric value to the z aesthetic, which by default is averaged in each hex bin.
Don't group by value - this effectively creates a layer of hexbins for each value, and this is what leads to the bins being in different positions in each group. Also, the values can't be averaged if they are in different groups.
library(ggplot2)
ggplot(df, aes(x = lon, y = lat)) +
stat_summary_hex(aes(z = as.numeric(as.character(value))),
bins = 50, linewidth = 10) +
scale_fill_gradientn(colors = scales::hue_pal()(5))
Note that the latest CRAN version of ggplot has an issue with hex-binning, and you will need to install the development version to get a decent result here. See this question for further details.
Created on 2023-01-04 with reprex v2.0.2
Data used
set.seed(1)
df <- data.frame(lon = rnorm(1000, 5.5), lat = rnorm(1000, 52.5),
value = factor(sample(0:4, 1000, TRUE)))

feature visualization in tsne plot

I have a table 1, where each row corresponds to the feature vector of gene in particular patient. The patient IDs located in the first column (label), while gene index located in the second column (geneIndex). The rest of the columns have feature values in various dimensions (128 overall).
I was able to perform the tsne reduction on these data to 2D and label clusters according to patient IDs. Here is the code:
library(Rtsne)
experiment<- read.table("test.txt", header=TRUE, sep= "\t")
metadata <- data.frame(sample_id = rownames(experiment),
colour = experiment$label)
data <- as.matrix(experiment[,2:129])
set.seed(1)
tsne <- Rtsne(data)
df <- data.frame(x = tsne$Y[,1],
y = tsne$Y[,2],
colour = metadata$colour)
library(ggplot2)
ggplot(df, aes(x, y, colour = colour)) +
geom_point()
However, my goal is to visualize feature vectors related to geneIndex. For example, I would like to pinpoint geneIndex "3" in red color, while the rest of the points on the plot will have grey color.
I would appreciate any suggestions!
Thank you!
Looking at the data, seems like there's not a lot of 3's and so if you just plot with others getting a transparent gray and selected have red.. i think it's hard to see:
df$geneIndex = experiment$geneIndex
plotIndex = function(data,selectedGene){
data$Gene = ifelse(data$geneIndex == selectedGene,selectedGene,"others")
ggplot(data, aes(x, y, colour = Gene))+
geom_point(alpha=0.3,size=1)+
scale_color_manual(values=c("#FF0000E6","#BEBEBE1A"))+
theme_bw()
}
plotIndex(df,3)
Maybe try circling the plots by plotting again, in combination with a new legend:
library(ggnewscale)
plotIndex = function(data,selectedGene){
subdf = subset(data,geneIndex == selectedGene)
ggplot(data, aes(x, y, colour = colour)) +
geom_point(alpha=0.3,size=2,shape=20)+
new_scale_color()+
geom_point(data=subdf,
aes(col=factor(geneIndex)),
shape=1,stroke=0.8,size=2.1)+
scale_color_manual("geneIndex",values="red")+
theme_bw()
}
plotIndex(df,3)
You can forget about the ggnewscale library if you don't need a legend. This package might be able to do the above too.. you needa check.

Geom_Point not portioning sizes on ggmap correctly

So I am plotting crime levels onto London, however the sizes and colours are not proportionate to the figures I am using. I will leave the code and graphic below, if anyone can give me an explanation as to why this is happening i would really appreciate it. Thanks
London <- c(lon=-0.1278, lat=51.5074)
London_map <- get_googlemap(center = London, zoom = 10)
plot(London_map)
LondonRob <- read.csv("Rob London.csv")
Areas <- as.character(LondonRob$ï..Borough)
locs_geo <- geocode(Areas)
df <- cbind(LondonRob, locs_geo)
Map <- ggmap(London_map) +
geom_point(data = df,
aes(x = lon, y = lat,
size = LondonRob$X2012.13,
color = LondonRob$X2012.13))
Map1 <- Map + labs(color="Robbery")
Map1 <- Map1 + labs(size="Figures")
Map1
scale_size_continuous, which is the default when mapping a continuous variable to size, uses the range argument to determine the scale. By default, this is c(1, 6), and ggplot will translate the variation in the X2012.13 variable to a range from size 1 to size 6.
In many scenarios, with points, you don't want this default. Instead you want to able to compare the sizes directly, with 0 being no point at all. Then you can use scale_size_area instead.
This has the added benefit the area is scaled instead of the radius, which performs better visually.
In addition, you very likely should not be doing
aes(x = lon, y = lat, size = LondonRob$X2012.13, color = LondonRob$X2012.13)
but rather rather
aes(x = lon, y = lat, size = X2012.13, color = X2012.13)
As a general rule, never use $ inside aes, things can go horribly wrong, without any errors or warnings.

Coloring differently adjacent regions on a map with ggplot

I am trying to make a map of different regions in R with ggplot, where adjacent regions don't have the same color, something aking to what the five color theorem describes.
Regions are groups of californians counties, coded with a number (here the column c20). Using ggplot() and geom_map() with a qualitative scale to color the regions, the closest I get is there:
ggplot() + geom_map(data = data, aes(map_id = geoid, fill = as.factor(c20 %% 12)),
map = county) + expand_limits(x = county$long, y = county$lat) +
coord_map(projection="mercator") +
scale_fill_brewer(palette = "Paired") +
geom_text(data = distcenters, aes(x = clong, y = clat, label = cluster, size = 0.2))
The problem is that adjacent counties from different regions (i.e. with a different number), will sometimes be of the same color. For instance, around Los Angeles, counties from regions 33 & 45 are the same color, and we don't visually differentiate the regions.
Is there a way to do that with ggplot?
Try this. It takes a spatial polygons data frame and returns a vector of colours for each element such that no two adjacent polygons have the same colour.
You need to install the spdep package first.
nacol <- function(spdf){
resample <- function(x, ...) x[sample.int(length(x), ...)]
nunique <- function(x){unique(x[!is.na(x)])}
np = nrow(spdf)
adjl = spdep::poly2nb(spdf)
cols = rep(NA, np)
cols[1]=1
nextColour = 2
for(k in 2:np){
adjcolours = nunique(cols[adjl[[k]]])
if(length(adjcolours)==0){
cols[k]=resample(cols[!is.na(cols)],1)
}else{
avail = setdiff(nunique(cols), nunique(adjcolours))
if(length(avail)==0){
cols[k]=nextColour
nextColour=nextColour+1
}else{
cols[k]=resample(avail,size=1)
}
}
}
return(cols)
}
Test:
library(spdep)
example(columbus)
columbus$C = nacol(columbus)
plot(columbus,col=columbus$C+1)
This is fairly late, but when searching for the same issue, I found a dev package called MapColoring.
It does exactly what you asked for.

Adding points to GGPLOT2 Histogram

I'm trying to produce a histogram that illustrates observed points(a sub-set) on a histogram of all observations. To make it meaningful, I need to color each point differently and place a legend on the plot. My problem is, I can't seem to get a scale to show up on the plot. Below is an example of what I've tried.
subset <-1:8
results = data.frame(x_data = rnorm(5000),TestID=1:5000)
m <- ggplot(results,aes(x=x_data))
m+stat_bin(aes(y=..density..))+
stat_density(colour="blue", fill=NA)+
geom_point(data = results[results$TestID %in% subset,],
aes(x = x_data, y = 0),
colour = as.factor(results$TestID[results$TestID %in% subset]),
size = 5)+
scale_colour_brewer(type="seq", palette=3)
Ideally, I'd like the points to be positioned on the density line(but I'm really unsure of how to make that work, so I'll settle to position them at y = 0). What I need most urgently is a legend which indicates the TestID that corresponds to each of the points in subset.
Thanks a lot to anyone who can help.
This addresses your second point - if you want a legend, you need to include that variable as an aesthetic and map it to a variable (colour in this case). So all you really need to do is move colour = as.factor(results$TestID[results$TestID %in% subset]) inside the call to aes() like so:
ggplot(results,aes(x=x_data)) +
stat_bin(aes(y=..density..))+
stat_density(colour="blue", fill=NA)+
geom_point(data = results[results$TestID %in% subset,],
aes(x = x_data,
y = 0,
colour = as.factor(results$TestID[results$TestID %in% subset])
),
size = 5) +
scale_colour_brewer("Fancy title", type="seq", palette=3)

Resources