ggVennDiagram, define decimal places - r

does someone have an idea how to change the number of displayed decimal places when using the ggVennDiagram function (R)?
# Example code
install.packages("ggVennDiagram")
my_species <- paste("species", 1:50, sep="")
set.seed(2)
x <-list(A = sample(my_species, 12),
B = sample(my_species, 20),
C = sample(my_species, 16),
D = sample(my_species, 2))
# venn plot
ggVennDiagram(x, label="percent") +
theme(axis.text = element_blank(),
legend.position = "none",
axis.ticks = element_blank(),
axis.title = element_blank())
From the source code (https://www.rdocumentation.org/packages/ggVennDiagram/versions/0.3/source), I see that the author defined two decimal places for label="percent".
Can I overwrite this in my R code, so that I have either no decimal places or just one?
# from function "plot_venn()" in source code of ggVennDiagram
counts <- counts %>%
mutate(percent=paste(round(.data$count*100/sum(.data$count),digits = 2),"%",sep=""))
Thank you very much in advance!

There is always a solution, but since this is hard coded way down, its going to get ugly.
In this case one way to do it is to initialise the figure without percentages, then add them yourself like ggVennDiagram would have, which requires a bit of backtracking through code and reaching into its innards.
g <- ggVennDiagram(x, label=NULL) +
theme(axis.text = element_blank(),
legend.position = "none",
axis.ticks = element_blank(),
axis.title = element_blank())
g
## Notice label=NULL above. We add labels ourself like so:
region_data <- ggVennDiagram:::four_dimension_ellipse_regions(n.sides=3000)
counts <- ggVennDiagram:::four_dimension_region_values(x)
polygon <- region_data[[1]]
center <- region_data[[2]]
counts <- counts %>%
mutate(percent=paste(round(.data$count*100/sum(.data$count),digits = 1),"%",sep="")) %>%
mutate(label = paste(.data$count,"\n","(",.data$percent,")",sep=""))
data <- merge(counts,center)
g + geom_label(aes_string(label="percent"),data=data,label.size = NA, alpha=.5)
(note, the code above was just copied from the package itself, the work goes into reverse engineering and figuring out which bits you need, and in which order)
You should notify the author of the package of this need, and ask him to offer this as a function argument.

ggVennDiagram now support percent_digit configuration in version 1.1. You may update it and set percent_digit as followings:
ggVennDiagram(x, label_percent_digit = 1, label = "percent")
see https://venn.bio-spring.info/using-ggvenndiagram#setting-region-label for more information.

Related

ggplot - using a vector mask for a raster

I am trying to create a vector mask to a raster. The raster is some color gradient created elswhere. Here I am discussing only the vector mask.
Using the raster and sf packages seems to be an overkill for the simple case. The best way I came up with is to plot the vector object, ggsave it to a raster file, read it back and then overlay it on the original raster.
will be happy to hear any better suggestion.
Anyway, when I write the plot to the the file there is always a small frame around it. It may not be visible when displaying the file on screen but its problematic in my case.
I could remove the frame but I cannot rely on color only and I am not sure that its always the same size. Here is my exampe:
library(tidyverse)
library(reshape2)
library(bmp)
pol <- tibble(x = c(1, 3, 5, 4), y = c(3,5, 4, 1))
p <- ggplot(pol) +
geom_polygon(aes(x,y), fill = "red") +
theme(panel.background = element_rect(fill = "black"),
panel.grid = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank())
ggsave("pol.bmp", p, dpi = "screen")
bmp <- read.bmp("pol.bmp")
bmp <- melt(bmp, varnames = c("y", "x")) %>%
mutate(value = as.factor(value))
ggplot(bmp) +
geom_raster(aes(x,y, fill = value)) +
theme(legend.position="none")
The initial plot
The raterized plot (ignore colors)
Please advise

export::graph2office moves axis labels around

I have made plots in R (RStudio) with ggplot2. When I export them via export::graph2office, the labels are moved around. However, this only happens when I specify the font for the labels.
library (ggplot2)
library (export)
plot_data <- data.frame (a = runif (1:20), b = seq (1:20))
x11 (width = 3, height = 3)
ggplot (data = plot_data, mapping = aes (x = a, y = b)) +
geom_point () +
labs (x = "my x-label", y = "my y-label") +
theme (panel.background = element_blank(),
panel.border = element_rect (fill = NA, size = 0.7),
axis.ticks = element_line (color = "black", lineend = "round"),
axis.ticks.length = unit (2, "mm"),
axis.text = element_text (color = "black"),
plot.margin = unit(rep (0, 4), "cm"),
text = element_text (size=18,
family="ChantillyLH",
color = "black")
)
graph2office (file = "my_graph", type = "DOC")
Here, you can see the graph in R (to the right) and the exported graph in word (to the left):
The undesired behaviour is more obvious for the y-label in this example, but also the x-label is moved a bit. I wonder if there is a way to fix this.
The same happens when I specify another font family, for example family="Comic Sans MS":
EDIT: it even happens when no textcommand is given:
The answer probably is: yes, export::graph2office moves axis labels around (so do export::graph2pptand export::graph2doc). There is no way to fix this. If you want to style your graphs in R and export them as-is into Office, the export::graph2office function, unfortunately, is not your way to go. However, the function can of course be used as a quick-and-dirty option to produce editable office-graphs.
If your goal is to export graphs in a more reliable manner, CairoSVG might be a much better option (see my answer here: Producing a vector graphics image (i.e. metafile) in R suitable for printing in Word 2007).

R: Counting how many polygons between two

I was trying to recreate a map showing how many municipals are you away from Cracow:
and to change the city from Cracow to Wrocław. The map was done in GIMP.
I got a shapefile (available here: http://www.gis-support.pl/downloads/powiaty.zip). I read the shapefile documentation packages like maptools, rgdal or sf, but I couldn't find an automatic function to count it, because I wouldn't like to do that manually.
Is there a function to do that?
Credits: The map was done by Hubert Szotek on https://www.facebook.com/groups/mapawka/permalink/1850973851886654/
I am not that experienced at network analysis, so I must confess not to understand every single line of code as follows. But it works! A lot of the material was adapted from here: https://cran.r-project.org/web/packages/spdep/vignettes/nb_igraph.html
This is the final results:
Code
# Load packages
library(raster) # loads shapefile
library(igraph) # build network
library(spdep) # builds network
library(RColorBrewer) # for plot colour palette
library(ggplot2) # plots results
# Load Data
powiaty <- shapefile("powiaty/powiaty")
Firstly the poly2nb function is used to calculate neighbouring regions:
# Find neighbouring areas
nb_q <- poly2nb(powiaty)
This creates our spatial mesh, which we can see here:
# Plot original results
coords <- coordinates(powiaty)
plot(powiaty)
plot(nb_q, coords, col="grey", add = TRUE)
This is the bit where I am not 100% sure what is happening. Basically, it is working out the shortest distance between all the shapefiles in the network, and returns a matrix of these pairs.
# Sparse matrix
nb_B <- nb2listw(nb_q, style="B", zero.policy=TRUE)
B <- as(nb_B, "symmetricMatrix")
# Calculate shortest distance
g1 <- graph.adjacency(B, mode="undirected")
dg1 <- diameter(g1)
sp_mat <- shortest.paths(g1)
Having made the calculations, the data can now be formatted to get into plotting format, so the shortest path matrix is merged with the spatial dataframe.
I wasn't sure what would be best to use as the ID for referring to datasets so I chose the jpt_kod_je variable.
# Name used to identify data
referenceCol <- powiaty$jpt_kod_je
# Rename spatial matrix
sp_mat2 <- as.data.frame(sp_mat)
sp_mat2$id <- rownames(powiaty#data)
names(sp_mat2) <- paste0("Ref", referenceCol)
# Add distance to shapefile data
powiaty#data <- cbind(powiaty#data, sp_mat2)
powiaty#data$id <- rownames(powiaty#data)
The data is now in a suitable format to display. Using the basic function spplot we can get a graph quite quickly:
displaylayer <- "Ref1261" # id for Krakow
# Plot the results as a basic spplot
spplot(powiaty, displaylayer)
I prefer ggplot for plotting more complex graphs as you can control the styling easier. However it is a bit more picky about how the data is fed into it, so we need to reformat the data for it before we build the graph:
# Or if you want to do it in ggplot
filtered <- data.frame(id = sp_mat2[,ncol(sp_mat2)], dist = sp_mat2[[displaylayer]])
ggplot_powiaty$dist == 0
ggplot_powiaty <- powiaty %>% fortify()
ggplot_powiaty <- merge(x = ggplot_powiaty, y = filtered, by = "id")
names(ggplot_powiaty)
And the plot. I have customised it a bit by removing elements which aren't required and added a background. Also, to make the region at the centre of the search black, I subset the data using ggplot_powiaty[ggplot_powiaty$dist == 0, ], and then plot this as another polygon.
ggplot(ggplot_powiaty, aes(x = long, y = lat, group = group, fill = dist)) +
geom_polygon(colour = "black") +
geom_polygon(data =ggplot_powiaty[ggplot_powiaty$dist == 0, ],
fill = "grey60") +
labs(title = "Distance of Counties from Krakow", caption = "Mikey Harper") +
scale_fill_gradient2(low = "#d73027", mid = "#fee08b", high = "#1a9850", midpoint = 10) +
theme(
axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
plot.background = element_rect(fill = "#f5f5f2", color = NA),
panel.background = element_rect(fill = "#f5f5f2", color = NA),
legend.background = element_rect(fill = "#f5f5f2", color = NA),
panel.border = element_blank())
To plot for Wrocław as shown at the top of the post, just change displaylayer <- "Ref0264" and update the title.

Edit labels in tooltip for plotly maps using ggplot2 in r

I know this question has been asked a number of times but I think some of the underlying syntax for plotly has changed since those questions have been asked. Using ggplotly() to create a choropleth map gives the default tooltip of long, lat, group, and one of my variables from my aesthetics. I understand that tooltip maps only whats in the aesthetics. All I want to do is to customize the tooltip so it displays some of the variables in my dataset (including those not mapped to aesthetics) and not others (such as the coordinates). Below is a reproducible example and what I've tried so far. I followed the advice given in response to other questions to no avail.
#Load dependencies
library(rgeos)
library(stringr)
library(rgdal)
library(maptools)
library(ggplot2)
library(plotly)
#Function to read shapefile from website
dlshape=function(shploc, shpfile) {
temp=tempfile()
download.file(shploc, temp)
unzip(temp)
shp.data <- sapply(".", function(f) {
fp <- file.path(temp, f)
return(readOGR(".",shpfile))
})
}
austria <- dlshape(shploc="http://biogeo.ucdavis.edu/data/gadm2.8/shp/AUT_adm_shp.zip",
"AUT_adm1")[[1]]
#Create random data to add as variables
austria#data$example1<-sample(seq(from = 1, to = 100, by = 1), size = 11, replace = TRUE)
austria#data$example2<-sample(seq(from = 1, to = 100, by = 1), size = 11, replace = TRUE)
austria#data$example3<-sample(seq(from = 1, to = 100, by = 1), size = 11, replace = TRUE)
#Fortify shapefile to use w/ ggplot
austria.ft <- fortify(austria, region="ID_1")
data<-merge(austria.ft, austria, region="id", by.x = "id", by.y = "ID_1")
#Save as ggplot object
gg<-ggplot(data, aes(x = long, y = lat, fill = example1, group = group)) +
geom_polygon() + geom_path(color="black",linetype=1) +
coord_equal() +
scale_fill_gradient(low = "lightgrey", high = "darkred", name='Index') +xlab("")+ylab("") +
theme(axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black")) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
#Plot using ggplotly
ggplotly(gg)
From here I've tried two different approaches. The most successful one of the approaches gets me there in part. I can add new variables to to the tooltip but I cannot do two things: 1) I cannot get rid of other variables already displayed by default (from the aesthetics) and 2) I cannot rename the variables something other than their column name from the dataset (for example I would like to label "example3 as "Example III"). Here is that approach:
#Save as a new ggplot object except this time add ``label = example3`` to the aesthetics
gg2<-ggplot(data, aes(x = long, y = lat, fill = example1, group = group, label = example3)) +
geom_polygon() + geom_path(color="black",linetype=1) +
coord_equal() +
scale_fill_gradient(low = "lightgrey", high = "darkred", name='Index') +xlab("")+ylab("") +
theme(axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black")) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
#Save as plotly object then plot
gg2 <- plotly_build(gg2)
gg2
I also tried adding the following but it did nothing:
gg2$data[[1]]$text <- paste("Example I:", data$example1, "<br>",
"Example II:", data$example2, "<br>",
"Example III:", data$example3)
Any help is much appreciated!
UPDATE: I updated plotly by installing from github instead of CRAN. Using this updated version (4.0.0) I've made it apart of the way there.
gg2$x$data[[2]]$text <- paste("Example I:", data$example1, "<br>",
"Example II:", data$example2, "<br>",
"Example III:", data$example3)
gg2
What happens now simply baffles me. This adds an additional tooltip separate from the previous one. This new tooltip is exactly what I want however both of them appear -not at once but if I move my mouse around. See the two screenshots below:
Notice those tooltips are from the same unit (Tirol). Could this be a bug in the package? This does not occur when display other graphs such as a time-series instead of a map. Also note, that I assigned the label "Example I" (or II or III) and this does not show on the new tooltip I added.
UPDATE #2: I figured out that the old tooltip (with long and lat shown) only appears when hovering over the borders so I got rid of the geom_path(color="black",linetype=1) command (as to remove the borders) and now I've managed to successfully solve that problem. However, I'm still unable to modify the labels that appear in the tooltip.
UPDATE #3: I figured out how to edit the labels but FOR ONLY ONE VARIABLE. Which is nuts! Here's my workflow from start to finish:
#Load dependencies
library(rgeos)
library(stringr)
library(rgdal)
library(maptools)
library(ggplot2)
library(plotly)
#Function to read shapefile from website
dlshape=function(shploc, shpfile) {
temp=tempfile()
download.file(shploc, temp)
unzip(temp)
shp.data <- sapply(".", function(f) {
fp <- file.path(temp, f)
return(readOGR(".",shpfile))
})
}
austria <- dlshape(shploc="http://biogeo.ucdavis.edu/data/gadm2.8/shp/AUT_adm_shp.zip",
"AUT_adm1")[[1]]
#Create random data to add as variables
austria#data$example1<-sample(seq(from = 1, to = 100, by = 1), size = 11, replace = TRUE)
austria#data$example2<-sample(seq(from = 1, to = 100, by = 1), size = 11, replace = TRUE)
austria#data$example3<-sample(seq(from = 1, to = 100, by = 1), size = 11, replace = TRUE)
#Fortify shapefile to use w/ ggplot
austria.ft <- fortify(austria, region="ID_1")
data<-merge(austria.ft, austria, region="id", by.x = "id", by.y = "ID_1")
#Save as ggplot object
gg<-ggplot(data, aes(x = long, y = lat, fill = example1, group = group, text = paste("Province:", NAME_1))) +
geom_polygon(color="black", size=0.2) +
coord_equal() +
scale_fill_gradient(low = "lightgrey", high = "darkred", name='Index') +xlab("")+ylab("") +
theme(axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black")) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
gg <- plotly_build(gg)
gg
That produces the following plot:
Notice that "Province" is now capitalized (it was not before). The trick was adding text = paste("Province:", NAME_1) to the aesthetics. HOWEVER, when I tried to add additional label changes using text2=paste("Example III:", example1), the following occurs:
Notice that it cannot render text2 the same way it renders text1. So instead I simply try adding a duplicate without the text2 like in the following: text=paste("Example III:", example1) -which produces the following odd result:
I'm beginning to think something as simple as toggling "legend" options in plotly's ggplot conversion is impossible.
UPDATE #4: So I decided to approach this another way. Instead, I decided to change the variable names themselves. I would have done this from the start, except I wasn't sure if/how ggplot2 accepts variables with spaces -i figured out `variable` that can work. So I went ahead and relabeled the variables. It works -KINDA. The problem is the text appears with the quotations marks around them. Now I need a way to get rid of these!!! Any ideas anyone? Thanks! Here is an image of what I mean by quotations in the text:
I am new to plotly too but have come across a similar problem for my ggplot2 bubble plots when using ggplotly(). I have finally found a solution that works for me and thought it might help you, too, although I haven't tried it for choropleth maps.
Your first question was to customize the tooltip so it displays some of the variables in the dataset (including those not mapped to aesthetics).
In your UPDATE#3 you introduce:text = paste("Province:", NAME_1) into your aes. If you want to add a second line of custom variables or text, just keep adding it into the brackets:text = paste("Province:", NAME_1, "Example III:", example1) To add a line break between both add <br> in the spot where you want the break to be, like:text = paste("Province:", NAME_1, "<br>", "Example III:", example1)
Your second question was to customize the tooltip so it does NOT display other (default) variables (that are mapped to aesthetics, such as the coordinates).
I found this very easy addition to the ggplotly() function that did the trick for me: ggplotly(gg, tooltip = c("text")) In my case, this removed ALL default variables that are shown in the tooltip and only showed those that are custom specified with text above. You can add other variables back in by doing ggplotly(gg, tooltip = c("text","x")) The order of the variables shown in the tooltip will be the same as the order specified in the tooltip argument. I found this documented here: https://github.com/ropensci/plotly/blob/master/R/ggplotly.R
This solution worked (in principle) for me using R 3.1.1 and plotly 3.4.13

How can I make a Frequency distribution bar plot in ggplot2?

Sample of the dataset.
nq
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.166967913
0.136841748
0.108907644
0.264136384
0.356655651
0.250508305
I would like to make a Percentage Bar plot/Histogram like this question: RE: Alignment of numbers on the individual bars with ggplot2
The max value of NQ for full dataset is 21 and minimum value is 0.00005
But I am unable to adapt the code as I don't have a Freq column and I have one series.
I have made a mockup of the figure I am trying to make.
Could you please help?
Would that work for you?
nq <- read.table(text = "
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.166967913
0.136841748
0.108907644
0.264136384
0.356655651
0.250508305", header = F) # Your data
nq$V2 <- cut(nq$V1, 5, include.lowest = T)
nq2 <- aggregate(V1 ~ V2, nq, length)
nq2$V3 <- nq2$V1/sum(nq2$V1)
library(ggplot2)
ggplot() + geom_bar(data = nq2, aes(V2, V1), stat = "identity", width=1, fill = "white", col = "black", size = 2) +
geom_text(vjust=1, fontface="bold", data = nq2, aes(label = paste(sprintf("%.1f", V3*100), "%", sep=""), x = V2, y = V1 + 0.4), size = 5) +
theme_bw() +
scale_x_discrete(expand = c(0,0), labels = sprintf("%.3f",seq(min(nq$V1), max(nq$V1), by = max(nq$V1)/6))) +
ylab("No. of Cases") + xlab("") +
scale_y_continuous(expand = c(0,0)) +
theme(
axis.title.y = element_text(size = 20, face = "bold", angle = 0),
panel.grid.major = element_blank() ,
panel.grid.minor = element_blank() ,
panel.border = element_blank() ,
panel.background = element_blank(),
axis.line = element_line(color = 'black', size = 2),
axis.text.x = element_text(face="bold"),
axis.text.y = element_text(face="bold")
)
I thought this would be easy, but it turned out to be frustrating. So perhaps the "right" way is to transform your data before using ggplot as it looks like #DavidArenburg has done. But, if you feel like hacking ggplot, here's what I ended up doing.
First, some sample data.
set.seed(15)
dd<-data.frame(x=sample(1:25, 100, replace=T, prob=25:1))
br <- seq(0,25, by=5) # break points
My first attempt was
library(ggplot2)
ggplot(dd, aes(x)) +
stat_bin(position="stack", breaks=br) +
geom_text(aes(y=..count.., label=..density..*..width.., ymax=..count..+1),
vjust=-.5, breaks=br, stat="bin")
but that didn't make "pretty labels"
so i thought i'd use the percent() function from the scales package to make it pretty. However, silly ggplot doesn't really make it possible to use functions with ..().. variables because it evaluates them in the data.frame only (then the empty baseenv()). It doesn't have a way to find the function you use. So this is when I turned to hacking. First i'll extract the "Layer" definition from ggplot and the map_statistic from it. (NOTE: this was done with "ggplot2_1.0.0" and is specific to that version; this is a private function that may change in future releases)
orig.map_statistic <- ggplot2:::Layer$map_statistic
new.map_statistic <- orig.map_statistic
body(new.map_statistic)[[9]]
# stat_data <- as.data.frame(lapply(new, eval, data, baseenv()))
here's the line that's causing grief I would prefer it the function resolved other names in the plot environment that are not found in the data.frame. So I decided to change it with
body(new.map_statistic)[[9]] <- quote(stat_data <- as.data.frame(lapply(new, eval, data, plot$plot_env)))
assign("map_statistic", new.map_statistic, envir=ggplot2:::Layer)
So now I can use functions with ..().. variables. So I can do
library(scales)
ggplot(dd, aes(x)) +
stat_bin(position="stack", breaks=br) +
geom_text(aes(y=..count.., ymax=..count..+2,
label=percent(..density..*..width..)),
vjust=-.5, breaks=br, stat="bin")
to get
So i'm not sure why ggplot has this default behavior. There could be some good reason for it but I don't know what it is. This does change how ggplot will behave for the rest of the session. You can change back to default with
assign("map_statistic", orig.map_statistic, envir=ggplot2:::Layer)

Resources