How to control stripe transparency using ggforestplot/geom_stripes? - r

I was hoping to have some help in modifying the stripe transparency/shading color in the ggforestplot package. Please see the image below ("lighten" indicates the stripes I need to lighten). What is the best way of modifying the following code to do that?
Thank you so much for any pointers!
# Load and attach the packages
library(ggforestplot)
library (ggplot2)
library(tidyverse)
# Reproducible dataset
df <- ggforestplot::df_linear_associations %>% filter( trait == "BMI", dplyr::row_number() <= 30)
# Draw a forestplot
ggforestplot::forestplot(
df = df,
name = name,
estimate = beta,
se = se)+
geom_point(shape = 15, size = 5) +
geom_stripes( odd ="#00000000", even = "#00000000") +
theme(legend.position="none",
panel.background = element_rect(fill = "transparent",colour = NA),
plot.background = element_rect(fill = "transparent",colour = NA))

The issue is that ggforestplot::forestplot already adds a geom_stripes layer with hard-coded default values for odd and even. Adding another geom_stripes will have no effect on this underlying stripes layer and will simply result in overplotting of the points, vertical lines, ... . To adjust the transparency you could (and TBMK need to) hack the internals:
# Load and attach the packages
library(ggforestplot)
library(ggplot2)
library(tidyverse)
# Reproducible dataset
df <- ggforestplot::df_linear_associations %>% filter( trait == "BMI", dplyr::row_number() <= 30)
# Draw a forestplot
p <- ggforestplot::forestplot(
df = df,
name = name,
estimate = beta,
se = se) +
geom_point(shape = 15, size = 5) +
theme(legend.position="none",
panel.background = element_rect(fill = "transparent"))
p$layers[[1]]$aes_params$odd <- "#00000000"
p

Related

Which R package can I use to plot kernel density plots per year?

I measured the number of occurrences of exclamation marks in the abstract and title of papers per year. Now, I want to show the distribution of this number for each individual year using a kernel density estimation. I want to plot my data in a way that I found in another publication (PlaveĢn-Sigray et al. eLife 2017, https://elifesciences.org/articles/2772):
Do you have any idea how I could achieve this using R? I would be glad if you could provide a package.
I added some toy data along with what I tested so far.
library(ggplot2)
set.seed(176)
df = data.frame(
id = seq(1:2000),
amount = sample(0:3, 2000, replace = TRUE),
year = sample(1990:2010, 2000, replace = T)
)
ggplot(df, aes(x = year, y = amount) ) +
geom_density_2d() +
geom_density_2d_filled() +
geom_density_2d(colour = "black")
I get the following result which is not really what I want:
Any help would be appreciated. Thank you in advance!
You can get a plot like this in ggplot directly without additional packages. Here's a full reprex:
set.seed(1)
df <- data.frame(year = rep(1920:2000, each = 100),
amount = rnorm(8100, rep(120:200, each = 100), 20))
library(tidyverse)
df %>%
group_by(year) %>%
summarize(Amount = density(amount, from = min(df$amount),
to = max(df$amount))$x,
Density = density(amount, from = min(df$amount),
to = max(df$amount))$y) %>%
ggplot(aes(year, Amount, fill = Density)) +
geom_raster(interpolate = TRUE) +
scale_fill_viridis_c(option = "magma") +
theme_minimal(base_size = 20) +
coord_cartesian(expand = 0) +
theme(legend.position = "top",
legend.key.width = unit(3, "cm"),
legend.title = element_text(vjust = 1))

rayshader 3D ggplots - issue with z values

I am trying to exploit some features of R to plot some crypto data. I am trying to use some examples found at https://www.tylermw.com/3d-ggplots-with-rayshader/
but I am having an issue since on the Z axis I do not get any data. I have on the x,-axis time and price and I want to plot the volume on the z-axis (third column of the file). At the moment I do not see the z values of the volume.
I am sending the R script that I am using:
library(tidyverse)
library(viridis)
library(rayshader)
library(ggplot2)
dec = "."
btcUSDT <- read.csv("test orderbook/data/BTC_USDT 20220720_r.txt",header=TRUE,sep=";",dec=dec,strip.white = TRUE)
head(btcUSDT)
class(btcUSDT$time)
class(btcUSDT$price)
class(btcUSDT$volume)
btcUSDT %>%
ggplot(aes(x = time, y = price) ) +
geom_tile(aes(fill = volume),size=1,color="black") +
scale_x_continuous("Time") +
scale_y_discrete("Price") +
ggtitle("USDT order book") +
labs(caption = "2022-07-23 data") +
theme(axis.text = element_text(size = 12),
title = element_text(size = 12,face="bold"),
panel.border= element_rect(size=2,color="black",fill=NA)) ->
nn_gg
plot_gg(nn_gg, multicore = TRUE, width = 6, height = 5.5, scale = 300,
background = "#afceff",shadowcolor = "#3a4f70")
And I am sending a few rows of my sample data:
head(btcUSDT)
time price volume
1 1.658332e+12 24177.8 1.533
2 1.658332e+12 24178.3 1.535
3 1.658332e+12 24179.1 3.650
4 1.658332e+12 24179.8 3.950
5 1.658332e+12 24179.9 4.241
6 1.658332e+12 24180.0 35.546
class(btcUSDT$time)
[1] "numeric"
class(btcUSDT$price)
[1] "numeric"
class(btcUSDT$volume)
[1] "numeric"
And an image result that I get
Please note that samples from https://www.tylermw.com/3d-ggplots-with-rayshader/ are reproduced properly in my R Studio so all libraries works fine.
The file with data that I am using is available here Data used
Can anyone help to solve the problem?
Thanks
I have changed the data set and values of z axis are plotted but it seems that these have not the correct size (z values seems small). In the previous data set z ranges are [0.60] now are [0,300] and this is the new result that i get
new 3d snapshot
The R script is pretty much the same:
library(tidyverse)
library(viridis)
library(rayshader)
library(ggplot2)
dec = "."
binancebtcUSDT <- read.csv("test orderbook/data/btcusdtbinance 20220720.csv",header=TRUE,sep=",",dec=dec,strip.white = TRUE)
head(binancebtcUSDT)
class(binancebtcUSDT$time)
class(binancebtcUSDT$price)
class(binancebtcUSDT$volume)
max(binancebtcUSDT$price)
min(binancebtcUSDT$price)
max(binancebtcUSDT$volume)
min(binancebtcUSDT$volume)
binancebtcUSDT %>%
ggplot(aes(x = time, y = price)) +
geom_tile(aes(color = volume),size=3, alpha= 0.1) +
scale_color_gradient(low = "dark red", high = "steelblue", na.value = NA) +
scale_x_continuous("Time", expand=c(0,0)) +
scale_y_discrete("Price", expand=c(0,0)) +
ggtitle("Binance BTC/USDT order book") +
labs(caption = "2022-07-20 data") +
coord_cartesian(ylim=c(23418.3, 24196.3), xlim=c(1658331502525,1658339456644)) +
theme(axis.text = element_text(size = 12),
title = element_text(size = 12,face="bold"),
panel.border= element_rect(size=2,color="black",fill=NA)) ->
binance_plt
plot_gg(binance_plt, multicore = TRUE, width = 6, height = 5.5, scale = 300,
background = "#afceff",shadowcolor = "#3a4f70")
I have tried to change the scale in plot_gg but only the z scale outside the plot changes. This is the new dataset that I am using new data set
Any idea to solve the new issue? Thanks
I have found out that the data set contains a few elements in the range 100-300 and this causes such representation issue. Is there a way to change the scale of the z values?
looks like "fill" is not the right argument for geom_tile: you need to use color:
btcUSDT %>%
ggplot(aes(x = time, y = price) ) +
#geom_point(aes(color=volume))+
geom_tile(aes(color = volume),size=3, alpha= 0.1)+
scale_color_gradient(low = "dark red", high = "steelblue", na.value = NA)

Specification curve "choices" plot using ggplot2

I have a small dataset of estimates from many regressions of an outcome variable on a main treatment variable and then various sets of control variables (in fact, all possible combinations of those controls variables). The table of estimates is as follows:
df <-
structure(list(control_set = c("cen21_hindu_pct", "cen83_urban_pct",
"cen21_hindu_pct + cen83_urban_pct", "NONE"), xest = c(0.0124513609978549,
0.00427174623249021, 0.006447506098051, 0.0137107176362076),
xest_conf_low = c(0.00750677700140716, -0.00436301983024899,
-0.0013089334064237, 0.00925185534519074), xest_conf_high = c(0.0173959449943027,
0.0129065122952294, 0.0142039456025257, 0.0181695799272245
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
I want to make the two plots for the classic "specification curve analysis." The top plot is simply the set of estimates ordered by the magnitude of the estimate on the main treatment variable (no issue here):
df %>%
arrange(xest) %>%
mutate(specifications = 1:nrow(.)) %>%
ggplot(aes(x = specifications, y = xest, ymin = xest_conf_low, ymax = xest_conf_high)) +
geom_pointrange(alpha = 0.1, size = 0.6, fatten = 1) +
labs(x = "", y = "Estimate\n") +
theme_bw()
My problem is with the aligned plot underneath that describes the control-set choices. Directly underneath each coefficient dot and whisker from the plot just made I want a plot that indicates the set of corresponding control variables that were included in that model (i.e. the list of controls in the control_set column in the df data frame row). So the plot I need in this example would look just like this:
This is a (failed) sketch of what I tried to get there, by modifying the earlier estimation dataset in long form, but I couldn't get multiple ticks to show vertically: (Note, this bit of code won't run)
# forplot %>%
# arrange(xest) %>%
# mutate(specifications = 1:nrow(.)) %>%
# mutate(value = "|") %>%
# ggplot(aes(specifications, term)) +
# geom_text(aes(label = value)) +
# scale_color_manual(values = c("lightblue")) +
# labs(x = "\nSpecification number", y = "") +
# theme_bw()
How can I use ggplot2 to make the plot-figure shown above from the information in the data frame, df?
If we define your plot as -> a...
library(patchwork)
b <- tibble(specifications = c(1,2,2,3),
control_set = rep(c("cen83_urban_pct", "cen21_hindu_pct"), each = 2)) %>%
ggplot(aes(specifications, control_set)) +
geom_text(aes(label = "|"), size = 5) +
coord_cartesian(xlim = c(1,4)) +
labs(x = NULL, y = NULL) +
theme_bw()+
theme(axis.ticks = element_blank(),
axis.text.x = element_blank())
a/b + plot_layout(heights = c(3,1))
If you want to generate the key automatically, you might use something like this:
library(dplyr)
df %>%
select(control_set) %>%
mutate(specifications = 1:4) %>%
separate_rows(control_set, sep = "\\+") %>%
mutate(control_set = trimws(control_set)) %>% # b/c my regex not good enough to trim spaces in line above
...
If you want to relabel the numbers in the y-axis with the control_set labels you can add
+ scale_y_continuous(breaks = df$xest, labels = df$control_set)

For loop to create a list of ggplots always save the same coordinates for points and segments

first you need to load these packages:
library(ggplot2)
library(ggrepel)
I have a dataframe "dframe" like this:
V1 V2 V3 V4 V5 V6 V7 Groups
0.05579838 -0.44781204 -0.164612982 -0.05362210 -0.23103516 -0.04923499 -0.06634579 1
0.14097924 -0.35582736 0.385218841 0.18004788 -0.18429327 0.29398646 0.69460669 2
0.10699838 -0.38486299 -0.107284020 0.16468591 0.48678593 -0.70776085 0.20721932 3
0.22720072 -0.30860464 -0.197930310 -0.24322096 -0.30969028 -0.04460600 -0.08420536 4
0.24872635 -0.23415141 0.410406163 0.07072146 -0.09302970 0.01662256 -0.21683816 5
0.24023670 -0.27807097 -0.096301697 -0.02373198 0.28474825 0.27397862 -0.29397324 6
0.30358363 0.05630646 -0.115190308 -0.51532428 -0.08516130 -0.08785924 0.12178198 7
0.28680386 0.07609196 0.488432908 -0.13082951 0.00439161 -0.17572986 -0.25854047 8
0.30084361 0.06323714 -0.008347161 -0.26762137 0.40739524 0.22959024 0.19107494 9
0.27955675 0.22533959 -0.095640072 -0.27988676 -0.04921808 -0.10662521 0.19934074 10
0.25209125 0.22723231 0.408770841 0.13013867 -0.03850484 -0.23924023 -0.16744745 11
0.29377963 0.13650247 -0.105720288 -0.00316556 0.29653723 0.25568169 0.06087608 12
0.24561895 0.28729625 -0.167402464 0.24251060 -0.22199262 -0.17182828 0.16363196 13
0.25150342 0.25298115 -0.147945172 0.43827820 0.02938933 0.01778563 0.15241257 14
0.30902922 -0.01299330 -0.261085058 0.13509982 -0.40967529 -0.11366113 -0.06020937 15
0.28696274 -0.12896680 -0.196764195 0.39259942 0.08362863 0.25464125 -0.29386260 16
Here is a reproducible dataframe that you can use from Mark Peterson:
dframe <-
rnorm(70) %>%
matrix(nrow = 10) %>%
as_tibble() %>%
setNames(paste0("V", 1:ncol(.))) %>%
mutate(Groups = 1:nrow(.)
, Label = 1:nrow(.))
I created a table of combinations of columns I want to be used from my dataframe:
#Create all possible combinations
combs<-expand.grid(seq(7),seq(7))
#Remove duplicate and order
combs<-combs[combs$Var1 != combs$Var2,]
combs<-combs[order(combs[,1]),]
then I made a for loop supposed to generate a list of ggplots, 1 plot by combination:
list_EVplots<-list()
for(i in seq(nrow(combs))){
list_EVplots[[paste(combs[i,1],"&",combs[i,2])]]<- ggplot(data=dframe) +
ggtitle(paste("Eigenvector Plot - Pairwise",
"correlation with","adjustment")) +
geom_point(aes(x = dframe[,combs[i,1]], y = dframe[,combs[i,2]],
color = Groups)) +
geom_segment(aes(x = rep(0,nrow(dframe)), y = rep(0,nrow(dframe)),
xend = dframe[,combs[i,1]], yend = dframe[,combs[i,2]],
color = Groups),
size = 1, arrow = arrow(length = unit(0.3,"cm"))) +
geom_label_repel(aes(x = dframe[,combs[i,1]], y = dframe[,combs[i,2]],
label = rownames(dframe))) +
scale_color_manual(values=colors) +
xlab(paste0("Eigenvector ",combs[i,1])) +
ylab(paste0("Eigenvector ",combs[i,2])) +
theme(plot.title = element_text(hjust = 0.5),
axis.title = element_text(size = 13),
legend.text = element_text(size=12)) +
geom_hline(yintercept = 0, linetype="dashed") +
geom_vline(xintercept = 0, linetype="dashed")
}
After running this for loop, I obtain my list "list_EVplots".
Problem: iterations seem to work for xlab() and ylab(), it also work for the names of plots in the list, but the coordinates of geom_point(aes()) and geom_segment(aes()) do not change. Coordinates stay the same when they obviously should change!
I think the coordinates stay locked on the one used for the first plot of the first iteration.
If anyone has the solution for that I would be very grateful for your help.
Working under Linux 16.04 with R Studio. R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
I tried with a subsetted dataframe with only the columns I wanted to work with instead of using an 8 columns dataframe: didn't work.
Expected: The list should contain different plots: all plots should be different.
Problem: All plots have the same coordinates for dots and segments in the list.
The simplest answer is often the easiest one: try to avoid using for loops in places where lapply is more appropriate. I don't see anything obvious in your code that suggests where the problem lies, but I am guessing that it is a problem in the deeply nested [] statements.
Here is an approach using lapply and aes_string to handle the variables. If you want something other than a full pairwise set of plots, you may have to modify the calls to the two lapply's a bit.
First, some reproducible data (made using dplyr). Note that I made the Labels explicit instead of relying on the rownames (this is good practice, and far easier to use in calls to ggplot).
dframe <-
rnorm(70) %>%
matrix(nrow = 10) %>%
as_tibble() %>%
setNames(paste0("V", 1:ncol(.))) %>%
mutate(Groups = 1:nrow(.)
, Label = 1:nrow(.))
Then, I am pulling out the columns that you want to use for your plots. I am naming them so that the returned list has the column names automatically assigned.
my_cols <-
names(dframe)[1:7] %>%
setNames(.,.)
Then, just set up a nested lapply to work through all of the pairwise comparisons:
plot_list <-
lapply(my_cols, function(col1){
lapply(my_cols, function(col2){
if(col1 == col2){
return(NULL)
}
ggplot(dframe) +
ggtitle(paste("Eigenvector Plot - Pairwise",
"correlation with","adjustment")) +
geom_point(aes_string(x = col1
, y = col2
, color = "Groups")) +
geom_segment(aes_string(xend = col1
, yend = col2
, color = "Groups")
, x = 0
, y = 0
, size = 1
, arrow = arrow(length = unit(0.3,"cm"))) +
geom_label_repel(aes_string(x = col1
, y = col2
, label = "Label")) +
xlab(paste0("Eigenvector ", col1)) +
ylab(paste0("Eigenvector ", col2)) +
theme(plot.title = element_text(hjust = 0.5),
axis.title = element_text(size = 13),
legend.text = element_text(size=12)) +
geom_hline(yintercept = 0, linetype="dashed") +
geom_vline(xintercept = 0, linetype="dashed")
})
})
Note that you did not include the colors that you wanted to use for the groups, so I left the defaults instead.
The plots come out correctly and this should be easier to work through.

Finding the random x-values used by geom_jitter

I want to be able to select observations from a box plot with jittered points over top. I have been somewhat successful by having the point click find the category, look at the y-value and select the observation. The following code shows my progress so far:
# ------------------------------Load Libraries---------------------------------
library(shiny)
library(ggplot2)
library(dplyr)
# -------------------------Print Boxplot to Screen-----------------------------
ui <- fluidPage(plotOutput('irisPlot', click = 'irisClick'))
server <- function(input, output){
# --------------------------Store Clicked Points-------------------------------
clicked <- reactiveValues(rows = rep(TRUE,nrow(iris)))
# ---------------------------Modify the Dataset--------------------------------
IRIS <- reactive({iris %>% mutate(index = clicked$rows)})
# ---------------------Select Points Through Plot Click------------------------
observeEvent(
input$irisClick,{
nS <- iris %>% mutate(selected = rep(FALSE,nrow(iris)))
lvls <- levels(iris$Species)
plant <- lvls[round(input$irisClick$x)]
pxl <- which(
sqrt((iris$Sepal.Width-input$irisClick$y)^2) %in%
min(sqrt((iris$Sepal.Width-input$irisClick$y)^2))
)
point <- iris[pxl,'Sepal.Width']
nS[nS$Species == plant & nS$Sepal.Width %in% point,'selected'] <- TRUE
clicked$rows <- xor(clicked$rows, nS$selected)
})
# --------------------------Generate the Boxplot-------------------------------
output$irisPlot <- renderPlot({
set.seed(1)
ggplot(IRIS(), aes(x = Species, y = Sepal.Width))+
geom_boxplot(na.rm = TRUE,outlier.shape = NA)+
geom_jitter(
na.rm = TRUE,
width = .8,
aes(shape = index, size = index, colour = index)
)+
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
panel.border = element_rect(colour = 'black', fill = NA),
legend.position = "none"
)+
scale_shape_manual(values = c('FALSE'= 1,'TRUE'= 19))+
scale_size_manual(values = c('FALSE' = 4, 'TRUE'= 2))+
scale_colour_manual(values = c('TRUE' = "#428BCA", 'FALSE' = '#FAA634'))
})
}
shinyApp(ui, server)
As I said the code mostly works but it can be inconsistent. Sometimes it can't find a point, other times it selects a large group of points or selects a point on the opposite side of the box plot. I figure the best way to solve this is to have both an x and y coordinate to select the point however, since the x values are randomly generated I need geom_jitter() to tell me what x-values it is using for a given plot but I have not been able to find any way to access this. Any help finding this information would be greatly appreciated.
My thanks to aosmith for telling me about the layer_data() function and to Peter Ellis for suggesting that I use geom_point() instead of geom_jitter() both comments were instrumental in helping me solve my problem.
What I had to do was create a new plot object in the global environment to jitter the points. Then use the layer_data() function to return the newly created x-values.
Finally, using those x-values, I created a new plot object and layered the points over top using geom_point(). Here is the completed code for anyone interested.
# ------------------------------Load Libraries---------------------------------
library(shiny)
library(ggplot2)
library(dplyr)
# ----------------------------Generate X Coords--------------------------------
set.seed(1)
g1 <- ggplot(iris, aes(x = Species, y = Sepal.Width))+
geom_boxplot(na.rm = TRUE,outlier.shape = NA)+
geom_jitter(na.rm = TRUE,width = .8)
xPoints <- layer_data(g1, i = 2)$x
# -------------------------Print Boxplot to Screen-----------------------------
ui <- fluidPage(
plotOutput('irisPlot', click = 'irisClick')
)
server <- function(input, output){
# --------------------------Store Clicked Points-------------------------------
clicked <- reactiveValues(rows = rep(TRUE,nrow(iris)))
rand <- reactiveValues(x = rep(NA,nrow(iris)))
# ---------------------------Modify the Dataset--------------------------------
IRIS <- reactive({iris %>% mutate(index = clicked$rows)})
# ---------------------Select Points Through Plot Click------------------------
observeEvent(
input$irisClick,{
nS <-data.frame( iris, x = xPoints)
point <- nearPoints(
df = nS,
coordinfo = input$irisClick,
xvar = 'x',
yvar = 'Sepal.Width',
allRows = TRUE
)
clicked$rows <- xor(clicked$rows, point$selected_)
})
# --------------------------Generate the Boxplot-------------------------------
output$irisPlot <- renderPlot({
ggplot(IRIS(), aes(x = Species, y = Sepal.Width))+
geom_boxplot(na.rm = TRUE,outlier.shape = NA)+
geom_point(
aes(
x = xPoints,
y = iris$Sepal.Width,
shape = index,
size = index,
colour = index
),
inherit.aes = FALSE
)+
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
panel.border = element_rect(colour = 'black', fill = NA),
legend.position = "none"
)+
scale_shape_manual(values = c('FALSE'= 1,'TRUE'= 19))+
scale_size_manual(values = c('FALSE' = 4, 'TRUE'= 2))+
scale_colour_manual(values = c('TRUE' = "#428BCA", 'FALSE' = '#FAA634'))
})
output$x <- renderPlot({
})
}
shinyApp(ui, server)
Just for the benefit of people like me who might be googling this problem I solved it very easily using Peter Ellis's suggestion of jittering the points myself using jitter().
I'm making it into an answer because I thought it should be more visible, I nearly missed it when I was looking at this page.

Resources