What is wrong with my custom colour palette in this plot? - r

Using ggsurvplot to draw some Kaplan-Meier curves.
5 curves should be plotted and I want control over their colours.
Here is the output of the survfit being plotted:
> elective_30Decadesurv
Call: survfit(formula = elective30Surv ~ electives$Decade)
n events median 0.95LCL 0.95UCL
electives$Decade=50 14 0 NA NA NA
electives$Decade=60 173 2 NA NA NA
electives$Decade=70 442 5 NA NA NA
electives$Decade=80 168 4 NA NA NA
electives$Decade=90 2 0 NA NA NA
Here is a working plot using the default colour palette, "hue":
> ggsurvplot(elective_30Decadesurv,
data = electives,
palette = "hue",
title = "30 day survival after elective EVAR",
legend = "none",
legend.title = "Decade",
legend.labs = c("5th",
"6th",
"7th",
"8th",
"9th"
),
censor.shape = 124,
ggtheme = survPlotTheme,
risk.table = "nrisk_cumevents",
risk.table.y.text.col = TRUE,
risk.table.fontsize = 3,
risk.table.height = 0.3,
break.time.by = 5,
ylim = c(0.95,
1
),
pval = TRUE,
pval.size = 3,
pval.coord = c(1,
0.96
)
)
See plot in section 3.1.4 of this webpage for the output of the above
The Decade group has 5 entries, so I'm trying to provide five colours to palette.
However, both:
> ggsurvplot(elective_30Decadesurv,
data = electives,
palette = c("#440154",
"#3B528B",
"#21908C",
"#5DC863",
"#5DC863"
),
title = "30 day survival after elective EVAR",
legend = "none",
legend.title = "Decade",
legend.labs = c("5th",
"6th",
"7th",
"8th",
"9th"
),
censor.shape = 124,
ggtheme = survPlotTheme,
risk.table = "nrisk_cumevents",
risk.table.y.text.col = TRUE,
risk.table.fontsize = 3,
risk.table.height = 0.3,
break.time.by = 5,
ylim = c(0.95,
1
),
pval = TRUE,
pval.size = 3,
pval.coord = c(1,
0.96
)
)
And:
> fiveColours <- c("#440154",
"#3B528B",
"#21908C",
"#5DC863",
"#5DC863"
)
> ggsurvplot(elective_30Decadesurv,
data = electives,
palette = fiveColours,
title = "30 day survival after elective EVAR",
legend = "none",
legend.title = "Decade",
legend.labs = c("5th",
"6th",
"7th",
"8th",
"9th"
),
censor.shape = 124,
ggtheme = survPlotTheme,
risk.table = "nrisk_cumevents",
risk.table.y.text.col = TRUE,
risk.table.fontsize = 3,
risk.table.height = 0.3,
break.time.by = 5,
ylim = c(0.95,
1
),
pval = TRUE,
pval.size = 3,
pval.coord = c(1,
0.96
)
)
Give the same error:
Error in names(.cols) <- grp.levels :
'names' attribute [5] must be the same length as the vector [4]
What vector is length [4]?
Is 'names' attribute my colour vector?
If I take one of the colours out of the custom palette, eg fiveColours <- c("#440154","#3B528B","#21908C","#5DC863") I get this error:
Error: Insufficient values in manual scale. 5 needed but only 4 provided.
Which implies the number of colours provided is correct but something else is causing the issue.
I've troubleshot to the limits of my own ability. Help please!
FYI:
> electives %>% select(Decade) %>% group_by(Decade) %>% summarise(n())
# A tibble: 5 x 2
Decade `n()`
<fct> <int>
1 50 14
2 60 173
3 70 442
4 80 168
5 90 2
Should prove the length of the Decade variable and here is how the survival object and survfit were generated:
> elective5Surv <- Surv(electives$surv5Y, electives$dead5Y)
> elective_5Decadesurv <- survfit(elective5Surv ~ electives$Decade)

Ok, I have sorted my own mistake by proof-reading!
Of the five hex colours I’d provided, two were identical (not on purpose.)
I changed the fifth colour to a different hex value (what it was meant to be in the first place) and it works now.
Thanks, Rui, for your response earlier, it helped me down the path!

Related

How to adjust column width in dataframe of foretsplotter function

I am trying to create a forestplot, using forestplotter function, am able to get a beautiful graph, but am not able to see the entire graph, the column widths in few of the columns are so big, even if the string size is less, making the width of the entire graph, so big to see, can someone help me with this and also is it possible to align the datahrame contents uniformly centre aligned......Please help me with this
The code and relevant data are
###Required packages###
library(grid)
library(forestploter)
library(rmeta)
library(gridExtra)
#Data entered#
df <- data.frame(Study=c("A","B","C","D","Summary"),
nA = c(24,187,36,26,273),
median_A = c(4.9,5.69,8.866995074,8.5,NA),
Q1A =c(3,2.86,4.495073892,2,NA),
Q3A =c(8.5,9.78,14.96305419,32,NA),
nP = c(23,193,36,26,278),
median_P = c(7.2,6.79,8.990147783,12.5,NA),
Q1P =c(3.4,3.59,4.002463054,2,NA),
Q3P =c(10.9,10.12,12.06896552,43,NA),
W = c("10.6%","80.8%","8.0%","0.70%",NA),
E=c(-2.3,-1.1,-0.123152709,-4,-1.16881587),
UL=c(1.161473203,0.156288294,3.881699516,10.02689306,-0.039791047),
LL=c(-5.761473203,-2.356288294,-4.128004935,-18.02689306,-2.297840692))
#Calculate SE for box size#
df$SE <- (df$UL-df$E)/1.96
#Column for Confidence intervals for Drug A and Placebo, with 2 significant digit#
df$IQRA <- sprintf("%.2f (%.2f to %.2f)",df$median_A,df$Q1A, df$Q3A)
df$IQRP <- sprintf("%.2f (%.2f to %.2f)",df$median_P,df$Q1P, df$Q3P)
#Column for Confidence intervals for NET EFFECT, with 2 significant digit#
df$MD <- sprintf("%.2f (%.2f to %.2f)", df$E, df$LL, df$UL)
#Create a column with space for forest plot#
df$" "<- paste(rep(" ", 16), collapse = " ")
##Forest plot theme##
#To be modified as needed#
ftn <-forest_theme(
base_size = 16,
base_family = "serif",
ci_pch = 15,
ci_col = "black",
ci_lty = 1,
ci_lwd = 1,
ci_Theight = 0.25,
legend_name = " ",
legend_position = "right",legend_value = "",
xaxis_lwd = 1,
xaxis_cex = 0.7,
refline_lwd = 1,
refline_lty = "dashed",
refline_col = "red",
summary_fill = "blue",
summary_col = "blue",
footnote_cex = 0.4,
footnote_fontface = "plain",
footnote_col = "black",
title_just = c("center"),
title_cex = 1.1,
title_fontface = "bold",
title_col = "black",
show.rownames = FALSE)
##Table in Order for Forest plot##
#First get Column names#
colnames(df)
df2 <-df[,c(1,2,15,6,16,18,17)]
#Make NA cells empty
df2[5,3] <-c(" ")
df2[5,5] <-c(" ")
##Forestplot##
plot<-forest(df2,
est = df$E,
lower = df$LL,
upper = df$UL,
sizes = (df$SE/10),
ci_column = 6,
ref_line = 0,
arrow_lab = c("Drug A Better", "Placebo Better"),
xlim = c(-7, 6),
is_summary = c(FALSE,FALSE,FALSE,FALSE,TRUE),
xlog = FALSE,
ticks_digits = 0,ticks_at = c(-6,0,6),
theme = ftn)
##Show plot
print(plot, autofit = FALSE)

Seeking help to generate a similar heatmap as attached

I am trying to generate a heatmap as the following figure. I have already tried pheatmap and the code is as follows:
breaks_2 <- seq(min(0), max(2), by = 0.1)
pheatmap::pheatmap(
mat = data,
cluster_cols = F,
cluster_rows = F,
scale = "column",
border_color = "white",
color = inferno(20),
show_colnames = TRUE,
show_rownames = FALSE,
breaks = breaks_2
)
But this does not seem to work. So far I am understanding I am mistaking with defining break or have to use another package than pheatmap. Any suggestion will be really helpful.
The color scale in pheatmap adjusts to the range of the input data. If you want anything above a certain value to be coloured daffodil, then simply send pheatmap a copy of your data with the highest values rounded to 2.
Suppose you have a data frame like this, with values anywhere between 0 and 3:
set.seed(1)
data <- as.data.frame(matrix(runif(64, 0, 3), nrow = 8))
names(data) <- LETTERS[1:8]
data
#> A B C D E F G H
#> 1 0.7965260 1.8873421 2.1528555 0.801662 1.4806239 2.46283888 2.1969412 0.9488151
#> 2 1.1163717 0.1853588 2.9757183 1.158342 0.5586528 1.94118058 2.0781947 1.5559028
#> 3 1.7185601 0.6179237 1.1401055 0.040171 2.4821200 2.34879829 1.4328589 1.9860152
#> 4 2.7246234 0.5296703 2.3323357 1.147164 2.0054002 1.65910893 2.5836284 1.2204906
#> 5 0.6050458 2.0610685 2.8041157 2.609073 2.3827196 1.58915874 1.3142913 2.7386278
#> 6 2.6951691 1.1523112 0.6364276 1.021047 0.3238309 2.36806870 0.7343918 0.8808101
#> 7 2.8340258 2.3095243 1.9550213 1.446240 2.1711328 0.06999361 0.2120371 1.3771972
#> 8 1.9823934 1.4930977 0.3766653 1.798697 1.2338233 1.43169020 0.2983985 0.9971840
Some of the values are greater than two. We want all of these to appear the same colour on our heatmap, so we create a copy of our data for plotting, and round down all of the values that were greater than 2 to be exactly 2:
data_2 <- data
data_2[] <- lapply(data_2, function(x) { x[x > 2] <- 2; x })
So now if we run pheatmap on data_2, we see that all the values that were greater than 2 in our original data frame are coloured daffodil.
library(viridis)
library(pheatmap)
breaks_2 <- seq(0, 2, by = 0.1)
pheatmap(
mat = data_2,
cluster_cols = F,
cluster_rows = F,
border_color = "white",
scale = 'none',
color = inferno(22),
show_colnames = TRUE,
show_rownames = FALSE,
legend_breaks = breaks_2
)

R: Customizing Scatterplots

I am using the R programming language. I am trying to follow the answer posted in this previous stackoverflow post (scatterplot3d: regression plane with residuals) and add a "plane" to a scatterplot.
Suppose I have the following data:
my_data <- data.frame(read.table(header=TRUE,
row.names = 1,
text="
weight height age
1 2998.958 15.26611 53
2 3002.208 18.08711 52
3 3008.171 16.70896 49
4 3002.374 17.37032 55
5 3000.658 18.04860 50
6 3002.688 17.24797 45
7 3004.923 16.45360 47
8 2987.264 16.71712 47
9 3011.332 17.76626 50
10 2983.783 18.10337 42
11 3007.167 18.18355 50
12 3007.049 18.11375 53
13 3002.656 15.49990 42
14 2986.710 16.73089 47
15 2998.286 17.12075 52
"))
I adapted the code to fit my example:
library(scatterplot3d)
model_1 <- lm(age ~ weight + height, data = my_data)
# scatterplot
s3d <- scatterplot3d(my_data$height, my_data$weight, my_data$age, pch = 19, type = "p", color = "darkgrey",
main = "Regression Plane", grid = TRUE, box = FALSE,
mar = c(2.5, 2.5, 2, 1.5), angle = 55)
# regression plane
s3d$plane3d(model_1, draw_polygon = TRUE, draw_lines = TRUE,
polygon_args = list(col = rgb(.1, .2, .7, .5)))
# overlay positive residuals
wh <- resid(model_1) > 0
s3d$points3d(my_data$height, my_data$weight, my_data$age, pch = 19)
Problem: However, the "plane" appears to be absent :
Desired Result:
Can someone please show me what I am doing wrong?
Thanks
The order of height and weight caused the problem.
s3d <- scatterplot3d(my_data$weight, my_data$height,my_data$age, pch = 19, type = c("p"), color = "darkgrey",
main = "Regression Plane", grid = TRUE, box = FALSE,
mar = c(2.5, 2.5, 2, 1.5), angle = 55)
# regression plane
s3d$plane3d(model_1, draw_polygon = TRUE, draw_lines = TRUE,
polygon_args = list(col = rgb(.1, .2, .7, .5)))
# overlay positive residuals
wh <- resid(model_1) > 0
s3d$points3d(my_data$height, my_data$weight, my_data$age, pch = 19)

ERROR: unused argument (output.results = TRUE)

This is how my data looks like:
> dput(head(GDP_NUTS2,5))
structure(list(Regiao = c("T", "N", "Ag", "C", "AML"), t2000 = c(12529.42964,
10054.60679, 13045.59069, 10621.51789, 18104.36306), t2001 = c(13142.7713,
10652.46712, 13920.41552, 11101.08412, 18865.55149), t2002 = c(13714.17406,
11001.34917, 14612.37052, 11507.36163, 19812.29293), t2003 = c(13985.02689,
11031.7278, 15137.89461, 11884.96687, 20165.68892), t2004 = c(14537.15966,
11354.02317, 15479.68985, 12364.05053, 21068.05117), t2005 = c(15107.92333,
11875.44359, 16237.49791, 12754.40299, 21829.31373), t2006 = c(15816.27567,
12439.6426, 17046.29326, 13378.47797, 22714.25829), t2007 = c(16660.99538,
13229.02402, 17981.40383, 14044.39707, 23847.44923), t2008 = c(16971.19746,
13579.51144, 18226.74178, 14091.85326, 24347.83971), t2009 = c(16606.6617,
13243.19054, 17038.45595, 13974.46502, 23794.44899), t2010 = c(16986.91604,
13677.38358, 16976.83391, 14284.14565, 24119.66719), t2011 = c(16655.71238,
13491.68626, 16347.69468, 14011.54637, 23503.1765), t2012 = c(15963.69251,
13111.6173, 16059.51047, 13623.68635, 22118.01701), t2013 = c(16257.04222,
13473.68717, 16301.87448, 13919.18355, 22337.24739), t2014 = c(16596.21219,
13935.07757, 16974.57715, 14220.1043, 22491.62875), t2015 = c(17322.0514,
14570.33755, 17851.78088, 14983.95312, 23101.89351), t2016 = c(18033.44444,
15283.33044, 19251.57661, 15620.77307, 23800.20038), t2017 = c(19006.33518,
16083.53849, 20893.19975, 16410.11278, 24938.22636), t2018 = c(19938.15583,
17031.94867, 22131.96942, 17242.70015, 25974.24055), t2019 = c(20755.955,
17712.44223, 23145.30242, 18045.54697, 26970.71178)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
I'm using the "REAT" package to test the absolute beta convergence comparing years 2000 (t2000) and 2019 (t2019) with OLS (Ordinary Least Squares) estimation using function betaconv.ols().
I've used this code: betaconv.ols(GDP_NUTS2$t2000, 2000, GDP_NUTS2$t2019, 2019, output.results = TRUE) I tried other version of the code but my major problem is the output.results=TRUE because I get always this error: Error in betaconv.ols(GDP_NUTS2$t2000, 2000, GDP_NUTS2$t2019, 2019, output.results = TRUE) : unused argument (output.results = TRUE)
I've been searching for alternatives of output.results but no success.
Any help will be much appreciated.
The argument is print.results based on the args of the function
> args(betaconv.ols)
function (gdp1, time1, gdp2, time2, conditions = NULL, beta.plot = FALSE,
beta.plotPSize = 1, beta.plotPCol = "black", beta.plotLine = FALSE,
beta.plotLineCol = "red", beta.plotX = "Ln (initial)", beta.plotY = "Ln (growth)",
beta.plotTitle = "Beta convergence", beta.bgCol = "gray95",
beta.bgrid = TRUE, beta.bgridCol = "white", beta.bgridSize = 2,
beta.bgridType = "solid", print.results = FALSE)
NULL
betaconv.ols(GDP_NUTS2$t2000, 2000, GDP_NUTS2$t2019, 2019, print.results = TRUE)
-output
Absolute Beta Convergence
Model coefficients (Estimation method: OLS)
Estimate Std. Error t value Pr (>|t|)
Alpha 1.537689e-01 0.048509886 3.169847 0.05048663
Beta -1.341938e-02 0.005137275 -2.612158 0.07953682
Lambda 7.110647e-04 NA NA NA
Halflife 9.748018e+02 NA NA NA
Model summary
Estimate F value df 1 df 2 Pr (>F)
R-Squared 0.6946059 6.823372 1 3 0.07953682

Incremental Plot in R Notebook Error

I am trying to create an incremental HE plot in an R notebook that will be compiled to HTML (so a .R file). The function I am using heplot() from the heplots package uses an add = TRUE parameter to overlay the graphic over the previous one, which is useful when there are multiple groups you wish to compare.
If I run this as an R notebook, I get the following error:
Error in polygon(E.ellipse, col = last(fill.col), border = last(col), :
plot.new has not been called yet
Calls: <Anonymous> ... withVisible -> eval -> eval -> heplot -> heplot.mlm -> polygon
I believe this is because the R notebook is not keeping the previous plot in memory when it evaluates the second plot.
Here is a reproducible example of the problematic R notebook file (save as .R):
#' ---
#' title: "Incremental HE Plots Test"
#' author: "Matthew Sigal"
#' date: "08 Jun 2016"
#' ---
#' ## Load package and data:
library(heplots)
data(Rohwer, package="heplots")
#' ## Multivariate models for two subsets:
rohwer.ses1 <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,
data = Rohwer, subset = SES == "Hi")
rohwer.ses2 <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,
data = Rohwer, subset = SES == "Lo")
#' ## Overlaid visualization:
heplot(rohwer.ses2, col = c("red", rep("black",5), "blue"),
hypotheses = list("B=0, Low SES" = c("n", "s", "ns", "na", "ss")),
level = 0.5, cex = 1.25,
fill = c(TRUE, FALSE), fill.alpha = 0.05,
xlim = c(-15, 110), ylim = c(40, 110),
label.pos = c(1, rep(NULL, 5), 1))
#' ## High SES students:
heplot(rohwer.ses1, col = c("red", rep("black", 5), "blue"),
hypotheses = list("B=0, High SES" = c("n", "s", "ns", "na", "ss")),
level = 0.5, cex = 1.25,
add = TRUE, # place both plots on same graphic
error = TRUE, # error ellipse is not drawn by default with add = TRUE
fill = c(TRUE, FALSE), fill.alpha = 0.05,
xlim = c(-15, 110), ylim = c(40, 110))
I thought that maybe using a chunk option, such as fig.show="hold" might work, but this did not solve the issue.
If I knit this in an Rmarkdown document, it works as expected (save as .Rmd):
---
title: "Rmd Test"
author: "Matthew Sigal"
date: "June 9, 2016"
output: html_document
---
## Test
```{r}
library(heplots)
data(Rohwer, package="heplots")
rohwer.ses1 <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,
data = Rohwer, subset = SES == "Hi")
rohwer.ses2 <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,
data = Rohwer, subset = SES == "Lo")
heplot(rohwer.ses2, col = c("red", rep("black",5), "blue"),
hypotheses = list("B=0, Low SES" = c("n", "s", "ns", "na", "ss")),
level = 0.5, cex = 1.25,
fill = c(TRUE, FALSE), fill.alpha = 0.05,
xlim = c(-15, 110), ylim = c(40, 110),
label.pos = c(1, rep(NULL, 5), 1))
heplot(rohwer.ses1, col = c("red", rep("black", 5), "blue"),
hypotheses = list("B=0, High SES" = c("n", "s", "ns", "na", "ss")),
level = 0.5, cex = 1.25,
add = TRUE, # place both plots on same graphic
error = TRUE, # error ellipse is not drawn by default with add = TRUE
fill = c(TRUE, FALSE), fill.alpha = 0.05,
xlim = c(-15, 110), ylim = c(40, 110))
```
So, my question is: how can I get the Rnotebook compiler to act similarly to the Rmarkdown compiler?
Apparently, my issue was very minor - my comment for the second HE plot created a new chunk, which is what caused the error.
A working script simply removes the #' from the comment between the two calls to heplot()!

Resources