ggcorplot() : Changing Variable Labels on X and Y Axis - r

I'm currently working on a correlation plot with gradient for a dataset involving social factors and outcomes such as grades.
The variable names are not that accessible, and I was wondering how to change, for example, "famrel" to "Family Relationship" on the axis.
I am using ggcorrplot() as well as ggplotly to add interactivity.
Any help would be much appreciated! I've been googling for two hours but cannot for the life of me find an applicable solution that doesn't involve altering the original dataframe.
df_corr <- select_if(df, is.numeric)
df_corr
corr <- round(cor(df_corr), 1)
p.mat <- cor_pmat(df_corr)
corr.plot <- ggcorrplot(corr,
hc.order = TRUE,
type = "lower",
)
ggplotly(corr.plot)
Above is my code; I have also attached a screenshot of the resulting chart.
I tried googling as well as searching stack overflow for the answer to my question, but was unsuccessful.

In addition to #Kat's option of directly editing the dataframe, you can also rename the column names and rownames directly in corr. Also be aware that select_if has been superseded.
library(tidyverse)
library(ggcorrplot)
library(plotly)
data(mtcars)
df <- mtcars %>% select(mpg, gear, disp, drat)
df_corr <- df %>% select(where(is.numeric)) # changed here
corr <- round(cor(df_corr), 1)
colnames(corr) <- c("mpg_new", "gear_new", "disp_new", "drat_new")
rownames(corr) <- c("mpg_new", "gear_new", "disp_new", "drat_new")
p.mat <- cor_pmat(df_corr)
corr.plot <- ggcorrplot(corr, hc.order = TRUE, type = "lower")
ggplotly(corr.plot)
If you only want to change a single value on the axes (e.g., famrel), then you can edit a single rowname and a single column name like this:
colnames(corr)[colnames(corr) == "mpg"] <- "mpg_new"
rownames(corr)[rownames(corr) == "mpg"] <- "mpg_new"

Related

Ordering colors on colored bar for dendrogram in R

The vignette for the R package dendextend (https://cran.r-project.org/web/packages/dendextend/vignettes/dendextend.html) gives an example of using the colored_bars function with cutreeDynamic from package dynamicTreeCut as follows:
# let's get the clusters
library(dynamicTreeCut)
data(iris)
x <- iris[,-5] %>% as.matrix
hc <- x %>% dist %>% hclust
dend <- hc %>% as.dendrogram
# Find special clusters:
clusters <- cutreeDynamic(hc, distM = as.matrix(dist(x)), method = "tree")
# we need to sort them to the order of the dendrogram:
clusters <- clusters[order.dendrogram(dend)]
clusters_numbers <- unique(clusters) - (0 %in% clusters)
n_clusters <- length(clusters_numbers)
library(colorspace)
cols <- rainbow_hcl(n_clusters)
true_species_cols <- rainbow_hcl(3)[as.numeric(iris[,][order.dendrogram(dend),5])]
dend2 <- dend %>%
branches_attr_by_clusters(clusters, values = cols) %>%
color_labels(col = true_species_cols)
plot(dend2)
clusters <- factor(clusters)
levels(clusters)[-1] <- cols[-5][c(1,4,2,3)]
# Get the clusters to have proper colors.
# fix the order of the colors to match the branches.
colored_bars(clusters, dend, sort_by_labels_order = FALSE)
The following line reorders the colors to match the branches:
levels(clusters)[-1] <- cols[-5][c(1,4,2,3)]
I wish to apply this method to my own data which has many more clusters, but I am unclear on how the revised ordering of the colors was determined. This example uses a custom ordering for the iris data. Can anyone explain how this order was determined and is there a way to automate this?
Just for starters, your example code above from the data(iris)was missing two necessary packages, library(dplyr) to be able to use the pipe command %>% and library(dendextend) for the label colors, from color_lables()
In order to answer your question, solution can be found in the levels(clusters)[-1] <- cols[-5][c(1,4,3,2)] section of code. As you mention, this is custom to this specific dataset, but I am unaware of why the authors picked this specific order. If you do not set the order, and want R to automatically do it, than in the colored_bars() command, the sort_by_labels_order=TRUE must be set. Here, it is set to FALSE since the authors use a custom order.
If it is set to TRUE, than I cite directly from R "the colors vector/matrix should be provided in the order of the original data order (and it will be re-ordered automatically to the order of the dendrogram)". For more information, see ?colored_bars()
This will show you the difference betweeen the two parameters, when set to FALSE or TRUE.
# let's get the clusters
library(dynamicTreeCut)
library(dplyr)
data(iris)
x <- iris[,-5] %>% as.matrix
hc <- x %>% dist %>% hclust
dend <- hc %>% as.dendrogram
# Find special clusters:
clusters <- cutreeDynamic(hc, distM = as.matrix(dist(x)), method = "tree")
# we need to sort them to the order of the dendrogram:
clusters <- clusters[order.dendrogram(dend)]
clusters_numbers <- unique(clusters) - (0 %in% clusters)
n_clusters <- length(clusters_numbers)
library(colorspace)
library(dendextend)
cols <- rainbow_hcl(n_clusters)
true_species_cols <- rainbow_hcl(3)[as.numeric(iris[,][order.dendrogram(dend),5])]
dend2 <- dend %>%
branches_attr_by_clusters(clusters, values = cols) %>%
color_labels(col = true_species_cols)
clusters <- factor(clusters)
levels(clusters)[-1] <- cols[-5][c(1,4,2,3)]
plot(dend2);colored_bars(clusters, dend, sort_by_labels_order = FALSE)
# here R automatically assigned the colors
plot(dend2);colored_bars(clusters, dend, sort_by_labels_order = TRUE)

Defining particular sub-groups in survival analysis in R

This is the code in R that produces a Kaplan-Meier plot of Overall Survival for a population broken down by Stage.
library(tidyverse)
library(forcats)
library(broom)
library(survival)
library(Hmisc)
library(gmodels)
library(lazyeval)
library(plotrix)
library(summariser)
library(magrittr)
library(survminer)
library(dplyr)
library(lattice)
library(Formula)
library(lubridate)
library(ggfortify)
library(readxl)
icccdata = read_excel("ICCC_All_20072016.xls")
head(icccdata)
km <- with(icccdata, Surv(Time, Status))
# STAGE specific OVERALL SURVIVAL
survival_object2 <- Surv(icccdata$Time, icccdata$CancerSurvival)
str(survival_object2)
my_survfit_STAGE_OS <- survfit(survival_object2 ~ Stage, data = icccdata)
print(my_survfit_STAGE_OS, print.rmean = TRUE)
dat_my_survfit_STAGE_OS <- fortify(my_survfit_STAGE_OS)
ggsurvplot(my_survfit_STAGE_OS, risk.table = TRUE, xlab = "Time (years)", censor = T)
The Stage data consists of the values 0, I, II, III, IV.
I want to be able to just show the values for Stage I, without having the Stage 0, II, III, or IV displayed. I'd appreciate some help with the code to separate out a single sub-group.
A
I would recommend building your own ggplot() instead of using ggsurvplot().
This can be done by using surv_summary() from the survminer package. This is also what ggsurvplot() is using behind the scenes.
E.g.:
df <- surv_summary(my_survfit_STAGE_OS)
df %>% filter(Stage == "I") %>%
ggplot(aes(x = time, y = surv, col = Stage)) +
geom_step()
If you want to plot stage I and II you could use %in% like
df %>% filter(Stage %in% c("I", "II")) %>% ggplot(....)
In ggsurvplot() a dataframe from surv_summary() is passed to ggsurvplot_df() and then created using ggplot() based on user options.
Check out the R source code here:
https://github.com/kassambara/survminer/blob/master/R/ggsurvplot_df.R
If you want an at risk table this can be created using ggrisktable()

Change position of legend in plot of pec object

I am trying to plot the prediction error curve from pec package but I can't change the legend position and size. There's an example from pec package:
library(rms)
library(pec)
data(pbc)
pbc <- pbc[sample(1:NROW(pbc),size=100),]
f1 <- psm(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc)
f2 <- coxph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,x=TRUE,y=TRUE)
f3 <- cph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,surv=TRUE)
brier <- pec(list("Weibull"=f1,"CoxPH"=f2,"CPH"=f3),data=pbc,formula=Surv(time,status!=0)~1)
print(brier)
plot(brier)
But shows a big the legend in the middle of plot.
I also tried:
plot(brier, legend = "topright")
class(brier)
But don't show legend.
How can I change the position of legend? And also ¿is it posible to plot this graph using ggplot?
I think I got what you want using ggplot2. The idea is to pick elements from your brier object that contains data for the plot, make a dataframe with it and plot it.
library(ggplot2)
# packages for the pipe and pivot_wider, you can do it with base functions, I just prefer these
library(tidyr)
library(dplyr)
df <- do.call(cbind, brier[["AppErr"]]) # contains y values for each model
df <- cbind(brier[["time"]], df) # values of the x axis
colnames(df)[1] <- "time"
df <- as.data.frame(df) %>% pivot_longer(cols = 2:last_col(), names_to = "models", values_to = "values") # pivot table to long format makes it easier to use ggplot
ggplot(data = df, aes(x = time, y = values, color = models)) +
geom_line() # I suppose you know how to custom axis names etc.
Output:

How to plot a large number of density plots with different categorical variables

I have a dataset in which I have one numeric variable and many categorical variables. I would like to make a grid of density plots, each showing the distribution of the numeric variable for different categorical variables, with the fill corresponding to subgroups of each categorical variable. For example:
library(tidyverse)
library(nycflights13)
dat <- flights %>%
select(carrier, origin, distance) %>%
mutate(origin = origin %>% as.factor,
carrier = carrier %>% as.factor)
plot_1 <- dat %>%
ggplot(aes(x = distance, fill = carrier)) +
geom_density()
plot_1
plot_2 <- dat %>%
ggplot(aes(x = distance, fill = origin)) +
geom_density()
plot_2
I would like to find a way to quickly make these two plots. Right now, the only way I know how to do this is to create each plot individually, and then use grid_arrange to put them together. However, my real dataset has something like 15 categorical variables, so this would be very time intensive!
Is there a quicker and easier way to do this? I believe that the hardest part about this is that each plot has its own legend, so I'm not sure how to get around that stumbling block.
This solutions gives all the plots in a list. Here we make a single function that accepts a variable that you want to plot, and then use lapply with a vector of all the variables you want to plot.
fill_variables <- vars(carrier, origin)
func_plot <- function(fill_variable) {
dat %>%
ggplot(aes(x = distance, fill = !!fill_variable)) +
geom_density()
}
plotlist <- lapply(fill_variables, func_plot)
If you have no idea of what those !! mean, I recommend watching this 5 minute video that introduces the key concepts of tidy evaluation. This is what you want to use when you want to create this sorts of wrapper functions to do stuff programmatically. I hope this helps!
Edit: If you want to feed an array of strings instead of a quosure, you can change !!fill_variable for !!sym(fill_variable) as follows:
fill_variables <- c('carrier', 'origin')
func_plot <- function(fill_variable) {
dat %>%
ggplot(aes(x = distance, fill = !!sym(fill_variable))) +
geom_density()
}
plotlist <- lapply(fill_variables, func_plot)
Alternative solution
As #djc wrote in the comments, I'm having trouble passing the column names into 'fill_variables'. Right now I am extracting column names using the following code...
You can separate the categorical and numerical variables like; cat_vars <- flights[, sapply(flights, is.character)] for categorical variables and cat_vars <- flights[, sapply(flights, !is.character)] for continuous variables and then pass these vectors into the wrapper function given by mgiormenti
Full code is given below;
library(tidyverse)
library(nycflights13)
cat_vars <- flights[, sapply(flights, is.character)]
cont_vars<- flights[, !sapply(flights, is.character)]
dat <- flights %>%
select(carrier, origin, distance) %>%
mutate(origin = origin %>% as.factor,
carrier = carrier %>% as.factor)
func_plot_cat <- function(cat_vars) {
dat %>%
ggplot(aes(x = distance, fill = !!cat_vars)) +
geom_density()
}
func_plot_cont <- function(cont_vars) {
dat %>%
ggplot(aes(x = distance, fill = !!cont_vars)) +
geom_point()
}
plotlist_cat_vars <- lapply(cat_vars, func_plot_cat)
plotlist_cont_vars<- lapply(cont_vars, func_plot_cont)
print(plotlist_cat_vars)
print(plotlist_cont_vars)

ggvis barplot: negative values

I'm trying to draw a barplot using ggvis, for some data where for each variable I have both a negative and a positive value. It would be similar to this example from ggplot2.
However, when I try something similar in ggvis, I end up with basically no plot at all, just some weird lines.
Example data:
df <- data.frame(
direction=rep(c("up", "down"), each=3),
value=c(1:3, -c(1:3)),
x=rep(c("A", "B", "C"), 2))
This works, for all positive values:
df %>%
mutate(value.pos=abs(value)) %>%
ggvis(x=~x, y=~value.pos) %>%
group_by(direction) %>%
layer_bars(stack=TRUE)
This gives me nothing:
df %>%
ggvis(x=~x, y=~value) %>%
group_by(direction) %>%
layer_bars(stack=TRUE)
I've also tried various combinations of plotting them one by one, e.g.:
df %>%
spread(key=direction, value=value) %>%
ggvis(x=~x, y=~up) %>%
layer_bars() %>%
layer_bars(x=~x, y=~down)
So far, no luck. I suspect I'm missing some simple solution...
I don't ggvis lets you produce stacked bar plots with negative values within the same groups as positive data.
This is because if an x value appear more than once in the data, then ggvis will sum up the y values at each x. I had thought that since you plotted the vector 1:3, they canceled out, but that's not the case.
As of now, I do not believe that dodged bar plots exist for this. It also messes with the labels.
You can produce the plot non-stacked, while filling in the position.
df %>%
group_by(direction) %>%
ggvis(x=~x, y=~value, fill = ~direction) %>%
layer_bars(stack = FALSE)
Anyways, you might consider avoiding ggvis for any production work since it is under development, and hasn't been updated in a couple of months.
#shayaa
Thanks, this does seem to be working, although it will probably require some tweaking, and may not look as nicely as if I was using ggplot2. Actually, the reason I am using ggvis, is because I would like to combine it with shiny, to make a small interactive web version. For example:
df <- data.frame(
direction=rep(c("up", "down"), each=3),
value=c(1:3, -c(1:3)),
x=rep(c("A", "B", "C"), 2))
plot_fct <- function(letter) {
df %>%
filter(x==letter) %>%
ggvis(x=~x, y=~value, fill = ~direction) %>%
layer_bars(stack = FALSE) %>%
scale_numeric("y", domain=c(NA,NA))
}
ui <- fluidPage(
sidebarPanel(
selectInput("letter", "Choose letter", c("A", "B", "C"), selected="A")
),
mainPanel(
ggvisOutput("letter_barplot")
)
)
server <- function(input, output) {
plot_fct(letter=reactive(input$letter)) %>% bind_shiny("letter_barplot")
}
runApp(shinyApp(ui, server))
However, it does not seem to work for me anyway, since there is some issue with the reactive being of class character. I keep getting the error:
Error in eval(substitute(expr), envir, enclos) :
comparison (1) is possible only for atomic and list types
Guess I'll have to keep trying.

Resources