I am trying to plot the prediction error curve from pec package but I can't change the legend position and size. There's an example from pec package:
library(rms)
library(pec)
data(pbc)
pbc <- pbc[sample(1:NROW(pbc),size=100),]
f1 <- psm(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc)
f2 <- coxph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,x=TRUE,y=TRUE)
f3 <- cph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,surv=TRUE)
brier <- pec(list("Weibull"=f1,"CoxPH"=f2,"CPH"=f3),data=pbc,formula=Surv(time,status!=0)~1)
print(brier)
plot(brier)
But shows a big the legend in the middle of plot.
I also tried:
plot(brier, legend = "topright")
class(brier)
But don't show legend.
How can I change the position of legend? And also ¿is it posible to plot this graph using ggplot?
I think I got what you want using ggplot2. The idea is to pick elements from your brier object that contains data for the plot, make a dataframe with it and plot it.
library(ggplot2)
# packages for the pipe and pivot_wider, you can do it with base functions, I just prefer these
library(tidyr)
library(dplyr)
df <- do.call(cbind, brier[["AppErr"]]) # contains y values for each model
df <- cbind(brier[["time"]], df) # values of the x axis
colnames(df)[1] <- "time"
df <- as.data.frame(df) %>% pivot_longer(cols = 2:last_col(), names_to = "models", values_to = "values") # pivot table to long format makes it easier to use ggplot
ggplot(data = df, aes(x = time, y = values, color = models)) +
geom_line() # I suppose you know how to custom axis names etc.
Output:
Related
What I'm currently stuck on is trying to plot each column of my dataframe as its own histogram in ggplot. I attached a screenshot below:
Ideally I would be able to compare the values in every 'Esteem' column side-by-side by plotting multiple histograms.
I tried using the melt() function to reshape my dataframe, and then feed into ggplot() but somewhere along the way I'm going wrong...
You could pivot to long, then facet by column:
library(tidyr)
library(ggplot2)
esteem81_long <- esteem81 %>%
pivot_longer(
Esteem81_1:Esteem81_10,
names_to = "Column",
values_to = "Value"
)
ggplot(esteem81_long, aes(Value)) +
geom_bar() +
facet_wrap(vars(Column))
Or for a list of separate plots, just loop over the column names:
plots <- list()
for (col in names(esteem81)[-1]) {
plots[[col]] <- ggplot(esteem81) +
geom_bar(aes(.data[[col]]))
}
plots[["Esteem81_4"]]
Example data:
set.seed(13)
esteem81 <- data.frame(Subject = c(2,6,7,8,9))
for (i in 1:10) {
esteem81[[paste0("Esteem81_", i)]] <- sample(1:4, 5, replace = TRUE)
}
esteem_long <- esteem81 %>% pivot_longer(cols = -c(Subject))
plot <- ggplot(esteem_long, aes(x = value)) +
geom_histogram(binwidth = 1) +
facet_wrap(vars(name))
plot
I'm using pivot_longer() from tidyr and ggplot2 for the plotting.
The line pivot_longer(cols = -c(Subject)) reads as "apart from the "Subject" column, all the others should be pivoted into long form data." I've left the default new column names ("name" and "value") - if you rename them then be sure to change the downstream code.
geom_histogram automates the binning and tallying of the data into histogram format - change the binwidth parameter to suit your desired outcome.
facet_wrap() allows you to specify a grouping variable (here name) and will replicate the plot for each group.
when using the simple R boxplot function, I can easily place my dataframe directly into the parenthesis and a perfect boxplot emerges, eg:
baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314)
boxplot(naive_capqx)
this is an image of the boxplot made with the simple R boxplot function
However, I need to make this boxplot slightly more aesthetic and so I need to use ggplot. When I place the dataframe itself in, the boxplot cannot form as I need to specify x, y and fill coordinates, which I don't have. My y coordinates are the values for each vector in the dataframe and my x coordinates are just the name of the vector. How can I do this using ggplot? Is there a way to reform my dataframe so I can split it into coordinates, or is there a way ggplot can read my data?
geom_boxplot expects tidy data. Your data isn't tidy because the column names contain information. So the first thing to do is to tidy your data by using pivot_longer...
library(tidyverse)
naive_capqx %>%
pivot_longer(everything(), values_to="Value", names_to="Variable") %>%
ggplot() +
geom_boxplot(aes(x=Variable, y=Value))
giving
Turn the df into a long format df. Below, I use gather() to lengthen the df; I use group_by() to ensure boxplot calculation by key (formerly column name).
pacman::p_load(ggplot2, tidyverse)
baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314) %>%
gather("key", "value")) %>%
group_by(key)
ggplot(naive_capqx, mapping = aes(x = key, y = value)) +
geom_boxplot()
I have XY data (a 2D tSNE embedding of high dimensional data) which I'd like to scatter plot. The data are assigned to several clusters, so I'd like to color code the points by cluster and then add a single label for each cluster, that has the same color coding as the clusters, and is located outside (as much as possible) from the cluster's points.
Any idea how to do this using R in either ggplot2 and ggrepel or plotly?
Here's the example data (the XY coordinates and cluster assignments are in df and the labels in label.df) and the ggplot2 part of it:
library(dplyr)
library(ggplot2)
set.seed(1)
df <- do.call(rbind,lapply(seq(1,20,4),function(i) data.frame(x=rnorm(50,mean=i,sd=1),y=rnorm(50,mean=i,sd=1),cluster=i)))
df$cluster <- factor(df$cluster)
label.df <- data.frame(cluster=levels(df$cluster),label=paste0("cluster: ",levels(df$cluster)))
ggplot(df,aes(x=x,y=y,color=cluster))+geom_point()+theme_minimal()+theme(legend.position="none")
The geom_label_repel() function in the ggrepel package allows you to easily add labels to plots while trying to "repel" the labels from not overlapping with other elements. A slight addition to your existing code where we summarize the data / get coordinates of where to put the labels (here I chose the upper left'ish region of each cluster - which is the min of x and the max of y) and merge it with your existing data containing the cluster labels. Specify this data frame in the call to geom_label_repel() and specify the variable that contains the label aesthetic in aes().
library(dplyr)
library(ggplot2)
library(ggrepel)
set.seed(1)
df <- do.call(rbind,lapply(seq(1,20,4),function(i) data.frame(x=rnorm(50,mean=i,sd=1),y=rnorm(50,mean=i,sd=1),cluster=i)))
df$cluster <- factor(df$cluster)
label.df <- data.frame(cluster=levels(df$cluster),label=paste0("cluster: ",levels(df$cluster)))
label.df_2 <- df %>%
group_by(cluster) %>%
summarize(x = min(x), y = max(y)) %>%
left_join(label.df)
ggplot(df,aes(x=x,y=y,color=cluster))+geom_point()+theme_minimal()+theme(legend.position="none") +
ggrepel::geom_label_repel(data = label.df_2, aes(label = label))
I'm eviews user and eviews very basically draws scatter plots matrix.
In the following graph, I have 13 different group datas and Eviews draws one group data against 12 groups' data in 12 plots in one graph with regression line.
How can I realize same graph with Rstudio?
Here is an example on how to do the requested plot in ggplot:
First some data:
z <- matrix(rnorm(1000), ncol= 10)
The basic idea here is to convert the wide matrix to long format where the variable that is compared to all others is duplicated as many times as there are other variables. Each of these other variables gets a specific label in the key column. ggplot likes the data in this format
library(tidyverse)
z %>%
as.tibble() %>% #convert matrix to tibble or data.frame
gather(key, value, 2:10) %>% #convert to long format specifying variable columns 2:10
mutate(key = factor(key, levels = paste0("V", 1:10))) %>% #specify levels so the facets go in the correct order to avoid V10 being before V2
ggplot() +
geom_point(aes(value, V1))+ #plot points
geom_smooth(aes(value, V1), method = "lm", se = F)+ #plot lm fit without se
facet_wrap(~key) #facet by key
I am using a geom_bar plot in ggplotly, and it renders negative bars positive. Any ideas why this might be the case, and in particular how to solve this?
library(ggplot2)
library(plotly)
dat1 <- data.frame(
sex = factor(c("Female","Female","Male","Male")),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(-13.53, 16.81, 16.24, 17.42)
)
# Bar graph, time on x-axis, color fill grouped by sex -- use position_dodge()
g <- ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity", position=position_dodge())
ggplotly(g)
Why would the first bar be in a positive direction, with a negative value?
The versions that I am using is the latest:
plotly_3.4.13
ggplot2_2.1.0
If you write your plotly object to another variable you can modify the plotly properties including the 'data' it uses to render the plot.
For your specific example append this to your code:
#create plotly object to manipulate
gly<-ggplotly(g)
#confirm existing data structure/values
gly$x$data[[1]]
# see $y has values of 13.53, 16.81 which corresponds to first groups absolute values
#assign to original data
gly$x$data[[1]]$y <- dat1$total_bill[grep("Female",dat1$sex)]
#could do for second group too if needed
gly$x$data[[2]]$y <- dat1$total_bill[grep("Male",dat1$sex)]
#to see ggplotly object with changes
gly
I have come up with a general solution that works in cases where facet wrap is being used. Here is an example of the problem with toy data:
set.seed(45)
df <- data.frame( group=rep(1:4,5), TitleX=rep(1:5,4), TitleY=sample(-5:5,20, replace = TRUE))
h <- ggplot(df) + geom_bar(aes(TitleX,TitleY),stat = 'identity') + facet_wrap(~group)
h
When we use ggplotly we see what OP saw, which is that the negatives have disappeared:
gly <- ggplotly(h)
gly
I wrote a function that will check for the instances in each facet list where the y values in the text are given as 0, which seems to be a comorbid issue with the one I am currently addressing:
fix_bar_ly <- function(element,yname){
tmp <- as.data.frame(element[c("y","text")])
tmp <- tmp %>% mutate(
y=ifelse(grepl(paste0(yname,": 0$"),text),
ifelse(y!=0,-y,y),
y)
)
element$y <- tmp$y
element
}
Now I apply this function to the data for each facet:
data.list <- gly$x$data
m <- lapply(data.list,function(x){fix_bar_ly(x,"TitleY")})
gly$x$data <- m
gly
For some reason the spaces between the bars have disappeared ... but at least the values are negative in the appropriate places.