ggplotly plots negative bars in positive direction - r

I am using a geom_bar plot in ggplotly, and it renders negative bars positive. Any ideas why this might be the case, and in particular how to solve this?
library(ggplot2)
library(plotly)
dat1 <- data.frame(
sex = factor(c("Female","Female","Male","Male")),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(-13.53, 16.81, 16.24, 17.42)
)
# Bar graph, time on x-axis, color fill grouped by sex -- use position_dodge()
g <- ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity", position=position_dodge())
ggplotly(g)
Why would the first bar be in a positive direction, with a negative value?
The versions that I am using is the latest:
plotly_3.4.13
ggplot2_2.1.0

If you write your plotly object to another variable you can modify the plotly properties including the 'data' it uses to render the plot.
For your specific example append this to your code:
#create plotly object to manipulate
gly<-ggplotly(g)
#confirm existing data structure/values
gly$x$data[[1]]
# see $y has values of 13.53, 16.81 which corresponds to first groups absolute values
#assign to original data
gly$x$data[[1]]$y <- dat1$total_bill[grep("Female",dat1$sex)]
#could do for second group too if needed
gly$x$data[[2]]$y <- dat1$total_bill[grep("Male",dat1$sex)]
#to see ggplotly object with changes
gly

I have come up with a general solution that works in cases where facet wrap is being used. Here is an example of the problem with toy data:
set.seed(45)
df <- data.frame( group=rep(1:4,5), TitleX=rep(1:5,4), TitleY=sample(-5:5,20, replace = TRUE))
h <- ggplot(df) + geom_bar(aes(TitleX,TitleY),stat = 'identity') + facet_wrap(~group)
h
When we use ggplotly we see what OP saw, which is that the negatives have disappeared:
gly <- ggplotly(h)
gly
I wrote a function that will check for the instances in each facet list where the y values in the text are given as 0, which seems to be a comorbid issue with the one I am currently addressing:
fix_bar_ly <- function(element,yname){
tmp <- as.data.frame(element[c("y","text")])
tmp <- tmp %>% mutate(
y=ifelse(grepl(paste0(yname,": 0$"),text),
ifelse(y!=0,-y,y),
y)
)
element$y <- tmp$y
element
}
Now I apply this function to the data for each facet:
data.list <- gly$x$data
m <- lapply(data.list,function(x){fix_bar_ly(x,"TitleY")})
gly$x$data <- m
gly
For some reason the spaces between the bars have disappeared ... but at least the values are negative in the appropriate places.

Related

Change position of legend in plot of pec object

I am trying to plot the prediction error curve from pec package but I can't change the legend position and size. There's an example from pec package:
library(rms)
library(pec)
data(pbc)
pbc <- pbc[sample(1:NROW(pbc),size=100),]
f1 <- psm(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc)
f2 <- coxph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,x=TRUE,y=TRUE)
f3 <- cph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,surv=TRUE)
brier <- pec(list("Weibull"=f1,"CoxPH"=f2,"CPH"=f3),data=pbc,formula=Surv(time,status!=0)~1)
print(brier)
plot(brier)
But shows a big the legend in the middle of plot.
I also tried:
plot(brier, legend = "topright")
class(brier)
But don't show legend.
How can I change the position of legend? And also ¿is it posible to plot this graph using ggplot?
I think I got what you want using ggplot2. The idea is to pick elements from your brier object that contains data for the plot, make a dataframe with it and plot it.
library(ggplot2)
# packages for the pipe and pivot_wider, you can do it with base functions, I just prefer these
library(tidyr)
library(dplyr)
df <- do.call(cbind, brier[["AppErr"]]) # contains y values for each model
df <- cbind(brier[["time"]], df) # values of the x axis
colnames(df)[1] <- "time"
df <- as.data.frame(df) %>% pivot_longer(cols = 2:last_col(), names_to = "models", values_to = "values") # pivot table to long format makes it easier to use ggplot
ggplot(data = df, aes(x = time, y = values, color = models)) +
geom_line() # I suppose you know how to custom axis names etc.
Output:

ggplot add horizontal line inside for loop [duplicate]

Summary: When I use a "for" loop to add layers to a violin plot (in ggplot), the only layer that is added is the one created by the final loop iteration. Yet in explicit code that mimics the code that the loop would produce, all the layers are added.
Details: I am trying to create violin graphs with overlapping layers, to show the extent that estimate distributions do or do not overlap for several survey question responses, stratified by place. I want to be able to include any number of places, so I have one column in by dataframe for each place, and am trying to use a "for" loop to generate one ggplot layer per place. But the loop only adds the layer from the loop's final iteration.
This code illustrates the problem, and some suggested approaches that failed:
library(ggplot2)
# Create a dataframe with 500 random normal values for responses to 3 survey questions from two cities
topic <- c("Poverty %","Mean Age","% Smokers")
place <- c("Chicago","Miami")
n <- 500
mean <- c(35, 40,58, 50, 25,20)
var <- c( 7, 1.5, 3, .25, .5, 1)
df <- data.frame( topic=rep(topic,rep(n,length(topic)))
,c(rnorm(n,mean[1],var[1]),rnorm(n,mean[3],var[3]),rnorm(n,mean[5],var[5]))
,c(rnorm(n,mean[2],var[2]),rnorm(n,mean[4],var[4]),rnorm(n,mean[6],var[6]))
)
names(df)[2:dim(df)[2]] <- place # Name those last two columns with the corresponding place name.
head(df)
# This "for" loop seems to only execute the final loop (i.e., where p=3)
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in 2:dim(df)[2]) {
g <- g + geom_violin(aes(y = df[,p], colour = place[p-1]), alpha = 0.3)
}
g
# But mimicing what the for loop does in explicit code works fine, resulting in both "place"s being displayed in the graph.
g <- ggplot(df, aes(factor(topic), df[,2]))
g <- g + geom_violin(aes(y = df[,2], colour = place[2-1]), alpha = 0.3)
g <- g + geom_violin(aes(y = df[,3], colour = place[3-1]), alpha = 0.3)
g
## per http://stackoverflow.com/questions/18444620/set-layers-in-ggplot2-via-loop , I tried
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in 2:dim(df)[2]) {
df1 <- df[,c(1,p)]
g <- g + geom_violin(aes(y = df1[,2], colour = place[p-1]), alpha = 0.3)
}
g
# but got the same undesired result
# per http://stackoverflow.com/questions/15987367/how-to-add-layers-in-ggplot-using-a-for-loop , I tried
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in names(df)[-1]) {
cat(p,"\n")
g <- g + geom_violin(aes_string(y = p, colour = p), alpha = 0.3) # produced this error: Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0
# g <- g + geom_violin(aes_string(y = p ), alpha = 0.3) # produced this error: Error: stat_ydensity requires the following missing aesthetics: y
}
g
# but that failed to produce any graphic, per the errors noted in the "for" loop above
The reason this is happening is due to ggplot's "lazy evaluation". This is a common problem when ggplot is used this way (making the layers separately in a loop, rather than having ggplot to it for you, as in #hrbrmstr's solution).
ggplot stores the arguments to aes(...) as expressions, and only evaluates them when the plot is rendered. So, in your loops, something like
aes(y = df[,p], colour = place[p-1])
gets stored as is, and evaluated when you render the plot, after the loop completes. At this point, p=3 so all the plots are rendered with p=3.
So the "right" way to do this is to use melt(...) in the reshape2 package so convert your data from wide to long format, and let ggplot manage the layers for you. I put "right" in quotes because in this particular case there is a subtlety. When calculating the distributions for the violins using the melted data frame, ggplot uses the grand total (for both Chicago and Miami) as the scale. If you want violins based on frequency scaled individually, you need to use loops (sadly).
The way around the lazy evaluation problem is to put any reference to the loop index in the data=... definition. This is not stored as an expression, the actual data is stored in the plot definition. So you could do this:
g <- ggplot(df,aes(x=topic))
for (p in 2:length(df)) {
gg.data <- data.frame(topic=df$topic,value=df[,p],city=names(df)[p])
g <- g + geom_violin(data=gg.data,aes(y=value, color=city))
}
g
which gives the same result as yours. Note that the index p does not show up in aes(...).
Update: A note about scale="width" (mentioned in a comment). This causes all the violins to have the same width (see below), which is not the same scaling as in OP's original code. IMO this is not a great way to visualize the data, as it suggests there is much more data in the Chicago group.
ggplot(gg) +geom_violin(aes(x=topic,y=value,color=variable),
alpha=0.3,position="identity",scale="width")
You can do it w/o a loop:
df.2 <- melt(df)
gg <- ggplot(df.2, aes(x=topic, y=value))
gg <- gg + geom_violin(position="identity", aes(color=variable), alpha=0.3)
gg
You can use aes_() rather than aes(), which appears to stop the lazy evaluation. Answer found on a closed question that links here (Update a ggplot using a for loop (R)), but thought it should be here since the other question was closed.
While generally speaking, reshaping the data is always preferred, with newer version of ggplot2 (>3.0.0), you can use !! to inject values into the aes() For example you can do
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in 2:dim(df)[2]) {
g <- g + geom_violin(aes(y = df[,!!p], colour = place[!!p-1]), alpha = 0.3)
}
g
To get the desired result. The !! will force evaluation rather than remaining lazy as is the default.

boxplots with missing values in R - ggplot

I am trying to make boxplots for a matrix (athTp) with 6 variables (columns) but with many missing values, '
ggplot(athTp)+geom_boxplot()
But maybe sth I am doing wrong...
I tried also to make many box plots and after to arrange the grid, but the final plot was very small (in desired dimensions), loosing many of details.
q1 <- ggplot(athTp,aes(x="V1", y=athTp[,1]))+ geom_boxplot()
..continue with other 5 columns
grid.arrange(q1,q2,q3,q4,q5,q6, ncol=6)
ggsave("plot.pdf",plot = qq, width = 8, height = 8, units = "cm")
Do you have any ideas?
Thanks in advance!
# ok so your data has 6 columns like this
set.seed(666)
dat <- data.frame(matrix(runif(60,1,20),ncol=6))
names(dat) <- letters[1:6]
head(dat)
# so let's get in long format like ggplot likes
library(reshape2)
longdat <- melt(dat)
head(longdat)
# and try your plot call again specifying that we want a box plot per column
# which is now indicated by the "variable" column
# [remember you should specify the x and y axes with `aes()`]
library(ggplot2)
ggplot(longdat, aes(x=variable, y=value)) + geom_boxplot(aes(colour = variable))

R plot two series of means with 95% confidence intervals

I am trying to plot the following data
factor <- as.factor(c(1,2,3))
V1_mean <- c(100,200,300)
V2_mean <- c(350,150,60)
V1_stderr <- c(5,9,3)
V2_stderr <- c(12,9,10)
plot <- data.frame(factor,V1_mean,V2_mean,V1_stderr,V2_stderr)
I want to create a plot with factor on the x-axis, value on the y-axis and seperate lines for V1 and V2 (hence the points are the values of V1_mean on one line and V2_mean on the other). I would also like to add error bars for these means based on V1_stderr and V2_stderr
Many thanks
I'm not sure regarding your desired output, but here's a possible solution.
First of all, I wouldn't call your data plot as this is a stored function in R which is being commonly used
Second of all, when you want to plot two lines in ggplot you'll usually have to tide your data using functions such as melt (from reshape2 package) or gather (from tidyr package).
Here's an a possible approach
library(ggplot2)
library(reshape2)
dat <- data.frame(factor, V1_mean, V2_mean, V1_stderr, V2_stderr)
mdat <- cbind(melt(dat[1:3], "factor"), melt(dat[c(1, 4:5)], "factor"))
names(mdat) <- make.names(names(mdat), unique = TRUE)
ggplot(mdat, aes(factor, value, color = variable)) +
geom_point(aes(group = variable)) + # You can also add `geom_point(aes(group = variable)) + ` if you want to see the actual points
geom_errorbar(aes(ymin = value - value.1, ymax = value + value.1))

ggplot legend displays levels not present in data

I have a plot whose legend should contain two levels. Ggplot shows a legend with six levels, including four which do not appear in the data frame. A simple reproduction of the problem is shown below:
x <- seq(from=1, to=10, by=0.5)
y.2 <- x^2
y.3 <- x^3
exponent.2 <- 2
exponent.3 <- 3
data2 <- data.frame(x=x, y=y.2, exponent = exponent.2)
data3 <- data.frame(x=x, y=y.3, exponent = exponent.3)
data <- rbind(data2, data3)
p <- ggplot(data,aes(x,y,group=exponent, color=exponent)) + geom_line()
p
I am obviously doing something wrong, but need help in finding the problem.
ggplot2 interprets exponent as a continuous variable; thus it displays a number of breaks similarly to what pretty(c(2, 3)) would return.
You can use colour = factor(exponent), or specify explicitely the colour breaks.
Try
p <- ggplot(data,aes(x,y,group=factor(exponent), color=factor(exponent))) + geom_line()

Resources