ggplot legend displays levels not present in data - r

I have a plot whose legend should contain two levels. Ggplot shows a legend with six levels, including four which do not appear in the data frame. A simple reproduction of the problem is shown below:
x <- seq(from=1, to=10, by=0.5)
y.2 <- x^2
y.3 <- x^3
exponent.2 <- 2
exponent.3 <- 3
data2 <- data.frame(x=x, y=y.2, exponent = exponent.2)
data3 <- data.frame(x=x, y=y.3, exponent = exponent.3)
data <- rbind(data2, data3)
p <- ggplot(data,aes(x,y,group=exponent, color=exponent)) + geom_line()
p
I am obviously doing something wrong, but need help in finding the problem.

ggplot2 interprets exponent as a continuous variable; thus it displays a number of breaks similarly to what pretty(c(2, 3)) would return.
You can use colour = factor(exponent), or specify explicitely the colour breaks.

Try
p <- ggplot(data,aes(x,y,group=factor(exponent), color=factor(exponent))) + geom_line()

Related

Give color to scatter plot points based on value thershold

I have data.frame of value between -10 to 10, my data.frame has 2 columns.I wanted to create a ggplot graph.
I need to give color to points which have values more than 8 or less than -8.
How can I do this by ggplot in geom_point()?
I agree with the comments above, anyway I think this is what you are looking for
p <- runif(100, min=-10, max=10)
g <- 1:100
dat <- data.frame(p, g)
dat$colors <- 1
dat[which(dat$p < (-8) | dat$p > 8),"colors"] <- 0
library(ggplot2)
ggplot(dat, aes(x=g, y=p, group=colors)) + geom_point(aes(color=as.factor(colors)))
Which results in this:
Edit:
In a previous version of this answer the different colors were expressed as a continuous variable. I changed this to a dichotomous format with as.factor.

Plot with one line for each column and time-series on the x-axis R

You can find my dataset here.
From this data, I wish to plot (one line for each):
x$y[,1]
x$y[,5]
x$y[,1]+x$y[,5]
Therefore, more clearly, in the end, each of the following will be represented by one line:
y0,
z0,
y0+z0
My x-axis (time-series) will be from x$t.
I have tried the following, but the time-series variable is problematic and I cannot figure out how I can exactly plot it. My code is:
Time <- x$t
X0 <- x$y[,1]
Z0 <- x$y[,5]
X0.plus.Z0 <- X0 + Z0
xdf0 <- cbind(Time,X0,Z0,X0.plus.Z0)
xdf0.melt <- melt(xdf0, id.vars="Time")
ggplot(data = xdf0.melt, aes(x=Time, y=value)) + geom_line(aes(colour=Var2))
The error in your code comes from the use of melt applied to an object that is not a data.frame. You should modify like this:
xdf0 <- cbind.data.frame(Time,X0,Z0,X0.plus.Z0)
xdf0.melt <- reshape2::melt(xdf0, id.vars="Time")
ggplot(data = xdf0.melt, aes(x=Time, y=value)) + geom_line(aes(colour=variable))
You don't have to go through the melt process since you juste have 3 lines to plot, it's fine to plot them separately
ggplot(data=xdf0) + aes(x=Time) +
geom_line(aes(y=X0), col="red") +
geom_line(aes(y=Z0), col="blue") +
geom_line(aes(y=X0.plus.Z0))
However, you don't get the legend.
A remark about your example: you try to plot values of really different order of magnitude, so you can't really see anything.
How about
matplot(xdf0, type = 'l')
?

ggplotly plots negative bars in positive direction

I am using a geom_bar plot in ggplotly, and it renders negative bars positive. Any ideas why this might be the case, and in particular how to solve this?
library(ggplot2)
library(plotly)
dat1 <- data.frame(
sex = factor(c("Female","Female","Male","Male")),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(-13.53, 16.81, 16.24, 17.42)
)
# Bar graph, time on x-axis, color fill grouped by sex -- use position_dodge()
g <- ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity", position=position_dodge())
ggplotly(g)
Why would the first bar be in a positive direction, with a negative value?
The versions that I am using is the latest:
plotly_3.4.13
ggplot2_2.1.0
If you write your plotly object to another variable you can modify the plotly properties including the 'data' it uses to render the plot.
For your specific example append this to your code:
#create plotly object to manipulate
gly<-ggplotly(g)
#confirm existing data structure/values
gly$x$data[[1]]
# see $y has values of 13.53, 16.81 which corresponds to first groups absolute values
#assign to original data
gly$x$data[[1]]$y <- dat1$total_bill[grep("Female",dat1$sex)]
#could do for second group too if needed
gly$x$data[[2]]$y <- dat1$total_bill[grep("Male",dat1$sex)]
#to see ggplotly object with changes
gly
I have come up with a general solution that works in cases where facet wrap is being used. Here is an example of the problem with toy data:
set.seed(45)
df <- data.frame( group=rep(1:4,5), TitleX=rep(1:5,4), TitleY=sample(-5:5,20, replace = TRUE))
h <- ggplot(df) + geom_bar(aes(TitleX,TitleY),stat = 'identity') + facet_wrap(~group)
h
When we use ggplotly we see what OP saw, which is that the negatives have disappeared:
gly <- ggplotly(h)
gly
I wrote a function that will check for the instances in each facet list where the y values in the text are given as 0, which seems to be a comorbid issue with the one I am currently addressing:
fix_bar_ly <- function(element,yname){
tmp <- as.data.frame(element[c("y","text")])
tmp <- tmp %>% mutate(
y=ifelse(grepl(paste0(yname,": 0$"),text),
ifelse(y!=0,-y,y),
y)
)
element$y <- tmp$y
element
}
Now I apply this function to the data for each facet:
data.list <- gly$x$data
m <- lapply(data.list,function(x){fix_bar_ly(x,"TitleY")})
gly$x$data <- m
gly
For some reason the spaces between the bars have disappeared ... but at least the values are negative in the appropriate places.

adding layer to a plot in R

Taking some generic data
A <- c(1997,2000,2000,1998,2000,1997,1997,1997)
B <- c(0,0,1,0,0,1,0,0)
df <- data.frame(A,B)
counts <- t(table(A,B))
frac <- counts[1,]/(counts[2,]+counts[1,])
C <- c(1998,2001,2000,1995,2000,1996,1998,1999)
D <- c(1,0,1,0,0,1,0,1)
df2 <- data.frame(C,D)
counts2 <- t(table(C,D))
frac2 <- counts2[1,]/(counts2[2,]+counts2[1,])
If we then want to create a scatterplot for the two datasets on the one scale
We can:
plot(frac, pch=22)
points(frac2, pch=19)
But we see we have two problems
first we want to put our year values (which appear as df$A and df$C) along the x axis
We want the x axis to automatically adjust the scale when the second data is added.
A solution using ggplot2 or base R would be desired
ggplot will do the scaling for you. You can convert the fracs to data.frame and to use with ggplot
library(ggplot2)
ggplot(data.frame(y=frac, x=names(frac)), aes(x, y)) +
geom_point(col="salmon") +
geom_point(data=data.frame(y=frac2, x=names(frac2)), aes(x, y), col="steelblue") +
theme_bw()

Plotting Luv colors; replicating figure 6.11 from the ggplot2 book

I am trying to replicate figure 6.11 from Hadley Wickham's ggplot2 book, which plots R colors in Luv space; the colors of points represent themselves, and no legend is necessary.
Here are two attempts:
library(colorspace)
myColors <- data.frame("L"=runif(10000, 0,100),"a"=runif(10000, -100, 100),"b"=runif(10000, -100, 100))
myColors <- within(myColors, Luv <- hex(LUV(L, a, b)))
myColors <- na.omit(myColors)
g <- ggplot(myColors, aes(a, b, color=Luv), size=2)
g + geom_point() + ggtitle ("mycolors")
Second attempt:
other <- data.frame("L"=runif(10000),"a"=runif(10000),"b"=runif(10000))
other <- within(other, Luv <- hex(LUV(L, a, b)))
other <- na.omit(other)
g <- ggplot(other, aes(a, b, color=Luv), size=2)
g + geom_point() + ggtitle("other")
There are a couple of obvious problems:
These graphs don't look anything like the figure. Any suggestions on
the code needed?
The first attempt generates a lot of NA fields in the Luv
column (only ~3100 named colors out of 10,000 runs, versus ~9950 in
the second run). If L is supposed to be between 0-100 and u and v
between -100 and 100, why do I have so many NAs in the first run? I have tried rounding, it doesn't help.
Why do I have a legend?
Many thanks.
You're getting strange colors because aes(color = Luv) says "assign a color to each unique string in column Luv". If you assign color outside of aes, as below, it means "use these explicit colors". I think something like this should be close to the figure you presented.
require(colorspace)
x <- sRGB(t(col2rgb(colors())))
storage.mode(x#coords) <- "numeric" # as(..., "LUV") doesn't like integers for some reason
y <- as(x, "LUV")
DF <- as.data.frame(y#coords)
DF$col <- colors()
ggplot(DF, aes( x = U, y = V)) + geom_point(colour = DF$col)

Resources