Unexpected theme change in ggplot2 - r

I'm getting unexpected behavior in the look of ggplot2. When I plot large amounts of data, it appears the default theme changes from theme_grey to something like theme_bw. I can reproduce this on the particular dataset I'm working on, but cannot reproduce it on simulated data.
At any rate, here's the code:
ggplot(df2, aes(x = Sequence, y = y, color = as.factor(group))) +
geom_point(shape=19, alpha = 0.8)
nrow(df2)
[1] 4330
results in:
Now, if I take a subset of the data:
df3 <- slice(df2, 1:10)
ggplot(df3, aes(x = Sequence, y = y, color = as.factor(group))) +
geom_point(shape=19, alpha = 0.8)
results in:
I have tried:
uninstalling/reinstalling ggplot2
manually specifying a theme
unload all packages except ggplot2
working outside of a project
Sample of 5 obs:
> dput(df2[1:5, ])
structure(list(Sequence = c("1", "2", "3", "4", "5"), group = c(0,
0, 0, 0, 0), y = c(7711.945, 7695.075, 3432.585, 8081.19, 7344.455
)), .Names = c("Sequence", "group", "y"), row.names = c(NA, 5L
), class = "data.frame")

Your input for 'x' is currently stored as a factor (I'm guessing). The following code will reproduce the issue you're having and the final line of converting the x to numeric fixes the issue.
# make some test input
n <- 5000
df <- data.frame(x = factor(1:n), y = rnorm(n), group = sample(0:1, n, replace = T))
library(ggplot2)
# Using the x "as is" which is currently a factor
ggplot(df, aes(x = x, y =y, color = as.factor(group))) + geom_point(shape = 19, alpha = 0.8)
# Converting to numeric we see the desired result
ggplot(df, aes(x = as.numeric(x), y =y, color = as.factor(group))) + geom_point(shape = 19, alpha = 0.8)

Related

Adding three reference lines to a ggplot2 box plot works, but why can't I add two reference lines?

I have an example data frame:
#Libraries
library(tidyverse)
#Create example data
ex <- data.frame(id = 1:300,
event = rep(c("f", "s", "t"), 100),
x = rnorm(300, 50, 20))
And I need to make one plot with three horizontal reference lines and one plot with two horizontal reference lines.
The plot with three horizontal reference lines works
## Example Plot One (Three Reference Lines) ##
#Create boxplot function
ex_triple_plot <- function(data){
#Create datasets for references
References <- data.frame( x = c(-Inf, Inf, -Inf, Inf, -Inf, Inf),
y = c(60, 50, 40),
References = factor(c(60, 50, 40),
labels = c("High",
"Medium",
"Low")))
#Create Plots
ggplot(data, aes(x = event, y = x)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha=0.6) +
geom_line(aes( x, y, color = References), References) +
scale_color_manual(values=c("red", "red", "red"))}
#Plot
ex_triple_plot(ex)
But when I try to make a plot with two horizontal reference lines, the reference lines do not show up
## Example Plot Two (Two Reference Lines) ##
#Create boxplot function
ex_double_plot <- function(data){
#Create datasets for references
References <- data.frame( x = c(-Inf, Inf, -Inf, Inf),
y = c(60, 40),
References = factor(c(60, 40),
labels = c("High",
"Low")))
#Create Plots
ggplot(data, aes(x = event, y = x)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha=0.6) +
geom_line(aes( x, y, color = References), References) +
scale_color_manual(values=c("red", "red"))}
#Plot
ex_double_plot(ex)
Does anybody know what I'm doing wrong here?

ggplot missing plot with x-axis factor

The following works fine:
my_df <- data.frame(x_val = 1:10, y_val = sample(1:20,10),
labels = sample(c("a", "b"), 10, replace = T))
ggplot(data = my_df, aes(x = x_val, y = y_val)) + geom_line()
but if I chance x_val to factor, I am getting blank plot and message:
my_df <- data.frame(x_val = 1:10, y_val = sample(1:20,10),
labels = sample(c("a", "b"), 10, replace = T))
my_df$x_val <- as.factor(my_df$x_val)
ggplot(data = my_df, aes(x = x_val, y = y_val)) + geom_line()
message:
geom_path: Each group consists of only one observation. Do you
need to adjust the group aesthetic?
I can obviously drop factor conversion, but I need it in order to replace labels of x axis with scale_x_discrete(breaks = 1:10,labels= my_df$labels). Here is where I borrowed it link
Any thoughts?
Can you just leave x_val as numeric and use scale_x_continuous(breaks = 1:10,labels= my_df$labels) instead?

How to add two different magnitudes of point size in a ggplot bubbles chart?

I just encountered such graph attached where two colors of geom_point are used (I believe it is made by ggplot2). Similarly, I would like to have dots of one color to range from size 1 to 5, and have another color for a series of dots for the range 10 to 50. I have however no clue on how to add two different ranges of point in one graph.
At the basic step I have:
a <- c(1,2,3,4,5)
b <- c(10,20,30,40,50)
Species <- factor(c("Species1","Species2","Species3","Species4","Species5"))
bubba <- data.frame(Sample1=a,Sample2=b,Species=Species)
bubba$Species=factor(bubba$Species, levels=bubba$Species)
xm=melt(bubba,id.vars = "Species", variable.name="Samples", value.name = "Size")
str(xm)
ggplot(xm,aes(x= Samples,y= fct_rev(Species)))+geom_point(aes(size=Size))+scale_size(range = range(xm$Size))+theme_bw()
Any would have clues where I should look into ? Thanks!
I've got an approach that gets 90% of the way there, but I'm not sure how to finish the deed. To get a single legend for size, I used a transformation to convert input size to display size. That makes the legend appearance conform to the display. What I don't have figured out yet is how to apply a similar transformation to the fill so that both can be integrated into the same legend.
Here's the transformation, which in this case shrinks everything 10 or more:
library(scales)
shrink_10s_trans = trans_new("shrink_10s",
transform = function(y){
yt = if_else(y >= 10, y*0.1, y)
return(yt)
},
inverse = function(yt){
return(yt) # Not 1-to-1 function, picking one possibility
}
)
Then we can use this transformation on the size to selectively shink only the dots that are 10 or larger. This works out nicely for the legend, aside from integrating the fill encoding with the size encoding.
ggplot(xm,aes(x= Samples,y= fct_rev(Species), fill = Size < 10))+
geom_point(aes(size=Size), shape = 21)+
scale_size_area(trans = shrink_10s_trans, max_size = 10,
breaks = c(1,2,3,10,20,30,40),
labels = c(1,2,3,10,20,30,40)) +
scale_fill_manual(values = c(rgb(136,93,100, maxColorValue = 255),
rgb(236,160,172, maxColorValue = 255))) +
theme_bw()
a <- c(1, 2, 3, 4, 5)
b <- c(10, 20, 30, 40, 50)
Species <- factor(c("Species1", "Species2", "Species3", "Species4", "Species5"))
bubba <- data.frame(Sample1 = a, Sample2 = b, Species = Species)
bubba$Species <- factor(bubba$Species, levels = bubba$Species)
xm <- reshape2::melt(bubba, id.vars = "Species", variable.name = "Samples", value.name = "Size")
ggplot(xm, aes(x = Samples, y = fct_rev(Species))) +
geom_point(aes(size = Size, color = Size)) +
scale_color_continuous(breaks = c(1,2,3,10,20,30), guide = guide_legend()) +
scale_size(range = range(xm$Size), breaks = c(1,2,3,10,20,30)) +
theme_bw()
Here's a cludge. I haven't got time to figure out the legend at the moment. Note that 1 and 10 are the same size, but a different colour, as are 3 and 40.
# Create data frame
a <- c(1, 2, 3, 4, 5)
b <- c(10, 20, 30, 40, 50)
Species <- factor(c("Species1", "Species2", "Species3", "Species4", "Species5"))
bubba <- data.frame(Sample1 = a, Sample2 = b, Species = Species)
# Restructure data
xm <- reshape2::melt(bubba, id.vars = "Species", variable.name = "Samples", value.name = "Size")
# Calculate bubble size
bubble_size <- function(val){
ifelse(val > 3, (1/15) * val + (1/3), val)
}
# Calculate bubble colour
bubble_colour <- function(val){
ifelse(val > 3, "A", "B")
}
# Calculate bubble size and colour
xm %<>%
mutate(bub_size = bubble_size(Size),
bub_col = bubble_colour(Size))
# Plot data
ggplot(xm, aes(x = Samples, y = fct_rev(Species))) +
geom_point(aes(size = bub_size, fill = bub_col), shape = 21, colour = "black") +
theme(panel.grid.major = element_line(colour = alpha("gray", 0.5), linetype = "dashed"),
text = element_text(family = "serif"),
legend.position = "none") +
scale_size(range = c(1, 20)) +
scale_fill_manual(values = c("brown", "pink")) +
ylab("Species")
I think you are looking for bubble plots in R
https://www.r-graph-gallery.com/bubble-chart/
That said, you probably want to build the right and left the side of the graphic separately and then combine.

ggplot change line color specified by x axis values

Code to reproduce:
myDat <- data.frame(Event = rep(c("Arrival", "Departure"), 3),
AtNode = c("StationA", "StationA", "Track", "Track", "StationB", "StationB"),
Lane = c("Lane1", "Lane1", "Lane2", "Lane2", "Lane1", "Lane1"),
atTime = c(10, 12, 18, 20, 34, 36),
Type = c("Station", "Station", "Track", "Track", "Station", "Station"),
Train = 1 )
ggplot(data =myDat, aes(x = atTime, y=factor(AtNode, levels = unique(paste(myDat[order(myDat$atTime),"AtNode"]))), group = Train, colour = Lane ))+
geom_point(data = myDat)+
geom_path(data = myDat[which(!grepl(pattern = "Track", myDat$Type)),])
Now i need to project the two green points (Y = "Track") on the orange line and color the line between the projected points the same color as the points.
Expected result: (without the points (Y ="Track")
Thanks in advance for every hint or trick!
Cheers
I don't think your output is the right way of showing what you want. You have factors on your y-axis, which means it ranges between 1 and 3.
Therefore, projecting a line there means nothing in terms of y-axis values.
For me, the correct way of showing your data would be like this
ggplot(data =myDat,
aes(x = atTime, y=factor(AtNode, levels = unique(paste(myDat[order(myDat$atTime),"AtNode"]))),
group = AtNode, colour = Lane ))+
geom_point()+
geom_line() +
labs(y = 'AtNode')
However, to do it how you asked, you can do some simple trigonometry to project your line segment
x1 = 1 + tan(asin(2/sqrt(484)))*6 #y projection given x = 18
x2 = 1 + tan(asin(2/sqrt(484)))*8 #y projection given x = 20
foo = data.frame(x = c(18,20), y = c(x1, x2), Lane = "Lane2")
ggplot(data = myDat, aes(x = atTime, y=factor(AtNode, levels = unique(paste(myDat[order(myDat$atTime),"AtNode"]))), group = 1, colour = Lane ))+
geom_path(data = myDat[which(!grepl(pattern = "Track", myDat$Type)),]) +
geom_line(data = foo, aes(x = x, y = y, color = Lane), size = 1) +
scale_y_discrete(drop = FALSE)
I don't think there is a quick solution to this, but you could do something like this:
myDat$AtNode <- factor(myDat$AtNode, levels = unique(paste(myDat[order(myDat$atTime),"AtNode"]))) #Generate factor here so we can use in imputation calculation
impute_rows <- which(myDat$Type == "Track") #Select rows to impute
slope_df <- myDat[impute_rows + c(-1,1), ] #Select rows before and after imputation to calculate slope
line <- lm(as.numeric(AtNode) ~ atTime, data = slope_df) #Get slope of line so we can do the calculations
df <- data.frame(x = myDat[impute_rows, "atTime"], y = myDat[impute_rows, "atTime"]*line$coefficients[["atTime"]] + line$coefficients[["(Intercept)"]], Lane = myDat[impute_rows,"Lane"], Train = myDat[impute_rows,"Train"])
ggplot(data =myDat, aes(x = atTime, y=AtNode, group = Train, colour = Lane ))+
geom_path(data = myDat[which(!grepl(pattern = "Track", myDat$Type)),]) +
geom_path(data = df, aes(x = x, y = y), size = 2) +
scale_y_discrete(drop = FALSE)
The idea is as follows:
Identify the rows you want to impute: which()
Identify the rows before and after the ones to impute slope_df
Using the rows before and after the desired values to impute generate equation of line you want to impute along (using the slope_df)
Generate data based on the line df <- data.frame(...)
Note that you also need the scale_y_discrete(drop = FALSE) so that the Track level isn't removed from the plot.

Mix color and fill aesthetics in ggplot

I wonder if there is the possibility to change the fill main colour according to a categorical variable
Here is a reproducible example
df = data.frame(x = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
y = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
grp = c(rep('a', times = 10),
rep('b', times = 10)),
val = rep(1:10, times = 2))
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(color = grp,
fill = val,
size = val))
Of course it is easy to change the circle colour/shape, according to the variable grp, but I'd like to have the a group in shades of red and the b group in shades of blue.
I also thought about using facets, but don't know if the fill gradient can be changed for the two panels.
Anyone knows if that can be done, without gridExtra?
Thanks!
I think there are two ways to do this. The first is using the alpha aesthetic for your val column. This is a quick and easy way to accomplish your goal but may not be exactly what you want:
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(alpha=val,
fill = grp,
size = val)) + theme_minimal()
The second way would be to do something similar to this post: Vary the color gradient on a scatter plot created with ggplot2. I edited the code slightly so its not a range from white to your color of interest but from a lighter color to a darker color. This requires a little bit of work and using the scale_fill_identity function which basically takes a variable that has the colors you want and maps them directly to each point (so it doesn't do any scaling).
This code is:
#Rescale val to [0,1]
df$scaled_val <- rescale(df$val)
low_cols <- c("firebrick1","deepskyblue")
high_cols <- c("darkred","deepskyblue4")
df$col <- ddply(df, .(grp), function(x)
data.frame(col=apply(colorRamp(c(low_cols[as.numeric(x$grp)[1]], high_cols[as.numeric(x$grp)[1]]))(x$scaled_val),
1,function(x)rgb(x[1],x[2],x[3], max=255)))
)$col
df
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(
fill = col,
size = val)) + theme_minimal() +scale_fill_identity()
Thanks to this other post I found a way to visualize the fill bar in the legend, even though that wasn't what I meant to do.
Here's the ouptup
And the code
df = data.frame(x = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
y = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
grp = factor(c(rep('a', times = 10),
rep('b', times = 10)),
levels = c('a', 'b')),
val = rep(1:10, times = 2)) %>%
group_by(grp) %>%
mutate(scaledVal = rescale(val)) %>%
ungroup %>%
mutate(scaledValOffSet = scaledVal + 100*(as.integer(grp) - 1))
scalerange <- range(df$scaledVal)
gradientends <- scalerange + rep(c(0,100,200), each=2)
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(fill = scaledValOffSet,
size = val)) +
scale_fill_gradientn(colours = c('white',
'darkred',
'white',
'deepskyblue4'),
values = rescale(gradientends))
Basically one should rescale fill values (e.g. between 0 and 1) and separate them using another order of magnitude, provided by the categorical variable grp.
This is not what I wanted though: the snippet can be improved, of course, to make the whole thing less manual, but still lacks the simple usual discrete fill legend.

Resources