Tidy up ggplot with Jitter and fixing axis labels? - r

Hi guys, I have this very messy plot. How can I
rotate the x axis text so that you can actually read it
not include every y value in the y-axis (maybe have the y axis in intervals of 5)
add jitter so that the plot is easier is read
remove the NA values (I tried to, but I guess it did not work)
remove the legend (had to crop it for confidentiality)
here is my code:
data <- ndpdata[which(ndpdata$FC.Fill.Size==20),] #20 fill size
library(tidyr)
my_df_long <- gather(data, group, y, -FC.Batch.Nbr)
data = my_df_long[2075:2550,]
ggplot(data, aes(FC.Batch.Nbr, y, color=FC.Batch.Nbr), na.rm=TRUE) + geom_point()

To rotate the x axis add this to your ggplot:
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
If you don't want to include every value on the y axis, you can set breaks:
scale_y_continuous(breaks = c(251,270,290,310,325))
To add jitter points try position = "jitter" inside geom_point():
geom_point(position = "jitter")
To remove NA's you can use it on your data:
data <- data[!is.na(data)]
To remove legend add this to your ggplot:
theme(legend.position = "none")

Related

How do you recenter y-axis at 0.01 using ggplot2

I am trying to make a grouped bar plot comparing the concentration of contaminants (in ug/g) from 2 different locations site comparison figure, with the y axis in log scale. I have some values that are below one but greater than zero. The y axis is centering my bars at y=1, making some of the bars look negative. Is there to make my bars start at 0.01 instead of 1?
I tried coord_cartesian( ylim=c(0.01,100), expand = FALSE ) but that didn't do it. I also tried to use coor_trans(y='log10') to log transform y axis instead of scale_y_continuous(trans='log10') but was getting the error message "Transformation introduced infinite values in y-axis" even though I have no zero values.
Any help would be much appreciated,
Thank you.
my code is below:
malcomp2 %>%
ggplot(aes(x= contam, y= ug_values, fill= Location))+
geom_col(data= malcomp2,
mapping = aes(x= contam, y= ug_values, fill = Location),
position = position_dodge(.9), #makes the bars grouped
stat_identity(),
colour = "black", #adds black lines around bars
width = 0.8,
size = 0.3)+
ylab(expression(Concentration~(mu*g/g)))+ #adds mu character to y axis
coord_cartesian( ylim=c(0.01,100), expand = FALSE ) + #force bars to start at 0
theme_classic()+ #get rid of grey grid
scale_y_continuous(trans='log10', #change to log scale
labels = c(0.01,0.1,1,10,100))+ #change axis to not sci notation
annotation_logticks(sides = 'l')+ #add log ticks to y axis only
scale_x_discrete(name = NULL, #no x axis label
limits = c('Dieldrin','Mirex','PBDEs','CHLDs','DDTs','PCBtri','PCBquad',
'PCBhept'), #changes order of x axis
labels = c('Dieldrin','Mirex','PBDEs','CHLDs','DDTs','PCB 3','PCB 4-6','PCB7+'))+
scale_fill_manual("Location", #rename legend
values = c('turquoise2','gold'), #change colors
labels = c( 'St.Andrew Bay', 'Sapelo'))+ #change names on legend
theme(legend.title = NULL,
legend.key.size = unit(15, "pt"),
legend.position = c(0.10,0.95)) #places legend in upper left corner
One hack would be to just scale your data so baseline is at 1. People have asked this question before on SO, and it seems like a more satisfactory approach might be to use geom_rect instead, like here:
Setting where y-axis bisects when using log scale in ggplot2 geom_bar
ggplot(data.frame(contam = 1:5, ug_values = 10^(-2:2)*100),
aes(contam, ug_values)) +
geom_col() +
scale_y_continuous(trans = 'log10', limits = c(1,10000),
breaks = c(1,10,100,1000,10000),
labels = c(0.01,0.1,1,10,100))

Adding different secondary x axis for each facet in ggplot2

I would like to add a different secondary axis to each facet. Here is my working example:
library(ggplot2)
library(data.table)
#Create the data:
data<-data.table(cohort=sample(c(1946,1947,1948),10000,replace=TRUE),
works=sample(c(0,1),10000,replace=TRUE),
year=sample(seq(2006,2013),10000,replace=TRUE))
data[,age_cohort:=year-cohort]
data[,prop_works:=mean(works),by=c("cohort","year")]
#Prepare data for plotting:
data_to_plot<-unique(data,by=c("cohort","year"))
#Plot what I want:
ggplot(data_to_plot,aes(x=age_cohort,y=prop_works))+geom_point()+geom_line()+
facet_wrap(~ cohort)
The plot shows how many people of a particular cohort work at a given age. I would like to add a secondary x axis showing which year corresponds to a particular age for different cohorts.
Since you have the actual values you want to use in your dataset, one work around is to plot them as an additional geom_text layer:
ggplot(data_to_plot,
aes(x = age_cohort, y = prop_works, label = year))+
geom_point() +
geom_line() +
geom_text(aes(y = min(prop_works)),
hjust = 1.5, angle = 90) + # rotate to save space
expand_limits(y = 0.44) +
scale_x_continuous(breaks = seq(58, 70, 1)) + # ensure x-axis breaks are at whole numbers
scale_y_continuous(labels = scales::percent) +
facet_wrap(~ cohort, scales = "free_x") + # show only relevant age cohorts in each facet
theme(panel.grid.minor.x = element_blank()) # hide minor grid lines for cleaner look
You can adjust the hjust value in geom_text() and y value in expand_limits() for a reasonable look, depending on your desired output's dimensions.
(More data wrangling would be required if there are missing years in the data, but I assume that isn't the case here.)

changing ggplot legend unit scale

This question is motivated by a previous post illustrating various ways to change how axes scales are plotted in a ggplot figure, from the default exponential notation to the full integer value (when ones axes values are very large). While I am able to convert the axes scales from exponential notation to full values, I am unclear how one would achieve the same goal for the values appearing in the legend.
While I understand that one can manually change the length of the legend scale with "scale_color..." or "scale_fill..." followed by the "limits" argument, this does not appear to be a solution to getting my legend values to show "6000000000" rather than "6e+09" (or "0" rather than "0e+00" for that matter).
The following example should suffice. My hope is someone can point out how to implement the 'scales' package to apply for legend scales rather than axes scales.
Thanks very much.
library(ggplot2)
library(scales)
Data <- data.frame(
pi = c(2,71,828,1828,45904,523536,2874713,52662497,757247093,6999595749),
e = c(3,14,159,2653,58979,311599,7963468,54418516,1590576171, 99),
face = 1:10)
p <- ggplot(data = Data, aes(x=face, y=e, colour = pi))
myplot <- p + geom_point() +
scale_y_continuous(labels = comma) +
scale_color_gradientn(colours = rainbow(2), limits=c(0,7000000000))
myplot
Use the Comma formatter in scale_color_gradientn by setting labels = comma e.g.:
p <- ggplot(data = Data, aes(x=face, y=e, colour = pi))
myplot <- p + geom_point() +
scale_y_continuous(labels = comma) +
scale_color_gradientn(colours = rainbow(2), limits=c(0,7000000000), labels = comma)
myplot

Line up columns of bar graph with points of line plot with ggplot

Is there any way to line up the points of a line plot with the bars of a bar graph using ggplot when they have the same x-axis? Here is the sample data I'm trying to do it with.
library(ggplot2)
library(gridExtra)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line()
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity")
grid.arrange(no, yes)
Here is the output:
The first point of the line plot is to the left of the first bar, and the last point of the line plot is to the right of the last bar.
Thank you for your time.
Extending #Stibu's post a little: To align the plots, use gtable (Or see answers to your earlier question)
library(ggplot2)
library(gtable)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line() +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity") +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
gYes = ggplotGrob(yes) # get the ggplot grobs
gNo = ggplotGrob(no)
plot(rbind(gNo, gYes, size = "first")) # Arrange and plot the grobs
Edit To change heights of plots:
g = rbind(gNo, gYes, size = "first") # Combine the plots
panels <- g$layout$t[grepl("panel", g$layout$name)] # Get the positions for plot panels
g$heights[panels] <- unit(c(0.7, 0.3), "null") # Replace heights with your relative heights
plot(g)
I can think of (at least) two ways to align the x-axes in the two plots:
The two axis do not align because in the bar plot, the geoms cover the x-axis from 0.5 to 27.5, while in the other plot, the data only ranges from 1 to 27. The reason is that the bars have a width and the points don't. You can force the axex to align by explicitly specifying an x-axis range. Using the definitions from your plot, this can be achieved by
yes <- yes + scale_x_continuous(limits=c(0,28))
no <- no + scale_x_continuous(limits=c(0,28))
grid.arrange(no, yes)
limits sets the range of the x-axis. Note, though, that the alginment is still not quite perfect. The y-axis labels take up a little more space in the upper plot, because the numbers have two digits. The plot looks as follows:
The other solution is a bit more complicated but it has the advantage that the x-axis is drawn only once and that ggplot makes sure that the alignment is perfect. It makes use of faceting and the trick described in this answer. First, the data must be combined into a single data frame by
all <- rbind(data.frame(other_data,type="other"),data.frame(data,type="data"))
and then the plot can be created as follows:
ggplot(all,aes(x=x,y=y)) + facet_grid(type~.,scales = "free_y") +
geom_bar(data=subset(all,type=="other"),stat="identity") +
geom_point(data=subset(all,type=="data")) +
geom_line(data=subset(all,type=="data"))
The trick is to let the facets be constructed by the variable type which was used before to label the two data sets. But then each geom only gets the subset of the data that should be drawn with that specific geom. In facet_grid, I also used scales = "free_y" because the two y-axes should be independent. This plot looks as follows:
You can change the labels of the facets by giving other names when you define the data frame all. If you want to remove them alltogether, then add the following to your plot:
+ theme(strip.background = element_blank(), strip.text = element_blank())

Using coord_flip() with facet_wrap(scales = "free_y") in ggplot2 seems to give unexpected facet axis tick marks and tick labels

I am trying to create a faceted plot with flipped co-ordinates where one and only one of the axes are allowed to vary for each facet:
require(ggplot2)
p <- qplot(displ, hwy, data = mpg)
p + facet_wrap(~ cyl, scales = "free_y") + coord_flip()
This plot is not satisfactory to me because the wrong tick marks and tick labels are repeated for each plot. I want tick marks on every horizontal axis not on every vertical axis.
This is unexpected behaviour because the plot implies that the horizontal axis tick marks are the same for the top panels as they are for the bottom ones, but they are not. To see this run:
p <- qplot(displ, hwy, data = mpg)
p + facet_wrap(~ cyl, scales = "fixed") + coord_flip()
So my question is: is there a way to remove the vertical axis tick marks for the right facets and add horizontal axis tick marks and labels to the top facets?
As Paul insightfully points out below, the example I gave can be addressed by swapping x and y in qplot() and avoiding coord_flip(), however this does not work for all geoms for example, if I want a horizontal faceted bar plot with free horizontal axes I could run:
c <- ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar()
c + facet_wrap(~cut, scales = "free_y") + coord_flip()
These facets have a variable horizontal axes but repeated vertical axis tick marks instead of repeated horizontal axes tick marks. I do not think Paul's trick will work here, because unlike scatter plots, bar plots are not rotationally symmetric.
I would be very interested to hear any partial or complete solutions.
Using coord_flip in conjunction with facet_wrap is the problem. First you define a certain axis to be free (the x axis) and then you swap the axis, making the y axis free. Right now this is not reproduced well in ggplot2.
In your first example, I would recommend not using coord_flip, but just swapping the variables around in your call to qplot, and using free_x:
p <- qplot(hwy, displ, data = mpg)
p + facet_wrap(~ cyl, scales = "free_x")
This is the second or third time I have run into this problem myself. I have found that I can hack my own solution by defining a custom geom.
geom_bar_horz <- function (mapping = NULL, data = NULL, stat = "bin", position = "stack", ...) {
GeomBar_horz$new(mapping = mapping, data = data, stat = stat, position = position, ...)
}
GeomBar_horz <- proto(ggplot2:::Geom, {
objname <- "bar_horz"
default_stat <- function(.) StatBin
default_pos <- function(.) PositionStack
default_aes <- function(.) aes(colour=NA, fill="grey20", size=0.5, linetype=1, weight = 1, alpha = NA)
required_aes <- c("y")
reparameterise <- function(., df, params) {
df$width <- df$width %||%
params$width %||% (resolution(df$x, FALSE) * 0.9)
OUT <- transform(df,
xmin = pmin(x, 0), xmax = pmax(x, 0),
ymin = y - .45, ymax = y + .45, width = NULL
)
return(OUT)
}
draw_groups <- function(., data, scales, coordinates, ...) {
GeomRect$draw_groups(data, scales, coordinates, ...)
}
guide_geom <- function(.) "polygon"
})
This is just copying the geom_bar code from the ggplot2 github and then switching the x and y references to make a horizontal barplot in the standard Cartesian coordinators.
Note that you must use position='identity' and possibly also stat='identity' for this to work. If you need to use a position other than identity then you will have to eddit the collide function for it to work properly.
I've just been trying to do a horizontal barplot, and run into this problem where I wanted to scales = "free_x". In the end, it seemed easier to create the conventional (vertical) barplot), rotate the text so that if you tip your head to the left, it looks like the plot that you want. And then, once your plot is completed, rotate the PDF/image output(!)
ggplot(data, aes(x, y)) +
geom_bar(stat = "identity") +
facet_grid(var ~ group, scale = "free", space = "free_x", switch = "both") +
theme(axis.text.y = element_text(angle=90), axis.text.x = element_text(angle = 90),
strip.text.x = element_text(angle = 180))
The main keys to do this are to switch = "both", which moves the facet labels to the other axis, and the element_text(angle=90) which rotates the axis labels and text.

Resources