Why does "scale_x_discrete" screw up the zoom on my plot? - r

I'm trying to plot my data in R and trying to manually relabel the x axis keeps creating weird extra space on either side of the plot (see pictures).
Here's my code:
ACL_data_frame <- data.frame(ACLdata, Pre_PHQ_Score, Post_PHQ_Score)
colnames(ACL_data_frame)[c(328:329)] <- c("PHQ.1", "PHQ.2")
ACL_data_frame_long <- reshape(ACL_data_frame, direction="long", varying=328:329, sep=".")
ACL_data_frame_long$Condition <- factor(ACL_data_frame_long$Condition)
ggplot(data = ACL_data_frame_long, aes(x = time, y = PHQ, linetype = Condition)) +
stat_summary(fun=mean, geom="line", size=1.25) +
labs(x="Time", y="Depression (PHQ-9)") +
scale_x_discrete(limits=c("1","2"),
labels=c("Pre", "Post")) +
scale_linetype_discrete(name="Condition",
breaks=c("1", "2"),
labels=c("Control", "Intervention"))
The relevant columns in my data basically look like this (but longer):
time
PHQ
Condition
1
0.6666667
1
1
1.1111111
2
2
0.7777778
2
2
1.3333333
1
Does anyone know how to get rid of the weird space on either side of the plot (see pics)? I tried xlim but that gives an error. Removing the scale_x_discrete also fixes it, but then I can't figure out another way to label the x axis correctly (i.e., there are only 2 timepoints so having labels for Time 1.25, Time 1.5, etc doesn't make sense, they need to just be labeled "Pre" and "Post")
Plot using scale_x_discrete, with correct x axis labels but incorrect margins
Plot without scale_x_discrete, with correct margins but incorrect x axis labels

As your time column is a numeric you get a continuous scale by default. Therefore, to set the breaks and labels use scale_x_continuous(breaks = 1:2, labels = c("Pre", "Post")). Adding scale_x_discrete will switch to a discrete scale which by default adds an expansion of .6 to the lower and the upper end of the scale. In contrast, scale_x_continuous adds an expansion of 5% of the data range to the lower and the upper end of the scale.
library(ggplot2)
ggplot(data = ACL_data_frame_long, aes(x = time, y = PHQ, linetype = Condition)) +
stat_summary(fun = mean, geom = "line", size = 1.25) +
labs(x = "Time", y = "Depression (PHQ-9)") +
scale_x_continuous(breaks = 1:2, labels = c("Pre", "Post")) +
scale_linetype_discrete(
name = "Condition",
breaks = c("1", "2"),
labels = c("Control", "Intervention")
)

Related

How do you recenter y-axis at 0.01 using ggplot2

I am trying to make a grouped bar plot comparing the concentration of contaminants (in ug/g) from 2 different locations site comparison figure, with the y axis in log scale. I have some values that are below one but greater than zero. The y axis is centering my bars at y=1, making some of the bars look negative. Is there to make my bars start at 0.01 instead of 1?
I tried coord_cartesian( ylim=c(0.01,100), expand = FALSE ) but that didn't do it. I also tried to use coor_trans(y='log10') to log transform y axis instead of scale_y_continuous(trans='log10') but was getting the error message "Transformation introduced infinite values in y-axis" even though I have no zero values.
Any help would be much appreciated,
Thank you.
my code is below:
malcomp2 %>%
ggplot(aes(x= contam, y= ug_values, fill= Location))+
geom_col(data= malcomp2,
mapping = aes(x= contam, y= ug_values, fill = Location),
position = position_dodge(.9), #makes the bars grouped
stat_identity(),
colour = "black", #adds black lines around bars
width = 0.8,
size = 0.3)+
ylab(expression(Concentration~(mu*g/g)))+ #adds mu character to y axis
coord_cartesian( ylim=c(0.01,100), expand = FALSE ) + #force bars to start at 0
theme_classic()+ #get rid of grey grid
scale_y_continuous(trans='log10', #change to log scale
labels = c(0.01,0.1,1,10,100))+ #change axis to not sci notation
annotation_logticks(sides = 'l')+ #add log ticks to y axis only
scale_x_discrete(name = NULL, #no x axis label
limits = c('Dieldrin','Mirex','PBDEs','CHLDs','DDTs','PCBtri','PCBquad',
'PCBhept'), #changes order of x axis
labels = c('Dieldrin','Mirex','PBDEs','CHLDs','DDTs','PCB 3','PCB 4-6','PCB7+'))+
scale_fill_manual("Location", #rename legend
values = c('turquoise2','gold'), #change colors
labels = c( 'St.Andrew Bay', 'Sapelo'))+ #change names on legend
theme(legend.title = NULL,
legend.key.size = unit(15, "pt"),
legend.position = c(0.10,0.95)) #places legend in upper left corner
One hack would be to just scale your data so baseline is at 1. People have asked this question before on SO, and it seems like a more satisfactory approach might be to use geom_rect instead, like here:
Setting where y-axis bisects when using log scale in ggplot2 geom_bar
ggplot(data.frame(contam = 1:5, ug_values = 10^(-2:2)*100),
aes(contam, ug_values)) +
geom_col() +
scale_y_continuous(trans = 'log10', limits = c(1,10000),
breaks = c(1,10,100,1000,10000),
labels = c(0.01,0.1,1,10,100))

Ggplot2 in R gives incorrect coloring when creating overlapping demographic pyramids

I am creating an overlapping demographic pyramids in R with ggplot2 library to compare demographic data from two different sources.
I have however run in to problems with ggplot2 and the colouring when using the alpha-parameter. I have tried to make sense of ggplot2 and geom_bar structure, but so far it has gotten me nowhere. The deal is to draw four geom_bars where two geom_bars are overlapping each other (males and females, respectively). I'd have no problems if I didn't need use alpha to demonstrate differences in my data.
I would really appreciate some answers where I am going wrong here. As a R programmer I am pretty close to beginner, so bear with me if my code looks weird.
Below is my code which results in the image also shown below. I have altered my demographic data to be random for this question.
library(ggplot2)
# Here I randomise my data for StackOverflow
poptest<-data.frame(matrix(NA, nrow = 101, ncol = 5))
poptest[,1]<- seq(0,100)
poptest[,2]<- rpois(n = 101, lambda = 100)
poptest[,3]<- rpois(n = 101, lambda = 100)
poptest[,4]<- rpois(n = 101, lambda = 100)
poptest[,5]<- rpois(n = 101, lambda = 100)
colnames(poptest) <- c("age","A_males", "A_females","B_males", "B_females")
myLimits<-c(-250,250)
myBreaks<-seq(-250,250,50)
# Plot demographic pyramid
poptestPlot <- ggplot(data = poptest) +
geom_bar(aes(age,A_females,fill="black"), stat = "identity", alpha=0.75, position = "identity")+
geom_bar(aes(age,-A_males, fill="black"), stat = "identity", alpha=0.75, position="identity")+
geom_bar(aes(age,B_females, fill="white"), stat = "identity", alpha=0.5, position="identity")+
geom_bar(aes(age,-B_males, fill="white"), stat = "identity", alpha=0.5, position="identity")+
coord_flip()+
#set the y-axis which (because of the flip) shows as the x-axis
scale_y_continuous(name = "",
limits = myLimits,
breaks = myBreaks,
#give the values on the y-axis a name, to remove the negatives
#give abs() command to remove negative values
labels = paste0(as.character(abs(myBreaks))))+
#set the x-axis which (because of the flip) shows as the y-axis
scale_x_continuous(name = "age",breaks=seq(0,100,5)) +
#remove the legend
theme(legend.position = 'none')+
# Annotate geom_bars
annotate("text", x = 100, y = -200, label = "males",size=6)+
annotate("text", x = 100, y = 200, label = "females",size=6)
# show results in a separate window
x11()
print(poptestPlot)
This is what I get as result: (sorry, as a StackOverflow noob I can't embed my pictures)
Ggplot2 result
The colouring is really nonsensical. Black is not black and white is not white. Instead it may use some sort of default coloring because R or ggplot2 can't interpret my code.
I welcome any and all answers. Thank you.
You are trying to map "black" to data points. That means you would have to add a manual scale and tell ggplot to colour each instance of "black" in colour "black". There is a shortcut for this called scale_colour_identity. However, if this is your only level, it is much easier to just use fill outside the aes. This way the whole geom is filled in black or white respectively:
poptestPlot <- ggplot(data = poptest) +
geom_bar(aes(age,A_females),fill="black", stat = "identity", alpha=0.75, position = "identity")+
geom_bar(aes(age,-A_males), fill="black", stat = "identity", alpha=0.75, position="identity")+
geom_bar(aes(age,B_females), fill="white", stat = "identity", alpha=0.5, position="identity")+
geom_bar(aes(age,-B_males), fill="white", stat = "identity", alpha=0.5, position="identity")+
coord_flip()+
#set the y-axis which (because of the flip) shows as the x-axis
scale_y_continuous(name = "",
limits = myLimits,
breaks = myBreaks,
#give the values on the y-axis a name, to remove the negatives
#give abs() command to remove negative values
labels = paste0(as.character(abs(myBreaks))))+
#set the x-axis which (because of the flip) shows as the y-axis
scale_x_continuous(name = "age",breaks=seq(0,100,5)) +
#remove the legend
theme(legend.position = 'none')+
# Annotate geom_bars
annotate("text", x = 100, y = -200, label = "males",size=6)+
annotate("text", x = 100, y = 200, label = "females",size=6)

plotting labels outside of plot in ggplot

I have a plot made from the following code:
variable=c("A","B","C","D","E")
value=c(1,2,3,4,5);
type=c("A","B","A","A","B")
temp<-data.frame(var=factor(variable),val=value,type=factor(type))
p<-ggplot(temp,aes(var,val,color=type))+geom_point(aes(colour="type"))
p<-p+coord_flip()+theme(plot.margin = unit(c(1,5,1,1), "lines"),legend.position = "none")
How can I labels for the values (now on x-axis) of the plot on the right-side of the plot at the correct level (ie, i want it to say "5 4 3 2 1" vertically on the right side at the level (height) of the corresponding variable?
Thanks
if you make the "variable" the y-axis label rather than the actual values of the plot, you can use the sec_axis as a 1:1 transformation:
temp <- data.frame(val = value, var = value, type = type)
p <- ggplot(temp,aes(var,val,color=type)) +
geom_point(aes(colour="type")) +
theme(plot.margin = unit(c(1,5,1,1), "lines"), legend.position = "none")
p <- p + scale_y_continuous(labels = variable, sec.axis = sec_axis(~.*1))
p

Invert the ordering of labels in the ggplot color scale after discretizing a continuous variable [duplicate]

This question already has answers here:
Flip ordering of legend without altering ordering in plot
(3 answers)
Closed 6 years ago.
Consider the following sample data set:
mydata="theta,rho,Response
0,0.8400000,0.0000000
40,0.8400000,0.4938922
80,0.8400000,0.7581434
120,0.8400000,0.6675656
160,0.8400000,0.2616592
200,0.8400000,-0.2616592
240,0.8400000,-0.6675656
280,0.8400000,-0.7581434
320,0.8400000,-0.4938922
0,0.8577778,0.0000000
40,0.8577778,0.5152213
80,0.8577778,0.7908852
120,0.8577778,0.6963957
160,0.8577778,0.2729566
200,0.8577778,-0.2729566
240,0.8577778,-0.6963957
280,0.8577778,-0.7908852
320,0.8577778,-0.5152213
0,0.8755556,0.0000000
40,0.8755556,0.5367990
80,0.8755556,0.8240077
120,0.8755556,0.7255612
160,0.8755556,0.2843886
200,0.8755556,-0.2843886
240,0.8755556,-0.7255612
280,0.8755556,-0.8240077
320,0.8755556,-0.5367990
0,0.8933333,0.0000000
40,0.8933333,0.5588192
80,0.8933333,0.8578097
120,0.8933333,0.7553246
160,0.8933333,0.2960542
200,0.8933333,-0.2960542
240,0.8933333,-0.7553246
280,0.8933333,-0.8578097
320,0.8933333,-0.5588192
0,0.9111111,0.0000000
40,0.9111111,0.5812822
80,0.9111111,0.8922910
120,0.9111111,0.7856862
160,0.9111111,0.3079544
200,0.9111111,-0.3079544
240,0.9111111,-0.7856862
280,0.9111111,-0.8922910
320,0.9111111,-0.5812822
0,0.9288889,0.0000000
40,0.9288889,0.6041876
80,0.9288889,0.9274519
120,0.9288889,0.8166465
160,0.9288889,0.3200901
200,0.9288889,-0.3200901
240,0.9288889,-0.8166465
280,0.9288889,-0.9274519
320,0.9288889,-0.6041876
0,0.9466667,0.0000000
40,0.9466667,0.6275358
80,0.9466667,0.9632921
120,0.9466667,0.8482046
160,0.9466667,0.3324593
200,0.9466667,-0.3324593
240,0.9466667,-0.8482046
280,0.9466667,-0.9632921
320,0.9466667,-0.6275358
0,0.9644444,0.0000000
40,0.9644444,0.6512897
80,0.9644444,0.9997554
120,0.9644444,0.8803115
160,0.9644444,0.3450427
200,0.9644444,-0.3450427
240,0.9644444,-0.8803115
280,0.9644444,-0.9997554
320,0.9644444,-0.6512897
0,0.9822222,0.0000000
40,0.9822222,0.6751215
80,0.9822222,1.0363380
120,0.9822222,0.9125230
160,0.9822222,0.3576658
200,0.9822222,-0.3576658
240,0.9822222,-0.9125230
280,0.9822222,-1.0363380
320,0.9822222,-0.6751215
0,1.0000000,0.0000000
40,1.0000000,0.6989533
80,1.0000000,1.0729200
120,1.0000000,0.9447346
160,1.0000000,0.3702890
200,1.0000000,-0.3702890
240,1.0000000,-0.9447346
280,1.0000000,-1.0729200
320,1.0000000,-0.6989533"
foobar <- read.csv(text = mydata)
Of course Response is a continuous variable, and it should be plotted with a continuous color scale. However, I'm being asked to use a discrete color scale, thus I need to discretize value. My natural approach would be the same as in the second answer to this question:
easiest way to discretize continuous scales for ggplot2 color scales?
i.e.
library(ggplot2)
ggplot(data = foobar, aes(x = theta, y = rho, fill = cut(Response, breaks = 5))) +
geom_tile() +
coord_polar(theta = "x", start = -pi/9) +
scale_x_continuous(breaks = seq(0, 360, by = 45)) +
scale_y_continuous(limits = c(0, 1)) +
scale_fill_brewer(palette = "RdYlGn", direction = -1, name = "Response")
However, I would like the labels to be plotted in decreasing order, i.e., the same order ggplot2 would use if it were a continuous variable. In my example, this means that the label (0.644, 1.08], corresponding to the red color, should be on top, and the label (-1.08, 0.644], corresponding to the blue color, should be at the bottom of the legend. How can I get that?
You can use the guide_legend argument reverse to reverse the legend.
scale_fill_brewer(palette = "RdYlGn", direction = -1, name = "Response",
guide = guide_legend(reverse = TRUE))

How Do I produce a specific number of Y Axis Tick Marks on a Scatterplot in R

Below is my code, I just want the Y axis ticks to be in 5% increments, there are too many ticks, because they correspond to each plot.
library(ggplot2)
ggplot(data = data, aes(x = X, y = Y, label = Label)) +
geom_point() +
scale_colour_manual(values = c("steelblue4", "chartreuse4", "gold", "firebrick2")) +
geom_text(aes(color = Goal), position = "jitter", hjust=0.5, vjust=1.1, size = 2.3) +
labs(title = "Google", x = "Correlation Coefficient", y = "Top-Box %")
Try adding this layer to your ggplot:
scale_y_continuous(breaks=seq(min(data$Y),max(data$Y),(max(data$Y)-min(data$Y))/20))
The breaks= argument takes a vector that allows you to manually specify the breaks. To get 20 equally spaced values from the lowest to highest values in data$Y the seq function comes in handy. You could also wrap the seq() function with round() function to clean up the potentially messy numbers that result from max()-min()/20.

Resources