Changing alpha doesn't affect anything in ggplot2 - r

I'm somewhat new to R and ggplot2 so this question is likely somewhat low-level. But I've done a fair amount of experimenting and found no answers online, so I thought I'd ask here.
When I add alpha to my graph, the graph appears as follows:
Some alpha
However, no matter how I change the value of alpha, I get no changes in the graph. I tried alpha=.9 and alpha=1/10000, and there was no difference whatsoever in the graph.
Yet it seems that the 'alpha' term is doing something. When I remove the 'alpha' from the code, I get the following graph:
No alpha
Here's my code. Thanks!
library(ggplot2)
library(chron)
argv <- commandArgs(trailingOnly = TRUE)
mydata = read.csv(argv[1])
png(argv[2], height=300, width=470)
timeHMS_formatter <- function(x) { # Takes time in seconds from midnight, converts to HH:MM:SS
h <- floor(x/3600)
m <- floor(x %% 60)
s <- round(60*(x %% 1)) # Round to nearest second
lab <- sprintf('%02d:%02d', h, m, s) # Format the strings as HH:MM:SS
lab <- gsub('^00:', '', lab) # Remove leading 00: if present
lab <- gsub('^0', '', lab) # Remove leading 0 if present
}
dateEPOCH_formatter <- function (y){
epoch <- c(month=1,day=1,year=1970)
chron(floor(y),out.format="mon-year",origin.=epoch)
}
p= ggplot() +
coord_cartesian(xlim=c(min(mydata$day),max(mydata$day)), ylim=c(0,86400)) + # displays data from first email through present
scale_color_hue() +
xlab("Date") +
ylab("Time of Day") +
scale_y_continuous(label=timeHMS_formatter, breaks=seq(0, 86400, 7200)) + # adds tick marks every 2 hours
scale_x_continuous(label=dateEPOCH_formatter, breaks=seq(min(mydata$day), max(mydata$day), 365) ) +
ggtitle("Email Sending Times") + # adds graph title
theme( legend.position = "none", axis.title.x = element_text(vjust=-0.3)) +
layer(
data=mydata,
mapping=aes(x=mydata$day, y=mydata$seconds, alpha=1/2, size=5),
stat="identity",
stat_params=list(),
geom="point",
geom_params=list(),
position=position_identity(),
)
print(p)
dev.off()

You need to put the alpha specification outside the mapping statement, as in
layer(
data=mydata,
mapping=aes(x=day, y=seconds),
stat="identity",
stat_params=list(),
geom="point",
geom_params=list(alpha=1/2, size=5),
position=position_identity(),
)
I'm more used to expressing this somewhat more compactly as
geom_point(data=mydata,
mapping=aes(x=day, y=seconds),
alpha=1/2,size=5)
The rest of the excluded stuff represents default values, I believe ...
See also: Why does the ggplot legend show the "colour" parameter?

Related

generating a manhattan plot with ggplot

I've been trying to generate a Manhattan plot using ggplot, which I finally got to work. However, I cannot get the points to be colored by chromosome, despite having tried several different examples I've seen online. I'm attaching my code and the resulting plot below. Can anyone see why the code is failing to color points by chromosome?
library(tidyverse)
library(vroom)
# threshold to drop really small -log10 p values so I don't have to plot millions of uninformative points. Just setting to 0 since I'm running for a small subset
min_p <- 0.0
# reading in data to brassica_df2, converting to data frame, removing characters from AvsDD p value column, converting to numeric, filtering by AvsDD (p value)
brassica_df2 <- vroom("manhattan_practice_data.txt", col_names = c("chromosome", "position", "num_SNPs", "prop_SNPs_coverage", "min_coverage", "AvsDD", "AvsWD", "DDvsWD"))
brassica_df2 <- as.data.frame(brassica_df2)
brassica_df2$AvsDD <- gsub("1:2=","",as.character(brassica_df2$AvsDD))
brassica_df2$AvsDD <- as.numeric(brassica_df2$AvsDD)
brassica_df2 <- filter(brassica_df2, AvsDD > min_p)
# setting significance threshhold
sig_cut <- -log10(1)
# settin ylim for graph
ylim <- (max(brassica_df2$AvsDD) + 2)
# setting up labels for x axis
axisdf <- as.data.frame(brassica_df2 %>% group_by(chromosome) %>% summarize(center=( max(position) + min(position) ) / 2 ))
# making manhattan plot of statistically significant SNP shifts
manhplot <- ggplot(data = filter(brassica_df2, AvsDD > sig_cut), aes(x=position, y=AvsDD), color=as.factor(chromosome)) +
geom_point(alpha = 0.8) +
scale_x_continuous(label = axisdf$chromosome, breaks= axisdf$center) +
scale_color_manual(values = rep(c("#276FBF", "#183059"), unique(length(axisdf$chromosome)))) +
geom_hline(yintercept = sig_cut, lty = 2) +
ylab("-log10 p value") +
ylim(c(0,ylim)) +
theme_classic() +
theme(legend.position = "n")
print(manhplot)
I think you just need to move your color=... argument inside the call to aes():
ggplot(
data = filter(brassica_df2, AvsDD > sig_cut),
aes(x=position, y=AvsDD),
color=as.factor(chromosome))
becomes...
ggplot(
data = filter(brassica_df2, AvsDD > sig_cut),
aes(x=position, y=AvsDD, color=as.factor(chromosome)))

R: point at which geom_smooth drops below a certain value

Hi stack overflow community,
I hope the two interrelated questions I am asking are not too nooby. I tried several google searches but could not find a solution.
I use R to plot the findings of a linguistic "experiment", in which I checked in how far two grammatical constructions yield acceptable descriptions of an event, depending on how for it unfolds. My data look like similar to this:
event,PFV.alone,PFV.and.PART
0.01,0,1
0.01,0,1
0.05,0,1
0.05,0,1
0.05,0,1
0.1,0,1
0.1,0,1
0.25,0,1
0.25,0,1
0.25,0,1
0.3,0,1
0.3,0,1
0.33,0,1
0.33,0,1
0.33,0,1
0.33,0,1
....
0.67,1,0.5
0.75,1,0.5
0.75,1,0
0.75,1,0
0.75,1,0
0.8,1,1
0.8,1,0
0.8,1,0
0.8,1,0
0.85,1,1
0.85,1,0
0.9,1,0
0.9,1,0
0.9,1,0
0.95,1,0
As you can see, for each of the two constructions there are "plateaus" where acceptability is 0 or 1 and then there's a "transitional" area. In order to illustrate the "plateaus" I use geom_segment and to create a smooth "transition" for the scattered data in between, I use geom_smooth. Here's my code:
#after loading datafile into "Daten":
p <- ggplot(data = Daten,
aes(x=event, y=PFV.and.PART, xmin=0, ymin=0, xmax=1, ymax=1))
p + geom_blank() +
coord_fixed()+
xlab("Progress of the event") +
ylab("Acceptability") +
geom_segment(x=0, xend=1, y=0.5,yend=0.5, linetype="dotted") +
geom_smooth(data=(subset(Daten, event==0.33 | event ==0.9)),
aes(color="chocolate"),
method="loess", fullrange=FALSE, level=0.95, se=FALSE) +
geom_segment(x=0,xend=0.33,y=1,yend=1, color="chocolate", size=1) +
geom_segment(x=0.9,xend=1,y=0,yend=0, color="chocolate", size=1) +
geom_smooth(data=(subset(Daten, event==0.33 | event==0.67)),
aes(x = event, y = PFV.alone, color="cyan4"),
method="lm",fullrange=FALSE, level=0.95, se=FALSE) +
geom_segment(color="cyan4",x=0,xend=0.33,y=0,yend=0,size=1) +
geom_segment(color="cyan4", x=0.67,xend=1,y=1,yend=1, size=1) +
scale_x_continuous(labels = scales::percent) +
scale_y_continuous(breaks = c (0,0.5,1), labels = scales::percent)+
labs(color='Construction')+
scale_color_manual(labels = c("PFV + PART", "PFV alone"),
values = c("chocolate", "cyan4")) +
theme(legend.position=c(0.05, 0.8),
legend.justification = c("left", "top"),
legend.background = element_rect(fill = "darkgray"))
This code produces a nice graph, but there's one calculation and one plot-related issue that I need help with.
First, and most importantly, I'd like to find out, at what point exactly the geom_smooth (loess) curve for "PFV.and.PART" drops down to 0.5, i.e. hits 50% acceptability. I fear that this might involve some quiet complex code?
Related to the preceding point, I'd like to mark area/line, where both curves are above 0.5 (50% acceptability), or to speak in terms of what I am trying to show: the percentages of the event at which both constructions yield a description that is at least 50% acceptable. This, of course would be based on point 1, as it is neceessary to determine the right limit, whereas the left limit does not constitute a problem as it seems to lie at x=0.5,y=0.5.
I'd really appreciate any help and I hope that I have provided all the necessary information. Please excuse me if this question has been addressed elsewhere.
Here's one approach, which involves fitting a loess model outside of ggplot
# Generate some data
set.seed(2019)
my_dat <- c(sample(c(1,0.5, 0),33, prob = c(0.85,0.15,0), replace = TRUE),
sample(c(1,0.5, 0),33, prob = c(0.1,0.7,0.1), replace = TRUE),
sample(c(1,0.5,0),34, prob = c(0,0.15,0.85), replace = TRUE))
df <- tibble(x = 1:100, y = my_dat)
# fit a loess model
m1 <- loess(y~x, data = df)
df <- df %>%
add_column(pred = predict(m1)) # predict using the loess model
# plot
df %>%
ggplot(aes(x,y))+
geom_point() +
geom_line(aes(y = pred))
# search for a value of x that gives a prediction of 0.5
f <- function(x) { 0.5 - predict(m1)[x]}
uniroot(f, interval = c(1, 100))
# $root
# [1] 53.99997

How to split 300 months into groups of 60 for ggplot2 x-axis readability in r

I have been searching for 2 or 3 days now trying to find a resolution for my problem without success. I apologize if this is really easy or is already out there, but I can't find it. I have 3 data frames that can vary in length from 1 to 300. It is only possible to display about 60 values along the ggplot x-axis without it becoming unreadable and I don't want to omit values, so I am trying to find a way to calculate how long each data frame is and split it into "x" plots of no more than 60 each.
So far I have tried: facet_grid, facet_wrap, transform and split. "Split" works ok for splitting the data, but it adds an "X1." "X2." ... "Xn." to the front of the variable names (where n is the number of partitions it broke the data into). So when I call ggplot2, it can't find my original variable names ("Cost" and "Month") because they look like X1.Cost X1.Month, X2.Cost etc...How do I fix this?
I'm open to any suggestions, especially if I can fix both issues (not hard coding into 60 rows at a time AND breaking into graphs with smaller x-axis ranges). Thanks in advance for your patience and help.
Stephanie (desperate grad student)
Here is some stub code:
```{r setup, include=FALSE}
xsz <- 60 # would like not to have to hardcode this
ix1 <- seq(1:102) # would like to break into 2 or 3 approx equal graphs #
fcost <- sample(0:200, 102)
f.df <- data.frame("Cost" = fcost, "Month" = ix1)
fn <- nrow(f.df)
fr <- rep(1:ceiling(fn/xsz),each=xsz)[1:fn]
fd <- split(f.df,fr)
fc <- numeric(length(fd))
for (i in 1:length(fd)){
print(ggplot(as.data.frame(fd[i]), aes(Month, Cost)) +
geom_line(colour = "darkred", size = .5) +
geom_point(colour = "red", size = 1) +
labs(x = "Projected Future Costs (monthly)", y = "Dollars") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 60, vjust = .6)))
}
```
When I run it, I get:
Error in eval(expr, envir, enclos) : object 'Month' not found
When I do:
names(as.data.frame(fd[1]))
I get:
[1] "X1.Cost" "X1.Month"
Use [[]] for lists.
print(ggplot(as.data.frame(fd[[i]]), aes(Month, Cost)) +
To answer your other question, you have to create a new variable with a plot number. Here I'm using rep.
f.df$plot_number <-rep(1:round(nrow(f.df)/60),each=60,len=nrow(f.df))
Then, you create a list of plots in a loop
plots <- list() # new empty list
for (i in unique(f.df$plot_number)) {
p = ggplot(f.df[f.df$plot_number==i,], aes(Month, Cost)) +
geom_line(colour = "darkred", size = .5) +
geom_point(colour = "red", size = 1) +
labs(x = "Projected Future Costs (monthly)", y = "Dollars") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 60, vjust = .6))
plots[[paste0("p",i)]] <- p # add each plot into plot list
}
With package gridExtra, you can then arrange your plots in a single one.
library(gridExtra)
do.call("grid.arrange", c(plots, ncol=1))

ggplot2 integer multiple of minor breaks per major break

This answer shows how you can specify where the minor breaks should go. In the documentation it says that minor_breaks can be a function. This, however, takes as input the plot limits not, as I expected, the location of the major gridlines below and above.
It doesn't seem very simple to make a script that will return me, say, 4 minors per major. This is something I would like to do since I have a script that I want to use on multiple different datasets. I don't know the limits beforehand, so I can't hard code them in. I can of course create a function that gets the values I need from the dataset before plotting, but it seems overkill.
Is there a general way to state the number of minor breaks per major break?
You can extract the majors from the plot, and from there calculate what minors you want and set it for your plot.
df <- data.frame(x = 0:10,
y = 0:10)
p <- ggplot(df, aes(x,y)) + geom_point()
majors <- ggplot_build(p)$panel$ranges[[1]]$x.major_source
multiplier <- 4
minors <- seq(from = min(majors),
to = max(majors),
length.out = ((length(majors) - 1) * multiplier) + 1)
p + scale_x_continuous(minor_breaks = minors)
I think scales::extended_breaks is the default function for a continuous scale. You can set the number of breaks in this function, and make the number of minor_breaks a integer multiple of the number of breaks.
library(ggplot2)
library(scales)
nminor <- 7
nmajor <- 5
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_point() +
scale_y_continuous(breaks = extended_breaks(n = nmajor), minor_breaks = extended_breaks(n = nmajor * nminor) )
Using ggplot2 version 3, I have to modify Eric Watt's code above a bit to get it to work (I can't comment on that instead since I don't have a 50 reputation yet)
library(ggplot2)
df <- data.frame(x = 0:10,
y = 10:20)
p <- ggplot(df, aes(x,y)) + geom_point()
majors <- ggplot_build(p)$layout$panel_params[[1]]$x.major_source;majors
multiplier <- 10
minors <- seq(from = min(majors),
to = max(majors),
length.out = ((length(majors) - 1) * multiplier) + 1);minors
p + scale_x_continuous(minor_breaks = minors)
If I copy paste the same code in my editor, it doesn't create majors (NULL), and so the next line gives an error.

Trouble with placing and formatting dates in ggplot2 graph using chron

I've been trying to add appropriate dates on the x-axis of my graph, but can't figure out how to do it in a sane way. What I want is pretty simple: a date at every January 1st in between the minimum and maximum of my data set.
I don't want to include the month - just '2008' or '2009' or whatever is fine. A great example would be this graph:
example graph
Except I want the date on every year, rather than every other year.
I can't seem to figure this out. My dates are defined as days since 1/1/1970, and I've included a method dateEPOCH_formatter which converts the epoch format to a format using the chron package. I've figured out how to make a tick mark and date at the origin of the graph and every 365 days thereafter, but that's not quite the same thing.
Another minor problem is that, mysteriously, the line chron(floor(y), out.format="mon year",origin.=epoch) outputs a graph with axis markers like 'Mar 2008', but changing the line to chron(floor(y), out.format="year",origin.=epoch) doesn't give me a result like '2008' - it just results in the error:
Error in parse.format(format[1]) : unrecognized format year
Calls: print ... as.character.times -> format -> format.dates -> parse.format
Execution halted
Here's my code - thanks for the help.
library(ggplot2)
library(chron)
argv <- commandArgs(trailingOnly = TRUE)
mydata = read.csv(argv[1])
png(argv[2], height=300, width=470)
timeHMS_formatter <- function(x) { # Takes time in seconds from midnight, converts to HH:MM:SS
h <- floor(x/3600)
m <- floor(x %% 60)
s <- round(60*(x %% 1)) # Round to nearest second
lab <- sprintf('%02d:%02d', h, m, s) # Format the strings as HH:MM:SS
lab <- gsub('^00:', '', lab) # Remove leading 00: if present
lab <- gsub('^0', '', lab) # Remove leading 0 if present
}
dateEPOCH_formatter <- function (y){
epoch <- c(month=1,day=1,year=1970)
chron(floor(y), out.format="mon year",origin.=epoch)
}
p= ggplot() +
coord_cartesian(xlim=c(min(mydata$day),max(mydata$day)), ylim=c(0,86400)) + # displays data from first email through present
scale_color_hue() +
xlab("Date") +
ylab("Time of Day") +
scale_y_continuous(label=timeHMS_formatter, breaks=seq(0, 86400, 14400)) + # adds tick marks every 4 hours
scale_x_continuous(label=dateEPOCH_formatter, breaks=seq(min(mydata$day), max(mydata$day), 365) ) +
ggtitle("Email Sending Times") + # adds graph title
theme( legend.position = "none", axis.title.x = element_text(vjust=-0.3)) +
theme_bw() +
layer(
data=mydata,
mapping=aes(x=mydata$day, y=mydata$seconds),
stat="identity",
stat_params=list(),
geom="point",
geom_params=list(alpha=5/8, size=2, color="#A9203E"),
position=position_identity(),
)
print(p)
dev.off()
I think it will be much easier to use the built in function scale_x_date with date_format and date_breaks from the scales package. These should work with most date classes in R, such as Date, chron etc
for example
library(ggplot2)
library(chron)
library(scales)
# some example data
days <- seq(as.Date('01-01-2000', format = '%d-%m-%Y'),
as.Date('01-01-2010', format = '%d-%m-%Y'), by = 1)
days_chron <- as.chron(days)
mydata <- data.frame(day = days_chron, y = rnorm(length(days)))
# the plot
ggplot(mydata, aes(x=days, y= y)) + geom_point() +
scale_x_date(breaks = date_breaks('year'), labels = date_format('%Y'))
To show how intuitive and easy these function are, if you wanted Montth-year labels every 6 months - note that this requires a very wide plot or very small axis labels
ggplot(mydata, aes(x=days, y= y)) + geom_point() +
scale_x_date(breaks = date_breaks('6 months'), labels = date_format('%b-%Y'))

Resources