I am testing some templates of ggplot2, and I am interesting to the slope chart, available from here:
Slope Chart - Link
It works perfectly.
I am only trying to make a similar one but with only a group defined, like the following code:
library(dplyr)
library(ggplot2)
theme_set(theme_classic())
source_df <- read.csv("https://raw.githubusercontent.com/jkeirstead/r-slopegraph/master/cancer_survival_rates.csv")
source_df <- filter(source_df, group == "Thyroid")
then I copy the remaining code from the example:
# Define functions. Source: https://github.com/jkeirstead/r-slopegraph
tufte_sort <- function(df, x="year", y="value", group="group", method="tufte", min.space=0.05) {
## First rename the columns for consistency
ids <- match(c(x, y, group), names(df))
df <- df[,ids]
names(df) <- c("x", "y", "group")
## Expand grid to ensure every combination has a defined value
tmp <- expand.grid(x=unique(df$x), group=unique(df$group))
tmp <- merge(df, tmp, all.y=TRUE)
df <- mutate(tmp, y=ifelse(is.na(y), 0, y))
## Cast into a matrix shape and arrange by first column
require(reshape2)
tmp <- dcast(df, group ~ x, value.var="y")
ord <- order(tmp[,2])
tmp <- tmp[ord,]
min.space <- min.space*diff(range(tmp[,-1]))
yshift <- numeric(nrow(tmp))
## Start at "bottom" row
## Repeat for rest of the rows until you hit the top
for (i in 2:nrow(tmp)) {
## Shift subsequent row up by equal space so gap between
## two entries is >= minimum
mat <- as.matrix(tmp[(i-1):i, -1])
d.min <- min(diff(mat))
yshift[i] <- ifelse(d.min < min.space, min.space - d.min, 0)
}
tmp <- cbind(tmp, yshift=cumsum(yshift))
scale <- 1
tmp <- melt(tmp, id=c("group", "yshift"), variable.name="x", value.name="y")
## Store these gaps in a separate variable so that they can be scaled ypos = a*yshift + y
tmp <- transform(tmp, ypos=y + scale*yshift)
return(tmp)
}
plot_slopegraph <- function(df) {
ylabs <- subset(df, x==head(x,1))$group
yvals <- subset(df, x==head(x,1))$ypos
fontSize <- 3
gg <- ggplot(df,aes(x=x,y=ypos)) +
geom_line(aes(group=group),colour="grey80") +
geom_point(colour="white",size=8) +
geom_text(aes(label=y), size=fontSize, family="American Typewriter") +
scale_y_continuous(name="", breaks=yvals, labels=ylabs)
return(gg)
}
## Prepare data
df <- tufte_sort(source_df,
x="year",
y="value",
group="group",
method="tufte",
min.space=0.05)
df <- transform(df,
x=factor(x, levels=c(5,10,15,20),
labels=c("5 years","10 years","15 years","20 years")),
y=round(y))
## Plot
plot_slopegraph(df) + labs(title="Estimates of % survival rates") +
theme(axis.title=element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(hjust=0.5,
family = "American Typewriter",
face="bold"),
axis.text = element_text(family = "American Typewriter",
face="bold"))
The problem I find is that the connecting lines disappears if I take only one object in source_df$group ()in this case Thyroid, like the followings:
If I add only another item in the same column everything is fine and the connecting line appears.
Is there a way to have the lines also in this situation? I have tried on a lot of ways, removing the lines containing NA values without success, therefore I do not know how to fix this problem, if ... it is possible to be fixed.
Thank you in advance for every eventual reply!
May I suggest a much easier way, with the {ggh4x} package, which has a base R type = "b" like geom. You can remove the points, and plot text instead.
You'll get your result in three lines of code :)
library(tidyverse)
library(ggh4x)
source_df <- read.csv("https://raw.githubusercontent.com/jkeirstead/r-slopegraph/master/cancer_survival_rates.csv")
source_df <- filter(source_df, group == "Thyroid")
ggplot(source_df, aes(year, value)) +
## set shape to NA
geom_pointpath(aes(group = group, mult = 1), shape = NA) +
geom_text(aes(label = value))
Created on 2021-12-30 by the reprex package (v2.0.1)
I fixed simply adding the line:
df <- df[complete.cases(df), ]
Before the graphing instructions. The problem was the generation of many lines with NA values and this line removes lines with null value.
Related
After many questions on how to make boxplots with facets and significance levels, particularly this and this, I still have one more little problem.
I managed to produce the plot shown below, which is exactly what I want.
The problem I am facing now is when I have very few, or no significant comparisons; in those cases, the whole space dedicated to the brackets showing the significance levels is still preserved, but I want to get rid of it.
Please check this MWE with the iris dataset:
library(reshape2)
library(ggplot2)
data(iris)
iris$treatment <- rep(c("A","B"), length(iris$Species)/2)
mydf <- melt(iris, measure.vars=names(iris)[1:4])
mydf$treatment <- as.factor(mydf$treatment)
mydf$variable <- factor(mydf$variable, levels=sort(levels(mydf$variable)))
mydf$both <- factor(paste(mydf$treatment, mydf$variable), levels=(unique(paste(mydf$treatment, mydf$variable))))
a <- combn(levels(mydf$both), 2, simplify = FALSE)#this 6 times, for each lipid class
b <- levels(mydf$Species)
CNb <- relist(
paste(unlist(a), rep(b, each=sum(lengths(a)))),
rep.int(a, length(b))
)
CNb
CNb2 <- data.frame(matrix(unlist(CNb), ncol=2, byrow=T))
CNb2
#new p.values
pv.df <- data.frame()
for (gr in unique(mydf$Species)){
for (i in 1:length(a)){
tis <- a[[i]] #variable pair to test
as <- subset(mydf, Species==gr & both %in% tis)
pv <- wilcox.test(value ~ both, data=as)$p.value
ddd <- data.table(as)
asm <- as.data.frame(ddd[, list(value=mean(value)), by=list(both=both)])
asm2 <- dcast(asm, .~both, value.var="value")[,-1]
pf <- data.frame(group1=paste(tis[1], gr), group2=paste(tis[2], gr), mean.group1=asm2[,1], mean.group2=asm2[,2], log.FC.1over2=log2(asm2[,1]/asm2[,2]), p.value=pv)
pv.df <- rbind(pv.df, pf)
}
}
pv.df$p.adjust <- p.adjust(pv.df$p.value, method="BH")
colnames(CNb2) <- colnames(pv.df)[1:2]
# merge with the CN list
pv.final <- merge(CNb2, pv.df, by.x = c("group1", "group2"), by.y = c("group1", "group2"))
# fix ordering
pv.final <- pv.final[match(paste(CNb2$group1, CNb2$group2), paste(pv.final$group1, pv.final$group2)),]
# set signif level
pv.final$map.signif <- ifelse(pv.final$p.adjust > 0.05, "", ifelse(pv.final$p.adjust > 0.01,"*", "**"))
# subset
G <- pv.final$p.adjust <= 0.05
CNb[G]
P <- ggplot(mydf,aes(x=both, y=value)) +
geom_boxplot(aes(fill=Species)) +
facet_grid(~Species, scales="free", space="free_x") +
theme(axis.text.x = element_text(angle=45, hjust=1)) +
geom_signif(test="wilcox.test", comparisons = combn(levels(mydf$both),2, simplify = F),
map_signif_level = F,
vjust=0.5,
textsize=4,
size=0.5,
step_increase = 0.06)
P2 <- ggplot_build(P)
#pv.final$map.signif <- "" #UNCOMMENT THIS LINE TO MOCK A CASE WHERE THERE ARE NO SIGNIFICANT COMPARISONS
#pv.final$map.signif[c(1:42,44:80,82:84)] <- "" #UNCOMMENT THIS LINE TO MOCK A CASE WHERE THERE ARE JUST A COUPLE OF SIGNIFICANT COMPARISONS
P2$data[[2]]$annotation <- rep(pv.final$map.signif, each=3)
# remove non significants
P2$data[[2]] <- P2$data[[2]][P2$data[[2]]$annotation != "",]
# and the final plot
png(filename="test.png", height=800, width=800)
plot(ggplot_gtable(P2))
dev.off()
Which produces this plot:
The plot above is exactly what I want... But I am facing cases where there are no significant comparisons, or very few. In these cases, a lot of vertical space is left empty.
To exemplify those scenarios, we can uncomment the line:
pv.final$map.signif <- "" #UNCOMMENT THIS LINE TO MOCK A CASE WHERE THERE ARE NO SIGNIFICANT COMPARISONS
So when there are no significant comparisons I get this plot:
If we uncomment this other line instead:
pv.final$map.signif[c(1:42,44:80,82:84)] <- "" #UNCOMMENT THIS LINE TO MOCK A CASE WHERE THERE ARE JUST A COUPLE OF SIGNIFICANT COMPARISONS
We are in a case where there are only a couple of significant comparisons, and obtain this plot:
So my question here is:
How to adjust the vertical space to the number of significant comparisons, so no vertical space is left there?
There might be something I could change in step_increase or in y_position inside geom_signif(), so I only leave space for the significant comparisons in CNb[G]...
One option is to pre-calculate the p-values for each combination of both levels and then select only the significant ones for plotting. Since we then know up front how many are significant, we can adjust the y-ranges of the plots to account for that. However, it doesn't look like geom_signif is capable of doing only within-facet calculations for the p-value annotations (see the help for the manual argument). Thus, instead of using ggplot's faceting, we instead use lapply to create a separate plot for each Species and then use grid.arrange from the gridExtra package to lay out the individual plots as if they were faceted.
(To respond to the comments, I want to emphasize that the plots are all still created with ggplot2, but we create what would have been the three facet panels of a single plot as three separate plots and then lay them out together as if they had been faceted.)
The function below is hard-coded for the data frame and column names in the OP, but can of course be generalized to take any data frame and column names.
library(gridExtra)
library(tidyverse)
# Change data to reduce number of statistically significant differences
set.seed(2)
df = mydf %>% mutate(value=rnorm(nrow(mydf)))
# Function to generate and lay out the plots
signif_plot = function(signif.cutoff=0.05, height.factor=0.23) {
# Get full range of y-values
y_rng = range(df$value)
# Generate a list of three plots, one for each Species (these are the facets)
plot_list = lapply(split(df, df$Species), function(d) {
# Get pairs of x-values for current facet
pairs = combn(sort(as.character(unique(d$both))), 2, simplify=FALSE)
# Run wilcox test on every pair
w.tst = pairs %>%
map_df(function(lv) {
p.value = wilcox.test(d$value[d$both==lv[1]], d$value[d$both==lv[2]])$p.value
data.frame(levs=paste(lv, collapse=" "), p.value)
})
# Record number of significant p.values. We'll use this later to adjust the top of the
# y-range of the plots
num_signif = sum(w.tst$p.value <= signif.cutoff)
# Plot significance levels only for combinations with p <= signif.cutoff
p = ggplot(d, aes(x=both, y=value)) +
geom_boxplot() +
facet_grid(~Species, scales="free", space="free_x") +
geom_signif(test="wilcox.test", comparisons = pairs[which(w.tst$p.value <= signif.cutoff)],
map_signif_level = F,
vjust=0,
textsize=3,
size=0.5,
step_increase = 0.08) +
theme_bw() +
theme(axis.title=element_blank(),
axis.text.x = element_text(angle=45, hjust=1))
# Return the plot and the number of significant p-values
return(list(num_signif, p))
})
# Get the highest number of significant p-values across all three "facets"
max_signif = max(sapply(plot_list, function(x) x[[1]]))
# Lay out the three plots as facets (one for each Species), but adjust so that y-range is same
# for each facet. Top of y-range is adjusted using max_signif.
grid.arrange(grobs=lapply(plot_list, function(x) x[[2]] +
scale_y_continuous(limits=c(y_rng[1], y_rng[2] + height.factor*max_signif))),
ncol=3, left="Value")
}
Now run the function with four different significance cutoffs:
signif_plot(0.05)
signif_plot(0.01)
signif_plot(0.9)
signif_plot(0.0015)
You can try. Although the answer is similar to my answer here, I added now a function.
library(tidyverse)
library(ggsignif)
# 1. your data
set.seed(2)
df <- as.tbl(iris) %>%
mutate(treatment=rep(c("A","B"), length(iris$Species)/2)) %>%
gather(key, value, -Species, -treatment) %>%
mutate(value=rnorm(n())) %>%
mutate(key=factor(key, levels=unique(key))) %>%
mutate(both=interaction(treatment, key, sep = " "))
# 2. pairwise.wilcox.test for 1) validation and 2) to calculate the ylim
Wilcox <- df %>%
split(., .$Species) %>%
map(~tidy(pairwise.wilcox.test(.$value, .$both, p.adjust.method = "none"))) %>%
map(~filter(.,.$p.value < 0.05)) %>%
bind_rows(.id="Species") %>%
mutate(padjust=p.adjust(p.value, method = "BH"))
# 3. calculate y range
Ylim <- df %>%
summarise(Min=round(min(value)),
Max=round(max(value))) %>%
mutate(Max=Max+0.5*group_by(Wilcox, Species) %>% count() %>% with(.,max(n)))
%>% c()
# 4. the plot function
foo <- function(df, Ylim, Signif=0.05){
P <- df %>%
ggplot(aes(x=both, y=value)) +
geom_boxplot(aes(fill=Species)) +
facet_grid(~Species) +
ylim(Ylim$Min, Ylim$Max)+
theme(axis.text.x = element_text(angle=45, hjust=1)) +
geom_signif(comparisons = combn(levels(df$both),2,simplify = F),
map_signif_level = F, test = "wilcox.test" ) +
stat_summary(fun.y=mean, geom="point", shape=5, size=4) +
xlab("")
# 5. remove not significant values and add step increase
P_new <- ggplot_build(P)
P_new$data[[2]] <- P_new$data[[2]] %>%
filter(as.numeric(as.character(annotation)) < 0.05) %>%
group_by(PANEL) %>%
mutate(index=(as.numeric(group[drop=T])-1)*0.5) %>%
mutate(y=y+index,
yend=yend+index) %>%
select(-index) %>%
as.data.frame()
# the final plot
plot(ggplot_gtable(P_new))
}
foo(df, Ylim)
trying other data
set.seed(12345)
df <- as.tbl(iris) %>%
mutate(treatment=rep(c("A","B"), length(iris$Species)/2)) %>%
gather(key, value, -Species, -treatment) %>%
mutate(value=rnorm(n())) %>%
mutate(key=factor(key, levels=unique(key))) %>%
mutate(both=interaction(treatment, key, sep = " "))
foo(df, list(Min=-3,Max=5))
Ofcourse you can add the Ylim calculation to the function as well. In addition you can change or add ggtitel(), ylab() and change the color.
I would like to make a facet grid plot of quite long data which is in a list of different dataframes. Here is some example data:
library(tidyverse)
library(ggplot2)
L <- list()
for (i in 1:10) {
DATE <- seq( as.POSIXct("2016-01-01 21:00:00", tz="UTC"),
as.POSIXct("2016-02-01 21:00:00", tz="UTC"),
by = 600
)
V1 <- round(runif(length(DATE)) - 0.3,0)
L[[i]] <- tibble(DATE, V1)
}
names(L) <- LETTERS[1:10]
What I acctually need is a facet_grid() with each dataframe in one row showing V1 is either 1 or 0, like a barcode or something similar.
My only success was to bind all list elements to one dataframe with the name of each list element as a new column. This seems to be being kind of circuitous.
for (i in 1:10) {
L[[i]] <- mutate(L[[i]], list.name = names(L)[i] )
}
DF <- do.call(rbind.data.frame, L)
ggplot(DF) +
geom_step(aes(DATE, V1), size = 0.1, ) +
facet_grid(list.name ~ .)
I would like to have the downsite of the geom_step() plot filled like it is at the geom_area() (I know it is hard to see in the example). Is there another geom that fits better? Since my data is way longer it takes hours to create the plot. Is there a better way to make a facet_grid() on list elements?
How about using geom_raster?
DF <- map_df(L, I) %>% mutate(V1 = as.factor(V1))
ggplot(DF, aes(x = DATE, y = list.name, fill = V1)) +
geom_raster() +
scale_fill_manual(values = c("white", "black"))
This is a followup post from here
and here
I have successfully implemented the split violin ggplot2 for my data (two median estimator densities, for two cases) that need to be compared. Now, since i would like to add some confidence interval. I m following the code posted in the links above:
EDIT: A reproducible example
tmp <- rnorm(1000,0,1)
tmp.2 <- rnorm(1000,0,1)
x.1 <- density(tmp)
y.1 <- density(tmp.2)
Here, i m making the densities, extracting the (x,y) pairs. Then i m getting the quantiles back,
# Make densities
densities <- as.data.frame(c(x.1$x,y.1$x))
colnames(densities) <- "loc"
densities$dens <- c(x.1$y,y.1$y)
densities$drop_case <- c(rep("B",512),rep("S",512))
densities$dens <- ifelse(densities$drop_case=="B",densities$dens*-1,densities$dens)
densities$dens <- ifelse(densities$drop_case=="S",densities$dens*1,densities$dens)
conf <- as.data.frame(c(quantile(tmp,c(0.025,0.975))[1],quantile(tmp,c(0.025,0.975))[2],quantile(tmp.2,c(0.025,0.975))[1],quantile(tmp.2,c(0.025,0.975))[2]))
colnames(conf) <- "intervals"
conf$drop_case <- c(rep("B",2),rep("S",2))
conf$length <- rep(1000,4)
Now here i am trying to extract the values inside the densities, as was noted in the linked posts
Find data points in densities
val.tmp <- rep(0,4)
val.tmp.2 <- rep(0,4)
for (i in 1:4) {
x.here <- densities$loc
y.here <- densities$dens
your.number<- conf$intervals[i]
pos.tmp <- which(abs(x.here-your.number)==min(abs(x.here-your.number)))
val.tmp[i] <- x.here[pos.tmp]
val.tmp.2[i] <- y.here[pos.tmp]
}
conf$positions <- val.tmp
conf$length <- val.tmp.2
conf$length <- ifelse(conf$drop_case=="B",conf$length*-1,conf$length)
conf$length <- ifelse(conf$drop_case=="S",conf$length*1,conf$length)
ggplot(densities,aes(dens, loc, fill = factor(drop_case)))+
geom_polygon()+
scale_x_continuous(breaks = 0, name = info$Name)+
ylab('Estimator Density') +
theme(axis.title.x = element_blank())+
geom_point(data = conf, aes(x = positions, y = length, fill = factor(drop_case), group = factor(drop_case))
,shape = 21, colour = "black", show.legend = FALSE)
Then unfortuantely I am facing the following, the points are not mapped on the densities but are rather mapped on the plane.
There is a bunch of little mistakes in the code. Firstly, within that for loop, you can't set x.here and y.here to all of the density and location values, since that includes both groups. Secondly, since the signs are already changed in densities there is no need to use those ifelse statements afterwards. Thirdly, you would only need the top ifelse anyway, since the bottom one does absolutely nothing. Finally, you had the x and y mappings in geom_point the wrong way around!
There is a bunch of other things one could change to make the code more understandable and pretty, but I'm on limited time, so I'll leave those for what they are.
Below the full adjusted code:
tmp <- rnorm(1000,0,1)
tmp.2 <- rnorm(1000,0,1)
x.1 <- density(tmp)
y.1 <- density(tmp.2)
# Make densities
densities <- as.data.frame(c(x.1$x,y.1$x))
colnames(densities) <- "loc"
densities$dens <- c(x.1$y,y.1$y)
densities$drop_case <- c(rep("B",512),rep("S",512))
densities$dens <- ifelse(densities$drop_case=="B",densities$dens*-1,densities$dens)
conf <- as.data.frame(c(quantile(tmp,c(0.025,0.975)), quantile(tmp.2,c(0.025,0.975))))
colnames(conf) <- "intervals"
conf$drop_case <- c(rep("B",2),rep("S",2))
conf$length <- rep(1000,4)
val.tmp <- rep(0,4)
val.tmp.2 <- rep(0,4)
for (i in 1:4) {
x.here <- densities$loc[densities$drop_case == conf$drop_case[i]]
y.here <- densities$dens[densities$drop_case == conf$drop_case[i]]
your.number<- conf$intervals[i]
pos.tmp <- which(abs(x.here-your.number)==min(abs(x.here-your.number)))
val.tmp[i] <- x.here[pos.tmp]
val.tmp.2[i] <- y.here[pos.tmp]
}
conf$positions <- val.tmp
conf$length <- val.tmp.2
ggplot(densities, aes(dens, loc, fill = drop_case)) +
geom_polygon()+
ylab('Estimator Density') +
theme(axis.title.x = element_blank())+
geom_point(data = conf, aes(x = length, y = positions, fill = drop_case),
shape = 21, colour = "black", show.legend = FALSE)
This results in:
I would personally prefer a plot with line segments:
ggplot(densities, aes(dens, loc, fill = factor(drop_case)))+
geom_polygon()+
ylab('Estimator Density') +
theme(axis.title.x = element_blank())+
geom_segment(data = conf, aes(x = length, xend = 0, y = positions, yend = positions))
I'm trying to write a custom scatterplot matrix function in ggplot2 using facet_grid. My data have two categorical variables and one numeric variable.
I'd like to facet (make the scatterplot rows/cols) according to one of the categorical variables and change the plotting symbol according to the other categorical.
I do so by first constructing a larger dataset that includes all combinations (combs) of the categorical variable from which I'm creating the scatterplot panels.
My questions are:
How to use geom_rect to white-out the diagonal and upper panels in facet_grid (I can only make the middle ones black so far)?
How can you move the titles of the facets to the bottom and left hand sides respectively?
How does one remove tick axes and labels for the top left and bottom right facets?
Thanks in advance.
require(ggplot2)
# Data
nC <- 5
nM <- 4
dat <- data.frame(
Control = rep(LETTERS[1:nC], nM),
measure = rep(letters[1:nM], each = nC),
value = runif(nC*nM))
# Change factors to characters
dat <- within(dat, {
Control <- as.character(Control)
measure <- as.character(measure)
})
# Check, lapply(dat, class)
# Define scatterplot() function
scatterplotmatrix <- function(data,...){
controls <- with(data, unique(Control))
measures <- with(data, unique(measure))
combs <- expand.grid(1:length(controls), 1:length(measures), 1:length(measures))
# Add columns for values
combs$value1 = 1
combs$value2 = 0
for ( i in 1:NROW(combs)){
combs[i, "value1"] <- subset(data, subset = Control==controls[combs[i,1]] & measure == measures[combs[i,2]], select = value)
combs[i, "value2"] <- subset(data, subset = Control==controls[combs[i,1]] & measure == measures[combs[i,3]], select = value)
}
for ( i in 1:NROW(combs)){
combs[i,"Control"] <- controls[combs[i,1]]
combs[i,"Measure1"] <- measures[combs[i,2]]
combs[i,"Measure2"] <- measures[combs[i,3]]
}
# Final pairs plot
plt <- ggplot(combs, aes(x = value1, y = value2, shape = Control)) +
geom_point(size = 8, colour = "#F8766D") +
facet_grid(Measure2 ~ Measure1) +
ylab("") +
xlab("") +
scale_x_continuous(breaks = c(0,0.5,1), labels = c("0", "0.5", "1"), limits = c(-0.05, 1.05)) +
scale_y_continuous(breaks = c(0,0.5,1), labels = c("0", "0.5", "1"), limits = c(-0.05, 1.05)) +
geom_rect(data = subset(combs, subset = Measure1 == Measure2), colour='white', xmin = -Inf, xmax = Inf,ymin = -Inf,ymax = Inf)
return(plt)
}
# Call
plt1 <- scatterplotmatrix(dat)
plt1
I'm not aware of a way to move the panel strips (the labels) to the bottom or left. Also, it's not possible to format the individual panels separately (e.g., turn off the tick marks for just one facet). So if you really need these features, you will probably have to use something other than, or in addition to ggplot. You should really look into GGally, although I've never had much success with it.
As far as leaving some of the panels blank, here is a way.
nC <- 5; nM <- 4
set.seed(1) # for reproducible example
dat <- data.frame(Control = rep(LETTERS[1:nC], nM),
measure = rep(letters[1:nM], each = nC),
value = runif(nC*nM))
scatterplotmatrix <- function(data,...){
require(ggplot2)
require(data.table)
require(plyr) # for .(...)
DT <- data.table(data,key="Control")
gg <- DT[DT,allow.cartesian=T]
setnames(gg,c("Control","H","x","V","y"))
fmt <- function(x) format(x,nsmall=1)
plt <- ggplot(gg, aes(x,y,shape = Control)) +
geom_point(subset=.(as.numeric(H)<as.numeric(V)),size=5, colour="#F8766D") +
facet_grid(V ~ H) +
ylab("") + xlab("") +
scale_x_continuous(breaks=c(0,0.5,1), labels=fmt, limits=c(-0.05, 1.05)) +
scale_y_continuous(breaks=c(0,0.5,1), labels=fmt, limits=c(-0.05, 1.05))
return(plt)
}
scatterplotmatrix(dat)
The main feature of this is the use of subset=.(as.numeric(H)<as.numeric(V)) in the call to geom_point(...). This subsets the dataset so you only get a point layer when the condition is met, e.g. in facets where is.numeric(H)<is.numeric(V). This works because I've left the H and V columns as factors and is.numeric(...) operating on a factor returns the levels, not the names.
The rest is just a more compact (and much faster) way of creating what you called comb.
I am trying to make the plot with horizontal lines where the data2 and data3 points should be within data1 range. This will give an overlapping lines in different colors but I am getting an error which says:
Error in strsplit(filename, "\\.") : non-character argument
Here is the data and code. Please give me some suggestion.
data1 <- data.frame(Start=c(10),End=c(19))
data2 <- data.frame(Start=c(5),End=c(15))
data3 <- data.frame(Start=c(6),End=c(18))
filter_data2 <- data2[data2$Start >= (data1$Start-(data1$Start/2)) & data2$End <= (data1$End+(data1$End/2)), ]
filter_data3 <- data3[data3$Start >= (data1$Start-(data1$Start/2)) & data3$End <= (data1$End+(data1$End/2)), ]
data1 <- data.frame(rep(1,nrow(data1)),data1)
colnames(data1) <- c("ID","start","end")
data2 <- data.frame(rep(2,nrow(filter_data2)),filter_data2)
colnames(data2) <- c("ID","start","end")
data3 <- data.frame(rep(3,nrow(filter_data3)),filter_data3)
colnames(data3) <- c("ID","start","end")
dat1 <- rbind(data1,data2,data3)
pdf("overlap.pdf")
p <- ggplot(dat1, aes(x=(max(start)-max(start)/2), y = ID, colour=ID))
p <- p + geom_segment(aes(xend =(max(end)+max(end)/2), ystart = ID, yend = ID))
p <- p + scale_colour_brewer(palette = "Set1")
ggsave(p)
There are two problems in your code. If you want to use scale_colour_brewer() then ID values should be set as factor
p <- ggplot(dat1, aes(x=(max(start)-max(start)/2), y = ID, colour=as.factor(ID)))
Next, to save the ggplot2 plot you have two possibilities.
Using ggsave() function you should provide file name and format. In this case function pdf() is unnecessary.
ggsave(plot=p,file="plot.pdf")
Using function pdf(), you should add print(p) and then dev.off(). In this case you don't need ggsave() function.
pdf("overlap.pdf")
print(p)
dev.off()