I want to create a plot with custom axis tick spacing. What I want to achieve is similar to this plot:
I can specify axis tick locations using breaks argument, but I can't change the distance between them.
dat <- data.frame(x = runif(100), y = runif(100))
ggplot(dat, aes(x,y)) + geom_point() + scale_x_continuous(breaks=c(0,0.1,0.2,0.4,0.8,1)) + scale_y_continuous(breaks=c(0,0.1,0.2,0.4,0.8,1))
What I essentially want is to focus on a specific interval (say 0:0.2) and have bigger spacing for this interval and squish the rest (0.2:1).
Right now I do that by creating two graphs for my desired intervals, and glue them together with grid.arrange, but I was wondering if there was a solution that enables me to generate the plot in one go.
This is my current solution:
q1<-ggplot(dat, aes(x,y)) + geom_point() + ylim(c(0.2,1)) + xlim(c(0,0.2))+ theme(axis.text.x = element_blank(), axis.title.x = element_blank(), axis.ticks.x=element_blank())
q2<-ggplot(dat, aes(x,y)) + geom_point() + ylim(c(0.2,1)) + xlim(c(0.2,1))+ theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks=element_blank())
q3<-ggplot(dat, aes(x,y)) + geom_point() + ylim(c(0,0.2)) + xlim(c(0,0.2))
q4<-ggplot(dat, aes(x,y)) + geom_point() + ylim(c(0.2,1)) + xlim(c(0.2,1))+ theme(axis.text.y = element_blank(), axis.title.y = element_blank(), axis.ticks.y=element_blank())
grid.arrange(q1,q2,q3,q4)
Ok first I'll have to make the obligatory comment that squishing part of the data in a way that disconnects the position on the plot from a direct connection to the data is not a good idea in general.
That said, here is how you can do it. We can make a function factory that produces a transformation object with the scales factor. The function factory accepts a range it should squish and a factor by how much to squish the data. I haven't tested it exhaustively, but I think it works correctly.
library(ggplot2)
library(scales)
squish_trans <- function(range, factor = 10) {
force(range)
force(factor)
forward <- function(x) {
test_between <- x > range[1] & x < range[2]
test_over <- x >= range[2]
between <- ((x - range[1]) / factor) + range[1]
over <- (x - range[2] + diff(range) / factor) + range[1]
ifelse(test_over, over,
ifelse(test_between, between, x))
}
reverse <- function(x) {
test_between <- x > range[1] & x < range[1] + diff(range) / factor
test_over <- x >= range[1] + diff(range) / factor
between <- ((x - range[1]) * factor) + range[1]
over <- (x - range[1]) - diff(range) / factor + range[2]
ifelse(test_over, over,
ifelse(test_between, between, x))
}
trans_new(
"squish_trans",
transform = forward,
inverse = reverse
)
}
Now we simply run the function factory as trans argument with the range you want to squish. You can notice that the 0.2-1 range (80% of data range) is now 0.08/0.28 ~= 0.28 (~28%) of the axis range because we squish with a factor 10.
dat <- data.frame(x = runif(100), y = runif(100))
ggplot(dat, aes(x,y)) + geom_point() +
scale_x_continuous(breaks=c(0,0.1,0.2,0.4,0.8,1),
trans = squish_trans(c(0.2, Inf))) +
scale_y_continuous(breaks=c(0,0.1,0.2,0.4,0.8,1),
trans = squish_trans(c(0.2, Inf)))
Created on 2021-02-05 by the reprex package (v1.0.0)
dat <- data.frame(x = runif(100), y = runif(100))
ggplot(dat, aes(x,y)) +
geom_point() +
scale_x_continuous(breaks=c(0,0.1,0.2,0.4,0.8,1)) +
scale_y_continuous(breaks=c(0,0.1,0.2,0.4,0.8,1))
dat$condx <- ifelse(dat$x > 0.2, "x2", "x1")
dat$condy <- ifelse(dat$y > 0.2, "y1", "y2")
dat$condxy <- paste(dat$condx, dat$condy)
ggplot(dat, aes(x, y, group=condxy)) +
geom_point() +
scale_x_continuous(breaks=c(0,0.05,0.1,0.15,0.2,0.4,0.6,0.8,1)) +
scale_y_continuous(breaks=c(0,0.05,0.1,0.15,0.2,0.4,0.6,0.8,1)) +
facet_grid(condy~condx, scales="free")
(Related to Two scales in the same axis)
Regards,
Related
I'm trying to produce a scatter plot with geom_point where the points are circumscribed by a smoothed polygon, with geom_polygon.
Here's my point data:
set.seed(1)
df <- data.frame(x=c(rnorm(30,-0.1,0.1),rnorm(30,0,0.1),rnorm(30,0.1,0.1)),y=c(rnorm(30,-1,0.1),rnorm(30,0,0.1),rnorm(30,1,0.1)),val=rnorm(90),cluster=c(rep(1,30),rep(2,30),rep(3,30)),stringsAsFactors=F)
I color each point according the an interval that df$val is in. Here's the interval data:
intervals.df <- data.frame(interval=c("(-3,-2]","(-2,-0.999]","(-0.999,0]","(0,1.96]","(1.96,3.91]","(3.91,5.87]","not expressed"),
start=c(-3,-2,-0.999,0,1.96,3.91,NA),end=c(-2,-0.999,0,1.96,3.91,5.87,NA),
col=c("#2f3b61","#436CE8","#E0E0FF","#7d4343","#C74747","#EBCCD6","#D3D3D3"),stringsAsFactors=F)
Assigning colors and intervals to the points:
df <- cbind(df,do.call(rbind,lapply(df$val,function(x){
if(is.na(x)){
return(data.frame(col=intervals.df$col[nrow(intervals.df)],interval=intervals.df$interval[nrow(intervals.df)],stringsAsFactors=F))
} else{
idx <- which(intervals.df$start <= x & intervals.df$end >= x)
return(data.frame(col=intervals.df$col[idx],interval=intervals.df$interval[idx],stringsAsFactors=F))
}
})))
Preparing the colors for the leged which will show each interval:
df$interval <- factor(df$interval,levels=intervals.df$interval)
colors <- intervals.df$col
names(colors) <- intervals.df$interval
Here's where I constructed the smoothed polygons (using a function courtesy of this link):
clusters <- sort(unique(df$cluster))
cluster.cols <- c("#ff00ff","#088163","#ccbfa5")
splinePolygon <- function(xy,vertices,k=3, ...)
{
# Assert: xy is an n by 2 matrix with n >= k.
# Wrap k vertices around each end.
n <- dim(xy)[1]
if (k >= 1) {
data <- rbind(xy[(n-k+1):n,], xy, xy[1:k, ])
} else {
data <- xy
}
# Spline the x and y coordinates.
data.spline <- spline(1:(n+2*k), data[,1], n=vertices, ...)
x <- data.spline$x
x1 <- data.spline$y
x2 <- spline(1:(n+2*k), data[,2], n=vertices, ...)$y
# Retain only the middle part.
cbind(x1, x2)[k < x & x <= n+k, ]
}
library(data.table)
hulls.df <- do.call(rbind,lapply(1:length(clusters),function(l){
dt <- data.table(df[which(df$cluster==clusters[l]),])
hull <- dt[, .SD[chull(x,y)]]
spline.hull <- splinePolygon(cbind(hull$x,hull$y),100)
return(data.frame(x=spline.hull[,1],y=spline.hull[,2],val=NA,cluster=clusters[l],col=cluster.cols[l],interval=NA,stringsAsFactors=F))
}))
hulls.df$cluster <- factor(hulls.df$cluster,levels=clusters)
And here's my ggplot command:
library(ggplot2)
p <- ggplot(df,aes(x=x,y=y,colour=interval))+geom_point(cex=2,shape=1,stroke=1)+labs(x="X", y="Y")+theme_bw()+theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank())+scale_color_manual(drop=FALSE,values=colors,name="DE")
p <- p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster),color=hulls.df$col,fill=NA)
which produces:
My question is how do I add a legend for the polygon under the legend for the points? I want it to a legend with 3 lines colored according to the cluster colors and the corresponding cluster number beside each line?
Slightly different output, only changing the last line of your code, it may solve your purpose:
p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster, fill=cluster),alpha=0.1)
Say, you want to add a legend of the_factor. My basic idea is,
(1) put the_factor into mapping by using unused aes arguments; aes(xx = the_factor)
(2) if (1) affects something, delete the effect by using scale_xx_manual()
(3) modify the legend by using guides(xx = guide_legend(override.aes = list()))
In your case, aes(fill) and aes(alpha) are unused. The former is better to do it because of no effect. So I used aes(fill=as.factor(cluster)).
p <- ggplot(df,aes(x=x,y=y,colour=interval, fill=as.factor(cluster))) + # add aes(fill=...)
geom_point(cex=2, shape=1, stroke=1) +
labs(x="X", y="Y",fill="cluster") + # add fill="cluster"
theme_bw() + theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank()) + scale_color_manual(drop=FALSE,values=colors,name="DE") +
guides(fill = guide_legend(override.aes = list(colour = cluster.cols, pch=0))) # add
p <- p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster), color=hulls.df$col,fill=NA)
Of course, you can make the same graph by using aes(alpha = the_factor)). Because it has influence, you need to control it by using scale_alpha_manual().
g <- ggplot(df, aes(x=x,y=y,colour=interval)) +
geom_point(cex=2, shape=1, stroke=1, aes(alpha=as.factor(cluster))) + # add aes(alpha)
labs(x="X", y="Y",alpha="cluster") + # add alpha="cluster"
theme_bw() + theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank()) + scale_color_manual(drop=FALSE,values=colors,name="DE") +
scale_alpha_manual(values=c(1,1,1)) + # add
guides(alpha = guide_legend(override.aes = list(colour = cluster.cols, pch=0))) # add
g <- p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster), color=hulls.df$col,fill=NA)
What you are asking for is two colour scales. My understanding is that this is not possible. But you can give the impression of having two colour scales with a bit of a cheat and using the filled symbols (shapes 21 to 25).
p <- ggplot(df, aes(x = x, y = y, fill = interval)) +
geom_point(cex = 2, shape = 21, stroke = 1, colour = NA)+
labs(x = "X", y = "Y") +
theme_bw() +
theme(legend.key = element_blank(), panel.border = element_blank(), strip.background = element_blank()) +
scale_fill_manual(drop=FALSE, values=colors, name="DE") +
geom_polygon(data = hulls.df, aes(x = x, y = y, colour = cluster), fill = NA) +
scale_colour_manual(values = cluster.cols)
p
Alternatively, use a filled polygon with a low alpha
p <- ggplot(df,aes(x=x,y=y,colour=interval))+
geom_point(cex=2,shape=1,stroke=1)+
labs(x="X", y="Y")+
theme_bw() +
theme(legend.key = element_blank(),panel.border=element_blank(), strip.background=element_blank()) +
scale_color_manual(drop=FALSE,values=colors,name="DE", guide = guide_legend(override.aes = list(fill = NA))) +
geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster, fill = cluster), alpha = 0.2, show.legend = TRUE) +
scale_fill_manual(values = cluster.cols)
p
But this might make the point colours difficult to see.
Is there any way to set the break step size in ggplot without defining a sequence. For example:
x <- 1:10
y <- 1:10
df <- data.frame(x, y)
# Plot with auto scale
ggplot(df, aes(x,y)) + geom_point()
# Plot with breaks defined by sequence
ggplot(df, aes(x,y)) + geom_point() +
scale_y_continuous(breaks = seq(0,10,1))
# Plot with automatic sequence for breaks
ggplot(df, aes(x,y)) + geom_point() +
scale_y_continuous(breaks = seq(min(df$y),max(df$y),1))
# Does this exist?
ggplot(df, aes(x,y)) + geom_point() +
scale_y_continuous(break_step = 1)
You may say I am being lazy but there have been a few occasions where I have had to change the min and max limits of my seq due to the addition of error bars. So I just want to say...use a break size of x, with automatic scale limits.
You can define your own function to pass to the breaks argument. An example that would work in your case would be
f <- function(y) seq(floor(min(y)), ceiling(max(y)))
Then
ggplot(df, aes(x,y)) + geom_point() + scale_y_continuous(breaks = f)
gives
You could modify this to pass the step of the breaks, e.g.
f <- function(k) {
step <- k
function(y) seq(floor(min(y)), ceiling(max(y)), by = step)
}
then
ggplot(df, aes(x,y)) + geom_point() + scale_y_continuous(breaks = f(2))
would create a y-axis with ticks at 2, 4, .., 10, etc.
You can take this even further by writing your own scale function
my_scale <- function(step = 1, ...) scale_y_continuous(breaks = f(step), ...)
and just call it like
ggplot(df, aes(x,y)) + geom_point() + my_scale()
> # Does this exist?
> ggplot(df, aes(x,y)) + geom_point() +
> scale_y_continuous(break_step = 1)
If you're looking for an off-the-shelf solution, then you can use the scales::breaks_width() function like so:
scale_y_continuous(breaks = scales::breaks_width(1))
The scales package also includes handy functions to control breaks easily in "special" scales such as date-time, e.g. scale_x_datetime(breaks='6 hours').
In the code below I build a 40x1000 data frame where in each column I have the cumulative means for successive random draws from an exponential distribution with parameter lambda = 0.2.
I add an additional column to host the specific number of the "draw".
I also calculate the rowmeans as df_means.
How do I add df_means (as a black line) on top of all my simulated RVs? I don't understand ggplot well enough to do this.
df <- data.frame(replicate(1000,cumsum(rexp(40,lambda))/(1:40)))
df$draw <- seq(1,40)
df_means <- rowMeans(df)
Molten <- melt(df, id.vars="draw")
ggplot(Molten, aes(x = draw, y = value, colour = variable)) + geom_line() + theme(legend.position = "none") + geom_line(df_means)
How would I add plot(df_means, type="l") to my ggplot, below?
Thank you,
You can make another data.frame with the means and ids and use that to draw the line,
df_means <- rowMeans(df)
means <- data.frame(id=1:40, mu=df_means)
ggplot(Molten, aes(x=draw, y=value, colour=variable)) +
geom_line() +
theme(legend.position = "none") +
geom_line(data=means, aes(x=id, y=mu), color="black")
As described here
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data=fun, colour="red", geom=geom, width=0.2, ...)
}
k<-ggplot(Molten, aes(x = draw, y = value, colour = variable)) + geom_line() + theme(legend.position = "none")
k+stat_sum_single(mean) #gives you the required plot
I am trying to create a Cleveland Dot Plot given for two categories in this case J and K. The problem is the elements A,B,C are in both categories so R keeps farting. I have made a simple example:
x <- c(LETTERS[1:10],LETTERS[1:3],LETTERS[11:17])
type <- c(rep("J",10),rep("K",10))
y <- rnorm(n=20,10,2)
data <- data.frame(x,y,type)
data
data$type <- as.factor(data$type)
nameorder <- data$x[order(data$type,data$y)]
data$x <- factor(data$x,levels=nameorder)
ggplot(data, aes(x=y, y=x)) +
geom_segment(aes(yend=x), xend=0, colour="grey50") +
geom_point(size=3, aes(colour=type)) +
scale_colour_brewer(palette="Set1", limits=c("J","K"), guide=FALSE) +
theme_bw() +
theme(panel.grid.major.y = element_blank()) +
facet_grid(type ~ ., scales="free_y", space="free_y")
Ideally, I would want a dot plot for both categories(J,K) individually with each factor(vector x) decreasing with respect to the y vector. What ends up happening is that both categories aren't going from biggest to smallest and are erratic at the end instead. Please help!
Unfortunately factors can only have one set of levels. The only way i've found to do this is actually to create two separate data.frames from your data and re-level the factor in each. For example
data <- data.frame(
x = c(LETTERS[1:10],LETTERS[1:3],LETTERS[11:17]),
y = rnorm(n=20,10,2),
type= c(rep("J",10),rep("K",10))
)
data$type <- as.factor(data$type)
J<-subset(data, type=="J")
J$x <- reorder(J$x, J$y, max)
K<-subset(data, type=="K")
K$x <- reorder(K$x, K$y, max)
Now we can plot them with
ggplot(mapping = aes(x=y, y=x, xend=0, yend=x)) +
geom_segment(data=J, colour="grey50") +
geom_point(data=J, size=3, aes(colour=type)) +
geom_segment(data=K, colour="grey50") +
geom_point(data=K, size=3, aes(colour=type)) +
theme_bw() +
theme(panel.grid.major.y = element_blank()) +
facet_grid(type ~ ., scales="free_y", space="free_y")
which results in
I don't know what am I missing in the code?
set.seed(12345)
require(ggplot2)
AData <- data.frame(Glabel=LETTERS[1:7], A=rnorm(7, mean = 0, sd = 1), B=rnorm(7, mean = 0, sd = 1))
TData <- data.frame(Tlabel=LETTERS[11:20], A=rnorm(10, mean = 0, sd = 1), B=rnorm(10, mean = 0, sd = 1))
i <- 2
j <- 3
p <- ggplot(data=AData, aes(AData[, i], AData[, j])) + geom_point() + theme_bw()
p <- p + geom_text(aes(data=AData, label=Glabel), size=3, vjust=1.25, colour="black")
p <- p + geom_segment(data = TData, aes(xend = TData[ ,i], yend=TData[ ,j]),
x=0, y=0, colour="black",
arrow=arrow(angle=25, length=unit(0.25, "cm")))
p <- p + geom_text(data=TData, aes(label=Tlabel), size=3, vjust=1.35, colour="black")
Last line of the code produces the error. Please point me out how to figure out this problem. Thanks in advance.
I have no idea what you are trying to do, but the line that fails is the last line, because you haven't mapped new x and y variables in the mapping. geom_text() needs x and y coords but you only provide the label argument, so ggplot takes x and y from p, which has only 7 rows of data whilst Tlabel is of length 10. That explains the error. I presume you mean to plot at x = A and y = B of TData? If so, this works:
p + geom_text(data=TData, mapping = aes(A, B, label=Tlabel),
size=3, vjust=1.35, colour="black")
(This might get a better answer on the ggplot mailing list.)
It looks like you're trying to display some kind of biplot ... the root of your problem is that you're violating the idiom of ggplot, which wants you to specify variables in a way that's consistent with the scope of the data.
Maybe this does what you want, via some aes_string trickery that substitutes the names of the desired columns ...
varnames <- colnames(AData)[-1]
v1 <- varnames[1]
v2 <- varnames[2]
p <- ggplot(data=AData,
aes_string(x=v1, y=v2)) + geom_point() + theme_bw()
## took out redundant 'data', made size bigger so I could see the labels
p <- p + geom_text(aes(label=Glabel), size=7, vjust=1.25, colour="black")
p <- p + geom_segment(data = TData, aes_string(xend = v1, yend=v2),
x=0, y=0, colour="black",
arrow=arrow(angle=25, length=unit(0.25, "cm")))
## added colour so I could distinguish this second set of labels
p <- p + geom_text(data=TData,
aes(label=Tlabel), size=10, vjust=1.35, colour="blue")