Adding text outside plot doesn't work in r - r

I have a simple dataset:
11 observations, 1 variable.
I want to plot them adding my own axis names, but when I want to change the position of them, R keeps plotting them in the exact same spot.
Here is my script:
plot(data[,5], xlab = "", xaxt='n')
axis(1, at = 1:11, labels = F)
text(1:11, par("usr")[3] - 0.1, srt = 90, adj = 1, labels = names, xpd = TRUE)
I am changing the -0.1, to any number but R keeps placing the labels in the exact same spot. I tried with short names like "a" but the result is the same.
Thanks in advance
My data:
10308.9
10201.6
12685.3
3957.93
7677.1
9671.7
11849.4
10755.7
11283.4
11583.8
12066.9
names <- rep("name",11)

My ggplot solution:
# creating the sample dataframe
data <- read.table(text="10308.9
10201.6
12685.3
3957.93
7677.1
9671.7
11849.4
10755.7
11283.4
11583.8
12066.9", header=FALSE)
# adding a names column
data$names <- as.factor(paste0("name",sprintf("%02.0f", seq(1,11,1))))
#creating the plot
require(ggplot2)
ggplot(data, aes(x=names, y=V1)) +
geom_bar(fill = "white", color = "black")
which gives:
When you want to change the order of the bars, you can do that with transform:
# transforming the data (I placed "name04" as the first one)
data2 <- transform(data,
newnames=factor(names,
levels=c("name04","name01","name02","name03","name04","name05","name06","name07","name08","name09","name10","name11"),
ordered =TRUE))
#creating the plot
ggplot(data2, aes(x=newnames, y=V1)) +
geom_bar(stat="identity", fill="white", color="black")
which gives:

Related

How to align multiple legends and avoid overlapping in ggplot?

The bounty expires in 7 days. Answers to this question are eligible for a +50 reputation bounty.
Electrino wants to draw more attention to this question.
I am trying to create a plot that combines 2 separate legends and a grid of multiple plots. The issue I'm having is I'm finding it difficult to align the legends so they are visible and not overlapping. hopefully the example below will explain what I mean.
To begin I am going to create 2 plots. In these two plots I am only interested in the legends, and I am discarding the actual plot (so please ignore the actual plots in these two plots). To get just the legend I am using the cowplot package.
library(ggplot2)
library(cowplot)
# -------------------------------------------------------------------------
# plot 1 ------------------------------------------------------------------
# create fake data
dfLegend_1 <- data.frame(x = LETTERS[1:10], y = c(1:10))
# set colours
pointColours <- c(A = "#F5736A", B = "#D58D00", C = "#A0A300",
D = "#36B300", E = "#00BC7B", F = "#00BCC2",
G = "#00ADF4", H = "#928DFF", I = "#E568F0",
J = "#808080")
# plot
ggLegend_1 <- ggplot(dfLegend_1, aes(x=x, y=y))+
geom_point(aes(fill = pointColours), shape = 22, size = 10) +
scale_fill_manual(values = unname(pointColours),
label = names(pointColours),
name = 'Variable') +
theme(legend.key.size = unit(0.5, "cm")) +
theme_void()
# get legend
legend_1 <- get_legend(ggLegend_1)
# -------------------------------------------------------------------------
# plot 2 ------------------------------------------------------------------
# Create fake data
dflegend_2 <- data.frame(
x = runif(100),
y = runif(100),
z2 = abs(rnorm(100))
)
# plot
ggLegend_2 <- ggplot(dflegend_2, aes(x=x, y = y))+
geom_point(aes(color = z2), shape = 22, size = 10) +
scale_color_gradientn(
colours = rev(colorRampPalette(c('steelblue', '#f7fcfd', 'orange'))(5)),
limits = c(0,10),
name = 'Gradient',
guide = guide_colorbar(
frame.colour = "black",
ticks.colour = "black"
))
# get legend
legend_2 <- get_legend(ggLegend_2)
Then I am creating many plots (in this example, I am creating 20 individual plots) and plotting them on a grid:
# create data
dfGrid <- data.frame(x = rnorm(10), y = rnorm(10))
# make a list of plots
plotList <- list()
for(i in 1:20){
plotList[[i]] <- ggplot(dfGrid) +
geom_ribbon(aes(x = x, ymin = min(y), ymax = 0), fill = "red", alpha = .5) +
geom_ribbon(aes(x = x, ymin = min(0), ymax = max(y)), fill = "blue", alpha = .5) +
theme_void()
}
# plot them on a grid
gridFinal <- cowplot::plot_grid(plotlist = plotList)
Finally, I am joining the two legends together and adding them to the grid of many plots:
# add legends together into on single plot
legendFinal <- plot_grid(legend_2, legend_1, ncol = 1)
# plot everything on the same plot
plot_grid(gridFinal, legendFinal, rel_widths = c(3, 1))
This results in something that looks like this:
As you can see, the legends overlap and are not very well spaced. I was wondering if there is any way to fit everything in whilst having the legends appropriately spaced and readable?
I should also note, that, in general, there can be any number of variables and any number of gridded plots.
One option to fix your issue would be to switch to patchwork to glue your plots and the legends together. Especially I make use of the design argument to assign more space to the Variable legend. However, you should be aware that legends are much less flexible compared to plots, i.e. the size of legends is in absolute units and will not adjust to the available space. Hence, I'm not sure whether my solution will fit your desire for a "one-size-fits-all" approach.
library(patchwork)
design <-
"
ABCDEU
FGHIJV
KLMNOV
PQRSTV
"
plotList2 <- c(plotList, list(legend_2, legend_1))
wrap_plots(plotList2) +
plot_layout(design = design)

R colour code plot by rownames for principal component analysis

I am attempting to complete a principal component analysis on a set of data containing columns of numeric data.
Assuming a dataset like this (in reality I have a pre configured data frame, this one if for reproducibility):
v1 <- c(1,2,3,4,5,6,7)
v2 <- c(3,6,2,5,2,4,9)
v3 <- c(6,1,4,2,3,7,5)
dataset <-data.frame(v1,v2,v3)
row.names(dataset) <-c('New York', 'Seattle', 'Washington DC', 'Dallas', 'Chicago','Los Angeles','Minneapolis')
I have ran my principal component analysis, and successfully plotted it:
pca=prcomp(dataset,scale=TRUE)
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
What I want to do however is colour code my data points based on the city, which is the row names of my dataset. I also want to use these cities (i.e. rownames) as labels.
I've tried the following, but neither have worked:
## attempt 1 - I get row labels, but no chart
plot(pca$x[,1], pca$x[,2],col=rownames(dataset),pch=rownames(dataset),
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),cex=0.7,pos=3,col="darkgrey")
## attempt 2
datasetwithcity = rownames_to_column(dataset, var = "city")
head(datasetwithcity)
OnlyCities=datasetwithcity[,1]
OnlyCities
# this didn't work:
City_Labels=as.numeric(OnlyCities)
head(City_Labels)
# gets city labels, but loses points and no colour
plot(pca$x[,1], pca$x[,2],col=City_Labels,pch=City_Labels,
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),
cex=0.7,pos=3,col="darkgrey")
There are many different ways to do this.
In base R, you could do:
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC", col = seq(nrow(pca$x)),
xlim = c(-2.5, 2.5), ylim = c(-2, 2))
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
text(x = pca$x[,1], y = pca$x[,2], labels = rownames(pca$x), pos = 1)
Personally, I think the resulting aesthetics are nicer (and more easy to change to suit your needs) with ggplot. The code is also a bit easier to read once you get used to the syntax.
library(ggplot2)
df <- as.data.frame(pca$x)
df$city <- rownames(df)
ggplot(df, aes(PC1, PC2, color = city)) +
geom_point(size = 3) +
geom_text(aes(label = city) , vjust = 2) +
lims(x = c(-2.5, 2.5), y = c(-2, 2)) +
theme_bw() +
theme(legend.position = "none")
Created on 2021-10-28 by the reprex package (v2.0.0)

Can I draw a horizontal line at specific number of range of values using ggplot2?

I have data (from excel) with the y-axis as ranges (also calculated in excel) and the x-axis as cell counts and I would like to draw a horizontal line at a specific value in the range, like a reference line. I tried using geom_hline(yintercept = 450) but I am sure it is quite naive and does not work that way for a number in range. I wonder if there are any better suggestions for it :)
plot.new()
library(ggplot2)
d <- read.delim("C:/Users/35389/Desktop/R.txt", sep = "\t")
head(d)
d <- cbind(row.names(d), data.frame(d), row.names=NULL)
d
g <- ggplot(d, aes(d$CTRL,d$Bin.range))+ geom_col()
g + geom_hline(yintercept = 450)
First of all, have a look at my comments.
Second, this is how I suggest you to proceed: don't calculate those ranges on Excel. Let ggplot do it for you.
Say, your data is like this:
df <- data.frame(x = runif(100, 0, 500))
head(df)
#> x
#>1 322.76123
#>2 57.46708
#>3 223.31943
#>4 498.91870
#>5 155.05416
#>6 107.27830
Then you can make a plot like this:
library(ggplot2)
ggplot(df) +
geom_histogram(aes(x = x),
boundary = 0,
binwidth = 50,
fill = "steelblue",
colour = "white") +
geom_vline(xintercept = 450, colour = "red", linetype = 2, size = 1) +
coord_flip()
We don't have your data, but the following data frame is of a similar structure:
d <- data.frame(CTRL = sample(100, 10),
Bin.range = paste(0:9 * 50, 0:9 * 50 + 49.9, sep = "-"))
The first thing to note is that your y axis does not have your ranges ordered correctly. You have 50-99.9 at the top of the y axis. This is because your ranges are stored as characters and ggplot will automatically arrange these alphabetically, not numerically. So you need to reorder the factor levels of your ranges:
d$Bin.range <- factor(d$Bin.range, d$Bin.range)
When you create your plot, don't use d$Bin.range, but instead just use Bin.range. ggplot knows to look for this variable in the data frame you have passed.
g <- ggplot(d, aes(CTRL, Bin.range)) + geom_col()
If you want to draw a horizontal line, your two options are to specify the y axis label at which you want to draw the line (i.e. yintercept = "400-449.9") or, which is what I suspect you want, use a numeric value of 9.5 which will put it between the top two values:
g + geom_hline(yintercept = 9.5, linetype = 2)

Stagger axis labels, new feature in ggplot2

Hi there: I need to plot a factor with 81 different categories with different frequency counts each. Each factor name is a 4-letter category. It looks like this. As you can see, it is pretty tough to read the factor labels. I'd like to stagger the y-axis according to this suggestion. However, this issue on github suggests that something has changed in ggplot2 and that the hjust and vjust options no longer work. Does anyone have any suggestions to make this plot look better, in particular to make the factor levels readable.
#libraries
# install.packages('stringi')
library(ggplot2)
library(stringi)
#fake data
var<-stri_rand_strings(81, 4, pattern='[HrhEgeIdiFtf]')
var1<-rnorm(81, mean=175, sd=75)
#data frame
out<-data.frame(var, var1)
#set levels for plotting
out$var<-factor(out$var, levels=out$var[order(out$var1, decreasing=FALSE)])
#PLot
out.plot<-out %>%
ggplot(., aes(x=var, y=var1))+geom_point()+coord_flip()
#Add staggered axis option
out.plot+theme(axis.text.y = element_text(hjust = grid::unit(c(-2, 0, 2), "points")))
To stagger the labels, you could add spaces to the labels in the dataframe.
# Libraries
library(ggplot2)
library(stringi)
# fake data
set.seed(12345)
var <- stri_rand_strings(81, 4, pattern = '[HrhEgeIdiFtf]')
var1 <- rnorm(81, mean = 175, sd = 75)
out <- data.frame(var, var1)
# Add spacing, and set levels for plotting
out = out[order(out$var1), ]
out$var = paste0(out$var, c("", " ", " "))
out$var <- factor(out$var, levels = out$var[order(out$var1, decreasing = FALSE)])
# Plot
out.plot <- ggplot(out, aes(x = var, y = var1)) +
geom_point() + coord_flip()
out.plot
Alternatively, draw the original plot, then edit. Here, I use the grid function, editGrob() to do the editing.
# Libraries
library(ggplot2)
library(gtable)
library(grid)
library(stringi)
# fake data
set.seed(12345)
var <- stri_rand_strings(81, 4, pattern = '[HrhEgeIdiFtf]')
var1 <- rnorm(81, mean = 175, sd = 75)
out <- data.frame(var, var1)
# Set levels for plotting
out$var <- factor(out$var, levels = out$var[order(out$var1, decreasing = FALSE)])
# Plot
out.plot <- ggplot(out, aes(x = var, y = var1)) +
geom_point() + coord_flip()
# Get the ggplot grob
g = ggplotGrob(out.plot)
# Get a hierarchical list of component grobs
grid.ls(grid.force(g))
Look through the list to find the section referring to the left axis. The relevant bit is:
axis-l.6-3-6-3
axis.line.y..zeroGrob.232
axis
axis.1-1-1-1
GRID.text.229
axis.1-2-1-2
You will need to set up path from 'axis-l', through 'axis', through 'axis', though to 'GRID.text'.
# make the relevant column a little wider
g$widths[3] = unit(2.5, "cm")
# The edit
g = editGrob(grid.force(g),
gPath("axis-l", "axis", "axis", "GRID.text"),
x = unit(c(-1, 0, 1), "npc"),
grep = TRUE)
# Draw the plot
grid.newpage()
grid.draw(g)
Another option is to find your way through the structure to the relevant grob to make the edit.
# Get the grob
g <- ggplotGrob(out.plot)
# Get the y axis
index <- which(g$layout$name == "axis-l") # Which grob
yaxis <- g$grobs[[index]]
# Get the ticks (labels and marks)
ticks <- yaxis$children[[2]]
# Get the labels
ticksL <- ticks$grobs[[1]]
# Make the edit
ticksL$children[[1]]$x <- rep(unit.c(unit(c(1,0,-1),"npc")), 27)
# Put the edited labels back into the plot
ticks$grobs[[1]] <- ticksL
yaxis$children[[2]] <- ticks
g$grobs[[index]] <- yaxis
# Make the relevant column a little wider
g$widths[3] <- unit(2.5, "cm")
# Draw the plot
grid.newpage()
grid.draw(g)
Sandy mentions adding spaces to the labels.
With a discrete axis, you can also simply add line breaks to alternate cases. In my case I wanted to stagger alternate ones:
scale_x_discrete(labels=paste0(c("","\n"),net_change$TZ_t)
Where net_change$TZ_t is my ordered factor. It extends to 'triple' levels easily with c("","\n","\n\n").

ggplot plot axis ticks and labels separately

I am looking for a way to create ticks and labels in different positions on a ggplot.
Sample code
#load libraries
library(ggplot2)
library(reshape2)
#create data
df <-data.frame(A=1:6,B=c(0.6,0.5,0.4,0.2,0.3,0.8),C=c(0.4,0.5,0.6,0.8,0.7,0.2),D=c("cat1","cat1","cat1","cat2","cat2","cat2"))
df
df1 <- melt(df,measure.vars=c("B","C"))
#plot
p <- ggplot()+
geom_bar(data=df1,aes(x=A,y=value,fill=variable),stat="identity")+
theme(axis.title=element_blank(),legend.position="none")
print(p)
In this figure, the default has the ticks and labels at same position (defined by breaks). And the x axis line is missing altogether due to the theme.
Instead, I would like to have ticks at these positions
tpoint <- c(1,3,4,6)
and labels at these positions
lpoint <- data.frame(pos=c(2,5),lab=c("cat1","cat2"))
And eventually a figure something like one shown below with partial x-axis line or full x-axis line:
This puts my labels in place
p1 <- p + scale_x_discrete(breaks=lpoint$pos,labels=lpoint$lab)
But the ticks are in the wrong place and multiple scales are not possible?
The closest I could come to your desired output is this:
dfannotate <- data.frame(x = c(2, 5), xmin = c(1, 4), xmax = c(3, 6), y = -.01, height=.02)
dfbreaks = data.frame(lim = 1:6, lab = c('', 'cat1', '', '', 'cat2', ''))
p + geom_errorbarh(data = dfannotate, aes(x, y, xmin=xmin, xmax=xmax, height=height)) +
scale_x_discrete(limits=dfbreaks$lim, labels=dfbreaks$lab) +
scale_y_continuous(expand = c(0, 0), limits=c(-0.02, 1.02)) +
theme(axis.ticks.x = element_line(linetype=0))

Resources