R: Connecting Points in Arbitrary Order [duplicate] - r

This question already has answers here:
Can I input my own order for a ggplot path?
(3 answers)
Closed 11 months ago.
I am working with the R programming language.
I generated the following random data set in R and made a plot of these points:
library(ggplot2)
set.seed(123)
x_cor = rnorm(5,100,100)
y_cor = rnorm(5,100,100)
my_data = data.frame(x_cor,y_cor)
x_cor y_cor
1 43.95244 271.50650
2 76.98225 146.09162
3 255.87083 -26.50612
4 107.05084 31.31471
5 112.92877 55.43380
ggplot(my_data, aes(x=x_cor, y=y_cor)) + geom_point() + ggtitle("Travelling Salesman Example")
Suppose I want to connect these dots together in the following order: 1 with 3, 3 with 4, 4 with 5, 5 with 2, 2 with 1
I can make a new variable that contains this ordering:
my_data$order = c(3, 1, 4, 5, 2)
Is it possible to make this kind of graph using ggplot2?
I tried the following code - but this connects the points based on the order they appear in, and not the custom ordering:
ggplot(my_data, aes(x = x_cor, y = y_cor)) +
geom_path() +
geom_point(size = 2)
I could probably manually re-shuffle the dataset to match this ordering - but is there an easier way to do this?
In the past, I have made these kind of graphs using "igraph" - but is it possible to make them with ggplot2?
Can someone please show me how to do this?
Thanks!

You can order your data like so:
my_data$order = c(1, 5, 2, 3, 4)
ggplot(my_data[order(my_data$order),], aes(x = x_cor, y = y_cor)) +
geom_path() +
geom_point(size = 2)
If you want to close the path, use geom_polygon:
ggplot(my_data[order(my_data$order),], aes(x = x_cor, y = y_cor)) +
geom_polygon(fill = NA, color = "black") +
geom_point(size = 2)

Related

adding a line to a ggplot boxplot

I'm struggling with ggplot2 and I've been looking for a solution online for several hours. Maybe one of you can give me a help? I have a data set that looks like this (several 100's of observations):
Y-AXIS
X-AXIS
SUBJECT
2.2796598
F1
1
0.9118639
F1
2
2.7111228
F3
3
2.7111228
F2
4
2.2796598
F4
5
2.3876401
F10
6
....
...
...
The X-AXIS is a continuous value larger than 0 (the upper limit can vary from data set to data set, but is typically < 100). Y-AXIS is a categorical variable with 10 levels. SUBJECT refers to an individual and, across the entire data set, each individual has exactly 10 observations, exactly 1 for each level of the categorical variable.
To generate a box plot, I used ggplot like this:
plot1 <- ggplot(longdata,
aes(x = X_axis, y = Y_axis)) +
geom_boxplot() +
ylim(0, 12.5) +
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
That results in the boxplot I have in mind. You can check out the result here if you like: boxplot
So far so good. What I want to do next, and hopefully someone can help me, is this: for one specific SUBJECT, I want to plot a line for their 10 scores in the same figure. So on top of the boxplot. An example of what I have in mind can be found here: boxplot with data of one subject as a line. In this case, I simply assumed that the outliers belong to the same case. This is just an assumption. The data of an individual case can also look like this: boxplot with data of a second subject as a line
Additional tips on how to customize that line (colour, thikness, etc.) would also be appreciated. Many thanks!
library(ggplot2)
It is always a good idea to add a reproducible example of your data,
you can always simulate what you need
set.seed(123)
simulated_data <- data.frame(
subject = rep(1:10, each = 10),
xaxis = rep(paste0('F', 1:10), times = 10),
yaxis = runif(100, 0, 100)
)
In ggplot each geom can take a data argument, for your line just use
a subset of your original data, limited to the subject desired.
Colors and other visula elements for the line are simple, take a look here
ggplot() +
geom_boxplot(data = simulated_data, aes(xaxis, yaxis)) +
geom_line(
data = simulated_data[simulated_data$subject == 1,],
aes(xaxis, yaxis),
color = 'red',
linetype = 2,
size = 1,
group = 1
)
Created on 2022-10-14 with reprex v2.0.2
library(ggplot2)
library(dplyr)
# Simulate some data absent a reproducible example
testData <- data.frame(
y = runif(300,0,100),
x = as.factor(paste0("F",rep(1:10,times=30))),
SUBJECT = as.factor(rep(1:30, each = 10))
)
# Copy your plot with my own data + ylimits
plot1 <- ggplot(testData,
aes(x = x, y = y)) +
geom_boxplot() +
ylim(0, 100) +
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
# add the geom_line for subject 1
plot1 +
geom_line(data = filter(testData, SUBJECT == 1),
mapping = aes(x=x, y=y, group = SUBJECT))
My answer is very similar to Johan Rosa's but his doesn't use additional packages and makes the aesthetic options for the geom_line much more apparent - I'd follow his example if I were you!

ordering facet_wrap levels by frequency

I'm trying to do a waffle chart for the championships won by F1 drivers so far. The chart comes out good but it comes out with alphabetical labels. I want it to start from the most titles won to the least.
I've tried ordering and fct_relevel. But nothing works. Below is the code
ggplot(data = dfc, aes(fill=Champions, values=one)) +
geom_waffle(color = "cornsilk", size=0.25, n_rows=7)+
facet_wrap(~Champions, nrow = 3, strip.position = "bottom",labeller = label_wrap_gen(6))
And this is the
result I'm looking for.
You can find the entire code here
The dataset looks like
Season Champions Team one
1 a x 1
2 a x 1
3 b y 1
4 a x 1
5 c z 1
Here's a solution using forcats (also part of the tidyverse package).
fct_infreq() orders factors according to their frequency in the data, and you can use that to specify the ordering of the levels in your data.
dfc$Champions <- factor(dfc$Champions, levels=levels(fct_infreq(dfc$Champions)))
ggplot(data = dfc, aes(fill=Champions, values=one)) +
geom_waffle(color = "cornsilk", size=0.25, n_rows=7) +
facet_wrap(~Champions, nrow = 3, strip.position = "bottom", labeller = label_wrap_gen(6))

Is there a way to create a ggplot facetted scatterplot with n colours, such that the colours alternate or are randomised from the palette?

I've got 40 subjects in my dataset, 4 in each group and I want to create a plot that shows a line for each subject replicate (3 replicates); colouring them by subject, shaping by replicate. The problem I have is that the colours are so similar in each facet (group) that I can't really tell them apart.
My main body of code for the plot is:
ggplot(T_S, aes(x=as.numeric(Day), y=variable, colour=as.factor(Subj))) +
geom_point(aes(shape=as.factor(Rep))) +
geom_line(aes(linetype=as.factor(Rep))) +
facet_wrap(.~Group,ncol=3) +
theme_bw() +
theme(legend.position="none")
And an example of what I mean by not being able to distinguish between colours is below using the viridis package. Is there a way to get the colours to alternate between the dark purple and yellow within each facet?
[Example with the Viridis Package][1]
Other things I've tried:
scale_color_brewer(palette="Dark2")
scale_fill_manual(values = wes_palette("GrandBudapest1", n = 38))
scale_color_gradientn(colours = rainbow(40))
I also looked into the PolyChrome and randomcoloR packages, but can't see how they work with ggplot2. Any other suggestions also welcome!
Thanks in advance for your help.
[1]: https://i.stack.imgur.com/2iHXY.png
It's difficult for anyone to give you a tested, working solution on Stack Overflow if you don't include a sample of your data in the question. However, after a bit of reverse-engineering I have created what should be a reasonable approximation of your data structure:
set.seed(69)
T_S <- data.frame(Day = rep(c(4, 11, 16, 25), 120),
Subj = rep(1:40, each = 12),
Rep = rep(rep(1:3, each = 4), 40),
Group = factor(rep(1:10, each = 48), c(2:10, 1)),
variable = c(replicate(120, cumsum(runif(4, 0, 400)) + 250)))
And we can see that with your plotting code, we get similar results:
library(ggplot2)
ggplot(T_S, aes(x=as.numeric(Day), y=variable, colour=as.factor(Subj))) +
geom_point(aes(shape=as.factor(Rep))) +
geom_line(aes(linetype=as.factor(Rep))) +
facet_wrap(.~Group,ncol=3) +
theme_bw() +
theme(legend.position="none") +
scale_colour_viridis_d()
The reason for this is that your subject numbers are consecutive and correlate completely with the group. For the purposes of plotting, we want just four colors - one for each subject in each panel. The easiest way to achieve this is to renumber the subjects to 1:4 within each panel.
T_S$Subj <- T_S$Subj %% 4 + 1
So now the exact same plotting code gives us:
ggplot(T_S, aes(x=as.numeric(Day), y=variable, colour=as.factor(Subj))) +
geom_point(aes(shape=as.factor(Rep))) +
geom_line(aes(linetype=as.factor(Rep))) +
facet_wrap(.~Group,ncol=3) +
theme_bw() +
theme(legend.position="none") +
scale_colour_viridis_d()
It appears that it can be done through the package "RColorBrewer", by the combination of colours through different palettes.
Using the simulated data from Allan Cameron:
require(ggplot2)
require(RColorBrewer)
set.seed(69)
T_S <- data.frame(Day = rep(c(4, 11, 16, 25), 120),
Subj = rep(1:40, each = 12),
Rep = rep(rep(1:3, each = 4), 40),
Group = factor(rep(1:10, each = 48), c(2:10, 1)),
variable = c(replicate(120, cumsum(runif(4, 0, 400)) + 250)))
Now we can combine the colours from different palettes:
mycolours = c(brewer.pal(name="Accent", n = 7),
brewer.pal(name="Dark2", n = 7),
brewer.pal(name="Paired", n = 7),
brewer.pal(name="Set3", n = 7),
brewer.pal(name="Dark2", n = 7),
brewer.pal(name="Set1", n = 7))
The palettes don't really matter in this instance, but I tried to choose them so they stood out. There is great documentation here for the full details on the colour palettes here: https://www.datanovia.com/en/blog/the-a-z-of-rcolorbrewer-palette/. More or fewer colours can be chosen as well, in this instance there are 42. Problems may occur if you need say, 1000 colours for instance, as there aren't that many in this package.
Finally the code to plot the data:
ggplot(T_S, aes(x=as.numeric(Day), y= variable, colour = as.factor(Subj))) +
geom_point(aes(shape = as.factor(Rep))) +
geom_line(aes(linetype = as.factor(Rep))) +
facet_wrap(.~Group, ncol = 3) +
theme_bw() +
theme(legend.position="none") +
scale_color_manual(values = mycolours)
https://imgur.com/fGsXy4k
GB

Increase the size of variable-size points in ggplot2 scatter plot [duplicate]

This question already has an answer here:
defining minimum point size in ggplot2 - geom_point
(1 answer)
Closed 9 years ago.
I am plotting a scatter plot where each point has a different size corresponding to the number of observations. Below is the example of code and the image output:
rm(list = ls())
require(ggplot2)
mydf <- data.frame(x = c(1, 2, 3),
y = c(1, 2, 3),
count = c(10, 20, 30))
ggplot(mydf, aes(x = x, y = y)) + geom_point(aes(size = count))
ggsave(file = '2013-11-25.png', height = 5, width = 5)
This is quite nice, but is there a way to increase the sizes of all of the points? In particular, as it currently is, the point for "10" is too small and thus very hard to see.
Use:
<your ggplot code> + scale_size_continuous(range = c(minSize, maxSize))
where minSize is your minimum point size and maxSize is your maximum point size.
Example:
ggplot(mydf, aes(x = x, y = y)) +
geom_point(aes(size = count)) +
scale_size_continuous(range = c(3, 7))

Adding xbreaks to ggplot

Hello I have generated the following:
Using the following code:
library(ggplot2,quietly=TRUE)
args <- commandArgs(TRUE)
data <-read.table(args[1],header=T,sep="\t")
pdf(args[2])
p1 <- ggplot( data,aes(x=Mutation,y=difference) ) +
geom_bar( width=.75,aes(fill=as.factor(position),stat="identity"),position="dodge") +
geom_errorbar(aes(y=difference, ymin=difference-difference_std, ymax=difference+difference_std )) +
theme(legend.key = element_blank()) +
ylab(expression(Delta*"Energy (Design - Wild Type)" )) +
xlab( "Mutation" ) +
ylim(-2.5,2.5) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,size=9,hjust=1))+
opts(legend.position="none")+
ggtitle(expression("PG9 Mutational Analysis"))
p1
dev.off()
As you can see I've sort of grouped the positions using the fill layer so you can group them together. Ideally though I would like to put a break in the xbreak when the position changes instead of grouping them by the color.
So it would go like orange bar, orange bar, break, green bar, break, cyan bar, break, etc...
*EDIT:
Here is what the data from an input table looks like:
position Mutation difference difference_std
97 R97L -0.3174848488298787 0.2867477631591484
97 R97N 0.5333956571765566 0.35232170408577224
99 A99H -0.2294999853769939 0.24017957128601522
99 A99S -0.45171049187057877 0.0013816966425459903
101 G101S 0.5315110947026147 0.08483810927415139
102 P102S -0.04872141488960846 0.02890820273131048
103 D103Y 0.6692000007629395 0.07312815307204293
....
So all the positions would be grouped together with a break on either side.
I hope I'm making sense. Is there an easy way to do this?
J
I think the easiest way would be to use faceting (used sample data from question). To get new variable for the faceting, first, order your data frame according to Mutation. Then add new column pos2 where using cumsum() and diff() according to column position sequence of numbers is added (idea of #agstudy).
df2<-df[order(df$Mutation),]
df2$pos2<-cumsum(c(0,diff(df2$position)) != 0)
df2
position Mutation difference difference_std pos2
3 99 A99H -0.22949999 0.240179571 0
4 99 A99S -0.45171049 0.001381697 0
7 103 D103Y 0.66920000 0.073128153 1
5 101 G101S 0.53151109 0.084838109 2
6 102 P102S -0.04872141 0.028908203 3
1 97 R97L -0.31748485 0.286747763 4
2 97 R97N 0.53339566 0.352321704 4
Now use new column pos2 for facetting. With theme() and strip.text= and strip.background= you can remove strip texts and grey background.
ggplot(df2,aes(x=Mutation,y=difference) ) +
geom_bar( width=.75,aes(fill=as.factor(position),
stat="identity"),position="dodge") +
geom_errorbar(aes(y=difference, ymin=difference-difference_std,
ymax=difference+difference_std )) +
theme(legend.key = element_blank()) +
ylim(-2.5,2.5) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,size=9,hjust=1))+
theme(legend.position="none")+
theme(strip.text=element_blank(),
strip.background=element_blank())+
facet_grid(.~pos2,scales="free_x",space="free_x")
UPDATE - empty levels
Other possibility is to use scale_x_discrete() and argument limits= and add empty levels in places where you need space between bars (actual levels). Problem with this approach is that you need to add those levels manually.
For example used the same data as in previous example (ordered question data).
ggplot(df2,aes(x=Mutation,y=difference) ) +
geom_bar( width=.75,aes(fill=as.factor(position),
stat="identity"),position="dodge") +
geom_errorbar(aes(y=difference, ymin=difference-difference_std,
ymax=difference+difference_std )) +
theme(legend.key = element_blank()) +
ylim(-2.5,2.5) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,size=9,hjust=1))+
theme(legend.position="none")+
scale_x_discrete(limits=c("A99H", "A99S","", "D103Y","","G101S",
"","P102S","","R97L", "R97N"))
I think you need first to create a new variable that group the labels.
dat$group <- cumsum(c(0,diff(dat$position)) != 0)
Mutation difference position group
20 0wYpO 0.93746859 4 0
17 63L00 -0.57833783 4 0
3 6hfEp -1.01620448 3 1
1 FvAz4 0.09496127 2 2
8 ghNTj -1.10180158 3 3
14 GxYzD 0.41997924 3 3
Then I don't think it will be an easy way to add a gap to your plot. But You can create a barplot by group and add the genes names with geom_text. here a first version, I hope that someone more experienced with ggplot2 can help to adjust the texts in the middle
ggplot( dat,aes(x=factor(group),y=difference) ) +
geom_bar( width=.75,aes(fill=as.factor(Mutation)),stat="identity",
position="dodge") +
geom_text(aes(label=Mutation),position=position_dodge(height=0.9),angle=90)+
theme_bw() +
opts(legend.position="none")
PS: The code below can be used to generate data:
N <- 21
Mutation <- replicate(N,paste(sample(c(0:9, letters, LETTERS),
5, replace=TRUE),
collapse=""))
difference <- rnorm(N)
position <- c(4, 4, 3, 2, 3, 3, 2, 3, 2,
5, 2, 2, 2, 5, 5,
5, 2, 4, 5, 4, 2)
difference_std <- sd(difference)
dat <- data.frame(Mutation,difference)
dat <- dat[order(dat$Mutation),]
dat$position <- position
dat$group <- cumsum(c(0,diff(dat$position)) != 0)

Resources