ordering facet_wrap levels by frequency - r

I'm trying to do a waffle chart for the championships won by F1 drivers so far. The chart comes out good but it comes out with alphabetical labels. I want it to start from the most titles won to the least.
I've tried ordering and fct_relevel. But nothing works. Below is the code
ggplot(data = dfc, aes(fill=Champions, values=one)) +
geom_waffle(color = "cornsilk", size=0.25, n_rows=7)+
facet_wrap(~Champions, nrow = 3, strip.position = "bottom",labeller = label_wrap_gen(6))
And this is the
result I'm looking for.
You can find the entire code here
The dataset looks like
Season Champions Team one
1 a x 1
2 a x 1
3 b y 1
4 a x 1
5 c z 1

Here's a solution using forcats (also part of the tidyverse package).
fct_infreq() orders factors according to their frequency in the data, and you can use that to specify the ordering of the levels in your data.
dfc$Champions <- factor(dfc$Champions, levels=levels(fct_infreq(dfc$Champions)))
ggplot(data = dfc, aes(fill=Champions, values=one)) +
geom_waffle(color = "cornsilk", size=0.25, n_rows=7) +
facet_wrap(~Champions, nrow = 3, strip.position = "bottom", labeller = label_wrap_gen(6))

Related

R: Connecting Points in Arbitrary Order [duplicate]

This question already has answers here:
Can I input my own order for a ggplot path?
(3 answers)
Closed 11 months ago.
I am working with the R programming language.
I generated the following random data set in R and made a plot of these points:
library(ggplot2)
set.seed(123)
x_cor = rnorm(5,100,100)
y_cor = rnorm(5,100,100)
my_data = data.frame(x_cor,y_cor)
x_cor y_cor
1 43.95244 271.50650
2 76.98225 146.09162
3 255.87083 -26.50612
4 107.05084 31.31471
5 112.92877 55.43380
ggplot(my_data, aes(x=x_cor, y=y_cor)) + geom_point() + ggtitle("Travelling Salesman Example")
Suppose I want to connect these dots together in the following order: 1 with 3, 3 with 4, 4 with 5, 5 with 2, 2 with 1
I can make a new variable that contains this ordering:
my_data$order = c(3, 1, 4, 5, 2)
Is it possible to make this kind of graph using ggplot2?
I tried the following code - but this connects the points based on the order they appear in, and not the custom ordering:
ggplot(my_data, aes(x = x_cor, y = y_cor)) +
geom_path() +
geom_point(size = 2)
I could probably manually re-shuffle the dataset to match this ordering - but is there an easier way to do this?
In the past, I have made these kind of graphs using "igraph" - but is it possible to make them with ggplot2?
Can someone please show me how to do this?
Thanks!
You can order your data like so:
my_data$order = c(1, 5, 2, 3, 4)
ggplot(my_data[order(my_data$order),], aes(x = x_cor, y = y_cor)) +
geom_path() +
geom_point(size = 2)
If you want to close the path, use geom_polygon:
ggplot(my_data[order(my_data$order),], aes(x = x_cor, y = y_cor)) +
geom_polygon(fill = NA, color = "black") +
geom_point(size = 2)

ggplot facet_wrap with equally spaced axes

Say I have the following dummy data frame:
df <- data.frame(let = LETTERS[1:13], value = sample(13),
group = rep(c("foo", "bar"), times = c(5,8)))
df
let value group
1 A 2 foo
2 B 1 foo
3 C 12 foo
4 D 8 foo
5 E 4 foo
6 F 13 bar
7 G 11 bar
8 H 3 bar
9 I 7 bar
10 J 5 bar
11 K 10 bar
12 L 9 bar
13 M 6 bar
Using ggplot with facet_wrap allows me to make a panel for each of the groups...
library(ggplot2)
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free")
..but the vertical axes are not equally spaced, i.e. the left plot contains more vertical ticks than the right one. I would like to fill up the right vertical axis with (unlabeled) ticks (with no plotted values). In this case that would add 3 empty ticks, but it should be scalable to any df size.
What is the best way to accomplish this? Should I change the data frame, or is there a way to do this using ggplot?
I’m not sure why you want to arrange the categorical variable on your chart as you do other than aesthetics (it does seem to look better). At any rate, a simple workaround which seems to handle general cases is to note that ggplot uses a numerical scale to plot categorical variables. The workaround for your chart is then for each x value to plot a transparent point at the y value equal to the number of categorical variables. Points are plotted for all x values as a simple solution to the case of non-overlapping ranges of x values for each group. I've added another group to your data frame to make the example a bit more general.
library(ggplot2)
set.seed(123)
df <- data.frame(let = LETTERS[1:19], value = c(sample(13),20+sample(6)),
group = rep(c("foo", "bar", "bar2"), times = c(5,8,6)))
num_rows <- xtabs(~ group, df)
max_rows <- max(num_rows)
sp <- ggplot(df, aes(y= let, x = value)) +
geom_point() +
geom_point(aes(y = max_rows +.5), alpha=0 ) +
facet_wrap(~group, scales = "free", nrow=1 )
plot(sp)
This gives the following chart:
A cludgy solution that requires magrittr (for the compound assignment pipe %<>%):
df %<>%
rbind(data.frame(let = c(" ", " ", " "),
value = NA,
group = "foo"))
I just add three more entries for foo that are blank strings (i.e., just spaces) of different lengths. There must be a more elegant solution, though.
Use free_x instead of free, like this:
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free_x")+
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())

ggplot2: create ordered group bar plot - (use reorder)

I want to create grouped bar plot while keeping order. If it was single column and not a grouped bar plot use of reorder function is obvious. But not sure how to use it on a melted data.frame.
Here is the detail explanation with code example:
Lets say we have following data.frame:
d.nfl <- data.frame(Team1=c("Vikings", "Chicago", "GreenBay", "Detroit"), Win=c(20, 13, 9, 12))
plotting a simple bar plot while flipping it.
ggplot(d.nfl, aes(x = Team1, y=Win)) + geom_bar(aes(fill=Team1), stat="identity") + coord_flip()
above plot will not have an order and if I want to order the plot by win I can do following:
d.nfl$orderedTeam <- reorder(d.nfl$Team1, d.nfl$Win)
ggplot(d.nfl, aes(x = orderedTeam, y=Win)) + geom_bar(aes(fill=orderedTeam), stat="identity") + coord_flip()
Now lets say we add another column (to original data frame)
d.nfl$points <- c(12, 3, 45, 5)
Team1 Win points
1 Vikings 20 12
2 Chicago 13 3
3 GreenBay 9 45
4 Detroit 12 5
to generate grouped bar plot, first we need to melt it:
library(reshape2)
> d.nfl.melt <- melt(d.nfl[,c('Team1','Win','points')],id.vars = 1)
> ggplot(d.nfl.melt,aes(x = Team1,y = value)) + geom_bar(aes(fill = variable),position = "dodge", stat="identity") + coord_flip()
above ggplot is unordered.
but how I do ordered group bar plot (ascending manner)
This is a non-issue.
The easiest way is to not discard your ordered team in the melt:
d.nfl.melt <- melt(d.nfl,id.vars = c("Team1", "orderedTeam"))
Alternatively, we can use reorder after melting and just only use the Win elements in computing the ordering:
d.nfl.melt$ordered_after_melting = reorder(
d.nfl.melt$Team1,
X = d.nfl.melt$value * (d.nfl.melt$variable == "Win")
)
Yet another idea is to take the levels from the original ordered column and apply them to a melted factor:
d.nfl.melt$copied_levels = factor(
d.nfl.melt$Team1,
levels = levels(d.nfl$orderedTeam)
)
All three methods give the same result. (I left out the coord_flips because they don't add anything to the question, but you can of course add them back in.)
gridExtra::grid.arrange(
ggplot(d.nfl.melt,aes(x = orderedTeam, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity"),
ggplot(d.nfl.melt,aes(x = ordered_after_melting, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity"),
ggplot(d.nfl.melt,aes(x = copied_levels, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity")
)
As to the easiest, I would recommend just keeping the orderedTeam variable around while melting. Your code seems to work hard to leave it out, it's quite easy to keep it in.
The challenge your question presents is how to reorder a factor Team1 based on a subset values in a melted column.
The comments to your question from #alistaire and #joran link to great answers.
The tl;dr answer is to just apply the ordering from your original, unmelted data.frame to the new one using levels().
library(reshape2)
#Picking up from your example code:
d.nfl.melt <- melt(d.nfl[,c('Team1','Win','points')],id.vars = 1)
levels(d.nfl.melt$Team1)
#Current order is alphabetical
#[1] "Chicago" "Detroit" "GreenBay" "Vikings"
#Reorder based on Wins (using the same order from your earlier, unmelted data.frame)
d.nfl.melt$Team1 <- factor(d.nfl.melt$Team1, levels = levels(d.nfl$orderedTeam)) #SOLUTION
levels(d.nfl.melt$Team1)
#New order is ascending by wins
#[1] "GreenBay" "Detroit" "Chicago" "Vikings"
ggplot(d.nfl.melt,aes(x = Team1,y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity") + coord_flip()

compare different datasets with stacked bar graphs in R

I need to compare two different methods that each of them has 3 different results in one graph with using stacked bar style.
I want to draw a plot so that x axis shows the experiment and y axis shows results. and each bar fills with 3 results in stacked bar format.
experiment method resuult1 result2 result3
1 m1 1 2 3
1 m2 4 5 6
2 m1 7 8 9
2 m2 10 11 12
3 m1 13 14 15
3 m2 16 17 18
I have this code for comparing two data set how can i change it.
library(ggplot2);
pdf(file = '$filename.pdf', width=5, height=5);
data1 <- as.matrix(read.table('$INPUT_FILE1', header = T));
data1.experiment <- as.numeric(data1[,\"Experiment\"]);
data1.obs <- as.numeric(data1[,\"Result1\"]);
data1.method <- as.factor(data1[,\"Method\"]);
df <- data.frame(data1.experiment, data1.method, data1.obs);
orderlist = c("70", "100", "130", "160", "190", "260");
ggplot(df, aes(x = data1.experiment, y = data1.obs, fill = data1.method), ylim=c(60000, 2800000)) +
geom_bar(stat='identity', position='dodge')+
labs(x='$xlabel',y='$ylabel', fill='Methods') +
scale_fill_manual(values = c('red','blue'), labels = c('DTB-MAC', 'IEEE802.11P')) +
scale_x_continuous(breaks = orderlist)+
theme(legend.position = c(1, 1), legend.justification = c(1, 1), legend.background = element_rect(colour = NA, fill = 'white'));
You said that you need to compare the methods. If you represent experiment on x-axis and result on y then how will you represent method??? My way of doing it is using the facet. Here is the code for how to do it using ggplot2.
dat <- read.csv("data.csv")
library(reshape2)
library(ggplot2)
dat1 <- melt(dat,id.vars = c("experiment","method"))
p <- ggplot(dat1,aes(experiment,value,fill=variable))+geom_bar(stat="identity")+
facet_wrap(~method,nrow=1)
p
This sort of multi-dimensional chart is best explored using the ggplot2 package. I will assume here that the data you have pasted is stored in the data.frame d:
require(reshape2) ## needed to have all experiments in one variable
require(ggplot2) ## needed for the great vizualizations
d <- melt(d, id.vars=c("experiment", "method"))
ggplot(d, aes(x=factor(experiment), y=value, fill=variable)) +
geom_bar(stat="identity") +
facet_wrap(~method)
You can polish the graph further using custom labels, but that is too long to explore here. The questions with the ggplot2 tag have lots of great examples.
EDIT: Corrected to show the methods too, as already answered by #user2743244

Adding xbreaks to ggplot

Hello I have generated the following:
Using the following code:
library(ggplot2,quietly=TRUE)
args <- commandArgs(TRUE)
data <-read.table(args[1],header=T,sep="\t")
pdf(args[2])
p1 <- ggplot( data,aes(x=Mutation,y=difference) ) +
geom_bar( width=.75,aes(fill=as.factor(position),stat="identity"),position="dodge") +
geom_errorbar(aes(y=difference, ymin=difference-difference_std, ymax=difference+difference_std )) +
theme(legend.key = element_blank()) +
ylab(expression(Delta*"Energy (Design - Wild Type)" )) +
xlab( "Mutation" ) +
ylim(-2.5,2.5) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,size=9,hjust=1))+
opts(legend.position="none")+
ggtitle(expression("PG9 Mutational Analysis"))
p1
dev.off()
As you can see I've sort of grouped the positions using the fill layer so you can group them together. Ideally though I would like to put a break in the xbreak when the position changes instead of grouping them by the color.
So it would go like orange bar, orange bar, break, green bar, break, cyan bar, break, etc...
*EDIT:
Here is what the data from an input table looks like:
position Mutation difference difference_std
97 R97L -0.3174848488298787 0.2867477631591484
97 R97N 0.5333956571765566 0.35232170408577224
99 A99H -0.2294999853769939 0.24017957128601522
99 A99S -0.45171049187057877 0.0013816966425459903
101 G101S 0.5315110947026147 0.08483810927415139
102 P102S -0.04872141488960846 0.02890820273131048
103 D103Y 0.6692000007629395 0.07312815307204293
....
So all the positions would be grouped together with a break on either side.
I hope I'm making sense. Is there an easy way to do this?
J
I think the easiest way would be to use faceting (used sample data from question). To get new variable for the faceting, first, order your data frame according to Mutation. Then add new column pos2 where using cumsum() and diff() according to column position sequence of numbers is added (idea of #agstudy).
df2<-df[order(df$Mutation),]
df2$pos2<-cumsum(c(0,diff(df2$position)) != 0)
df2
position Mutation difference difference_std pos2
3 99 A99H -0.22949999 0.240179571 0
4 99 A99S -0.45171049 0.001381697 0
7 103 D103Y 0.66920000 0.073128153 1
5 101 G101S 0.53151109 0.084838109 2
6 102 P102S -0.04872141 0.028908203 3
1 97 R97L -0.31748485 0.286747763 4
2 97 R97N 0.53339566 0.352321704 4
Now use new column pos2 for facetting. With theme() and strip.text= and strip.background= you can remove strip texts and grey background.
ggplot(df2,aes(x=Mutation,y=difference) ) +
geom_bar( width=.75,aes(fill=as.factor(position),
stat="identity"),position="dodge") +
geom_errorbar(aes(y=difference, ymin=difference-difference_std,
ymax=difference+difference_std )) +
theme(legend.key = element_blank()) +
ylim(-2.5,2.5) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,size=9,hjust=1))+
theme(legend.position="none")+
theme(strip.text=element_blank(),
strip.background=element_blank())+
facet_grid(.~pos2,scales="free_x",space="free_x")
UPDATE - empty levels
Other possibility is to use scale_x_discrete() and argument limits= and add empty levels in places where you need space between bars (actual levels). Problem with this approach is that you need to add those levels manually.
For example used the same data as in previous example (ordered question data).
ggplot(df2,aes(x=Mutation,y=difference) ) +
geom_bar( width=.75,aes(fill=as.factor(position),
stat="identity"),position="dodge") +
geom_errorbar(aes(y=difference, ymin=difference-difference_std,
ymax=difference+difference_std )) +
theme(legend.key = element_blank()) +
ylim(-2.5,2.5) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,size=9,hjust=1))+
theme(legend.position="none")+
scale_x_discrete(limits=c("A99H", "A99S","", "D103Y","","G101S",
"","P102S","","R97L", "R97N"))
I think you need first to create a new variable that group the labels.
dat$group <- cumsum(c(0,diff(dat$position)) != 0)
Mutation difference position group
20 0wYpO 0.93746859 4 0
17 63L00 -0.57833783 4 0
3 6hfEp -1.01620448 3 1
1 FvAz4 0.09496127 2 2
8 ghNTj -1.10180158 3 3
14 GxYzD 0.41997924 3 3
Then I don't think it will be an easy way to add a gap to your plot. But You can create a barplot by group and add the genes names with geom_text. here a first version, I hope that someone more experienced with ggplot2 can help to adjust the texts in the middle
ggplot( dat,aes(x=factor(group),y=difference) ) +
geom_bar( width=.75,aes(fill=as.factor(Mutation)),stat="identity",
position="dodge") +
geom_text(aes(label=Mutation),position=position_dodge(height=0.9),angle=90)+
theme_bw() +
opts(legend.position="none")
PS: The code below can be used to generate data:
N <- 21
Mutation <- replicate(N,paste(sample(c(0:9, letters, LETTERS),
5, replace=TRUE),
collapse=""))
difference <- rnorm(N)
position <- c(4, 4, 3, 2, 3, 3, 2, 3, 2,
5, 2, 2, 2, 5, 5,
5, 2, 4, 5, 4, 2)
difference_std <- sd(difference)
dat <- data.frame(Mutation,difference)
dat <- dat[order(dat$Mutation),]
dat$position <- position
dat$group <- cumsum(c(0,diff(dat$position)) != 0)

Resources