compare different datasets with stacked bar graphs in R - r

I need to compare two different methods that each of them has 3 different results in one graph with using stacked bar style.
I want to draw a plot so that x axis shows the experiment and y axis shows results. and each bar fills with 3 results in stacked bar format.
experiment method resuult1 result2 result3
1 m1 1 2 3
1 m2 4 5 6
2 m1 7 8 9
2 m2 10 11 12
3 m1 13 14 15
3 m2 16 17 18
I have this code for comparing two data set how can i change it.
library(ggplot2);
pdf(file = '$filename.pdf', width=5, height=5);
data1 <- as.matrix(read.table('$INPUT_FILE1', header = T));
data1.experiment <- as.numeric(data1[,\"Experiment\"]);
data1.obs <- as.numeric(data1[,\"Result1\"]);
data1.method <- as.factor(data1[,\"Method\"]);
df <- data.frame(data1.experiment, data1.method, data1.obs);
orderlist = c("70", "100", "130", "160", "190", "260");
ggplot(df, aes(x = data1.experiment, y = data1.obs, fill = data1.method), ylim=c(60000, 2800000)) +
geom_bar(stat='identity', position='dodge')+
labs(x='$xlabel',y='$ylabel', fill='Methods') +
scale_fill_manual(values = c('red','blue'), labels = c('DTB-MAC', 'IEEE802.11P')) +
scale_x_continuous(breaks = orderlist)+
theme(legend.position = c(1, 1), legend.justification = c(1, 1), legend.background = element_rect(colour = NA, fill = 'white'));

You said that you need to compare the methods. If you represent experiment on x-axis and result on y then how will you represent method??? My way of doing it is using the facet. Here is the code for how to do it using ggplot2.
dat <- read.csv("data.csv")
library(reshape2)
library(ggplot2)
dat1 <- melt(dat,id.vars = c("experiment","method"))
p <- ggplot(dat1,aes(experiment,value,fill=variable))+geom_bar(stat="identity")+
facet_wrap(~method,nrow=1)
p

This sort of multi-dimensional chart is best explored using the ggplot2 package. I will assume here that the data you have pasted is stored in the data.frame d:
require(reshape2) ## needed to have all experiments in one variable
require(ggplot2) ## needed for the great vizualizations
d <- melt(d, id.vars=c("experiment", "method"))
ggplot(d, aes(x=factor(experiment), y=value, fill=variable)) +
geom_bar(stat="identity") +
facet_wrap(~method)
You can polish the graph further using custom labels, but that is too long to explore here. The questions with the ggplot2 tag have lots of great examples.
EDIT: Corrected to show the methods too, as already answered by #user2743244

Related

ordering facet_wrap levels by frequency

I'm trying to do a waffle chart for the championships won by F1 drivers so far. The chart comes out good but it comes out with alphabetical labels. I want it to start from the most titles won to the least.
I've tried ordering and fct_relevel. But nothing works. Below is the code
ggplot(data = dfc, aes(fill=Champions, values=one)) +
geom_waffle(color = "cornsilk", size=0.25, n_rows=7)+
facet_wrap(~Champions, nrow = 3, strip.position = "bottom",labeller = label_wrap_gen(6))
And this is the
result I'm looking for.
You can find the entire code here
The dataset looks like
Season Champions Team one
1 a x 1
2 a x 1
3 b y 1
4 a x 1
5 c z 1
Here's a solution using forcats (also part of the tidyverse package).
fct_infreq() orders factors according to their frequency in the data, and you can use that to specify the ordering of the levels in your data.
dfc$Champions <- factor(dfc$Champions, levels=levels(fct_infreq(dfc$Champions)))
ggplot(data = dfc, aes(fill=Champions, values=one)) +
geom_waffle(color = "cornsilk", size=0.25, n_rows=7) +
facet_wrap(~Champions, nrow = 3, strip.position = "bottom", labeller = label_wrap_gen(6))

ggplot facet_wrap with equally spaced axes

Say I have the following dummy data frame:
df <- data.frame(let = LETTERS[1:13], value = sample(13),
group = rep(c("foo", "bar"), times = c(5,8)))
df
let value group
1 A 2 foo
2 B 1 foo
3 C 12 foo
4 D 8 foo
5 E 4 foo
6 F 13 bar
7 G 11 bar
8 H 3 bar
9 I 7 bar
10 J 5 bar
11 K 10 bar
12 L 9 bar
13 M 6 bar
Using ggplot with facet_wrap allows me to make a panel for each of the groups...
library(ggplot2)
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free")
..but the vertical axes are not equally spaced, i.e. the left plot contains more vertical ticks than the right one. I would like to fill up the right vertical axis with (unlabeled) ticks (with no plotted values). In this case that would add 3 empty ticks, but it should be scalable to any df size.
What is the best way to accomplish this? Should I change the data frame, or is there a way to do this using ggplot?
I’m not sure why you want to arrange the categorical variable on your chart as you do other than aesthetics (it does seem to look better). At any rate, a simple workaround which seems to handle general cases is to note that ggplot uses a numerical scale to plot categorical variables. The workaround for your chart is then for each x value to plot a transparent point at the y value equal to the number of categorical variables. Points are plotted for all x values as a simple solution to the case of non-overlapping ranges of x values for each group. I've added another group to your data frame to make the example a bit more general.
library(ggplot2)
set.seed(123)
df <- data.frame(let = LETTERS[1:19], value = c(sample(13),20+sample(6)),
group = rep(c("foo", "bar", "bar2"), times = c(5,8,6)))
num_rows <- xtabs(~ group, df)
max_rows <- max(num_rows)
sp <- ggplot(df, aes(y= let, x = value)) +
geom_point() +
geom_point(aes(y = max_rows +.5), alpha=0 ) +
facet_wrap(~group, scales = "free", nrow=1 )
plot(sp)
This gives the following chart:
A cludgy solution that requires magrittr (for the compound assignment pipe %<>%):
df %<>%
rbind(data.frame(let = c(" ", " ", " "),
value = NA,
group = "foo"))
I just add three more entries for foo that are blank strings (i.e., just spaces) of different lengths. There must be a more elegant solution, though.
Use free_x instead of free, like this:
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free_x")+
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())

ggplot: How does geom_tile calculate the fill? [duplicate]

I used geom_tile() for plot 3 variables on the same graph... with
tile_ruined_coop<-ggplot(data=df.1[sel1,])+
geom_tile(aes(x=bonus, y=malus, fill=rf/300))+
scale_fill_gradient(name="vr")+
facet_grid(Seuil_out_coop_i ~ nb_coop_init)
tile_ruined_coop
and I am pleased with the result !
But What kind of statistical treatment is applied to fill ? Is this a mean ?
To plot the mean of the fill values you should aggregate your values, before plotting. The scale_colour_gradient(...) does not work on the data level, but on the visualization level.
Let's start with a toy Dataframe to build a reproducible example to work with.
mydata = expand.grid(bonus = seq(0, 1, 0.25), malus = seq(0, 1, 0.25), type = c("Risquophile","Moyen","Risquophobe"))
mydata = do.call("rbind",replicate(40, mydata, simplify = FALSE))
mydata$value= runif(nrow(mydata), min=0, max=50)
mydata$coop = "cooperative"
Now, before plotting I suggest you to calculate the mean over your groups of 40 values, and for this operation like to use the dplyr package:
library(dplyr)
data = mydata %>% group_by("bonus","malus","type","coop") %>% summarise(vr=mean(value))
Tow you have your dataset ready to plot with ggplot2:
library(ggplot2)
g = ggplot(data, aes(x=bonus,y=malus,fill=vr))
g = g + geom_tile()
g = g + facet_grid(type~coop)
and this is the result:
where you are sure that the fill value is exactly the mean of your values.
Is this what you expected?
It uses stat_identity as can be seen in the documentation. You can test that easily:
DF <- data.frame(x=c(rep(1:2, 2), 1),
y=c(rep(1:2, each=2), 1),
fill=1:5)
# x y fill
#1 1 1 1
#2 2 1 2
#3 1 2 3
#4 2 2 4
#5 1 1 5
p <- ggplot(data=DF) +
geom_tile(aes(x=x, y=y, fill=fill))
print(p)
As you see the fill value for the 1/1 combination is 5. If you use factors it's even more clear what happens:
p <- ggplot(data=DF) +
geom_tile(aes(x=x, y=y, fill=factor(fill)))
print(p)
If you want to depict means, I'd suggest to calculate them outside of ggplot2:
library(plyr)
DF1 <- ddply(DF, .(x, y), summarize, fill=mean(fill))
p <- ggplot(data=DF1) +
geom_tile(aes(x=x, y=y, fill=fill))
print(p)
That's easier than trying to find out if stat_summary can play with geom_tile somehow (I doubt it).
scale_fill() and geom_tile() apply no statistics -or better apply stat_identity()- to your fill value=rf/300. It just computes how many colors you use and then generates the colors with the munsell function 'mnsl()'. If you want to apply some statistics only to the colors displayed you should use:
scale_colour_gradient(trans = "log")
or
scale_colour_gradient(trans = "sqrt")
Changing the colors among the tiles could not be the best idea since the plots have to be comparable, and you compare the values by their colours. Hope this helps

Drawing a multiple line ggplot figure

I am working on a figure which should contain 3 different lines on the same graph. The data frame I am working on is the follow:
I would like to be able to use ind(my data point) on x axis and then draw 3 different lines using the data coming from the columns med, b and c.
I only managed to obtain draw one line.
Could you please help me? the code I am using now is
ggplot(data=f, aes(x=ind, y=med, group=1)) +
geom_line(aes())+ geom_line(colour = "darkGrey", size = 3) +
theme_bw() +
theme(plot.background = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank())
The key is to spread columns in question into a new variable. This happens in the gather() step in the below code. The rest is pretty much boiler plate ggplot2.
library(ggplot2)
library(tidyr)
xy <- data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10),
ind = 1:10)
# we "spread" a and b into a a new variable
xy <- gather(xy, key = myvariable, value = myvalue, a, b)
ggplot(xy, aes(x = ind, y = myvalue, color = myvariable)) +
theme_bw() +
geom_line()
With melt and ggplot:
df$ind <- 1:nrow(df)
head(df)
a b med c ind
1 -87.21893 -84.72439 -75.78069 -70.87261 1
2 -107.29747 -70.38214 -84.96422 -73.87297 2
3 -106.13149 -105.12869 -75.09039 -62.61283 3
4 -93.66255 -97.55444 -85.01982 -56.49110 4
5 -88.73919 -95.80307 -77.11830 -47.72991 5
6 -86.27068 -83.24604 -86.86626 -91.32508 6
df <- melt(df, id='ind')
ggplot(df, aes(ind, value, group=variable, col=variable)) + geom_line(lwd=2)

Labels y-axis change

I struggling a lot with a graph and I dont know what is going wrong. I got the following dataframe:
And then I use the following dataframe:
df <- read.table(text ="YEAR Eucaris Niet.Eucaris
1 8 81867 0.1527756
2 9 91507 0.1533734
3 10 102755 0.1733875
4 11 116491 0.1648633
5 12 55133 0.1771800
6 13 67115 0.1449571", header =TRUE)
This works but when I expand the dataframe
r <- c(14,56849)
df <- rbind(df, r)
The graph shows 8, 10, 12 in stead of 8,9,10 etc...
Why is this happening?
Using ggplot2, by modifying scale_x_continuous:
library(ggplot2)
graph <- ggplot(df, aes(x = YEAR, y=Eucaris)) +
geom_line(linetype="dashed", size=1, colour="blue") +
geom_point(size=4, shape=22, colour="darkred", fill="pink")+
scale_x_continuous(breaks = 1:14)

Resources