ggplot2 expecting square matrix even though matrix is not symmetric - r

Hi I am trying to plot a heat map in ggplot2, using a matrix with 9 rows and 10 columns
I melt the matrix using the "as.matrix" notation in reshape2 and get the following output
A1 = melt(as.matrix(A))
Var1 Var2 value
1 1 X0.05 8.690705e-01
2 2 X0.05 1.930320e-01
3 3 X0.05 1.474900e-02
4 4 X0.05 3.498176e-04
5 5 X0.05 2.451419e-06
6 6 X0.05 4.946808e-09
7 7 X0.05 2.832895e-12
8 8 X0.05 4.563140e-16
9 9 X0.05 2.055474e-20
10 1 X0.1 5.906241e-01
11 2 X0.1 7.416265e-01
12 3 X0.1 2.311771e-01
13 4 X0.1 3.892639e-02
14 5 X0.1 3.361408e-03
15 6 X0.1 1.445629e-04
16 7 X0.1 3.043528e-06
17 8 X0.1 3.103555e-08
18 9 X0.1 1.522292e-10
The output is correct with each column represented by 9 values
I then rescale by value and get the following output
A2 = ddply(A1, .(var2), transform, rescale = rescale(value))
Var1 Var2 value rescale
1 1 X0.05 8.690705e-01 1.000000e+00
2 2 X0.05 1.930320e-01 2.221132e-01
3 3 X0.05 1.474900e-02 1.697101e-02
4 4 X0.05 3.498176e-04 4.025192e-04
5 5 X0.05 2.451419e-06 2.820737e-06
6 6 X0.05 4.946808e-09 5.692068e-09
7 7 X0.05 2.832895e-12 3.259684e-12
8 8 X0.05 4.563140e-16 5.250361e-16
9 9 X0.05 2.055474e-20 0.000000e+00
10 1 X0.1 5.906241e-01 7.963902e-01
11 2 X0.1 7.416265e-01 1.000000e+00
12 3 X0.1 2.311771e-01 3.117163e-01
13 4 X0.1 3.892639e-02 5.248786e-02
14 5 X0.1 3.361408e-03 4.532480e-03
15 6 X0.1 1.445629e-04 1.949266e-04
16 7 X0.1 3.043528e-06 4.103651e-06
17 8 X0.1 3.103555e-08 4.164269e-08
18 9 X0.1 1.522292e-10 0.000000e+00
Everything still looks fine and when I plot the heat map the output is correct, so far so good
ggplot(A2, aes(Var2, Var1)) + geom_tile(aes(fill = rescale), colour = "white") + scale_fill_gradient(low = "light blue", high = "dark blue")
The problem comes up when I add custom labels, where the y axis goes from 1 to 9 (displaying the number of heterozygote individuals) and the x-axis goes from 0.05 to 0.5 (displaying the minor allele frequency)
x = [0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50]
y = [1 2 3 4 5 6 7 8 9]
ggplot(A4, aes(Var2, Var1)) + geom_tile(aes(fill = rescale), colour = "white") + scale_fill_gradient(low = "light blue", high = "dark blue") + scale_x_discrete(labels= x, expression("Minor allele frequency")) + scale_y_discrete(labels= y, expression("Number of Heterozygotes"))
However this time the y axis is all messed up
It seems to me that ggplot automatically assumes a 10X10 matrix and tries to add the missing labels. I tried to find an option in reshape where I could maybe declare the shape of the matrix, however I was unable to find a solution. Has anyone come across this problem. Any help would be much appreciated, thanks in advance

Here is one approach. You can change tick mark labels with scale_x_discrete. As for y, I converted Var1 to factor.
ggplot(mydf, aes(x = Var2, y = as.factor(Var1), fill = rescale)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "light blue", high = "dark blue") +
scale_x_discrete(breaks=c("X0.05", "X0.1"), labels=c("0.05", "0.1")) +
xlab("Minor allele frequency") +
ylab("Number of Heterozygotes")

Related

Stacked/Dodged barplot using base R with x-axis is numerical data

I have looked at all barplot questions in the sites but still couldn't figure out what to do with my dataset. I don't know if it's a duplicate but any help would be so much appreciated
My dataset
Region Scenario HC NPV1 NPV2
C 1 0.1 10 5
C 2 0.2 8 4
C 3 0.3 7 3
C 4 0.4 6 2
N 1 0.1 10 5
N 2 0.2 8 4
N 3 0.3 7 3
N 4 0.4 6 2
W 1 0.1 10 5
W 2 0.2 8 4
W 3 0.3 7 3
W 4 0.4 6 2
I want to create a barplot where HC, Scenario is at x-axis, NPV1 and NPV2 is the height and be distinguished by different patterns. A region should be a common name in the middle of each 4 scenarios.
Thanks a lot.
Expected output is something like this.
Further to my above comments, I'm quite unclear about how you'd like to visualise your data. What exactly would you like to show on the x axis?
As a start, perhaps you are after something like this?
library(tidyverse)
df %>%
gather(key, val, -Region, -Scenario, -HC) %>%
unite(x, Region, Scenario, HC) %>%
ggplot(aes(x, val, fill = key)) +
geom_col()
Here categories on the x-axis are of the form <Region>_<Scenario>_<HC>.
Update
To achieve a plot similar to the one you're showing you can do the following
library(tidyverse)
df %>%
gather(key, val, -Region, -Scenario, -HC) %>%
ggplot(aes(HC, val, fill = key)) +
geom_col(position = "dodge2") +
facet_wrap(~Region, nrow = 1, strip.position = "bottom") +
theme_minimal() +
theme(strip.placement = "outside")
Explanation: strip.position = "bottom" ensures that strip labels are at the bottom, and strip.placement = "outside" ensures that strip labels are below the axis labels (to be precise, between the axis labels and axis title).
Sample data
df <- read.table(text =
"Region Scenario HC NPV1 NPV2
C 1 0.1 10 5
C 2 0.2 8 4
C 3 0.3 7 3
C 4 0.4 6 2
N 1 0.1 10 5
N 2 0.2 8 4
N 3 0.3 7 3
N 4 0.4 6 2
W 1 0.1 10 5
W 2 0.2 8 4
W 3 0.3 7 3
W 4 0.4 6 2
", header = T)

R - Multiple plot with ggplot

I have this small dataset
map red_team blue_team
1 7 8
2 21 32
3 11 22
4 10 8
And I am trying to create a multiplot where each individual plot one represents one of the maps (1,2,3 and 4), and the content is two bars, one for red_team and another for blue_team on the X axis and the score on the Y axis.
This what I currently have.
ggplot(winners_and_score, aes(red_team)) + geom_bar() + facet_wrap(~ map)
I'm having issue trying to display the score for both teams.
Thanks.
require(reshape2)
require(ggplot2)
# toy data
df = data.frame(map = 1:4, red_team = sample(7:21, 4, replace=T),
blue_team = sample(8:32, 4, replace=T))
df.melted <- melt(df, id='map')
> df.melted
map variable value
1 1 red_team 8
2 2 red_team 15
3 3 red_team 17
4 4 red_team 19
5 1 blue_team 22
6 2 blue_team 32
7 3 blue_team 31
8 4 blue_team 18
# making the plot
ggplot(data=df.melted, aes(x=variable, y=value, fill=variable)) +
geom_bar(stat='identity') +
facet_wrap(~map) +
theme_bw()

Plotting several X,Y column pairs as data series, while excluding (0,0) points

I'm trying to plot three data series in a single plot. The X and Y coordinates of each series are in separate columns in my data frame:
X1 Y1 X2 Y2 X3 Y3
1 0 1 0 2 0 3
2 1 2 1 3 1 4
3 2 3 2 4 2 5
4 3 4 3 5 3 6
5 4 5 4 6 4 7
6 5 6 5 7 5 8
7 6 7 6 8 6 9
8 0 0 7 9 7 8
9 0 0 8 8 0 0
10 0 0 9 7 0 0
Since the trailing (0,0) data points of each series are invalid, only this subset of points should eventually be plotted:
X1 Y1 X2 Y2 X3 Y3
1 0 1 0 2 0 3
2 1 2 1 3 1 4
3 2 3 2 4 2 5
4 3 4 3 5 3 6
5 4 5 4 6 4 7
6 5 6 5 7 5 8
7 6 7 6 8 6 9
8 7 9 7 8
9 8 8
10 9 7
Additionally, the X-axis of the first series should be inverted:
Even without cleaning up with data frame first, I struggled to plot the column pairs as individual series in ggplot2 (see 'legend').
require(ggplot2)
report <- function(df){
plot = ggplot(data=df, aes(x=-X1, y=Y1, size=3)) + #inverted X-axis of series 1
layer(geom="point") +
geom_point(aes(X2, Y2, colour="red", size=2)) +
geom_point(aes(X3, Y3, colour="blue", size=1)) +
xlab("X") + ylab("Y")
print(plot)
}
X1 = c(0,1,2,3,4,5,6,0,0,0)
Y1 = c(1,2,3,4,5,6,7,0,0,0)
X2 = c(0,1,2,3,4,5,6,7,8,9)
Y2 = c(2,3,4,5,6,7,8,9,8,7)
X3 = c(0,1,2,3,4,5,6,7,0,0)
Y3 = c(3,4,5,6,7,8,9,8,0,0)
df <- data.frame(X1,Y1,X2,Y2,X3,Y3)
colnames(df) <- c("X1","Y1","X2","Y2","X3","Y3")
report(df)
What would be the best way to get rid of the invalid (0,0) data points in each series, and how should I plot them properly?
I think you actually want to transform your data.frame in order to make your ggplot call more concise. Here is the updated version to plot your data correctly using the dplyr package to transform the data.
In response to comment requesting additional info on dplyr. It provides the %>% operator which simply passed the argument to the left into the function on the right as the first argument. It allows for much more readable R code. The mutate function adds the Series variable via a manual setting of the variable given the knowledge of which points are part of which series. Then the filter function removes the 0,0 points which you indicated were not wanted. You can inspect the df after these operations to see the final output. Hope this helps interpret the below code. Also here is a link to the dplyr page.
library(dplyr)
df <- rbind.data.frame(
data.frame(X=-X1, Y=Y1),
data.frame(X=X2, Y=Y2),
data.frame(X=X3, Y=Y3))
df <- df %>%
mutate(Series=rep(c('S1', 'S2', 'S3'), each=10)) %>%
filter(!(X == 0 & Y == 0))
png('foo.png')
ggplot(df) + geom_point(aes(x=X, y=Y, color=Series, size=Series))
dev.off()
Also if you want to manual set the values of color and size as well as adding the lines as in your ideal example plot, here is a more complex ggplot command:
ggplot(df, aes(x=X, y=Y, color=Series, size=Series)) +
geom_point() + geom_line(size=1) + theme_bw() +
scale_color_manual(values=c('black', 'red', 'blue')) +
scale_size_manual(values=seq(4,2,-1))

Apply function to all possible values of a variable

I would like to get as many plots as factors/values in a variable.
For example, I would like to plot the following variables (v1, v2, v3, v4, v5, v6, v7, v8) that I have defined as a scale for all possible values on the variable country. So i get, in that case, a total of three different plots.
I know how to plot it separately, for example in this cases I would have used the following:
basicgraph(Data[country==1, scale1] )
basicgraph(Data[country==2, scale1] )
basicgraph(Data[country==3, scale1] )
I would like my function to plot as many graphs as factors/values (without specifying the number of factors/values). I have tried with "apply" but i can't really make it work, so any clue could be good for me.
I have a dataset that looks like:
v1 v2 v3 v4 v5 v6 v7 v8 country
1 NA NA NA NA NA NA NA NA 1
2 5 5 5 5 5 4 5 5 2
3 4 5 3 5 4 5 5 5 3
4 5 5 5 4 2 4 4 5 1
5 4 3 5 4 4 5 4 5 2
6 5 5 5 2 3 4 3 5 3
7 NA NA NA NA NA NA NA NA 1
8 3 5 5 5 4 5 4 4 2
9 4 5 5 4 5 5 4 5 3
10 2 4 4 5 4 5 4 5 1
11 4 5 5 3 4 4 4 5 2
12 4 5 4 4 5 4 4 5 3
13 5 5 4 3 3 5 5 5 1
14 3 5 1 2 3 1 4 5 2
Ihave defined the scale as:
scale1 <- names(Data) %in% c( "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8")
I have defined a plot function by:
basicgraph<-function(df, title, lab)
{
for(i in 1:length(df))
{
y <- melt(df)
z <- with(y, as.data.frame(table(variable, value, exclude = NULL)))
z <- z[!is.na(z$variable), ]
z$scale <- z$variable
levelss<-levels(z$variable)
}
theme_nogrid <- function (base_size = 12, base_family = "")
{
theme_bw(base_size = base_size, base_family = base_family) %+replace%
theme(panel.grid = element_blank()) +
theme(axis.text.x =element_text(size = base_size * 0.8 , lineheight = 0.9,
vjust = 0.5, hjust=1, angle=90))
}
plot1<-function(z) {
ggplot(data = z, aes(x = variable, y = value, size = Freq))+
geom_point(aes(size = Freq, stat = "identity", position = "identity"), shape = 20, color="black", alpha=0.6) +
scale_size_continuous(range = c(3,15)) +
scale_x_discrete(breaks=levelss,labels=lab)+
xlab("")+ #Afegir/canviar títol eix x
ylab("Response")+ #Afegir/canviar títol eix y
ggtitle(title)+ #Títol a dalt
theme_nogrid()
}
}
This is a pretty confusing question and example. I think you want to produce a different graph for each country value? In that case I'd suggest something like this:
library(reshape2)
Data_m <- melt(Data, id.vars="country") # melt the data into 'long' format
f <- function(d) { # function that produces a graph and waits
print(qplot(variable, value, data=d) + ggtitle(unique(d$country)))
readline()
}
library(plyr)
d_ply(Data_m, .(country), f) # produces three separate graphs
The d_ply call splits Data_m into three parts and repeatedly calls f on each, producing a graph of that subset of the data, without knowing anything about the data being graphed.

R: {reshape}: (melt.data.frame) How do I replicate a column?

I have an array of iterations in an MCMC algorithm. The rows represent draws from a distribution. The columns represent parameters (variables) in the distribution. For ease of exposition: assume two variables, five iterations. So I have:
> draws <- data.frame( iteration = c(1:5),
alpha = rnorm(5,0,1),
beta = rnorm(5,0,1))
iteration alpha beta
1 1 -0.3157940 0.2122465
2 2 1.0087298 -0.2346733
3 3 1.0366165 0.3472915
4 4 -2.4256564 0.9863279
5 5 -0.6089072 -1.1213000
When I melt the dataset, I get:
> melt(draws)
Using as id variables
variable value
1 iteration 1.0000000
2 iteration 2.0000000
3 iteration 3.0000000
4 iteration 4.0000000
5 iteration 5.0000000
6 alpha -0.1042616
7 alpha 1.0707001
8 alpha 0.2166865
9 alpha 0.0771617
10 alpha -0.8893614
11 beta -0.4846693
12 beta -1.5950729
13 beta -0.7178340
14 beta 1.0149766
15 beta -0.3128256
But I want to hold iteration out so that I get the equivalent of (hand edited):
> melt(draws)
Using as id variables
iteration variable value
1 1 alpha -0.1042616
2 2 alpha 1.0707001
3 3 alpha 0.2166865
4 4 alpha 0.0771617
5 5 alpha -0.8893614
6 1 beta -0.4846693
7 2 beta -1.5950729
8 3 beta -0.7178340
9 4 beta 1.0149766
10 5 beta -0.3128256
Supply the id variable to melt:
melt(draws, id = "iteration")
Gives:
iteration variable value
1 1 alpha -0.02765436
2 2 alpha -1.42138702
3 3 alpha 0.83525096
4 4 alpha -1.10677555
5 5 alpha 0.72465936
6 1 beta 0.59269720
7 2 beta -0.32164072
8 3 beta -1.31097204
9 4 beta 0.94993620
10 5 beta 0.20919169
Bah. I always ask a question right before finding the answer...
I had been reading help(melt.array), but when I converted to a data.frame, to post my question, it eventually led me to the answer in help(melt.data.frame).
To get what I want, I will use:
myMelt <- melt( draws, id.var = "iteration" );
So that I can then make a faceted plot:
ggplot(myMelt, aes(x = iteration,y = value)) + geom_point() + stat_smooth() + facet_grid(variable ~ ., scales="free")

Resources