ggplot2 - geom_ribbon bug? - r

This code throws an error and I can't figure out why...
library( plyr )
library( ggplot2 )
library( grid )
library( proto )
# the master dataframe
myDF = structure(list(Agg52WkPrceRange = c(2L, 2L, 2L, 2L, 2L, 2L, 3L,
5L, 3L, 5L, 3L, 5L, 3L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 3L, 4L, 3L, 4L, 4L, 4L, 4L), OfResidualPntReturn52CWk = c(0.201477324,
0.22350293, 0.248388728, 0.173871456, 0.201090654, 0.170666183,
0.18681883, 0.178840521, 0.159744891, 0.129811042, 0.13209741,
0.114989407, 0.128347625, 0.100945992, 0.057017002, 0.081123718,
0.018900252, 0.021784814, 0.081931816, 0.059067844, 0.095879746,
0.038977508, 0.078895248, 0.051344317, 0.077515295, 0.011776214,
0.099216033, 0.054714439, 0.022879951, -0.079558277, -0.050889584,
-0.006934821, -0.003407085, 0.032545474, -0.003387139, 0.030418511,
0.053942523, 0.051398537, 0.073482355, 0.087963039, 0.079555591,
-0.040490418, -0.130754663, -0.125826649, -0.141766316, -0.150708718,
-0.171906882, -0.174623614, -0.212945405, -0.174480554), IndependentVariableBinned = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 3L, 10L, 3L, 10L, 4L, 10L, 4L, 2L, 4L, 4L,
4L, 5L, 2L, 2L, 2L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 3L, 6L, 6L, 6L,
6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 8L, 9L, 9L, 9L, 9L,
10L, 10L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10"), class = "factor")), .Names = c("Agg52WkPrceRange",
"OfResidualPntReturn52CWk", "IndependentVariableBinned"), row.names = 28653:28702, class = "data.frame")
# secondary data frame
meansByIndependentVariableBin = ddply( myDF , .( IndependentVariableBinned ) , function( df ) mean( df[[ "OfResidualPntReturn52CWk" ]] ) )
# construct the plot
thePlot = ggplot( myDF , aes_string( x = "IndependentVariableBinned" , y = "OfResidualPntReturn52CWk" ) )
thePlot = thePlot + geom_point( data = meansByIndependentVariableBin , aes( x = IndependentVariableBinned , y = V1 ) )
thePlot = thePlot + geom_line( data = meansByIndependentVariableBin , aes( x = IndependentVariableBinned , y = V1 , group = 1 ) )
thePlot = thePlot + geom_ribbon( data = meansByIndependentVariableBin , aes( group = 1 , x = IndependentVariableBinned , ymin = V1 - 1 , ymax = V1 + 1 ) )
# print - error!
print( thePlot )
I've tried with/without group=1. The error is:
Error in eval(expr, envir, enclos) :
object 'OfRelStrength52CWk' not found
but not sure how that is relevant?? I must be missing something obvious. Take away the last geom (ribbon) and it plots just fine!

There is no bug in geom_ribbon. Your error is because you are defining y = OfResidualPntReturn52CWk in your ggplot call as a result of which geom_ribbon is looking for it. Since you are passing a different data frame to geom_ribbon, there is confusion and hence an error. From your plotting call, although you are using y = OfResidualPntReturn52CWk in your ggplot call, there is no layer where you are calling it, and hence it is immaterial to the plot.
Here is how to do it correctly (if I am understanding what you intend to do in this plot)
MIVB = meansByIndependentVariableBin
thePlot = ggplot(myDF , aes(x = IndependentVariableBinned)) +
geom_point(aes(y = OfResidualPntReturn52CWk)) +
geom_point(data = MIVB, aes(y = V1), colour = 'red') +
geom_line(data = MIVB , aes(y = V1, group = 1), colour = 'red') +
geom_ribbon(data = MIVB, aes(group = 1, ymin = V1 - 1 , ymax = V1 + 1),
alpha = 0.2)
Here is the output it produces
Here is another way to do it, without computing the means in advance. Also I have used mean +- standard errors in the ribbon as I find the choice of +- 1 to be arbitrary
myDF$IndependentVariableBinned = as.numeric(myDF$IndependentVariableBinned)
thePlot = ggplot(myDF , aes(x = IndependentVariableBinned, y =
OfResidualPntReturn52CWk)) +
geom_point() +
geom_point(stat = 'summary', fun.y = 'mean', colour = 'red') +
geom_line(stat = 'summary', fun.y = 'mean', colour = 'red') +
geom_ribbon(stat = 'summary', fun.data = 'mean_se', alpha = 0.2)
This produces

#Ramnath is spot on. Your initial call to ggplot is not needed as all of the layers you are plotting come from the summarized data.frame made by ddply(). You can also simplify your call to ddply() by using the summarize function:
meansByIndependentVariableBin2 = ddply( myDF , .( IndependentVariableBinned )
, summarize, means = mean(OfResidualPntReturn52CWk) )
I would then plot your graph as such:
ggplot(meansByIndependentVariableBin2, aes(x = as.numeric(IndependentVariableBinned), y = means)) +
geom_ribbon(aes(ymin = (means - 1), ymax = (means + 1)), alpha = .4) +
geom_point() +
geom_line()
Is that what you had in mind? I added an alpha to the ribbon layer so we can see the lines and points clearly.

Related

How to prevent R from alphabetically ranking data in ggplot and specify the order in which data is plotted (Data + Code + Graphs provided)?

I'm trying to fix an issue with my GGBalloonPlot graph with regards to how R processes the axis labels.
By default R plots the data using the labels ranked in reverse alphabetical order but to reveal the pattern of the data, the data need to be plotted in a specific order. The only way I've been able to do trick the software is by manually adding a prefix to each label in my .csv table so that R would rank them properly in my output. This is time consuming since I need to manually order the data first before adding the prefix and then plotting.
I would like to input a character vector (or something like that) which would essentially specify the order in which I want to have the data plotted which would reveal the pattern without the need for a prefix in the label name.
I have made some attempts with "scale_y_discrete" without success. I would also like to do the same thing for the X axis since I've had to use the same "trick" to display the columns in the proper non-alphabetical order which offsets the position of the labels. Any idea on how to get GGplot to display my values as seen in the graph without having to "trick" the software since this is quite time consuming ?
Data + Code
#Assign data to "Stack_Overflow_DummyData"
Stack_Overflow_DummyData <- structure(list(Species = structure(c(8L, 3L, 1L, 5L, 6L, 2L,
7L, 4L, 8L, 3L, 1L, 5L, 6L, 2L, 7L, 4L, 8L, 3L, 1L, 5L, 6L, 2L,
7L, 4L, 8L, 3L, 1L, 5L, 6L, 2L, 7L, 4L), .Label = c("Ani", "Cal",
"Can", "Cau", "Fis", "Ort", "Sem", "Zan"), class = "factor"),
Species_prefix = structure(c(8L, 7L, 6L, 5L, 4L, 3L, 2L,
1L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 8L, 7L, 6L, 5L, 4L, 3L,
2L, 1L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L), .Label = c("ac.Cau",
"ad.Sem", "af.Cal", "ag.Ort", "as.Fis", "at.Ani", "be.Can",
"bf.Zan"), class = "factor"), Dist = structure(c(2L, 3L,
5L, 2L, 1L, 1L, 4L, 5L, 2L, 3L, 5L, 2L, 1L, 1L, 4L, 5L, 2L,
3L, 5L, 2L, 1L, 1L, 4L, 5L, 2L, 3L, 5L, 2L, 1L, 1L, 4L, 5L
), .Label = c("End", "Ind", "Pan", "Per", "Wid"), class = "factor"),
Region = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Cen", "Col",
"Far", "Nor"), class = "factor"), Region_prefix = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L), .Label = c("a.Far", "b.Nor", "c.Cen", "d.Col"), class = "factor"),
Frequency = c(75, 50, 25, 50, 0, 0, 0, 0, 11.1, 22.2, 55.6,
55.6, 11.1, 0, 5.6, 0, 0, 2.7, 36.9, 27.9, 65.8, 54.1, 37.8,
28.8, 0, 0, 0, 3.1, 34.4, 21.9, 78.1, 81.3)), class = "data.frame", row.names = c(NA,
-32L))
# Plot Data With Prefix Trick
library(ggplot2)
library(ggpubr)
# make color base on Dist, size and alpha dependent on Frequency
ggballoonplot(Stack_Overflow_DummyData, x = "Region_prefix", y = "Species_prefix",
size = "Frequency", size.range = c(1, 9), fill = "Dist") +
theme_set(theme_gray() +
theme(legend.key=element_blank())) +
# Sets Grey Theme and removes grey background from legend panel
theme(axis.title = element_blank()) +
# Removes X axis title (Region)
geom_text(aes(label=Frequency), alpha=1.0, size=3, nudge_x = 0.4)
# Add Frequency Values Next to the circles
# Plot Data Without Prefix Trick
library(ggplot2)
library(ggpubr)
# make color base on Dist, size and alpha dependent on Frequency
ggballoonplot(Stack_Overflow_DummyData, x = "Region", y = "Species",
size = "Frequency", size.range = c(1, 9), fill = "Dist") +
theme_set(theme_gray() +
theme(legend.key=element_blank())) +
# Sets Grey Theme and removes grey background from legend panel
theme(axis.title = element_blank()) +
# Removes X axis title (Region)
geom_text(aes(label=Frequency), alpha=1.0, size=3, nudge_x = 0.4)
# Add Frequency Values Next to the circles
Here below are the graphs
Good Graph.
Using the label prefix trick with the visible pattern in the data:
Wrong Graph (R default).
Without the prefix trick when GGplot automatically orders the data/labels and the graph makes no sense:
To sum up, I would like the Good graph output without having to have to previously add a prefix in my labels.
Many Thanks in advance for your help.
For the axis labels I would define a previous function to override the breaks:
shlab <- function(lbl_brk){
sub("^[a-z]+\\.","",lbl_brk) # removes the starts of strings as a. or ab.
}
Then, to change the labels you just have to use scale_x,y_discrete with labels = shlab (if you look at the help of scale_x_discrete you will see that one of the options for labels is A function that takes the breaks as input and returns labels as output).
For the colours would be enough to change them (values) in scale_fill_manual and for the sizes, using guides so:
library(ggplot2)
library(ggpubr)
shlab <- function(lbl_brk){
sub("^[a-z]+\\.","",lbl_brk)
}
ggballoonplot(Stack_Overflow_DummyData, x = "Region_prefix", y = "Species_prefix", size = "Frequency", size.range = c(1, 9), fill = "Dist") +
scale_x_discrete(labels = shlab) +
scale_y_discrete(labels = shlab) +
scale_fill_manual(values = c("green", "blue", "red", "black", "white")) +
guides(fill = guide_legend(override.aes = list(size=8))) +
theme_set(theme_gray() + theme(legend.key=element_blank())) + # Sets Grey Theme and removes grey background from legend panel
theme(axis.title = element_blank()) + # Removes X axis title (Region)
geom_text(aes(label=Frequency), alpha=1.0, size=3, nudge_x = 0.4) # Add Frequency Values Next to the circles
UPDATE:
With the new dataset and vector labels:
library(ggplot2)
library(ggpubr)
# make color base on Dist, size and alpha dependent on Frequency
ggballoonplot(Stack_Overflow_DummyData, x = "Region", y = "Species",
size = "Frequency", size.range = c(1, 9), fill = "Dist") +
scale_y_discrete(limits = c("Cau", "Sem", "Cal", "Ort", "Fis", "Ani", "Can", "Zan")) +
scale_x_discrete(limits = c("Far", "Nor", "Cen", "Col")) +
theme_set(theme_gray() +
theme(legend.key=element_blank())) +
# Sets Grey Theme and removes grey background from legend panel
theme(axis.title = element_blank()) +
# Removes X axis title (Region)
geom_text(aes(label=Frequency), alpha=1.0, size=3, nudge_x = 0.4)

New error when producing boxplot

So I had this script working yesterday on a different data set, an it actually worked once on this data set, but when I tried to combine it with another figure using plot_grid, I got this error:
Error:
T_SHOW_BACKTRACE environmental variable.
Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
polygon edge not found
Now when I try to construct the boxplot itself, I get the same error...
Here is my data:
dput(SUICMass)
structure(list(ChillTime = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("2", "4", "6", "24",
"27", "29", "31"), class = "factor"), Mass = c(1.2687, 1.5417,
1.6898, 1.7655, 2.413, 2.0333, 2.0824, 1.2676, 1.4916, 2.1585,
2.2453, 1.3624, 1.2951, 2.4209, 2.0804, 1.9227, 1.9032, 2.1063,
1.7601, 1.9905, 1.9837, 1.6312, 1.8567, 1.4433, 1.9369, 2.1029,
2.0265, 1.3212, 1.2971, 1.5823, 1.4759, 1.2745, 0.714, 1.5693,
1.7906, 1.8607, 1.8851, 1.9192, 1.6307, 1.4269, 1.7011, 0.8249,
1.7198, 1.3939, 1.394, 2.1527, 1.288, 1.4724, 1.5264, 1.6562,
1.5796, 1.4982, 1.2794, 1.6021, 0.6345, 2.4041, 2.0246, 1.8398,
1.349, 2.0156, 1.1563, 2.0462)), .Names = c("ChillTime", "Mass"
), row.names = c(NA, -62L), class = "data.frame")
Here is my code:
library(ggplot2)
library(multcompView)
library(plyr)
library(gridExtra)
library(cowplot)
## Box plot for Susans WMA population
SUICMass <- read.csv('SUICMass_Test_June_28_2017.csv', header = TRUE)
SUICMass$ChillTime <- factor(SUICMass$ChillTime, levels=c("2", "4", "6", "24", "27", "29", "31"))
generate_label_df <- function(SUICMassTUKEY, variable){
# Extract labels and factor levels from Tukey post-hoc
Tukey.levels <- SUICMassTUKEY[[variable]][,4]
Tukey.labels <- data.frame(multcompLetters(Tukey.levels)['Letters'])
#I need to put the labels in the same order as in the boxplot :
Tukey.labels$treatment=rownames(Tukey.labels)
Tukey.labels=Tukey.labels[order(Tukey.labels$treatment) , ]
return(Tukey.labels)
}
SUICMassmodel=lm(SUICMass$Mass~SUICMass$ChillTime )
SUICMassANOVA=aov(SUICMassmodel)
# Tukey test to study each pair of treatment :
SUICMassTUKEY <- TukeyHSD(x=SUICMassANOVA, 'SUICMass$ChillTime', conf.level=0.95)
labels<-generate_label_df(SUICMassTUKEY , "SUICMass$ChillTime")#generate labels using function
names(labels)<-c('Letters','ChillTime')#rename columns for merging
SUICMassyvalue<-aggregate(.~ChillTime, data=SUICMass, max)# obtain letter position for y axis using means
SUICMassfinal<-merge(labels,SUICMassyvalue) #merge dataframes
SUICMassPlot <- ggplot(SUICMass, aes(x = ChillTime, y = Mass)) +
stat_boxplot(geom ='errorbar', width=.2) +
geom_blank() +
theme_bw() +
theme(panel.border = element_rect(fill=NA, colour = "black", size=0.75)) +
theme(axis.text.x = element_text(face="bold")) +
theme(axis.text.y = element_text(face="bold")) +
labs(x = 'Time (weeks)', y = 'Mass (g)') +
ggtitle(expression(atop(bold("Fresh Mass"), atop(italic("(Sarah's - UIC Colony)"))))) +
theme(plot.title = element_text(hjust = 0.5, vjust = -0.6, face='bold')) +
geom_boxplot(fill = 'dodgerblue1', stat = "boxplot") +
geom_text(data = SUICMassfinal, aes(x = ChillTime, y = Mass, label = Letters),vjust=-2,hjust=.5) +
scale_y_continuous(limit = c(0, 3.5))
I can't figure out what the issue is here, because sometimes I can get the script to work and other times not.

How do I make a dot plot with a continuous x-axis (ggplot2)?

I'm trying to create a vertically oriented double plot with a line plot above and dot plot below, with both on the same (continuous, date) x-axis. I've successfully placed the two plots on a common axis and finished the (upper) line plot, but when I try to change the (lower) dot plot's x-axis from categorical to continuous, all my dots bunch up in the middle of the plot.
I only include here my code for the dot plot for simplicity, but if it turns out I need to show you the full double plot, I can do that.
Here's a small subset of my data, then my code, as far as I've gotten with it:
data <- structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
), .Label = c("11/11/2016", "12/16/2016", "12/2/2016", "12/23/2016"
), class = "factor"), factor = c(2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L
), temp = c(-19.85, -19.94, -20.77, -21.3, -21.71, -21.88, -22.03,
-22.74, -22.86, -18.88, -19.02, -19.22, -19.32, -19.32, -19.55,
-19.68, -20.23, -20.32, -21.37, -16.63, -19.01, -19.67, -20.47,
-21.14, -21.23, -23.01, -24.43, -24.61, -24.76, -15.9, -18.87,
-19.02, -19.16, -19.44, -19.62, -22.38, -24.37, -24.92, -26.9
)), .Names = c("date", "factor", "temp"), class = "data.frame", row.names = c(NA,
-39L))
library(ggplot2)
library(scales)
#format date and order date levels (the second line here gives me a warning, but seems to do what I want it to)..
data$date <- as.Date(data$date, "%m/%d/%Y")
data$date.chr <- factor(data$date, as.character(data$date))
data$date.chr <- as.Date(data$date.chr)
#now plot..
ggplot(data, aes(x = date.chr, fill = factor(factor), y = temp)) +
geom_dotplot(binaxis = 'y', stackdir = 'center', method = 'histodot', binwidth = 0.3, position=position_dodge(0.8)) +
scale_x_date(date_breaks = "2 weeks", labels = date_format("%e %b"), limits = as.Date(c("2016-11-04","2016-12-23"))) +
labs(title="", x="", y="response temp (°C)") +
theme_minimal() +
theme(axis.title.y = element_text(vjust=1)) +
theme(legend.position="top") +
guides(fill = guide_legend(override.aes = list(size=10)))
(My session info:
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1)
Any suggestions how I can (dot) plot this data on a continuous x-axis? (again, so I can line it up with the date axis in a plot above it)
I'm not sure if this is what you are looking for, but let's see:
data$date <- as.Date(data$date, "%m/%d/%Y")
data$date.chr <- factor(data$date)
#create dummy variable to get both the position and "filling" right
data$datefact <- paste(data$factor,data$date.chr)
The trick here is to set the "group" argument in geom_dotplot to the dummy variable created before:
ggplot(data, aes(x = date, y = temp)) +
# geom_point() +
geom_dotplot(aes(x = date, group = datefact, fill = factor(factor)),binaxis = 'y',
stackdir = 'center',
method = 'histodot',
binwidth = 0.3)+
scale_x_date(date_breaks = "2 weeks", labels = date_format("%e %b"), limits = as.Date(c("2016-11-04","2016-12-23"))) +
labs(title="", x="", y="response temp (°C)") +
theme_minimal() +
theme(axis.title.y = element_text(vjust=1)) +
theme(legend.position="top") +
guides(fill = guide_legend(override.aes = list(size=10)))
giving:
Is this what you wanted ?

ggplot barchart with grouped confidence interval

a <- structure(list(
X1 = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L),
.Label = c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8"), class = "factor"),
X2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
.Label = c("A", "B", "C"), class = "factor"),
value = c(0.03508924, 0.03054929, 0.03820896, 0.18207091, 0.25985142, 0.03909991, 0.03079736,
0.41436334, 0.02957787, 0.03113289, 0.03239794, 0.1691519, 0.16368845, 0.0287741, 0.02443448,
0.33474091, 0.03283068, 0.02668754, 0.03597605, 0.17098721, 0.23048966, 0.0385765, 0.02597068, 0.36917749),
se = c(0.003064016, 0.003189752, 0.003301929, 0.006415592, 0.00825635, 0.003479607,
0.003195332, 0.008754099, 0.005594554, 0.006840959, 0.006098068, 0.012790908, 0.014176414,
0.006249045, 0.005659445, 0.018284739, 0.005051873, 0.004719352, 0.005487301, 0.011454206,
0.01290797, 0.005884275, 0.004738851, 0.014075813)),
.Names = c("X1", "X2", "value", "se"), class = "data.frame", row.names = c(NA, -24L))
I'm plotting the above data (kept in dataset "a"), and I can't get the confidence interals to sit in the middle of the group chart.My attempts until now have only managed to put lines on the side of each bar, not in the middle like in the geom_errorbar helpfile.I've tried to manipulate the dodge parameters but it only made it worse.
The chart needs to stay flipped over and in the code below I used geom_linerange but geom_errorbar would be even better.
Another thing I haven't quite managed to do is to change the scale into whole numbers (without muliplying the original table ).
I've used the code below on a<-a[1:16,] (the first two groups).
When I use the same code on the full table I get even worse results with the confidence intervals.
Would anyone be able to help? Many thanks in advance.
limits <- aes(ymax = value + se, ymin=value - se)
p<-ggplot(data = a, aes(x = X1, y =value))+
geom_bar(aes(fill=X2),position = "dodge") +
scale_x_discrete(name="")+
scale_fill_manual(values=c("grey80","black","red"))+
scale_y_continuous(name="%")+
theme(axis.text.y = element_text(face='bold'),
legend.position ="top",
legend.title=element_blank())+
coord_flip()
p + geom_linerange(limits)
Try this ,
p<-ggplot(data = df, aes(x = X1, y =value,fill= X2))+
geom_bar(position=position_dodge()) +
geom_errorbar(aes(ymax = value + 2* se, ymin=value,colour = X2),position=position_dodge(.9))
p <- p + scale_x_discrete(name="")+
scale_fill_manual(values=c("grey80","black","red"))+
scale_y_continuous(name="%")+
theme(axis.text.y = element_text(face='bold'),
legend.position ="top",
legend.title=element_blank())+
coord_flip()

visualize associations between two groups of data

Where each datapoint has a pairing of A and B and there multiple entries in A and multiple entires in B. IE multiple syndromes and multiple diagnoses, although for each datapoint there is one single syndrome-diagnoses pair.
Examples, suggestions, or ideas much appreciated
here's what the data is like. And I want to see connections between values of A and B (how many GG's are linked to TTs etc). Both are nominal datatypes.
ID,A ,B
1,GG,TT
2,AA,SS
3,BB,XX
4,DD,SS
5,DD,TT
6,CC,XX
7,HH,ZZ
8,AA,TT
9,CC,RR
10,DD,ZZ
11,AA,XX
12,AA,TT
13,DD,SS
14,DD,XX
15,AA,YY
16,CC,ZZ
17,FF,SS
18,FF,XX
19,BB,VV
20,GG,VV
21,GG,SS
22,AA,RR
23,AA,TT
24,AA,SS
25,CC,VV
26,CC,TT
27,FF,RR
28,GG,UU
29,CC,TT
30,BB,ZZ
31,II,TT
32,FF,RR
33,BB,SS
34,GG,YY
35,FF,RR
36,BB,VV
37,II,RR
38,CC,YY
39,FF,VV
40,AA,XX
41,AA,ZZ
42,GG,VV
43,BB,UU
44,II,UU
45,II,SS
46,DD,SS
47,AA,UU
48,BB,VV
49,GG,TT
50,BB,TT
Since your data is bipartite, I would suggest plotting points in the first factor on one side, points in the other factor on the other, with lines between them, like so:
The code I used to generate this was:
## Make up data.
data <- data.frame(X1=sample(state.region, 10),
X2=sample(state.region, 10))
## Set up plot window.
plot(0, xlim=c(0,1), ylim=c(0,1),
type="n", axes=FALSE, xlab="", ylab="")
factor.to.int <- function(f) {
(as.integer(f) - 1) / (length(levels(f)) - 1)
}
segments(factor.to.int(data$X1), 0, factor.to.int(data$X2), 1,
col=data$X1)
axis(1, at = seq(0, 1, by = 1 / (length(levels(data$X1)) - 1)),
labels = levels(data$X1))
axis(3, at = seq(0, 1, by = 1 / (length(levels(data$X2)) - 1)),
labels = levels(data$X2))
This is what I do. A darker colour indicates a more important combination of A and B.
dataset <- data.frame(A = sample(LETTERS[1:5], 200, prob = runif(5), replace = TRUE), B = sample(LETTERS[1:5], 200, prob = runif(5), replace = TRUE))
Counts <- as.data.frame(with(dataset, table(A, B)))
library(ggplot2)
ggplot(Counts, aes(x = A, y = B, fill = Freq)) + geom_tile() + scale_fill_gradient(low = "white", high = "black")
Or if you prefer lines
library(ggplot2)
dataset <- data.frame(A = sample(letters[1:5], 200, prob = runif(5), replace = TRUE), B = sample(letters[1:5], 200, prob = runif(5), replace = TRUE))
Counts <- as.data.frame(with(dataset, table(A, B)))
Counts$X <- 0
Counts$Xend <- 1
Counts$Y <- as.numeric(Counts$A)
Counts$Yend <- as.numeric(Counts$B)
ggplot(Counts, aes(x = X, xend = Xend, y = Y, yend = Yend, size = Freq)) +
geom_segment() + scale_x_continuous(breaks = 0:1, labels = c("A", "B")) +
scale_y_continuous(breaks = 1:5, labels = letters[1:5])
This third options add labels to the data points using geom_text().
library(ggplot2)
dataset <- data.frame(
A = sample(letters[1:5], 200, prob = runif(5), replace = TRUE),
B = sample(LETTERS[20:26], 200, prob = runif(7), replace = TRUE)
)
Counts <- as.data.frame(with(dataset, table(A, B)))
Counts$X <- 0
Counts$Xend <- 1
Counts$Y <- as.numeric(Counts$A)
Counts$Yend <- as.numeric(Counts$B)
ggplot(Counts, aes(x = X, xend = Xend, y = Y, yend = Yend)) +
geom_segment(aes(size = Freq)) +
scale_x_continuous(breaks = 0:1, labels = c("A", "B")) +
scale_y_continuous(breaks = -1) +
geom_text(aes(x = X, y = Y, label = A), colour = "red", size = 7, hjust = 1, vjust = 1) +
geom_text(aes(x = Xend, y = Yend, label = B), colour = "red", size = 7, hjust = 0, vjust = 0)
Maybe mosaicplot:
X <- structure(list(
ID = 1:50,
A = structure(c(6L, 1L, 2L, 4L, 4L, 3L, 7L, 1L, 3L, 4L, 1L, 1L, 4L, 4L, 1L, 3L, 5L, 5L, 2L, 6L, 6L, 1L, 1L, 1L, 3L, 3L, 5L, 6L, 3L, 2L, 8L, 5L, 2L, 6L, 5L, 2L, 8L, 3L, 5L, 1L, 1L, 6L, 2L, 8L, 8L, 4L, 1L, 2L, 6L, 2L), .Label = c("AA","BB", "CC", "DD", "FF", "GG", "HH", "II"), class = "factor"),
B = structure(c(3L, 2L, 6L, 2L, 3L, 6L, 8L, 3L, 1L, 8L, 6L, 3L, 2L, 6L, 7L, 8L, 2L, 6L, 5L, 5L, 2L, 1L, 3L, 2L, 5L, 3L, 1L, 4L, 3L, 8L, 3L, 1L, 2L, 7L, 1L, 5L, 1L, 7L, 5L, 6L, 8L, 5L, 4L, 4L, 2L, 2L, 4L, 5L, 3L, 3L), .Label = c("RR", "SS", "TT", "UU", "VV", "XX", "YY", "ZZ"), class = "factor")
), .Names = c("ID", "A", "B"), class = "data.frame", row.names = c(NA, -50L)
)
mosaicplot(with(X,table(A,B)))
For you example dataset:
Thanks! I think that the connectivity between elements in each class is best visualized by the link graph examples given by both Jonathon and Thierry. Thierry's 2nd which shows the magnitude is definitely where i will start.
update
thanks everyone for you ideas and tips!
I came acrossthe bipartite package that has functions to visualize this kind of data. I think its a clean visualization of the relationships I am trying to show.
did:
library(bipartite)
dataset <- data.frame(
A = sample(letters[1:5], 200, prob = runif(5), replace = TRUE),
B = sample(LETTERS[20:26], 200, prob = runif(7), replace = TRUE)
)
datamat <- as.matrix(table(dataset$A, dataset$B))
visweb(datamat, text = "interaction", textsize = .8)
giving:
visweb result
couldnt put image in as a new user :(

Resources