This is related to another question: Plot weighted frequency matrix.
I have this graphic (produced by the code below in R):
#Set the number of bets and number of trials and % lines
numbet <- 36
numtri <- 1000
#Fill a matrix where the rows are the cumulative bets and the columns are the trials
xcum <- matrix(NA, nrow=numbet, ncol=numtri)
for (i in 1:numtri) {
x <- sample(c(0,1), numbet, prob=c(5/6,1/6), replace = TRUE)
xcum[,i] <- cumsum(x)/(1:numbet)
}
#Plot the trials as transparent lines so you can see the build up
matplot(xcum, type="l", xlab="Number of Trials", ylab="Relative Frequency", main="", col=rgb(0.01, 0.01, 0.01, 0.02), las=1)
I very much like the way that this plot is built up and shows the more frequent paths as darker than the rarer paths (but it is not clear enough for a print presentation). What I would like to do is to produce some kind of hexbin or heatmap for the numbers. On thinking about it, it seems that the plot will have to incorporate different sized bins (see my back of the envelope sketch):
My question then: If I simulate a million runs using the code above, how can I present it as a heatmap or hexbin, with the different sized bins as shown in the sketch?
To clarify: I do not want to rely on transparency to show the rarity of a trial passing through a part of the plot. Instead I would like to denote rarity with heat and show a common pathway as hot (red) and a rare pathway as cold (blue). Also, I do not think the bins should be the same size because the first trial has only two places where the path can be, but the last has many more. Hence the fact I chose a changing bin scale, based on that fact. Essentially I am counting the number of times a path passes through the cell (2 in col 1, 3 in col 2 etc) and then colouring the cell based on how many times it has been passed through.
UPDATE: I already had a plot similar to #Andrie, but I am not sure it is much clearer than the top plot. It is the discontinuous nature of this graph, that I do not like (and why I want some kind of heatmap). I think that because the first column has only two possible values, that there should not be a huge visual gap between them etc etc. Hence why I envisaged the different sized bins. I still feel that the binning version would show large number of samples better.
Update: This website outlines a procedure to plot a heatmap:
To create a density (heatmap) plot version of this we have to effectively enumerate the occurrence of these points at each discrete location in the image. This is done by setting a up a grid and counting the number of times a point coordinate "falls" into each of the individual pixel "bins" at every location in that grid.
Perhaps some of the information on that website can be combined with what we have already?
Update: I took some of what Andrie wrote with some of this question, to arrive at this, which is quite close to what I was conceiving:
numbet <- 20
numtri <- 100
prob=1/6
#Fill a matrix
xcum <- matrix(NA, nrow=numtri, ncol=numbet+1)
for (i in 1:numtri) {
x <- sample(c(0,1), numbet, prob=c(prob, 1-prob), replace = TRUE)
xcum[i, ] <- c(i, cumsum(x)/cumsum(1:numbet))
}
colnames(xcum) <- c("trial", paste("bet", 1:numbet, sep=""))
mxcum <- reshape(data.frame(xcum), varying=1+1:numbet,
idvar="trial", v.names="outcome", direction="long", timevar="bet")
#from the other question
require(MASS)
dens <- kde2d(mxcum$bet, mxcum$outcome)
filled.contour(dens)
I don't quite understand what's going on, but this seems to be more like what I wanted to produce (obviously without the different sized bins).
Update: This is similar to the other plots here. It is not quite right:
plot(hexbin(x=mxcum$bet, y=mxcum$outcome))
Last try. As above:
image(mxcum$bet, mxcum$outcome)
This is pretty good. I would just like it to look like my hand-drawn sketch.
Edit
I think the following solution does what you ask for.
(Note that this is slow, especially the reshape step)
numbet <- 32
numtri <- 1e5
prob=5/6
#Fill a matrix
xcum <- matrix(NA, nrow=numtri, ncol=numbet+1)
for (i in 1:numtri) {
x <- sample(c(0,1), numbet, prob=c(prob, 1-prob), replace = TRUE)
xcum[i, ] <- c(i, cumsum(x)/cumsum(1:numbet))
}
colnames(xcum) <- c("trial", paste("bet", 1:numbet, sep=""))
mxcum <- reshape(data.frame(xcum), varying=1+1:numbet,
idvar="trial", v.names="outcome", direction="long", timevar="bet")
library(plyr)
mxcum2 <- ddply(mxcum, .(bet, outcome), nrow)
mxcum3 <- ddply(mxcum2, .(bet), summarize,
ymin=c(0, head(seq_along(V1)/length(V1), -1)),
ymax=seq_along(V1)/length(V1),
fill=(V1/sum(V1)))
head(mxcum3)
library(ggplot2)
p <- ggplot(mxcum3, aes(xmin=bet-0.5, xmax=bet+0.5, ymin=ymin, ymax=ymax)) +
geom_rect(aes(fill=fill), colour="grey80") +
scale_fill_gradient("Outcome", formatter="percent", low="red", high="blue") +
scale_y_continuous(formatter="percent") +
xlab("Bet")
print(p)
FYI: This is more of an extended comment than an answer.
To me, this new plot looks like a stacked bar where each bar's height is equal to the intersection points of the upper and lower line at the next trial.
The way that I would approach this is to treat "Trials" as a categorical variable. Then we can search each row of xcum for elements that are equal. If they are, then we can consider this to be a point of intersection whose minima also represents the multiple defining the height of our bars.
x <- t(xcum)
x <- x[duplicated(x),]
x[x==0] <- NA
Now we have the multiples of the actual points, we need to figure out how to take it to the next step and find a way of binning the information. That means we need to make a decision about how many points will represent each grouping. Let's write some points out for posterity.
Trial 1 (2) = 1, 0.5 # multiple = 0.5
Trial 2 (3) = 1, 0.66, 0.33 # multiple = 0.33
Trial 3 (4) = 1, 0.75, 0.5, 0.25 # multiple = 0.25
Trial 4 (5) = 1, 0.8, 0.6, 0.4, 0.2 # multiple = 0.2
Trial 5 (6) = 1, 0.8333335, 0.6666668, 0.5000001, 0.3333334, 0.1666667
...
Trial 36 (35) = 1, 0.9722223, ..., 0.02777778 # mutiple = 0.05555556 / 2
In other words, for each Trial there are n-1 points to plot. In your drawing you have 7 bins. So we need to figure out the multiples for each bin.
Let's cheat and divide the last two columns by two, we know from visual inspection that the minima is lower than 0.05
x[,35:36] <- x[,35:36] / 2
Then find the minimum of each column:
x <- apply(x, 2, function(x) min(x, na.rm=T))[-1] # Drop the 1
x <- x[c(1,2,3,4,8,17,35)] # I'm just guessing here by the "look" of your drawing.
The clearest way to do this is to create each bin separately. Obviously, this could be done automatically later. Remembering that each point is
bin1 <- data.frame(bin = rep("bin1",2), Frequency = rep(x[1],2))
bin2 <- data.frame(bin = rep("bin2",3), Frequency = rep(x[2],3))
bin3 <- data.frame(bin = rep("bin3",4), Frequency = rep(x[3],4))
bin4 <- data.frame(bin = rep("bin4",5), Frequency = rep(x[4],5))
bin5 <- data.frame(bin = rep("bin5",9), Frequency = rep(x[5],9))
bin6 <- data.frame(bin = rep("bin6",18), Frequency = rep(x[6],18))
bin7 <- data.frame(bin = rep("bin7",36), Frequency = rep(x[7],36))
df <- rbind(bin1,bin2,bin3,bin4,bin5,bin6,bin7)
ggplot(df, aes(bin, Frequency, color=Frequency)) + geom_bar(stat="identity", position="stack")
Related
Let's say I have a histogram with two overlapping groups. Here's a possible command from ggplot2 and a pretend output graph.
ggplot2(data, aes(x=Variable1, fill=BinaryVariable)) + geom_histogram(position="identity")
So what I have is the frequency or count of each event. What I'd like to do instead is to get the difference between the two events in each bin. Is this possible? How?
For example, if we do RED minus BLUE:
Value at x=2 would be ~ -10
Value at x=4 would be ~ 40 - 200 = -160
Value at x=6 would be ~ 190 - 25 = 155
Value at x=8 would be ~ 10
I'd prefer to do this using ggplot2, but another way would be fine. My dataframe is set up with items like this toy example (dimensions are actually 25000 rows x 30 columns) EDITED: Here is example data to work with GIST Example
ID Variable1 BinaryVariable
1 50 T
2 55 T
3 51 N
.. .. ..
1000 1001 T
1001 1944 T
1002 1042 N
As you can see from my example, I'm interested in a histogram to plot Variable1 (a continuous variable) separately for each BinaryVariable (T or N). But what I really want is the difference between their frequencies.
So, in order to do this we need to make sure that the "bins" we use for the histograms are the same for both levels of your indicator variable. Here's a somewhat naive solution (in base R):
df = data.frame(y = c(rnorm(50), rnorm(50, mean = 1)),
x = rep(c(0,1), each = 50))
#full hist
fullhist = hist(df$y, breaks = 20) #specify more breaks than probably necessary
#create histograms for 0 & 1 using breaks from full histogram
zerohist = with(subset(df, x == 0), hist(y, breaks = fullhist$breaks))
oneshist = with(subset(df, x == 1), hist(y, breaks = fullhist$breaks))
#combine the hists
combhist = fullhist
combhist$counts = zerohist$counts - oneshist$counts
plot(combhist)
So we specify how many breaks should be used (based on values from the histogram on the full data), and then we compute the differences in the counts at each of those breaks.
PS It might be helpful to examine what the non-graphical output of hist() is.
Here's a solution that uses ggplot as requested.
The key idea is to use ggplot_build to get the rectangles computed by stat_histogram. From that you can compute the differences in each bin and then create a new plot using geom_rect.
setup and create a mock dataset with lognormal data
library(ggplot2)
library(data.table)
theme_set(theme_bw())
n1<-500
n2<-500
k1 <- exp(rnorm(n1,8,0.7))
k2 <- exp(rnorm(n2,10,1))
df <- data.table(k=c(k1,k2),label=c(rep('k1',n1),rep('k2',n2)))
Create the first plot
p <- ggplot(df, aes(x=k,group=label,color=label)) + geom_histogram(bins=40) + scale_x_log10()
Get the rectangles using ggplot_build
p_data <- as.data.table(ggplot_build(p)$data[1])[,.(count,xmin,xmax,group)]
p1_data <- p_data[group==1]
p2_data <- p_data[group==2]
Join on the x-coordinates to compute the differences. Note that the y-values aren't the counts, but the y-coordinates of the first plot.
newplot_data <- merge(p1_data, p2_data, by=c('xmin','xmax'), suffixes = c('.p1','.p2'))
newplot_data <- newplot_data[,diff:=count.p1 - count.p2]
setnames(newplot_data, old=c('y.p1','y.p2'), new=c('k1','k2'))
df2 <- melt(newplot_data,id.vars =c('xmin','xmax'),measure.vars=c('k1','diff','k2'))
make the final plot
ggplot(df2, aes(xmin=xmin,xmax=xmax,ymax=value,ymin=0,group=variable,color=variable)) + geom_rect()
Of course the scales and legends still need to be fixed, but that's a different topic.
I posted following question on https://stats.stackexchange.com/questions/117578/density-distribution-of-outcomes-of-2-dice-rolled but did not get any response.
This question is related to: A histogram with a bar for each frequency value
Two dice are rolled and the sum is plotted. The histogram is as expected but density graph shows different densities of 2 and 12 and the plot is assymetric. Why is this so?
Amongst the outcomes of 2 dice rolled, the chances of 2 are equal to that of 12. Why is it that the density graph is of unequal values?
num.dices <- 2L
num.rolls <- 100000L
outcomes <- matrix(sample(1:6, num.dices * num.rolls, replace = TRUE),
nrow = num.rolls, ncol = num.dices)
sums <- rowSums(outcomes)
Histogram:
ggplot(data.frame(sums), aes(x=factor(sums)))+geom_histogram()
Density plot:
ggplot(data.frame(sums), aes(x=factor(sums), fill=factor(sums)))+geom_density()
I also tried:
ggplot(data.frame(sums), aes(x=factor(sums), fill=factor(sums)))+geom_density(aes(y = ..count..))
It seems to me, that you are doing 11 different kernel-estimations with 11 diff. bandwitdths, instead do:
ggplot(data.frame(sums), aes(x=sums, fill=2))+geom_density()
or you could add group=1 if you insist to do it with the extra arguments:
ggplot(data.frame(sums), aes(x=sums, fill=factor(sums)))+geom_density(aes(group=1))
The specific example is that imagine x is some continuous variable between 0 and 10 and that the red line is distribution of "goods" and the blue is "bads", I'd like to see if there is value in incorporating this variable into checking for 'goodness' but I'd like to first quantify the amount of stuff in the areas where the blue > red
Because this is a distribution chart, the scales look the same, but in reality there is 98 times more good in my sample which complicates things, since it's not actually just measuring the area under the curve, but rather measuring the bad sample where it's distribution is along lines where it's greater than the red.
I've been working to learn R, but am not even sure how to approach this one, any help appreciated.
EDIT
sample data:
http://pastebin.com/7L3Xc2KU <- a few million rows of that, essentially.
the graph is created with
graph <- qplot(sample_x, bad_is_1, data=sample_data, geom="density", color=bid_is_1)
The only way I can think of to do this is to calculate the area between the curve using simple trapezoids. First we manually compute the densities
d0 <- density(sample$sample_x[sample$bad_is_1==0])
d1 <- density(sample$sample_x[sample$bad_is_1==1])
Now we create functions that will interpolate between our observed density points
f0 <- approxfun(d0$x, d0$y)
f1 <- approxfun(d1$x, d1$y)
Next we find the x range of the overlap of the densities
ovrng <- c(max(min(d0$x), min(d1$x)), min(max(d0$x), max(d1$x)))
and divide that into 500 sections
i <- seq(min(ovrng), max(ovrng), length.out=500)
Now we calculate the distance between the density curves
h <- f0(i)-f1(i)
and using the formula for the area of a trapezoid we add up the area for the regions where d1>d0
area<-sum( (h[-1]+h[-length(h)]) /2 *diff(i) *(h[-1]>=0+0))
# [1] 0.1957627
We can plot the region using
plot(d0, main="d0=black, d1=green")
lines(d1, col="green")
jj<-which(h>0 & seq_along(h) %% 5==0); j<-i[jj];
segments(j, f1(j), j, f1(j)+h[jj])
Here's a way to shade the area between two density plots and calculate the magnitude of that area.
# Create some fake data
set.seed(10)
dat = data.frame(x=c(rnorm(1000, 0, 5), rnorm(2000, 0, 1)),
group=c(rep("Bad", 1000), rep("Good", 2000)))
# Plot densities
# Use y=..count.. to get counts on the vertical axis
p1 = ggplot(dat) +
geom_density(aes(x=x, y=..count.., colour=group), lwd=1)
Some extra calculations to shade the area between the two density plots
(adapted from this SO question):
pp1 = ggplot_build(p1)
# Create a new data frame with densities for the two groups ("Bad" and "Good")
dat2 = data.frame(x = pp1$data[[1]]$x[pp1$data[[1]]$group==1],
ymin=pp1$data[[1]]$y[pp1$data[[1]]$group==1],
ymax=pp1$data[[1]]$y[pp1$data[[1]]$group==2])
# We want ymax and ymin to differ only when the density of "Good"
# is greater than the density of "Bad"
dat2$ymax[dat2$ymax < dat2$ymin] = dat2$ymin[dat2$ymax < dat2$ymin]
# Shade the area between "Good" and "Bad"
p1a = p1 +
geom_ribbon(data=dat2, aes(x=x, ymin=ymin, ymax=ymax), fill='yellow', alpha=0.5)
Here are the two plots:
To get the area (number of values) in specific ranges of Good and Bad, use the density function on each group (or you can continue to work with the data pulled from ggplot as above, but this way you get more direct control over how the density distribution is generated):
## Calculate densities for Bad and Good.
# Use same number of points and same x-range for each group, so that the density
# values will line up. Use a higher value for n to get a finer x-grid for the density
# values. Use a power of 2 for n, because the density function rounds up to the nearest
# power of 2 anyway.
bad = density(dat$x[dat$group=="Bad"],
n=1024, from=min(dat$x), to=max(dat$x))
good = density(dat$x[dat$group=="Good"],
n=1024, from=min(dat$x), to=max(dat$x))
## Normalize so that densities sum to number of rows in each group
# Number of rows in each group
counts = tapply(dat$x, dat$group, length)
bad$y = counts[1]/sum(bad$y) * bad$y
good$y = counts[2]/sum(good$y) * good$y
## Results
# Number of "Good" in region where "Good" exceeds "Bad"
sum(good$y[good$y > bad$y])
[1] 1931.495 # Out of 2000 total in the data frame
# Number of "Bad" in region where "Good" exceeds "Bad"
sum(bad$y[good$y > bad$y])
[1] 317.7315 # Out of 1000 total in the data frame
I am a new R user and am having trouble graphing some data in a bar plot. Sorry in advance if this is really easy to do, and I just can’t figure it out.
I have six sets of data: 3 data sets for car #1 at 1, 5, and 10yrs, and 3 data sets of car#2 at 1,5, and 10 yrs., where measurements for each car at each age would consist of 1.) counting the total number of dents on the cars exterior and 2.) number of dents that remove paint. I want to make a boxplot with 6 bars, corresponding to each car and their respective ages, where the column height is the total number of dents that remove paint, with standard deviation bars.
Here’s what I’ve been trying so far (only 2 data sets included):
car1yr1 = c(rep(0, 101), rep(1, 9)) #car has 9 dents that remove paint
car1yr5 = c(rep(0, 131), rep(1, 19)) #car has 19 dents that remove paint
sd1 = sd(car1yr1)
sd2 = sd(car1yr5)
stdv = c(sd1, sd2)
car1yr1 = car1yr1[1:150]
dentsCar1 = data.frame("Car1Yr1" = car1yr1, "Car1Yr5" = car1yr5)
barplot(as.matrix(dentsCar1, ylim = c(0, 50), beside = TRUE))
I’ve found an example of error bars: arrows(bar, x, bar, x+ -(stdv), length = 0.15, angle = 90), but I can’t get this to work with my numbers. Also, in this example, the y-axis stops at 15, but the bars Car1Yr5 goes until 19. How can I draw a y-axis up to 20 or 30?
Again, I’m new at R and any help would be greatly appreciated. I’ve been trying to solve this on my own off and on for about 2 weeks. Thanks.
I am a little confused by your data... I am assuming from your example that car 1 has 101 dents that did not remove paint and 9 that did and car 2 has 131 that did not and 19 that did.
Now calculating the standard deviation on the number of dents does not make much sense to me... you are plotting count data, so you should not have any standard deviation unless you have, say, many cars of the same model and you want to see the variability between cars.
The best thing to do would be to calculate the % of dents that removed paint by doing:
car1yr1 = c(rep(0, 101), rep(1, 9)) #car has 9 dents that remove paint
car1yr5 = c(rep(0, 131), rep(1, 19)) #car has 19 dents that remove paint
# The total number of observations is the total number of dents
total.dents.1 <- length(car1yr1)
total.dents.5 <- length(car1yr5)
# The dents that remove paint are marked as 1, the others with 0,
# so we can just sum all of the data to get the number of paint-removing dents
dents.paint.1 <- sum(car1yr1)
dents.paint.5 <- sum(car1yr5)
# Alternatively you can use
# dents.paint.1 <- length(which(car1yr1==1))
# Calculate the %
dents.paint.perc.1 <- dents.paint.1/total.dents.1
dents.paint.perc.5 <- dents.paint.1/total.dents.5
df <- data.frame(dents.paint.perc.1, dents.paint.perc.5)
# Plot the data.
# ylim specifies the limits of the y axis
# ylab defines the axis title.
# las=1 puts the labels on the y axis horizontally
# names defines the labels on the x axis
barplot(as.matrix(df)*100, ylim=c(0,20),
ylab="% dents removing paint", las=1,
names=c("Car 1 year 1", "Car 1 year 5"))
In general it would be much better to put all your data in a single list, so that you can use the *apply family of function to perform repetitive operations on all of your dataset. This will give you cleaner and more manageable code. Also, if you add more data it will automagically add it to the plot.
I'm trying to get a fine-grain visualisation of critical values I got from posthoc Tukey. There are some good guidelines out there for visualizing pairwise comparisons, but I need something more refined. The idea is that I would have a plot where each small square would represent a critical value from the matrix below, coded in such manner that:
if the value is higher or equal to 5.45 - it's a black square;
if the value is lower or equal to -5.45 - it's a gray square;
if the value is between -5.65 and 5.65 - it's a white square.
The data matrix is here.
Or maybe you would have better suggestion how to visualize those critical values?
EDIT: Following comments from #Aaron and #DWin I want to provide a bit more context for the above data and justification for my question. I am looking at the mean ratings of acceptability for seven virtual characters, each of them is animated on 5 different levels. So, I have two factors there - character (7 levels) and motion (5 levels). Because I have found interaction between those two factors, I decided to look at differences between the means for all the characters for all levels of motion , which resulted in this massive matrix, as an output of posthoc Tukey. It's probably too much detail now, but please don't throw me out to Cross Validated, they will eat me alive...
This is fairly straightforward with image:
d <- as.matrix(read.table("http://dl.dropbox.com/u/2505196/postH.dat"))
image(x=1:35, y=1:35, as.matrix(d), breaks=c(min(d), -5.45, 5.45, max(d)),
col=c("grey", "white", "black"))
For just half, set half to missing with d[upper.tri(d)] <- NA and add na.rm=TRUE to the
min and max functions.
Here is a ggplot2 solution. I'm sure there are simpler ways to accomplish this -- I guess I got carried away!
library(ggplot2)
# Load data.
postH = read.table("~/Downloads/postH.dat")
names(postH) = paste("item", 1:35, sep="") # add column names.
postH$item_id_x = paste("item", 1:35, sep="") # add id column.
# Convert data.frame to long form.
data_long = melt(postH, id.var="item_id_x", variable_name="item_id_y")
# Convert to factor, controlling the order of the factor levels.
data_long$item_id_y = factor(as.character(data_long$item_id_y),
levels=paste("item", 1:35, sep=""))
data_long$item_id_x = factor(as.character(data_long$item_id_x),
levels=paste("item", 1:35, sep=""))
# Create critical value labels in a new column.
data_long$critical_level = ifelse(data_long$value >= 5.45, "high",
ifelse(data_long$value <= -5.65, "low", "middle"))
# Convert to labels to factor, controlling the order of the factor levels.
data_long$critical_level = factor(data_long$critical_level,
levels=c("high", "middle", "low"))
# Named vector for ggplot's scale_fill_manual
critical_level_colors = c(high="black", middle="grey80", low="white")
# Calculate grid line positions manually.
x_grid_lines = seq(0.5, length(levels(data_long$item_id_x)), 1)
y_grid_lines = seq(0.5, length(levels(data_long$item_id_y)), 1)
# Create plot.
plot_1 = ggplot(data_long, aes(xmin=as.integer(item_id_x) - 0.5,
xmax=as.integer(item_id_x) + 0.5,
ymin=as.integer(item_id_y) - 0.5,
ymax=as.integer(item_id_y) + 0.5,
fill=critical_level)) +
theme_bw() +
opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank()) +
coord_cartesian(xlim=c(min(x_grid_lines), max(x_grid_lines)),
ylim=c(min(y_grid_lines), max(y_grid_lines))) +
scale_x_continuous(breaks=seq(1, length(levels(data_long$item_id_x))),
labels=levels(data_long$item_id_x)) +
scale_y_continuous(breaks=seq(1, length(levels(data_long$item_id_x))),
labels=levels(data_long$item_id_y)) +
scale_fill_manual(name="Critical Values", values=critical_level_colors) +
geom_rect() +
geom_hline(yintercept=y_grid_lines, colour="grey40", size=0.15) +
geom_vline(xintercept=x_grid_lines, colour="grey40", size=0.15) +
opts(axis.text.y=theme_text(size=9)) +
opts(axis.text.x=theme_text(size=9, angle=90)) +
opts(title="Critical Values Matrix")
# Save to pdf file.
pdf("plot_1.pdf", height=8.5, width=8.5)
print(plot_1)
dev.off()
If you set this up with findInterval as an index into the bg, col, and/or pch arguments (although they are all squares at the moment), you should find the code fairly compact and understandable.
You'll need to get the data in long format first; here's one way:
d <- as.matrix(read.table("http://dl.dropbox.com/u/2505196/postH.dat"))
dat <- within(as.data.frame(as.table(d)),
{ Var1 <- as.numeric(Var1)
Var2 <- as.numeric(Var2) })
Then the code is as follows; pch=22 uses filled squares, bg sets the fill color of the square, col sets the border color, and cex=1.5 just makes them a little bigger than the default.
plot(dat$Var1, dat$Var2,
bg = c("grey", "white", "black")[1+findInterval(dat$Freq, c(-5.45,5.45))],
col="white", cex=1.5, pch = 22)
You need the 1+ in there because the values would be 0,1,2 and your indices need to start with 1.
To make a closure here I used majority of suggestions from #DWin and #Aaron to create the plot below. The lightest level of gray stands for non-significant values. I also used rect to create lines above axis names to better differentiate between conditions:
d <- as.matrix(read.table("http://dl.dropbox.com/u/2505196/postH.dat"))
#remove upper half of the values (as they are mirrored values)
d[upper.tri(d)] <- NA
dat <- within(as.data.frame(as.table(d)),{
Var1 <- as.numeric(Var1)
Var2 <- as.numeric(Var2)})
par(mar=c(6,3,3,6))
colPh=c("gray50","gray90","black")
plot(dat$Var1,dat$Var2,bg = colPh[1+findInterval(dat$Freq, c(-5.45,5.45))],
col="white",cex=1.2,pch = 21,axes=F,xlab="",ylab="")
labDis <- rep(c("A","B","C","D","E"),times=7)
labChar <- c(1:7)
axis(1,at=1:35,labels=labDis,cex.axis=0.5,tick=F,line=-1.4)
axis(1,at=seq(3,33,5),labels=labChar, tick=F)
#drawing lines above axis for better identification
rect(1,0,5,0,angle=90);rect(6,0,10,0,angle=90);rect(11,0,15,0,angle=90);
rect(16,0,20,0,angle=90);rect(21,0,25,0,angle=90);rect(26,0,30,0,angle=90);
rect(31,0,35,0,angle=90)
axis(4,at=1:35,labels=labDis,cex.axis=0.5,tick=F,line=-1.4)
axis(4,at=seq(3,33,5),labels=labChar,tick=F)
#drawing lines above axis for better identification
rect(36,1,36,5,angle=90);rect(36,6,36,10,angle=90);rect(36,11,36,15,angle=90);
rect(36,16,36,20,angle=90);rect(36,21,36,25,angle=90);rect(36,26,36,30,angle=90);
rect(36,31,36,35,angle=90)
legend("topleft",legend=c("not significant","p<0.01","p<0.05"),pch=16,
col=c("gray90","gray50","black"),cex=0.7,bty="n")