I've made a quick example data frame for this below. Basically I want to create a line plot with the average value as a line and a shadow around that line representative of the range of the values. I realise I'll likely have to find row min/max but am unsure how to do this for rows and also don't know how I would go about plotting this
TEST <- data.frame(a=c(1,5,7,2), b=c(3,8,2,5), c=c(6,10,2,1))
TEST$mean <- rowMeans(TEST)
Any help appreciated - Thanks
It is probably easily done with base R too, but here's a ggplot approach
Adding Min and Max and some index for the x axis
TEST <- transform(TEST, Min = pmin(a,b,c), Max = pmax(a,b,c), indx = seq_len(dim(TEST)[1]))
Plotting, using geom_ribbon
library(ggplot2)
ggplot(TEST) +
geom_line(aes(indx, mean), group = 1) +
geom_ribbon(aes(x = indx, ymax = Max, ymin = Min), alpha = 0.6, fill = "skyblue")
Just to add another option, here's a possibile solution using only base R:
TEST <- data.frame(a=c(1,5,7,2), b=c(3,8,2,5), c=c(6,10,2,1))
# compute mean, min and max of rows
means <- rowMeans(TEST)
maxs <- apply(TEST,1,max)
mins <- apply(TEST,1,min)
# create x-coordinates
xcoords <- 1:nrow(TEST)
# create an empty plot to make space for everything
plot(x=c(min(xcoords),max(xcoords)),y=c(min(mins),max(maxs)),
type="n", main="Average",xlab="X",ylab="Y")
# add min-max ranges (color is DodgerBlue with 80/255 of opacity,
# for rgb values of colors see http://en.wikipedia.org/wiki/Web_colors)
rangecolor <- rgb(30,144,255,alpha=80,maxColorValue=255)
polygon(x=c(xcoords,rev(xcoords)),y=c(maxs,rev(means)),col=rangecolor,border=NA)
polygon(x=c(xcoords,rev(xcoords)),y=c(mins,rev(means)),col=rangecolor,border=NA)
# add average line (black)
meancolor <- "black"
lines(x=xcoords,y=means,col=meancolor)
Result :
For future reuse, you can also wrap it into a helpful function :
plotLineWithRange <- function(x, yVal, yMin, yMax,
lineColor="Black", rangeColor="LightBlue",
main="", xlab="X", ylab="Y"){
if(missing(x)){
x <- 1:length(yVal)
}
stopifnot(length(yVal) == length(yMin) && length(yVal) == length(yMax))
plot(x=c(min(x),max(x)),y=c(min(yMin),max(yMax)),type="n", main=main,xlab=xlab,ylab=ylab)
polygon(x=c(x,rev(x)),y=c(yMax,rev(yVal)),col=rangeColor,border=NA)
polygon(x=c(x,rev(x)),y=c(yMin,rev(yVal)),col=rangeColor,border=NA)
lines(x=x,y=yVal,col=lineColor)
}
# usage example:
plotLineWithRange(yVal=means,yMin=mins,yMax=maxs,main="Average")
Related
I have an object from ggplot2, say myPlot, how can I identify the ranges for the x and y axes?
It doesn't seem to be a simple multiple of the data values' range, because one can rescale plots, modify axes' ranges, and so on. findFn (from sos) and Google don't seem to be turning up relevant results, other than how to set the axes' ranges.
I am using ggplot2 version 2, I am not sure if this is same is previous version,
Suppose you have saved your plot on plt object. It is easy to extract the ranges,
# y-range
layer_scales(plt)$y$range$range
# x-range
layer_scales(plt)$x$range$range
In case of facet plot, you can access scales of individual facets using layer_scales(plot, row_idx, col_idx). For example to access the facet at first row and second column,
# y-range
layer_scales(plt, 1, 2)$y$range$range
# x-range
layer_scales(plt, 1, 2)$x$range$range
In newer versions of ggplot2, you can find this information among the output of ggplot_build(p), where p is your ggplot object.
For older versions of ggplot (< 0.8.9), the following solution works:
And until Hadley releases the new version, this might be helpful. If you do not set the limits in the plot, there will be no info in the ggplot object. However, in that case you case you can use the defaults of ggplot2 and get the xlim and ylim from the data.
> ggobj = ggplot(aes(x = speed, y = dist), data = cars) + geom_line()
> ggobj$coordinates$limits
$x
NULL
$y
NULL
Once you set the limits, they become available in the object:
> bla = ggobj + coord_cartesian(xlim = c(5,10))
> bla$coordinates$limits
$x
[1] 5 10
$y
NULL
November 2018 UPDATE
As of ggplot2 version 3.1.0, the following works:
obj <- qplot(mtcars$disp, bins = 5)
# x range
ggplot_build(obj)$layout$panel_params[[1]]$x.range
# y range
ggplot_build(obj)$layout$panel_params[[1]]$y.range
A convenience function:
get_plot_limits <- function(plot) {
gb = ggplot_build(plot)
xmin = gb$layout$panel_params[[1]]$x.range[1]
xmax = gb$layout$panel_params[[1]]$x.range[2]
ymin = gb$layout$panel_params[[1]]$y.range[1]
ymax = gb$layout$panel_params[[1]]$y.range[2]
list(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax)
}
get_plot_limits(p)
Until the next update...
Get the yrange with
ggplot_build(myPlot)$panel$ranges[[1]]$y.range
and the xrange with
ggplot_build(myPlot)$panel$ranges[[1]]$x.range
In version 2.2.0 this has to be done as follows:
# y-range
ggplot_build(plot.object)$layout$panel_ranges[[1]]$y.range
# x-range
ggplot_build(plot.object)$layout$panel_ranges[[1]]$x.range
As of Aug 2018 you extract the x and y-axes ranges with the following.
ggplot_build(obj)$layout$panel_scales_x[[1]]$range$range
ggplot_build(obj)$layout$panel_scales_y[[1]]$range$range
As mentioned here: https://gist.github.com/tomhopper/9076152#gistcomment-2624958 there is a difference between the two options:
#get ranges of the data
ggplot_build(obj)$layout$panel_scales_x[[1]]$range$range
ggplot_build(obj)$layout$panel_scales_y[[1]]$range$range
#get ranges of the plot axis
ggplot_build(obj)$layout$panel_params[[1]]$x.range
ggplot_build(obj)$layout$panel_params[[1]]$y.range
Here is a set of convenience functions to take a list of plots, extract the common y-axis range and replace it. I needed it because I used different data sets within one graph arranged via ggarange :
require(ggplot2)
#get the visible scales from single plots
get_plot_view_ylimits <- function(plot) {
gb = ggplot_build(plot)
ymin = gb$layout$panel_params[[1]]$y.range[1]
ymax = gb$layout$panel_params[[1]]$y.range[2]
message(paste("limits are:",ymin,ymax))
list(ymin = ymin, ymax = ymax)
}
#change the limit of single plot, using list of limits
change_plot_ylimits <- function(plot, nlimits){
p <- plot + ggplot2:::limits(unlist(nlimits, use.names =FALSE),"y")
}
#adjust the scales of multiple plots
#take a list of plots, passes back adjusted list of plots
adjust_plots_shared_ylimits <- function(plotList) {
#read limits
first <- TRUE
for (plot in plotList) {
if (first) {
nlimits <- get_plot_view_ylimits(plot)
first <- FALSE
} else {
altLimits <- get_plot_view_ylimits(plot)
nlimits$ymin <- min(nlimits$ymin,altLimits$ymin)
nlimits$ymax <- max(nlimits$ymax,altLimits$ymax)
}
}
message(paste("new limits are:",nlimits$ymin,nlimits$ymax))
#adjust limits
lapply(plotList,change_plot_ylimits,nlimits)
}
I thought this might also be useful for others.
This is a potential work around for you! This works unless you change the axis limits in the layout of the plot. It essentially takes the range from the data in the plot, so it work better when the axis is changed through filtering data rather than by using the layout function.
Here's the code!
# load ggplot2
library(ggplot2)
# A basic scatterplot
p <-ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
geom_point(size=6)
# p$data returns the dataset used to create the plot (iris)
head(p$data)
# Choose plot variable you want range for
range(p$data[,"Sepal.Length"]) # * c(0.95, 1.05)
Its not a perfect solution but is a great easy quick workaround, hope it helped!
Update:
ggplot2 version 3.3.2 now uses this code:
xmin <- ggplot_build(myPlot)$layout$panel_params[[1]]$x_range[1]
xmax <- ggplot_build(myPlot)$layout$panel_params[[1]]$x_range[2]
I'm working with a data frame of size 2 x 400. I need to graph this (let's call it data set A) on the same graph as the main data set for my project.
All I need is the general shape of data set A's graph. ie i only need to see the trend.
The scale that data set A takes place on happens to be much smaller than that of the main graph. So dataset A just looks like a horizontal line.
I decided to scale data set A by multiplying it by a factor of... I tried various values to get the optimum vertical scaling, which leads me to the problem I'm having.
When trying to find the ideal multiplicative factor by trial and error, I expected the general shape of data set A's graph to retain its shape, and only vary in its relative vertical points . ie the horizontal coordinates of all maxes and mins shouldn't move, and only the vertical points should be moving. but this wasn't happening. I'd like to know why.
Here's the data set A (yellow), when multiplied by factor of 3:
factor of 5:
The yellow dots are the geom_point and the yellow curve is the corresponding geom_smooth.
EDIT:
here is my the code original code:
I haven't had much formal training with code. I'm apologize for any messiness!
library("ggplot2")
library("dplyr")
# READ IN DATA
temp_data <-read.table(col.names = "y",
"C:/Users/Ben/Documents/Visual Studio 2013/Projects/Home/Home/steamdata2.txt")
boilpoint <- which(temp_data$y == "boil") # JUST A MARKER..
temp_data <- filter(temp_data, y != "boil") # GETTING RID OF THE MARKER ENTRY
# DON'T KNOW WHY BUT I HAD TO DO THIS INTERMEDIATE STEP
# BEFORE I COULD CONVERT FROM FACTOR -> NUMERIC
temp_data$y <- as.character(temp_data$y)
# CONVERTING TO NUMERIC
temp_data$y <- as.numeric(temp_data$y)
# GETTING RID OF BASICALLY THE LAST ENTRY WHICH HAS THE LARGEST VALUE
temp_data <- filter(temp_data, y<max(temp_data$y))
# ADD ANOTHER COLUMN WITH THE ROW NUMBER,
# BECAUSE I DON'T KNOW HOW TO ACCESS THIS FOR GGPLOT
temp_data <- transform(temp_data, x = 1:nrow(temp_data))
n <- nrow(temp_data) # Num of readings
period <- temp_data[n,1] # (sec)
RpS <- n / period # Avg Readings per Second
MIN <- min(temp_data$y)
MAX <- max(temp_data$y)
# DERIVATIVE OF ORIGINAL
deriv <- data.frame(matrix(ncol=2, nrow=n))
# ADD ANOTHER COLUMN TO ACCESS ROW NUMBERS FOR GGPLOT LATER
colnames(deriv) <- c("y","x")
deriv <- transform(deriv, x = c(1:n))
# FILL DERIVATIVE DATAFRAME
deriv[1, 1] <- 0
for(i in 2:n){
deriv[i - 1, 1] <- temp_data[i, 1] - temp_data[i - 1, 1]
}
deriv <- filter(deriv, y != 0)
# DID THE SAME FOR SECOND DERIVATIVE
dderiv <- data.frame(matrix(ncol = 2, nrow = nrow(deriv)))
colnames(dderiv) <- c("y", "x")
dderiv <- transform(dderiv, x=rep(0, nrow(deriv)))
dderiv[1, 1] <- 0
for(i in 2:nrow(deriv)) {
dderiv$y[i - 1] <- (deriv$y[i] - deriv$y[i - 1]) /
(deriv$x[i] - deriv$x[i - 1])
dderiv$x[i - 1] <- deriv$x[i] + (deriv$x[i] - deriv$x[i - 1]) / 2
}
dderiv <- filter(dderiv, y!=0)
# HERE'S WHERE I FACTOR BY VARIOUS MULTIPLES
deriv <- MIN + deriv * 3
dderiv <- MIN + dderiv * 3
graph <- ggplot(temp_data, aes(x, y)) + geom_smooth()
graph <- graph + geom_point(data = deriv, color = "yellow")
graph <- graph + geom_smooth(data = deriv, color = "yellow")
graph <- graph + geom_point(data = dderiv, color = "green")
graph <- graph + geom_smooth(data = dderiv, color = "green")
graph <- graph + geom_vline(xintercept = boilpoint, color = "red")
graph <- graph + xlab("Readings (n)") +
ylab(expression(paste("Temperature (",degree,"C)")))
graph <- graph + xlim(c(0,n)) + ylim(c(MIN, MAX))
It's hard to check without your raw data, but I'm 99% sure that your main problem is that you're hard-coding the y limits with ylim(c(MIN, MAX)). This is exacerbated by accidentally scaling both variables in your deriv and dderiv data frame, not just y.
I was able to debug the problem when I noticed that your top "scale by 3" graph has a lot more yellow points than your bottom "scale by 5" graph.
The quick fix is don't scale the row numbers, only scale the y values, which is to say, replace this
# scales entire data frame: bad!
deriv <- MIN + deriv * 3
dderiv <- MIN + dderiv * 3
with this:
# only scale y
deriv$y <- MIN + deriv$y * 3
dderiv$y <- MIN + dderiv$y * 3
I think there is another problem too: even with my correction above, negative values of your derivatives will be excluded. If deriv$y or dderiv$y is ever negative, then MIN + deriv$y * 3 will be less than MIN, and since your y axis begins at MIN it won't be plotted.
So I think the whole fix would be to instead do something like
# keep the original y values around so we can experiment with scaling
# without running *all* the code again
deriv$y_orig <- deriv$y
# multiplicative scale
# fill in the value of `prop` to be the proportion of the vertical plot area
# that you want taken up by the derivative
deriv$y <- deriv$y_orig * diff(c(MIN, MAX)) / diff(range(deriv$y_orig)) * prop
# shift into plot range
# fill in the value of `intercept` to be the y value of the
# lowest point of this line
deriv$y <- deriv$y + MIN - min(deriv$y) + 1
I normally don't answer questions that aren't reproducible with data because I hate lack of clarity and I hate the inability to test. However, your question was very clear and I'm pretty sure this will work even without testing. Fingers crossed!
A few other, more general comments:
It's good you know that to convert factor to numeric you need to go via character. It's an annoyance, but if you want to understand more here's the r-faq on it.
I'm not sure why you bother with (deriv$x[i] - deriv$x[i - 1]) in your for loop. Since you define x to be 1, 2, 3, ... the difference is always 1. I'm more confused by why you divide by 2 in the second derivative.
Your for loop can probably be replaced by the diff() function. (See below.)
You seem to have just gotten your foot in the dplyr door, so I used base functions in my recommendation. Keep working with dplyr, I think you'll like it. The big dplyr function you're not using is mutate. It works like base::transform for adding new columns.
I dislike that you've created all these different data frames, it clutters things up. I think your code could be simplified to something like this
all_data = filter(temp_data, y != "boil") %>%
mutate(y = as.numeric(as.character(y))) %>%
filter(y < max(y)) %>%
mutate(
x = 1:n(),
deriv = c(NA, diff(y)) / c(NA, diff(x)),
dderiv = c(NA, diff(deriv)) / 2
)
Rather than having separate data frames for the original data, first derivative and second derivative, this puts them all in the same data frame.
The big benefit of having things in one data frame is that you could then "gather" it into a nice, long (rather than wide) tidy format and simplify your plotting call:
library(tidyr)
long_data = gather(all_data, key = function, value = y, y, deriv, dderiv)
Then your ggplot call would look more like this:
graph <- ggplot(temp_data, aes(x, y, color = function)) +
geom_smooth() +
geom_point() +
geom_vline(xintercept = boilpoint, color = "red") +
scale_color_manual(values = c("green", "yellow", "blue")) +
xlab("Readings (n)") +
ylab(expression(paste("Temperature (",degree,"C)"))) +
xlim(c(0,n)) + ylim(c(MIN, MAX))
With data in long format, you'd have a column of you data (I've named it "function") that maps to color, so you don't have to add all the layers one at a time, and you get a nicely generated legend!
I have a matrix with x rows (i.e. the number of draws) and y columns (the number of observations). They represent a distribution of y forecasts.
Now I would like to make sort of a 'heat map' of the draws. That is, I want to plot a 'confidence interval' (not really a confidence interval, but just all the values with shading in between), but as a 'heat map' (an example of a heat map ). That means, that if for instance a lot of draws for observation y=y* were around 1 but there was also a draw of 5 for that same observation, that then the area of the confidence interval around 1 is darker (but the whole are between 1 and 5 is still shaded).
To be totally clear: I like for instance the plot in the answer here, but then I would want the grey confidence interval to instead be colored as intensities (i.e. some areas are darker).
Could someone please tell me how I could achieve that?
Thanks in advance.
Edit: As per request: example data.
Example of the first 20 values of the first column (i.e. y[1:20,1]):
[1] 0.032067416 -0.064797792 0.035022338 0.016347263 0.034373065
0.024793101 -0.002514447 0.091411355 -0.064263536 -0.026808208 [11] 0.125831185 -0.039428744 0.017156454 -0.061574540 -0.074207109 -0.029171227 0.018906181 0.092816957 0.028899699 -0.004535961
So, the hard part of this is transforming your data into the right shape, which is why it's nice to share something that really looks like your data, not just a single column.
Let's say your data is this a matrix with 10,000 rows and 10 columns. I'll just use a uniform distribution so it will be a boring plot at the end
n = 10000
k = 10
mat = matrix(runif(n * k), nrow = n)
Next, we'll calculate quantiles for each column using apply, transpose, and make it a data frame:
dat = as.data.frame(t(apply(mat, MARGIN = 2, FUN = quantile, probs = seq(.1, 0.9, 0.1))))
Add an x variable (since we transposed, each x value corresponds to a column in the original data)
dat$x = 1:nrow(dat)
We now need to get it into a "long" form, grouped by the min and max values for a certain deviation group around the median, and of course get rid of the pesky percent signs introduced by quantile:
library(dplyr)
library(tidyr)
dat_long = gather(dat, "quantile", value = "y", -x) %>%
mutate(quantile = as.numeric(gsub("%", "", quantile)),
group = abs(50 - quantile))
dat_ribbon = dat_long %>% filter(quantile < 50) %>%
mutate(ymin = y) %>%
select(x, ymin, group) %>%
left_join(
dat_long %>% filter(quantile > 50) %>%
mutate(ymax = y) %>%
select(x, ymax, group)
)
dat_median = filter(dat_long, quantile == 50)
And finally we can plot. We'll plot a transparent ribbon for each "group", that is 10%-90% interval, 20%-80% interval, ... 40%-60% interval, and then a single line at the median (50%). Using transparency, the middle will be darker as it has more ribbons overlapping on top of it. This doesn't go from the mininum to the maximum, but it will if you set the probs in the quantile call to go from 0 to 1 instead of .1 to .9.
library(ggplot2)
ggplot(dat_ribbon, aes(x = x)) +
geom_ribbon(aes(ymin = ymin, ymax = ymax, group = group), alpha = 0.2) +
geom_line(aes(y = y), data = dat_median, color = "white")
Worth noting that this is not a conventional heatmap. A heatmap usually implies that you have 3 variables, x, y, and z (color), where there is a z-value for every x-y pair. Here you have two variables, x and y, with y depending on x.
That is not a lot to go on, but I would probably start with the hexbin or hexbinplot package. Several alternatives are presented in this SO post.
Formatting and manipulating a plot from the R package "hexbin"
In R I have created a simple matrix of one column yielding a list of numbers with a set mean and a given standard deviation.
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
r <- rnorm2(100,4,1)
I now would like to plot how these numbers differ from the mean. I can do this in Excel as shown below:
But I would like to use ggplot2 to create a graph in R. in the Excel graph I have cheated by using a line graph but if I could do this as columns it would be better. I have tried using a scatter plot but I cant work out how to turn this into deviations from the mean.
Perhaps you want:
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
set.seed(101)
r <- rnorm2(100,4,1)
x <- seq_along(r) ## sets up a vector from 1 to length(r)
par(las=1,bty="l") ## cosmetic preferences
plot(x, r, col = "green", pch=16) ## draws the points
## if you don't want points at all, use
## plot(x, r, type="n")
## to set up the axes without drawing anything inside them
segments(x0=x, y0=4, x1=x, y1=r, col="green") ## connects them to the mean line
abline(h=4)
If you were plotting around 0 you could do this automatically with type="h":
plot(x,r-4,type="h", col="green")
To do this in ggplot2:
library("ggplot2")
theme_set(theme_bw()) ## my cosmetic preferences
ggplot(data.frame(x,r))+
geom_segment(aes(x=x,xend=x,y=mean(r),yend=r),colour="green")+
geom_hline(yintercept=mean(r))
Ben's answer using ggplot2 works great, but if you don't want to manually adjust the line width, you could do this:
# Half of Ben's data
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
set.seed(101)
r <- rnorm2(50,4,1)
x <- seq_along(r) ## sets up a vector from 1 to length(r)
# New variable for the difference between each value and the mean
value <- r - mean(r)
ggplot(data.frame(x, value)) +
# geom_bar anchors each bar at zero (which is the mean minus the mean)
geom_bar(aes(x, value), stat = "identity"
, position = "dodge", fill = "green") +
# but you can change the y-axis labels with a function, to add the mean back on
scale_y_continuous(labels = function(x) {x + mean(r)})
in base R it's quite simple, just do
plot(r, col = "green", type = "l")
abline(4, 0)
You also tagged ggplot2, so in that case it will be a bit more complicated, because ggplot requires creating a data frame and then melting it.
library(ggplot2)
library(reshape2)
df <- melt(data.frame(x = 1:100, mean = 4, r = r), 1)
ggplot(df, aes(x, value, color = variable)) +
geom_line()
I have data in R with overlapping points.
x = c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y = c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
plot(x,y)
How can I plot these points so that the points that are overlapped are proportionally larger than the points that are not. For example, if 3 points lie at (4,5), then the dot at position (4,5) should be three times as large as a dot with only one point.
Here's one way using ggplot2:
x = c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y = c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
df <- data.frame(x = x,y = y)
ggplot(data = df,aes(x = x,y = y)) + stat_sum()
By default, stat_sum uses the proportion of instances. You can use raw counts instead by doing something like:
ggplot(data = df,aes(x = x,y = y)) + stat_sum(aes(size = ..n..))
Here's a simpler (I think) solution:
x <- c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y <- c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
size <- sapply(1:length(x), function(i) { sum(x==x[i] & y==y[i]) })
plot(x,y, cex=size)
## Tabulate the number of occurrences of each cooordinate
df <- data.frame(x, y)
df2 <- cbind(unique(df), value = with(df, tapply(x, paste(x,y), length)))
## Use cex to set point size to some function of coordinate count
## (By using sqrt(value), the _area_ of each point will be proportional
## to the number of observations it represents)
plot(y ~ x, cex = sqrt(value), data = df2, pch = 16)
You didn't really ask for this approach but alpha may be another way to address this:
library(ggplot2)
ggplot(data.frame(x=x, y=y), aes(x, y)) + geom_point(alpha=.3, size = 3)
You need to add the parameter cex to your plot function. First what I would do is use the function as.data.frame and table to reduce your data to unique (x,y) pairs and their frequencies:
new.data = as.data.frame(table(x,y))
new.data = new.data[new.data$Freq != 0,] # Remove points with zero frequency
The only downside to this is that it converts numeric data to factors. So convert back to numeric, and plot!
plot(as.numeric(new.data$x), as.numeric(new.data$y), cex = as.numeric(new.data$Freq))
You may also want to try sunflowerplot.
sunflowerplot(x,y)
Let me propose alternatives to adjusting the size of the points. One of the drawbacks of using size (radius? area?) is that the reader's evaluation of spot size vs. the underlying numeric value is subjective.
So, option 1: plot each point with transparency --- ninja'd by Tyler!
option 2: use jitter to push your data around slightly so the plotted points don't overlap.
A solution using lattice and table ( similar to #R_User but no need to remove 0 since lattice do the job)
dt <- as.data.frame(table(x,y))
xyplot(dt$y~dt$x, cex = dt$Freq^2, col =dt$Freq)