Use wordlayout results for ggplot geom_text - r

The R package wordcloud has a very useful function which is called wordlayout. It takes initial positions of words and their respective sizes an rearranges them in a way that they do not overlap. I would like to use the results of this functions to do a geom_text plot in ggplot.
I came up with the following example but soon realized that there seems to be a big difference betweetn cex (wordlayout) and size (geom_plot) since words in graphics package appear way larger.
here is my sample code. Plot 1 is the original wordcloud plot which has no overlaps:
library(wordcloud)
library(tm)
library(ggplot2)
samplesize=100
textdf <- data.frame(label=sample(stopwords("en"),samplesize,replace=TRUE),x=sample(c(1:1000),samplesize,replace=TRUE),y=sample(c(1:1000),samplesize,replace=TRUE),size=sample(c(1:5),samplesize,replace=TRUE))
#plot1
plot.new()
pdf(file="plot1.pdf")
textplot(textdf$x,textdf$y,textdf$label,textdf$size)
dev.off()
#plot2
ggplot(textdf,aes(x,y))+geom_text(aes(label = label, size = size))
ggsave("plot2.pdf")
#plot3
new_pos <- wordlayout(x=textdf$x,y=textdf$y,words=textdf$label,cex=textdf$size)
textdf$x <- new_pos[,1]
textdf$y <- new_pos[,2]
ggplot(textdf,aes(x,y))+geom_text(aes(label = label, size = size))
ggsave("plot3.pdf")
#plot4
textdf$x <- new_pos[,1]+0.5*new_pos[,3]#this is the way the wordcloud package rearranges the positions. I took this out of the textplot function
textdf$y <- new_pos[,2]+0.5*new_pos[,4]
ggplot(textdf,aes(x,y))+geom_text(aes(label = label, size = size))
ggsave("plot4.pdf")
is there a way to overcome this cex/size difference and reuse wordlayout for ggplots?

cex stands for character expansion and is the factor by which text is magnified relative the default, specified by cin - set on my installation to 0.15 in by 0.2 in: see ?par for more details.
#hadley explains that ggplot2 sizes are measured in mm. Therefore cex=1 would correspond to size=3.81 or size=5.08 depending on if it is being scaled by the width or height. Of course, font selection may cause differences.
In addition, to use absolute sizes, you need to have the size specification outside the aes otherwise it considers it a variable to map to and choose the scale itself, eg:
ggplot(textdf,aes(x,y))+geom_text(aes(label = label),size = textdf$size*3.81)

Sadly I think you're going to find the short answer is no! I think the package handles the text vector mapping differently from ggplot2, so you can tinker with size and font face/family, etc. but will struggle to replicate exactly what the package is doing.
I tried a few things:
1) Try to plot the grobs from textdata using annotation_custom
require(plyr)
require(grid)
# FIRST TRY PLOT INDIVIDUAL TEXT GROBS
qplot(0:1000,0:1000,geom="blank") +
alply(textdf,1,function(x){
annotation_custom(textGrob(label=x$label,0,0,c("center","center"),gp=gpar(cex=x$size)),x$x,x$x,x$y,x$y)
})
2) Run the wordlayout() function which should readjust the text, but difficult to see for what font (similarly doesn't work)
# THEN USE wordcloud() TO GET CO-ORDS
plot.new()
wordlayout(textdf$x,textdf$y,words=textdf$label,cex=textdf$size,xlim=c(min(textdf$x),max(textdf$x)),ylim=c(min(textdf$y),max(textdf$y)))
plotdata<-cbind(data.frame(rownames(w)),w)
colnames(plotdata)=c("word","x","y","w","h")
# PLOT WORDCLOUD DATA
qplot(0:1000,0:1000,geom="blank") +
alply(plotdata,1,function(x){
annotation_custom(textGrob(label=x$word,0,0,c("center","center"),gp=gpar(cex=x$h*40)),x$x,x$x,x$y,x$y)
})
Here's a cheat if you just want to overplot other ggplot functions on top of it (although the co-ords don't seem to match up exactly between the data and the plot). It basically images the wordcloud, removes the margins, and under-plots it at the same scale:
# make a png file of just the panel
plot.new()
png(filename="bgplot.png")
par(mar=c(0.01,0.01,0.01,0.01))
textplot(textdf$x,textdf$y,textdf$label,textdf$size,xaxt="n",yaxt="n",xlab="",ylab="",asp=1)
dev.off()
# library to get PNG file
require(png)
# then plot it behind the panel
qplot(0:1000,0:1000,geom="blank") +
annotation_custom(rasterGrob(readPNG("bgplot.png"),0,0,1,1,just=c("left","bottom")),0,1000,0,1000) +
coord_fixed(1,c(0,1000),c(0,1000))

Related

R ggbiplot for PCA results: why is the resulting plot so narrow and how to adjust the width?

So I do a PCA analysis, and I usually plotted the results with ggplot2, but I just recently discovered ggbiplot which can show arrows with the variables.
ggbiplot seems to be working ok, though it shows some problems (like the imposibility of changing point size, hence the whole layer thing I do in the MWE).
The problem I am facing now is that, while ggplot2 plots adjust the plot width to the plotting area, ggbiplot does not. With my data, the ggbiplot is horribly narrow and leaves horribly wide vertical margins, even though it expands the same x axis interval as the ggplot2 plot (it is, in fact, the same plot).
I am using the iris data here, so I had to make the png width extra large so the problem I am facing becomes evident. Please check the MWE below:
data(iris)
head(iris)
pca.obj <- prcomp(iris[,1:4],center=TRUE,scale.=TRUE)
pca.df <- data.frame(Species=iris$Species, as.data.frame(pca.obj$x))
rownames(pca.df) <- NULL
png(filename="test1.png", height=500, width=1000)
print(#or ggsave()
ggplot(pca.df, aes(x=PC1, y=PC2)) +
geom_point(aes(color=Species), cex=3)
)
dev.off()
P <- ggbiplot(pca.obj,
obs.scale = 1,
var.scale=1,
ellipse=T,
circle=F,
varname.size=3,
groups=iris$Species, #no need for coloring, I'm making the points invisible
alpha=0) #invisible points, I add them below
P$layers <- c(geom_point(aes(color=iris$Species), cex=3), P$layers) #add geom_point in a layer underneath (only way I have to change the size of the points in ggbiplot)
png(filename="test2.png", height=500, width=1000)
print(#or ggsave()
P
)
dev.off()
This code produces the following two images.
ggplot2 output (desired plot width):
ggbiplot output (plot too narrow for plotting area):
See how, while ggplot2 adjusts the plot width, to the plot area, ggbiplot does not. With my data, the ggbiplot plot is extremely narrow and leaves large vertical margins.
My question here is: How to make ggbiplot behave as ggplot2? How can I adjust the plot width to my desired plotting area (png size) with ggbiplot? Thanks!
Change the ratio argument in coord_equal() to some value smaller than 1 (default in ggbiplot()) and add it to your plot. From the function description: "Ratios higher than one make units on the y axis longer than units on the x-axis, and vice versa."
P + coord_equal(ratio = 0.5)
NOTE: as #Brian noted in the comments, "changing the aspect ratio would bias the interpretation of the length of the principal component vectors, which is why it's set to 1."

Plotted raster output in R won't eliminate legend margin

In R, I have a raster object generated from a kernel density analysis using the ks package. I convert this into a raster object (from the raster package) and try to draw that raster object to a PNG using plot(). I want the png to have exactly one pixel for every pixel in the raster object. Simple enough, right? By default of course, I get all sorts of extraneous junk added to the plot. I can remove most of this using the various settings in plot() or par(), but no matter what I do, I don't seem able to get rid of the space formerly taken up by the legend on the right side of the plot.
library('ks')
library('raster')
# generate the data
set.seed(1)
x = matrix(rnorm(1000,1,0.5),500)
xpix = 100
ypix = 100
# calculate the density function
k = kde(
x,
H=matrix(c(0.1,0,0,0.1),2),
xmin=c(0,0),
xmax=c(1,1),
gridsize=c(xpix,ypix)
)
# convert to raster
r = raster(k)
# plot the image to PNG
png('file.png',width=xpix,height=ypix)
par(
mar=c(0,0,0,0),
bty='n',
bg='black',
plt=c(0,1,0,1)
)
plot(
r,
legend=FALSE,
axes=FALSE,
plt=c(0,1,0,1)
)
# see that 'plt' did not change
print(par())
dev.off()
If I check par before closing the device, I can see that the 'plt' value is not what I set it to; it shows the right margin, where the plotting area has been nudged over to make space for the non-legend. Sample code is above, and the image it generates is linked to here.
Incidentally, I was able to achieve the correct effect with the image() function instead of plot(), though that introduced it's own problems, namely that transparency no longer worked. Can I solve this with plot()? It's very frustrating that I'm so close but just can't seem to change the size of the plot area! I don't want to use another graphics package if there is any way to make the base function work.

Automatically resize bars in ggplot for uniformity across several graphs R

I generate several bars graphs in a loop, and they all resize according to the output size (assume from plot/device size?) rather than according to the bar size. This means that plots with two bars have fat bars, and plots with, say, 6 bars, have thin bars; both outputs are the same size though. The code below represents my script with reproduceable data (I do many other aes/theme changes to mine).
I'd like the output plot to resize (in the dimension of bar width) so that the bars are always the same width across different graphs, but the output images change size according to the number of (same width) bars.
my_factors = c("vs","cyl","carb")
for (current_factor in my_factors) {
c <- ggplot(mtcars, aes(factor(current_factor)))
c + geom_bar() + coord_flip()
ggsave(paste0(my_factors(current_factor),".png")
}
Sorry if I've missed something glaring, I am new to ggplot, and R. I'm from MATLAB so the whole "device" thing still confuses me! In MATLAB I'd specify the bar size explicitly (i.e. not relatively), and the output would resize accordingly.
You can use this foo function
library(lazyeval)
library(ggplot2)
foo <- function(data,i, height_rate = 0.1){
height <- eval(substitute(length(unique(data$i))))
ld <- as.lazy_dots(list(lazy(i)))
ld <- as.lazy_dots(lapply(ld, function(x){
try(x$expr <- as.name(x$expr), silent=TRUE)
x
}))
x <- make_call(quote(aes),make_call(quote(factor),ld))
ggplot(data, eval(x$expr))+
geom_bar(width = height_rate*height)+
coord_flip()
}
foo(mtcars,"cyl")
Because of lazyeval package
foo(mtcars,cyl)
also works.A disadvantage of this code is usage only exact name of column. So in order to use for loop code has to be developed a bit. Hope it helps.

Exporting graphs in R

I have two graphs that I plotted in R and I want to export it as a high-resolution picture for publication.
For example:
a<-c(1,2,3,4,5,6,7)
b<-c(2,3,4,6,7,8,9)
par(mfrow=c(2,1))
plot (a,b)
plot(a,b)
I usually export this graph by:
dev.copy(jpeg,'test.jpeg',width=80,height=150,units="mm",res=200)
dev.off()
However I always find this process a bit troublesome. The graph that was plotted in R does not necessarily look like the one that I exported. Therefore, I am wondering if there is a way to specifiy the dimensions and resolution of graphs before I plot them so that I can visually inspect the graphs before I export them?
Thank you
You can try:
png('out.png')
a<-c(1,2,3,4,5,6,7)
b<-c(2,3,4,6,7,8,9)
par(mfrow=c(2,1))
plot (a,b)
plot(a,b)
dev.off()
As baptiste said, jpeg is the worst format you can choose. You should take a look at the help for the bmp and png functions (with ?bmp and ?png). Both bmp and png have height, width, and res arguments that you can use to specifiy the dimensions and resolution of the output. Also, I wouldn't recommend the use of dev.copy. As you could see, the result of the output is not always what you expect.
To add to Bonifacio2's answer, you if you call the function first to make the plot, you can also define your margins and window size etc before doing any actual plotting. That way you have full control over all fig specs.
pdf(file='test.jpeg',width=80,height=150,units="mm") #I prefer pdf, because they are editable files
a<-c(1,2,3,4,5,6,7)
b<-c(2,3,4,6,7,8,9)
par(mfrow=c(2,1))
plot (a,b)
plot(a,b)
dev.off()
You can use cowplot package to combine multiple panels in several different ways. For example, in your case, we export one plot with two panels arranged in two rows and one column. I assume that you prefer to use base-R 'plot' function instead of ggplot.
library(cowplot)
p1 <- ~{
plot(a,b)
}
p2 <- ~{
plot(b,a)
}
png("plot.png",
width = 3.149606, # 80 mm
height = 5.905512, # 150 mm
units = 'in',
res = 500)
plot_grid(p1, p2, labels = "AUTO", nrow = 2, ncol = 1)
dev.off()
Note that you can either remove the labels if not needed or print small letters by using "auto". Regarding size of the text, axis-labels etc, use the standard arguments for generic plot function of base-R. I hope this answer helps you. Best wishes.

How to plot matrix with background color varying according to entry?

I wanted to ask for any general idea about plotting this kind of plot in R which can compare for example the overlaps of different methods listed on the horizontal and vertical side of the plot? Any sample code or something
Many thanks
A ggplot2-example:
# data generation
df <- matrix(runif(25), nrow = 5)
# bring data to long format
require(reshape2)
dfm <- melt(df)
# plot
require(ggplot2)
ggplot(dfm, aes(x = Var1, y = Var2)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = round(value, 2)))
The corrplot package and corrplot function in that package will create plots similar to what you show above, that may do what you want or give you a starting point.
If you want more control then you could plot the colors using the image function, then use the text function to add the numbers. You can either create the margins large enough to place the text in the margins, see the axis function for the common way to add text labels in the margin. Or you could leave enough space internally (maybe use rasterImage instead of image) and use text to do the labelling. Look at the xpd argument to par if you want to add the lines and the grconvertX and grconvertY functions to help with the coordinates of the line segents.

Resources