Create "arty" mosaic pictures with R (*not* statistical mosaic plots) - r

I'd like to play around with images a bit and wondered if there are ways in R to produce mosaic pictures like these.
I guess for the background image one could use readJPEG (package jpeg) and rasterImage from package graphics.
But I'm lost with respect to how to compute and cluster color values etc. in order to arrange the foreground pictures.
EDIT
I found this post which goes "in the right direction". But I guess if you create a "true" mosaic where the actual picture is purely made up of small pictures (as opposed to having a combination of background and foreground pictures and finding the right amount of transparency as in this example), you have the problem that you'll need hundreds or possibly even thousands of pictures.

Thought this was a nice challenge to waste a few hours on. Here is a proof of concept function:
library(jpeg)
library(png)
library(plyr)
reduceCol <- function(x,dim=c(1,1))
{
arr <- array(dim=c(nrow(x),ncol(x),4))
cols <- col2rgb(c(x),alpha=TRUE)
arr[,,1] <- matrix(cols[1,],nrow(x),ncol(x),byrow=TRUE)
arr[,,2] <- matrix(cols[2,],nrow(x),ncol(x),byrow=TRUE)
arr[,,3] <- matrix(cols[3,],nrow(x),ncol(x),byrow=TRUE)
arr[,,4] <- matrix(cols[4,],nrow(x),ncol(x),byrow=TRUE)
Res <- array(dim=c(dim,4))
if (dim[1]>1) seqRows <- as.numeric(cut(1:nrow(x),dim[1])) else seqRows <- rep(1,nrow(x))
if (dim[2]>1) seqCols <- as.numeric(cut(1:ncol(x),dim[2])) else seqCols <- rep(1,ncol(x))
for (i in 1:dim[1])
{
for (j in 1:dim[2])
{
for (z in 1:4)
{
Res[i,j,z] <- mean(arr[seqRows==i,seqCols==j,z])
}
}
}
return(Res)
}
rgbarr2colmat <- function(mat)
{
Res <- array(dim=dim(mat)[1:2])
for (i in 1:dim(mat)[1])
{
for (j in 1:dim(mat)[2])
{
Res[i,j] <- rgb(mat[i,j,1],mat[i,j,2],mat[i,j,3],mat[i,j,4],maxColorValue=255)
}
}
return(Res)
}
artymosaic <- function(BG,pics,res=c(10,10))
{
BGreduced <- reduceCol(BG,res)
Picmeancol <- lapply(pics,reduceCol)
blockPic <- array(dim=res)
for (i in 1:res[1])
{
for (j in 1:res[2])
{
blockPic[i,j] <- which.min(sapply(Picmeancol,function(x)sum(abs(BGreduced[i,j,]-x))))
}
}
blockPic <- t(blockPic)
blockPic <- blockPic[,ncol(blockPic):1]
# Open empty plot:
par(mar=c(0,0,0,0))
plot(1,xlim=c(0,1),ylim=c(0,1),type="n",bty="n",axes=FALSE)
# plot moasics:
seqRows <- seq(0,1,length=res[1]+1)
seqCols <- seq(0,1,length=res[2]+1)
for (i in 1:res[1])
{
for (j in 1:res[2])
{
rasterImage(pics[[blockPic[i,j]]],seqRows[i],seqCols[j],seqRows[i+1],seqCols[j+1],interpolate=FALSE)
}
}
}
artymosaic uses the background in raster format as first argument, a list of pictures in raster format as second and the resolution (numeric(2)) as third argument.
An example with the R logo made up of pictures of computers. I downloaded some pictures of google and uploaded them at http://sachaem47.fortyseven.versio.nl/files/pics/mosaic.zip. If these are extracted in a mosaic folder, and the R logo (http://cran.r-project.org/Rlogo.jpg) is downloaded in the working directory, we can make the "arty mosaic" as follows.
bg <- readJPEG("Rlogo.jpg")
BG <- as.raster(bg)
jpgs <- lapply(list.files("mosaic/",pattern="\\.jpg",full.names=TRUE),readJPEG)
pics <- lapply(jpgs,as.raster)
png("test.png")
artymosaic(BG,pics,c(50,50))
dev.off()
Looks spectacular right? The major drawback here is that I reuse the same image where appropriate and that I only used very few images. That could be changed but would require much much more images, which will cause the function to run much longer. Again, proof of concept.

Related

Problem with getting an image of a graph made in ggplot2 [duplicate]

In R, I use function savePlot to save graphs into image files. But my colleague can only open .jpgs and .gifs (probably because he's on vacation, reading emails on his mobile phone). I hate to create jpegs because especially the boxplots looks very ugly (whiskers blurred etc.). But the savePlot function only supports the following types:
type = c("wmf", "emf", "png", "jpg", "jpeg", "bmp",
"tif", "tiff", "ps", "eps", "pdf")
How can I save plot in GIF in R?
EDIT: if possible, ideal solution should work without installing ImageMagick (so that the R script is easily portable).
R doesn't have a native GIF graphics driver, mostly (completely?) due to the patent-encumbrances of the GIF format: see http://tolstoy.newcastle.edu.au/R/help/05/02/12809.html .
There is a function in the caTools package (write.gif()) but it is specifically designed for writing images. If you wanted to use it you have to do something hacky to convert your plot to an image first (e.g. save as PNG and then read it back into R as an image). For example:
png("myPlot.png")
plot(rnorm(1000),rnorm(1000))
dev.off()
library(png)
P1 <- readPNG("myPlot.png")
library(caTools)
write.gif(P1,"myPlot.gif")
showGIF <- function(fn) system(paste("display",fn))
showGIF("myPlot.gif")
unlink("myPlot.gif") ## clean up
?write.gif() has a lot of stuff about color indexing that I didn't read but that might be important for more complex graphs ...
The animation package has a saveGIF() function to save GIFs, but (1) it is designed for saving multi-frame animations (not general graphics), and (2) it does it by calling ImageMagick.
It's easier just to construct that function yourself.
install ImageMagick (http://imagemagick.org)
save as a PNG, then use ImageMagick to convert.
For example:
png("myPlot.png")
plot(rnorm(1000),rnorm(1000))
dev.off()
system("convert myPlot.png myPlot.gif")
unlink("myPlot.png") ## clean up
showGIF("myPlot.gif")
unlink("myPlot.gif") ## clean up
Of course you can either of these in a function if you want to use them regularly.
UPDATE: I spent a while longer on this, to try to get a pure-R solution, but don't yet have a working solution. Suggestions or edits welcome ...
## needs ImageMagick: just for testing ...
showGIF <- function(fn) system(paste("display",fn))
The main function:
saveGIF <- function(fn,verbose=FALSE,debug=FALSE) {
require(png)
require(caTools)
tmpfn <- tempfile()
on.exit(unlink(tmpfn))
savePlot(tmpfn,type="png")
P1 <- readPNG(tmpfn)
dd <- dim(P1)
P1 <- aperm(P1,c(3,1,2),resize=TRUE) ## P1[,1,15]
dim(P1) <- c(dd[3],prod(dd[1:2]))
P1 <- t(P1)
if (verbose) cat("finding unique colours ...\n")
P1u <- unique(P1)
rgbMat <- function(x) {
rgb(x[,1],x[,2],x[,3])
}
if (verbose) cat("creating colour index ...\n")
pp <- paste(P1[,1],P1[,2],P1[,3],sep=".")
## make sure factor is correctly ordered
ind <- as.numeric(factor(pp,levels=unique(pp)))
if (verbose) cat("finding colour palette ...\n")
if (nrow(P1u)>256) {
if (verbose) cat("kmeans clustering ...\n")
kk <- kmeans(P1u,centers=256)
ind <- kk$cluster[ind]
pal <- rgbMat(kk$centers)
} else {
pal <- rgbMat(P1u)
}
## test:
if (debug) {
dev.new()
par(mar=rep(0,4))
image(t(matrix(ind-1,nrow=dd[1])),col=pal,axes=FALSE,ann=FALSE)
}
if (verbose) cat("writing GIF ...\n")
indmat <- matrix(ind-1,nrow=dd[1])
storage.mode(indmat) <- "integer"
write.gif(indmat,fn,col=as.list(pal),scale="never")
}
X11.options(antialias="none")
image(matrix(1:64,nrow=8),col=rainbow(10))
saveGIF("tmp.gif",verbose=TRUE,debug=TRUE)
showGIF("tmp.gif")

When and why does profvis() show "Sources not available"?

I am profiling R code a lot, and make heavy use of profvis().
For some functions, the top half of the browser window shows the source, and sometimes it doesn't. I can't make out when that is the case, it seems random to me.
Does anyone know when and why profvis can't show the code in the top window? One situation where it happens is this piece of code:
simulation <- function(p=1e4, n=100){
df <- list()
for(i in 1:p){ # simulate p columns of data
df[[paste0("Var", i)]] <- rnorm(n)
}
df <- as.data.fame(df)
return(apply(df, 2, mean))
}
profvis(simulation())
When there is only one function() in profvis(), nothing is in upper part of Frame graph. I think the reason is that a function of highlighting a bar at base of graph is unnecessary.
Example code:
library(profvis)
simulation <- function(p=1e4, n=100){
df <- list()
for(i in 1:p){ # simulate p columns of data
df[[paste0("Var", i)]] <- rnorm(n)
}
df <- as.data.frame(df)
return(apply(df, 2, mean))
}
profvis(simulation()) # When only one function() is, the source isn't shown.
profvis({ # When there are more than two function(), source is shown.
simulation()
sum(iris[,1])
})

Is there any way to build two plots at the same time in R?

I want to make two 2x2 plots at the same time. My motivation is because I originally had logic like this:
pdf(...)
par(mfrow=c(2,2))
for (subplot_i in c(1,2,3,4)) {
plot(f(subplot_i))
}
dev.off()
but now I want to do something like
pdf(..., window=1)
pdf(..., window=2)
par(mfrow=c(2,2))
for (subplot_i in c(1,2,3,4)) {
plot(f(subplot_i), window=1)
plot(g(subplot_i), window=2)
}
dev.off(window=1)
dev.off(window=2)
that is, build up two plots at the same time.
I could separate it into two loops, but this would double the processing that occurs inside of the loop. I could move all of this processing outside of the loop, but it would come at some considerable effort.
So is what I want possible?
Appears to be possible!
f <- function(x) { x }
g <- function(x) { x^2 }
pdf('a.pdf')
par(mfrow=c(2,2))
pdf('b.pdf')
par(mfrow=c(2,2))
for (subplot_i in c(1,2,3,4)) {
plot(f(subplot_i))
dev.set(dev.prev())
plot(g(subplot_i))
dev.set(dev.next())
}
dev.off()
dev.off()

Plotting during a loop in RStudio

I am implementing a solution to the Traveling Salesman Problem (TSP) in R (simulated Annealing) and I want to output the current best path periodically. I have searched quite a bit for how to output plots during a for loop and have thus far failed.
I use RStudio, and want to see the graphs as they are generated. If you have ever watched TSP solvers do their thing, you will understand how cool it is to watch. Here is a sample of the graphics output I want to see http://www.staff.science.uu.nl/~beuke106/anneal/anneal.html
I don't think that the memory usage will be a problem (during about 500,000 iterations, I am only expecting 50-100 plots). Here is a sample function, where we would expect to see 10 different plots during the time the function runs:
Plotz <- function(iter = 1000000, interval = 100000) {
x <- 1:10
for(i in 1:iter){
y <- runif(10)
if(i %% interval == 0) {
plot(x, y)
}
}
return(c(x, y))
}
Plotz()
When I run this, all I see is the final plot produced (in RStudio). How can I see the plots as they're generated?
Also: I am on Ubuntu (whatever the newest stable release is). Don't know if that is relevant.
Thank you everyone in advance.
EDIT: Per Captain Murphy's suggestion, I tried running this in the Linux terminal, and the graphics appeared. I still think the question of "How to do this in RStudio?" Is still relevant, however. It's such a good program, so maybe someone has an idea of what could be done to get this to work?
EDIT2: As Thilo stated, this is a known bug in Rstudio. If anyone has any other ideas to solve this without the software itself being fixed, then there is still something to discuss. Otherwise, consider this question solved.
Calling Sys.sleep(0) should cause the plot to draw. Unlike the X11 solution, this will work on server versions of RStudio as well.
(I was surprised that dev.flush() did not give the result you were hoping for, that might be a bug.)
One thing you can do is open a x11 window and plot in there:
x11()
Plotz()
That should work the same as running it in terminal.
Following up on #JoeCheng's answer and #RGuy's comment on that answer: as I worked out with the RStudio folks, the problem seems to primarily arise when there is too much plotting going on in too short a timespan. The solution is twofold:
Sys.sleep(0) helps force an update to the plotting window.
Plotting updates every Wth loop rather than every loop.
For instance, on my computer (i7, RStudio Server), the following code does not update until the loop completes:
N <- 1000
x <- rep(NA,N)
plot(c(0,1)~c(0,N), col=NA)
for(i in seq(N)) {
Sys.sleep(.01)
x[i] <- runif(1)
iseq <- seq(i-99,i)
points( x[i]~i )
Sys.sleep(0)
}
The following code updates in real-time, despite having the same number of points to be plotted:
N <- 1000
x <- rep(NA,N)
plot(c(0,1)~c(0,N), col=NA)
for(i in seq(N)) {
Sys.sleep(.01)
x[i] <- runif(1)
iseq <- seq(i-99,i)
if(i%%100==0) {
points( x[iseq]~iseq )
Sys.sleep(0)
}
}
In other words, it's the number of calls the plot that seems to matter, not the amount of data to be plotted.
If you want to save the plots as well you could just open a new device in the loop and close it afterwards.
Plotz <- function(iter = 1000, interval = 100) {
x <- 1:10
p <- 0 #plot number
for(i in 1:iter){
y <- runif(10)
if(i %% interval == 0) {
png(file=paste(i,"png",sep="."))
p <- p + 1; plot(x, y)
dev.off()
}
}
return(c(x, y))
}
Plotz <- function(iter = 1000, interval = 100) {
x <- 1:10
p <- 0 #plot number
for(i in 1:iter){
y <- runif(10)
if(i %% interval == 0) {
p <- p + 1; plot(x, y)
readline("Please press the Enter key to see the next plot if there is one.")
}
}
return(c(x, y))
}
Plotz()
You can also use the back arrows on the plots tab of the lower left pane of the RStudio interface in order to view the plots.
You can use the animate package to layer your plots into a GIF.

Automating great-circle map production in R

I've taken some of the things I learned in a Flowing Data great circle mapping tutorial and combined them with code linked in the comments to prevent weird things from happening when R plots trans-equatorial great circles. That gives me this:
airports <- read.csv("/home/geoff/Desktop/DissertationData/airports.csv", header=TRUE)
flights <- read.csv("/home/geoff/Desktop/DissertationData/ATL.csv", header=TRUE, as.is=TRUE)
library(maps)
library(geosphere)
checkDateLine <- function(l){
n<-0
k<-length(l)
k<-k-1
for (j in 1:k){
n[j] <- l[j+1] - l[j]
}
n <- abs(n)
m<-max(n, rm.na=TRUE)
ifelse(m > 30, TRUE, FALSE)
}
clean.Inter <- function(p1, p2, n, addStartEnd){
inter <- gcIntermediate(p1, p2, n=n, addStartEnd=addStartEnd)
if (checkDateLine(inter[,1])){
m1 <- midPoint(p1, p2)
m1[,1] <- (m1[,1]+180)%%360 - 180
a1 <- antipode(m1)
l1 <- gcIntermediate(p1, a1, n=n, addStartEnd=addStartEnd)
l2 <- gcIntermediate(a1, p2, n=n, addStartEnd=addStartEnd)
l3 <- rbind(l1, l2)
l3
}
else{
inter
}
}
# Unique months
monthyear <- unique(flights$month)
# Color
pal <- colorRampPalette(c("#FFEA00", "#FF0043"))
colors <- pal(100)
for (i in 1:length(monthyear)) {
png(paste("monthyear", monthyear[i], ".png", sep=""), width=750, height=500)
map("world", col="#191919", fill=TRUE, bg="black", lwd=0.05)
fsub <- flights[flights$month == monthyear[i],]
fsub <- fsub[order(fsub$cnt),]
maxcnt <- max(fsub$cnt)
for (j in 1:length(fsub$month)) {
air1 <- airports[airports$iata == fsub[j,]$airport1,]
air2 <- airports[airports$iata == fsub[j,]$airport2,]
p1 <- c(air1[1,]$long, air1[1,]$lat)
p2 <- c(air2[1,]$long, air2[1,]$lat)
inter <- clean.Inter(p1,p2,n=100, addStartEnd=TRUE)
colindex <- round( (fsub[j,]$cnt / maxcnt) * length(colors) )
lines(inter, col=colors[colindex], lwd=1.0)
}
dev.off()
}
I'd like to automate the production of maps for a large dataset containing all scheduled commercial routes — dummy sample — shared between ATL and other airports in the global network (airports.csv is linked to in the Flowing Data post). Preferably, I'd produce one map per month that I would use as frame in a short video depicting changes in the Atlanta airport network space.
The problem: I can't get the loop to produce any more than one PNG—from only the first unique month in each CSV—each time I run it. I'm fairly certain Aaron Hardin's code 'breaks' the automation as it is used in the Flowing Data tutorial. After three days of messing with it and chasing down any relevant R how-to's, I realize I simply lack the chops to reconcile one with the other. Can anybody help me automate the process?
There's a dissertation acknowledgement in it for you!
Too much information for a comment, so I post an answer instead. Here is what I think (and read to the end to see what could potentially be the problem):
I have tried to run your code on the original data in the Flowing Data tutorial. (Obviously you have to add a column for monthly data, so I simply added this line to randomise the month:):
airports <- read.csv("http://datasets.flowingdata.com/tuts/maparcs/airports.csv",
header=TRUE)
flights <- read.csv("http://datasets.flowingdata.com/tuts/maparcs/flights.csv",
header=TRUE, as.is=TRUE)
# Add column with random data for month
flights$month <- sample(month.abb[1:4], nrow(flights), replace=TRUE)
Whenever I have a loop that takes a long time to run, I generally stick a bit of code in there that gives me a progress check. Use what takes your fancy: print, cat, tcltk::tkProgressBar. I use message:
for (i in 1:length(monthyear)) {
message(i)
#
# your code here
#
}
Anyway, I then ran your code. Everything works exactly as it should. Since I sampled four months worth of data, I get:
The message with the current iteration of i prints four times
Four png plots, each with a dark world map and bright yellow lines. Here is one of the four lines:
So, why does it work on my machine and not yours?
I can only guess, but my guess is that you haven't set the working directory. There is no setwd in your code, and the call to png just gives the filename. I suspect your code is being written to whatever your working directory is in your system.
By default, on my installation, the working directory is:
getwd()
[1] "C:/Program Files/eclipse 3.7"
To solve this, do one of the following:
Use setwd() to set your working directory at the top of your script.
Or use the full path and file name in your call to png()

Resources