How to process all text elements of ggplot through a function before output? - r

I'm creating a set of functions for enciphering data plots as a hook for data literacy lessons.
For example, this function performs an alphabetic shift of +/- X for any string input
enciphR<-function(x,alg,key=F){
x.vec<-tolower(unlist(strsplit(x,fixed=T,split="")))#all lower case as vector
#define new alphabet
alphabet<-1:26+alg
alphabet.shifted.idx<-sapply(alphabet,function(x) {if(x>26){x-26}else{ if(x<1){x+26}else{x}}})
alphabet.shifted<-letters[alphabet.shifted.idx]
keyMat=cbind(IN=letters,OUT=alphabet.shifted)
#encipher
x1.1<-as.vector(sapply(x.vec,function(s) {
if(!s%in%letters){s}else{#If nonletter, leave it alone, else...
keyMat[match(s,keyMat[,"IN"]),"OUT"]
}},USE.NAMES = F))
x2<-paste0(x1.1,collapse="")
if(key){
out<-list(IN=x,OUT=x2,KEY=keyMat)}
else{
out<-x2
}
return(out)
}
enciphR(letters,+1) yields "bcdefghijklmnopqrstuvwxyza"
I want to feed a ggplot object into another function ggEnciphR(), which uses the enciphR() helper to encode all text elements of that object, according to the supplied cipher algorithm.
So, start with a figure:
data("AirPassengers")
require(ggplot2);require(reshape2)
df <- data.frame(Passengers=as.numeric(AirPassengers), year = as.factor(trunc(time(AirPassengers))), mnth = month.abb[cycle(AirPassengers)])
(gg<-qplot(mnth,Passengers,aes(col=year),data=df)+geom_line(aes(group=year,col=year)))
now I define the ggCiphR function:
ggCiphR<-function(ggGraph,alg){
g<-ggGraph
pass2enciphR=function(x){enciphR(x,alg=alg)}
#process all labels
g$labels<-lapply(g$labels,pass2enciphR)
#process custom legend if present (sometimes works, depending on ggplot object)
try(if(length(g$scales$scales)>0){
for(i in 1:length(g$scales$scales)){
g$scales$scales[[i]]$name=enciphR(g$scales$scales[[i]],alg=alg)
}
})
g
}
ggCiphR(gg,+1)
Yields:
This is getting there. Axis labels and legend title have been enciphered, but not tick values. Also, if my factor was Airline, instead of Year, I'd like that to be sent through enciphR, as well.
I've searched high and wide and cannot for the life of me figure out how to modify x- and y-axis values and factor levels from the gg object. I don't want to have to recast the dataframe and regenerate the ggplot... ideally I can do this programmatically using the gg object. And also, it would (ideally) be flexible enough to work with different classes of data in X & Y. Thoughts?

Related

Add text to a ggpairs() scatterplot?

dumb but maddening question: How can I add text labels to my scatterplot points in a ggpairs(...) plot? ggpairs(...) is from the GGally library. The normal geom_text(...) function doesn't seem to be an option, as it take x,y arguments and ggpairs creates an NxN matrix of differently-styled plots.
Not showing data, but imagine I have a column called "ID" with id's of each point that's displayed in the scatterplots.
Happy to add data if it helps, but not sure it's necessary. And maybe the answer is simply that it isn't possible to add text labels to ggpairs(...)?
library(ggplot2)
library(GGally)
ggpairs(hwWrld[, c(2,6,4)], method = "pearson")
Note: Adding labels is for my personal reference. So no need to tell me it would look like an absolute mess. It will. I'm just looking to identify my outliers.
Thanks!
It is most certainly possible. Looking at the documentation for ?GGally::ggpairs there are three arguments, upper, lower and diag, which from the details of the documentations are
Upper and lower are lists that may contain the variables 'continuous', 'combo', 'discrete' and 'na'. Each element of thhe list may be a function or a string
... (more description)
If a function is supplied as an option to upper, lower, or diag, it should implement the function api of function(data, mapping, ...){#make ggplot2 plot}. If a specific function needs its parameters set, wrap(fn, param1 = val1, param2 = val2) the function with its parameters.
Thus a way to "make a label" would be to overwrite the default value of a plot. For example if we wanted to write "hello world" in the upper triangle we could do something like:
library(ggplot2)
library(GGally)
#' Plot continuous upper function, by adding text to the standard plot
#' text is placed straight in the middle, over anything already residing there!
continuous_upper_plot <- function(data, mapping, text, ...){
p <- ggally_cor(data, mapping, ...)
if(!is.data.frame(text))
text <- data.frame(text = text)
lims <- layer_scales(p)
p + geom_label(data = text, aes(x = mean(lims$x$range$range),
y = mean(lims$y$range$range),
label = text),
inherit.aes = FALSE)
}
ggpairs(iris, upper = list(continuous = wrap(continuous_upper_plot,
text = 'hello world')))
with the end result being:
There are 3 things to note here:
I've decided to add the text in the function itself. If your text is part of your existing data, simply using the mapping (aes) argument when calling the function will suffice. And this is likely also better, as you are looking to add text to specific points.
If you have any additional arguments to a function (outside data and mapping) you will need to use wrap to add these to the call.
The function documentation specifically says that arguments should be data, mapping rather than the standard for ggplot2 which is mapping, data. As such for any of the ggplot functions a small wrapper switching their positions will be necessary to overwrite the default arguments for ggpairs.

Adding points after the fact with ggplot2; user defined function

I believe the answer to this is that I cannot, but rather than give in utterly to depraved desperation, I will turn to this lovely community.
How can I add points (or any additional layer) to a ggplot after already plotting it? Generally I would save the plot to a variable and then just tack on + geom_point(...), but I am trying to include this in a function I am writing. I would like the function to make a new plot if plot=T, and add points to the existing plot if plot=F. I can do this with the basic plotting package:
fun <- function(df,plot=TRUE,...) {
...
if (!plot) { points(dYdX~Time.Dec,data=df2,col=col) }
else { plot(dYdX~Time.Dec,data=df2,...) }}
I would like to run this function numerous times with different dataframes, resulting in a plot with multiple series plotted.
For example,
fun(df.a,plot=T)
fun(df.b,plot=F)
fun(df.c,plot=F)
fun(df.d,plot=F)
The problem is that because functions in R don't have side-effects, I cannot access the plot made in the first command. I cannot save the plot to -> p, and then recall p in the later functions. At least, I don't think I can.
have a ggplot plot object be returned from your function that you can feed to your next function call like this:
ggfun = function(df, oldplot, plot=T){
...
if(plot){
outplot = ggplot(df, ...) + geom_point(df, ...)
}else{
outplot = oldplot + geom_point(data=df, ...)
}
print(outplot)
return(outplot)
}
remember to assign the plot object returned to a variable:
cur.plot = ggfun(...)

How to only change parameters for "lower" plots in the ggpairs function from GGally package

I have the following example
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],200),]
ggpairs(diamonds.samp, columns=8:10,
upper=list(continuous='cor'),
lower=list(continuous = 'points'),
diag=list(continuous='density'),
axisLabels='show'
)
Resulting in a really nice figure:
But my problem is that in the real dataset I have to many points whereby I would like to change the parameters for the point geom. I want to reduce the dot size and use a lower alpha value. I can however not doe this with the "param" option it applies to all plot - not just the lower one:
ggpairs(diamonds.samp, columns=8:10,
upper=list(continuous='cor'),
lower=list(continuous = 'points'),
diag=list(continuous='density'),
params=c(alpha=1/10),
axisLabels='show'
)
resulting in this plot:
Is there a way to apply parameters to only "lower" plots - or do I have to use the ability to create custom plots as suggested in the topic How to adjust figure settings in plotmatrix?
In advance - thanks!
There doesn't seem to be any elegant way to do it, but you can bodge it by writing a function to get back the existing subchart calls from the ggally_pairs() object and then squeezing the params in before the last bracket. [not very robust, it'll only work for if the graphs are already valid]
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],200),]
g<-ggpairs(diamonds.samp, columns=8:10,
upper=list(continuous='cor'),
lower=list(continuous = 'points'),
diag=list(continuous='density'),
axisLabels='show'
)
add_p<-function(g,i,params){
side=length(g$columns) # get number of cells per side
lapply(i,function(i){
s<-as.character(g$plots[i]) # get existing call as a template
l<-nchar(s)
p<-paste0(substr(s,1,l-1),",",params,")") # append params before last bracket
r<-i%/%side+1 # work out the position on the grid
c<-i%%side
array(c(p,r,c)) # return the sub-plot and position data
})
}
rep_cells<-c(4,7,8)
add_params<-"alpha=0.3, size=0.1, color='red'"
ggally_data<-g$data # makes sure that the internal parameter picks up your data (it always calls it's data 'ggally_data'
calls<-add_p(g,rep_cells,params=add_params) #call the function
for(i in 1:length(calls)){g<-putPlot(g,calls[[i]][1],as.numeric(calls[[i]][2]),as.numeric(calls[[i]][3]))}
g # call the plot

How to put ggplot2 ticks labels between dollars?

In order to "convert" a ggplot2 graphic to a pdf LaTeX graphic with the tikzDevice package, I'd like to put the axis ticks labels between two $. Of course I can do it if I specify manually the tick labels, but how to do when using the automatic tick labels ? (in particular when using dates on the x-axis it is hard to specify manually the labels).
Update - solution with axis labels formatter
Based on #agstudy's answer, I have written the dollarify() formatter for numerical labels:
dollarify <- function(){
function(x) paste0("$",x,"$")
}
and the datify() formatter for dates:
datify <- function(){
function(x){
split <- stringr::str_split_fixed(as.character(x),"-",3)
out <- character(nrow(split))
for(i in 1:length(out)){
out[i] <- paste0("\\formatdate{", split[i,3], "}{", split[i,2], "}{", split[i,1], "}")
}
out
}
}
which generates a LaTeX code to be used with the datetime package:
\usepackage[ddmmyyyy]{datetime}
Below is a screenshot of a rendering, using the following scale for the x-axis :
scale_x_date(breaks="2 months", labels=datify())
It is not clear what you want to do but I think you are looking for an axis labels formatter.
## forma :you can give here any date format
dollar_date_format <- function (forma = "%H:%M"){
function(x) paste0("$",format(x,forma),"$")
}
Then using some data ( please provide reproducible example next time) you can use it like this:
DF <- data.frame(time=Sys.time()+1:10,count=1:10)
library(ggplot2)
qplot(x=time,y=count,data=DF)+
scale_x_datetime(labels = dollar_date_format(forma = "%M:%S"))+
xlab("Time (dollars)") +
theme(axis.text.x =element_text(size=20))
If you want to generically be able to modify the axis labels, you can run the plot, use ggplot_build() to get the plotted labels back, then add e.g. scale_x_continuous()/scale_x_date() with custom labels on the rendered breaks. You will need to tweak it depending on datatypes (look in the build variable to see what data's available).
You might want to use $x/y.labels or $x/y.major_source depending on datatype
x=c("2013-03-22","2013-04-24","2013-07-01","2013-09-13")
y=c(1,2,3,4)
#any ggplot object
g<-qplot(as.Date(x),y)
#call the rendered axis labels
build<-ggplot_build(g)
xrng<-data.frame(build$panel$ranges[[1]]$x.major_source,stringsAsFactors=FALSE)
yrng<-data.frame(build$panel$ranges[[1]]$y.labels,stringsAsFactors=FALSE)
colnames(xrng)<-"value"
colnames(yrng)<-"value"
#create custom labels
xrng$lab<-paste0("$",row.names(xrng),"$")
yrng$lab<-paste0("$",yrng$value,"$")
#re-render with custom labels
g+scale_x_date(breaks=as.Date(xrng$value),labels=xrng$lab) +
scale_y_continuous(breaks=as.numeric(yrng$value),labels=yrng$lab)

How to draw lines on a plot in R?

I need to draw lines from the data stored in a text file.
So far I am able only to draw points on a graph and i would like to have them as lines (line graph).
Here's the code:
pupil_data <- read.table("C:/a1t_left_test.dat", header=T, sep="\t")
max_y <- max(pupil_data$PupilLeft)
plot(NA,NA,xlim=c(0,length(pupil_data$PupilLeft)), ylim=c(2,max_y));
for (i in 1:(length(pupil_data$PupilLeft) - 1))
{
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red", cex = 0.5, lwd = 2.0)
}
Please help me change this line of code:
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red")
to draw lines from the data.
Here is the data in the file:
PupilLeft
3.553479
3.539469
3.527239
3.613131
3.649437
3.632779
3.614373
3.605981
3.595985
3.630766
3.590724
3.626535
3.62386
3.619688
3.595711
3.627841
3.623596
3.650569
3.64876
By default, R will plot a single vector as the y coordinates, and use a sequence for the x coordinates. So to make the plot you are after, all you need is:
plot(pupil_data$PupilLeft, type = "o")
You haven't provided any example data, but you can see this with the built-in iris data set:
plot(iris[,1], type = "o")
This does in fact plot the points as lines. If you are actually getting points without lines, you'll need to provide a working example with your data to figure out why.
EDIT:
Your original code doesn't work because of the loop. You are in effect asking R to plot a line connecting a single point to itself each time through the loop. The next time through the loop R doesn't know that there are other points that you want connected; if it did, this would break the intended use of points, which is to add points/lines to an existing plot.
Of course, the line connecting a point to itself doesn't really make sense, and so it isn't plotted (or is plotted too small to see, same result).
Your example is most easily done without a loop:
PupilLeft <- c(3.553479 ,3.539469 ,3.527239 ,3.613131 ,3.649437 ,3.632779 ,3.614373
,3.605981 ,3.595985 ,3.630766 ,3.590724 ,3.626535 ,3.62386 ,3.619688
,3.595711 ,3.627841 ,3.623596 ,3.650569 ,3.64876)
plot(PupilLeft, type = 'o')
If you really do need to use a loop, then the coding becomes more involved. One approach would be to use a closure:
makeaddpoint <- function(firstpoint){
## firstpoint is the y value of the first point in the series
lastpt <- firstpoint
lastptind <- 1
addpoint <- function(nextpt, ...){
pts <- rbind(c(lastptind, lastpt), c(lastptind + 1, nextpt))
points(pts, ... )
lastpt <<- nextpt
lastptind <<- lastptind + 1
}
return(addpoint)
}
myaddpoint <- makeaddpoint(PupilLeft[1])
plot(NA,NA,xlim=c(0,length(PupilLeft)), ylim=c(2,max(PupilLeft)))
for (i in 2:(length(PupilLeft)))
{
myaddpoint(PupilLeft[i], type = "o")
}
You can then wrap the myaddpoint call in the for loop with whatever testing you need to decide whether or not you will actually plot that point. The function returned by makeaddpoint will keep track of the plot indexing for you.
This is normal programming for Lisp-like languages. If you find it confusing you can do this without a closure, but you'll need to handle incrementing the index and storing the previous point value 'manually' in your loop.
There is a strong aversion among experienced R coders to using for-loops when not really needed. This is an example of a loop-less use of a vectorized function named segments that takes 4 vectors as arguments: x0,y0, x1,y1
npups <-length(pupil_data$PupilLeft)
segments(1:(npups-1), pupil_data$PupilLeft[-npups], # the starting points
2:npups, pupil_data$PupilLeft[-1] ) # the ending points

Resources