I am trying to write a function that will produce what I regard as a real dot plot (unlike the Cleveland variety, I require a univariate scatterplot with the dots stacked for (nearly) equal values). I have come close:
In this illustration, the dots you see are actually rotated text strings of lower-case "o"s. It is done this way because I need the dot spacing to stay constant if the plot is re-scaled. However, I'd like something better than lower-case "o"s, for example, filled dots instead of circles. This could be done if I could access the font that is used for the standard plotting symbols (pch = 1:25 in the plot function and relatives). Then I could make a text string with that font and get what's needed. Does anybody know how to do that?
PS - No, a histogram with lots of bins is not an acceptable substitute.
I did find a way to get the desired dot plot using low-level graphics parameters (namely "usr", the actual user coordinates of the plotting area, and "cxy", the character size). The recordGraphics() function wraps the part that needs to be changed when the graph is resized. Here's the function:
dot.plot = function(x, pch = 16, bins = 50, spacing = 1, xlab, ...) {
if(missing(xlab))
xlab = as.character(substitute(x))
# determine dot positions
inc = diff(pretty(x, n = bins)[1:2])
freq = table(inc * round(x / inc, 0))
xx = rep(as.numeric(names(freq)), freq)
yy = unlist(lapply(freq, seq_len))
# make the order of the dots the same as the order of the data
idx = seq_along(x)
idx[order(x)] = idx
xx = xx[idx]
yy = yy[idx]
# make a blank plot
plot(xx, yy, type = "n", axes = FALSE, xlab = xlab, ylab = "")
# draw scale
axis(1)
ylow = par("usr")[3]
abline(h = ylow) # extend to full width
# draw points and support resizing
recordGraphics({
yinc = 0.5 * spacing * par("cxy")[2]
points(xx, ylow + yinc * (yy - .5), pch = pch, ...)
},
list(),
environment(NULL))
invisible()
}
The spacing argument may be used if you want a tighter or looser gap between dots. An example...
with(iris, dot.plot(Sepal.Length, col = as.numeric(Species)))
This is a better solution than trying to do it with text, but also a little bit scary because of the warnings you see in the documentation for recordGraphics
Related
How can such a non-linear transformation be done?
here is the code to draw it
my.sin <- function(ve,a,f,p) a*sin(f*ve+p)
s1 <- my.sin(1:100, 15, 0.1, 0.5)
s2 <- my.sin(1:100, 21, 0.2, 1)
s <- s1+s2+10+1:100
par(mfrow=c(1,2),mar=rep(2,4))
plot(s,t="l",main = "input") ; abline(h=seq(10,120,by = 5),col=8)
plot(s*7,t="l",main = "output")
abline(h=cumsum(s)/10*2,col=8)
don't look at the vector, don't look at the values, only look at the horizontal grid, only the grid matters
####UPDATE####
I see that my question is not clear to many people, I apologize for that...
Here are examples of transformations only along the vertical axis, maybe now it will be more clear to you what I want
link Source
#### UPDATE 2 ####
Thanks for your answer, this looks like what I need, but I have a few more questions if I may.
To clarify, I want to explain why I need this, I want to compare vectors with each other that are non-linearly distorted along the horizontal axis .. Maybe there are already ready-made tools for this?
You mentioned that there are many ways to do such non-linear transformations, can you name a few of the best ones in my case?
how to make the function f() more non-linear, so that it consists, for example, not of one sinusoid, but of 10 or more. Тhe figure shows that the distortion is quite simple, it corresponds to one sinusoid
and how to make the function f can be changed with different combinations of sinusoids.
set.seed(126)
par(mar = rep(2, 4),mfrow=c(1,3))
s <- cumsum(rnorm(100))
r <- range(s)
gridlines <- seq(r[1]*2, r[2]*2, by = 0.2)
plot(s, t = "l", main = "input")
abline(h = gridlines, col = 8)
f <- function(x) 2 * sin(x)/2 + x
plot(s, t = "l", main = "input+new greed")
abline(h = f(gridlines), col = 8)
plot(f(s), t = "l", main = "output")
abline(h = f(gridlines), col = 8)
If I understand you correctly, you wish to map the vector s from the regular spacing defined in the first image to the irregular spacing implied by the second plot.
Unfortunately, your mapping is not well-defined, since there is no clear correspondence between the horizontal lines in the first image and the second image. There are in fact an infinite number of ways to map the first space to the second.
We can alter your example a bit to make it a bit more rigorous.
If we start with your function and your data:
my.sin <- function(ve, a, f, p) a * sin(f * ve + p)
s1 <- my.sin(1:100, 15, 0.1, 0.5)
s2 <- my.sin(1:100, 21, 0.2, 1)
s <- s1 + s2 + 10 + 1:100
Let us also create a vector of gridlines that we will draw on the first plot:
gridlines <- seq(10, 120, by = 2.5)
Now we can recreate your first plot:
par(mar = rep(2, 4))
plot(s, t = "l", main = "input")
abline(h = gridlines, col = 8)
Now, suppose we have a function that maps our y axis values to a different value:
f <- function(x) 2 * sin(x/5) + x
If we apply this to our gridlines, we have something similar to your second image:
plot(s, t = "l", main = "input")
abline(h = f(gridlines), col = 8)
Now, what we want to do here is effectively transform our curve so that it is stretched or compressed in such a way that it crosses the gridlines at the same points as the gridlines in the original image. To do this, we simply apply our mapping function to s. We can check the correspondence to the original gridlines by plotting our new curves with a transformed axis :
plot(f(s), t = "l", main = "output", yaxt = "n")
axis(2, at = f(20 * 1:6), labels = 20 * 1:6)
abline(h = f(gridlines), col = 8)
It may be possible to create a mapping function using the cumsum(s)/10 * 2 that you have in your original example, but it is not clear how you want this to correspond to the original y axis values.
Response to edits
It's not clear what you mean by comparing two vectors. If one is a non-linear deformation of the other, then presumably you want to find the underlying function that produces the deformation. It is possible to create a function that applies the deformation empirically simply by doing f <- approxfun(untransformed_vector, transformed_vector).
I didn't say there were many ways of doing non-linear transformations. What I meant is that in your original example, there is no correspondence between the grid lines in the original picture and the second picture, so there is an infinite choice for which gridines in the first picture correspond to which gridlines in the second picture. There is therefore an infinite choice of mapping functions that could be specified.
The function f can be as complicated as you like, but in this scenario it should at least be everywhere non-decreasing, such that any value of the function's output can be mapped back to a single value of its input. For example, function(x) x + sin(x)/4 + cos(3*(x + 2))/5 would be a complex but ever-increasing sinusoidal function.
I have a plot() where multiple colour shadings represent the same thing. I would like to add a legend that conveys this by having dual-coloured boxes (see example below). Is there any way to do this using legend() or a similar command? Alternatively, is there a way to identify the precise coordinates of these boxes so I can plot a polygon() triangle over it?
Note: legend() does return the coordinates of the outer box and the top left of each labels, but I am not sure if this is sufficient to calculate where the coloured boxes are.
This is a bit of a hack, but you could put two legends on top of another. Unfortunately, there is no left triangle pch which would have been exactly as you wanted.
plot(1)
legend("bottomright",c("Label 1", "Label 2"),pch=22,col=c("red","blue"),pt.bg=c("red","blue"), pt.cex=1.8)
legend("bottomright",c("Label 1", "Label 2"),pch=21,col=c("green","orange"),pt.bg=c("green","orange"))
A slightly dirty hack can allow you to get the legend() function to give you the necessary information. A smarter person than me would probably work out how legend() calculates box positioning and replicate this outside the function. Note that editing standard R functions is probably not recommended.
If you have not edited R functions yet, an easy (and temporary) way to access it, is typing
fix(legend)
Typing
rm(legend)
later will undo your changes.
Find this section that says fill <- rep and add the lines indicated by the comments:
fillList <- NULL ## added
if (mfill) {
if (plot) {
fill <- rep(fill, length.out = n.leg)
rect2(left = xt, top = yt + ybox/2, dx = xbox, dy = ybox,
col = fill, density = density, angle = angle,
border = border)
fillList <- data.frame(left = xt, top = yt + ybox/2, dx = xbox, dy = ybox) ## added
}
xt <- xt + dx.fill
}
Find the very last line and change it to
invisible(list(rect = list(w = w, h = h, left = left, top = top),
text = list(x = xt, y = yt), fillList=fillList)) ## modified
Now call legend via
output <- legend(...) ## replace ... with whatever you want to have as inputs
and the draw triangles using the information returned by legend() like so:
with(output$fillList[1,], { ## first box
polygon(c(left, left+dx, left+dx), c(top, top, top-dy), col=myColour, border=NA)
})
I am trying to use a larger font size for those who have poor eyesight.
library(quantmod)
getSymbols("SPY", from="2013-11-01", to=Sys.Date())
chart_Series(SPY)
myPars <-chart_pars()
myPars$cex<-1.5
chart1 <- chart_Series(SPY, pars=myPars)
chart1
However, when I do this, only a part of the y axis numbers scale are shown.
Is it possible to shift the chart, so the y axis numbers scale are not cut off.
Thank you for your help.
When I try your code (Note this is in R 3.1.0 in R studio though), the y axis numbers scale doesn't appear to be cut off. Nevertheless, you can adjust the chart_pars() (as you've already partly done) and chart_theme() to achieve what you want.
To reduce crowding on the y-axis, you can adjust the margin parameters, $mar, of chart_pars(). Increase the left (and or right) margin parameter values to remove crowding of the y-axis numbers. You can also consider removing either the left or right y-axis scale to save more space. Here's an example, with explanations:
library(quantmod)
getSymbols("SPY", from="2013-11-01", to=Sys.Date())
myPars <- chart_pars()
myPars$mar <- c(3, 2, 0, .2) # default is c(3, 1, 0, 1) # bottom, left, top, right
myPars$cex <- 1.5 #' Increase font size of both x and y axis scale ticks
mychartTheme <- chart_theme()
mychartTheme$rylab = FALSE #' Don't show y-axis on right side of plot to save space
# mychartTheme$lylab = TRUE #' Show y-axis ticks on left side of plot? Default is TRUE for both left and right sides.
chart1 <- chart_Series(SPY, pars=myPars, theme = mychartTheme)
chart1
The plot you'll get from this code is:
Furthermore, in case you're interested in editing the number of decimal places displayed on the y-axis scale numbers (e.g. in FX for currencies quoted in pips at 1e-4), you can edit the source code of chart_Series at certain lines to get what you desire.
For example, to plot to 4 decimal places on the LEFT y-axis only, and offset the printing of the y-axis numbers to the left (so they don't plot under the bars near the left margin of the plot), you could edit lines 143-147 of chart_Series like so (create a copy of chart_Series with the following edits):
#' my.chart_Series is identical to the definition of chart_Series, with these minor edits
my.chart_Series <- function (x, name = deparse(substitute(x)), type = "candlesticks",
subset = "", TA = "", pars = chart_pars(), theme = chart_theme(),
clev = 0, ...)
{
cs <- new.replot()
....
[lines 143-147]: if (theme$lylab) {
cs$add(expression(text(1 - 1/3 - max(strwidth(alabels)),
alabels, sprintf("%.4f", alabels), #alabels, noquote(format(alabels, justify = "left", digits = 4)),
col = theme$labels, offset = -2, cex = 0.9, pos = 4,
xpd = TRUE)), expr = TRUE)
} #' default offset = 0
....
}
And then in your R script to see this effect write something like this:
source('my.chart_Series.R')
environment(my.chart_Series) <- environment(get("chart_Series", envir = asNamespace("quantmod")))
assignInNamespace(x = "chart_Series", value = my.chart_Series, ns = "quantmod")
myPars <- chart_pars()
myPars$mar <- c(3, 3, 0, .2) # default is c(3, 1, 0, 1) # bottom, left, top, right
myPars$cex <- 1.0 #' Increase font size of both x and y axis scale ticks
mychartTheme <- chart_theme()
mychartTheme$rylab = FALSE #' Don't show y-axis on right side of plot to save space
# mychartTheme$lylab = TRUE #' Show y-axis ticks on left side of plot? Default is TRUE for both left and right sides.
chart1 <- quantmod:::chart_Series(SPY, pars=myPars, theme = mychartTheme) #' Note the need to prepend the quantmod namespace to the modified visible chart_Series function in quantmod.
chart1
This will give you this plot:
Also, to reduce the number of ticks drawn on the y axis, you can also modify line 128 in chart_Series like so:
p <- pretty(ylim, n = 5) #' The original source code (quantmod 0.4-0) is p <- pretty(ylim, 10)
See the R help documentation for constraints on the n argument in the R function pretty. This gives:
One of the things that most bugs me about R is the separation of the plot, points, and lines commands. It's somewhat irritating to have to change plot to whatever variant for the first plot done, and to have to replot from scratch if you failed to have set the correct the ylim and xlim initially. Wouldn't it be nice to have one command that:
Picks lines, points or both via an argument, as in plot(..., type = "l") ?
By default, chooses whether to create a new plot, or add to an existing one according to whether the current device is empty or not.
Rescales axes automatically if the added elements to the plot exceed the current bounds.
Has anyone done anything like this? If not, and there's no strong reason why this isn't possible, I'll answer this myself in a bit...
Some possible functionality that may help with what you want:
The matplot function uses base graphics and will plot several sets of points or lines in one step, figuring out the correct ranges in one step.
There is an update method for lattice graphics that can be used to add/change things in the plot and will therefore result in automatic recalculation of things like limits and axes.
If you add additional information (useing +) to a ggplot2 plot, then the things that are automatically calculated will be recalculated.
You already found zoomplot and there is always the approach of writing your own function like you did.
Anyway, this is what I came up with: (It uses zoomplot from TeachingDemos)
fplot <- function(x, y = NULL, type = "l", new = NULL, xlim, ylim, zoom = TRUE,...){
require(TeachingDemos)
if (is.null(y)){
if (length(dim(x)) == 2){
y = x[,2]
x = x[,1]
} else {
y = x
x = 1:length(y)
}
}
if ( is.null(new) ){
#determine whether to make a new plot or not
new = FALSE
if (is.null(recordPlot()[[1]])) new = TRUE
}
if (missing(xlim)) xlim = range(x)
if (missing(ylim)) ylim = range(y)
if (new){
plot(x, y, type = type, xlim = xlim, ylim = ylim, ...)
} else {
if (type == "p"){
points(x,y, ...)
} else {
lines(x,y, type = type, ...)
}
if (zoom){
#rescale plot
xcur = par("usr")[1:2]
ycur = par("usr")[3:4]
#shrink coordinates and pick biggest
xcur = (xcur - mean(xcur)) /1.08 + mean(xcur)
ycur = (ycur - mean(ycur)) /1.08 + mean(ycur)
xlim = c(min(xlim[1], xcur[1]), max(xlim[2], xcur[2]))
ylim = c(min(ylim[1], ycur[1]), max(ylim[2], ycur[2]))
#zoom plot
zoomplot(xlim, ylim)
}
}
}
So you can do, e.g.
dev.new()
fplot(1:4)
fplot(1:4 +1, col = 2)
fplot(0:400/100 + 1, sin(0:400/10), type = "p")
dev.new()
for (k in 1:20) fplot(sort(rnorm(20)), type = "b", new = (k==1) )
par(mfrow) and log axis don't currently work well with zooming, but, it's a start...
I just came a cross this nice code that makes this scatter matrix plot:
(source: free.fr)
And wanted to implement it to a likret scale variables (integers of 1 to 5) by making the dot's sizes/colors (in the lower triangle) differ according to how many options of that type occurs (like the effect the jitter might have given me).
Any idea on how to do this on the base plotting mechanism ?
Update:
I made the following function, but don't know how to have the scale of the dots always be "good", what do you think ?
panel.smooth2 <- function (x, y, col = par("col"), bg = NA, pch = par("pch"),
cex = 1, col.smooth = "red", span = 2/3, iter = 3, ...)
{
require(reshape)
z <- merge(data.frame(x,y), melt(table(x ,y)),sort =F)$value
z <- z/ (4*max(z))
symbols( x, y, circles = z,#rep(0.1, length(x)), #sample(1:2, length(x), replace = T) ,
inches=F, bg="blue", fg = bg, add = T)
# points(x, y, pch = pch, col = col, bg = bg, cex = cex)
ok <- is.finite(x) & is.finite(y)
if (any(ok))
lines(stats::lowess(x[ok], y[ok], f = span, iter = iter),
col = col.smooth, ...)
}
a1 <- sample(1:5, 100, replace = T)
a2 <- sample(1:5, 100, replace = T)
a3 <- sample(1:5, 100, replace = T)
aa <- data.frame(a1,a2,a3)
pairs(aa , lower.panel=panel.smooth2)
You can use 'symbols' (analogous to the methods 'lines', 'abline' et al.)
This method will give you fine-grained control over both symbols size and color in a single line of code.
Using 'symbols' you can set the symbol size, color, and shape. Shape and size are set by passing in a vector for the size of each symbol and binding it to either 'circles', 'squares', 'rectangles', or 'stars', e.g., 'stars' = c(4, 3, 5, 1). Color is set with 'bg' and/or 'fg'.
symbols( x, y, circles = circle_radii, inches=1/3, bg="blue", fg=NULL)
If i understand the second part of your question, you want to be reasonably sure that the function you use to scale the symbols in your plot does so in a meaningful way. The 'symbols' function scales (for instance) the radii of circles based on values in a 'z' variable (or data.frame column, etc.) In the line below, I set the max symbol size (radius) as 1/3 inches--every symbol except for the largest has a radius some fraction smaller, scaled by the ratio of the value of that dat point over the largest value. than that one in proportion to Is this a good choice? I don't know--it seems to me that diameter or particularly circumference might be better. In any event, that's a trivial change. In sum, 'symbols' with 'circles' passed in will scale the radii of the symbols in proportion to the 'z' coordinate--probably best suited for continuous variables. I would use color ('bg') for discrete variables/factors.
One way to use 'symbols' is to call your plot function and pass in type='n' which creates the plot object but suppresses drawing the symbols so that you can draw them with the 'symbols' function next.
I would not recommend 'cex' for this purpose. 'cex' is a scaling factor for both text size and symbols size, but which of those two plot elements it affects depends on when you pass it in--if you set it via 'par' then it acts on most of the text appearing on the plot; if you set it within the 'plot' function then it affects symbols size.
Sure, just use cex:
set.seed(42)
DF <- data.frame(x=1:10, y=rnorm(10)*10, z=runif(10)*3)
with(DF, plot(x, y, cex=z))
which gives you varying circle sizes. Color can simply be a fourth dimension.