How can I prevent resizing of fonts, plot objects etc. in R? - r

I want to have multiple plots in the same image, and I want to have a different number of plots depending on image. To be precise, I first create a 1x2 matrix of plots, and then a 3x2 matrix of plots. I want to use the same basic settings for these two images - the same font sizes especially, since this is for a paper and the font size has to be at least 6 pt for a plot.
In order to achieve this, I wrote the following code for R:
filename = "test.png"
font.pt = 6 # font size in pts (1/72 inches)
total.w = 3 # total width in inches
plot.ar = 4/3 # aspect ratio for single plot
mat.col = 2 # number of columns
mat.row = 1 # number of rows
dpi = 300
plot.mar = c(3, 3, 1, 2) + 0.1
plot.mgp = c(2, 1, 0)
plot.w = total.w / mat.col - 0.2 * plot.mar[2] - 0.2 * plot.mar[4]
plot.h = plot.w / plot.ar
total.h = (plot.h + 0.2 * plot.mar[1] + 0.2 * plot.mar[3]) * mat.row
png(filename, width = total.w, height = total.h, res = dpi * 12 / font.pt, units = "in")
par(mfrow = c(mat.row, mat.col), mai = 0.2 * plot.mar, mgp = plot.mgp)
plot(1, 1, axes = T, typ = 'p', pch = 20, xlab = "Y Test", ylab = "X Test")
dev.off()
As you can see, I set a total width of 3 inches and then calculate the total height for my image, so that the aspect ratio of the plots is correct. The font size only changes the resolution by a factor.
Anyway, the problem is now that the font size changes significantly when I go from mat.row = 1 to mat.row = 3. Other things change as well, for example the labelling of the axes and the margins, even though I specifically set those before in inches. Have a look:
When 3 rows are set (cropped image):
When only 1 row is set (cropped image):
How can I prevent this? As far as I can see, I did everything I could. This took me quite a while, so I'd like to get this to work instead of switching to gglplot and learning everything from scratch again. It's also small enough that I really hope I'm just missing something very obvious.

In ?par we can find:
In a layout with exactly two rows and columns the base value of "cex"
is reduced by a factor of 0.83: if there are three or more of either
rows or columns, the reduction factor is 0.66.
Therefore, when you change mfrow values from (2, 1) to (2, 3) the cex value changes from 0.83 to 0.66. cex affects font size and text line height.
So, you can manually specify cex value for your plots.
par(mfrow = c(mat.row, mat.col), mai = 0.2 * plot.mar, mgp = plot.mgp, cex = 1)
Hope, it is what you need.
Plot for mat.row = 1 (cropped):
And plot for mat.row = 3 (cropped):

Related

non-linear 2d object transformation by horizontal axis

How can such a non-linear transformation be done?
here is the code to draw it
my.sin <- function(ve,a,f,p) a*sin(f*ve+p)
s1 <- my.sin(1:100, 15, 0.1, 0.5)
s2 <- my.sin(1:100, 21, 0.2, 1)
s <- s1+s2+10+1:100
par(mfrow=c(1,2),mar=rep(2,4))
plot(s,t="l",main = "input") ; abline(h=seq(10,120,by = 5),col=8)
plot(s*7,t="l",main = "output")
abline(h=cumsum(s)/10*2,col=8)
don't look at the vector, don't look at the values, only look at the horizontal grid, only the grid matters
####UPDATE####
I see that my question is not clear to many people, I apologize for that...
Here are examples of transformations only along the vertical axis, maybe now it will be more clear to you what I want
link Source
#### UPDATE 2 ####
Thanks for your answer, this looks like what I need, but I have a few more questions if I may.
To clarify, I want to explain why I need this, I want to compare vectors with each other that are non-linearly distorted along the horizontal axis .. Maybe there are already ready-made tools for this?
You mentioned that there are many ways to do such non-linear transformations, can you name a few of the best ones in my case?
how to make the function f() more non-linear, so that it consists, for example, not of one sinusoid, but of 10 or more. Тhe figure shows that the distortion is quite simple, it corresponds to one sinusoid
and how to make the function f can be changed with different combinations of sinusoids.
set.seed(126)
par(mar = rep(2, 4),mfrow=c(1,3))
s <- cumsum(rnorm(100))
r <- range(s)
gridlines <- seq(r[1]*2, r[2]*2, by = 0.2)
plot(s, t = "l", main = "input")
abline(h = gridlines, col = 8)
f <- function(x) 2 * sin(x)/2 + x
plot(s, t = "l", main = "input+new greed")
abline(h = f(gridlines), col = 8)
plot(f(s), t = "l", main = "output")
abline(h = f(gridlines), col = 8)
If I understand you correctly, you wish to map the vector s from the regular spacing defined in the first image to the irregular spacing implied by the second plot.
Unfortunately, your mapping is not well-defined, since there is no clear correspondence between the horizontal lines in the first image and the second image. There are in fact an infinite number of ways to map the first space to the second.
We can alter your example a bit to make it a bit more rigorous.
If we start with your function and your data:
my.sin <- function(ve, a, f, p) a * sin(f * ve + p)
s1 <- my.sin(1:100, 15, 0.1, 0.5)
s2 <- my.sin(1:100, 21, 0.2, 1)
s <- s1 + s2 + 10 + 1:100
Let us also create a vector of gridlines that we will draw on the first plot:
gridlines <- seq(10, 120, by = 2.5)
Now we can recreate your first plot:
par(mar = rep(2, 4))
plot(s, t = "l", main = "input")
abline(h = gridlines, col = 8)
Now, suppose we have a function that maps our y axis values to a different value:
f <- function(x) 2 * sin(x/5) + x
If we apply this to our gridlines, we have something similar to your second image:
plot(s, t = "l", main = "input")
abline(h = f(gridlines), col = 8)
Now, what we want to do here is effectively transform our curve so that it is stretched or compressed in such a way that it crosses the gridlines at the same points as the gridlines in the original image. To do this, we simply apply our mapping function to s. We can check the correspondence to the original gridlines by plotting our new curves with a transformed axis :
plot(f(s), t = "l", main = "output", yaxt = "n")
axis(2, at = f(20 * 1:6), labels = 20 * 1:6)
abline(h = f(gridlines), col = 8)
It may be possible to create a mapping function using the cumsum(s)/10 * 2 that you have in your original example, but it is not clear how you want this to correspond to the original y axis values.
Response to edits
It's not clear what you mean by comparing two vectors. If one is a non-linear deformation of the other, then presumably you want to find the underlying function that produces the deformation. It is possible to create a function that applies the deformation empirically simply by doing f <- approxfun(untransformed_vector, transformed_vector).
I didn't say there were many ways of doing non-linear transformations. What I meant is that in your original example, there is no correspondence between the grid lines in the original picture and the second picture, so there is an infinite choice for which gridines in the first picture correspond to which gridlines in the second picture. There is therefore an infinite choice of mapping functions that could be specified.
The function f can be as complicated as you like, but in this scenario it should at least be everywhere non-decreasing, such that any value of the function's output can be mapped back to a single value of its input. For example, function(x) x + sin(x)/4 + cos(3*(x + 2))/5 would be a complex but ever-increasing sinusoidal function.

Saving dataframe to pdf adjust width

I found that grid.table could be used to plot a dataframe to a pdf file, as described here. I want to save a dataframe to a landscape A4 format, however it seems to not scale the data such that is nicely fits within the borders of the pdf.
Code
library(gridExtra)
set.seed(1)
strings <- c('Wow this is nice', 'I need some coffee', 'Insert something here', 'No ideas left')
table <- as.data.frame(matrix(ncol = 9, nrow = 30, data = sample(strings,30, replace = TRUE)))
pdf("test.pdf", paper = 'a4r')
grid.table(table)
dev.off()
Output
Not the whole table is shown in the pdf:
Question
How can I make sure that the dataframe is scaled to fit within the landscape A4? I don't explicitly need gridExtra or the default pdf save, I can use any other package if these are easier to fix this.
EDIT
I came across this other question, appareantly one can figure out the required height and width of the tableGrob
tg = gridExtra::tableGrob(table)
h = grid::convertHeight(sum(tg$heights), "mm", TRUE)
w = grid::convertWidth(sum(tg$widths), "mm", TRUE)
ggplot2::ggsave("test.pdf", tg, width=w, height=h, units = 'mm')
Here h = 172.2 and w = 444.3 which exceed the size of a standard A4, namely 210 x 279. So I know this causes the problem however still can't figure out to scale down the table to fit it on an A4.
I figured out I could add the scale parameter to ggsave. I wrote a simple function to get the optimal scale:
optimal.scale <- function(w,h, wanted.w, wanted.h) max(c(w/wanted.w, h/wanted.h))
I added 0.1 to the scale to add a margin to the plot such that the text is not directly on the edge of the paper. Then I passed the resulting scale to ggsave
tg = gridExtra::tableGrob(table
h = grid::convertHeight(sum(tg$heights), "mm", TRUE)
w = grid::convertWidth(sum(tg$widths), "mm", TRUE)
scale = optimal.scale(w,h, 279, 210) + 0.1 #A4 = 279 x 210 in landscape
ggplot2::ggsave("test.pdf", tg, width = 279, height = 210, units = 'mm' , scale = scale)
Now my table fits on the A4:

R corrplot - color relying on value

I have a binary data.frame (53115 rows; 520 columns) and I want to plot a correlation plot. I want to colour it based on the values, correlation values >=0.95 (red), otherwise, blue.
correl <- abs(round(cor(bin_mat), 2))
pdf("corrplot.pdf", width = 200, height = 200)
a <- corrplot(correl, order = "hclust", addCoef.col = "black", number.cex=0.8, cl.lim = c(0,1), col=c(rep("deepskyblue",19) ,"red"))
dev.off()
I get the correlation plot but in many cases I get a wrong coloring (see below on 0.91).
data: file
How can I manage to have a right coloring?
In general corrplot library is quite weird when it comes to cl.lim and colors. For some reason it doesn't seem to matter if you set cl.lim or not - the colors will still be distributed from -1 to 1.
So in your case just use 39 blue colors instead of 19 (to cover the range from -1 to 1):
cors <- cor(iris[,-5])
cors[cbind(c(1,2), c(2,1))] <- 0.912
corrplot(cors, col=c(rep("blue", 39), "red"), cl.lim=c(-1,1), addCoef.col="black")
And the result:

How do I appropriately use polygon() to shade the confidence interval of a plot and appropriately scale it so that everything can be seen?

Currently I have a simulated data set describing the concentration-time profile of a drug at 10 different doses, with five replications ("subjects") for each dosing group. I am trying to create one file with plots for each dose, with the median concentration plotted as a dashed line, and a shaded confidence interval which I hoped to achieve with polygon().
# vector with the simulated doses
dvec <- c(0.1, 0.3, 1, 3, 10, 30, 100, 300, 500, 1000)
win.metafile("indv_plot.wmf", width = 25, height = 20)
par(mfrow = c(2, 5))
for (bb in c(1:10)) {
## sumdat is a data file which has summary statistics and CI
dose_d <- sumdat[sumdat$DOSE == dvec[bb], ]
plot(dose_d$TIME, dose_d$OBS_MED,
type = "n", lty = 2, col = "black",
xlab = "Time (hr)", ylab = "Plasma Concentration",
main = paste("DOSE=", dvec[bb]))
polygon(c(dose_d$TIME, rev(dose_d$TIME)), c(dose_d$LPL_MED, rev(dose_d$UPL_MED)),
col = "salmon", border = "red")
}
There are two problems I have when running this code:
polygon() seems to be successful in generating the shaded region I want, but my dashed line is nohere to be seen on the graph. How do I appropriately overlay this so that both can be seen?
In the .wmf file, the y-axes are extremely misleading (scaled so that each of the 10 graphs looks nearly identical!) and cut off part of the shaded region, such that you can no longer see the peak of the curve. How do I fix this scaling? Please note that the doses cover a 10k fold range.
EDIT: Here is a sample of the spreadsheet I'm working with with the numbers slightly different from the ones I'm working with.
DOSE TIME OBS_MED LPL_MED UPL_MED
0.1 0.25 0.00133825 0.001223836 0.002291141
0.1 0.5 0.001747625 0.002151059 0.003686252
0.1 1 0.00308325 0.003017057 0.005157501
0.1 2 0.003539375 0.003388839 0.005425594
0.1 3 0.002771875 0.003205603 0.004896142
0.1 5 0.002286875 0.002368057 0.003719701
0.1 7 0.0020495 0.00164708 0.002937914
0.1 12 0.001414625 0.000644596 0.001710477
30 0.25 0.760858151 0.275588118 0.470376128
30 0.5 0.749280163 0.468870272 0.774292746
30 1 1.264732715 0.677246853 1.069407039
30 2 1.219044091 0.769589233 1.148778861
30 3 1.084113451 0.70481485 1.06030292
30 5 1.014486376 0.527557896 0.791142911
30 7 0.600676092 0.368193808 0.610086631
30 12 0.287301205 0.138354359 0.371749849
Your polygon is called to draw after the line is plotted. Without transparency in the polygon fill, there's no way you will see the line. Plot the polygon first.
You are depending on the automatic axis choices made by the plot function, which are based on the limits of the line rather than of the polygon. You may specify your own limits as needed, as explained in the man pages.

Bubble chart for integer variables where the largest bubble has a diameter of 1 (on the x or y axis scale)?

I want to achieve the following outcomes:
Rescale the size of the bubbles such that the largest bubble has a
diameter of 1 (on whichever has the more compressed scale of the x
and y axes).
Rescale the size of the bubbles such that the smallest bubble has a diameter of 1 mm
Have a legend with the first and last points the minimum non-zero
frequency and the maximum frequency.
The best I have been able to do is as follows, but I need a more general solution where the value of maxSize is computed rather than hard-coded. If I was doing it in the traditional R plots I would use par("pin") to work out the size of plot area and work backwards, but I cannot figure out how to access this information with ggplot2. Any suggestions?
library(ggplot2)
agData = data.frame(
class=rep(1:7,3),
drv = rep(1:3,rep(7,3)),
freq = as.numeric(xtabs(~class+drv,data = mpg))
)
agData = agData[agData$freq != 0,]
rng = range(agData$freq)
mn = rng[1]
mx = rng[2]
minimumArea = mx - mn
maxSize = 20
minSize = max(1,maxSize * sqrt(mn/mx))
qplot(class,drv,data = agData, size = freq) + theme_bw() +
scale_area(range = c(minSize,maxSize),
breaks = seq(mn,mx,minimumArea/4), limits = rng)
Here is what it looks like so far:
When no ggplot, lattice or other highlevel package seems to do the job without hours of fine tuning I always revert to the base graphics. The following code gets you what you want, and after it I have another example based on how I would have plotted it.
Note however that I have set the maximum radius to 1 cm, but just divide size.range/2 to get diameter instead. I just thought radius gave me nicer plots, and you'll probably want to adjust things anyways.
size.range <- c(.1, 1) # Min and max radius of circles, in cm
# Calculate the relative radius of each circle
radii <- sqrt(agData$freq)
radii <- diff(size.range)*(radii - min(radii))/diff(range(radii)) + size.range[1]
# Plot in two panels
mar0 <- par("mar")
layout(t(1:2), widths=c(4,1))
# Panel 1: The circles
par(mar=c(mar0[1:3],.5))
symbols(agData$class, agData$drv, radii, inches=size.range[2]/cm(1), bg="black")
# Panel 2: The legend
par(mar=c(mar0[1],.5,mar0[3:4]))
symbols(c(0,0), 1:2, size.range, xlim=c(-4, 4), ylim=c(-2,4),
inches=1/cm(1), bg="black", axes=FALSE, xlab="", ylab="")
text(0, 3, "Freq")
text(c(2,0), 1:2, range(agData$freq), col=c("black", "white"))
# Reset par settings
par(mar=mar0)
Now follows my suggestion. The largest circle has a radius of 1 cm and area of the circles are proportional to agData$freq, without forcing a size of the smallest circle. Personally I think this is easier to read (both code and figure) and looks nicer.
with(agData, symbols(class, drv, sqrt(freq),
inches=size.range[2]/cm(1), bg="black"))
with(agData, text(class, drv, freq, col="white"))

Resources