Reduce space between ticks in ggplot2 - r

I made a time series plot of several random walks and by now I understand how to extract a certain part of it and how to change the ticks from years to months. But even after long testing I don't get how to manipulate the x-axis in my graph properly.
Right now, it displays 50 year-steps and only every second white vertical grid line is labelled (why? In every tutorial I watch all lines are labelled instead). What I want to achieve is to change the scaling, so less space is used horizontally (i.e. reduce the space between all the ticks on the x-axis), so the first tick would be at 2000, the second (not the third as is currently the case) at 2050, and so on. I think this should be somehow achievable with breaks, but I can't figure it out. Finally the plot starts and ends too early on the left and on the right, but I believe I can handle that.
Here is the plot:
set.seed(21)
n <- 2500
x <- matrix(replicate(20,cumsum(sample(c(-1, 1), n, TRUE))),nrow = 2500,ncol=20)
aa <- x
rnames <- seq(as.Date("2010-01-01"), length=dim(aa)[1], by="1 month") - 1
rownames(aa) <- format(as.POSIXlt(rnames, format = "%Y-%m-%d"), format = "%d.%m.%Y")
colnames(aa) <- paste0("aa",1:k)
library("ggplot2")
library("reshape2")
library("scales")
aa <- melt(aa, id.vars = rownames(aa))
names(aa) <- c("time","id","value")
aa$time <- as.Date(aa$time, "%d.%m.%Y")
ggplot(aa, aes(x=time,y=value,colour=id,group=id)) +
geom_line()

By default, ggplot adds a minor grid line (that is, a grid line without a tick mark or tick label) between each major grid line. To include only major grid lines add scale_x_date(minor_breaks=NULL). (If you're not seeing minor grid lines in the tutorial videos you've watched, my guess is that they are there, but difficult or impossible to see due to insufficient resolution and/or small size of the video image.)
To reduce the physical distance between tick marks, you would need to change the aspect ratio of the plot. For example, if you want the vertical extent of the plot to be, say 3", then you would need to shrink the horizontal extent until you get a small enough distance between tick marks. First, let's create a plot:
ggplot(aa, aes(x=time,y=value,colour=id,group=id)) +
geom_line(show.legend=FALSE) +
scale_x_date(minor_breaks=NULL)
Here are two examples of rendering the plot:
UPDATE: To answer the comment: For the plots above, I used grid.arrange to create the plot layout and then saved it as a png from the RStudio plot window. I used the widths argument to make one plot thinner than the other.
library(gridExtra)
grid.arrange(p1, p1, widths=c(0.6,0.4), ncol=2)
However, you can adjust the size precisely in many different ways, depending on what format you desire. For example:
# PNG format
png("wide.png", 500,500)
p1
dev.off()
png("narrow.png", 300,500)
p1
dev.off()
# PDF format
pdf("wide.pdf", 5, 5)
p1
dev.off()
pdf("narrow.pdf", 3, 5)
p1
dev.off()

Related

Add data labels to spineplot in R

iFacColName <- "hireMonth"
iTargetColName <- "attrition"
iFacVector <- as.factor(c(1,1,1,1,10,1,1,1,12,9,9,1,10,12,1,9,5))
iTargetVector <- as.factor(c(1,1,0,1,1,0,0,1,1,0,1,0,1,1,1,1,1))
sp <- spineplot(iFacVector,iTargetVector,xlab=iFacColName,ylab=iTargetColName,main=paste0(iFacColName," vs. ",iTargetColName," Spineplot"))
spLabelPass <- sp[,2]/(sp[,1]+sp[,2])
spLabelFail <- 1-spLabelPass
text(seq_len(nrow(sp)),rep(.95,length(spLabelPass)),labels=as.character(spLabelPass),cex=.8)
For some reason, the text() function only plots one label far to the right of the graph. I have used this format to apply data labels to other types of graphs, so I am confused.
EDIT: added more code to make example work
You're not placing your labels inside the plotting region. It only extends to around 1.3 on the x axis. Try plotting something like
text(
cumsum(prop.table(table(iFacVector))),
rep(.95, length(spLabelPass)),
labels = as.character(round(spLabelPass, 1)),
cex = .8
)
and you'll get something like
This is obviously not the right positions for the labels, but you should be able to figure that out by yourself. (You're going to have to subtract half of the frequency for each bar from the cumulative frequency and account for the fact that the bars are padded with some amount of whitespace.

Plotting a scaled version of a variable using ggplot2. Need to show the scale too

I am trying to plot a variable over time using ggplot2.
My current plot looks like this:
However, I want the scaled values with significant numbers shown on the axis. The scale needs to be shown at the top left corner. Something like:-
I don't want to scale the plot. The axis needs to show the significant numbers and the exponential scale needs to be shown at the top left.
Is this what you had in mind (my own MWE):
library(ggplot2)
df <- data.frame( x=rnorm(10), y=seq(1e8,10e8,by=1e8))
p <- ggplot(df)+
geom_point(aes(x=x,y=y/1e8))+
geom_text(aes(x=-Inf,y=Inf,label=as.character(paste("10^8"))), parse=T, hjust=1, vjust=0.8, size=5/14*10)+
scale_y_continuous(name="y")
gt <- ggplot_gtable(ggplot_build(p))
gt$layout$clip[gt$layout$name == "panel"] <- "off"
grid.draw(gt)
Basically divide by the exponent, and then write it manually at the top left. The parse=T renders the power as an actual power. That factor of 5/14 was mentioned on another post as being the rough factor by which geom_text size relates to the size you set text to be elsewhere on the graph (no idea why but I've found doing 5/14*text-size-you-want gives a decent looking geom_text). Finally the last three lines prevent the off-plot-panel text being cropped out (you wouldn't see the geom_text otherwise). Hope this gives you something to work with.

How to add second y axis to seqfplot with sequence frequency?

I'm working with TraMineR to do a sequence analysis of educational data. I can get R to produce a plot of the 10 most frequent sequences in the data using code similar to the following:
library(TraMineR)
##Loading the data
data(actcal)
##Creating the labels and defining the sequence object
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal, 13:24, labels=actcal.lab)
## 10 most frequent sequences in the data
actcal.freq <- seqtab(actcal.seq)
actcal.freq
## Plotting the object
seqfplot(actcal.seq, pbarw=FALSE, yaxis="pct", tlim=10:1, cex.legend=.75, withlegend="right")
However, I'd also like to have the frequencies of each sequence (which are in the object actcal.freq) along the right side of the plot. For example, the first sequence in the plot created by the code above represents 37.9% of the data (as the plot currently shows). Per the seqtab, this is 757 subjects. I'd like the number 757 to appear on the right y-axis (and so on for the other sequences).
Is this possible? I've played around with axis(side=4, ...) but never been able to get it to reproduce the spacing of the left y-axis.
OK. This is a bit of a mess, but the function resets the par setting if you include a legend by default, so you need to turn that off. Then you can set the axis a bit more easily, and then we can go back for the legend. This should work with your test data above.
#add padding to the right for axis and legend
par("mar"=c(5,4,4,8)+.1)
#plot w/o axis
seqfplot(actcal.seq, pbarw=FALSE, yaxis="pct", tlim=10:1, withlegend=F)
#plot right axis with freqs
axis(4, at = seq(.7, by=1.2, length.out=length(attr(actcal.freq,"freq")$Freq)),
labels = rev(attr(actcal.freq,"freq")$Freq),
mgp = c(1.5, 0.5, 0), las = 1, tick = FALSE)
#now put the legend on
legend("right", legend=attr(actcal.seq, "labels"),
fill=attr(actcal.seq, "cpal"),
inset=-.3, bty="o", xpd=NA, cex=.75)
You may need to play a bit with the margins and especially the inset= parameter of the legend to get it placed correctly. I hope your real data isn't too much different than this because you really have to dig though the function to see how it does the formatting to get things to match up.

Set margins to cater for large legend

I'm trying to figure out a way to calculate the height of a legend for a plot prior to setting the margins of the plot. I intend to place the legend below the plot below the x-axis labels and title.
As it is part of a function which plots a range of things the legend can grow and shrink in size to cater for 2 items, up to 15 or more, so I need to figure out how I can do this dynamically rather that hard-coding. So, in the end I need to dynamically set the margin and some other bits and pieces.
The key challenge is to figure out the height of the legend to feed into par(mar) prior to drawing the plot, but after dissecting the base codes for legend however, it seems impossible to get a solid estimate of the height value unless the plot is actually drawn (chicken and egg anyone?)
Here's what I've tried already:
get a height using the legend$rect$h output from the base legend function (which seems to give a height value which is incorrect unless the plot is actually drawn)
calculate the number of rows in the legend (easy) and multiply this by the line height (in order to do this, seems you'd need to translate into inches (the base legend code uses yinch and I've also tried grconvertY but neither of those work unless a plot has been drawn).
Another challenge is to work out the correct y value for placement of the legend - I figure that once I've solved the first challenge, the second will be easy.
EDIT:
After a day of sweating over how this is (not) working. I have a couple of insights and a couple of questions. For the sake of clarity, this is what my function essentially does:
step 1) set the margins
step 2) create the barplot on the left axis
step 3) re-set the usr coordinates - this is necessary to ensure alignment of the right axis otherwise it plots against the x-axis scale. Not good when they are markedly different.
step 4) create the right axis
step 5) create a series of line charts on the right axis
step 6) do some labelling of the two axes and the x-axis
step 7) add in the legend
Here are the questions
Q1) What units are things reported in? I'm interested in margin lines and coordinates (user-coordinates), inches is self explanatory. - I can do some conversions using grconvertY() but I'm not sure what I'm looking at and what I should be converting to - the documentation isn't so great.
Q2) I need to set the margin in step 1 so that there is enough room at the bottom of the chart for the legend. I think I'm getting that right, however I need to set the legend after the right axis and line charts are set, which means that the user coordinates (and the pixel value of an inch, has changed. Because of Q1 above I'm not sure how to translate one system to the other. Any ideas in this regard would be appreciated.
After another day of sweating over this here's what solved it mostly for me.
I pulled apart the code for the core legend function and compiled this:
#calculate legend buffer
cin <- par("cin")
Cex <- par("cex")
yc <- Cex * cin[2L] #cin(inches) * maginfication
yextra <- 0
ymax <- yc * max(1, strheight("Example", units = "inches", cex = Cex)/yc)
ychar <- yextra + ymax #coordinates
legendHeight <- (legendLines * ychar) + yc # in
Which is essentially mimicking the way the core function calculates legend height but returns the height in inches rather than in user coordinates. legendLines is the number of lines in the legend.
After that, it's a doddle to work out how to place the legend, and to set the lower margin correctly. I'm using:
#calculate inches per margin line
inchesPerMarLine<-par("mai")[1]/par("mar")[1]
To calculate the number of inches per margin line, and the following to set the buffers (for the axis labels and title, and the bottom of the chart), and the margin of the plot.
#set buffers
bottomBuffer = 1
buffer=2
#calculate legend buffer
legBuffer <- legendHeight/inchesPerMarLine
#start the new plot
plot.new()
# set margin
bottomMargin <- buffer + legBuffer + bottomBuffer
par(mar=c(bottomMargin,8,3,5))
The plot is made
barplot(data, width=1, col=barCol, names.arg=names, ylab="", las=1 ,axes=F, ylim=c(0,maxL), axis.lty=1)
And then the legend is placed. I've used a different method to extract the legend width which does have some challenges when there is a legend with 1 point, however, it works ok for now. Putting the legend into a variable allows you to access the width of the box like l$rect$w. trace=TRUE and plot=FALSE stop the legend being written to the plot just yet.
ycoord <- -1*(yinch(inchesPerMarLine*buffer)*1.8)
l<-legend(x=par("usr")[1], y=ycoord, inset=c(0,-0.25), legendText, fill=legendColour, horiz=FALSE, bty = "n", ncol=3, trace=TRUE,plot=FALSE)
lx <- mean(par("usr")[1:2]-(l$rect$w/2))
legend(x=lx, y=ycoord, legendText, fill=legendColour, horiz=FALSE, bty = "n", ncol=3)
For completeness, this is how I calculate the number of lines in the legend. Note - the number of columns in the legend is 3. labelSeries is the list of legend labels.
legendLines <- ceiling(nrow(labelSeries)/3)

Multiple plot in the same figure

I have several data and I need to plot them compactly in a picture like this:
I already tried par() layout() and ggplot() but plots are displayed so far each other.
I need them to be very close, as if they were in the same plot with a different y (e.g. plot1 y=0, plot2 y=1, plot3 y=3 and so on..)
Can someone help me?
That can be acquired using the layout, also, but maybe an easier approach is to set the graphical parameters in a suitable way.
Function par() let's you specify the number of panels in a single figure using the argument mfrow. It takes a vector of two numbers, that specify the number sub-figure rows and columns. For example, c(2,1) would create two rows of figure,s but only a single column. That's what is in your example figure. You can change the number of figure rows to the number of sub-figures you would like to plot vertically.
In addition, the margins around each sub-figure can be set using the argument mar. The margins are specified in the order of 1. bottom, 2. left, 3. top., and 4. right. Making the bottom and top margins smaller would draw your sub-figures closer together.
In R this could look something like the following:
# Simulate some random data
a<-runif(10000)
b<-runif(10000)
# Open a new plot windows
# width: 7 inches, height: 2 inches
x11(width=7, height=1)
# Specify the number of sub-figures
# Specify the margins (top and bottom are 0.1, left and right are 2)
# Needs some experimenting with to get these right
par(mfrow=c(2,1), mar=c(0.1,2,0.1,2))
# Plot the figures
barplot(a)
barplot(b)
The resulting figure should roughly resemble this:
Here is ggplot version using facet_grid:
df <- data.frame(a=runif(3e3), b=rep(letters[1:3], 1e3), c=rep(1:1e3, 3))
ggplot(df, aes(y=a, x=c)) + geom_bar(stat="identity") + facet_grid(b ~ .)

Resources