How to change the spacing of values in a plot in R? - r

I am trying to plot a chunk size in relation to run time with the different chunk sizes on the x-axis being 1000, 10000, 100000, and 1000000. However, when I create the plot using the plot() and axis commands.
plot(chunk, totTime, main="Runtime with Different Chunks", xaxt = "n",ylim = c(4,5),ylab="Runtime (sec)", xlab = "Size of Chunk", type="l")
axis(side = 1, c(1000,10000,100000,1000000))
I get a plot that looks like this.
I've tried axp in plot() and at in the axis function but it still has the same spacing. So, I wonder if there was a way to change how the graph spaces the data in the plot so the graph will look cleaner.

Try converting to the log scale but labeling your x-axis according to the set values you want (here, xvalues). This will put equal spacing between orders of magnitude:
# Sample data
totTime <- c(4.4, 4.01, 4.01, 4.8)
chunk <- c(1000, 10000, 100000, 1000000)
# Values desired on the x-asis
xvalues <- c(1000, 10000, 100000, 1000000)
# Plot
plot(
log(chunk), # note the log scale
totTime,
main = "Runtime with Different Chunks",
xaxt = "n",
ylim = c(4, 5),
ylab = "Runtime (sec)",
xlab = "Size of Chunk",
type = "l"
)
axis(side = 1,
at = log(xvalues), # note the log scale
label = xvalues)
Output:

Related

formatting the x-axis exponential plot in R as a^x?

I have generated this plot in R with some strange numbers format in the x-axis:
enter image description here
I want to have in the x-axis the numbers in the format (ax) as 2^6, 6^6, 10^6. this would simplify the x-axis to get data in all points. Please do you have any suggestions?
Here my code :
data=read.csv("my_file.csv",row.names = 1)
plot(genes~Prot,cex=1.5,data, function(x) 10^x, xlab="Proteome
size(codons)",ylim=c(0,30), ylab="Genes in pathway")
abline(lm(prot~genes,data),lty=2, lwd=3,col="black")
Use xaxt = 'n' as an argument to plot to turn off the x-axis labelling. Then use the Axis function to set tick marks and label as required.
# Generating some data
power <- seq(1, 6, length.out = 20)
Prot = 10^power
genes <- runif(20, min = 5, max = 30)
# plotting
plot(x= Prot, y= genes, cex=1.5, xlab="Proteome size(codons)", ylab="Genes in pathway", xaxt = 'n', log = 'xy')
Axis(at = c(2^6, 6^6, 10^6), side = 1, labels = c('2^6', '6^6', '10^6'), las = 1)

R barplots: specify intervals of date-based x-axis

I've been producing different sets of charts, all in R base. I have a problem though with barplots. I've formatted the x-axis to show the dates by year, however, many years show up several times. I would like each year to only show up once.
Here's my example code:
library(quantmod)
start <- as.Date("01/01/2010", "%d/%m/%Y")
#Download FRED data
tickers <- c("WTISPLC", "DCOILBRENTEU")
fred <- lapply(tickers, function(sym) {na.omit(getSymbols(sym, src="FRED", auto.assign=FALSE, return.class = "zoo"))})
df <- do.call(merge, fred)
#Subset for start date
df <- subset(df, index(df)>=start)
#Create bar plot
par(mar = c(5,5,5,5))
barplot(df[,2], names.arg=format(index(df), "%Y"), ann=FALSE, bty="n", tck=-0, col=1:1, border=NA, space=0); title(main="Example chart", ylab="y-axis")
This example should be reproducible and show clearly what I mean. Now, I've been researching how to add a separate x-axis and how to define that axis. So, I've tried to add the following code:
#Plot bars but without x-axis
barplot(df[,2], names.arg=format(index(df), "%Y"), ann=FALSE, bty="n", tck=-0, xaxt="n", col=1:1, border=NA, space=0); title(main="Example chart", ylab="y-axis")
# Set x-axis parameters
x_min <- min(index(df))
x_max <- max(index(df))
xf="%Y"
#Add x-axis
axis.Date(1, at=seq(as.Date(x_min), x_max, "years"), format=xf, las=1, tck=-0)
This does not give me an error message, but it also does absolutely nothing in terms of drawing an x-axis.
Please do not provide a solution for ggplot. Even though I like ggplot, these barplots are part of a bigger project for me, all using R base and I would not like to introduce ggplot into this project now.
Thanks!
If you are not limited to barplot, you may use the following very simple solution using plot.zoo behind the screens:
# only use what you want, and avoid multiple plots
df2 <- df[ , 2]
# use zoo.plot's functionality
plot(df2, main = "Example Chart", ylab = "y-axis", xlab = "")
This yields the following plot:
I know it is not a barplot, but I don't see what a barplot would add here. Please let me know, whether this is what you want or not.
Edit 1
If you do want to use barplot you may use the following code:
### get index of ts in year format
index_y <- format(index(df), "%Y")
### logical vector with true if it is the start of a new year
index_u <- !duplicated(index_y)
### index of start of new year for tick marks
at_tick <- which(index_u)
### label of start of new year
labels <- index_y[index_u]
### draw barplot without X-axis, and store in bp
### bp (bar midpoints) is used to set the ticks right with the axis function
bp <- barplot(df[,2], xaxt = "n", ylab= "y-axis")
axis(side = 1, at = bp[at_tick] , labels = labels)
yielding the following plot:
Please let me know, whether this is what you want.
Edit 2
We need to take into account two bits of information, when explaining why the ticks and labels group together at the left-hand side.
(1) in barplot, space defines the amount of space before each bar (as a fraction of the average bar width). In our case, it defaults to around zero (see ?barplot for details). In the illustration below, we use spaces of 0.0, 0.5, and 2.0
(2) Barplot returns a numeric vector with the midpoints of the bars drawn (again see the help pages for more detailed info). We can use these midpoints to add information to the graph, like we do in the following excerpt: after storing the result of barplot in bp, we use bp to set the ticks: axis(... at = bp[at_tick] ... ).
When we add space, the location of the bar midpoints change. So, when we want to use the bar midpoints after adding space, we need to be sure we have the right information. Simply stated, use the vector returned by barplot with the call where you added space. If you don't, the graph will be messed up. In the below, if you continue to use the bar-midpoints of the call with (space=0), and you increase space, the ticks and labels will group at the left-hand side.
Below, I illustrate this with your data limited to 3 months in 2017.
In the top layer 3 barplots are drawn with space equal to 0.0, 0.5 and 2.0. The information used to calculated the location of ticks and labels is recalculated and saved at every plot.
In the bottom layer, the same 3 barplots are drawn, but the information used to draw the ticks and labels is only created with the first plot (space=0.0)
# Subset for NEW start for illustration of space and bp
start2 <- as.Date("01/10/2017", "%d/%m/%Y")
df2 <- subset(df, index(df)>=start2)
### get index of ts in month format, define ticks and labels
index_y2 <- format(index(df2), "%m")
at_tick2 <- which(!duplicated(index_y2))
labels2 <- index_y2[!duplicated(index_y2)]
par(mfrow = c(2,3))
bp2 <- barplot(df2[,2], xaxt = "n", ylab= "y-axis", space= 0.0, main ="Space = 0.0")
axis(side = 1, at = bp2[at_tick2] , labels = labels2)
bp2 <- barplot(df2[,2], xaxt = "n", ylab= "y-axis", space= 0.5, main ="Space = 0.5")
axis(side = 1, at = bp2[at_tick2] , labels = labels2)
bp2 <- barplot(df2[,2], xaxt = "n", ylab= "y-axis", space= 2.0, main ="Space = 2.0")
axis(side = 1, at = bp2[at_tick2] , labels = labels2)
### the lower layer
bp2 <- barplot(df2[,2], xaxt = "n", ylab= "y-axis", space= 0.0, main ="Space = 0.0")
axis(side = 1, at = bp2[at_tick2] , labels = labels2)
barplot(df2[,2], xaxt = "n", ylab= "y-axis", space= 0.5, main ="Space = 0.5")
axis(side = 1, at = bp2[at_tick2] , labels = labels2)
barplot(df2[,2], xaxt = "n", ylab= "y-axis", space= 2.0, main ="Space = 2.0")
axis(side = 1, at = bp2[at_tick2] , labels = labels2)
par(mfrow = c(1,1))
Have a look here:
Top layer: bp recalculated every time
Bottom layer: bp space=0 reused
Cutting and pasting the commands in your console may illustrate the effects better than the pic above.
I hope this helps.
You could use the axis function, I used match to obtain the indices of the dates on the axis:
space=1
#Plot bars but without x-axis
barplot(df[,2], names.arg=format(index(df), "%Y"), ann=FALSE, bty="n", tck=-0, xaxt="n",
col=1:1, border=NA, space=space); title(main="Example chart", ylab="y-axis")
# Set x-axis parameters
x_min <- min(index(df))
x_max <- max(index(df))
#Add x-axis
axis(1, at=match(seq(as.Date(x_min), x_max, "years"),index(df))*(1+space),
labels = format(seq(as.Date(x_min), x_max, "years"),"%Y"),lwd=0)
Hope this helps!

How to consistently skip (or not skip) x axis labels in R base graphics

I want to create a figure where for various reasons I need to specify the axis labels myself. But when I specify my labels (some have one digit, some two digits) R suppresses every other two-digit label because it decides there isn't enough room to show them all, but it leaves all of the one-digit labels, leaving the axis looking lopsided.
Is there a way to suppress labels consistently across the whole axis, based on whether any of them need to be skipped? Note: I have a lot of plots with varying scales, so I was looking for something I could use for all of them - I don't want to render all the labels for every plot, or to skip every other label in every plot. Suppressing labels will be desirable for some plots and not for others. I just want to skip every other label consistently, if that's what R chooses to do for the particular plot.
(Here is an example figure of what I mean. What I want is for the "6%" label to also be suppressed in the x axis.)
Example code:
library(labeling)
df <- data.frame("estimate" = c(9.81, 14.29, 12.94),
"lower" = c(4.54, 6.25, 5.12),
"upper" = c(12.85, 20.12, 15.84))
ticks <- extended(min(df$lower), max(df$upper), m = 5, only.loose = TRUE,
Q=c(2, 5, 10))
png("examplePlot.png", width = 1200, height = 900, pointsize = 10, res = 300)
bars <- barplot(df$estimate, horiz = TRUE, col = "white", border = NA,
xlim = c(min(ticks), max(ticks)), xaxt = "n", main = "Example")
arrows(df$lower, bars, df$upper, bars, code = 3, angle = 90, length = 0.03)
points(df$estimate, bars, pch = 20)
tickLabels <- paste(ticks, "%", sep = "")
axis(1, at=ticks, labels = tickLabels, cex.axis=1)
axis(2, at = bars, labels = c("c", "b", "a"), lwd = 0, las = 2)
dev.off()
This depends on the size of the plot, so you'll have to plot each label separately:
axis(1, lwd.ticks = 1, labels = FALSE, at = ticks) # plot line and ticks
i <- seq(1,length(ticks),2) # which labels to plot
for(ii in i)
axis(1, at = ticks[ii], labels = tickLabels[ii], cex.axis = 1, lwd = 0)

How to create an x axis break on the density plot?

I am trying to do a density plot of a dataset that has a wide range.
data=c(-10,-20,-20,-18,-17,1000,10000, 500, 500, 500, 500000)
plot(density(data))
As you can see in the figure, we can not see much
.
Is there a way to make an axis break (or several ones) on the x axis to visualise better the distribution of the data? Or, is there a way to plot a certain range of the data in several graphs and than paste it together?
Thanks a lot!
There is a function gap.plot() in package plotrix but I think it has some problems (see How to plot “multiple” curves with a break through y-data-range in R?). I recommend you draw two plots.
## use small margins and relatively big outer margins (to write labels).
old.par <- par(mfrow = c(1, 2), mar = rep(0.5, 4), oma = c(4, 4, 1, 1))
plot(density(data), xlim = c(-1000, 29000), main = "", bty="c") # diff 30000
abline(v = par("usr")[2], lty=2) # keep the same diff of xlim to avoid misleading
plot(density(data), xlim = c(471000, 501000), main = "", yaxt ="n", bty="]") # diff 30000
abline(v = par("usr"[1]), lty=2)
par(old.par)

Mixed plot with histogram and superimposed line plot in same figure

I know there are strong opinions about mixing plot types in the same figures, especially if there are two y axes involved. However, this is a situation in which I have no alternative - I need to create a figure using R that follows a standard format - a histogram on one axis (case counts), and a superimposed line graph showing an unrelated rate on an independent axis.
The best I have been able to do is stacked ggplot2 facets, but this is not as easy to interpret for the purposes of this analysis as the combined figure. The people reviewing this output will need it in the format they are used to.
I'm attaching an example below.
Any ideas?
For etiquette purposes, sample data below:
y1<-sample(0:1000,20,rep=TRUE)
y2<-sample(0:100,20,rep=TRUE)
x<-1981:2000
I feel your pain - have had to recreate plots before. even did it in SAS once
if it's a once off, I'm be tempted to go old-school. something like this:
# Generate some data
someData <- data.frame(Year = 1987:2009,
mCases = rpois(23, 3),
pVac = sample(55:80, 23, T))
par(mar = c(5, 5, 5, 5))
with(someData, {
# Generate the barplot
BP <- barplot(mCases, ylim = c(0, 18), names = Year,
yaxt = "n", xlab = "", ylab = "Measles cases in Thousands")
axis(side = 2, at = 2*1:9, las = 1)
box()
# Add the % Vaccinated
par(new = T)
plot(BP, pVac, type = "l", ylim = c(0, 100), axes = F, ylab = "", xlab = "")
axis(side = 4, las = 1)
nudge <- ifelse(pVac > median(pVac), 2, -2)
text(BP, pVac + nudge, pVac)
mtext(side = 4, "% Vaccinated", line = 3)
par(new = F)
})
Try library(plotrix)
library(plotrix)
## Create sample data
y2<-sample(0:80,20,rep=TRUE)
x2<-sort(sample(1980:2010,20,rep=F))
y1<-sample(0:18,20,rep=TRUE)
x1<-sort(sample(1980:2010,20,rep=F))
x<-1980:2010
twoord.plot(x1,y1,x2,y2,
lylim=c(0,18),rylim=c(0,100),type=c("bar","l"),
ylab="Measles Cases in thousands",rylab="% Vaccinated",
lytickpos=seq(0,18,by=2),rytickpos=seq(0,100,by=10),ylab.at=9,rylab.at=50,
lcol=3,rcol=4)

Resources