R Plotting Series of Data with Secondary axis - r

I would like to plot a data frame with 4 columns:
Quarters <- c("Q1","Q2","Q3")
Series1 <- c("1%","2%","3%")
Series2 <- c("4%","5%","6%")
Series3 <- c("1000","2000","3000")
df <- data.frame(Quarters,Series1,Series2,Series3)
Quarters as x-axis, Series1 & Series2 as left y-axis, Series3 as right y-axis and a legend.
I have seen some solutions with ggplot using scale_y_continues, but then the secondary (y) axis has to be a multiple of the primary axis. Which I do not want, as the data will be dynamic and the ratio might not hold through in all instances.
Any solutions how I might go about creating this? Perhaps ggplot is not the way to go?

I don't know about ggplot2, but you can use par(new = T) in R to plot a graph on top of another one.
If you remove the right axis from the first plot and add it manually on the second one it should look good.
Quarters <- c(1,2,3)
Series1 <- c(0.01,0.02,0.03)
Series2 <- c(0.04,0.05,0.06)
Series3 <- c(1000,2000,3000)
par(mar = c(5,5,2,5)) # Leaves some space for the second axis
plot(Quarters,Series1,type="l",ylim=c(0,0.1))
lines(Quarters,Series2,col="red")
par(new=T)
plot(Quarters,Series3,type="l",axes=F, xlab=NA, ylab=NA,col="blue") # Removes axis and labels so they don't overlap
axis(side = 4) # Adds secondary axis
Does this work for you? More info here

ggplot2 is perfectly fine and deals with dual-axis very well. You would use sec.axis within scale_y_continuous or scale_y_discrete (or really just about any valid scale_y_) call:
scale_y_continuous(
"Casualties* due to:",
sec.axis = sec_axis(~. *0.001,
name="Aircraft passengers carried, bn",
labels = scaleFUN,
breaks = seq(0,3, by=0.5)),
limits = c(0,3000),
breaks = seq(0,3000, by=500),
labels = comma
)
The following creates two axis, one with a break of 0 to 3000, by 500. That's the axis on the left (primary axis). The second one goes by 0 to 3 by 0.5, but there's no reason why it should follow that scale. You can very well have scales that are not multiples of the primary axis.
You can get a plot like the following:
Using the above technique. If it is helpful I put up the full ggplot code to recreate the above plot in this post. Completely done in ggplot2 including the horizontal legend and secondary axis.
Good luck!

Related

Y axis to percent using barplot [duplicate]

I'm plotting a graph using this
plot(dates,returns)
I would like to have the returns expressed as percentages instead of numbers. 0.1 would become 10%. Also, the numbers on the y-axis appear tilted 90 degrees on the left. Is it possible to make them appear horizontally?
Here is one way using las=TRUE to turn the labels on the y-axis and axis() for the new y-axis with adjusted labels.
dates <- 1:10
returns <- runif(10)
plot(dates, returns, yaxt="n")
axis(2, at=pretty(returns), lab=pretty(returns) * 100, las=TRUE)
If you use ggplot you can use the scales package.
library(scales)
plot + scale_y_continuous(labels = percent)
library(scales)
dates <- 1:100
returns <- runif(100)
yticks_val <- pretty_breaks(n=5)(returns)
plot(dates, returns, yaxt="n")
axis(2, at=yticks_val, lab=percent(yticks_val))
Highlights:
No need to explicitly add "%"
Manually fix the number of y-ticks to be consistent with further plots. Here I chose 5.
Combining two answers together #rengis #vladiim

Axis breaks in ggplot histogram in R [duplicate]

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:

How can I make legend next to my piechart in R?

I have made a piechart in R with the next code:
#make slices
slices <- c(19, 26, 55)
# Define some colors
colors <- c("yellow2","olivedrab3","orangered3")
# Calculate the percentage for each day, rounded to one decimal place
slices_labels <- round(slices/sum(slices) * 100, 1)
# Concatenate a '%' char after each value
slices_labels <- paste(slices_labels, "%", sep="")
# Create a pie chart with defined heading and custom colors and labels
pie(slices, main="Sum", col=colors, labels=slices_labels, cex=0.8)
# Create a legend at the right
legend("topright", c("DH","UT","AM"), cex=0.7, fill=colors)
But I want the legend next to my piechart. I have also tried the following code: legend("centreright", c("DH","UT","AM"), cex=0.7, fill=colors).
But this does not give me a legend next to my pie chart.
Which code do I have to use to make a legend next to my pie chart in the middle?
You can play with the x and y argument from legend (cf ?legend):
legend(.9, .1, c("DH","UT","AM"), cex = 0.7, fill = colors)
However, a pie chart may not be the best way to represent your data, because our eye is not very good in assessing angles. The only usecase where a pie chart seems reasonable to me is when comparing 2 categories, because due to watches we can assess these proportions rather easily.

How can I make my vertical labels fit within my plotting window?

I'm creating a histogram in R which displays the frequency of several events in a vector. Each event is represented by an integer in the range [1, 9]. I'm displaying the label for each count vertically below the chart. Here's the code:
hist(vector, axes = FALSE, breaks = chartBreaks)
axis(1, at = tickMarks, labels = eventTypes, las = 2, tick = FALSE)
Unfortunately, the labels are too long, so they are cut off by the bottom of the window. How can I make them visible? Am I even using the right chart?
Look at help(par), in particular fields mar (for the margin) and oma (for outer margin).
It may be as simple as
par(mar=c(5,3,1,1)) # extra large bottom margin
hist(vector, axes = FALSE, breaks = chartBreaks)
axis(1, at = tickMarks, labels = eventTypes, las = 2, tick = FALSE)
This doesn't sound like a job for a histogram - the event is not a continuous variable. A barplot or dotplot may be more suitable.
Some dummy data
set.seed(123)
vec <- sample(1:9, 100, replace = TRUE)
vec <- factor(vec, labels = paste("My long event name", 1:9))
A barplot is produced via the barplot() function - we provide it the counts of each event using the table() function for convenience. Here we need to rotate labels using las = 2 and create some extra space of the labels in the margin
## lots of extra space in the margin for side 1
op <- par(mar = c(10,4,4,2) + 0.1)
barplot(table(vec), las = 2)
par(op) ## reset
A dotplot is produced via function dotchart() and has the added convenience of sorting out the plot margins for us
dotchart(table(vec))
The dotplot has the advantage over the barplot of using much less ink to display the same information and focuses on the differences in counts across groups rather than the magnitudes of the counts.
Note how I've set the data up as a factor. This allows us to store the event labels as the labels for the factor - thus automating the labelling of the axes in the plots. It also is a natural way of storing data like I understand you to have.
Perhaps adding \n into your labels so they will wrap onto 2 lines? It's not optimal, but it may work.
You might want to look at this post from Cross Validated

How can I plot a histogram of a long-tailed data using R?

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:

Resources