Histogram using qplot of ggplot2 looks different from the book - r

I am new to R and I am learning from the book Hands on Programming with R. I have a simple task: plot a histogram using qplot. The book and I have different graphs from the same command
library("ggplot2")
x <- c(1, 2, 2, 2, 3, 3)
qplot(x, binwidth = 1)
Unlike my histogram (image below), the one in the book has this type of interval [1, 2) and thus the histogram starts from 1 and not 0.5. Appreciate your help in telling me what's wrong here.

I don't think you're doing anything wrong. I'm getting the same plot as you are getting with the code from the book.
I added xlim and I'm getting a slightly better plot.
qplot(x, binwidth = 1, xlim = c(0, 4))

My best bet is either
The qplot function was updated at some point (unlikely that this has changed) so labels were changed.
The book tries to reduce confusion by only giving a segment of the code, while actually performing some relabeling behind-the-scene (and causing extra confusion). This could be done rather simply
qplot(c(1, 2, 2, 2, 3, 3), binwidth = 1, xlab = 'x') +
scale_x_continuous(breaks = c(1, 2, 3), labels = c('[1, 2)', '[2, 3)', '[3, Inf)'))

Related

How to find the first extrema (peaks) in a time series and extract the values in R

I am trying to find the peaks in a time series. In this case I am looking for the first minimum and the first maximum (Extrema) of the following data:
data <- data.table(x = c(1, 2, 3, 4, 5, 6, 7, 8),
y = c(1, -1, 2, 3, 3, 1, 3, 1))
I am able to do it with the stat_peaks function of the ggpmisc package and the argument span = NULL. But now I want to extract the values of these peaks so I tried find_peaks also of the ggpmisc package, but I can't use the argument span = NULL anymore.
How can I extract the values of the stat_peaks and stat:valleys function? I am not able to find the peak values in the data, only in the visualization.
data %>%
ggplot(aes(x = x, y = y)) +
geom_line() +
stat_peaks(col = "red", span = NULL, ignore_threshold = 0.01) +
stat_valleys(col = "blue", span = NULL, ignore_threshold = 0.01)
The package you are using (ggpmisc) is an extension to ggplot2 so it is just for graphing. To get the locations of the peaks & valleys you can use other packages - e.g. the functions quantmod::findPeaks and quantmod::findValleys might be one solution. See this post for more details.
I don't know if the functions align with the package you are using- I assume they are using different criteria to find the peaks and valleys.
Note that these functions return the index of the position after any peaks in the timeseries. You can get the peaks and valleys by correcting for this:
peaks <- quantmod::findPeaks(data$y) -1
valleys <- quantmod::findValleys(data$y) -1

Rotate y axis TEXT labels in plot.zoo

I would like to rotate the labelling of the y-labs to horizontal and can't find an answer without ggplot.
Is there a way to rotate them in plot.zoo?
The labels I mean are those ones called Series 1:5 and I have outlined them in red.
data <- xts(matrix(rnorm(1000), ncol = 5), order.by = as.Date(1:200))
plot.zoo(data)
Use las=1 like this:
plot.zoo(data, las = 1)
Update
The question later clarified that it was referring to the ylab. plot.zoo uses mtext for that and hard codes it; however, we could hack it using trace:
library(xts)
trace(plot.zoo,
quote(mtext <- function(...) graphics::mtext(..., cex = 0.7, las = 1)))
plot.zoo(data, oma = c(6, 5, 5, 0))
untrace(plot.zoo)

Reduce the amount of y axis tick values in R (pirateplot)

Since I have updated R to version 3.5.2, my pirateplots (yarrr library) show more y axis ticks than before. Before they only showed ticks at full number values (1, 2, 3, etc., 1), but now they also show ticks at .5 values (1, 1.5, 2, 2.5, 3, etc., 2). This is despite the fact that the data and scripts are exactly the same as before.
Do you know how I can remove the .5 value ticks and make the plots look like they looked before?
I think the yaxt.y argument is what you're looking for as it allows you to override the default y axis construction.
Here's a plot using default arguments
pirateplot(weight ~ Diet, data = ChickWeight)
Original plot
Now with the yaxt.y argument
pirateplot(weight ~ Diet, data = ChickWeight, yaxt.y = seq(0, 400, 25))
Second version
You can also specify the grid line widths with gl.lwd:
pirateplot(weight ~ Diet, data = ChickWeight, yaxt.y = seq(0, 400, 25), gl.lwd = c(.5, 1.5), gl.col = "black")
Third version
Hope this helps!

Why is the last x-axis label omitted?

In the following example, the last x-axis label ("4.0") is omitted.
df <- data.frame(x = c(1, 2, 3.8), y = c(1, 2, 3))
#png(filename = "cutoff.png")
plot(df$x, df$y, xaxt = "n")
axis(side = 1, at = seq(0, 4, 0.5), labels = seq(0, 4, 0.5))
#dev.off()
How to prevent this behaviour?
You axis limit does not include 4; you need to overwrite the default limits of the plot (which it derives from the data) using xlim:
plot(df$x, df$y, xaxt = "n", xlim = c(1, 4))
Note that when using axis your specification of at will become your labels unless you overwrite that, so your script doesn't need to specify labels; your script can become:
axis(side = 1, at = seq(0, 4, 0.5))
As #griffinevo answered (+1), If you want the axis limits to go to 4, you must specify that using xlim. However, it is probably worth explaining how the default limits are computed. This is explained in the documentation, but in a slightly obscure place. On the help page ?par search for xaxs. There you will see
Style "r" (regular) first extends the data range by 4 percent at each
end and then finds an axis with pretty labels that fits within the
extended range.
In your case, the data ranges from 1 to 3.8. So plot will look for pretty labels inside the range
1 - 0.04*(3.8-1) = 0.888
to
3.8 + 0.04*(3.8-1) = 3.912
4 is outside of this range and so will not appear as an axis label. For completeness, it is worth noting that "pretty" sounds like just a word, but actually has a technical meaning here - related to the pretty function. If you look at the help page ?pretty You will see the description:
Compute a sequence of about n+1 equally spaced ‘round’ values which
cover the range of the values in x. The values are chosen so that they
are 1, 2 or 5 times a power of 10.
There is additional detail on the help page.

multiple barplots in same graph maintain the same axes and same bar width

Very stupid question but I can't find the answer online:
Need to plot multiple barplots together like this:
i=1:4
main = paste ("Location ", i)
windows()
par(mfrow=c(2,2))
a<-table(c(rep(10, 6), rep(4, 32)))
b<-table(c(rep(9, 6), rep(10, 32), rep(11,4)))
c<-table(c(rep(10, 6)))
d<-table(c(rep(10, 3), rep(9, 3), rep(8, 5)))
barplot(a, main=main[1], xlab='RSSI')
barplot(b, main=main[2], xlab='RSSI')
barplot(c, main=main[3], xlab='RSSI')#, width=0.5
barplot(d, main=main[4], xlab='RSSI')
1) is it possible to maintain constant the axes scale, so that it is the same on every graph?
2) is it possible to maintain constant the bar width in the graphs? I tried with width but it does not seem to work and i would like to have it constant and fixed between graphs.
Thanks
EDIT: for 2) I know you can add zeros, so that all the graphics have the same number of classes involved, but since I am plotting in a for loop that I have removed to make the example simpler, I would rather use another way to do it, if it is possible.
Thanks again
Possibly it was only for your example that you kept the data for each plot in separate vectors. Anyway, if the number of locations would be much bigger, you will soon have your workspace cluttered with small vectors, and you would have to call tableand barplot many times.
It would be much easier to work with the data stored in a data frame, regardless if you plot using base R functions, or ggplot. Furthermore, it might be easier to compare counts for different levels of RSSI, among the different locations, if the same set of classes for each location is used in each plot, i.e. that also RSSI classes with zero counts were included. You might also use the same scale of the y axis across locations. Here is a small example with ggplot
library(ggplot2)
# create a data frame with the data in your vectors
# 'x' is the value, and 'loc' the location of each registration
df <- data.frame(x = c(rep(10, 6), rep(4, 32),
rep(9, 6), rep(10, 32), rep(11, 4),
rep(10, 6),
rep(10, 3), rep(9, 3), rep(8, 5)),
loc = c(rep("a", 6+32), rep("b", 6+32+4), rep("c", 6), rep("d", 3+3+5)))
# plot using geom_bar, which default counts the cases for each level of - no need for 'table'
ggplot(data = df, aes(x = factor(x))) +
geom_bar() +
facet_wrap(~ loc)
Have a look at the xlim and ylim argument described in ?barplot:
barplot(a, main=main[1], xlab='RSSI', xlim=c(0, 4), ylim=c(0, 32))
barplot(b, main=main[2], xlab='RSSI', xlim=c(0, 4), ylim=c(0, 32))
barplot(c, main=main[3], xlab='RSSI', xlim=c(0, 4), ylim=c(0, 32))
barplot(d, main=main[4], xlab='RSSI', xlim=c(0, 4), ylim=c(0, 32))
(xlim and width influence each other.)

Resources