Formatting histograms in R - r

I'm trying to fit Variance-Gamma distribution to empirical data of 1-minute logarithmic returns. In order to visualize the results I plotted together 2 histograms: empirical and theoretical.
(a is the vector of empirical data)
SP_hist <- hist(a,
col = "lightblue",
freq = FALSE,
breaks = seq(a, max(a), length.out = 141),
border = "white",
main = "",
xlab = "Value",
xlim = c(-0.001, 0.001))
hist(VG_sim_rescaled,
freq = FALSE,
breaks = seq(min(VG_sim_rescaled), max(VG_sim_rescaled), length.out = 141),
xlab = "Value",
main = "",
col = "orange",
add = TRUE)
(empirical histogram-blue, theoretical histogram-orange)
However, after having plotted 2 histograms together, I started wondering about 2 things:
In both histograms I stated, that freq = FALSE. Therefore, the y-axis should be in range (0, 1). In the actual picture values on the y-axis exceed 3,000. How could it happen? How to solve it?
I need to change the bucketing size (the width of the buckets) and the density per unit length of the x-axis. How is it possible to do these tasks?
Thank you for your help.

freq=FALSE means that the area of the entire histogram is normalized to one. As your x-axis has a very small range (about 10^(-4)), the y-values must be quite large to achieve an area (= x times y) of one.
The only way to set the number of bins is by providing a vector of break points to the parameter breaks. Theoretically, this parameter also accepts a single number, but this number is ignored by hist. Thus try the following:
bins <- 6 # number of cells
breaks <- seq(min(x),max(x),(max(x)-min(x))/bins)
hist(x, freq=FALSE, breaks=breaks)

Related

Histogram overlay not visible

I need to overlay a normal distribution curve based on a dataset on a histogram of the same dataset.
I get the histogram and the normal curve right individually. But the curve just stays a flat line when combined to the histogram using the add = TRUE attribute in the curve function.
I did try adjusting the xlim and ylim to check if it works but am not getting the intended results, I am confused about how to set the (x and y) limits to suit both the histogram and the curve.
Any suggestions? My dataset is a set of values for 100 individuals daily walk distances ranging from min = 0.4km to max = 10km
bd.m <- read_excel('walking.xlsx')
hist(bd.m, ylim = c(0,10))
curve(dnorm(x, mean = mean(bd.m), sd = sd(bd.m)), add = TRUE, col = 'red')
You need to set freq = FALSEin the call to hist. For example:
dt <- rnorm(1000, 2)
hist(dt, freq = F)
curve(dnorm(x, mean = mean(dt), sd = sd(dt)), add = TRUE, col = 'red')

How do you implement rgamma and dgamma in a single plot

For an assignment I was asked this:
For the values of
(shape=5,rate=1),(shape=50,rate=10),(shape=.5,rate=.1), plot the
histogram of a random sample of size 10000. Use a density rather than
a frequency histogram so that you can add in a line for the population
density (hint: you will use both rgamma and dgamma to make this plot).
Add an abline for the population and sample mean. Also, add a subtitle
that reports the population variance as well as the sample variance.
My current code looks like this:
library(ggplot2)
set.seed(1234)
x = seq(1, 1000)
s = 5
r = 1
plot(x, dgamma(x, shape = s, rate = r), rgamma(x, shape = s, rate = r), sub =
paste0("Shape = ", s, "Rate = ", r), type = "l", ylab = "Density", xlab = "", main =
"Gamma Distribution of N = 1000")
After running it I get this error:
Error in plot.window(...) : invalid 'xlim' value
What am I doing incorrectly?
plot() does not take y1 and y2 arguments. See ?plot. You need to do a plot (or histogram) of one y variable (e.g., from rgamma), then add the second y variable (e.g., from dgamma) using something like lines().
Here's one way to get a what you want:
#specify parameters
s = 5
r = 1
# plot histogram of random draws
set.seed(1234)
N = 1000
hist(rgamma(N, shape=s, rate=r), breaks=100, freq=FALSE)
# add true density curve
x = seq(from=0, to=20, by=0.1)
lines(x=x, y=dgamma(x, shape=s, rate=r))

Twosided Barplot in R with different data

I was wondering if it's possible to get a two sided barplot (e.g. Two sided bar plot ordered by date) that shows above Data A and below Data B of each X-Value.
Data A would be for example the age of a person and Data B the size of the same person. The problem with this and the main difference to the examples above: A and B have obviously totally different units/ylims.
Example:
X = c("Anna","Manuel","Laura","Jeanne") # Name of the Person
A = c(12,18,22,10) # Age in years
B = c(112,186,165,120) # Size in cm
Any ideas how to solve this? I don't mind a horizontal or a vertical solution.
Thank you very much!
Here's code that gets you a solid draft of what I think you want using barplot from base R. I'm just making one series negative for the plotting, then manually setting the labels in axis to reference the original (positive) values. You have to make a choice about how to scale the two series so the comparison is still informative. I did that here by dividing height in cm by 10, which produces a range similar to the range for years.
# plot the first series, but manually set the range of the y-axis to set up the
# plotting of the other series. Set axes = FALSE so you can get the y-axis
# with labels you want in a later step.
barplot(A, ylim = c(-25, 25), axes = FALSE)
# plot the second series, making whatever transformations you need as you go. Use
# add = TRUE to add it to the first plot; use names.arg to get X as labels; and
# repeat axes = FALSE so you don't get an axis here, either.
barplot(-B/10, add = TRUE, names.arg = X, axes = FALSE)
# add a line for the x-axis if you want one
abline(h = 0)
# now add a y-axis with labels that makes sense. I set lwd = 0 so you just
# get the labels, no line.
axis(2, lwd = 0, tick = FALSE, at = seq(-20,20,5),
labels = c(rev(seq(0,200,50)), seq(5,20,5)), las = 2)
# now add y-axis labels
mtext("age (years)", 2, line = 3, at = 12.5)
mtext("height (cm)", 2, line = 3, at = -12.5)
Result with par(mai = c(0.5, 1, 0.25, 0.25)):

Fine tuning addADX() to avoind truncating the trend curves

I am using the following code to look at the past 9 months of a stock.
library(quantmod)
getSymbols("AMZN")
candleChart(to.weekly(AMZN),multi.col=TRUE,theme="white",subset='last 9 months')
addADX()
You can see that the red line is essentially not included in the plot because it mostly lies below the value of 20. I want to modify the Y axis range of addADX so that it always shows all three lines. How would it be possible?
The input parameters of addADX() only control the computation of the directional movement index - not the graphical parameters which are set according to the average direction index.
A simple workaround to display the positive and negative direction index completely is to compute the directional movement index by yourself with ADX() from the TTR package and then add it to the previous chart with addTA(), which allows more customisation.
library(quantmod)
getSymbols("AMZN")
dat <- to.weekly(AMZN)
candleChart(dat, multi.col = TRUE, theme = "white", subset = "last 9 months")
adx <- ADX(HLC(dat), n = 14, maType = "EMA", wilder = TRUE)[, c("DIp", "DIn", "ADX")]
addTA(adx, col = c("green", "red", "blue"), lwd = c(1, 1, 2), legend = NULL)

Adding color to circular data based on group membership

I'm trying to add color to specific points in my circular data based on group membership (I have two groups: one with individuals with a certain medical condition and another group of just healthy controls). I've converted their data from degrees to radians and put it on the plot, but I haven't managed to be able to selectively change the color of the points based on the factor variable I have).
Know that I've loaded library (circular), which doesn't allow me to use ggplot. Here's the syntax I've been working with:
plot(bcirc, stack=FALSE, bins=60, shrink= 1, col=w$dx, axes=FALSE, xlab ="Basal sCORT", ylab = "Basal sAA")
If you've noticed, I specified the factor variable (which has two levels) in the color section, but it just keeps putting everything in one color. Any suggestions?
Seems plot.circular does not like to assign multiple colours. Here's one potential work-around:
library(circular)
## simulate circular data
bcirc1 <- rvonmises(100, circular(90), 10, control.circular=list(units="degrees"))
bcirc2 <- rvonmises(100, circular(0), 10, control.circular=list(units="degrees"))
bcirc <- c(bcirc1, bcirc2)
dx <- c(rep(1,100),rep(2,100))
## start with blank plot, then add group-specific points
plot(bcirc, stack=FALSE, bins=60, shrink= 1, col=NA,
axes=FALSE, xlab ="Basal sCORT", ylab = "Basal sAA")
points(bcirc[dx==1], col=rgb(1,0,0,0.1), cex=2) # note: a loop would be cleaner if dealing with >2 levels
points(bcirc[dx==2], col=rgb(0,0,1,0.1), cex=2)
Inspired by Paul Regular's example, here is a version using the same data where one condition is plotted stacking inwards and the other is plotted stacking outwards.
library(circular)
## simulate circular data
bcirc1 <- rvonmises(100, circular(90, units = 'degrees'), 10, control.circular=list(units="degrees"))
bcirc2 <- rvonmises(100, circular(0, units = 'degrees'), 10, control.circular=list(units="degrees"))
bcirc <- data.frame(condition = c(
rep(1,length(bcirc1)),
rep(2,length(bcirc2)) ),
angles = c(bcirc1,
bcirc2) )
## start with blank plot, then add group-specific points
dev.new(); par(mai = c(1, 1, 0.1,0.1))
plot(circular(subset(bcirc, condition == 1)$angles, units = 'degrees'), stack=T, bins=60, shrink= 1, col=1,sep = 0.005, tcl.text = -0.073,#text outside
axes=T, xlab ="Basal sCORT", ylab = "Basal sAA")
par(new = T)
plot(circular(subset(bcirc, condition == 2)$angles, units = 'degrees'), stack=T, bins=60, shrink= 1.05, col=2,
sep = -0.005, axes=F)#inner circle, no axes, stacks inwards

Resources