How to fit R histogram within axes limits [0,1] - r

Suppose I generate data using x <- rnorm(10000) and then plot a simple histogram using hist(x).
This obviously shows that the data is normal, but the x and y axes are determined by the values generated. How could I adjust x so that the histogram will still appear as a normal curve, but on a plot whose bounds are x=[0,1] and y=[0,1]. I tried using this normalization method from another answer, https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range, and setting xlim and ylim to c(0,1), but the result was not what I wanted, as it basically just fills up the entire plot.

I'm not sure what you mean by 'fills up the whole plot'. This code seems to work fine:
x <- rnorm(1000)
z <- (x - min(x))/(max(x) - min(x))
hist(z)
Then if you want the y-axis on a scale of 0-1:
hist1 <- hist(z)
hist1$counts <- hist1$counts/sum(hist1$counts)
plot(hist1, ylim = c(0,1)) ## Looks squished to me if you include the ylim argument

Related

How to make custom spacing for y-axis in R?

I want to make a plot in R where the spacing between ticks on the y-axis all have the same distance and the tick labels are a custom list of values, for example:
set.seed(1)
n <- 10
x <- 1:n
y <- rnorm(n)
plot(x, y, axes = FALSE, ylim=c(-2,2))
axis(1)
axis(2, seq(-2,2,1), c(-100,-10,0,5,1000))
gets me a plot where the distance between the y-axis ticks are equal but clearly the true distance between values is not equal, i.e., -100 to - 10 is not the same distance as 5 to 1000, numerically.
Now this works, but the problem with this solution is that the data is not correctly mapped to the right position in the plot. As in, I would like for the data to be plotted correctly based on the original scale. So either I need a way to simply change the y-axis to be plotted on a different scale, or for the data to be transformed to a new scale that matches my axis(2, seq(-2,2,1), c(-100,-10,0,5,1000)) command.
I guess what I am saying is I want the equivalent of plot(x, y, log = "y") but I don't actually want the log scale, I just want the tick marks to be even spaced based on values I want shown, i.e., -100,-10,0,5,1000
Your example is a bit hard to implement because you are setting the plot boundaries from -2 to 2 and then wanting axis labels that go from -100 to 1000. It should work if you use at and set the boundaries of the initial plot to match the axis parameters. I've modified your example to spread the data across the plot more evenly:
set.seed(1)
n <- 10
x <- 1:n
y <- 100*rnorm(n)
yticks = c(-100,-10,0,5,200)
plot(x, y, axes = FALSE, ylim=c(-100,200))
axis(1)
axis(2,at = yticks,labels=yticks)

R how to automatically adjust y axis when using basic plot with xlim

I'm trying to use base R (and would like to stick to it for this problem) to plot a specific portion of a dataset.
My example data looks like below:
x <- c(1:100)
y <- sort(runif(100, min=0, max=1000))
When I plot this with plot(x,y, type='l'), I get a plot with a y axis that shows 0 to 1000. However, when I plot only a specific x range, my y axis still shows 0 to 1000. I would like to zoom in to reduce the y axis range.
For example,
plot(x,y, type='l', xlim=c(40,60))
plot(x,y, type='l', xlim=c(80,90))
both produces plots with a y axis that ranges c(0,1000). But I'd like to zoom in so that the y axis range for the first plot is something like c(300,700) and that for the second plot is c(700,1000). (300, 700 and 1000 are all arbitrary numbers just to illustrate the purpose to really zoom into the plot). Is there a way to do this without setting specific ylim?
I'd like to avoid using ylim because I'm plotting and saving in a for loop and I can't write a ylim that is suitable for all plots. I've thought of doing something like ylim = max(y)*1.5, but again, since I'm cutting the y values off based on xlim, this doesn't help with zooming in whenever xlim changes.
Subset the relevant data and plot that
lower = 40
upper = 60
ind = which(x >= lower & x <= upper)
plot(x[ind], y[ind], type = "l")

Add points and colored surface to locfit perspective plot

I have a perspective plot of a locfit model and I wish to add two things to it
Predictor variables as points in the 3D space
Color the surface according to the Z axis value
For the first, I have tried to use the trans3d function. But I get the following error even though my variables are in vector format:
Error in cbind(x, y, z, 1) %*% pmat : requires numeric/complex matrix/vector arguments
Here is a snippet of my code
library(locfit)
X <- as.matrix(loc1[,1:2])
Y <- as.matrix(loc1[,3])
zz <- locfit(Y~X,kern="bisq")
pmat <- plot(zz,type="persp",zlab="Amount",xlab="",ylab="",main="Plains",
phi = 30, theta = 30, ticktype="detailed")
x1 <- as.vector(X[,1])
x2 <- as.vector(X[,2])
Y <- as.vector(Y)
points(trans3d(x1,x2,Y,pmat))
My "loc1" data can be found here - https://www.dropbox.com/s/0kdpd5hxsywnvu2/loc1_amountfreq.txt?dl=0
TL,DR: not really in plot.locfit, but you can reconstruct it.
I don't think plot.locfit has good support for this sort of customisation. Supposedly get.data=T in your plot call will plot the original data points (point 1), and it does seem to do so, except if type="persp". So no luck there. Alternatively you can points(trans3d(...)) as you have done, except you need the perspective matrix returned by persp, and plot.locfit.3d does not return it. So again, no luck.
For colouring, typically you make a colour scale (http://r.789695.n4.nabble.com/colour-by-z-value-persp-in-raster-package-td4428254.html) and assign each z facet the colour that goes with it. However, you need the z-values of the surface (not the z-values of your original data) for this, and plot.locfit does not appear to return this either.
So to do what you want, you'll essentially be recoding plot.locfit yourself (not hard, though just cludgy).
You could put this into a function so you can reuse it.
We:
make a uniform grid of x-y points
calculate the value of the fit at each point
use these to draw the surface (with a colour scale), saving the perspective matrix so that we can
plot your original data
so:
# make a grid of x and y coords, calculate the fit at those points
n.x <- 20 # number of x points in the x-y grid
n.y <- 30 # number of y points in the x-y grid
zz <- locfit(Total ~ Mex_Freq + Cal_Freq, data=loc1, kern="bisq")
xs <- with(loc1, seq(min(Mex_Freq), max(Mex_Freq), length.out=20))
ys <- with(loc1, seq(min(Cal_Freq), max(Cal_Freq), length.out=30))
xys <- expand.grid(Mex_Freq=xs, Cal_Freq=ys)
zs <- matrix(predict(zz, xys), nrow=length(xs))
# generate a colour scale
n.cols <- 100 # number of colours
palette <- colorRampPalette(c('blue', 'green'))(n.cols) # from blue to green
# palette <- colorRampPalette(c(rgb(0,0,1,.8), rgb(0,1,0,.8)), alpha=T)(n.cols) # if you want transparency for example
# work out which colour each z-value should be in by splitting it
# up into n.cols bins
facetcol <- cut(zs, n.cols)
# draw surface, with colours (col=...)
pmat <- persp(x=xs, y=ys, zs, theta=30, phi=30, ticktype='detailed', main="plains", xlab="", ylab="", zlab="Amount", col=palette[facetcol])
# draw your original data
with(loc1, points(trans3d(Mex_Freq,Cal_Freq,Total,pmat), pch=20))
Note - doesn't look that pretty! might want to adjust say your colour scale colours, or the transparency of the facets, etc. Re: adding legend, there are some other questions that deal with that.
(PS: what a shame ggplot doesn't do 3D scatter plots.)

log-transformed density function not plotting correctly

I'm trying to log-transform the x axis of a density plot and get unexpected results. The code without the transformation works fine:
library(ggplot2)
data = data.frame(x=c(1,2,10,11,1000))
dens = density(data$x)
densy = sapply(data$x, function(x) { dens$y[findInterval(x, dens$x)] })
ggplot(data, aes(x = x)) +
geom_density() +
geom_point(y = densy)
If I add scale_x_log10(), I get the following result:
Apart from the y values having been rescaled, something seems to have happened to the x values as well -- the peaks of the density function are not quite where the points are.
Am I using the log transformation incorrectly here?
The shape of the density curve changes after the transformation because the distribution of the data has changed and the bandwidths are different. If you set a bandwidth of (bw=1000) prior to the transformation and 10 afterward, you will get two normal looking densities (with different y-axis values because the support will be much larger in the first case). Here is an example showing how varying bandwidths change the shape of the density.
data = data.frame(x=c(1,2,10,11,1000), y=0)
## Examine how changing bandwidth changes the shape of the curve
par(mfrow=c(2,1))
greys <- colorRampPalette(c("black", "red"))(10)
plot(density(data$x), main="No Transform")
points(data, pch=19)
plot(density(log10(data$x)), ylim=c(0,2), main="Log-transform w/ varying bw")
points(log10(data$x), data$y, pch=19)
for (i in 1:10)
points(density(log10(data$x), bw=0.02*i), col=greys[i], type="l")
legend("topright", paste(0.02*1:10), col=greys, lty=2, cex=0.8)

log="y" convert only y-axis label or also y-coordinates of my data?

I'm building a plot in R and I have used the plot() function, with log="y" parameter.
Does that mean that ONLY the y-axis labels will be converted in log scale OR that also the y-coordinates of my data will be converted in log-scale?
Thank you
When using log = "y" it plots the log-transformed y-values with the labels on the original scale -- the opposite of what you seem to suggest.
Compare these three plots:
x <- rnorm(50)
y <- 2*exp(x) + rexp(50)
plot(x, y) # y-scale, y-scale-labels
plot(x, y, log = "y") # log-y-scale, y-scale-labels
plot(x, log(y)) # log-y-scale, log-y-scale labels
Notice that the last two plots only differs in the y-axis labels. Both are still correct as the axis titles are also different.

Resources