Make y-axis logarithmic in histogram using R [duplicate] - r

This question already has answers here:
Histogram with Logarithmic Scale and custom breaks
(7 answers)
Closed 5 years ago.
Hi I'm making histogram using R, but the number of Y axis is so large that I need to turn it into logarithmic.See below my script:
hplot<-read.table("libl")
hplot
pdf("first_end")
hist(hplot$V1, breaks=24, xlim=c(0,250000000), ylim=c(0,2000000),main="first end mapping", xlab="Coordinates")
dev.off()
So how should I change my script?
thx

You can save the histogram data to tweak it before plotting:
set.seed(12345)
x = rnorm(1000)
hist.data = hist(x, plot=F)
hist.data$counts = log10(hist.data$counts)
dev.new(width=4, height=4)
hist(x)
dev.new(width=4, height=4)
plot(hist.data, ylab='log10(Frequency)')

Another option would be to use plot(density(hplot$V1), log="y").
It's not a histogram, but it shows just about the same information, and it avoids the illogical part where a bin with zero counts is not well-defined in log-space.
Of course, this is only relevant when your data is continuous and not when it's really categorical or ordinal.

A histogram with the y-axis on the log scale will be a rather odd histogram. Technically it will still fit the definition, but it could look rather misleading: the peaks will be flattened relative to the rest of the distribution.
Instead of using a log transformation, have you considered:
Dividing the counts by 1 million:
h <- hist(hplot$V1, plot=FALSE)
h$counts <- h$counts/1e6
plot(h)
Plotting the histogram as a density estimate:
hist(hplot$V1, freq=FALSE)

You can log your y-values for the plot and add a custom log y-axis afterwards.
Here is an example for a table object of random normal distribution numbers:
# data
count = table(round(rnorm(10000)*2))
# plot
plot(log(count) ,type="h", yaxt="n", xlab="position", ylab="log(count)")
# axis labels
yAxis = c(0,1,10,100,1000)
# draw axis labels
axis(2, at=log(yAxis),labels=yAxis, las=2)

Related

Plot continuous data with discrete colors

I found some similar questions but the answers didn't solve my problem.
I try to plot a time series of to variables as a scatterplot and using the date to color the points. In this example, I created a simple dataset (see below) and I want to plot all data with timesteps in the 1960ties, 70ties, 80ties and 90ties with one colour respectively.
Using the standard plot command (plot(x,y,...)) it works the way it should, as I try using the ggplot library some strange happens, I guess I miss something. Has anyone an idea how to solve this and generate a correct plot?
Here is my code using the standard plot command with a colorbar
# generate data frame with test data
x <- seq(1,40)
y <- seq(1,40)
year <- c(rep(seq(1960,1969),2),seq(1970,1989,2),seq(1990,1999))
df <- data.frame(x,y,year)
# define interval and assing color to interval
myinterval <- seq(1959,1999,10)
mycolors <- rainbow(4)
colbreaks <- findInterval(df$year, vec = myinterval, left.open = T)
# basic plot
layout(array(1:2,c(1,2)),widths =c(5,1)) # divide the device area in two panels
par(oma=c(0,0,0,0), mar=c(3,3,3,3))
plot(x,y,pch=20,col = mycolors[colbreaks])
# add colorbar
ncols <- length(myinterval)-1
colbarlabs <- seq(1960,2000,10)
par(mar=c(5,0,5,5))
image(t(array(1:ncols, c(ncols,1))), col=mycolors, axes=F)
box()
axis(4, at=seq(0.5/(ncols-1)-1/(ncols-1),1+1/(ncols-1),1/(ncols-1)), labels=colbarlabs, cex.axis=1, las=1)
abline(h=seq(0.5/(ncols-1),1,1/(ncols-1)))
mtext("year",side=3,line=0.5,cex=1)
As I would like to use ggplot package, as I do for other plots, I tried this version with ggplot
# plot with ggplot
require(ggplot2)
ggplot(df, aes(x=x,y=y,color=year)) + geom_point() +
scale_colour_gradientn(colours= mycolors[colbreaks])
but it didn't work the way I thought it would. Obviously, there is something wrong with the color coding. Also, the colorbar looks strange. I also tried it with scale_color_manual and scale_color_gradient2 but I got more errors (Error in continuous_scale).
Any idea how to solve this and generate a plot according to the standard plot 3 including a colorbar.

Plot a histogram with densities past initialization

This code will create two plots.
a = c(0,1,1,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,5,5,6,6,6,6,6,7,7,7,7,7,7,8,8,8,8,8,8,8,9,9,9)
b = hist(a, freq=FALSE)
dev.new()
plot(b)
The first is a histogram of the density (which I want to have). But if one would like to plot b later on, it will always be plotted as frequency.
Is there any chance to plot the histogram as density past initialisation?
You just need to change the argument in plot
plot(b,freq=FALSE)

How do I create a histogram in R with logarithmic x and y axes? [duplicate]

This question already has answers here:
Histogram with Logarithmic Scale and custom breaks
(7 answers)
Closed 10 years ago.
So I have a vector of integers, quotes, which I wish to see whether it observes a power law distribution by plotting the frequency of data points and making both the x and y axes logarithmic. However, I am not quite sure how to accomplish this in R. I can currently create a histogram using
hist(quotes, breaks = max(quotes))
But the axes are all linear.
There's probably a better way to do this, but this (basically) works:
data = rnorm(1000,0,1)
r <- hist(log(data))
plot(r$breaks[-1],log(r$counts))
EDIT: Better solution:
r <- hist(data)
plot(r$breaks[-1], r$counts, log='xy', type='h')
# or alternatively:
barplot(r$counts, log="y", col="white", names.arg=r$breaks[-1])
The barplot version doesn't have a transformed x axis for reasons that will become clear if you try it with the x axis transformed.

histogram and pdf in the same graph [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Fitting a density curve to a histogram in R
I'd like to plot on the same graph the histogram and various pdf's. I've tried for just one pdf with the following code (adopted from code I've found in the web):
hist(data, freq = FALSE, col = "grey", breaks = "FD")
.x <- seq(0, 0.1, length.out=100)
curve(dnorm(.x, mean=a, sd=b), col = 2, add = TRUE)
It gives me an error. Can you advise me?
For multiple pdf's what's the trick?
And I've observed that the histogram seems to be plot the density (on y-y axis) instead of the number of observations.... how can I change this?
Many thanks!
It plots the density instead of the frequency because you specified freq=FALSE. It is not very fair to complain about it doing exactly what you told it to do.
The curve function expects an expression involving x (not .x) and it does not require you to precompute the x values. You probably want something like:
a <- 5
b <- 2
hist( rnorm(100, a, b), freq=FALSE )
curve( dnorm(x,a,b), add=TRUE )
To head of your next question, if you specify freq=TRUE (or just leave it out for the default) and add the curve then the curve just runs along the bottom (that is the whole purpose of plotting the histogram as a density rather than frequencies). You can work around this by scaling the expression given to curve by the width of the bins and the number of total points:
out <- hist( rnorm(100, a, b) )
curve( dnorm(x,a,b)*100*diff(out$breaks[1:2]), add=TRUE )
Though personally the first option (density scale) without tickmark labels on the y-axis makes more sense to me.
h<-hist(data, breaks="FD", col="red", xlab="xTitle", main="Normal pdf and histogram")
xfit<-seq(min(data),max(data),length=100)
x.norm<-rnorm(n=100000, mean=a, sd=b)
yfit<-dnorm(xfit,mean=mean(x.norm),sd=sd(x.norm))
yfit <- yfit*diff(h$mids[1:2])*length(loose_All)
lines(xfit, yfit, col="blue", lwd=2)

Problem with axis limits when plotting curve over histogram [duplicate]

This question already has an answer here:
How To Avoid Density Curve Getting Cut Off In Plot
(1 answer)
Closed 6 years ago.
newbie here. I have a script to create graphs that has a bit that goes something like this:
png(Test.png)
ht=hist(step[i],20)
curve(insert_function_here,add=TRUE)
I essentially want to plot a curve of a distribution over an histogram. My problem is that the axes limits are apparently set by the histogram instead of the curve, so that the curve sometimes gets out of the Y axis limits. I have played with par("usr"), to no avail. Is there any way to set the axis limits based on the maximum values of either the histogram or the curve (or, in the alternative, of the curve only)?? In case this changes anything, this needs to be done within a for loop where multiple such graphs are plotted and within a series of subplots (par("mfrow")).
Inspired by other answers, this is what i ended up doing:
curve(insert_function_here)
boundsc=par("usr")
ht=hist(A[,1],20,plot=FALSE)
par(usr=c(boundsc[1:2],0,max(boundsc[4],max(ht$counts))))
plot(ht,add=TRUE)
It fixes the bounds based on the highest of either the curve or the histogram.
You could determine the mx <- max(curve_vector, ht$counts) and set ylim=(0, mx), but I rather doubt the code looks like that since [] is not a proper parameter passing idiom and step is not an R plotting function, but rather a model selection function. So I am guessing this is code in Matlab or some other idiom. In R, try this:
set.seed(123)
png("Test.png")
ht=hist(rpois(20,1), plot=FALSE, breaks=0:10-0.1)
# better to offset to include discrete counts that would otherwise be at boundaries
plot(round(ht$breaks), dpois( round(ht$breaks), # plot a Poisson density
mean(ht$counts*round(ht$breaks[-length(ht$breaks)]))),
ylim=c(0, max(ht$density)+.1) , type="l")
plot(ht, freq=FALSE, add=TRUE) # plot the histogram
dev.off()
You could plot the curve first, then compute the histogram with plot=FALSE, and use the plot function on the histogram object with add=TRUE to add it to the plot.
Even better would be to calculate the the highest y-value of the curve (there may be shortcuts to do this depending on the nature of the curve) and the highest bar in the histogram and give this value to the ylim argument when plotting the histogram.

Resources