How to plot two Dataset in same graph using poweRlaw - r

I have two data set, I need to plot them in same graph. Here is the two dataset.
The following is the code I used to plot the data. How to plot above data in same plot ? How to set the graph legend on the x-axis? I tried setting it but it didn't work I got some error.
m_bs = conpl$new(sample_data1$V1)
m_eq = conpl$new(sample_data2$V1)
est = estimate_xmin(m_bs, xmax=5e+5)
est_eq = estimate_xmin(m_eq, xmax=Inf)
m_bs$setXmin(est_bs)
m_eq$setXmin(est_eq)
plot(m_bs)
lines(m_bs)
d = plot(m_eq, draw =FALSE)
points(d$x, d$y, col=2)
lines(m_eq,col=2,lwd=2)
Kindly let me know thanks.

You code works find for me when I used simulated data. However, I think your problem is with your data. In particular, you need to set the xlim values in your plot command. Something like:
min_x = min(sample_data1$V1, sample_data1$V2)
max_x = max(sample_data1$V1, sample_data1$V2)
plot(m_bs, xlim=c(min_x, max_x))
Should do the trick. To add a legend, just use the legend function
legend("bottomleft", col=1:2, legend = c("BS", "EQ"), lty=1)

Related

Multiple Pen's Parade Graphs on the same Plot

I'm doing stochastic dominance analysis with diferent income distributions using Pen's Parade. I can plot a single Pen's Parade using Pen function from ineq package, but I need a visual comparison and I want multiple lines in the same image. I don't know how extract values from the function, so I can't do this.
I have the following reproducible example:
set.seed(123)
x <- rnorm(100)
y <- rnorm(100, mean = 0.2)
library(ineq)
Pen(x)
Pen(y)
I obtain the following plots:
I want obtain sometime as the following:
You can use add = TRUE:
set.seed(123)
x <- rnorm(100)
y <- rnorm(100, mean = 0.2)
library(ineq)
Pen(x); Pen(y, add = TRUE)
From help("Pen"):
add logical. Should the plot be added to an existing plot?
While the solution mentioned by M-M in the comments is a more general solution, in this specific case it produces a busy Y axis:
Pen(x)
par(new = TRUE)
Pen(y)
I would generalize the advice for plotting functions in this way:
Check the plotting function's help file. If it has an add argument, use that.
Otherwise, use the par(new = TRUE) technique
Update
As M-M helpfully mentions in the comments, their more general solution will not produce a busy Y axis if you manually suppress the Y axis on the second plot:
Pen(x)
par(new = TRUE)
Pen(y, yaxt = "n")
Looking at ?ineq::Pen() it seems to work like plot(); therefore, followings work for you.
Pen(x)
Pen(y, add=T)
Note: However, add=T cuts out part of your data since second plot has points which fall out of the limit of the first.
Update on using par(new=T):
Using par(new=T) basically means overlaying two plots on top of each other; hence, it is important to make them with the same scale. We can achieve that by setting the same axis limits. That said, while using add=T argument it is desired to set limits of the axis to not loose any part of data. This is the best practice for overlaying two plots.
Pen(x, ylim=c(0,38), xlim=c(0,1))
par(new=T)
Pen(y, col="red", ylim=c(0,38), xlim=c(0,1), yaxt='n', xaxt='n')
Essentially, you can do the same with add=T.

Time Series Analysis using ts.plot and abline()

Please explain me which transformation should I be using in the below code to apply WN model.
Below is the code where difference is used, I did not use log() because the series is decaying :
data <- c(60088,48398,54687,43337,47839,43480,53297,46882,45387,47186,42794,43274,31486,29036,25242,21792,23699,19161)
diff_data <- diff(data)
ts.plot(diff_data)
model_wn <- arima(diff_data, order = c(0, 0, 0))
coeff<-model_wn$coef
ts.plot(data)
abline(0, coeff)
Please explain me two things:
with ts.plot and abline, the abline is not visible in the graph
what can I utilise using the time series analysis with the above data.
'abline' has some parameters that you can specify, for example-
If you want a horizontal line you need to specify h = y-value
If you want a vertical line, you need to specify v = x-value
Your plot is produced by-
ts.plot(data)
If you want a horizontal line in your plot, add this code after the above code-
abline(h = 40000, lty = "dashed", col = "black")
'lty' is for line type and 'col' is for line color.
Similarly, if you want a vertical line, replace 'h' with 'v' in the above code. But remember that the value of 'v' should be within the bounds of your x-variable values.
Hope this helps answer you're question.

Axis breaks in ggplot histogram in R [duplicate]

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:

Add a grid to an "ecdfplot" in R

I am using the latticeExtra library "ecdfplot" to plot my error. I want to add gridlines.
The following does not seem to work:
ecdfplot(err)
grid(ny=10)
It gives the following (gridless) result:
I really would love to give a "graphical summary" where the quantiles are indicated by lines, and their intersections with the data are shown on the x-axis.
Can you tell me how to add gridlines?
How about adding vertical lines at a particular x-location?
Try the argument axis = axis.grid
require(latticeExtra)
data(singer, package = "lattice")
ecdfplot(~height, data = singer, add=TRUE, axis = axis.grid, par.settings = theEconomist.theme())

How can I plot a histogram of a long-tailed data using R?

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:

Resources