Power law distribution with R - r

I tried to visualize a Power Law p(x)=x^(-2.5) with following R code. When you use an log-scale in the end you get a lot of vibrations what is okay as can be seen here
But know, and this is my Problem, I read an article where the author says I have to use a cumulative distribution function to remove this vibrations at the end. But for me it doesn't work, as can be seen here
library(ggplot2)
chol_r <- read.table("C:\\Users\\me\\Desktop\\1M_just_random_py.txt",
header = FALSE)
chol <- (chol_r)**(-2.5) #this p(x)
chol2 = (1/1.5)*chol_r**(-1.5) # the cumulative distribution function
qplot(chol2,
geom="histogram",
binwidth = 0.001, #0.001 oder 0.38
main = "Histogram",
xlab = "Numbers",
fill=I("blue"),
col=I("red"),
log="xy")
So does anybody know what I am doing wrong? Or how i can get a straight line falling without that vibrations? I really don't know what I am doing wrong

Related

Lag 0 is not plotted in GGCcf

With the following code I plotted the Cross Correlation of my data. All works wonderful, however the visualization does not depict Lag 0, which is highly important for my studies.
p= ggCcf(
df_ccf$Asia_Co,
df_ccf$EU_USA,
lag.max = 10,
type = c("correlation", "covariance"),
plot = TRUE,
na.action = na.contiguous)
plot(p)
The plot is looking like that:
Head of data:
I encountered the same issue; it might be an issue/bug with 'ggCcf' from the forecast library. I couldn't get ggCcf to work, no matter what I tried. Anyone who wants to reproduce this behaviour, try:
ggCcf(c(1,2,3,4),c(2,3,4,6))
The workaround is using regular/base R ccf:
max_lag = 10
result = ccf(series1, series2, lag.max = max_lag)
y = results$acf
x = c(-max_lag:max_lag)
You can use these two series to plot the ccf using ggplot2 and choosing an appropriate ylim.
The downside of this all is less conveniance, but the upside is that you can add some flair/styling to your plot now that you are doing everything yourself anyway ;).

R: How does stat_density_ridges modify and plot data?

<Disclaimer(s) - (1) This is my first post, so please be gentle, specifically regarding formatting and (2) I did try to dig as much as I could on this topic before posting the question here>
I have a simple data vector containing returns of 40 portfolios on the same day:
Year Return
Now -17.39862061
Now -12.98954582
Now -12.98954582
Now -12.86928749
Now -12.37044334
Now -11.07007504
Now -10.68971539
Now -10.07578182
Now -9.984867096
Now -8.764036179
Now -8.698093414
Now -8.594026566
Now -8.193638802
Now -7.818599701
Now -7.622627735
Now -7.535216808
Now -7.391239166
Now -7.331315517
Now -5.58059597
Now -5.579797268
Now -4.525201797
Now -3.735909224
Now -2.687532902
Now -2.65363884
Now -2.177522898
Now -1.977644682
Now -1.353205681
Now -0.042584345
Now 0.096564181
Now 0.275416046
Now 0.638839543
Now 1.959529042
Now 3.715519428
Now 4.842819691
Now 5.475946426
Now 6.380955219
Now 6.535937309
Now 8.421762466
Now 8.556800842
Now 10.39185524
I am trying to plot these returns to compare versus other days (so the rest of my history e.g.). I tried to use stat_density_ridges as per the code block below
ggplot(data = data.plot, aes(x = Return, y = Year, fill = factor(..quantile..))) +
stat_density_ridges(geom = "density_ridges_gradient",calc_ecdf = TRUE,
quantiles = c(0.025, 0.5, 0.975),
quantile_lines = TRUE)
As you can see - the "year" in this case is the same i.e. there is no height parameter, yet I get a nice ridg(y) chart. While the chart is beautiful to behold, and very very awesome, I am at a loss to determine how the plotting function is computing the density in this case, specially the height.
This is the output chart I get (I have omitted the formatting code here since it doesn't make a difference to my question):
Portfolio Return Distribution Plots - US versus Europe
I tried digging into the code of the function itself, but came up with a total blank. The documentation didn't help (except perhaps give me a hint that the function plots continous distributions).
Any help, or guidance, or even a nudge in the right direction would be extremely helpful.

Why aren't any points showing up in the qqcomp function when using plotstyle="ggplot"?

I want to compare the fit of different distributions to my data in a single plot. The qqcomp function from the fitdistrplus package pretty much does exactly what I want to do. The only problem I have however, is that it's mostly written using base R plot and all my other plots are written in ggplot2. I basically just want to customize the qqcomp plots to look like they have been made in ggplot2.
From the documentation (https://www.rdocumentation.org/packages/fitdistrplus/versions/1.0-14/topics/graphcomp) I get that this is totally possible by setting plotstyle="ggplot". If I do this however, no points are showing up on the plot, even though it worked perfectly without the plotstyle argument. Here is a little example to visualize my problem:
library(fitdistrplus)
library(ggplot2)
set.seed(42)
vec <- rgamma(100, shape=2)
fit.norm <- fitdist(vec, "norm")
fit.gamma <- fitdist(vec, "gamma")
fit.weibull <- fitdist(vec, "weibull")
model.list <- list(fit.norm, fit.gamma, fit.weibull)
qqcomp(model.list)
This gives the following output:
While this:
qqcomp(model.list, plotstyle="ggplot")
gives the following output:
Why are the points not showing up? Am I doing something wrong here or is this a bug?
EDIT:
So I haven't figured out why this doesn't work, but there is a pretty easy workaround. The function call qqcomp(model.list, plotstyle="ggplot") still returns an ggplot object, which includes the data used to make the plot. Using that data one can easily write an own plot function that does exactly what one wants. It's not very elegant, but until someone finds out why it's not working as expected I will just use this method.
I was able to reproduce your error and indeed, it's really intriguing. Maybe, you should contact developpers of this package to mention this bug.
Otherwise, if you want to reproduce this qqplot using ggplot and stat_qq, passing the corresponding distribution function and the parameters associated (stored in $estimate):
library(ggplot2)
df = data.frame(vec)
ggplot(df, aes(sample = vec))+
stat_qq(distribution = qgamma, dparams = as.list(fit.gamma$estimate), color = "green")+
stat_qq(distribution = qnorm, dparams = as.list(fit.norm$estimate), color = "red")+
stat_qq(distribution = qweibull, dparams = as.list(fit.weibull$estimate), color = "blue")+
geom_abline(slope = 1, color = "black")+
labs(title = "Q-Q Plots", x = "Theoritical quantiles", y = "Empirical quantiles")
Hope it will help you.

Knn Regression in R

I am investigating Knn regression methods and later Kernel Smoothing.
I wish to demonstrate these methods using plots in R. I have generated a data set using the following code:
x = runif(100,0,pi)
e = rnorm(100,0,0.1)
y = sin(x)+e
I have been trying to follow a description of how to use "knn.reg" in 9.2 here:
https://daviddalpiaz.github.io/r4sl/k-nearest-neighbors.html#regression
grid2=data.frame(x)
knn10 = FNN::knn.reg(train = x, test = grid2, y = y, k = 10)
My predicted values seem reasonable to me but when I try to plot a line with them on top of my x~y plot I don't get what I'm hoping for.
plot(x,y)
lines(grid2$x,knn10$pred)
I feel like I'm missing something obvious and would really appreciate any help or advice you can offer, thank you for your time.
You just need to sort the x values before plotting the lines.
plot(x,y)
ORD = order(grid2$x)
lines(grid2$x[ORD],knn10$pred[ORD])

rarecurve() plotted with Standard Error

Does rarecurve() (vegan) accept standard error for plotting?
If so, how can I plot such a curve?
I am following a classical script for this, with the BCI dataset:
S <- specnumber(BCI)
(raremax <- min(rowSums(BCI)))
Srare <- rarefy(BCI, raremax)
plot(S, Srare, xlab = "Observed No. of Species", ylab = "Rarefied No. of Species")
abline(0, 1)
rarecurve(BCI, step = 20, sample = raremax, col = "blue", cex = 0.6)
Statistically speaking, facilitating a function as this one would be helpful to most vegan users.
Thank you!
André
rarecurve does not give you SE. The reason is obvious and already given to you: there is enough clutter without extra curves. If you really want to do this, you must do it manually. That is not too complicated, because rarefy function accepts a vector sample sizes and gives you all the numbers you need. The following draws a basic plot using one site of Barro Colorado data set:
library(vegan)
data(BCI)
sum(BCI[1,]) # site 1, 448 tree stems
N <- seq(2, 448, by=8)
S <- rarefy(BCI[1,], N, se = TRUE)
plot(N, S[1,], type="l", lwd=3)
lines(N, S[1,] + 2*S[2,]) ## 2*SE is good enough for 95% CI
lines(N, S[1,] - 2*S[2,])
Statistically speaking, this gives you only the error caused by the subsampling process assuming that the observed data have no random variation. To me this makes little sense, and I find the rarefaction SE's misleading and meaningless. That does not stop me providing them in vegan.

Resources