I have the following algorithm
Step 1. Generate u1 and u2~U(0,1)
Step 2. Define v1=2u1-1, v2=2u2-1 and s=v1^2+v2^2
Step 3. If s>1, come back to Step 1.
Step 4. If s<=1, x=v1(-2logs/s)^(1/2) and y=v2(-2logs/s)^(1/2)
Here is my approach to implement this algorithm in R:
PolarMethod1<-function(N)
{
x<-numeric(N)
y<-numeric(N)
z<-numeric(N)
i<-1
while(i<=N)
{u1<-runif(1)
u2<-runif(1)
v1<-(2*u1)-1
v2<-(2*u2)-1
s<-(v1^2)+(v2^2)
if(s<=1)
{
x[i]<-((-2*log(s)/s)^(1/2))*v1
y[i]<-((-2*log(s)/s)^(1/2))*v2
z[i]<-(x[i]+y[i])/sqrt(2) #standarization
i<-i+1
}
else
i<-i-1
}
return(z)
}
z<-PolarMethod1(10000)
hist(z,freq=F,nclass=10,ylab="Density",col="purple",xlab=" z values")
curve(dnorm(x),from=-3,to=3,add=TRUE)
The code, fortunately, does not mark any error and works quite well with N=1000 but when I change to N=10000, instead of making a better approach to the curve displays:
contrast with N=1000 displays:
Why is that?
Is there something wrong with my code? It's supposed to be better adjusted when N increases.
Note:I added the z in the code to include both variables in the output.
Why is there a difference between 1000 and 100000 runs?
When you run 1000 simulations the z values usually go from -3.2 to 3.2. But if you increase the runs to 100k you will obtain more extreme values, z will go from -4 to 4.
The histogram is binning the z results into 10 bins. A higher range in z will result in wider bins, and wider bins usually adjust worse to the probability density.
Your bin width for 1000 runs is aproximately 0.5, but for 100k is 1.
You ask for 10 bins when you draw the histogram, but that's only a suggestion. You actually got 8, because to cover the range from -4 to 4 there is no division into 10 bins that ends up on nice round numbers, whereas 8 bins have very nice boundaries.
If you want more bins, then don't specify nclass. The default gave me 20 bins. Or specify breaks = "Scott", which uses a different rule to select bins. I saw about 80 bins using this option.
Related
The logistic map (a map is a function that takes its value at any time step to its value at the next time step) is a model that has its roots in the prediction of animal population sizes. It has become famous, in part, due to special cases of its parameterization that exhibit surprising chaotic behavior. The logistic map equation is
xi+1 = rxi(1 - xi)
where xi ∈ [0,1] is the value ratio of current population size to maximum possible size at time i, xi+1 is the ratio at the next generation and r is the driving rate, representing animal reproduction and death. For r < 3.5 the population eventually reaches a stable size or will oscillate between a set of fixed values. However, if r > 3.5 then the system destabilizes and exhibits chaotic behavior!
That is background or context for the following problem statement:
Generate a set of points S = {r, x} where, for each r ∈ [1.0, 4.1] by increments of 0.001025 there will be a sequence of xi values for i = 0,...,16. So, for each r value there will be 17 xi values. Use x0 = 0.01. Depending on your implementation, you may find the rbind function useful. It may take a few seconds for the code to run since it will generate a lot of points in S. No more than 10 lines of R code.
Admittedly, this is a lab assignment; however, I am not a student in the class. I am learning R, and I am trying to work through the online assignments and come up with a solution myself. I have tried to create the set of points to plot, and based on manual verification of a few points, the set looks accurate.
for(j in c(0:3024)) {
rm(x)
x <- 1:17
x[1] <- 0.01
r <- 1 + (j * 0.001025)
for(i in c(1:(17-1))) {
x[i+1] <- r *x[i] * (1 - x[i])
}
if (j==0) {
binded <- cbind(r,x)
} else {
binded <- rbind(binded, cbind(r,x))
}
}
When I invoke plot(binded, pch='.') RStudio displays the result as a straight line. So I am unsure if I am using plot correctly, or even if I am generating all the points correctly. If I decrease the maximum value of j to something less than 2000, you will see a plot; it is just when the j value iterates up to 3024 that you only plot a straight line.
I believe your code is correct, what happens is when time exceeds 4, the of iterations are widely unstable and are going to -infinity. This large variation in the y value is compressing the scale and making the plot look like a flat line.
Cutting off the tail end of the matrix makes a very interesting plot:
plot(binded[-which(binded[,2]<0),], pch=".")
If you do want to plot the entire matrix, consider manually setting your y-axis limits to [0,1]. This way, the plot won't be stretched down to -1e24.
As an added bonus, here's a version in a different plotting library that has points colored by i.
I want to calculate the following integrate by using the hit and miss method.
I=∫x^3dx with lower= 0 and upper =1
I know how to solve it but I cannot find the right code in R to calculate it and generate -for example 100000 random- and then plot them like this:
Thank you.
1. Generate 2 vectors from uniform distribution of the desired length
l = 10000
x = runif(l)
y = runif(l)
2. The approximation of the integral is the number of cases where the (x,y) points are below the function you want to integrate:
sum(y<x^3)/l
3. For the plot, you just have to plot the points, changing their color depending whether they are above or below the curve, and add the function with curve():
plot(x,y,col=1+(y<x^3))
curve(x^3,add=T,col=3)
I am trying to plot this formula. As x approaches 0 from the right, y should be approaching infinity, and so my curve should be going upwards close to y-axis. Instead it gets cut off at y=23 or so.
my_formula = function(x){7.9*x^(-0.5)-1.3}
curve(my_formula,col="red",from=0 ,to=13, xlim=c(0,13),ylim=c(0,50),axes=T, xlab=NA, ylab=NA)
I tried to play with from= parameter, and actually got what I needed when I
put from=-4.8 but I have no idea why this works. in fact x doesn't get less than 0, and from/to should represent the range of x values, Do they? If someone could explain it to me, this would be amazing! Thank you!
By default, curve only chooses 101 x-values within the (from, to) range, set by the default value of the n argument. In your case this means there aren't many values that are close enough to 0 to show the full behaviour of the function. Increasing the number of values that are plotted with something like n=500 helps:
curve(my_formula,col="red",from=0 ,to=13,
xlim=c(0,13),ylim=c(0,50),axes=T, xlab=NA, ylab=NA,
n=500)
This is due mainly to the fact that my_formula(0) is Inf:
So plotting from=0, to=13 in curve means your first 2 values are by default (with 101 points as #Marius notes):
# x
seq(0, 13, length.out=101)[1:2]
#[1] 0.00 0.13
# y
my_formula(seq(0, 13, length.out=101)[1:2])
#[1] Inf 20.61066
And R will not plot infinite values to join the lines from the first point to the second one.
If you get as close to 0 on your x axis as is possible on your system, you can make this work a-okay. For instance:
curve(my_formula, col="red", xlim=c(0 + .Machine$double.eps, 13), ylim=c(0,50))
I have a binomial assymetric distribution which I would like to cut at both ends. The specific part of it is that I would like to calculate symmetric boundaries at the appropriate side of each 'bell'. The figure shows an extreme case of separation between bells for simplicity.
In this case the red cuts were selected by eye and the 1550 blue lines used at each side represent an arbitrary value that could potentially be passed through a function for the trim. My goal would be subset everything between blue lines.
hist(p3_cut$x,50)
abline(v=c(6200,7600),col='red')
abline(v=c(6200-1500,7600+1500),col='blue')
My guess is that the problem here is basically find the 'edges' of each curve. I cannot use half distance between means, I need something that recognizes frequency change from 0 (or very low value) to something relatively high.
A somewhat general answer. Depending on the problem you might need to adjust the binwidth in the density function:
# get density of x and normalize so max is one
dens <- density(x,adjust=0.1)
dens$y <- dens$y / max(dens$y)
# keep all x where density is higher than some fraction of max (here 1%)
min_frac <- 0.01
x_keep <- dens$x[dens$y > 0.01]
# find position of gap in x, and get x just before and after gap
gap_pos <- which.max(diff(x_keep))
left_cut <- x_keep[gap_pos]
right_cut <- x_keep[gap_pos + 1]
Using this code and changing the adjust parameter in the density function I was able to calculate almost perfect cuts at least for this case. I am positive that this approach is flexible enough for most situations that are similar to this one. I show the results for the cuts proposed.
Not sure whether this should go on cross validated or not but we'll see. Basically I obtained data from an instrument just recently (masses of compounds from 0 to 630) which I binned into 0.025 bins before plotting a histogram as seen below:-
I want to identify the bins that are of high frequency and that stands out from against the background noise (the background noise increases as you move from right to left on the a-xis). Imagine drawing a curve line ontop of the points that have almost blurred together into a black lump and then selecting the bins that exists above that curve to further investigate, that's what I'm trying to do. I just plotted a kernel density plot to see if I could over lay that ontop of my histogram and use that to identify points that exist above the plot. However, the density plot in no way makes any headway with this as the densities are too low a value (see the second plot). Does anyone have any recommendations as to how I Can go about solving this problem? The blue line represents the density function plot overlayed and the red line represents the ideal solution (need a way of somehow automating this in R)
The data below is only part of my dataset so its not really a good representation of my plot (which contains just about 300,000 points) and as my bin sizes are quite small (0.025) there's just a huge spread of data (in total there's 25,000 or so bins).
df <- read.table(header = TRUE, text = "
values
1 323.881306
2 1.003373
3 14.982121
4 27.995091
5 28.998639
6 95.983138
7 2.0117459
8 1.9095478
9 1.0072853
10 0.9038475
11 0.0055748
12 7.0964916
13 8.0725191
14 9.0765316
15 14.0102531
16 15.0137390
17 19.7887675
18 25.1072689
19 25.8338140
20 30.0151683
21 34.0635308
22 42.0393751
23 42.0504938
")
bin <- seq(0, 324, by = 0.025)
hist(df$values, breaks = bin, prob=TRUE, col = "grey")
lines(density(df$values), col = "blue")
Assuming you're dealing with a vector bin.densities that has the densities for each bin, a simple way to find outliers would be:
look at a window around each bin, say +- 50 bins
current.bin <- 1
window.size <- 50
window <- bin.densities[current.bin-window.size : current.bin+window.size]
find the 95% upper and lower quantile value (or really any value you think works)
lower.quant <- quantile(window, 0.05)
upper.quant <- quantile(window, 0.95)
then say that the current bin is an outlier if it falls outside your quantile range.
this.is.too.high <- (bin.densities[current.bin] > upper.quant
this.is.too.low <- (bin.densities[current.bin] < lower.quant)
#final result
this.is.outlier <- this.is.too.high | this.is.too.low
I haven't actually tested this code, but this is the general approach I would take. You can play around with window size and the quantile percentages until the results look reasonable. Again, not exactly super complex math but hopefully it helps.