I want to create a set of 10 logspaced numbers from zero to some big number M, say M=60,000, for example in R.
First, I tried to use lseq() from the package emdbook. The problem with lseq, however, is that it cannot handle 0 as a starting point. (This is due to the fact that it will try to calculate log(0) and then fail).
Next, I tried to use logspace() from the pracma package in the following way:
Numbers <- log(logspace(0,M,10),base=10)
This works fine for values of M up to about 340. From then on the numbers in the set will become infinity because the exponential function becomes too large.
Is there any other way in R to create a set of logspaced numbers from zero to some big number M which will not make most of the numbers in the set infinity and which can actually handle zero as a starting point?
Correct me if I am wrong, but can't you just çalculate the logspaces for lower numbers and then multiply? They should be linearly related right? Just look at this output:
library(pracma)
> log(logspace(0,60, 10), base = 10)[1:5]
[1] 0.000000000000000 6.666666666666667 13.333333333333334 20.000000000000000 26.666666666666668
> log(logspace(0,600, 10), base = 10)[1:5]
[1] 0.000000000000000 66.666666666666671 133.333333333333343 200.000000000000000 266.666666666666686
> x1 <- (log(logspace(0,600, 10), base = 10)*100)[2]
> x1
[1] 6666.666666666667
> x2 <- seq(0 , 9, 1)*x1
> x2
[1] 0.000000000000 6666.666666666667 13333.333333333334 20000.000000000000 26666.666666666668
[6] 33333.333333333336 40000.000000000000 46666.666666666672 53333.333333333336 60000.000000000000
Related
I'm pretty new to R so apologies in advance if this question is poorly constructed. Basically I have a piece-wise function that I need to calculate the value for a large number of rows. My current function looks something like this:
f <- function(x){
(x <= 1000) * x^2 +
(x > 1000 & x <= 2000) * x^3 +
(x > 2000 & x <= 3000) * x^4 +
(x > 4000) * x^5
}
However I need to be able to create or generalize this function for a variety of different sets of breakpoints (ie maybe 1500,2500,3500, etc) and for different numbers of breakpoints. Also given the large number of rows that will need to be calculated on, the function has to be vectorized. Any advice?
Edit:
To clarify, I made the function above from some table of breakpoints (1000,2000,3000,4000) and associated powers to raise x to (2,3,4,5). However I need to be able to take multiple of such tables, each with varying breakpoints and number of breakpoints (with potentially 100 or so breakpoints) and be able to apply the resulting piecewise function to a large number of rows.
A vectorised version of your function with additional breaks and power arguments can be written this way:
function(x, breaks, power){
x^power[as.numeric(cut(x, breaks))]
}
as.numeric(cut(...)) gets the position of all x values in the breaks, then the square bracket looks up the power in the power vector and raises the corresponding x to the correct power. Tests:
Some breaks points and powers:
> bp <- c(10,20,30,40)
> po = c(2,3,4)
Note the breakpoints are left-excluded:
> f(9,bp,po)
[1] NA
> f(10,bp,po)
[1] NA
So the first valid x has to be above 10:
> f(11,bp,po)
[1] 121
And gets us 11^2 as expected. So 20 gets squared and 21 gets cubed:
> f(20,bp,po)
[1] 400
> f(21,bp,po)
[1] 9261
Good so far. Vectorised?
> f(19:22, bp, po)
[1] 361 400 9261 10648
Yes - the change from square to cube happens between 20 and 21.
See the help for the right option for the cut function if you want the intervals to be closed on the left or right.
From what I understand from your example code, you basically want to minimize the coding, and also want the code to be dynamic, so that you can dynamically vary the breaks and power.
Below is the sample code, which tries to do the same.
f <- function(x, breakPoints, powerX) {
cutX <- cut(x, breaks=breakPoints)
cutX1 <- factor(cutX, labels=powerX)
retX <- x ^ as.numeric(as.character(cutX1))
retX
}
x1 <- sample(1:10000, 1000)
x1 <- x1[order(x1)]
breakPoints1 <- c(min(x1)-1, 1000, 2000, 3000, max(x1))
powerX1 <- c(2, 3, 4, 5)
newX1 <- f(x1, breakPoints1, powerX1)
head(newX1) # manual check whether the values make sense
head(x1)
This code will do that.
But my suggestion will be to test this code, as much as possible, so that you can use it reliably. Hope this code is useful to you.
Cans someone explain the results in a typical dt function? The help page says that I should receive the density function. However, in my code below, what does the first value ".2067" represent?The second value?
x<-seq(1,10)
dt(x, df=3)
[1] 0.2067483358 0.0675096607 0.0229720373 0.0091633611 0.0042193538 0.0021748674
[7] 0.0012233629 0.0007369065 0.0004688171 0.0003118082
Two things were confused here:
dt gives you the density, this is why it decreases for large numbers:
x<-seq(1,10)
dt(x, df=3)
[1] 0.2067483358 0.0675096607 0.0229720373 0.0091633611 0.0042193538 0.0021748674
[7] 0.0012233629 0.0007369065 0.0004688171 0.0003118082
pt gives the distribution function. This is the probability of being smaller or equal x.
This is why the values go to 1 as x increases:
pt(x, df=3)
[1] 0.8044989 0.9303370 0.9711656 0.9859958 0.9923038 0.9953636 0.9970069 0.9979617 0.9985521 0.9989358
A "probability density" is not really a true probability, since probabilities are bounded in [0,1] while densities are not. The integral of densities across their domain is normalized to exactly 1. So densities are really the first derivatives of the probability function. This code may help:
plot( x= seq(-10,10,length=100),
y=dt( seq(-10,10,length=100), df=3) )
The value of 0.207 for dt at x=1 says that at x=1 that the probability is increasing at a rate of 0.207 per unit increase in x. (And since the t-distribution is symmetric that is also the value of dt with 3 df at -1.)
A bit of coding to instantiate the dt(x,df=3) function (see ?dt) and then integrate it:
> dt3 <- function(x) { gamma((4)/2)/(sqrt(3*pi)*gamma(3/2))*(1+x^2/3)^-((3+1)/2) }
> dt3(1)
[1] 0.2067483
> integrate(dt3, -Inf, Inf)
1 with absolute error < 7.2e-08
I am calculating z-scores to see if a value is far from the mean/median of the distribution.
I had originally done it using the mean, then turned these into 2-side pvalues. But now using the median I noticed that there are some Na's in the pvalues.
I determined this is occuring for values that are very far from the median.
And looks to be related to the pnorm calculation.
"
'qnorm' is based on Wichura's algorithm AS 241 which provides
precise results up to about 16 digits. "
Does anyone know a way around this as I would like the very small pvalues.
Thanks,
> z<- -12.5
> 2-2*pnorm(abs(z))
[1] 0
> z<- -10
> 2-2*pnorm(abs(z))
[1] 0
> z<- -8
> 2-2*pnorm(abs(z))
[1] 1.332268e-15
Intermediately, you are actually calculating very high p-values:
options(digits=22)
z <- c(-12.5,-10,-8)
pnorm(abs(z))
# [1] 1.0000000000000000000000 1.0000000000000000000000 0.9999999999999993338662
2-2*pnorm(abs(z))
# [1] 0.000000000000000000000e+00 0.000000000000000000000e+00 1.332267629550187848508e-15
I think you will be better off using the low p-values (close to zero) but I am not good enough at math to know whether the error at close-to-one p-values is in the AS241 algorithm or the floating point storage. Look how nicely the low values show up:
pnorm(z)
# [1] 3.732564298877713761239e-36 7.619853024160526919908e-24 6.220960574271784860433e-16
Keep in mind 1 - pnorm(x) is equivalent to pnorm(-x). So, 2-2*pnorm(abs(x)) is equivalent to 2*(1 - pnorm(abs(x)) is equivalent to 2*pnorm(-abs(x)), so just go with:
2 * pnorm(-abs(z))
# [1] 7.465128597755427522478e-36 1.523970604832105383982e-23 1.244192114854356972087e-15
which should get more precisely what you are looking for.
One thought, you'll have to use an exp() with larger precision, but you might be able to use log(p) to get slightly more precision in the tails, otherwise you are effectively at 0 for the non-log p values in terms of the range that can be calculated:
> z<- -12.5
> pnorm(abs(z),log.p=T)
[1] -7.619853e-24
Converting back to the p value doesn't work well, but you could compare on log(p)...
> exp(pnorm(abs(z),log.p=T))
[1] 1
pnorm is a function which gives what P value is based on given x. If You do not specify more arguments, then default distribution is Normal with mean 0, and standart deviation 1.
Based on simetrity, pnorm(a) = 1-pnorm(-a).
In R, if you add positive numbers it will round them. But if you add negative no rounding is done. So using this formula and negative numbers you can calculate needed values.
> pnorm(0.25)
[1] 0.5987063
> 1-pnorm(-0.25)
[1] 0.5987063
> pnorm(20)
[1] 1
> pnorm(-20)
[1] 2.753624e-89
Assume I have two different parameter vectors used as inputs in a function. Also, assume the two parameter vectors are of different lengths. Is there a way to output all the possible values of that function? I know that if I use two parameter vectors of different lengths, then the shorter parameter vector is just repeated so that doesn't work. I can solve this "manually" as you can see below. But, I'd like to find a more efficient manner of calculating all the possible combinations. An example follows:
Assume I'm using the dbinom() function that has as inputs x (number of "successes" from the sample), n (number of observations in the sample), and p (the probability of "success" for each x). n stays constant at 20; however, my x varies from 7,8,9,...,20 ("x7" below) or 0,1 ("x1" below). Also, I want to evaluate dbinom() at different values of p, specifically from 0 to 1 in .1 increments ("p" below). As you can see the three parameter vectors x7, x1, and p are all of different lengths 14, 2, and 11, respectively.
> p<-seq(from=0,to=1,by=.1)
> x7<-seq.int(7,20)
> x1<-c(0,1)
> n<-20
I can evaluate each combination by using one of the vectors (x7/x2 or p) in dbinom() and then selecting a value for the remaining parameter. As you can see below, I used the vector x7 or x2 and then "manually" changed the p to equal 0,.1,.2,...,1.
> sum(dbinom(x7,n,.1))
[1] 0.002386089
> sum(dbinom(x7,n,.1))+sum(dbinom(x1,n,.1))
[1] 0.3941331
> sum(dbinom(x7,n,.2))+sum(dbinom(x1,n,.2))
[1] 0.1558678
> sum(dbinom(x7,n,.3))+sum(dbinom(x1,n,.3))
[1] 0.3996274
> sum(dbinom(x7,n,.4))+sum(dbinom(x1,n,.4))
[1] 0.7505134
> sum(dbinom(x7,n,.5))+sum(dbinom(x1,n,.5))
[1] 0.9423609
> sum(dbinom(x7,n,.6))+sum(dbinom(x1,n,.6))
[1] 0.9935345
> sum(dbinom(x7,n,.7))+sum(dbinom(x1,n,.7))
[1] 0.999739
> sum(dbinom(x7,n,.8))+sum(dbinom(x1,n,.8))
[1] 0.9999982
> sum(dbinom(x7,n,.9))+sum(dbinom(x1,n,.9))
[1] 1
> sum(dbinom(x7,n,1))+sum(dbinom(x1,n,1))
[1] 1
Basically, I want to know if there is a way to get R to print all the sums above from 0.3941331,0.1558678,...,1 with a single line of input or some other more efficient way of varying the parameter p without simply copying and changing p on each line.
*I'm new to Stackoverflow, so I apologize in advance if I have not formulated my question conventionally.
You are using dbinom with a range of x values and then summing. Instead using pbinom, which calculates then probability of P(x <=q) (or P(x >q) if lower.tail = FALSE).
Thus you can calculate P(x >6) + P(q <= 1) (which is what you appear to want to calculate)
pbinom(q = 6, p = p ,size = n, lower.tail = FALSE) + pbinom(q = 1, p = p, size = n)
How can I get more significant digits in R? Specifically, I have the following example:
> dpois(50, lambda= 5)
[1] 1.967673e-32
However when I get the p-value:
> 1-ppois(50, lambda= 5)
[1] 0
Obviously, the p-value is not 0. In fact it should greater than 1.967673e-32 since I'm summing a bunch of probabilities. How do I get the extra precision?
Use lower.tail=FALSE:
ppois(50, lambda= 5, lower.tail=FALSE)
## [1] 2.133862e-33
Asking R to compute the upper tail is much more accurate than computing the lower tail and subtracting it from 1: given the inherent limitations of floating point precision, R can't distinguish (1-eps) from 1 for values of eps less than .Machine$double.neg.eps, typically around 10^{-16} (see ?.Machine).
This issue is discussed in ?ppois:
Setting ‘lower.tail = FALSE’ allows to get much more precise
results when the default, ‘lower.tail = TRUE’ would return 1, see
the example below.
Note also that your comment about the value needing to be greater than dpois(50, lambda=5) is not quite right; ppois(x,...,lower.tail=FALSE) gives the probability that the random variable is greater than x, as you can see (for example) by seeing that ppois(0,...,lower.tail=FALSE) is not exactly 1, or:
dpois(50,lambda=5) + ppois(50,lambda=5,lower.tail=FALSE)
## [1] 2.181059e-32
ppois(49,lambda=5,lower.tail=FALSE)
## [1] 2.181059e-32