So I am trying to find 3 things given a certain function in the x domain when transformed into the spectral domain.
the Amplitude
The Frequency
The Phase
In R (statistical software) I have coded the following function:
y=7*cos(2*pi*(seq(-50,50,by=.01)*(1/9))+32)
fty=fft(y,inverse=F)
angle=atan2(Im(fty), Re(fty))
x=which(abs(fty)[1:(length(fty)/2)]==max(abs(fty)[1:(length(fty)/2)]))
par(mfcol=c(2,1))
plot(seq(-50,50,by=.01),y,type="l",ylab = "Cosine Function")
plot(abs(fty),xlim=c(x-30,x+30),type="l",ylab="Spectral Density in hz")
I know I can compute the frequency manually by taking the bin value and dividing it by the size of the interval(total time of the domain). Since the bins started at 1, when it should be zero, it would thus be frequency=(BinValue-1)/MaxTime, which does get me the 1/9'th I have in the function above.
I have two quick questions:
First) I am having trouble computing the phase, is there a prebuilt R function that can give me the phase? From a manual calculation, the density function peaks at 12 (see bottom graph), shouldn't the value then the value of the phase be 2*pi+angle[12] but I am getting a value of
angle[12] [1] -2.558724
which puts the phase at 2*pi+angle[12]=3.724462. But that's wrong the phase should be 32 radians. What am I doing wrong?
Second) Is there a function that can automatically convert abs(fty)[12]=34351.41 , to the amplitude number I have in front of the cosine, which is 7?
Related
I'm having trouble interpreting the results I got from relrisk. My data is a multiple point process containing two marks (two rodents species AA and RE), I want to know if they are spatially segregated or not.
> summary(REkm)
Marked planar point pattern: 46 points
Average intensity 0.08101444 points per square unit
*Pattern contains duplicated points*
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Multitype:
frequency proportion intensity
AA 15 0.326087 0.02641775
RE 31 0.673913 0.05459669
Window: rectangle = [4, 38] x [0.3, 17] units
x 16.7 units)
Window area = 567.8 square units
relkm <- relrisk(REkm)
plot(relkm, main="Relrisk default")
The bandwidth of this relrisk estimation is automatically selection by default(bw.relrisk), but when I tried other numeric number using sigma= 0.5 or 1, the results are somehow kind of weird.
How did this happened? Was it because the large proportion of blank space of my ppp?
According to chapter.14 of Spatial Point Patterns books and the previous discussion, I assume the default of relrisk will show the ratio of intensities (case divided by control, in my case: RE divided by AA), but if I set casecontrol=FALSE, I can get the spatially-varying probability of each type.
Then why the image of type RE in the Casecontrol=False looks exactly same as the relrisk estimation by default? Or they both estimate p(RE)=λRE/ λRE+λAA for each sites?
Any help will be appreciated! Thanks a lot!
That's two questions.
Why does the image for RE when casecontrol=FALSE look the same as the default output from relrisk?
The definitive source of information about spatstat functions is the online documentation in the help files. The help file for relrisk.ppp gives full details of the behaviour of this function. It says that the calculation of probabilities and risks is controlled by the argument relative. If relative=FALSE (the default), the code calculates the spatially varying probability of each type. If relative=TRUE it calculates the relative risk of each type i, defined as the ratio of the probability of type i to the probability of type c where c is the type designated as the control. If you wanted the relative risk then you should set relative=TRUE.
Very different results obtained when setting sigma=0.5 compared to the automatically selected bandwidth.
Your example output says that the window is 34 by 17 units. A smoothing bandwidth of sigma=0.5 is very small for this region. Imagine each data point being replaced by a blurry circle of radius 0.5; there would be a lot of empty space. The smoothing procedure is encountering numerical problems which are causing the funky artefacts.
You could try a range of different values of sigma, say from 1 to 15, and decide which value produces the most satisfactory result.
The plot of relrisk(REkm, casecontrol=FALSE) suggests that the automatic bandwidth selector bw.relriskppp chose a much larger value of sigma, perhaps about 10. You can investigate this by
b <- bw.relriskppp(REkm)
print(b)
plot(b)
The print command will print the chosen value of sigma that was used in the default calculation. The plot command will show the cross-validation criterion which was maximised to select the bandwidth. This gives you an idea of the range of values of sigma that are acceptable according to the automatic selector.
Read the help file for bw.relriskppp about the different options available for bandwidth selection method. Maybe a different choice of method would give you a more acceptable result from your viewpoint.
Im trying to apply the fourier phase shift theorem to a complex signal in R. However, only the magnitude of my signal shifts as I expect it. I think it should be possible to apply this theorem to complex signals, so probably I make an error somewhere. My guess is that there is an error in the frequency axis I calculate.
How do I correctly apply the fourier shift theorem to a complex signal (using R)?
i = complex(0,0,1)
t.in = (1+i)*matrix(c(1,0,0,0,0,0,0,0,0,0))
n.shift = 5
#the output of fft() has the mean / 0 frequency at the first element
#it then increases to the highest frequency, flips to negative frequencies
#and then increases again to the negative frequency closest to 0
N = length(t.in)
if (N%%2){#odd
kmin = -(N-1)/2
kmax = (N-1)/2
} else {#even
kmin = -N/2
kmax = N/2-1
#center frequency negative, is that correct?
}
#create frequency axis for fft() output, no sampling frequency or sample duration needed
k = (kmin:kmax)
kflip = floor(N/2)
k = k[c((kflip+1):N,1:kflip)]
f = 2*pi*k/N
shiftterm = exp( -i*n.shift*f )
T.in = fft(t.in)
T.out = T.in*shiftterm
t.out = fft(T.out, inverse=T)/N
par(mfrow=c(2,2))
plot(Mod(t.in),col="green");
plot(Mod(t.out), col="red");
plot(Arg(t.in),col="green");
plot(Arg(t.out),col="red");
As you can see the magnitude of the signal is nicely shifted, but the phase is scrambled. I think the negative frequencies are where my error is, but I cant see it.
What am I doing wrong?
The questions about fourier phase shift theorem I could find:
real 2d signal in python
real 2d signal in matlab
real 1d signal in python
math question about what fourier shift does
But these were not about complex signals.
Answer
As Steve suggested in the comments, I checked the phase on the 6th element.
> Arg(t.out)[6]
[1] 0.7853982
> Arg(t.in)[1]
[1] 0.7853982
So the only element that has a magnitude (at least one order of magnitude higher than the EPS) does have the phase that I expected.
TL;DR The result from the original approach in the question was already correct, we see the Gibbs Phenomenon sliding by.
Just discard low magnitude elements?
If ever the phase of elements that should be zero will be a problem I can run t.out[Mod(t.out)<epsfactor*.Machine$double.eps] = 0 where in this case epsfactor has to be 10 to get rid of the '0' magnitude elements.
Adding that line before plotting gives the following result, which is what I expected to get beforehand. However, the 'scrambled' phase might actually be accurate in most cases as I'll explain below.
The original result really was correct
Just setting low magnitude elements to 0 does not make the phase of the shifted signal more intuitive however. This is a plot where I apply a 4.5 sample shift, the phase is still 'scrambled'.
Applying fourier shift equivalent to downsmapling shifted fourier interpolation
It occurred to me that applying a non-integer number of elements phase shift is equivalent to fourier interpolating the signal and then downsample the interpolated signal at points between the original elements. Since the vector I used as input is an impulse function, the fourier interpolated signal is just not well behaved. Then the signal after applying the fourier phase shift theorem can be expected to have exactly the phase that the fourier interpolated signal has, as seen below.
Gibbs Ringing
Its just at the discontinuities where phase is not well behaved and where small rounding errors might cause large errors in the reconstructed phase. So not really related to low magnitude but to not well defined fourier transform of the input vector. This is called Gibbs Ringing, I could use low-pass filtering with a gaussian filter to decrease it.
Questions related to fourier interpolation and phase shift
symbolic approach in R to estimate fourier transform error
non integer signal shift by use of linear interpolation
downsampling complex signal
fourier interpolation application
estimating sub-sample shift between two signals using fourier transforms
estimating sub-sample shift between two signals without interpolation
I'm using the survival library. After computing the Kaplan-Meier estimator of a survival function:
km = survfit(Surv(time, flag) ~ 1)
I know how to compute percentiles:
quantile(km, probs = c(0.05,0.25,0.5,0.75,0.95))
But, how do I compute the mean survival time?
Calculate Mean Survival Time
The mean survival time will in general depend on what value is chosen for the maximum survival time. You can get the restricted mean survival time with print(km, print.rmean=TRUE). By default, this assumes that the longest survival time is equal to the longest survival time in the data. You can set this to a different value by adding an rmean argument (e.g., print(km, print.rmean=TRUE, rmean=250)).
Extract Value of Mean Survival Time and Store in an Object
In response to your comment: I initially figured one could extract the mean survival time by looking at the object returned by print(km, print.rmean=TRUE), but it turns out that print.survfit doesn't return a list object but just returns text to the console.
Instead, I looked through the code of print.survfit (you can see the code by typing getAnywhere(print.survfit) in the console) to see where the mean survival time is calculated. It turns out that a function called survmean takes care of this, but it's not an exported function, meaning R won't recognize the function when you try to run it like a "normal" function. So, to access the function, you need to run the code below (where you need to set rmean explicitly):
survival:::survmean(km, rmean=60)
You'll see that the function returns a list where the first element is a matrix with several named values, including the mean and the standard error of the mean. So, to extract, for example, the mean survival time, you would do:
survival:::survmean(km, rmean=60)[[1]]["*rmean"]
Details on How the Mean Survival Time is Calculated
The help for print.survfit provides details on the options and how the restricted mean is calculated:
?print.survfit
The mean and its variance are based on a truncated estimator. That is,
if the last observation(s) is not a death, then the survival curve
estimate does not go to zero and the mean is undefined. There are four
possible approaches to resolve this, which are selected by the rmean
option. The first is to set the upper limit to a constant,
e.g.,rmean=365. In this case the reported mean would be the expected
number of days, out of the first 365, that would be experienced by
each group. This is useful if interest focuses on a fixed period.
Other options are "none" (no estimate), "common" and "individual". The
"common" option uses the maximum time for all curves in the object as
a common upper limit for the auc calculation. For the
"individual"options the mean is computed as the area under each curve,
over the range from 0 to the maximum observed time for that curve.
Since the end point is random, values for different curves are not
comparable and the printed standard errors are an underestimate as
they do not take into account this random variation. This option is
provided mainly for backwards compatability, as this estimate was the
default (only) one in earlier releases of the code. Note that SAS (as
of version 9.3) uses the integral up to the last event time of each
individual curve; we consider this the worst of the choices and do not
provide an option for that calculation.
Using the tail formula (and since our variable is non negative) you can calculate the mean as the integral from 0 to infinity of 1-CDF, which equals the integral of the Survival function.
If we replace a parametric Survival curve with a non parametric KM estimate, the survival curve goes only until the last time point in our dataset. From there on it "assumes" that the line continues straight. So we can use the tail formula in a "restricted" manner only until some cut-off point, which we can define (default is the last time point in our dataset).
You can calculate it using the print function, or manually:
print(km, print.rmean=TRUE) # print function
sum(diff(c(0,km$time))*c(1,km$surv[1:(length(km$surv)-1)])) # manually
I add 0 in the beginning of the time vector, and 1 at the beginning of the survival vector since they're not included. I only take the survival vector up to the last point, since that is the last chunk. This basically calculates the area-under the survival curve up to the last time point in your data.
If you set up a manual cut-off point after the last point, it will simply add that area; e.g., here:
print(km, print.rmean=TRUE, rmean=4) # gives out 1.247
print(km, print.rmean=TRUE, rmean=4+2) # gives out 1.560
1.247+2*min(km$surv) # gives out 1.560
If the cut-off value is below the last, it will only calculate the area-under the KM curve up to that point.
There's no need to use the "hidden" survival:::survmean(km, rmean=60).
Use just summary(km)$table[,5:6], which gives you the RMST and its SE. The CI can be calculated using appropriate quantile of the normal distribution.
I'm pretty new to R, so I hope you can help me!
I'm trying to do a simulation for my Bachelor's thesis, where I want to simulate how a stock evolves.
I've done the simulation in Excel, but the problem is that I can't make that large of a simulation, as the program crashes! Therefore I'm trying in R.
The stock evolves as follows (everything except $\epsilon$ consists of constants which are known):
$$W_{t+\Delta t} = W_t exp^{r \Delta t}(1+\pi(exp((\sigma \lambda -0.5\sigma^2) \Delta t+\sigma \epsilon_{t+\Delta t} \sqrt{\Delta t}-1))$$
The only thing here which is stochastic is $\epsilon$, which is represented by a Brownian motion with N(0,1).
What I've done in Excel:
Made 100 samples with a size of 40. All these samples are standard normal distributed: N(0,1).
Then these outcomes are used to calculate how the stock is affected from these (the normal distribution represent the shocks from the economy).
My problem in R:
I've used the sample function:
x <- sample(norm(0,1), 1000, T)
So I have 1000 samples, which are normally distributed. Now I don't know how to put these results into the formula I have for the evolution of my stock. Can anyone help?
Using R for (discrete) simulation
There are two aspects to your question: conceptual and coding.
Let's deal with the conceptual first, starting with the meaning of your equation:
1. Conceptual issues
The first thing to note is that your evolution equation is continuous in time, so running your simulation as described above means accepting a discretisation of the problem. Whether or not that is appropriate depends on your model and how you have obtained the evolution equation.
If you do run a discrete simulation, then the key decision you have to make is what stepsize $\Delta t$ you will use. You can explore different step-sizes to observe the effect of step-size, or you can proceed analytically and attempt to derive an appropriate step-size.
Once you have your step-size, your simulation consists of pulling new shocks (samples of your standard normal distribution), and evolving the equation iteratively until the desired time has elapsed. The final state $W_t$ is then available for you to analyse however you wish. (If you retain all of the $W_t$, you have a distribution of the trajectory of the system as well, which you can analyse.)
So:
your $x$ are a sampled distribution of your shocks, i.e. they are $\epsilon_t=0$.
To simulate the evolution of the $W_t$, you will need some initial condition $W_0$. What this is depends on what you're modelling. If you're modelling the likely values of a single stock starting at an initial price $W_0$, then your initial state is a 1000 element vector with constant value.
Now evaluate your equation, plugging in all your constants, $W_0$, and your initial shocks $\epsilon_0 = x$ to get the distribution of prices $W_1$.
Repeat: sample $x$ again -- this is now $\epsilon_1$. Plugging this in, gives you $W_2$ etc.
2. Coding the simulation (simple example)
One of the useful features of R is that most operators work element-wise over vectors.
So you can pretty much type in your equation more or less as it is.
I've made a few assumptions about the parameters in your equation, and I've ignored the $\pi$ function -- you can add that in later.
So you end up with code that looks something like this:
dt <- 0.5 # step-size
r <- 1 # parameters
lambda <- 1
sigma <- 1 # std deviation
w0 <- rep(1,1000) # presumed initial condition -- prices start at 1
# Show an example iteration -- incorporate into one line for production code...
x <- rnorm(1000,mean=0,sd=1) # random shock
w1 <- w0*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*x*sqrt(dt) -1)) # evolution
When you're ready to let the simulation run, then merge the last two lines, i.e. include the sampling statement in the evolution statement. You then get one line of code which you can run manually or embed into a loop, along with any other analysis you want to run.
# General simulation step
w <- w*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*rnorm(1000,mean=0,sd=1)*sqrt(dt) -1))
You can also easily visualise the changes and obtain summary statistics (5-number summary):
hist(w)
summary(w)
Of course, you'll still need to work through the details of what you actually want to model and how you want to go about analysing it --- and you've got the $\pi$ function to deal with --- but this should get you started toward using R for discrete simulation.
I have a continuous univariate xts object of length 1000, which I have converted into a data.frame called x to be used by the package RHmm.
I have already chosen that there are going to be 5 states and 4 gaussian distributions in the mixed distribution.
What I'm after is the expected mean value for the next observation. How do I go about getting that?
So what I have so far is:
a transition matrix from running the HMMFit() function
a set of means and variances for each of the gaussian distributions in the mixture, along with their respective proportions, all of which was also generated form the HMMFit() function
a list of past hidden states relating to the input data when using the output of the HMMFit function and putting it into the viterbi function
How would I go about getting the next hidden state (i.e. the 1001st value) from what I've got, and then using it to get the weighted mean from the gaussian distributions.
I think I'm pretty close just not too sure what the next part is...The last state is state 5, do I use the 5th row in the transition matrix somehow to get the next state?
All I'm after is the weighted mean for what is to be expect in the next observation, so the next hidden state isn't even necessary. Do I multiply the probabilities in row 5 by each of the means, weighted to their proportion for each state? and then sum it all together?
here is the code I used.
# have used 2000 iterations to ensure convergence
a <- HMMFit(x, nStates=5, nMixt=4, dis="MIXTURE", control=list(iter=2000)
v <- viterbi(a,x)
a
v
As always any help would be greatly appreciated!
Next predicted value uses last hidden state last(v$states) to get probability weights from the transition matrix a$HMM$transMat[last(v$states),] for each state the distribution means a$HMM$distribution$mean are weighted by proportions a$HMM$distribution$proportion, then its all multiplied together and summed. So in the above case it would be as follows:
sum(a$HMM$transMat[last(v$states),] * .colSums((matrix(unlist(a$HMM$distribution$mean), nrow=4,ncol=5)) * (matrix(unlist(a$HMM$distribution$proportion), nrow=4,ncol=5)), m=4,n=5))