How to calculate amplitude from spectrum() - r

I have a signal and I need to get the actual magnitude of a frequency found at spectrum()
Consider the following signal
f <- 5
n <- 500
signal <- 4*sin(2*pi*f*seq(0,10,1/n))
S.signal <- spectrum(signal, log="no")
Using spectrum() I get the following:
I can verify the amplitude of the peak using:
> max(S.signal$spec)
[1] 16698.45
How can I convert this value 16698.45 to the actual magnitude of the signal at that frequency 4 - or something close?

There is no relation between the amplitude of your signal and the amplitude of your spectrum here. The Fourier transform of a sinus is a delta function at the corresponding frequency, that is an infinitely narrow pic with an infinite amplitude.
The fact that you find a value for the amplitude of your spetrum is due to the sampling of your signal that cause a loss of information, You can see that here :
f <- 5
n <- 1000
signal <- 4*sin(2*pi*f*seq(0,10,1/n))
S.signal <- spectrum(signal, log="no")
max(S.signal$spec)
[1] 25261.03
You have better sampling, so you get a value closer to the real value of the spectrum (that is inifinity here).

A late answer, but in case it helps others. As previous answers state, it is not a question of how to convert the spectral density to an amplitude, but rather, having found a signal in our density spectra, how do we extract the amplitude at the dominant frequency. I found the custom function proposed in this post useful.
An example implementing it with original poster's example:
power_spec = function(y,samp.freq, ...){
N <- length(y)
fk <- fft(y)
fk <- fk[2:length(fk)/2+1]
fk <- 2*fk[seq(1, length(fk), by = 2)]/N
freq <- (1:(length(fk)))* samp.freq/(2*length(fk))
data.frame(amplitude = Mod(fk), freq = freq)
}
f <- 5
n <- 500
signal <- 4*sin(2*pi*f*seq(0,10,1/n))
x = power_spec(signal,samp.freq = 1/n)
plot(x$amplitude~x$freq,type='l',xlim=c(0,10))
We find a peak with an amplitude of 4 at f = 5.
Please up-vote the original post where this custom function came from if it helps you too!

If your signal is really like what you mentioned in your code, a sin() function, then you should only get a impulse/peak at one location, and anywhere else is simply zero.

Related

Simulating a process n times in R

I've written an R script (sourced from here) simulating the path of a geometric Brownian motion of a stock price, and I need the simulation to run 1000 times such that I generate 1000 paths of the process Ut = Ste^-mu*t, by discretizing the law of motion derived from Ut which is the bottom line of the solution to the question posted here.
The process also has n = 252 steps and discretization step = 1/252, also risk of sigma = 0.4 and instantaneous drift mu, which I've treated as zero, although I'm not sure about this. I'm struggling to simulate 1000 paths of the process but am able to generate one single path, I'm unsure which variables I need to change or whether there's an issue in my for loop that's restricting me from generating all 1000 paths. Could it also be that the script is simulating each individual point for 252 realization instead of simulating the full process? If so, would this restrict me from generating all 1000 paths? Is it also possible that the array I'm generating defined as U hasn't being correctly generated by me? U[0] must equal 1 and so too must the first realization U(1) = 1. The code is below, I'm pretty stuck trying to figure this out so any help is appreciated.
#Simulating Geometric Brownian motion (GMB)
tau <- 1 #time to expiry
N <- 253 #number of sub intervals
dt <- tau/N #length of each time sub interval
time <- seq(from=0, to=N, by=dt) #time moments in which we simulate the process
length(time) #it should be N+1
mu <- 0 #GBM parameter 1
sigma <- 0.4 #GBM parameter 2
s0 <- 1 #GBM parameter 3
#simulate Geometric Brownian motion path
dwt <- rnorm(N, mean = 0, sd = 1) #standard normal sample of N elements
dW <- dwt*sqrt(dt) #Brownian motion increments
W <- c(0, cumsum(dW)) #Brownian motion at each time instant N+1 elements
#Define U Array and set initial values of U
U <- array(0, c(N,1)) #array of U
U[0] = 1
U[1] <- s0 #first element of U is s0. with the for loop we find the other N elements
for(i in 2:length(U)){
U[i] <- (U[1]*exp(mu - 0.5*sigma^2*i*dt + sigma*W[i-1]))*exp(-mu*i)
}
#Plot
plot(ts(U), main = expression(paste("Simulation of Ut")))
This questions is quite difficult to answer since there are a lot of unclear things, at least to me.
To begin with, length(time) is equal to 64010, not N + 1, which will be 254.
If I understand correctly, the brownian motion function returns the position in one dimension given a time. Hence, to calculate this position for each time the following can be enough:
s0*exp((mu - 0.5*sigma^2)*time + sigma*rnorm(length(time),0,time))
However, this calculates 64010 points, not 253. If you replicate it 1000 times, it gives 64010000 points, which is quite a lot.
> B <- 1000
> res <- replicate(B, {
+ s0*exp((mu - 0.5*sigma^2)*time + sigma*rnorm(length(time),0,time))
+ })
> length(res)
[1] 64010000
> dim(res)
[1] 64010 1000
I know I'm missing the second part, the one explained here, but I actually don't fully understand what you need there. If you can draw the formula maybe I can help you.
In general, avoid programming in R using for loops to iterate vectors. R is a vectorized language, and there is no need for that. If you want to run the same code B times, the replicate(B,{ your code }) function is your firend.

Is there a way to replicate Excel's goal seek in R involving sumif

I have what I think is a fairly simple ask but I don't seem to be finding a solution for it.
Here's a reproducible example
Cost <- c(100,200,300,400,500)
Savings <- c(10,20,30,40,50)
Fees <- c(20,20,20,20,20)
Data <- data.frame(Cost,Savings,Fees)
ROI <- function(x) {
sum(subset(Data,Cost>=x)$Savings)/sum(subset(Data,Cost>=x)$Fees)
}
Basically I want to find the optimum value of the Cost (x) which will make the ROI = 2
In Excel what I do is
Assuming you have Cost, Savings, and Fees in columns A, B, and C respectively.
In cell E2, I wrote the function below
SUMIF($A$2:$A$6,H2,$B$2:$B$6)/SUMIF($A$2:$A$6,H2,$C$2:$C$6)
Where H2 has the following function: CONCATENATE(">",G2)
G2 is the parameter of interest here.
I then go to Data What-If analysis > Goal Seek
Set Cell E2 to Value 2 by changing G2
Is there a way to replicate Excel's steps in R?
Thank you!
P.S. in R, If I did
ROI (201) or ROI (299)
I get 2
What I want to find is the 201.
You could try the uniroot() function to find the value x where your ROI function - 2 ==0.
Cost <- c(100,200,300,400,500)
Savings <- c(10,20,30,40,50)
Fees <- c(20,20,20,20,20)
Data <- data.frame(Cost,Savings,Fees)
ROI <- function(x, targetROI) {
y<-sum(subset(Data,Cost>=x)$Savings)/sum(subset(Data,Cost>=x)$Fees)
y-targetROI
}
uniroot(ROI, c(0,500), targetROI=2)
Note Since there is a step change in your function, there are multiple roots this function from x=201 to x=300

generating random x and y coordinates with a minimum distance

Is there a way in R to generate random coordinates with a minimum distance between them?
E.g. what I'd like to avoid
x <- c(0,3.9,4.1,8)
y <- c(1,4.1,3.9,7)
plot(x~y)
This is a classical problem from stochastic geometry. Completely random points in space where the number of points falling in disjoint regions are independent of each other corresponds to a homogeneous Poisson point process (in this case in R^2, but could be in almost any space).
An important feature is that the total number of points has to be random before you can have independence of the counts of points in disjoint regions.
For the Poisson process points can be arbitrarily close together. If you define a process by sampling the Poisson process until you don't have any points that are too close together you have the so-called Gibbs Hardcore process. This has been studied a lot in the literature and there are different ways to simulate it. The R package spatstat has functions to do this. rHardcore is a perfect sampler, but if you want a high intensity of points and a big hard core distance it may not terminate in finite time... The distribution can be obtained as the limit of a Markov chain and rmh.default lets you run a Markov chain with a given Gibbs model as its invariant distribution. This finishes in finite time but only gives a realisation of an approximate distribution.
In rmh.default you can also simulate conditional on a fixed number of points. Note that when you sample in a finite box there is of course an upper limit to how many points you can fit with a given hard core radius, and the closer you are to this limit the more problematic it becomes to sample correctly from the distribution.
Example:
library(spatstat)
beta <- 100; R = 0.1
win <- square(1) # Unit square for simulation
X1 <- rHardcore(beta, R, W = win) # Exact sampling -- beware it may run forever for some par.!
plot(X1, main = paste("Exact sim. of hardcore model; beta =", beta, "and R =", R))
minnndist(X1) # Observed min. nearest neighbour dist.
#> [1] 0.102402
Approximate simulation
model <- rmhmodel(cif="hardcore", par = list(beta=beta, hc=R), w = win)
X2 <- rmh(model)
#> Checking arguments..determining simulation windows...Starting simulation.
#> Initial state...Ready to simulate. Generating proposal points...Running Metropolis-Hastings.
plot(X2, main = paste("Approx. sim. of hardcore model; beta =", beta, "and R =", R))
minnndist(X2) # Observed min. nearest neighbour dist.
#> [1] 0.1005433
Approximate simulation conditional on number of points
X3 <- rmh(model, control = rmhcontrol(p=1), start = list(n.start = 42))
#> Checking arguments..determining simulation windows...Starting simulation.
#> Initial state...Ready to simulate. Generating proposal points...Running Metropolis-Hastings.
plot(X3, main = paste("Approx. sim. given n =", 42))
minnndist(X3) # Observed min. nearest neighbour dist.
#> [1] 0.1018068
OK, how about this? You just generate random number pairs without restriction and then remove the onces which are too close. This could be a great start for that:
minimumDistancePairs <- function(x, y, minDistance){
i <- 1
repeat{
distance <- sqrt((x-x[i])^2 + (y-y[i])^2) < minDistance # pythagorean theorem
distance[i] <- FALSE # distance to oneself is always zero
if(any(distance)) { # if too close to any other point
x <- x[-i] # remove element from x
y <- y[-i] # and remove element from y
} else { # otherwise...
i = i + 1 # repeat the procedure with the next element
}
if (i > length(x)) break
}
data.frame(x,y)
}
minimumDistancePairs(
c(0,3.9,4.1,8)
, c(1,4.1,3.9,7)
, 1
)
will lead to
x y
1 0.0 1.0
2 4.1 3.9
3 8.0 7.0
Be aware, though, of the fact that these are not random numbers anymore (however you solve problem).
You can use rejection sapling https://en.wikipedia.org/wiki/Rejection_sampling
The principle is simple: you resample until you data verify the condition.
> set.seed(1)
>
> x <- rnorm(2)
> y <- rnorm(2)
> (x[1]-x[2])^2+(y[1]-y[2])^2
[1] 6.565578
> while((x[1]-x[2])^2+(y[1]-y[2])^2 > 1) {
+ x <- rnorm(2)
+ y <- rnorm(2)
+ }
> (x[1]-x[2])^2+(y[1]-y[2])^2
[1] 0.9733252
>
The following is a naive hit-and-miss approach which for some choices of parameters (which were left unspecified in the question) works well. If performance becomes an issue, you could experiment with the package gpuR which has a GPU-accelerated distance matrix calculation.
rand.separated <- function(n,x0,x1,y0,y1,d,trials = 1000){
for(i in 1:trials){
nums <- cbind(runif(n,x0,x1),runif(n,y0,y1))
if(min(dist(nums)) >= d) return(nums)
}
return(NA) #no luck
}
This repeatedly draws samples of size n in [x0,x1]x[y0,y1] and then throws the sample away if it doesn't satisfy. As a safety, trials guards against an infinite loop. If solutions are hard to find or n is large you might need to increase or decrease trials.
For example:
> set.seed(2018)
> nums <- rand.separated(25,0,10,0,10,0.2)
> plot(nums)
runs almost instantly and produces:
Im not sure what you are asking.
if you want random coordinates here.
c(
runif(1,max=y[1],min=x[1]),
runif(1,max=y[2],min=x[2]),
runif(1,min=y[3],max=x[3]),
runif(1,min=y[4],max=x[4])
)

Bandpassfilter R using fft

I have a time series z with sampling frequeny fs = 12(monthly data) and I would like to perform a bandpass filter using the fftat 10 months and 15 months. This is how I would proceed:
y <- as.data.frame(fft(z))
y$freq <- ..
y$y <- ifelse(y$freq>= 1/10 & y$freq<= 1/15,y$y,0)
zz <- fft(y$y, inverse = TRUE)/length(z)
plot zz in the time domain...
However, I don't know how to derive the frequencies of the fft and I don't know how to plot zz in the time domain. Can someone help me?
I have a function, that wraps fft() a bit:
function(y, samp.freq, ...){
N <- length(y)
fk <- fft(y)
fk <- fk[2:length(fk)/2+1]
fk <- 2*fk[seq(1, length(fk), by = 2)]/N
freq <- (1:(length(fk)))* samp.freq/(2*length(fk))
return(data.frame(fur = fk, freq = freq))
}
y is values of your signal, and samp.freq is it's sample frequency. It's output is data.frame with two columns - fur is complex numbers we get after fast fourier transform (Mod(fur) will be an amplitude, Arg(fur) - a phase) and freq is vector of corresponding frequencies.
But for frequency filtering I highly reccomend using signal package.
For example using Butterworth filter:
library('signal')
bf <- butter(2, c(low, high), type = "pass")
signal.filtered <- filtfilt(bf, signal.noisy)
In this case interval should be defined as c(Low.freq, High.freq) * (2/samp.freq), where Low.freq and High.freq - borders of frequency intervals. More information can be found in package documentation and octave reference guide.
Also, notice that with fft you can get only frequencies up to (sample frequency)/2.

What is the formula to calculate the gini with sample weight

I need your helps to explain how I can obtain the same result as this function does:
gini(x, weights=rep(1,length=length(x)))
http://cran.r-project.org/web/packages/reldist/reldist.pdf --> page 2. Gini
Let's say, we need to measure the inocme of the population N. To do that, we can divide the population N into K subgroups. And in each subgroup kth, we will take nk individual and ask for their income. As the result, we will get the "individual's income" and each individual will have particular "sample weight" to represent for their contribution to the population N. Here is example that I simply get from previous link and the dataset is from NLS
rm(list=ls())
cat("\014")
library(reldist)
data(nls);data
help(nls)
# Convert the wage growth from (log. dollar) to (dollar)
y <- exp(recent$chpermwage);y
# Compute the unweighted estimate
gini_y <- gini(y)
# Compute the weighted estimate
gini_yw <- gini(y,w=recent$wgt)
> --- Here is the result----
> gini_y = 0.3418394
> gini_yw = 0.3483615
I know how to compute the Gini without WEIGHTS by my own code. Therefore, I would like to keep the command gini(y) in my code, without any doubts. The only thing I concerned is that the way gini(y,w) operate to obtain the result 0.3483615. I tried to do another calculation as follow to see whether I can come up with the same result as gini_yw. Here is another code that I based on CDF, Section 9.5, from this book: ‘‘Relative
Distribution Methods in the Social Sciences’’ by Mark S. Handcock,
#-------------------------
# test how gini computes with the sample weights
z <- exp(recent$chpermwage) * recent$wgt
gini_z <- gini(z)
# Result gini_z = 0.3924161
As you see, my calculation gini_z is different from command gini(y, weights). If someone of you know how to build correct computation to obtain exactly
gini_yw = 0.3483615, please give me your advices.
Thanks a lot friends.
function (x, weights = rep(1, length = length(x)))
{
ox <- order(x)
x <- x[ox]
weights <- weights[ox]/sum(weights)
p <- cumsum(weights)
nu <- cumsum(weights * x)
n <- length(nu)
nu <- nu/nu[n]
sum(nu[-1] * p[-n]) - sum(nu[-n] * p[-1])
}
This is the source code for the function gini which can be seen by entering gini into the console. No parentheses or anything else.
EDIT:
This can be done for any function or object really.
This is bit late, but one may be interested in concentration/diversity measures contained in the [SciencesPo][1] package.

Resources