How does one calculate the correct amplitude with R's fft()-function? [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
In R, I am generating uncorrelated values in time domain with rnorm(). Then I apply fft() to these values, however, I am only getting a value of 0.88 instead of 1. Is there anything I am not aware of?
Here is a MWE:
# dt <- 0.01 # time stesp
nSteps <- 100000 # Number of time steps
# df <- 1/(nSteps*dt) # frequency resolution
# t <- 0:(nSteps-1)*dt #
y <- rnorm(nSteps, mean=0, sd=1) # generate uncorrelated data. Should result in a white noise spectrum with sd=1
y_sq_sum <- sum(y^2)
# We ignore cutting to the Nyquist frequency.
# f <- 0:(nSteps-1)*df
fft_y <- abs(fft(y))/sqrt(length(y))
fft_y_sq_sum <- sum(fft_y^2)
print(paste("Check for Parseval's theorem: y_sq_sum = ", y_sq_sum, "; fft_y_sq_sum = ", fft_y_sq_sum, sep=""))
print(paste("Mean amplitude of my fft spectrum: ", mean(fft_y)))
print(paste("The above is typically around 0.88, why is it not 1?"))

This question doesn't belong on StackOverflow, it's more of a Cross-validated kind of thing. But here's an answer anyway:
Parseval's theorem says that the mean of fft_y^2 should be 1. The square root function is a concave function, so Jensen's inequality says the mean of sqrt(fft_y^2) will be less than 1. Since fft_y is positive in your definition, fft_y = sqrt(fft_y^2).

Related

Pls convert this R code of bit error probability into matlab [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
solution of the bit error probability problem
prob=function(E,m) #--- prob is the estimated error probabity for given values of signal to
#---noise ratio E and sample size m
{
stopifnot(E>=0 & m>0) #--- this says that the function won't accept negative values of E and
#---m shoulde be at least 1
n=rnorm(m) #--- this says that n is a random sample of size m from N(0,1)
#---distribution
m=mean(n< -sqrt(E)) #--- this says that m is the proportion of values in n which are less
#---than the negative root of E
return(m) #--- this gives us the value of m, which is the estimated error
#---probability
}
E=seq(0,2,by=0.001)
sam=1000
y=sapply(E,prob,m=sam)
p=10*log10(E)
plot(p, log(y),
main="Graph For The Error Probabilities",
xlab=expression(10*log[10](E)),
ylab="log(Error Probability)",
type="l")
You can try the MATLAB code like below
clc;
clear;
close all;
function y = prob(E,m)
assert(E>=0 & m>0);
n = randn(1,m);
y = mean(n+sqrt(E)<0);
end
E = 0:0.001:2;
sam = 1000;
y = arrayfun(#(x) prob(x,sam),E);
p = 10*log10(E);
plot(p,log(y));
title("Graph For The Error Probabilities");
xlabel('10\log_{10}(E)');
ylabel("log(Error Probability)");
OUTPUT (MATLAB)
OUTPUT (R)

Coding a simple loop for a sliding window [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I have the following problem. I have a time series made by 2659 observations. I need to perform a statistical test over a sliding window of length 256 and each time I want to extract the p-values from these tests and gather them into a time series vector. To perform this test (runs test) I want as threshold a moving average that moves along with the data and the rolling window. Here is my attemp (in R)
x<- ts(rnorm(2659, mean = 0.0001, sd = 0.0001))
library(randtests)
for(i in 1:2404){
runs <- runs.test(x[i:i+255], threshold = mean(x[i:i+255]))
ret[i] <- runs$p.value
}
The index starts from 1 but stops to 2404 because the time window must move of 256 each time, therefore the first window goes from 1 to 256, the second from 2 to 257... and finally stops to 255+2404 = 2659. I hope that I made clear my problem, I do not understand why it does not work. Of course I need to plot the result over time to have in a plot all the p-values over the time. I hope you can help me.
PS: Please, set a seed if you propose an example so that I can reproduce your results.
Use rollapplyr with the indicated function.
library(zoo)
pv <- function(xx) runs.test(xx, threshold = mean(xx))$p.value
out <- rollapplyr(x, 256, pv, fill = NA)
Note
library(randtests)
set.seed(123)
x <- ts(rnorm(2659, mean = 0.0001, sd = 0.0001))
Two changes to your existing code should make it work:
set.seed(0)
x <- ts(rnorm(2659, mean = 0.0001, sd = 0.0001))
library(randtests)
ret <- rep(NA, length(x))
for(i in 1:2404){
runs <- runs.test(x[i:(i+255)], threshold = mean(x[i:(i+255)]))
ret[i] <- runs$p.value
}
First change is to initialize the ret variable before the loop. ret <- rep(NA, length(x))
The second change is to add the parenthesis, i.e. x[i:(i+255)]. If you do x[i:i+255], you will get a single return value, x[i].

Different dimensions of distributions of topics [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I would like to divide all documents in 10 topics, and it goes well with a converged result except for the dimensions of distributions and covariance matrix of topic.
Why the topics distribution is a 9 dimension vector instead of 10 and their covariance matrix is 9*9 matrix instead of 10*10?
I have use library(topicmodels) and function CTM() to implement the topic model in Chinese.
my code is below:
library(rJava);
library(Rwordseg);
library(NLP);
library(tm);
library(tmcn)
library(tm)
library(Rwordseg)
library(topicmodels)
installDict("C:\\Users\\Jeffy\\OneDrive\\Workplace\\R\\Law.scel","Law");
installDict("C:\\Users\\Jeffy\\OneDrive\\Workplace\\R\\NationalInstitution.scel","NationalInstitution");
installDict("C:\\Users\\Jeffy\\OneDrive\\Workplace\\R\\Place.scel","Place");
installDict("C:\\Users\\Jeffy\\OneDrive\\Workplace\\R\\Psychology.scel","Psychology");
installDict("C:\\Users\\Jeffy\\OneDrive\\Workplace\\R\\Politics.scel","Politics");
listDict();
#read file
d.vec <- segmentCN("samgovWithoutID.csv", returnType = "tm")
samgov.segment <- read.table("samgovWithoutID.segment.csv", header = TRUE, fill = TRUE, stringsAsFactors = F, sep = ",",fileEncoding='utf-8')
fix(samgov.segment)
# create DTM(document term matrix)
d.corpus <- Corpus(VectorSource(samgov.segment$content))
inspect(d.corpus[1:10])
d.corpus <- tm_map(d.corpus, removeWords, stopwordsCN())
ctrl <- list(removePunctuation = TRUE, removeNumbers= TRUE, wordLengths = c(1, Inf), stopwords = stopwordsCN(), wordLengths = c(2, Inf))
d.dtm <- DocumentTermMatrix(d.corpus, control = ctrl)
inspect(d.dtm[1:10, 110:112])
# impletment topic models
ctm10<-CTM(d.dtm,k=10, control=list(seed=2014012692))
Terms10 <- terms(ctm10, 10)
Terms10[,1:10]
ctm20<-CTM(d.dtm,k=20, control=list(seed=2014012692))
Terms20 <- terms(ctm20, 20)
Terms20[,1:20]
The result in R Studio (see Highlighted part):
Help document:
A probability distribution over 10 values has 9 free parameters: once I tell you the probability of the first 9, the probability of the last value has to be one minus the sum of those probabilities.
A 10-dimensional logistic normal distribution is equivalent to sampling a 10-dimensional vector from a Gaussian distribution and then "squashing" that vector by exponentiating it and normalizing it to sum to 1.0. There are an infinite number of 10-dimensional vectors that will exponentiate and normalize to the same 10-dimensional probability distribution -- you just have to add an arbitrary constant c to each value. That's because the mean of the Gaussian has 10 free parameters, one more than the more constrained distribution.
There are several ways to make the Gaussian "identifiable". One is to fix one of the elements of the mean vector to be 0.0. That's why you see a 9-dimensional mean and covariance matrix: the 10th value is always 0 with no variance.

'Non-conformable arguments' in R code [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
: ) I previously wrote an R function that will compute a least-squares polynomial of arbitrary order to fit whatever data I put into it. "LeastSquaresDegreeN.R" The code works because I can reproduce results I got previously. However, when I try to put new data into it I get a "Non-conformable arguments" error.
"Error in Conj(t(Q))%*%t(b) : non-conformable arguments"
An extremely simple example of data that should work:
t <- seq(1,100,1)
fifthDegree <- t^5
LeastSquaresDegreeN(t,fifthDegree,5)
This should output and plot a polynomial f(t) = t^5 (up to rounding errors).
However I get "Non-conformable arguments" error even if I explicitly make these vectors:
t <- as.vector(t)
fifthDegree <- as.vector(fifthDegree)
LeastSquaresDegreeN(t,fifthDegree,5)
I've tried putting in the transpose of these vectors too - but nothing works.
Surely the solution is really simple. Help!? Thank you!
Here's the function:
LeastSquaresDegreeN <- function(t, b, deg)
{
# Usage: t is independent variable vector, b is function data
# i.e., b = f(t)
# deg is desired polynomial order
# deg <- deg + 1 is a little adjustment to make the R loops index correctly.
deg <- deg + 1
t <- t(t)
dataSize <- length(b)
A <- mat.or.vec(dataSize, deg) # Built-in R function to create zero
# matrix or zero vector of arbitrary size
# Given basis phi(z) = 1 + z + z^2 + z^3 + ...
# Define matrix A
for (i in 0:deg-1) {
A[1:dataSize,i+1] = t^i
}
# Compute QR decomposition of A. Pull Q and R out of QRdecomp
QRdecomp <- qr(A)
Q <- qr.Q(QRdecomp, complete=TRUE)
R <- qr.R(QRdecomp, complete=TRUE)
# Perform Q^* b^T (Conjugate transpose of Q)
c <- Conj(t(Q))%*%t(b)
# Find x. R isn't square - so we have to use qr.solve
x <- qr.solve(R, c)
# Create xPlot (which is general enough to plot any degree
# polynomial output)
xPlot = x[1,1]
for (i in 1:deg-1){
xPlot = xPlot + x[i+1,1]*t^i
}
# Now plot it. Least squares "l" plot first, then the points in red.
plot(t, xPlot, type='l', xlab="independent variable t", ylab="function values f(t)", main="Data Plotted with Nth Degree Least Squares Polynomial", col="blue")
points(t, b, col="red")
} # End

Given a sample of random variables, and n, how do I find the ecdf of the sum of n Xs? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I can't fit X to a common distribution so currently I just have X ~ ecdf(sample_data).
How do I calculate the empirical distribution of sum(X1 + ... + Xn), given n? X1 to Xn are iid.
To estimate the distribution of that sum, you can repeatedly sample with replacement (and then take the sum of) n variates from sample_data. (sample() places equal probability mass on each element of sample_data, just as the ecdf does, so you don't need to calculate ecdf(sample_data) as an intermediate step.)
# Create some example data
sample_data <- runif(100)
n <- 10
X <- replicate(1000, sum(sample(sample_data, size=n, replace=TRUE)))
# Plot the estimated distribution of the sum of n variates.
hist(X, breaks=40, col="grey", main=expression(sum(x[i], i==1, n)))
box(bty="l")
# Plot the ecdf of the sum
plot(ecdf(X))
First, generalize and simplify: solve for step function CDFs X and Y, independent but not identically distributed. For every step jump xi and every step jump yi, there will be a corresponding step jump at xi+yi in the CDF of X + Y, So the CDF of X + Y will be characterized by the list:
sorted(x + y for x in X for y in Y)
That means if there are k points in X's CDF, there will be kn in (X1 + ... + Xn). We can cut that down to a manageable number at the end by throwing away all but k again, but clearly the intermediate calculations will be costly in time and space.
Also, note that even though the original CDF is an ECDF for X, the result will not be an ECDF for (X1 + ... + Xn), even if you keep all kn points.
In conclusion, use Josh's solution.

Resources