This
n = length(s)
# n = 25920169
nfft = 8192
noverlap = Int64(floor(nfft/2))
window = hanning(nfft)
#sp = spectrogram(s, n, noverlap; nfft=nfft, fs=1, window=window)
sp = periodogram(s; nfft=nfft, fs=1, window=window)
throws the error
nfft must be >= n
But the documentation says:
If length(s) < nfft, then the input is padded with zeros.
Doesn't it mean that nfft < n should be correct?
I think that the FFT length nfft should be greater than the signal length n to prevent aliasing.
The periodogram function uses FFT internally, where the length is denoted as nfft. In theory, when using FFT, the signal in both time domain and frequency domain are discrete and periodic, where the period is given by nfft. So, if you specify an nfft that is less than the signal length n, this actually introduces aliasing in the time domain to make the signal periodic with nfft.
For example, if you have a sequence 1 2 3 4 5, assuming that your period is also 5, you have
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
--------------------------------
... 1 2 3 4 5 ...
i.e., the original sequence. Now assume you have a period of 3, then it looks like
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
------------------------
... 5 7 3 ...
When you take FFT of this sequence with n > nfft, you are working with this aliased sequence.
You can manually allow for n > nfft by applying the wrap(x,nfft) as bellow and feeding its output to periodogram, MATLAB does exactly that.
function wrap(x,nfft)
y = zeros(eltype(x),nfft)
for (i,xi) in enumerate(x)
y[mod1(i,nfft)] += xi
end
y
end
For example:
wrap(1:5,3)
3-element Vector{Int64}:
5
7
3
Related
I am working with the R programming language.
I have the following data set:
my_data = data.frame(id = c(1,2,3,4,5), n = c(15,3,51,8,75))
I want to create a new variable that generates a single random integer for each row based on the corresponding value of "n". I tried to do this with the following code:
my_data$rand = sample.int(my_data$n,1)
But this is not working (the same random number is repeated 5 times).
I also tried to define a function to this:
my_function <- function(x){sample.int(x,1)}
transform(my_data, new_column= my_function(my_data$n) )
But this is also not working (the same random number is again repeated 5 times)..
In the end, I am trying to achieve something like this :
my_data$rand = c(sample.int(15,1), sample.int(3,1), sample.int(51,1), sample.int(8,1), sample.int(75,1))
Can someone please show me how to do this for larger datasets without having to manually specify each "sample.int" command?
Thanks!
When you say "based on value of n" what do you mean by that exactly? Based on n how?
Guess#1: at each row, you want to draw one random number with possible values being 1 to n.
Guess#2: at each row, you want to draw n random numbers for possible values between 0 and 1.
Second option is harder, but option #1 can be done with a loop:
my_data = data.frame(id = c(1,2,3,4,5), n = c(15,3,51,8,75))
my_data$rand = NA
set.seed(123)
for(i in 1:nrow(my_data)){
my_data$rand[i] = sample(1:(my_data$n[i]), size = 1)
}
my_data
id n rand
1 1 15 15
2 2 3 3
3 3 51 51
4 4 8 6
5 5 75 67
We can use sapply to go over all rows in my_data, and generate one sample.int per iteration.
my_data$rand <- sapply(1:nrow(my_data), function(x) sample.int(my_data[x, 2], 1))
id n rand
1 1 15 7
2 2 3 2
3 3 51 28
4 4 8 6
5 5 75 9
You can do this efficiently by a single call to runif(), multiplying by n, and rounding up:
transform(my_data, rand = ceiling(runif(n) * n))
id n rand
1 1 15 13
2 2 3 1
3 3 51 41
4 4 8 1
5 5 75 9
I have a few years of dataset and I am try to look at duration of events within the dataset. For example, I would like to know the duration of "Strong wind events". I can do this by:
wind.df <- data.frame(ws = c(6,7,8,9,1,7,6,1,2,3,4,10,4,1,2))
r <- rle(wind.df$ws>=6)
sequence <- unlist(sapply(r$lengths, seq))
wind.df$strong.wind.duration <- sequence
BUT, if the wind speed goes below a threshold for only two datapoints, I want to keep counting. If the wind speed is below a threshold for more than two, then I want to reset the counter.
So the output would look like:
## manually creating a desired output ###
wind.df$desired.output <- c(1,2,3,4,5,6,7,1,2,3,4,5,6,7,8)
You can do this with a customized function that loops over your wind speeds and counts the consecutive numbers above a threshold:
numerate = function(nv, threshold = 6){
counter = 1
clist = c()
low=TRUE
for(i in 1:(length(nv))){
if(max(nv[i:(i+2)],na.rm=T)<threshold & !low){ ## Reset the counter
counter = 1
low = T
}
if(nv[i]>=threshold){low=FALSE}
clist=c(clist,counter)
counter=counter+1
}
return(clist)
}
wind.df <- data.frame(ws = c(6,7,8,9,1,7,6,1,2,3,4,10,4,1,2))
wind.df$desired.output = numerate(wind.df$ws)
The output of this function would be:
> print(wind.df)
ws desired.output
1 6 1
2 7 2
3 8 3
4 9 4
5 1 5
6 7 6
7 6 7
8 1 1
9 2 2
10 3 3
11 4 4
12 10 5
13 4 1
14 1 2
15 2 3
The desired output you wrote in your question is wrong, as the last three element of the wind speed are 4, 1, 2. That's more than two values below 6 after there was a value above 6. So, the counter has to be reset.
I'm trying to understand how the %% operator works in R:
10 %% 10 # 0
20 %% 10 # 0
I'm not sure about these two results:
10 %% 20 # 10
2 %% 8 # 2
Can you help me understand the last two results? I'm a little confused.
Nothing wrong:
10 = 1 * 10 + 0
20 = 2 * 10 + 0
10 = 0 * 20 + 10
2 = 0 * 8 + 2
The modulo is the number after +.
In general, for two numbers a and b, there is
a = floor(a / b) * b + (a %% b)
Let's write a toy function:
foo <- function(a,b) c(quotient = floor(a / b), modulo = a %% b)
foo(10, 10)
#quotient modulo
# 1 0
foo(20, 10)
#quotient modulo
# 2 0
foo(10, 20)
#quotient modulo
# 0 10
foo(2, 8)
#quotient modulo
# 0 2
Update: Instead of using floor(a / b) to get quotient, we can also use a %/% b.
I'll offer another explanation. Take this problem:
20 %% 10 = 0
Instead of evaluating the modulo, start with simple divison:
20 / 10 = 2
As you know, the answer "2" means that it takes two sets of 10 to get 20. Note that we can also write the answer this way with the decimal, 2.0.
The decimal is important. When the decimal is .0, we have no remainder. We have complete sets. If division yields a 0 decimal, then the modulo evaluates to zero.
Now consider this:
11/3 = 3.667
That tail part, the 0.667, is the portion of a set of 3 that remains after we form all full sets of 3 that we can. On the left side of the decimal, we show:
#Splitting the answer into its components - 3 full sets, 0.667 partial sets
3.0 + 0.667 = 3.667
So if we want to know the actual remaining quantity, we can multiply 0.667 by the divisor, 3:
0.667 * 3 = 2
This is the remainder. It is the quantity that remains after all full sets of 3 are formed. It's the same result we get using modulo:
11 %% 3 = 2
The same applies here. Given this problem,
10 %% 20 = 10
we can divide normally and get:
10 / 20 = 0.5
Reading this out, we have 0 full groups of 20 (left side); we only have half a set, 0.5, of 20.
0.5 * 20 = 10
This is equivalent to:
10 %% 20 = 10
10 is thus the remainder. It's the gap between the 10 we have and the 10 we need to get to 20.
I was also very confused, but if you understand that the result of the %% operator is the REMAINDER of a division, it is very easy.
Eg. 75%%4 = 3
and I noticed if the dividend is lower than the divisor, then R returns the same dividend value.
Eg.
4%%75 = 4
10 %% 20 = 10
2 %% 8 = 2
Syntax
remainder <- dividend %% divisor
Details
The only thing that were missing from the documentations were the details on which side is the dividend and which side is the divisor. Wikipedia describes the two terms as:
What is being divided is called the dividend, which is divided by the divisor, and the result is called the quotient. In the example, 20 is the dividend, 5 is the divisor, and 4 is the quotient.
However, in comparison with the division operation, the modulo operation is not returning the quotient. Instead, it is returning the remainder.
Examples
To easily understand the modulo operation, ideally the dividend > divisor.
12 %% 11
# quotient is 1.090909
# remainder is 1
12 %% 10
# quotient is 1.2
# remainder is 2
12 %% 9
# quotient is 1.333333
# remainder is 3
12 %% 8
# quotient is 1.5
# remainder is 4
12 %% 7
# quotient is 1.714286
# remainder is 5
12 %% 6
# quotient is 2
# remainder is 0
# 12 is divisible by 6
12 %% 5
# quotient is 2.4
# remainder is 2
12 %% 4
# quotient is 3
# remainder is 0
# 12 is divisible by 4
12 %% 3
# quotient is 4
# remainder is 0
# 12 is divisible by 3
12 %% 2
# quotient is 6
# remainder is 0
# 12 is divisible by 2
12 %% 1
# quotient is 12
# remainder is 0
# any whole number is divisible by 1
Trying to understand some results in R with x modulo y I found this page. Then trying to explain to myself some "quirky" results I wrote this R script below. I had read that the remainder or result of modulo operator is supposed to be always positive, but this is not the case in R, and the definition and example provide here explain the logic that seems to be used. Definition x mod y = x - ( |_x/y_| * y) where |_x/y_| = floor(x/y) seems to always be true in R, or in a more standard way, the definition of the remainder r of the operation q = x / y is x = k*q + r, where k and r both are integers.
Basically in R with x = 2 and y = - 5, x mod y = -3; or using definition x = k*q + r we have r = x - k*q = -3.
Still, this is kind of quirky in a mathematical sense because "integer part product" (k*q) actually exceeds the dividend (x), thus defining the remainder (r) as a negative integer...
x <- 2
y <- -5
q <- x/y
k <- floor(2/-5)
kq <- floor(2/-5) * -5
r <- 2 - (floor(2/-5) * -5)
x %% y
I'm trying to understand how the %% operator works in R:
10 %% 10 # 0
20 %% 10 # 0
I'm not sure about these two results:
10 %% 20 # 10
2 %% 8 # 2
Can you help me understand the last two results? I'm a little confused.
Nothing wrong:
10 = 1 * 10 + 0
20 = 2 * 10 + 0
10 = 0 * 20 + 10
2 = 0 * 8 + 2
The modulo is the number after +.
In general, for two numbers a and b, there is
a = floor(a / b) * b + (a %% b)
Let's write a toy function:
foo <- function(a,b) c(quotient = floor(a / b), modulo = a %% b)
foo(10, 10)
#quotient modulo
# 1 0
foo(20, 10)
#quotient modulo
# 2 0
foo(10, 20)
#quotient modulo
# 0 10
foo(2, 8)
#quotient modulo
# 0 2
Update: Instead of using floor(a / b) to get quotient, we can also use a %/% b.
I'll offer another explanation. Take this problem:
20 %% 10 = 0
Instead of evaluating the modulo, start with simple divison:
20 / 10 = 2
As you know, the answer "2" means that it takes two sets of 10 to get 20. Note that we can also write the answer this way with the decimal, 2.0.
The decimal is important. When the decimal is .0, we have no remainder. We have complete sets. If division yields a 0 decimal, then the modulo evaluates to zero.
Now consider this:
11/3 = 3.667
That tail part, the 0.667, is the portion of a set of 3 that remains after we form all full sets of 3 that we can. On the left side of the decimal, we show:
#Splitting the answer into its components - 3 full sets, 0.667 partial sets
3.0 + 0.667 = 3.667
So if we want to know the actual remaining quantity, we can multiply 0.667 by the divisor, 3:
0.667 * 3 = 2
This is the remainder. It is the quantity that remains after all full sets of 3 are formed. It's the same result we get using modulo:
11 %% 3 = 2
The same applies here. Given this problem,
10 %% 20 = 10
we can divide normally and get:
10 / 20 = 0.5
Reading this out, we have 0 full groups of 20 (left side); we only have half a set, 0.5, of 20.
0.5 * 20 = 10
This is equivalent to:
10 %% 20 = 10
10 is thus the remainder. It's the gap between the 10 we have and the 10 we need to get to 20.
I was also very confused, but if you understand that the result of the %% operator is the REMAINDER of a division, it is very easy.
Eg. 75%%4 = 3
and I noticed if the dividend is lower than the divisor, then R returns the same dividend value.
Eg.
4%%75 = 4
10 %% 20 = 10
2 %% 8 = 2
Syntax
remainder <- dividend %% divisor
Details
The only thing that were missing from the documentations were the details on which side is the dividend and which side is the divisor. Wikipedia describes the two terms as:
What is being divided is called the dividend, which is divided by the divisor, and the result is called the quotient. In the example, 20 is the dividend, 5 is the divisor, and 4 is the quotient.
However, in comparison with the division operation, the modulo operation is not returning the quotient. Instead, it is returning the remainder.
Examples
To easily understand the modulo operation, ideally the dividend > divisor.
12 %% 11
# quotient is 1.090909
# remainder is 1
12 %% 10
# quotient is 1.2
# remainder is 2
12 %% 9
# quotient is 1.333333
# remainder is 3
12 %% 8
# quotient is 1.5
# remainder is 4
12 %% 7
# quotient is 1.714286
# remainder is 5
12 %% 6
# quotient is 2
# remainder is 0
# 12 is divisible by 6
12 %% 5
# quotient is 2.4
# remainder is 2
12 %% 4
# quotient is 3
# remainder is 0
# 12 is divisible by 4
12 %% 3
# quotient is 4
# remainder is 0
# 12 is divisible by 3
12 %% 2
# quotient is 6
# remainder is 0
# 12 is divisible by 2
12 %% 1
# quotient is 12
# remainder is 0
# any whole number is divisible by 1
Trying to understand some results in R with x modulo y I found this page. Then trying to explain to myself some "quirky" results I wrote this R script below. I had read that the remainder or result of modulo operator is supposed to be always positive, but this is not the case in R, and the definition and example provide here explain the logic that seems to be used. Definition x mod y = x - ( |_x/y_| * y) where |_x/y_| = floor(x/y) seems to always be true in R, or in a more standard way, the definition of the remainder r of the operation q = x / y is x = k*q + r, where k and r both are integers.
Basically in R with x = 2 and y = - 5, x mod y = -3; or using definition x = k*q + r we have r = x - k*q = -3.
Still, this is kind of quirky in a mathematical sense because "integer part product" (k*q) actually exceeds the dividend (x), thus defining the remainder (r) as a negative integer...
x <- 2
y <- -5
q <- x/y
k <- floor(2/-5)
kq <- floor(2/-5) * -5
r <- 2 - (floor(2/-5) * -5)
x %% y
I need help simulating a dataset.
It is supposed to simulate all possible outcomes on a signal detection theory task (participants are presented with trials and have to decide whether or not they detected given signal). Now, I need a dataset of all possible values for varying number of trials.
Say, there are 6 trials, 5 with the signal present, 5 with the signal absent. I am only interested in correct detections (hits) and false alarms (Type I errors). A participant can correctly detect between 1 (I don't need 0's) and 5 and make the same number of false alarms. With all possible combinations, that would be dataset containing two variables with 5^2 cases each. To make things more complicated, even the number of trials is variable. The number of both signal and non-signal trials can vary between 1 and 20 but the total number of trials cannot be less than 3 (either 1 S trial and 2 Non-S trials, or the other way around). And for each possible combination of trials, there is a group of possible combinations of hits and false alarms.
What I need is a dataset with 5 variables (total N, N of S trials, N of Non-S trials, N of Hits, and N of False Alarms) with all the possible values.
EXAMPLE
Here are all possible data for total N of 4. Note that Signal + Noise = N_total and that N_Hit seq(1:Signal) and N_FA seq(1:Noise)
N_total Signal Noise N_Hit N_FA
4 1 3 1 1
4 1 3 1 2
4 1 3 1 3
4 2 2 1 1
4 2 2 1 2
4 2 2 2 1
4 2 2 2 2
4 3 1 1 1
4 3 1 2 1
4 3 1 3 1
I'm an R novice so any help at all would be much appreciated!
Hope the description is clear.
I created a function, which uses the number of trials as parameter.
myfunc <- function(n) {
# create a data frame of all combinations
grid <- expand.grid(rep(list(seq_len(n - 1)), 4))
# remove invalid combinations (keep valid ones)
grid <- grid[grid[3] <= grid[1] & # number of hits <= number of signals
grid[4] <= grid[2] & # false alarms <= noise
(grid[1] + grid[2]) == n , ] # signal and noise sum to total n
# remove signal and noise > 20
grid <- grid[!rowSums(grid[1:2] > 20), ]
# sort rows
grid <- grid[order(grid[1], grid[3], grid[4]), ]
# add total number of trials
res <- cbind(n, grid)
# remove row names, add column names and return the object
return(setNames("rownames<-"(res, NULL),
c("N_total", "Signal", "Noise", "N_Hit", "N_FA")))
}
Use the function:
> myfunc(4)
N_total Signal Noise N_Hit N_FA
1 4 1 3 1 1
2 4 1 3 1 2
3 4 1 3 1 3
4 4 2 2 1 1
5 4 2 2 1 2
6 4 2 2 2 1
7 4 2 2 2 2
8 4 3 1 1 1
9 4 3 1 2 1
10 4 3 1 3 1
How to apply this function to the values 3-40:
lapply(3:40, myfunc)
This will return a list of data frames.