I am having problems with the syntax of the peakpat option within the findpeaks function within the pramca R package (v. 2.1.1). I am using R 3.4.3 x64 windows.
I would like the function to identify peaks that may have two repeated values, and I believe the option peakpat is how I can do this.
This question has been asked before, however I haven't been able to come across an example of how to implement the option Hans is referring to. This seems very basic and I am also quite a beginner when it comes to coding. In the help file online, it says the following about peakpat:
define a peak as a regular pattern, such as the default pattern ``[+]1,[-]1,''; if a pattern is provided, the parameters nups and ndowns are not taken into account."
I'm having problems interpreting what "[+]1,[-]1" means. Any ideas? I've tried variations of what I think this means, but each attempt results in NULL. Please see my example below, any help/insight is greatly appreciated.
# Example:
install.packages("pracma")
library(pracma)
subset = c(570,584,500,310,261,265,272,313,314,315,330,360,410,410,360,365,368,391,390,414)
# Plots
plot(subset)
lines(subset)
# findpeaks without defining repeated values;
# the result does not identify the peak at subset[13:14] (repeated 'peak' values)
result = findpeaks(subset)
pks1 = data.matrix(result[,1])
locs1 = data.matrix(result[,2])
# findpeaks with my futile attempt at defining peakpat
result = findpeaks(subset, nups=2, ndowns=nups, zero = "0", peakpat="[+]2,[-]2,")
result = findpeaks(subset, nups=1, ndowns=nups, zero = "0", peakpat="[+]1,[-]1,")
result = findpeaks(subset, nups=1, ndowns=nups, zero = "0", peakpat="[+]{,1},[-]{,1}")
result = findpeaks(subset, nups=1, ndowns=nups, zero = "0", peakpat="[+]{1,},[-]{1,}")
result = findpeaks(subset, nups=2, ndowns=nups, zero = "0", peakpat="[2],[2]")
result = findpeaks(subset, nups=2, ndowns=nups, zero = "0", peakpat="[1],[1]")
# all of the above results in NULL
Thank you!
The documentation isn't too helpful in this case, but you can get some clues by inspecting the function body.
Typing the function name into the console lets you inspect its source. Without going into complete detail, this line is helpful:
peakpat <- sprintf("[+]{%d,}[-]{%d,}", nups, ndowns)
This shows us that the default arguments correspond to a peakpat of "[+]{1,}[-]{1,}".
This should also reinforce why if you specify peakpat, you don't need to specify anything for nups and ndowns.
A pattern that does what you're after, for peaks of two repeated values:
result <- findpeaks(subset, peakpat = "[+]{1,}[0]{1,}[-]{1,}")
The commas specify an interval. So if you wanted to limit your search to peaks that have a repeated value of at most length 3:
result <- findpeaks(subset, peakpat = "[+]{1,}[0]{1,2}[-]{1,}")
The function works by turning your data into a string and applying a regular expression, so the usual rules for regex should apply.
This is my final solution for finding the peaks with the traditional peakpat definition and defining sustained peaks (the answer by Callum):
# EXAMPLE: findpeaks
library(pracma)
subset = c(570,584,500,310,261,265,272,313,314,315,330,360,410,410,360,365,368,391,390,414)
# Plots
plot(subset)
lines(subset)
# findpeaks without sustained peaks:
# the result does not identify the peak at subset[13:14] (repeated 'peak' values)
result = findpeaks(subset)
pks1 = result[,1]
locs1 = result[,2]
# findpeaks by defining sustained peaks (2 or more consecutive values):
result = findpeaks(subset, peakpat = "[+]{1,}[0]{1,}[-]{1,}")
pks2 = result[,1]
locs2 = result[,2]
Related
Can someone help me with this? I got the cut_interval code to work for a single test column, but can't seem to get it to work in a for loop to have it run on all of the columns.
#Bin worker data into three groups (low/medium/high %methylation) for the cpg cg10757709
#This code works
cg10757709_interval <- cut_interval(cpgs$cg10757709, n=3, labels = c("low","med","high"))
View(cg10757709_interval)
#Write a loop so that data for each of the significant cpgs will be binned into low, medium, and high groups
#This code gives an error (that there are more elements are supplied than there are to replace)
cpgs_interval <- matrix(ncol = length(cpgs), nrow = 29)
for (i in seq_along(cpgs)) {
cpgs_interval[[i]] <- cut_interval(cpgs[[i]], n=3, labels = c("low","med","high"))
}
View(cpgs_interval)
The error says "Error in cpgs_interval[[i]] <- cut_interval(cpgs[[i]], n = 3, labels = c("low", : more elements supplied than there are to replace". Should I not be using a matrix for cpgs_interval? Or is something else the problem? I'm rather new to writing for loops. Thanks.
In your example, cpgs_interval is a matrix. If you want to put the variable into the ith column of the matrix, you could do:
for (i in seq_along(cpgs)) {
cpgs_interval[,i] <- cut_interval(cpgs[[i]], n=3, labels = c("low","med","high"))
}
That said, you might be better off making cpgs_interval a data frame, then you'll retain the factor rather than turning it into text.
I've been encountering an issue when optimizing a strategy using the apply.paramset function in quantstrat.
The issue I am having appears to be the same as the one here:
Quantstrat: apply.paramset fails due to combine error for certain paramater distributions, but not others
The optimization works well if all of the parameter combinations return at least one transaction, however, if one of the combinations doesn't return a transaction then the results for all of the combinations are lost/NULL, typically with the following error:
"simpleError in match.names(clabs, names(xi)): names do not match previous names"
I have provided a short example below where it produces one successful parameter combination that produces a number of transactions (.nlist = 10 and .sdlist = 1) and one combination which doesn't produce any transactions (.nlist = 10 and .sdlist = 20).
I would like to be able to extract the tradeStats for this optimization from the "results" environment, but due to the error the results become NULL.
What would be the best way to solve this problem?
# Demo from https://github.com/braverock/quantstrat/blob/master/demo/bbandParameters.R
require(foreach,quietly=TRUE)
require(iterators)
require(quantstrat)
demo('bbands',ask=FALSE)
strategy.st='bbands'
# Here I have chosen only a single parameter for the first distribution and two for the second, 1 will work but 20 will fail
.nlist = 10
.sdlist = c(1, 20)
# Here are parameters that will successfully produce the results and tradeStats
#.nlist = c(10,12)
#.sdlist = 1
# number of random samples of the parameter distribution to use for random run
.nsamples = 2
add.distribution(strategy.st,
paramset.label = 'BBparams',
component.type = 'indicator',
component.label = 'BBands',
variable = list(n = .nlist),
label = 'nFAST'
)
add.distribution(strategy.st,
paramset.label = 'BBparams',
component.type = 'indicator',
component.label = 'BBands',
variable = list(sd = .sdlist),
label = 'nSLOW'
)
results <- apply.paramset(strategy.st,
paramset.label='BBparams',
portfolio.st=portfolio.st,
account.st=account.st,
nsamples=.nsamples,
verbose=TRUE)
Hopefully I have provided enough information, if not please let me know and I'll try to elaborate.
Thanks in advance
Edit: The obvious answer to the question would be to not choose parameters which have a range which is too large, in other words don't cast the net too wide, but even if the parameters didn't span such large ranges this issue could still arise if you were applying a small parameter optimisation on a large number of assets/datasets. For example, applying the parameters: ".nlist = c(10,12)" and ".sdlist = 1" (which are successful in the above example) to a number of different stocks in this case, It may not work for all of them and will therefore encounter this problem too.
Any input would be greatly appreciated.
The 20 standard deviation indicator is resulting in no trades, and the resulting rbind() of the tradeStats from the different param.combo strategy results is throwing an error. We should handle these errors more gracefully, and at least inform the user and allow the code to continue executing. I have created an issue here - https://github.com/braverock/quantstrat/issues/121.
Your code will work fine with a more realistic standard deviation, say .sdlist = c(1, 2).
I am trying to solve the following a constrained maximization problem.
The example here is simply me trying to recreate a simple example.
I have a dataframe as follows:
Obs=c(1,2,3,4,5)
Var1=c(11,15,16,19,20)
Var2=c(1.5,22,0.9,1.7,.1)
Var3=c(2.6,2.5,3.5,3.6,2.1)
Value_One = c(10,12.5,8.4,7.5,2.6)
Cost = c(1.1,1.2,1.3,1.6,1.7)
Value_overall = c(10,21,31,4,29)
df=data.frame(Obs,Var1,Var2,Var3,Value_One,Cost,Value_overall)
var_sel=c('Var1','Var2')
coeff_sel=c(2.5,4.5)
gamma=.7
I have to run a constrained optimization problem an example of which is as follows (Note the exact values, does not matter. Please feel free to change them as you please):
Value_func = function(x){
Value_var=x$Cost
# - since the contrained optimum function is for minima.
-((x$Value_overall+gamma*(x$Value_null-
(as.matrix(x[var_sel])%*%(as.matrix(coeff_sel)))))-2*x[Cost])
}
#Please feel free to change the values below.
#I just want to know where I am going wrong. The exact values do not matter here.
for (i2 in 1:nrow(df)){
x=df[i2,]
zzz=constrOptim(-1.2, Value_func, NULL,ui=1,ci=-1.3)
}
What I want to do is to run the above for each row of the dataframe. When I run the above example, I get the following error:
Error: $ operator is invalid for atomic vectors
Called from: f(theta, ...)
I tried to look for a solution and this is what I got but it does not seem to be applicable in my case (R $ operator is invalid for atomic vectors in constraOptim).
Please help. Thanks in advance.
This makes a result without error. Changes to your code include:
Added Value_null to the data.frame
Changed the function argument to costs and modify the function after the matrix stuff.
Saved the results to zzz as a list instead of static.
If you design this as a matrix in the first place, you could utilize apply.
df <- data.frame(Obs=c(1,2,3,4,5)
,Var1=c(11,15,16,19,20)
,Var2=c(1.5,22,0.9,1.7,.1)
,Var3=c(2.6,2.5,3.5,3.6,2.1)
,Value_One = c(10,12.5,8.4,7.5,2.6)
,Cost = c(1.1,1.2,1.3,1.6,1.7)
,Value_overall = c(10,21,31,4,29)
#added to match
,Value_null = 5
)
var_sel=c('Var1','Var2')
coeff_sel=c(2.5,4.5)
gamma=.7
Value_func = function(costs){
# - since the contrained optimum function is for minima.
-((x$Value_overall+gamma*(x$Value_null-
(as.matrix(x[var_sel])%*%(as.matrix(coeff_sel)))))-2*costs)
}
for (i2 in 1:nrow(df)){
x=df[i2,]
zzz[[i2]]=constrOptim(1, Value_func, NULL,ui=1,ci=-1.3, x$Cost)
}
Or the apply approach. I don't like that I'm assigning x <<- z but it gives results.
Value_func = function(costs){
# - since the contrained optimum function is for minima.
-((x['Value_overall']+gamma*(x['Value_null']-
(x[var_sel]%*%(coeff_sel))))-2*costs)
}
apply(df, 1, function(z) {
x<<- z
constrOptim(1, Value_func, NULL, ui = 1, ci = -1.3, z['Cost'])
}
)
I have the following MATLAB code and I'm working to translating it to R:
nproc=40
T=3
lambda=4
tarr = zeros(1, nproc);
i = 1;
while (min(tarr(i,:))<= T)
tarr = [tarr; tarr(i, :)-log(rand(1, nproc))/lambda];
i = i+1;
end
tarr2=tarr';
X=min(tarr2);
stairs(X, 0:size(tarr, 1)-1);
It is the Poisson Process from the renewal processes perspective. I've done my best in R but something is wrong in my code:
nproc<-40
T<-3
lambda<-4
i<-1
tarr=array(0,nproc)
lst<-vector('list', 1)
while(min(tarr[i]<=T)){
tarr<-tarr[i]-log((runif(nproc))/lambda)
i=i+1
print(tarr)
}
tarr2=tarr^-1
X=min(tarr2)
plot(X, type="s")
The loop prints an aleatory number of arrays and only the last is saved by tarr after it.
The result has to look like...
Thank you in advance. All interesting and supportive comments will be rewarded.
Adding on to the previous comment, there are a few things which are happening in the matlab script that are not in the R:
[tarr; tarr(i, :)-log(rand(1, nproc))/lambda]; from my understanding, you are adding another row to your matrix and populating it with tarr(i, :)-log(rand(1, nproc))/lambda].
You will need to use a different method as Matlab and R handle this type of thing differently.
One glaring thing that stands out to me, is that you seem to be using R: tarr[i] and M: tarr(i, :) as equals where these are very different, as what I think you are trying to achieve is all the columns in a given row i so in R that would look like tarr[i, ]
Now the use of min is also different as R: min() will return the minimum of the matrix (just one number) and M: min() returns the minimum value of each column. So for this in R you can use the Rfast package Rfast::colMins.
The stairs part is something I am not familiar with much but something like ggplot2::qplot(..., geom = "step") may work.
Now I have tried to create something that works in R but am not sure really what the required output is. But nevertheless, hopefully some of the basics can help you get it done on your side. Below is a quick try to achieve something!
nproc <- 40
T0 <- 3
lambda <- 4
i <- 1
tarr <- matrix(rep(0, nproc), nrow = 1, ncol = nproc)
while(min(tarr[i, ]) <= T0){
# Major alteration, create a temporary row from previous row in tarr
temp <- matrix(tarr[i, ] - log((runif(nproc))/lambda), nrow = 1)
# Join temp row to tarr matrix
tarr <- rbind(tarr, temp)
i = i + 1
}
# I am not sure what was meant by tarr' in the matlab script I took it as inverse of tarr
# which in matlab is tarr.^(-1)??
tarr2 = tarr^(-1)
library(ggplot2)
library(Rfast)
min_for_each_col <- colMins(tarr2, value = TRUE)
qplot(seq_along(min_for_each_col), sort(min_for_each_col), geom="step")
As you can see I have sorted the min_for_each_col so that the plot is actually a stair plot and not some random stepwise plot. I think there is a problem since from the Matlab code 0:size(tarr2, 1)-1 gives the number of rows less 1 but I cant figure out why if grabbing colMins (and there are 40 columns) we would create around 20 steps. But I might be completely misunderstanding! Also I have change T to T0 since in R T exists as TRUE and is not good to overwrite!
Hope this helps!
I downloaded GNU Octave today to actually run the MatLab code. After looking at the code running, I made a few tweeks to the great answer by #Croote
nproc <- 40
T0 <- 3
lambda <- 4
i <- 1
tarr <- matrix(rep(0, nproc), nrow = 1, ncol = nproc)
while(min(tarr[i, ]) <= T0){
temp <- matrix(tarr[i, ] - log(runif(nproc))/lambda, nrow = 1) #fixed paren
tarr <- rbind(tarr, temp)
i = i + 1
}
tarr2 = t(tarr) #takes transpose
library(ggplot2)
library(Rfast)
min_for_each_col <- colMins(tarr2, value = TRUE)
qplot(seq_along(min_for_each_col), sort(min_for_each_col), geom="step")
Edit: Some extra plotting tweeks -- seems to be closer to the original
qplot(seq_along(min_for_each_col), c(1:length(min_for_each_col)), geom="step", ylab="", xlab="")
#or with ggplot2
df1 <- cbind(min_for_each_col, 1:length(min_for_each_col)) %>% as.data.frame
colnames(df1)[2] <- "index"
ggplot() +
geom_step(data = df1, mapping = aes(x = min_for_each_col, y = index), color = "blue") +
labs(x = "", y = "")
I'm not too familiar with renewal processes or matlab so bear with me if I misunderstood the intention of your code. That said, let's break down your R code step by step and see what is happening.
The first 4 lines assign numbers to variables.
The fifth line creates an array with 40 (nproc) zeros.
The sixth line (which doesnt seem to be used later) creates an empty vector with mode 'list'.
The seventh line starts a while loop. I suspect this line is supposed to say while the min value of tarr is less than or equal to T ...
or it's supposed to say while i is less than or equal to T ...
It actually takes the minimum of a single boolean value (tarr[i] <= T). Now this can work because TRUE and FALSE are treated like numbers. Namely:
TRUE == 1 # returns TRUE
FALSE == 0 # returns TRUE
TRUE == 0 # returns FALSE
FALSE == 1 # returns FALSE
However, since the value of tarr[i] depends on a random number (see line 8), this could lead to the same code running differently each time it is executed. This might explain why the code "prints an aleatory number of arrays ".
The eight line seems to overwrite the assignment of tarr with the computation on the right. Thus it takes the single value of tarr[i] and subtracts from it the natural log of runif(proc) divided by 4 (lambda) -- which gives 40 different values. These fourty different values from the last time through the loop are stored in tarr.
If you want to store all fourty values from each time through the loop, I'd suggest storing it in say a matrix or dataframe instead. If that's what you want to do, here's an example of storing it in a matrix:
for(i in 1:nrow(yourMatrix)){
//computations
yourMatrix[i,] <- rowCreatedByComputations
}
See this answer for more info about that. Also, since it's a set number of values per run, you could keep them in a vector and simply append to the vector each loop like this:
vector <- c(vector,newvector)
The ninth line increases i by one.
The tenth line prints tarr.
the eleveth line closes the loop statement.
Then after the loop tarr2 is assigned 1/tarr. Again this will be 40 values from the last time through the loop (line 8)
Then X is assigned the min value of tarr2.
This single value is plotted in the last line.
Also note that runif samples from the uniform distribution -- if you're looking for a Poisson distribution see: Poisson
Hope this helped! Let me know if there's more I can do to help.
I am trying to compare mean similarity between 3 subsets of data using the com.sim function (simba-package), but I’m having trouble getting the function to ignore missing values and correctly run the analysis.
Some background on my data and what I’ve done so far: My data is binary, but unlike the kinds of data for which the function is written, I working with skeletal remains, which are typically incomplete and fragmented. Thus, ~10% of my data matrix has missing values.
When I run this command in R
com.sim(mydata, subs, simil = "jaccard", binary = TRUE, permutations = 1000, alpha = 0.05, bonfc = TRUE)
I get the following error message:
Error in diffmean(as.numeric(sim(veg[subs == (comb[x, 1]), ], method = simil)), :
There are NA values. Consider setting na.rm accordingly
I subsequently modified the code of the function to the following (modification in bold):
if (binary) {
tmp <- lapply(c(1:nrow(comb)), function(x) diffmean(as.numeric(sim(veg[subs ==
(comb[x, 1]), ], method = simil,)), as.numeric(sim(veg[subs ==
(comb[x, 2]), ], method = simil, )), na.rm = TRUE))
Now, the function runs, but it is excluding all cases with at least 1 missing value (which is nearly half the data set!!). It seems that it is deleting cases w/ NA listwise, whereas I’d prefer pairwise deletion so that similarity coefficients can still be calculated between cases with missing values (but just excluding the variables with NA from the calculation). Is there any way to accomplish this within com.sim? I know other functions such as simil (proxy-package) can handle missing values when calculating a matrix of Jaccard coefficients, but it seems that the sim functions in simba weren’t built this way.
I’m have zero coding experience (is it obvious?) and so I would appreciate any help or advice on options to pursue!
Thank you very much, and please let me know if I can provide additional information.
Best,
Matt