plotting loess with standard errors in R causes integer overflow - r

I am attempting to use predict with a loess object in R. There are 112406 observations. There is one particular line inside stats:::predLoess which attempts to multiply N*M1 where N=M1=112406. This causes an integer overlow and the function bombs out. The line of code that does this is the following (copied from predLoess source):
L <- .C(R_loess_ise, as.double(y), as.double(x), as.double(x.evaluate[inside,
]), as.double(weights), as.double(span), as.integer(degree),
as.integer(nonparametric), as.integer(order.drop.sqr), as.integer(sum.drop.sqr),
as.double(span * cell), as.integer(D), as.integer(N), as.integer(M1),
double(M1), L = double(N * M1))$L
Has anyone solved this or found a solution to this problem? I am using R 2.13. The name of this forum is fitting for this problem.

It sounds like you're trying to get predictions for all N=112406 observations. First, do you really need to do this? For example, if you want graphical output, it's faster just to get predictions on a small grid over the range of your data.
If you do need 112406 predictions, you can split your data into subsets (say of size 1000 each) and get predictions on each subset independently. This avoids forming a single gigantic matrix inside predLoess.

Related

How to use escalc function?

I am working on a meta analysis using the metafor package. I want to calculate the effect size in using the package but am running into some trouble. I am trying to calculate effect size using the escalc function. I have a file with values ~200 rows containing data on the control/test means variances, and sample numbers. For each row I would like to calculate the effect size. I would now like to use the escalc function to determiner the effect size using SMD.
My current code is as follows:
# escalc function
escalc <- function(measure, ai, bi, ci, di, n1i, n2i, x1i, x2i, t1i, t2i, m1i, m2i, sd1i, sd2i, xi, mi, ri, ti, sdi, r2i, ni, yi, vi, sei,
data, slab, subset, include, add=1/2, to="only0", drop00=FALSE, vtype="LS", var.names=c("yi","vi"), add.measure=FALSE, append=TRUE, replace=TRUE, digits, ...)
# apply data and add effect size col to data frame
data$ES <- escalc(measure = SMD, dat$MRE1, dat$MTE2, dat$VRE1, dat$VTE2, dat$NR1, dat$NR2, data = dat)
When I run this code once there seems to be no problem/error (if I run the code more than once it says "Error: C stack usage 15925888 is too close to the limit" - unsure what this means) but my dataframe does not have a new column with the ES for each study. When I highlight the new variable and click enter (to see what the data looks like) it says NULL so I don't think it actually ran. How can I get a summary of the effect sizes?
I am unsure what I am doing wrong or how to see what the effect sizes I've calculated are. I've been reading the metafor documentation and am unsure what I am doing wrong (https://cran.r-project.org/web/packages/metafor/metafor.pdf). Do I need to calculate escalc for each paper? Any help is greatly appreciated.
Thank you!
You should use:
dat <- escalc(measure="SMD", m1i=MRE1, m2i=MTE2, sd1i=sqrt(VRE1), sd2i=sqrt(VTE2), n1i=NR1, n2i=NR2, data=dat)
Note that the SDs are the input for arguments sd1i and sd2i, so if you have the variances, we need to take the square-root of them.

Xgboost - how to make a custom loss function which depends the value of another column, as well the error

I am having issue implementing recency-weighting for xgboost training in R (i.e. passing a weight vector to xgb.dmatrix) - although the weighting affects the learning curve readout for the training set, it does not appear to have any impact at all on the actual model produced - performance in the test set is identical.
I can't seem to get to the bottom of this issue or generate a reproducible example. So instead I would like to pass the Date column of the features to a custom loss function, something like:
custom_loss <- function(preds,dat) {
labels <- getinfo(dat,"label")
dates <- [a vector corresponding to the dates associated with each prediction]
grad = f(dates)*-2*(labels - preds)
hess = f(dates)*2
[where f is an increasing function of the value in dates, so later samples matter more when training]
return(list(grad=grad,hess=hess))
}
But I can't seem to figure out how to do this, any suggestions?

Low-pass fltering of a matrix

I'm trying to write a low-pass filter in R, to clean a "dirty" data matrix.
I did a google search, came up with a dazzling range of packages. Some apply to 1D signals (time series mostly, e.g. How do I run a high pass or low pass filter on data points in R? ); some apply to images. However I'm trying to filter a plain R data matrix. The image filters are the closest equivalent, but I'm a bit reluctant to go this way as they typically involve (i) installation of more or less complex/heavy solutions (imageMagick...), and/or (ii) conversion from matrix to image.
Here is sample data:
r<-seq(0:360)/360*(2*pi)
x<-cos(r)
y<-sin(r)
z<-outer(x,y,"*")
noise<-0.3*matrix(runif(length(x)*length(y)),nrow=length(x))
zz<-z+noise
image(zz)
What I'm looking for is a filter that will return a "cleaned" matrix (i.e. something close to z, in this case).
I'm aware this is a rather open-ended question, and I'm also happy with pointers ("have you looked at package so-and-so"), although of course I'd value sample code from users with experience on signal processing !
Thanks.
One option may be using a non-linear prediction method and getting the fitted values from the model.
For example by using a polynomial regression, we can predict the original data as the purple one,
By following the same logic, you can do the same thing to all columns of the zz matrix as,
predictions <- matrix(, nrow = 361, ncol = 0)
for(i in 1:ncol(zz)) {
pred <- as.matrix(fitted(lm(zz[,i]~poly(1:nrow(zz),2,raw=TRUE))))
predictions <- cbind(predictions,pred)
}
Then you can plot the predictions,
par(mfrow=c(1,3))
image(z,main="Original")
image(zz,main="Noisy")
image(predictions,main="Predicted")
Note that, I used a polynomial regression with degree 2, you can change the degree for a better fitting across the columns. Or maybe, you can use some other powerful non-linear prediction methods (maybe SVM, ANN etc.) to get a more accurate model.

Converting R script to SAS

I want to add noise to a dataset. This is a fairly straightforward procedure in R. I sample from a Laplace distribution and then add/multiply/whatever that vector to the vector I want to add noise to.
The issue is, my colleague is asking for the code in SAS. I have not used SAS since graduate school and my project has been put on hold until I can get my colleague up to speed in SAS.
My code is pretty simple :
library ("rmutil")
vector <- c (1,2,3,1,2,3,1,2,3)
vector_prop <- vector/sum(vector)
noise <- rlaplace(9, m=1, s=.1)
new_vector <- vector_prop * noise
I am turning my vector I want to add noise to into a proportion, then drawing from a laplace distribution. Finally I multiply those draws with my proportion vector.
Any idea would be helpful as the SAS documentation was difficult to follow. I imagine they feel the same way with R documentation.
Assuming your data is in a data set called have with a variable called vector_prop the following code is likely correct. Because of the nature of random numbers and streams you can't replicate that though, don't you end up with a different data set each time?
data want;
set have;
call streaminit(24); *fixes random number stream for reproduciblilty;
new_var = vectorProp * rand('laplace', 1, 0.1);
run;

Plotting coda mcmc objects giving error in plot.new

I have an R package I am working on that returns output from a Metropolis-Hastings sampler. The output consists of, among other things, matrices where the columns are the variables and the rows are the samples from the posterior. I convert these into coda mcmc objects with this code:
colnames(results$beta) = x$data$Pops
results$beta = mcmc(results$beta, thin = thin)
where thin is 183 and beta is a 21 x 15 matrix (this is a toy example). The mcmc.summary method works fine, but the plot.mcmc gives me:
Error in plot.new() : figure margins too large
I have done a bit of debugging. All the values are finite, there are no NA's, the limits of the axes seem to be being set okay, and there are enough panels (2 plots each with 4 rows and 2 columns) I think. Is there something I am missing in the coercion into the mcmc object?
Package source and all associated files can be found on http://github.com/jmcurran/rbayesfst. A script which will produce the error quickly is in the unexported function mytest, so you'll need
rbayesfst:::mytest()
to get it to run.
There has been suggestion that this has been answered already in this question, but I would like to point out that it is not me setting any of the par values, but plot.mcmc so my question is not about par or plot but what (if anything) I am doing wrong in making a matrix into an mcmc object that cannot be plotted by plot.mcmc It can't be the size of the matrix, because I have had examples with many more dimensions directly from rjags that worked fine.

Resources