How to use escalc function? - r

I am working on a meta analysis using the metafor package. I want to calculate the effect size in using the package but am running into some trouble. I am trying to calculate effect size using the escalc function. I have a file with values ~200 rows containing data on the control/test means variances, and sample numbers. For each row I would like to calculate the effect size. I would now like to use the escalc function to determiner the effect size using SMD.
My current code is as follows:
# escalc function
escalc <- function(measure, ai, bi, ci, di, n1i, n2i, x1i, x2i, t1i, t2i, m1i, m2i, sd1i, sd2i, xi, mi, ri, ti, sdi, r2i, ni, yi, vi, sei,
data, slab, subset, include, add=1/2, to="only0", drop00=FALSE, vtype="LS", var.names=c("yi","vi"), add.measure=FALSE, append=TRUE, replace=TRUE, digits, ...)
# apply data and add effect size col to data frame
data$ES <- escalc(measure = SMD, dat$MRE1, dat$MTE2, dat$VRE1, dat$VTE2, dat$NR1, dat$NR2, data = dat)
When I run this code once there seems to be no problem/error (if I run the code more than once it says "Error: C stack usage 15925888 is too close to the limit" - unsure what this means) but my dataframe does not have a new column with the ES for each study. When I highlight the new variable and click enter (to see what the data looks like) it says NULL so I don't think it actually ran. How can I get a summary of the effect sizes?
I am unsure what I am doing wrong or how to see what the effect sizes I've calculated are. I've been reading the metafor documentation and am unsure what I am doing wrong (https://cran.r-project.org/web/packages/metafor/metafor.pdf). Do I need to calculate escalc for each paper? Any help is greatly appreciated.
Thank you!

You should use:
dat <- escalc(measure="SMD", m1i=MRE1, m2i=MTE2, sd1i=sqrt(VRE1), sd2i=sqrt(VTE2), n1i=NR1, n2i=NR2, data=dat)
Note that the SDs are the input for arguments sd1i and sd2i, so if you have the variances, we need to take the square-root of them.

Related

How to estimate a parameter in R- sample with replace

I have a txt file with numbers that looks like this(but with 100 numbers) -
[1] 7.1652348 5.6665965 4.4757553 4.8497086 15.2276296 -0.5730937
[7] 4.9798067 2.7396933 5.1468304 10.1221489 9.0165661 65.7118194
[13] 5.5205704 6.3067488 8.6777177 5.2528503 3.5039562 4.2477401
[19] 11.4137624 -48.1722034 -0.3764006 5.7647536 -27.3533138 4.0968204
I need to estimate MLE theta parameter from this distrubution -
[![this is my distrubution ][1]][1]
and I need to estimate theta from a sample of 1000 observations with replace, and save the sample, and do a hist.
How can I estimate theta from my sample? I have no information about normal distrubation.
I wrote something like this -
data<-read.table(file.choose(), header = TRUE, sep= "")
B <- 1000
sample.means <- numeric(data)
sample.sd <- numeric(data)
for (i in 1:B) {
MySample <- sample(data, length(data), replace = TRUE)
sample.means <- c(sample.means,mean(MySample))
sample.sd <- c(sample.sd,sd(MySample))
}
sd(sample.sd)
but it doesn't work..
This question incorporates multiple different ones, so let's tackle each step by step.
First, you will need to draw a random sample from your population (with replacement). Assuming your 100 population-observations sit in a vector named pop.
rs <- sample(pop, 1000, replace = True)
gives you your vector of random samples. If you wanna save it, you can write it to your disk in multiple formats, so I'll just suggest a few related questions (How to Export/Import Vectors in R?).
In a second step, you can use the mle()-function of the stats4-package (https://stat.ethz.ch/R-manual/R-devel/library/stats4/html/mle.html) and specify the objective function explicitly.
However, the second part of your question is more of a statistical/conceptual question than R related, IMO.
Try to understand what MLE actually does. You do not need normally distributed variables. The idea behind MLE is to choose theta in such a way, that under the resulting distribution the random sample is the most probable. Check https://en.wikipedia.org/wiki/Maximum_likelihood_estimation for more details or some youtube videos, if you'd like a more intuitive approach.
I assume, in the description of your task, it is stated that f(x|theta) is the conditional joint density function and that the observations x are iir?
What you wanna do in this case, is to select theta such that the squared difference between the observation x and the parameter theta is minimized.
For your statistical understanding, in such cases, it makes sense to perform log-linearization on the equation, instead of dealing with a non-linear function.
Minimizing the squared difference is equivalent to maximizing the log-transformed function since the sum is negative (<=> the product was in the denominator) and the log, as well as the +1 are solely linear transformations.
This leaves you with the maximization problem:
And the first-order condition:
Obviously, you would also have to check that you are actually dealing with a maximum via the second-order condition but I'll omit that at this stage for simplicity.
The algorithm in R does nothing else than solving this maximization problem.
Hope this helps for your understanding. Maybe some smarter people can give some additional input.

Xgboost - how to make a custom loss function which depends the value of another column, as well the error

I am having issue implementing recency-weighting for xgboost training in R (i.e. passing a weight vector to xgb.dmatrix) - although the weighting affects the learning curve readout for the training set, it does not appear to have any impact at all on the actual model produced - performance in the test set is identical.
I can't seem to get to the bottom of this issue or generate a reproducible example. So instead I would like to pass the Date column of the features to a custom loss function, something like:
custom_loss <- function(preds,dat) {
labels <- getinfo(dat,"label")
dates <- [a vector corresponding to the dates associated with each prediction]
grad = f(dates)*-2*(labels - preds)
hess = f(dates)*2
[where f is an increasing function of the value in dates, so later samples matter more when training]
return(list(grad=grad,hess=hess))
}
But I can't seem to figure out how to do this, any suggestions?

Weibull distribution with weighted data

I have some time to event data that I need to generate around 200 shape/scale parameters for subgroups for a simulation model. I have analysed the data, and it best follows a weibull distribution.
Normally, I would use the fitdistrplus package and fitdist(x, "weibull") to do so, however this data has been matched using kernel matching and I have a variable of weighting values called km and so needs to incorporate a weight, which isn't something fitdist can do as far as I can tell.
With my gamma distributed data instead of using fitdist I did the calculation manually using the wtd.mean and wtd.var functions from the hsmisc package, which worked well. However, finding a similar formula for the weibull is eluding me.
I've been testing a few options and comparing them against the fitdist results:
test_data <- rweibull(100, 0.676, 946)
fitweibull <- fitdist(test_data, "weibull", method = "mle", lower = c(0,0))
fitweibull$estimate
shape scale
0.6981165 935.0907482
I first tested this: The Weibull distribution in R (ExtDist)
library(bbmle)
m1 <- mle2(y~dweibull(shape=exp(lshape),scale=exp(lscale)),
data=data.frame(y=test_data),
start=list(lshape=0,lscale=0))
which gave me lshape = -0.3919991 and lscale = 6.852033
The other thing I've tried is eweibull from the EnvStats package.
eweibull <- eweibull(test_data)
eweibull$parameters
shape scale
0.698091 935.239277
However, while these are giving results, I still don't think I can fit my data with the weights into any of these.
Edit: I have also tried the similarly named eWeibull from the ExtDist package (which I'm not 100% sure still works, but does have a weibull function that takes a weight!). I get a lot of error messages about the inputs being non-computable (NA or infinite). If I do it with map, so map(test_data, test_km, eWeibull) I get [[NULL] for all 100 values. If I try it just with test_data, I get a long string of errors associated with optimx.
I have also tried fitDistr from propagate which gives errors that weights should be a specific length. For example, if both are set to be 100, I get an error that weights should be length 94. If I set it to 94, it tells me it has to be length of 132.
I need to be able to pass either a set of pre-weighted mean/var/sd etc data into the calculation, or have a function that can take data and weights and use them both in the calculation.
After much trial and error, I edited the eweibull function from the EnvStats package to instead of using mean(x) and sd(x), to instead use wtd.mean(x,w) and sqrt(wtd.var(x, w)). This now runs and outputs weighted values.

Meta analysis from mean-effectsizes for overlapping samples

I run a meta analysis and use the metafor library to calculate fisher z transformed values from correlations.
>meta1 <- escalc(ri=TESTR, ni=N, measure="ZCOR", data=subdata2)
As some of the studies I include in my meta-analysis, overlap in samples (i.e. in Study XY, 5 effect-sizes are reported from the same N), I need to calculate means of the standardized z-values. To indicate overlapping samples, I gave all effect sizes IDs (in Excel) which are equal if the samples overlap.
To run the final metaanalysis, I would like R to sum the standardized effect sizes from IDs and calculate means for the final metaanalysis.
So the idea is:
IF Effect_SIZE_ID (a variable) is similar in two lines of my df, then sum both effect sizes and divide it by two (calculate the mean). Provide this result in a new column.
As I am a full newbie, please let me know if you require further specification!
Thank you so much in advance.
LEon
Have a look at the summaryBy command in the doBy package.
mymean <- summaryBy(SD_effect ~ ID, FUN = mean, data = data)
Should work in general (if you provide some sample data it is easy to check if that does what you need).

plotting loess with standard errors in R causes integer overflow

I am attempting to use predict with a loess object in R. There are 112406 observations. There is one particular line inside stats:::predLoess which attempts to multiply N*M1 where N=M1=112406. This causes an integer overlow and the function bombs out. The line of code that does this is the following (copied from predLoess source):
L <- .C(R_loess_ise, as.double(y), as.double(x), as.double(x.evaluate[inside,
]), as.double(weights), as.double(span), as.integer(degree),
as.integer(nonparametric), as.integer(order.drop.sqr), as.integer(sum.drop.sqr),
as.double(span * cell), as.integer(D), as.integer(N), as.integer(M1),
double(M1), L = double(N * M1))$L
Has anyone solved this or found a solution to this problem? I am using R 2.13. The name of this forum is fitting for this problem.
It sounds like you're trying to get predictions for all N=112406 observations. First, do you really need to do this? For example, if you want graphical output, it's faster just to get predictions on a small grid over the range of your data.
If you do need 112406 predictions, you can split your data into subsets (say of size 1000 each) and get predictions on each subset independently. This avoids forming a single gigantic matrix inside predLoess.

Resources