Wrong risk ratio/relative risk calculation with escalc() metafor package for meta analysis - metafor

I have a quick (and I'm sorry if silly) question.
I am using the metafor package and this function:
data<-escalc(measure = "RR", ai=1, bi=0, ci=43, di=443, append=TRUE)
to calculate the relative risk.
While my manual calculations say its 11.30 (log(RR)=2.425), the escalc function throws a 8.40 (log(RR)=2.13).
I actually took this values from a research paper, and know for a fact that 11.30 is the correct result, but I have no clue on why this calculation gives 8.40.
Any help is very welcome

When one of the cells is a 0, escalc applies the +1/2 continuity correction by default. If you don't want that, use add=0. See help(escalc).

Related

There are something that i don't know with this plot?

I am looking at this code, previously v-transformations were done and fitting VT-ARMA copula models, now here it is applying shapiro test to residuals and want to plot 4 graphs: 
https://i.stack.imgur.com/gTtBU.png
These 4 plots should come out of plot(vtcop, plotoption=3) etc... I have never used this argument plotoption, i think this argument is contained in the tscopula package ,but I have already done the necessary research on the help and read the pdf file that explains the tscopula package but there is no such "plotoption".
Can anyone tell me why it tells me unused argument at this point?
This code from by paper of AlexanderMcNeil: "Modelling Volatile Time Series with V-Transforms and Copulas".
Thank you very much. Good day.

CCF - general problems

I am working on my bachelor thesis, where I want to look into the lagged cross-correlation of a timeseries of search query volumes (=x) to the price of bitcoin (=y).
I have already created several ccf-plots using the "ccf"-function in R .
See picture:
I saw in the description of R's acf-function that ccf only works with one y and one x series. I was wondering if someone knows a way to put several of those plots into one, especially since I can categorize positively correlated and negatively correlated ones.
Further I was wondering, the dashed-blue line representing the confidence value, but at what level? 0.05? 0.01?
These are two questions in one.
1. question: combine plots
This question has been asked before. Please look it up:
Combining plots created by R base, lattice, and ggplot2
Combine plots in R
2. question: confidence intervals in ccf-plot:
The plot gives you the confidence intervals. The manual advises caution with these even though it uses ci.type = "white" is default setting. This default bluntly adds some confidence based on the quantiles of a standard normal distribution. It does not take the statistical properties of your data into account. In my opinion it is altogether useless. The manual recommends ci.type = "ma". But that will only work for autocorrelations. If you try using it with cross-correlations, you will get a warning saying "can use ci.type=‘ma’ only if first lag is 0". When doing autocorrelations the function shifts the sequences from -k to +k and will allow the first lag to be zero. ccf does not.
Further support
I hope it is not against the code of conduct to offer further support.
The ccf function has some pecularities that aren't well explained in the manual. Since I had trouble with ccf myself I wrote it all down here for everybody.
Because I wanted meaningful confidence intervals I developed an improved version of 'ccf' (link to repository in case anyone is interested) myself. It offers confidence intervals. The ccf-object by the new function is compatible with the output by stats::ccf() but contains more information. Additional functions make it more useful.

1 sample t-test from summarized data in R

I can perform a 1 sample t-test in R with the t.test command. This requires actual sets of data. I can't use summary statistics (sample size, sample mean, standard deviation). I can work around this utilizing the BSDA package. But are there any other ways to accomplish this 1-sample-T in R without the BSDA pacakage?
Many ways. I'll list a few:
directly calculate the p-value by computing the statistic and calling pt with that and the df as arguments, as commenters suggest above (it can be done with a single short line in R - ekstroem shows the two-tailed test case; for the one tailed case you wouldn't double it)
alternatively, if it's something you need a lot, you could convert that into a nice robust function, even adding in tests against non-zero mu and confidence intervals if you like. Presumably if you go this route you'' want to take advantage of the functionality built around the htest class
(code and even a reasonably complete function can be found in the answers to this stats.SE question.)
If samples are not huge (smaller than a few million, say), you can simulate data with the exact same mean and standard deviation and call the ordinary t.test function. If m and s and n are the mean, sd and sample size, t.test(scale(rnorm(n))*s+m) should do (it doesn't matter what distribution you use, so runif would suffice). Note the importance of calling scale there. This makes it easy to change your alternative or get a CI without writing more code, but it wouldn't be suitable if you had millions of observations and needed to do it more than a couple of times.
call a function in a different package that will calculate it -- there's at least one or two other such packages (you don't make it clear whether using BSDA was a problem or whether you wanted to avoid packages altogether)

How does r calculate the p-values in logistic regression

What type of p-values do R calculate in a binomial logistic regression, and where is this documented?
When i read the documentation for ?glm() I find no reference to the calculation of the p-values.
The p-values are calculated by the function summary.glm. See ?summary.glm for a (very brief) bit about how those are calculated.
For more information, look at the source code by typing
summary.glm
at the R command prompt. There you will find the lines of code where an object pvalue is created. Follow the code back to see how the components of the p-value calculation are (conditionally) calculated.
The authors of R wrote the help system with several principles in mind: compactness (don't write more than is needed, it's not a textbook), accuracy, and a curious and well-educated audience. It really was written for other statisticians. The "curious" part of that opening sentence was included to raise the question why you did not also follow the various links in the ?glm page: to summary.glm where you would have found one answer to your ambiguous question or to anova.glm where you would have found another possible answer. The help-authors do expect that you will follow those links and read the whole page and execute the examples. You will notice that even after you get to summary.glm that there is no mention of "binary logistic regression" since they pretty much assume that you are well-grounded in statistics and have copy of McCullagh and Nelder handy, or if not that you will go read the references.
The other principle: sometimes it is the code itself (given the open-source nature of R) that performs the documentation. Technically glm doesn't print anything and print.glm doesn't print p-values. It would be print.summary.glm or print.anova.glm that would be doing any printing. Part of learning R is learning that the results printed to the console will have gone through a eval-print loop and that output can be tailored with object-class-specific functions.
These assumptions are just part of what many people see as a "steep learning curve for R" (although I would have called it a shallow curve if plotted with time/effort on x-axis.)

m-estimate for continuous values

I'm building a custom regression tree and want to use m-estimate for pruning.
Does anyone know how to calculate that.
http://www.ailab.si/blaz/predavanja/UISP/slides/uisp07-RegTrees.ppt might help (slide 12, how should Em look like?)
There are a lot of m-estimates. They all boil down to recasting your estimation problem as a minimization problem. If you use squared error as the function you're minimizing, you just get sample mean. If you use absolute value of the error, you get the sample median. The idea is to use a function that is a compromise between these two so that you get some of the efficiency of the mean and some of the robustness of the median.
Once you've picked your function, finding an m-estimate is just an optimization problem. So your question really boils down to one of finding optimization software. If your optimization problem is convex (and you can pick your m-estimator so that the problem is convex) then there's a lot of high quality software out there.

Resources