How to estimate a country specific reproduction number (R0) - r

I am a new R user and I am trying to estimate a country specific reproduction number ("R0") for a disease called "Guinea worm".
I tried to install the R0 package but I can't figure out how it works.
I have the number of cases reported in a range of years, the total population per year and an uniform distribution specifying the generation time function.
Is it possible to estimate R0 with these data? Thank you for any help you can provide.

Yes, you can. You can get started with the estimate.R{R0} command. Please follow the exercise as the example in the respective documentation.

Related

Create a new datafram to do piecewise linear regression on percentages after doing serial crosstabs in R

I am working with R. I need to identify the predictors of higher Active trial start percentage over time (StartDateMonthsYrs). I will do linear regression with Percent.Active as the dependent variable.
My original dataframe is attached and my obtained Active trial start percentage over time (named Percent.Activeis presented here.
So, I need to assess whether federal sponsored trials, industry sponsored trials or Other sponsored trials were associated with higher active trial start percentage over time. I have many other variables that I wneed to assess but this is the sample of my data.
I am thinking to do many crosstabs for each variable (eg Fedral & Active then Industry & Active..etc.) in each month (may be with help of lapply then accumulate the obtained percentages data in the second sheet then run the analysis based on that.
My code for linear regression is as follow:
q.lm0 <- lm(Percent.Active ~ Time.point+ xyz, data.percentage);summary(q.lm0)
I'm a little bit confused. You write 'associated'. If you really want to look for association then yeah, a crosstab might be possible, and sufficient, as association is not the same as causation (which is further derived from correlation, if there is a theory behind). If you look for correlation, and insights over time, doing a regression with the lm package is not useful.
If you want to look for a regreesion type analysis there are packages in R like the plm package, which can deal with panel data, as you clearly have panel data (time points, and interested trials labels, and repetitive time points for these labels). Look at this post for infos about the package:https://stackoverflow.com/questions/2804001/panel-data-with-binary-dependent-variable-in-r
I'm writing you this because your Percent.Activevariable is only a binary outcome of 0/1 I'm not sure if this is on purpose. However, even if your outcome is not binary, the plm package might help, but you will find other mentioned packages in that post.

R - replicate weight survey

Currently I'm interested in learning how to obtain information from the American Community Survey PUMS files. I have read some of the the ACS documentation and found that to replicate weights I must use the following formula:
And thanks to google I also found that there's the SURVEY package and the svrepdesign function to help me get this done
https://www.rdocumentation.org/packages/survey/versions/3.33-2/topics/svrepdesign
Now, even though I'm getting into R and learning statistics and have a SQL background, there are two BIG problems:
1 - I have no idea what that formula means and I would really like to understand it before going any further
2 - I don't understand how the SVREPDESIGN function works nor how to use it.
I'm not looking for someone to solve my life/problems, but I would really appreciate if someone points me in the right direction and gives a jump start.
Thank you for your time.
When you are using svyrepdesign, you are specifying that it is a design with replicated weights, and it uses the formula you provided to calculate the standard errors.
The American Community Survey has 80 replicate weights, so it first calculates the statistic you are interested in with the full sample weights (X), then it calculates the same statistic with all 80 replicate weights (X_r).
You should read this: https://usa.ipums.org/usa/repwt.shtml

How do I solve this by R, with power.t.test maybe?

A graduate program is interested in estimating the average annual income of its alumni. Although no prior information is available to estimate the population variance, it is known that most alumni incomes lie within a $10,000 range. The graduate program has N = 1,000 alumni. Determine the sample size necessary to estimate with a bound on the error of estimation of $500.
I know how to deal with it statistically, but I don't know if I have to use R.
power.t.test requires 4 arguments: delta, sig.level, sd,power (since n is what I want).
I know that sd can be calculated with 10000 range and = 10000/4 = 2500
but how to deal with the rest three?
Addition:
I googled about how to do this statistically(mathematically).
It is the book Elementary Survey Sampling by R.L.Scheaffer and W.Mendenhall. Page 88. StackOverflow doesn't allow me to add a picture yet, so I just share the link here.
https://books.google.co.jp/books?id=nUYJAAAAQBAJ&pg=PA89&lpg=PA89&dq=Although+no+prior+information+is+available+to+estimate+the+population+variance&source=bl&ots=Kqt7Cc5FFv&sig=Vx2bBRyi2KfrgMGkaC0f1EnfTWM&hl=en&sa=X&redir_esc=y#v=onepage&q&f=false
With the formulae provided, I can calculate that the sample size required to solve the question is 91. Anyone can show me how to do this with R pls?
Thanks in advance!
ps. Sorry about my crap English and crap format... I have not been familiar with this website yet.

R: library(BTYD): possible to add more predictors?

I've been using BTYD package for predicting customer churn and number of orders in the future, but I find the included models (Pareto/NBD, BG/NBD and BG/BB) limited in the sense that they only take recency, frequency, age and monetary value into account. Using these values I receive accuracy of up to 80% but I'm sure this could be improved by incorporating more meaningful predictors into the model. Is it possible in this package? I couldn't find any information about it in the vignette. Any help is welcome, thanks!
Kasia

Buy Till You Die(BTYD) Model Validation in R

I ran the BTYD package in R which predicts the number of transactions that a customer is expected to make in the future.These expected values that I get are not integers but are in the form 0.14, 0.79, 1.85, etc. In reality, however a customer will only make integral number of transactions - I have this data as well. My question is - how do I validate the performance of my model? What tests can I use to check that my model is predicting close enough results. Or is there a maximum likelihood function that will give me integral values of my Expected transactions through which I can compare the actual and expected results?
Any help will be appreciated.
I know the post is rather old, but just in case you are still interested... The decimal values are fine, if you want integers round in a smart way, but my suggestion would be to stick with the decimal.
For the validation, usually you need a dataset of past data which you split in two: the first half of the dataset can be used to calibrate the model (you derive the four parameters of the BTYD), while you test it on the second half of the dataset and you see the actual trend vs the predicted trend.
You can find examples on how to compare actual trends vs predicted trends on the guide here:
BTYD walkthrough
Good luck

Resources