A graduate program is interested in estimating the average annual income of its alumni. Although no prior information is available to estimate the population variance, it is known that most alumni incomes lie within a $10,000 range. The graduate program has N = 1,000 alumni. Determine the sample size necessary to estimate with a bound on the error of estimation of $500.
I know how to deal with it statistically, but I don't know if I have to use R.
power.t.test requires 4 arguments: delta, sig.level, sd,power (since n is what I want).
I know that sd can be calculated with 10000 range and = 10000/4 = 2500
but how to deal with the rest three?
Addition:
I googled about how to do this statistically(mathematically).
It is the book Elementary Survey Sampling by R.L.Scheaffer and W.Mendenhall. Page 88. StackOverflow doesn't allow me to add a picture yet, so I just share the link here.
https://books.google.co.jp/books?id=nUYJAAAAQBAJ&pg=PA89&lpg=PA89&dq=Although+no+prior+information+is+available+to+estimate+the+population+variance&source=bl&ots=Kqt7Cc5FFv&sig=Vx2bBRyi2KfrgMGkaC0f1EnfTWM&hl=en&sa=X&redir_esc=y#v=onepage&q&f=false
With the formulae provided, I can calculate that the sample size required to solve the question is 91. Anyone can show me how to do this with R pls?
Thanks in advance!
ps. Sorry about my crap English and crap format... I have not been familiar with this website yet.
Related
So I am analyzing a multi-site trial of rice breeding lines at 4 environments. The simplified data is here:
https://drive.google.com/file/d/1jilVXX8JMkZCDVtIRmrwzB55kgR2GtYB/view?usp=sharing
And I am runnning a simple model with unstructured variance on sommer. I have done it on lme4 and nlme, but let's just say I want to stick with sommer. The model is:
m3 <- mmer(RDM ~ ENV ,
random=~ vsr(usr(ENV),GEN),
rcov=~ units,
data=d)
Pretty simple, no? However, very quickly I get the error:
System is singular (V). Stopping the job. Try a bigger number of tolparinv.
So, ok, I try a bigger tolparinv number (as I can't make a simpler model). But the smallest number that makes the function work is 1000. So, my question is: what are the implications of this?
Moreover, let's say that it is ok to run the model like that. And now what happens is that many of my variance components are negative. Which doesn't make much sense.
Could somebody please shed some light on this? So, the concrete questions are:
Why is singularity arriving so quickly?
What happens if I increase so much tolparinv?
Is that why my variance is negative?
And most importantly: is this fixable? How?
Thank you!!!
In the help file for the Kest function in spatstat there is a warning section stating:
"The estimator of K(r) is approximately unbiased for each fixed r. Bias increases with r and depends on the window geometry. For a rectangular window it is prudent to restrict the r values to a maximum of 1/4 of the smaller side length of the rectangle. Bias may become appreciable for point patterns consisting of fewer than 15 points."
I would like to know in what sense the estimator of K(r) becomes biased with increasing r and for point patterns with fewer than 15 points?
Any advice on this matter would be greatly appreciated!
I have read the book "Spatial point patterns" (Baddeley et al., 2015) but I can't seem to find the answer there (or in any other literature). I may of course have missed that section of the book, if so please let me know.
I don't know the historical facts about where n=15 comes from, but this is probably related to the fact that the estimate of K(r) is only ratio-unbiased. Typically what we can estimate directly is X(r) = lambda^2*K(r) where lambda is the the true intensity of the process. Then we use the estimate of this quantity, X_est(r) say, together with an estimate of lambda^2, lambda^2_est say, and then estimate K(r) as K_est(r) = X_est(r) / lambda^2_est. Thus the numerator and denominator are unbiased estimates of the right things, but the ratio isn't. The problem is worst when lambda^2 is poorly estimated, i.e., when we have few data points.
Currently I'm interested in learning how to obtain information from the American Community Survey PUMS files. I have read some of the the ACS documentation and found that to replicate weights I must use the following formula:
And thanks to google I also found that there's the SURVEY package and the svrepdesign function to help me get this done
https://www.rdocumentation.org/packages/survey/versions/3.33-2/topics/svrepdesign
Now, even though I'm getting into R and learning statistics and have a SQL background, there are two BIG problems:
1 - I have no idea what that formula means and I would really like to understand it before going any further
2 - I don't understand how the SVREPDESIGN function works nor how to use it.
I'm not looking for someone to solve my life/problems, but I would really appreciate if someone points me in the right direction and gives a jump start.
Thank you for your time.
When you are using svyrepdesign, you are specifying that it is a design with replicated weights, and it uses the formula you provided to calculate the standard errors.
The American Community Survey has 80 replicate weights, so it first calculates the statistic you are interested in with the full sample weights (X), then it calculates the same statistic with all 80 replicate weights (X_r).
You should read this: https://usa.ipums.org/usa/repwt.shtml
I am a new R user and I am trying to estimate a country specific reproduction number ("R0") for a disease called "Guinea worm".
I tried to install the R0 package but I can't figure out how it works.
I have the number of cases reported in a range of years, the total population per year and an uniform distribution specifying the generation time function.
Is it possible to estimate R0 with these data? Thank you for any help you can provide.
Yes, you can. You can get started with the estimate.R{R0} command. Please follow the exercise as the example in the respective documentation.
I recently started to work with a huge dataset, provided by medical emergency
service. I have cca 25.000 spatial points of incidents.
I am searching books and internet for quite some time and am getting more and more confused about what to do and how to do it.
The points are, of course, very clustered. I calculated K, L and G function
for it and they confirm serious clustering.
I also have population point dataset - one point for every citizen, that is similarly clustered as incidents dataset (incidents happen to people, so there is a strong link between these two datasets).
I want to compare these two datasets to figure out, if they are similarly
distributed. I want to know, if there are places, where there are more
incidents, compared to population. In other words, I want to use population dataset to explain intensity and then figure out if the incident dataset corresponds to that intensity. The assumption is, that incidents should appear randomly regarding to population.
I want to get a plot of the region with information where there are more or less incidents than expected if the incidents were randomly happening to people.
How would you do it with R?
Should I use Kest or Kinhom to calculate K function?
I read the description, but still don't understand what is a basic difference
between them.
I tried using Kcross, but as I figured out, one of two datasets used
should be CSR - completely spatial random.
I also found Kcross.inhom, should I use that one for my data?
How can I get a plot (image) of incident deviations regarding population?
I hope I asked clearly.
Thank you for your time to read my question and
even more thanks if you can answer any of my questions.
Best regards!
Jernej
I do not have time to answer all your questions in full, but here are some pointers.
DISCLAIMER: I am a coauthor of the spatstat package and the book Spatial Point Patterns: Methodology and Applications with R so I have a preference for using these (and I genuinely believe these are the best tools for your problem).
Conceptual issue: How big is your study region and does it make sense to treat the points as distributed everywhere in the region or are they confined to be on the road network?
For now I will assume we can assume they are distributed anywhere.
A simple approach would be to estimate the population density using density.ppp and then fit a Poisson model to the incidents with the population density as the intensity using ppm. This would probably be a reasonable null model and if that fits the data well you can basically say that incidents happen "completely at random in space when controlling for the uneven population density". More info density.ppp and ppm are in chapters 6 and 9 of 1, respectively, and of course in the spatstat help files.
If you use summary statistics like the K/L/G/F/J-functions you should always use the inhom versions to take the population density into account. This is covered in chapter 7 of 1.
Also it could probably be interesting to see the relative risk (relrisk) if you combine all your points in to a marked point pattern with two types (background and incidents). See chapter 14 of 1.
Unfortunately, only chapters 3, 7 and 9 of 1 are availble as free to download sample chapters, but I hope you have access to it at your library or have the option of buying it.