Predicting using an exponential model - r

I have the following data:
Days Total cases
1 3
2 3
3 5
4 6
5 28
6 30
7 31
8 34
9 39
10 48
11 63
12 70
13 82
14 91
15 107
16 112
17 127
18 146
19 171
20 198
21 258
22 334
23 403
24 497
25 571
26 657
27 730
28 883
29 1024
30 1139
31 1329
32 1635
33 2059
34 2545
35 3105
36 3684
37 4289
38 4778
39 5351
40 5916
41 6729
42 7600
43 8452
44 9210
45 10453
46 11484
47 12370
48 13431
49 14353
50 15724
51 17304
52 18543
53 20080
54 21372
I defined days as 'days' and total cases as 'cases1'. I run the following code:
exp.mod <- lm(log(cases1)~days)
I get a good model with reasonable residuals and p-value.
but when i run the following:
predict(exp.mod, data.frame(days=60))
I get the value of 11.66476, which doesnt seem to be correct.
I need to get the value and also include the predictive plot in the exponential model.
Hope that clarifies the issue.

you should consider the EST models from the forecast package.
Below an example.
library(dplyr)
library(forecast)
ausair %>% ets() %>% forecast() %>% autoplot()
I suggest you to check the free book of the Prof. Rob J Hyndman and Prof George Athanasopoulos wrote (are the authors of the forecast package).

Related

Translate this R geometric problem using numpy random geometric

How can I translate this geometric law problem to numpy ?
Products produced by a machine has a 3% defective rate.
What is the probability that the first defective oc-curs in the fifth item inspected?
P(X= 5) =P(1st 4 non-defective )P( 5th defective)=(0.974)(0.03)
In R > dgeom (x= 4, prob = .03)[1] 0.02655878T
The convention in R is to record X as the number of failures that occur
before the first success.
Is this my numpy code ok ? :
result = np.random.geometric(p=0.03, size=1000)
print(result);
result = (result == 5).sum() / 1000.
print(result * 1000,"%");
I get 17 % as a result with numpy , is it ok ? Seem wrong because there is only 3% defect rate.
This is the numpy result Array :
""" [ 31 20 37 9 47 31 22 7 44 15 52 15 4 14 36 45 26 27
9 48 30 5 7 17 7 24 121 22 23 49 2 26 25 8 4 5
3 27 70 71 3 1 19 22 103 18 14 20 34 45 8 169 11 63
29 71 30 79 75 19 56 9 5 8 15 44 8 12 40 29 46 2
144 69 65 1 4 90 20 187 100 52 46 76 3 105 12 110 31 3
113 18 6 15 127 22 6 7 3 18 123 41 69 104 13 18 2 8
52 35 54 27 74 22 31 27 3 15 21 26 13 3 32 10 131 20
I guess that 31 is the number of integrity checks before a failure .... 20 , 37 etc ...
This is what I would do:
np.random.seed(1)
tests = np.random.choice([0,1], size=(1000,5), p=[0.7,0.3])
((np.argmax(tests, axis=1) == 4) & tests[:,4]==1).mean()
# 0.073

Data fitting by the method of maximum likelihood for a new distribution

I would like to know how one is able to fit any distribution to a given set of data using the method of MLEs. As a particular example, could anyone suggest a working code that would give the correct results for the MLEs for $\theta$ and $\beta$ when the generalised Lindley distribution described in https://rivista-statistica.unibo.it/article/viewFile/6836/7039 is applied to the data: 5.1, 6.3, 10.8, 12.1, 18.5, 19.7, 22.2, 23, 30.6, 37.3, 46.3, 53.9, 59.8, 66.2 on pg. 156? How can this then be used to fit the distribution over a histogram?
I'm going to answer a now-deleted question of yours that is very similar, based on a problem in Parari et al. (data on air conditioning from Proschan (Table 7.2, results in Table 7.5)
In general you need to know reasonable starting values in order to be able to do general-purpose maximum-likelihood estimation (this is admittedly a chicken-and-egg problem, but there are lots of solutions [use the method of moments, pick reasonable values based on the problem, eyeball a graph to pick values, etc.]. In this case, since the authors gave you the answers, you can use those parameters to pick starting values of the right order of magnitude. More generally, you need to know something about reasonable orders of magnitude in order to get started.
Note that there are lots of fussy numerical issues with general maximum likelihood estimation: e.g. see chapter 7 here ...
Data (x) and likelihood function (dBEPL) are defined below. I am defining the density function and using the formula interface to mle2() ...
dd <- data.frame(x)
library(bbmle)
## parameters from table
ovec <- list(alpha=.7945,beta=0.1509,omega=6.7278,
a=0.2035,b=0.2303)
## starting vals -- right order of magnitude
svec <- list(alpha=0.8,beta=0.2,omega=10,a=0.2,b=0.2)
m1 <- mle2( x ~ dBEPL(alpha,beta,omega,a,b),
data=dd,
start=svec,
control=list(parscale=unlist(svec)))
coef(m1) ## NOT the same as reported, but close
par(las=1,bty="l")
hist(x,col="gray",freq=FALSE,breaks=40,ylim=c(0.,0.014))
with(as.list(coef(m1)),curve(dBEPL(x,alpha,beta,omega,a,b,log=FALSE),add=TRUE,
col=2,lwd=2,n=201))
x <- scan(textConnection("
194 413 90 74 55 23 97 50 359 50 130 487 57 102
15 14 10 57 320 261 51 44 9 254 493 33 18 209 41
58 60 48 56 87 11 102 12 5 14 14 29 37 186 29 104
7 4 72 270 283 7 61 100 61 502 220 120 141 22 603 35
98 54 100 11 181 65 49 12 239 14 18 39 3 12 5 32 9 438
43 134 184 20 386 182 71 80 188 230 152 5 36 79 59 33
246 1 79 3 27 201 84 27 156 21 16 88 130 14 118 44 15 42
106 46 230 26 59 153 104 20 206 566 34 29 26 35 5 82 31
118 326 12 54 36 34 18 25 120 31 22 18 216 139 67 310
346 210 57 76 14 111 97 62 39 30 7 44 11 63 23 22 23 14
18 13 34 16 18 130 90 163 208 124 70 16 101 52
208 95 62 11 191 14 71"))
dBEPL <- function(x,alpha,beta,omega,a,b,log=TRUE) {
r <- log(alpha*beta^2*omega/(beta(a,b)*(beta+1)))+
log(1+x^alpha)+(alpha-1)*log( x)-beta* x^alpha+(omega*a-1) *
log(1-(1+beta* x^alpha/(beta+1))*exp(-beta* x^alpha))+
(b-1)*log(1-(1-(1+beta* x^alpha/(beta+1))*exp(-beta* x^alpha))^omega)
if (log) return(r) else return(exp(r))
}

R Points in Polygon

Was wondering if you could help me with the following. I am trying to calculate the amount of points that fall within each polygon US state. There are 52 states total. The point data and the polygon data are both in the same transformation.
I can run the function:
over(Transformed.States, clip.points)
Which returns:
0 1 2 3 4 5 6 7 8 9 10
4718 NA 488 2688 4454 3762 2041 NA 5 NA 3620
11 12 13 14 15 16 17 18 19 20 21
412 3042 2028 3390 2755 4250 3275 2484 466 4255 1
22 23 24 25 26 27 28 29 30 31 32
3238 744 4125 2926 927 495 3541 4640 3039 895 620
33 34 35 36 37 38 39 40 41 42 43
4069 4671 3801 1012 4023 626 1158 4627 217 13 4055
44 45 46 47 48 49 50 51
573 3456 NA 4670 4505 903 4172 4641
However, I want to write this function so that each polygon is given a value based on the amount of points in the polygon that can then be plotted such as:
plot(points.in.state)
What would be the best function to go about this? So that I still have polygon data but with the new point in polygons data attached?
The end goal of this is to make a graduated symbol map for each state based on the values for points in each state.
Thanks!
Jim

Making a Bland-Altman plot for 2 columns of a data set using R

Can someone help me on this? I would like to make a Bland-Altman plot using R, for 2 columns of my data, the columns are forearm and forearm2 in the data below, but I have no idea how.
> data_2
Sex Forearm Height Age Forearm2
1 Male 17 182 55 26
2 Male 18 185 103 28
3 Male 20 171 49 25
4 Male 18 176 58 25
5 Male 21 158 57 23
6 Female 21 155 43 25
7 Male 18 199 114 29
8 Male 19 176 90 25
9 Male 17 191 68 29
10 Male 23 176 52 25
11 Female 19 153 34 24
12 Female 19 160 56 26
13 Male 19 170 47 25
14 Male 22 178 62 25
15 Female 21 174 49 27
16 Male 22 162 40 24
17 Female 23 172 82 27
18 Female 19 185 99 28
19 Female 18 168 66 25
20 Female 17 155 45 24
21 Male 17 182 83 27
22 Female 17 164 42 25
23 Female 18 162 73 26
24 Male 18 185 68 28
25 Female 18 146 50 23
26 Female 23 169 47 25
27 Female 18 160 51 24
28 Female 18 170 69 25
29 Male 24 185 57 27
30 Male 24 167 46 24
31 Female 25 169 47 26
32 Female 24 164 50 25
33 Female 21 155 47 235
34 Female 24 158 37 24
35 Female 23 177 88 27
36 Female 23 155 36 24
37 Male 19 170 47 24
38 Female 21 170 48 26
39 Female 23 160 74 25
40 Male 21 180 100 26
41 Male 19 186 95 27
42 Male 21 181 65 26
Here is part of what I have done, but am getting nothing. I saw this on wikipedia, so I tried it out to see what i get.
> BAplot=function(data_2$forearm,data_2$forearm2){
(data_2$forearm,data_2$forearm2,
xlab="Mean size(mm)",
ylab="Difference(mm)",
ylim=c(15,40),pch=42)
abline(0,0)}
Error: unexpected '$' in "BAplot=function(data_2$"
library(sos); ???altman produces : Packages MethComp, pairedData, epade,ResearchMethods , and mcr . The MethComp help page warns that the plot command may be deprecated and removed, so I'd grab it while you can :-) or use the other packages.

Generating Stacked bar plots

I have a dataframe with 3 columns
$x -- at http://pastebin.com/SGrRUJcA
$y -- at http://pastebin.com/fhn7A1rj
$z -- at http://pastebin.com/VmVvdHEE
that I wish to use to generate a stacked barplot. All of these columns hold integer data. The stacked barplot should have the levels along the x-axis and the data for each level along the y-axis. The stacks should then correspond to each of $x, $y and $z.
UPDATE: I now have the following:
counted <- data.frame(table(myDf$x),variable='x')
counted <- rbind(counted,data.frame(table(myDf$y),variable='y'))
counted <- rbind(counted,data.frame(table(myDf$z),variable='z'))
counted <- counted[counted$Var1!=0,] # to get rid of 0th level??
stackedBp <- ggplot(counted,aes(x=Var1,y=Freq,fill=variable))
stackedBp <- stackedBp+geom_bar(stat='identity')+scale_x_discrete('Levels')+scale_y_continuous('Frequency')
stackedBp
which generates:
.
Two issues remain:
the x-axis labeling is not correct. For some reason, it goes: 46, 47, 53, 54, 38, 40.... How can I order it naturally?
I also wish to remove the 0th label.
I've tried using +scale_x_discrete(breaks = 0:50, labels = 1:50) but this doesn't work.
NB. axis labeling issue: Dataframe column appears incorrectly sorted
Not completely sure what you're wanting to see... but reading ?barplot says the first argument, height must be a vector or matrix. So to fix your initial error:
myDf <- data.frame(x=sample(1:10,100,replace=T),y=sample(11:20,100,replace=T),z=1:10)
barplot(as.matrix(myDf))
If you provide a reproducible example and a more specific description of your desired output you can get a better answer.
Or if I were to guess wildly (and use ggplot)...
myDf <- data.frame(x=sample(1:10,100,replace=T),y=sample(11:20,100,replace=T),z=1:10)
myDf.counted<- data.frame(table(myDf$x),variable='x')
myDf.counted <- rbind(myDf.counted,data.frame(table(myDf$y),variable='y'))
myDf.counted <- rbind(myDf.counted,data.frame(table(myDf$z),variable='z'))
ggplot(myDf.counted,aes(x=Var1,y=Freq,fill=variable))+geom_bar(stat='identity')
I'm surprised that didn't blow up in your face. Cross-classifying the joint occurrence of three different vectors each of length 35204 would often consume many gigabytes of RAM (and would possibly create lots of useless 0's as you found). Maybe you wanted to examine instead the results of sapply(myDf, table)? This then creates three separate tables of counts.
It's a rather irregular result and would need further work to get it into a matrix form but you might want to consider using densityplot to display the comparative distributions which I think is your goal.
$x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
126 711 1059 2079 3070 2716 2745 3329 2916 2671 2349 2457 2055 1303 892 692
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
559 799 482 299 289 236 156 145 100 95 121 133 60 34 37 13
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
15 12 56 10 4 7 2 14 13 28 30 20 16 62 74 58
49 50
40 15
$y
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
3069 32 1422 1376 1780 1556 1937 1844 1967 1699 1910 1924 1047 894 975 865
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
635 1002 710 908 979 848 678 908 696 491 417 412 499 411 421 217
32 33 34 35 36 37 39 42 46 47 53 54
265 182 121 47 38 11 2 2 1 1 1 4
$z
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
31 202 368 655 825 1246 900 1136 1098 1570 1613 1144 1107 1037 1239 1372
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1306 1085 843 867 813 1057 1213 1020 1210 939 725 644 617 602 739 584
32 33 34 35 36 37 38 39 40 41 42 43
650 733 756 681 684 657 544 416 220 48 7 1
The density plot is really simple to create in lattice:
densityplot( ~x+y+z, myDf)

Resources