Dynamic GMM in R - r

I am dealing with large panel data, here is just my sample panel data:
Year PERMNO A B SIC
1991 a 3 11 2
1991 b 4 12 2
1991 c 5 15 1
1992 a 4 10 2
1992 b 4 14 2
1992 c 3 11 1
1993 a 2 9 2
1993 b 3 15 2
Dynamic GMM enables us to estimate the A-B relation while including both past A levels and fixed effects to account for the dynamic aspects of the A-B relation and time-invariant unobservable heterogeneity.
Specifically, I have the following:
A(t)=alpha+B(t)+A(t-1)+A(t-2)+controls+Year_dummy+Industry_dummy
Currently, I am running:
lm(A ~ B + Controls + lag_A +lag_A_2 + factor(SIC_2)+factor(Year), data = data)
The reason I am not using "plm" is becuase normally we have to set the index:
pdata.frame(data, index = c("PERMNO","Year"))
However, my fixed effects are Year and industry, not individual PERMNO. Each industry contains mutiple PERMNO.In "lm" fixed effects can be implemented by adding factor(SIC_2)+factor(Year).
I do not know how to run this Dynamic GMM in R? I notice some people refer to "pgmm".

Related

Nested logit model using panel data in R

I am new to R and I would love it if you can help me with this because I am having serious difficulties.
I have unbalanced panel data that shows monthly companies' performance compared to the rest of the market in terms of $$ (eg. this month company 1 has made $1000 more than the average of the market). Each of these companies had decided on a strategy when they entered the market (1 through 8). These strategies are nested into two different groups (a,b) so that strategies 1,2, and 3 are part of the group a, while strategies 4 through 8 are part of group b. I would need a rank of the best strategies from best to worst.
I have discretized my DV so that now it only shows whether that month company 1 performed higher or lower than the market. However, I am not sure it is the right way because I would then lose how much better or worse each month companies performed compared to the market.
My data looks like this:
ID Main Strategy YearMonth DiffPerformance Control1 Control 2 DiffPerformanceHL
1 a 2 201706 9.037 2 57 H
1 a 2 201707 4.371 2 57 H
1 a 2 201708 1.633 2 57 H
1 a 2 201709 -3.521 2 59 L
1 a 2 201710 13.096 2 59 H
1 a 2 201711 5.070 2 60 H
1 a 2 201712 4.25 2 60 H
2 b 5 201904 6.78 4 171 H
2 b 5 201905 -15.26 4 169 L
2 b 5 201906 7.985 4 169 H
Where ID is the company, Main is the group (a or b) Strategies are 1 through 8 and nested as previously stated, YearMonth represents the specific month, DifferencePerformance is the DV as a continuous variable, Control 1 is static over time and is a categorical variable (1 through 6), Control 2 is a control count variable that changes over time, and DiffPerformance HL is the discretized DV.
Can you please help me figuring out how to create a nested logit model in R? I would be super appreciative
Thanks

Cannot introduce offset into negative binomial regression

I'm performing a count data analysis in R, dealing with a data called 'doctor' which is:
V2 V3 L4 V5
1 1 32 10.866795 1
2 2 104 10.674706 1
3 3 206 10.261581 1
4 4 186 9.446440 1
5 5 102 8.578665 1
6 1 2 9.841080 2
7 2 12 9.275472 2
8 3 28 8.649974 2
9 4 28 7.857481 2
10 5 31 7.287561 2
The best model was V3~V2+L4+V5+V2:L4:V5 using stepwise AIC. Now I want to set L4 as the offset and perform negative binomial regression including the interaction, so I used the code nbinom=glm.nb(V3~V2+V5+V2:V5,offset=L4) but get this error message that says Error in glm.control(...) : unused argument (offset = L4). What have I done wrong here?
Offsets are entered using an offset term in the model formula:
nbinom=glm.nb(V3~V2+V5+V2:V5+offset(L4))
Also you can use V2*V5 instead of V2+V5+V2:V5

Frequency distribution using binCounts

I have a dataset of Ages for the customer and I wanted to make a frequency distribution by 9 years of a gap of age.
Ages=c(83,51,66,61,82,65,54,56,92,60,65,87,68,64,51,
70,75,66,74,68,44,55,78,69,98,67,82,77,79,62,38,88,76,99,
84,47,60,42,66,74,91,71,83,80,68,65,51,56,73,55)
My desired outcome would be similar to below-shared table, variable names can be differed(as you wish)
Could I use binCounts code into it ? if yes could you help me out using the code as not sure of bx and idxs in this code?
binCounts(x, idxs = NULL, bx, right = FALSE) ??
Age Count
38-46 3
47-55 7
56-64 7
65-73 14
74-82 10
83-91 6
92-100 3
Much Appreciated!
I don't know about the binCounts or even the package it is in but i have a bare r function:
data.frame(table(cut(Ages,0:7*9+37)))
Var1 Freq
1 (37,46] 3
2 (46,55] 7
3 (55,64] 7
4 (64,73] 14
5 (73,82] 10
6 (82,91] 6
7 (91,100] 3
To exactly duplicate your results:
lowerlimit=c(37,46,55,64,73,82,91,101)
Labels=paste(head(lowerlimit,-1)+1,lowerlimit[-1],sep="-")#I add one to have 38 47 etc
group=cut(Ages,lowerlimit,Labels)#Determine which group the ages belong to
tab=table(group)#Form a frequency table
as.data.frame(tab)# transform the table into a dataframe
group Freq
1 38-46 3
2 47-55 7
3 56-64 7
4 65-73 14
5 74-82 10
6 83-91 6
7 92-100 3
All this can be combined as:
data.frame(table(cut(Ages,s<-0:7*9+37,paste(head(s+1,-1),s[-1],sep="-"))))

How can I filter out rows from linear regression based on another linear regression

I would like to conduct a linear regression that will have three steps: 1) Running the regression on all data points 2) Taking out the 10 outiers as found by using the absolute distanse value of rstandard 3) Running the regression again on the new data frame.
I know how to do it manually but these is very awkwarding. Is there a way to do it automatically? Can it be done for taking out columns as well?
Here is my toy data frame and code (I'll take out 2 top outliers):
df <- read.table(text = "userid target birds wolfs
222 1 9 7
444 1 8 4
234 0 2 8
543 1 2 3
678 1 8 3
987 0 1 2
294 1 7 16
608 0 1 5
123 1 17 7
321 1 8 7
226 0 2 7
556 0 20 3
334 1 6 3
225 0 1 1
999 0 3 11
987 0 30 1 ",header = TRUE)
model<- lm(target~ birds+ wolfs,data=df)
rstandard <- abs(rstandard(model))
df<-cbind(df,rstandard)
g<-subset(df,rstandard > sort(unique(rstandard),decreasing=T)[3])
g
userid target birds wolfs rstandard
4 543 1 2 3 1.189858
13 334 1 6 3 1.122579
modelNew<- lm(target~ birds+ wolfs,data=df[-c(4,13),])
I don't see how you could do this without estimating two models, the first to identify the most influential cases and the second on the data without those cases. You could simplify your code and avoid cluttering the workspace, however, by doing it all in one shot, with the subsetting process embedded in the call to estimate the "final" model. Here's code that does this for the example you gave:
model <- lm(target ~ birds + wolfs,
data = df[-(as.numeric(names(sort(abs(rstandard(lm(target ~ birds + wolfs, data=df))), decreasing=TRUE)))[1:2]),])
Here, the initial model, evaluation of influence, and ensuing subsetting of the data are all built into the code that comes after the first data =.
Also, note that the resulting model will differ from the one your code produced. That's because your g did not correctly identify the two most influential cases, as you can see if you just eyeball the results of abs(rstandard(lm(target ~ birds + wolfs, data=df))). I think it has to do with your use of unique(), which seems unnecessary, but I'm not sure.

Competing risk analysis of interval data

I study competitive risks and use R.
I would like to use the model in Fine and Gray (1999), A proportional hazards model for the subdistribution of a competing risk, JASA, 94:496-509.
I found the cmprsk package.
However, I have an “interval data” configuration with a starting time t0 and an ending time t1 for each interval, t1 being the exit or the right censoring when I am in the last interval for a given entity. Here is an extract of the dataset
entity t0 t1 cov
1 0 3 12
1 3 7 4
1 7 9 1
2 2 3 2
2 3 10 9
3 0 10 11
4 0 1 0
4 1 6 21
4 6 7 12
...
I do not find how to implement that with cmprsk, while it is implemented FOR EXAMPLE in the survival package (Surv(time,time2,…)).
Is it possible to do it with cmprsk or should I go to another package?
I know that there is a Stata package (stcrreg) doing it but I prefer working with R.

Resources