How to use fixed time effects in r? - r

I have panel dataset with 40 variables for many cities over a period of time, from 1980-2014, and I'm trying to use multiple linear regression using only three variables but also I want to have time dummies for each and every year to control for unobserved shocks over time.
Should I create a dummy for each year? but that would create too many columns.
I don't know how to create the set of time dummies just to be in one column in R (as one variable).
I searched online but couldn't find help.

Related

Regress variable on variables the date before

For an econometrics project, I'm using R to estimate some effects with panel data.
To know if the strict exogeneity isn't too restrictive, I'm running the following 2SLS estimation to predict Y_it (which are sales) by X_it (some variables) using a first difference model.
I need to regress each component of Delta_X_it (=X_it - X_it-1) on a constant and all components of Delta_X_it-1
Then regress Delta_Y_it on the estimations of Delta_X_it
The 2nd step will be easy to implement if the first step is done, but this first step is the problem. I already first differenced all variables by group (here by Store), but I don't know how to tell R that I want to regress one variable at time t on variables at time t-1 while grouping by Store. Any idea on how to do so ?

Create a new datafram to do piecewise linear regression on percentages after doing serial crosstabs in R

I am working with R. I need to identify the predictors of higher Active trial start percentage over time (StartDateMonthsYrs). I will do linear regression with Percent.Active as the dependent variable.
My original dataframe is attached and my obtained Active trial start percentage over time (named Percent.Activeis presented here.
So, I need to assess whether federal sponsored trials, industry sponsored trials or Other sponsored trials were associated with higher active trial start percentage over time. I have many other variables that I wneed to assess but this is the sample of my data.
I am thinking to do many crosstabs for each variable (eg Fedral & Active then Industry & Active..etc.) in each month (may be with help of lapply then accumulate the obtained percentages data in the second sheet then run the analysis based on that.
My code for linear regression is as follow:
q.lm0 <- lm(Percent.Active ~ Time.point+ xyz, data.percentage);summary(q.lm0)
I'm a little bit confused. You write 'associated'. If you really want to look for association then yeah, a crosstab might be possible, and sufficient, as association is not the same as causation (which is further derived from correlation, if there is a theory behind). If you look for correlation, and insights over time, doing a regression with the lm package is not useful.
If you want to look for a regreesion type analysis there are packages in R like the plm package, which can deal with panel data, as you clearly have panel data (time points, and interested trials labels, and repetitive time points for these labels). Look at this post for infos about the package:https://stackoverflow.com/questions/2804001/panel-data-with-binary-dependent-variable-in-r
I'm writing you this because your Percent.Activevariable is only a binary outcome of 0/1 I'm not sure if this is on purpose. However, even if your outcome is not binary, the plm package might help, but you will find other mentioned packages in that post.

Is it possible to use the evtree package in R for panel data / over multiple years?

I would like to know, if it is possible to use evtree over multiple years?
I have an unbalanced panel data set (8 years), with two groups based on a (binary) dependent variable (dv). For every year, the dv-value for each observation can be either 0 or 1, and thus constitutes group membership. Also, I have multiple predictor variables (pv), where their influence on dv might change over time.
Evtree generally seems the correct approach for me (at least for a single year). My goal is to train the evtree model over multiple periods (to capture possible temporal effects) in order to classify the two groups as good as possible.
Any help is highly appreciated.
Thanks in advance!

Force range in R Kaplan-Meier time component

I have a numeric time variable and I want to observe my data using R's Kaplan-Meier implementation. The common way is:
km <- survfit(Surv(time,event)~1)
summary(km)
But I have specific range of time that I want processed say 1-3,4-6, and so on but my method returns an error.
km <- survfit(Surv(time=c(c(1,3),c(4,6),c(7,9)),event)~1)
summary(km)
Is this possible with another code or do I have to change the time variable to suit my needs (ie recode into categories)?
The time variable you provide should include the survival times of the individual observations in your dataset.
Not sure if I understand your question correctly, but if you want the KM-estimates for specific timepoints you can get those by using the times argument in the summary function for survfit:
summary(km, times=c(1,3,4,6,7,9))
Of course you could also plot the KM-curve using plot(km) to visualize it.

survival package, right censored data

I account for right censored data in the analysis of my dataset. I am using the survival package - given cancer treatment tactics and when the patient last checked in with my clients clinic.
Is there a suggested method or manipulation to the standard survival package to account for right-censored data?
Our rows are unique individual patients...
Here are our columns that are filled out:
List item
our treatment type (constant)
days since original diagnosis
'censored' which is the number of patients who were last heard on this day. Hence, We are now uncertain if they are still alive or dead seen as they stopped attending the clinic. They should be removed from the probability estimate at all points in future.
# of patients who died on that day (from original diagnosis)
So do you recommend a manipulation of the standard survival package? Or using another package? I have seen survSNP, survPRESMOOTH and survBIVAR that may perhaps help. I want to avoid recalculations of the individual columns/fields and creating new objects of the R algorithm seeing as this is a small part of a very large dataset.

Resources