I am working on a time series project.
I would like to add exogeneous variables on my regression. The exogeneous variables have a seasonnal component and I don't know if it is necessary to eliminate the seasonality and then to include the variable on the regression or simply to include the exogeneous variable on the regression.
Is there someone who can help me? Is there some econometrics references?
Thanks a lot.
As a start, if you are looking for a guide to the computation, you might have a look at this excellent external reference which lists out lots of packages which can be very useful for time series analysis.
Additionally, you might also have a look at this brief description on exogenous variables. In general, every regression model first assumes that all independent regression variables are exogenous.
You might also want to have a look at this other excellent resource which talks about time series analysis & de-seazonalization. I would suggest that, before you do any variable selection on the time series, you de-seasonalize it. This simple act can allow you to get a picture of the trend of the data & flatten it so that you can objectively look at the predictions between groups.
Related
I created a Random-Forest Regression model for time-series data in R that have three predictors and one output variable.
Is there a way to find (perhaps in more absolute terms) how changes in a specific variable affect the prediction output?
I know about variable importance, I am not trying to find the variables that have the biggest effect instead I am trying to see if I pick input variable X_1 and increase its value (or decrease it) how that would change the prediction output.
Does it even makes sense to do this? or is it even possible with a random-forest model? Rereading my question a few times it made me dubious, but any insight/recommendation would be greatly appreciated.
I would guess what this question is actually about is called exploratory data analysis (EDA). For starters, I would calculate the correlations between the variables to get a feeling for the strength of the [linear] relationship between two variables. Further, I would look at scatter plots between the variables to get a feeling for the relationships. Depending on the variables [linear] regression could tell how an increase in variable x1 would affect variable x2.
I have a dataset containing repeated measures and quite a lot of variables per observation. Therefore, I need to find a way to select explanatory variables in a smart way. Regularized Regression methods sound good to me to address this problem.
Upon looking for a solution, I found out about the glmmLasso package quite recently. However, I have difficulties defining a model. I found a demo file online, but since I'm a beginner with R, I had a hard time understanding it.
(demo: https://rdrr.io/cran/glmmLasso/src/demo/glmmLasso-soccer.r)
Since I cannot share the original data, I would suggest you use the soccer dataset (the same dataset used in glmmLasso demo file). The variable team is repeated in observations and should be taken as a random effect.
# sample data
library(glmmLasso)
data("soccer")
I would appreciate if you can explain the parameters lambda and family, and how to tune them.
I'm working with a large data set with repeated patients over multiple months with ordered outcomes on a severity scale from 1 to 5. I was able to analyze the first set of patients using the polr function to run a basic ordinal logistic regression model, but now want to analyze association across all the time points using a longitudinal ordinal logistic model. I can't seem to find any clear documentation online or on this site so far explaining which package to use and how to use it. I am also an R novice so any simple explanations would be incredibly useful. Based on some initial searching it seems like the mixor function might be what I need though I am not sure how it works. I found it on this site
https://cran.r-project.org/web/packages/mixor/vignettes/mixor.pdf
Would appreciate a simple explanation of how to use this function if this is the right one, or would happily take any alternate suggestions with an explanation.
Thank you in advance for your help!
I am working with R. I need to identify the predictors of higher Active trial start percentage over time (StartDateMonthsYrs). I will do linear regression with Percent.Active as the dependent variable.
My original dataframe is attached and my obtained Active trial start percentage over time (named Percent.Activeis presented here.
So, I need to assess whether federal sponsored trials, industry sponsored trials or Other sponsored trials were associated with higher active trial start percentage over time. I have many other variables that I wneed to assess but this is the sample of my data.
I am thinking to do many crosstabs for each variable (eg Fedral & Active then Industry & Active..etc.) in each month (may be with help of lapply then accumulate the obtained percentages data in the second sheet then run the analysis based on that.
My code for linear regression is as follow:
q.lm0 <- lm(Percent.Active ~ Time.point+ xyz, data.percentage);summary(q.lm0)
I'm a little bit confused. You write 'associated'. If you really want to look for association then yeah, a crosstab might be possible, and sufficient, as association is not the same as causation (which is further derived from correlation, if there is a theory behind). If you look for correlation, and insights over time, doing a regression with the lm package is not useful.
If you want to look for a regreesion type analysis there are packages in R like the plm package, which can deal with panel data, as you clearly have panel data (time points, and interested trials labels, and repetitive time points for these labels). Look at this post for infos about the package:https://stackoverflow.com/questions/2804001/panel-data-with-binary-dependent-variable-in-r
I'm writing you this because your Percent.Activevariable is only a binary outcome of 0/1 I'm not sure if this is on purpose. However, even if your outcome is not binary, the plm package might help, but you will find other mentioned packages in that post.
I am new to using R as I usually use Stata. I want to estimate a state space model on some time series data with time varying coefficients. From what I have gathered this is not possible to do in Stata.
I have downloaded the dlm package in R and I am trying to run the dlmModReg command to regress my dependent variable on a single explanatory variable. I would like to allow the intercept and beta coefficient to vary over time.
If anyone could show me an example of the code I want to run I think that would be enough for me to work out how to do this. The examples I have found online are vague or use terminology that I am not familiar with as a new R user. Any help or comments are greatly appreciated.