I have a question regarding conditional lagging variables. The data structure is as following: paired variables of S&P1500 CEO characteristics according to a company key and financial year. For one company you can have multiple values of the same financial year (multiple CEO's in that year). I would like to lookup the value of a third variable (called AT) of the last value of the previous financial year within the same key (same company).
Related
I need to do a term paper with R (never did it before) and I have the following problem that I cannot solve.
I have a dataset with all countries of all years since 1950 (so one row (observation) is one country in one year, the next row is the same country one year later and so on). Now I need to construct a new variable, which is filled with the average value of the previous three years of a given variable.
Specifically it is about the democracy level of a country. So I have the variable of the democracy level of a country for a year T, and I need a new variable, which indicates the "democracy growth" of the previous three years T(-3,0).
How can I construct this new variable?
As I said I never used R before, but I need to use mutate() and then I need to address year-3, year-2, year-1 and divide it by 3. But how to I address the previous three years? Case_when or something?
I have a question regarding the filtering of a loan dataset for my upcoming thesis.
My dataset consists of loan data which is reported for 5 years on a quarterly basis. The column of interest is the 'Loan Identifier' as well as the 'Cut-Off-Date'. I just want to observe the loans (via Loan Identifier) that exist at the first reporting date (first quarter) for every upcoming quarter (cut-off-date).
For example, if there are the loans with the identifier c("1001","1002","1003") in the first cut-off-date and the second cut-off date, one quarter later, has loans with identifiers ("1002","1003","1004"), R should filter for only the identifiers that existed in the first quarter ("1002","1003"). So that new loans during the analysis are completely ignored.
Is there also the possibility to do that all in one file? Or should I extract the data of each cut-off-date in a new table?
Thanks and best regards!
I am thinking about assigning each loan in the first quarter as a vector. After that, I should split up the loan dataset for each cut-off-date and merge the vector with the new tables via left_join. So that every loan that does not match with the vector is disregarded.
As I have multiple loan pools with 15 pool-cut-off dates, this seems very impractical for me. Maybe there is a smarter and more effective solution.
I am a cross country runner on a high school team, and I am using my limited knowledge of R and linear algebra to create a ranking index for xc teams.
I get my data from milesplit.com, but I am unsure if I am formatting this data properly. So far I created matrices for each race, with odd columns including runner score and even columns including time, where each team has a team_score and team_time column. I want to analyze growth of teams in a time series, but I have two questions about this:
(1): can I combine all of these "race matrices" into a time series? Can I assign all the data in a race matrix a certain date, then make one big time series including all 25 race matrices I made?
(2): Am I closing myself off to insights by not including name and grade for each runner (as I only record time and score)? If so, how can I write a matrix that contains all this information?
I have a data frame in R that examines the ELO rating of college football teams over the course of several decades.
Data Layout
Each row is a specific game, and the team listed under the Team.A column is a winning team while the team under Team.B is a losing team. Also, the ELO scores under Elo.A represent the score for Team.A and the ELO scores under Elo.B represent the score for Team.B for those games, respectively.
I want to create a time-series that, for instance, looks at all of the ELO scores in Elo.A and Elo.B for Minnesota. Is there a way in R that can pull the date and scores in both of those columns for that one school?
How about:
df[df$team.A=="Minesota" | df$tema.B=="Minesota", ]
And you can select and specific columns using c(...) in the space after the ','
I have a pretty big csv-file with data on volatility in it. the file contains numerous columns starting with the particular country (i), the year of when the volatility is indicated, i have a column for the volatility (measured as standard deviation of the logs of the exchange rate), the logarithms of the exchange rate and the exchange rate itself. my country sample includes 152 countries (accordingly there is 152 columns of the measured volatility, the logs of the exchange rate and the exchange rate). The column headers look like this:
"i" "year" "vol0" "vol1" "vol2" "vol3" "vol4"
"lER0" "lER1" "lER2" "lER3" "lER4"
"ER0" "ER1" "ER2" "ER3" "ER4"
now i am faced with the task to do summary statistics on this data. to design this in a comparable and neat way i want to group the countries (i) respectively their volatilities into different "region groups". the regions are defined in another file (it would look for say the united states like this: country id: 67, region: NAm).
now to my question: how do evaluate the data based on the different groups; meaning how do i assign the countries to the groups and then how do i do the summary statistics per group?