I have a data frame in R that examines the ELO rating of college football teams over the course of several decades.
Data Layout
Each row is a specific game, and the team listed under the Team.A column is a winning team while the team under Team.B is a losing team. Also, the ELO scores under Elo.A represent the score for Team.A and the ELO scores under Elo.B represent the score for Team.B for those games, respectively.
I want to create a time-series that, for instance, looks at all of the ELO scores in Elo.A and Elo.B for Minnesota. Is there a way in R that can pull the date and scores in both of those columns for that one school?
How about:
df[df$team.A=="Minesota" | df$tema.B=="Minesota", ]
And you can select and specific columns using c(...) in the space after the ','
Related
I have a question regarding the filtering of a loan dataset for my upcoming thesis.
My dataset consists of loan data which is reported for 5 years on a quarterly basis. The column of interest is the 'Loan Identifier' as well as the 'Cut-Off-Date'. I just want to observe the loans (via Loan Identifier) that exist at the first reporting date (first quarter) for every upcoming quarter (cut-off-date).
For example, if there are the loans with the identifier c("1001","1002","1003") in the first cut-off-date and the second cut-off date, one quarter later, has loans with identifiers ("1002","1003","1004"), R should filter for only the identifiers that existed in the first quarter ("1002","1003"). So that new loans during the analysis are completely ignored.
Is there also the possibility to do that all in one file? Or should I extract the data of each cut-off-date in a new table?
Thanks and best regards!
I am thinking about assigning each loan in the first quarter as a vector. After that, I should split up the loan dataset for each cut-off-date and merge the vector with the new tables via left_join. So that every loan that does not match with the vector is disregarded.
As I have multiple loan pools with 15 pool-cut-off dates, this seems very impractical for me. Maybe there is a smarter and more effective solution.
I am a cross country runner on a high school team, and I am using my limited knowledge of R and linear algebra to create a ranking index for xc teams.
I get my data from milesplit.com, but I am unsure if I am formatting this data properly. So far I created matrices for each race, with odd columns including runner score and even columns including time, where each team has a team_score and team_time column. I want to analyze growth of teams in a time series, but I have two questions about this:
(1): can I combine all of these "race matrices" into a time series? Can I assign all the data in a race matrix a certain date, then make one big time series including all 25 race matrices I made?
(2): Am I closing myself off to insights by not including name and grade for each runner (as I only record time and score)? If so, how can I write a matrix that contains all this information?
So I have one dataset (DF1) that includes baseball players, the year, and their stats in that year. I have another (DF2) that lists the players, the year, and their salary in that year.
I would like to add the salary column information to DF1 when player name AND year match in both datasets.
I tried
DF1$Salary <- DF2$salary[match(Pitching$playerID, Salaries$playerID)]
But realized that if I did this the information was only correct for the first year. I need to only make the match if year and player ID are the same. Can someone help me? Thanks!
Firstly, I'm new to R and I apologize. So I'm working with data involving prescriptions. Since it's on a secure VM, I can't copy and paste, but the data structure looks like this:
Patient ID | Medication | Start Date | End Date
There are multiple rows for each patient, since each patient has been precribed more than one medication.
What I want to do is the following:
Find out how many medications/which medications the patients are on that overlap each other in terms of time frame, and then return how many overlapping prescriptions the patients has. Is there a way to do this in R?
I am new to R but finding it a powerful solution when working with education data across a state. I have grades for about 11,000 students over the span of two years. Most students 12 rows in my dataset, as most schools work on a semester system. Many schools, however, work on a trimester or quarter system, meaning there are more or less rows and, therefore, more or less grades. The grades are relatively close throughout each semester/trimester/whatever and I have already converted the letter grade into a numeric value. A column titled 'TERM' identifies which system the school is under (SEM1/2, TRI1/2/3, QTR1/2/3/4). I am wondering if anyone has an idea as to how best organize this data by TERM so I have something normalized.
df<- cbind(c('stu1', 'stu1', 'stu2','stu2','stu2'), c('sem1','sem2', 'tri1','tri2','tri3'), c('a','c','a','b','a'), c(4,2,4,3,4))