qqPloting subset of data R - r

I have R data that looks like this.
Year Total
2005 238.79
2005 165.46
2005 196.07
2005 135.28
2005 180.30
2005 237.95
2005 714.74
2005 828.19
2005 516.19
2005 279.76
2005 281.88
2005 338.68
The left most column Year goes from 2005 to 2009. I want to do a qqPlot of the Total using only the files that have 2005 in the year column. how can i do this

Another option is to use subset(), which might seem more natural:
tmp <- subset(dat, subset = Year == 2005, select = Total)
qqnorm(tmp)
qqline(tmp)
Do note that subset() is not recommended for use in programming as the sugar that makes it works gets all messed up when running inside other functions/environments. Using it interactively like this is what subset() was designed for.

First, some example data:
dat <- read.table(text="Year Total
2005 238.79
2005 165.46
2005 196.07
2005 135.28
2005 180.30
2005 237.95
2008 714.74
2008 828.19
2008 516.19
2009 279.76
2009 281.88
2009 338.68", header = TRUE)
If you want a normal QQ plot:
qqnorm(dat[dat$Year == 2005, "Total"])

Related

How can I arrange a group within a data frame based on year?

I have a data frame ("df") which I want to order based on year for a specific group based on Ticker.
year
Ticker
at
2009
FLWS
286.127
2003
FLWS
214.796
2007
FLWS
352.507
2008
FLWS
371.338
2004
FLWS
261.552
2005
FLWS
251.952
2010
FLWS
256.086
2011
FLWS
256.951
2006
FLWS
346.634
2007
SRCE
4447.104
2009
SRCE
4542.100
2003
SRCE
3330.153
2010
SRCE
4445.281
2011
SRCE
4374.071
2005
SRCE
3511.277
I want to have the data frame in order of year (ascending) for each group of Ticker. I've tried using base R (order) and the dplyr package (group_by, arrange) but I am a complete newbie to any sort of coding so needless to say I have been struggling.

How do I invert the order of a variable in a tibble?

I have some fantasy football data from my league. 12 teams x 8 years = 96 observations. I'm trying to create tibble(year, team, record). The team and record variables are organized correctly. But my year column is in the wrong order. It's current order is below, but I need to reverse it so that 2019 starts at the top and 2012 is the last observation. Each value in the year column repeats 12 times since there are 12 teams. There are no NA values. Thanks in advance.
year team record
2012
2012
2012
2012
2012
2012
2012
2012
2012
2012
2012
2012
2013
2013
2013
.
.
.
2019
I'm dumb, this was quite easy. I'll leave it for others and I'll accept any other answer that works. I just inverted year numerically. year <- year[96:1] then did tibble(year, team, record)

Functions on a Matrix in R

Lets say I have a dataset with a column representing the years.
Years
2007
2008
2009
2011
2015
I want to subtract the row with the row below it and save the ans to a new column. such as for above data I want to make a function that subtracts 2008 to 2007, the ans is 1 and save this ans to a new column, the next would be 2009 - 2008, 2011 - 2009. the resulting matrix should look like
Year Gap
2007 1
2008 1
2009 2
2011 4
2015 .
and so on
How can I make a function in R that will do this for me?

Create a local id for a combination of 2 columns [duplicate]

This question already has answers here:
R - add column that counts sequentially within groups but repeats for duplicates
(3 answers)
Closed 7 years ago.
I have a dataset I wish to process, and instead of processing it as a time series, I want to summarize the time behaviour. Here is the dataset:
business_id year
vcNAWiLM4dR7D2nwwJ7nCA 2007
vcNAWiLM4dR7D2nwwJ7nCA 2007
vcNAWiLM4dR7D2nwwJ7nCA 2009
UsFtqoBl7naz8AVUBZMjQQ 2004
UsFtqoBl7naz8AVUBZMjQQ 2005
cE27W9VPgO88Qxe4ol6y_g 2007
cE27W9VPgO88Qxe4ol6y_g 2007
cE27W9VPgO88Qxe4ol6y_g 2008
cE27W9VPgO88Qxe4ol6y_g 2010
I want to turn it into this:
business_id year yr_id
vcNAWiLM4dR7D2nwwJ7nCA 2007 1
vcNAWiLM4dR7D2nwwJ7nCA 2007 1
vcNAWiLM4dR7D2nwwJ7nCA 2009 2
UsFtqoBl7naz8AVUBZMjQQ 2004 1
UsFtqoBl7naz8AVUBZMjQQ 2005 2
cE27W9VPgO88Qxe4ol6y_g 2007 1
cE27W9VPgO88Qxe4ol6y_g 2007 1
cE27W9VPgO88Qxe4ol6y_g 2008 2
cE27W9VPgO88Qxe4ol6y_g 2010 3
In other words, I want the ID to be sequential to the year, but local to the business_id, so that it resets when the program finds another business_id.
Is this something that is easily achievable in R?
I found this other question in SO, and the answer effectively answers this question, so this should be marked as duplicate.
https://stackoverflow.com/a/27896841/4858065
The way to achieve this is:
df %>% group_by(business_id) %>%
mutate(year_id = dense_rank(year))

Alter output of ddply

Is it possible to alter the output of ddply? I wondered if was possible to present the unique results for a subset on ONE row instead of giving each result a new row. E.g.
ID Season Year
5074 Summer 2008
5074 Summer 2009
5074 Winter 2008
5074 Winter 2009
5074 Winter 2010
Into...
ID Season Year
5074 Summer 2008,2009
5074 Winter 2008,2009,2010
I often use ddply to manually diagnose the results of for-loops etc, and presenting the results like this would reduce the length of the output and making the check go much faster.
Cheers!
First load in the data
dd = read.table(textConnection("ID Season Year
5074 Summer 2008
5074 Summer 2009
5074 Winter 2008
5074 Winter 2009
5074 Winter 2010"), header=TRUE)
then just use ddply as normal, splitting by ID and Season
ddply(dd, .(ID, Season), summarise, Year=paste(Year, collapse=","))
We use the collapse argument in paste to return a single character. Since you want to use this as a check, it might be worth using sort on Year, i.e.
paste(sort(Year), collapse=",")
dat <- read.table(text="ID Season Year
5074 Summer 2008
5074 Summer 2009
5074 Winter 2008
5074 Winter 2009
5074 Winter 2010", header = TRUE)
The output can be transformed using aggregate:
aggregate(Year ~ ID + Season, data = dat, paste)
# ID Season Year
#1 5074 Summer 2008, 2009
#2 5074 Winter 2008, 2009, 2010
This is a perfect fit for the new nice printing of lists in data.table version 1.8.2
library(data.table)
DT <- as.data.table(dd)
DT[,list(Year = list(Year)), by = list(ID, Season)]
## ID Season Year
## 1: 5074 Summer 2008,2009
## 2: 5074 Winter 2008,2009,2010
The good thing about the results in this format is the fact that it is just the printing that is affected, you can still access the results without any string splitting
DT[(ID==5074)&(Season == 'Summer'), Year]
## [1] 2008 2009
DT[(ID==5074)&(Season == 'Winter'), Year]
## [1] 2008 2009 2010

Resources