Functions on a Matrix in R - r

Lets say I have a dataset with a column representing the years.
Years
2007
2008
2009
2011
2015
I want to subtract the row with the row below it and save the ans to a new column. such as for above data I want to make a function that subtracts 2008 to 2007, the ans is 1 and save this ans to a new column, the next would be 2009 - 2008, 2011 - 2009. the resulting matrix should look like
Year Gap
2007 1
2008 1
2009 2
2011 4
2015 .
and so on
How can I make a function in R that will do this for me?

Related

How can I arrange a group within a data frame based on year?

I have a data frame ("df") which I want to order based on year for a specific group based on Ticker.
year
Ticker
at
2009
FLWS
286.127
2003
FLWS
214.796
2007
FLWS
352.507
2008
FLWS
371.338
2004
FLWS
261.552
2005
FLWS
251.952
2010
FLWS
256.086
2011
FLWS
256.951
2006
FLWS
346.634
2007
SRCE
4447.104
2009
SRCE
4542.100
2003
SRCE
3330.153
2010
SRCE
4445.281
2011
SRCE
4374.071
2005
SRCE
3511.277
I want to have the data frame in order of year (ascending) for each group of Ticker. I've tried using base R (order) and the dplyr package (group_by, arrange) but I am a complete newbie to any sort of coding so needless to say I have been struggling.

How do I invert the order of a variable in a tibble?

I have some fantasy football data from my league. 12 teams x 8 years = 96 observations. I'm trying to create tibble(year, team, record). The team and record variables are organized correctly. But my year column is in the wrong order. It's current order is below, but I need to reverse it so that 2019 starts at the top and 2012 is the last observation. Each value in the year column repeats 12 times since there are 12 teams. There are no NA values. Thanks in advance.
year team record
2012
2012
2012
2012
2012
2012
2012
2012
2012
2012
2012
2012
2013
2013
2013
.
.
.
2019
I'm dumb, this was quite easy. I'll leave it for others and I'll accept any other answer that works. I just inverted year numerically. year <- year[96:1] then did tibble(year, team, record)

Create a local id for a combination of 2 columns [duplicate]

This question already has answers here:
R - add column that counts sequentially within groups but repeats for duplicates
(3 answers)
Closed 7 years ago.
I have a dataset I wish to process, and instead of processing it as a time series, I want to summarize the time behaviour. Here is the dataset:
business_id year
vcNAWiLM4dR7D2nwwJ7nCA 2007
vcNAWiLM4dR7D2nwwJ7nCA 2007
vcNAWiLM4dR7D2nwwJ7nCA 2009
UsFtqoBl7naz8AVUBZMjQQ 2004
UsFtqoBl7naz8AVUBZMjQQ 2005
cE27W9VPgO88Qxe4ol6y_g 2007
cE27W9VPgO88Qxe4ol6y_g 2007
cE27W9VPgO88Qxe4ol6y_g 2008
cE27W9VPgO88Qxe4ol6y_g 2010
I want to turn it into this:
business_id year yr_id
vcNAWiLM4dR7D2nwwJ7nCA 2007 1
vcNAWiLM4dR7D2nwwJ7nCA 2007 1
vcNAWiLM4dR7D2nwwJ7nCA 2009 2
UsFtqoBl7naz8AVUBZMjQQ 2004 1
UsFtqoBl7naz8AVUBZMjQQ 2005 2
cE27W9VPgO88Qxe4ol6y_g 2007 1
cE27W9VPgO88Qxe4ol6y_g 2007 1
cE27W9VPgO88Qxe4ol6y_g 2008 2
cE27W9VPgO88Qxe4ol6y_g 2010 3
In other words, I want the ID to be sequential to the year, but local to the business_id, so that it resets when the program finds another business_id.
Is this something that is easily achievable in R?
I found this other question in SO, and the answer effectively answers this question, so this should be marked as duplicate.
https://stackoverflow.com/a/27896841/4858065
The way to achieve this is:
df %>% group_by(business_id) %>%
mutate(year_id = dense_rank(year))

Simple filtering in R, but with more than one value

I am well aware of how to extract some data based on a condition, but whenever I try multiple conditions, a struggle ensues. I have some data and I only want to extract certain years from the df. Here is an example df:
year value
2006 3
2007 4
2007 3
2008 5
2008 4
2008 4
2009 5
2009 9
2010 2
2010 8
2011 3
2011 8
2011 7
2012 3
2013 4
2012 6
Now let's say I just want 2008, 2009, 2010, and 2011. I try
df<-df[df$year == c("2008", "2009", "2010", "2011"),]
doesn't work, so then:
df<-df[df$year == "2008" & df$year == "2009"
& df$year == "2010" & df$year == "2011",]
No error messages, just an empty df. What am I missing?
You need to use %in% and not==
df[df$year %in% c(2008, 2009, 2010, 2011),]
year value
4 2008 5
5 2008 4
6 2008 4
7 2009 5
8 2009 9
9 2010 2
10 2010 8
11 2011 3
12 2011 8
13 2011 7
As answered %in% works but so should using |. The & is for AND logic, meaning that the year would need to be equal to 2008, 2009, 2010 AND 2011 whereas what you want is the OR operator.
df<-df[df$year == "2008" | df$year == "2009" | df$year == "2010" | df$year == "2011",]
If you don't like %in%, try the function is.element. You might find it more intuitive.
df[is.element(el=df[,"year"], set=c(2008:2011)),]
Careful, though... switching el and set gives different results, and it can be confusing which way you want it. For this example, just remember that "set" contains the "subSET" of years that you want.
The questions has been answered but I wanted to add a comment about why your first try gives an unexpected result. This is a good example of R's vector recycling.
I'm guessing you got
year value
6 2008 4
13 2011 8
Why has R done this? What happens is R recycles the vector c("2008", "2009", "2010", "2011") like the below.
year value compare
2006 3 2008
2007 4 2009
2007 3 2010
2008 5 2011
2008 4 2008
2008 4 2009
2009 5 2010
2009 9 2011
2010 2 2008
2010 8 2009
2011 3 2010
2011 8 2011
2011 7 2008
2012 3 2009
2013 4 2010
2012 6 2011
Do you see what's about to happen? When you run
df<-df[df$year == c("2008", "2009", "2010", "2011"),]
it will return the rows where the year column and the compare column are equal. You didn't get a warning because (by chance) your comparison vector was a divisor of the number of rows, so R thought it was doing the right thing.
This is essentially the same as #Metrics answer:
subset(df, year %in% c(2008, 2009, 2010, 2011))
And if you need help with %in%, see ?intersect

qqPloting subset of data R

I have R data that looks like this.
Year Total
2005 238.79
2005 165.46
2005 196.07
2005 135.28
2005 180.30
2005 237.95
2005 714.74
2005 828.19
2005 516.19
2005 279.76
2005 281.88
2005 338.68
The left most column Year goes from 2005 to 2009. I want to do a qqPlot of the Total using only the files that have 2005 in the year column. how can i do this
Another option is to use subset(), which might seem more natural:
tmp <- subset(dat, subset = Year == 2005, select = Total)
qqnorm(tmp)
qqline(tmp)
Do note that subset() is not recommended for use in programming as the sugar that makes it works gets all messed up when running inside other functions/environments. Using it interactively like this is what subset() was designed for.
First, some example data:
dat <- read.table(text="Year Total
2005 238.79
2005 165.46
2005 196.07
2005 135.28
2005 180.30
2005 237.95
2008 714.74
2008 828.19
2008 516.19
2009 279.76
2009 281.88
2009 338.68", header = TRUE)
If you want a normal QQ plot:
qqnorm(dat[dat$Year == 2005, "Total"])

Resources