Creating a column of unique identifiers and sequencing along subjects - r

I have a data set where I am attempting to sequence along subject observations and then create a column that provides their birth year. The data looks like this:
Name <- c("Joe Smith", "Joe Smith","Joe Smith","Joe Smith", "Tom Watson", "Tom Watson", "Tom Watson", "Carl Nelle", "Carl Nelle", "Carl Nelle", "Carl Nelle", "Joe Smith", "Joe Smith", "Joe Smith", "Joe Smith")
Year <- c(2001, 2002, 2003, 2004, 2014, 2015, 2016, 2006, 2007, 2008, 2009, 1997, 1998, 1999, 2000)
Var1 <- round(rnorm(n = Name, mean = 10, sd = 2),1)
Var2 <- round(rnorm(n = Name, mean = 30, sd = 10),0)
data <- data.frame(Name, Year, Var1, Var2)
data
Name Year Var1 Var2
1 Joe Smith 2001 8.9 23
2 Joe Smith 2002 9.8 45
3 Joe Smith 2003 11.1 43
4 Joe Smith 2004 11.7 63
5 Tom Watson 2014 11.7 47
6 Tom Watson 2015 13.2 28
7 Tom Watson 2016 9.5 30
8 Carl Nelle 2006 9.5 44
9 Carl Nelle 2007 11.2 32
10 Carl Nelle 2008 12.2 24
11 Carl Nelle 2009 5.6 15
12 Joe Smith 1997 10.5 38
13 Joe Smith 1998 10.3 14
14 Joe Smith 1999 9.2 27
15 Joe Smith 2000 7.1 49
I used the dplyr package to create my sequence of each observation for the subjects like so:
data <- data %>%
group_by(Name) %>%
mutate(id = row_number())
Name Year Var1 Var2 id
1 Joe Smith 2001 8.9 23 1
2 Joe Smith 2002 9.8 45 2
3 Joe Smith 2003 11.1 43 3
4 Joe Smith 2004 11.7 63 4
5 Tom Watson 2014 11.7 47 1
6 Tom Watson 2015 13.2 28 2
7 Tom Watson 2016 9.5 30 3
8 Carl Nelle 2006 9.5 44 1
9 Carl Nelle 2007 11.2 32 2
10 Carl Nelle 2008 12.2 24 3
11 Carl Nelle 2009 5.6 15 4
12 Joe Smith 1997 10.5 38 5
13 Joe Smith 1998 10.3 14 6
14 Joe Smith 1999 9.2 27 7
15 Joe Smith 2000 7.1 49 8
My first problem with this is that the second Joe Smith doesn't get his own id number. This is a problem as several people in the dataset can have the same name. Is there a way to correct this?
The second issue is that I need to create a column called "Birth.Year" which is represented as the first year that the person is in the data base. So it would look like this:
Name Year Var1 Var2 id Birth.Year
1 Joe Smith 2001 8.9 23 1 2001
2 Joe Smith 2002 9.8 45 2 2001
3 Joe Smith 2003 11.1 43 3 2001
4 Joe Smith 2004 11.7 63 4 2001
5 Tom Watson 2014 11.7 47 1 2014
6 Tom Watson 2015 13.2 28 2 2014
7 Tom Watson 2016 9.5 30 3 2014
8 Carl Nelle 2006 9.5 44 1 2006
9 Carl Nelle 2007 11.2 32 2 2006
10 Carl Nelle 2008 12.2 24 3 2006
11 Carl Nelle 2009 5.6 15 4 2006
12 Joe Smith 1997 10.5 38 5 1997
13 Joe Smith 1998 10.3 14 6 1997
14 Joe Smith 1999 9.2 27 7 1997
15 Joe Smith 2000 7.1 49 8 1997
Is there a way to accomplish these tasks in dplyr or do I need to write a specific function?

Here's a way using the lag function. Note that we need to replace the first instance (which is NA) with FALSE. The use of the lag function allows us to check if the Name matches the previous Name or not.
This solution assumes that if the Names aren't grouped together, they're different people.
data <- data.frame(Name, Year, Var1, Var2, stringsAsFactors = FALSE)
data %>%
mutate(Foo1 = Name != lag(Name),
Foo2 = cumsum(ifelse(is.na(Foo1), FALSE, Foo1))) %>%
group_by(Name, Foo2) %>%
mutate(id = row_number(),
BirthYear = min(Year))
Name Year Var1 Var2 Foo1 Foo2 id BirthYear
<chr> <dbl> <dbl> <dbl> <lgl> <int> <int> <dbl>
1 Joe Smith 2001 9.0 30 NA 0 1 2001
2 Joe Smith 2002 11.8 47 FALSE 0 2 2001
3 Joe Smith 2003 6.9 23 FALSE 0 3 2001
4 Joe Smith 2004 8.6 37 FALSE 0 4 2001
5 Tom Watson 2014 10.7 35 TRUE 1 1 2014
6 Tom Watson 2015 9.4 30 FALSE 1 2 2014
7 Tom Watson 2016 7.5 25 FALSE 1 3 2014
8 Carl Nelle 2006 10.7 32 TRUE 2 1 2006
9 Carl Nelle 2007 6.6 25 FALSE 2 2 2006
10 Carl Nelle 2008 10.9 34 FALSE 2 3 2006
11 Carl Nelle 2009 13.5 18 FALSE 2 4 2006
12 Joe Smith 1997 10.1 34 TRUE 3 1 1997
13 Joe Smith 1998 12.0 34 FALSE 3 2 1997
14 Joe Smith 1999 7.3 40 FALSE 3 3 1997
15 Joe Smith 2000 10.8 26 FALSE 3 4 1997

Related

Error for NA using group_by or aggregate function [aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : no rows to aggregate]

I've recently picked up R programming and have been looking through some group_by/aggregate questions posted here to help me learn better. A question came to my mind earlier today on how group_by/aggregate can incorporate NA data rather than 0.
Given the table and code below (credits to max_lim for allowing me to use his data set), what happens if the field of NA exist (which does happen quite often)?
Farms = c(rep("Farm 1", 6), rep("Farm 2", 6), rep("Farm 3", 6))
Year = rep(c(2020,2020,2019,2019,2018,2018),3)
Cow = c(22,NA,16,12,8,NA,31,NA,3,20,39,34,27,50,NA,NA,NA,NA)
Duck = c(12,12,6,NA,NA,NA,28,13,31,50,33,20,NA,9,19,2,NA,7)
Chicken = c(100,120,80,50,NA,10,27,31,NA,43,NA,28,37,NA,NA,NA,5,43)
Sheep = c(30,20,10,NA,16,13,10,20,20,17,48,12,30,NA,20,NA,27,49)
Horse = c(25,20,16,11,NA,12,14,NA,43,42,10,12,42,NA,16,7,NA,42)
Data = data.frame(Farms, Year, Cow, Duck, Chicken, Sheep, Horse)
Farm
Year
Cow
Duck
Chicken
Sheep
Horse
Farm 1
2020
22
12
100
30
25
Farm 1
2020
NA
12
120
20
20
Farm 1
2019
16
6
80
10
16
Farm 1
2019
12
NA
50
NA
11
Farm 1
2018
8
NA
NA
16
NA
Farm 1
2018
NA
NA
10
13
12
Farm 2
2020
31
28
27
10
14
Farm 2
2020
NA
13
31
20
NA
Farm 2
2019
3
31
NA
20
43
Farm 2
2019
20
50
43
17
42
Farm 2
2018
39
33
NA
48
10
Farm 2
2018
34
20
28
12
12
Farm 3
2020
27
NA
37
30
42
Farm 3
2020
50
9
NA
NA
NA
Farm 3
2019
NA
19
NA
20
16
Farm 3
2019
NA
2
NA
NA
7
Farm 3
2018
NA
NA
5
27
NA
Farm 3
2018
NA
7
43
49
42
If I were to use aggregate(.~Farms + Year, Data, mean) here, I would get Error in aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : no rows to aggregate which I assume is because the mean function isn't able to account for NA.
Does anyone know how we can modify the aggregate/group_by function to account for the NA by calculating the average using only years without NA data? i.e.
2020: 10, 2019: NA, 2018:20, 2017:NA, 2016:15 -> the average (after discounting NA years 2019 and 2017) will be (10 + 20 + 15) / (3) = 15.
The ideal output will be as follow:
Farm
Year
Cow
Duck
Chicken
Sheep
Horse
Farm 1
2020
22 (avg = 22/1 as one entry is NA)
12
110
25
22.5
Farm 1
2019
14
6
65
10
13.5
Farm 1
2018
8
N.A. (as it's all NA)
10
14.5
12
Farm 2
2020
31
20.5
29
15
14
Farm 2
2019
11.5
40.5
43
18.5
42.5
Farm 2
2018
36.5
26.5
28
30
11
Farm 3
2020
...
...
...
...
...
Farm 3
2019
...
...
...
...
...
Farm 3
2018
...
...
...
...
...
Here is a way to create the wanted data.frame. I think your solution has one error in row 2 (Sheep), where mean(NA, 10) is equal to 5 and not 10.
library(dplyr)
Using aggregate
Data %>%
aggregate(.~Year+Farms,., FUN=mean, na.rm=T, na.action=NULL) %>%
arrange(Farms, desc(Year)) %>%
as.data.frame() %>%
mutate_at(names(.), ~replace(., is.nan(.), NA))
Using summarize
Data %>%
group_by(Year, Farms) %>%
summarize(MeanCow = mean(Cow, na.rm=T),
MeanDuck = mean(Duck, na.rm=T),
MeanChicken = mean(Chicken, na.rm=T),
MeanSheep = mean(Sheep, na.rm=T),
MeanHorse = mean(Horse, na.rm=T)) %>%
arrange(Farms, desc(Year)) %>%
as.data.frame() %>%
mutate_at(names(.), ~replace(., is.nan(.), NA))
Solution for both
Year Farms Cow Duck Chicken Sheep Horse
1 2020 Farm 1 22.0 12.0 110 25.0 22.5
2 2019 Farm 1 14.0 6.0 65 10.0 13.5
3 2018 Farm 1 8.0 NA 10 14.5 12.0
4 2020 Farm 2 31.0 20.5 29 15.0 14.0
5 2019 Farm 2 11.5 40.5 43 18.5 42.5
6 2018 Farm 2 36.5 26.5 28 30.0 11.0
7 2020 Farm 3 38.5 9.0 37 30.0 42.0
8 2019 Farm 3 NA 10.5 NA 20.0 11.5
9 2018 Farm 3 NA 7.0 24 38.0 42.0

Match tables using 2 criteria in R

I just begin coding in R , I am trying to manipulate data but I have an issue which is the following:
I have 2 different tables (simplified )
the first one (player_df) is as follows:
name experience Club age Position
luc 2 FCB 18 Goalkeeper
jean 9 Real 26 midfielder
ronaldo 14 FCB 32 Goalkeeper
jean 9 Real 26 midfielder
messi 11 Liverpool 35 midfielder
tevez 6 Chelsea 27 Attack
inzaghi 9 Juve 34 Defender
kwfni 17 Bayern 40 Attack
Blabla 9 Real 25 midfielder
wdfood 11 Liverpool 33 midfielder
player2 7 Chelsea 28 Attack
player3 10 Juve 34 Defender
fgh 17 Bayern 40 Attack
...
The second table is the salary by club and experience in million (salary_df)
*experience FCB BAYERN Juve Real Chelsea
1 1.5 1.3 1 4 3
2 2.5 2 2.4 5 4
3 3.4 3.1 3.5 6.3 5
4 5 4.5 6.7 9 6
5 7.1 6.9 9 12 7
6 9 8 10 15 10
7 10 9 12 16 15
8 14 12 13 19 16
9 14.5 17 15 20 17
10 15 19 17 23 18
..*
I would like to add a new column to my data in the first table named let say salary_estimation, and which takes into consideration 2 variables for example here experience and the club.
For example for "luc" who plays in "FCB" and has "2" years experience the output should be "2.5"
In excel its an index / match function, but in R I don't know which function should I use .
How should I approach the problem ?
Data:
df1 <- read.table(text = 'name experience Club age Position
luc 2 FCB 18 Goalkeeper
jean 9 Real 26 midfielder
ronaldo 14 FCB 32 Goalkeeper
jean 9 Real 26 midfielder
messi 11 Liverpool 35 midfielder
tevez 6 Chelsea 27 Attack
inzaghi 9 Juve 34 Defender
kwfni 17 Bayern 40 Attack
Blabla 9 Real 25 midfielder
wdfood 11 Liverpool 33 midfielder
player2 7 Chelsea 28 Attack
player3 10 Juve 34 Defender
fgh 17 Bayern 40 Attack', header = TRUE, stringsAsFactors = FALSE)
df2 <- read.table(text = 'experience FCB BAYERN Juve Real Chelsea
1 1.5 1.3 1 4 3
2 2.5 2 2.4 5 4
3 3.4 3.1 3.5 6.3 5
4 5 4.5 6.7 9 6
5 7.1 6.9 9 12 7
6 9 8 10 15 10
7 10 9 12 16 15
8 14 12 13 19 16
9 14.5 17 15 20 17
10 15 19 17 23 18', header = TRUE, stringsAsFactors = FALSE)
Code:
library('data.table')
setDT(df2)[, Chelsea := as.numeric(Chelsea)]
df2 <- melt(df2, id.vars = "experience", variable.name = "Club", value.name = "Salary" )
df2[df1, on = c("experience", "Club"), nomatch = NA]
Output:
# experience Club Salary name age Position
# 1: 2 FCB 2.5 luc 18 Goalkeeper
# 2: 9 Real 20.0 jean 26 midfielder
# 3: 14 FCB NA ronaldo 32 Goalkeeper
# 4: 9 Real 20.0 jean 26 midfielder
# 5: 11 Liverpool NA messi 35 midfielder
# 6: 6 Chelsea 10.0 tevez 27 Attack
# 7: 9 Juve 15.0 inzaghi 34 Defender
# 8: 17 Bayern NA kwfni 40 Attack
# 9: 9 Real 20.0 Blabla 25 midfielder
# 10: 11 Liverpool NA wdfood 33 midfielder
# 11: 7 Chelsea 15.0 player2 28 Attack
# 12: 10 Juve 17.0 player3 34 Defender
# 13: 17 Bayern NA fgh 40 Attack
One of possible solutions is by joining first table (let say it is player_df) with "long format" of second table salary_df using experience and club as keys. You can do it by using tidyverse package.
library(tidyverse)
player_df %>%
mutate(Club = str_to_title(Club)) %>%
left_join(
salary_df %>%
pivot_longer(-experience, names_to = "Club", values_to = "salary_estimation") %>%
mutate(Club = str_to_title(Club)) )
# Joining, by = c("experience", "Club")
# # A tibble: 13 x 6
# name experience Club age Position salary_estimation
# <chr> <dbl> <chr> <dbl> <chr> <dbl>
# 1 luc 2 Fcb 18 Goalkeeper 2.5
# 2 jean 9 Real 26 midfielder 20
# 3 ronaldo 14 Fcb 32 Goalkeeper NA
# 4 jean 9 Real 26 midfielder 20
# 5 messi 11 Liverpool 35 midfielder NA
# 6 tevez 6 Chelsea 27 Attack 10
# 7 inzaghi 9 Juve 34 Defender 15
# 8 kwfni 17 Bayern 40 Attack NA
# 9 Blabla 9 Real 25 midfielder 20
# 10 wdfood 11 Liverpool 33 midfielder NA
# 11 player2 7 Chelsea 28 Attack 15
# 12 player3 10 Juve 34 Defender 17
# 13 fgh 17 Bayern 40 Attack NA

Filter a dataframe by keeping row dates of three days in a row preferably with dplyr

I would like to filter a dataframe based on its date column. I would like to keep the rows where I have at least 3 consecutive days. I would like to do this as effeciently and quickly as possible, so if someone has a vectorized approached it would be good.
I tried to inspire myself from the following link, but it didn't really go well, as it is a different problem:
How to filter rows based on difference in dates between rows in R?
I tried to do it with a for loop, I managed to put an indicator on the dates who are not consecutive, but it didn't give me the desired result, because it keeps all dates that are in a row even if they are less than 3 in a row.
tf is my dataframe
for(i in 2:(nrow(tf)-1)){
if(tf$Date[i] != tf$Date[i+1] %m+% days(-1)){
if(tf$Date[i] != tf$Date[i-1] %m+% days(1)){
tf$Date[i] = as.Date(0)
}
}
}
The first 22 rows of my dataframe look something like this:
Date RR.x RR.y Y
1 1984-10-20 1 10.8 1984
2 1984-11-04 1 12.5 1984
3 1984-11-05 1 7.0 1984
4 1984-11-09 1 22.9 1984
5 1984-11-10 1 24.4 1984
6 1984-11-11 1 19.0 1984
7 1984-11-13 1 5.9 1984
8 1986-10-15 1 10.3 1986
9 1986-10-16 1 18.1 1986
10 1986-10-17 1 11.3 1986
11 1986-11-17 1 14.1 1986
12 2003-10-17 1 7.8 2003
13 2003-10-25 1 7.6 2003
14 2003-10-26 1 5.0 2003
15 2003-10-27 1 6.6 2003
16 2003-11-15 1 26.4 2003
17 2003-11-20 1 10.0 2003
18 2011-10-29 1 10.0 2011
19 2011-11-04 1 11.4 2011
20 2011-11-21 1 9.8 2011
21 2011-11-22 1 5.6 2011
22 2011-11-23 1 20.4 2011
The result should be:
Date RR.x RR.y Y
4 1984-11-09 1 22.9 1984
5 1984-11-10 1 24.4 1984
6 1984-11-11 1 19.0 1984
8 1986-10-15 1 10.3 1986
9 1986-10-16 1 18.1 1986
10 1986-10-17 1 11.3 1986
13 2003-10-25 1 7.6 2003
14 2003-10-26 1 5.0 2003
15 2003-10-27 1 6.6 2003
20 2011-11-21 1 9.8 2011
21 2011-11-22 1 5.6 2011
22 2011-11-23 1 20.4 2011
One possibility could be:
df %>%
mutate(Date = as.Date(Date, format = "%Y-%m-%d"),
diff = c(0, diff(Date))) %>%
group_by(grp = cumsum(diff > 1 & lead(diff, default = last(diff)) == 1)) %>%
filter(if_else(diff > 1 & lead(diff, default = last(diff)) == 1, 1, diff) == 1) %>%
filter(n() >= 3) %>%
ungroup() %>%
select(-diff, -grp)
Date RR.x RR.y Y
<date> <int> <dbl> <int>
1 1984-11-09 1 22.9 1984
2 1984-11-10 1 24.4 1984
3 1984-11-11 1 19 1984
4 1986-10-15 1 10.3 1986
5 1986-10-16 1 18.1 1986
6 1986-10-17 1 11.3 1986
7 2003-10-25 1 7.6 2003
8 2003-10-26 1 5 2003
9 2003-10-27 1 6.6 2003
10 2011-11-21 1 9.8 2011
11 2011-11-22 1 5.6 2011
12 2011-11-23 1 20.4 2011
Here's a base solution:
DF$Date <- as.Date(DF$Date)
rles <- rle(cumsum(c(1,diff(DF$Date)!=1)))
rles$values <- rles$lengths >= 3
DF[inverse.rle(rles), ]
Date RR.x RR.y Y
4 1984-11-09 1 22.9 1984
5 1984-11-10 1 24.4 1984
6 1984-11-11 1 19.0 1984
8 1986-10-15 1 10.3 1986
9 1986-10-16 1 18.1 1986
10 1986-10-17 1 11.3 1986
13 2003-10-25 1 7.6 2003
14 2003-10-26 1 5.0 2003
15 2003-10-27 1 6.6 2003
20 2011-11-21 1 9.8 2011
21 2011-11-22 1 5.6 2011
22 2011-11-23 1 20.4 2011
Similar approach in dplyr
DF%>%
mutate(Date = as.Date(Date))%>%
add_count(IDs = cumsum(c(1, diff(Date) !=1)))%>%
filter(n >= 3)
# A tibble: 12 x 6
Date RR.x RR.y Y IDs n
<date> <int> <dbl> <int> <dbl> <int>
1 1984-11-09 1 22.9 1984 3 3
2 1984-11-10 1 24.4 1984 3 3
3 1984-11-11 1 19 1984 3 3
4 1986-10-15 1 10.3 1986 5 3
5 1986-10-16 1 18.1 1986 5 3
6 1986-10-17 1 11.3 1986 5 3
7 2003-10-25 1 7.6 2003 8 3
8 2003-10-26 1 5 2003 8 3
9 2003-10-27 1 6.6 2003 8 3
10 2011-11-21 1 9.8 2011 13 3
11 2011-11-22 1 5.6 2011 13 3
12 2011-11-23 1 20.4 2011 13 3

Remove rows with NA values and delete those observations in another year [duplicate]

This question already has answers here:
Filter rows in R based on values in multiple rows
(2 answers)
Closed 5 years ago.
I find it a bit hard to find the right words for what I'm trying to do.
Say I have this dataframe:
library(dplyr)
# A tibble: 74 x 3
country year conf_perc
<chr> <dbl> <dbl>
1 Canada 2017 77
2 France 2017 45
3 Germany 2017 60
4 Greece 2017 33
5 Hungary 2017 67
6 Italy 2017 38
7 Canada 2009 88
8 France 2009 91
9 Germany 2009 93
10 Greece 2009 NA
11 Hungary 2009 NA
12 Italy 2009 NA
Now I want to delete the rows that have NA values in 2009 but then I want to remove the rows of those countries in 2017 as well. I would like to get the following results:
# A tibble: 74 x 3
country year conf_perc
<chr> <dbl> <dbl>
1 Canada 2017 77
2 France 2017 45
3 Germany 2017 60
4 Canada 2009 88
5 France 2009 91
6 Germany 2009 93
We can do any after grouping by 'country'
library(dplyr)
df1 %>%
group_by(country) %>%
filter(!any(is.na(conf_perc)))
# A tibble: 6 x 3
# Groups: country [3]
# country year conf_perc
# <chr> <int> <int>
#1 Canada 2017 77
#2 France 2017 45
#3 Germany 2017 60
#4 Canada 2009 88
#5 France 2009 91
#6 Germany 2009 93
base R solution:
foo <- df$year == 2009 & is.na(df$conf_perc)
bar <- df$year == 2017 & df$country %in% unique(df$country[foo])
df[-c(which(foo), which(bar)), ]
# country year conf_perc
# 1 Canada 2017 77
# 2 France 2017 45
# 3 Germany 2017 60
# 7 Canada 2009 88
# 8 France 2009 91
# 9 Germany 2009 93

Using mutate() to efficiently create data frame

I have this local data frame:
Source: local data frame [792 x 3]
team player_name g
1 Anaheim PERRY_COREY 31
2 Anaheim GETZLAF_RYAN 22
3 Dallas BENN_JAMIE 25
4 Pittsburgh CROSBY_SIDNEY 20
5 Toronto KESSEL_PHIL 27
6 Edmonton HALL_TAYLOR 16
7 Dallas SEGUIN_TYLER 24
8 Montreal VANEK_THOMAS 19
9 Colorado LANDESKOG_GABRIEL 18
10 Chicago SHARP_PATRICK 22
.. ... ... ..
I want to be able to rank the teams based on their average number of goals (g) per player. Here is what I did (really feels suboptimal):
library(dplyr)
d1 <- select(df, team, g, player_name)
c1 <- count(d1, team, wt = g)
c2 <- count(d1, team, wt = n_distinct(player_name))
c3 <- cbind(c1, c2[,2])
c4 <- c3[,2] / c3[,3]
c5 <- cbind(c3, c4)
colnames(c5) <- c("team", "ttgpt", "ttnp", "agpp")
c6 <- mutate(c5, rank = row_number(desc(c4)))
c7 <- filter(c6, rank <=10)
c8 <- arrange(c7, rank)
And here is the result of c8:
team ttgpt ttnp agpp rank
1 Chicago 177 23 7.695652 1
2 Colorado 164 23 7.130435 2
3 Anaheim 180 26 6.923077 3
4 NY_Rangers 153 23 6.652174 4
5 Boston 179 27 6.629630 5
6 San_Jose 157 25 6.280000 6
7 Dallas 155 25 6.200000 7
8 St._Louis 148 24 6.166667 8
9 Ottawa 160 26 6.153846 9
10 Philadelphia 140 23 6.086957 10
I would like to recreate this table with consistent use of %>%
See CSV for reproductible example: playerstats.csv
Ok from what you said:
df<-read.csv("../Downloads/playerstats.csv",header=T,sep=",")
df %>% group_by(Team)
%>% summarise(ttgp=sum(G),ttnp=n_distinct(Player.Name),agp=sum(G)/n_distinct(Player.Name))
%>% mutate(rank=rank(desc(agp)))
%>% filter(rank<=10)
%>% arrange(rank)
Source: local data frame [10 x 5]
Team ttgp ttnp agp rank
1 Chicago 177 23 7.695652 1
2 Colorado 164 23 7.130435 2
3 Anaheim 180 26 6.923077 3
4 NY Rangers 153 23 6.652174 4
5 Boston 179 27 6.629630 5
6 San Jose 157 25 6.280000 6
7 Dallas 155 25 6.200000 7
8 St. Louis 148 24 6.166667 8
9 Ottawa 160 26 6.153846 9
10 Philadelphia 140 23 6.086957 10
Note that I am not sure what you mean with ttgpt and ttnp. Therefore, I tried to guess it.

Resources