Using daply to Split Apply Combine - r

I have been doing some research on the solution to my problem and I think it lies somewhere in daply. I need to take my data frame split it by boat name add 0.1 to net # every time the activity changes and then combine the data sets. My data frame looks like this.
Boat Net # Activity
Ray F 40 Lift
Dawn 67 Lift
Ray F 40 Set
Dawn 67 Set
Ray F 40 Lift
Ray F 40 Set
Ray F 40 Lift
Dawn 67 Lift
After I apply the functions I need the frame to look like this. Essentially adding 0.1 to the net # each time Activity = Set, but the boats are independent of each other.
Boat Net # Activity
Ray F 40.0 Lift
Dawn 67.0 Lift
Ray F 40.1 Set
Dawn 67.1 Set
Ray F 40.1 Lift
Ray F 40.2 Set
Ray F 40.2 Lift
Dawn 67.1 Lift
I have been using this function to add 0.1 to net # for every change in Activity, and it has worked really well but does not take into consideration the boat name.
df$`Net #` <- df$`Net #` + seq(0, 1, by = 0.1)[with(df, cumsum(c(TRUE, Activity[-1]!= Activity[-length(Activity)])))] + 1
Initially I tried to use Split, and then apply the function but that did nothing so I switched to daply. I tried this and got the following error:
daply(df, df$Boat, .fun = df$`Net #` + seq(0, 1, by = 0.1)[with(df, cumsum(c(TRUE, Activity[-1]!= Activity[-length(Activity)])))] + 1)
Error in parse(text = x) : <text>:1:6: unexpected symbol
1: Dawn Marie
^
I think I am on the right path but any help would be great.

Using dplyr package and the %>% operator:
df <- df %>% group_by(Boat) %>% mutate(Net = Net + cumsum(Activity == "Set") * 0.1) %>% ungroup
we have the answer:
Boat Net Activity
1 Ray F 40.0 Lift
2 Dawn 67.0 Lift
3 Ray F 40.1 Set
4 Dawn 67.1 Set
5 Ray F 40.1 Lift
6 Ray F 40.2 Set
7 Ray F 40.2 Lift
8 Dawn 67.1 Lift
The same code but without the %>% if you prefer:
df <- ungroup(mutate(group_by(df, Boat), Net = Net + cumsum(Activity == "Set") * 0.1))

Related

How to create a more concise table with these 2 variables? (R programming)

I am using the dataset nba_ht_wt which can be imported via text(readr) by the url http://users.stat.ufl.edu/~winner/data/nba_ht_wt.csv . The question I am trying to tackle is "What percentage of players have a BMI over 25, which is considered "overweight"?
I already created a new variable in the table called highbmi, which corresponds to bmi > 25. This is my code, but the table is hard to read, how could I get a more concise and easier to read table?
nba_ht_wt = nba_ht_wt %>% mutate(highbmi = bmi>25)
tab = table(nba_ht_wt$highbmi, nba_ht_wt$Player)
100*prop.table(tab,1)
I am using R programming.
There is no variable called bmi in the data provided so I will take a guess it is calculated via formula Weight/Height^2, where height is in meters.
data <- read.csv("http://users.stat.ufl.edu/~winner/data/nba_ht_wt.csv")
head(data)
Player Pos Height Weight Age
1 Nate Robinson G 69 180 29
2 Isaiah Thomas G 69 185 24
3 Phil Pressey G 71 175 22
4 Shane Larkin G 71 176 20
5 Ty Lawson G 71 195 25
6 John Lucas III G 71 157 30
I am no expert but it looks to me like height and weight have it names swapped for some reason.
So I will make this adjustment to calculate bmi:
data$bmi <- data$Height/(data$Weight/100)**2
And now we can answer "What percentage of players have a BMI over 25, which is considered "overweight"? with simple line of code:
mean(data$bmi > 25)
Multiply this number by 100 to get answer in percentages. So the answer will be 1.782178%
Assuming the formula: weight (lb) / [height (in)]^2 * 703 (source: https://www.cdc.gov/healthyweight/assessing/bmi/adult_bmi/index.html), you could do:
library(data.table)
nba_ht_wt <- fread("http://users.stat.ufl.edu/%7Ewinner/data/nba_ht_wt.csv")
nba_ht_wt[, highbmi:=(Weight / Height**2 * 703)>25][,
.(`% of Players`=round(.N/dim(nba_ht_wt)[1]*100,2)), by="highbmi"][]
#> highbmi % of Players
#> 1: TRUE 45.35
#> 2: FALSE 54.65
... or plug in the formula into the previous response for a base R solution.
This simple formula might not be really appropriate for basketball players, obviously.

Calculate the angles between lat/long coordinates with moving animal

I want to work out what the green and yellow angles are on the diagrams. The red line is the movement of animal 1 with each numbered point representing its location at each time point (rowid in the data). A3 represents the position of animal 3 when animal 1 is at point 2.
To work out the orange angle, I think I need to work out the angle drawn on in black and then do 180° minus the black angle, but i'm not sure how to work out this either.
I want to work this out for each timepoint in the data, a sample of which i've included below. In some cases I don't have the location of animal 3 which is fine and the angle can be NA. I've included 2 diagrams to show the different situations that could occur. The crs= 4326. Any help is much appreciated!
rowid,id,t_,lon,lat,Animal3.lon,Animal3 .lat
1,Animal 1,01/01/2017 06:19,-9.95545,3.777097,#N/A,#N/A
2,Animal 1,01/01/2017 08:45,-9.93917,3.774998,-9.95192,3.789981
3,Animal 1,01/01/2017 16:34,-9.94561,3.779115,-9.94959,3.783688
4,Animal 1,01/02/2017 08:18,-9.94575,3.784986,-9.94617,3.798219
5,Animal 1,01/02/2017 15:57,-9.94198,3.794307,-9.94861,3.802043
6,Animal 1,01/03/2017 07:24,-9.9353,3.783469,-9.9472,3.795541
7,Animal 1,01/03/2017 17:44,-9.93446,3.775781,-9.93526,3.81313
8,Animal 1,01/03/2017 19:33,-9.94091,3.773766,#N/A,#N/A
9,Animal 1,01/04/2017 06:33,-9.93553,3.775065,-9.93203,3.799718
10,Animal 1,01/04/2017 17:01,-9.93588,3.779135,-9.93348,3.796017
11,Animal 1,01/05/2017 08:43,-9.92929,3.774276,-9.93471,3.794776
12,Animal 1,01/05/2017 16:43,-9.92989,3.778653,-9.93755,3.803964
Here's my solution based on a simplified version of the matrix you provided. Credit to #mdsummer's function - here - which I modified only slightly.
library(tidyverse)
library(maptools)
library(sf)
# Function - get angle from set of three points
trackAngle <- function(xy) {
if(any(is.na(xy))){return(NA)}
angles <- abs(c(trackAzimuth(xy), 0) -
c(0, rev(trackAzimuth(xy[nrow(xy):1, ]))))
angles <- ifelse(angles > 180, 360 - angles, angles)
angles[is.na(angles)] <- 180
angles[-c(1, length(angles))]
}
# Original Matrix
animal_mat <- t(matrix(c(-9.95545,3.777097,NA, NA,
-9.93917,3.774998,-9.95192,3.789981,
-9.94561,3.779115,-9.94959,3.783688,
-9.94575,3.784986,-9.94617,3.798219,
-9.94198,3.794307,-9.94861,3.802043,
-9.9353,3.783469,-9.9472,3.795541,
-9.93446,3.775781,-9.93526,3.81313,
-9.94091,3.773766,NA, NA,
-9.93553,3.775065,-9.93203,3.799718,
-9.93588,3.779135,-9.93348,3.796017,
-9.92929,3.774276,-9.93471,3.794776,
-9.92989,3.778653,-9.93755,3.803964), 4, 12))
## Reformat to get lists of points
# Other animal angles
animal_pts1 <- map(2:nrow(animal_mat), function(idx){
animal_pt <- unname(animal_mat[idx,][c(1,2)])
other_pt <- unname(animal_mat[idx,][c(3,4)])
animal_last_pt <- unname(animal_mat[idx-1,][c(1,2)])
animal_pts_mat <- rbind(animal_last_pt, animal_pt, other_pt)
animal_pts_mat
})
# Main animal angles
animal_pts2 <- map(2:(nrow(animal_mat)-1), function(idx){
animal_pt <- unname(animal_mat[idx,][c(1,2)])
animal_last_pt <- unname(animal_mat[idx-1,][c(1,2)])
animal_next_pt <- unname(animal_mat[idx+1,][c(1,2)])
animal_pts_mat <- rbind(animal_last_pt, animal_pt, animal_next_pt)
animal_pts_mat
})
## Angles
# Other animal angles (11)
other_animal_angles <- 180 - map(animal_pts1, ~trackAngle(.x)) %>% unlist
# Main animal angles (10)
main_animal_angles <- 180 - map(animal_pts2, ~trackAngle(.x)) %>% unlist
## Combining angles info
angles_tbl <- tibble(main = main_animal_angles, other = other_animal_angles[1:10])
## Final dataframe
angles_tbl %>% mutate(tot = main+other)
# A tibble: 10 x 3
main other tot
<dbl> <dbl> <dbl>
1 155. 138. 292.
2 56.0 16.4 72.4
3 23.3 0.451 23.8
4 126. 62.5 189.
5 25.4 167. 192.
6 78.8 175. 254.
7 176. NA NA
8 81.3 68.3 150.
9 131. 13.0 144.
10 134. 141. 275.
Note that you get angles greater than 180 with this solution, so something may be wrong here (possible with CRS?). Technically, it is possible to get values greater than 180, although I don't know if it makes sense in practice because I don't have any domain knowledge that would give me a sense of what angles are possible in this setting.

Conditional constraints in RGLPK library for R

dipping my feet into R after using excel for many years and have a question. I am thoroughly impressed with how much faster R is, it used to take Excel over an hour to do 10,000 simulations and R did 25,000 of the same sim in 4 mins. Awesome.
This is fantasy football related as I am trying to create a lineup optimizer in R and found the RGLPK library to be a good option. There are multiple other questions on SO that helped me get to where I am today however I have hit a road block. Here are some of the other topics.
Fantasy football linear programming in R with RGLPK
Rglpk - Fantasy Football Lineup Optimiser - Rbind of For Loop Output
Rglpk - Fantasy Football Lineup Optimiser - Forcing the Inclusion of a Player
Here is my stock optimizer
#stock optimal linups solver
name <- myData$Name
pos <- myData$Pos
pts <- myData$Projection
cost <- myData$Salary
team <- myData$Team
opp <- myData$Opp
num.players <- length(name)
f <- pts
var.types <- rep("B", num.players)
A <- rbind(as.numeric(pos=="QB")
, as.numeric(pos=="RB")
, as.numeric(pos=="WR")
, as.numeric(pos=="TE")
, as.numeric(pos=="K")
, as.numeric(pos=="D")
,cost)
dir <- c("=="
,"=="
,"=="
,"=="
,"=="
,"=="
,"<=")
b <- c(1
, 2
, 3
, 1
, 1
, 1
, 60000)
library(Rglpk)
sol <- Rglpk_solve_LP(obj = f
, mat = A
, dir = dir
, rhs = b
, types = var.types
, max=TRUE)
myData[sol$solution == 1,]
sprintf('Cost is:$%i', sum(cost[sol$solution > 0]))
sprintf('Projected Points is: %f', sol$optimum)
Here is a link to the data I'm using.
https://www.dropbox.com/s/d5m8jjnq32f0cpe/Week6NFLProjections.csv?dl=0
I'm also to the point where I can loop the code to create multiple lineups by setting the objective = to the previous score - .01. As a side note this process slows down significantly as it keeps going on(say by lineup #50), is this normal and is there a more efficient way to loop this?
My real question is how can I add some more extensive constraints. In Fantasy football it is useful to "pair" players from the same team together and I can't figure out how I would put that into the constraints.
For a simple pairing example how could I add a constraint so that my "optimal lineup" would have the D and K from the same team? I actually have been able to work around this question by just combining the D+K in the CSV file but am interested in how I would code that into R.
A more complex pairing scenario would be to have my QB and just 1 of the (3)WR/(1)TE be on the same team.
Another would be to make sure none of the offensive players is playing vs my own defense.
Any help would be greatly appreciated. Can't seem to find an answer to this anywhere.
Try doing something similar to this, you'll just need to modify it to suit your situation. I've taken this direct from my own code, but basically, input the players I want and create a separate data frame with these. Then I optimise the left over positions and rbind together to create the final lineup. This loops through and gives as many lineups as the user wants.
Inclusions<-readline("Enter players to include into optimal lineups: ")
Inclusions <- as.character(unlist(strsplit(Inclusions, ",")))
Inclusions_table<-Data[ Data$Player.Name %in% Inclusions, ]
Inclusions_no<-nrow(Inclusions_table)
Data<-Data[ ! Data$Player.Name %in% Inclusions, ]
Lineup_no<-readline("How many lineups to be generated?: ")
num.players <- length(Data$Player.Name)
obj<-Data$fpts
var.types<-rep("B",num.players)
subscore<-1000
Lineups <- list()
for(i in 1:Lineup_no)
{
matrix <- rbind(as.numeric(Data$Position == "QB"), # num QB
as.numeric(Data$Position == "RB"), # num RB
as.numeric(Data$Position == "RB"), # num RB
as.numeric(Data$Position == "WR"), # num WR
as.numeric(Data$Position == "WR"), # num WR
as.numeric(Data$Position == "TE"), # num TE
as.numeric(Data$Position == "TE"), # num TE
as.numeric(Data$Position %in% c("RB", "WR", "TE")), # Num RB/WR/TE
as.numeric(Data$Position == "DEF"),# num DEF
Data$Salary,Data$fpts)
direction <- c("==",
">=",
"<=",
">=",
"<=",
">=",
"<=",
"==",
"==",
"<=","<")
opt_var<-subscore-0.01
rhs<-c(1-sum(Inclusions_table$Position=="QB"),max(0,2-sum(Inclusions_table$Position=="RB")),4-sum(Inclusions_table$Position=="RB"),max(0,2-sum(Inclusions_table$Position=="WR")),4-sum(Inclusions_table$Position=="WR"),max(0,1-sum(Inclusions_table$Position=="TE")),2-sum(Inclusions_table$Position=="TE"),7-sum(Inclusions_table$Position=="RB")-sum(Inclusions_table$Position=="WR")-sum(Inclusions_table$Position=="TE"),1-sum(Inclusions_table$Position=="DEF"),100000-sum(Inclusions_table$Salary),opt_var)
sol <- Rglpk_solve_LP(obj = obj, mat = matrix, dir = direction, rhs = rhs,
types = var.types, max = TRUE)
Lineup<-data.frame(Data[sol$solution==1,])
subscore<-sum(Lineup$fpts)
Lineup<-rbind(Lineup,Inclusions_table)
Lineup<-Lineup[order(Lineup$Position),]
Salary<-sum(Lineup$Salary)
Score<-sum(Lineup$fpts)
print(Lineup)
print(Salary)
print(Score)
Lineups[[i]]<-Lineup
}
Data is my data set and looks like this for reference:
Position Player.Name Team Opponent Salary PPG fpts Pos_Rank upper lower Off_Snaps Pct_Off
1056 TE A.J. Derby Patriots Bills 5000 0 0.0000 82 0 0 NA <NA>
462 RB Aaron Ripkowski Packers Falcons 6000 1.8 1.3116 75 1.8852 0.01 22 25%
78 QB Aaron Rodgers Packers Falcons 19350 20.6 18.4292 1 19.9689 17.2 87 100%
1466 WR Adam Humphries Buccaneers Raiders 7650 8.1 9.4808 46 11.2125 7.5664 38 51%
1808 WR Albert Wilson Chiefs Colts 5000 4.3 5.6673 74 6.2438 4.78 11 21%
1252 WR Aldrick Robinson Falcons Packers 5000 3.8 2.9114 96 3.2836 2.0152 10 15%
636 RB Alex Collins Seahawks Saints 6000 2.7 1.5992 69 2.1513 0.41 1 2%
Hopefully you can modify this example to suit you.

How to prepare data including cardinal directions to work with rayleigh.test {circular}

I have data in a form like this:
quantity direction
10 n
5 e
6 ne
12 n
20 nw
5 s
8 n
1 sw
3 se
2 ne
6 nw
8 n
2 se
3 e
4 w
9 nw
on which I want to run the rayleigh.test from circular package (For more information why I want to do this check: https://stats.stackexchange.com/questions/198701/check-for-significant-difference-between-numbers-of-sightings-per-cardinal-direc). I guess that I have to use the circular function up front to prepare the data but I have no clue how to do that. The allowed values for the units argument of this function are “radians”, “degrees”, “hours” and I can't figure out how to fit my directions into that.
How can I get rayleigh.test to accept cardinal directions as input?
I can't judge whether the rayleigh test is ok with non-continuous data, but here is a small example of how to map your characters to degrees:
df <- data.frame( quantity = c(37,5,6) , direction = c("n", "ne" , "n") )
df$direction <- as.factor(df$direction)
# create a map from character to degrees:
map <- setNames( c( 0, 45) , c("n", "ne") )
levels(df$direction) <- map[ levels(df$direction) ]

Using dplyr::summarize() function for multi-step arithmetical process?

So I've got some golf data that I'm messing with in R:
player rd hole shot distToPin distShot
E. Els 1 1 1 525 367.6
E. Els 1 1 2 157.4 130.8
E. Els 1 1 3 27.5 27.4
E. Els 1 1 4 1.2 1.2
E. Els 1 2 1 222 216.6
E. Els 1 2 2 6.8 6.6
E. Els 1 2 3 0.3 0.3
E. Els 2 1 1 378 244.4
E. Els 2 1 2 135.9 141.6
E. Els 2 1 3 6.7 6.9
E. Els 2 1 4 0.1 0.1
I'm trying to make an "efficiency" computation. Basically, I want to compute the following formula (which I made up, if you can't tell) by round:
E = hole yardage / (sum(distance of all shots) - hole yardage)
And ultimately, I want my results to look like this:
rd efficiency
E.Els 1 205.25
2 25.2
That efficiency column is the averaged result of the efficiency for each hole over the entire round. The issue that I'm having is I can't quite figure out how to do such a complex calculation using dplyr::summarize():
efficiency <- df %>%
group_by(player, rd) %>%
summarize(efficiency = (sum(distShot) - distToPin))
But the problem with that particular script is that it returns the error:
Error: expecting a single value
I think my problem is that were it to run, it wouldn't be able to tell WHICH distToPin to subtract, and the one I want is obviously the first distToPin of each hole, or the accurate hole length (unfortunately, I don't have a column of just "hole yardage." I want to pull that first distToPin of each hole out and use it within my summarize() arithmetic. Is this even possible?
I'm guessing that there is a way to do these types of complex, multi-step calculations within the summarize function, But maybe there's not! Any ideas or advice?
You seem to be missing some steps. Here is a deliberately labored version to show that, using dplyr. It assumes that your data frame is named golfdf:
golfdf %>%
group_by(player, round, hole) %>%
summarise(hole.length = first(distToPin), shots.length = sum(distShot)) %>%
group_by(player, round) %>%
summarise(efficiency = sum(hole.length) / (sum(shots.length) - sum(hole.length)))

Resources