Computing a few difficult metrics from an integer vector in R

Computing a few difficult metrics from an integer vector in R - r

For some context, I am working with sports / basketball data. The following vector is for 1 NBA game, and contains the number of points that the home team was ahead or behind at any given point in the game.
dput(leads_vector)
c(0, 0, 0, 0, 0, 0, 0, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 4, 2,
5, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 8, 8, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 11, 11, 9, 9, 9, 9, 9, 9, 9, 9, 11,
11, 9, 9, 9, 11, 11, 11, 11, 12, 13, 13, 13, 13, 13, 13, 15,
14, 14, 13, 13, 13, 13, 11, 14, 14, 14, 14, 14, 14, 14, 14, 14,
14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16,
16, 13, 13, 11, 11, 11, 11, 11, 9, 9, 9, 7, 9, 9, 9, 10, 10,
11, 11, 11, 11, 11, 11, 13, 13, 13, 13, 13, 11, 11, 11, 11, 11,
12, 13, 13, 13, 13, 13, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 12, 13, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 15, 15, 15, 13, 13, 13, 13, 15, 12, 12, 12, 9,
9, 9, 9, 9, 11, 11, 11, 11, 13, 13, 10, 10, 10, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 10, 8, 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 9, 9, 11, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 12, 10, 12, 12, 12,
12, 14, 14, 14, 12, 12, 12, 12, 12, 12, 12, 12, 14, 14, 14, 15,
16, 16, 16, 16, 14, 14, 11, 11, 11, 11, 11, 11, 9, 9, 9, 9, 9,
9, 9, 10, 11, 11, 9, 9, 9, 9, 7, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 3, 2, 1, 1, 1,
3, 3, 3, 3, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 6, 6, 6, 6, 6,
6, 6, 6, 7, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 8, 8, 7, 7, 7,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 11, 11, 11, 11,
9, 9, 9, 9, 9, 9, 10, 11, 11, 11, 8, 11, 8, 10, 10, 11, 11, 11,
11, 11, 9, 11, 11, 11, 10, 10, 10, 12, 12, 12, 12, 13, 13, 16,
16, 16, 16, 17, 18, 19, 19, 19, 19, 19, 18, 18, 18, 20, 20, 20,
20, 20, 20, 20, 18, 18, 18, 16, 16, 16, 13, 13, 13, 11, 10, 10,
10, 10, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13)
These vectors always start with 0, since the game begins tied at 0-0. leads_vector[100] equals 14, which means the home team was winning by 14 at this point in the game. Note that the numbers in the vector repeat, since the score can remain the same for several plays in a row in a basketball game.
The 4 metrics I would like to compute are:
Biggest Lead
Number of times the game was tied
Longest run (consecutive points for one team)
Lead changes
Biggest Lead is easy to compute:
biggest_lead <- abs(max(leads_vector))
Number of times the game was tied is a bit more difficult to compute:
times_tied <- sum(leads_vector[2:length(leads_vector)] == 0 & leads_vector[1:(length(leads_vector)-1)] != 0)
times_tied checks for all instances in the vector where the value is 0 (the score is tied), and the preceding value in the vector is not 0. This ensures that each sequence of zeros counts as the score being tied only once.
I am not sure how to compute longest run. The longest run in the game is the largest monotonically increasing or decreasing sequence in the vector. Just using the eye test, I notice a long run of 8 at leads_vector[38:65].
Number of lead changes is difficult to compute as well. It would be equal to the number of times the lead went from positive to negative in this vector. The following leads_vector:
c(3, -3, 2, 5, 4, 3, 0, 2, -3, -1, -4, -5, -2, 0, 1)
... would have 4 lead changes (from 3 to -3, from -3 to 2, from 2 to -3, and from -2 to 0 to 1).
Any help with this is appreciated!
EDIT - longest run is the tough stat to compute here, but i'm working on it.
EDIT2 - i think longest run will be easier to compute if i remove repeat values from leads_vector. but i cannot use duplicated() function, because that will remove duplicates in different places in the vector. Instead i'd want to only remove repeat values next to each other (get c(0, -2, 5, 3, 5, 8, 10, 11, 9, 11, 9, 11, ... ))

Computing of longest run:
compute_longest_run <- function(x) {
# Collapse repetitions
x_unique <- rle(x)$values
# Compute score change
score_change <- diff(x_unique)
# Need to compute sum of all subvectors with the same sign
run_side <- sign(score_change)
run_id <- c(1, cumsum(diff(run_side) != 0) + 1)
run_value <- tapply(score_change, run_id, sum)
max(abs(run_value))
}
compute_longest_run(leads_vector)
#> [1] 10

#biggest_lead
with(rle(leads_vector), max(abs(values)))
#number_ties
with(rle(leads_vector), sum(values == 0))
#longest_run
#lead_changes
length(rle(leads_vector[leads_vector != 0] < 0)$values)

I found out how to compute lead changes using the sign() and diff() function. First I need to filter out the values where the lead equals 0, since these are not lead changes for my calculations, even though R's sign() function has different values for (+), (-) and 0. I have this:
lead_changes <- sum(diff(sign(leads_vector[leads_vector != 0]))) / 2
For longest run, I think starting with this, to remove repeat values, is a good start:
lead_changes[c(TRUE, lead_changes[-1] != hL[-length(hLlead_changes])]

Related

Marginal effects from the multinomial model

I am trying to get the marginal effects from a multinomial model derived from the mlogit package but it shows an error. Can anyone provide some guidance to solve the problem? Many thanks!
# data
df1 <- structure(list(Y = c(3, 4, 1, 2, 3, 4, 1, 5, 2, 3, 4, 2, 1, 4,
1, 5, 3, 3, 3, 5, 5, 4, 3, 5, 4, 2, 5, 4, 3, 2, 5, 3, 2, 5, 5,
4, 5, 1, 2, 4, 3, 1, 2, 3, 1, 1, 3, 2, 4, 2, 2, 4, 1, 5, 3, 1,
5, 2, 3, 4, 2, 4, 5, 2, 4, 1, 4, 2, 1, 5, 3, 2, 1, 4, 4, 1, 5,
1, 1, 1, 4, 5, 5, 3, 2, 3, 3, 2, 4, 4, 5, 3, 5, 1, 2, 5, 5, 1,
2, 3), D = c(12, 8, 6, 11, 5, 14, 0, 22, 15, 13, 18, 3, 5, 9,
10, 28, 9, 16, 17, 14, 26, 18, 18, 23, 23, 12, 28, 14, 10, 15,
26, 9, 2, 30, 18, 24, 27, 7, 6, 25, 13, 8, 4, 16, 1, 4, 5, 18,
21, 1, 2, 19, 4, 2, 16, 17, 23, 15, 13, 21, 24, 14, 27, 6, 20,
6, 19, 8, 7, 23, 11, 11, 1, 22, 21, 4, 27, 6, 2, 9, 18, 30, 26,
22, 10, 1, 4, 7, 26, 15, 26, 18, 30, 1, 11, 29, 25, 3, 19, 15
), x1 = c(13, 12, 4, 3, 16, 16, 15, 13, 1, 15, 10, 16, 1, 17,
7, 13, 12, 6, 8, 16, 16, 11, 7, 16, 5, 13, 12, 16, 17, 6, 16,
9, 14, 16, 15, 5, 7, 2, 8, 2, 9, 9, 15, 13, 9, 4, 16, 2, 11,
13, 11, 6, 4, 3, 7, 4, 12, 2, 16, 14, 3, 13, 10, 11, 10, 4, 11,
16, 8, 12, 14, 9, 4, 16, 16, 12, 9, 10, 6, 1, 3, 8, 7, 7, 5,
16, 17, 10, 4, 15, 10, 8, 3, 13, 9, 16, 12, 7, 4, 11), x2 = c(12,
19, 18, 19, 15, 12, 15, 16, 15, 11, 12, 16, 17, 14, 12, 17, 17,
16, 12, 20, 11, 11, 15, 14, 18, 10, 14, 13, 10, 14, 18, 18, 18,
17, 18, 14, 16, 19, 18, 16, 18, 14, 17, 10, 16, 12, 16, 15, 11,
18, 19, 15, 19, 11, 16, 10, 20, 14, 10, 12, 10, 15, 13, 15, 11,
20, 11, 12, 16, 16, 11, 15, 11, 11, 10, 10, 16, 11, 20, 17, 20,
17, 16, 11, 18, 19, 18, 14, 17, 11, 16, 11, 18, 14, 15, 16, 11,
14, 11, 13)), class = "data.frame", row.names = c(NA, -100L))
library(mlogit)
mld <- mlogit.data(df1, choice="Y", shape="wide") # shape data for `mlogit()`
mlfit <- mlogit(Y ~ 1 | D + x1 + x2, reflevel="1", data=ml.d) # fit the model
effects(mlfit) # this shows the following error:
Error in if (rhs %in% c(1, 3)) { : argument is of length zero
Called from: effects.mlogit(mlfit)

I believe you are missing the covariate information that needs to be put there, so if you use effects(mlfit, covariate = 'D'), It should work. Now the error is coming because the default of covariate is NULL. NULL is special in R, it has no(zero) length and hence you are getting argument of length zero. Please let me know if it fixes your issue.
As per documentation of effects.mlogit , it says:
covariate
the name of the covariate for which the effect should be computed,
I am getting this output at my end:
R>effects(mlfit, covariate = 'D')
1 2 3
-0.003585105992 -0.070921137682 -0.026032167377
4 5
0.078295227196 0.022243183855

Plotting immunophenogram: Arguments implying different numbe rof rows error

I have encountered an error:
Error in data.frame(start = c(0, 2.5, 5, 7.5, 10, 15, seq(20, 39), 0, :
arguments imply differing number of rows: 31, 30, 36
While im trying to plot an immunophenogram. The code that made the error is below. Please help!
data_a <- data.frame(
start = c(0, 2.5, 5, 7.5, 10, 15, seq(20, 39), 0, 5, 10, 15, 20),
end = c(2.5, 5, 7.5, 10, 15, seq(20, 40), 0, 5, 10, 15, 20),
y1 = c(rep(2.6, 26), rep(0.4, 4)),
y2 = c(rep(5.6, 26), rep(2.2, 4)),
z = c(MIG[c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)],
Activated_B_cell, Active_CD4_T_cell, Active_CD8_T_cell,
Effector_Memory_CD4_T_cell, Effector_Memory_CD8_T_cell, MDSC,
Natural_Killer_cell, Neutrophil, Regulatory_T_cell, MHC,
Immune_Checkpoint, Angiogenesis_Avastin_drug_response_marker,
Catalytic_Activity, Hypoxia_monitor, Immune_Response_Signature,
Stimulatory_dentritic_cell_regulation, Tumor_Secretory_Cytokines,
Wnt_signaling),
vcol = c(unlist(lapply(MIG[c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)],
mapcolors)),
unlist(lapply(c(
Activated_B_cell, Active_CD4_T_cell, Active_CD8_T_cell,
Effector_Memory_CD4_T_cell, Effector_Memory_CD8_T_cell, MDSC,
Natural_Killer_cell, Neutrophil, Regulatory_T_cell, MHC,
Immune_Checkpoint, Angiogenesis_Avastin_drug_response_marker,
Catalytic_Activity, Hypoxia_monitor, Immune_Response_Signature,
Stimulatory_dentritic_cell_regulation, Tumor_Secretory_Cytokines,
Wnt_signaling),
mapbw))),
label = c(unique_ips_genes[c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)],
"Activated B cell", "Active CD4 T cell", "Active CD8 T cell",
"Effector Memory CD4 T cell", "Effector Memory CD8 T cell", "MDSC",
"Natural Killer cell", "Neutrophil", "Regulatory T cell", "MHC",
"Immune Checkpoint", "Angiogenesis - Avastin drug response marker",
"Catalytic Activity", "Hypoxia monitor", "Immune Response Signature",
"Stimulatory dentritic cell regulation", "Tumor Secretory Cytokines",
"Wnt signaling")
)

Using text3D in ribbon plot in R

I want to construct a 3D ribbon plot with the following data.
structure(c(10, 10, 10, 10, 10, 10, 21, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 20, 10, 10, 10, 10, 10, 10, 10, 21, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 10, 10, 10, 19,
10, 10, 10, 21, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
20, 10, 20, 9, 9, 9, 9, 9, 21, 9, 9, 9, 18, 9, 9, 9, 9, 9, 9,
9, 9, 19, 9, 8, 8, 16, 8, 16, 8, 21, 20, 8, 8, 16, 8, 8, 8, 8,
8, 18, 8, 8, 19, 8, 9, 9, 9, 9, 9, 9, 21, 20, 9, 9, 9, 9, 9,
9, 9, 9, 19, 9, 9, 18, 9, 8, 8, 16, 8, 16, 8, 21, 20, 8, 8, 8,
8, 8, 8, 8, 8, 19, 8, 8, 18, 8, 7, 7, 14, 7, 16, 7, 21, 20, 7,
18, 7, 7, 7, 7, 14, 7, 19, 7, 7, 16, 7, 8, 8, 16, 8, 8, 8, 20,
19, 8, 21, 8, 8, 8, 8, 16, 8, 18, 8, 8, 8, 8, 8, 8, 16, 8, 8,
8, 20, 19, 16, 21, 8, 8, 8, 8, 16, 8, 18, 8, 8, 8, 8, 8, 8, 17,
8, 16, 8, 20, 18, 8, 21, 8, 8, 8, 8, 16, 8, 18, 8, 8, 8, 8, 7,
7, 16, 16, 16, 7, 18, 20, 7, 21, 16, 7, 7, 7, 7, 7, 19, 7, 7,
7, 7), .Dim = c(21L, 12L), .Dimnames = list(c("colmA", "colmB",
"colmC", "colmD", "colmE", "colmF", "colmG", "colmH", "colmI",
"colmJ", "colmK", "colmL", "colmM", "colmN", "colmO", "colmP",
"colmQ", "colmR", "colmS", "colmT", "colmU"), c("2005", "2006",
"2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014",
"2015", "2016")))
I have to work out a code in the meanwhile as I did not get any response. Here is the code.
ribbon3D(x = 1:21, y = 1:12, z = tf14, scale = T, expand = 0.01, bty = "g", along = "y",
col = "pink", border = "black", shade = 0.2, ltheta = -90, lphi = 30, space = 0.5,
ticktype = "detailed", d = 2, curtain = T, xlab = "", ylab = "", zlab = "")
# Use text3D to label x axis
text3D(x = 1:21, y = rep(0.5, 21), z = rep(1, 21),
labels = rownames(tf14),
add = TRUE, adj = 0, lphi = 30, ltheta = -90)
# Use text3D to label y axis
text3D(x = rep(0.5, 12), y = 1:12, z = rep(1, 12),
labels = colnames(tf14),
add = TRUE, adj = 1, lphi = 30, ltheta = -90)
But, the image that I get is not the desired one. The axis labels are cluttered and the side on which years are displayed needs to be right hand side. Also, the height of the ribbons is too low.
Can somebody improve the code?

Error in data.frame() arguments imply differing number of rows: 1, 11, 10, 3, 5, 4, 9, 2, 6, 7, 8, 12, 22, 13, 16, 14, 15, 19, 17, 20, 18, 28, 2

I am using this command in R Studio to split the data present in one column:
CTE.info <- data.frame(strsplit(as.character(CTE$V11),'|',fixed=TRUE))
But, I am getting the error:
Error in data.frame("orderItems", "79542;2;24.000;24.000;5.310", "Credit;1;-15.000;-15.000;.000", :
arguments imply differing number of rows: 1, 11, 10, 3, 5, 4, 9, 2, 6, 7, 8, 12, 22, 13, 16, 14, 15, 19, 17, 20, 18, 28, 24
Could someone assist and let me know how can this be sorted?

You can make the length of the list element same and it should work.
lst <- strsplit(as.character(CTE$V11),'|',fixed=TRUE)
d1 <- data.frame(lapply(lst, `length<-`, max(lengths(lst))))
colnames(d1) <- paste0('V', seq_along(d1))
data
CTE <- data.frame(V11= c('a|b|c', 'a|b', 'a|b|c|d'))

R circular LOESS function over 24 hours (a day)

I have data for free parking slots over hours and days.
Here's a random sample of 100.
sl <- list(EmptySlots = c(7, 6, 20, 5, 16, 20, 24, 5, 24, 24, 15, 11,
8, 6, 13, 2, 21, 6, 1, 6, 9, 1, 8, 0, 20, 9, 20, 11, 22, 24,
1, 2, 12, 6, 8, 2, 23, 18, 8, 3, 20, 2, 1, 0, 5, 21, 1, 4, 20,
15, 24, 12, 4, 14, 2, 4, 20, 16, 2, 10, 2, 1, 24, 9, 22, 7, 6,
3, 20, 13, 1, 16, 12, 5, 2, 7, 4, 1, 6, 1, 1, 2, 0, 13, 24, 6,
13, 7, 24, 24, 15, 6, 10, 1, 2, 9, 5, 2, 11, 15), hour = c(8,
16, 23, 14, 18, 7, 17, 15, 19, 19, 17, 17, 16, 14, 17, 12, 19,
10, 10, 13, 16, 10, 16, 11, 12, 9, 0, 15, 16, 21, 10, 11, 17,
11, 16, 15, 23, 7, 16, 14, 18, 14, 14, 9, 15, 2, 10, 9, 19, 17,
20, 16, 12, 17, 12, 9, 23, 9, 15, 17, 10, 12, 18, 17, 18, 17,
13, 10, 7, 8, 10, 18, 11, 11, 12, 17, 12, 9, 14, 15, 10, 11,
10, 10, 20, 16, 18, 15, 21, 18, 17, 13, 8, 11, 15, 16, 11, 9,
12, 18))
A quick way to calculate a LOESS function via ggplot2.
sl <- as.data.frame(sl)
library(ggplot2)
qplot(hour, EmptySlots, data=sl, geom="jitter") + theme_bw() + stat_smooth(size = 2)
What is the best way to tell the LOESS function that 0 and 24 are neighbours? I.e. the line on the left and the right should be the same value if we were to estimate it this way.
Pointers on where to start will do fine.

I'd be tempted just to replicate the data on either side:
library(ggplot2)
empty <- c(7, 6, 20, 5, 16, 20, 24, 5, 24, 24, 15, 11, 8, 6, 13, 2, 21, 6, 1, 6, 9, 1, 8, 0, 20, 9, 20, 11, 22, 24, 1, 2, 12, 6, 8, 2, 23, 18, 8, 3, 20, 2, 1, 0, 5, 21, 1, 4, 20, 15, 24, 12, 4, 14, 2, 4, 20, 16, 2, 10, 2, 1, 24, 9, 22, 7, 6, 3, 20, 13, 1, 16, 12, 5, 2, 7, 4, 1, 6, 1, 1, 2, 0, 13, 24, 6, 13, 7, 24, 24, 15, 6, 10, 1, 2, 9, 5, 2, 11, 15)
hour <- c(8, 16, 23, 14, 18, 7, 17, 15, 19, 19, 17, 17, 16, 14, 17, 12, 19, 10, 10, 13, 16, 10, 16, 11, 12, 9, 0, 15, 16, 21, 10, 11, 17, 11, 16, 15, 23, 7, 16, 14, 18, 14, 14, 9, 15, 2, 10, 9, 19, 17, 20, 16, 12, 17, 12, 9, 23, 9, 15, 17, 10, 12, 18, 17, 18, 17, 13, 10, 7, 8, 10, 18, 11, 11, 12, 17, 12, 9, 14, 15, 10, 11, 10, 10, 20, 16, 18, 15, 21, 18, 17, 13, 8, 11, 15, 16, 11, 9, 12, 18)
emptyrep <- rep.int(empty,3)
hourrep <- c(hour,hour+24,hour-24)
sl <- data.frame(empty=emptyrep, hour=hourrep)
qplot(hour, empty, data=sl, geom="jitter") + theme_bw() + geom_smooth(method="loess",size = 1.5,span=0.2) + coord_cartesian(xlim=c(0,24))
... just like joran said a few minutes earlier (woops)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Computing a few difficult metrics from an integer vector in R - r

#biggest_lead with(rle(leads_vector), max(abs(values))) #number_ties with(rle(leads_vector), sum(values == 0)) #longest_run #lead_changes length(rle(leads_vector[leads_vector != 0] < 0)$values)

Related

Marginal effects from the multinomial model

Plotting immunophenogram: Arguments implying different numbe rof rows error

Using text3D in ribbon plot in R

Error in data.frame() arguments imply differing number of rows: 1, 11, 10, 3, 5, 4, 9, 2, 6, 7, 8, 12, 22, 13, 16, 14, 15, 19, 17, 20, 18, 28, 2

R circular LOESS function over 24 hours (a day)

Categories

Resources