R circular LOESS function over 24 hours (a day) - r

I have data for free parking slots over hours and days.
Here's a random sample of 100.
sl <- list(EmptySlots = c(7, 6, 20, 5, 16, 20, 24, 5, 24, 24, 15, 11,
8, 6, 13, 2, 21, 6, 1, 6, 9, 1, 8, 0, 20, 9, 20, 11, 22, 24,
1, 2, 12, 6, 8, 2, 23, 18, 8, 3, 20, 2, 1, 0, 5, 21, 1, 4, 20,
15, 24, 12, 4, 14, 2, 4, 20, 16, 2, 10, 2, 1, 24, 9, 22, 7, 6,
3, 20, 13, 1, 16, 12, 5, 2, 7, 4, 1, 6, 1, 1, 2, 0, 13, 24, 6,
13, 7, 24, 24, 15, 6, 10, 1, 2, 9, 5, 2, 11, 15), hour = c(8,
16, 23, 14, 18, 7, 17, 15, 19, 19, 17, 17, 16, 14, 17, 12, 19,
10, 10, 13, 16, 10, 16, 11, 12, 9, 0, 15, 16, 21, 10, 11, 17,
11, 16, 15, 23, 7, 16, 14, 18, 14, 14, 9, 15, 2, 10, 9, 19, 17,
20, 16, 12, 17, 12, 9, 23, 9, 15, 17, 10, 12, 18, 17, 18, 17,
13, 10, 7, 8, 10, 18, 11, 11, 12, 17, 12, 9, 14, 15, 10, 11,
10, 10, 20, 16, 18, 15, 21, 18, 17, 13, 8, 11, 15, 16, 11, 9,
12, 18))
A quick way to calculate a LOESS function via ggplot2.
sl <- as.data.frame(sl)
library(ggplot2)
qplot(hour, EmptySlots, data=sl, geom="jitter") + theme_bw() + stat_smooth(size = 2)
What is the best way to tell the LOESS function that 0 and 24 are neighbours? I.e. the line on the left and the right should be the same value if we were to estimate it this way.
Pointers on where to start will do fine.

I'd be tempted just to replicate the data on either side:
library(ggplot2)
empty <- c(7, 6, 20, 5, 16, 20, 24, 5, 24, 24, 15, 11, 8, 6, 13, 2, 21, 6, 1, 6, 9, 1, 8, 0, 20, 9, 20, 11, 22, 24, 1, 2, 12, 6, 8, 2, 23, 18, 8, 3, 20, 2, 1, 0, 5, 21, 1, 4, 20, 15, 24, 12, 4, 14, 2, 4, 20, 16, 2, 10, 2, 1, 24, 9, 22, 7, 6, 3, 20, 13, 1, 16, 12, 5, 2, 7, 4, 1, 6, 1, 1, 2, 0, 13, 24, 6, 13, 7, 24, 24, 15, 6, 10, 1, 2, 9, 5, 2, 11, 15)
hour <- c(8, 16, 23, 14, 18, 7, 17, 15, 19, 19, 17, 17, 16, 14, 17, 12, 19, 10, 10, 13, 16, 10, 16, 11, 12, 9, 0, 15, 16, 21, 10, 11, 17, 11, 16, 15, 23, 7, 16, 14, 18, 14, 14, 9, 15, 2, 10, 9, 19, 17, 20, 16, 12, 17, 12, 9, 23, 9, 15, 17, 10, 12, 18, 17, 18, 17, 13, 10, 7, 8, 10, 18, 11, 11, 12, 17, 12, 9, 14, 15, 10, 11, 10, 10, 20, 16, 18, 15, 21, 18, 17, 13, 8, 11, 15, 16, 11, 9, 12, 18)
emptyrep <- rep.int(empty,3)
hourrep <- c(hour,hour+24,hour-24)
sl <- data.frame(empty=emptyrep, hour=hourrep)
qplot(hour, empty, data=sl, geom="jitter") + theme_bw() + geom_smooth(method="loess",size = 1.5,span=0.2) + coord_cartesian(xlim=c(0,24))
... just like joran said a few minutes earlier (woops)

Related

Marginal effects from the multinomial model

I am trying to get the marginal effects from a multinomial model derived from the mlogit package but it shows an error. Can anyone provide some guidance to solve the problem? Many thanks!
# data
df1 <- structure(list(Y = c(3, 4, 1, 2, 3, 4, 1, 5, 2, 3, 4, 2, 1, 4,
1, 5, 3, 3, 3, 5, 5, 4, 3, 5, 4, 2, 5, 4, 3, 2, 5, 3, 2, 5, 5,
4, 5, 1, 2, 4, 3, 1, 2, 3, 1, 1, 3, 2, 4, 2, 2, 4, 1, 5, 3, 1,
5, 2, 3, 4, 2, 4, 5, 2, 4, 1, 4, 2, 1, 5, 3, 2, 1, 4, 4, 1, 5,
1, 1, 1, 4, 5, 5, 3, 2, 3, 3, 2, 4, 4, 5, 3, 5, 1, 2, 5, 5, 1,
2, 3), D = c(12, 8, 6, 11, 5, 14, 0, 22, 15, 13, 18, 3, 5, 9,
10, 28, 9, 16, 17, 14, 26, 18, 18, 23, 23, 12, 28, 14, 10, 15,
26, 9, 2, 30, 18, 24, 27, 7, 6, 25, 13, 8, 4, 16, 1, 4, 5, 18,
21, 1, 2, 19, 4, 2, 16, 17, 23, 15, 13, 21, 24, 14, 27, 6, 20,
6, 19, 8, 7, 23, 11, 11, 1, 22, 21, 4, 27, 6, 2, 9, 18, 30, 26,
22, 10, 1, 4, 7, 26, 15, 26, 18, 30, 1, 11, 29, 25, 3, 19, 15
), x1 = c(13, 12, 4, 3, 16, 16, 15, 13, 1, 15, 10, 16, 1, 17,
7, 13, 12, 6, 8, 16, 16, 11, 7, 16, 5, 13, 12, 16, 17, 6, 16,
9, 14, 16, 15, 5, 7, 2, 8, 2, 9, 9, 15, 13, 9, 4, 16, 2, 11,
13, 11, 6, 4, 3, 7, 4, 12, 2, 16, 14, 3, 13, 10, 11, 10, 4, 11,
16, 8, 12, 14, 9, 4, 16, 16, 12, 9, 10, 6, 1, 3, 8, 7, 7, 5,
16, 17, 10, 4, 15, 10, 8, 3, 13, 9, 16, 12, 7, 4, 11), x2 = c(12,
19, 18, 19, 15, 12, 15, 16, 15, 11, 12, 16, 17, 14, 12, 17, 17,
16, 12, 20, 11, 11, 15, 14, 18, 10, 14, 13, 10, 14, 18, 18, 18,
17, 18, 14, 16, 19, 18, 16, 18, 14, 17, 10, 16, 12, 16, 15, 11,
18, 19, 15, 19, 11, 16, 10, 20, 14, 10, 12, 10, 15, 13, 15, 11,
20, 11, 12, 16, 16, 11, 15, 11, 11, 10, 10, 16, 11, 20, 17, 20,
17, 16, 11, 18, 19, 18, 14, 17, 11, 16, 11, 18, 14, 15, 16, 11,
14, 11, 13)), class = "data.frame", row.names = c(NA, -100L))
library(mlogit)
mld <- mlogit.data(df1, choice="Y", shape="wide") # shape data for `mlogit()`
mlfit <- mlogit(Y ~ 1 | D + x1 + x2, reflevel="1", data=ml.d) # fit the model
effects(mlfit) # this shows the following error:
Error in if (rhs %in% c(1, 3)) { : argument is of length zero
Called from: effects.mlogit(mlfit)
I believe you are missing the covariate information that needs to be put there, so if you use effects(mlfit, covariate = 'D'), It should work. Now the error is coming because the default of covariate is NULL. NULL is special in R, it has no(zero) length and hence you are getting argument of length zero. Please let me know if it fixes your issue.
As per documentation of effects.mlogit , it says:
covariate
the name of the covariate for which the effect should be computed,
I am getting this output at my end:
R>effects(mlfit, covariate = 'D')
1 2 3
-0.003585105992 -0.070921137682 -0.026032167377
4 5
0.078295227196 0.022243183855

Using text3D in ribbon plot in R

I want to construct a 3D ribbon plot with the following data.
structure(c(10, 10, 10, 10, 10, 10, 21, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 20, 10, 10, 10, 10, 10, 10, 10, 21, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 10, 10, 10, 19,
10, 10, 10, 21, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
20, 10, 20, 9, 9, 9, 9, 9, 21, 9, 9, 9, 18, 9, 9, 9, 9, 9, 9,
9, 9, 19, 9, 8, 8, 16, 8, 16, 8, 21, 20, 8, 8, 16, 8, 8, 8, 8,
8, 18, 8, 8, 19, 8, 9, 9, 9, 9, 9, 9, 21, 20, 9, 9, 9, 9, 9,
9, 9, 9, 19, 9, 9, 18, 9, 8, 8, 16, 8, 16, 8, 21, 20, 8, 8, 8,
8, 8, 8, 8, 8, 19, 8, 8, 18, 8, 7, 7, 14, 7, 16, 7, 21, 20, 7,
18, 7, 7, 7, 7, 14, 7, 19, 7, 7, 16, 7, 8, 8, 16, 8, 8, 8, 20,
19, 8, 21, 8, 8, 8, 8, 16, 8, 18, 8, 8, 8, 8, 8, 8, 16, 8, 8,
8, 20, 19, 16, 21, 8, 8, 8, 8, 16, 8, 18, 8, 8, 8, 8, 8, 8, 17,
8, 16, 8, 20, 18, 8, 21, 8, 8, 8, 8, 16, 8, 18, 8, 8, 8, 8, 7,
7, 16, 16, 16, 7, 18, 20, 7, 21, 16, 7, 7, 7, 7, 7, 19, 7, 7,
7, 7), .Dim = c(21L, 12L), .Dimnames = list(c("colmA", "colmB",
"colmC", "colmD", "colmE", "colmF", "colmG", "colmH", "colmI",
"colmJ", "colmK", "colmL", "colmM", "colmN", "colmO", "colmP",
"colmQ", "colmR", "colmS", "colmT", "colmU"), c("2005", "2006",
"2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014",
"2015", "2016")))
I have to work out a code in the meanwhile as I did not get any response. Here is the code.
ribbon3D(x = 1:21, y = 1:12, z = tf14, scale = T, expand = 0.01, bty = "g", along = "y",
col = "pink", border = "black", shade = 0.2, ltheta = -90, lphi = 30, space = 0.5,
ticktype = "detailed", d = 2, curtain = T, xlab = "", ylab = "", zlab = "")
# Use text3D to label x axis
text3D(x = 1:21, y = rep(0.5, 21), z = rep(1, 21),
labels = rownames(tf14),
add = TRUE, adj = 0, lphi = 30, ltheta = -90)
# Use text3D to label y axis
text3D(x = rep(0.5, 12), y = 1:12, z = rep(1, 12),
labels = colnames(tf14),
add = TRUE, adj = 1, lphi = 30, ltheta = -90)
But, the image that I get is not the desired one. The axis labels are cluttered and the side on which years are displayed needs to be right hand side. Also, the height of the ribbons is too low.
Can somebody improve the code?

How to get the true node value in igraph

So I have read in a network data in iGraph(R) and would like to store the nodes into a list. Here's what I have done:
G = read_graph("somegraph.graphml",format="graphml")
x = list(V(G))
> x
+ 15/15 vertices, from ecb3920:
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
My question is, how do I get the true value, i.e. the actually node id in my data, from V(G). Thanks.
> dput(G)
structure(list(15, FALSE, c(13, 7, 9, 14, 10, 5, 4, 11, 6, 7,
14, 4, 13, 9, 10, 5, 5, 13, 9, 6, 7, 14, 12, 10, 14, 10, 11,
13, 9, 10, 12, 14, 8, 7, 11, 12, 8, 13, 14, 9, 11, 13, 13, 12,
14, 10, 13, 12, 14, 12, 13, 13, 14, 14), c(0, 0, 2, 2, 2, 2,
2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 6,
6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 9, 9, 9, 9, 10,
10, 10, 11, 11, 12, 12, 13), c(6, 11, 5, 15, 16, 8, 19, 1, 9,
20, 33, 32, 36, 2, 13, 18, 28, 39, 4, 14, 23, 25, 29, 45, 7,
26, 34, 40, 22, 30, 35, 43, 47, 49, 0, 12, 17, 27, 37, 41, 42,
46, 50, 51, 3, 10, 21, 24, 31, 38, 44, 48, 52, 53), c(1, 0, 6,
5, 2, 4, 3, 11, 15, 8, 9, 13, 14, 7, 12, 10, 16, 19, 20, 18,
23, 22, 17, 21, 25, 24, 33, 32, 28, 29, 26, 30, 27, 31, 36, 39,
34, 35, 37, 38, 40, 41, 45, 43, 42, 44, 47, 46, 48, 49, 50, 51,
52, 53), c(0, 0, 0, 0, 0, 2, 5, 7, 11, 13, 18, 24, 28, 34, 44,
54), c(0, 2, 2, 7, 16, 24, 26, 34, 40, 42, 46, 49, 51, 53, 54,
54), list(c(1, 0, 1), structure(list(), .Names = character(0)),
structure(list(id = c("1351920706", "500102244", "1454425532",
"1625050630", "510838353", "1262640078", "681721364", "1351920717",
"1260750116", "1524975171", "1070293410", "727198538", "715215233",
"1351920666", "500920034")), .Names = "id"), list()), <environment>), class = "igraph")
Just for closure (and to summarise from our chat): Based on the sample data you give, you can extract additional data for every vertex by indexing the corresponding element.
So
V(g)$id
returns
#[1] "1351920706" "500102244" "1454425532" "1625050630" "510838353"
#[6] "1262640078" "681721364" "1351920717" "1260750116" "1524975171"
#[11] "1070293410" "727198538" "715215233" "1351920666" "500920034"

Computing a few difficult metrics from an integer vector in R

For some context, I am working with sports / basketball data. The following vector is for 1 NBA game, and contains the number of points that the home team was ahead or behind at any given point in the game.
dput(leads_vector)
c(0, 0, 0, 0, 0, 0, 0, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 4, 2,
5, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 8, 8, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 11, 11, 9, 9, 9, 9, 9, 9, 9, 9, 11,
11, 9, 9, 9, 11, 11, 11, 11, 12, 13, 13, 13, 13, 13, 13, 15,
14, 14, 13, 13, 13, 13, 11, 14, 14, 14, 14, 14, 14, 14, 14, 14,
14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16,
16, 13, 13, 11, 11, 11, 11, 11, 9, 9, 9, 7, 9, 9, 9, 10, 10,
11, 11, 11, 11, 11, 11, 13, 13, 13, 13, 13, 11, 11, 11, 11, 11,
12, 13, 13, 13, 13, 13, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 12, 13, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 15, 15, 15, 13, 13, 13, 13, 15, 12, 12, 12, 9,
9, 9, 9, 9, 11, 11, 11, 11, 13, 13, 10, 10, 10, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 10, 8, 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 9, 9, 11, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 12, 10, 12, 12, 12,
12, 14, 14, 14, 12, 12, 12, 12, 12, 12, 12, 12, 14, 14, 14, 15,
16, 16, 16, 16, 14, 14, 11, 11, 11, 11, 11, 11, 9, 9, 9, 9, 9,
9, 9, 10, 11, 11, 9, 9, 9, 9, 7, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 3, 2, 1, 1, 1,
3, 3, 3, 3, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 6, 6, 6, 6, 6,
6, 6, 6, 7, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 8, 8, 7, 7, 7,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 11, 11, 11, 11,
9, 9, 9, 9, 9, 9, 10, 11, 11, 11, 8, 11, 8, 10, 10, 11, 11, 11,
11, 11, 9, 11, 11, 11, 10, 10, 10, 12, 12, 12, 12, 13, 13, 16,
16, 16, 16, 17, 18, 19, 19, 19, 19, 19, 18, 18, 18, 20, 20, 20,
20, 20, 20, 20, 18, 18, 18, 16, 16, 16, 13, 13, 13, 11, 10, 10,
10, 10, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13)
These vectors always start with 0, since the game begins tied at 0-0. leads_vector[100] equals 14, which means the home team was winning by 14 at this point in the game. Note that the numbers in the vector repeat, since the score can remain the same for several plays in a row in a basketball game.
The 4 metrics I would like to compute are:
Biggest Lead
Number of times the game was tied
Longest run (consecutive points for one team)
Lead changes
Biggest Lead is easy to compute:
biggest_lead <- abs(max(leads_vector))
Number of times the game was tied is a bit more difficult to compute:
times_tied <- sum(leads_vector[2:length(leads_vector)] == 0 & leads_vector[1:(length(leads_vector)-1)] != 0)
times_tied checks for all instances in the vector where the value is 0 (the score is tied), and the preceding value in the vector is not 0. This ensures that each sequence of zeros counts as the score being tied only once.
I am not sure how to compute longest run. The longest run in the game is the largest monotonically increasing or decreasing sequence in the vector. Just using the eye test, I notice a long run of 8 at leads_vector[38:65].
Number of lead changes is difficult to compute as well. It would be equal to the number of times the lead went from positive to negative in this vector. The following leads_vector:
c(3, -3, 2, 5, 4, 3, 0, 2, -3, -1, -4, -5, -2, 0, 1)
... would have 4 lead changes (from 3 to -3, from -3 to 2, from 2 to -3, and from -2 to 0 to 1).
Any help with this is appreciated!
EDIT - longest run is the tough stat to compute here, but i'm working on it.
EDIT2 - i think longest run will be easier to compute if i remove repeat values from leads_vector. but i cannot use duplicated() function, because that will remove duplicates in different places in the vector. Instead i'd want to only remove repeat values next to each other (get c(0, -2, 5, 3, 5, 8, 10, 11, 9, 11, 9, 11, ... ))
Computing of longest run:
compute_longest_run <- function(x) {
# Collapse repetitions
x_unique <- rle(x)$values
# Compute score change
score_change <- diff(x_unique)
# Need to compute sum of all subvectors with the same sign
run_side <- sign(score_change)
run_id <- c(1, cumsum(diff(run_side) != 0) + 1)
run_value <- tapply(score_change, run_id, sum)
max(abs(run_value))
}
compute_longest_run(leads_vector)
#> [1] 10
#biggest_lead
with(rle(leads_vector), max(abs(values)))
#number_ties
with(rle(leads_vector), sum(values == 0))
#longest_run
#lead_changes
length(rle(leads_vector[leads_vector != 0] < 0)$values)
I found out how to compute lead changes using the sign() and diff() function. First I need to filter out the values where the lead equals 0, since these are not lead changes for my calculations, even though R's sign() function has different values for (+), (-) and 0. I have this:
lead_changes <- sum(diff(sign(leads_vector[leads_vector != 0]))) / 2
For longest run, I think starting with this, to remove repeat values, is a good start:
lead_changes[c(TRUE, lead_changes[-1] != hL[-length(hLlead_changes])]

sorting columns from lowest to highest values (i.e. 1, 2, 3 etc, not 1, 10, 11...2, 20, 21... etc)

I have a dataset with 50 thousand rows that I want to sort according the the values in one of the columns. The numbers in the column go from 1-30, and when I do the following
data=data[order(data$columnname),]
it gets sorted so that the order of the columns is like this
1, 10, 11 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 3, 30, 4, 5, 6, 7, 8, 9
how could I sort it so that it is like this
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30
For me it seems, that your format is not numeric. Try this:
data$columnname<-as.numeric(data$columnname)
data=data[order(data$columnname),]

Resources