Related
I am trying to get the marginal effects from a multinomial model derived from the mlogit package but it shows an error. Can anyone provide some guidance to solve the problem? Many thanks!
# data
df1 <- structure(list(Y = c(3, 4, 1, 2, 3, 4, 1, 5, 2, 3, 4, 2, 1, 4,
1, 5, 3, 3, 3, 5, 5, 4, 3, 5, 4, 2, 5, 4, 3, 2, 5, 3, 2, 5, 5,
4, 5, 1, 2, 4, 3, 1, 2, 3, 1, 1, 3, 2, 4, 2, 2, 4, 1, 5, 3, 1,
5, 2, 3, 4, 2, 4, 5, 2, 4, 1, 4, 2, 1, 5, 3, 2, 1, 4, 4, 1, 5,
1, 1, 1, 4, 5, 5, 3, 2, 3, 3, 2, 4, 4, 5, 3, 5, 1, 2, 5, 5, 1,
2, 3), D = c(12, 8, 6, 11, 5, 14, 0, 22, 15, 13, 18, 3, 5, 9,
10, 28, 9, 16, 17, 14, 26, 18, 18, 23, 23, 12, 28, 14, 10, 15,
26, 9, 2, 30, 18, 24, 27, 7, 6, 25, 13, 8, 4, 16, 1, 4, 5, 18,
21, 1, 2, 19, 4, 2, 16, 17, 23, 15, 13, 21, 24, 14, 27, 6, 20,
6, 19, 8, 7, 23, 11, 11, 1, 22, 21, 4, 27, 6, 2, 9, 18, 30, 26,
22, 10, 1, 4, 7, 26, 15, 26, 18, 30, 1, 11, 29, 25, 3, 19, 15
), x1 = c(13, 12, 4, 3, 16, 16, 15, 13, 1, 15, 10, 16, 1, 17,
7, 13, 12, 6, 8, 16, 16, 11, 7, 16, 5, 13, 12, 16, 17, 6, 16,
9, 14, 16, 15, 5, 7, 2, 8, 2, 9, 9, 15, 13, 9, 4, 16, 2, 11,
13, 11, 6, 4, 3, 7, 4, 12, 2, 16, 14, 3, 13, 10, 11, 10, 4, 11,
16, 8, 12, 14, 9, 4, 16, 16, 12, 9, 10, 6, 1, 3, 8, 7, 7, 5,
16, 17, 10, 4, 15, 10, 8, 3, 13, 9, 16, 12, 7, 4, 11), x2 = c(12,
19, 18, 19, 15, 12, 15, 16, 15, 11, 12, 16, 17, 14, 12, 17, 17,
16, 12, 20, 11, 11, 15, 14, 18, 10, 14, 13, 10, 14, 18, 18, 18,
17, 18, 14, 16, 19, 18, 16, 18, 14, 17, 10, 16, 12, 16, 15, 11,
18, 19, 15, 19, 11, 16, 10, 20, 14, 10, 12, 10, 15, 13, 15, 11,
20, 11, 12, 16, 16, 11, 15, 11, 11, 10, 10, 16, 11, 20, 17, 20,
17, 16, 11, 18, 19, 18, 14, 17, 11, 16, 11, 18, 14, 15, 16, 11,
14, 11, 13)), class = "data.frame", row.names = c(NA, -100L))
library(mlogit)
mld <- mlogit.data(df1, choice="Y", shape="wide") # shape data for `mlogit()`
mlfit <- mlogit(Y ~ 1 | D + x1 + x2, reflevel="1", data=ml.d) # fit the model
effects(mlfit) # this shows the following error:
Error in if (rhs %in% c(1, 3)) { : argument is of length zero
Called from: effects.mlogit(mlfit)
I believe you are missing the covariate information that needs to be put there, so if you use effects(mlfit, covariate = 'D'), It should work. Now the error is coming because the default of covariate is NULL. NULL is special in R, it has no(zero) length and hence you are getting argument of length zero. Please let me know if it fixes your issue.
As per documentation of effects.mlogit , it says:
covariate
the name of the covariate for which the effect should be computed,
I am getting this output at my end:
R>effects(mlfit, covariate = 'D')
1 2 3
-0.003585105992 -0.070921137682 -0.026032167377
4 5
0.078295227196 0.022243183855
I want to construct a 3D ribbon plot with the following data.
structure(c(10, 10, 10, 10, 10, 10, 21, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 20, 10, 10, 10, 10, 10, 10, 10, 21, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 10, 10, 10, 19,
10, 10, 10, 21, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
20, 10, 20, 9, 9, 9, 9, 9, 21, 9, 9, 9, 18, 9, 9, 9, 9, 9, 9,
9, 9, 19, 9, 8, 8, 16, 8, 16, 8, 21, 20, 8, 8, 16, 8, 8, 8, 8,
8, 18, 8, 8, 19, 8, 9, 9, 9, 9, 9, 9, 21, 20, 9, 9, 9, 9, 9,
9, 9, 9, 19, 9, 9, 18, 9, 8, 8, 16, 8, 16, 8, 21, 20, 8, 8, 8,
8, 8, 8, 8, 8, 19, 8, 8, 18, 8, 7, 7, 14, 7, 16, 7, 21, 20, 7,
18, 7, 7, 7, 7, 14, 7, 19, 7, 7, 16, 7, 8, 8, 16, 8, 8, 8, 20,
19, 8, 21, 8, 8, 8, 8, 16, 8, 18, 8, 8, 8, 8, 8, 8, 16, 8, 8,
8, 20, 19, 16, 21, 8, 8, 8, 8, 16, 8, 18, 8, 8, 8, 8, 8, 8, 17,
8, 16, 8, 20, 18, 8, 21, 8, 8, 8, 8, 16, 8, 18, 8, 8, 8, 8, 7,
7, 16, 16, 16, 7, 18, 20, 7, 21, 16, 7, 7, 7, 7, 7, 19, 7, 7,
7, 7), .Dim = c(21L, 12L), .Dimnames = list(c("colmA", "colmB",
"colmC", "colmD", "colmE", "colmF", "colmG", "colmH", "colmI",
"colmJ", "colmK", "colmL", "colmM", "colmN", "colmO", "colmP",
"colmQ", "colmR", "colmS", "colmT", "colmU"), c("2005", "2006",
"2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014",
"2015", "2016")))
I have to work out a code in the meanwhile as I did not get any response. Here is the code.
ribbon3D(x = 1:21, y = 1:12, z = tf14, scale = T, expand = 0.01, bty = "g", along = "y",
col = "pink", border = "black", shade = 0.2, ltheta = -90, lphi = 30, space = 0.5,
ticktype = "detailed", d = 2, curtain = T, xlab = "", ylab = "", zlab = "")
# Use text3D to label x axis
text3D(x = 1:21, y = rep(0.5, 21), z = rep(1, 21),
labels = rownames(tf14),
add = TRUE, adj = 0, lphi = 30, ltheta = -90)
# Use text3D to label y axis
text3D(x = rep(0.5, 12), y = 1:12, z = rep(1, 12),
labels = colnames(tf14),
add = TRUE, adj = 1, lphi = 30, ltheta = -90)
But, the image that I get is not the desired one. The axis labels are cluttered and the side on which years are displayed needs to be right hand side. Also, the height of the ribbons is too low.
Can somebody improve the code?
I want to generate a Sankey plot to visualize movements to different areas using sankeyNetwork() from the package networkd3 in r. I tried to mimic some examples as perfectly as possible. But when I run the function sankeyNetwork, no output is generated. On top of that, R doesn't show any warnings, erros et cetera. Therefore, I can't really check whether I made mistakes (obviously, because no plot is generated) and how to fix them. I provided a sample df and the code below.
library(networkD3)
nodes <- data.frame(area = c("a", "b", "c", "d", "e", "f", "g",
"h", "i", "j", "k", "l", "m", "n"))
links2 <- data.frame(source = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 9, 9, 9, 9, 10, 10, 11, 11, 11, 12, 13, 13),
target = c(2, 8, 10, 11, 13, 0, 4, 5, 6, 7, 10, 11, 13, 0, 4, 9, 10, 12, 13, 0, 5, 6, 7, 10, 11, 13, 7, 10, 12,
0, 10, 11, 12, 13, 8, 9, 10, 11, 12, 13, 9, 10, 13, 10, 12, 13, 0, 11, 12, 13, 0, 14, 0, 0),
value = c(14, 4, 6, 23, 3, 6, 36, 3, 4, 4, 3, 12, 3, 24, 3, 6, 19, 3, 9, 3, 6, 3, 11, 9, 3, 22, 3, 3, 10, 3, 4,
3, 3, 9, 12, 5, 16, 13, 3, 10, 3, 4, 9, 7, 4, 4, 77, 4, 6, 6, 27, 3, 3, 3))
sankeyNetwork(Links = links2, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "area",
fontSize= 12, nodeWidth = 30)
You refer to 15 unique nodes in your links2 data frame, but you only have 14 unique nodes in your nodes data frame.
length(unique(c(links2$source, links2$target)))
# [1] 15
length(nodes$area)
# [1] 14
If you add another node, it will work...
library(networkD3)
nodes <- data.frame(area = c("a", "b", "c", "d", "e", "f", "g",
"h", "i", "j", "k", "l", "m", "n", "o"))
links2 <- data.frame(source = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 9, 9, 9, 9, 10, 10, 11, 11, 11, 12, 13, 13),
target = c(2, 8, 10, 11, 13, 0, 4, 5, 6, 7, 10, 11, 13, 0, 4, 9, 10, 12, 13, 0, 5, 6, 7, 10, 11, 13, 7, 10, 12,
0, 10, 11, 12, 13, 8, 9, 10, 11, 12, 13, 9, 10, 13, 10, 12, 13, 0, 11, 12, 13, 0, 14, 0, 0),
value = c(14, 4, 6, 23, 3, 6, 36, 3, 4, 4, 3, 12, 3, 24, 3, 6, 19, 3, 9, 3, 6, 3, 11, 9, 3, 22, 3, 3, 10, 3, 4,
3, 3, 9, 12, 5, 16, 13, 3, 10, 3, 4, 9, 7, 4, 4, 77, 4, 6, 6, 27, 3, 3, 3))
sankeyNetwork(Links = links2, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "area",
fontSize= 12, nodeWidth = 30)
For some context, I am working with sports / basketball data. The following vector is for 1 NBA game, and contains the number of points that the home team was ahead or behind at any given point in the game.
dput(leads_vector)
c(0, 0, 0, 0, 0, 0, 0, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 4, 2,
5, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 8, 8, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 11, 11, 9, 9, 9, 9, 9, 9, 9, 9, 11,
11, 9, 9, 9, 11, 11, 11, 11, 12, 13, 13, 13, 13, 13, 13, 15,
14, 14, 13, 13, 13, 13, 11, 14, 14, 14, 14, 14, 14, 14, 14, 14,
14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16,
16, 13, 13, 11, 11, 11, 11, 11, 9, 9, 9, 7, 9, 9, 9, 10, 10,
11, 11, 11, 11, 11, 11, 13, 13, 13, 13, 13, 11, 11, 11, 11, 11,
12, 13, 13, 13, 13, 13, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 12, 13, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 15, 15, 15, 13, 13, 13, 13, 15, 12, 12, 12, 9,
9, 9, 9, 9, 11, 11, 11, 11, 13, 13, 10, 10, 10, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 10, 8, 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 9, 9, 11, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 12, 10, 12, 12, 12,
12, 14, 14, 14, 12, 12, 12, 12, 12, 12, 12, 12, 14, 14, 14, 15,
16, 16, 16, 16, 14, 14, 11, 11, 11, 11, 11, 11, 9, 9, 9, 9, 9,
9, 9, 10, 11, 11, 9, 9, 9, 9, 7, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 3, 2, 1, 1, 1,
3, 3, 3, 3, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 6, 6, 6, 6, 6,
6, 6, 6, 7, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 8, 8, 7, 7, 7,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 11, 11, 11, 11,
9, 9, 9, 9, 9, 9, 10, 11, 11, 11, 8, 11, 8, 10, 10, 11, 11, 11,
11, 11, 9, 11, 11, 11, 10, 10, 10, 12, 12, 12, 12, 13, 13, 16,
16, 16, 16, 17, 18, 19, 19, 19, 19, 19, 18, 18, 18, 20, 20, 20,
20, 20, 20, 20, 18, 18, 18, 16, 16, 16, 13, 13, 13, 11, 10, 10,
10, 10, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13)
These vectors always start with 0, since the game begins tied at 0-0. leads_vector[100] equals 14, which means the home team was winning by 14 at this point in the game. Note that the numbers in the vector repeat, since the score can remain the same for several plays in a row in a basketball game.
The 4 metrics I would like to compute are:
Biggest Lead
Number of times the game was tied
Longest run (consecutive points for one team)
Lead changes
Biggest Lead is easy to compute:
biggest_lead <- abs(max(leads_vector))
Number of times the game was tied is a bit more difficult to compute:
times_tied <- sum(leads_vector[2:length(leads_vector)] == 0 & leads_vector[1:(length(leads_vector)-1)] != 0)
times_tied checks for all instances in the vector where the value is 0 (the score is tied), and the preceding value in the vector is not 0. This ensures that each sequence of zeros counts as the score being tied only once.
I am not sure how to compute longest run. The longest run in the game is the largest monotonically increasing or decreasing sequence in the vector. Just using the eye test, I notice a long run of 8 at leads_vector[38:65].
Number of lead changes is difficult to compute as well. It would be equal to the number of times the lead went from positive to negative in this vector. The following leads_vector:
c(3, -3, 2, 5, 4, 3, 0, 2, -3, -1, -4, -5, -2, 0, 1)
... would have 4 lead changes (from 3 to -3, from -3 to 2, from 2 to -3, and from -2 to 0 to 1).
Any help with this is appreciated!
EDIT - longest run is the tough stat to compute here, but i'm working on it.
EDIT2 - i think longest run will be easier to compute if i remove repeat values from leads_vector. but i cannot use duplicated() function, because that will remove duplicates in different places in the vector. Instead i'd want to only remove repeat values next to each other (get c(0, -2, 5, 3, 5, 8, 10, 11, 9, 11, 9, 11, ... ))
Computing of longest run:
compute_longest_run <- function(x) {
# Collapse repetitions
x_unique <- rle(x)$values
# Compute score change
score_change <- diff(x_unique)
# Need to compute sum of all subvectors with the same sign
run_side <- sign(score_change)
run_id <- c(1, cumsum(diff(run_side) != 0) + 1)
run_value <- tapply(score_change, run_id, sum)
max(abs(run_value))
}
compute_longest_run(leads_vector)
#> [1] 10
#biggest_lead
with(rle(leads_vector), max(abs(values)))
#number_ties
with(rle(leads_vector), sum(values == 0))
#longest_run
#lead_changes
length(rle(leads_vector[leads_vector != 0] < 0)$values)
I found out how to compute lead changes using the sign() and diff() function. First I need to filter out the values where the lead equals 0, since these are not lead changes for my calculations, even though R's sign() function has different values for (+), (-) and 0. I have this:
lead_changes <- sum(diff(sign(leads_vector[leads_vector != 0]))) / 2
For longest run, I think starting with this, to remove repeat values, is a good start:
lead_changes[c(TRUE, lead_changes[-1] != hL[-length(hLlead_changes])]
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Forenote: this is a follow-up question to this one.
I've programmed a Boggle Game Solver in R (see this github page for source code), and find its performance disappointing.
Here's how it works...
# Say we have the following set of letters
bog.letters <- c("t", "e", "n", "s", "d", "a", "i", "o",
"l", "e", "r", "o", "c", "f", "i", "e")
# We get the list of paths (permutations) from a pre-existing list
paths <- paths.by.length[[6]] # 6th element corresponds to 8-element "paths"
dim(paths) # [1] 183472 8
# The following function is the key here,
# mapping the 183,472 combinations to the 16 letters
candidates <- apply(X = paths, MARGIN = 1, FUN = function(x) paste(bog.letters[x], collapse=""))
# The only remaining thing is to intersect the candidate words
# with the actual words from our dictionary
dict.words <- dict.fr$mot[dict.fr$taille == 8]
valid.words <- intersect(candidates, dict.words)
Reproducible example for 13-letter words candidates
bog.letters <- c("t", "e", "n", "s", "d", "a", "i", "o", "l", "e", "r", "o", "c", "f", "i", "e")
n.letters <- 13
paths <- structure(list(V1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), V2 = c(2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2), V3 = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3),
V4 = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), V5 = c(7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7), V6 = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6), V7 = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), V8 = c(9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9), V9 = c(10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10), V10 = c(11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 13, 13, 13, 13,
13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14), V11 = c(8, 8,
12, 12, 12, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 14, 14,
14, 14, 14, 14, 14, 11, 11, 11, 11, 11, 11, 11, 11), V12 = c(12,
12, 15, 15, 16, 15, 15, 12, 12, 14, 16, 12, 12, 15, 15, 11,
11, 11, 11, 15, 15, 15, 8, 12, 12, 12, 15, 15, 16, 16), V13 = c(15,
16, 14, 16, 15, 12, 16, 8, 16, 13, 12, 8, 15, 12, 14, 8,
12, 15, 16, 11, 12, 16, 12, 8, 15, 16, 12, 16, 12, 15)), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11",
"V12", "V13"), row.names = c(NA, 30L), class = "data.frame")
candidates <- apply(X = paths, MARGIN = 1, FUN = function(x) paste(bog.letters[x], collapse=""))
For such a small path list, this is pretty fast. But the actual number of paths for 13-letter words is 2,644,520. So it can take a minute or even more to find all candidates. Using doSNOW, I am able to parrallelize the searches, reducing the total time by a significant amount, but there is a huge drawback to this: when using a normal loop, I can exit/break whenever I reach the point where no more words are found. This is not obvious (impossible?) to do with parrallel processes.
So my question is: can you think of a better function/algorithm for this task? Some websites provide solutions to Boggle game in a matter of seconds... Either they generated all possible letter combinations and stored the results in a database (!), else they clearly use a better algorithm (and probably a compiled language) to achieve those results.
Any ideas?
Using cpp_str_split function from the Rcpp Gallery, running time is now reduced to 3secs for 2644520 paths.
library(stringi)
paths <- data.frame(matrix(sample(1:16, 13*2644520, TRUE), ncol=13))
a1 <- stri_c(bog.letters[t(as.matrix(paths))], collapse="")
candidates <- cpp_str_split(a1, 13)[[1]]
For 2644520 paths, apply approach takes about 80secs on my notebook.