Using mlogit in R dependant and independent categorical variables - r

I have two vectors (A and B) with categorical data of 36 subjects. A_i,j being the categorytype1 j, subject i fits into and B_i,k is categorytype2 k of subject i. With i=1:36, j=1:5 and k=1:6.
library(mlogit)
AB <- read.csv("C:/.../AB.csv")
head(AB)
Subject A B
1 1 1 3
2 2 3 3
3 3 1 6
4 4 1 3
5 5 1 2
6 6 1 4
I would like to find a probability for every category combination. So with what chance does a subject choose category j and k for all j=1:5 and k=1:6.
I was told the probit/logit model was a great tool to use for this problem and I tried estimating it in R.
mldata<-mlogit.data(AB, choice="A", alt.var="B", shape="long", id.var = "Subject")
Gives me an error and I can not find my mistake.
Error in `row.names<-.data.frame`(`*tmp*`, value = c("1.3", "1.3", "1.6", :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘1.3’, ‘2.2’, ‘2.3’, ‘3.1’,‘3.5’,‘4.2’,‘4.3’, ‘5.3’, ‘5.4’, ‘6.5’, ‘7.3’, ‘8.2’, ‘8.3’
I tried looking through the help files but has not helped me a lot.
I hope someone can point out the mistake(s) I'm making.
Thank you very much for your help.

Post output of dput(A) and dput(b) and specify what the first couple of answers should be. . Looks like you want rowSums(.)/6 across some logical operation on those two matrices. Probably:
rowSums(A==B)/6

Related

Creating a variable with randomized number from an old variable (with a higher population)

I am not very good at R, and I have a problem. As I want to do a linear regression between two variables from different datasets, i run into the proble, that one dataset is way bigger than the other. So, in order to bypass that problem, I want to create a smaller variable with an equal population, randomly selected from the greater datasets variable. What is the command for that? And if any specification is needed for that, please let me know! Thank you so much for your help!
Tried to make a liner regression out of two datasets, but as one is bigger than the other, it did not help, and the line (error)
Error in model.frame.default(formula = lobby_expenditure$expend \~ compustat$lct, :
variable lengths differ (found for 'compustat$lct')
appeared
Here is a simple example; y comes from d2 and a sample of rows from d1 are selected for x
d1=data.frame(x=rnorm(100))
d2=data.frame(y=rnorm(10))
lm(d2$y~d1[sample(1:nrow(d1),nrow(d2)),"x"])
To get any sample rows, use dplyr::sample_n
Example : dataset :
df2 <- read_table('Individual Site
1 A
2 B
3 A
4 C
5 C
6 B
7 A
8 B
9 C')
with sample_n(df2,2) where 2 is number of samples you want, you can get random rows. The following output may differ in your case since its random.
#A tibble: 2 x 2
Individual Site
<dbl> <chr>
1 4 C
2 5 C

getting an error when creating a lagged variable in R

related question link
Hi, I have created a lagged variable using instructions from the answer to the above question (link above). It says that to create a lagged variable I need to use:
library(data.table)
data = data[, lag.value:=c(NA, value[-.N]), by=groups]
Or, alternatively:
data = data[, lag.value := shift(value, 1L), keyby = groups]
This is what I got from the related questions answers below:
related question 1 link
(I might be not entirely right with the second method because it is a bit complicated there so please correct me if it's wrong)
In any case as I use any of these methods I get an error:
Error in `[.data.frame`(data, , `:=`(lag.value, c(NA, :
unused argument (by = groups)
Could you please explain what I'm doing wrong here and what I should do to avoid the error?
data:
time value groups
1 3 a
2 3 a
3 4 a
4 4 a
1 1 b
2 2 b
3 5 b
4 5 b
and the variable I want to create is lag.value which is value lagged by 1 within groups

R error: Error in `row.names<-.data.frame`(`*tmp*`, value = value)

I just make up a data set to test the function "mlogit" which stands for "multinomial logistic regression model"
The data is simply:
head(dat)
y x1 x2 x3
1 4 1 18 4
2 5 1 20 5
3 2 1 25 3
4 3 0 26 6
5 4 0 26 8
6 3 1 27 4
Then when I type
fit <- mlogit(y ~ x1 + x2 + x3, data=dat)
the following Message appears:
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
invalid 'row.names' length
Does anyone know why or how to solve it?
The help states:
The ‘data’ argument may be an ordinary ‘data.frame’. In this case,
some supplementary arguments should be provided and are passed to
‘mlogit.data’.
You have not given any supplementary arguments. Note that I consider this poor documentation because it does not state which supplementary arguments should be provided.
From the examples, it seems that "shape" and "choice" should at least be set:
# a data.frame in wide format with two missing prices
Fishing2 <- Fishing
Fishing2[1, "price.pier"] <- Fishing2[3, "price.beach"] <- NA
mlogit(mode~price+catch|income, Fishing2, shape="wide", choice="mode", varying = 2:9)
# a data.frame in long format with three missing lines
data("TravelMode", package = "AER")
Tr2 <- TravelMode[-c(2, 7, 9),]
mlogit(choice~wait+gcost|income+size, Tr2, shape = "long",
chid.var = "individual", alt.var="mode", choice = "choice")
By the way, welcome to stackoverflow! Here are some tips on writing a better question and thus increasing the chance of a good answer.
you should state the package from which your command comes. I'm assuming it is from the mlogit package, but the mlogit command is in every package.
you should give a minimal example. You give the output of the head command, but it's not clear if the error can be reproduced with that. library(mlogit) should also be given in your minimal example.
you should read the help for the command. Help files can be intimidating and very technical, but you don't have to understand everything in them. In your example, I'm guessing that some supplementary arguments should be provided would have jumped out at you. In case you're not sure how to access help for the command mlogit, you can use ?mlogit or help(mlogit).

covariance matrix from a community list with grouping factors

I am still learning to use data.table (from the data.table package) and even after looking for help on the web and the help files, I am still struggling to do what I want.
I have a large data table with over 60 columns (the first three corresponding to factors and the remaining to response variables, in this case different species) and several rows corresponding to the different levels of the treatments and the species abundances. A very small version looks like this:
> TEST<-data.table(Time=c("0","0","0","7","7","7","12"),
Zone=c("1","1","0","1","0","0","1"),
quadrat=c(1,2,3,1,2,3,1),
Sp1=c(0,4,29,9,1,2,10),
Sp2=c(20,17,11,15,32,15,10),
Sp3=c(1,0,1,1,1,1,0))
>setkey(TEST,Time)
> TEST
Time Zone quadrat Sp1 Sp2 Sp3
1: 0 1 1 0 20 1
2: 0 1 2 4 17 0
3: 0 0 3 29 11 1
4: 12 1 1 10 10 0
5: 7 1 1 9 15 1
6: 7 0 2 1 32 1
7: 7 0 3 2 15 1
I need to calculate the sum of the covariances for each Zone x quadrat group. If I only had the species list for a given Zone x quadrat combination, then I could use the cov() function but using cov() in the same way that I would use mean() or sum() in
Abundance = TEST[,lapply(.SD,mean),by="Zone,quadrat"]
does not work as I get the following error message:
Error in cov(value) : supply both 'x' and 'y' or a matrix-like 'x'
I understand why but I cannot figure out how to solve this.
What I exactly want is to be able to get, for each Zone x quadrat combination, the covariance matrix of all the species across all the sampling Time points. From each matrix, I then need to calculate the sum of the covariances of all pairs of species, so that then I can have a sum of covariance for each Zone x quadrat combination.
Any help would be greatly appreciated, Thanks.
From the help provided above by #Frank and some additional searching that I did around the use of the upper.tri function, the following code works:
Cov= TEST[,sum(cov(.SD)[upper.tri(cov(.SD), diag = FALSE)]), by='Zone,quadrat', .SDcols=paste('Sp',1:3,sep='')]
The initial version proposed, where upper.tri() did not appear in [ ] only extracted logical values from the covariance matrix and having diag = FALSE allowed to exclude the diagonal values before summing the upper triangle of the matrix. In my case, I didn't care whether it was the upper or lower triangle but I'm sure that using lower.tri() would work equally well.
I hope this helps other users who might encounter a similar issue.

ChoiceModelR - Hierarchical Bayes Multinomial Logit Model

I hope that some of you are a bit experienced with the R package ChoiceModelR by Sermas and Colias, to estimate a Hierarchical Bayes Multinomial Logit Model. Actually, I am quite a newbie on both R and Hierarchical Bayes. However, I tried to get some estimates by using the script provided by Sermas and Colias in the help file. I have a data set in the same structure as they use (ID, choice set, alternative, independent variables, and choice variable). I have four independent variables all of them binary coded as categorical variables, none of them restricted. I have eight choice sets with three alternatives within each set as well as one no-choice-option as fourth alternative. I tried the following script:
library (ChoiceModelR)
data <- read.delim("Z:/KLU/CSR/CBC/mp3_vio.txt")
xcoding=c(0,0,0,0)
mcmc = list(R = 10, use = 10)
options = list(none=FALSE, save=TRUE, keep=1)
attlevels=c(2,2,2,2)
c1=matrix(c(0,0,0,0),2,2)
c2=matrix(c(0,0,0,0),2,2)
c3=matrix(c(0,0,0,0),2,2)
c4=matrix(c(0,0,0,0),2,2)
constraints = list(c1, c2, c3, c4)
out = choicemodelr(data, xcoding, mcmc = mcmc, options = options, constraints = constraints)
and have got the following error message:
Error in 1:nalts[i] : result would be too long a vector
In addition: There were 50 or more warnings (use warnings() to see the first 50). The mentioned warnings are of the following:
In max(temp[temp[, 2] == j, 3]) : no non-missing arguments to max; returning -Inf
In max(temp[temp[, 2] == j, 3]) : no non-missing arguments to max; returning -Inf
Actually, I have no idea what went wrong so far as I used the same data structure even I have more independent variables, more choice sets, and more alternatives within a choice set. I would be fantastic if anybody can shed some light into the darkness
I know that this may not be helpful since you posted so long ago, but if it comes up again in the future, this could prove useful.
One of the most common reasons for this error (in my experience) has been that either the scenario variable or the alternative variable is not in ascending order within your data.
id scenario alt x1 ... y
1 1 1 4 1
1 1 2 1 0
1 3 1 4 2
1 3 2 5 0
2 1 4 3 1
2 1 5 1 0
2 2 1 4 2
2 2 2 3 0
This dataset will give you errors since the scenario and alternative variables must be ascending, and they must not skip any values. Just to fully reiterate what I mean, the scenario and alt variables must be reordered as follows in order to work:
id scenario alt x1 ... y
1 1 1 4 1
1 1 2 1 0
1 2 1 4 2
1 2 2 5 0
2 1 1 3 1
2 1 2 1 0
2 2 1 4 2
2 2 2 3 0
I work with ChoiceModelR quite frequently, and this is what has caused these errors for me in the past. If you have a github account, you can also post your data (or modified data) there if you end up wanting to have other users take a look.

Resources