How to extract or predict latent class membership in gmnl? - r

Let's say you run the example for a latent class model from ?gmnl:
library(mlogit)
library(gmnl)
## Examples using the Electricity data set from the mlogit package
data("Electricity", package = "mlogit")
Electr <- mlogit.data(Electricity, id.var = "id", choice = "choice",
varying = 3:26, shape = "wide", sep = "")
## Estimate a LC model with 2 classes
Elec.lc <- gmnl(choice ~ pf + cl + loc + wk + tod + seas| 0 | 0 | 0 | 1,
data = Electr,
subset = 1:3000,
model = 'lc',
panel = TRUE,
Q = 2)
summary(Elec.lc)
You get a fitted model with coefficient estimates for two classes (class 1 & 2). Is there a way to extract (or predict) for each observation, what the most likely class is that this observation belongs to?
After several helpful comments and lots of digging, it seems that there is an undocumented feature that allows you to get predicted class probabilities, which are stored in Wnq. You get one entry per observation and the number of columns matches the number of latent classes (Q = 2 from above), and entries sum to 1.
## Get class probabilities
head(Elec.lc$Wnq)
init
[1,] 0.5547805 0.4452195
[2,] 0.5547805 0.4452195
[3,] 0.5547805 0.4452195
[4,] 0.5547805 0.4452195
[5,] 0.5547805 0.4452195
[6,] 0.5547805 0.4452195

The fitted model contains a matrix called prob.alt which gives the probability of each choice, so you can do:
predictions <- apply(Elec.cor$prob.alt,1, which.max)
predictions
#> [1] 1 1 2 3 1 4 4 3 3 3 2 1 2 2 3 1 1 1 2 3 4 4 4 1 1 4 1 1 4 4 4 2 4 3 1 2 4
#> [38] 4 4 1 1 4 1 1 4 4 4 2 1 1 2 3 4 4 4 2 4 3 4 2 1 4 2 2 2 2 4 2 1 3 4 3 4 4
#> [75] 4 1 4 2 3 2 2 1 3 3 4 3 4 1 1 4 2 1 4 4 2 2 2 2 2 2 1 4 2 2 2 2 1 2 2 4 3
#> [112] 1 1 1 2 3 4 4 4 2 4 3 4 1 1 4 2 1 4 4 2 2 1 4 2 2 2 2 1 2 1 2 4 3 2 2 2 2
#> [149] 1 4 2 2 2 1 2 1 4 3 2 2 2 1 2 1 1 4 2 1 4 2 2 2 2 1 2 1 1 4 3 2 2 2 2 1 4
#> [186] 2 2 2 2 4 2 1 4 3 2 2 2 2 2 1 1 4 2 1 4 4 3 2 2 4 4 1 3 4 1 2 4 3 1 1 1 2
#> [223] 3 4 4 4 1 2 4 2 3 4 4 1 3 4 2 3 3 2 4 1 1 4 4 4 2 1 3 1 2 1 1 2 3 1 4 4 2
#> [260] 4 3 2 1 2 4 2 3 3 4 1 3 4 2 3 3 4 4 4 4 4 1 3 2 3 1 3 3 1 4 2 1 4 4 2 2 1
#> [297] 3 1 1 4 2 4 1 2 4 1 1 4 4 4 2 1 1 2 3 4 4 4 2 4 3 4 1 1 1 2 3 1 4 4 3 4 3
#> [334] 2 1 1 4 1 1 4 4 2 2 1 3 1 3 1 4 2 2 2 2 1 2 1 3 4 3 2 2 2 2 1 4 3 2 2 2 1
#> [371] 2 4 4 1 3 4 2 3 3 2 1 3 3 3 3 4 1 1 4 1 1 4 4 2 2 2 4 2 3 4 4 4 1 4 2 3 2
#> [408] 1 4 3 2 2 2 1 2 1 1 4 3 1 1 2 3 4 4 4 3 3 3 2 1 2 4 3 4 4 4 3 4 3 4 3 4 1
#> [445] 1 4 1 1 4 4 4 2 1 4 2 2 2 2 1 2 1 3 4 3 1 4 2 2 2 2 1 2 4 2 4 3 3 3 4 1 1
#> [482] 4 2 1 4 4 2 2 2 2 3 1 1 1 2 3 4 4 4 2 2 4 2 3 4 4 4 3 4 2 3 2 2 4 2 3 4 4
#> [519] 1 1 4 2 3 2 2 4 1 1 4 4 4 2 2 3 1 3 2 1 2 2 1 4 4 2 2 2 4 2 1 4 3 2 2 2 4
#> [556] 2 1 1 4 2 1 4 2 2 2 2 1 2 1 2 4 3 1 1 2 3 4 4 4 2 4 3 4 2 4 4 4 3 4 2 3 3
#> [593] 3 1 3 3 1 1 2 3 1 4 4 3 4 3 2 1 2 2 2 2 1 4 3 2 2 2 2 2 2 4 2 3 3 4 1 3 4
#> [630] 2 3 3 2 3 1 1 4 4 4 2 2 3 1 3 1 1 2 3 1 4 4 3 3 3 4 1 4 4 4 3 4 1 4 3 1 1
#> [667] 3 3 2 2 3 1 1 1 2 3 1 4 4 2 1 4 2 2 2 2 1 2 1 1 4 2 1 1 2 3 4 4 4 2 4 3 4
#> [704] 1 2 2 2 2 1 4 2 2 2 2 4 2 2 2 2 2 1 4 3 2 2 2 4 2 1 4 2 2 2 2 4 2 1 3 4 3
#> [741] 1 4 3 2 2 2 2 2 1 1
If we compare these predictions to the actual choice, we see that the prediction is correct about 50% of the time (the values in the diagonal are correct):
table(predictions, Electricity$choice[1:750])
#>
#> predictions 1 2 3 4
#> 1 78 35 28 32
#> 2 40 129 40 33
#> 3 16 27 57 24
#> 4 27 36 38 110
Created on 2022-08-06 by the reprex package (v2.0.1)

I have a feeling that this object Wnq is not class membership probabilities though.
Even in your example above, when calling Elec.lc$Wnq, you seem to have obtained a list of probabilities of class membership for your individuals, but critically they are all equal across individuals.
When looking for this I also found myself with the same problem. I think Elec.lc$Wnq is just the mean of class membership probabilities.
I have not looked throughly in the gmnl code, but I think the object Qir is what you should look for ?

Related

Appending into data frame with a for loop [duplicate]

This question already has answers here:
Return a data frame from function
(2 answers)
Closed 6 years ago.
I want to read some file then removes the NA values from those read and then give the number of observation that left after removing the NAs
i have wrote this script but the result was something so weird
complete <- function(directory, id){
fileList <- list.files(directory, full.names = TRUE)[id]
datafamelist <- data.frame(id = numeric(), nobs = numeric())
for(Rfile in fileList){
cleandata <- na.omit(read.csv(file = Rfile))
datafamelist <- rbind(datafamelist, c(cleandata$ID, nrow(cleandata)))
}
datafamelist
}
and the result was something like that :
complete("~/Desktop/DataSets/specdata", 1:5)
X1L X1L.1 X1L.2 X1L.3 X1L.4 X1L.5 X1L.6 X1L.7 X1L.8 X1L.9 X1L.10 X1L.11 X1L.12 X1L.13 X1L.14 X1L.15 X1L.16 X1L.17 X1L.18 X1L.19
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
X1L.20 X1L.21 X1L.22 X1L.23 X1L.24 X1L.25 X1L.26 X1L.27 X1L.28 X1L.29 X1L.30 X1L.31 X1L.32 X1L.33 X1L.34 X1L.35 X1L.36 X1L.37
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
X1L.38 X1L.39 X1L.40 X1L.41 X1L.42 X1L.43 X1L.44 X1L.45 X1L.46 X1L.47 X1L.48 X1L.49 X1L.50 X1L.51 X1L.52 X1L.53 X1L.54 X1L.55
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
X1L.56 X1L.57 X1L.58 X1L.59 X1L.60 X1L.61 X1L.62 X1L.63 X1L.64 X1L.65 X1L.66 X1L.67 X1L.68 X1L.69 X1L.70 X1L.71 X1L.72 X1L.73
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
X1L.74 X1L.75 X1L.76 X1L.77 X1L.78 X1L.79 X1L.80 X1L.81 X1L.82 X1L.83 X1L.84 X1L.85 X1L.86 X1L.87 X1L.88 X1L.89 X1L.90 X1L.91
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
X1L.92 X1L.93 X1L.94 X1L.95 X1L.96 X1L.97 X1L.98 X1L.99 X1L.100 X1L.101 X1L.102 X1L.103 X1L.104 X1L.105 X1L.106 X1L.107 X1L.108
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
X1L.109 X1L.110 X1L.111 X1L.112 X1L.113 X1L.114 X1L.115 X1L.116 X117L
1 1 1 1 1 1 1 1 1 117
2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5
instead of being like this :
## id nobs
## 1 1 117
## 2 2 000
## 3 3 000
## 4 4 000
## 5 5 000
where the 000 is the number of observed values that supposed to be there
Try to read and form your dataframe like this
setwd("<Your Directory>")
file_list <- list.files()
for (file in file_list){
# if the merged dataset doesn't exist, create it
if (!exists("rawdata")){
rawdata <- read.csv(file)
}
# if the merged dataset does exist, append to it
if (exists("rawdata")){
temp_dataset <- read.csv(file)
rawdata<-rbind(rawdata, temp_dataset)
rm(temp_dataset)
}
}
For NAs, you can check which column contain NA and work according
to check NA, use summary

R predict.glm when newdata has fewer levels

I attempted to prove to myself that predict() will not give incorrect predictions, when labels and levels (the underlying integer for the factor level) of newdata do not match that of the train data.
I think I did prove that, and I'm sharing that code below, but I'd just like to ask what exactly R is doing when predicting for newdata. I know it is not appending newdata to training data, does it translate the factor labels of newdata into the corresponding representation of train data before predicting?
options(stringsAsFactors = TRUE)
dat <- data.frame(x = rep(c("cat", "dog", "bird", "horse"), 100), y = rgamma(100, shape=3, scale = 300))
model <- glm(y~., family = Gamma(link = "log"), data = dat)
coefficients(model)
# (Intercept) xcat xdog xhorse
# 6.5816536 0.2924488 0.3586094 0.2740487
newdata1 <- data.frame(x = "cat")
newdata2 <- data.frame(x = "bird")
newdata3 <- data.frame(x = "dog")
predict.glm(object = model, newdata = newdata1, type = "response")
# 1
# 966.907
exp(6.5816536 + 0.2924488) #intercept + cat coef
# [1] 966.9071
predict.glm(object = model, newdata = newdata2, type = "response")
# 1
# 721.7318
exp(6.5816536)
# [1] 721.7318
predict.glm(object = model, newdata = newdata3, type = "response")
# 1
# 1033.042
exp(6.5816536 + 0.3586094)
# [1] 1033.042
unclass(dat$x)
# [1] 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3
# [87] 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4
# [173] 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3
# [259] 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4
# [345] 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4
# attr(,"levels")
# [1] "bird" "cat" "dog" "horse"
unclass(newdata1$x)
# [1] 1
# attr(,"levels")
# [1] "cat"
unclass(newdata2$x)
# [1] 1
# attr(,"levels")
# [1] "bird"
Model object has an xlevels recording factor levels used for model estimation. For your example, we have:
model$xlevels
#$x
#[1] "bird" "cat" "dog" "horse"
When your new data is presented in prediction, factor levels will be matched. For example, your newdata1 will be matched to "cat" levels, and this is the second level in xlevels. Thus, predict will have no difficulty finding the correct coefficients for that level.

How do I add a vector where I collapse scores from individuals within pairs?

I have done an experiment in which participants have solved a task in pairs, with another participant. Each participant has then received a score for how well they did the task. Pairs have gone through different amounts of trials.
I have a data frame similar to the one below:
participant <- c(1,1,2,2,3,3,3,4,4,4,5,6)
pair <- c(1,1,1,1,2,2,2,2,2,2,3,3)
trial <- c(1,2,1,2,1,2,3,1,2,3,1,1)
score <- c(2,3,6,3,4,7,3,1,8,5,4,3)
data <- data.frame(participant, pair, trial, score)
participant pair trial score
1 1 1 2
1 1 2 3
2 1 1 6
2 1 2 3
3 2 1 4
3 2 2 7
3 2 3 3
4 2 1 1
4 2 2 8
4 2 3 5
5 3 1 4
6 3 1 3
I would like to add a new vector to the data frame, where each participant gets the numeric difference between their own score and the other participant's score within each trial.
Does someone have an idea about how one might do that?
It should end up looking something like this:
participant pair trial score difference
1 1 1 2 4
1 1 2 3 0
2 1 1 6 4
2 1 2 3 0
3 2 1 4 3
3 2 2 7 1
3 2 3 3 2
4 2 1 1 3
4 2 2 8 1
4 2 3 5 2
5 3 1 4 1
6 3 1 3 1
Here's a solution that involves first reordering data such that each sequential pair of rows corresponds to a single pair within a single trial. This allows us to make a single call to diff() to extract the differences:
data <- data[order(data$trial,data$pair,data$participant),];
data$diff <- rep(diff(data$score)[c(T,F)],each=2L)*c(-1L,1L);
data;
## participant pair trial score diff
## 1 1 1 1 2 -4
## 3 2 1 1 6 4
## 5 3 2 1 4 3
## 8 4 2 1 1 -3
## 11 5 3 1 4 1
## 12 6 3 1 3 -1
## 2 1 1 2 3 0
## 4 2 1 2 3 0
## 6 3 2 2 7 -1
## 9 4 2 2 8 1
## 7 3 2 3 3 -2
## 10 4 2 3 5 2
I assumed you wanted the sign to capture the direction of the difference. So, for instance, if a participant has a score 4 points below the other participant in the same trial-pair, then I assumed you would want -4. If you want all-positive values, you can remove the multiplication by c(-1L,1L) and add a call to abs():
data$diff <- rep(abs(diff(data$score)[c(T,F)]),each=2L);
data;
## participant pair trial score diff
## 1 1 1 1 2 4
## 3 2 1 1 6 4
## 5 3 2 1 4 3
## 8 4 2 1 1 3
## 11 5 3 1 4 1
## 12 6 3 1 3 1
## 2 1 1 2 3 0
## 4 2 1 2 3 0
## 6 3 2 2 7 1
## 9 4 2 2 8 1
## 7 3 2 3 3 2
## 10 4 2 3 5 2
Here's a solution built around ave() that doesn't require reordering the whole data.frame first:
data$diff <- ave(data$score,data$trial,data$pair,FUN=function(x) abs(diff(x)));
data;
## participant pair trial score diff
## 1 1 1 1 2 4
## 2 1 1 2 3 0
## 3 2 1 1 6 4
## 4 2 1 2 3 0
## 5 3 2 1 4 3
## 6 3 2 2 7 1
## 7 3 2 3 3 2
## 8 4 2 1 1 3
## 9 4 2 2 8 1
## 10 4 2 3 5 2
## 11 5 3 1 4 1
## 12 6 3 1 3 1
Here's how you can get the score of the other participant in the same trial-pair:
data$other <- ave(data$score,data$trial,data$pair,FUN=rev);
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 2 1 1 2 3 3
## 3 2 1 1 6 2
## 4 2 1 2 3 3
## 5 3 2 1 4 1
## 6 3 2 2 7 8
## 7 3 2 3 3 5
## 8 4 2 1 1 4
## 9 4 2 2 8 7
## 10 4 2 3 5 3
## 11 5 3 1 4 3
## 12 6 3 1 3 4
Or, assuming the data.frame has been reordered as per the initial solution:
data$other <- c(rbind(data$score[c(F,T)],data$score[c(T,F)]));
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 3 2 1 1 6 2
## 5 3 2 1 4 1
## 8 4 2 1 1 4
## 11 5 3 1 4 3
## 12 6 3 1 3 4
## 2 1 1 2 3 3
## 4 2 1 2 3 3
## 6 3 2 2 7 8
## 9 4 2 2 8 7
## 7 3 2 3 3 5
## 10 4 2 3 5 3
Alternative, using matrix() instead of rbind():
data$other <- c(matrix(data$score,2L)[2:1,]);
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 3 2 1 1 6 2
## 5 3 2 1 4 1
## 8 4 2 1 1 4
## 11 5 3 1 4 3
## 12 6 3 1 3 4
## 2 1 1 2 3 3
## 4 2 1 2 3 3
## 6 3 2 2 7 8
## 9 4 2 2 8 7
## 7 3 2 3 3 5
## 10 4 2 3 5 3
Here is an option using data.table:
library(data.table)
setDT(data)[,difference := abs(diff(score)), by = .(pair, trial)]
data
# participant pair trial score difference
# 1: 1 1 1 2 4
# 2: 1 1 2 3 0
# 3: 2 1 1 6 4
# 4: 2 1 2 3 0
# 5: 3 2 1 4 3
# 6: 3 2 2 7 1
# 7: 3 2 3 3 2
# 8: 4 2 1 1 3
# 9: 4 2 2 8 1
#10: 4 2 3 5 2
#11: 5 3 1 4 1
#12: 6 3 1 3 1
A slightly faster option would be:
setDT(data)[, difference := abs((score - shift(score))[2]) , by = .(pair, trial)]
If we need the value of the other pair:
data[, other:= rev(score) , by = .(pair, trial)]
data
# participant pair trial score difference other
# 1: 1 1 1 2 4 6
# 2: 1 1 2 3 0 3
# 3: 2 1 1 6 4 2
# 4: 2 1 2 3 0 3
# 5: 3 2 1 4 3 1
# 6: 3 2 2 7 1 8
# 7: 3 2 3 3 2 5
# 8: 4 2 1 1 3 4
# 9: 4 2 2 8 1 7
#10: 4 2 3 5 2 3
#11: 5 3 1 4 1 3
#12: 6 3 1 3 1 4
Or using dplyr:
library(dplyr)
data %>%
group_by(pair, trial) %>%
mutate(difference = abs(diff(score)))
# participant pair trial score difference
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 1 2 4
#2 1 1 2 3 0
#3 2 1 1 6 4
#4 2 1 2 3 0
#5 3 2 1 4 3
#6 3 2 2 7 1
#7 3 2 3 3 2
#8 4 2 1 1 3
#9 4 2 2 8 1
#10 4 2 3 5 2
#11 5 3 1 4 1
#12 6 3 1 3 1

Can I have different aggregation rules for different columns in acast?

Brain afunctional today: How do I tell acast to return different aggregations?
# the rows and columns have integer names
Rgames> foo
1 2
1 1 1
2 2 2
3 3 3
4 4 4
1 1 4
2 2 8
3 3 2
4 4 1
Rgames> mfoo<-melt(foo)
Rgames> mfoo
Var1 Var2 value
1 1 1 1
2 2 1 2
3 3 1 3
4 4 1 4
5 1 1 1
6 2 1 2
7 3 1 3
8 4 1 4
9 1 2 1
10 2 2 2
11 3 2 3
12 4 2 4
13 1 2 4
14 2 2 8
15 3 2 2
16 4 2 1
Rgames> acast(mfoo,Var1~Var2,function(x)x[1]-x[2])
1 2
1 0 -3
2 0 -6
3 0 1
4 0 3
# what I would like is the casting formula to return
1 2
1 1 -3
2 2 -6
3 3 1
4 4 3
With the caveat that this is a simple example. In the general case, there will be rows with unique names -- but never more than two rows with a given name, so my x[1]-x[2] won't ever fail.
Or should I just use this:
aggregate(foo[,2],by=list((foo[,1])),function(x)x[1]-x[2])

Episode count for each row

I'm sure this has been asked before but for the life of me I can't figure out what to search for!
I have the following data:
x y
1 3
1 3
1 3
1 2
1 2
2 2
2 4
3 4
3 4
And I would like to output a running count that resets everytime either x or y changes value.
x y o
1 3 1
1 3 2
1 3 3
1 2 1
1 2 2
2 2 1
2 4 1
3 4 1
3 4 2
Try something like
df<-read.table(header=T,text="x y
1 3
1 3
1 3
1 2
1 2
2 2
2 4
3 4
3 4")
cbind(df,o=sequence(rle(paste(df$x,df$y))$lengths))
> cbind(df,o=sequence(rle(paste(df$x,df$y))$lengths))
x y o
1 1 3 1
2 1 3 2
3 1 3 3
4 1 2 1
5 1 2 2
6 2 2 1
7 2 4 1
8 3 4 1
9 3 4 2
After seeing #ttmaccer's I see my first attempt with ave was wrong and this is perhaps what is needed:
> dat$o <- ave(dat$y, list(dat$y, dat$x), FUN=seq )
# there was a warning but the answer is corect.
> dat
x y o
1 1 3 1
2 1 3 2
3 1 3 3
4 1 2 1
5 1 2 2
6 2 2 1
7 2 4 1
8 3 4 1
9 3 4 2

Resources