R - Count duplicates values for each row - r

I'm working on a data frame that requires to calculate Fleiss's Kappa for inter-rater agreements. I'm using the 'irr' package for that.
Besides that, I need to count, for each observation, how many of raters are in agreement.
My data looks like these:
a b c
1 1 1 1
2 1 2 2
3 2 3 2
4 3 3 1
5 4 2 1
I'm expecting something like this, , where count stands for number of raters on agreement
a b c count
1 1 1 1 3
2 1 2 2 2
3 2 3 2 2
4 3 3 1 2
5 4 2 1 0
Thanks a lot.

Alternative solution if your data is in a data frame called abc:
as.numeric(apply(abc,1,function(x) {
ux<-unique(x);
tab <- tabulate(match(x, ux));
mode <- ux[tab == max(tab)];
ifelse(length(mode)==1,length(which(x==mode)),NA_character_);
} ))
When you run it gives:
[1] 3 2 2 2 NA

Related

How to generate permutation in data.frame format in R?

I am preparing a questionnaire to ask which transport mode do the respondents use in different conditions in terms of its travel time and cost.
There are three transport modes, two levels of travel time and three levels of travel cost as below:
mode <- c(1:3)
time <- c(1:2)
cost <- c(1:3)
I would generate all combinations of travel time and cost by transport mode but do not know how to generate it easily in R.
In the questionnaire, it shows three modes in one pair of modes with different conditions like the example below. comb indicates combination number of each pair of modes.
comb mode time cost
1 1 1 1
1 2 1 1
1 3 1 1
2 1 1 1
2 2 2 1
2 3 2 1
3 1 2 1
3 2 1 2
3 3 1 1
4 1 1 3
4 2 2 3
4 3 1 1
5 1 1 2
5 2 2 1
5 3 1 3
6 1 1 1
6 2 1 1
6 3 1 1
7 1 1 1
7 2 1 1
7 3 1 1
8 1 1 1
8 2 1 1
8 3 1 1
..... continues till fulfilling all combinations
I used expand.grid() but it returns just 18 combinations of mode, time and cost (3*2*3) without taking permutation by a pair of transport mode into account. I also tried several permutation functions but it may not bring my desired result. I prefer to make it in a data.frame with grouping variable such as comb in the example.
Permute groups in a data.frame R
How to calculate permutations of group labels with R?
It would be highly appreciated to generate all combinations simply..

R merge matrices with function

I would like to merge two matrices with different length on their incommon row.names with a function:
My first matrix (T) looks similar to this:
1 2 3 4
1 -4 3 2 2
1 2 1 1 5
2 3 -2 4 6
2 -2 1 -1 -9
Now I want to join this function into my new matrix (M), however in this matrix there should be only the colsum of the matching rows which are >=0 plus 1:
1 2 3 4
1 2 3 3 3
2 2 2 2 2
I tried following formula, which I found here in the forum, however it does not work:
merge.default(as.data.frame(M), as.data.frame(T), by = "row.names", function(x){colSums(T[,]>0)+1})
Do you have an idea, where my mistake is?
Thank you very much
EDIT: my desired output would be my Matrix T, which is at the moment empty:
T now:
1 2 3 4
1
2
T after merge which is now filled with the function:
colsums(T[,] >=0)+1
1 2 3 4
1 2 3 3 3
2 2 2 2 2
T[1,1]= 2 as there is 1 value in Matrix M which is >=0 and then I add 1 to it
T[2,1]= 3 : two values >=0 and plus 1

Proportion of dataset equal to a value

I have the following dataset called asteroids
3 4 3 3 1 4 1 3 2 3
1 1 4 2 3 3 2 6 1 1
3 3 2 2 2 2 1 3 2 1
6 1 3 2 2 1 2 2 4 2
I need to find out what proportion of this dataset is 1.
If you have a specific value in mind you can just do an equality comparison and then use mean on the resulting logical vector.
> asteroids <- scan(what=numeric())
1: 3 4 3 3 1 4 1 3 2 3 1 1 4 2 3 3 2 6 1 1 3 3 2 2 2 2 1 3 2 1 6 1 3 2 2 1 2 2 4 2
41:
Read 40 items
> mean(asteroids == 1)
[1] 0.25
This works since the equality comparison will give TRUE and FALSE and when T/F are coerced numerically they become 1s and 0s so mean ends up giving us the proportion of TRUEs.
I assumed asteroids was a vector. You don't specify in your question but if it's a different type of structure you'll probably need to coerce it into a vector in some way or another.
Assuming that 'asteroids' is a data.frame, unlist it, get the table and find the proportion with prop.table.
prop.table(table(unlist(asteroids)==1))
# FALSE TRUE
# 0.75 0.25
Or as #Richard Scriven mentioned, we can convert the data.frame to a logical matrix, and use table directly on it as 'matrix' is a vector with dim attributes.
prop.table(table(asteroids == 1))

Creating a fractional factorial design in R without prohibited pairs

I'm trying to write R code for a choice-based conjoint study.
I can create a factorial design using AlgDesign or conjoint - however, there are combinations of attribute levels that should not be together
Using an example from the web:
#Creating a full factorial design
library(AlgDesign)
ffd <- gen.factorial(c(2,2,4), varNames=c("Discount","Amount","Price"), factors="all")
ffd
Discount Amount Price
1 1 1 1
2 2 1 1
3 1 2 1
4 2 2 1
5 1 1 2
6 2 1 2
7 1 2 2
8 2 2 2
9 1 1 3
10 2 1 3
11 1 2 3
12 2 2 3
13 1 1 4
14 2 1 4
15 1 2 4
16 2 2 4
But what if "Discount" 2 ("no discount") should never be paired with "Amount" 1 ("20% discount")
Is there a way to tell AlgDesign or conjoint or some other factorial design to remove any prohibited pairs from the design?
Any advice would be appreciated.
You could always generate ffd as you did there, and then remove rows which meet your criteria, e.g. ffd$Discount == 2 & ffd$Amount==1 . The easy-ish way is to keep all the rows which do not meet the condition:
ffd<-ffd[(ffd$Discount != 2 | ffd$Amount != 1),]
Repeat for each condition you want to reject.

paste values within categories defined by multiple columns

I want to pivot the result column in df horizontally creating a data set with a separate row for each
region, state, county combination where the columns are ordered by year then city.
I also want to identify each row in the new data set by region, state and county and remove the white space between the four results columns. The code below does all of that, but I suspect it is not very efficient.
Is there a way to do this with reshape2 without creating a unique identifier for each group and numbering observations within each group? Is there a way to use apply in place of the for-loop to remove white space from a matrix? (Matrix here being used in a different manner than a mathematical or programming construct.) I realize those are two separate questions and maybe I should post each question separately.
Given that I can achieve the desired result and am only looking to improve the code I do not know whether I should even post this, but I am hoping to learn. Thanks for any advice.
df <- read.table(text= "
region state county city year result
1 1 1 1 1 1
1 1 1 2 1 2
1 1 1 1 2 3
1 1 1 2 2 4
1 1 2 3 1 4
1 1 2 4 1 3
1 1 2 3 2 2
1 1 2 4 2 1
1 2 1 1 1 0
1 2 1 2 1 NA
1 2 1 1 2 0
1 2 1 2 2 0
1 2 2 3 1 2
1 2 2 4 1 2
1 2 2 3 2 2
1 2 2 4 2 2
2 1 1 1 1 9
2 1 1 2 1 9
2 1 1 1 2 8
2 1 1 2 2 8
2 1 2 3 1 1
2 1 2 4 1 0
2 1 2 3 2 1
2 1 2 4 2 0
2 2 1 1 1 2
2 2 1 2 1 4
2 2 1 1 2 6
2 2 1 2 2 8
2 2 2 3 1 3
2 2 2 4 1 3
2 2 2 3 2 2
2 2 2 4 2 2
", header=TRUE, na.strings=NA)
desired.result <- read.table(text= "
region state county results
1 1 1 1234
1 1 2 4321
1 2 1 0.00
1 2 2 2222
2 1 1 9988
2 1 2 1010
2 2 1 2468
2 2 2 3322
", header=TRUE, colClasses=c('numeric','numeric','numeric','character'))
# redefine variables for package reshape2 creating a unique id for each
# region, state, county combination and then number observations in
# each of those combinations
library(reshape2)
id.var <- df$region*100000 + df$state*1000 + df$county
obsnum <- sequence(rle(id.var)$lengths)
df2 <- dcast(df, region + state + county ~ obsnum, value.var = "result")
# remove spaces between columns of results matrix
# with a for-loop. How can I use apply to do this?
x <- df2[,4:(4+max(obsnum)-1)]
# use a dot to represent a missing observation
x[is.na(x)] = '.'
x.cat = numeric(nrow(x))
for(i in 1:nrow(x)) {
x.cat[i] = paste(x[i,], collapse="")
}
df3 <- cbind(df2[,1:3],x.cat)
colnames(df3) <- c("region", "state", "county", "results")
df3
df3 == desired.result
EDIT:
Matthew Lundberg's answer below is excellent. Afterwards I realized I also needed to create an output data set in which the four result columns above contain numeric, rational numbers and are separated by a space. So, I have posted an apparent way to do that below that modifies Matthew's answer. I do not know whether this is accepted protocol, but the new scenario seems so immediately related to the original post that I did not think I should post a new question.
I think this does what you want:
df$result <- as.character(df$result)
df$result[is.na(df$result)] <- '.'
aggregate(result ~ county+state+region, data=df, paste0, collapse='')
county state region result
1 1 1 1 1234
2 2 1 1 4321
3 1 2 1 0.00
4 2 2 1 2222
5 1 1 2 9988
6 2 1 2 1010
7 1 2 2 2468
8 2 2 2 3322
This relies on your data frame being sorted in the proper order (as yours is).
Matthew Lundberg's answer is excellent. Afterwards I realized I also needed to create an output data set in which the four result columns above contain numeric, rational numbers and are separated by a space. So, here I provide an apparent way to do that using a modification of Matthew's answer. I do not know whether this is accepted protocol, but the new scenario seems so immediately related to the original post that I did not think I should post a new question.
The first two lines are modifications of Matthew's answer.
df$result[is.na(df$result)] <- 'NA'
df2 <- aggregate(result ~ county+state+region, data=df, paste)
Then I specify that NA represents missing observations and use apply to obtain the numeric output.
df2$result[df2$result=='NA'] = NA
new.df <- data.frame(df2[,1:3], apply(df2$result,2,as.numeric))
The output is below except note that I added 0.5 to each value in df shown in the original post.
county state region X1 X2 X3 X4
1 1 1 1.5 2.5 3.5 4.5
2 1 1 4.5 3.5 2.5 1.5
1 2 1 0.5 NA 0.5 0.5
2 2 1 2.5 2.5 2.5 2.5
1 1 2 9.5 9.5 8.5 8.5
2 1 2 1.5 0.5 1.5 0.5
1 2 2 2.5 4.5 6.5 8.5
2 2 2 3.5 3.5 2.5 2.5
In my original post I asked how to remove spaces between columns in a data set using apply. That did not prove necessary thanks to Matthew Lundberg's answer to my larger question. Nevertheless, removing spaces between columns of a data set is something I frequently have to do. For completeness, here I post a way to do that using paste0 and apply that arose, in part, from Matthew's answer.
To remove all spaces from the data set x:
x <- read.table(text= "
A B C D
1 1 1 1
1 1 2 2
1 NA 1 3
1 1 2 4
1 2 1 5
1 2 NA 6
1 2 1 7
1 2 2 8
", header=TRUE, na.strings=NA)
# use a dot to represent a missing observation
x[is.na(x)] = '.'
y <- as.data.frame(apply(x, 1, function(i) paste0(i, collapse='')))
colnames(y) <- 'result'
y
Gives:
result
1 1111
2 1122
3 1.13
4 1124
5 1215
6 12.6
7 1217
8 1228
The following code removes the spaces between just the second and third columns:
z <- as.data.frame(apply(x[,2:3], 1, function(i) paste0(i, collapse='')))
y <- data.frame(x[,1], z, x[,4])
colnames(y) <- c('A','BC','D')
y
Giving:
A BC D
1 1 11 1
2 1 12 2
3 1 .1 3
4 1 12 4
5 1 21 5
6 1 2. 6
7 1 21 7
8 1 22 8

Resources