Creating a matrix with all combinations within a budget - r

I am attempting to create a matrix that includes all combinations of numbers within a range such that the row sums to a specific value. I am not sure if there is a function for this or if I would need to create the function manually. I have tried combn function but it does not constrain to the sum and so the matrix gets large very quickly.
example: 3 rows that sum to 5
5,0,0
4,1,0
4,0,1
3,2,0
3,0,2
3,1,1
2,3,0
2,0,3
2,2,1
2,1,2
etc..

These combinatorial objects are called partitions (see also here and even here), and their computation is implemented by the partitions package.
Depending on what you really want, use one of the following:
library(partitions)
## The first argument says you want to enumerate all partitions in which the
## second argument (5) is broken into three summands, each of which can take a
## maximum value of 5.
blockparts(rep(5,3),5) ## Equiv: blockparts(c(5,5,5), 5)
#
# [1,] 5 4 3 2 1 0 4 3 2 1 0 3 2 1 0 2 1 0 1 0 0
# [2,] 0 1 2 3 4 5 0 1 2 3 4 0 1 2 3 0 1 2 0 1 0
# [3,] 0 0 0 0 0 0 1 1 1 1 1 2 2 2 2 3 3 3 4 4 5
restrictedparts(5,3)
#
# [1,] 5 4 3 3 2
# [2,] 0 1 2 1 2
# [3,] 0 0 0 1 1

Perhaps this does what you want:
x <- expand.grid(replicate(3, list(0:5)))
x[rowSums(x) == 5, ]
# Var1 Var2 Var3
# 6 5 0 0
# 11 4 1 0
# 16 3 2 0
# 21 2 3 0
# 26 1 4 0
# 31 0 5 0
# 41 4 0 1
# 46 3 1 1
# 51 2 2 1
# 56 1 3 1
# 61 0 4 1
# 76 3 0 2
# 81 2 1 2
# 86 1 2 2
# 91 0 3 2
# 111 2 0 3
# 116 1 1 3
# 121 0 2 3
# 146 1 0 4
# 151 0 1 4
# 181 0 0 5
expand.grid and combn are somewhat related, but I find expand.grid to be more applicable to these types of problems.
There is also the permutations function from the "gtools" package:
library(gtools)
x <- permutations(6, 3, v = 0:5, set = FALSE, repeats.allowed=TRUE)
x[rowSums(x) == 5, ]

Related

File entire vector by a certain appearance of another vector

I have the following data:
players<-rep(1:3,each=3)
trial<-rep(1:3)
choice<-c(1,0,0,0,0,0,0,1,0)
gamematrix<-data.frame(cbind(players,trial,choice))
players trial choice
1 1 1 1
2 1 2 0
3 1 3 0
4 2 1 0
5 2 2 0
6 2 3 0
7 3 1 0
8 3 2 1
9 3 3 0
Now I want to create a new vector:
for each participant who have at least one choice of "1", to get the value "3" and "0" otherwise:
players trial choice win
1 1 1 1 3
2 1 2 0 3
3 1 3 0 3
4 2 1 0 0
5 2 2 0 0
6 2 3 0 0
7 3 1 0 3
8 3 2 1 3
9 3 3 0 3
In the simple example above, player "1", had "1" in the first trial, while player 3 in the second trial, thus for all their choices the value is "3" in the new vector.
Any ideas how to do it? thanks!
A base R option using ave + ifelse
within(
gamematrix,
win <- ave(choice,players,FUN = function(x) ifelse(any(x==1),3,0))
)
giving
players trial choice win
1 1 1 1 3
2 1 2 0 3
3 1 3 0 3
4 2 1 0 0
5 2 2 0 0
6 2 3 0 0
7 3 1 0 3
8 3 2 1 3
9 3 3 0 3
Update
If you criteria is depending on the first two values of choice, you can try
within(
gamematrix,
win <- ave(choice,players,FUN = function(x) ifelse(all(head(x,2)==1),3,0))
)
which gives
players trial choice win
1 1 1 1 0
2 1 2 0 0
3 1 3 0 0
4 2 1 0 0
5 2 2 0 0
6 2 3 0 0
7 3 1 0 0
8 3 2 1 0
9 3 3 0 0
Try this dplyr approach:
library(dplyr)
#Code
gamematrix <- gamematrix %>% group_by(players) %>%
mutate(win=ifelse(length(choice[choice==1])>=1,3,0))
Output:
# A tibble: 9 x 4
# Groups: players [3]
players trial choice win
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 3
2 1 2 0 3
3 1 3 0 3
4 2 1 0 0
5 2 2 0 0
6 2 3 0 0
7 3 1 0 3
8 3 2 1 3
9 3 3 0 3
There is no reason for this data to be a data.frame. Keep it as a numeric matrix. If you do so you can do in one line using only vectorized functions.
cbind(gamematrix, win = (rowSums(gamematrix == 1) > 0) * 3)
for your second case:
I would like it to be only for those players who had "choice=1" in the first N (e.g., first 2 trials)
cbind(gamematrix, win = (rowSums(gamematrix[,c(1,2)] == 1) > 0) * 3)
Vectorized solutions are usually more performant than solutions incorporating a buried loop (e.g. ave).
An option with rowsum from base R
gamematrix$win <- with(gamematrix, 3 * players %in%
names(which(rowsum(choice, players)[,1] > 0)))
gamematrix$win
#[1] 3 3 3 0 0 0 3 3 3

Is there an R function similar to pandas.crosstab, generate joint frequency table with named attributes?

I'd like to create a frequency table iteratively, with a single variable [var1, Y] or [var2, Y] and joint variables [var1, var2, Y]
Codes below in R can only make the single frequency table and joint frequency table separately.
c1 <- ftable(variable[[1]], data1[,3])
# Fund
#
# b 21
# c 206
# d 1127
c1 <- ftable(variable[[3]], data1[,3])
# x.2 a b c d
# x.1
# b 0 9 4 8
# c 0 116 51 39
# d 5 542 291 289
#variable[[3]] is a joint variable of variable[[1]] and variable[[2]]
as.matrix(as.vector(t(c1)))
# [,1]
# [1,] 0
# [2,] 9
# [3,] 4
# [4,] 8
# [5,] 0
# [6,] 116
# [7,] 51
# [8,] 39
# [9,] 5
# [10,] 542
# [11,] 291
# [12,] 289
ftable(variable[[1]], variable[[2]], data1[,3])
# Fund
#
# b a 0
# b 9
# c 4
# d 8
# c a 0
# b 116
# c 51
# d 39
# d a 5
# b 542
# c 291
# d 289
Is there a way to generate frequency tables together but also keep the named attribute?
You can use addmargins to add margins (row and column sums) to a table.
For example:
data(mtcars)
addmargins(table(mtcars[c("cyl", "gear")]))
# gear
# cyl 3 4 5 Sum
# 4 1 8 2 11
# 6 2 4 1 7
# 8 12 0 2 14
# Sum 15 12 5 32
ftable(addmargins(table(mtcars[c("cyl", "gear", "carb")])))
# carb 1 2 3 4 6 8 Sum
# cyl gear
# 4 3 1 0 0 0 0 0 1
# 4 4 4 0 0 0 0 8
# 5 0 2 0 0 0 0 2
# Sum 5 6 0 0 0 0 11
# 6 3 2 0 0 0 0 0 2
# 4 0 0 0 4 0 0 4
# 5 0 0 0 0 1 0 1
# Sum 2 0 0 4 1 0 7
# 8 3 0 4 3 5 0 0 12
# 4 0 0 0 0 0 0 0
# 5 0 0 0 1 0 1 2
# Sum 0 4 3 6 0 1 14
# Sum 3 3 4 3 5 0 0 15
# 4 4 4 0 4 0 0 12
# 5 0 2 0 1 1 1 5
# Sum 7 10 3 10 1 1 32
I first use table to create the table as addmargins expects the output of table and not ftable. In case of the three dimensional table, I finally use ftable to format the table in a more readable format.
Generating all possible tables
# Select columns interesting to use in table
dta <- mtcars[c("cyl", "vs", "am", "gear", "carb")]
# Generate all possible combinations of columns
combinations <- unlist(lapply(1:ncol(dta),
function(x) combn(1:ncol(dta), x, simplify = FALSE)), recursive = FALSE)
# For each combination calculate a table
tables <- lapply(combinations, function(cols) ftable(dta[cols]))

Sum rows in a group, starting when a specific value occurs

I want to accumulate the values of a column till the end of the group, though starting the addition when a specific value occurs in another column. I am only interested in the first instance of the specific value within a group. So if that value occurs again within the group, the addition column should continue to add the values. I know this sounds like a rather strange problem, so hopefully the example table makes sense.
The following data frame is what I have now:
> df = data.frame(group = c(1,1,1,1,2,2,2,2,2,3,3,3,4,4,4),numToAdd = c(1,1,3,2,4,2,1,3,2,1,2,1,2,3,2),occurs = c(0,0,1,0,0,1,0,0,0,0,1,1,0,0,0))
> df
group numToAdd occurs
1 1 1 0
2 1 1 0
3 1 3 1
4 1 2 0
5 2 4 0
6 2 2 1
7 2 1 0
8 2 3 0
9 2 2 0
10 3 1 0
11 3 2 1
12 3 1 1
13 4 2 0
14 4 3 0
15 4 2 0
Thus, whenever a 1 occurs within a group, I want a cumulative sum of the values from the column numToAdd, until a new group starts. This would look like the following:
> finalDF = data.frame(group = c(1,1,1,1,2,2,2,2,2,3,3,3,4,4,4),numToAdd = c(1,1,3,2,4,2,1,3,2,1,2,1,2,3,2),occurs = c(0,0,1,0,0,1,0,0,0,0,1,1,0,0,0),added = c(0,0,3,5,0,2,3,6,8,0,2,3,0,0,0))
> finalDF
group numToAdd occurs added
1 1 1 0 0
2 1 1 0 0
3 1 3 1 3
4 1 2 0 5
5 2 4 0 0
6 2 2 1 2
7 2 1 0 3
8 2 3 0 6
9 2 2 0 8
10 3 1 0 0
11 3 2 1 2
12 3 1 1 3
13 4 2 0 0
14 4 3 0 0
15 4 2 0 0
Thus, the added column is 0 until a 1 occurs within the group, then accumulates the values from numToAdd until it moves to a new group, turning the added column back to 0. In group three, a value of 1 is found a second time, yet the cumulated sum continues. Additionally, in group 4, a value of 1 is never found, thus the value within the added column remains 0.
I've played around with dplyr, but can't get it to work. The following solution only outputs the total sum, and not the increasing cumulated number at each row.
library(dplyr)
df =
df %>%
mutate(added=ifelse(occurs == 1,cumsum(numToAdd),0)) %>%
group_by(group)
Try
df %>%
group_by(group) %>%
mutate(added= cumsum(numToAdd*cummax(occurs)))
# group numToAdd occurs added
# 1 1 1 0 0
# 2 1 1 0 0
# 3 1 3 1 3
# 4 1 2 0 5
# 5 2 4 0 0
# 6 2 2 1 2
# 7 2 1 0 3
# 8 2 3 0 6
# 9 2 2 0 8
# 10 3 1 0 0
# 11 3 2 1 2
# 12 3 1 1 3
# 13 4 2 0 0
# 14 4 3 0 0
# 15 4 2 0 0
Or using data.table
library(data.table)#v1.9.5+
i1 <-setDT(df)[, .I[(rleid(occurs) + (occurs>0))>1], group]$V1
df[, added:=0][i1, added:=cumsum(numToAdd), by = group]
Or a similar option as in dplyr
setDT(df)[,added := cumsum(numToAdd * cummax(occurs)) , by = group]
You can use split-apply-combine in base R with something like:
df$added <- unlist(lapply(split(df, df$group), function(x) {
y <- rep(0, nrow(x))
pos <- cumsum(x$occurs) > 0
y[pos] <- cumsum(x$numToAdd[pos])
y
}))
df
# group numToAdd occurs added
# 1 1 1 0 0
# 2 1 1 0 0
# 3 1 3 1 3
# 4 1 2 0 5
# 5 2 4 0 0
# 6 2 2 1 2
# 7 2 1 0 3
# 8 2 3 0 6
# 9 2 2 0 8
# 10 3 1 0 0
# 11 3 2 1 2
# 12 3 1 1 3
# 13 4 2 0 0
# 14 4 3 0 0
# 15 4 2 0 0
To add another base R approach:
df$added <- unlist(lapply(split(df, df$group), function(x) {
c(x[,'occurs'][cumsum(x[,'occurs']) == 0L],
cumsum(x[,'numToAdd'][cumsum(x[,'occurs']) != 0L]))
}))
# group numToAdd occurs added
# 1 1 1 0 0
# 2 1 1 0 0
# 3 1 3 1 3
# 4 1 2 0 5
# 5 2 4 0 0
# 6 2 2 1 2
# 7 2 1 0 3
# 8 2 3 0 6
# 9 2 2 0 8
# 10 3 1 0 0
# 11 3 2 1 2
# 12 3 1 1 3
# 13 4 2 0 0
# 14 4 3 0 0
# 15 4 2 0 0
Another base R:
df$added <- unlist(lapply(split(df,df$group),function(x){
cumsum((cumsum(x$occurs) > 0) * x$numToAdd)
}))

Keep the 2 highest values of each row in a list of matrices r

Suppose I have a list of matrices:
$`2010`
1 2 3 4
1 0 3 5 6
2 5 1 9 5
3 0 0 0 0
4 10 10 10 0
$`2011`
1 2 3 4
1 0 2 3 6
2 5 0 3 1
3 2 4 0 1
4 2 1 2 1
Code to create the matrices:
cntry<-c(1,2,3,4)
a<-c(0,5,0,10)
b<-c(3,1,0,10)
c<-c(5,9,0,10)
d<-c(6,5,0,0)
k<-data.frame(a,b,c,d)
k<-as.matrix(k)
dimnames(k)<-list(cntry,cntry)
e<-c(0,5,2,2)
f<-c(2,0,4,1)
g<-c(3,3,0,2)
h<-c(6,1,1,1)
l<-data.frame(e,f,g,h)
l<-as.matrix(l)
dimnames(l)<-list(cntry,cntry)
list<-list(k,l)
names(list)<-2010:2011
I want to keep the two highest values in each row, and replace the remaining smaller values of the other cells in the same row with 0's.
If there are more than two cells that have the highest value, I want to leave all those cells as they are (for example: 10 10 10 0-> 10 10 10 0, 5 1 9 5 -> 5 0 9 5). All the other cells of the row should be set to 0 again.
The results should look like this:
$`2010`
1 2 3 4
1 0 0 5 6
2 5 0 9 5
3 0 0 0 0
4 10 10 10 0
$`2011`
1 2 3 4
1 0 0 3 6
2 5 0 3 0
3 2 4 0 0
4 2 0 2 0
I'm not sure how to approach this problem, so any help is highly welcome!
Here's one approach:
lapply(list, function(x) {
t(apply(x, 1, function(y) {
y[!y %in% tail(sort(y), 2)] <- 0
y
}))
})
## $`2010`
## 1 2 3 4
## 1 0 0 5 6
## 2 5 0 9 5
## 3 0 0 0 0
## 4 10 10 10 0
##
## $`2011`
## 1 2 3 4
## 1 0 0 3 6
## 2 5 0 3 0
## 3 2 4 0 0
## 4 2 0 2 0
This works by iterating over elements of the list (with lapply), treating each in turn as the object x, and then iterating over the rows of that x (with apply(x, 1, ...)) calling the row y and applying a function to it.
The function applied to the row y of list element x is:
function(y) {
y[y < tail(sort(y), 2)] <- 0
y
}
which identifies the two highest-valued elements of the row (tail(sort(y), 2)), returns a logical vector indicating which of the elements of y are not in that set (with y < ...), subsets the elements of the vector y with that logical vector, and assigns 0 to these elements. Finally, it returns the modified y.

calculate the modularity on each step

library(igraph)
g=graph.famous("Zachary")
c=edge.betweenness.community(g)
a=membership(c)
c$merges
b=community.to.membership(g,c$merges,steps=33)
[1] 1 1 2 1 3 3 3 1 4 5 3 1 1 1 4 4 3 1 4 1 4 1 4 4 2 2 4 2 2 4 4 2 4 4
[1] 9 4 0 0 0 0 5 0 3 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 6 0 0 0 0 7 0 0 8
I expect b is same with membership(c) because there are 33 steps. But why is b different from membership(c)?
This seems to be a bug in igraph because even the following does not work:
> library(igraph)
> g <- graph.full(3)
> c <- edge.betweenness.community(g)
> community.to.membership(g, c$merges, steps=1)$membership
This gives me a membership vector of 1 2 0, which is clearly wrong; the correct result should be 1 0 0 or 0 1 1. Actually, you can fix the bug by subtracting 1 from the merge matrix:
> g <- graph.famous("zachary")
> c <- edge.betweenness.community(g)
> membership(c)
[1] 1 1 2 1 3 3 3 1 4 5 3 1 1 1 4 4 3 1 4 1 4 1 4 4 2 2 4 2 2 4 4 2 4 4
> community.to.membership(g, c$merges-1, steps=29)$membership
[1] 0 0 2 0 3 3 3 0 1 4 3 0 0 0 1 1 3 0 1 0 1 0 1 1 2 2 1 2 2 1 1 2 1 1
These two membership vectors are essentially the same (with a bit of reindexing). Note that you only need 29 steps to reach the same membership vector because there are 34 vertices in the graph and you have 5 communities, so you need to perform 34-5=29 merges. Performing 33 steps would get you a single community only.
I will file a bug report for this in igraph's bug tracker.

Resources