I am trying to simulate a game in R. For that I need to choose a random player out of n_players who begins in the first round. Then the other n_players follow in a random order in the first round. However, in the next rounds the same order of players as in the first round must be kept. Does anyone have an idea on how to do this?
Create a sequence of numbers, say n=10, from 1 up to n.
x<-1:10
Think of this to be the tag number of players. You can then use the sample function of R (read the documentation using ?sample command or visit here) to create another sequence of numbers whose order have been shuffled randomly.
y<-sample(x,10,replace=F)
Now your y variable is the order in which your players are selected one by one.
Also, you can access each individual chosen player just like you choose an element from a vector.
Finally, the vector y is the sequence in which these players are selected in the subsequent rounds.
Test run:
x<-1:10
#[1] 1 2 3 4 5 6 7 8 9 10
y<-sample(x,10,replace=F)
#[1] 2 4 1 8 9 7 5 6 10 3
I have a vector of numbers stored in R.
my_vector <- c(1,2,3,4,5)
I want to add two to each number.
my_vector + 2
[1] 3 4 5 6 7
However, I want there to only be a twenty percent chance of adding two to the numbers in my vector each time I run the code. Is there a way to code this in R?
What I mean is, if I run the code, the output could be:
[1] 3 4 5 6 9
Or perhaps
[1] 5 4 5 6 7
i.e. there is only a 20% chance that any one number in the vector will get two added to it.
myvector + 2*sample(c(TRUE,FALSE), length(myvector), prob=c(0.2,0.8), repl=TRUE)
That will give a variable number of 2's to be added (which is what you were asking) but sometimes people want to know that exactly 20% will have a 2 added in whoch case it would be:
myvector + 2*sample(c(TRUE,rep(FALSE,4)))
Using the walktrap.community approach for defining communities within my graph works great - of all the algorithms I tested it performs the best. The caveat is that in the case of a fully connected graph with no self linkages (every node connects to each other node, but not itself) each node is assigned its own community.
I am not experienced in network analysis but this seems like an interesting case and its certainly not desired behavior. How can I avoid this splitting in my actual data?
library(igraph)
match.mat = matrix(T, nrow=8, ncol=8)
diag(match.mat)[1:8] = T
topology = which(match.mat, arr.ind=T)
g = graph.data.frame(topology, directed=F)
cm = walktrap.community(g)
membership(cm)
# 2 3 4 5 6 7 8 1
# 1 1 1 1 1 1 1 1
plot(cm, g)
diag(match.mat)[1:8] = F
topology = which(match.mat, arr.ind=T)
g = graph.data.frame(topology, directed=F)
cm = walktrap.community(g)
membership(cm)
#2 3 4 5 6 7 8 1
#1 2 3 4 5 6 7 8
plot(cm, g)
Conceptually I'm not sure how the lack of self linkages would lead to every node being split - maybe possible communities are all tied and therefore split? But the case of all self linkages would seem equivalent in that regard.
Thanks!
http://www-rp.lip6.fr/~latapy/Publis/communities.pdf
If you have read the paper carefully, you will note that the Walktrap builds a node distance measure based on the random walk transition matrix. However, this transition matrix needs to be ergodic, therefore its underlying adjacency matrix needs to be connected and non-bipartite. Non-bipartiteness is achieved by adding self loops to the nodes. Therefore, you need to add self loops to each node in your graph. Maybe it will be a good idea for the future to include this correction in the igraph package, but as far as I know they are using the C implementation of Latapy and Pons and for this one the graph needs to have self loops. Hope this answers your question!
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
using a package for a glm which reads a dataframe in chunks. It is required that all levels of a factor occur in every chunk. I am looking for a nice strategy to rearrange observations so as to maximise the probability to have all values in every chunk.
The example would be
c(4,7,4,4,4,4,4,4,4,4,4,7,4,4,8,8,5,5)
for a size of the chunk of 8 the best rearrangement would be
c(4,7,5,8,4,4,4,4,4,4,4,7,4,4,8,8,4,5,8)
is there some elegant way to shuffel the data around?
just saw the comments..the library itself is called bigglm (where it reads data chunkwise). The vectors should be of eqal lenegth. The question is really just about re arranging the data that most level are present in most chunks
An example for the coumn of the dataframe can be found here
(https://www.dropbox.com/s/cth8kwcq9ph5j0p/d1.RData?dl=0)
the most important thing in this case is that as many levels as possible are present in as many chunks as possible. The smaller the chunk, the less memory will be needed when reading in. I think it would be a good point to assume 10 chunks.
I think I understand what you are asking for, though admittedly I am not familiar with the function that reads data in by chunks and uses stringsAsFactors = TRUE while making assumptions a priori on the makeup of the data (and does not offer a way to superimpose other characteristics of the factors). I offer in advance the suggestion that either you are misinterpreting the function or you are mis-applying it to your specific data problem.
I'm easily wrong in problems like this, so I'll try to address the inferred problem regardless.
You claim that the function will be reading in the first 8 elements, on which it do its processing. It must know that there are (in this case) four factors to be considered; the easiest way, as you are asking, is to have each of these factors present in each chunk. Once it has processed these first 8 rows, it will then read the second 8 elements. In the case of your sample data, this does not work since the second 8 elements does not include a 5.
I'll define slightly augmented data later to remedy this.
Assumptions / Rules
the number of unique values overall in the data must be no larger than the size of each chunk;
each factor must have at least as many occurrences as the number of chunks to be read; and
all chunks have precisely chunksize elements in them (i.e., full) except for the last chunk will have between 1 and chunksize elements in it; ergo,
the last chunk has at least as many elements as there are unique values.
Function Definition
Given those rules, here's some code. This is most certainly not the only solution, and it may not perform well with significantly large datasets (I have not done extensive testing).
myfunc <- function(x, chunksize = 8) {
numChunks <- ceiling(length(x) / chunksize)
uniqx <- unique(x)
lastChunkSize <- chunksize * (1 - numChunks) + length(x)
## check to see if it is mathematically possible
if (length(uniqx) > chunksize)
stop('more factors than can fit in one chunk')
if (any(table(x) < numChunks))
stop('not enough of at least one factor to cover all chunks')
if (lastChunkSize < length(uniqx))
stop('last chunk will not have all factors')
## actually arrange things in one feasible permutation
allIndices <- sapply(uniqx, function(z) which(z == x))
## fill one of each unique x into chunks
chunks <- lapply(1:numChunks, function(i) sapply(allIndices, `[`, i))
remainder <- unlist(sapply(allIndices, tail, n = -3))
remainderCut <- split(remainder, ceiling(seq_along(remainder)/4))
## combine them all together, wary of empty lists
finalIndices <- sapply(1:numChunks,
function(i) {
if (i <= length(remainderCut))
c(chunks[[i]], remainderCut[[i]])
else
chunks[[i]]
})
x[unlist(finalIndices)]
}
Supporting Execution
In your offered data, you have 18 elements requiring three chunks. Your data will fail on two accounts: three of the elements only occur twice, so the third chunk will most certainly not contain all elements; and your last chunk will only have two elements, which cannot contain each of the four.
I'll augment your data to satisfy both misses, with:
dat3 <- c(4,7,5,7,8,4,4,4,4,4,4,7,4,4,8,8,5,5,5,5)
which will not work unadjusted, if for no other reason than the last chunk will only have four 5's in it.
The solution:
myfunc(dat3, chunksize = 8)
## [1] 4 7 5 8 4 4 4 4 4 7 5 8 4 4 5 5 4 7 5 8
(spaces were added to the output for easy inspection). Each chunk has 4, 7, 5, 8 as its first four elements, therefore all factors are covered in each chunk.
Breakdown
A quick walkthrough (using debug(myfunc)), assuming x = dat3 and chunksize = 8. Jumping down the code:
## Browse[2]> uniqx
## [1] 4 7 5 8
## Browse[2]> allIndices
## [[1]]
## [1] 1 6 7 8 9 10 11 13 14
## [[2]]
## [1] 2 4 12
## [[3]]
## [1] 3 17 18 19 20
## [[4]]
## [1] 5 15 16
This shows the indices for each unique element. For example, there are 4's located at indices 1, 6, 7, etc.
## Browse[2]> chunks
## [[1]]
## [1] 1 2 3 5
## [[2]]
## [1] 6 4 17 15
## [[3]]
## [1] 7 12 18 16
There are three chunks to be filled, and this list starts forming those chunks. In this example, we have placed indices 1, 2, 3, and 5 in the first chunk. Looking back at allIndices, you'll see that these represent the first instance of each of uniqx, so the first chunk now contains c(4, 7, 5, 8), as do the other two chunks.
At this point, we have satisfied the basic requirement that each unique element be found in every chunk. The rest of the code fills with the remaining elements.
## Browse[2]> remainder
## [1] 8 9 10 11 13 14 19 20
These are all indices that have so far not been added to the chunks.
## Browse[2]> remainderCut
## $`1`
## [1] 8 9 10 11
## $`2`
## [1] 13 14 19 20
Though we have three chunks, we only have two lists here. This is fine, we have nothing (and need nothing) to add to the last chunk. We will then zip-merge these with chunks to form a list of index lists. (Note: you might be tempted to try mapply(function(a, b) c(a, b), chunks, remainderCut), but you may notice that if remainderCut is not the same size as chunks, as we see here, then its values are recycled. Not acceptable. Try it.)
## Browse[2]> finalIndices
## [[1]]
## [1] 1 2 3 5 8 9 10 11
## [[2]]
## [1] 6 4 17 15 13 14 19 20
## [[3]]
## [1] 7 12 18 16
Remember, each number represents the index from within x (originally dat3). We then unlist this split-vector and apply the indices to the data.
I have the same graph represented at two different times, g.t0 and g.t1. g.t1 differs from g.t0 for having one additional edge but maintains the same vertices.
I want to compare the communities in g.t0 and g.t1, that is, to test whether the vertices moved to a different community from t0 to t1. I tried the following
library(igraph)
m <- matrix(c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0),nrow=4,ncol=4)
g.t0 <- graph.adjacency(m)
memb.t0 <- membership(edge.betweenness.community(g.t0))
V(g.t0)
# Vertex sequence:
# [1] 1 2 3 4
memb.t0
# [1] 1 2 2 3
g.t1 <- add.edges(g.t0,c(1,2))
memb.t1 <- membership(edge.betweenness.community(g.t1))
V(g.t1)
# Vertex sequence:
# [1] 1 2 3 4
memb.t1
# [1] 1 1 1 2
But of course the problem is that the indexing of the communities always start from 1. Then in the example it seems that all the vertices have moved to a different community, but the most intuitive reading is that actually only the vertex 1 changed community, moving with 2 and 3.
How could I approach the problem of counting the number of vertices that changed communities from t0 to t1?
Actually this is not an easy question. In general you need to match the communities in the two graphs, using some rule or criteria that the matching optimizes. As you can have different number of communities, the matching is not necessarily bijective.
There were several methods and quantities proposed for this problem, a bunch is implemented in igraph, see
http://igraph.org/r/doc/compare.html
compare.communities(memb.t1, memb.t0, method="vi")
# [1] 0.4773856
compare.communities(memb.t1, memb.t0, method="nmi")
# [1] 0.7020169
compare.communities(memb.t1, memb.t0, method="rand")
# [1] 0.6666667
See the references in the igraph manual for the details about the methods.