Fixing the First and Last Numbers in a Random List - r

I used this code to generate these random numbers (corresponds to an edge list for a graph) such that (Generating Random Graphs According to Some Conditions):
The first and last "nodes" are the same (e.g. starts at "1" and ends at "1")
Each node is visited exactly once
See below:
d = 15
relations = data.frame(tibble(
from = sample(data$d),
to = lead(from, default=from[1]),
))
> relations
from to
1 1 11
2 11 7
3 7 5
4 5 10
5 10 13
6 13 9
7 9 15
8 15 2
9 2 3
10 3 4
11 4 8
12 8 6
13 6 12
14 12 14
15 14 1
If I re-run this above code, it will (naturally) produce a different list:
relations
from to
1 6 9
2 9 2
3 2 5
4 5 8
5 8 13
6 13 1
7 1 14
8 14 3
9 3 11
10 11 12
11 12 7
12 7 15
13 15 4
14 4 10
15 10 6
Can I do something so that each time I generate a new random set of numbers, I can fix the first and last number to a specific number?
For instance, could I make it so that the first number and the last number are always "7"?
#example 1
from to
1 7 11
2 11 1
3 1 5
4 5 10
5 10 13
6 13 9
7 9 15
8 15 2
9 2 3
10 3 4
11 4 8
12 8 6
13 6 12
14 12 14
15 14 7
#example 2
from to
1 7 9
2 9 2
3 2 5
4 5 8
5 8 13
6 13 1
7 1 14
8 14 3
9 3 11
10 11 12
11 12 6
12 6 15
13 15 4
14 4 10
15 10 7
In the above examples (example 1, example 2), I took the first two random lists I made and manually replaced the first number and last number with 7 - and then replaced the replacement numbers as well.
But is there a way to "automatically" do this instead of making a manual correction?
For example, I think I figured out how to do this:
#run twice to make sure the output is correct
relations = data.frame(tibble(
from = sample(data$d),
to = lead(from, default=from[1]),
))
orig_first = relations[1,1]
relations[1,1] = 7
relations[15,2] = 7
relation = relations[-c(1,15),]
r1 = relations[1,]
r2 = relations[15,]
final_relation = rbind(r1, relation, r2)
#output 1 : seems correct (starts with 7, ends with 7, all nodes visited exactly once)
from to
1 7 8
2 8 4
3 4 7
4 7 13
5 13 1
6 1 14
7 14 6
8 6 9
9 9 11
10 11 10
11 10 12
12 12 2
13 2 5
14 5 15
15 15 7
#output 2: looks correct
from to
1 7 9
2 9 2
3 2 1
4 1 6
5 6 3
6 3 10
7 10 11
8 11 14
9 14 12
10 12 7
11 7 13
12 13 4
13 4 15
14 15 8
15 8 7
Am I doing this correctly? Is there an easier way to do this?
Thank you!

Here is a way to do this -
library(dplyr)
set.seed(2021)
d = 15
fix_num <- 7
relations = tibble(
from = c(fix_num, sample(setdiff(1:d, fix_num))),
to = lead(from, default=from[1]),
)
relations
# A tibble: 15 x 2
# from to
# <dbl> <dbl>
# 1 7 8
# 2 8 6
# 3 6 11
# 4 11 15
# 5 15 4
# 6 4 14
# 7 14 9
# 8 9 10
# 9 10 3
#10 3 5
#11 5 12
#12 12 13
#13 13 1
#14 1 2
#15 2 7

Related

Convert dataframe from vertical to horizontal

I already checked many questions and I don't seem to find the suitable answer.
I have this df
df = data.frame(x = 1:10,y=11:20)
the output
x y
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
I just wish the output to be:
1 2 3 4 5 6 7 8 9 10
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20
thanks
Try t() like below
> data.frame(t(df), check.names = FALSE)
1 2 3 4 5 6 7 8 9 10
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20
A transpose should do it
setNames(data.frame(t(df)), df[,"x"])
1 2 3 4 5 6 7 8 9 10
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20

How to find all pairs of two lists, and categorize them without repetitions?

We are preparing for a program where 18 people should discuss topics in a way that in each round they form pairs, and then they switch until everyone has talked to everyone. It means 153 discussions, 9 pairs talking parallelly in each round, for 17 rounds. I tried to formulate a matrix showing who should talk to whom in order to avoid the chaos, but could not succeed. For the sake of simplicity everyone is given a number, so the bottom line is, i would need all pairs of combinations of the numbers from 1 to 18 (did that with combn function), but then these pairs should be rearranged for the 17 round so that each number only appears once per round. Any ideas?
Let's first look at a simpler problem with 6 persons. The following matrix lists who (rows) is talking to whom (columns) in which round (entry):
So for example in round 1 (yellow) we have the following pairs:
(1-2), (3-5), (4-6)
For round 2 (green) we would have:
(1-3), (2-6), (4-5)
and so on.
Thus, basically we are looking for a symmetric latin square (i.e. in each row and in each column each entry appears only once, cf. Latin Squares on Wikipedia).
The latin square in the box can be easily generated via an addition table:
inner_ls <- function(k) {
res <- outer(0:(k-1), 0:(k-1), function(i, j) (i + j) %% k)
## replace zeros by k
res[res == 0] <- k
## replace diagonal by NA
diag(res) <- NA
res
}
inner_ls(5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 1 2 3 4
# [2,] 1 NA 3 4 5
# [3,] 2 3 NA 5 1
# [4,] 3 4 5 NA 2
# [5,] 4 5 1 2 NA
So all is left to append the last row (column) with the missing round number:
full_ls <- function(k) {
i_ls <- inner_ls(k - 1)
last_row <- apply(i_ls, 1, function(row) {
rounds <- 1:(k - 1)
rounds[!rounds %in% row]
})
res <- cbind(rbind(i_ls, last_row), c(last_row, NA))
rownames(res) <- colnames(res) <- 1:k
res
}
full_ls(6)
# 1 2 3 4 5 6
# 1 NA 1 2 3 4 5
# 2 1 NA 3 4 5 2
# 3 2 3 NA 5 1 4
# 4 3 4 5 NA 2 1
# 5 4 5 1 2 NA 3
# 6 5 2 4 1 3 NA
With that you get your assignment matrix as follows:
full_ls(18)
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
# 1 NA 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# 2 1 NA 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 2
# 3 2 3 NA 5 6 7 8 9 10 11 12 13 14 15 16 17 1 4
# 4 3 4 5 NA 7 8 9 10 11 12 13 14 15 16 17 1 2 6
# 5 4 5 6 7 NA 9 10 11 12 13 14 15 16 17 1 2 3 8
# 6 5 6 7 8 9 NA 11 12 13 14 15 16 17 1 2 3 4 10
# 7 6 7 8 9 10 11 NA 13 14 15 16 17 1 2 3 4 5 12
# 8 7 8 9 10 11 12 13 NA 15 16 17 1 2 3 4 5 6 14
# 9 8 9 10 11 12 13 14 15 NA 17 1 2 3 4 5 6 7 16
# 10 9 10 11 12 13 14 15 16 17 NA 2 3 4 5 6 7 8 1
# 11 10 11 12 13 14 15 16 17 1 2 NA 4 5 6 7 8 9 3
# 12 11 12 13 14 15 16 17 1 2 3 4 NA 6 7 8 9 10 5
# 13 12 13 14 15 16 17 1 2 3 4 5 6 NA 8 9 10 11 7
# 14 13 14 15 16 17 1 2 3 4 5 6 7 8 NA 10 11 12 9
# 15 14 15 16 17 1 2 3 4 5 6 7 8 9 10 NA 12 13 11
# 16 15 16 17 1 2 3 4 5 6 7 8 9 10 11 12 NA 14 13
# 17 16 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 NA 15
# 18 17 2 4 6 8 10 12 14 16 1 3 5 7 9 11 13 15 NA

divide dataframe into subgroups based on several columns successively in R

I have to sort a datapool with following structure into subgroups based on the value of 3 columns in R, but I cannot figure it out.
What I want to do is:
First, sort the datapool based on the column V1, the datapool should be divided into three subgroups according to the value of V1 (the value of V1 should be sorted by descending at first).
Sort each of the 3 subgroups into another 3 subgroups according to the value of V2, now we should have 9 subgroups.
Similarly, subdivide each of the 9 groups into 3 groups again,and resulting in 27 subgroups all together.
the following data is only a simple example, the data have 1545 firms.
Firm value V1 V2 V3
1 7 7 11 8
2 9 9 11 7
3 8 14 8 10
4 9 9 7 14
5 8 11 15 14
6 9 10 9 7
7 8 8 6 14
8 4 8 11 14
9 8 10 13 10
10 2 11 6 13
11 3 5 12 14
12 5 12 15 12
13 1 9 13 7
14 4 5 14 7
15 5 10 5 9
16 5 8 13 14
17 2 10 10 7
18 5 12 12 9
19 7 6 11 7
20 6 9 14 14
21 6 14 9 14
22 8 6 6 7
23 9 11 9 5
24 7 7 6 9
25 10 5 15 11
26 4 6 10 9
27 4 13 14 8
And the result should be:
Firm value V1 V2 V3
5 8 11 15 14
12 5 12 15 12
27 4 13 14 8
21 6 14 9 14
18 5 12 12 9
23 9 11 9 5
10 2 11 6 13
3 8 14 8 10
6 9 10 9 7
20 6 9 14 14
9 8 10 13 10
13 1 9 13 7
8 4 8 11 14
2 9 9 11 7
17 2 10 10 7
4 9 9 7 14
7 8 8 6 14
15 5 10 5 9
16 5 8 13 14
25 10 5 15 11
14 4 5 14 7
11 3 5 12 14
1 7 7 11 8
19 7 6 11 7
26 4 6 10 9
24 7 7 6 9
22 8 6 6 7
I have tried for a long time, also searched Google without success. :(
As #Codoremifa said, data.table can be used here:
require(data.table)
DT <- data.table(dat)
DT[order(V1),G1:=rep(1:3,each=9)]
DT[order(V2),G2:=rep(1:3,each=3),by=G1]
DT[order(V3),G3:=1:3,by='G1,G2']
Now your groups are labeled using the additional columns G1 and G2. To sort, so that it's easier to see the groups, use
setkey(DT,G1,G2,G3)
A couple of the OP's columns are just noise unrelated to the question; to verify that this works by eye, try DT[,list(V1,V2,V3,G1,G2,G3)]
EDIT: The OP did not specify a means of dealing with ties. I guess it makes sense to use the value in the later columns to break ties, so...
DT <- data.table(dat)
DT[order(rank(V1)+rank(V2)/100+rank(V3)/100^2),
G1:=rep(1:3,each=9)]
DT[order(rank(V2)+rank(V3)/100),
G2:=rep(1:3,each=3),by=G1]
DT[order(V3),
G3:=1:3,by='G1,G2']
setkey(DT,G1,G2,G3)
DT[27:1] (the result backwards) is
Firm value V1 V2 V3 G1 G2 G3
1: 5 8 11 15 14 3 3 3
2: 12 5 12 15 12 3 3 2
3: 27 4 13 14 8 3 3 1
4: 21 6 14 9 14 3 2 3
5: 9 8 10 13 10 3 2 2
6: 18 5 12 12 9 3 2 1
7: 10 2 11 6 13 3 1 3
8: 3 8 14 8 10 3 1 2
9: 23 9 11 9 5 3 1 1
10: 20 6 9 14 14 2 3 3
11: 16 5 8 13 14 2 3 2
12: 13 1 9 13 7 2 3 1
13: 8 4 8 11 14 2 2 3
14: 17 2 10 10 7 2 2 2
15: 2 9 9 11 7 2 2 1
16: 4 9 9 7 14 2 1 3
17: 15 5 10 5 9 2 1 2
18: 6 9 10 9 7 2 1 1
19: 11 3 5 12 14 1 3 3
20: 25 10 5 15 11 1 3 2
21: 14 4 5 14 7 1 3 1
22: 26 4 6 10 9 1 2 3
23: 1 7 7 11 8 1 2 2
24: 19 7 6 11 7 1 2 1
25: 7 8 8 6 14 1 1 3
26: 24 7 7 6 9 1 1 2
27: 22 8 6 6 7 1 1 1
Firm value V1 V2 V3 G1 G2 G3
Here is an answer using transform and then ddply from plyr. I don't address the ties, which really means that in case of a tie the value from the lowest row number is used first. This is what the OP shows in the example output.
First, order the dataset in descending order of V1 and create three groups of 9 by creating a new variable, fv1.
dat1 = transform(dat1[order(-dat1$V1),], fv1 = factor(rep(1:3, each = 9)))
Then order the dataset in descending order of V2 and create three groups of 3 within each level of fv1.
require(plyr)
dat1 = ddply(dat1[order(-dat1$V2),], .(fv1), transform, fv2 = factor(rep(1:3, each = 3)))
Finally order the dataset by the two factors and V3. I use arrange from plyr for typing efficiency compared to order
(finaldat = arrange(dat1, fv1, fv2, -V3) )
This isn't a particularly generalizable answer, as the group sizes are known in advance for the factors. If the V3 group size was larger than one, a similar process as for V2 would be needed.

making sort order in merge() numeric

I have two easy matrices (or df's) to merge:
a <- cbind(one=0:15, two=0:15, three=0:15)
b <- cbind(one=0:15, two=0:15, three=0:15)
#a <- data.frame(one=0:15, two=0:15, three=0:15)
#b <- data.frame(one=0:15, two=0:15, three=0:15)
No problem: after sorting on column one, column one is output ascending nicely from 0 to 15:
merge(a,b,by=c("one"), sort=T)
one two.x three.x two.y three.y
1 0 0 0 0 0
2 1 1 1 1 1
3 2 2 2 2 2
4 3 3 3 3 3
5 4 4 4 4 4
6 5 5 5 5 5
7 6 6 6 6 6
8 7 7 7 7 7
9 8 8 8 8 8
10 9 9 9 9 9
11 10 10 10 10 10
12 11 11 11 11 11
13 12 12 12 12 12
14 13 13 13 13 13
15 14 14 14 14 14
16 15 15 15 15 15
But wait: when merging on two columns --- both numeric --- the sort order suddenly seems alphabetic.
merge(a,b,by=c("one", "two"), sort=T)
one two three.x three.y
1 0 0 0 0
2 1 1 1 1
3 10 10 10 10
4 11 11 11 11
5 12 12 12 12
6 13 13 13 13
7 14 14 14 14
8 15 15 15 15
9 2 2 2 2
10 3 3 3 3
11 4 4 4 4
12 5 5 5 5
13 6 6 6 6
14 7 7 7 7
15 8 8 8 8
16 9 9 9 9
Eww, gross. What's going on? And what do I do?
Based on #joran's comments, it looks like if you want the rows to be sorted in any particular order, you should explicitly set it yourself.
If the order you'd like is one in which the rows have increasing values of one or more columns, you can use the function order(), like this:
X <- merge(a, b, by = c("one", "two"))
X[with(X, order(one, two)),]

How to create a dataframe with different number of values?

When I create a dataframe I do:
dt = data.frame(a=c(1:5),b=c(1:20))
dt
a b
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 1 6
7 2 7
8 3 8
9 4 9
10 5 10
11 1 11
12 2 12
13 3 13
14 4 14
15 5 15
16 1 16
17 2 17
18 3 18
19 4 19
20 5 20
as you can see the value of the first column (a) are repeated.
How can I create different "columns" with different number of values?
Thanks
H
Use a list. A data.frame is a special kind of list in which all elements are of the same length.
list(a=c(1:5),b=c(1:20))
$a
[1] 1 2 3 4 5
$b
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Resources