Argument "partial" of the sort function in R - r

?sort states that the partial argument may be NULL or a vector of indices for partial sorting.
I tried:
x <- c(1,3,5,2,4,6,7,9,8,10)
sort(x)
## [1] 1 2 3 4 5 6 7 8 9 10
sort(x, partial=5)
## [1] 1 3 4 2 5 6 7 9 8 10
sort(x, partial=2)
## [1] 1 2 5 3 4 6 7 9 8 10
sort(x, partial=4)
## [1] 1 2 3 4 5 6 7 9 8 10
I am not sure what partial means when sorting a vector.

As ?sort states,
If partial is not NULL, it is taken to contain indices of elements of the result
which are to be placed in their correct positions in the sorted array by partial sorting.
In other words, the following assertion is always true:
stopifnot(sort(x, partial=pt_idx)[pt_idx] == sort(x)[pt_idx])
for any x and pt_idx, e.g.
x <- sample(100) # input vector
pt_idx <- sample(1:100, 5) # indices for partial arg
This behavior is different from the one defined in the Wikipedia article on partial sorting. In R sort()'s case we are not necessarily computing k smallest elements.
For example, if
print(x)
## [1] 91 85 63 80 71 69 20 39 78 67 32 56 27 79 9 66 88 23 61 75 68 81 21 90 36 84 11 3 42 43
## [31] 17 97 57 76 55 62 24 82 28 72 25 60 14 93 2 100 98 51 29 5 59 87 44 37 16 34 48 4 49 77
## [61] 13 95 31 15 70 18 52 58 73 1 45 40 8 30 89 99 41 7 94 47 96 12 35 19 38 6 74 50 86 65
## [91] 54 46 33 22 26 92 53 10 64 83
and
pt_idx
## [1] 5 54 58 95 8
then
sort(x, partial=pt_idx)
## [1] 1 3 2 4 5 6 7 8 11 12 9 10 13 15 14 16 17 18 23 30 31 27 21 32 36 34 35 19 20 37
## [31] 38 33 29 22 26 25 24 28 39 41 40 42 43 48 46 44 45 47 51 50 52 49 53 54 57 56 55 58 59 60
## [61] 62 64 63 61 65 66 70 72 73 69 68 71 67 79 78 82 75 81 80 77 76 74 89 85 88 87 83 84 86 90
## [91] 92 93 91 94 95 96 97 99 100 98
Here x[5], x[54], ..., x[8] are placed in their correct positions - and we cannot say anything else about the remaining elements. HTH.
EDIT: Partial sorting may reduce the sorting time, of course if you are interested in e.g. finding only some of the order statistics.
require(microbenchmark)
x <- rnorm(100000)
microbenchmark(sort(x, partial=1:10)[1:10], sort(x)[1:10])
## Unit: milliseconds
## expr min lq median uq max neval
## sort(x, partial = 1:10)[1:10] 2.342806 2.366383 2.393426 3.631734 44.00128 100
## sort(x)[1:10] 16.556525 16.645339 16.745489 17.911789 18.13621 100

regarding the statement "Here x[5], x[54], ..., x[8] are placed in their correct positions", I don't think it's correct, it should be "in the result, i.e. sorted x, result[5], result[54],.....,result[8], will be placed with right values from x."
quote from R manual:
If partial is not NULL, it is taken to contain indices of elements of
the result which are to be placed in their correct positions in the
sorted array by partial sorting. For each of the result values in a
specified position, any values smaller than that one are guaranteed to
have a smaller index in the sorted array and any values which are
greater are guaranteed to have a bigger index in the sorted array.

Related

Can RStudio show number of changes done to an object after a line is run (as in Stata)?

In Stata, when changing values to variables (or other related operations), the output includes a comment regarding the number of changes. E.g:
Is there a way to obtain similar commentary in RStudio?
For instance, sometimes I want to check how many changes a command made (partly to see if command worked, or to count the extent of a potential problem in the data). Currently, I have to inspect the data manually or do a pretty uninformative comparison using all(), for instance.
Base R doesn't do this, but you could write a function to do it, and then instead of saying
x <- y
you'd say
x <- showChanges(x, y)
For example,
library(waldo)
showChanges <- function(oldval, newval) {
print(compare(oldval, newval))
newval
}
set.seed(123)
x <- 1:100
x <- showChanges(x, x + rbinom(100, size = 1, prob = 0.01))
#> `old[21:27]`: 21 22 23 24 25 26 27
#> `new[21:27]`: 21 22 23 25 25 26 27
x
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 25 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100
Created on 2021-10-21 by the reprex package (v2.0.0)

What should I do when the if statement in a for loop is false and I don't wanna take any operation for the false and then directly test next value?

I wanna find multiples of 2 between 0 and 100 and save these multiples in a vector.
This is my code:
i <- c(0:100)
a <- c()
for (value in i) {
if (i %% 2 == 0) {
a[i+1] <- i
}
}
#> Warning in if (i%%2 == 0) {: the condition has length > 1 and only the first
#> element will be used
#> Warning in if (i%%2 == 0) {: the condition has length > 1 and only the first
#> element will be used
#> Warning in if (i%%2 == 0) {: the condition has length > 1 and only the first
#> element will be used
...
print(a)
#> [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
#> [19] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
#> [37] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
#> [55] 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
#> [73] 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
#> [91] 90 91 92 93 94 95 96 97 98 99 100
Created on 2020-06-12 by the reprex package (v0.3.0)
The result that I expected should be "0,2,4,6,8,10,12...".
Where am I wrong?
Based on the way the 'a' is initialized (i.e. as NULL vector), we can concatenate the 'value' if the condition is satisified
a <- c()
for(value in i) if(value %%2 == 0) a <- c(a, value)
a
#[1] 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66
#[35] 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
In the OP's code, the condition inside if is done withe the whole vector i instead of the 'value' resulting in the warning message because if/else expects a single element of TRUE/FALSE
This can be done without a loop in R as these are vectorized operations
i[!i %% 2]
Instead of checking for every value of i why not generate a sequence with a step of 2.
i <- 0:100
seq(min(i), max(i), 2)
# [1] 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
#[19] 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70
#[37] 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100

In igraph, which network specifications allow groups of nodes to have the same distribution?

I am currently trying to generate a network where the degree distribution has a large variance, but with a sufficient number of nodes at each degree. For example, in igraph, if we use the Barabasi-Albert network, we can do:
g <- sample_pa(n=100,power = 1,m = 10)
g_adj <- as.matrix(as_adj(g))
rowSums(g_adj)
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
[29] 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
[57] 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
[85] 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
The above shows the degree on each of the 100 nodes. The problem for me is that I would like to only have 10-15 unique degree values, so that instead of having 93 94 95 96 97 98 99 at the end, we have instead, for example, 93 for each of the last 7 nodes. In other words, when I call
unique(rowSums(g_adj))
I'd like at most 10-15 values. Is there a way to "cluster" the nodes instead of having so many different unique degree values? thanks.
You may use sample_degseq: Generate random graphs with a given degree sequence. For instance,
degrees <- seq(1, 61, length = 10) # Ten different degrees
times <- rep(10, 10) # Giving each of the degrees to ten vertices
g <- sample_degseq(rep(degrees, times = times), method = "vl")
table(degree(g))
# 1 7 14 21 27 34 41 47 54 61
# 10 10 10 10 10 10 10 10 10 10
Note that you may need to play with degree and times as ultimately rep(degrees, times = times) needs to be a graphic sequence.

How to write OR condition inside which in R

I am unable to figure out how can i write or condition inside which in R.
This statemnet does not work.
which(value>100 | value<=200)
I know it very basic thing but i am unable to find the right solution.
Every value is either larger than 100 or smaller-or-equal to 200. Maybe you need other numbers or & instead of |? Otherwise, there is no problem with that statement, the syntax is correct:
> value <- c(110, 2, 3, 4, 120)
> which(value>100 | value<=200)
[1] 1 2 3 4 5
> which(value>100 | value<=2)
[1] 1 2 5
> which(value>100 & value<=200)
[1] 1 5
> which(iris$Species == "setosa" | iris$Species == "virginica")
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
does work. Remember to fully qualify the names of the variables you are selecting, as iris$Species in the example at hand (and not only Species).
Have a look at the documentation here.
Also notice that whatever you do with which can be generally done otherwise in a faster and better way.

In R, how do I locally shuffle a vector's elements

I have the following vector in R. Think of them as a vector of numbers.
x = c(1,2,3,4,...100)
I want to randomize this vector "locally" based on some input number the "locality factor". For example if the locality factor is 3, then the first 3 elements are taken and randomized followed by the next 3 elements and so on. Is there an efficient way to do this? I know if I use sample, it would jumble up the whole array.
Thanks in advance
Arun didn't like how inefficient my other answer was, so here's something very fast just for him ;)
It requires just one call each to runif() and order(), and doesn't use sample() at all.
x <- 1:100
k <- 3
n <- length(x)
x[order(rep(seq_len(ceiling(n/k)), each=k, length.out=n) + runif(n))]
# [1] 3 1 2 6 5 4 8 9 7 11 12 10 13 14 15 18 16 17
# [19] 20 19 21 23 22 24 27 25 26 29 28 30 33 31 32 36 34 35
# [37] 37 38 39 40 41 42 43 44 45 47 48 46 51 49 50 52 54 53
# [55] 55 57 56 58 60 59 62 63 61 66 64 65 68 67 69 71 70 72
# [73] 75 74 73 76 77 78 81 80 79 84 82 83 86 85 87 89 88 90
# [91] 93 92 91 94 96 95 97 98 99 100
General solution:
Edit: As #MatthewLundberg comments, the issue I pointed out with "repeating numbers in x" can be easily overcome by working on seq_along(x), which would mean the resulting values will be indices. So, it'd be like so:
k <- 3
x <- c(2,2,1, 1,3,4, 4,6,5, 3)
x.s <- seq_along(x)
y <- sample(x.s)
x[unlist(split(y, (match(y, x.s)-1) %/% k), use.names = FALSE)]
# [1] 2 2 1 3 4 1 4 5 6 3
Old answer:
The bottleneck here is the amount of calls to function sample. And as long as your numbers don't repeat, I think you can do this with just one call to sample in this manner:
k <- 3
x <- 1:20
y <- sample(x)
unlist(split(y, (match(y,x)-1) %/% k), use.names = FALSE)
# [1] 1 3 2 5 6 4 8 9 7 12 10 11 13 14 15 17 16 18 19 20
To put everything together in a function (I like the name scramble from #Roland's):
scramble <- function(x, k=3) {
x.s <- seq_along(x)
y.s <- sample(x.s)
idx <- unlist(split(y.s, (match(y.s, x.s)-1) %/% k), use.names = FALSE)
x[idx]
}
scramble(x, 3)
# [1] 2 1 2 3 4 1 5 4 6 3
scramble(x, 3)
# [1] 1 2 2 1 4 3 6 5 4 3
To reduce the answer (and get it faster) even more, following #flodel's comment:
scramble <- function(x, k=3L) {
x.s <- seq_along(x)
y.s <- sample(x.s)
x[unlist(split(x.s[y.s], (y.s-1) %/% k), use.names = FALSE)]
}
For the record, the boot package (shipped with base R) includes a function permutation.array() that is used for just this purpose:
x <- 1:100
k <- 3
ii <- boot:::permutation.array(n = length(x),
R = 2,
strata = (seq_along(x) - 1) %/% k)[1,]
x[ii]
# [1] 2 1 3 6 5 4 9 7 8 12 11 10 15 13 14 16 18 17
# [19] 21 19 20 23 22 24 26 27 25 28 29 30 33 31 32 36 35 34
# [37] 38 39 37 41 40 42 43 44 45 46 47 48 51 50 49 53 52 54
# [55] 57 55 56 59 60 58 63 61 62 65 66 64 67 69 68 72 71 70
# [73] 75 73 74 76 77 78 79 80 81 82 83 84 86 87 85 89 88 90
# [91] 93 91 92 94 95 96 97 98 99 100
This will drop elements at the end (with a warning):
locality <- 3
x <- 1:100
c(apply(matrix(x, nrow=locality, ncol=length(x) %/% locality), 2, sample))
## [1] 1 2 3 4 6 5 8 9 7 12 10 11 13 15 14 16 18 17 19 20 21 22 24 23 26 25 27 28 30 29 32 33 31 35 34 36 38 39 37
## [40] 42 40 41 43 44 45 47 48 46 51 49 50 54 52 53 55 57 56 58 59 60 62 61 63 64 65 66 67 69 68 71 72 70 74 75 73 78 77 76
## [79] 80 81 79 83 82 84 87 85 86 88 89 90 92 93 91 96 94 95 99 98 97
v <- 1:16
scramble <- function(vec,n) {
res <- tapply(vec,(seq_along(vec)+n-1)%/%n,
FUN=function(x) x[sample.int(length(x), size=length(x))])
unname(unlist(res))
}
set.seed(42)
scramble(v,3)
#[1] 3 2 1 6 5 4 9 7 8 12 10 11 15 13 14 16
scramble(v,4)
#[1] 2 3 1 4 5 8 6 7 10 12 9 11 14 15 16 13
I like Matthew's approach way better but here was the way I did the problem:
x <- 1:100
fact <- 3
y <- ceiling(length(x)/fact)
unlist(lapply(split(x, rep(1:y, each =fact)[1:length(x)]), function(x){
if (length(x)==1) return(x)
sample(x)
}), use.names = FALSE)
## [1] 3 1 2 6 4 5 8 9 7 11 10 12 13 15 14 17 16 18
## [19] 20 21 19 24 23 22 26 27 25 29 30 28 31 32 33 35 34 36
## [37] 39 37 38 41 42 40 45 43 44 47 46 48 51 49 50 52 53 54
## [55] 57 56 55 59 60 58 63 62 61 64 66 65 67 68 69 70 71 72
## [73] 75 73 74 77 76 78 80 79 81 82 84 83 85 86 87 90 89 88
## [91] 92 91 93 96 94 95 98 99 97 100

Resources