Finding solutions to the equation a + b + gcd(a, b) = n - math

I have a question that, we are given n; We should find different two numbers: a and b, such that, a + b + gcd(a, b) = n. How can we find it?

a + b + gcd(a, b) =
gcd(a, b) * da + gcd(a, b) * db + gcd(a, b) =
gcd(a, b)* (da + db + 1)
So you have to get arbitrary factorization of n into two divisors, assign one divisor >= 3 to the sum d = (da + db + 1), and another divisor to gcd(a, b).
Subdivide d-1 value into two mutual prime parts da and db
Example (just some possible solutions for "universal answer value"):
n = 42 = 6 x 7
da + db + 1 = 6
da = 2 //arbitrary subdivision, da is mutual prime with db, OK
db = 3
gcd = 7
a = 14
b = 21
n = 14 + 21 + 7 = 42
da = 1 //arbitrary subdivision
db = 4
a = 7
b = 28
n = 7 + 28 + 7 = 42
da + db + 1 = 7
da = 2 // Error - gcd(da,db)>1, they are not mutual prime, subdivision is not suitable
db = 4
da = 1 //it's OK
db = 5
gcd = 6
6 + 30 + 6 = 42
n = 14 * 3
da = 5, db = 8
15 + 24 + 3 = 42

Related

How to find the base value in modular arithmetic?

I need an efficient formula of some kind that will allow one to figure out the original message(msg) with regards to the following formula: C = msg^e mod N. If a user is provided with C, e and N, is there an efficient way to calculate msg? In this example, C is the ciphertext, e is the public key and N is a public modulus.
I have done some research on what modular arithmetic is all about and looked over some detailed explanations, however, no articles have shown me how to figure out a problem such as this.
Modulus is a non-reversible operation. At best, you know that msg ^ e = C + k*N, and need to determine the value of k.
Consider the following simple case:
e = 2
N = 10
msg = 1 | C = 1 'Note1
msg = 2 | C = 4 'Note2
msg = 3 | C = 9 'Note3
msg = 4 | C = 6 'Note4
msg = 5 | C = 5
msg = 6 | C = 6 'Note4
msg = 7 | C = 9 'Note3
msg = 8 | C = 4 'Note2
msg = 9 | C = 1 'Note1
msg = 10 | C = 0
It should immediately be obvious that if C = 6, this could mean that Msg = 6 or that Msg = 4 (Or Msg = 24, et cetera, ad infinitum) with no way to tell the difference without more information.
However, given the same msg for different known values of e and N, then you can narrow down the possibilities like so:
e = 3
N = 10
Msg = 1 | C = 1
Msg = 2 | C = 8
Msg = 3 | C = 7
Msg = 4 | C = 4 'We can now see that this
Msg = 5 | C = 5
Msg = 6 | C = 6 'Is different from this
Msg = 7 | C = 3
Msg = 8 | C = 2
Msg = 9 | C = 9
Msg = 10 | C = 0

How to find the inversion of f(x)=(6x mod 13)?

Finding the inversion of an easier function is simple. The way I find the way of doing this by flipping the x and the y in the equation and solving for y. But I am stuck on a certain part.
y = (6*x) mod 13
x = (6*y) mod 13
The inverse of that function will only be defined for values between 0 and 12. Also for every possible y (in between 0 and 12) there will be an infinite number of possible x that fulfill the equation.
Let's try to solve for y
x = (6*y) mod 13
x + n*13 = (6*y)
y = (x + n*13)/6 | x ∈ {0,…,12}, n ∈ ℕ
where n is an unknown positive integer that could have any arbitrary value
To compute the inverse of y = 6 * x mod 13, I am first going to solve for x and replace x with y (and vice versa) later.
Since y = 6 * x mod 13, x = 6^(-1) * y mod 13, where 6^(-1) is the modular multiplicative inverse of 6 for the modulus 13. Your task now becomes finding 6^(-1) mod 13. In other words, you have to find m such that 6 * m = 1 mod 13.
Note that 6 * 2 = 12 = -1 mod 13. Squaring both sides, 6 * 6 * 2 * 2 = 1 mod 13, or 6 * 24 = 1 mod 13. Since 24 = 11 mod 13, therefore 6 * 11 = 1 mod 13 and 11 = 6^(-1) mod 13.
Thus, our equation for x now becomes x = 11 * y mod 13. Replacing y -> x and x -> y, the inverse of the given function is given by y = 11 * x mod 13.
This nifty Python script can be used to test the validity of our result:
def f(x):
return (6 * x) % 13
def f_inv(x):
return (11 * x) % 13
for x in range(13):
print x, f_inv(f(x)), x == f_inv(f(x))
When run, it gives the following output:
0 0 True
1 1 True
2 2 True
3 3 True
4 4 True
5 5 True
6 6 True
7 7 True
8 8 True
9 9 True
10 10 True
11 11 True
12 12 True
Thus validating the premise that f^(-1)(x) = 11 * x mod 13 satisfies the required premise that f^(-1)(f(x)) = x.

Find list of values in list of ranges in R

I have two data frames:
set.seed(123)
myData<-data.frame(id=1:10, pos=21:30)
refData<-data.frame(id=letters[1:15], pos=sample(10:40,15))
looking like that
> myData
id1 pos1
1 21
2 22
3 23
4 24
5 25
6 26
7 27
8 28
9 29
10 30
> refData
id2 pos2
a 18
b 33
c 21
d 34
e 35
f 11
g 23
h 31
i 22
j 20
k 30
l 19
m 32
n 39
o 36
I want an extended data frame of myData. For each row in myData i want to check if there is an entry in refData with a distance less than 2 numbers and if so, i want the IDs of refData pasted in a new column of myData.
In the end my new data frame should look like that:
id1 pos1 newColumn
1 21 c, g, i, j, l
2 22 c, g, i, j
3 23 c, g, i
4 24 g, i
5 25 g
6 26
7 27
8 28 k
9 29 h, k
10 30 h, k, m
Obviously, i could do that with the following loop, which works fine:
myData$newColumn<-rep(NA, nrow(myData))
for(i in 1:nrow(myData)){
ww<-which(abs(refData$pos2 - myData$pos1[i]) <= 2)
myData$newColumn[i]<-paste(refData[ww,1],collapse=", ")
}
But, i'm looking for a really fast way to do that, since my real data has about 10^6 entries, and my real refData has about 10^7 entries.
I really appreciate any help and ideas of a fast way to do that!
You could try:
myData$newColumn = lapply(myData$pos,
function(x) {paste(refData$id[abs(refData$pos-x)<3],collapse=', ')})
Output:
id pos newColumn
1 1 21 c, g, i, j, l
2 2 22 c, g, i, j
3 3 23 c, g, i
4 4 24 g, i
5 5 25 g
6 6 26
7 7 27
8 8 28 k
9 9 29 h, k
10 10 30 h, k, m
Hope this helps!
Another option would be
myData$newColumn <- sapply(myData$pos, function(x) paste(refData$id[refData$pos >= x-2 & refData$pos <= x+2], collapse = ", "))
A benchmark with n = 1000 shows #Florian's solution slightly ahead:
set.seed(123)
myData<-data.frame(id=1:1000, pos=sample(21:30, 1000, replace = T))
refData<-data.frame(id=sample(letters[1:15], 1000, replace = T), pos=sample(10:40, 1000, replace = T))
myData$newColumn<-rep(NA, nrow(myData))
library(microbenchmark)
microbenchmark(for(i in 1:nrow(myData)){
ww<-which(abs(refData$pos - myData$pos[i]) <= 2)
myData$newColumn[i]<-paste(refData[ww, "id"],collapse=", ")
},
myData$newColumn2 <- sapply(myData$pos, function(x) paste(refData$id[refData$pos >= x-2 & refData$pos <= x+2], collapse = ", ")),
myData$newColumn3 <- lapply(myData$pos, function(x) paste(refData$id[abs(refData$pos - x) < 3], collapse = ", ")))
Unit: milliseconds
expr
for (i in 1:nrow(myData)) { ww <- which(abs(refData$pos - myData$pos[i]) <= 2) myData$newColumn[i] <- paste(refData[ww, "id"], collapse = ", ") }
myData$newColumn2 <- sapply(myData$pos, function(x) paste(refData$id[refData$pos >= x - 2 & refData$pos <= x + 2], collapse = ", "))
myData$newColumn3 <- lapply(myData$pos, function(x) paste(refData$id[abs(refData$pos - x) < 3], collapse = ", "))
min lq mean median uq max neval cld
62.97657 64.74155 70.01541 68.81024 71.02023 206.80477 100 c
46.55872 47.90585 50.75397 50.42333 53.42990 58.01813 100 b
36.69362 37.34244 39.70480 38.54905 42.49614 46.27513 100 a
Your current problem has two main bottlenecks -- 1) the nrow(myData) * nrow(refData) computations and, 2) the creation of possibly large character vectors by concatenating refData$id.
To overcome the first one, one way (since myData$pos is/can be sorted) is to use findInterval to locate the ranges that each refData$pos falls in regards to myData$pos +/- the allowed distance (here 2). This way, the computational complexity gets reduced to nrow(refData) * log(nrow(myData)) or, possibly, even less.
To save some typing:
a = myData$pos
b = refData$pos
As a start, we need to find the interval of a + 2 where each b is found:
i = findInterval(b, a + 2L, all.inside = TRUE, left.open = TRUE)
#> i
# [1] 1 9 1 9 9 1 1 8 1 1 7 1 9 9 9
We specify the intervals as (lower, upper] and avoid falling outside of the 1:(length(a) - 1) range so we can calculate easily the first index where b is 2 units away from a:
i1 = ifelse(abs(b - a[i + 1L]) <= 2, i + 1L, NA)
i2 = ifelse(abs(b - a[i]) <= 2, i, NA)
ii = pmin(i1, i2, na.rm = TRUE)
#> ii
# [1] NA NA 1 NA NA NA 1 9 1 1 8 1 10 NA NA
We, also, need to locate the ([lower, upper)) interval of a - 2 where each b falls and we find the last index of a where b is 2 units away:
j = findInterval(b, a - 2L, all.inside = TRUE, left.open = FALSE)
j1 = ifelse(abs(b - a[j + 1L]) <= 2, j + 1L, NA)
j2 = ifelse(abs(b - a[j]) <= 2, j, NA)
jj = pmax(j1, j2, na.rm = TRUE)
#> jj
# [1] NA NA 3 NA NA NA 5 10 4 2 10 1 10 NA NA
Now, we are left with the location of the first (ii) and last (jj) index of myData$pos (a) where each refData$pos (b) is located +/- 2 units away (where the missing values denote no matching).
A way to overcome the second bottleneck is to avoid it overall if we can utilize the above format to continue.
Nonetheless, to further proceed with representing the matches as concatenated refData$ids, we could, probably, utilize the IRanges package from here on to hope for something efficient:
library(IRanges)
nr = 1:nrow(myData)
myrng = IRanges(nr, nr)
refrng = IRanges(ifelse(is.na(ii), 0L, ii), ifelse(is.na(jj), 0L, jj)) ## replace NA with 0
ovrs = findOverlaps(myrng, refrng)
tapply(refData$id[subjectHits(ovrs)], factor(queryHits(ovrs), nr), toString)
# 1 2 3 4 5
#"c, g, i, j, l" "c, g, i, j" "c, g, i" "g, i" "g"
# 6 7 8 9 10
# NA NA "k" "h, k" "h, k, m"

Best practices for dynamic functions in R - when a value is a function of itself at an earlier timestep

I have a series of time steps:
t <- 1:10
I have a starting value:
x <- 1
I combine them into a data frame:
myDF <- data.frame(T = t, X = NA)
and some dynamic update such that my value is a function of self at t - 1:
myDF$X[1] = x
myDF$x[2:10] = myDF$x[T-1] + (myDF$x[T-1] * (myDF$T - myDF[T-1]))
This simple example would yield the following:
T X
1 0.5
2 1
3 2
4 4
5 8
6 16
7 32
8 64
9 128
10 256
What is the best way to program this kind of thing in R? I have developed a number of what feel like brute force solutions. I have come close with mutate(), arrange() and lag() in dplyr, but I can only get the first two values to calculate because it does not evaluate iteratively.
Thanks!
Seems you need this (with dplyr):
x <- 0.5
myDF <- data.frame(T = t)
myDF %>% mutate(X = x * cumprod(1 + T - lag(T, default = first(T))))
# T X
#1 1 0.5
#2 2 1.0
#3 3 2.0
#4 4 4.0
#5 5 8.0
#6 6 16.0
#7 7 32.0
#8 8 64.0
#9 9 128.0
#10 10 256.0
An informal Derivation:
let x = myDF$x
t = myDF$T
x[i] = x[i-1] + x[i-1] * (T[i] - T[i-1])
= x[i-1] * (1 + T[i] - T[i-1])
= x[i-2] * (1 + T[i-1] - T[i-2]) * (1 + T[i] - T[i-1])
...
= x[1] * (1 + T[2] - T[1]) * ... * (1 + T[i] - T[i-1])
# vectorized solution would be
x = x[1] * cumprod(1 + T - lag(T))

r and igraph Help, Assortativity Coefficient with Weighted Edges. Remaining weights vs. Total weights. (undirected graph)

a combo network analysis and igraph/r question. It is cross posted with Mathematics (and I will hopefully not be laughed away).
I am trying to find the assortativity coefficient for an undirected weighted graph. Edges are have weights, reflecting flows of value between nodes (if you're curious, it's a graph of a marketplace with goods being exchanged between traders).
I see the standard Newman (2002) implementation of assortativity may be applied to weighted networks, replacing degree count with degree strength, ala
(where i indexes the graph's M edges, and F(i) denotes the set of two nodes connected by link i, and s_\phi is strength or total weights; the sum of edge weights connected to node \phi.)
And I see this implemented by the r package igraph, via the assortativity(g, types1 = graph.strength(g)), where types1 are node weights.
The issue
If s_\phi were just degrees, one could either use total or remaining degrees and you'd get the same answer either way. However, using total weight (graph.strength(g) above), I am worried about introducing upward bias in my assortativity metric, since each node weight includes the edge weight we are iterating over (the edge which connects those two nodes). Although I can't point to a citation at the moment, it seems like we should be finding the correlation of remaining weights.
That is to say, where s_\phi above is the sum of weights on edges connected to node \phi including the edge that connects them, it should be the remaining weights; the sum of weights on edges connected to node \phi, except for edge i.
Firstly, I am curious if this is an issue? Working with this type of graph, will assortativity using nodes weights bias up my correlation estimate? Does it just not matter?
Secondly, I am curious if this is possible with the igraph package in r, or the networkX python library. Perhaps an implementation of weighted degree-dependent nearest-neighbors degree? Doesn't look like it, but thought I'd ask.
I put together an example to show you what I mean:
Set up
library(igraph)
g <- read.table(
text =
" a b c d e f
a 0 4 5 8 1 0
b 4 0 1 4 8 9
c 5 1 0 4 1 0
d 8 4 4 0 2 7
e 1 8 1 2 0 1
f 0 9 0 7 1 0
",
header = T
)
g <- as.matrix(g)
g <- graph.adjacency(g, mode="undirected", weighted=TRUE)
E(g)$width <- E(g)$weight; plot(g)
igraph
assortativity.degree(g) # degree assortativity (ie unweighted)
[1] -0.4444444
assortativity(g, types1 = graph.strength(g), directed = F) #assortativity with weights
[1] -0.2421219
my remaining weights assortativity
assortativity.weightEdge <- function(g){
linkedNodes <- data.frame(
n1 = rep(NA, length(E(g))),
n2 = rep(NA, length(E(g))),
s1 = rep(NA, length(E(g))),
s2 = rep(NA, length(E(g))),
s1_remaining = rep(NA, length(E(g))),
s2_remaining = rep(NA, length(E(g))),
k_nn_1 = rep(NA, length(E(g))),
k_nn_2 = rep(NA, length(E(g)))
)
# standard Newman 2002, http://arxiv.org/pdf/cond-mat/0205405v1.pdf
# but usign the "remaining strengths"
num1 = 0
num2 = 0
den1 = 0
#iterate over edges
for (i in 1:length(E(g))){
n1 = ends(g,E(g))[i,1]
n2 = ends(g,E(g))[i,2]
s1 = sum(g[n1,]) - g[n1,n2] # stregth of "remaining" connected nodes
s2 = sum(g[n2,]) - g[n1,n2] # "
linkedNodes$n1[i] <- n1
linkedNodes$n2[i] <- n2
linkedNodes$s1_remaining[i] <- s1
linkedNodes$s2_remaining[i] <- s2
num1 = num1 + s1 * s2
num2 = num2 + s1 + s2
den1 = den1 + (s1^2 + s2^2)
}
num1 = num1 / length(E(g))
num2 = num2 / (length(E(g)) * 2)
num2 = num2 * num2
den1 = den1 / (length(E(g)) * 2)
print(
paste(
"Assortativity, remaining weights:",
(num1-num2) / (den1-num2))
)
##########
num1 = 0
num2 = 0
den1 = 0
#iterate over edges
for (i in 1:length(E(g))){
n1 = ends(g,E(g))[i,1]
n2 = ends(g,E(g))[i,2]
s1 = sum(g[n1,])
s2 = sum(g[n2,])
linkedNodes$s1[i] <- s1
linkedNodes$s2[i] <- s2
num1 = num1 + s1 * s2
num2 = num2 + s1 + s2
den1 = den1 + (s1^2 + s2^2)
}
num1 = num1 / length(E(g))
num2 = num2 / (length(E(g)) * 2)
num2 = num2 * num2
den1 = den1 / (length(E(g)) * 2)
print(
paste(
"Assortativity, total weights:",
(num1-num2) / (den1-num2))
)
return(linkedNodes)
}
returns:
assortativity.weightEdge(g)
[1] "Assortativity, remaining weights: -0.3579013116802"
[1] "Assortativity, total weights: -0.242121917988836"
n1 n2 s1 s2 s1_remaining s2_remaining k_nn_1 k_nn_2
1 a b 18 26 14 22 NA NA
2 a c 18 11 13 6 NA NA
3 a d 18 25 10 17 NA NA
4 a e 18 13 17 12 NA NA
5 b c 26 11 25 10 NA NA
6 b d 26 25 22 21 NA NA
7 b e 26 13 18 5 NA NA
8 b f 26 17 17 8 NA NA
9 c d 11 25 7 21 NA NA
10 c e 11 13 10 12 NA NA
11 d e 25 13 23 11 NA NA
12 d f 25 17 18 10 NA NA
13 e f 13 17 12 16 NA NA
For some evidence, note the total weights assortativity matches igraph's. And as suspected, the remaining weights assortativity is a bit lower than that derived with total weights.

Resources