I want to make a short URL service for 2 million assets but I want to use the shortest number of possible characters.
What is the math equation that I would need to use to figure it out? I know it has something to do with factorials, right?
It's not a factorial problem, but an exponential one.
If x is the number of possible characters, you need to solve the following equation for y:
x^y = 2000000
If you want to use all numbers and case-sensitive alpha [0-9A-Za-z], you have 62 possible values. This means you need to solve:
62^y = 2000000
y*log(62) = log(2000000)
y = log(2000000) / log(62)
y = 3.5154313828...
Of course, you can't have 3.5 characters in your URL, so you would need 4. If you want to change the character set you are using for your URL's, simply resolve the problem above using the number of values in your set.
Note Solving this equation assumes fixed-length URL's. For variable-length URL's, see Rob's answer.
#jheddings is close, and got the right answer, but the math was not quite correct. Don't forget you are not limited to all the permutations of characters of a specific length. You can also leverage URLs of length 1 through y characters. Therefore we want the closed value of this sum:
x + x^2 + x^3 + ... + x^y = 2000000
Fortunately, there is a closed form for that sum:
x + x^2 + x^3 + ... + x^y = x*(x^y - 1)/(x-1) = 2000000
x is the number of possible characters in our range. For simplicity sake, let's assume it only includes lowercase, uppercase, and numbers (26+26+10 = 62.)
Then we get the following equation:
2000000 = (62^(y+1) - 62)/(62-1)
2000000 = (62^(y+1) - 62)/(61)
2000000 * 61 = 62^(y+1) - 62
122000000 = 62^(y+1) - 62
122000000 + 62 = 62^(y+1)
122000062 = 62^(y+1)
log(122000062) = (y+1)
log(122000062) / log(62) = y+1
4.511492 = y+1
3.511492 = y
And, as you said, 3.5 characters is impossible so 4 are required. Admittedly the difference doesn't matter in this case. However, in certain scenarios (especially when dealing with base 2) it is very important.
Number of possible short URLs = (Number of possible different characters in ID) raised to the power of (Length of ID in url)
For instance, if you're only using lowercase characters (of which there are 26) and your URLs look like http://domain.com/XXXXX (for your unique id's of 5 characters), then you can make 26^5 = 11,881,376 short urls.
If you were using upper and lower case letters, you'd have 52, so 52^5 = 380,204,032 possible short URLs, et cetera.
You need to answer a number of questions, like what kinds of characters you want to allow in your set.
All letters and all digits? base 36 (5 characters can fit 2mil+)
Distinguish between upper and lowercase? That gets you to base 62 (4 characters)
Remove easily-mistaken characters and numbers (e.g. i/l 0/o)? roughly base 32 (also 5 characters)
You can often solve this kind of problem without any math wizardry.
26+26+10 = 62 characters
Try 1. 62 = 62
Try 2. 62*62 = 3,844
Try 3. 62*62*62 = 238,328
Try 4. 62*62*62*62 = 14,776,336
So 4 is your answer :)
According to the HTTP/URI Spec you can additionally use the following "unreserved characters": ALPHA / DIGIT / "-" / "." / "_" / "~"
That adds an additional 4 characters to your radix and thus
Math.log(2000000) / Math.log(66) = 3.4629721616408813
Although this still means you will end up with a 4 character URL path at maximum.
Related
How can I convert a z3.String to a sequence of ASCII values?
For example, here is some code that I thought would check whether the ASCII values of all the characters in the string add up to 100:
import z3
def add_ascii_values(password):
return sum(ord(character) for character in password)
password = z3.String("password")
solver = z3.Solver()
ascii_sum = add_ascii_values(password)
solver.add(ascii_sum == 100)
print(solver.check())
print(solver.model())
Unfortunately, I get this error:
TypeError: ord() expected string of length 1, but SeqRef found
It's apparent that ord doesn't work with z3.String. Is there something in Z3 that does?
The accepted answer dates back to 2018, and things have changed in the mean time which makes the proposed solution no longer work with z3. In particular:
Strings are now formalized by SMTLib. (See https://smtlib.cs.uiowa.edu/theories-UnicodeStrings.shtml)
Unlike the previous version (where strings were simply sequences of bit vectors), strings are now sequences unicode characters. So, the coding used in the previous answer no longer applies.
Based on this, the following would be how this problem would be coded, assuming a password of length 3:
from z3 import *
s = Solver()
# Ord of character at position i
def OrdAt(inp, i):
return StrToCode(SubString(inp, i, 1))
# Adding ascii values for a string of a given length
def add_ascii_values(password, len):
return Sum([OrdAt(password, i) for i in range(len)])
# We'll have to force a constant length
length = 3
password = String("password")
s.add(Length(password) == length)
ascii_sum = add_ascii_values(password, length)
s.add(ascii_sum == 100)
# Also require characters to be printable so we can view them:
for i in range(length):
v = OrdAt(password, i)
s.add(v >= 0x20)
s.add(v <= 0x7E)
print(s.check())
print(s.model()[password])
Note Due to https://github.com/Z3Prover/z3/issues/5773, to be able to run the above, you need a version of z3 that you downloaded on Jan 12, 2022 or afterwards! As of this date, none of the released versions of z3 contain the functions used in this answer.
When run, the above prints:
sat
" #!"
You can check that it satisfies the given constraint, i.e., the ord of characters add up to 100:
>>> sum(ord(c) for c in " #!")
100
Note that we no longer have to worry about modular arithmetic, since OrdAt returns an actual integer, not a bit-vector.
2022 Update
Below answer, written back in 2018, no longer applies; as strings in SMTLib received a major update and thus the code given is outdated. Keeping it here for archival purposes, and in case you happen to have a really old z3 that you cannot upgrade for some reason. See the other answer for a variant that works with the new unicode strings in SMTLib: https://stackoverflow.com/a/70689580/936310
Old Answer from 2018
You're conflating Python strings and Z3 Strings; and unfortunately the two are quite different types.
In Z3py, a String is simply a sequence of 8-bit values. And what you can do with a Z3 is actually quite limited; for instance you cannot iterate over the characters like you did in your add_ascii_values function. See this page for what the allowed functions are: https://rise4fun.com/z3/tutorialcontent/sequences (This page lists the functions in SMTLib parlance; but the equivalent ones are available from the z3py interface.)
There are a few important restrictions/things that you need to keep in mind when working with Z3 sequences and strings:
You have to be very explicit about the lengths; In particular, you cannot sum over strings of arbitrary symbolic length. There are a few things you can do without specifying the length explicitly, but these are limited. (Like regex matches, substring extraction etc.)
You cannot extract a character out of a string. This is an oversight in my opinion, but SMTLib just has no way of doing so for the time being. Instead, you get a list of length 1. This causes a lot of headaches in programming, but there are workarounds. See below.
Anytime you loop over a string/sequence, you have to go up to a fixed bound. There are ways to program so you can cover "all strings upto length N" for some constant "N", but they do get hairy.
Keeping all this in mind, I'd go about coding your example like the following; restricting password to be precisely 10 characters long:
from z3 import *
s = Solver()
# Work around the fact that z3 has no way of giving us an element at an index. Sigh.
ordHelperCounter = 0
def OrdAt(inp, i):
global ordHelperCounter
v = BitVec("OrdAtHelper_%d_%d" % (i, ordHelperCounter), 8)
ordHelperCounter += 1
s.add(Unit(v) == SubString(inp, i, 1))
return v
# Your original function, but note the addition of len parameter and use of Sum
def add_ascii_values(password, len):
return Sum([OrdAt(password, i) for i in range(len)])
# We'll have to force a constant length
length = 10
password = String("password")
s.add(Length(password) == 10)
ascii_sum = add_ascii_values(password, length)
s.add(ascii_sum == 100)
# Also require characters to be printable so we can view them:
for i in range(length):
v = OrdAt(password, i)
s.add(v >= 0x20)
s.add(v <= 0x7E)
print(s.check())
print(s.model()[password])
The OrdAt function works around the problem of not being able to extract characters. Also note how we use Sum instead of sum, and how all "loops" are of fixed iteration count. I also added constraints to make all the ascii codes printable for convenience.
When you run this, you get:
sat
":X|#`y}###"
Let's check it's indeed good:
>>> len(":X|#`y}###")
10
>>> sum(ord(character) for character in ":X|#`y}###")
868
So, we did get a length 10 string; but how come the ord's don't sum up to 100? Now, you have to remember sequences are composed of 8-bit values, and thus the arithmetic is done modulo 256. So, the sum actually is:
>>> sum(ord(character) for character in ":X|#`y}###") % 256
100
To avoid the overflows, you can either use larger bit-vectors, or more simply use Z3's unbounded Integer type Int. To do so, use the BV2Int function, by simply changing add_ascii_values to:
def add_ascii_values(password, len):
return Sum([BV2Int(OrdAt(password, i)) for i in range(len)])
Now we'd get:
unsat
That's because each of our characters has at least value 0x20 and we wanted 10 characters; so there's no way to make them all sum up to 100. And z3 is precisely telling us that. If you increase your sum goal to something more reasonable, you'd start getting proper values.
Programming with z3py is different than regular programming with Python, and z3 String objects are quite different than those of Python itself. Note that the sequence/string logic isn't even standardized yet by the SMTLib folks, so things can change. (In particular, I'm hoping they'll add functionality for extracting elements at an index!).
Having said all this, going over the https://rise4fun.com/z3/tutorialcontent/sequences would be a good start to get familiar with them, and feel free to ask further questions.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
i know how to convert from base b to decimal but Im not understand what is base-b exactly. I know we multiply the base to the numbers if we gonna convert to bas 10 then multiply with then is base-b(base 5)?
In order to follow this, we should understand the difference between a number and its representation. Let's start with the (natural) numbers. There are two special numbers: zero and one. Zero is the neutral element of addition (i.e. you can add zero to anything without changing it) and one is the neutral element of multiplication. Every other number can be induced by these two numbers. Start with zero. Then, subsequently add one.
A common representation for numbers is the decimal system. However, this is purely arbitrary and any other system could be used as well. There is nothing intrinsic in the number twelve that would require us to write it as 12. The nice thing is that all arithmetic rules are defined on the numbers themselves, not on their representations. Five plus six will always be eleven. No matter how you represent them. You may have already noticed that I use number words when I talk about the numbers and any other representation if I talk about the representation.
Ok, so we have our numbers. Now we need a way to represent them. Imagine we have three symbols a, b, and c. We could just assign the first three numbers to them
a (zero)
b (one)
c (two)
But then we are out of symbols. As you know, the positional numeral systems solve this by introducing another position. Then, just continue as before. Assign the next few numbers in order
ba (three)
bb (four)
bc (five)
ca (six)
cb (seven)
cc (eight)
You might want to continue with a third position:
baa (nine)
bab (ten)
bac (eleven)
...
The base of this system is three (or ba) because we have three symbols. We can observe that the digits in the second position stand for an addition of a multiple of three (b. stands for three + ., c. stands for two times three + . ...) Expressed in base ba, this is: b. = b * ba + ., c. = c * ba + .. This continues to all positions and you can generalize that a number formed of digits dn ... d1 d0 can be expressed by the well-known formula:
n = Sum(i) di * base^i
The intuition behind this formula is that there will be base numbers with one digit, base^2 numbers with two digits and so on. And the di * base^i term skips the first few of them (as many such that the first digit matches, then the second and so on).
We can check this at the example of bac which should be eleven:
n = b * ba^c + a * ba^a + c * ba^a
= one * three^two + zero * three^one + two * three^zero
= nine + zero + two
= eleven
= bac
Remember that the arithmetic rules apply to the numbers and not to the representations? So since we know the definition of our number (second line in the above formula), we can use any other number representation. For example, the decimal one:
n = one * three^two + zero * three^one + two * three^zero
= 1 * 3^2 + 0*3^1 + 2*3^0
= 9 + 0 + 2
= 11 (decimal)
But we could also use another base, e.g. base-8:
n = one * three^two + zero * three^one + two * three^zero
= 1 * 3^2 + 0*3^1 + 2*3^0
= 11 + 0 + 2
= 13 (octal)
So basically, these systems arise naturally by assigning digit sequences systematically to subsequent numbers. The conversion is so simple because the positional equation applies to the numbers, not to the representations.
I hope this answer was not too abstract and helped you.
I need to represent very large numbers (on the order of 8 kB or more) using a mathematical expression consisting of smaller numbers. One form of this would be factoring, as in 81 = 9 * 9. But for some numbers, factoring would not work. In that case, a more complex expression would be required.
For example, the number 78125 can be represented by two smaller numbers 78125 = 5 ^ 7. But 78121 can not be factored. So the expression would be 78121 = 5 ^ 7 - 4. For the number sizes I'm interested in, the expression might be something like n = (511 ^ 213) * (327 ^ 400)
What type of algorithm would allow me calculate these types of short-cut expressions?
(Edited to try to make it more clear)
In R, I need to create a vector b = (1, 1+pi, 1+2pi, 1+3pi,...,1+19pi). I am unsure how to do this. I keep trying to use the seq command (i.e. seq(1, 1+npi n = 1:19) and that's totally wrong!), but don't know the proper syntax to make it work, thus it never does.
Any help would be appreciated.
R needs the multiplication operator.
b <- 1+ seq(0,19)*pi
Or slightly faster in situations where speed might matter:
b <- 1+ seq.int(0,19)*pi
You could use the equivalent:
b <- 1+ 0:19*pi
Because the ":" operator has very high precedence ( see ?Syntax), it's reasonable safe. Just be careful that you understand precedence when you use a minus or plus sign where it might be parse as a binary operator (remembering that spaces are ignored and that unary-minus has higher precedence than the single-colon, but binary minus or plus has a lower precedence :
> 1: 5+5
[1] 6 7 8 9 10
You should use simply 0:19 * pi + 1. Using seq is not so nice: seq(1, 1 + 19 * pi, by = pi) or seq(1, 1 + 19 * pi, length = 20).
Additional details:
X is any positive integer 6 digits or less.
X is left-padded with zeros to maintain a width of 6.
Please explain your answer :)
(This might be better in the Math site, but figured it involves programming functions)
The picture from the german Wikipedia article is very helpful:
You see that 6 consecutive bits from the original bytes generate a Base64 value. To generate + or / (codes 62 and 63), you'd need the bitstrings 111110 and 111111, so at least 5 consecutive bits set.
However, look at the ASCII codes for 0...9:
00110000
00110001
00110010
00110011
00110100
00110101
00110110
00110111
00111000
00111001
No matter how you concatenate six of those, there won't be more than 3 consecutive bits set. So it's not possible to generate a Base64 string that contains + or / this way, Y will always be alphanumeric.
EDIT: In fact, you can even rule other Base64 values out like 000010 (C), so this leads to nice follow-up questions/puzzles like "How many of the 64 values are possible at all?".