Easier way to convert a character to an integer? - julia

Still getting a feel for what's in the Julia standard library. I can convert strings to their integer representation via the Int() constructor, but when I call Int() with a Char I don't integer value of a digit:
julia> Int('3')
51
Currently I'm calling string() first:
intval = Int(string(c)) # doesn't work anymore, see note below
Is this the accepted way of doing this? Or is there a more standard method? It's coming up quite a bit in my Project Euler exercise.
Note: This question was originally asked before Julia 1.0. Since it was asked the int function was renamed to Int and became a method of the Int type object. The method Int(::String) for parsing a string to an integer was removed because of the potentially confusing difference in behavior between that and Int(::Char) discussed in the accepted answer.

The short answer is you can do parse(Int, c) to do this correctly and efficiently. Read on for more discussion and details.
The code in the question as originally asked doesn't work anymore because Int(::String) was removed from the languge because of the confusing difference in behavior between it and Int(::Char). Prior to Julia 1.0, the former was parsing a string as an integer whereas the latter was giving the unicode code point of the character which meant that Int("3") would return 3 whereaas Int('3') would return 51. The modern working equivalent of what the questioner was using would be parse(Int, string(c)). However, you can skip converting the character to a string (which is quite inefficient) and just write parse(Int, c).
What does Int(::Char) do and why does Int('3') return 51? That is the code point value assigned to the character 3 by the Unicode Consortium, which was also the ASCII code point for it before that. Obviously, this is not the same as the digit value of the letter. It would be nice if these matched, but they don't. The code points 0-9 are a bunch of non-printing "control characters" starting with the NUL byte that terminates C strings. The code points for decimal digits are at least contiguous, however:
julia> [Int(c) for c in "0123456789"]
10-element Vector{Int64}:
48
49
50
51
52
53
54
55
56
57
Because of this you can compute the value of a digit by subtracting the code point of 0 from it:
julia> [Int(c) - Int('0') for c in "0123456789"]
10-element Vector{Int64}:
0
1
2
3
4
5
6
7
8
9
Since subtraction of Char values works and subtracts their code points, this can be simplified to [c-'0' for c in "0123456789"]. Why not do it this way? You can! That is exactly what you'd do in C code. If you know your code will only ever encounter c values that are decimal digits, then this works well. It doesn't, however, do any error checking whereas parse does:
julia> c = 'f'
'f': ASCII/Unicode U+0066 (category Ll: Letter, lowercase)
julia> parse(Int, c)
ERROR: ArgumentError: invalid base 10 digit 'f'
Stacktrace:
[1] parse(::Type{Int64}, c::Char; base::Int64)
# Base ./parse.jl:46
[2] parse(::Type{Int64}, c::Char)
# Base ./parse.jl:41
[3] top-level scope
# REPL[38]:1
julia> c - '0'
54
Moreover, parse is a bit more flexible. Suppose you want to accept f as a hex "digit" encoding the value 15. To do that with parse you just need to use the base keyword argument:
julia> parse(Int, 'f', base=16)
15
julia> parse(Int, 'F', base=16)
15
As you can see it parses upper or lower case hex digits correctly. In order to do that with the subtraction method, your code would need to do something like this:
'0' <= c <= '9' ? c - '0' :
'A' <= c <= 'F' ? c - 'A' + 10 :
'a' <= c <= 'f' ? c - 'a' + 10 : error()
Which is actually quite close to the implementation of the parse(Int, c) method. Of course at that point it's much clearer and easier to just call parse(Int, c) which does this for you and is well optimized.

Related

How do you find the unicode value of a character in Julia?

I'm looking for something like Python's ord(char) for Julia that returns an integer.
I think you're looking for codepoint. From the documentation:
codepoint(c::AbstractChar) -> Integer
Return the Unicode codepoint (an unsigned integer) corresponding to the character c (or throw an exception if c does not represent a valid character). For Char, this is a UInt32 value, but AbstractChar types that represent only a subset of Unicode may return a different-sized integer (e.g. UInt8).
For example:
julia> codepoint('a')
0x00000061
To get the exact equivalent of Python's ord function, you might want to convert the result to a signed integer:
julia> Int(codepoint('a'))
97
You can also just do:
julia> Int('a')
97
If you have a String:
julia> s="hello";
julia> Int(s[1])
104
julia> Int(s[2])
101
julia> Int(s[5])
111
More details here.

ord() Function or ASCII Character Code of String with Z3 Solver

How can I convert a z3.String to a sequence of ASCII values?
For example, here is some code that I thought would check whether the ASCII values of all the characters in the string add up to 100:
import z3
def add_ascii_values(password):
return sum(ord(character) for character in password)
password = z3.String("password")
solver = z3.Solver()
ascii_sum = add_ascii_values(password)
solver.add(ascii_sum == 100)
print(solver.check())
print(solver.model())
Unfortunately, I get this error:
TypeError: ord() expected string of length 1, but SeqRef found
It's apparent that ord doesn't work with z3.String. Is there something in Z3 that does?
The accepted answer dates back to 2018, and things have changed in the mean time which makes the proposed solution no longer work with z3. In particular:
Strings are now formalized by SMTLib. (See https://smtlib.cs.uiowa.edu/theories-UnicodeStrings.shtml)
Unlike the previous version (where strings were simply sequences of bit vectors), strings are now sequences unicode characters. So, the coding used in the previous answer no longer applies.
Based on this, the following would be how this problem would be coded, assuming a password of length 3:
from z3 import *
s = Solver()
# Ord of character at position i
def OrdAt(inp, i):
return StrToCode(SubString(inp, i, 1))
# Adding ascii values for a string of a given length
def add_ascii_values(password, len):
return Sum([OrdAt(password, i) for i in range(len)])
# We'll have to force a constant length
length = 3
password = String("password")
s.add(Length(password) == length)
ascii_sum = add_ascii_values(password, length)
s.add(ascii_sum == 100)
# Also require characters to be printable so we can view them:
for i in range(length):
v = OrdAt(password, i)
s.add(v >= 0x20)
s.add(v <= 0x7E)
print(s.check())
print(s.model()[password])
Note Due to https://github.com/Z3Prover/z3/issues/5773, to be able to run the above, you need a version of z3 that you downloaded on Jan 12, 2022 or afterwards! As of this date, none of the released versions of z3 contain the functions used in this answer.
When run, the above prints:
sat
" #!"
You can check that it satisfies the given constraint, i.e., the ord of characters add up to 100:
>>> sum(ord(c) for c in " #!")
100
Note that we no longer have to worry about modular arithmetic, since OrdAt returns an actual integer, not a bit-vector.
2022 Update
Below answer, written back in 2018, no longer applies; as strings in SMTLib received a major update and thus the code given is outdated. Keeping it here for archival purposes, and in case you happen to have a really old z3 that you cannot upgrade for some reason. See the other answer for a variant that works with the new unicode strings in SMTLib: https://stackoverflow.com/a/70689580/936310
Old Answer from 2018
You're conflating Python strings and Z3 Strings; and unfortunately the two are quite different types.
In Z3py, a String is simply a sequence of 8-bit values. And what you can do with a Z3 is actually quite limited; for instance you cannot iterate over the characters like you did in your add_ascii_values function. See this page for what the allowed functions are: https://rise4fun.com/z3/tutorialcontent/sequences (This page lists the functions in SMTLib parlance; but the equivalent ones are available from the z3py interface.)
There are a few important restrictions/things that you need to keep in mind when working with Z3 sequences and strings:
You have to be very explicit about the lengths; In particular, you cannot sum over strings of arbitrary symbolic length. There are a few things you can do without specifying the length explicitly, but these are limited. (Like regex matches, substring extraction etc.)
You cannot extract a character out of a string. This is an oversight in my opinion, but SMTLib just has no way of doing so for the time being. Instead, you get a list of length 1. This causes a lot of headaches in programming, but there are workarounds. See below.
Anytime you loop over a string/sequence, you have to go up to a fixed bound. There are ways to program so you can cover "all strings upto length N" for some constant "N", but they do get hairy.
Keeping all this in mind, I'd go about coding your example like the following; restricting password to be precisely 10 characters long:
from z3 import *
s = Solver()
# Work around the fact that z3 has no way of giving us an element at an index. Sigh.
ordHelperCounter = 0
def OrdAt(inp, i):
global ordHelperCounter
v = BitVec("OrdAtHelper_%d_%d" % (i, ordHelperCounter), 8)
ordHelperCounter += 1
s.add(Unit(v) == SubString(inp, i, 1))
return v
# Your original function, but note the addition of len parameter and use of Sum
def add_ascii_values(password, len):
return Sum([OrdAt(password, i) for i in range(len)])
# We'll have to force a constant length
length = 10
password = String("password")
s.add(Length(password) == 10)
ascii_sum = add_ascii_values(password, length)
s.add(ascii_sum == 100)
# Also require characters to be printable so we can view them:
for i in range(length):
v = OrdAt(password, i)
s.add(v >= 0x20)
s.add(v <= 0x7E)
print(s.check())
print(s.model()[password])
The OrdAt function works around the problem of not being able to extract characters. Also note how we use Sum instead of sum, and how all "loops" are of fixed iteration count. I also added constraints to make all the ascii codes printable for convenience.
When you run this, you get:
sat
":X|#`y}###"
Let's check it's indeed good:
>>> len(":X|#`y}###")
10
>>> sum(ord(character) for character in ":X|#`y}###")
868
So, we did get a length 10 string; but how come the ord's don't sum up to 100? Now, you have to remember sequences are composed of 8-bit values, and thus the arithmetic is done modulo 256. So, the sum actually is:
>>> sum(ord(character) for character in ":X|#`y}###") % 256
100
To avoid the overflows, you can either use larger bit-vectors, or more simply use Z3's unbounded Integer type Int. To do so, use the BV2Int function, by simply changing add_ascii_values to:
def add_ascii_values(password, len):
return Sum([BV2Int(OrdAt(password, i)) for i in range(len)])
Now we'd get:
unsat
That's because each of our characters has at least value 0x20 and we wanted 10 characters; so there's no way to make them all sum up to 100. And z3 is precisely telling us that. If you increase your sum goal to something more reasonable, you'd start getting proper values.
Programming with z3py is different than regular programming with Python, and z3 String objects are quite different than those of Python itself. Note that the sequence/string logic isn't even standardized yet by the SMTLib folks, so things can change. (In particular, I'm hoping they'll add functionality for extracting elements at an index!).
Having said all this, going over the https://rise4fun.com/z3/tutorialcontent/sequences would be a good start to get familiar with them, and feel free to ask further questions.

'Big' fractions in Julia

I've run across a little problem when trying to solve a Project Euler problem in Julia. I've basically written a recursive function which produces fractions with increasingly large numerators and denominators. I don't want to post the code for obvious reasons, but the last few fractions are as follows:
1180872205318713601//835002744095575440
2850877693509864481//2015874949414289041
6882627592338442563//4866752642924153522
At that point I get an OverflowError(), presumably because the numerator and/or denominator now exceeds 19 digits. Is there a way of handling 'Big' fractions in Julia (i.e. those with BigInt-type numerators and denominators)?
Addendum:
OK, I've simplified the code and disguised it a bit. If anyone wants to wade through 650 Project Euler problems to try to work out which question it is, good luck to them – there will probably be around 200 better solutions!
function series(limit::Int64, i::Int64=1, n::Rational{Int64}=1//1)
while i <= limit
n = 1 + 1//(1 + 2n)
println(n)
return series(limit, i + 1, n)
end
end
series(50)
If I run the above function with, say, 20 as the argument it runs fine. With 50 I get the OverflowError().
Julia defaults to using machine integers. For more information on this see the FAQ: Why does Julia use native machine integer arithmetic?.
In short: the most efficient integer operations on any modern CPU involves computing on a fixed number of bits. On your machine, that's 64 bits.
julia> 9223372036854775805 + 1
9223372036854775806
julia> 9223372036854775805 + 2
9223372036854775807
julia> 9223372036854775805 + 3
-9223372036854775808
Whoa! What just happened!? That's definitely wrong! It's more obvious if you look at how these numbers are represented in binary:
julia> bitstring(9223372036854775805 + 1)
"0111111111111111111111111111111111111111111111111111111111111110"
julia> bitstring(9223372036854775805 + 2)
"0111111111111111111111111111111111111111111111111111111111111111"
julia> bitstring(9223372036854775805 + 3)
"1000000000000000000000000000000000000000000000000000000000000000"
So you can see that those 63 bits "ran out of space" and rolled over — the 64th bit there is called the "sign bit" and signals a negative number.
There are two potential solutions when you see overflow like this: you can use "checked arithmetic" — like the rational code does — that ensures you don't silently have this problem:
julia> Base.Checked.checked_add(9223372036854775805, 3)
ERROR: OverflowError: 9223372036854775805 + 3 overflowed for type Int64
Or you can use a bigger integer type — like the unbounded BigInt:
julia> big(9223372036854775805) + 3
9223372036854775808
So an easy fix here is to remove your type annotations and dynamically choose your integer types based upon limit:
function series(limit, i=one(limit), n=one(limit)//one(limit))
while i <= limit
n = 1 + 1//(1 + 2n)
println(n)
return series(limit, i + 1, n)
end
end
julia> series(big(50))
#…
1186364911176312505629042874//926285732032534439103474303
4225301286417693889465034354//3299015554385159450361560051

Find numeric placement of letters

Looking to find the numeric placement of letters in a random letter vector using a function equivalent to foo.
myletters = ["a","c","b","d","z"]
foo(myletters)
# [1,3,2,4,26]
Edit: If you're looking for the numeric distance from 'a', here's one solution:
julia> Int.(first.(["a","c","b","d","z"])) - Int('a') + 1
5-element Array{Int64,1}:
1
3
2
4
26
It will gracefully handle unicode (those simply are later code points and thus will have larger values) and longer strings (by only looking at the first character). Capitals, numbers, and some symbols will appear as negative numbers since their code points come before a.
Previous answer: I think you're looking for sortperm. It gives you a vector of indices that, if you index back into the original array with it, will put it in sorted order.
julia> sortperm(["a","c","b","d"])
4-element Array{Int64,1}:
1
3
2
4
I came up with the somewhat convoluted solution:
[reshape((1:26)[myletters[i] .== string.('a':'z')],1)[1] for i=1:length(myletters)]
Or using map
map(x -> reshape((1:26)[x .== string.('a':'z')],1)[1], myletters)

Why 2 ^ 3 ^ 4 = 0 in Julia?

I just read a post from Quora:
http://www.quora.com/Is-Julia-ready-for-production-use
At the bottom, there's an answer said:
2 ^ 3 ^ 4 = 0
I tried it myself:
julia> 2 ^ 3 ^ 4
0
Personally I don't consider this a bug in the language. We can add parenthesis for clarity, both for Julia and for our human beings:
julia> (2 ^ 3) ^ 4
4096
So far so good; however, this doesn't work:
julia> 2 ^ (3 ^ 4)
0
Since I'm learning, I'd like to know, how Julia evaluate this expression to 0? What's the evaluation precedent?
julia> typeof(2 ^ 3 ^ 4)
Int64
I'm surprised I couldn't find a duplicate question about this on SO yet. I figure I'll answer this slightly differently than the FAQ in the manual since it's a common first question. Oops, I somehow missed: Factorial function works in Python, returns 0 for Julia
Imagine you've been taught addition and multiplication, but never learned any numbers higher than 99. As far as you're concerned, numbers bigger than that simply don't exist. So you learned to carry ones into the tens column, but you don't even know what you'd call the column you'd carry tens into. So you just drop them. As long as your numbers never get bigger than 99, everything will be just fine. Once you go over 99, you wrap back down to 0. So 99+3 ≡ 2 (mod 100). And 52*9 ≡ 68 (mod 100). Any time you do a multiplication with more than two factors of 10, your answer will be zero: 25*32 ≡ 0 (mod 100). Now, after you do each computation, someone could ask you "did you go over 99?" But that takes time to answer… time that could be spent computing your next math problem!
This is effectively how computers natively do arithmetic, except they do it in binary with 64 bits. You can see the individual bits with the bits function:
julia> bits(45)
"0000000000000000000000000000000000000000000000000000000000101101"
As we multiply it by 2, 101101 will shift to the left (just like multiplying by 10 in decimal):
julia> bits(45 * 2)
"0000000000000000000000000000000000000000000000000000000001011010"
julia> bits(45 * 2 * 2)
"0000000000000000000000000000000000000000000000000000000010110100"
julia> bits(45 * 2^58)
"1011010000000000000000000000000000000000000000000000000000000000"
julia> bits(45 * 2^60)
"1101000000000000000000000000000000000000000000000000000000000000"
… until it starts falling off the end. If you multiply more than 64 twos together, the answer will always zero (just like multiplying more than two tens together in the example above). We can ask the computer if it overflowed, but doing so by default for every single computation has some serious performance implications. So in Julia you have to be explicit. You can either ask Julia to check after a specific multiplication:
julia> Base.checked_mul(45, 2^60) # or checked_add for addition
ERROR: OverflowError()
in checked_mul at int.jl:514
Or you can promote one of the arguments to a BigInt:
julia> bin(big(45) * 2^60)
"101101000000000000000000000000000000000000000000000000000000000000"
In your example, you can see that the answer is 1 followed by 81 zeros when you use big integer arithmetic:
julia> bin(big(2) ^ 3 ^ 4)
"1000000000000000000000000000000000000000000000000000000000000000000000000000000000"
For more details, see the FAQ: why does julia use native machine integer arithmetic?

Resources