Rust-Crypto: AES-CBC produces weird benchmarking results

Rust-Crypto: AES-CBC produces weird benchmarking results - encryption

I am currently working on a project that is going to become the foundation for some bigger research on Rust's crypto landscape and I am currently facing an issue with Rust-Crypto's AES implementation when using CBC as block mode.
I wrote benchmarks using criterion and its cycles-per-byte plugin for ECB, CBC, CTR and GCM to measure how they perform in terms of Cycles-per-Byte and I am getting very reasonable results for ECB, CTR and GCM. In fact the results for these three modes are exactly as they are supposed to be.
CBC however does not produce expected results and I do not understand how or why.
Here's the premise: All benchmarks are executed on an I7-8700k with 3,7 GHz core frequency and disabled turbo boost. According to Intels specifications this CPU uses an AES pipeline length of 4.
The correct amount of Cycles-per-Byte should therefore be 2.5cpb for CBC-128 encryption. This I was able to verify with a basic benchmark in C.
Now the problem: My CBC-128 benchmarks in Rust produce a result of 4.5cpb, so 2cbp above the "normal".
Could somebody take a look at my code?
Here's the content of aes-cbc.rs:
use aes::cipher::{BlockEncryptMut, KeyIvInit};
use aes::{Aes128, Aes192, Aes256};
use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
use criterion_cycles_per_byte::CyclesPerByte;
use RustCrypto_AES_Benchmarks as benches;
type Aes128CbcEnc = cbc::Encryptor<Aes128>;
type Aes192CbcEnc = cbc::Encryptor<Aes192>;
type Aes256CbcEnc = cbc::Encryptor<Aes256>;
pub const KB: usize = 1024;
fn bench(c: &mut Criterion<CyclesPerByte>) {
let mut group = c.benchmark_group("aes-cbc");
let mut cipher128 = Aes128CbcEnc::new(&Default::default(), &Default::default());
let mut cipher192 = Aes192CbcEnc::new(&Default::default(), &Default::default());
let mut cipher256 = Aes256CbcEnc::new(&Default::default(), &Default::default());
for size in &[KB, 2 * KB, 4 * KB, 8 * KB, 16 * KB] {
let mut buf = vec![Default::default(); *size / 16];
group.throughput(Throughput::Bytes(*size as u64));
group.bench_function(BenchmarkId::new("encrypt-128", size), |b| {
b.iter(|| cipher128.encrypt_blocks_mut(&mut buf));
});
group.bench_function(BenchmarkId::new("encrypt-192", size), |b| {
b.iter(|| cipher192.encrypt_blocks_mut(&mut buf));
});
group.bench_function(BenchmarkId::new("encrypt-256", size), |b| {
b.iter(|| cipher256.encrypt_blocks_mut(&mut buf));
});
}
group.finish();
}
criterion_group!(
name = benches;
config = Criterion::default().with_measurement(CyclesPerByte);
targets = bench
);
criterion_main!(benches);
The entire project can be found here:
https://github.com/Schmid7k/RustCrypto-AES-Benchmarks

Related

Rust, how do i correctly free heap allocated memory?

I wanted to reinvent the wheel(reference counting smart pointer) and i am not sure how to properly free the memory leaked with Box::into_raw() , once the references go to zero i don't know how to efficiently free the memory that is being pointed
I originally went with
impl<T> Drop for SafePtr<T>{
fn drop(&mut self) {
//println!("drop, {} refs", self.get_refs());
self.dec_refs();
let ref_count = self.get_refs();
if ref_count == 0usize{
unsafe{
let _ = Box::from_raw(self.ptr);
let _ = Box::from_raw(self.refs);
};
println!("Dropped all pointed values");
};
}
}
but i was wondering if ptr::drop_in_place() would work the same if not better since it won't have to make a Box just to drop it

As you can see from the documentation on into_raw just drop_in_place is not enough, to free the memory you also have to call dealloc:
use std::alloc::{dealloc, Layout};
use std::ptr;
let x = Box::new(String::from("Hello"));
let p = Box::into_raw(x);
unsafe {
ptr::drop_in_place(p);
dealloc(p as *mut u8, Layout::new::<String>());
}
For performance considerations, both methods compile to the exact same instructions, so I'd just use drop(Box::from_raw(ptr)) to save me the hussle remembering the dealloc where applicable.

Why is returning a cloned ndarray throwing an overflow error(exceeded max recursion limit)?

I'm currently trying to write a function that is generally equivalent to numpy's tile. Currently each time I try to return a (altered or unaltered) clone of the input array, I get a warning about an overflow, and cargo prompts me to increase the recursion limit. however this function isn't recursive, so I'm assuming its happening somewhere in the implementation.
here is the stripped down function, (full version):
pub fn tile<A, D1, D2>(arr: &Array<A, D1>, reps: Vec<usize>) -> Array<A, D2>
where
A: Clone,
D1: Dimension,
D2: Dimension,
{
let num_of_reps = reps.len();
//just clone the array if reps is all ones
let mut res = arr.clone();
let bail_flag = true;
for &x in reps.iter() {
if x != 1 {
bail_flag = false;
}
}
if bail_flag {
let mut res_dim = res.shape().to_owned();
_new_shape(num_of_reps, res.ndim(), &mut res_dim);
res.to_shape(res_dim);
return res;
}
...
//otherwise do extra work
...
return res.reshape(shape_out);
}
this is the actual error I'm getting on returning res:
overflow evaluating the requirement `&ArrayBase<_, _>: Neg`
consider increasing the recursion limit by adding a `#![recursion_limit = "1024"]` attribute to your crate (`mfcc_2`)
required because of the requirements on the impl of `Neg` for `&ArrayBase<_, _>`
511 redundant requirements hidden
required because of the requirements on the impl of `Neg` for `&ArrayBase<OwnedRepr<A>, D1>`rustcE0275
I looked at the implementation of Neg in ndarray, it doesn't seem to be recursive, so I'm a little confused as to what is going on.
p.s. I'm aware there are other errors in this code, as those appeared after I switched from A to f64(the actual type I plan on using the function with), but those are mostly trivial to fix. Still if you have suggestions on any error you see I appreciate them nonetheless.

Recursive generator in Rust

I am trying to port this python prime number generator to rust using rust generators and this generator-to-iterator wrapper.
My problem is that the original implementation is recursive, and I didn't manage to get passed the following error:
error[E0720]: opaque type expands to a recursive type
--> src/main.rs:27:29
|
27 | fn recursive_generator() -> impl Iterator<Item = u64> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^ expands to a recursive type
|
= note: expanded type is `GeneratorToIterator<[generator#src/main.rs:28:25:
48:6 {u64, (), impl std::iter::Iterator}]>`
Below is the implementation of the recursive generator producing this error:
fn recursive_generator() -> impl Iterator<Item = u64> {
GeneratorToIterator(move || {
// Yield a few values
yield 1;
yield 2;
yield 3;
// Initialize the inner generator
let mut inner_value: u64 = 0;
let mut inner_generator = recursive_generator();
// Get the first value of the inner generator
match inner_generator.next(){
Some(x) => inner_value += x,
None => {},
}
// Yield some other values
yield 4;
yield 5;
yield 6;
})
}
The full implementation (with the GeneratorToIterator definition) can be found here.
I found this related question but I did not manage to compile their gen_to_iter function.
EDIT: Thanks to #bluss answer, I've finally been able to implement a working version of the prime number generator in rust.

The error explanation E0720 mentions that an impl Trait type must expand to one that doesn't contain other impl Trait types, but here the type is of course recursive, since that's the point.
This can be worked around by using boxed trait objects instead - Box<Iterator<Item=u64>> works well here and avoids the problem.
Either adjust recursive_generator to return Box<Iterator<Item=u64>>, or change the line of the recursive call to use the boxed iterator just for that recursive case.

Is there a way to drain parts of a vector based on a predicate?

I'm trying to remove some elements from a vector, based on a predicate, and collecting the result. Here's a (not working) example with an expected result:
let mut v: Vec<i32> = vec![1, 2, 3, 4, 5, 6];
let drained: Vec<i32> = v.iter().filter(|e| (*e) % 2 == 0).drain(..).collect();
assert_eq!(v, vec![1, 3, 5]);
assert_eq!(drained, vec![2, 4, 6]);
This results in the error
error[E0599]: no method named `drain` found for type `std::iter::Filter<std::slice::Iter<'_, i32>, [closure#src/main.rs:4:45: 4:62]>` in the current scope
--> src/main.rs:4:64
|
4 | let drained: Vec<i32> = v.iter().filter(|e| (*e) % 2 == 0).drain(..).collect();
| ^^^^^
There are several alternatives I looked at, none of them seem to be doing what I want:
Vec::retain removes the elements from the vector, but doesn't give back ownership of the removed elements.
v.drain(..).filter(condition).collect() returns the correct value for drained but empties the whole vector.

Not in stable Rust 1.33.0. There's an unstable nightly feature called drain_filter that does exactly what you want:
#![feature(drain_filter)]
fn main() {
let mut v: Vec<i32> = vec![1, 2, 3, 4, 5, 6];
let drained: Vec<i32> = v.drain_filter(|&mut e| e % 2 == 0).collect();
assert_eq!(v, vec![1, 3, 5]);
assert_eq!(drained, vec![2, 4, 6]);
}
As a stable workaround, you may be able to use Iterator::partition, but it does not reuse the memory:
fn main() {
let v: Vec<i32> = vec![1, 2, 3, 4, 5, 6];
let (drained, v): (Vec<_>, Vec<_>) = v.into_iter().partition(|&e| e % 2 == 0);
assert_eq!(v, vec![1, 3, 5]);
assert_eq!(drained, vec![2, 4, 6]);
}

The documentation state that Vec.retain will operate in-place, and visit each element in order, exactly once.
fn drain_where<T, Pred : Fn(&T) -> bool>(source: &mut Vec<T>, pred: Pred) -> Vec<T>
where T : Copy {
let mut drained: Vec<T> = Vec::new();
source.retain(|item| {
if pred(item) { drained.push(*item); false } else { true }
});
drained
}

I can suggest few other ways to do this + my benchmarks.
N.B. I compare all methods by few criteria:
Does it support external source of truth (my use-case). E.g. Vec::retain supports that, meaning that you can write code like
// conditions: &[bool]
assert_eq!(conditions.len(), my_vec.len());
let cond = conditions.iter().copied();
my_vec.retain(move|_|cond.next().unwrap());
Is method supported by third-party Vecs, namely ArrayVec, TinyVec, SmallVec, FixedSliceVec and others.
Is it fast.
So, let's begin
Sort slice than split slice
Features:
Can support external source of truth — No ❌. Would call closure O(n log n) times in unspecified order so support only predicates which calculated only directly from values.
Third-party support — Excellent ✅. You can use it on anything convertible to mutable slice.
Is it fast — In generic case, no ❌. It runs for O(n log n) time while other methods run for O(n) time. However, if preserving original relative order is not important for you, you can use sort_unstable_by_key which doesn't allocate memory at all, which can make it fastest in some scenarios.
Implementation:
v.sort_by_key(|x| predicate(x));
let split_pos = v.partition_point(|x| predicate(x));
let (false_slice, true_slice) = v.split_at_mut(split_pos)
Vec::drain_filter
Can support external source of truth — Yes ✅. Visits items in original order exactly one time.
Third-party support — Non existent ❌. Also, you can't it even use in stable Rust and it's tracking issue has been suffering from bikeshedding 5 years now (at 2022-07).
Is it fast — Yes ✅.
Code
let removed_items: Vec<_> = v.drain_filter(|x| predicate(x)).collect();
MaybeUninit trick using unsafe code
Well, I wrote it myself.
Features:
Can support external source of truth — Yes ✅. Visits items in original order exactly one time.
Third-party support — Supported ✅. Note that you must audit their implementation of retain to ensure that it cannot panic itself.
Is it fast — Yes ✅. In my benchmarks it is faster than Vec::drain_filter.
This implementation makes 2 assumptions:
Internal layout of Vec<T> and Vec<MaybeUninit<T>> is same. There is no reason why it is not because memory layout of T and MaybeUninit<T> is same but I also verify this using asserts. Note that asserts would be removed by compiler optimizer because they are always true.
retain doesn't panic itself (it can only propagate panics from element drop or from predicate). This is true for std::vec::Vec but you need to ensure that for third-party crates.
Algorithm is simple:
reinterpret initial vector Vec<T> as Vec<MaybeUninit<T>>;
wrap our predicate into new predicate that moves items which we want to remove into external storage;
let Vec::retain handle removal of items.
Also, only reason of usage of MaybeUninit and unsafe is avoiding double-frees so if your elements implement Copy, this algorithm can be implemented in safe Rust.
However, in that case you can just use filter(...).collect() and retain with almost same performance.
So, code is below with all comments why it is safe (note that I didn't test it using sanitizers or Miri so use it on your own risk):
/// Returns removed values.
fn retain_unsafe_generic<T: Sized>(
v: &mut Vec<T>,
mut which_to_keep: impl FnMut(&T) -> bool,
) -> Vec<T> {
use std::mem::{transmute, MaybeUninit};
/// # Safety
/// Caller must ensure that if it makes living copies of inner items,
/// those items is removed from original vec before original reference became usable again.
unsafe fn as_uninits<T: Sized>(v: &mut Vec<T>) -> &mut Vec<MaybeUninit<T>> {
let orig_ptr = v.as_ptr();
let orig_cap = v.capacity();
let orig_size = v.len();
let v: &mut Vec<MaybeUninit<T>> = unsafe {
// Safety: since `MaybeUninit` has same memory layout
// as wrapped type, we assume that we can treat vec with T
// as MaybeUninit<T>. This assumption checked by asserts below.
//
// Lifetimes of elements must be correctly enforced by caller.
transmute(v)
};
// Check if layout of Vec with different element type remains same.
assert_eq!(v.len(), orig_size);
assert_eq!(v.capacity(), orig_cap);
assert_eq!(v.as_ptr(), orig_ptr.cast());
v
}
let mut res: Vec<T> = Vec::with_capacity(v.len());
let v = unsafe {
// Safety: we keep result reference only in `retain` call.
// We would remove all moved elements using retain.
as_uninits(v)
};
v.retain(
// Safety: `Vec::retain` would remove all items which values we moved into `res`.
// It wouldn call `drop::<T>` for removed values
// because `MaybeUninit` never drops wrapped values.
|x| unsafe {
// Safety: it is safe because `Vec::retain` visits elements sequentally
// so we haven't moved value from `x` yet.
// https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain
let val = &*x.as_ptr();
if which_to_keep(val) {
return true;
}
res.reserve(1);
// Any panic before this place is safe because
// 1. We didn't moved value from `x` yet;
// 2. In case of panic in predicate, `Vec::retain` preserve current value.
// Here we could probably use `Vec::push`
// but compiler currently fails to remove capacity check in `Vec::push`
// so this function became slower than `Vec::drain_filter`
// https://godbolt.org/z/7fhnnMh46
// And `Vec::push(x.assume_init_read())` is unsafe operation too anyway.
let old_len = res.len();
// Safety: We just allocated memory for this place.
let dst = res.as_mut_ptr().add(old_len);
// Safety: since we cannot panic until the end of closure
// and `Vec::retain` wouldn't panic and would remove `x`,
// making bitwise copy of `x` is safe.
x.as_ptr().copy_to_nonoverlapping(dst, 1);
// Safety: we just wrote additional value.
res.set_len(old_len + 1);
false
},
);
res
}
Benchmarks
Code of benchmarks is long so here link to the gist: https://gist.github.com/AngelicosPhosphoros/7ee482316bc1c83945f88308954e0d7e
It tries to split away odd numbers from they Vec using all three algorithms I listed.
Results:
algorithm
Mixed
Odds first
Evens first
sort-split
465us
35us
10us
drain_filter
26us
24us
22.5us
retain-uninit
17us
21us
19us
As you see, retain usage won in all cases except when sort-split doesn't actually have anything to do.
It is mainly because Vec::retain has been rigorously optimized over the years.

How can this imperative code be rewritten to be more functional?

I found an answer on SO that explained how to write a randomly weighted drop system for a game. I would prefer to write this code in a more functional-programming style but I couldn't figure out a way to do that for this code. I'll inline the pseudo code here:
R = (some random int);
T = 0;
for o in os
T = T + o.weight;
if T > R
return o;
How could this be written in a style that's more functional? I am using CoffeeScript and underscore.js, but I'd prefer this answer to be language agnostic because I'm having trouble thinking about this in a functional way.

Here are two more functional versions in Clojure and JavaScript, but the ideas here should work in any language that supports closures. Basically, we use recursion instead of iteration to accomplish the same thing, and instead of breaking in the middle we just return a value and stop recursing.
Original pseudo code:
R = (some random int);
T = 0;
for o in os
T = T + o.weight;
if T > R
return o;
Clojure version (objects are just treated as clojure maps):
(defn recursive-version
[r objects]
(loop [t 0
others objects]
(let [obj (first others)
new_t (+ t (:weight obj))]
(if (> new_t r)
obj
(recur new_t (rest others))))))
JavaScript version (using underscore for convenience).
Be careful, because this could blow out the stack.
This is conceptually the same as the clojure version.
var js_recursive_version = function(objects, r) {
var main_helper = function(t, others) {
var obj = _.first(others);
var new_t = t + obj.weight;
if (new_t > r) {
return obj;
} else {
return main_helper(new_t, _.rest(others));
}
};
return main_helper(0, objects);
};

You can implement this with a fold (aka Array#reduce, or Underscore's _.reduce):
An SSCCE:
items = [
{item: 'foo', weight: 50}
{item: 'bar', weight: 35}
{item: 'baz', weight: 15}
]
r = Math.random() * 100
{item} = items.reduce (memo, {item, weight}) ->
if memo.sum > r
memo
else
{item, sum: memo.sum + weight}
, {sum: 0}
console.log 'r:', r, 'item:', item
You can run it many times at coffeescript.org and see that the results make sense :)
That being said, i find the fold a bit contrived, as you have to remember both the selected item and the accumulated weight between iterations, and it doesn't short-circuit when the item is found.
Maybe a compromise solution between pure FP and the tedium of reimplementing a find algorithm can be considered (using _.find):
total = 0
{item} = _.find items, ({weight}) ->
total += weight
total > r
Runnable example.
I find (no pun intended) this algorithm much more accessible than the first one (and it should perform better, as it doesn't create intermediate objects, and it does short-circuiting).
Update/side-note: the second algorithm is not "pure" because the function passed to _.find is not referentially transparent (it has the side effect of modifying the external total variable), but the whole of the algorithm is referentially transparent. If you were to encapsulate it in a findItem = (items, r) -> function, the function will be pure and will always return the same output for the same input. That's a very important thing, because it means that you can get the benefits of FP while using some non-FP constructs (for performance, readability, or whatever reason) under the hoods :D

I think the underlying task is randomly selecting 'events' (objects) from array os with a frequency defined by their respective weights. The approach is to map (i.e. search) a random number (with uniform distribution) onto the stairstep cumulative probability distribution function.
With positive weights, their cumulative sum is increasing from 0 to 1. The code you gave us simply searches starting at the 0 end. To maximize speed with repeated calls, pre calculate sums, and order the events so the largest weights are first.
It really doesn't matter whether you search with iteration (looping) or recursion. Recursion is nice in a language that tries to be 'purely functional' but doesn't help understanding the underlying mathematical problem. And it doesn't help you package the task into a clean function. The underscore functions are another way of packaging the iterations, but don't change the basic functionality. Only any and all exit early when the target is found.
For small os array this simple search is sufficient. But with a large array, a binary search will be faster. Looking in underscore I find that sortedIndex uses this strategy. From Lo-Dash (an underscore dropin), "Uses a binary search to determine the smallest index at which the value should be inserted into array in order to maintain the sort order of the sorted array"
The basic use of sortedIndex is:
os = [{name:'one',weight:.7},
{name:'two',weight:.25},
{name:'three',weight:.05}]
t=0; cumweights = (t+=o.weight for o in os)
i = _.sortedIndex(cumweights, R)
os[i]
You can hide the cumulative sum calculation with a nested function like:
osEventGen = (os)->
t=0; xw = (t+=y.weight for y in os)
return (R) ->
i = __.sortedIndex(xw, R)
return os[i]
osEvent = osEventGen(os)
osEvent(.3)
# { name: 'one', weight: 0.7 }
osEvent(.8)
# { name: 'two', weight: 0.25 }
osEvent(.99)
# { name: 'three', weight: 0.05 }
In coffeescript, Jed Clinger's recursive search could be written like this:
foo = (x, r, t=0)->
[y, x...] = x
t += y
return [y, t] if x.length==0 or t>r
return foo(x, r, t)
An loop version using the same basic idea is:
foo=(x,r)->
t=0
while x.length and t<=r
[y,x...]=x # the [first, rest] split
t+=y
y
Tests on jsPerf http://jsperf.com/sortedindex
suggest that sortedIndex is faster when os.length is around 1000, but slower than the simple loop when the length is more like 30.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Rust-Crypto: AES-CBC produces weird benchmarking results - encryption

Related

Rust, how do i correctly free heap allocated memory?

Why is returning a cloned ndarray throwing an overflow error(exceeded max recursion limit)?

Recursive generator in Rust

Is there a way to drain parts of a vector based on a predicate?

How can this imperative code be rewritten to be more functional?

Categories

Resources