How to rename all keys in a serde_json::Map? - dictionary

Let's say I have a &mut std::collections::HashMap, and I want to turn all the keys into uppercase. The following code does the trick:
use std::collections::HashMap;
fn keys_to_upper<T>(map: &mut HashMap<String, T>) {
let mut tmp = Vec::with_capacity(map.len());
for (key, val) in map.drain() {
tmp.push((key.to_ascii_uppercase(), val));
}
for (key, val) in tmp {
map.insert(key, val);
}
}
Unfortunately, I don't have a HashMap but a &mut serde_json::Map, and I want to turn all the keys into uppercase. There is no .drain() method. I could use .into_iter() instead, but that would only give me mutable references to the keys and values. To insert them into the map again I would have to clone them, which would hurt performance.
Is there some way here to get around the absense of the .drain() method?

A nice tool in your Rust programmer toolbox: std::mem::take.
This lets you change a &mut T to a T if the type implements default (if it doesn't, but the type still has a dummy/cheap value you can use, then std::mem::replace is your function of choice).
Applied to your current use-case this gives:
use serde_json::{Map, Value};
fn keys_to_upper<T>(map: &mut Map<String, Value>) {
*map = std::mem::take(map)
.into_iter()
.map(|(k, v)| (k.to_ascii_uppercase(), v))
.collect();
}

Related

Stack of references in unsafe Rust, but ensuring that the unsafeness does not leak out of the stack?

I'm implementing some recursive code, where function instances deeper down in the call stack may need to refer to data from prior frames. However, I only have non-mut access to those data, so I receive those data as references. As such, I would need to keep references to those data in a stack data structure that can be accessed from the deeper instances.
To illustrate:
// I would like to implement this RefStack class properly, without per-item memory allocations
struct RefStack<T: ?Sized> {
content: Vec<&T>,
}
impl<T: ?Sized> RefStack<T> {
fn new() -> Self { Self{ content: Vec::new() } }
fn get(&self, index: usize) -> &T { self.content[index] }
fn len(&self) -> usize { self.content.len() }
fn with_element<F: FnOnce(&mut Self)>(&mut self, el: &T, f: F) {
self.content.push(el);
f(self);
self.content.pop();
}
}
// This is just an example demonstrating how I would need to use the RefStack class
fn do_recursion(n: usize, node: &LinkedListNode, st: &mut RefStack<str>) {
// get references to one or more items in the stack
// the references should be allowed to live until the end of this function, but shouldn't prevent me from calling with_element() later
let tmp: &str = st.get(rng.gen_range(0, st.len()));
// do stuff with those references (println is just an example)
println!("Item: {}", tmp);
// recurse deeper if necessary
if n > 0 {
let (head, tail): (_, &LinkedListNode) = node.get_parts();
manager.get_str(head, |s: &str| // the actual string is a local variable somewhere in the implementation details of get_str()
st.with_element(s, |st| do_recursion(n - 1, tail, st))
);
}
// do more stuff with those references (println is just an example)
println!("Item: {}", tmp);
}
fn main() {
do_recursion(100, list /* gotten from somewhere else */, &mut RefStack::new());
}
In the example above, I'm concerned about how to implement RefStack without any per-item memory allocations. The occasional allocations by the Vec is acceptable - those are few and far in between. The LinkedListNode is just an example - in practice it's some complicated graph data structure, but the same thing applies - I only have a non-mut reference to it, and the closure given to manager.get_str() only provides a non-mut str. Note that the non-mut str passed into the closure may only be constructed in the get_str() implementation, so we cannot assume that all the &str have the same lifetime.
I'm fairly certain that RefStack can't be implemented in safe Rust without copying out the str into owned Strings, so my question is how this can be done in unsafe Rust. It feels like I might be able to get a solution such that:
The unsafeness is confined to the implementation of RefStack
The reference returned by st.get() should live at least as long as the current instance of the do_recursion function (in particular, it should be able to live past the call to st.with_element(), and this is logically safe since the &T that is returned by st.get() isn't referring to any memory owned by the RefStack anyway)
How can such a struct be implemented in (unsafe) Rust?
It feels that I could just cast the element references to pointers and store them as pointers, but I will still face difficulties expressing the requirement in the second bullet point above when casting them back to references. Or is there a better way (or by any chance is such a struct implementable in safe Rust, or already in some library somewhere)?
I think storing raw pointer is the way to go. You need a PhantomData to store the lifetime and get proper covariance:
use std::marker::PhantomData;
struct RefStack<'a, T: ?Sized> {
content: Vec<*const T>,
_pd: PhantomData<&'a T>,
}
impl<'a, T: ?Sized> RefStack<'a, T> {
fn new() -> Self {
RefStack {
content: Vec::new(),_pd: PhantomData
}
}
fn get(&self, index: usize) -> &'a T {
unsafe { &*self.content[index] }
}
fn len(&self) -> usize {
self.content.len()
}
fn with_element<'t, F: FnOnce(&mut RefStack<'t, T>)>(&mut self, el: &'t T, f: F)
where 'a: 't,
{
self.content.push(el);
let mut tmp = RefStack {
content: std::mem::take(&mut self.content),
_pd: PhantomData,
};
f(&mut tmp);
self.content = tmp.content;
self.content.pop();
}
}
(Playground)
The only unsafe code is in converting the pointer back into a reference.
The tricky part is getting the with_element right. I think that the were 'a: 't is implicit, because the whole impl depends on it (but better safe than sorry).
The last problem is how to convert a RefStack<'a, T> into a RefStack<'t, T>. I'm pretty sure I could just std::transmute it. But that would an extra unsafe to pay attention to, and creating a new temporary stack one is quite trivial.
About the 't lifetime
You may think that this 't lifetime is not actually needed, but not adding it may cause subtle unsoundness, as the callback could call get() and get values with a lifetime 'a that is actually longer than the inserted value.
For example this code should not compile. With the 't it correctly fails, but without it it compiles and causes undefined behavior:
fn breaking<'a, 's, 'x>(st: &'s mut RefStack<'a, i32>, v: &'x mut Vec<&'a i32>) {
v.push(st.get(0));
}
fn main() {
let mut st = RefStack::<i32>::new();
let mut y = Vec::new();
{
let i = 42;
st.with_element(&i, |stack| breaking(stack, &mut y));
}
println!("{:?}", y);
}
About panic!.
When doing these kind of unsafe things, particularly when you are calling user code, as we are doing now in with_element, we have to consider what would happen if it panics. In the OP code, the last object will not be popped, and when the stack is unwound, any drop implementation could see the now dangling reference. My code is ok in case of panics because, if f(&mut tmp); the dangling references die in the local temporary tmp while self.content is empty.
Disclaimer: this answer originally used traits, and it was a nightmare; Francis Gagne pointed out rightly that using an Option for the tail was a much better alternative, hence the answer was much simplified.
Given the structure of your usage, with the stack in RefStack following the usage of the stack frames, you can simply put elements on the stack frames and build a stack from that.
The main advantage of such an approach is that it is entirely safe. You can review the whole code here, or follow for the blow by blow description which follows.
The key is idea is to build a so-called cons-list.
#[derive(Debug)]
struct Stack<'a, T> {
element: &'a T,
tail: Option<&'a Stack<'a, T>>,
}
impl<'a, T> Stack<'a, T> {
fn new(element: &'a T) -> Self { Stack { element, tail: None } }
fn top(&self) -> &T { self.element }
fn get(&self, index: usize) -> Option<&T> {
if index == 0 {
Some(self.element)
} else {
self.tail.and_then(|tail| tail.get(index - 1))
}
}
fn tail(&self) -> Option<&'a Stack<'a, T>> { self.tail }
fn push<'b>(&'b self, element: &'b T) -> Stack<'b, T> { Stack { element, tail: Some(self) } }
}
A simple example of usage is:
fn immediate() {
let (a, b, c) = (0, 1, 2);
let root = Stack::new(&a);
let middle = root.push(&b);
let top = middle.push(&c);
println!("{:?}", top);
}
Which just prints the stack, yielding:
Stack { element: 2, tail: Some(Stack { element: 1, tail: Some(Stack { element: 0, tail: None }) }) }
And a more elaborate recursive version:
fn recursive(n: usize) {
fn inner(n: usize, stack: &Stack<'_, i32>) {
if n == 0 {
print!("{:?}", stack);
return;
}
let element = n as i32;
let stacked = stack.push(&element);
inner(n - 1, &stacked);
}
if n == 0 {
println!("()");
return;
}
let element = n as i32;
let root = Stack::new(&element);
inner(n - 1, &root);
}
Which prints:
Stack { element: 1, tail: Some(Stack { element: 2, tail: Some(Stack { element: 3, tail: None }) }) }
The one downside is that get performance may not be so good; it has linear complexity. On the other hand, cache-wise sticking to the stack frames is pretty nice. If you mostly access the first few elements, I expect it'll be good enough.
Disclaimer: A different answer; with a different trade-off.
Compared to my other answer, this answer presents a solution that is:
unsafe: it's encapsulated, but subtle.
simpler to use.
simpler code, likely faster.
The idea is to still use the stack to bind the lifetimes of the references, yet storing all lifetimes in a single Vec for O(1) random access. So we're building a stack on the stack, but not storing the references themselves on the stack. Alright?
The full code is available here.
The stack itself is very easily defined:
struct StackRoot<T: ?Sized>(Vec<*const T>);
struct Stack<'a, T: ?Sized>{
len: usize,
stack: &'a mut Vec<*const T>,
}
impl<T: ?Sized> StackRoot<T> {
fn new() -> Self { Self(vec!()) }
fn stack(&mut self) -> Stack<'_, T> { Stack { len: 0, stack: &mut self.0 } }
}
The implementation of Stack is trickier, as always when unsafe is involved:
impl<'a, T: ?Sized> Stack<'a, T> {
fn len(&self) -> usize { self.len }
fn get(&self, index: usize) -> Option<&'a T> {
if index < self.len {
// Safety:
// - Index is bounds as per above branch.
// - Lifetime of reference is guaranteed to be at least 'a (see push).
Some(unsafe { &**self.stack.get_unchecked(index) })
} else {
None
}
}
fn push<'b>(&'b mut self, element: &'b T) -> Stack<'b, T>
where
'a: 'b
{
// Stacks could have been built and forgotten, resulting in `self.stack`
// containing references to further elements, so that the newly pushed
// element would not be at index `self.len`, as expected.
//
// Note that on top of being functionally important, it's also a safety
// requirement: `self` should never be able to access elements that are
// not guaranteed to have a lifetime longer than `'a`.
self.stack.truncate(self.len);
self.stack.push(element as *const _);
Stack { len: self.len + 1, stack: &mut *self.stack }
}
}
impl<'a, T: ?Sized> Drop for Stack<'a, T> {
fn drop(&mut self) {
self.stack.truncate(self.len);
}
}
Do note the unsafe here; the invariant is that 'a parameter is always stricter that the lifetimes of the elements pushed into the stack so far.
By refusing to access elements pushed by other members, we thus guarantee that the lifetime of the returned reference is valid.
It does require a generic definition of do_recursion, however generic lifetime parameters are erased at code generation, so there's no code bloat involved:
fn do_recursion<'a, 'b>(nodes: &[&'a str], stack: &mut Stack<'b, str>)
where
'a: 'b
{
let tmp: &str = stack.get(stack.len() - 1).expect("Not empty");
println!("{:?}", tmp);
if let [head, tail # ..] = nodes {
let mut new = stack.push(head);
do_recursion(tail, &mut new);
}
}
A simple main to show it off:
fn main() {
let nodes = ["Hello", ",", "World", "!"];
let mut root = StackRoot::new();
let mut stack = root.stack();
let mut stack = stack.push(nodes[0]);
do_recursion(&nodes[1..], &mut stack);
}
Resulting in:
"Hello"
","
"World"
"!"
Based on rodrigo's answer, I implemented this slightly simpler version:
struct RefStack<'a, T: ?Sized + 'static> {
content: Vec<&'a T>,
}
impl<'a, T: ?Sized + 'static> RefStack<'a, T> {
fn new() -> Self {
RefStack {
content: Vec::new(),
}
}
fn get(&self, index: usize) -> &'a T {
self.content[index]
}
fn len(&self) -> usize {
self.content.len()
}
fn with_element<'t, F: >(&mut self, el: &'t T, f: F)
where
F: FnOnce(&mut RefStack<'t, T>),
'a: 't,
{
let mut st = RefStack {
content: std::mem::take(&mut self.content),
};
st.content.push(el);
f(&mut st);
st.content.pop();
self.content = unsafe { std::mem::transmute(st.content) };
}
}
The only difference to rodrigo's solution is that the vector is represented as vector of references instead of pointers, so we don't need the PhantomData and the unsafe code to access an element.
When a new element is pushed to the stack in with_element(), we require that it has a shorter lifetime than the existing elements with the a': t' bound. We then create a new stack with the shorter lifetime, which is possible in safe code since we know the data the references in the vector are pointing to even lives for the longer lifetime 'a. We then push the new element with lifetime 't to the new vector, again in safe code, and only after we removed that element again we move the vector back in it's original place. This requires unsafe code since we are extending the lifetime of the references in the vector from 't to 'a this time. We know this is safe, since the vector is back to its original state, but the compiler doesn't know this.
I feel this version represents the intent better than rodrigo's almost identical version. The type of the vector always is "correct", in that it desribes that the elements are actually references, not raw pointers, and it always assigns the correct lifetime to the vector. And we use unsafe code exactly in the place where something potentially unsafe happens – when extending the lifetime of the references in the vector.

Is there a way to convert a ChunkMut<T> from Vec::chunks_mut to a slice &mut [T]?

I'm filling a vector in parallel, but for this generalized question, I've only found hints and no answers.
The code below works, but I want to switch to Rng::fill instead of iterating over each chunk. It might not be possible to have multiple mutable slices inside a single Vec; I'm not sure.
extern crate rayon;
extern crate rand;
extern crate rand_xoshiro;
use rand::{Rng, SeedableRng};
use rand_xoshiro::Xoshiro256StarStar;
use rayon::prelude::*;
use std::{iter, env};
use std::sync::{Arc, Mutex};
// i16 because I was filling up my RAM for large tests...
fn gen_rand_vec(data: &mut [i16]) {
let num_threads = rayon::current_num_threads();
let mut rng = rand::thread_rng();
let mut prng = Xoshiro256StarStar::from_rng(&mut rng).unwrap();
// lazy iterator of fast, unique RNGs
// Arc and Mutex are just so it can be accessed from multiple threads
let rng_it = Arc::new(Mutex::new(iter::repeat(()).map(|()| {
let new_prng = prng.clone();
prng.jump();
new_prng
})));
// generates random numbers for each chunk in parallel
// par_chunks_mut is parallel version of chunks_mut
data.par_chunks_mut(data.len() / num_threads).for_each(|chunk| {
// I used extra braces because it might be required to unlock Mutex.
// Not sure.
let mut prng = { rng_it.lock().unwrap().next().unwrap() };
for i in chunk.iter_mut() {
*i = prng.gen_range(-1024, 1024);
}
});
}
It turns out that a ChunksMut iterator gives slices. I'm not sure how to glean that from the documentation. I figured it out by reading the source:
#[derive(Debug)]
#[stable(feature = "rust1", since = "1.0.0")]
pub struct ChunksMut<'a, T:'a> {
v: &'a mut [T],
chunk_size: usize
}
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, T> Iterator for ChunksMut<'a, T> {
type Item = &'a mut [T];
#[inline]
fn next(&mut self) -> Option<&'a mut [T]> {
if self.v.is_empty() {
None
} else {
let sz = cmp::min(self.v.len(), self.chunk_size);
let tmp = mem::replace(&mut self.v, &mut []);
let (head, tail) = tmp.split_at_mut(sz);
self.v = tail;
Some(head)
}
}
I guess it's just experience; to others it must be obvious that an iterator of type ChunksMut<T> from Vec<T> returns objects of type [T]. That makes sense now. It just wasn't very clear with the intermediate struct.
pub fn chunks_mut(&mut self, chunk_size: usize) -> ChunksMut<T>
// ...
impl<'a, T> Iterator for ChunksMut<'a, T>
Reading this, it looked like the iterator returned objects of type T, the same as Vec<T>.iter(), which wouldn't make sense.

It is possible to collect a &mut from an iterator?

I am trying to learn more about ownership. Here is some code that doesn't work because collect doesn't let you get a &mut String:
fn search(word: &str, data: &mut Vec<String>) {
data = data
.iter()
.filter(|x| x.contains(word))
.collect::<&mut Vec<String>>();
}
I think I could just return a cloned version, but is this the only/preferred way to do it?
No, it is not possible. For this to be possible, collect would have to return a reference to something it created, and that's not possible.
You are looking for Vec::retain:
fn search(word: &str, data: &mut Vec<String>) {
data.retain(|x| x.contains(word));
}
If you didn't want to mutate the passed-in data, you would indeed need to return a new Vec:
fn search<'a>(word: &str, data: &'a [String]) -> Vec<&'a String> {
data.iter().filter(|x| x.contains(word)).collect()
}
See also:
Is there any way to return a reference to a variable created in a function?
Why is it discouraged to accept a reference to a String (&String), Vec (&Vec) or Box (&Box) as a function argument?

Is it safe to use a raw pointer to access the &T of a RefCell<HashMap<T>>?

I have a cache-like structure which internally uses a HashMap:
impl Cache {
fn insert(&mut self, k: u32, v: String) {
self.map.insert(k, v);
}
fn borrow(&self, k: u32) -> Option<&String> {
self.map.get(&k)
}
}
Playground with external mutability
Now I need internal mutability. Since HashMap does not implement Copy, my guess is that RefCell is the path to follow. Writing the insert method is straight forward but I encountered problems with the borrow-function. I could return a Ref<String>, but since I'd like to cache the result, I wrote a small Ref-wrapper:
struct CacheRef<'a> {
borrow: Ref<'a, HashMap<u32, String>>,
value: &'a String,
}
This won't work since value references borrow, so the struct can't be constructed. I know that the reference is always valid: The map can't be mutated, because Ref locks the map. Is it safe to use a raw pointer instead of a reference?
struct CacheRef<'a> {
borrow: Ref<'a, HashMap<u32, String>>,
value: *const String,
}
Am I overlooking something here? Are there better (or faster) options? I'm trying to avoid RefCell due to the runtime overhead.
Playground with internal mutability
I'll complement #Shepmaster's safe but not quite as efficient answer with the unsafe version. For this, we'll pack some unsafe code in a utility function.
fn map_option<'a, T, F, U>(r: Ref<'a, T>, f: F) -> Option<Ref<'a, U>>
where
F: FnOnce(&'a T) -> Option<&'a U>
{
let stolen = r.deref() as *const T;
let ur = f(unsafe { &*stolen }).map(|sr| sr as *const U);
match ur {
Some(u) => Some(Ref::map(r, |_| unsafe { &*u })),
None => None
}
}
I'm pretty sure this code is correct. Although the compiler is rather unhappy with the lifetimes, they work out. We just have to inject some raw pointers to make the compiler shut up.
With this, the implementation of borrow becomes trivial:
fn borrow<'a>(&'a self, k: u32) -> Option<Ref<'a, String>> {
map_option(self.map.borrow(), |m| m.get(&k))
}
Updated playground link
The utility function only works for Option<&T>. Other containers (such as Result) would require their own modified copy, or else GATs or HKTs to implement generically.
I'm going to ignore your direct question in favor of a definitely safe alternative:
impl Cache {
fn insert(&self, k: u32, v: String) {
self.map.borrow_mut().insert(k, v);
}
fn borrow<'a>(&'a self, k: u32) -> Option<Ref<'a, String>> {
let borrow = self.map.borrow();
if borrow.contains_key(&k) {
Some(Ref::map(borrow, |hm| {
hm.get(&k).unwrap()
}))
} else {
None
}
}
}
Ref::map allows you to take a Ref<'a, T> and convert it into a Ref<'a, U>. The ugly part of this solution is that we have to lookup in the hashmap twice because I can't figure out how to make the ideal solution work:
Ref::map(borrow, |hm| {
hm.get(&k) // Returns an `Option`, not a `&...`
})
This might require Generic Associated Types (GATs) and even then the return type might be a Ref<Option<T>>.
As mentioned by Shepmaster, it is better to avoid unsafe when possible.
There are multiple possibilities:
Ref::map, with double look-up (as illustrated by Shepmaster's answer),
Ref::map with sentinel value,
Cloning the return value.
Personally, I'd consider the latter first. Store Rc<String> into your map and your method can easily return a Option<Rc<String>> which completely sidesteps the issues:
fn get(&self, k: u32) -> Option<Rc<String>> {
self.map.borrow().get(&k).cloned()
}
As a bonus, your cache is not "locked" any longer while you use the result.
Or, alternatively, you can work-around the fact that Ref::map does not like Option by using a sentinel value:
fn borrow<'a>(&'a self, k: u32) -> Ref<'a, str> {
let borrow = self.map.borrow();
Ref::map(borrow, |map| map.get(&k).map(|s| &s[..]).unwrap_or(""))
}

How can I invoke an unknown Rust function with some arguments using reflection?

I'm having a lot of fun playing around with Rust having been a C# programmer for a long time but I have a question around reflection. Maybe I don't need reflection in this case but given that Rust is strongly typed I suspect I do (I would definitely need it in good ol' C#, bless its cotton socks).
I have this situation:
use std::collections::HashMap;
fn invoke_an_unknown_function(
hashmap: HashMap<String, String>,
// Something to denote a function I know nothing about goes here
) {
// For each key in the hash map, assign the value
// to the parameter argument whose name is the key
// and then invoke the function
}
How would I do that? I'm guessing I need to pass in some sort of MethodInfo as the second argument to the function and then poke around with that to get the arguments whose name is the key in the hash map and assign the values but I had a look around for the reflection API and found the following pre-Rust 1.0 documentation:
Module std::reflect
Module std::repr
[rust-dev] Reflection system
None of these give me enough to go on to get started. How would I implement the function I describe above?
Traits are the expected way to implement a fair amount of what reflection is (ab)used for elsewhere.
trait SomeInterface {
fn exposed1(&self, a: &str) -> bool;
fn exposed2(&self, b: i32) -> i32;
}
struct Implementation1 {
value: i32,
has_foo: bool,
}
impl SomeInterface for Implementation1 {
fn exposed1(&self, _a: &str) -> bool {
self.has_foo
}
fn exposed2(&self, b: i32) -> i32 {
self.value * b
}
}
fn test_interface(obj: &dyn SomeInterface) {
println!("{}", obj.exposed2(3));
}
fn main() {
let impl1 = Implementation1 {
value: 1,
has_foo: false,
};
test_interface(&impl1);
}

Resources