interfacing Rust with Berkeley DB - berkeley-db

I have an existing C++ program that uses Berkeley DB as a storage backend. I would like to rewrite it in Rust. Is there a way to write a Foreign Function Interface in Rust to use Berkeley DB? I have found the tutorial Rust Foreign Function Interface, but it seems too simple an example for the complicated C structs used in BDB; for example, to open a database
I need to declare a DB struct and call DB->open(). But I don't know how to do this using the example shown in the tutorial.
Can anyone help with this?

Well, looking into the C API of BDB I found out that it consists of C structures with elements-pointers to functions. It is not explained in the tutorial (which is very strange), but Rust currently supports pointers to foreign functions. It is also mentioned in Rust reference manual.
You can create all required structures roughly based on the ones defined in db.h, and since Rust and C structures memory layout is the same you can pass these structures to/from the library and expect correct pointers to be present in them.
For example, your DB->open() call could look like this:
struct DB {
open: extern "C" fn()
}
let db = ... // Get DB from somewhere
(db.open)() // Parentheses around db.open are needed to disambiguate field access
This, however, really should be wrapped in some kind of impl-based interface because calling extern functions is unsafe operation, and you do not want your users to put unsafe around all database interactions.

Given the sheer size and complexity of the DB struct, there doesn't appear to be a "clean" way to expose the whole thing to Rust. A tool similar to C2HS to generate the FFI from C headers would be nice, but alas we don't have one.
Note also that the Rust FFI can't currently call into C++ libraries, so you'll have to use the C API instead.
I'm not familiar with the DB APIs at all, but it appears plausible to create a small support library in C to actually create an instance of the DB struct, then expose the public members of the struct __db via getter and setter functions.
Your implementation might look something like this:
[#link_args = "-lrust_dbhelper"]
extern {
fn create_DB() -> *c_void;
fn free_DB(db: *c_void);
}
struct DB {
priv db: *c_void
}
impl Drop for DB {
fn drop(&self) {
free_DB(self.db);
}
}
priv struct DBAppMembers {
pgsize: u32,
priority: DBCachePriority
// Additional members omitted for brevity
}
impl DB {
pub fn new() -> DB {
DB {
db: create_DB()
}
}
pub fn set_pgsize(&mut self, u32 pgsize) {
unsafe {
let x: *mut DBAppMembers = ::std::ptr::transmute(self.db);
x.pgsize = pgsize;
}
}
// Additional methods omitted for brevity
}
You can save yourself from some additional work by specifically calling C functions with the DB.db member as a parameter, but that requires working in an unsafe context, which should probably be avoided where possible. Otherwise, each function exported by libdb will need to have its own wrapper in your native struct DB.

Related

Using dyn async traits (with async-trait crate) in spawned tokio task

I'm working on an asynchronous rust application which utilizes tokio. I'd also like to define some trait methods as async and have opted for the async-trait crate rather than the feature in the nightly build so that I can use them as dyn objects. However, I'm running into issues trying to use these objects in a task spawned with tokio::spawn. Here's a minimal complete example:
use std::time::Duration;
use async_trait::async_trait;
#[tokio::main]
async fn main() {
// These two lines based on the examples for dyn traits in the async-trait create
let value = MyStruct::new();
let object = &value as &dyn MyTrait;
tokio::spawn(async move {
object.foo().await;
});
}
#[async_trait]
trait MyTrait {
async fn foo(&self);
}
struct MyStruct {}
impl MyStruct {
fn new() -> MyStruct {
MyStruct {}
}
}
#[async_trait]
impl MyTrait for MyStruct {
async fn foo(&self) {
tokio::time::sleep(Duration::from_secs(1)).await;
}
}
When I compile this I get the following output:
error: future cannot be sent between threads safely
--> src/main.rs:11:18
|
11 | tokio::spawn(async move {
| __________________^
12 | | object.foo().await;
13 | | });
| |_____^ future created by async block is not `Send`
|
= help: the trait `Sync` is not implemented for `dyn MyTrait`
note: captured value is not `Send` because `&` references cannot be sent unless their referent is `Sync`
--> src/main.rs:12:9
|
12 | object.foo().await;
| ^^^^^^ has type `&dyn MyTrait` which is not `Send`, because `dyn MyTrait` is not `Sync`
note: required by a bound in `tokio::spawn`
--> /home/wilyle/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/task/spawn.rs:163:21
|
163 | T: Future + Send + 'static,
| ^^^^ required by this bound in `spawn`
error: could not compile `async-test` due to previous error
(The results are similar when making object boxed with let object: Box<dyn MyTrait> = Box::new(MyStruct::new()); and when moving the construction fully inside the tokio::spawn call)
By messing around and trying a few things I found that I could solve the issue by boxing object and adding additional trait bounds. Replacing the first two lines of main in my example with the following seems to work just fine:
let object: Box<dyn MyTrait + Send + Sync> = Box::new(MyStruct::new());
So I have two questions:
Why doesn't my original example work? Is it some inconsistency between the two libraries I'm trying to use or am I approaching async programming in rust incorrectly?
Is the solution of adding additional trait bounds the right way to solve this? I'm rather new to rust and have only been programming with it for a few months so I wouldn't be surprised to hear I'm just approaching this wrong.
If you're not sure what Send and Sync mean, check out those documentation links. Something to note is that if T is Sync, then &T is Send.
Question #2 is simple: yes this is the right way to do it. async-trait uses Pin<Box<dyn Future + Send>> as its return type for basically the same reasons. Note that you can only add auto traits to trait objects.
For question #1, there's two issues: Send and 'static.
Send
When you cast something as dyn MyTrait, you're removing all the original type information and replacing it with the type dyn MyTrait. That means you lose the auto-implemented Send and Sync traits on MyStruct. The tokio::spawn function requires Send.
This issue isn't inherent to async, it's because tokio::spawn will run the future on its threadpool, possibly sending it to another thread. You can run the future without tokio::spawn, for example like this:
fn main() {
let runtime = tokio::runtime::Runtime::new().unwrap();
let value = MyStruct::new();
let object = &value as &dyn MyTrait;
runtime.block_on(object.foo());
}
The block_on function runs the future on the current thread, so Send is not necessary. And it blocks until the future is done, so 'static is also not needed. This is great for things that are created at runtime and contain the entire logic of the program, but for dyn Trait types you usually have other things going on that makes this not as useful.
'static
When something requires 'static, it means that all references need to live as long as 'static. One way of satisfying that is to remove all references. In an ideal world you could do:
let object = value as dyn MyTrait;
However, rust doesn't support dynamically sized types on the stack or as function arguments. We're trying to remove all references, so &dyn MyTrait isn't going to work (unless you leak or have a static variable). Box lets you have ownership over dynamically sized types by putting them on the heap, eliminating the lifetime.
You need Send for this because the upgrade from Sync to Send only happens with &, not Box. Instead, Box<T> is Send when T is Send.
Sync is more subtle. While spawn doesn't require Sync, the async block does require Send + Sync to be Send. Since foo takes &self, that means it returns a Future that holds &self. That type is then polled, so in between polls &self could be sent in between threads. And as before, &T is Send if T is Sync. However, if you change it to foo(&mut self) it compiles without + Sync. Makes sense since now it can check that it's not being used concurrently, but it seems to me like the &self verison could be allowed in the future.

Rust, std::cell::Cell - get immutable reference to inner data

Looking through the documentation for std::cell::Cell, I don't see anywhere how I can retrieve a non-mutable reference to inner data. There is only the get_mut method: https://doc.rust-lang.org/std/cell/struct.Cell.html#method.get_mut
I don't want to use this function because I want to have &self instead of &self mut.
I found an alternative solution of taking the raw pointer:
use std::cell::Cell;
struct DbObject {
key: Cell<String>,
data: String
}
impl DbObject {
pub fn new(data: String) -> Self {
Self {
key: Cell::new("some_uuid".into()),
data,
}
}
pub fn assert_key(&self) -> &str {
// setup key in the future if is empty...
let key = self.key.as_ptr();
unsafe {
let inner = key.as_ref().unwrap();
return inner;
}
}
}
fn main() {
let obj = DbObject::new("some data...".into());
let key = obj.assert_key();
println!("Key: {}", key);
}
Is there any way to do this without using unsafe? If not, perhaps RefCell will be more practical here?
Thank you for help!
First of, if you have a &mut T, you can trivially get a &T out of it. So you can use get_mut to get &T.
But to get a &mut T from a Cell<T> you need that cell to be mutable, as get_mut takes a &mut self parameter. And this is by design the only way to get a reference to the inner object of a cell.
By requiring the use of a &mut self method to get a reference out of a cell, you make it possible to check for exclusive access at compile time with the borrow checker. Remember that a cell enables interior mutability, and has a method set(&self, val: T), that is, a method that can modify the value of a non-mut binding! If there was a get(&self) -> &T method, the borrow checker could not ensure that you do not hold a reference to the inner object while setting the object, which would not be safe.
TL;DR: By design, you can't get a &T out of a non-mut Cell<T>. Use get_mut (which requires a mut cell), or set/replace (which work on a non-mut cell). If this is not acceptable, then consider using RefCell, which can get you a &T out of a non-mut instance, at some runtime cost.
In addition to to #mcarton answer, in order to keep interior mutability sound, that is, disallow mutable reference to coexist with other references, we have three different ways:
Using unsafe with the possibility of Undefined Behavior. This is what UnsafeCell does.
Have some runtime checks, involving runtime overhead. This is the approach RefCell, RwLock and Mutex use.
Restrict the operations that can be done with the abstraction. This is what Cell, Atomic* and (the unstable) OnceCell (and thus Lazy that uses it) does (note that the thread-safe types also have runtime overhead because they need to provide some sort of locking). Each provides a different set of allowed operations:
Cell and Atomic* do not let you to get a reference to the contained value, and only replace it as whole (basically, get() and set, though convenience methods are provided on top of these, such as swap()). Projection (cell-of-slice to slice-of-cells) is also available for Cell (field projection is possible, but not provided as part of std).
OnceCell allows you to assign only once and only then take shared reference, guaranteeing that when you assign you have no references and while you have shared references you cannot assign anymore.
Thus, when you need to be able to take a reference into the content, you cannot choose Cell as it was not designed for that - the obvious choice is RefCell, indeed.

Pointers as function arguments when implementing a structure

Why there is a & symbol before self in the full_name() function but there isn't any in the to_tuple() function? When I look at them, the usage of self is similar in both function, but why use &. Also when I add & to to_tuple() or delete it from full_name() it would throw an error. Can someone explain it?
fn full_name(&self) -> String {
format!("{} {}", self.first_name, self.last_name)
}
fn to_tuple(self) -> (String, String) {
(self.first_name, self.last_name)
}
full_name does not consume self, it uses a reference via &self: The members are only used via references as arguments to format!(), so a reference suffices.
to_tuple (as the name to_... suggests) consumes self: It moves the members from self into the returned tuple. Since the original self is no longer valid memory after the move (self no longer owns the memory), it has to be consumed, hence a move via self.
You can change full_name to use self, that is move ownership. This would become unhandy, though, as calling the function would consume the struct without the need to.
to_tuple could be changed to not consume self, yet it would need to .clone() (make a copy) of the members, which is costly.

Create interface wrapping existing types with pointer-receiver methods

I need to test an app which uses Google Cloud Pubsub, and so must wrap its types pubsub.Client and pubsub.Subscriber for testing purposes. However, despite several attempts I can't get an interface around them which compiles.
The definitions of the methods I'm trying to wrap are:
func (s *Subscription) Receive(
ctx context.Context, f func(context.Context, *Message)) error
func (c *Client) Subscription(id string) *Subscription
Here is the current code. The Receiver interface (wrapper around Subscriber) seems to work, but I suspect it may need to change in order to fix SubscriptionMaker, so I've include both.
Note: I've tried several variations of where to reference and dereference pointers, so please don't tell me to change that unless you have an explanation of why your suggested configuration is the correct one or you've personally verified it compiles.
import (
"context"
"cloud.google.com/go/pubsub"
)
type Receiver interface {
Receive(context.Context, func(ctx context.Context, msg *pubsub.Message)) (err error)
}
// Pubsub subscriptions implement Receiver
var _ Receiver = &pubsub.Subscription{}
type SubscriptionMaker interface {
Subscription(name string) (s Receiver)
}
// Pubsub clients implement SubscriptionMaker
var _ SubscriptionMaker = pubsub.Client{}
Current error message:
common_types.go:21:5: cannot use "cloud.google.com/go/pubsub".Client literal (type "cloud.google.com/go/pubsub".Client) as type SubscriptionMaker in assignment:
"cloud.google.com/go/pubsub".Client does not implement SubscriptionMaker (wrong type for Subscription method)
have Subscription(string) *"cloud.google.com/go/pubsub".Subscription
want Subscription(string) Receiver
First, for most uses, using the ptest package is probably a much easier approach for testing pubsub. But of course, your specific question can apply to any library, and the below approach can be useful for many things, not just mocking pubsub.
Your broader goal of using interfaces to mock a library like this, is doable. But it is complicated when the library you wish to mock out returns concrete types that you cannot mock (probably due to unreported fields). The approach to be taken is much more involved than is often worth it, as there may be easier ways to test your code.
But if you're intent on doing this, the approach you must take is to not wrap the entire package in interfaces, not just the specific methods you wish to mock.
You would need to wrap any types that you wish to mock which are returned by or accepted by your interface, too. This usually means you also need to modify your production code (not just your test code), so this can sometimes be a deal-breaker for existing code bases.
Where I have usually done this before is when mocking something like the standard library's sql driver, but the same approach can be applied here. In essence, you would need to create a wrapper package for your pubsub library, which you use even in your production code. Again, this can be quite intrusive on existing codebases, but for the sake of illustration. Using your defined interfaces:
package mypubsub
import "cloud.google.com/go/pubsub"
type Receiver interface {
Recieve(context.Context, func(context.Context, *pubsub.Message) error)
}
type SubscriptionMaker interface {
Subscription(string) Receiver
}
You can then wrap the default implementation, for use in production code:
// defaultClient wraps the default pubsub Client functionality.
type defaultClient struct {
*pubsub.Client
}
func (d defaultImplementation) Subscription(name string) Receiver {
return d.Client.Subscription()
}
Naturally, you'd need to expand this package to wrap most or all of the pubsub package you're using. This can be a bit daunting.
But once you've done that, then use your mypubsub package everywhere in your code, instead of directly depending on the pubsub package. And now you can easily swap out a mock implementation anywhere you need for testing.
It can't be done.
When defining the type signature of a method on an interface, it must match exactly. func (c *Client) Subscription(id string) *Subscription returns a *Subscription, and a *Subscription is a valid Receiver, but it does not count as conforming to the interface method Subscription(string) Receiver. Go requires precise matching for function signatures, not the duck-typing style that it usually uses for interfaces.

How to return reference to locally allocated struct/object? AKA error: `foo` does not live long enough

Here's a simplified example of what I'm doing:
struct Foo ...
impl io::Read for Foo ...
fn problem<'a>() -> io::Result<&'a mut io::Read> {
// foo does not live long enough, because it gets allocated on the stack
let mut foo = Foo{ v: 42 };
Ok(&mut foo)
}
Rust playground is here.
Obviously, the problem is that foo is allocated on the stack, so if we return a reference to it, the reference outlives the object.
In C, you'd get around this by using malloc to allocate the object on the heap, and the caller would need to know to call free when appropriate. In a GCed language, this would just work since foo would stick around until there are no references to it. Rust is really clever, and kind of in-between, so I'm not sure what my options are.
I think one option would be to return a managed pointer type. Is Box the most appropriate? (I found a guide to pointers in rust, but it is way outdated.)
The reason I'm returning a reference is that in reality I need to return any of several structs which implement Read. I suppose another option would be to create an enum to wrap each of the possible structs. That would avoid heap allocation, but seems needlessly awkward.
Are there other options I haven't thought of?
Replacing the reference with a Box compiles successfully:
fn problem<'a>() -> io::Result<Box<io::Read>> {
let mut foo = Foo{ v: 42 };
Ok(Box::new(foo))
}
Can you use static type? Looks like in either C or rust, static variable lasts as long as the program does - even if it's a static local.
http://rustbyexample.com/scope/lifetime/static_lifetime.html

Resources