When designing a program in a functional style, I think about designing a base layer of functions that operate on a single object. Then, if I need to operate on a collection of those objects I start building on top of that base layer using traditional functional glue like mapping, filtering, reducing, etc.
For example, lets say I have a DB backed application that has Users and Tasks, where Users are assigned Tasks.
I may have a function defined like
def doesUserPerformTask?(taskId, userId)
// Go to DB to see if this userId does this taskId
// return userid if success or else nil
end
Later down the road, I am given a list of user id's and want to know which of them perform task X. Perfect, I already have the function doesUserPerformTask? and it has been battle tested all over other places in the code, so I can just map over the user id list and call that function for each of them and then filter the results.
While this is a great benefit of functional design, I have an efficiency problem that each element (i.e. user id) passed to map requires a DB hit. I now need to create an entirely new function that operates on a list of userId's.
I keep running into this problem when designing DB backed programs in a functional style, where I keep having to write new functions that don't build off the base layer of functions and ending up with lots of functions written specifically for both operating on single items and collections of items.
Is there a better way to organize DB backed programs written in a functional style?
Why not pass the actual object into the function?
def doesUserPerformTask?(task, user)
// Check the user object directly
// return true or (false|nil)
end
Then write a wrapper that will fetch the user and task from the DB
def doesUserPerformTaskFromDB?(taksId, userId)
// DB calls here
if doesUserPerformTask?(task, user) ? user.id : nil
end
Then write a wrapper for collections
def whichUsersPerformTask?(task)
// fetch users from DB
// map non-db function over collection
end
Then again, unless you're going to use that user collection for something else wouldn't it be better to depend upon the DB query to get the users you need (via whichever query language)? Seems like there are a few options that make this both efficient and DRY.
Related
What would a Cosmos stored procedure look like that would set the PumperID field for every record to a default value?
We are needing to do this to repair some data, so the procedure would visit every record that has a PumperID field (not all docs have this), and set it to a default value.
Assuming a one-time data maintenance task, arguably the simplest solution is to create a single purpose .NET Core console app and use the SDK to query for the items that require changes, and perform the updates. I've used this approach to rename properties, for example. This works for any Cosmos database and doesn't require deploying any stored procs or otherwise.
Ideally, it is designed to be idempotent so it can be run multiple times if several passes are required to catch new data coming in. If the item count is large, one could optionally use the SDK operations to scale up throughput on start and scale back down when finished. For performance run it close to the endpoint on an Azure Virtual Machine or Function.
For scenarios where you want to iterate through every item in a container and update a property, the best means to accomplish this is to use the Change Feed Processor and run the operation in an Azure function or VM. See Change Feed Processor to learn more and examples to start with.
With Change Feed you will want to start it to read from the beginning of the container. To do this see Reading Change Feed from the beginning.
Then within your delegate you will read each item off the change feed, check it's value and then call ReplaceItemAsync() to write back if it needed to be updated.
static async Task HandleChangesAsync(IReadOnlyCollection<MyType> changes, CancellationToken cancellationToken)
{
Console.WriteLine("Started handling changes...");
foreach (MyType item in changes)
{
if(item.PumperID == null)
{
item.PumperID = "some value"
//call ReplaceItemAsync(), etc.
}
}
Console.WriteLine("Finished handling changes.");
}
I have several functions that deal with database interactions. (like readModelById, updateModel, findModels, etc) that I try to use in a functional style.
In OOP, I'd create a class that takes DB-connection-parameters in the constructor, creates the database-connection and save the DB-handle in the instance. The functions then would just use the DB-handle from "this".
What's the best way in FP to deal with this? I don't want to hand around the DB handle throughout the entire application. I thought about partial application on the functions to "bake in" the handle, but that creates ugly boilerplate code, doing it one by one and handing it back.
What's the best practice/design pattern for things like this in FP?
There is a parallel to this in OOP that might suggest the right approach is to take the database resource as parameter. Consider the DB implementation in OOP using SOLID principles. Due to Interface Segregation Principle, you would end up with an interface per DB method and at least one implementation class per interface.
// C#
public interface IGetRegistrations
{
public Task<Registration[]> GetRegistrations(DateTime day);
}
public class GetRegistrationsImpl : IGetRegistrations
{
public Task<Registration[]> GetRegistrations(DateTime day)
{
...
}
private readonly DbResource _db;
public GetRegistrationsImpl(DbResource db)
{
_db = db;
}
}
Then to execute your use case, you pass in only the dependencies you need instead of the whole set of DB operations. (Assume that ISaveRegistration exists and is defined like above).
// C#
public async Task Register(
IGetRegistrations a,
ISaveRegistration b,
RegisterRequest requested
)
{
var registrations = await a.GetRegistrations(requested.Date);
// examine existing registrations and determine request is valid
// throw an exception if not?
...
return await b.SaveRegistration( ... );
}
Somewhere above where this code is called, you have to new up the implementations of these interfaces and provide them with DbResource.
var a = new GetRegistrationsImpl(db);
var b = new SaveRegistrationImpl(db);
...
return await Register(a, b, request);
Note: You could use a DI framework here to attempt to avoid some boilerplate. But I find it to be borrowing from Peter to pay Paul. You pay as much in having to learn a DI framework and how to make it behave as you do for wiring the dependencies yourself. And it is another tech new team members have to learn.
In FP, you can do the same thing by simply defining a function which takes the DB resource as a parameter. You can pass functions around directly instead of having to wrap them in classes implementing interfaces.
// F#
let getRegistrations (db: DbResource) (day: DateTime) =
...
let saveRegistration (db: DbResource) ... =
...
The use case function:
// F#
let register fGet fSave request =
async {
let! registrations = fGet request.Date
// call your business logic here
...
do! fSave ...
}
Then to call it you might do something like this:
register (getRegistrations db) (saveRegistration db) request
The partial application of db here is analogous to constructor injection. Your "losses" from passing it to multiple functions is minimal compared to the savings of not having to define interface + implementation for each DB operation.
Despite being in a functional-first language, the above is in principle the same as the OO/SOLID way... just less lines of code. To go a step further into the functional realm, you have to work on eliminating side effects in your business logic. Side effects can include: current time, random numbers, throwing exceptions, database operations, HTTP API calls, etc.
Since F# does not require you to declare side effects, I designate a border area of code where side effects should stop being used. For me, the use case level (register function above) is the last place for side effects. Any business logic lower than that, I work on pushing side effects up to the use case. It is a learning process to do that, so do not be discouraged if it seems impossible at first. Just do what you have to for now and learn as you go.
I have a post that attempts to set the right expectations on the benefits of FP and how to get them.
I'm going to add a second answer here taking an entirely different approach. I wrote about it here. This is the same approach used by MVU to isolate decisions from side effects, so it is applicable to UI (using Elmish) and backend.
This is worthwhile if you need to interleave important business logic with side effects. But not if you just need to execute a series of side effects. In that case just use a block of imperative statements, in a task (F# 6 or TaskBuilder) or async block if you need IO.
The pattern
Here are the basic parts.
Types
Model - The state of the workflow. Used to "remember" where we are in the workflow so it can be resumed after side effects.
Effect - Declarative representation of the side effects you want to perform and their required data.
Msg - Represents events that have happened. Primarily, they are the results of side effects. They will resume the workflow.
Functions
update - Makes all the decisions. It takes in its previous state (Model) and a Msg and returns an updated state and new Effects. This is a pure function which should have no side effects.
perform - Turns a declared Effect into a real side effect. For example, saving to a database. Returns a Msg with the result of the side effect.
init - Constructs an initial Model and starting Msg. Using this, a caller gets the data it needs to start the workflow without having to understand the internal details of update.
I jotted down an example for a rate-limited emailer. It includes the implementation I use on the backend to package and run this pattern, called Ump.
The logic can be tested without any instrumentation (no mocks/stubs/fakes/etc). Declare the side effects you expect, run the update function, then check that the output matches with simple equality. From the linked gist:
// example test
let expected = [SendEmail email1; ScheduleSend next]
let _, actual = Ump.test ump initArg [DueItems ([email1; email2], now)]
Assert.IsTrue(expected = actual)
The integrations can be tested by exercising perform.
This pattern takes some getting-used-to. It reminds me a bit of an Erlang actor or a state machine. But it is helpful when you really need your business logic to be correct and well-tested. It also happens to be a proper functional pattern.
I have a primary node in my database called 'questions', when I create a ref to that node and bring it into my project as a $asObject(), I can modify the individual questions and $save() the collection without any problems, however as soon as I try to limit the object, by priority, the $save() deletes everything off of the object!
this works fine:
db.questions = $firebase(fb.questions).$asObject();
// later :
db.questions.$save();
// db.questions is an object with many 'questions', which I can edit and resave as I please
but as soon as I switch my code to this:
db.questions = $firebase(fb.questions.startAt(auth.user.id).endAt(auth.user.id)).$asObject();
// later :
db.questions.$save();
// db.questions is an empty firebase object without any 'questions!'
Is there some limitation to limited objects (pun not intended) and their ability to be changed and saved?? The saving actually saves updates to the questions to the database, but somehow nukes the local $firebase object...
First line of synchronized arrays ($asArray) documentation:
Synchronized arrays should be used for any list of objects that will be sorted, iterated, and which have unique ids.
First line of synchronized objects ($asObject) documentation:
Objects are useful for storing key/value pairs, and singular records that are not used as a collection.
As demonstrated, if you are going to work with a collection and employ limit, it would behoove you to use a tool designed for collections (i.e. $asArray).
If you were to recreate the behavior of $save using the Firebase SDK, it would look like this:
var ref = new Firebase(URL).limit(10);
// ref.set(data); // throws an error!
ref.ref().set(data); // replaces the entire path; same as $save
Thus, the behavior here exactly matches the SDK. You cannot, technically, call set() on a query instance and this doesn't make any sense, really. What does limit(10) mean to a JSON object? If you call set, which 10 unordered keys should be set? There is no correlation here and limit() really only makes sense with a collection of data, not a list of key/value pairs.
Hope that helps.
On my meteor project users can post events and they have to choose (via an autocomplete) in which city it will take place. I have a full list of french cities and it will never be updated.
I want to use a collection and publish-subscribes based on the input of the autocomplete because I don't want the client to download the full database (5MB). Is there a way, for performance, to tell meteor that this collection is "static"? Or does it make no difference?
Could anyone suggest a different approach?
When you "want to tell the server that a collection is static", I am aware of two potential optimizations:
Don't observe the database using a live query because the data will never change
Don't store the results of this query in the merge box because it doesn't need to be tracked and compared with other data (saving memory and CPU)
(1) is something you can do rather easily by constructing your own publish cursor. However, if any client is observing the same query, I believe Meteor will (at least in the future) optimize for that so it's still just one live query for any number of clients. As for (2), I am not aware of any straightforward way to do this because it could potentially mess up the data merging over multiple publications and subscriptions.
To avoid using a live query, you can manually add data to the publish function instead of returning a cursor, which causes the .observe() function to be called to hook up data to the subscription. Here's a simple example:
Meteor.publish(function() {
var sub = this;
var args = {}; // what you're find()ing
Foo.find(args).forEach(function(document) {
sub.added("client_collection_name", document._id, document);
});
sub.ready();
});
This will cause the data to be added to client_collection_name on the client side, which could have the same name as the collection referenced by Foo, or something different. Be aware that you can do many other things with publications (also, see the link above.)
UPDATE: To resolve issues from (2), which can be potentially very problematic depending on the size of the collection, it's necessary to bypass Meteor altogether. See https://stackoverflow.com/a/21835534/586086 for one way to do it. Another way is to just return the collection fetch()ed as a method call, although this doesn't have the benefits of compression.
From Meteor doc :
"Any change to the collection that changes the documents in a cursor will trigger a recomputation. To disable this behavior, pass {reactive: false} as an option to find."
I think this simple option is the best answer
You don't need to publish your whole collection.
1.Show autocomplete options only after user has inputted first 3 letters - this will narrow your search significantly.
2.Provide no more than 5-10 cities as options - this will keep your recordset really small - thus no need to push 5mb of data to each user.
Your publication should look like this:
Meteor.publish('pub-name', function(userInput){
var firstLetters = new RegExp('^' + userInput);
return Cities.find({name:firstLetters},{limit:10,sort:{name:1}});
});
I have a batch apex class where i'm building collections of websites and emails, so that i can use those collections to filter other other queries which will be made into collections. With all collections set, i want to run through a final loop of the scope to perform business processes.
Mockup:
for(Object o : scope)
{
listEmails.add(o.Email);
listWebsites.add(o.Websites);
}
Map<String, Account> accounts = Gather all accounts where website not in :listWebsties; //Website is key
List<String, Contact> contacts = Gather all contacts where email not in :listEmails; //Email is key
for(Object o : scope)
{
Account = accounts.get(o.website);
Contact = contacts.get(o.Email);
Perform business logic here
}
The problem is when i run this batch it stays processing for hours. When working with a rather small database this works fine. But in working in a larger environment perhaps this is not the best solution.
Can anyone help me speed up the batch process with a more effective approach?
Is there anyway to post the entire batch apex class? Or help understand the data more?
It looks like from your map that all of your accounts (in theory) have unique websites and all of your contacts have unique emails?
I assume you build those maps by hand? That is you loop over the accounts and do a
map.put(account.website,account)?
Do you have any system debug statements to confirm your map sizes?
What happens if there is no account or no contact when you call accounts.get()?
And the business logic - is it more looping?
And are you using Batch variables in a static manner - i.e. you can have a counter to count the total number of records processed. If so, is your variable a list? that can be dangerous of course.
Also what object is your scope object? Not that it matters, but I'd think you'd want to have your scope be the Accounts themselves or the Contacts themselves.
I'd try adding system.debug statements to your batch to verify it's running and to see where the infinite loop may be occurring.