Salesforce Batch Apex Class - Querying Against Large Data Sets - collections

I have a batch apex class where i'm building collections of websites and emails, so that i can use those collections to filter other other queries which will be made into collections. With all collections set, i want to run through a final loop of the scope to perform business processes.
Mockup:
for(Object o : scope)
{
listEmails.add(o.Email);
listWebsites.add(o.Websites);
}
Map<String, Account> accounts = Gather all accounts where website not in :listWebsties; //Website is key
List<String, Contact> contacts = Gather all contacts where email not in :listEmails; //Email is key
for(Object o : scope)
{
Account = accounts.get(o.website);
Contact = contacts.get(o.Email);
Perform business logic here
}
The problem is when i run this batch it stays processing for hours. When working with a rather small database this works fine. But in working in a larger environment perhaps this is not the best solution.
Can anyone help me speed up the batch process with a more effective approach?

Is there anyway to post the entire batch apex class? Or help understand the data more?
It looks like from your map that all of your accounts (in theory) have unique websites and all of your contacts have unique emails?
I assume you build those maps by hand? That is you loop over the accounts and do a
map.put(account.website,account)?
Do you have any system debug statements to confirm your map sizes?
What happens if there is no account or no contact when you call accounts.get()?
And the business logic - is it more looping?
And are you using Batch variables in a static manner - i.e. you can have a counter to count the total number of records processed. If so, is your variable a list? that can be dangerous of course.
Also what object is your scope object? Not that it matters, but I'd think you'd want to have your scope be the Accounts themselves or the Contacts themselves.
I'd try adding system.debug statements to your batch to verify it's running and to see where the infinite loop may be occurring.

Related

MDriven ECO_ID duplicates

We appear to have a problem with MDriven generating the same ECO_ID for multiple objects. For the most part it seems to happen in conjunction with unexpected process shutdowns and/or server shutdowns, but it does also happen during normal activity.
Our system consists of one ASP.NET application and one WinForms application. The ASP.NET app is setup in IIS to use a single worker process. We have a mixture of WebForms and MVC, including ApiControllers. We're using a rather old version of the ECO packages: 7.0.0.10021. We're on VS 2017, target framework is 4.7.1.
We have it configured to use 64 bit integers for object id:s. Database is Firebird. SQL configuration is set to use ReadCommitted transaction isolation.
As far as I can tell we have configured EcoSpaceStrategyHandler with EcoSpaceStrategyHandler.SessionStateMode.Never, which should mean that EcoSpaces are not reused at all, right? (Why would I even use EcoSpaceStrategyHandler in this case, instead of just creating EcoSpace normally with the new keyword?)
We have created MasterController : Controller and MasterApiController : ApiController classes that we use for all our controllers. These have a EcoSpace property that simply does this:
if (ecoSpace == null)
{
if (ecoSpaceStrategyHandler == null)
ecoSpaceStrategyHandler = new EcoSpaceStrategyHandler(
EcoSpaceStrategyHandler.SessionStateMode.Never,
typeof(DiamondsEcoSpace),
null,
false
);
ecoSpace = (DiamondsEcoSpace)ecoSpaceStrategyHandler.GetEcoSpace();
}
return ecoSpace;
I.e. if no strategy handler has been created, create one specifying no pooling and no session state persisting of eco spaces. Then, if no ecospace has been fetched, fetch one from the strategy handler. Return the ecospace. Is this an acceptable approach? Why would it be better than simply doing this:
if (ecoSpace = null)
ecoSpace = new DiamondsEcoSpace();
return ecoSpace;
In aspx we have a master page that has an EcoSpaceManager. It has been configured to use a pool but SessionStateMode is Never. It has EnableViewState set to true. Is this acceptable? Does it mean that EcoSpaces will be pooled but inactivated between round trips?
It is possible that we receive multiple incoming API calls in tight succession, so that one API call hasn't been completed before the next one comes in. I assume that this means that multiple instances of MasterApiController can execute simultaneously but in separate threads. There may of course also be MasterController instances executing MVC requests and also the WinForms app may be running some batch job or other.
But as far as I understand id reservation is made at the beginning of any UpdateDatabase call, in this way:
update "ECO_ID" set "BOLD_ID" = "BOLD_ID" + :N;
select "BOLD_ID" from "ECO_ID";
If the returned value is K, this will reserve N new id:s ranging from K - N to K - 1. Using ReadCommitted transactions everywhere should ensure that the update locks the id data row, forcing any concurrent save operations to wait, then fetches the update result without interference from other transactions, then commits. At that point any other pending save operation can proceed with its own id reservation. I fail to see how this could result in the same ID being used for multiple objects.
I should note that it does seem like it sometimes produces id duplicates within one single UpdateDatabase, i.e. when saving a set of new related objects, some of them end up with the same id. I haven't really confirmed this though.
Any ideas what might be going on here? What should I look for?
The issue is most likely that you use ReadCommitted isolation.
This allows for 2 systems to simultaneously start a transaction, read the current value, increase the batch, and then save after each other.
You must use Serializable isolation for key generation; ie only read things not currently in a write operation.
MDriven use 2 settings for isolation level UpdateIsolationLevel and FetchIsolationLevel.
Set your UpdateIsolationLevel to Serializable

How to set a field for every document in a Cosmos db?

What would a Cosmos stored procedure look like that would set the PumperID field for every record to a default value?
We are needing to do this to repair some data, so the procedure would visit every record that has a PumperID field (not all docs have this), and set it to a default value.
Assuming a one-time data maintenance task, arguably the simplest solution is to create a single purpose .NET Core console app and use the SDK to query for the items that require changes, and perform the updates. I've used this approach to rename properties, for example. This works for any Cosmos database and doesn't require deploying any stored procs or otherwise.
Ideally, it is designed to be idempotent so it can be run multiple times if several passes are required to catch new data coming in. If the item count is large, one could optionally use the SDK operations to scale up throughput on start and scale back down when finished. For performance run it close to the endpoint on an Azure Virtual Machine or Function.
For scenarios where you want to iterate through every item in a container and update a property, the best means to accomplish this is to use the Change Feed Processor and run the operation in an Azure function or VM. See Change Feed Processor to learn more and examples to start with.
With Change Feed you will want to start it to read from the beginning of the container. To do this see Reading Change Feed from the beginning.
Then within your delegate you will read each item off the change feed, check it's value and then call ReplaceItemAsync() to write back if it needed to be updated.
static async Task HandleChangesAsync(IReadOnlyCollection<MyType> changes, CancellationToken cancellationToken)
{
Console.WriteLine("Started handling changes...");
foreach (MyType item in changes)
{
if(item.PumperID == null)
{
item.PumperID = "some value"
//call ReplaceItemAsync(), etc.
}
}
Console.WriteLine("Finished handling changes.");
}

Optimize Firebase database design

I am having trouble designing the database of my app. In the app users are allowed to create jobs and then using GeoFire I find people nearby.
This is my design for the jobs so far:
As you can see there are the users and then the workers. After pushing the new job to the users Unique ID (UID) under serviceUsers, I then use geoFire to find the workerUsers that are nearby. I then push the jobs into the UID's of the workerUsers.
Now here are my questions:
I am basically creating copies of these jobs. Once for the person who created it (under serviceUsers) and once for every nearby workerUsers.
Is this inefficient? Should I rather pass some kind of pointer instead of the whole job object to the nearby users?
And here for the more important question: If the design is fine as it is, how would I go on about when the creator of the job deletes it? I would then need to find each job in workerUsers and delete the job with the jobs UID. Does Firebase support queries for this?
Thank you very much in advance!
I am basically creating copies of these jobs. Once for the person who
created it (under serviceUsers) and once for every nearby workerUsers.
Is this inefficient? Should I rather pass some kind of pointer instead
of the whole job object to the nearby users?
Every job should have a UUID which can act as a "pointer" (I'd rather call it a key). Then every user should include a job UUID, not a whole copy, so you can refer to it. I won't completely replicate your use case, but you should get an idea.
{
users: {
exampleUserId: {
jobs: ['exampleUUID']
}
},
jobs: {
exampleUUID: {
name: 'awesome job'
}
}
}
If the design is fine as it is, how would I go on about when the
creator of the job deletes it? I would then need to find each job in
workerUsers and delete the job with the jobs UID. Does Firebase
support queries for this?
It does support it, but you should implement my suggestion from above to do it in a sane way. After this, you can create a cloud function whose job should sound like this: "When a job with given UUID is removed, then go through every user and remove a reference to it if it exists"
exports.checkReferences = functions.database.ref('/jobs/{uuid}').onWrite(event => {
// check information here
if (!event.data.val()) {
// job was removed! get its uuid and iterate through users and remove the uuid from them
}
});

Firebase client-side fan-out performance

for my new app i use this method
https://firebase.googleblog.com/2015/10/client-side-fan-out-for-data-consistency_73.html
i think that is a good method for a person that have a number of followers less than 1 million. i try and up to this number is fine. but for person that have 10kk
of followers the client get in crash because you get a big array of 10kk followers and short it to create another big array of 10kk of path activities.
I just wanted to point out this point, i think that this is a solution that work only with app that have a few numbers of users. finally We're forced to use server-side solutions. and this is bad for the general app efficency
would be a nice feature a function that allow this thing by firebase side with less cost in the client side. i think a feature like this. i make an example in javascript
var obj = { created: time } var path = "FollowersActivity/uid/" var followers = 'root.child("Followers").child("uid").val()' function massSaved(obj, path, followers)
by firebase server side the server get all childs by "followers" path and by foreach cycle append every follower name at the "path" string and save all objects. in this mode the client send only fews strings at firebase server without get all followers and make other big array of activity. probably my example not work because i not know the firebase infrastructure but is only an example to suggest an idea to conclude these operations entirely on the server side

Designing a DB backed application in a functional style

When designing a program in a functional style, I think about designing a base layer of functions that operate on a single object. Then, if I need to operate on a collection of those objects I start building on top of that base layer using traditional functional glue like mapping, filtering, reducing, etc.
For example, lets say I have a DB backed application that has Users and Tasks, where Users are assigned Tasks.
I may have a function defined like
def doesUserPerformTask?(taskId, userId)
// Go to DB to see if this userId does this taskId
// return userid if success or else nil
end
Later down the road, I am given a list of user id's and want to know which of them perform task X. Perfect, I already have the function doesUserPerformTask? and it has been battle tested all over other places in the code, so I can just map over the user id list and call that function for each of them and then filter the results.
While this is a great benefit of functional design, I have an efficiency problem that each element (i.e. user id) passed to map requires a DB hit. I now need to create an entirely new function that operates on a list of userId's.
I keep running into this problem when designing DB backed programs in a functional style, where I keep having to write new functions that don't build off the base layer of functions and ending up with lots of functions written specifically for both operating on single items and collections of items.
Is there a better way to organize DB backed programs written in a functional style?
Why not pass the actual object into the function?
def doesUserPerformTask?(task, user)
// Check the user object directly
// return true or (false|nil)
end
Then write a wrapper that will fetch the user and task from the DB
def doesUserPerformTaskFromDB?(taksId, userId)
// DB calls here
if doesUserPerformTask?(task, user) ? user.id : nil
end
Then write a wrapper for collections
def whichUsersPerformTask?(task)
// fetch users from DB
// map non-db function over collection
end
Then again, unless you're going to use that user collection for something else wouldn't it be better to depend upon the DB query to get the users you need (via whichever query language)? Seems like there are a few options that make this both efficient and DRY.

Resources