Is it possible to add new CompletableFutures to CompletableFuture.allOf() dynamically? - asynchronous

I have a database with 3 tables:
Table A which contains data of A objects
Table B which contains data of B objects
Table C which contains data of C objects
A objects can have 0 or 1 B objects
B objects can have 0 or 1 C objects
(I know, these could be in just one table, but its just for the example)
I want to make a csv file from the whole database:
each line should contain exactly one A object, optionally its B object, and optionally its C object.
For each table there is an asynchronous repository, that gives back a CompletionStage. So when I fetch A objects from the repositoryA, I get back a CompletionStage<List<A>>. When it completes, I make a Map for every A object, fill it with the data of A and I call the repositoryB.getB(A.id) , which gives back a CompletionStage<Optional<B>>. If the B value is not present, I append a new line to my CSV file with the data inside of the map. If the B is present, I add its values to the map, and call repositoryC.getC(B.id) which returns a CompletionStage<Optional<C>>. If the C is present, I add its values to the Map, and add a new line to the CSV file, if it is not, then I just add the new line.
The creation of the CSV is done, when all CompletionStages are completed. I tried to use the CompletableFuture.allOf(), but since at the beginning i don't know how many CompletionStages there will be, I can't add all of them to the allOf method, so I think i would need to add the Completionstages dynamically somehow. Is it possible?
Currently I have a working solution, but it blocks after every B and C fetch, so I want to make the whole code nonblocking.
This is my nonblocking attempt, but its not working well, as some of the B and C futures aren't added to the list of futures, so the code does not wait for their completion:
CompletableFuture<List<CompletableFuture>> genereteCSV = repositoryA.getAs().thenApplyAsync(listA-> {
List<CompletableFuture> futures = new ArrayList<>();
for (A a : listA) {
Map<String, String> values = new Map<>();
addAvaluesToMap(values, A);
CompletableFuture Bfuture = repositoryB.getB(A.id).thenAcceptAsync((optionalB -> {
if (optionalB.isPresent()) {
addValuesToMap(values, B);
CompletableFuture Cfuture = repositoryC.getC(B.id).thenAcceptAsync(optionalC-> {
if (optionalC.isPresent()) {
addAvaluesToMap(values, C);
}
addMapValuesToCSV(values);
});
futures.add(Cfuture);
} else {
addMapValuesToCSV(values);
}
}));
futures.add(Bfuture);
}
return futures;
});
geerateCSV.thenApplyAsync(futureList-> CompletableFuture.allOf(futureList.toArray(new CompletableFuture<?>[0])))
.thenAccept(dummy->{System.out.println("CsV generation done");});

Basically, this is one possible plan to achieve non-blocking processing:
Create CompletableFuture for each object A (possibly populated
with B and C objects)
Asynchronously gather A objects from CompletableFutures
Write created A objects to CSV file
Notes: In the example below I've used addAvaluesToMap and addMapValuesToCSV assuming, that you have them working. Also, I assume, that your usage of CompletableFutures is justified by your goals.
This will be an implementation of approach described above:
public void generateCSV() {
repositoryA.getAs().thenAccept(listA -> {
List<CompletableFuture<A>> futures = listA.stream()
.map(a -> repositoryB.getB(a.id).thenComposeAsync(optionalB ->
optionalB.map(b -> repositoryC.getC(b.id).thenComposeAsync(optionalC -> {
a.setB(b);
return optionalC.map(c -> {
b.setC(c);
return CompletableFuture.completedFuture(a);
}).orElse(CompletableFuture.completedFuture(a));
})
).orElse(CompletableFuture.completedFuture(a)))
).collect(Collectors.toList());
CompletableFuture.allOf(futures.toArray(new CompletableFuture<?>[0]))
.thenAccept(v -> futures.stream()
.map(CompletableFuture::join)
.forEach(a -> {
Map<String, String> values = new HashMap<>();
addAvaluesToMap(values, a);
addMapValuesToCSV(values);
})
)
.exceptionally(throwable -> {
System.out.println("Failed generating CSV. Error: " + throwable);
return null;
});
}).exceptionally(throwable -> {
System.out.println("Failed to get list of As. Error: " + throwable);
return null;
});
}

You are using a relational database. It should be easier and more performant to write a database query to return the data you want in the format you need than to write this in java. An SQL query will allow you to join three tables together very easily and provide the results in a format which can easily be extracted in csv format. Databases can perform these operations much more effectively than by writing your own implementation.

Related

Cannot get Realm result for objects filtered by the latest (nsdate) value of a property of a collection property swift (the example is clearer)

I Have the following model
class Process: Object {
#objc dynamic var processID:Int = 1
let steps = List<Step>()
}
class Step: Object {
#objc private dynamic var stepCode: Int = 0
#objc dynamic var stepDateUTC: Date? = nil
var stepType: ProcessStepType {
get {
return ProcessStepType(rawValue: stepCode) ?? .created
}
set {
stepCode = newValue.rawValue
}
}
}
enum ProcessStepType: Int { // to review - real value
case created = 0
case scheduled = 1
case processing = 2
case paused = 3
case finished = 4
}
A process can start, processing , paused , resume (to be in step processing again), pause , resume again,etc. the current step is the one with the latest stepDateUTC
I am trying to get all Processes, having for last step ,a step of stepType processing "processing ", ie. where for the last stepDate, stepCode is 2 .
I came with the following predicate... which doesn't work. Any idea of the right perform to perform such query ?
my best trial is the one. Is it possible to get to this result via one realm query .
let processes = realm.objects(Process.self).filter(NSPredicate(format: "ANY steps.stepCode = 2 AND NOT (ANY steps.stepCode = 4)")
let ongoingprocesses = processes.filter(){$0.steps.sorted(byKeyPath: "stepDateUTC", ascending: false).first!.stepType == .processing}
what I hoped would work
NSPredicate(format: "steps[LAST].stepCode = \(TicketStepType.processing.rawValue)")
I understand [LAST] is not supported by realm (as per the cheatsheet). but is there anyway around I could achieve my goal through a realm query?
There are a few ways to approach this and it doesn't appear the date property is relevant because lists are stored in sequential order (as long as they are not altered), so the last element in the List was added last.
This first piece of code will filter for processes where the last element is 'processing'. I coded this long-handed so the flow is more understandable.
let results = realm.objects(Process.self).filter { p in
let lastIndex = p.steps.count - 1
let step = p.steps[lastIndex]
let type = step.stepType
if type == .processing {
return true
}
return false
}
Note that Realm objects are lazily loaded - which means thousands of objects have a low memory impact. By filtering using Swift, the objects are filtered in memory so the impact is more significant.
The second piece of code is what I would suggest as it makes filtering much simpler, but would require a slight change to the Process model.
class Process: Object {
#objc dynamic var processID:Int = 1
let stepHistory = List<Step>() //RENAMED: the history of the steps
#objc dynamic var name = ""
//ADDED: new property tracks current step
#objc dynamic var current_step = ProcessStepType.created.index
}
My thought here is that the Process model keeps a 'history' of steps that have occurred so far, and then what the current_step is.
I also modified the ProcessStepType enum to make it more filterable friendly.
enum ProcessStepType: Int { // to review - real value
case created = 0
case scheduled = 1
case processing = 2
case paused = 3
case finished = 4
//this is used when filtering
var index: Int {
switch self {
case .created:
return 0
case .scheduled:
return 1
case .processing:
return 2
case .paused:
return 3
case .finished:
return 4
}
}
}
Then to return all processes where the last step in the list is 'processing' here's the filter
let results2 = realm.objects(Process.self).filter("current_step == %#", ProcessStepType.processing.index)
The final thought is to add some code to the Process model so when a step is added to the list, the current_step var is also updated. Coding that is left to the OP.

DocumentDB Change Feed and saving Checkpoint

After reading the documentation, I'm having a hard time conceptualizing the change feed. Let's take the code from the documentation below. The second change feed is picking up the changes from the last time it was run via the checkpoints. Let's say it is being used to create summary data and there was an issue and it needed to be re-run from a prior time. I don't understand the following:
How to specify a particular time the checkpoint should start. I understand I can save the checkpoint dictionary and use that for each run, but how do you get the changes from X time to maybe rerun some summary data
Secondly, let's say we are rerunning some summary data and we save the last checkpoint used for each summarized data so we know where that one left off. How does one know that a record is in or before that checkpoint?
Code that runs from collection beginning and then from last checkpoint:
Dictionary < string, string > checkpoints = await GetChanges(client, collection, new Dictionary < string, string > ());
await client.CreateDocumentAsync(collection, new DeviceReading {
DeviceId = "xsensr-201", MetricType = "Temperature", Unit = "Celsius", MetricValue = 1000
});
await client.CreateDocumentAsync(collection, new DeviceReading {
DeviceId = "xsensr-212", MetricType = "Pressure", Unit = "psi", MetricValue = 1000
});
// Returns only the two documents created above.
checkpoints = await GetChanges(client, collection, checkpoints);
//
private async Task < Dictionary < string, string >> GetChanges(
DocumentClient client,
string collection,
Dictionary < string, string > checkpoints) {
List < PartitionKeyRange > partitionKeyRanges = new List < PartitionKeyRange > ();
FeedResponse < PartitionKeyRange > pkRangesResponse;
do {
pkRangesResponse = await client.ReadPartitionKeyRangeFeedAsync(collection);
partitionKeyRanges.AddRange(pkRangesResponse);
}
while (pkRangesResponse.ResponseContinuation != null);
foreach(PartitionKeyRange pkRange in partitionKeyRanges) {
string continuation = null;
checkpoints.TryGetValue(pkRange.Id, out continuation);
IDocumentQuery < Document > query = client.CreateDocumentChangeFeedQuery(
collection,
new ChangeFeedOptions {
PartitionKeyRangeId = pkRange.Id,
StartFromBeginning = true,
RequestContinuation = continuation,
MaxItemCount = 1
});
while (query.HasMoreResults) {
FeedResponse < DeviceReading > readChangesResponse = query.ExecuteNextAsync < DeviceReading > ().Result;
foreach(DeviceReading changedDocument in readChangesResponse) {
Console.WriteLine(changedDocument.Id);
}
checkpoints[pkRange.Id] = readChangesResponse.ResponseContinuation;
}
}
return checkpoints;
}
DocumentDB supports check-pointing only by the logical timestamp returned by the server. If you would like to retrieve all changes from X minutes ago, you would have to "remember" the logical timestamp corresponding to the clock time (ETag returned for the collection in the REST API, ResponseContinuation in the SDK), then use that to retrieve changes.
Change feed uses logical time in place of clock time because it can be different across various servers/partitions. If you would like to see change feed support based on clock time (with some caveats on skew), please propose/upvote at https://feedback.azure.com/forums/263030-documentdb/.
To save the last checkpoint per partition key/document, you can just save the corresponding version of the batch in which it was last seen (ETag returned for the collection in the REST API, ResponseContinuation in the SDK), like Fred suggested in his answer.
How to specify a particular time the checkpoint should start.
You could try to provide a logical version/ETag (such as 95488) instead of providing a null value as RequestContinuation property of ChangeFeedOptions.

Java 8 Streams API to filter Map entries

I have a the following container in Java that I need to work on
Map<String, List<Entry<Parameter, String>>>
Where Parameter is an enumerated type defined as follows:
public enum Parameter {
Param1,
Param2,
Param3
}
The code below shows how I initialize the map structure - effectively putting 2 rows in the container.
Map<String, List<Entry<Parameter, String>>> map2 = new HashMap<String, List<Entry<Parameter, String>>>() {{
put("SERVICE1", new ArrayList<Entry<Parameter, String>>(){{
add (new AbstractMap.SimpleEntry<>(Parameter.Param1,"val1"));
add (new AbstractMap.SimpleEntry<>(Parameter.Param2,"val2"));
add (new AbstractMap.SimpleEntry<>(Parameter.Param3,"val3"));
}});
put("SERVICE2", new ArrayList<Entry<Parameter, String>>(){{
add (new AbstractMap.SimpleEntry<>(Parameter.Param1,"val4"));
add (new AbstractMap.SimpleEntry<>(Parameter.Param2,"val5"));
add (new AbstractMap.SimpleEntry<>(Parameter.Param3,"val6"));
}});
}};
I need to use the java 8 streams api to find the val1 and val2 values from "SERVICE1" but I do not know the correct java streams filter and mapping syntax.
The nearest thing I can come up with is the following, but this only filters at the top level and it returns a list of lists rather than the list of Parameter.Param1,"val1" & Parameter.Param2,"val3" that I am looking for from the SERVICE1 row.
List<List<Entry<Parameter, String>>> listOfLists = myMap.entrySet().stream()
.filter(next -> next.getKey().equals("SERVICE1"))
.map(Map.Entry::getValue)
.collect(Collectors.toList());
listOfLists.size();
If you only need the "val1" and "val2" values, you can first use getOrDefault to get the corresponding list, and then filter on the entries' keys to get entries with Param1 or Param2 as key, and finally apply map again to get the values of these entries.
List<String> list =
myMap.getOrDefault("SERVICE1", Collections.emptyList())
.stream()
.filter(e -> e.getKey() == Parameter.Param1 || e.getKey() == Parameter.Param2)
.map(Map.Entry::getValue)
.collect(Collectors.toList());
Also you might be interested to look at Efficiency of Java "Double Brace Initialization"?

Add "blocking" to Swift for-loop

I am using Swift in a project, and using SQLite.swift for database handling. I am trying to retrieve the most recent entry from my database like below:
func returnLatestEmailAddressFromEmailsTable() -> String{
let dbPath = NSSearchPathForDirectoriesInDomains(.DocumentDirectory, .UserDomainMask, true).first as String
let db = Database("\(dbPath)/db.sqlite3")
let emails = db["emails"]
let email = Expression<String>("email")
let time = Expression<Int>("time")
var returnEmail:String = ""
for res in emails.limit(1).order(time.desc) {
returnEmail = res[email]
println("from inside: \(returnEmail)")
}
return returnEmail
}
I am trying to test the returned string from the above function like this:
println("from outside: \(returnLatestEmailAddressFromEmailsTable())")
Note how I print the value from both inside and outside of the function. Inside, it works every single time. I am struggling with the "from outside:" part.
Sometimes the function returns the correct email, but sometimes it returns "" (presumably, the value was not set in the for loop).
How can I add "blocking" functionality so calling returnLatestEmailAddressFromEmailsTable() will always first evaluate the for loop, and only after this return the value?

How use a variable name to point different data types with the same name?

I have 2 List one stores the name of filterable columns(of type DropDown) and another store the values to load in those filterable columns.
List<string> filterableFields = new List<string>() { "A_B", "C_D", "E_F" };
List<string> AB, CD , EF;
Now at the run time I get the data from web service and I have written a function to to extract values for these filterable fields and store the values to 2nd List.
private void prepareListForFilterableColumns(XDocument records)
{
foreach (var currentField in filterableFields)
{
var values = (from xml in records.Descendants(z + "row")
let val = (string)xml.Attribute("ows_" + currentField.Replace("_", "_x0020_"))
where val != ""
orderby val
select val
).Distinct();
switch (currentField)
{
case "A_B": AB = values.ToList(); break;
case "C_D": CD = values.ToList(); break;
}
}
}
Now I was thinking that instead of hard coding the assignment in swtich case block, If I could just use the first List name "A_B" and replace "_" from it to point to my 2nd List and assign values.ToList() to it.
I understand that c# is a static language, So not sure if we can achieve this, but IF I can it will make my function generic.
Thanks a lot in advance for time and help.
Vishal
You could use a dictionary of lists of strings instead of 3 lists to store the values.
Dictionary<string, List<string>> val lists = new Dictionary<string,List<string>>();
And make the keys of the dictionary equal to the filterables: "AB", "CD",..
then, instead of AB you would use valLists["AB"] and could then reference reach list based on a string key.
The other option would be to use reflection but that would be slower and unnecessarily a bit more complicated.

Resources