How to simplify pipeline with aggregation and accumulation? - bigdata

I'm trying to design a pipeline that would be reading data from PubSubIO and aggregating it into one output fire every 60 seconds.
Input:
00:00:01 -> "1"
00:00:21 -> "2"
00:00:41 -> "3"
00:01:01 -> "4"
00:01:21 -> "5"
00:01:51 -> "6"
Expected output:
00:01:00 -> "1,2,3"
00:02:00 -> "1,2,3,4,5,6"
Here is my code:
pipeline
.apply("Reading PubSub",
PubsubIO
.readMessagesWithAttributes()
.fromSubscription("..."))
.apply("Get message",
ParDo.of(new DoFn<PubsubMessage, String>() {
#ProcessElement
public void processElement(ProcessContext c) {
PubsubMessage ref = c.element();
c.output(new String(ref.getPayload()));
}
}))
.apply("Window",
Window.<String>into(new GlobalWindows())
.triggering(
Repeatedly.forever(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(60))))
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes())
.apply("Accumulate result to iterable",
Combine.globally(new CombineIterableAccumulatorFn<>()))
.apply("toString()", ToString.elements())
.apply("Write to file",
TextIO
.write()
.withWindowedWrites()
.withNumShards(1)
.to("result"));
This is my CombineFn implementation for aggregation data into Iterable
public class CombineIterableAccumulatorFn<T> extends Combine.CombineFn<T, List<T>, Iterable<T>> {
#Override
public List<T> createAccumulator() {
return new ArrayList<>();
}
#Override
public List<T> addInput(List<T> accumulator, T input) {
accumulator.add(input);
return accumulator;
}
#Override
public List<T> mergeAccumulators(Iterable<List<T>> accumulators) {
return StreamSupport.stream(accumulators.spliterator(), false)
.flatMap(List::stream)
.collect(Collectors.toList());
}
#Override
public Iterable<T> extractOutput(List<T> accumulator) {
return accumulator;
}
}
With this realization I'm receiving next output:
00:01:00 -> "1,2,3"
00:02:00 -> "1,2,3
1,2,3,4,5,6"
To remove duplicated "1,2,3" line at 00:02:00 I should add after line
.apply("Accumulate result to iterable",
Combine.globally(new CombineIterableAccumulatorFn<>()))
additional Windowing block like this:
.apply("Window",
Window
.<String>into(new GlobalWindows())
.triggering(
Repeatedly.forever(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(60))))
.withAllowedLateness(Duration.ZERO)
.discardingFiredPanes())
It all looks very complex. Is there any better options to implement this task?

Related

Picocli: arbitrary length of paired parameters

In Picocli, is it possible to pair parameters of an arbitrary length? For example:
grades Abby 4.0 Billy 3.5 Caily 3.5 Danny 4.0
where each pair must have a name and a grade but the total length is unknown, i.e.:
grades <name> <grade> [<name> <grade>]*
A parameter map is the closest that appears might work, e.g.
#Parameters(index = "0..*") Map<String, float> grades;
would parse:
grades Abby=4.0 Billy=3.5 Caily=3.5 Danny=4.0
into the map but it'd be nicer if the equals wasn't there...
Update: picocli 4.3 has been released with improved support for positional parameters in argument groups.
#Command(name = "grades", mixinStandardHelpOptions = true, version = "grades 1.0")
public class Grades implements Runnable {
static class StudentGrade {
#Parameters(index = "0") String name;
#Parameters(index = "1") BigDecimal grade;
}
#ArgGroup(exclusive = false, multiplicity = "1..*")
List<StudentGrade> gradeList;
#Override
public void run() {
gradeList.forEach(e -> System.out.println(e.name + ": " + e.grade));
}
public static void main(String[] args) {
System.exit(new CommandLine(new Grades()).execute(args));
}
}
Running the above program with this input:
Alice 3.5 Betty 4.0 "X Æ A-12" 3.5 Zaphod 3.4
Produces the following output:
Alice: 3.5
Betty: 4.0
X Æ A-12: 3.5
Zaphod: 3.4
Prior to picocli 4.3, applications can do the following to accomplish this:
import picocli.CommandLine;
import picocli.CommandLine.Command;
import picocli.CommandLine.Parameters;
import java.math.BigDecimal;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
#Command(name = "grades", mixinStandardHelpOptions = true, version = "grades 1.0")
public class Grades implements Runnable {
#Parameters(arity = "2",
description = "Each pair must have a name and a grade.",
paramLabel = "(NAME GRADE)...", hideParamSyntax = true)
List<String> gradeList;
#Override
public void run() {
System.out.println(gradeList);
Map<String, BigDecimal> map = new LinkedHashMap<>();
for (int i = 0; i < gradeList.size(); i += 2) {
map.put(gradeList.get(i), new BigDecimal(gradeList.get(i + 1)));
}
}
public static void main(String[] args) {
int exitCode = new CommandLine(new Grades()).execute(args);
System.exit(exitCode);
}
}

Kotlin - group elements by a key under some conditions with new value type

I'm trying to find a way to use Kotlin collection operation to do some logic that I'm going to explain:
Let's say type Classroom contains a list of Student as a field in it, eg. classroom.getSudents() returns a list of certain studends.
Now I have a list of mixed Student that I need to group by one of its fields say major, and the value of the resultant map to be Classroom.
So I need to convert List<Student> to Map<Student.major, Classroom>
Also at some cases of major, for example for all major == chemistry, I'll need to group by another criteria, say firstname, so the keys of major chemistry would be major_firstname
Here's an example, I have a list of Student(major, firstname):
[
Student("chemistry", "rafael"),
Student("physics", "adam"),
Student("chemistry", "michael"),
Student("math", "jack"),
Student("chemistry", "rafael"),
Student("biology", "kevin")
]
I need the result to be:
{
"math" -> Classroom(Student("math", "jack")),
"physics" -> Classroom(Student("physics", "adam")),
"chemistry_michael" -> Classroom(Student("chemistry", "michael")),
"chemistry_rafael" -> Classroom(Student("chemistry", "rafael"), Student("chemistry", "rafael")),
"biology" -> Classroom(Student("biology", "kevin"))
}
I've tried groupBy, flatMapTo and associateBy but as far as I understand all of these doesn't group by a certain condition.
I will try to answer the 1st part as Roland posted an answer for the 2nd part (although I did not try it).
Assuming your classes are:
class Student(val major: String, val firstName: String)
class Classroom(val studentList: MutableList<Student>) {
fun getStudents(): MutableList<Student> {
return studentList
}
}
and with an initialization like:
val list = mutableListOf<Student>(
Student("chemistry", "rafael"),
Student("physics", "adam"),
Student("chemistry", "michael"),
Student("math", "jack"),
Student("chemistry", "rafael"),
Student("biology", "kevin"))
val classroom = Classroom(list)
val allStudents = classroom.getStudents()
you can have a result list:
val finalList: MutableList<Pair<String, Classroom>> = mutableListOf()
allStudents.map { it.major }.distinctBy { it }.forEach { major ->
finalList.add(major to Classroom(allStudents.filter { it.major == major }.toMutableList()))
}
so by the below code:
finalList.forEach {
println(it.first + "->")
it.second.getStudents().forEach { println(" " + it.major + ", " + it.firstName) }
}
this will be printed:
chemistry->
chemistry, rafael
chemistry, michael
chemistry, rafael
physics->
physics, adam
math->
math, jack
biology->
biology, kevin
It's actually the mixture of those methods which you require. There are also other ways to achieve it, but here is one possible example using groupBy and flatMap:
val result = students.groupBy { it.major }
.flatMap { (key, values) -> when (key) {
"chemistry" -> values.map { it.firstname }
.distinct()
.map { firstname -> "chemistry_$firstname" to ClassRoom(values.filter { it.firstname == firstname }) }
else -> listOf(key to ClassRoom(values))
}
}.toMap()
Assuming the following data classes:
data class Student(val major: String, val firstname: String)
data class ClassRoom(val students : List<Student>)
If you also want a map with all students grouped by major, the following suffices:
val studentsPerMajor = students.groupBy { it.major }
.map { (major, values) -> major to ClassRoom(values) }
If you then rather want to continue working with that map instead of recalculating everything from the source, it's also possible, e.g. the following will then return your desired map based on the studentsPerMajor:
val result = studentsPerMajor.flatMap { (key, classroom) -> when (key) {
"chemistry" -> classroom.students.map { it.firstname }
.distinct()
.map { firstname -> "chemistry_$firstname" to ClassRoom(classroom.students.filter { it.firstname == firstname }) }
else -> listOf(key to classroom)
}
}.toMap()

Dictionary contains a certain value swift 3

I want to check if a string exists in any of the values in my Dictionary
Dictionary<String, AnyObject>
I know arrays has .contains so I would think a dictionary does too. Xcode tells me to use the following when I start typing contains
countDic.contains(where: { ((key: String, value: AnyObject)) -> Bool in
<#code#>
})
I just don't understand how to use this I know inside I need to return a Bool, but I don't understand where I put what String I'm looking for. Any help would be great.
contains(where:) checks if any element of the collection satisfies
the given predicate, so in your case it would be
let b = countDic.contains { (key, value) -> Bool in
value as? String == givenString
}
or, directly applied to the values view of the dictionary:
let b = countDic.values.contains { (value) -> Bool in
value as? String == givenString
}
In both cases it is necessary to (optionally) cast the AnyObject
to a String in order to compare it with the given string.
It would be slightly easier with a dictionary of type
Dictionary<String, String> because strings are Equatable,
and the contains(element:) method can be used:
let b = countDic.values.contains(givenString)
Since your values are AnyObject – Any in Swift 3 - you have to check if the value is a string. If yes check if the value contains the substring.
let countDic : [String:Any] = ["alpha" : 1, "beta" : "foo", "gamma" : "bar"]
countDic.contains { (key, value) -> Bool in
if let string = value as? String { return string.contains("oo") }
return false
}
However if you want to check if any of the values is equal to (rather than contains) a string you could use also the filter function and isEmpty
!countDic.filter { (key, value) -> Bool in
value as? String == "foo"
}.isEmpty
You may need to learn basic usage of contains(where:) for Dictionarys first:
For [String: Int]:
let myIntDict1: [String: Int] = [
"a" : 1,
"b" : 101,
"c" : 2
]
let myIntDict1ContainsIntGreaterThan100 = myIntDict1.contains {
key, value in //<- `value` is inferred as `Int`
value > 100 //<- true when value > 100, false otherwise
}
print(myIntDict1ContainsIntGreaterThan100) //->true
For [String: String]:
let myStringDict1: [String: String] = [
"a" : "abc",
"b" : "def",
"c" : "ghi"
]
let myStringDict1ContainsWordIncludingLowercaseE = myStringDict1.contains {
key, value in //<- `value` is inferred as `String`
value.contains("e") //<- true when value contains "e", false otherwise
}
print(myStringDict1ContainsWordIncludingLowercaseE) //->true
So, with [String: AnyObject]:
let myAnyObjectDict1: [String: AnyObject] = [
"a" : "abc" as NSString,
"b" : 101 as NSNumber,
"c" : "ghi" as NSString
]
let myAnyObjectDict1ContainsWordIncludingLowercaseE = myAnyObjectDict1.contains {
key, value in //<- `value` is inferred as `AnyObject`
//`AnyObject` may not have the `contains(_:)` method, so you need to check with `if-let-as?`
if let stringValue = value as? String {
return value.contains("e") //<- true when value is a String and contains "e"
} else {
return false //<- false otherwise
}
}
print(myAnyObjectDict1ContainsWordIncludingLowercaseE) //->false
So, in your case:
let countDic: [String: AnyObject] = [
"a" : 1 as NSNumber,
"b" : "no" as NSString,
"c" : 2 as NSNumber
]
let countDicContainsString = countDic.contains {
key, value in //<- `value` is inferred as `AnyObject`
value is String //<- true when value is a String, false otherwise
}
print(countDicContainsString) //->true

Swift Dictionary Filter

So it looks like the filter function on a Swift (2.x) dictionary returns a tuple array. My question is there an elegant solution to turning it back into a dictionary? Thanks in advance.
let dictionary: [String: String] = [
"key1": "value1",
"key2": "value2",
"key3": "value3"
]
let newTupleArray: [(String, String)] = dictionary.filter { (tuple: (key: String, value: String)) -> Bool in
return tuple.key != "key2"
}
let newDictionary: [String: String] = Dictionary(dictionaryLiteral: newTupleArray) // Error: cannot convert value of type '[(String, String)]' to expected argument type '[(_, _)]'
If you are looking for a more functional approach:
let result = dictionary.filter {
$0.0 != "key2"
}
.reduce([String: String]()) { (var aggregate, elem) in
aggregate[elem.0] = elem.1
return aggregate
}
reduce here is used to construct a new dictionary from the filtered tuples.
Edit: since var parameters has been deprecated in Swift 2.2, you need to create a local mutable copy of aggregate:
let result = dictionary.filter {
$0.0 != "key2"
}
.reduce([String: String]()) { aggregate, elem in
var newAggregate = aggregate
newAggregate[elem.0] = elem.1
return newAggregate
}
You can extend Dictionary so that it takes a sequence of tuples as initial values:
extension Dictionary {
public init<S: SequenceType where S.Generator.Element == (Key, Value)>(_ seq: S) {
self.init()
for (k, v) in seq { self[k] = v }
}
}
and then do
let newDictionary = Dictionary(newTupleArray)

Linq retrieve non repeated values

I have two tables Event_Day and Event_Session which is like this
Event_Day
Event_Day_Id(PK) Event_Id DayNo Day_Date
420 120 1 20/6/2013
421 120 2 21/6/2013
422 120 3 22/6/2013
Event_Session
Event_Session_Id(PK) Event_Id Event_Day_Id
170 120 420
171 120 420
172 120 420
173 120 421
174 120 421
175 120 421
I Want to retrieve by comparing data from these two table using Linq
Event_Day_Id DayNo DayDate
420 1 21/6/2013
421 2 22/6/2013
Pls help me to retrieve data using Linq
You want to use the Enumerable.Distinct LINQ method. Read the documentation here. You first want to do a join on the two tables (suppose your data sets are eventSessions and eventDays):
var dayInfo = from sess in eventSessions
join day in eventDays
on sess.Event_Day_Id equals day.Event_Day_Id
select new { Event_Day_Id = sess.Event_Day_Id, DayNo = day.DayNo, DayDate = sess.Day_Date };
If you are unfamiliar with LINQ equijoins, read the documentation here.
After that, you want to use the Distinct method:
var uniqueDayInfo = dayInfo.Distinct();
Note that Distinct also has an overload that takes in an IEqualityComparer, in cases where you do not want to use default equality comparison. The initial LINQ query (the one that initializes the dayInfo variable) ends with a projection to an anonymous type. The default equality comparison for anonymous types runs default equality comparison on all the properties. Read more about anonymous types and the overridden Equals method here.
If all the selected properties are simple (ints, DateTimes, etc.), this should be sufficient for the Distinct to work as desired without providing the optional IEqualityComparer argument.
If you have a helper class to create a comparer (shown below) you can do it like this:
var unique_session = Sessions.Distinct(
new GenComp<Event_Session>((a,b) =>
(a.Event_Id == b.Event_Id) && (a.Event_Day_Id == b.Event_Day_Id),
(a) => a.Event_Id.GetHashCode()+a.Event_Day_Id.GetHashCode()));
var result = unique_session.Join(Days,
s => new { Event_Id = s.Event_Id, Event_Day_Id = s.Event_Day_Id },
d => new { Event_Id = d.Event_Id, Event_Day_Id = d.Event_Day_Id },
(s, d) => new { Event_Day_Id = d.Event_Day_Id,
DayNo = d.DayNo,
DayDate = d.Day_Date });
Here is the helper class
public class GenComp<T> : IEqualityComparer<T>
{
public Func<T, T, bool> comp { get; private set; }
public Func<T, int> hash { get; private set; }
public GenComp(Func<T, T, bool> inComp, Func<T,int> inHash)
{
comp = inComp;
hash = inHash;
}
public GenComp(Func<T, T, bool> inComp)
{
comp = inComp;
hash = null;
}
public bool Equals(T x, T y)
{
return comp(x, y);
}
public int GetHashCode(T obj)
{
return hash == null ? obj.GetHashCode() : hash(obj);
}
}
Full source code test which runs under LinqPad is here: https://gist.github.com/hoganlong/5820080
Which returns the following:
Note: I suggest LinqPad at LinqPad.com for solving these types of problems - it rocks.

Resources