I am implementing a simple data analytic functionality with RXJava, where a topic subscriber asynchronously processes the data published to a topic, depositing the output to the Redis.
When a message is received, the Spring component publishes it to an Observable. To avoid blocking the submission I used the RxJava Async to do this asynchronously.
#Override
public void onMessage(final TransactionalMessage message) {
Async.start(new Func0<Void>() {
#Override
public Void call() {
analyser.process(message);
return null;
}
});
}
I have two confusions in implementing other processing parts; 1) creating an asynchronous Observable with buffering 2) Computing different logics in parallel based on message type on list of messages.
After long experiments I found two ways to create the Async Observable and not sure which one is the right and better approach.
Way one,
private static final class Analyzer {
private Subscriber<? super TransactionalMessage> subscriber;
public Analyzer() {
OnSubscribe<TransactionalMessage> f = subscriber -> this.subscriber = subscriber;
Observable.create(f).observeOn(Schedulers.computation())
.buffer(5, TimeUnit.SECONDS, 5, Schedulers.io())
.skipWhile((list) -> list == null || list.isEmpty())
.subscribe(t -> compute(t));
}
public void process(TransactionalMessage message) {
subscriber.onNext(message);
}
}
Way two
private static final class Analyser {
private PublishSubject<TransactionalMessage> subject;
public Analyser() {
subject = PublishSubject.create();
Observable<List<TransactionalMessage>> observable = subject
.buffer(5, TimeUnit.SECONDS, 5, Schedulers.io())
.observeOn(Schedulers.computation());
observable.subscribe(new Observer<List<TransactionalMessage>>() {
#Override
public void onCompleted() {
log.debug("[Analyser] onCompleted(), completed!");
}
#Override
public void onError(Throwable e) {
log.error("[Analyser] onError(), exception, ", e);
}
#Override
public void onNext(List<TransactionalMessage> t) {
compute(t);
}
});
}
public void process(TransactionalMessage message) {
subject.onNext(message);
}
}
The TransactionalMessage comes in different types, so I want to perform different computations based on the types. One approach I tried is filter the list based on every type and process them separately, but this looks so bad and I think does not work in parallel. What way to process them in parallel?
protected void compute(List<TransactionalMessage> messages) {
Observable<TransactionalMessage> observable = Observable
.from(messages);
Observable<String> observable2 = observable
.filter(new Func1<TransactionalMessage, Boolean>() {
#Override
public Boolean call(TransactionalMessage t) {
return t.getMsgType()
.equals(OttMessageType.click.name());
}
}).flatMap(
new Func1<TransactionalMessage, Observable<String>>() {
#Override
public Observable<String> call(
TransactionalMessage t) {
return Observable.just(
t.getMsgType() + t.getAppId());
}
});
Observable<String> observable3 = observable
.filter(new Func1<TransactionalMessage, Boolean>() {
#Override
public Boolean call(TransactionalMessage t) {
return t.getMsgType()
.equals(OttMessageType.image.name());
}
}).flatMap(
new Func1<TransactionalMessage, Observable<String>>() {
#Override
public Observable<String> call(
TransactionalMessage t) {
return Observable.just(
t.getMsgType() + t.getAppId());
}
});
// I sense some code smell in filtering on type and processing it.
Observable.merge(observable2, observable3)
.subscribe(new Action1<String>() {
#Override
public void call(String t) {
// save it to redis
System.out.println(t);
}
});
}
I suggest thinking about Subjects before attempting to use create.
If you want parallel processing done based on some categorization, you could use groupBy along with observeOn to achieve the desired effect:
Observable.range(1, 100)
.groupBy(v -> v % 3)
.flatMap(g ->
g.observeOn(Schedulers.computation())
.reduce(0, (a, b) -> a + b)
.map(v -> g.getKey() + ": " + v)
)
.toBlocking().forEach(System.out::println);
Related
There is a quite simple case I would like to implement:
I have a base and DLT topics:
MessageBus:
Topic: my_topic
DltTopic: my_dlt_topic
Broker: event-serv:9092
So, those topics are already predefined, I don't need to create them automatically.
The only I need to handle broken messages automatically without retries, because they don't make any sense, so I have something like this:
#KafkaListener(topics = ["#{config.messageBus.topic}"], groupId = "group_id")
#RetryableTopic(
dltStrategy = DltStrategy.FAIL_ON_ERROR,
autoCreateTopics = "false",
attempts = "1"
)
#Throws(IOException::class)
fun consume(rawMessage: String?) {
...
}
#DltHandler
fun processMessage(rawMessage: String?) {
kafkaTemplate.send(config.messageBus.dltTopic, rawMessage)
}
That of course doesn't work properly.
I also tried to specify a kafkaTemplate
#Bean
fun kafkaTemplate(
config: Config,
producerFactory: ProducerFactory<String, String>
): KafkaTemplate<String, String> {
val template = KafkaTemplate(producerFactory)
template.defaultTopic = config.messageBus.dltTopic
return template
}
however, that does not change the situation.
In the end, I believe there is an obvious solution, so I please give me a hint about it.
See the documenation.
#SpringBootApplication
public class So69317126Application {
public static void main(String[] args) {
SpringApplication.run(So69317126Application.class, args);
}
#RetryableTopic(attempts = "1", autoCreateTopics = "false", dltStrategy = DltStrategy.FAIL_ON_ERROR)
#KafkaListener(id = "so69317126", topics = "so69317126")
void listen(String in) {
System.out.println(in);
throw new RuntimeException();
}
#DltHandler
void handler(String in) {
System.out.println("DLT: " + in);
}
#Bean
RetryTopicNamesProviderFactory namer() {
return new RetryTopicNamesProviderFactory() {
#Override
public RetryTopicNamesProvider createRetryTopicNamesProvider(Properties properties) {
if (properties.isMainEndpoint()) {
return new SuffixingRetryTopicNamesProviderFactory.SuffixingRetryTopicNamesProvider(properties) {
#Override
public String getTopicName(String topic) {
return "so69317126";
}
};
}
else if(properties.isDltTopic()) {
return new SuffixingRetryTopicNamesProviderFactory.SuffixingRetryTopicNamesProvider(properties) {
#Override
public String getTopicName(String topic) {
return "so69317126.DLT";
}
};
}
else {
throw new IllegalStateException("Shouldn't get here - attempts is only 1");
}
}
};
}
}
so69317126: partitions assigned: [so69317126-0]
so69317126-dlt: partitions assigned: [so69317126.DLT-0]
foo
DLT: foo
This is a Kafka server configuration so you must set it on the server. The relevant property is:
auto.create.topics.enable (true by default)
Is there an example of how to create a GlobalKTable to keep count from a KStream using Spring Cloud stream and using Functional approach?
Is implementing processor interface the right approach?
#Bean
public Consumer<KStream<String, Long>> processorsample() {
return input -> input.process(() -> new Processor<String, Long>() {
#Override
public void init(ProcessorContext context) {
if (state == null) {
state = (KeyValueStore<String, Long>) context.getStateStore("mystate");
}
}
#Override
public void process(String key, Long value) {
if (state != null) {
if (key != null) {
Long currentCount = state.get(key);
if (currentCount == null) {
state.put(key, value);
} else {
state.put(key, currentCount + value);
}
}
}
}
#Override
public void close() {
if (state != null) {
state.close();
}
}
}, "mystate");
}
According to the documentation GlobalKTables are read-only, you cannot modify a global table during the processing.
Since GlobalKTables are consumers of a Kafka topic, you can just send your data to the GlobalKTable's source topic, and eventually, it's going to be added to the table. But you cannot be sure that the GlobalKTable will be updated immediately.
placeautocompletefragmentis deprecated. I need an alternative code for the same.
Here's the code sample
PlaceAutocompleteFragment autocompleteFragment = (PlaceAutocompleteFragment)
getFragmentManager().findFragmentById(R.id.place_autocomplete_fragment);
autocompleteFragment.setOnPlaceSelectedListener(new PlaceSelectionListener() {
#Override
public void onPlaceSelected(Place place) {
// TODO: Get info about the selected place.
destination = place.getName().toString();
destinationLatLng = place.getLatLng();
}
#Override
public void onError(Status status) {
// TODO: Handle the error.
}
});
Try migrating to
com.google.android.libraries.places.widget.AutocompleteSupportFragment
There is more information here
https://developers.google.com/places/android-sdk/autocomplete#option_1_embed_an_autocompletesupportfragment
Given an object like this:
Matcher matcher = pattern.matcher(sql);
with usage like so:
Set<String> matches = new HashSet<>();
while (matcher.find()) {
matches.add(matcher.group());
}
I'd like to replace this while loop by something more object-oriented like so:
new Iterator<String>() {
#Override
public boolean hasNext() {
return matcher.find();
}
#Override
public String next() {
return matcher.group();
}
}
so that I can easily e.g. make a Stream of matches, stick to using fluent APIs and such.
The thing is, I don't know and can't find a more concise way to create this Stream or Iterator. An anonymous class like above is too verbose for my taste.
I had hoped to find something like IteratorFactory.from(matcher::find, matcher::group) or StreamSupport.of(matcher::find, matcher::group) in the jdk, but so far no luck. I've no doubt libraries like apache commons or guava provide something for this, but let's say I can't use those.
Is there a convenient factory for Streams or Iterators that takes a hasNext/next method combo in the jdk?
In java-9 you could do it via:
Set<String> result = matcher.results()
.map(MatchResult::group)
.collect(Collectors.toSet());
System.out.println(result);
In java-8 you would need a back-port for this, taken from Holger's fabulous answer
EDIT
There is a single method btw tryAdvance that could incorporate find/group, something like this:
static class MyIterator extends AbstractSpliterator<String> {
private Matcher matcher;
public MyIterator(Matcher matcher) {
// I can't think of a better way to estimate the size here
// may be you can figure a better one here
super(matcher.regionEnd() - matcher.regionStart(), 0);
this.matcher = matcher;
}
#Override
public boolean tryAdvance(Consumer<? super String> action) {
while (matcher.find()) {
action.accept(matcher.group());
return true;
}
return false;
}
}
And usage for example:
Pattern p = Pattern.compile("\\d");
Matcher m = p.matcher("12345");
Set<String> result = StreamSupport.stream(new MyIterator(m), false)
.collect(Collectors.toSet());
This class I wrote embodies what I wanted to find in the jdk. Apparently though it just doesn't exist. eugene's accepted answer offers a java 9 Stream solution though.
public static class SearchingIterator<T> implements Iterator<T> {
private final BooleanSupplier advancer;
private final Supplier<T> getter;
private Optional<T> next;
public SearchingIterator(BooleanSupplier advancer, Supplier<T> getter) {
this.advancer = advancer;
this.getter = getter;
search();
}
private void search() {
boolean hasNext = advancer.getAsBoolean();
next = hasNext ? Optional.of(getter.get()) : Optional.empty();
}
#Override
public boolean hasNext() {
return next.isPresent();
}
#Override
public T next() {
T current = next.orElseThrow(IllegalStateException::new);
search();
return current;
}
}
Usage:
Matcher matcher = Pattern.compile("\\d").matcher("123");
Iterator<String> it = new SearchingIterator<>(matcher::find, matcher::group);
We are using Storm to process streaming data and store into HDFS. We have got everything to work but have one issue. I understand that we can specify the number of tuples after which the data gets flushed to HDFS using SyncPolicy, something like this below:
SyncPolicy syncPolicy = new CountSyncPolicy(Integer.parseInt(args[3]));
The question I have is can the data also be flushed after a timeout? For e.g. we have set the SyncPolicy above to 1000 tuples. If for whatever reason we get 995 tuples and then the data stops coming in for a while is there any way that storm can flush the 995 records to HDFS after a specified timeout (5 seconds)?
Thanks in advance for any help on this!
Shay
Yes, if you send a tick tuple to the HDFS bolt, it will cause the bolt to try to sync to the HDFS file system. All this happens in the HDFS bolt's execute function.
To configure tick tuples for your topology, in your topology config. In Java, to set that to every 300 seconds the code would look like:
Config topologyConfig = new Config();
topologyConfig.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 300);
StormSubmitter.submitTopology("mytopology", topologyConfig, builder.createTopology());
You'll have to adjust that last line depending on your circumstances.
There is an alternative solution for this problem,
First, lets clarify about sync policy, If your sync policy is 1000 ,then HdfsBolt only sync the data from 1000 tuple by calling hsync() method in execute() means it only clears the buffer by pushing data to disk, but for faster write disk may uses its cache and not writing to file directly.
The data is written to the file only when the size of data matches your rotation policy that need to specify at the time of bolt creation.
FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(100.0f, Units.KB);
So for flushing the record the to file after timeout, Seperate your tick tuple from normal tuples in excecute method and calculate the time difference of both tuple, If the diff is greater than timeout period then write the data to file.
By handling tick tuple differently you can also avoid the tick tuple frequency written to your file.
See the below code for better understanding:
public class CustomHdfsBolt1 extends AbstractHdfsBolt {
private static final Logger LOG = LoggerFactory.getLogger(CustomHdfsBolt1.class);
private transient FSDataOutputStream out;
private RecordFormat format;
private long offset = 0L;
private int tickTupleCount = 0;
private String type;
private long normalTupleTime;
private long tickTupleTime;
public CustomHdfsBolt1() {
}
public CustomHdfsBolt1(String type) {
this.type = type;
}
public CustomHdfsBolt1 withFsUrl(String fsUrl) {
this.fsUrl = fsUrl;
return this;
}
public CustomHdfsBolt1 withConfigKey(String configKey) {
this.configKey = configKey;
return this;
}
public CustomHdfsBolt1 withFileNameFormat(FileNameFormat fileNameFormat) {
this.fileNameFormat = fileNameFormat;
return this;
}
public CustomHdfsBolt1 withRecordFormat(RecordFormat format) {
this.format = format;
return this;
}
public CustomHdfsBolt1 withSyncPolicy(SyncPolicy syncPolicy) {
this.syncPolicy = syncPolicy;
return this;
}
public CustomHdfsBolt1 withRotationPolicy(FileRotationPolicy rotationPolicy) {
this.rotationPolicy = rotationPolicy;
return this;
}
public CustomHdfsBolt1 addRotationAction(RotationAction action) {
this.rotationActions.add(action);
return this;
}
protected static boolean isTickTuple(Tuple tuple) {
return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
&& tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
}
public void execute(Tuple tuple) {
try {
if (isTickTuple(tuple)) {
tickTupleTime = Calendar.getInstance().getTimeInMillis();
long timeDiff = normalTupleTime - tickTupleTime;
long diffInSeconds = TimeUnit.MILLISECONDS.toSeconds(timeDiff);
if (diffInSeconds > 5) { // specify the value you want.
this.rotateWithOutFileSize(tuple);
}
} else {
normalTupleTime = Calendar.getInstance().getTimeInMillis();
this.rotateWithFileSize(tuple);
}
} catch (IOException var6) {
LOG.warn("write/sync failed.", var6);
this.collector.fail(tuple);
}
}
public void rotateWithFileSize(Tuple tuple) throws IOException {
syncHdfs(tuple);
this.collector.ack(tuple);
if (this.rotationPolicy.mark(tuple, this.offset)) {
this.rotateOutputFile();
this.offset = 0L;
this.rotationPolicy.reset();
}
}
public void rotateWithOutFileSize(Tuple tuple) throws IOException {
syncHdfs(tuple);
this.collector.ack(tuple);
this.rotateOutputFile();
this.offset = 0L;
this.rotationPolicy.reset();
}
public void syncHdfs(Tuple tuple) throws IOException {
byte[] e = this.format.format(tuple);
synchronized (this.writeLock) {
this.out.write(e);
this.offset += (long) e.length;
if (this.syncPolicy.mark(tuple, this.offset)) {
if (this.out instanceof HdfsDataOutputStream) {
((HdfsDataOutputStream) this.out).hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH));
} else {
this.out.hsync();
}
this.syncPolicy.reset();
}
}
}
public void closeOutputFile() throws IOException {
this.out.close();
}
public void doPrepare(Map conf, TopologyContext topologyContext, OutputCollector collector) throws IOException {
LOG.info("Preparing HDFS Bolt...");
this.fs = FileSystem.get(URI.create(this.fsUrl), this.hdfsConfig);
this.tickTupleCount = 0;
this.normalTupleTime = 0;
this.tickTupleTime = 0;
}
public Path createOutputFile() throws IOException {
Path path = new Path(this.fileNameFormat.getPath(),
this.fileNameFormat.getName((long) this.rotation, System.currentTimeMillis()));
this.out = this.fs.create(path);
return path;
}
}
You can directly use this class in your project.
Thanks,