Replication factor: 3 larger than available brokers: 1 in #EmbeddedKafka - spring-kafka

I want to test kafka - transaction.
kafkaTemplate.executeInTransaction { tx ->
tx.sendDefault("abacaba") // Should I do .get() ??
tx.sendDefault("abacaba")
}
And I get next log when test is starting:
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 3 larger than available brokers: 1.
2023-01-27 16:18:17.831 INFO 81975 --- [quest-handler-4] kafka.server.ZkAdminManager
: [Admin Manager on Broker 0]: Error processing create topic request
CreatableTopic(name='__transaction_state', numPartitions=50, replicationFactor=3,
assignments=[], configs=[CreateableTopicConfig(name='compression.type',
value='uncompressed'), CreateableTopicConfig(name='cleanup.policy', value='compact'),
CreateableTopicConfig(name='min.insync.replicas', value='2'),
CreateableTopicConfig(name='segment.bytes', value='104857600'),
CreateableTopicConfig(name='unclean.leader.election.enable', value='false')])
I try settings replication factor but it don't work :(
Help me, please.

You didn't say in your question that you deal with an #EmbeddedKafka. See its JavaDocs for more info:
/**
* #return the number of brokers
*/
#AliasFor("value")
int count() default 1;
When you have a enough brokers in the cluster, then you can ask for replication factor on this or that topic, but not more than a number of brokers of course.

Related

(airflow) emr steps operator -> emr steps sensor; sensor failed -> trigger before operator

I want to handle failovers.
But, sensor failed -> retries only sensor self.
But, I want to trigger before operator.
This is flow chart.
a -> a_sensor (failed) -> a (retry) -> a_sensor -> (done)
Can I do this?
I recommend waiting the EMR job in the operator itself, even if this keeps the task running and occupying the worker slot, but it doesn't consume much resources, and you can simply manage the timeout, cleanup and the retry strategy:
class EmrOperator(BaseOperator):
...
def execute():
run_job():
wait_job()
def wait_job():
while not (is_finished()):
sleep(10s)
def on_kill():
cleanup()
And you can use the official operator EmrAddStepsOperator which already supports this.
And if you want to implement what you have mentioned in the question, Airflow doesn't support the retry for a group of tasks yet, but you can achieve this using callbacks:
a = EmrOperator(..., retries=0)
a_sensor = EmrSensor(, retries=0, on_failure_callback=emr_a_callback)
def emr_a_callback(ti, dag_run,):
max_retries = 3
retry_num = ti.xcom_pull(dag_ids=ti.task_id, key="retry_num")
if retry > max_retries:
retrun # do nothing
task_a = dag_run.get_task_instance("<task a id>")
task_a.state = None # pass a state to None
ti.state = None # pass the sensor state to None

slurm:all cpus in a node are allocated by a job which just need a subset of cpus

I have every node configured as follow in slurm.conf
NodeName=node1 NodeAddr=xxx.xxx.xxx.xxx State=UNKNOWN Procs=32 Boards=1 SocketsPerBoard=2 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=128000 TmpDisk=65536
when I run the following command
srun -n 2 sleep 60
I found that all the core in a node would be allocated by this job. If another job want to run on this node, it would be bolcked until the previous job finishes.
scontrol show the job information as following
JobId=51 JobName=sleep
UserId=hadoop(1002) GroupId=hadoop(1002) MCS_label=N/A
Priority=4294901703 Nice=0 Account=hadoop QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
RunTime=00:00:12 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2018-07-16T21:46:56 EligibleTime=2018-07-16T21:46:56
StartTime=2018-07-16T21:46:56 EndTime=Unknown Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2018-07-16T21:46:56
Partition=TOTAL AllocNode:Sid=node1:25124
ReqNodeList=(null) ExcNodeList=(null)
NodeList=xxx.xxx.xxx
BatchHost=xxx.xxx.xxx
NumNodes=1 NumCPUs=32 NumTasks=2 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=32,mem=125G,node=1,billing=32
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=125G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
Gres=(null) Reservation=(null)
OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
Command=sleep
WorkDir=/home/hadoop
Power=
Use sacct to get the history jobs , I get the following output
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
51 sleep TOTAL hadoop 32 COMPLETED 0:0
51.0 sleep hadoop 2 COMPLETED 0:0
show the partition information:
PartitionName=TOTAL
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO
MaxCPUsPerNode=UNLIMITED
Nodes=xxxxxxx
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=96 TotalNodes=3 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
It seems something wrong.
It's the problem casued by SelectType. I let it as the default value which I think is select/linear. As mentioned in Select Plugin Design Guide, select/linear is node-centric .
The select/linear and select/cons_res plugins have similar modes of operation. The obvious difference is that data structures in select/linear are node-centric, while those in select/cons_res contain information at a finer resolution (sockets, cores, threads, or CPUs depending upon the SelectTypeParameters configuration parameter).
I change SelectType to select/cons_res and restart the whole cluster, the problem is solved.

MariaDB + MaxScale Replication Error : The slave I/O thread stops because a fatal error is encountered when it tried to SELECT #master_binlog_checksum

I am trying to setup Real-time Data Streaming to Kafka with MaxScale CDC with MariaDB veriosn 10.0.32. After configuring replication, I am getting the status:
"The slave I/O thread stops because a fatal error is encountered when it tried to SELECT #master_binlog_checksum".
Below are all of my configurations:
MariaDB - Configuration
server-id = 1
#report_host = master1
#auto_increment_increment = 2
#auto_increment_offset = 1
log_bin = /var/log/mysql/mariadb-bin
log_bin_index = /var/log/mysql/mariadb-bin.index
binlog_format = row
binlog_row_image = full
# not fab for performance, but safer
#sync_binlog = 1
expire_logs_days = 10
max_binlog_size = 100M
# slaves
#relay_log = /var/log/mysql/relay-bin
#relay_log_index = /var/log/mysql/relay-bin.index
#relay_log_info_file = /var/log/mysql/relay-bin.info
#log_slave_updates
#read_only
MaxScale Configuration
[server1]
type=server
address=192.168.56.102
port=3306
protocol=MariaDBBackend
[Replication]
type=service
router=binlogrouter
version_string=10.0.27-log
user=myuser
passwd=mypwd
server_id=3
#binlogdir=/var/lib/maxscale
#mariadb10-compatibility=1
router_options=binlogdir=/var/lib/maxscale,mariadb10-compatibility=1
#slave_sql_verify_checksum=1
[Replication Listener]
type=listener
service=Replication
protocol=MySQLClient
port=5308
Starting Replication
CHANGE MASTER TO MASTER_HOST='192.168.56.102', MASTER_PORT=5308, MASTER_USER='myuser', MASTER_PASSWORD='mypwd', MASTER_LOG_POS=328, MASTER_LOG_FILE='mariadb-bin.000018';
START SLAVE;
Replication Status
Master_Host: 192.168.56.102
Master_User: myuser
Master_Port: 5308
Connect_Retry: 60
Master_Log_File: mariadb-bin.000018
Read_Master_Log_Pos: 328
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 4
Relay_Master_Log_File: mariadb-bin.000018
**Slave_IO_Running: No**
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 328
Relay_Log_Space: 248
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1593
Last_IO_Error: **The slave I/O thread stops because a fatal error is encountered when it tried to SELECT #master_binlog_checksum. Error:**
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 0
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
The binlogrouter performs the following query to set the value of #master_binlog_checksum (real replication slaves perform the same query).
SET #master_binlog_checksum = ##global.binlog_checksum
Checking what the output of it is will probably explain why the replication won't start. Most likely the SET query failed which is why the latter SELECT #master_binlog_checksum query returns unexpected results.
In cases like these, it is recommended to open a bug report on the MariaDB Jira under the MaxScale project. This way the possibility of a bug is ruled out and if it turns out to be a configuration problem, the documentation can be updated to more clearly explain how to configure MaxScale.

Setting the TCP keepalive interval on the Hiredis async context

I'm writing a wrapper around hiredis to enable publish / subscribe functionality with reconnects should a redis node go down.
I'm using the asynchronous redis API.
So I have a test harness that sets up a publisher and subscriber. The harness then shuts down the slave VM from which the subscriber is reading.
However, the disconnect callback isn't called until much later (when I'm destructing the Subscription object that contains the corresponding redisAsyncContext.
I thought that the solution to this might be using tcp keepalive.
So I found that there's a suitable redis function in net.h:
int redisKeepAlive (redisContext* c, int interval);
However, the following appears to show that the redisKeepAlive function has been omitted from the library on purpose:
$ nm libhiredis.a --demangle | grep redisKeepAlive
0000000000000030 T redisKeepAlive
U redisKeepAlive
$ nm libhiredis.a -u --demangle | grep redisKeepAlive
U redisKeepAlive
Certainly when I try to use the call, the linker complains:
Subscription.cpp:167: undefined reference to `redisKeepAlive(redisContext*, int)'
collect2: error: ld returned 1 exit status
Am I out of luck - is there a way to set the TCP keepalive interval on the Hiredis async context?
Update
I've found this:
int redisEnableKeepAlive(redisContext *c);
But setting this on the asyncContext->c and adjusting REDIS_KEEPALIVE_INTERVAL seems to have no effect.
I found that the implementation of redisKeepAlive contains code that shows how to get direct access to the underlying socket descriptor:
int redisKeepAlive(redisContext *c, int interval) {
int val = 1;
int fd = c->fd;
if (setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &val, sizeof(val)) == -1){
__redisSetError(c,REDIS_ERR_OTHER,strerror(errno));
return REDIS_ERR;
}
Maybe this'll help someone..

Using telnetlib. How to make use of a response after writing a command

I am extremely new to Python and after several weeks of study and practice programming I have begun my home automation project.
My aim is to interact with a Java based service that runs on my Windows machine called C-Gate. This then interprets and communicates with the much more complicated commands sent & received by my Clipsal C-Bus automation system.
So far I have managed to create a connection to C-Gate using telnelib and write/read commands.
What I have been trying to figure out for some time now is how to use the responses given by C-Gate and extract a particular value such as the level of a light and turn it into a variable I can use.
From there I should be able to expand my understanding and begin building an interface to use/monitor these values.
Here is my code so far:
import telnetlib
HOST = "localhost"
PORT = "20023"
def read_until(cue, timeout=2):
print tn.read_until(cue, timeout)
def tn_write(text):
tn.write(text)
tn = telnetlib.Telnet(HOST, PORT)
tn_write('project load MYHOUSE\r\n')
read_until('200 OK.')
tn_write('project use MYHOUSE\r\n')
read_until('200 OK.')
tn_write('net open 254\r\n')
read_until('200 OK: //MYHOUSE/254')
tn_write('project start\r\n')
read_until('200 OK.')
tn_write('get 254/56/1 level\r\n')
tn_write('get 254/56/2 level\r\n')
tn_write('get 254/56/3 level\r\n')
tn_write('get 254/56/4 level\r\n')
tn_write('get 254/56/5 level\r\n')
tn_write('get 254/56/6 level\r\n')
tn_write('get 254/56/7 level\r\n')
tn_write('get 254/56/8 level\r\n')
read_until('300 //MYHOUSE/254/56/1: level=0\r\n')
This then prints the following responses:
201 Service ready: Clipsal C-Gate Version: v2.9.5 (build 2460) #cmd-syntax=1.0
200 OK.
200 OK.
200 OK: //MYHOUSE/254
200 OK.
300 //MYHOUSE/254/56/1: level=100
300 //MYHOUSE/254/56/2: level=0
300 //MYHOUSE/254/56/3: level=0
300 //MYHOUSE/254/56/4: level=0
300 //MYHOUSE/254/56/5: level=0
300 //MYHOUSE/254/56/6: level=0
300 //MYHOUSE/254/56/7: level=0
300 //MYHOUSE/254/56/8: level=0
You can perfirm this task in few ways but I would recommend using regular expression. First of all, telnet code should be modified a little bit:
def read_until(cue, timeout=2):
return tn.read_until(cue, timeout)
telnetReturnedValue = read_until('200 OK: //MYHOUSE/254')
Example code which would extract value of "level":
import re
rePattern = 'level=(\d+)'
matchTuple = re.search(rePattern,telnetReturnedValue)
if(matchTuple!=None):
levelValue = matchTuple.groups()[0]
print(levelValue)
Re library documentation: http://docs.python.org/2/library/re.html .
UPDATE:
Answering more detailed whittie83's question, there are many ways to write this program. It is still about Python basics so I would recommend learning from at least some tutorial, for example: http://docs.python.org/2/tutorial/.
Anyway, I would put regexp code into function like this:
import re
def parseTelnetOutput(output,rePattern='level=(\d+)'):
print(output)
matchTuple = re.search(rePattern,output)
if(matchTuple!=None):
someValue = matchTuple.groups()[0]
return(someValue)
#telnet functions definitions and telnet initiation
levelsList = [] #empty table for extracted levels
#some telnet commands...
tn_write('project load MYHOUSE\r\n')
print(read_until('200 OK.'))#not parsing, only printing
tn_write('net open 254\r\n')
parseResult = parseTelnetOutput(read_until('200 OK: //MYHOUSE/254/56/1'))
levelsList.append(parseResult) # adding extracted level into a table
tn_write('net open 254\r\n')
parseResult = parseTelnetOutput(read_until('200 OK: //MYHOUSE/254/56/2'))
levelsList.append(parseResult) # adding extracted level into a table
#etc...
for pr in levelsList: #iterating list of extracted levels
print(pr)
It is only a suggestion. You can merge 3 functions into 1, embed variables and functions into a class and so on. It is up to you.

Resources