Slow data reading from Google BigTable - airflow

Airflow 1.10.14 and composer 1.15.2, google Bigtable in GCP
I'm getting this issue
{taskinstance.py:1152} ERROR - <_MultiThreadedRendezvous of RPC that terminated with
status = StatusCode.ABORTE
details = "Error while reading table 'projects/pyproject/instances/bigtable-02/tables/mytable' : Response was not consumed in time; terminating connection.(Possible causes: slow client data read or network problems)
debug_error_string = "{"created":"#1649401290.125577144","description":"Error received from peer ipv4:142.251.6.95:443","file":"src/core/lib/surface/call.cc","file_line":1061,"grpc_message":"Error while reading table 'projects/pyproject/instances/bigtable-02/tables/mytable' : Response was not consumed in time; terminating connection.(Possible causes: slow client data read or network problems)","grpc_status":10}
>
Traceback (most recent call last)
File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 980, in _run_raw_tas
result = task_copy.execute(context=context
File "/home/airflow/gcs/plugins/minutebardata.py", line 55, in execut
tk.get_data(dt=self.dt
File "/home/airflow/gcs/plugins/scraper/ticker.py", line 46, in get_dat
df0 = self.dbhook._load_all_minubebar_tickers(ticker
File "/home/airflow/gcs/plugins/scraper/db_wrapper.py", line 583, in _load_all_minubebar_ticker
dfs = [self.process(row, ticker) for row in rows if row is not None
File "/home/airflow/gcs/plugins/scraper/db_wrapper.py", line 583, in <listcomp
dfs = [self.process(row, ticker) for row in rows if row is not None
File "/opt/python3.6/lib/python3.6/site-packages/google/cloud/bigtable/row_data.py", line 485, in __iter_
response = self._read_next_response(
File "/opt/python3.6/lib/python3.6/site-packages/google/cloud/bigtable/row_data.py", line 474, in _read_next_respons
return self.retry(self._read_next, on_error=self._on_error)(
File "/opt/python3.6/lib/python3.6/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_fun
on_error=on_error
File "/opt/python3.6/lib/python3.6/site-packages/google/api_core/retry.py", line 184, in retry_targe
return target(
File "/opt/python3.6/lib/python3.6/site-packages/google/cloud/bigtable/row_data.py", line 470, in _read_nex
return six.next(self.response_iterator
File "/opt/python3.6/lib/python3.6/site-packages/grpc/_channel.py", line 416, in __next_
return self._next(
File "/opt/python3.6/lib/python3.6/site-packages/grpc/_channel.py", line 803, in _nex
raise sel
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with
status = StatusCode.ABORTE
details = "Error while reading table 'projects/pyproject/instances/bigtable-02/tables/mytable' : Response was not consumed in time; terminating connection.(Possible causes: slow client data read or network problems)
debug_error_string = "{"created":"#1649401290.125577144","description":"Error received from peer ipv4:142.251.6.95:443","file":"src/core/lib/surface/call.cc","file_line":1061,"grpc_message":"Error while reading table 'projects/pyproject/instances/bigtable-02/tables/mytable' : Response was not consumed in time; terminating connection.(Possible causes: slow client data read or network problems)","grpc_status":10}
I use the following approach
row_set = RowSet()
row_set.add_row_range_from_keys(
start_key=ticker + b"#" + startdt,
end_key=ticker + b"#" + enddt)
rows = table.read_rows(
row_set=row_set)
This code works properly when I run this locally. However, when I try to run this in GCP I get this issue.
Could you give me any hints to find the solution?

Related

user id not found even when user exists

I have been trying to retrieve the data from instagram using instagramy but i get on getting the errors like profile doesnt exist but actually the profile username is copied from instagram itself.
from instagramy.plugins.analysis import analyze_users_popularity
import pandas as pd
session_id = "58094758320%3AcPGOQQFP3YK2sq%3A0%3AAYeC7jQGmpaEYf2il0Evg60SXDeJarTsjUjc5TG7RQ"
# Instagram user_id of ipl teams
teams = ["chennaiipl", "mumbaiindians",
"royalchallengersbangalore", "kkriders",
"delhicapitals", "sunrisershyd",
"kxipofficial"]
data = analyze_users_popularity(teams ,session_id)
df= pd.DataFrame(data)
df
and i keep on getting this error -------
HTTPError: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\siddh\OneDrive\Desktop\instagram scrapper.py", line 17, in <module>
data = analyze_users_popularity(teams ,session_id)
File "C:\Users\siddh\anaconda3\lib\site-packages\instagramy\plugins\analysis.py", line 16, in analyze_users_popularity
user = InstagramUser(username, sessionid)
File "C:\Users\siddh\anaconda3\lib\site-packages\instagramy\InstagramUser.py", line 61, in __init__
data = self.get_json()
File "C:\Users\siddh\anaconda3\lib\site-packages\instagramy\InstagramUser.py", line 83, in get_json
raise UsernameNotFound(self.url.split("/")[-2])
UsernameNotFound: InstagramUser('chennaiipl') not Found

AirflowException("Task received SIGTERM signal")

I'm running Airflow with Docker swarm on 5 servers. After using 2 months, there are some errors on Dags like this. These errors occurred in dags that using a custom hive operator (similar to the inner function ) and no error occurred before 2 months. (Nothing changed with Dags...)
Also, if I tried to retry dag, sometimes it succeeded and sometimes it failed.
The really weird thing about this issue is that hive job was not failed. After the task was marked as failed in the airflow webserver (Sigterm), the query was complete after 1~10 mins.
As a result, flow is like this.
Task start -> 5~10 mins -> error (sigterm, airflow) -> 1~10 mins -> hive job success (hadoop log)
[2023-01-09 08:06:07,583] {local_task_job.py:208} WARNING - State of this instance has been externally set to up_for_retry. Terminating instance.
[2023-01-09 08:06:07,588] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 135213
[2023-01-09 08:06:07,588] {taskinstance.py:1236} ERROR - Received SIGTERM. Terminating subprocesses.
[2023-01-09 08:13:42,510] {taskinstance.py:1463} ERROR - Task failed with exception
Traceback (most recent call last):
File "/opt/airflow/dags/common/operator/hive_q_operator.py", line 81, in execute
cur.execute(statement) # hive query custom operator
File "/home/airflow/.local/lib/python3.8/site-packages/pyhive/hive.py", line 454, in execute
response = self._connection.client.ExecuteStatement(req)
File "/home/airflow/.local/lib/python3.8/site-packages/TCLIService/TCLIService.py", line 280, in ExecuteStatement
return self.recv_ExecuteStatement()
File "/home/airflow/.local/lib/python3.8/site-packages/TCLIService/TCLIService.py", line 292, in recv_ExecuteStatement
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/protocol/TBinaryProtocol.py", line 134, in readMessageBegin
sz = self.readI32()
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/protocol/TBinaryProtocol.py", line 217, in readI32
buff = self.trans.readAll(4)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/transport/TTransport.py", line 62, in readAll
chunk = self.read(sz - have)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift_sasl/__init__.py", line 173, in read
self._read_frame()
File "/home/airflow/.local/lib/python3.8/site-packages/thrift_sasl/__init__.py", line 177, in _read_frame
header = self._trans_read_all(4)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift_sasl/__init__.py", line 210, in _trans_read_all
return read_all(sz)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/transport/TTransport.py", line 62, in readAll
chunk = self.read(sz - have)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/transport/TSocket.py", line 150, in read
buff = self.handle.recv(sz)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1238, in signal_handler
raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
I already restarted the airflow server and there was nothing changed.
Here is the failed task's log (flower log)
Is there any helpful guide for me?
Thanks :)
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/celery/app/trace.py", line 412, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/celery/app/trace.py", line 704, in __protected_call__
return self.run(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/celery_executor.py", line 88, in execute_command
_execute_in_fork(command_to_exec)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/celery_executor.py", line 99, in _execute_in_fork
raise AirflowException('Celery command failed on host: ' + get_hostname())
airflow.exceptions.AirflowException: Celery command failed on host: 8be4caa25d17

exception.ReadTimeout not handling properly

I looked through similar questions before posting and one helped with the continue statement. It is still not handling the ReadTimeout exception. I know it is user error somewhere. I am pinging a list of my subdomains and it fails after the active ones. After an inactive subdomain, I need it to proceed to the next one in the list.
for line in f:
line = line.strip()
r = requests.get("https://" + line, timeout = 2)
try:
if r.status_code == 200:
g.write(line + " is active!")
print(line + " is active!")
continue
except (requests.exceptions.ConnectionTimeout, requests.exceptions.MaxRetryError, requests.exceptions.ConnectTimeoutError, requests.exceptions.ConnectTimeout, requests.exceptions.HTTPError, requests.exceptions.ReadTimeout, requests.exceptions.Timeout, requests.exceptions.ConnectionError):
continue
Here is the traceback:
Traceback (most recent call last):
File "is_alive.py", line 21, in <module>
is_alive()
File "is_alive.py", line 13, in is_alive
r = requests.get("https://" + line, timeout = 2)
File "/.local/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/.local/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/.local/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/.local/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/.local/lib/python3.8/site-packages/requests/adapters.py", line 504, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='111.111.111.111', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f6eb79ad7c0>, 'Connection to 111.111.111.111 timed out. (connect timeout=2)'))
After looking around wording my question in different ways, I found the simplest way to be this:
for line in f:
line = line.strip()
r = subprocess.call(['ping', '-c', '2', '-W 2', line])
if r == 0:
g.write(line + ' is alive!\n')
print(line + ' is alive!')
else:
continue

why I got the errors PartitionOwnedError and ConsumerStoppedException when starting a few consumers

I use pykafka to fetch message from kafka topic, and then do some process and update to mongodb. As the pymongodb can update only one item every time, so I start 100 processes. But when starting, some processes occoured errors "PartitionOwnedError and ConsumerStoppedException". I don't know why.
Thank you.
kafka_cfg = conf['kafka']
kafka_client = KafkaClient(kafka_cfg['broker_list'])
topic = kafka_client.topics[topic_name]
balanced_consumer = topic.get_balanced_consumer(
consumer_group=group,
auto_commit_enable=kafka_cfg['auto_commit_enable'],
zookeeper_connect=kafka_cfg['zookeeper_list'],
zookeeper_connection_timeout_ms = kafka_cfg['zookeeper_conn_timeout_ms'],
consumer_timeout_ms = kafka_cfg['consumer_timeout_ms'],
)
while(1):
for msg in balanced_consumer:
if msg is not None:
try:
value = eval(msg.value)
id = long(value.pop("id"))
value["when_update"] = datetime.datetime.now()
query = {"_id": id}}
result = collection.update_one(query, {"$set": value}, True)
except Exception, e:
log.error("Fail to update: %s, msg: %s", e, msg.value)
>
Traceback (most recent call last):
File "dump_daily_summary.py", line 182, in <module>
dump_daily_summary.run()
File "dump_daily_summary.py", line 133, in run
for msg in self.balanced_consumer:
File "/data/share/python2.7/lib/python2.7/site-packages/pykafka-2.5.0.dev1-py2.7-linux-x86_64.egg/pykafka/balancedconsumer.py", line 745, in __iter__
message = self.consume(block=True)
File "/data/share/python2.7/lib/python2.7/site-packages/pykafka-2.5.0.dev1-py2.7-linux-x86_64.egg/pykafka/balancedconsumer.py", line 734, in consume
raise ConsumerStoppedException
pykafka.exceptions.ConsumerStoppedException
>
Traceback (most recent call last):
File "dump_daily_summary.py", line 182, in <module>
dump_daily_summary.run()
File "dump_daily_summary.py", line 133, in run
for msg in self.balanced_consumer:
File "/data/share/python2.7/lib/python2.7/site-packages/pykafka-2.5.0.dev1-py2.7-linux-x86_64.egg/pykafka/balancedconsumer.py", line 745, in __iter__
message = self.consume(block=True)
File "/data/share/python2.7/lib/python2.7/site-packages/pykafka-2.5.0.dev1-py2.7-linux-x86_64.egg/pykafka/balancedconsumer.py", line 726, in consume
self._raise_worker_exceptions()
File "/data/share/python2.7/lib/python2.7/site-packages/pykafka-2.5.0.dev1-py2.7-linux-x86_64.egg/pykafka/balancedconsumer.py", line 271, in _raise_worker_exceptions
raise ex
pykafka.exceptions.PartitionOwnedError
PartitionOwnedError: check if there are some background process consuming in the same consumer_group, maybe there are not enough available partitions for starting another consumer.
ConsumerStoppedException: you can try upgrading your pykafka version (https://github.com/Parsely/pykafka/issues/574)
I met the same problem like you. But, I confused about others' solutions like adding enough partitions for consumers or updating the version of pykafka.
In fact, mine satisfied those conditions above.
Here is the version of tools:
python 2.7.10
kafka 2.11-0.10.0.0
zookeeper 3.4.8
pykafka 2.5.0
Here is my code:
class KafkaService(object):
def __init__(self, topic):
self.client_hosts = get_conf("kafka_conf", "client_host", "string")
self.topic = topic
self.con_group = topic
self.zk_connect = get_conf("kafka_conf", "zk_connect", "string")
def kafka_consumer(self):
"""kafka-consumer client, using pykafka
:return: {"id": 1, "url": "www.baidu.com", "sitename": "baidu"}
"""
from pykafka import KafkaClient
consumer = ""
try:
kafka = KafkaClient(hosts=str(self.client_hosts))
topic = kafka.topics[self.topic]
consumer = topic.get_balanced_consumer(
consumer_group=self.con_group,
auto_commit_enable=True,
zookeeper_connect=self.zk_connect,
)
except Exception as e:
logger.error(str(e))
while True:
message = consumer.consume(block=False)
if message:
print "message:", message.value
yield message.value
The two exceptions(ConsumerStoppedException and PartitionOwnedError), are raised by the function consum(block=True) of pykafka.balancedconsumer.
Of course, I recommend you to read the source code of that function.
There is a argument block=True, after altering it to False, the programme can not fall into the exceptions.
Then the kafka consumers work fine.
This behavior is affected by a longstanding bug that was recently discovered and is currently being fixed. The workaround we've used in production at Parse.ly is to run our consumers in an environment that handles automatically restarting them when they crash with these errors until all partitions are owned.

pexpect python throw error

Although this is my first attempt at using pexpect, the python3 script using pexpect is pretty simple; yet it fails.
#!/usr/bin/env python3
import sys
import pexpect
SSH_NEWKEY = r'Are you sure you want to continue connecting \(yes/no\)\?'
child = pexpect.spawn("ssh -i /user/aws/key.pem ec2-user#xxx.xxx.xxx.xxx date")
i = child.expect( [ pexpect.TIMEOUT, SSH_NEWKEY )
if i == 1:
child.sendline('yes')
print(child.before)
The SSH_NEWKEY is the only response I'm expecting, but the example showed a list containing pexpect.TIMEOUT in it so I used it.
$ ./test.py
Traceback (most recent call last):
File "/usr/local/lib/python3.4/site-packages/pexpect/spawnbase.py", line 144, in read_nonblocking
s = os.read(self.child_fd, size)
OSError: [Errno 5] Input/output error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.4/site-packages/pexpect/expect.py", line 97, in expect_loop
incoming = spawn.read_nonblocking(spawn.maxread, timeout)
File "/usr/local/lib/python3.4/site-packages/pexpect/pty_spawn.py", line 455, in read_nonblocking
return super(spawn, self).read_nonblocking(size)
File "/usr/local/lib/python3.4/site-packages/pexpect/spawnbase.py", line 149, in read_nonblocking
raise EOF('End Of File (EOF). Exception style platform.')
pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./min.py", line 15, in <module>
i = child.expect( [ pexpect.TIMEOUT, SSH_NEWKEY ] )
File "/usr/local/lib/python3.4/site-packages/pexpect/spawnbase.py", line 315, in expect
timeout, searchwindowsize, async)
File "/usr/local/lib/python3.4/site-packages/pexpect/spawnbase.py", line 339, in expect_list
return exp.expect_loop(timeout)
File "/usr/local/lib/python3.4/site-packages/pexpect/expect.py", line 102, in expect_loop
return self.eof(e)
File "/usr/local/lib/python3.4/site-packages/pexpect/expect.py", line 49, in eof
raise EOF(msg)
pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
<pexpect.pty_spawn.spawn object at 0x7f70ea4fbcf8>
command: /usr/bin/ssh
args: ['/usr/bin/ssh', '-i', '/user/aws/key.pem', 'ec2-user#xxx.xxx.xxx.xxx', 'date']
searcher: None
buffer (last 100 chars): b''
before (last 100 chars): b'Fri May 6 13:50:18 EDT 2016\r\n'
after: <class 'pexpect.exceptions.EOF'>
match: None
match_index: None
exitstatus: 0
flag_eof: True
pid: 31293
child_fd: 5
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
What am I missing?
CentOS 6.4
python 3.4.3
An EOF error is being raised during your expect call. This means that the response received does not match SSH_NEWKEY, and reaches end of file within the timeout period. To catch this exception, you should change your except line to read:
i = child.expect( [ pexpect.TIMEOUT, SSH_NEWKEY, pexpect.EOF)
You can then make your if more robust:
if i == 1:
child.sendline('yes')
elif i == 0:
print "Timeout"
elif i == 2:
print "EOF"
print(child.before)
This doesn't solve the reason behind why you are on receiving a response with the expected string - it's hard to know without looking at more code but it's likely because you have the response slightly wrong. If you manually type in the SSH string, you should be able to see the response you can expect, and enter this response into your code.
You can also print child.before after your expect call, or print child.read() instead of your expect call to see what is being sent back as a response.

Resources