Emitting dronekit.io vehicle's attribute changes using flask-socket.io - flask-socketio

I'm trying to send data from my dronekit.io vehicle using flask-socket.io. Unfortunately, I got this log:
Starting copter simulator (SITL)
SITL already Downloaded and Extracted.
Ready to boot.
Connecting to vehicle on: tcp:127.0.0.1:5760
>>> APM:Copter V3.3 (d6053245)
>>> Frame: QUAD
>>> Calibrating barometer
>>> Initialising APM...
>>> barometer calibration complete
>>> GROUND START
* Restarting with stat
latitude -35.363261
>>> Exception in attribute handler for location.global_relative_frame
>>> Working outside of request context.
This typically means that you attempted to use functionality that needed
an active HTTP request. Consult the documentation on testing for
information about how to avoid this problem.
longitude 149.1652299
>>> Exception in attribute handler for location.global_relative_frame
>>> Working outside of request context.
This typically means that you attempted to use functionality that needed
an active HTTP request. Consult the documentation on testing for
information about how to avoid this problem.
Here is my code:
sample.py
from dronekit import connect, VehicleMode
from flask import Flask
from flask_socketio import SocketIO, emit
import dronekit_sitl
import time
sitl = dronekit_sitl.start_default()
connection_string = sitl.connection_string()
print("Connecting to vehicle on: %s" % (connection_string,))
vehicle = connect(connection_string, wait_ready=True)
def arm_and_takeoff(aTargetAltitude):
print "Basic pre-arm checks"
while not vehicle.is_armable:
print " Waiting for vehicle to initialise..."
time.sleep(1)
print "Arming motors"
vehicle.mode = VehicleMode("GUIDED")
vehicle.armed = True
while not vehicle.armed:
print " Waiting for arming..."
time.sleep(1)
print "Taking off!"
vehicle.simple_takeoff(aTargetAltitude)
while True:
if vehicle.location.global_relative_frame.alt>=aTargetAltitude*0.95:
print "Reached target altitude"
break
time.sleep(1)
last_latitude = 0.0
last_longitude = 0.0
last_altitude = 0.0
#vehicle.on_attribute('location.global_relative_frame')
def location_callback(self, attr_name, value):
global last_latitude
global last_longitude
global last_altitude
if round(value.lat, 6) != round(last_latitude, 6):
last_latitude = value.lat
print "latitude ", value.lat, "\n"
emit("latitude", value.lat)
if round(value.lon, 6) != round(last_longitude, 6):
last_longitude = value.lon
print "longitude ", value.lon, "\n"
emit("longitude", value.lon)
if round(value.alt) != round(last_altitude):
last_altitude = value.alt
print "altitude ", value.alt, "\n"
emit("altitude", value.alt)
app = Flask(__name__)
socketio = SocketIO(app)
if __name__ == '__main__':
socketio.run(app, host='0.0.0.0', port=5000, debug=True)
arm_and_takeoff(20)
I know because of the logs that I should not do any HTTP request inside "vehicle.on_attribute" decorator method and I should search for information on how to solve this problem but I didn't found any info about the error.
Hope you could help me.
Thank you very much,
Raniel

The emit() function by default returns an event back to the active client. If you call this function outside of a request context there is no concept of active client, so you get this error.
You have a couple of options:
indicate the recipient of the event and the namespace that you are using, so that there is no need to look them up in the context. You can do this by adding room and namespace arguments. Use '/' for the namespace if you are using the default namespace.
emit to all clients by adding broadcast=True as an argument, plus the namespace as indicated in #1.

Related

Active BLE Scanning (BlueZ) - Issue with DBus

I've started a project where I need to actively (all the time) scan for BLE Devices. I'm on Linux, using Bluez 5.49 and I use Python to communicate with dbus 1.10.20).
I' m able to start scanning, stop scanning with bluetoothctl and get the BLE Advertisement data through DBus (GetManagedObjects() of the BlueZ interface). The problem I have is when I let the scanning for many hours, dbus-deamon start to take more and more of the RAM and I'm not able to find how to "flush" what dbus has gathered from BlueZ. Eventually the RAM become full and Linux isn't happy.
So I've tried not to scan for the entire time, that would maybe let the Garbage collector do its cleanup. It didn't work.
I've edited the /etc/dbus-1/system.d/bluetooth.conf to remove any interface that I didn't need
<policy user="root">
<allow own="org.bluez"/>
<allow send_destination="org.bluez"/>
</policy>
That has slow down the RAM build-up but didn't solve the issue.
I've found a way to inspect which connection has byte waiting and confirmed that it comes from blueZ
Connection :1.74 with pid 3622 '/usr/libexec/bluetooth/bluetoothd --experimental ' (org.bluez):
IncomingBytes=1253544
PeakIncomingBytes=1313072
OutgoingBytes=0
PeakOutgoingBytes=210
and lastly, I've found that someone needs to read what is waiting in DBus in order to free the memory. So I've found this : https://stackoverflow.com/a/60665430/15325057
And I receive the data that BlueZ is sending over but the memory still built-up.
The only way I know to free up dbus is to reboot linux. which is not ideal.
I'm coming at the end of what I understand of DBus and that's why I'm here today.
If you have any insight that could help me to free dbus from BlueZ messages, it would be highly appreciated.
Thanks in advance
EDIT Adding the DBus code i use to read the discovered devices:
#!/usr/bin/python3
import dbus
BLUEZ_SERVICE_NAME = "org.bluez"
DBUS_OM_IFACE = "org.freedesktop.DBus.ObjectManager"
DEVICES_IFACE = "org.bluez.Device1"
def main_loop(subproc):
devinfo = None
objects = None
dbussys = dbus.SystemBus()
dbusconnection = dbussys.get_object(BLUEZ_SERVICE_NAME, "/")
bluezInterface = dbus.Interface(dbusconnection, DBUS_OM_IFACE)
while True:
try:
objects = bluezInterface.GetManagedObjects()
except dbus.DBusException as err:
print("dbus Error : " + str(err))
pass
all_devices = (str(path) for path, interfaces in objects.items() if DEVICES_IFACE in interfaces.keys())
for path, interfaces in objects.items():
if "org.bluez.Adapter1" not in interfaces.keys():
continue
device_list = [d for d in all_devices if d.startswith(path + "/")]
for dev_path in device_list:
properties = objects[dev_path][DEVICES_IFACE]
if "ServiceData" in properties.keys() and "Name" in properties.keys() and "RSSI" in properties.keys():
#[... Do someting...]
Indeed, Bluez flushes memory when you stop discovering. So in order to scan continuously you need start and stop the discovery all the time. I discover for 6 seconds, wait 1 second and then start discovering for 6 seconds again...and so on. If you check the logs you will see it deletes a lot of stuff when stopping discovery.
I can't really reproduce your error exactly but my system is not happy running that fast while loop repeatedly getting the data from GetManagedObjects.
Below is the code I ran based on your code with a little bit of refactoring...
import dbus
BLUEZ_SERVICE_NAME = "org.bluez"
DBUS_OM_IFACE = "org.freedesktop.DBus.ObjectManager"
ADAPTER_IFACE = "org.bluez.Adapter1"
DEVICES_IFACE = "org.bluez.Device1"
def main_loop():
devinfo = None
objects = None
dbussys = dbus.SystemBus()
dbusconnection = dbussys.get_object(BLUEZ_SERVICE_NAME, "/")
bluezInterface = dbus.Interface(dbusconnection, DBUS_OM_IFACE)
while True:
objects = bluezInterface.GetManagedObjects()
for path in objects:
name = objects[path].get(DEVICES_IFACE, {}).get('Name')
rssi = objects[path].get(DEVICES_IFACE, {}).get('RSSI')
service_data = objects[path].get(DEVICES_IFACE, {}).get('ServiceData')
if all((name, rssi, service_data)):
print(f'{name} # {rssi} = {service_data}')
#[... Do someting...]
if __name__ == '__main__':
main_loop()
I'm not sure what you are trying to do in the broader project but if I can make some recommendations...
A more typical way of scanning for service/manufacturer data is to subscribe to signals in D-Bus that trigger callbacks when something of interest happens.
Below is some code I use to look for iBeacons and Eddystone beacons. This runs using the GLib event loop which is maybe something you have ruled out but is more efficient on resources.
It does use different Python dbus bindings as I find pydbus more "pythonic".
I have left the code in processing the beacons as it might be a useful reference.
import argparse
from gi.repository import GLib
from pydbus import SystemBus
import uuid
DEVICE_INTERFACE = 'org.bluez.Device1'
remove_list = set()
def stop_scan():
"""Stop device discovery and quit event loop"""
adapter.StopDiscovery()
mainloop.quit()
def clean_beacons():
"""
BlueZ D-Bus API does not show duplicates. This is a
workaround that removes devices that have been found
during discovery
"""
not_found = set()
for rm_dev in remove_list:
try:
adapter.RemoveDevice(rm_dev)
except GLib.Error as err:
not_found.add(rm_dev)
for lost in not_found:
remove_list.remove(lost)
def process_eddystone(data):
"""Print Eddystone data in human readable format"""
_url_prefix_scheme = ['http://www.', 'https://www.',
'http://', 'https://', ]
_url_encoding = ['.com/', '.org/', '.edu/', '.net/', '.info/',
'.biz/', '.gov/', '.com', '.org', '.edu',
'.net', '.info', '.biz', '.gov']
tx_pwr = int.from_bytes([data[1]], 'big', signed=True)
# Eddystone UID Beacon format
if data[0] == 0x00:
namespace_id = int.from_bytes(data[2:12], 'big')
instance_id = int.from_bytes(data[12:18], 'big')
print(f'\t\tEddystone UID: {namespace_id} - {instance_id} \u2197 {tx_pwr}')
# Eddystone URL beacon format
elif data[0] == 0x10:
prefix = data[2]
encoded_url = data[3:]
full_url = _url_prefix_scheme[prefix]
for letter in encoded_url:
if letter < len(_url_encoding):
full_url += _url_encoding[letter]
else:
full_url += chr(letter)
print(f'\t\tEddystone URL: {full_url} \u2197 {tx_pwr}')
def process_ibeacon(data, beacon_type='iBeacon'):
"""Print iBeacon data in human readable format"""
print('DATA:', data)
beacon_uuid = uuid.UUID(bytes=bytes(data[2:18]))
major = int.from_bytes(bytearray(data[18:20]), 'big', signed=False)
minor = int.from_bytes(bytearray(data[20:22]), 'big', signed=False)
tx_pwr = int.from_bytes([data[22]], 'big', signed=True)
print(f'\t\t{beacon_type}: {beacon_uuid} - {major} - {minor} \u2197 {tx_pwr}')
def ble_16bit_match(uuid_16, srv_data):
"""Expand 16 bit UUID to full 128 bit UUID"""
uuid_128 = f'0000{uuid_16}-0000-1000-8000-00805f9b34fb'
return uuid_128 == list(srv_data.keys())[0]
def on_iface_added(owner, path, iface, signal, interfaces_and_properties):
"""
Event handler for D-Bus interface added.
Test to see if it is a new Bluetooth device
"""
iface_path, iface_props = interfaces_and_properties
if DEVICE_INTERFACE in iface_props:
on_device_found(iface_path, iface_props[DEVICE_INTERFACE])
def on_device_found(device_path, device_props):
"""
Handle new Bluetooth device being discover.
If it is a beacon of type iBeacon, Eddystone, AltBeacon
then process it
"""
address = device_props.get('Address')
address_type = device_props.get('AddressType')
name = device_props.get('Name')
alias = device_props.get('Alias')
paired = device_props.get('Paired')
trusted = device_props.get('Trusted')
rssi = device_props.get('RSSI')
service_data = device_props.get('ServiceData')
manufacturer_data = device_props.get('ManufacturerData')
if address.casefold() == '00:c3:f4:f1:58:69':
print('Found mac address of interest')
if service_data and ble_16bit_match('feaa', service_data):
process_eddystone(service_data['0000feaa-0000-1000-8000-00805f9b34fb'])
remove_list.add(device_path)
elif manufacturer_data:
for mfg_id in manufacturer_data:
# iBeacon 0x004c
if mfg_id == 0x004c and manufacturer_data[mfg_id][0] == 0x02:
process_ibeacon(manufacturer_data[mfg_id])
remove_list.add(device_path)
# AltBeacon 0xacbe
elif mfg_id == 0xffff and manufacturer_data[mfg_id][0:2] == [0xbe, 0xac]:
process_ibeacon(manufacturer_data[mfg_id], beacon_type='AltBeacon')
remove_list.add(device_path)
clean_beacons()
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-d', '--duration', type=int, default=0,
help='Duration of scan [0 for continuous]')
args = parser.parse_args()
bus = SystemBus()
adapter = bus.get('org.bluez', '/org/bluez/hci0')
bus.subscribe(iface='org.freedesktop.DBus.ObjectManager',
signal='InterfacesAdded',
signal_fired=on_iface_added)
mainloop = GLib.MainLoop()
if args.duration > 0:
GLib.timeout_add_seconds(args.duration, stop_scan)
adapter.SetDiscoveryFilter({'DuplicateData': GLib.Variant.new_boolean(False)})
adapter.StartDiscovery()
try:
print('\n\tUse CTRL-C to stop discovery\n')
mainloop.run()
except KeyboardInterrupt:
stop_scan()

Removing Airflow task logs

I'm running 5 DAG's which have generated a total of about 6GB of log data in the base_log_folder over a months period. I just added a remote_base_log_folder but it seems it does not exclude logging to the base_log_folder.
Is there anyway to automatically remove old log files, rotate them or force airflow to not log on disk (base_log_folder) only in remote storage?
Please refer https://github.com/teamclairvoyant/airflow-maintenance-dags
This plugin has DAGs that can kill halted tasks and log-cleanups.
You can grab the concepts and can come up with a new DAG that can cleanup as per your requirement.
We remove the Task logs by implementing our own FileTaskHandler, and then pointing to it in the airflow.cfg. So, we overwrite the default LogHandler to keep only N task logs, without scheduling additional DAGs.
We are using Airflow==1.10.1.
[core]
logging_config_class = log_config.LOGGING_CONFIG
log_config.LOGGING_CONFIG
BASE_LOG_FOLDER = conf.get('core', 'BASE_LOG_FOLDER')
FOLDER_TASK_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}'
FILENAME_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log'
LOGGING_CONFIG = {
'formatters': {},
'handlers': {
'...': {},
'task': {
'class': 'file_task_handler.FileTaskRotationHandler',
'formatter': 'airflow.job',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
'filename_template': FILENAME_TEMPLATE,
'folder_task_template': FOLDER_TASK_TEMPLATE,
'retention': 20
},
'...': {}
},
'loggers': {
'airflow.task': {
'handlers': ['task'],
'level': JOB_LOG_LEVEL,
'propagate': False,
},
'airflow.task_runner': {
'handlers': ['task'],
'level': LOG_LEVEL,
'propagate': True,
},
'...': {}
}
}
file_task_handler.FileTaskRotationHandler
import os
import shutil
from airflow.utils.helpers import parse_template_string
from airflow.utils.log.file_task_handler import FileTaskHandler
class FileTaskRotationHandler(FileTaskHandler):
def __init__(self, base_log_folder, filename_template, folder_task_template, retention):
"""
:param base_log_folder: Base log folder to place logs.
:param filename_template: template filename string.
:param folder_task_template: template folder task path.
:param retention: Number of folder logs to keep
"""
super(FileTaskRotationHandler, self).__init__(base_log_folder, filename_template)
self.retention = retention
self.folder_task_template, self.folder_task_template_jinja_template = \
parse_template_string(folder_task_template)
#staticmethod
def _get_directories(path='.'):
return next(os.walk(path))[1]
def _render_folder_task_path(self, ti):
if self.folder_task_template_jinja_template:
jinja_context = ti.get_template_context()
return self.folder_task_template_jinja_template.render(**jinja_context)
return self.folder_task_template.format(dag_id=ti.dag_id, task_id=ti.task_id)
def _init_file(self, ti):
relative_path = self._render_folder_task_path(ti)
folder_task_path = os.path.join(self.local_base, relative_path)
subfolders = self._get_directories(folder_task_path)
to_remove = set(subfolders) - set(subfolders[-self.retention:])
for dir_to_remove in to_remove:
full_dir_to_remove = os.path.join(folder_task_path, dir_to_remove)
print('Removing', full_dir_to_remove)
shutil.rmtree(full_dir_to_remove)
return FileTaskHandler._init_file(self, ti)
Airflow maintainers don't think truncating logs is a part of airflow core logic, to see this, and then in this issue, maintainers suggest to change LOG_LEVEL avoid too many log data.
And in this PR, we can learn how to change log level in airflow.cfg.
good luck.
I know it sounds savage, but have you tried pointing base_log_folder to /dev/null? I use Airflow as a part of a container, so I don't care about the files either, as long as the logger pipe to STDOUT as well.
Not sure how well this plays with S3 though.
For your concrete problems, I have some suggestions.
For those, you would always need a specialized logging config as described in this answer: https://stackoverflow.com/a/54195537/2668430
automatically remove old log files and rotate them
I don't have any practical experience with the TimedRotatingFileHandler from the Python standard library yet, but you might give it a try:
https://docs.python.org/3/library/logging.handlers.html#timedrotatingfilehandler
It not only offers to rotate your files based on a time interval, but if you specify the backupCount parameter, it even deletes your old log files:
If backupCount is nonzero, at most backupCount files will be kept, and if more would be created when rollover occurs, the oldest one is deleted. The deletion logic uses the interval to determine which files to delete, so changing the interval may leave old files lying around.
Which sounds pretty much like the best solution for your first problem.
force airflow to not log on disk (base_log_folder), but only in remote storage?
In this case you should specify the logging config in such a way that you do not have any logging handlers that write to a file, i.e. remove all FileHandlers.
Rather, try to find logging handlers that send the output directly to a remote address.
E.g. CMRESHandler which logs directly to ElasticSearch but needs some extra fields in the log calls.
Alternatively, write your own handler class and let it inherit from the Python standard library's HTTPHandler.
A final suggestion would be to combine both the TimedRotatingFileHandler and setup ElasticSearch together with FileBeat, so you would be able to store your logs inside ElasticSearch (i.e. remote), but you wouldn't store a huge amount of logs on your Airflow disk since they will be removed by the backupCount retention policy of your TimedRotatingFileHandler.
Usually apache airflow grab the disk space due to 3 reasons
1. airflow scheduler logs files
2. mysql binaly logs [Major]
3. xcom table records.
To make it clean up on regular basis I have set up a dag which run on daily basis and cleans the binary logs and truncate the xcom table to make the disk space free
You also might need to install [pip install mysql-connector-python].
To clean up scheduler log files I do delete them manually two times in a week to avoid the risk of logs deleted which needs to be required for some reasons.
I clean the logs files by [sudo rm -rd airflow/logs/] command.
Below is my python code for reference
'
"""Example DAG demonstrating the usage of the PythonOperator."""
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
from airflow.utils.dates import days_ago
from airflow.operators.bash import BashOperator
from airflow.providers.postgres.operators.postgres import PostgresOperator
args = {
'owner': 'airflow',
'email_on_failure':True,
'retries': 1,
'email':['Your Email Id'],
'retry_delay': timedelta(minutes=5)
}
dag = DAG(
dag_id='airflow_logs_cleanup',
default_args=args,
schedule_interval='#daily',
start_date=days_ago(0),
catchup=False,
max_active_runs=1,
tags=['airflow_maintenance'],
)
def truncate_table():
import mysql.connector
connection = mysql.connector.connect(host='localhost',
database='db_name',
user='username',
password='your password',
auth_plugin='mysql_native_password')
cursor = connection.cursor()
sql_select_query = """TRUNCATE TABLE xcom"""
cursor.execute(sql_select_query)
connection.commit()
connection.close()
print("XCOM Table truncated successfully")
def delete_binary_logs():
import mysql.connector
from datetime import datetime
date = datetime.today().strftime('%Y-%m-%d')
connection = mysql.connector.connect(host='localhost',
database='db_name',
user='username',
password='your_password',
auth_plugin='mysql_native_password')
cursor = connection.cursor()
query = 'PURGE BINARY LOGS BEFORE ' + "'" + str(date) + "'"
sql_select_query = query
cursor.execute(sql_select_query)
connection.commit()
connection.close()
print("Binary logs deleted successfully")
t1 = PythonOperator(
task_id='truncate_table',
python_callable=truncate_table, dag=dag
)
t2 = PythonOperator(
task_id='delete_binary_logs',
python_callable=delete_binary_logs, dag=dag
)
t2 << t1
'
I am surprized but it worked for me. Update your config as below:
base_log_folder=""
It is test in minio and in s3.
Our solution looks a lot like Franzi's:
Running on Airflow 2.0.1 (py3.8)
Override default logging configuration
Since we use a helm chart for airflow deployment it was easiest to push an env there, but it can also be done in the airflow.cfg or using ENV in dockerfile.
# Set custom logging configuration to enable log rotation for task logging
AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS: "airflow_plugins.settings.airflow_local_settings.DEFAULT_LOGGING_CONFIG"
Then we added the logging configuration together with the custom log handler to a python module we build and install in the docker image. As described here: https://airflow.apache.org/docs/apache-airflow/stable/modules_management.html
Logging configuration snippet
This is only a copy on the default from the airflow codebase, but then the task logger gets a different handler.
DEFAULT_LOGGING_CONFIG: Dict[str, Any] = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'airflow': {'format': LOG_FORMAT},
'airflow_coloured': {
'format': COLORED_LOG_FORMAT if COLORED_LOG else LOG_FORMAT,
'class': COLORED_FORMATTER_CLASS if COLORED_LOG else 'logging.Formatter',
},
},
'handlers': {
'console': {
'class': 'airflow.utils.log.logging_mixin.RedirectStdHandler',
'formatter': 'airflow_coloured',
'stream': 'sys.stdout',
},
'task': {
'class': 'airflow_plugins.log.rotating_file_task_handler.RotatingFileTaskHandler',
'formatter': 'airflow',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
'filename_template': FILENAME_TEMPLATE,
'maxBytes': 10485760, # 10MB
'backupCount': 6,
},
...
RotatingFileTaskHandler
And finally the custom handler which is just a merge of the logging.handlers.RotatingFileHandler and the FileTaskHandler.
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""File logging handler for tasks."""
import logging
import os
from pathlib import Path
from typing import TYPE_CHECKING, Optional
import requests
from airflow.configuration import AirflowConfigException, conf
from airflow.utils.helpers import parse_template_string
if TYPE_CHECKING:
from airflow.models import TaskInstance
class RotatingFileTaskHandler(logging.Handler):
"""
FileTaskHandler is a python log handler that handles and reads
task instance logs. It creates and delegates log handling
to `logging.FileHandler` after receiving task instance context.
It reads logs from task instance's host machine.
:param base_log_folder: Base log folder to place logs.
:param filename_template: template filename string
"""
def __init__(self, base_log_folder: str, filename_template: str, maxBytes=0, backupCount=0):
self.max_bytes = maxBytes
self.backup_count = backupCount
super().__init__()
self.handler = None # type: Optional[logging.FileHandler]
self.local_base = base_log_folder
self.filename_template, self.filename_jinja_template = parse_template_string(filename_template)
def set_context(self, ti: "TaskInstance"):
"""
Provide task_instance context to airflow task handler.
:param ti: task instance object
"""
local_loc = self._init_file(ti)
self.handler = logging.handlers.RotatingFileHandler(
filename=local_loc,
mode='a',
maxBytes=self.max_bytes,
backupCount=self.backup_count,
encoding='utf-8',
delay=False,
)
if self.formatter:
self.handler.setFormatter(self.formatter)
self.handler.setLevel(self.level)
def emit(self, record):
if self.handler:
self.handler.emit(record)
def flush(self):
if self.handler:
self.handler.flush()
def close(self):
if self.handler:
self.handler.close()
def _render_filename(self, ti, try_number):
if self.filename_jinja_template:
if hasattr(ti, 'task'):
jinja_context = ti.get_template_context()
jinja_context['try_number'] = try_number
else:
jinja_context = {
'ti': ti,
'ts': ti.execution_date.isoformat(),
'try_number': try_number,
}
return self.filename_jinja_template.render(**jinja_context)
return self.filename_template.format(
dag_id=ti.dag_id,
task_id=ti.task_id,
execution_date=ti.execution_date.isoformat(),
try_number=try_number,
)
def _read_grouped_logs(self):
return False
def _read(self, ti, try_number, metadata=None): # pylint: disable=unused-argument
"""
Template method that contains custom logic of reading
logs given the try_number.
:param ti: task instance record
:param try_number: current try_number to read log from
:param metadata: log metadata,
can be used for steaming log reading and auto-tailing.
:return: log message as a string and metadata.
"""
# Task instance here might be different from task instance when
# initializing the handler. Thus explicitly getting log location
# is needed to get correct log path.
log_relative_path = self._render_filename(ti, try_number)
location = os.path.join(self.local_base, log_relative_path)
log = ""
if os.path.exists(location):
try:
with open(location) as file:
log += f"*** Reading local file: {location}\n"
log += "".join(file.readlines())
except Exception as e: # pylint: disable=broad-except
log = f"*** Failed to load local log file: {location}\n"
log += "*** {}\n".format(str(e))
elif conf.get('core', 'executor') == 'KubernetesExecutor': # pylint: disable=too-many-nested-blocks
try:
from airflow.kubernetes.kube_client import get_kube_client
kube_client = get_kube_client()
if len(ti.hostname) >= 63:
# Kubernetes takes the pod name and truncates it for the hostname. This truncated hostname
# is returned for the fqdn to comply with the 63 character limit imposed by DNS standards
# on any label of a FQDN.
pod_list = kube_client.list_namespaced_pod(conf.get('kubernetes', 'namespace'))
matches = [
pod.metadata.name
for pod in pod_list.items
if pod.metadata.name.startswith(ti.hostname)
]
if len(matches) == 1:
if len(matches[0]) > len(ti.hostname):
ti.hostname = matches[0]
log += '*** Trying to get logs (last 100 lines) from worker pod {} ***\n\n'.format(
ti.hostname
)
res = kube_client.read_namespaced_pod_log(
name=ti.hostname,
namespace=conf.get('kubernetes', 'namespace'),
container='base',
follow=False,
tail_lines=100,
_preload_content=False,
)
for line in res:
log += line.decode()
except Exception as f: # pylint: disable=broad-except
log += '*** Unable to fetch logs from worker pod {} ***\n{}\n\n'.format(ti.hostname, str(f))
else:
url = os.path.join("http://{ti.hostname}:{worker_log_server_port}/log", log_relative_path).format(
ti=ti, worker_log_server_port=conf.get('celery', 'WORKER_LOG_SERVER_PORT')
)
log += f"*** Log file does not exist: {location}\n"
log += f"*** Fetching from: {url}\n"
try:
timeout = None # No timeout
try:
timeout = conf.getint('webserver', 'log_fetch_timeout_sec')
except (AirflowConfigException, ValueError):
pass
response = requests.get(url, timeout=timeout)
response.encoding = "utf-8"
# Check if the resource was properly fetched
response.raise_for_status()
log += '\n' + response.text
except Exception as e: # pylint: disable=broad-except
log += "*** Failed to fetch log file from worker. {}\n".format(str(e))
return log, {'end_of_log': True}
def read(self, task_instance, try_number=None, metadata=None):
"""
Read logs of given task instance from local machine.
:param task_instance: task instance object
:param try_number: task instance try_number to read logs from. If None
it returns all logs separated by try_number
:param metadata: log metadata,
can be used for steaming log reading and auto-tailing.
:return: a list of listed tuples which order log string by host
"""
# Task instance increments its try number when it starts to run.
# So the log for a particular task try will only show up when
# try number gets incremented in DB, i.e logs produced the time
# after cli run and before try_number + 1 in DB will not be displayed.
if try_number is None:
next_try = task_instance.next_try_number
try_numbers = list(range(1, next_try))
elif try_number < 1:
logs = [
[('default_host', f'Error fetching the logs. Try number {try_number} is invalid.')],
]
return logs, [{'end_of_log': True}]
else:
try_numbers = [try_number]
logs = [''] * len(try_numbers)
metadata_array = [{}] * len(try_numbers)
for i, try_number_element in enumerate(try_numbers):
log, metadata = self._read(task_instance, try_number_element, metadata)
# es_task_handler return logs grouped by host. wrap other handler returning log string
# with default/ empty host so that UI can render the response in the same way
logs[i] = log if self._read_grouped_logs() else [(task_instance.hostname, log)]
metadata_array[i] = metadata
return logs, metadata_array
def _init_file(self, ti):
"""
Create log directory and give it correct permissions.
:param ti: task instance object
:return: relative log path of the given task instance
"""
# To handle log writing when tasks are impersonated, the log files need to
# be writable by the user that runs the Airflow command and the user
# that is impersonated. This is mainly to handle corner cases with the
# SubDagOperator. When the SubDagOperator is run, all of the operators
# run under the impersonated user and create appropriate log files
# as the impersonated user. However, if the user manually runs tasks
# of the SubDagOperator through the UI, then the log files are created
# by the user that runs the Airflow command. For example, the Airflow
# run command may be run by the `airflow_sudoable` user, but the Airflow
# tasks may be run by the `airflow` user. If the log files are not
# writable by both users, then it's possible that re-running a task
# via the UI (or vice versa) results in a permission error as the task
# tries to write to a log file created by the other user.
relative_path = self._render_filename(ti, ti.try_number)
full_path = os.path.join(self.local_base, relative_path)
directory = os.path.dirname(full_path)
# Create the log file and give it group writable permissions
# TODO(aoen): Make log dirs and logs globally readable for now since the SubDag
# operator is not compatible with impersonation (e.g. if a Celery executor is used
# for a SubDag operator and the SubDag operator has a different owner than the
# parent DAG)
Path(directory).mkdir(mode=0o777, parents=True, exist_ok=True)
if not os.path.exists(full_path):
open(full_path, "a").close()
# TODO: Investigate using 444 instead of 666.
os.chmod(full_path, 0o666)
return full_path
Maybe a final note; the links in the airflow UI to the logging will now only open the latest logfile, not the older rotated files which are only accessible by means of SSH or any other interface to access the airflow logging path.
I don't think that there is a rotation mechanism but you can store them in S3 or google cloud storage as describe here : https://airflow.incubator.apache.org/configuration.html#logs

How to get the Tor ExitNode IP with Python and Stem

I'm trying to get the external IP that Tor uses, as mentioned here. When using something like myip.dnsomatic.com, this is very slow. I tried what was suggested in the aforementioned link (python + stem to control tor through the control port), but all you get is circuit's IPs with no assurance of which one is the one on the exitnode, and, sometimes the real IP is not even among the results.
Any help would be appreciated.
Also, from here, at the bottom, Amine suggests a way to renew the identity in Tor. There is an instruction, controller.get_newnym_wait(), which he uses to wait until the new connection is ready (controller is from Control in steam.control), isn't there any thing like that in Steam (sorry, I checked and double/triple checked and couldn't find nothing) that tells you that Tor is changing its identity?
You can get the exit node ip without calling a geoip site.
This is however on a different stackexchange site here - https://tor.stackexchange.com/questions/3253/how-do-i-trap-circuit-id-none-errors-in-the-stem-script-exit-used-py
As posted by #mirimir his code below essentially attaches a stream event listener function, which is then used to get the circuit id, circuit fingerprint, then finally the exit ip address -
#!/usr/bin/python
import functools
import time
from stem import StreamStatus
from stem.control import EventType, Controller
def main():
print "Tracking requests for tor exits. Press 'enter' to end."
print
with Controller.from_port() as controller:
controller.authenticate()
stream_listener = functools.partial(stream_event, controller)
controller.add_event_listener(stream_listener, EventType.STREAM)
raw_input() # wait for user to press enter
def stream_event(controller, event):
if event.status == StreamStatus.SUCCEEDED and event.circ_id:
circ = controller.get_circuit(event.circ_id)
exit_fingerprint = circ.path[-1][0]
exit_relay = controller.get_network_status(exit_fingerprint)
t = time.localtime()
print "datetime|%d-%02d-%02d %02d:%02d:%02d % (t.tm_year, t.tm_mon, t.tm_mday, t.tm_hour, t.tm_min, t.tm_sec)
print "website|%s" % (event.target)
print "exitip|%s" % (exit_relay.address)
print "exitport|%i" % (exit_relay.or_port)
print "fingerprint|%s" % exit_relay.fingerprint
print "nickname|%s" % exit_relay.nickname
print "locale|%s" % controller.get_info("ip-to-country/%s" % exit_relay.address, 'unknown')
print
You can use this code for check current IP (change SOCKS_PORT value to yours):
import re
import stem.process
import requesocks
SOCKS_PORT = 9053
tor_process = stem.process.launch_tor()
proxy_address = 'socks5://127.0.0.1:{}'.format(SOCKS_PORT)
proxies = {
'http': proxy_address,
'https': proxy_address
}
response = requesocks.get("http://httpbin.org/ip", proxies=proxies)
print re.findall(r'[\d.-]+', response.text)[0]
tor_process.kill()
If you want to use socks you should do:
pip install requests[socks]
Then you can do:
import requests
import json
import stem.process
import stem
SOCKS_PORT = "9999"
tor = stem.process.launch_tor_with_config(
config={
'SocksPort': SOCKS_PORT,
},
tor_cmd= 'absolute_path/to/tor.exe',
)
r = requests.Session()
proxies = {
'http': 'socks5://localhost:' + SOCKS_PORT,
'https': 'socks5://localhost:' + SOCKS_PORT
}
response = r.get("http://httpbin.org/ip", proxies=proxies)
self.current_ip = response.json()['origin']

Create a portal_user_catalog and have it used (Plone)

I'm creating a fork of my Plone site (which has not been forked for a long time). This site has a special catalog object for user profiles (a special Archetypes-based object type) which is called portal_user_catalog:
$ bin/instance debug
>>> portal = app.Plone
>>> print [d for d in portal.objectMap() if d['meta_type'] == 'Plone Catalog Tool']
[{'meta_type': 'Plone Catalog Tool', 'id': 'portal_catalog'},
{'meta_type': 'Plone Catalog Tool', 'id': 'portal_user_catalog'}]
This looks reasonable because the user profiles don't have most of the indexes of the "normal" objects, but have a small set of own indexes.
Since I found no way how to create this object from scratch, I exported it from the old site (as portal_user_catalog.zexp) and imported it in the new site. This seemed to work, but I can't add objects to the imported catalog, not even by explicitly calling the catalog_object method. Instead, the user profiles are added to the standard portal_catalog.
Now I found a module in my product which seems to serve the purpose (Products/myproduct/exportimport/catalog.py):
"""Catalog tool setup handlers.
$Id: catalog.py 77004 2007-06-24 08:57:54Z yuppie $
"""
from Products.GenericSetup.utils import exportObjects
from Products.GenericSetup.utils import importObjects
from Products.CMFCore.utils import getToolByName
from zope.component import queryMultiAdapter
from Products.GenericSetup.interfaces import IBody
def importCatalogTool(context):
"""Import catalog tool.
"""
site = context.getSite()
obj = getToolByName(site, 'portal_user_catalog')
parent_path=''
if obj and not obj():
importer = queryMultiAdapter((obj, context), IBody)
path = '%s%s' % (parent_path, obj.getId().replace(' ', '_'))
__traceback_info__ = path
print [importer]
if importer:
print importer.name
if importer.name:
path = '%s%s' % (parent_path, 'usercatalog')
print path
filename = '%s%s' % (path, importer.suffix)
print filename
body = context.readDataFile(filename)
if body is not None:
importer.filename = filename # for error reporting
importer.body = body
if getattr(obj, 'objectValues', False):
for sub in obj.objectValues():
importObjects(sub, path+'/', context)
def exportCatalogTool(context):
"""Export catalog tool.
"""
site = context.getSite()
obj = getToolByName(site, 'portal_user_catalog', None)
if tool is None:
logger = context.getLogger('catalog')
logger.info('Nothing to export.')
return
parent_path=''
exporter = queryMultiAdapter((obj, context), IBody)
path = '%s%s' % (parent_path, obj.getId().replace(' ', '_'))
if exporter:
if exporter.name:
path = '%s%s' % (parent_path, 'usercatalog')
filename = '%s%s' % (path, exporter.suffix)
body = exporter.body
if body is not None:
context.writeDataFile(filename, body, exporter.mime_type)
if getattr(obj, 'objectValues', False):
for sub in obj.objectValues():
exportObjects(sub, path+'/', context)
I tried to use it, but I have no idea how it is supposed to be done;
I can't call it TTW (should I try to publish the methods?!).
I tried it in a debug session:
$ bin/instance debug
>>> portal = app.Plone
>>> from Products.myproduct.exportimport.catalog import exportCatalogTool
>>> exportCatalogTool(portal)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File ".../Products/myproduct/exportimport/catalog.py", line 58, in exportCatalogTool
site = context.getSite()
AttributeError: getSite
So, if this is the way to go, it looks like I need a "real" context.
Update: To get this context, I tried an External Method:
# -*- coding: utf-8 -*-
from Products.myproduct.exportimport.catalog import exportCatalogTool
from pdb import set_trace
def p(dt, dd):
print '%-16s%s' % (dt+':', dd)
def main(self):
"""
Export the portal_user_catalog
"""
g = globals()
print '#' * 79
for a in ('__package__', '__module__'):
if a in g:
p(a, g[a])
p('self', self)
set_trace()
exportCatalogTool(self)
However, wenn I called it, I got the same <PloneSite at /Plone> object as the argument to the main function, which didn't have the getSite attribute. Perhaps my site doesn't call such External Methods correctly?
Or would I need to mention this module somehow in my configure.zcml, but how? I searched my directory tree (especially below Products/myproduct/profiles) for exportimport, the module name, and several other strings, but I couldn't find anything; perhaps there has been an integration once but was broken ...
So how do I make this portal_user_catalog work?
Thank you!
Update: Another debug session suggests the source of the problem to be some transaction matter:
>>> portal = app.Plone
>>> puc = portal.portal_user_catalog
>>> puc._catalog()
[]
>>> profiles_folder = portal.some_folder_with_profiles
>>> for o in profiles_folder.objectValues():
... puc.catalog_object(o)
...
>>> puc._catalog()
[<Products.ZCatalog.Catalog.mybrains object at 0x69ff8d8>, ...]
This population of the portal_user_catalog doesn't persist; after termination of the debug session and starting fg, the brains are gone.
It looks like the problem was indeed related with transactions.
I had
import transaction
...
class Browser(BrowserView):
...
def processNewUser(self):
....
transaction.commit()
before, but apparently this was not good enough (and/or perhaps not done correctly).
Now I start the transaction explicitly with transaction.begin(), save intermediate results with transaction.savepoint(), abort the transaction explicitly with transaction.abort() in case of errors (try / except), and have exactly one transaction.commit() at the end, in the case of success. Everything seems to work.
Of course, Plone still doesn't take this non-standard catalog into account; when I "clear and rebuild" it, it is empty afterwards. But for my application it works well enough.

Development Mode For uWSGI/Pylons (Reload new code)

I have a setup such that an nginx server passes control off to uWsgi, which launches a pylons app using the following in my xml configuration file:
<ini-paste>...</ini-paste>
Everything is working nicely, and I was able to set it to debug mode using the following in the associated ini file, like:
debug = true
Except debug mode only prints out errors, and doesn't reload the code everytime a file has been touched. If I was running directly through paste, I could use the --reload option, but going through uWsgi complicates things.
Does anybody know of a way to tell uWsgi to tell paste to set the --reload option, or to do this directly in the paste .ini file?
I used something like the following code to solve this, the monitorFiles(...) method is called on application initialization, and it monitors the files, sending the TERM signal when it sees a change.
I'd still much prefer a solution using paster's --reload argument, as I imagine this solution has bugs:
import os
import time
import signal
from deepthought.system import deployment
from multiprocessing.process import Process
def monitorFiles():
if deployment.getDeployment().dev and not FileMonitor.isRunning:
monitor = FileMonitor(os.getpid())
try: monitor.start()
except: print "Something went wrong..."
class FileMonitor(Process):
isRunning = False
def __init__(self, masterPid):
self.updates = {}
self.rootDir = deployment.rootDir() + "/src/python"
self.skip = len(self.rootDir)
self.masterPid = masterPid
FileMonitor.isRunning = True
Process.__init__(self)
def run(self):
while True:
self._loop()
time.sleep(5)
def _loop(self):
for root, _, files in os.walk(self.rootDir):
for file in files:
if file.endswith(".py"):
self._monitorFile(root, file)
def _monitorFile(self, root, file):
mtime = os.path.getmtime("%s/%s" % (root, file))
moduleName = "%s/%s" % (root[self.skip+1:], file[:-3])
moduleName = moduleName.replace("/",".")
if not moduleName in self.updates:
self.updates[moduleName] = mtime
elif self.updates[moduleName] < mtime:
print "Change detected in %s" % moduleName
self._restartWorker()
self.updates[moduleName] = mtime
def _restartWorker(self):
os.kill(self.masterPid, signal.SIGTERM)
Use the signal framework in 0.9.7 tree
http://projects.unbit.it/uwsgi/wiki/SignalFramework
An example of auto-reloading:
import uwsgi
uwsgi.register_signal(1, "", uwsgi.reload)
uwsgi.add_file_monitor(1, 'myfile.py')
def application(env, start_response):
...

Resources