Buffering data for media player hangs tests in PyQt5 - qt

I need to play audio from memory. For demonstration purposes, I loaded my .mp3 from hard-drive into memory. The audio loads and plays correctly, but not in testing mode. Here's my code:
from PyQt5.QtCore import QBuffer, QIODevice, QByteArray
from PyQt5.QtMultimedia import QMediaPlayer, QMediaContent
with open(r"C:\test_audio.mp3", "rb") as f:
q_byte_array = QByteArray(f.read())
player = QMediaPlayer()
buffer = QBuffer(player)
buffer.setData(q_byte_array)
buffer.open(QIODevice.ReadOnly)
player.setMedia(QMediaContent(), buffer)
player.play()
When I try to run a test for this part, it hangs infinitely. Here's a minimal example of a test function for it to get stuck.
from PyQt5.QtCore import QBuffer, QIODevice
from PyQt5.QtMultimedia import QMediaPlayer, QMediaContent
def test_media_player():
player = QMediaPlayer()
buffer = QBuffer()
buffer.open(QIODevice.ReadWrite)
player.setMedia(QMediaContent(), buffer)
(it does that even if QBuffer is filled with audio data)
There was a suggestion that it maybe needs the app event loop, is there any example how to implement that for testing? Or some other solution?

Related

I want to create a communication with raspberry pico and my windows pc

I need to build the communication with micropython since I need it for school. The next issue that I can't seem to get done is that my communication needs to be from python program to raspberry pi pico and back. The farthest I've tried is this.
A program on the raspberry:
import sys
import utime
while(True):
x = sys.stdin.buffer.read()
if x == "1":
sys.stdout.print(x)
utime.sleep(1)
if x == 'end':
break
and a program on my pc:
import serial
from time import sleep
class Handler:
TERMINATOR = '\n'.encode('UTF8')
def __init__(self, device='COM19', baud=115200, timeout=1):
self.serial = serial.Serial(device, baud, timeout=timeout)
def receive(self) -> str:
line = self.serial.read_until(self.TERMINATOR)
return line.decode('UTF8').strip()
def send(self, text: str):
line = text
self.serial.write(line.encode('UTF8'))
def close(self):
self.serial.close()
sender = Handler('COM19',115200,1)
while(True):
x = input()
sender.send(x)
sleep(2)
print(sender.receive())
if x == 'end':
break
This code is absolutely not mine and is an amalgam of what I was able to find on the internet. What I am trying to do is put a number into the console on my computer program and I am trying to send it back with raspberry pi pico and read it on my pc. But I wasn't able to get that response. Any help would be fine, either pointers or solutions. Thank you for anything in advance.
Fixed it, finally realized that when I use serial.write() it doesn't write \r\n so everything was just always being put into a long line and never read on pi pico.

How can I utilize asyncio to make third party file operations faster?

I am utilizing a third party library called isort. isort has an available function that opens and reads a file. In order to speed this up I attempted to changed the function called isort.check_file to make it perform asynchronously. The method check_file takes the file path, however the current behaviour that I have attempted does not work.
...
coroutines= [self.check_file('c:\\example1.py'), self.check_file('c:\\example2.py')]
loop = asyncio.get_event_loop()
result = loop.run_until_complete(asyncio.gather(*coroutines))
...
async def check_file(self, changed_file):
return isort.check_file(changed_file)
However, this does not seem to work. How can I make the library call isort.check_file be utilized correctly with asyncio.gather?
Better understanding of IO Bottleneck and GIL
What your async function check_file doing is just as same without async at front. To get any meaningful performance asynchronously, you Must be using some sort of Awaitables - which requires await keyword.
So basically what you did is:
import time
async def wait(n):
time.sleep(n)
Which does absolutely no good for asynchronous operations.
To make such synchronous function asynchronous - assuming it's mostly IO-bound - you can use asyncio.to_thread instead.
import asyncio
import time
async def task():
await asyncio.to_thread(time.sleep, 10) # <- await + something that's awaitable
# similar to await asyncio.sleep(10) now
async def main():
tasks = [task() for _ in range(10)]
await asyncio.gather(*tasks)
asyncio.run(main())
That essentially moves IO bound operation out of main thread, so main thread can do it's work without waiting for IO works.
But there's catch - Python's Global Interpreter Lock(GIL).
Due to CPython - official python implementation - limitation, only 1 python interpreter thread can run in at any given moment, stalling all others.
Then how we achieve better performance just by moving IO to different thread? Just simply by releasing GIL during IO operations.
IO Operations are basically just like this:
"Hey OS, please do this IO works for me. Wake me up when it's done."
Thread 1 goes to sleep
Some time later, OS punches Thread 1
"Your IO Operation is done, take this and get back to work."
So all it does is Doing Nothing - for such cases, aka IO Bound stuffs, GIL can be safely released and let other threads to run. Built-in functions like time.sleep, open(), etc implements such GIL release logic in their C code.
This doesn't change much in asyncio, which is internally bunch of event checks and callbacks. Each asyncio,Tasks works like threads in some degree - tasks asking main loop to wake them up when IO operation done is done.
Now these basic simplified concepts sorted out, we can go back to your question.
CPU Bottleneck and IO Bottleneck
Bsically what you're up against is Not an IO bottleneck. It's mostly CPU/etc bottleneck.
Loading merely few KB of texts from local drives then running tons of intense Python code afterward doesn't count as an IO bound operation.
Testing
Let's consider following test case:
run isort.check_file for 10000 scripts as:
Synchronously, just like normal python codes
Multithreaded, with 2 threads
Multiprocessing, with 2 processes
Asynchronous, using asyncio.to_thread
We can expect that:
Multithreaded will be slower than Synchronous code, as there's very little IO works
Multiprocessing process spawning & communicating takes time, so it will be slower in short workload, faster in longer workload.
Asynchronous will be even more slower than the Multithreaded, because Asyncio have to deal with threads which it's not really designed for.
With folder structure of:
├─ main.py
└─ import_messes
├─ lib_0.py
├─ lib_1.py
├─ lib_2.py
├─ lib_3.py
├─ lib_4.py
├─ lib_5.py
├─ lib_6.py
├─ lib_7.py
├─ lib_8.py
└─ lib_9.py
Which we'll load 1000 times each, making up to total 10000 loads.
Each of those are filled with random imports I grabbed from asyncio.
from asyncio.base_events import *
from asyncio.coroutines import *
from asyncio.events import *
from asyncio.exceptions import *
from asyncio.futures import *
from asyncio.locks import *
from asyncio.protocols import *
from asyncio.runners import *
from asyncio.queues import *
from asyncio.streams import *
from asyncio.subprocess import *
from asyncio.tasks import *
from asyncio.threads import *
from asyncio.transports import *
Source code(main.py):
"""
asynchronous isort demo
"""
import pathlib
import asyncio
import itertools
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
from timeit import timeit
import isort
from isort import format
# target dir with modules
FILE = pathlib.Path("./import_messes")
# Monkey-patching isort.format.create_terminal_printer to suppress Terminal bombarding.
# Totally not required nor recommended for normal use
class SuppressionPrinter:
def __init__(self, *_, **__):
pass
def success(self, *_):
pass
def error(self, *_):
pass
def diff_line(self, *_):
pass
isort.format.BasicPrinter = SuppressionPrinter
# -----------------------------
# Test functions
def filelist_gen():
"""Chain directory list multiple times to get meaningful difference"""
yield from itertools.chain.from_iterable([FILE.iterdir() for _ in range(1000)])
def isort_synchronous(path_iter):
"""Synchronous usual isort use-case"""
# return list of results
return [isort.check_file(file) for file in path_iter]
def isort_thread(path_iter):
"""Threading isort"""
# prepare thread pool
with ThreadPoolExecutor(max_workers=2) as executor:
# start loading
futures = [executor.submit(isort.check_file, file) for file in path_iter]
# return list of results
return [fut.result() for fut in futures]
def isort_multiprocess(path_iter):
"""Multiprocessing isort"""
# prepare process pool
with ProcessPoolExecutor(max_workers=2) as executor:
# start loading
futures = [executor.submit(isort.check_file, file) for file in path_iter]
# return list of results
return [fut.result() for fut in futures]
async def isort_asynchronous(path_iter):
"""Asyncio isort using to_thread"""
# create coroutines that delegate sync funcs to threads
coroutines = [asyncio.to_thread(isort.check_file, file) for file in path_iter]
# run coroutines and wait for results
return await asyncio.gather(*coroutines)
if __name__ == '__main__':
# run once, no repetition
n = 1
# synchronous runtime
print(f"Sync func.: {timeit(lambda: isort_synchronous(filelist_gen()), number=n):.4f}")
# threading demo
print(f"Threading : {timeit(lambda: isort_thread(filelist_gen()), number=n):.4f}")
# multiprocessing demo
print(f"Multiproc.: {timeit(lambda: isort_multiprocess(filelist_gen()), number=n):.4f}")
# asyncio to_thread demo
print(f"to_thread : {timeit(lambda: asyncio.run(isort_asynchronous(filelist_gen())), number=n):.4f}")
Run results
Sync func.: 18.1764
Threading : 18.3138
Multiproc.: 9.5206
to_thread : 27.3645
(above results are ran on NVME ssd)
You can see isort.check_file is not an IO-Bound operation on fast IO devices. Therefore best bet is using Multiprocessing, if Really needed with such fast drives.
If number of files are low in above 'Fast IO Device' situations, like hundred or below, multiprocessing will suffer even more than using asyncio.to_thread, because cost to spawn, communicate, and kill process overwhelm the multiprocessing's benefits.
However - With slow IO devics like HDD Threading/async is totally valid idea and will give a great boost in performance.
Experiment with your usecase, adjust core/thread count (max_workers) to best fit your enviornments and your usecase.

problem with event loops, gui qt5, and ipywidgets in a jupyter notebook

I am trying to integrate interactive ipywidgets with a loop in my code that also performs other tasks (in this case, acquiring data from some hardware attached from the computer and updating live plots).
In the past, I could do this by using IPython.kernel.do_one_iteration() in my while loop: this would trigger a sync of the ipywidget changes and I would be able to retrieve them from the python widget objects. A minimal example is here:
import ipywidgets as widgets
from time import sleep
import IPython
do_one_iteration = IPython.get_ipython().kernel.do_one_iteration
w = widgets.ToggleButton()
display(w)
i=0
while True:
do_one_iteration()
print(i, w.value, end="\r")
w.decription = str(i)
sleep(0.5)
i+=1
Here, the for loop prints out the ticker integer along with the state of the widget. (In the real code, I would also acquire data, update plots, and change plot / acquisition settings dependent on the interaction with the user via the widgets.)
With ipykernel 5.3.2 and ipython 7.16.1, this worked fine: if the widget changed, calling do_one_iteration() synced the widget states to the kernel and I could retrieve it from my while loop.
After an upgrade (to 6.4.1 and 7.29.0), this no longer works. It seems that do_one_iteration() is now a coroutine: I get a warning coroutine 'Kernel.do_one_iteration' was never awaited if I use the above code.
With some help of a friend, we found a way to do this with threading an asyncio:
%gui asyncio
import asyncio
import ipywidgets as widgets
button = widgets.ToggleButton()
display(button)
text = widgets.Text()
display(text)
text.value= str(button.value)
stop_button = widgets.ToggleButton()
stop_button.description = "Stop"
display(stop_button)
async def f():
i=0
while True:
i += 1
text.value = str(i) + " " + str(button.value)
await asyncio.sleep(0.2)
if stop_button.value == True:
return
asyncio.create_task(f());
And this works (also adding a stop button, and changing to text output widget instead of printing). But to throw a spanner in the works, I need to use a library that itself uses a QT gui event loop. Some more puzzling suggests that this should be the code to make this work:
%gui qt5
import asyncio
import ipywidgets as widgets
import qasync
button = widgets.ToggleButton()
display(button)
text = widgets.Text()
display(text)
text.value= str(button.value)
stop_button = widgets.ToggleButton()
stop_button.description = "Stop"
display(stop_button)
async def f():
while True:
i += 1
text.value = str(i) + " " + str(button.value)
await asyncio.sleep(0.2)
if stop_button.value == True:
return
from qtpy import QtWidgets
APP = QtWidgets.QApplication.instance()
loop = qasync.QEventLoop(APP)
asyncio.set_event_loop(loop)
asyncio.create_task(f());
But with this code, the updates do not propagate, and I get the following error on the terminal running my notebook server:
[IPKernelApp] ERROR | Error in message handler
Traceback (most recent call last):
File "/Users/gsteele/anaconda3/envs/myenv2/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 457, in dispatch_queue
await self.process_one()
File "/Users/gsteele/anaconda3/envs/myenv2/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 440, in process_one
t, dispatch, args = await self.msg_queue.get()
RuntimeError: Task <Task pending name='Task-2'
coro=<Kernel.dispatch_queue() running at
/Users/gsteele/anaconda3/envs/myenv2/lib/python3.9/site-packages/ipykernel/kernelbase.py:457>
cb=[IOLoop.add_future.<locals>.<lambda>() at /Users/gsteele/anaconda3/envs/myenv2/lib/python3.9/site-packages/tornado/ioloop.py:688]>
got Future <Future pending> attached to a different loop
It seems that somehow my ipywidgets events are propagating to the wrong event loop.
And now my question is: does anybody know what is going on here?
It's hard for me to identify if this is a "bug", and if so, in which software package do things go wrong? ipykernel? Or tornado? Or ipywidgets? Or asyncio? Or maybe I'm missing something?
Any thoughts highly welcome, thanks!
Found at least a partial solution: using the nest_asyncio package allows me to now use do_one_iteration(), just by adding the following to the first code block:
import nest_asyncio
nest_asyncio.apply()
and then using await do_one_iteration() instead of calling it directly.
(see https://github.com/ipython/ipykernel/issues/825)
For my purposes, this solves my issue since I don't need asynchronous interaction with my GUI. The problem of the %gui qt5 interaction with the event loop in the asynchronous versions of the code is still a mystery though...

Extract Hyperlink from a spool pdf file in Python

I am getting my form data from frontend and reading it using fast api as shown below:
#app.post("/file_upload")
async def upload_file(pdf: UploadFile = File(...)):
print("Content = ",pdf.content_type,pdf.filename,pdf.spool_max_size)
return {"filename": "Succcess"}
Now what I need to do is extract hyperlinks from these spool Files with the help of pypdfextractor as shown below:
import pdfx
from os.path import exists
from config import availableUris
def getHrefsFromPDF(pdfPath:str)->dict:
if not(exists(pdfPath)):
raise FileNotFoundError("PDF File not Found")
pdf = pdfx.PDFx(pdfPath)
return pdf.get_references_as_dict().get('url',[])
But I am not sure how to convert spool file (Received from FAST API) to pdfx readable file format.
Additionally, I also tried to study the bytes that come out of the file. When I try to do this:
data = await pdf.read()
data type shows as : bytes when I try to convert it using str function it gives a unicoded encoded string which is totally a gibberish to me, I also tried to decode using "utf-8" which throws UnicodeDecodeError.
fastapi gives you a SpooledTemporaryFile. You may be able to use that file object directly if there is some api in pdfx which will work on a File() object rather than a str representing a path (!). Otherwise make a new temporary file on disk and work with that:
from tempfile import TemporaryDirectory
from pathlib import Path
import pdfx
#app.post("/file_upload")
async def upload_file(pdf: UploadFile = File(...)):
with TemporaryDirectory() as d: #Adding the file into a temporary storage for re-reading purposes
tmpf = Path(d) / "pdf.pdf"
with tmpf.open("wb") as f:
f.write(pdf.read())
p = pdfx.PDFX(str(tmpf))
...
It may be that pdfx.PDFX will take a Path object. I'll update this answer if so. I've kept the read-write loop synchronous for ease, but you can make it asynchronous if there is a reason to do so.
Note that it would be better to find a way of doing this with the SpooledTemporaryFile.
As to your data showing as bytes: well, pdfs are (basically) binary files: what did you expect?

CUDA IPC Memcpy + MPI fails in Theano, works in pycuda

For learning purposes, I wrote a small C Python module that is supposed to perform an IPC cuda memcopy to transfer data between processes. For testing, I wrote equivalent programs: one using theano's CudaNdarray, and the other using pycuda. The problem is, even though the test programs are nearly identical, the pycuda version works while the theano version does not. It doesn't crash: it just produces incorrect results.
Below is the relevant function in the C module. Here is what it does: every process has two buffers: a source and a destination. Calling _sillycopy(source, dest, n) copies n elements from each process's source buffer to the neighboring process's dest array. So, if I have two processes, 0 and 1, processes 0 will end up with process 1's source buffer and processes 1 will end up with process 0's source buffer.
Note that to transfer cudaIpcMemHandle_t values between processes, I use MPI (this is a small part of a larger project which uses MPI). _sillycopy is called by another function, "sillycopy" which is exposed in Python by the standard Python C API methods.
void _sillycopy(float *source, float* dest, int n, MPI_Comm comm) {
int localRank;
int localSize;
MPI_Comm_rank(comm, &localRank);
MPI_Comm_size(comm, &localSize);
// Figure out which process is to the "left".
// m() performs a mod and treats negative numbers
// appropriately
int neighbor = m(localRank - 1, localSize);
// Create a memory handle for *source and do a
// wasteful Allgather to distribute to other processes
// (could just use an MPI_Sendrecv, but irrelevant right now)
cudaIpcMemHandle_t *memHandles = new cudaIpcMemHandle_t[localSize];
cudaIpcGetMemHandle(memHandles + localRank, source);
MPI_Allgather(
memHandles + localRank, sizeof(cudaIpcMemHandle_t), MPI_BYTE,
memHandles, sizeof(cudaIpcMemHandle_t), MPI_BYTE,
comm);
// Open the neighbor's mem handle so we can do a cudaMemcpy
float *sourcePtr;
cudaIpcOpenMemHandle((void**)&sourcePtr, memHandles[neighbor], cudaIpcMemLazyEnablePeerAccess);
// Copy!
cudaMemcpy(dest, sourcePtr, n * sizeof(float), cudaMemcpyDefault);
cudaIpcCloseMemHandle(sourcePtr);
delete [] memHandles;
}
Now here is the pycuda example. For reference, using int() on a_gpu and b_gpu returns the pointer to the underlying buffer's memory address on the device.
import sillymodule # sillycopy lives in here
import simplempi as mpi
import pycuda.driver as drv
import numpy as np
import atexit
import time
mpi.init()
drv.init()
# Make sure each process uses a different GPU
dev = drv.Device(mpi.rank())
ctx = dev.make_context()
atexit.register(ctx.pop)
shape = (2**26,)
# allocate host memory
a = np.ones(shape, np.float32)
b = np.zeros(shape, np.float32)
# allocate device memory
a_gpu = drv.mem_alloc(a.nbytes)
b_gpu = drv.mem_alloc(b.nbytes)
# copy host to device
drv.memcpy_htod(a_gpu, a)
drv.memcpy_htod(b_gpu, b)
# A few more host buffers
a_p = np.zeros(shape, np.float32)
b_p = np.zeros(shape, np.float32)
# Sanity check: this should fill a_p with 1's
drv.memcpy_dtoh(a_p, a_gpu)
# Verify that
print(a_p[0:10])
sillymodule.sillycopy(
int(a_gpu),
int(b_gpu),
shape[0])
# After this, b_p should have all one's
drv.memcpy_dtoh(b_p, b_gpu)
print(c_p[0:10])
And now the theano version of the above code. Rather than using int() to get the buffers' address, the CudaNdarray way of accessing this is via the gpudata attribute.
import os
import simplempi as mpi
mpi.init()
# select's one gpu per process
os.environ['THEANO_FLAGS'] = "device=gpu{}".format(mpi.rank())
import theano.sandbox.cuda as cuda
import time
import numpy as np
import time
import sillymodule
shape = (2 ** 24, )
# Allocate host data
a = np.ones(shape, np.float32)
b = np.zeros(shape, np.float32)
# Allocate device data
a_gpu = cuda.CudaNdarray.zeros(shape)
b_gpu = cuda.CudaNdarray.zeros(shape)
# Copy from host to device
a_gpu[:] = a[:]
b_gpu[:] = b[:]
# Should print 1's as a sanity check
print(np.asarray(a_gpu[0:10]))
sillymodule.sillycopy(
a_gpu.gpudata,
b_gpu.gpudata,
shape[0])
# Should print 1's
print(np.asarray(b_gpu[0:10]))
Again, the pycuda code works perfectly and the theano version runs, but gives the wrong result. To be precise, at the end of the theano code, b_gpu is filled with garbage: neither 1's nor 0's, just random numbers as though it were copying from a wrong place in memory.
My original theory regarding why this was failing had to do with CUDA contexts. I wondered if it was possible theano was doing something with them that meant that the cuda calls made in sillycopy were run under a different CUDA context than had been used to create the gpu arrays. I don't think this is the case because:
I spent a lot of time digging deep in theano's code and saw no funny business being played with contexts
I would expect such a problem to result in a bad crash, not an incorrect result, which is not what happens.
A secondary thought is whether this has to do the fact that theano spawns several threads, even when using a cuda backend, which can be verified this by running "ps huH p ". I don't know how threads might affect anything, but I have run out of obvious things to consider.
Any thoughts on this would be greatly appreciated!
For reference: the processes are launched in the normal OpenMPI way:
mpirun --np 2 python test_pycuda.py

Resources