"HTTPError: 429 Client Error: Too Many Requests for url" in Python [w/ Azure developer account] - microsoft-cognitive

Getting continual exception when trying to run Python code on a small sample directory of 5-20 images.
I have a developer account with Microsoft (as of this morning) and have inquired with Azure Support, but this issue wasn't solvable through chat alone. They've instructed me to post this here, so I'm sorry if it comes off as an eye-roll to everyone else!
There's very little documentation on this web-wide. Maybe this'll help someone else, because this is potentially a game-changer for people in graphics and media who have a huge amount of disorganized images like I do.
Note that this code DID work once last night! But only once. No idea what went wrong!
# API reference :
# https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fa
# 参考 : https://ledge.ai/microsoft-computer-vision-api/
# 機能概要 : img フォルダ中の画像をAI解析し、ファイルのリネームを行います。
# 使い方 : python3 cv_demo.py
# 注意 : サブスクリプションキーは変更してください
import requests
import glob
import os
import time
import urllib
subscription_key = "i do have a real subscription key don't worry"
assert subscription_key
vision_base_url = "https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/"
analyze_url = vision_base_url + "analyze"
# ファイル名を変更
def file_rename(list_1, list_2):
for i in range(len(list_1)):
os.rename(list_1[i], './img/' + list_2[i] + '.jpg')
def ms_computer_vision_api(filepath):
headers = {'Ocp-Apim-Subscription-Key': subscription_key,'Content-Type': 'application/octet-stream'}
params = urllib.parse.urlencode({
# Request parameters
'visualFeatures': 'Categories,Tags,Description,Faces'
})
img = open(filepath, 'rb')
img_byte = img.read()
response = requests.post(analyze_url, data=img_byte, headers=headers, params=params)
response.raise_for_status()
return response.json()
if __name__ == "__main__":
# 画像ファイルを配列に格納
image_file = glob.glob('./img/*')
vision_file_name = []
start = time.time()
# Computer Vision APIにリクエストして結果を取得
for i in range(len(image_file)):
json_data = ms_computer_vision_api(image_file[i])
# 生成された文章を取得
file_name = json_data['description']['captions'][0]['text']
vision_file_name.append(file_name)
# 文章の空白をファイル名用にアンダーバーに修正
for i in range(len(vision_file_name)):
vision_file_name[i] = vision_file_name[i].replace(' ', '_')
file_rename(image_file,vision_file_name)
# 経過時間を出力
print("elapsed_time:{0}".format(time.time() - start) + "[sec]")
File "C:\Users\MyName\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\requests\models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://renameimages.cognitiveservices.azure.com/vision/v2.0/analyze?visualFeatures=Description

Based on your code this error comes from a loop which requesting a cognitive API.
Actually each Cognitive Service has the limit of TPS (Transaction Per Second), and will report 429 error when exceed the TPS. Even if specific Cognitive Service has a higer TPS such as 50 TPS, maybe you still have 429 error. You should always use the following policy to avoid 429 in the future.
The following is the explanation 429 and how to handle 429.
HTTP 429 would indicate RateLimitExceeded, meaning you are making too many API calls per second or minute.
When HTTP 429 happens, you must wait for some time to call API again, or else the next call of API will be refused. Usually we retry the operation using something like an exponential back off retry policy to handle the 429 error:
2.1) You need check the HTTP response code in your code.
2.2) When HTTP response code is 429, then retry this operation after N seconds which you can define by yourself such as 10 seconds…
For example, the following is a response of 429. You can set your wait time as (26 + n) seconds. (PS: you can define n by yourself here, such as n = 5…)
{
"error":{
"statusCode": 429,
"message": "Rate limit is exceeded. Try again in 26 seconds."
}
}
2.3) If step 2 succeed, continue the next operation.
2.4) If step 2 fail with 429 too, retry this operation after N*N seconds (you can define by yourself too) which is an exponential back off retry policy..
2.5) If step 4 fail with 429 too, retry this operation after NNN seconds…
2.6) You should always wait for current operation to succeed, and the Waiting time will be exponential growth.

Related

Requests(url) is having after 5 iteration

I am attempting to run a webscraping algo on indeed using beautifulSoup and loop through the different pages. However, after 2-6 iterations, the requests.get(url) hangs and stops finding the next page. I have read that it might do something with the server being blocked but that would have blocked the original requests and it also says online that Indeed allows for web scraping. I have also heard that I should set a header but I am unsure how to do that. I am running on the latest version of safari and MacOs 12.4.
A solution I came up with, thought this does not answer the question specifically, is by using a try expect statement and setting a timeout value to the request. Once the timeout value is reached, it enters the try except statement, sets a boolean value, and then continues the loop and try again. Code is inserted below.
while(i < 10):
url = get_url('software intern', '', i)
print("Parsing Page Number:" + str(i + 1))
error = False
try:
response = requests.get(url, timeout = 10)
except requests.exceptions.Timeout as err:
error = True
if error:
print("Trying to connect to webpage again")
continue
i += 1
I am leaving the question as unanswered for now however as I still don't know the root cause of this issue and this solution is just a workaround.

Is there a restriction on the no. of HERE API Calls I can make in a loop (using R)

I am trying to loop through a list of origin destination lat long locations to get the transit time. I am getting the following error when I loop. However when I do a single call (without looping), I get an output without error. I use the freemium HERE-API and I am allowed 250k transactions a month.
`for (i in 1:nrow(test))
{
call <- paste0("https://route.api.here.com/routing/7.2/calculateroute.json",
"?app_id=","appid",
"&app_code=","appcode",
"&waypoint0=geo!",y$dc_lat[i],",",y$dc_long[i],
"&waypoint1=geo!",y$store_lat[i],",",y$store_long[i],
"&mode=","fastest;truck;traffic:enabled",
"&trailerscount=","1",
"&routeattributes=","sh",
"&maneuverattributes=","di,sh",
"&limitedweight=","20")
response <-fromJSON(call, simplify = TRUE)
Traffic_time = (response[["response"]][["route"]][[1]][["summary"]][["trafficTime"]]) / 60
Base_time = (response[["response"]][["route"]][[1]][["summary"]][["baseTime"]]) / 60
print(Traffic_time)
}`
Error in file(con, "r"): cannot open the connection to 'https://route.api.here.com/routing/7.2/calculateroute.json?app_id=appid&app_code=appcode&waypoint0=geo!45.1005200,-93.2452000&waypoint1=geo!45.0978500,-95.0413620&mode=fastest;truck;traffic:enabled&trailerscount=1&routeattributes=sh&maneuverattributes=di,sh&limitedweight=20'
Traceback:
As per the error, this suggests that there is problem with the file at your end. it could be corrupt, good to try with changing the extension of the file. Can also try to restart your IDE. The number of API calls depend on the plans that you have opted for freemium or pro plans. You can have more details : https://developer.here.com/faqs

SIM5320E - POST request with large data is slow

I have built a prototype using a raspberry and a sim5320E module. The goal is to send a large amount of data (~100Kb) through HTTP using this 3G module.
I have followed the instructions specified in section 16.5 (HTTPS) of the AT Command set for the SIM5320:
https://cdn-shop.adafruit.com/datasheets/SIMCOM_SIM5320_ATC_EN_V2.02.pdf
And it worked fine, except that it is slow.
From what I understand from the documentation (and seen from my tests), the data to be sent must be divided in chunks of max 4096 bytes.
Every chunk must be sent to what is called the "sending buffer" using the command AT+CHTTPSSEND.
Every now and then, we must check that the sending buffer does not have too much data in cache using the AT+CHTTPSSEND? command.
The last AT+CHTTPSSEND command commits all sending data.
My problem is that every AT+CHTTPSSEND takes around 10 seconds to complete, which means that my HTTP request will take around 250 seconds to complete.
Anybody knows what might cause this slowness?
Here is some code to illustrate the issue:
def send_chunk(self, chunk):
# Send chunk
self._send('CHTTPSSEND={}'.format(len(chunk)), wait_for=">")
self._send_raw(chunk.encode())
# Check how much data is left in the sending buffer
# Wait for this data to be under 3Kb
data_left = 3001
while data_left > 3000:
response = self._send('CHTTPSSEND?', wait_for="+CHTTPSSEND:")
data_left = int(response.strip().split(" ")[1])
time.sleep(2)
And here are the logs I get:
>> AT+CHTTPSSEND=4096 -> This commands takes ~10 seconds
<< >
>> Sending chunk of data
<< OK
>> AT+CHTTPSSEND?
<< +CHTTPSSEND: 0

Write and read from a serial port

I am using the following python script to write AT+CSQ on serial port ttyUSB1.
But I cannot read anything.
However, when I fire AT+CSQ on minicom, I get the required results.
What may be the issue with this script?
Logs:
Manual Script
root#imx6slzbha:~# python se.py
Serial is open
Serial is open in try block also
write data: AT+CSQ
read data:
read data:
read data:
read data:
Logs:
Minicom console
1. ate
OK
2. at+csq
+CSQ: 20,99
3. at+csq=?
OKSQ: (0-31,99),(99)
How can I receive these results in the following python script?
import serial, time
#initialization and open the port
#possible timeout values:
# 1. None: wait forever, block call
# 2. 0: non-blocking mode, return immediately
# 3. x, x is bigger than 0, float allowed, timeout block call
ser = serial.Serial()
ser.port = "/dev/ttyUSB1"
ser.baudrate = 115200
ser.bytesize = serial.EIGHTBITS #number of bits per bytes
ser.parity = serial.PARITY_NONE #set parity check: no parity
ser.stopbits = serial.STOPBITS_ONE #number of stop bits
ser.timeout = None #block read
#ser.timeout = 0 #non-block read
ser.timeout = 3 #timeout block read
ser.xonxoff = False #disable software flow control
ser.rtscts = False #disable hardware (RTS/CTS) flow control
ser.dsrdtr = False #disable hardware (DSR/DTR) flow control
ser.writeTimeout = 2 #timeout for write
try:
ser.open()
print("Serial is open")
except Exception, e:
print "error open serial port: " + str(e)
exit()
if ser.isOpen():
try:
print("Serial is open in try block also")
ser.flushInput() #flush input buffer, discarding all its contents
ser.flushOutput()#flush output buffer, aborting current output
#and discard all that is in buffer
#write data
ser.write("AT+CSQ")
time.sleep(1)
# ser.write("AT+CSQ=?x0D")
print("write data: AT+CSQ")
# print("write data: AT+CSQ=?x0D")
time.sleep(2) #give the serial port sometime to receive the data
numOfLines = 1
while True:
response = ser.readline()
print("read data: " + response)
numOfLines = numOfLines + 1
if (numOfLines >= 5):
break
ser.close()
except Exception, e1:
print "error communicating...: " + str(e1)
else:
print "cannot open serial port "
You have two very fundamental flaws in your AT command handling:
time.sleep(1)
and
if (numOfLines >= 5):
How bad are they? Nothing will ever work until you fix those, and by that I mean completely change the way you send and receive command and responses.
Sending AT commands to a modem is a communication protocol like any other protocols, where certain parts and behaviours are required and not optional. Just like you would not write a HTTP client that completely ignores the responses it gets back from the HTTP server, you must never write a program that sends AT commands to a modem and completely ignores the responses the modem sends back.
AT commands are a link layer protocol, with with a window size of 1 - one. Therefore after sending a command line, the sender MUST wait until has received a response from the modem that it is finished with processing the command line, and that kind of response is called Final result code.
If the modem uses 70ms before it responds with a final result code you have to wait at least 70ms before continuing, if it uses 4 seconds you have to wait at least 4 seconds before continuing, if it uses several minutes (and yes, there exists AT commands that can take minutes to complete) you have to wait for several minutes. If the modem has not responded in an hour, your only options are 1) continue waiting, 2) just give up or 3) disconnect, reconnect and start all over again.
This is why sleep is such a horrible approach that in the very best case is a time wasting ticking bomb. It is as useful as kicking dogs that stand in your way in order to get them to move. Yes it might actually work some times, but at some point you will be sorry for taking that approach...
And regarding numOfLines there is no way anyone in advance can know exactly how many lines a modem will respond with. What if your modem just responds with a single line with the ERROR final result code? The code will deadlock.
So this line number counting has to go completely away, and instead your code should be sending a command line and then wait for the final result code by reading and parsing the response lines from the modem.
But before diving too deep into that answer, start by reading the V.250 specification, at least all of chapter 5. This is the standard that defines the basics of AT command, and will for instance teach you the difference between a command and a command line. And how to correctly terminate a command line which you are not doing, so the modem will never start processing the commands you send.

Callback from "multiprocessing" with CFFI segfaults after ~100 iterations

A PyPy callback, that works perfectly (in an infinite loop) when implemented (straightforwardly) as method of a Python object, segfaults after approximately 100 iterations when I move the Python object into a separate multiprocessing process.
In the main code I have:
import multiprocessing as mp
class Task(object):
def __init__(self, com, lib):
self.com = com # communication queue
self.lib = lib # ffi library
self.proc = mp.Process(target=self.spawn, args=(self.com,))
self.register_callback()
def spawn(self, com):
print('%s spawned.'%self.name)
# loop (keeping 'self' alive) until BREAK:
while True:
cmd = com.get()
if cmd == self.BREAK:
break
print("%s stopped."%self.name)
#ffi.calback("int(void*, Data*"): # old cffi (ABI mode)
def callback(self, data):
# <work on data>
return 1
def register_callback(self):
s = ffi.new_handle(self)
self.lib.register_callback(s, self.callback) # C-call
The idea is that multiple tasks should serve an equal number of callbacks concurrently. I have no clue what may cause the segfault, especially since it runs fine for the first ~100 iterations or so. Help much appreciated!
Solution
Handle 's' is garbage collected when returning from 'register_callback()'. Making the handle an attribute of 'self' and passing keeps it alive.
Standard CPython (cffi 1.6.0) segfaulted at the first iteration (i.e. gc was immediate) and provided me a crucial informative error message. PyPy on the other hand segfaulted after approximately 100 iterations without providing a message... Both run fine now.

Resources