Encode image file to base64 - python-3.6

I have a trouble to convert an image to base64 and send it through xml-rpc client, the xml-RPC server respond and gives this error
a bytes-like object is required, not '_io.BufferedReader'
import base64
with open(full_path, 'rb') as imgFile:
image = base64.b64encode(imgFile)

You have given file pointer but should give binary data.
You should write as following :
import base64
with open(full_path, 'rb') as imgFile:
image = base64.b64encode(imgFile.read())

Related

Extract Hyperlink from a spool pdf file in Python

I am getting my form data from frontend and reading it using fast api as shown below:
#app.post("/file_upload")
async def upload_file(pdf: UploadFile = File(...)):
print("Content = ",pdf.content_type,pdf.filename,pdf.spool_max_size)
return {"filename": "Succcess"}
Now what I need to do is extract hyperlinks from these spool Files with the help of pypdfextractor as shown below:
import pdfx
from os.path import exists
from config import availableUris
def getHrefsFromPDF(pdfPath:str)->dict:
if not(exists(pdfPath)):
raise FileNotFoundError("PDF File not Found")
pdf = pdfx.PDFx(pdfPath)
return pdf.get_references_as_dict().get('url',[])
But I am not sure how to convert spool file (Received from FAST API) to pdfx readable file format.
Additionally, I also tried to study the bytes that come out of the file. When I try to do this:
data = await pdf.read()
data type shows as : bytes when I try to convert it using str function it gives a unicoded encoded string which is totally a gibberish to me, I also tried to decode using "utf-8" which throws UnicodeDecodeError.
fastapi gives you a SpooledTemporaryFile. You may be able to use that file object directly if there is some api in pdfx which will work on a File() object rather than a str representing a path (!). Otherwise make a new temporary file on disk and work with that:
from tempfile import TemporaryDirectory
from pathlib import Path
import pdfx
#app.post("/file_upload")
async def upload_file(pdf: UploadFile = File(...)):
with TemporaryDirectory() as d: #Adding the file into a temporary storage for re-reading purposes
tmpf = Path(d) / "pdf.pdf"
with tmpf.open("wb") as f:
f.write(pdf.read())
p = pdfx.PDFX(str(tmpf))
...
It may be that pdfx.PDFX will take a Path object. I'll update this answer if so. I've kept the read-write loop synchronous for ease, but you can make it asynchronous if there is a reason to do so.
Note that it would be better to find a way of doing this with the SpooledTemporaryFile.
As to your data showing as bytes: well, pdfs are (basically) binary files: what did you expect?

How to update payload info for python scraping

I have a python scraper that works for this site:
https://dhhr.wv.gov/COVID-19/Pages/default.aspx
It will scrape the tooltips from one of the graphs that is navigated to by clicking the "Positive Case Trends" link in the above URL.
here is my code:
import re
import requests
import json
from datetime import date
url4 = 'https://wabi-us-gov-virginia-api.analysis.usgovcloudapi.net/public/reports/querydata?synchronous=true'
# payload:
x=r'{"version":"1.0.0","queries":[{"Query":{"Commands":[{"SemanticQueryDataShapeCommand":{"Query":{"Version":2,"From":[{"Name":"c","Entity":"Case Data"}],"Select":[{"Column":{"Expression":{"SourceRef":{"Source":"c"}},"Property":"Lab Report Date"},"Name":"Case Data.Lab Add Date"},{"Aggregation":{"Expression":{"Column":{"Expression":{"SourceRef":{"Source":"c"}},"Property":"Daily Confirmed Cases"}},"Function":0},"Name":"Sum(Case Data.Daily Confirmed Cases)"},{"Aggregation":{"Expression":{"Column":{"Expression":{"SourceRef":{"Source":"c"}},"Property":"Daily Probable Cases"}},"Function":0},"Name":"Sum(Case Data.Daily Probable Cases)"}]},"Binding":{"Primary":{"Groupings":[{"Projections":[0,1,2]}]},"DataReduction":{"DataVolume":4,"Primary":{"BinnedLineSample":{}}},"Version":1}}}]},"CacheKey":"{\"Commands\":[{\"SemanticQueryDataShapeCommand\":{\"Query\":{\"Version\":2,\"From\":[{\"Name\":\"c\",\"Entity\":\"Case Data\"}],\"Select\":[{\"Column\":{\"Expression\":{\"SourceRef\":{\"Source\":\"c\"}},\"Property\":\"Lab Report Date\"},\"Name\":\"Case Data.Lab Add Date\"},{\"Aggregation\":{\"Expression\":{\"Column\":{\"Expression\":{\"SourceRef\":{\"Source\":\"c\"}},\"Property\":\"Daily Confirmed Cases\"}},\"Function\":0},\"Name\":\"Sum(Case Data.Daily Confirmed Cases)\"},{\"Aggregation\":{\"Expression\":{\"Column\":{\"Expression\":{\"SourceRef\":{\"Source\":\"c\"}},\"Property\":\"Daily Probable Cases\"}},\"Function\":0},\"Name\":\"Sum(Case Data.Daily Probable Cases)\"}]},\"Binding\":{\"Primary\":{\"Groupings\":[{\"Projections\":[0,1,2]}]},\"DataReduction\":{\"DataVolume\":4,\"Primary\":{\"BinnedLineSample\":{}}},\"Version\":1}}}]}","QueryId":"","ApplicationContext":{"DatasetId":"fb9b182d-de95-4d65-9aba-3e505de8eb75","Sources":[{"ReportId":"dbabbc9f-cc0d-4dd0-827f-5d25eeca98f6"}]}}],"cancelQueries":[],"modelId":339580}'
x=x.replace("\\\'","'")
json_data = json.loads(x)
final_data2 = requests.post(url4, json=json_data, headers={'X-PowerBI-ResourceKey': 'ab4e5874-7bbf-44c9-9443-0701abdee612'}).json()
print(json.dumps(final_data2))
The issue is that some days it stops working because the payload and X-PowerBI-ResourceKey header parameter values change and i have to find and manually copy and paste the new values from browser inspection network section into my source. Is there a way to programatically obtain these from the webpage and construct them in my code?
I'm pretty sure the resource key is part of the iframe url encoded as base64.
from base64 import b64decode
from bs4 import BeautifulSoup
import json
import requests
resp = requests.get('https://dhhr.wv.gov/COVID-19/Pages/default.aspx')
soup = BeautifulSoup(resp.text)
data = soup.find_all('iframe')[0]['src'].split('=').pop()
decoded = json.loads(b64decode(data).decode())

corrupt data in decode base64 to image in Qt

I write a program decode the base64 string to image. I wrote a sample:
QFile file("./image.jpg");
if(!file.open(QIODevice::ReadOnly | QIODevice::Text))
{
return;
}
QByteArray raw = file.readAll().toBase64();
QImage = image;
image.loadFromData(QByteArray::fromBase64(raw), "JPG");
image.save("output.jpg", "JPG");
The output of the program is:
Corrupt JPEG data: 65 extraneous bytes before marker 0xc0
Quantization table 0x01 was not defined
I can't find something useful with google. I only read image file, and encode it with base64, then decode it. Could you tell me what's wrong with my code?
I have figured out what's wrong with my code. When i open a image file, i use the QIODevice::Text open mode. But the image is a binary file, so i should remove the QIODevice::Text option. After do that, the code run well.

How to process and save HTTP body as-is in Haskell?

I have tried following code to download HTML but it actually transforms non-ASCII characters into series of decoded characters like < U+009B> and 0033200400\0031\0031.
openURL x = getResponseBody =<< simpleHTTP (getRequest x)
download url path = do src <- openURL url
writeFile path src
How to change the following code to write HTTP response exactly as received? How should one search and manipulate with strings in such content?
The string output like "\1234\5678" is actually only two characters long—the data is preserved, but you need to interpret it correctly. Probably the best way to do that is to use Text which, instead of being a list of Chars, is actually a byte array representing UTF-8 codepoints.
To do this, you need to use a slightly more general interface in HTTP mkRequest :: BufferType ty => RequestMethod -> URI -> Request ty. Text does not directly instantiate BufferType, so we'll go through ByteString, which represents binary chunks of data—it has no particular interpretation of the encoding of that data.
We can then use decodeUtf8 to convert the raw bytes to UTF-8 Text
import Data.Text
import Data.Text.Encoding
import Data.ByteString
\ uri -> do
rawData <- getResponseBody =<< simpleHTTP (mkRequest GET uri) :: IO Text
return (decodeUtf8 rawData)
Note that decodeUtf8 is partial—it may fail in a way that cannot be caught in pure code mandating a restart or handler all the way up in your IO stack. If this is undesirable, if there's a good chance that you're downloading text which isn't valid UTF-8, then you can use decodeUtf8' which returns an Either.

Haskell Network.HTTP incorrectly downloading image

I'm trying to download images using the Network.HTTP module and having little success.
import Network.HTTP
main = do
jpg <- get "http://www.irregularwebcomic.net/comics/irreg2557.jpg"
writeFile "irreg2557.jpg" jpg where
get url = simpleHTTP (getRequest url) >>= getResponseBody
The output file appears in the current directory, but fails to display under chromium or ristretto. Ristretto reports "Error interpreting JPEG image file (Not a JPEG file: starts with 0c3 0xbf)".
writeFile :: FilePath -> String -> IO ()
String. That's your problem, right there. String is for unicode text. Attempting to store binary data in it will lead to corruption. It's not clear in this case whether the corruption is being done by simpleHTTP or by writeFile, but it's ultimately unimportant. You're using the wrong type, and something is corrupting the data when confronted with bytes that don't make up a valid unicode encoding.
As for fixing this, newer versions of HTTP are polymorphic in their return type, and can handle returning the raw bytes in a ByteString. You just need to change how you're writing the bytes to the file, so that it won't infer that you want a String.
import qualified Data.ByteString as B
import Network.HTTP
import Network.URI (parseURI)
main = do
jpg <- get "http://www.irregularwebcomic.net/comics/irreg2557.jpg"
B.writeFile "irreg2557.jpg" jpg
where
get url = let uri = case parseURI url of
Nothing -> error $ "Invalid URI: " ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
The construction to get a polymorphic Request is a bit clumsy. If issue #1 ever gets fixed then using getRequest url will suffice.

Resources