google.cloud translation pdf documents (en to zh), return protobuff.repeated,not write pdf,how use python write pdf? - google-translate

python code : en pdf translate zh pdf,but translate result not write pdf

The returned object of response.document_translation.byte_stream_outputs is <class 'proto.marshal.collections.repeated.Repeated'> which means it is a list. You need to loop through the bytes_stream object then write it to file.
See code snippet below for the implementation:
bytes_stream = response.document_translation.byte_stream_outputs
f = open('2_after.pdf', 'wb')
for data in bytes_stream:
f.write(data)
f.close()

Related

Find sequencing reads with insertions longer than number

I'm trying to isolate, from a bam file, those sequencing reads that have insertions longer than number (let's say 50bp). I guess I can do that using the cigar but I don't know any easy way to parse it and keep only the reads that I want. This is what I need:
Read1 -> 2M1I89M53I2M
Read2 -> 2M1I144M
I should keep only Read1.
Thanks!
Most likely I'm late, but ...
Probably you want MC tag, not CIGAR. I use BWA, and information on insertions is stored in the MC tag. But I may mistake.
Use pysam module to parse BAM, and regular expressions to parse MC tags.
Example code:
import pysam
import re
input_file = pysam.AlignmentFile('input.bam', 'rb')
output_file = pysam.AlignmentFile('found.bam', 'wb', template = input_file)
for Read in input_file:
try:
TagMC = Read.get_tag('MC')
except KeyError:
continue
InsertionsTags = re.findall('\d+I', TagMC)
if not InsertionsTags: continue
InsertionLengths = [int(Item[:-1]) for Item in InsertionsTags]
MinLength = min(InsertionLengths)
if MinLength > 50: output_file.write(Read)
input_file.close()
output_file.close()
Hope that helps.

WebScraping for downloading certain .csv files

I have this question. I need to download certain .csv files from a website as the title said, and i'm having troubles doing it. I'm very new on programming and especially with this topic(web scraping)
from bs4 import BeautifulSoup as BS
import requests
DOMAIN = 'https://datos.gob.ar'
URL = 'https://datos.gob.ar/dataset/cultura-mapa-cultural-espacios-culturales/'
FILETYPE = ".csv"
def get_soup(url):
return BS(requests.get(url).text, 'html.parser')
for link in get_soup(URL).find_all('a'):
file_link = link.get('href')
if FILETYPE in file_link:
print(file_link)
this code shows all avaibable .csv files but I just need to download those which end up with "biblioteca popular.csv" , "cine.csv" and "museos.csv"
Maybe it's a very simple task but I can not finding out
https://datos.cultura.gob.ar/dataset/37305de4-3cce-4d4b-9d9a-fec3ca61d09f/resource/456d1087-87f9-4e27-9c9c-1d9734c7e51d/download/biblioteca_especializada.csv
https://datos.cultura.gob.ar/dataset/37305de4-3cce-4d4b-9d9a-fec3ca61d09f/resource/01c6c048-dbeb-44e0-8efa-6944f73715d7/download/biblioteca_popular.csv
https://datos.cultura.gob.ar/dataset/37305de4-3cce-4d4b-9d9a-fec3ca61d09f/resource/8d0b7f33-d570-4189-9961-9e907193aebc/download/casas_bicentenario.csv
https://datos.cultura.gob.ar/dataset/37305de4-3cce-4d4b-9d9a-fec3ca61d09f/resource/4207def0-2ff7-41d5-9095-d42ae8207a5d/download/museos.csv
https://datos.cultura.gob.ar/dataset/37305de4-3cce-4d4b-9d9a-fec3ca61d09f/resource/392ce1a8-ef11-4776-b280-6f1c7fae16ae/download/cine.csv
https://datos.cultura.gob.ar/dataset/37305de4-3cce-4d4b-9d9a-fec3ca61d09f/resource/87ebac9c-774c-4ef2-afa7-044c41ee4190/download/teatro.csv
You can extract the JavaScript object housing that info which otherwise would be loaded to where you see if by JavaScript running in the browser. You then need to do some Unicode code point cleaning and string cleaning and parse as JSON. You can use a key word list to select from desired urls.
Unicode cleaning method by #Mark Tolonen
import json
import requests
import re
URL = 'https://datos.gob.ar/dataset/cultura-mapa-cultural-espacios-culturales/'
r = requests.get(URL)
search = ["Bibliotecas Populares", "Salas de Cine", "Museos"]
s = re.sub( r'\n\s{2,}', '', re.search(r'"#graph": (\[[\s\S]+{0}[\s\S]+)}}'.format(search[0]), r.text).group(1))
data = json.loads(re.sub(r'\\"', '', re.sub(r'\\u([0-9a-fA-F]{4})',lambda m: chr(int(m.group(1),16)),s)))
for i in data:
if 'schema:name' in i:
name = i['schema:name']
if name in search:
print(name)
print(i['schema:url'])

How do I turn a file's contents into a dictionary?

I have a function that I want to open .dat files with, to extract data from them, but the problem is I don't know how to turn that data back into a dictionary to store in a variable. Currently, the data in the files are stored like this: "{"x":0,"y":1}" (it uses up only one line of the file, which is just the normal structure of a dictionary).
Below is just the function where I open the .dat file and try to extract stuff from it.
def openData():
file = fd.askopenfile(filetypes=[("Data",".dat"),("All Files",".*")])
filepath = file.name
if file is None:
return
with open(filepath,"r") as f:
contents = dict(f.read())
print(contents["x"]) #let's say there is a key called "x" in that dictionary
This is the error that I get from it: (not because the key "x" is not in dict, trust me)
Exception in Tkinter callback
Traceback (most recent call last):
File "...\AppData\Local\Programs\Python\Python39\lib\tkinter\__init__.py", line 1892, in __call__
return self.func(*args)
File "...\PycharmProjects\[this project]\main.py", line 204, in openData
contents = dict(f.read())
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Process finished with exit code 0
Update: I tried using json and it worked, thanks to #match
def openData():
file = fd.askopenfile(filetypes=[("Data",".dat"),("All Files",".*")])
filepath = file.name
if file is None:
return
with open(filepath,"r") as f:
contents = dict(json.load(f))
print(contents["x"])
You need to parse the data to get a data structure from a string, fortunately, Python provides a function for safely parsing Python data structures: ast.literal_eval(). E.g:
import ast
...
with open("/path/to/file", "r") as data:
dictionary = ast.literal_eval(data.read())
Reference stackoverflow

Jupyter: suppress %%file magic output

When using IPython's %%file magic to write the content of a notebook cell to a file in the current working directory, is there a way to suppress the Created file ... info text displayed on execution of the cell?
Sometimes creating files in this way is super handy (for example when using a Matlab kernel) but this is a huge problem with respect to version control, I don't want the structure of my local filesystem to be present in code that others work on as well.
source for this function
#cell_magic
def writefile(self, line, cell):
"""Write the contents of the cell to a file.
The file will be overwritten unless the -a (--append) flag is specified.
"""
args = magic_arguments.parse_argstring(self.writefile, line)
if re.match(r'^(\'.*\')|(".*")$', args.filename):
filename = os.path.expanduser(args.filename[1:-1])
else:
filename = os.path.expanduser(args.filename)
if os.path.exists(filename):
if args.append:
print("Appending to %s" % filename)
else:
print("Overwriting %s" % filename)
else:
print("Writing %s" % filename)
mode = 'a' if args.append else 'w'
with io.open(filename, mode, encoding='utf-8') as f:
f.write(cell)
File: /usr/local/lib/python3.6/dist-packages/IPython/core/magics/osm.py

Stuck with string.translate function in python 3

import os
def rename_files():
#(1) get file names from a folder
file_list = os.listdir("my_directory")
#print(file_list)
os.chdir("my_directory")
saved_path = os.getcwd()
print("Current work directory is " + saved_path)
os.getcwd()
#(2) for each file, rename filename
for file_name in file_list:
os.rename(file_name, file_name.translate(None, "0123456789"))
os.chdir("my_directory")
rename_files()
And after this I've got an error:
TypeError: translate() takes exactly one argument (2 given)
str.translate in python 3.x accepts just one argument i.e. translation table.
From docs:
str.translate(table)
Return a copy of the string in which each character has been mapped
through the given translation table
You can create required translation table using str.maketrans
table = str.maketrans(dict.fromkeys('0123456789'))
file_name.translate(table)

Resources