Align text inside a table docx Python - docx

How to align the text to right with python.
from docx import Document
document = Document('example.docx')
????

Try This, if your docx doesn't created or exist yet then use Document() intead of Document("example.docx")
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document("example.docx")
para = doc.add_paragraph('This is text.')
para.paragraph_format.alignment = WD_ALIGN_PARAGRAPH.RIGHT
doc.save("example.docx")
If you want to do this in table so try this
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document("example.docx")
table = doc.add_table(rows=0, columns=2)
row=table.add_row().cells
p=row[0].add_paragraph('left justified text')
p.alignment=WD_ALIGN_PARAGRAPH.RIGHT
doc.save("example.docx")

Related

How to convert div tags to a table?

I want to extract the table from this website https://www.rankingthebrands.com/The-Brand-Rankings.aspx?rankingID=37&year=214
Checking the source of that website, I noticed that somehow the table tag is missing. I assume that this table is a summary of multiple div classes. Is there any easy approach to convert this table to excel/csv? I badly have coding skills/experience...
Appreciate any help
There are a few way to do that. One of which (in python) is (pretty self-explanatory, I believe):
import lxml.html as lh
import csv
import requests
url = 'https://www.rankingthebrands.com/The-Brand-Rankings.aspx?rankingID=37&year=214'
req = requests.get(url)
doc = lh.fromstring(req.text)
headers = ['Position', 'Name', 'Brand Value', 'Last']
with open('brands.csv', 'a', newline='') as fp:
#note the 'a' in there - for 'append`
file = csv.writer(fp)
file.writerow(headers)
#with the headers out of the way, the heavier xpath lifting begins:
for row in doc.xpath('//div[#class="top100row"]'):
pos = row.xpath('./div[#class="pos"]//text()')[0]
name = row.xpath('.//div[#class="name"]//text()')[0]
brand_val = row.xpath('.//div[#class="weighted"]//text()')[0]
last = row.xpath('.//div[#class="lastyear"]//text()')[0]
file.writerow([pos,name,brand_val,last])
The resulting file should be at least close to what you're looking for.

WebScraping for downloading certain .csv files

I have this question. I need to download certain .csv files from a website as the title said, and i'm having troubles doing it. I'm very new on programming and especially with this topic(web scraping)
from bs4 import BeautifulSoup as BS
import requests
DOMAIN = 'https://datos.gob.ar'
URL = 'https://datos.gob.ar/dataset/cultura-mapa-cultural-espacios-culturales/'
FILETYPE = ".csv"
def get_soup(url):
return BS(requests.get(url).text, 'html.parser')
for link in get_soup(URL).find_all('a'):
file_link = link.get('href')
if FILETYPE in file_link:
print(file_link)
this code shows all avaibable .csv files but I just need to download those which end up with "biblioteca popular.csv" , "cine.csv" and "museos.csv"
Maybe it's a very simple task but I can not finding out
https://datos.cultura.gob.ar/dataset/37305de4-3cce-4d4b-9d9a-fec3ca61d09f/resource/456d1087-87f9-4e27-9c9c-1d9734c7e51d/download/biblioteca_especializada.csv
https://datos.cultura.gob.ar/dataset/37305de4-3cce-4d4b-9d9a-fec3ca61d09f/resource/01c6c048-dbeb-44e0-8efa-6944f73715d7/download/biblioteca_popular.csv
https://datos.cultura.gob.ar/dataset/37305de4-3cce-4d4b-9d9a-fec3ca61d09f/resource/8d0b7f33-d570-4189-9961-9e907193aebc/download/casas_bicentenario.csv
https://datos.cultura.gob.ar/dataset/37305de4-3cce-4d4b-9d9a-fec3ca61d09f/resource/4207def0-2ff7-41d5-9095-d42ae8207a5d/download/museos.csv
https://datos.cultura.gob.ar/dataset/37305de4-3cce-4d4b-9d9a-fec3ca61d09f/resource/392ce1a8-ef11-4776-b280-6f1c7fae16ae/download/cine.csv
https://datos.cultura.gob.ar/dataset/37305de4-3cce-4d4b-9d9a-fec3ca61d09f/resource/87ebac9c-774c-4ef2-afa7-044c41ee4190/download/teatro.csv
You can extract the JavaScript object housing that info which otherwise would be loaded to where you see if by JavaScript running in the browser. You then need to do some Unicode code point cleaning and string cleaning and parse as JSON. You can use a key word list to select from desired urls.
Unicode cleaning method by #Mark Tolonen
import json
import requests
import re
URL = 'https://datos.gob.ar/dataset/cultura-mapa-cultural-espacios-culturales/'
r = requests.get(URL)
search = ["Bibliotecas Populares", "Salas de Cine", "Museos"]
s = re.sub( r'\n\s{2,}', '', re.search(r'"#graph": (\[[\s\S]+{0}[\s\S]+)}}'.format(search[0]), r.text).group(1))
data = json.loads(re.sub(r'\\"', '', re.sub(r'\\u([0-9a-fA-F]{4})',lambda m: chr(int(m.group(1),16)),s)))
for i in data:
if 'schema:name' in i:
name = i['schema:name']
if name in search:
print(name)
print(i['schema:url'])

How can I extract text from Multiple URLs using beautifulsoup?

I am doing lead generation and want to extract text for a handful of URLs. Here is my code to extract for one URL. What should i do if i want to extract for more than one URL and save it into a dataframe?
import urllib
from urllib.request import urlopen as urlopen
from bs4 import BeautifulSoup
url = 'https://www.wdtl.com/'
html = urlopen(url).read()
soup = BeautifulSoup(html)
for script in soup(["script", "style"]):
script.extract() # rip it out
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = '\n'.join(chunk for chunk in chunks if chunk)
print(text)
If I understand you correctly, you can get there using this simplified method. Let's see if it works for you:
import pandas as pd
from bs4 import BeautifulSoup as bs
import requests
headers={'User-Agent':'Mozilla/5.0'}
url = 'https://www.wdtl.com/'
resp = requests.get(url,headers = headers)
soup = bs(resp.content, "lxml")
#first, find the links
links = soup.find_all('link',href=True)
#create a list to house the links
all_links= []
#find each link and add it to the list
for link in links:
if 'http' in link['href']: #the soup contains many non-http links; this will remove them
all_links.append(link['href'])
#finally, load the list into a dataframe
df = pd.DataFrame(all_links)

extracting key-value data from javascript json type data with bs4

I am trying to extract some information from HTML of a web page.
But neither regex method nor list comprehension method works.
At http://bitly.kr/RWz5x, there is some key called encparam enclosed in getjason from a javascript tag which is 49th from all script elements of the page.
Thank you for your help in advance.
sam = requests.get('http://bitly.kr/RWz5x')
#html = sam.text
html=sam.content
soup = BeautifulSoup(html, 'html.parser')
scripts = soup.find_all('script')
#your_script = [script for script in scripts if 'encparam' in str(script)][0]
#print(your_script)
#print(scripts)
pattern = re.compile("(\w+): '(.*?)'")
fields = dict(re.findall(pattern, scripts.text))
Send your request to the following url which you can find in the sources tab:
import requests
from bs4 import BeautifulSoup as bs
import re
res = requests.get("https://navercomp.wisereport.co.kr/v2/company/c1010001.aspx?cmp_cd=005930")
soup = bs(res.content, 'lxml')
r = re.compile(r"encparam: '(.*)'")
data = soup.find('script', text=r).text
encparam = r.findall(data)[0]
print(encparam)
It is likely you can avoid bs4 altogether:
import requests
import re
r = requests.get("https://navercomp.wisereport.co.kr/v2/company/c1010001.aspx?cmp_cd=005930")
p = re.compile(r"encparam: '(.*)'")
encparam = p.findall(r.text)[0]
print(encparam)
If you actually want the encparam part in the string:
import requests
import re
r = requests.get("https://navercomp.wisereport.co.kr/v2/company/c1010001.aspx?cmp_cd=005930")
p = re.compile(r"(encparam: '\w+')")
encparam = p.findall(r.text)[0]
print(encparam)

How to export an R citation output into endnote?

How can I easily import citations from R into endnote obtained by for instance
citation("ggplot2")
Is there a good workflow for this or do I manually have to do it?
How automated this can be will depend on what Endnote can import. It seems BibTeX import is not currently possible out of the box, and requires some extra software. See for example: http://www.lib.uts.edu.au/content/faq/how-can-i-import-bibliography-endnote-bibtex-latex-what-about-converting-other-way-endno
Read ?bibentry and in particular the argument style and the Details section. See if Endnote can import data in any of those formats? I doubt it, but I have never used Endnote.
If not, we can go the BibTeX route if you install something that allows you to import BibTeX into Endnote.
> utils:::print.bibentry(citation("ggplot2"), style = "Bibtex")
#Book{,
author = {Hadley Wickham},
title = {ggplot2: elegant graphics for data analysis},
publisher = {Springer New York},
year = {2009},
isbn = {978-0-387-98140-6},
url = {http://had.co.nz/ggplot2/book},
}
To get this out into a file for passing to an import utility, you can use capture.output()
capture.output(utils:::print.bibentry(citation("ggplot2"), style = "Bibtex"),
file = "endnote_import.bib")
Which gives a file with the following content:
$ cat endnote_import.bib
#Book{,
author = {Hadley Wickham},
title = {ggplot2: elegant graphics for data analysis},
publisher = {Springer New York},
year = {2009},
isbn = {978-0-387-98140-6},
url = {http://had.co.nz/ggplot2/book},
}
which you should be able to import with third party tools.

Resources