I have searched a multiple articles, but unable to get iPython (Python 2.7) to export data to a CSV, and I do not receive an error message to troubleshoot the specific problem, and when I include "print(new_links)" I obtain the desired output; thus, this issue is printing to the csv.
Any suggestions on next steps are much appreciated !
Thanks!
import csv
import requests
import lxml.html as lh
url = 'http://wwwnc.cdc.gov/travel/destinations/list'
page = requests.get(url)
doc = lh.fromstring(page.content)
new_links = []
for link_node in doc.iterdescendants('a'):
try:
new_links.append(link_node.attrib['href'])
except KeyError:
pass
cdc_part1 = open("cdc_part1.csv", 'wb')
wr = csv.writer(cdc_part1, dialect='excel')
wr.writerow(new_links)
Check to make sure the new_links is a list of lists.
If so and wr.writerow(new_links) is still not working, you can try:
for row in new_links:
wr.writerow(row)
I would also check the open statement's file path and mode. Check if you can get it to work with 'w'.
Related
I encountered an issue while using the code from https://codingandfun.com/scraping-sec-edgar-python/
I tried to contact the authors of the website, but didn't work out. I am hoping to get some help here, and thank you in advance.
It seems that when I get to the print (download) step, the output is some weird special characters instead of organized firm urls. Is there something wrong the SEC master.idx? Could someone help me identify the issue?
Here is the code:
import bs4 as bs
import requests
import pandas as pd
import re
company = 'Facebook Inc'
filing = '10-Q'
year = 2020
quarter = 'QTR3'
#get name of all filings
download = requests.get(f'https://www.sec.gov/Archives/edgar/full-index/{year}/{quarter}/master.idx').content
download = download.decode("utf-8").split('\n')
print (download)
You need to declare your user-agent as described here otherwise you will download an html page prompting you do so.
I'm a newbie to beautiful soup. Can anyone suggest how to scrape the excel file for the past 14 days? My understanding is to loop over the date and save the file. Thanks
https://www.hkexnews.hk/reports/sharerepur/sbn.asp
import requests
from bs4 import BeautifulSoup
res=requests.get("https://www.hkexnews.hk/reports/sharerepur/sbn.asp")
soup=BeautifulSoup(res.text,"lxml")
Now we will find data inside table using find method and use find_all to get all td tags and append data to list lst.
main_data=soup.find("table").find_all("td")
lst=[]
for data in main_data:
try:
url=data.find("a").get('href')[1:]
main_url="https://www.hkexnews.hk/reports/sharerepur"+url
lst.append(main_url)
except AttributeError:
pass
Now iterate through lst and call individual URL to download data to excel file.
for url in range(len(lst)):
resp=requests.get(lst[url])
output = open(f'test_{url}.xls', 'wb')
output.write(resp.content)
output.close()
print(url)
Image: (File being created in Local)
I came across the problem when I tried to use xlrd to import an .xls file and create dataframe using python.
Here is my file format:
xls file format
When I run:
import os
import pandas as pd
import xlrd
for filename in os.listdir("."):
if filename.startswith("report_1"):
df = pd.read_excel(filename)
It's showing "XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'Report g'"
I am pretty sure nothing wrong with xlrd (version 1.0.0) because when I remove the first row, dataframe can be created.
Wonder if there is any way that i can load the original file format?
Try following that accounts for a header line:
df = pd.read_excel(filename, header=0)
I am trying to read a simple stock page with the following code. The last line returns an error. I have double checked that the url works and have also tried multiple url's as a check. Any help please?
import urllib.request
url = "https://www.google.com"
data = urllib.request.urlopen(url).read()
The following code was automatically generated in the RStudio "Import Dataset" function. However, when I hit the Import button, I got different result. The "Data Preview" section of Import Dataset recognised my login credential and returned 10yr data. But when it was imported, it didn't recognise the login credential and returned 5yr data. Any way to resolve this please so I can get the same result as the Preview when importing data?
library(readr)
ReportProcess4CSV <-
read_csv("http://financials.morningstar.com/ajax/ReportProcess4CSV.html?reportType=is&period=12&dataType=A&order=asc&columnYear=10&rounding=3&denominatorView=raw&t=MSFT",
skip = 1)
View(ReportProcess4CSV)