I have a free account on PythonAnywhere from where I am trying to run the following script that locally works just fine.
I am wondering if the error I get is for technical reasons or just that PythonAnywhere forbids people to scrap from their platform for certain websites only?
Do you know of other free websites where I would be allowed to scrap anything?
import requests
from bs4 import BeautifulSoup as bs
def scrapMarketwatch(address):
#creating formatting data from scrapdata
r = requests.get(address)
c = r.content
sup = bs(c,"html.parser")
print(sup)
scrapMarketwatch('http://www.marketwatch.com/investing/future/sp%20500%20futures')
print('\n\n\n PARAGRAPH \n SPACE \n\n\n')
scrapMarketwatch('https://www.bloomberg.com/quote/USDJPY:CUR')
I get the following error:
File
"/usr/local/lib/python3.6/dist-packages/requests/packages/urllib3/util/retry.py",
line 376, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause)) requests.packages.urllib3.exceptions.MaxRetryError:
HTTPSConnectionPool(host='www.bloomberg.com', port=443): Max retries
exceeded with url: /quote/USDJPY:CUR (Caused by ProxyError('Cannot
conn ect to proxy.', OSError('Tunnel connection failed: 403
Forbidden',))) During handling of the above exception, another
exception occurred: Traceback (most recent call last): File
"/home/sylvester83/scrapit/try2.py", line 20, in
scrapMarketwatch('https://www.bloomberg.com/quote/USDJPY:CUR') File "/home/sylvester83/scrapit/try2.py", line 10, in scrapMarketwatch
r = requests.get(address) File "/usr/local/lib/python3.6/dist-packages/requests/api.py", line 70, in
get
return request('get', url, params=params, **kwargs) File "/usr/local/lib/python3.6/dist-packages/requests/api.py", line 56, in
request
return session.request(method=method, url=url, **kwargs) File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line
488, in request
resp = self.send(prep, **send_kwargs) File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line
609, in send
r = adapter.send(request, **kwargs) File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line
485, in send
raise ProxyError(e, request=request) requests.exceptions.ProxyError:
HTTPSConnectionPool(host='www.bloomberg.com', port=443): Max retries
exceeded with url: /quote/USDJPY:CUR (Caused by ProxyError('Cannot
connect to proxy.', OSEr ror('Tunnel connection failed: 403
Forbidden',)))
PythonAnywhere free accounts are only allowed to access external sites that are on their whitelist. Those permitted sites offer a machine API. You can ask for other sites to be added but not if you are going to scrape them.
Related
I'm trying to import data from MongoDB to Azure Machine learning with a python script. I use the following script:
import pymongo as pymongo
import pandas as pd
def azureml_main(dataframe1 = None, dataframe2 = None):
client = pymongo.MongoClient("SERVER:USERNAME:PASSWORD")
db = client['DATABASE']
coll = db['COLLECTION']
cursor = coll.find().limit(10)
df = pd.DataFrame(list(cursor))
return df,
This gives me the following error:
Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
Caught exception while executing function: Traceback (most recent call last):
File "C:\server\invokepy.py", line 199, in batch
odfs = mod.azureml_main(*idfs)
File "C:\temp\416f67ae321a4f7b9a2d5eda63aa127c.py", line 23, in azureml_main
df = pd.DataFrame(list(cursor))
File "C:\pyhome\lib\site-packages\pymongo\cursor.py", line 977, in next
if len(self.__data) or self._refresh():
File "C:\pyhome\lib\site-packages\pymongo\cursor.py", line 902, in _refresh
self.__read_preference))
File "C:\pyhome\lib\site-packages\pymongo\cursor.py", line 813, in __send_message
**kwargs)
File "C:\pyhome\lib\site-packages\pymongo\mongo_client.py", line 728, in _send_message_with_response
server = topology.select_server(selector)
File "C:\pyhome\lib\site-packages\pymongo\topology.py", line 121, in select_server
address))
File "C:\pyhome\lib\site-packages\pymongo\topology.py", line 97, in select_servers
self._error_message(selector))
pymongo.errors.ServerSelectionTimeoutError: SERVERNAME:XXXXX:[WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond,SERVERNAME:XXXXX: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond,SERVERNAME:XXXXX: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
Process returned with non-zero exit code 1
Is this caused by not whitelisting any IP adresses? I can't find any information on what kind of IP comes out of the Azure ML. Is there a workaround to this issue?
That error is nothing to do with any IP whitelisting; it's related to not being able to connect to your mongo database. Check your connection string, and that your server is running. The connection string should look something like
mongodb://username:password#server:27017/yourdatabase?authSource=admin
First check it works from your chosen command prompt / shell using
mongo mongodb://username:password#server:27017/yourdatabase?authSource=admin
then change your python connection to:
client = pymongo.MongoClient("<working connection string>")
I have set up a raspberry webserver running on my school lan network ,other people are connecting with arduino,sometimes when they connect i get this error:
Exception happened during processing of request from ('172.17.17.66', 49153)
Traceback (most recent call last):
File "/usr/lib/pythonz.7/SocketServer.py", line 290, in
_hand1e_request_nobloc k self.process_request(request, client_address) File
"/usr/lib/pythonz.7/SocketServer.py“, line 318, in process_request
self.finish_request(request, client_address) File
"/usr/lib/pythonz.7/SocketServer.py“, line 331, in finish_request
self.RequestHandlerClass(request, client_address, self) File
"/usr/lib/pythonz.7/SocketServer.py", line 652, in __init__ self.hand1e()
File "/usr/lib/pythonz.7/BaseHTTPServer.py“, line 340, in handle
self.handle_one_request() File "/usr/lib/pythonz.7/BaseHTTPServer.py", line
310, in handle_one request self . raw_requestline = self . rfile .
readline(65537) ‘ File "/usr/lib/pythonz.7/socket.py", line 480, in readline
data = self._sock.recv(se1f._rbufsize) error: [Errno 104] Connection reset
by peer
Can someone tell me what means? Is a problem of my server or their socket? If needed i can post my code.
The last line:
error: [Errno 104] Connection reset by peer
Means the client dropped the connection. I'd look into the Arduino code first.
I'm running Airflow on a clustered environment running on two AWS EC2-Instances. One for master and one for the worker. The worker node though periodically throws this error when running "$airflow worker":
[2018-08-09 16:15:43,553] {jobs.py:2574} WARNING - The recorded hostname ip-1.2.3.4 does not match this instance's hostname ip-1.2.3.4.eco.tanonprod.comanyname.io
Traceback (most recent call last):
File "/usr/bin/airflow", line 27, in <module>
args.func(args)
File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 387, in run
run_job.run()
File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 198, in run
self._execute()
File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2527, in _execute
self.heartbeat()
File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 182, in heartbeat
self.heartbeat_callback(session=session)
File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 50, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2575, in heartbeat_callback
raise AirflowException("Hostname of job runner does not match")
airflow.exceptions.AirflowException: Hostname of job runner does not match
[2018-08-09 16:15:43,671] {celery_executor.py:54} ERROR - Command 'airflow run arl_source_emr_test_dag runEmrStep2WaiterTask 2018-08-07T00:00:00 --local -sd /var/lib/airflow/dags/arl_source_emr_test_dag.py' returned non-zero exit status 1.
[2018-08-09 16:15:43,681: ERROR/ForkPoolWorker-30] Task airflow.executors.celery_executor.execute_command[875a4da9-582e-4c10-92aa-5407f3b46d5f] raised unexpected: AirflowException('Celery command failed',)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 52, in execute_command
subprocess.check_call(command, shell=True)
File "/usr/lib64/python3.6/subprocess.py", line 291, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'airflow run arl_source_emr_test_dag runEmrStep2WaiterTask 2018-08-07T00:00:00 --local -sd /var/lib/airflow/dags/arl_source_emr_test_dag.py' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/dist-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/python3.6/dist-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 55, in execute_command
raise AirflowException('Celery command failed')
airflow.exceptions.AirflowException: Celery command failed
When this error occurs the task is marked as failed on Airflow and thus fails my DAG when nothing actually went wrong in the task.
I'm using Redis as my queue and postgreSQL as my meta-database. Both are external as AWS services. I'm running all of this on my company environment which is why the full name of the server is ip-1.2.3.4.eco.tanonprod.comanyname.io. It looks like it wants this full name somewhere but I have no idea where I need to fix this value so that it's getting ip-1.2.3.4.eco.tanonprod.comanyname.io instead of just ip-1.2.3.4.
The really weird thing about this issue is that it doesn't always happen. It seems to just randomly happen every once in a while when I run the DAG. It's also occurring on all of my DAGs sporadically so it's not just one DAG. I find it strange though how it's sporadic because that means other task runs are handling the IP address for whatever this is just fine.
Note: I've changed the real IP address to 1.2.3.4 for privacy reasons.
Answer:
https://github.com/apache/incubator-airflow/pull/2484
This is exactly the problem I am having and other Airflow users on AWS EC2-Instances are experiencing it as well.
The hostname is set when the task instance runs, and is set to self.hostname = socket.getfqdn(), where socket is the python package import socket.
The comparison that triggers this error is:
fqdn = socket.getfqdn()
if fqdn != ti.hostname:
logging.warning("The recorded hostname {ti.hostname} "
"does not match this instance's hostname "
"{fqdn}".format(**locals()))
raise AirflowException("Hostname of job runner does not match")
It seems like the hostname on the ec2 instance is changing on you while the worker is running. Perhaps try manually setting the hostname as described here https://forums.aws.amazon.com/thread.jspa?threadID=246906 and see if that sticks.
I had a similar problem on my Mac. It fixed it setting hostname_callable = socket:gethostname in airflow.cfg.
Personally when running on my Mac, I found that I got similar errors to this when the Mac would sleep while I was running a long job. The solution was to go into System Preferences -> Energy Saver and then check "Prevent computer from sleeping automatically when the display is off."
Im having trouble establishing a WebSocket connection between a client and a server (both on localhost).
I'm using Nginx, uWSGI and Flask.
I'm getting the following error printed out in the uWSGI log when I try to establish a WebSocket connection (note that normal GET and POST works):
you need to build uWSGI with SSL support to use the websocket handshake api function !!!
Traceback (most recent call last):
File "/Users/user/Documents/Development/virtualenv/flask/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
return self.wsgi_app(environ, start_response)
File "/Users/user/Documents/Development/virtualenv/flask/lib/python2.7/site-packages/flask_uwsgi_websocket/_gevent.py", line 63, in __call__
environ.get('HTTP_ORIGIN', ''))
IOError: unable to complete websocket handshake
My question is therefore: How can I build uWSGI with SSL support?
We have a server running Plone 4 which hosts many of our sites, on one we have just noticed it is not sending any mail from PloneFormGen.
I have tried:
Recreating the form
Deleting & recreating the mailer in the form
Altering the site from address (avoiding any spam issues like sending to and from the same address)
All the other sites on this server can send mail fine from PFG, I am not really sure which log files to check but below is the last few lines of my events.log file which mentions a few things about missing adapters as far as I can tell, can someone translate this form me?:
------
2012-07-11T15:13:38 WARNING PloneFormGen Designated action adapter 'formsavedataadapter.2012-07-03.5018819612' is missing; ignored.
------
2012-07-11T15:13:38 WARNING PloneFormGen Designated action adapter 'formmaileradapter.2012-07-11.9678428439' is missing; ignored.
------
2012-07-11T15:13:38 WARNING PloneFormGen Designated action adapter 'formmaileradapter.2012-07-11.9935785303' is missing; ignored.
------
2012-07-11T15:13:38 ERROR MailDataManager [Errno -2] Name or service not known
Traceback (most recent call last):
File "/usr/local/Plone/buildout-cache/eggs/Plone-4.0.4-py2.6.egg/Products/CMFPlone/patches/sendmail.py", line 9, in _catch
return func(*args, **kwargs)
File "/usr/local/Plone/buildout-cache/eggs/zope.sendmail-3.5.2-py2.6.egg/zope/sendmail/mailer.py", line 46, in send
connection = self.smtp(self.hostname, str(self.port))
File "/usr/local/Plone/Python-2.6/lib/python2.6/smtplib.py", line 239, in __init__
(code, msg) = self.connect(host, port)
File "/usr/local/Plone/Python-2.6/lib/python2.6/smtplib.py", line 295, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/usr/local/Plone/Python-2.6/lib/python2.6/smtplib.py", line 273, in _get_socket
return socket.create_connection((port, host), timeout)
File "/usr/local/Plone/Python-2.6/lib/python2.6/socket.py", line 547, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -2] Name or service not known
------
2012-07-11T15:13:38 ERROR MailDataManager [Errno -2] Name or service not known
Traceback (most recent call last):
File "/usr/local/Plone/buildout-cache/eggs/Plone-4.0.4-py2.6.egg/Products/CMFPlone/patches/sendmail.py", line 9, in _catch
return func(*args, **kwargs)
File "/usr/local/Plone/buildout-cache/eggs/zope.sendmail-3.5.2-py2.6.egg/zope/sendmail/mailer.py", line 46, in send
connection = self.smtp(self.hostname, str(self.port))
File "/usr/local/Plone/Python-2.6/lib/python2.6/smtplib.py", line 239, in __init__
(code, msg) = self.connect(host, port)
File "/usr/local/Plone/Python-2.6/lib/python2.6/smtplib.py", line 295, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/usr/local/Plone/Python-2.6/lib/python2.6/smtplib.py", line 273, in _get_socket
return socket.create_connection((port, host), timeout)
File "/usr/local/Plone/Python-2.6/lib/python2.6/socket.py", line 547, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -2] Name or service not known
------
2012-07-11T15:30:18 INFO CMFFormController /usr/local/Plone/buildout-cache/eggs/Products.PloneFormGen-1.7a2-py2.6.egg/Products/PloneFormGen/skins/PloneFormGen/fg_base_view_p3.cpt: No default action specified for status success, content type ANY. Users of IE can submit pages using the return key, resulting in no button in the REQUEST. Please specify a default action for this case.
------
2012-07-11T15:30:18 INFO CMFFormController /usr/local/Plone/buildout-cache/eggs/Products.PloneFormGen-1.7a2-py2.6.egg/Products/PloneFormGen/skins/PloneFormGen/fg_embedded_view_p3.cpt: No default action specified for status success, content type ANY. Users of IE can submit pages using the return key, resulting in no button in the REQUEST. Please specify a default action for this case.
------
2012-07-11T15:30:32 INFO CMFFormController /usr/local/Plone/buildout-cache/eggs/Products.PloneFormGen-1.7a2-py2.6.egg/Products/PloneFormGen/skins/PloneFormGen/fg_base_view_p3.cpt: No default action specified for status success, content type ANY. Users of IE can submit pages using the return key, resulting in no button in the REQUEST. Please specify a default action for this case.
------
2012-07-11T15:30:32 INFO CMFFormController /usr/local/Plone/buildout-cache/eggs/Products.PloneFormGen-1.7a2-py2.6.egg/Products/PloneFormGen/skins/PloneFormGen/fg_embedded_view_p3.cpt: No default action specified for status success, content type ANY. Users of IE can submit pages using the return key, resulting in no button in the REQUEST. Please specify a default action for this case.
------
2012-07-11T15:39:18 WARNING PloneFormGen Designated action adapter 'formsavedataadapter.2012-07-03.5018819612' is missing; ignored.
------
2012-07-11T15:39:18 WARNING PloneFormGen Designated action adapter 'formmaileradapter.2012-07-11.9678428439' is missing; ignored.
------
2012-07-11T15:39:18 WARNING PloneFormGen Designated action adapter 'formmaileradapter.2012-07-11.9935785303' is missing; ignored.
------
2012-07-11T15:39:18 ERROR MailDataManager [Errno -2] Name or service not known
Traceback (most recent call last):
File "/usr/local/Plone/buildout-cache/eggs/Plone-4.0.4-py2.6.egg/Products/CMFPlone/patches/sendmail.py", line 9, in _catch
return func(*args, **kwargs)
File "/usr/local/Plone/buildout-cache/eggs/zope.sendmail-3.5.2-py2.6.egg/zope/sendmail/mailer.py", line 46, in send
connection = self.smtp(self.hostname, str(self.port))
File "/usr/local/Plone/Python-2.6/lib/python2.6/smtplib.py", line 239, in __init__
(code, msg) = self.connect(host, port)
File "/usr/local/Plone/Python-2.6/lib/python2.6/smtplib.py", line 295, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/usr/local/Plone/Python-2.6/lib/python2.6/smtplib.py", line 273, in _get_socket
return socket.create_connection((port, host), timeout)
File "/usr/local/Plone/Python-2.6/lib/python2.6/socket.py", line 547, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -2] Name or service not known
------
2012-07-11T15:39:18 ERROR MailDataManager [Errno -2] Name or service not known
Traceback (most recent call last):
File "/usr/local/Plone/buildout-cache/eggs/Plone-4.0.4-py2.6.egg/Products/CMFPlone/patches/sendmail.py", line 9, in _catch
return func(*args, **kwargs)
File "/usr/local/Plone/buildout-cache/eggs/zope.sendmail-3.5.2-py2.6.egg/zope/sendmail/mailer.py", line 46, in send
connection = self.smtp(self.hostname, str(self.port))
File "/usr/local/Plone/Python-2.6/lib/python2.6/smtplib.py", line 239, in __init__
(code, msg) = self.connect(host, port)
File "/usr/local/Plone/Python-2.6/lib/python2.6/smtplib.py", line 295, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/usr/local/Plone/Python-2.6/lib/python2.6/smtplib.py", line 273, in _get_socket
return socket.create_connection((port, host), timeout)
File "/usr/local/Plone/Python-2.6/lib/python2.6/socket.py", line 547, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -2] Name or service not known
I can provide other information please just tell me where to look if possible.
Thanks.
This traceback means that your server is unable to connect mail server:
gaierror: [Errno -2] Name or service not known
It's an operating system levle failure.
DNS resolution of the mail server name failed. The reason could be
Firewalled server
Misconfigured DNS
or mistyped mail server name as mentioned above
Perhaps this site has an incorrect mail server configured at Site Setup>Mail?