Using python mechanize to log in websites - web-scraping

I am using python mechanize to log in the social website https://www.pinterest.com/login/. I run into an error:
HTTP Error 403: request disallowed by robots.txt
Here are my simple codes.
import urllib
import re
import mechanize
browser=mechanize.Browser()
browser.open("https://www.pinterest.com/login/")
browser.select_form(nr=0)
browser.form['username_or_email']='xxx#example.com'
browser.form['password']='xxx'
browser.submit()
print browser
What is wrong? Thank you!

Related

push_notebook does not work in Google Collab Jupyter Notebook

I am using bokeh on Google collab. I wonder if anybody has used push_notebook in Google Collab Jupyter notebook. I am trying to run following code in Jupiter Notebook on Google Collab , but it get stuck on push_notebook() command
from ipywidgets import interact
import numpy as np
from bokeh.io import push_notebook,show,output_notebook
from bokeh.plotting import figure
output_notebook()
x=np.linspace(0,2*np.pi,2000)
y=np.sin(x)
p=figure(title="ff",plot_height=300,plot_width=600,y_range=(-5,5))
r=p.line(x,y,color="red",line_width=2)
def update(f,w=1,A=1,phi=0):
print("fff")
if f== "sin":func=np.sin
if f== "sin":func=np.sin
elif f=="cos":func =np.cos
elif f== "tan":func=np.tan
r.data_source.data['y']=A*func(w*x+phi)
push_notebook()
show(p,notebook_handle=True)
interact(update,f=["sin","cos","tan"],w=(0,100),A=(1,5),phi=(0,20,0.1))
Can anybody suggest whats wrong in the code and how can it be run Google Collab.
push_notebook does not and cannot work on Google Collab due to the fact that Google's notebook implementation will not allow the necessary websocket connections to be opened. There is nothing that can be done about this until/unless Google makes changes on their end.
ref: https://github.com/bokeh/bokeh/issues/9302

How to resolve: ImportError: cannot import name 'HttpNtlmAuth' in python3 script?

I have installed both requests and requests_ntlm modules using "sudo python3 -m pip install requests" (and requests_ntlm respectively) and both installs were successful.
When I then attempt to do "from requests import HttpNtlmAuth", I get an error stating "cannot import name 'HttpNtlmAuth'. I do not get this error on my "import requests" line.
When I do a "sudo python3 -m pip list", I see both are installed and are the latest versions.
I've not encountered this error before, only "cannot import module", so I'm unfamiliar with how to resolve this.
EDIT 1: Additional information. When I run this script from command line as "sudo", it works. Because I am running my python script from within a PHP file using "exec", I don't particularly want to run this as a root user. Is there a way around this, or possibly running the exec statement with sudo?
the HttpNtlmAuth class is in the requests_ntlm package so you'll need to have:
import requests
from requests_ntlm import HttpNtlmAuth
Then you'll be able to instantiate your authentication
session = requests.Session()
session.auth = HttpNtlmAuth('domain\\username','password')
session.get(url)

how to port python urllib2 app (a web scraper) that uses Beautiful Soup 4 to use requests package instead

I am trying to update web scraper app that uses Beautiful Soup 4 in Python 3 in Anaconda to use the Requests package instead of urllib, urllib2 and urllib3.
urllib and urllib2 don't exist in the Anaconda channels and from what I have read requests package has made urllib and urllib2 obsolete. I am still rather new in Python programming for web scraping, and don't yet fully understand all concepts and internal subtleties of these 4 packages.
When I replace "urllib2.urlopen()" with "requests.get()", I get the following error:
import requests
from bs4 import BeautifulSoup
'''replace the following line with "page = Request.get(url)" '''
# page = urllib2.urlopen(url)
page = requests.get(url)
soup_page = BeautifulSoup(page,"lxml")
I get the following error message with no explanation in the bs4 module:
File "C:\ProgramData\Anaconda3\lib\site-packages\bs4__init__.py", line 246, in init
elif len(markup) <= 256 and (
TypeError: object of type 'Response' has no len()
This error message puts me deep into the bowels of init.py in bs4.
I cannot find an explanation of how to port urllib or urllib2 code to requests with Beautiful Soup 4.
Can anyone provide an explicit guide on how to port urllib / urllib2 apps to use requests with beautiful soup in Python 3?
Anaconda / conda does not import urllib or urllib2 into Python 3 environments.
Thank you.
Rich
The error occurs because you're trying to pass the html code of the response to Beautifulsoup in the wrong way. Pass response.text, instead of the response object:
# page = urllib2.urlopen(url)
page = requests.get(url)
soup_page = BeautifulSoup(page.text, "lxml")
You may need to read requests documentation

How can I send HTTP Basic Authentication headers in RSelenium? [duplicate]

I am trying to enter data in prompt (URL Given), below codes is giving me an error. Please help me out with these?
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Firefox()
url = "http://the-internet.herokuapp.com/basic_auth"
driver.get(url)
time.sleep(5)
alert = driver.switch_to.alert
alert.authenticate('admin','admin')
time.sleep(4)
alert.accept()
I have tried with:
ActionChains(driver).send_keys("admin").send_keys(Keys.TAB).send_keys("admin").perform()
This one is also not working.
When you work with Selenium 3.4.0, geckodriver v0.18.0, Mozilla Firefox 53.0 through Python 3.6.1 you can bypass the Basic Authentication popup through embedding the username and password in the url itself as follows.
This solution opens the URL http://the-internet.herokuapp.com/basic_auth and authenticates with a valid username and password credentials.
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary('C:\\Program Files\\Mozilla Firefox\\firefox.exe')
driver = webdriver.Firefox(firefox_binary=binary, executable_path="C:\\Utility\\BrowserDrivers\\geckodriver.exe")
driver.get("http://admin:admin#the-internet.herokuapp.com/basic_auth")
def test_1_authentication(self):
self.driver.get("https://the-internet.herokuapp.com/basic_auth")
shell = win32com.client.Dispatch("WScript.Shell")
shell.Sendkeys("admin")
time.sleep(3)
shell.Sendkeys("{TAB}")
time.sleep(3)
shell.Sendkeys("admin")
time.sleep(3)
shell.Sendkeys("{ENTER}")
time.sleep(3)
Above code is also properly worked :)

py2app: Compiles app but app has error on opening

I am working on a Python application in Python3.6 that I would like to convert into a standalone application that can be ported easily to other devices. I tried using py2app as in this tutorial.
Everything works well until I get to the point of actually creating the app. It does not throw any error in the creation process and creates the .app file, however, when I try to run it, A window pops up saying there is an error and gives me the options of terminating or opening the console. I tried opening the console but I cannot find any substantive information in the error messages.
These are the import statements that I have:
from urllib.request import urlopen, build_opener
from bs4 import BeautifulSoup, SoupStrainer
import ssl
import urllib
import sys
import subprocess
from tkinter import *
from tkinter.ttk import *
import webbrowser
from unidecode import unidecode
As far as I know the only 2 packages that aren't standard with python are bs4 and unidecode. My setup.py file looks like this:
from setuptools import setup
APP = ['GUImain.py']
DATA_FILES = ['logo.png']
OPTIONS = {'argv_emulation': True,
'iconfile': 'logo.png',
'includes': ['undidecode','bs4']}
setup(
app=APP,
data_files=DATA_FILES,
options={'py2app': OPTIONS},
setup_requires=['py2app'],
)
I haven't seen any other errors like this one from any of my searching. I have seen some suggestion that py2app doesn't fully support Python3.6. Does anyone know how I can figure out what error is being thrown? Any suggestions on different tools to use and tutorials on how to use them?

Resources