I'm having trouble finding a unique element or set of elements to identify a password field. There are two attributes I want to experiment with but haven't figured out how to deal with all the quotations. Typically, there are only two sets and I know one must be single and the other double but how are, e.g., three sets to be managed?
(Is this impossible and should I take another approach such as using a path-like approach using descendants/children?)
Website I'm working on:
https://myibd.investors.com/secure/signin.aspx?eurl=https://marketsmith.investors.com/
My code so far:
import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# create driver object and launch the webpage
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
driver.get("https://myibd.investors.com/secure/signin.aspx?eurl=https://marketsmith.investors.com/")
# switch to the iframe we need
driver.switch_to.frame(driver.find_element_by_css_selector("iframe[id = 'signin-iframe']"))
# create variable for the login field
login_field = driver.find_element_by_css_selector("input[name='username'][data-gigya-placeholder='Email']")
# variable for password field
pswd_field = driver.find_element_by_css_selector()
The two tags & values I'd like to experiment with:
gigya-expression:data-gigya-placeholder="screenset.translations['PASSWORD_132128826476804690_PLACEHOLDER']"
gigya-expression:aria-label="screenset.translations['PASSWORD_132128826476804690_PLACEHOLDER']"
Edit 1
Two new attempts that did not work:
1.) I tried using backslashes \ to escape the quotes inside the value as suggested below.
password_field = driver.find_element_by_css_selector("input[gigya-expression:data-gigya-placeholder='\"screenset.translations[\'PASSWORD_132128826476804690_PLACEHOLDER\']\"]")
2.) I took #Laif's idea a little further and found this article on using escape characters for single and double quotes, ' and " respective.
password_field = driver.find_element_by_css_selector("input[gigya-expression:data-gigya-placeholder='"screenset.translations['PASSWORD_132128826476804690_PLACEHOLDER']"']")
Edit 2
workaround using XPath
I haven't figured out how to deal with the problem in CSS but I've gotten what I need using xpath. After Inspecting the target field, you can right-click the HTML and copy > Copy XPath.
copied XPath:
//*[#id="gigya-login-form"]/div[2]/div[3]/div[2]/input
python code:
pswd_field = driver.find_element_by_xpath("//*[#id='gigya-login-form']/div[2]/div[3]/div[2]/input")
If quotations are giving you trouble, remember you can escape any character inline with \
For example if I were to assign str="\"\"\"\"", print(str) would print out """".
Both these are valid strings as well, because the ' is enveloped by the "
gigya-expression:data-gigya-placeholder="screenset.translations['PASSWORD_132128826476804690_PLACEHOLDER']"
gigya-expression:aria-label="screenset.translations['PASSWORD_132128826476804690_PLACEHOLDER']"
Related
I have a piece of code that I made in Google Colab that essentially just scrapes a piece of data from a website, show below:
#imports
#<div class="sc-aef7b723-0 dDQUel priceTitle">
#<div class="priceValue ">
from bs4 import BeautifulSoup
import requests
import time
url = 'https://coinmarketcap.com/currencies/index-cooperative/'
HTML = requests.get(url)
soup = BeautifulSoup(HTML.text, 'html.parser')
text = soup.find('div', attrs={'class':'sc-aef7b723-0 dDQUel priceTitle'}).find('div', attrs={'class':'priceValue '}).text
print(text)
I need this to run as a py file on my computer, but when it runs as a py file, I get the error:
text = soup.find('div', attrs={'class':'sc-aef7b723-0 dDQUel priceTitle'}).find('div', attrs={'class':'priceValue '}).text
AttributeError: 'NoneType' object has no attribute 'text'
I was wondering why this happened as it is the exact same code. All of my packages are at the most recent version as well.
You just need to remove the trailing space in a class name attrs={'class':'priceValue'} because when you run the specified web page through the html.parser it corrects the HTML in some ways.
In this case it remove the trailing space that present on the web page, because it doesn't really makes sense to have a trailing space in a class name. Spaces needed when you have more than one class for a given element.
So, parsed web page that you store in your soup variable have that div looking like this <div class="priceValue"><span>$1.74</span></div>. And soup.find function actually care about trailing spaces, so it couldn't match the class priceValue with priceValue
To match the class with any trailing or leading whitespaces you could've used the soup.select function that uses CSS selectors to match elements, so it doesn't care about spaces, you could've found element of that class like this (with any amount of trailing and/or leading whitespaces):
css_selected_value = soup.select("[class= priceValue ]")[0].text
print(css_selected_value)
That being said, I'm not sure why your code works properly on Google Colab, never tried it. Maybe will try to dig into it later.
After some issues trying to connect to the database via Jupyter, it works now.
However, when trying to contact the database, there are some issues with 2 tables (the others works well).
FYI:
I'm working with datas from Germany, so maybe the problem is with the special characters ü,ä,ö?
I,ve been thinking to specify to the connection line that it is special encoding but I have no ideas how to do it.
import cx_Oracle
import pandas as pd
# The connection
conn = cx_Oracle.Connection(user='', password='', dsn='')
# The Query
SQL_Query = pd.read_sql_query(
'''select * from CB_CONTRACTS''', conn)
# Define the DF
df_CRH = pd.DataFrame(SQL_Query)
# Display the DF
df_CRH.head()
The output should be a data frame containing all the query.
As I said, it works for 8/10 tables except those 2 that I've been struggling with.
The Error Message:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 23: character maps to
cx_Oracle.connection() accepts encoding and nencoding parameters, see cx_Oracle.connect(). Alternatively you can set the NLS_LANG environment variable before starting python. Update: cx_Oracle version 8 uses the UTF-8 character set by default, and will ignore the character set component of NLS_LANG: see Character sets and globalization.
cursor.var() has an encodingErrors parameter, see Cursor.var and https://github.com/oracle/python-cx_Oracle/issues/162
I am trying to get all the meanings in the "noun" heading of the word the user enters.
This is my code for now:
import requests
from bs4 import BeautifulSoup
word=raw_input("Enter word: ").lower()
url=('http://www.dictionary.com/browse/'+word)
r=requests.get(url)
soup=BeautifulSoup(r.content,"html.parser")
try:
meaning=soup.find("div",attrs={"class":"def-content"}).get_text()
print "Meaning of",word,"is: "
print meaning
except AttributeError:
print "Sorry, we were not able to find the word."
pass
finally:
print "Thank you for using our dictionary."
Now suppose the user enters the word "today" and my output will be:
this present day: Today is beautiful.
I dont understand why does it leave so many spaces and why doesnt the part
"Today is beautiful"
come down.
Anyway when you look up that word on this site, you can see there are 2 meanings yet my program only shows one.
I want the output to be:
1.this present day:
Today is beautiful.
2.
this present time or age:
the world of today.
Can anyone explain me whats wrong and how can i fix it?
I have no idea what's wrong so please dont think I dint try.
You are getting the first noun meaning using the above code.
I have rewritten the code, it is as below:
from bs4 import BeautifulSoup
import requests
word = raw_input("Enter word: ").lower()
url = ('http://www.dictionary.com/browse/' + word)
r = requests.get(url)
bsObj = BeautifulSoup(r.content, "lxml")
nouns = bsObj.find("section", {"class": "def-pbk ce-spot"})
data = nouns.findAll('div', {'class': 'def-content'})
count = 1
for item in data:
temp = ' '.join(item.get_text().strip().split())
print str(count) + '. ' + temp
count += 1
Explanation:
Yes. Assuming the website shows noun meaning first, I am retrieving the first section which contains complete noun data. Then I am finding all the meanings under that section inside data variable and iterating it in a loop and fetching the text of each meaning present in the data. Then to remove all the extra spaces I am splitting the fetched text and the joining it with a single space along with the addition of a number at the beginning
try:
meaning = soup.find(attrs={"class": "def-pbk ce-spot"}).get_text(separator="\n",strip=True)
you can strip the whitesapce of the text by pass strip=True to get_text()
the reason way you don't got all the text is that you seletor is wrong, you should make the range bigger.
I add separator= '\n' to get_text() to format output.
if you hava any question, you can read the BeautifulSoup Document.
I'm brand new to R Project.
I've tried to import using the file path, and no matter what path way I use I get this same error. It says the back slash and the first character after it is an unrecognized escape.
I can import files by using the file.choose() function and manually navigating to my file, but I need to be able to use the file path method in code so that I can run multiple iterative steps without having to be there to choose the file at every point.
Does anyone have any ideals on why this error might occur? Is there anything wrong with my code? Is there some kind of configuration I need to do?
Thanks.
Data1 <- read.table(file="\Head-Location-001\MarketingAnalysis\Competitive Intelligence\Stick Rate\Last week\Test.cvs" sep",", header=TRUE)
Error: '\C' is an unrecognized escape in character string starting ""\fknp-sfs-001\fknmktanlys\
Import Error
replace all the \ with \\.
it's trying to escape the next character in this case the C so to insert a \ you need to insert an escaped \ which is \\
or
Replacing them with / works as well
Plone is showing the special chars from my mother language (Brazilian Portuguese) in its pages. However, when I use a spt page I created it shows escape sequences, e.g.:
Educa\xc3\xa7\xc3\xa3o
instead of
Educação
(by the way, it means Education). I'm creating a python function to replace the escape sequences with the utf chars, but I have a feeling that I'm slaving away without need.
Are you interpolating catalog search results? Those are, by necessity (the catalog cannot handle unicode) UTF-8 encoded.
Just use the .decode method on strings to turn them into unicode again:
value = value.decode('utf8')
A better way should be to use safe_unicode function https://github.com/plone/Products.CMFPlone/blob/master/Products/CMFPlone/utils.py#L458
from Products.CMFPlone.utils import safe_unicode
value = safe_unicode(value)