lxml.etree.XPathEvalError: Invalid predicate - python-3.6

I get the following lxml.etree.XPathEvalError: Invalid predicate error:
Traceback (most recent call last):
File "check_1337.py", line 18, in
//div[#class = "_3iyw"]//div[#class = "_6beq _7cdk _6beo"]//div[#class = "_7om2 _3gim _ 7cdk"]//div [#class = "5s61"]//div[#class = "_7cdi"]')
File "src\lxml\etree.pyx", line 1583, in lxml.etree._Element.xpath
File "src\lxml\xpath.pxi", line 305, in lxml.etree.XPathElementEvaluator.call
File "src\lxml\xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Invalid predicate
from the following code:
from lxml.etree import HTML
import requests
url = "https://m.facebook.com/?_rdr"
response = requests.get(url)
root = HTML(response.content)
tempII = root.find("body").xpath('//div[#id = "viewport"]//div[#id = "page"]//div[#id = "rootcontainer]//div[#class = "async_compose _2v9s"]//div[#id = "MRoot"]\
//div[#class = "_3iyw"]//div[#class = "_6beq _7cdk _6beo"]//div[#class = "_7om2 _3gim _ 7cdk"]//div [#class = "5s61"]//div[#class = "_7cdi"]')
print (tempII)
Can you help me find out the reason for this error?

Take a look at rootcontainer in your find call.
Before this word you put a double quote (this is OK),
but failed to put another double quote after it.
Another detail: A bit later you have: #class = "_7om2 _3gim _ 7cdk".
Are you sure that there should be four classes here (_7om2, _3gim,
_ and 7cdk)?
Using "_" as a class name is a weird practice.
Maybe instead of 2 last classes there should be one class _7cdk?
Note that a bit earlier you have just _7cdk.
This flaw is not likely to cause any exception, but the result of find
probably will be empty.
Be cautious as you write such predicates, as errors like this are quite
difficult to identify.

Related

Find sequencing reads with insertions longer than number

I'm trying to isolate, from a bam file, those sequencing reads that have insertions longer than number (let's say 50bp). I guess I can do that using the cigar but I don't know any easy way to parse it and keep only the reads that I want. This is what I need:
Read1 -> 2M1I89M53I2M
Read2 -> 2M1I144M
I should keep only Read1.
Thanks!
Most likely I'm late, but ...
Probably you want MC tag, not CIGAR. I use BWA, and information on insertions is stored in the MC tag. But I may mistake.
Use pysam module to parse BAM, and regular expressions to parse MC tags.
Example code:
import pysam
import re
input_file = pysam.AlignmentFile('input.bam', 'rb')
output_file = pysam.AlignmentFile('found.bam', 'wb', template = input_file)
for Read in input_file:
try:
TagMC = Read.get_tag('MC')
except KeyError:
continue
InsertionsTags = re.findall('\d+I', TagMC)
if not InsertionsTags: continue
InsertionLengths = [int(Item[:-1]) for Item in InsertionsTags]
MinLength = min(InsertionLengths)
if MinLength > 50: output_file.write(Read)
input_file.close()
output_file.close()
Hope that helps.

How do I turn a file's contents into a dictionary?

I have a function that I want to open .dat files with, to extract data from them, but the problem is I don't know how to turn that data back into a dictionary to store in a variable. Currently, the data in the files are stored like this: "{"x":0,"y":1}" (it uses up only one line of the file, which is just the normal structure of a dictionary).
Below is just the function where I open the .dat file and try to extract stuff from it.
def openData():
file = fd.askopenfile(filetypes=[("Data",".dat"),("All Files",".*")])
filepath = file.name
if file is None:
return
with open(filepath,"r") as f:
contents = dict(f.read())
print(contents["x"]) #let's say there is a key called "x" in that dictionary
This is the error that I get from it: (not because the key "x" is not in dict, trust me)
Exception in Tkinter callback
Traceback (most recent call last):
File "...\AppData\Local\Programs\Python\Python39\lib\tkinter\__init__.py", line 1892, in __call__
return self.func(*args)
File "...\PycharmProjects\[this project]\main.py", line 204, in openData
contents = dict(f.read())
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Process finished with exit code 0
Update: I tried using json and it worked, thanks to #match
def openData():
file = fd.askopenfile(filetypes=[("Data",".dat"),("All Files",".*")])
filepath = file.name
if file is None:
return
with open(filepath,"r") as f:
contents = dict(json.load(f))
print(contents["x"])
You need to parse the data to get a data structure from a string, fortunately, Python provides a function for safely parsing Python data structures: ast.literal_eval(). E.g:
import ast
...
with open("/path/to/file", "r") as data:
dictionary = ast.literal_eval(data.read())
Reference stackoverflow

Python 3.7- PhantomJS - Driver.get(url) with 'Window handle/name is invalid or closed?'

Using two functions to scrape a website results in a driver.get error.
I've tried different variations of while and for loops to get this to work. Now I get a driver.get error. The initial function works on its own, but when running both functions one after another I get this error.
import requests, sys, webbrowser, bs4, time
import urllib.request
import pandas as pd
from selenium import webdriver
driver = webdriver.PhantomJS(executable_path = 'C:\\PhantomJS\\bin\\phantomjs.exe')
jobtit = 'some+job'
location = 'some+city'
urlpag = ('https://www.indeed.com/jobs?q=' + jobtit + '&l=' + location + '%2C+CA')
def initial_scrape():
data = []
try:
driver.get(urlpag)
results = driver.find_elements_by_tag_name('h2')
print('Finding the results for the first page of the search.')
for result in results: # loop 2
job_name = result.text
link = result.find_element_by_tag_name('a')
job_link = link.get_attribute('href')
data.append({'Job' : job_name, 'link' : job_link})
print('Appending the first page results to the data table.')
if result == len(results):
return
except Exception:
print('An error has occurred when trying to run this script. Please see the attached error message and screenshot.')
driver.save_screenshot('screenshot.png')
driver.close()
return data
def second_scrape():
data = []
try:
#driver.get(urlpag)
pages = driver.find_element_by_class_name('pagination')
print('Variable nxt_pg is ' + str(nxt_pg))
for page in pages:
page_ = page.find_element_by_tag_name('a')
page_link = page_.get_attribute('href')
print('Taking a look at the different page links..')
for page_link in range(1,pg_amount,1):
driver.click(page_link)
items = driver.find_elements_by_tag_name('h2')
print('Going through each new page and getting the jobs for ya...')
for item in items:
job_name = item.text
link = item.find_element_by_tag_name('a')
job_link = link.get_attribute('href')
data.append({'Job' : job_name, 'link' : job_link})
print('Appending the jobs to the data table....')
if page_link == pg_amount:
print('Oh boy! pg_link == pg_amount...time to exit the loops')
return
except Exception:
print('An error has occurred when trying to run this script. Please see the attached error message and screenshot.')
driver.save_screenshot('screenshot.png')
driver.close()
return data
Expected:
Initial Function
Get website from urlpag
Find element by tag name and loop through elements while appending to a list.
When done will all elements exit and return the list.
Second Function
While still on urlpag, find element by class name and get the links for the next pages to scrape.
As we have each page to scrape, go through each page scraping and appending the elements to a different table.
Once we reach our pg_amount limit - exit and return the finalized list.
Actual:
Initial Function
Get website from urlpag
Find element by tag name and loop through elements while appending to a list.
When done will all elements exit and return the list.
Second Function
Finds class pagination, prints nxt_variable and then throws the error below.
Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\Scripts\Indeedscraper\indeedscrape.py", line 23, in initial_scrape
driver.get(urlpag)
File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchWindowException: Message: {"errorMessage":"Currently Window handle/name is invalid (closed?)"
For individuals having this error, I ended up switching to chromedriver and using that instead for webscraping. It appears that using the PhantomJS driver will sometimes return this error.
I was having the same issue, until I placed my driver.close() after I was done interacting with selenium objects. I ended up closing the driver at the end of my script just to be on the safe side.

Get pre-existing variable through user input

I'm making a script that asks for input. Few variables has been set according to number of option.
option1 = '.\\Folder'
userChoice = input()
def func(userChoice):
pyautogui.locateOnScreen(userChoice)
But if user type option1 then userChoice = 'option1' instead '.\Folder' which lead to 'No such file or directory' error.
How could i solve the problem?
You can use python's try-except. An example from official docs:
import sys
try:
f = open('myfile.txt')
s = f.readline()
i = int(s.strip())
except OSError as err:
print("OS error: {0}".format(err))
except ValueError:
print("Could not convert data to an integer.")
except:
print("Unexpected error:", sys.exc_info()[0])
raise
Update: Based on your comment, I think you need to do something like this:
choices={'option1' : './/Folder'}
userChoice = input()
try:
function=choices[userChoice]
except KeyError:
raise ValueError('invalid input')

Return constant values in graphite in json format

How do we obtain a series of contant values in graphite. I have checked function constantLine(x), however, it draws a constant line on the "graph". However, I would need the values in json format.
The function Identity(t), returns x(t) = t; what we need y(t) = constant.
Currently, it seems to me that we need to inject data points into graphite DB. Is there a way we can do without it.
[graphite web uri]/render?target=FUNC(x)&format=json.
* Edit:
I have tried the constantLine(x) function
I get:
Traceback (most recent call last):
*
File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/opt/graphite/webapp/graphite/render/views.py", line 104, in renderView
seriesList = evaluateTarget(requestContext, target)
File "/opt/graphite/webapp/graphite/render/evaluator.py", line 10, in evaluateTarget
result = evaluateTokens(requestContext, tokens)
File "/opt/graphite/webapp/graphite/render/evaluator.py", line 21, in evaluateTokens
return evaluateTokens(requestContext, tokens.expression)
File "/opt/graphite/webapp/graphite/render/evaluator.py", line 27, in evaluateTokens
func = SeriesFunctions[tokens.call.func]
KeyError: u'contantLine'
*
graphite.com/render/?target=constantLine(123.456)&format=json
returns
[{"target": "123.456", "datapoints": [[123.456, 1381399794]]}]
Isn't this what you want? In case you are expecting graphite to return pure 123.456, you will have to edit the code and override how results are displayed.
Add a new function customConstantLine(requestContext, value) in functions.py
Override the rendering class to print pure 123.456, when the returning function is customConstantLine().
Heads-up: Your edit has to also account for cases wherein multiple targets are returned.
EDIT:
Patch your functions.py! In case you do not want to upgrade your Graphite add-
def constantLine(requestContext, value):
start = timestamp( requestContext['startTime'] )
end = timestamp( requestContext['endTime'] )
step = (end - start) / 1.0
series = TimeSeries(str(value), start, end, step, [value, value])
return [series]

Resources