requests_html render() throwing OSError: [WinError 14001] - web-scraping

Hello I'm trying to do web scraping with the python module requests-html to handle dynamic content on the page https://www.monster.com/jobs/search?q=Software+Engineer&where=. My code is:
from requests_html import HTMLSession
url = 'https://www.monster.com/jobs/search?q=Software+Engineer&where='
session = HTMLSession()
response = session.get(url)
response.html.render()
but when I run response.html.render() I get this error
OSError: [WinError 14001] The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detail
The first time I ran render() I got
[W:pyppeteer.chromium_downloader] start chromium download.
Download may take a few minutes.
[W:pyppeteer.chromium_downloader]
chromium download done.
[W:pyppeteer.chromium_downloader] chromium extracted to: C:\Users\user\AppData\Local\pyppeteer\pyppeteer\local-chromium\588429
however the file path doesn't exist but pyppeteer is actually an installed package (pyppeteer==0.2.5). Does anyone have an idea what is going on?

You're having this issue because chromium setup failed.
You can either try to reinstall request_html or what I did was switching from the python from the Windows store to the download from the python website and then installing request_html again.
After having everything setup correctly with the downloaded python I switched back to python 3.9 from the store and everything is still working.

Related

OSError: [WinError 193]

Imma try and be careful, my first question on here got me blocked lol. Testing out my scraping skills on a random church website and I keep getting the error on the title. Can someone see what I'm doing wrong? Updated my CV's, installed like 10 packages(based on some past answers) and still nothing.
import subprocess
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
subprocess.call(['chromedriver_win32.zip'], shell=True)
website = "https://www.bethanyfga.org/"
path = "C:/Users/calde/OneDrive/Desktop/chromedriver_win32.zip"
service = Service(executable_path=path)
driver = webdriver.Chrome(service=service)
driver.get(website)
installed several openCV's, uninstalled and reinstalled other, unrelated, packages. Imported subprocess.

Deno 500 error from dev.jspm.io installing dependencies

I'm trying to run my first deno script which is pretty much from the denoDB docs, it just tries to connect to a database with a SQLite3 connector (I'm on a Macbook pro so it should be installed):
import { Database, SQLite3Connector } from 'https://deno.land/x/denodb/mod.ts';
const connector = new SQLite3Connector({
filepath: './db.sqlite',
});
export const db = new Database(connector);
I'm running deno run api/db.ts and I get this error after a few successful downloads:
Download https://deno.land/std#0.149.0/encoding/hex.ts
Download https://deno.land/std#0.149.0/hash/_wasm/lib/deno_hash.generated.mjs
error: Import 'https://dev.jspm.io/inherits#2.0' failed: 500 Internal Server Error
at https://raw.githubusercontent.com/Zhomart/dex/930253915093e1e08d48ec0409b4aee800d8bd0c/lib-dyn/deps.ts:4:26
I've deleted my /Users/<me>/Library/Caches/deno/deps/https and reran the script a few times but I still can't get past this. In my browser trying to follow the URL https://dev.jspm.io/inherits#2.0 does give me an error. What is going on here? There's not much code and I imagine it's not broken for everybody. What do I need to do to get this script to run without issues?
EDIT: it seems to be a library error https://github.com/eveningkid/denodb/issues/348
This is an error caused by a nested depedency, from a project that is not maintained.
See this for more info: [https://jspm.org/jspm-dev-release]
The point is dev.jspm.io is now jspm.dev
A way to fix this is to fork and update depedencies.
Another thing, if you're not using deno deploy, you can just use this as a replacement for your denodb: https://raw.githubusercontent.com/joeldesante/denodb/master/mod.ts
Just note that this script is no more maintained either, but it will fix your problem
Edit
I just made a dirty quick fix for deno deploy use this as a depedency isntead of denodb: https://raw.githubusercontent.com/ninjinskii/denodb/master/mod.ts
Again, i may not maintain this script forever.
The best thing that can happen is an update from these libs maintainers

XDG_RUNTIME_DIR , propagateSizeHints() errors

When I run this script "python3 script.py" everything is alright.
(I tried to run the script as a root, and other user too)
import music21
import os
# "qt.qpa.xcb: could not connect to display"
# "qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found."
os.putenv("QT_QPA_PLATFORM", "offscreen") #handle error above in my case
us = music21.environment.UserSettings()
us['musescoreDirectPNGPath'] = '/usr/bin/musescore3'
score = music21.converter.parse("myfile.musicxml")
score.write('musicxml.pdf', fp='song.pdf')
But when my django backend has to execute this code as a function to respond API call, its not working. I got:
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-myusername'
This plugin does not support propagateSizeHints()
And if I remove part with "QT_QPA_PLATFORM", I got "qt.qpa.xcb" error.
I have not idea what to do.
I also tried to add following line in my gunicorn setup script but nothing changed.
export QT_QPA_PLATFORM=offscreen
Any ideas how to fix it ? What can be wrong ?
Server OS: Ubuntu 20.04
You probably need to also set XDG_RUNTIME_DIR. See https://github.com/cuthbertLab/music21/issues/260#issuecomment-834489173

Installing a package from private GitLab server on Windows

I am struggling with installing a package from a GitLab repository on a Windows computer.
I found different hints but still have problems to install my package from GitLab. First of all, I generated a public and private key with puttygen.exe. The files need to be changed afterwards, I had to remove comments and stuff so they look like my the file on my Unix system. So now, both public and private key files have just a single line.
I tried to install my package via devtools::install_git which takes very long and I get the error message
Error: Failed to install 'unknown package' from Git:
Error in 'git2r_remote_ls': Failed to authenticate SSH session: Unable to send userauth-publickey request
And with devtools::install_gitlab I get a different error message and I somehow have the feeling, the link which gets generated doesn't fit to my GitLab server.
Error: Failed to install 'unknown package' from GitLab:
cannot open URL 'https://gitlab.rlp.net/api/v4/projects/madejung%2FMQqueue.git/repository/files/DESCRIPTION/raw?ref=master'
My complete code to test at the moment is
creds <- git2r::cred_ssh_key(publickey="~/.ssh/id_rsa_gitlab.pub",
privatekey="~/.ssh/id_rsa_gitlab")
devtools::install_git(
url='git#gitlab.rlp.net:madejung/MQqueue.git',
quiet=FALSE,
credentials=creds)
devtools::install_gitlab(
repo='madejung/MQqueue.git',
host='gitlab.rlp.net',
quiet=FALSE,
credentials=creds
)
My id_rsa_gitlab.pub file looks like this and is just a single line:
ssh-rsa AAAA....fiwbw== rsa-key-20200121
The id_rsa_gitlab file has just the code:
AAABA.....3WNSIAGE=
Update
On my Mac system it works as expected after installing the libssh2 library via homebrew and and recompiling git2r with install.packages("git2r", type = "source").
So the working code on my machine is:
creds <- git2r::cred_ssh_key(publickey="~/.ssh/id_rsa_gitlab.rlp.net.pub",
privatekey="~/.ssh/id_rsa_gitlab.rlp.net")
devtools::install_git(
url='git#gitlab.rlp.net:madejung/MQqueue.git',
quiet=FALSE,
credentials=creds
)
For some strange reason, the devtools::install_git call needs about a minute to fail in the end. I have no idea where the problem here is.
After struggling for almost a day, I found a solution I can live with...
I first created a PAT (Personal Access Token) in my gitlab account and granted full API access. For some reason the read_only access didn't worked and I am now tired to figure out what the problem is.
After this I had still problems to install my package and for some reason, the wininet setting for downloading doesn't work.
I used the command capabilities("libcurl") to check if libcurl is available on my windows, which was and tried to overwrite wininet to libcurl by using method='libcurl' in the install function. Somehow, this was not enough so I overwrote the options variable download.file.method directly.
options("download.file.method"='libcurl')
devtools::install_gitlab(
repo='madejung/MQqueue',
auth_token='Ho...SOMETHING...xugzb',
host='gitlab.rlp.net',
quiet=FALSE, force=TRUE
)

twisted web server not ruuning a .rpy file

i am a toddler on Twisted .I am trying to run a Twisted web server using the command
twistd web --resource-script=~/Desktop/step/ecdemo.rpy
assume that my file(ecdemo.rpy) is located on desktop in step folder
the traceback when i visit the page(127.0.0.1:8080/ecdemo.rpy) shows
<type 'exceptions.IOError'>: [Errno 2] No such file or directory: '~/Desktop/step/ecdemo.rpy
however if i run the same file with command
python ecdemo.rpy it runs smoothly.
The program simply renders a get request from an http page
I know it is something basic that i do not know but if you could help me get started i would come up with better problems...
thanks for help.
Your shell didn't expand ~ into your home directory. Try this instead:
twistd web --resource-script ~/Desktop/step/ecdemo.rpy
Notice I removed the = between the option name and its value. This will probably let your shell turn ~ into /home/whoever.

Resources