OSError: [WinError 193] - web-scraping

Imma try and be careful, my first question on here got me blocked lol. Testing out my scraping skills on a random church website and I keep getting the error on the title. Can someone see what I'm doing wrong? Updated my CV's, installed like 10 packages(based on some past answers) and still nothing.
import subprocess
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
subprocess.call(['chromedriver_win32.zip'], shell=True)
website = "https://www.bethanyfga.org/"
path = "C:/Users/calde/OneDrive/Desktop/chromedriver_win32.zip"
service = Service(executable_path=path)
driver = webdriver.Chrome(service=service)
driver.get(website)
installed several openCV's, uninstalled and reinstalled other, unrelated, packages. Imported subprocess.

Related

Deno 500 error from dev.jspm.io installing dependencies

I'm trying to run my first deno script which is pretty much from the denoDB docs, it just tries to connect to a database with a SQLite3 connector (I'm on a Macbook pro so it should be installed):
import { Database, SQLite3Connector } from 'https://deno.land/x/denodb/mod.ts';
const connector = new SQLite3Connector({
filepath: './db.sqlite',
});
export const db = new Database(connector);
I'm running deno run api/db.ts and I get this error after a few successful downloads:
Download https://deno.land/std#0.149.0/encoding/hex.ts
Download https://deno.land/std#0.149.0/hash/_wasm/lib/deno_hash.generated.mjs
error: Import 'https://dev.jspm.io/inherits#2.0' failed: 500 Internal Server Error
at https://raw.githubusercontent.com/Zhomart/dex/930253915093e1e08d48ec0409b4aee800d8bd0c/lib-dyn/deps.ts:4:26
I've deleted my /Users/<me>/Library/Caches/deno/deps/https and reran the script a few times but I still can't get past this. In my browser trying to follow the URL https://dev.jspm.io/inherits#2.0 does give me an error. What is going on here? There's not much code and I imagine it's not broken for everybody. What do I need to do to get this script to run without issues?
EDIT: it seems to be a library error https://github.com/eveningkid/denodb/issues/348
This is an error caused by a nested depedency, from a project that is not maintained.
See this for more info: [https://jspm.org/jspm-dev-release]
The point is dev.jspm.io is now jspm.dev
A way to fix this is to fork and update depedencies.
Another thing, if you're not using deno deploy, you can just use this as a replacement for your denodb: https://raw.githubusercontent.com/joeldesante/denodb/master/mod.ts
Just note that this script is no more maintained either, but it will fix your problem
Edit
I just made a dirty quick fix for deno deploy use this as a depedency isntead of denodb: https://raw.githubusercontent.com/ninjinskii/denodb/master/mod.ts
Again, i may not maintain this script forever.
The best thing that can happen is an update from these libs maintainers

vscode jupyter notebook error Error: Session cannot generate requests

Error message in vs code when using jupyter extension connected to remote server using ssh.
Error: Session cannot generate requests
Error: Session cannot generate requests
at w.executeCodeCell (/root/.vscode-server/extensions/ms-toolsai.jupyter-2021.8.1236758218/out/client/extension.js:90:327199)
at w.execute (/root/.vscode-server/extensions/ms-toolsai.jupyter-2021.8.1236758218/out/client/extension.js:90:326520)
at w.start (/root/.vscode-server/extensions/ms-toolsai.jupyter-2021.8.1236758218/out/client/extension.js:90:322336)
at async t.CellExecutionQueue.executeQueuedCells (/root/.vscode-server/extensions/ms-toolsai.jupyter-2021.8.1236758218/out/client/extension.js:90:336863)
at async t.CellExecutionQueue.start (/root/.vscode-server/extensions/ms-toolsai.jupyter-2021.8.1236758218/out/client/extension.js:90:336403)
I got this error after running the code below.
import pandas as pd
from itertools import product
pd.DataFrame(product(item_table, user_table), columns = ['item_id', 'user_id'])
product function outputs all combinations of the given tables.
item_table has 39729 number of items(39729 by 1)
user_table has 251350 users(251350 by 1).
And the above code outputs 251350 x 39729 combination table.
Therefore I guess this is because of the large computation but I want to know the meaning of error messages and want to know how to solve the problem.
I have encountered the same problem (so so).
It happened when I tried to import tensorflow.keras.
I was no longer able to import packages.
I just changed of conda environment, and then got back to the one I was working in and it worked (but trying to import keras still caused the same problem).

requests_html render() throwing OSError: [WinError 14001]

Hello I'm trying to do web scraping with the python module requests-html to handle dynamic content on the page https://www.monster.com/jobs/search?q=Software+Engineer&where=. My code is:
from requests_html import HTMLSession
url = 'https://www.monster.com/jobs/search?q=Software+Engineer&where='
session = HTMLSession()
response = session.get(url)
response.html.render()
but when I run response.html.render() I get this error
OSError: [WinError 14001] The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detail
The first time I ran render() I got
[W:pyppeteer.chromium_downloader] start chromium download.
Download may take a few minutes.
[W:pyppeteer.chromium_downloader]
chromium download done.
[W:pyppeteer.chromium_downloader] chromium extracted to: C:\Users\user\AppData\Local\pyppeteer\pyppeteer\local-chromium\588429
however the file path doesn't exist but pyppeteer is actually an installed package (pyppeteer==0.2.5). Does anyone have an idea what is going on?
You're having this issue because chromium setup failed.
You can either try to reinstall request_html or what I did was switching from the python from the Windows store to the download from the python website and then installing request_html again.
After having everything setup correctly with the downloaded python I switched back to python 3.9 from the store and everything is still working.

pyinstaller ImportError: C extension: No module named np_datetime not built

I am running a virtual environment with Python 2.7 for my program.
There seems to be a problem after creating the executable file on windows.
I ran venv/Scripts/pyinstaller.exe -F main.py
everything seems fine. But when i click on the created executable main.exe.
There is an error.
Tried and tested
I have re-installed of pandas and pyinstaller
Implemented the hook-pandas.py to the hooks folder in the environment.
hook-pandas
Ensured the environment is activated.
Checked that the program is running fine before building executable.
Re-created the environment.
Yet after all that, I am prompted with this issue [see Importerror] when I run the executable file.
It is an extreme pain to debug this because the command prompt displaying the error will not pause but close almost immediately.
Similar issues
Looking for Suggestions
I am hoping for suggestions to troubleshoot Pyinstaller. Any resources to read up on would be nice.
Usually, I have no trouble with python as Pycharm has several handy debugging tools that will help me identify the problem
I ran into the same problem and found this thread, but I managed to solve it borrowing from the reference you posted (about pandas._libs.tslibs.timedeltas), so thank you for that!
In that article, the module that resulted in the ImportError was, in fact pandas._libs.tslibs.timedeltas, if you look at the poster's logs. But the error you and I ran into refers to np_datetime instead. So, from the traceback logs, I finally figured out that the code we have to write in hook-pandas.py should be the following:
hiddenimports = ['pandas._libs.tslibs.np_datetime']
Maybe that alone will solve your problem, HOWEVER, in my case, once I solved the np_datetime issue, other very similar ImportError problems arose (also related to hiddenimports regarding pandas), so, in case you run into the same issues, just define hiddenimports as follows:
hiddenimports = ['pandas._libs.tslibs.np_datetime','pandas._libs.tslibs.nattype','pandas._libs.skiplist']
TL;DR:
You can first try to write
hiddenimports = ['pandas._libs.tslibs.np_datetime']
into hook-pandas.py. However, if for some reason you run into the exact same issues I did afterwards, try
hiddenimports = ['pandas._libs.tslibs.np_datetime','pandas._libs.tslibs.nattype','pandas._libs.skiplist']
If you wish to dive deeper (or run into a different pandas ImportError than the ones I did), this is the code in pandas's __init__.py referenced in your traceback log (lines 23 to 35):
from pandas.compat.numpy import *
try:
from pandas._libs import (hashtable as _hashtable,
lib as _lib,
tslib as _tslib)
except ImportError as e: # pragma: no cover
# hack but overkill to use re
module = str(e).replace('cannot import name ', '')
raise ImportError("C extension: {0} not built. If you want to import "
"pandas from the source directory, you may need to run "
"'python setup.py build_ext --inplace --force' to build "
"the C extensions first.".format(module))
From that I went into the
C:\Python27\Lib\site-packages\pandas_libs
and
C:\Python27\Lib\site-packages\pandas_libs\tslibs
folders and found the exact names of the modules that resulted the errors.
I hope that solves your problem as it did mine.
Cheers!

Apache spark-shell error import jars

I have a local spark 1.5.2 (hadoop 2.4) installation on Windows as explained here.
I'm trying to import a jar file that I created in Java using maven (the jar is jmatrw that I uploaded on here on github). Note the jar does not include a spark program and it has no dependencies to spark. I tried the following steps, but no one seems to work in my installation:
I copied the library in "E:/installprogram/spark-1.5.2-bin-hadoop2.4/lib/jmatrw-v0.1-beta.jar"
Edit spark-env.sh and add SPARK_CLASSPATH="E:/installprogram/spark-1.5.2-bin-hadoop2.4/lib/jmatrw-v0.1-beta.jar"
In a command window I run > spark-shell --jars "E:/installprogram/spark-1.5.2-bin-hadoop2.4/lib/jmatrw-v0.1-beta.jar", but it says "Warning: skip remote jar"
In the spark shell I tried to do scala> sc.addJar("E:/installprogram/spark-1.5.2-bin-hadoop2.4/lib/jmatrw-v0.1-beta.jar"), it says "INFO: added jar ... with timestamp"
When I type scala> import it.prz.jmatrw.JMATData, spark-shell replies with error: not found: value it.
I spent lot of time searching on Stackoverflow and on Google, indeed a similar Stakoverflow question is here, but I'm still not able to import my custom jar.
Thanks
There are two settings in 1.5.2 to reference an external jar. You can add it for the driver or for the executor(s).
I'm doing this by adding settings to the spark-defaults.conf, but you can set these at spark-shell or in SparkConf.
spark.driver.extraClassPath /path/to/jar/*
spark.executor.extraClassPath /path/to/jar/*
I don't see anything really wrong with the way you are doing it, but you could try the conf approach above, or setting these using SparkConf
val conf = new SparkConf()
conf.set("spark.driver.extraClassPath", "/path/to/jar/*")
val sc = new SparkContext(conf)
In general, I haven't enjoyed working with Spark on Windows. Try to get onto Unix/Linux.

Resources