PDFPageCountError: Unable to get page count - pdf2image

I am trying to use pdf2image, but I am getting this error:
PDFPageCountError: Unable to get page count.
I/O Error: Couldn't open file 'C:\Users\user_name\Desktop\folder_name\folder2_name\folder3_name\007-084841-1 to 31 Dec'22': No error.
It is confusing as it doesn't give any error, it just says 'No error'
My code is:
doc = convert_from_path("C:\\Users\\user_name\\Desktop\\folder_name\\folder2_name\\folder3_name\\007-084841-1 to 31 Dec'22")
path, fileName = os.path.split("C:\\Users\\user_name\\Desktop\\folder_name\\folder2_name\\folder3_name\\007-084841-1 to 31 Dec'22")
fileBaseName, fileExtension = os.path.splitext(fileName)
for page_number, page_data in enumerate(doc):
txt = pytesseract.image_to_string(Image.fromarray(page_data)).encode("utf-8")
print("Page # {} - {}".format(str(page_number),txt))
Can anyone help me please?
I don't know what to try as the error message just says Unable to open...: No error

Related

rscopus package on R in MacBook - Invalid API key error

I am trying to use the Scopus API for the first time. I have the API key and the institution token. However, I am still getting an error, when I try to use it in R on my Mac. Here is my code:
library(rscopus)
set_api_key(MY_KEY)
hdr=inst_token_header(MY_TOKEN)
key=get_api_key()
print(rscopus::get_api_key(), reveal=TRUE)
have_api_key()
auth_info = process_author_name(last_name="Muschelli", first_name="John", verbose=FALSE)
The error message is:
> library(rscopus)
>
> set_api_key(MY_KEY)
> hdr=inst_token_header(MY_TOKEN)
> key=get_api_key()
> print(rscopus::get_api_key(), reveal=TRUE)
[1] "MY_KEY"
> have_api_key()
[1] TRUE
>
> if (have_api_key()) {
+ auth = elsevier_authenticate(api_key=key)
+ }
HTTP specified is: https://api.elsevier.com/authenticate
Warning message:
In elsevier_authenticate(api_key = key) : Forbidden (HTTP 403).
> auth_info = process_author_name(last_name="Muschelli", first_name="John", verbose=FALSE)
$`service-error`
$`service-error`$status
$`service-error`$status$statusCode
[1] "AUTHENTICATION_ERROR"
$`service-error`$status$statusText
[1] "Invalid API Key: valid apikey credentials required."
Error in get_complete_author_info(...) : Service Error
I tried
if (have_api_key()) {
auth = elsevier_authenticate(api_key=key)
}
but I get the error:
HTTP specified is: https://api.elsevier.com/authenticate
Warning message:
In elsevier_authenticate(api_key = key) : Forbidden (HTTP 403).
I have tried using auth_token_header(MY_TOKEN) instead of inst_token_header(MY_TOKEN) but the code is still not working.
I have also taken the following step in my terminal:
export Elsevier_API=MY_KEY > ~/.bash_profile
source ~/.bash_profile
I am still getting the error. However, the combination of key and institution token work here: https://dev.elsevier.com/scopus.html
Can anyone please help me debug this issue?
Thank You!
I figured out the issue. So, the correct way to query would be to pass the headers argument as well:
auth_info = process_author_name(last_name="Muschelli", first_name="John", verbose=FALSE, headers=hdr)
And now the code will work! :)

filterAndTrim : Error in add(bin) : record does not start with '#'

I'm using dada 2 version ‘1.22.0’ on windows 10, i have list of compressed (.gz ) fastq files, when i use the function filterAndTrim i get this error message :
Error in add(bin) : record does not start with '#'
But when i want to see if i can read th .gz file with library(ShortRead) :
library(ShortRead)
fn <- "path/to/example.fastq.gz"
reads <- readFastq(fn)
it doesn't give any error message message
I don't understand why the function filterAndTrim give the error message
Error in add(bin) : record does not start with '#'
Do you have any solution?

Error when trying to prase a HTTP-Request in R

im using R package httr to get a HTTP-Response for a specific link.
When trying to parse the content of the response im getting the Error:
Fehler in parse(text = script_content) : <text>:1:10: Unerwartete(s) '['
1: {"lines":[
Translated to enlgish it says something like this (sorry for my error messages being in German):
Error in parsing(text = script_content) : <text>1:10: Unexpected '['
1: {"lines":[
It seems as there is a problem with the format/encoding of the text. Here is my code:
script <-
GET(
url = "https://my_url.which_origin_is_not_important/my_script.R",
authenticate(username, pass)
)
script_content <- content(script, as = "text", encoding = "ISO-8859-1")
parsed_condent <- parse(text = script_content )
The value of script_content looks like this:
"{\"lines\":[{\"text\":\"################## FUNCTION ##################\"},{\"text\":\"\"},{\"text\":\"library(log4r)\"}],\"start\":0,\"size\":32,\"isLastPage\":true,\"limit\":500,\"nextPageStart\":null}"
Some more background to this operation: Im trying to source a code, which is currently inside of a private repository. I wrote the code myself i'm trying to source. I made sure, that the issue is not coming from within th code.
I got the solution from: Sourcing R files in a private github folder
Thanks for any advice!!

Error when scraping 2019 data with nflscrapR. Just started getting it

Everything was working great up to last Tuesday. Ran it again this weekend and today, getting the error:
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string: '\037‹\b'
install.packages("devtools")
devtools::install_github(repo = "maksimhorowitz/nflscrapR")
library(nflscrapR)
pbp_2019 <- scrape_season_play_by_play(2019, weeks = 9)
I expected to get the data as always, but this error above always pops up.
Any ideas?
I just redownloaded nflscrapR. I had to add 'force = TRUE' to the 'install_github()' command to get it to actually redownload.

file_get_html() not working for the only webpage

I want to call a simple DOM file
I tested with another links and it works, but with this url it's not working.
My code is:
$bnadatos = file_get_html("http://www.rofex.com.ar/cem/FyO.aspx");
foreach($bnadatos->find('[#id="ctl00_ContentPlaceHolder1_gvFyO"]') as $i){
echo "datos:";
echo $i->innertext;
}
Response is a blank page.
What's wrong?
i solved with
$arrContextOptions=array(
"ssl"=>array(
"verify_peer"=>false,
"verify_peer_name"=>false,
),
);
$response = file_get_html("https://www.rofex.com.ar/cem/FyO.aspx", false, stream_context_create($arrContextOptions));
foreach($response->find('[#id="ctl00_gvwDDF"]/tbody/tr[2]/td[2]') as $i){
echo $i->innertext;
}
thank you #maio290 for light my road
This is just a guess, but do you have your error reporting on?
Out of the box, this is not working with the simple-html-dom library:
Warning: file_get_contents(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed in /var/www/html/dom.php on line 83
Warning: file_get_contents(): Failed to enable crypto in /var/www/html/dom.php on line 83
Warning: file_get_contents(http://www.rofex.com.ar/cem/FyO.aspx): failed to open stream: operation failed in /var/www/html/dom.php on line 83
Fatal error: Call to a member function find() on boolean in /var/www/html/test.php on line 11
A fix for this can be found here - with that in place, I still get a blank page, which is due to a wrong answer (301 Moved Permanently) - for this to fix, you need to modify
'follow_location' => false
to
'follow_location' => true
so, now we get the proper site content - you can modify the selector to $html->find('#ctl00_ContentPlaceHolder1_gvFyO'); this will find all element which id=ctl00_ContentPlaceHolder1_gvFyO - see the documentation as reference.

Resources