I am trying to use pdftables package to extract data into csv.
install.packages("pdftables")
library(pdftables)
write.csv(head(iris), file = "test.csv", row.names = FALSE)
Open test.csv and print as PDF to "test.pdf"
convert_pdf("test.pdf", "test2.csv")
However, I am getting the following error:
Error in get_content(input_file, format, api_key) : Bad Request
(HTTP 400).
What's the fix here?
Did you get an API token?
To use the package the user first needs to sign up to the PDFTables API to get an API token (they offer a free package that allows up to 50 pages).
See: https://cran.r-project.org/web/packages/pdftables/README.html
To use the PDFTables R package, you need to the run the following command:
convert_pdf('test/index.pdf', output_file = NULL, format = "xlsx-single", message = TRUE, api_key = "insert_API_key")
Make sure you replace insert_API_key with your API key, and change the file path and/or format.
More info here: https://pdftables.com/blog/convert-pdf-to-excel-r
Related
I have just received an Academic Twitter Developer privileges and am attempting to scrape some tweets. I updated RStudio, regenerated a new bearer token once I got updated Academic Twitter access, and get_bearer() returns my new bearer token. However, I continue to get the following error:
Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, :
something went wrong. Status code: 403
In addition: Warning messages:
1: Recommended to specify a data path in order to mitigate data loss when ingesting large amounts of data.
2: Tweets will not be stored as JSONs or as a .rds file and will only be available in local memory if assigned to an object.
Additionally, I have tried specifying a data path, but I think I am confused as to what these means? I think that's where my issue lies, but does the data path way mean like a specific file pathway on my computer?
Below is the code I was attempting to use.This code worked previously with my professor's bearer token that they used to just show the output:
`tweets <-
get_all_tweets(
query = "#BlackLivesMatter",
start_tweets = "2020-01-01T00:00:00Z",
end_tweets = "2020-01-05T00:00:00Z",
n = 100)`
Thanks in advance!
Status code 403 means forbidden. You may want to check the error codes reference page of Twitter API here.
Perhaps your bearer token is misspelled?
I'm trying to run a code in R to get some data from FREDR package but I'm getting trouble to understand the error R shows me.
The code I have:
library(fredr)
fredr_set_key("...")
cpi <- fredr::fredr(series_id = "CPIAUCSL",observation_start = as.Date("1960-01-01"),observation_end = as.Date("2005-12-01"))
The error I get:
Error in (function (endpoint, ..., to_frame = TRUE, print_req = FALSE) :
400: Bad Request. The value for variable api_key is not a 32 character alpha-numeric lower-case string. Read https://research.stlouisfed.org/docs/api/api_key.html for more information.
This code runs perfectly in the computer of my professor (who is a Windows user) so I think that the problem may be related to my Mac but I'm really not sure.
Mac OS 10.15.4
Did you use your API key? You should request one here: https://research.stlouisfed.org/docs/api/api_key.html
Then replace ... with your API key.
I am attempting to use the streamR in R to download and analyze Twitter, under the pretense that this library can overcome the limitations from the twitteR package.
When downloading data everything seems to work fabulously, using the filterStream function (just to clarify, the function captures Twitter data, just running it will provide the json file -saved in the working directory- that needs to be used in further steps):
filterStream( file.name="tweets_test.json",
track="NFL", tweets=20, oauth=credential, timeout=10)
Capturing tweets...
Connection to Twitter stream was closed after 10 seconds with up to 21 tweets downloaded.
However, when moving on to parse the json file, I keep getting all sorts of errors:
readTweets("tweets_test.json", verbose = TRUE)
0 tweets have been parsed.
list()
Warning message:
In readLines(tweets) : incomplete final line found on 'tweets_test.json'
Or with this function from the same package:
tweet_df <- parseTweets(tweets='tweets_test.json')
Error in `$<-.data.frame`(`*tmp*`, "country_code", value = NA) :
replacement has 1 row, data has 0
In addition: Warning message:
In stream_in_int(path.expand(path)) : Parsing error on line 0
I have tried reading the json file with jsonlite and rjson with the same results.
Originally, it seemed that the error came from special characters ({, then \) within the json file that I tried to clean up following the suggestion from this post, however, not much came out of it.
I found out about the streamR package from this post, which shows the process as very straight forward and simple (which it is, except for the parsing part!).
If any of you have experience with this library and/or these parsing issues, I'd really appreciate your input. I have been searching non stop but haven't been able to locate a solution.
Thanks!
I am trying to load a .RData file into my R Notebook in DSX. I have followed the instructions in this notebook (https://apsportal.ibm.com/exchange/public/entry/view/90a34943032a7fde0ced0530d976ca82) but am still unable to load my data. So far, I have been successful in the following steps:
I have loaded my dataset into object storage.
I inserted my credentials using the Insert to code -> Insert Credentials button. This seemed to work as expected.
In the next cell, I chose the Insert to code -> Insert textConnection object option. This seemed to work as expected also.
The output of step # 3 was as follows:
Your data file was loaded into a textConnection object and you can process the data with your package of choice.
data.1 <- getObjectStorageFileWithCredentials_xxxxxxxxxx("projectname", "file.RData")
After this, since my file is a .RData file, I typed the following command:
data <- load("file.RDA")
When I ran this cell, I got the following output:
Warning message in readChar(con, 5L, useBytes = TRUE):
“cannot open compressed file 'file.RDA', probable reason 'No such file or directory'”
Error in readChar(con, 5L, useBytes = TRUE): cannot open the connection
Traceback:
load("file.RDA")
readChar(con, 5L, useBytes = TRUE)
When I type in the following command to print the dataset:
data
I get the following output:
X.html..h1.Forbidden..h1..p.Access.was.denied.to.this.resource...p...html.
Please can someone help?
Thanks,
Venky
Here is a workaround given that load can't read from a response object since to read objects from Object storage, only way is the REST api.
I tried to use rawConnection instead of textConnection but it seems to be not helping.
So instead of passing the read object from OS directly to load or readRDS function.You can write it to GPFS of spark service attached and read it from there same as reading from local.
Change this lines from generated code:-
rawdata <- content(httr::GET(url = access_url, add_headers ("Content-Type" = "application/json", "X-Auth-Token" = x_subject_token)), as="raw")
rawdata
Basically instead of returning text , return raw object and then write that as binary object to local GPFS.
data.3 <- getObjectStorageFileWithCredentials_216c032f3f574763ae975c6a83a0d523("testObjectStorage", "sample.rdata")
writeBin(data.3,"sample.rdata")
Now read it back using readRDS or load.
load("sample.rdata")
To see loaded dataframe.
ls()
I hope it helps.
Thanks,
Charles.
I need to import excel file directly from NYSE website. The spreadsheet url is https://quotespeed.morningstar.com/exportChartDataToExcel.jsp?tickers=AAPL&symbols=126.1.AAPL&st=1980-12-1&ed=2015-6-8&f=m&dty=1&types=1&ver=1.6.0&qs_wsid=E43474CC03753FE0E777D89877788ECB . Tried using gdata package and changing https to http but still doesnt work. Does anybody know solution to such issue?
EDIT: Has to be imported to R directly from website (project requirement)
Without information about why using the gdata package does not work for you I have to assume. Make sure you have Perl installed - you can download it at http://www.activestate.com/activeperl
This works for me:
library('gdata')
## URL broken into multiple lines for readability
url <- paste("https://quotespeed.morningstar.com/exportChartDataToExcel.",
"jsp?tickers=AAPL&symbols=126.1.AAPL&st=1980-12-1&ed=2015-",
"6-8&f=m&dty=1&types=1&ver=1.6.0&qs_wsid=E43474CC03753FE0E",
"777D89877788ECB", sep = "")
url <- gsub("https", "http",url)
data <- read.xls(url, perl = "C:/Perl64/bin/perl.exe")
Without perl = "path_to_perl.exe" I got the error
Error in findPerl(verbose = verbose) :
perl executable not found. Use perl= argument to specify the correct path.
Error in file.exists(tfn) : invalid 'file' argument
Use the RCurl package to download the file and the readxl package by Hadley to read the excel file