Parsing error in jsonlite reading multiple validated json files in R - r

I'm new to json files in R. I got some json files for STONKS are you can see scraped online. The werid thing is each individual json file can be read perfectly using fromJSON from the jsonlite library. I have taken individual files to check on jsonlint.com to validate, and the web confirmed they are valid json files.
I have tried to use the validate () from jsonlite on single files, but there the file failed to pass the validation.
However, when I want to use lapply to read all the files R gave me error on
Error in parse_con(txt, bigint_as_char) :
lexical error: invalid char in json text.
No data available for the given
Here are my simple codes:
library(tidyr)
library(jsonlite)
ls <- list.files(pattern = "*.json")
Data.fromJson <- lapply(ls, fromJSON)
Sorry but I think the problem might be with the data so I dont want to just trim it and make dummy as it might mess things up. Instead I have uploaded a few json files to my google drive and here is the link:
https://drive.google.com/drive/folders/1zM4vj1TIseFKBSiNWe5yMY9BJPg-CIsv?usp=sharing

Related

Error when parsing JSON file into R - how to fix?

Using the package rtweet, I have streamed some tweets and saved them in a JSON file.
When using the following: tweets_df <- parse_stream('file.json'), I get the following error during the process:
Does anyone have any idea how to fix this so that the JSON file can be read into R as a data frame?
Have you tried it this way? I don't personally use rtweet but work with json files.
#load library to read json
library(jsonlite)
json_data <- fromJSON("db.json")
It reads it as a nested list but then you can simply change it to a dataframe using
df<-rlist::list.stack(x, fill=TRUE )'
You might have to adapt it and for example use a loop if your json file contains several users.

Importing to R an Excel file saved as web-page

I would like to open an Excel file saved as webpage using R and I keep getting error messages.
The desired steps are:
1) Upload the file into RStudio
2) Change the format into a data frame / tibble
3) Save the file as an xls
The message I get when I open the file in Excel is that the file format (excel webpage format) and extension format (xls) differ. I have tried the steps in this answer, but to no avail. I would be grateful for any help!
I don't expect anybody will be able to give you a definitive answer without a link to the actual file. The complication is that many services will write files as .xls or .xlsx without them being valid Excel format. This is done because Excel is so common and some non-technical people feel more confident working with Excel files than a csv file. Now, the files will have been stored in a format that Excel can deal with (hence your warning message), but R's libraries are more strict and don't see the actual file type they were expecting, so they fail.
That said, the below steps worked for me when I last encountered this problem. A service was outputting .xls files which were actually just HTML tables saved with an .xls file extension.
1) Download the file to work with it locally. You can script this of course, e.g. with download.file(), but this step helps eliminate other errors involved in working directly with a webpage or connection.
2) Load the full file with readHTMLTable() from the XML package
library(XML)
dTemp = readHTMLTable([filename], stringsAsFactors = FALSE)
This will return a list of dataframes. Your result set will quite likely be the second element or later (see ?readHTMLTable for an example with explanation). You will probably need to experiment here and explore the list structure as it may have nested lists.
3) Extract the relevant list element, e.g.
df = dTemp[2]
You also mention writing out the final data frame as an xls file which suggests you want the old-style format. I would suggest the package WriteXLS for this purpose.
I seriously doubt Excel is 'saved as a web page'. I'm pretty sure the file just sits on a server and all you have to do is go fetch it. Some kind of files (In particular Excel and h5) are binary rather than text files. This needs an added setting to warn R that it is a binary file and should be handled appropriately.
myurl <- "http://127.0.0.1/imaginary/file.xlsx"
download.file(url=myurl, destfile="localcopy.xlsx", mode="wb")
or, for use downloader, and ty something like this.
myurl <- "http://127.0.0.1/imaginary/file.xlsx"
download(myurl, destfile="localcopy.csv", mode="wb")

PDF File Import R

I have multiple .pdf-files (stored in a local folder), that contain text. I would like to import the .pdf-files (i.e., the texts) in R. I applied the function 'read_dir' (R package: [textreadr][1])
library ("textreadr")
Data <- read_dir("<MY PATH>")
The function works well. BUT. For several files, that include special characters (i.e., letters) in their names (such as 'ć'; e.g., 'filenameć.pdf'), the function did not work (error message: 'The following files failed to read in and were removed:' …).
What can I do?
I tried to rename the files via R (did not work (probably due to the same reasons)). That might be a workaround.
I did not want to rename the files manually :)
Follow-Up (only for experts):
For several files, I got one of the following error messages (and I have no idea why):
PDF error: Mismatch between font type and embedded font file
or
PDF error: Couldn't find trailer dictionary
Any suggestions or hints how to solve this issue?
Likely the issue concerns the encoding of the file names. If you absolutely want to use R to rename the files for you, the function you want to use is iconv, determine the encoding of the file names and then convert them to utf-8.
However, a much better system would imply renaming them using bash from command line. Can you provide a more complete set of examples?

Wrong encoding with fromJSON of jsonlite library

I have converted a json data from a .json file to an R object with using fromJSON() of jsonlite library like this:
library(jsonlite)
jsonR<-fromJSON(txt="data.json")
If I explore the stringed values of the jsonR object I meet some strange sequences of chars.
For example if a string value of the original "data.json" was 😩 then R read it as \xf0\u009f\u0098©. And when I write this value back to file with cat() it becomes < f0>.
Can anyone suggest how one is supposed to do for keeping the correct original encoding while converting?
There must be something wrong with your requested URL. If that's not the problem, try it with the following packages:
library(RCurl) or library(RJSONIO)

Reading an Excel file into an R dataframe from a zipped folder

I have an Excel file (.xls extension) that is inside a zipped folder that I would like to read as a dataframe into R. I loaded the gdata library and set up my working directory to the folder that houses the zipped folder.
When I type in the following syntax:
data_frame1 <- read.xls( unz("./Data/Project1.zip","schools.xls"))
I get the following error messages:
Error in path.expand(xls) : invalid 'path' argument
Error in file.exists(tfn) : invalid 'file' argument
I'm guessing that I'm missing some arguments in the syntax, but I'm not entirely sure what else needs to be included.
Thanks for your help! This R newbie really appreciates it!
Unfortunately, after a quick survey of all the xls functions I know, there is no xls reading function that can recognize the unz output (I would love to be proven wrong here). If it were a 'csv' it would work fine. As it stands, until such a function is written, you must do the loading in two steps extraction and then loading.
To give you a little more control, you can specify which file to unzip as well as the directory to place the files with unzip.
# default exdir is current directory
unzip(zipfile="./Data/Project1.zip", files = "schools.xls", exdir=".")
dataframe_1 <- read.xls("schools.xls")
Sadly, this also means that you must do cleanup afterwards if you don't want the 'xls' file hanging around.

Resources