Convert json file with multiple lines to R dataframe - r

I'm using jsonr to read a JSON file in to R. However, the fromJSON(file="file.json") command is only reading the first line of the file. Here's the JSON:
{"id":"a","emailAddress":"a#a.com","name":"abc"}
{"id":"b","emailAddress":"b#b.com","name":"def"}
{"id":"c","emailAddress":"c#c.com","name":"ghi"}
How do I get all 3 rows into an R dataframe? Note that the above content lives in a single file.

I found a hacky way to do that; First i read in the whole file/string with readr, then i split the data by new lines "\n", and finally i parse each line with fromJSON and then i bind it into one dataframe:
library(jsonlite)
library(readr)
json_raw <- readr::read_file("file.json")
json_lines <- unlist(strsplit(json_raw, "\\n"))
json_df <- do.call(rbind, lapply(json_lines,
FUN = function(x){as.data.frame(jsonlite::fromJSON(x))}))

Related

rjson reads 35-column 40-row file as one long row

I am trying to test unicode-heavy imports of various R packages. I'm through everything but JSON because of a persisetant error: The file is read in as one long, single-row file. The file is available here.
I think I am following the instructions in the help. I have tried two approaches:
read the data into an object, then convert to a data frame.
raw_json_data <- read_file("World Class.json")
test_json <- fromJSON(raw_json_data)
as.data.frame(test_json)
Read the file using fromJSON() then convert to a data frame. I happen to be using R's new pipe here, but that doesn't seem to matter.
rjson_json <- fromJSON(file = "World Class.json") |>
as.data.frame()
In every attempt, I get the same result: a data frame of 1 column and 1400 variables. Is there a step I am missing in this conversion?
EDIT: I am not looking for the answer "Use package X instead". The rjson package seems to read in the JSON data, which has a quite simple structure. The problem is that the as.data.frame() call results in one-row, 1400-character data frame, and I'm asking wht that is.
Try the jsonlite package instead.
library(jsonlite)
## next line gives warning: JSON string contains (illegal) UTF8 byte-order-mark!
json_data <- fromJSON("World Class.json") # from file
dim(json_data)
[1] 40 35

Importing CSV file by read.csv but the function recognize wrong number of columns

I tried to import the CSV file from here: https://covid19.who.int/WHO-COVID-19-global-table-data.csv using read.csv function:
WHO_data <- read.csv("https://covid19.who.int/WHO-COVID-19-global-table-data.csv")
But the WHO_data I got has 12 columns and recognizes the first column as a row name.
I tried another method by getting a tibble instead of dataframe:
library(readr)
WHO_data <- read_csv("https://covid19.who.int/WHO-COVID-19-global-table-data.csv")
It then gives the error below:
Warning: 1 parsing failure.
row col expected actual file
1 -- 12 columns 13 columns 'https://covid19.who.int/WHO-COVID-19-global-table-data.csv'
Can anyone help me explain why this happens and how to fix this?
The file seem to be improperly formatted. There is an extra comma on the end of the second line. You can read the raw line data, remove the comma, then pass to read.csv. For example
file <- "https://covid19.who.int/WHO-COVID-19-global-table-data.csv"
rows <- readLines(file)
rows[2] <- gsub(",$", "", rows[2])
WHO_data <- read.csv(text=rows)
Here is another solution based on the data.table package. If you want to return a data.frame (as opposed to data.table), you can additionally specify the argument data.table=FALSE to the fread function:
library(data.table)
file <- "https://covid19.who.int/WHO-COVID-19-global-table-data.csv"
WHO_data <- fread(file, select=1:12, fill=TRUE)

rbind.fill based on common pattern R

I have a lot of text files in R that are written in the following format:
building_000000.txt
building_window_roof_000123.txt
building_window_roof_000126.txt
...
which I have listed using this command
files_list <- list.files(pattern="txt")
What I wanted to do is to bind all files (dataframes) which have this pattern "building_roof_window_\\\\\d+" into a single .txt file by using mget(ls). I also wanted to use "rbind.fill" because not all dataframes have the same number of columns. So this is what I tried to do:
building_roof_window <- do.call("rbind.fill", mget(ls(pattern="^building[_]roof[_]window[_]\\\\\\d+")))
But the result is an empty dataframe.
What am I missing? Is it perhaps due to the sloppy use of regex?
The main task is to select filenames using correct regex. We can use the regex as below :
files_list <- list.files(pattern= 'building_roof_window_\\d+.*\\.txt$')
building_roof_window <- do.call(plyr::rbind.fill, mget(files_list))

Extracting a single cell value from multiple csv files in R and

I have 500 csv. files with data that looks like:
sample data
I want to extract one cell (e.g. B4 or 0.477) per a csv file and combine those values into a single csv. What are some recommendations on how to do this easily?
You can try something like this
all.fi <- list.files("/path/to/csvfiles", pattern=".csv", full.names=TRUE) # store names of csv files in path as a string vector
library(readr) # package for read_lines and write_lines
ans <- sapply(all.fi, function(i) { eachline <- read_lines(i, n=4) # read only the 4th line of the file
ans <- unlist(strsplit(eachline, ","))[2] # split the string on commas, then extract the 2nd element of the resulting vector
return(ans) })
write_lines(ans, "/path/to/output.csv")
I can not add a comment. So, I will write my comment here.
Since your data is very large and it is very difficult to load it individually, then try this: Importing multiple .csv files into R. It is similar to the first part of your problem. For second part, try this:
You can save your data as a data.frame (as with the comment of #Bruno Zamengo) and then you can use select and merge functions in R. Then, you can easily combine them in single csv file. With select and merge functions you can select all the values you need and them combine them. I used this idea in my project. Do not forget to use lapply.

How to combine multiple datasets into one in R?

I have 3 text files each of which has 14 similar columns. I want to first read these 3 files (data frames) and then combine them into one data frame. Following is what I have tried after finding some help in R mailing list:
file_name <- list.files(pattern='sEMA*') # CREATING A LIST OF FILE NAMES OF FILES HAVING 'sEMA' IN THEIR NAMES
NGSim <- lapply (file_name, read.csv, sep=' ', header=F, strip.white=T) # READING ALL THE TEXT FILES
This piece of code can read the files altogether but does not combine them into one data frame. I have tried data.frame(NGSim) but R gives an error: cannot allocate vector of size 4.2 Mb. How can I combine the files in one single data frame?
Like this:
do.call(rbind, NGSim)
library(plyr)
rbind.fill(NGSim)
or,
ldply(NGSim)
If file size is an issue that's the case you may want to the use data.table functions instead of less efficient base functions like read.csv().
library(data.table)
NGSim <- data.frame(rbindlist(lapply(list.files(pattern='sEMA*'),fread)))

Resources