Converting JSON file with nested lists to R Dataframe - r

I have downloaded a JSON dataset from https://ped.uspto.gov/peds/#!/#%2FapiDocumentation. I need to convert it to a dataframe in R. I checked the guidance provided at:
Flatten nested lists in dataframe after JSON import in R and Convert nested json file to Dataframe in R but these are not helpful in my case.
Given below are my R code and R Output. The output shows there are nested lists. Kindly suggest how to convert these into a dataframe. Thanks.
df1 <- fromJSON(
"E:\\IIMU\\Databases\\USPTO\\2020-2022-pairbulk-full-20220609-json\\2022.json",
simplifyVector = TRUE,
simplifyDataFrame = TRUE,
simplifyMatrix = TRUE,
flatten = TRUE
)
R Output

Related

Using Character as Naming Convention in R

I am analyzing a data set and have created a function that summarizes most of my columns. The goal of my script is to automate the creation and extraction of summary tables(more or less dataframes).
To generalize as much as possible, I want to pass a character string to my function to be used to name columns, rows, files and more.
What I am working with currently:
NameFun <- function(df, name) {
##Name the first column
colnames(df)[1] <- "name"
##Write DF to Excel Workbook
write.xlsx(df, "Workbook.xlsx", sheetName = "name",
col.names = TRUE, row.names = TRUE, append = TRUE)
}
The objective here is to input a character "name" and use it within the function. I have tried "eval", "assign", and "get" with no luck. I have tried a few other attempts but either R doesn't recognize it in the environment, does nothing at all, or rejects the idea of passing a character all together.
I am open to any other solutions as to help generalize my script even more. Each column will have a unique name but report the same number of columns and type of metrics. Ideally, I would be able to pass a list of each column to the function and loop it through the whole data set.
Thanks!
-J
You could probably do this:
#Initialize a list to hold your results
ll<-list()
# You can run a loop or run it multiple times to generate your summary
ll[[name]]<-summary_Method(...) # Or pass the df
NameFun<-function(name, ll, df){
ll[[name]]<-df
}
# Write the list of dataframe to excel file.
lapply(names(ll), function(x) write.xlsx(ll[[x]], 'Workbook.xlsx', sheetName=x, append=TRUE))

Merging of multiple excel files in R

I am getting a basic problem in R. I have to merge 72 excel files with similar data type having same variables. I have to merge them to a single data set in R. I have used the below code for merging but this seems NOT practical for so many files. Can anyone help me please?
data1<-read.csv("D:/Customer_details1/data-01.csv")
data2<-read.csv("D:/Customer_details2/data-02.csv")
data3<-read.csv("D:/Customer_details3/data-03.csv")
data_merged<-merge(data1,data2,all.x = TRUE, all.y = TRUE)
data_merged2<-merge(data_merged,data3,all.x = TRUE, all.y = TRUE)
First, if the extensions are .csv, they're not Excel files, they're .csv files.
We can leverage the apply family of functions to do this efficiently.
First, let's create a list of the files:
setwd("D://Customer_details1/")
# create a list of all files in the working directory with the .csv extension
files <- list.files(pattern="*.csv")
Let's use purrr::map in this case, although we could also use lapply - updated to map_dfr to remove the need for reduce, by automatically rbind-ing into a data frame:
library(purrr)
mainDF <- files %>% map_dfr(read.csv)
You can pass additional arguments to read.csv if you need to: map(read.csv, ...)
Note that for rbind to work the column names have to be the same, and I'm assuming they are based on your question.
#Method I
library(readxl)
library(tidyverse)
path <- "C:/Users/Downloads"
setwd(path)
fn.import<- function(x) read_excel("country.xlsx", sheet=x)
sheet = excel_sheets("country.xlsx")
data_frame = lapply(setNames(sheet, sheet), fn.import )
data_frame = bind_rows(data_frame, .id="Sheet")
print (data_frame)
#Method II
install.packages("rio")
library(rio)
path <- "C:/Users/country.xlsx"
data <- import_list(path , rbind=TRUE)

Convert XML Doc to CSV using R - Or search items in XML file using R

I have a large, un-organized XML file that I need to search to determine if a certain ID numbers are in the file. I would like to use R to do so and because of the format, I am having trouble converting it to a data frame or even a list to extract to a csv. I figured I can search easily if it is in a csv format. So , I need help understanding how to do convert it and extract it properly, or how to search the document for values using R. Below is the code I have used to try and covert the doc,but several errors occur with my various attempts.
## Method 1. I tried to convert to a data frame, but the each column is not the same length.
require(XML)
require(plyr)
file<-"EJ.XML"
doc <- xmlParse(file,useInternalNodes = TRUE)
xL <- xmlToList(doc)
data <- ldply(xL, data.frame)
datanew <- read.table(data, header = FALSE, fill = TRUE)
## Method 2. I tried to convert it to a list and the file extracts but only lists 2 words on the file.
data<- xmlParse("EJ.XML")
print(data)
head(data)
xml_data<- xmlToList(data)
class(data)
topxml <- xmlRoot(data)
topxml <- xmlSApply(topxml,function(x) xmlSApply(x, xmlValue))
xml_df <- data.frame(t(topxml),
row.names=NULL)
write.csv(xml_df, file = "MyData.csv",row.names=FALSE)
I am going to do some research on how to search within R as well, but I assume the file needs to be in a data frame or list to so either way. Any help is appreciated! Attached is a screen shot of the data. I am interested in finding matching entity id numbers to a list I have in a excel doc.

Export SpectraObjects to csv in ChemoSpec

I am using ChemoSpec to analyse FTIR spectra in R.
I was able to import several csv files using files2SpectraObject, applied some of the data pre-processing procedures, such as normalization and binning, and generated new SpectraObjects with the results.
Is it possible to export data back to csv format from the generated SpectraObjects?
So far I tried this
write.table(ftirbin, "E:/ftirbin.txt", sep="\t")
and got this:
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class ""Spectra"" to a data.frame
Thanks in advance!
G
If you look at ?Spectra you'll see how a Spectra object is stored. The intensity values are in your_object$data, and the frequency values are in your_object$freq. So you can't export the whole object (it's not a data frame, but rather a list), but you can export the pieces. To export the frequencies in the first column, and the samples in the following columns, you can do this (example uses a built in data set, SrE.IR):
tmp <- cbind(SrE.IR$freq, t(SrE.IR$data))
colnames(tmp) <- c("freq", SrE.IR$names)
tmp <- as.data.frame(tmp) # it was a matrix
Then you can write it out using write.csv or write.table (check the arguments to avoid row numbers).

how to write multiple dataframe to a single csv file in a loop in R?

I would like to write a multiple dataframe "neighbours_dataframe" in a single CSV file :
I use this line to write the multiple dataframe to multiple file :
for(i in 1:vcount(karate)){
write.csv(neighbours_dataframe[[i]], file = as.character(V(karate3)$name[i]),row.names=FALSE)}
if I use this code:
for(i in 1:vcount(karate)){
write.csv(neighbours_dataframe[[i]], file = "karate3.csv",row.names=FALSE)}
this would give me just the last dataframe in the csv file :
I was wondering , How could I have a single CSV file which have all the dataframe in the way that the column header of the first dataframe just written to the csv file and all other data frame copied in a consecutive manner ?
thank you in advance
Two methods; the first is likely to be a little faster if neighbours_dataframe is a long list (though I haven't tested this).
Method 1: Convert the list of data frames to a single data frame first
As suggested by jbaums.
library(dplyr)
neighbours_dataframe_all <- rbind_all(neighbours_dataframe)
write.csv(neighbours_dataframe_all, "karate3.csv", row.names = FALSE)
Method 2: use a loop, appending
As suggested by Neal Fultz.
for(i in seq_along(neighbours_dataframe))
{
write.table(
neighbours_dataframe[[i]],
"karate3.csv",
append = i > 1,
sep = ",",
row.names = FALSE,
col.names = i == 1
)
}

Resources