How to get public github JSON into R as a list of lists? - r

I would like to experiment with some soccer data made publicly available by Statsbomb on their github page. Here's a list to one of the JSONs from their github page that is available:
https://raw.githubusercontent.com/statsbomb/open-data/master/data/events/7298.json
My question is, how can I get this into R? I have tried the following:
httr::content(httr::GET("https://raw.githubusercontent.com/statsbomb/open-data/master/data/events/7298.json"))
however, this simply returns a length-1 character vector with the whole JSON squeezed into the string. I would preferably like this as a list of lists. How could I do this?
Thanks !!
EDIT: here is Statsbomb's public github repo - if it helps at all!

If you want to turn the JSON file in to an R object, you'll need to actually parse the data, not just downline the file. the jsonlite library makes this easy
url <- "https://raw.githubusercontent.com/statsbomb/open-data/master/data/events/7298.json"
mydata <- jsonlite::read_json(url)
And then mydata is now a big list with all the parsed values from the JSON object.

Related

how to read json data and retrieve all the values in the tabular format in R

{"_id":{"$oid":"58eb2d682d546d63a0252c1e"},"CookieID":"2202256445201745143942335","RespCode":"201","Dateval":"05042017155753521","urlval":"http://taxguru.in/goods-and-service-tax/sales-tax-forms-demystified.html"}
{"_id":{"$oid":"58eb2d682d546d63a0252c1f"},"CookieID":"493511602017451518184","RespCode":"300","Dateval":"05042017155751163","urlval":"http://taxguru.in/income-tax/understanding-44ab-tax-audit.html"}
{"_id":{"$oid":"58eb2d682d546d63a0252c20"},"CookieID":"201745145852229","RespCode":"300","Dateval":"05042017155750765","urlval":"http://taxguru.in/income-tax/capital-gains-income-sale-agricultural-land.html"}
I need to read my json file in R using rjson package and get the entries in the form of a table.
I tried following code
library(rjson)
exposure=fromJSON(file="expodata.json")
expo_data=as.data.frame(exposure)
# View(expo_data)
When I view the data it is showing me only the first row and not the others.
Where am I going wrong.How to get all the rows when I view it.
I am a beginner and struggling to work aroung with this.Can someone help.

Get a list from R string that contains a csv

for one of my projects I will need to import the dataset (csv-File) outside of R and then assign it from the Ruby side of the project in R (this will be done with rinruby and already works).
In my R-Script I now need to create a list out of that csv file.
The variable contains an escaped string that contains the original csv.
data <- "\"\",\"futime\",\"fustat\",\"age\",\"resid.ds\",\"rx\",\"ecog.ps\"\n\"1\",59,1,72.3315,2,1,1\n\"2\",115,1,74.4932,2,1,1\n\"3\",156,1,66.4658,2,1,2\n\"4\",421,0,53.3644,2,2,1\n\"5\",431,1,50.3397,2,1,1\n\"6\",448,0,56.4301,1,1,2\n\"7\",464,1,56.937,2,2,2\n\"8\",475,1,59.8548,2,2,2\n\"9\",477,0,64.1753,2,1,1\n\"10\",563,1,55.1781,1,2,2\n\"11\",638,1,56.7562,1,1,2\n\"12\",744,0,50.1096,1,2,1\n\"13\",769,0,59.6301,2,2,2\n\"14\",770,0,57.0521,2,2,1\n\"15\",803,0,39.2712,1,1,1\n\"16\",855,0,43.1233,1,1,2\n\"17\",1040,0,38.8932,2,1,2\n\"18\",1106,0,44.6,1,1,1\n\"19\",1129,0,53.9068,1,2,1\n\"20\",1206,0,44.2055,2,2,1\n\"21\",1227,0,59.589,1,2,2\n\"22\",268,1,74.5041,2,1,2\n\"23\",329,1,43.137,2,1,1\n\"24\",353,1,63.2192,1,2,2\n\"25\",365,1,64.4247,2,2,1\n\"26\",377,0,58.3096,1,2,1"
And I would like to convert this to a R-List.
So my approach is basically to call read.csv(data_as_string) but unfortunately the signature is read.csv(file_where_data_lies).
How can this be done?
Thanks so much!
As Therkel mentioned above, myfunc(file = textConnection(data)) did exactly what I was about to do. Thanks!

Command to use with easy way the insert of R dataframe

I have a dataframe loaded successfully in R.
I would like to give the data of df to someone else to use them with quick and easy way without need to load again the file into a df.
Which is the command to give the whole data of df (not the str())
You can save the file into a .RData using save or save.image, depending on your needs. First one will save specific objects while the latter will dump the whole workspace to a file. This method has the advantage of working on probably any R object.
Another option is as #user1945827 mentioned, using dput which will produce a string that is parseable into another R session. This will not work for complex (like S4) objects.

How to put datasets into an R package

I am creating my own R package and I was wondering what are the possible methods that I can use to add (time-series) datasets to my package. Here are the specifics:
I have created a package subdirectory called data and I am aware that this is the location where I should save the datasets that I want to add to my package. I am also cognizant of the fact that the files containing the data may be .rda, .txt, or .csv files.
Each series of data that I want to add to the package consists of a single column of numbers (eg. of the form 340 or 4.5) and each series of data differs in length.
So far, I have saved all of the datasets into a .txt file. I have also successfully loaded the data using the data() function. Problem not solved, however.
The problem is that each series of data loads as a factor except for the series greatest in length. The series that load as factors contain missing values (of the form '.'). I had to add these missing values in order to make each column of data the same in length. I tried saving the data as unequal columns, but I received an error message after calling data().
A consequence of adding missing values to get the data to load is that once the data is loaded, I need to remove the NA's in order to get on with my analysis of the data! So, this clearly is not a good way of doing things.
Ideally (I suppose), I would like the data to load as numeric vectors or as a list. In this way, I wouldn't need the NA's appended to the end of each series.
How do I solve this problem? Should I save all of the data into one single file? If so, in what format should I do it? Perhaps I should save the datasets into a number of files? Again, in which format? What is the best practical way of doing this? Any tips would greatly be appreciated.
I'm not sure if I understood your question correctly. But, if you edit your data in your favorite format and save with
save(myediteddata, file="data.rda")
The data should be loaded exactly the way you saw it in R.
To load all files in data directory you should add
LazyData: true
To your DESCRIPTION file, in your package.
If this don't help you could post one of your files and a print of the format you want, this will help us to help you ;)
In addition to saving as rda files you could also choose to load them as numeric with:
read.table( ... , colClasses="numeric")
Or as non-factor-text:
read.table( ..., as.is=TRUE) # which does pretty much the same as stringsAsFactors=FALSE
read.table( ..., colClasses="character")
It also appears that the data function would accept these arguments sinc it is documented to be a simple wrapper for read.table(..., header=TRUE).
Preferred saving location of your data depends on its format.
As Hadley suggested:
If you want to store binary data and make it available to the user,
put it in data/. This is the best place to put example datasets.
If you want to store parsed data, but not make it available to the
user, put it in R/sysdata.rda. This is the best place to put data
that your functions need.
If you want to store raw data, put it in inst/extdata.
I suggest you have a look at the linked chapter as it goes into detail about working with data when developing R packages.
You'll need to create the data file and include it in the R package, and you may want to also document it. Here's how to do both.
Create the data file and include it in R package
Create a directory inside the package called /data and place any data in it. Use only .rda and .RData files.
When creating the rda/RData file from an R object, make sure the R object is named what you want it to be named when it's used in the package and use save() to create it. Example:
save(river_fish, file = "data/river_fish.rda", version = 2)
Add this on a new line in the file called DESCRIPTION:
LazyData: true
Documenting the dataset
Document the dataset by placing a string with the dataset name after the documentation:
#' This is data to be included in my package
#'
#' #author My Name \email{blahblah##roxygen.org}
#' #references \url{data_blah.com}
"data-name"
Here and here are some nice examples from dplyr.
Notes
To access the data in the package, run river_fish or whatever the name of the dataset is. Nothing more is needed.
Using version = 2 when calling save() ensures your data object is available for older R versions (i.e. prior to 3.5.0) i.e. it will prevent this warning:
WARNING: Added dependency on R >= 3.5.0 because serialized objects in serialize/load version 3 cannot be read in older versions of R.
No need to use load() in the R package (just call the object directly instead e.g. river_fish will be enough to yield the data from data/river_fish.rda), but in the event you do wish to load an rda/RData file for some reason (e.g. playing around or testing), this will do it:
load("data/river_fish.rda")
Informative sources here and here

How to print the structure of an R object to the console

What is the command for printing the structure of R objects so the object can be re-created by running the printed output?
The output often contains the structure function, and you can copy and paste the output into your code in order to easily create an object for a reproducible example.
I'm breaking my head all morning over this command that I ought to know.
The function is dput (or dump).
I think str function (aka structure) has more consistently useful information than dump.
From the R help:
Compactly Display the Structure of an Arbitrary R Object
Description
Compactly display the internal structure of an R object, a diagnostic function and an
alternative to summary (and to some extent, dput). Ideally, only one line for each
‘basic’ structure is displayed. It is especially well suited to compactly display the
(abbreviated) contents of (possibly nested) lists. The idea is to give reasonable
output for any R object. It calls args for (non-primitive) function objects.

Resources