How can i convert a JSON file in R to a dataframe? - r

I want to load the data from a JSON file into R to make a new dataframe. However the JSON file consists out of other links with data, so i can't seem to find the actual data from the JSON file. I got the JSON file from this website: https://ckan.dataplatform.nl/dataset/467dc230-20e0-4c3a-8240-dccbfc20807a/resource/531cc276-b88e-49bb-a97f-443707936a12/download/p-route-autoparkeren.json
This is the code i used.
library(rjson)
JSONList1 <- fromJSON(file = "utrecht2.json")
print(JSONList1)
JSONList1_df <- as.data.frame(JSONList1)
when i use this code i get only 1 observation with 411 variables.
Any idea how to do this? I'm a beginner and i've never worked with JSON files.

Maybe try fromJSON from package jsonlite
library(jsonlite)
JSONList1 <- fromJSON("https://ckan.dataplatform.nl/dataset/467dc230-20e0-4c3a-8240-dccbfc20807a/resource/531cc276-b88e-49bb-a97f-443707936a12/download/p-route-autoparkeren.json")

There are several packages offering JSON importing abilities. If I use the one I am involved with, then the resulting data appears to contain a data.frame as the first list element.
d <- RcppSimdJson::fload("https://ckan.dataplatform.nl/dataset/467dc230-20e0-4c3a-8240-dccbfc20807a/resource/531cc276-b88e-49bb-a97f-443707936a12/download/p-route-autoparkeren.json")
> class(d)
[1] "list"
> class(d[[1]])
[1] "data.frame"
>
> head(d[[1]])
dynamicDataUrl
1 http://opendata.technolution.nl/opendata/parkingdata/v1/dynamic/8d85bbdb-8bbd-4a24-b35f-85f21186ec04
2 http://opendata.technolution.nl/opendata/parkingdata/v1/dynamic/21b0388a-56f7-4cba-8fd3-4a1c914f5fe2
3 http://opendata.technolution.nl/opendata/parkingdata/v1/dynamic/45434989-3252-4c85-8731-c856b02c390c
4 http://opendata.technolution.nl/opendata/parkingdata/v1/dynamic/9064b206-7e62-402d-ae62-f25a0e47571b
5 http://opendata.technolution.nl/opendata/parkingdata/v1/dynamic/5829fb06-ee4a-4762-946c-ed6209edf7d5
6 http://opendata.technolution.nl/opendata/parkingdata/v1/dynamic/e4da517a-ef32-426d-821c-96e29ac5ac80
staticDataUrl
1 http://opendata.technolution.nl/opendata/parkingdata/v1/static/8d85bbdb-8bbd-4a24-b35f-85f21186ec04
2 http://opendata.technolution.nl/opendata/parkingdata/v1/static/21b0388a-56f7-4cba-8fd3-4a1c914f5fe2
3 http://opendata.technolution.nl/opendata/parkingdata/v1/static/45434989-3252-4c85-8731-c856b02c390c
4 http://opendata.technolution.nl/opendata/parkingdata/v1/static/9064b206-7e62-402d-ae62-f25a0e47571b
5 http://opendata.technolution.nl/opendata/parkingdata/v1/static/5829fb06-ee4a-4762-946c-ed6209edf7d5
6 http://opendata.technolution.nl/opendata/parkingdata/v1/static/e4da517a-ef32-426d-821c-96e29ac5ac80
limitedAccess identifier name
1 FALSE 8d85bbdb-8bbd-4a24-b35f-85f21186ec04 P06 - Sluisstraat
2 FALSE 21b0388a-56f7-4cba-8fd3-4a1c914f5fe2 3 - Burcht
3 FALSE 45434989-3252-4c85-8731-c856b02c390c P01 - Stationsplein
4 FALSE 9064b206-7e62-402d-ae62-f25a0e47571b Jaarbeurs P3 - Jaarbeurs P3
5 FALSE 5829fb06-ee4a-4762-946c-ed6209edf7d5 P03 - Dek Stadspoort
6 FALSE e4da517a-ef32-426d-821c-96e29ac5ac80 PG-Pieter Vreedeplein
locationForDisplay
1 NA
2 WGS84, 52.4387428557465, 4.82805132865906
3 WGS84, 52.2573226613971, 6.16240739822388
4 WGS84, 52.0854991774024, 5.10619640350342
5 WGS84, 52.256324421386, 6.15569114685059
6 WGS84, 51.5582297848141, 5.08894979953766
>
I would expect this to be similar for the other ones.

Related

Reading a JSON file (with 1 key to many values mapping) in R

I have a file named data.json. It has the following contents:
{
"ID":["1","2","3","4","5","6","7","8" ],
"Name":["Rick","Dan","Michelle","Ryan","Gary","Nina","Simon","Guru" ],
"Salary":["623.3","515.2","611","729","843.25","578","632.8","722.5" ],
"StartDate":[ "1/1/2012","9/23/2013","11/15/2014","5/11/2014","3/27/2015","5/21/2013",
"7/30/2013","6/17/2014"],
"Dept":[ "IT","Operations","IT","HR","Finance","IT","Operations","Finance"]
}
In RStudio, I have installed the 'rjson' package and have the following code:
library("rjson")
myData <- fromJSON(file="data.json")
print(myData)
As per the description of the fromJSON() function, it should read the contents of 'data.json' file into a R object 'myData'. When I executed it, I got the following error:
Error in fromJSON(file = "data.json") :
not all data was parsed (0 chars were parsed out of a total of 3 chars)
I validated the structure of the 'data.json' file on https://jsonlint.com/. It was valid.
I searched stackoverflow.com and got the following page: Error in fromJSON("employee.json") : not all data was parsed (0 chars were parsed out of a total of 13 chars)
My program already complies with the answers given here but the 'data.json' file is still not getting parsed.
I would be grateful if you could point out what mistake I am making in the R program or JSON file as I am new to both.
Thank You.
I can confirm the error for rjson, but jsonlite::fromJSON appears to work.
jsonlite::fromJSON('foo.dat') |> as.data.frame()
# ID Name Salary StartDate Dept
# 1 1 Rick 623.3 1/1/2012 IT
# 2 2 Dan 515.2 9/23/2013 Operations
# 3 3 Michelle 611 11/15/2014 IT
# 4 4 Ryan 729 5/11/2014 HR
# 5 5 Gary 843.25 3/27/2015 Finance
# 6 6 Nina 578 5/21/2013 IT
# 7 7 Simon 632.8 7/30/2013 Operations
# 8 8 Guru 722.5 6/17/2014 Finance

Loading multiple files into R at the same time (with similar file names)

I am trying to load in multiple files into an R environment, I have tried something like the following;
files <- list.files(pattern = ".Rda", recursive = TRUE)
lapply(files,load,.GlobalEnv)
Which only loads in one data file (incorrectly). The problem I am finding is that all the files have the same names across each years. For example "Year1/beer/beer.Rda" has also "Year2/beer/beer.Rda".
I am trying to rename the data files upon import so beer1 and beer2 will correspond to beer year 1 and beer year 2 etc.
Anybody have a better method of loading in the data? I have more than 2 years worth of data.
File names:
[1] "Year1/beer/beer.Rda" "Year1/blades/blades.Rda" "Year1/carbbev/carbbev.Rda"
[4] "Year1/cigets/cigets.Rda" "Year1/coffee/coffee.Rda" "Year1/coldcer/coldcer.Rda"
[7] "Year1/deod/deod.Rda" "Year1/diapers/diapers.Rda" "Year1/factiss/factiss.Rda"
[10] "Year1/fzdinent/fzdinent.Rda" "Year1/fzpizza/fzpizza.Rda" "Year1/hhclean/hhclean.Rda"
[13] "Year1/hotdog/hotdog.Rda" "Year1/laundet/laundet.Rda" "Year1/margbutr/margbutr.Rda"
[16] "Year1/mayo/mayo.Rda" "Year1/milk/milk.Rda" "Year1/mustketc/mustketc.Rda"
[19] "Year1/paptowl/paptowl.Rda" "Year1/peanbutr/peanbutr.Rda" "Year1/photo/photo.Rda"
[22] "Year1/razors/razors.Rda" "Year1/saltsnck/saltsnck.Rda" "Year1/shamp/shamp.Rda"
[25] "Year1/soup/soup.Rda" "Year1/spagsauc/spagsauc.Rda" "Year1/sugarsub/sugarsub.Rda"
[28] "Year1/toitisu/toitisu.Rda" "Year1/toothbr/toothbr.Rda" "Year1/toothpa/toothpa.Rda"
[31] "Year1/yogurt/yogurt.Rda" "Year2/beer/beer.Rda" "Year2/blades/blades.Rda"
[34] "Year2/carbbev/carbbev.Rda" "Year2/cigets/cigets.Rda" "Year2/coffee/coffee.Rda"
[37] "Year2/coldcer/coldcer.Rda" "Year2/deod/deod.Rda" "Year2/diapers/diapers.Rda"
[40] "Year2/factiss/factiss.Rda" "Year2/fzdinent/fzdinent.Rda" "Year2/fzpizza/fzpizza.Rda"
[43] "Year2/hhclean/hhclean.Rda" "Year2/hotdog/hotdog.Rda" "Year2/laundet/laundet.Rda"
[46] "Year2/margbutr/margbutr.Rda" "Year2/mayo/mayo.Rda" "Year2/milk/milk.Rda"
[49] "Year2/mustketc/mustketc.Rda" "Year2/paptowl/paptowl.Rda" "Year2/peanbutr/peanbutr.Rda"
[52] "Year2/photo/photo.Rda" "Year2/razors/razors.Rda" "Year2/saltsnck/saltsnck.Rda"
[55] "Year2/shamp/shamp.Rda" "Year2/soup/soup.Rda" "Year2/spagsauc/spagsauc.Rda"
[58] "Year2/sugarsub/sugarsub.Rda" "Year2/toitisu/toitisu.Rda" "Year2/toothbr/toothbr.Rda"
[61] "Year2/toothpa/toothpa.Rda" "Year2/yogurt/yogurt.Rda"
One solution is to parse the file names and assign them as names to elements in a list of data frames. We'll use some sample data that has monthly sales for beer brands across two years that were saved as CSV files into two subdirectories, year1 and year2.
We will use lapply() to read the files into a list of data frames, and then use the names() function to name each element by appending year<x>. to the file name (excluding .csv).
fileList <- c("year1/beer.csv","year2/beer.csv")
data <- lapply(fileList,function(x){
read.csv(x)
})
# generate data set names to be assigned to elements in the list
fileNameTokens <- strsplit(fileList,"/|[.]")
theNames <- unlist(lapply(fileNameTokens,function(x){
paste0(x[1],".",x[2])
}))
names(data) <- theNames
# print first six rows of file 1 based on named extract
data[["year1.beer"]][1:6,]
...and the output.
> data[["year1.beer"]][1:6,]
Month Item Sales
1 1 Budweiser 83047
2 2 Budweiser 38374
3 3 Budweiser 47287
4 4 Budweiser 18417
5 5 Budweiser 23981
6 6 Budweiser 55471
>
Next, we'll print the first few rows of the second file.
> # print first six rows of file 1 based on named extract
> data[["year2.beer"]][1:6,]
Month Item Sales
1 1 Budweiser 23847
2 2 Budweiser 33847
3 3 Budweiser 44400
4 4 Budweiser 35333
5 5 Budweiser 18710
6 6 Budweiser 63108
>
If one needs to access the files directly without relying on the list() names, they can be assigned to the parent environment within the lapply() function via the assign() function, as noted in the other answer.
# alternate form, assigning directly to parent environment
data <- lapply(fileList,function(x){
# x is the filename, parse into strings to generate data set name
fileNameTokens <- unlist(strsplit(x,"/|[.]"))
assign(paste0(fileNameTokens[1],".",fileNameTokens[2]), read.csv(x),pos=1)
})
head(year1.beer)
...and the output.
> head(year1.beer)
Month Item Sales
1 1 Budweiser 83047
2 2 Budweiser 38374
3 3 Budweiser 47287
4 4 Budweiser 18417
5 5 Budweiser 23981
6 6 Budweiser 55471
>
The technique also works with RDS files as follows.
data <- lapply(fileList,function(x){
# x is the filename, parse into strings to generate data set name
fileNameTokens <- unlist(strsplit(x,"/|[.]"))
assign(paste0(fileNameTokens[1],".",fileNameTokens[2]), readRDS(x),pos=1)
})
head(year1.beer)
...and the output.
> head(year1.beer)
Month Item Sales
1 1 Budweiser 83047
2 2 Budweiser 38374
3 3 Budweiser 47287
4 4 Budweiser 18417
5 5 Budweiser 23981
6 6 Budweiser 55471
>
One option might be to load the files in a new environment and then assign them to a custom named object in the parent environment.
This is modified from https://stackoverflow.com/a/5577647/6561924
# first create custom names for objects (e.g. add folder names)
file_names <- gsub("/", "_", files)
file_names <- gsub("\\.Rda", "", file_names)
# function to load objects in new environ
load_obj <- function(f, f_name) {
env <- new.env()
nm <- load(f, env)[1] # load into new environ and capture name
assign(f_name, env[[nm]], pos = 1) # pos 1 is parent env
}
# load all
mapply(load_obj, files, file_names)

How to convert a list with same type of field to a data.frame in R

I have a list and the field inside each list element is of same name(only values are different) and I need to convert that into a data.frame with column name is same as that of field name. Following is my list,
Data input (data input in json format.json)
library(rjson)
data <- fromJSON(file = "data input in json format.json")
head(data,3)
[[1]]
[[1]]$floors
[1] 5
[[1]]$elevation
[1] 15
[[1]]$bmi
[1] 23.7483
[[2]]
[[2]]$floors
[1] 4
[[2]]$elevation
[1] 12
[[2]]$bmi
[1] 23.764
[[3]]
[[3]]$floors
[1] 3
[[3]]$elevation
[1] 9
[[3]]$bmi
[1] 23.7797
And my expected data.frame is,
floors elevation bmi
5 15 23.7483
4 12 23.7640
3 9 23.7797
Can you help me to figure out this ?.
Thanks in adavance.
You can use jsonlite.
library(jsonlite)
Then use fromJSON() and specify the path to your file (or alternatively a URL or the raw text) in the argument txt:
fromJSON(txt = 'path/to/json/file.json')
The result is:
floors elevation bmi
1 5 15 23.7483
2 4 12 23.7640
3 3 9 23.7797
If you prefer rjson, you could first read it as previously:
data <- rjson::fromJSON(file = 'path/to/json/file.json')
Then use do.call() and rbind.data.frame() to convert the list to a dataframe:
do.call("rbind.data.frame", data)
Alternatively to do.call(): use data.tables rbindlist() which is faster:
data.table::rbindlist(data)

Is there anyway to read .dat file from movielens to R studio

I am trying to use Import Dataset in R Studio to read ratings.dat from movielens.
Basically it has this format:
1::1::5::978824268
1::1022::5::978300055
1::1028::5::978301777
1::1029::5::978302205
1::1035::5::978301753
So I need to replace :: by : or ' or white spaces, etc. I use notepad++, it helps to load the file quite fast (compare to note) and can view very big file easily. However, when I do replacement, it shows some strange characters:
"LF"
as I do some research here, it said that it is \n (line feed or line break). But I do not know why when it load the file, it do not show these, only when I do replacement then they appear. And when I load into R Studio, it still detect as "LF", not line break and cause error in data reading.
What is the solution for that ? Thank you !
PS: I know there is python code for converting this but I don't want to use it, is there any other ways ?
Try this:
url <- "http://files.grouplens.org/datasets/movielens/ml-10m.zip"
## this part is agonizingly slow
tf <- tempfile()
download.file(url,tf, mode="wb") # download archived movielens data
files <- unzip(tf, exdir=tempdir()) # unzips and returns a vector of file names
ratings <- readLines(files[grepl("ratings.dat$",files)]) # read rating.dat file
ratings <- gsub("::", "\t", ratings)
# this part is much faster
library(data.table)
ratings <- fread(paste(ratings, collapse="\n"), sep="\t")
# Read 10000054 rows and 4 (of 4) columns from 0.219 GB file in 00:00:07
head(ratings)
# V1 V2 V3 V4
# 1: 1 122 5 838985046
# 2: 1 185 5 838983525
# 3: 1 231 5 838983392
# 4: 1 292 5 838983421
# 5: 1 316 5 838983392
# 6: 1 329 5 838983392
Alternatively (use the d/l code from jlhoward but he also updated his code to not use built-in functions and switch to data.table while i wrote this, but mine's still faster/more efficient :-)
library(data.table)
# i try not to use variable names that stomp on function names in base
URL <- "http://files.grouplens.org/datasets/movielens/ml-10m.zip"
# this will be "ml-10m.zip"
fil <- basename(URL)
# this will download to getwd() since you prbly want easy access to
# the files after the machinations. the nice thing about this is
# that it won't re-download the file and waste bandwidth
if (!file.exists(fil)) download.file(URL, fil)
# this will create the "ml-10M100K" dir in getwd(). if using
# R 3.2+ you can do a dir.exists() test to avoid re-doing the unzip
# (which is useful for large archives or archives compressed with a
# more CPU-intensive algorithm)
unzip(fil)
# fast read and slicing of the input
# fread will only spit on a single delimiter so the initial fread
# will create a few blank columns. the [] expression filters those
# out. the "with=FALSE" is part of the data.table inanity
mov <- fread("ml-10M100K/ratings.dat", sep=":")[, c(1,3,5,7), with=FALSE]
# saner column names, set efficiently via data.table::setnames
setnames(mov, c("user_id", "movie_id", "tag", "timestamp"))
mov
## user_id movie_id tag timestamp
## 1: 1 122 5 838985046
## 2: 1 185 5 838983525
## 3: 1 231 5 838983392
## 4: 1 292 5 838983421
## 5: 1 316 5 838983392
## ---
## 10000050: 71567 2107 1 912580553
## 10000051: 71567 2126 2 912649143
## 10000052: 71567 2294 5 912577968
## 10000053: 71567 2338 2 912578016
## 10000054: 71567 2384 2 912578173
It's quite a bit faster than built-in functions.
Small improvement to #hrbrmstr's answer:
mov <- fread("ml-10M100K/ratings.dat", sep=":", select=c(1,3,5,7))

read.delim() in R doesn't read .txt files keeping original values for variables

Hi everybody I am trying to load a .txt file in R and I got some troubles trying to make this. My original .txt file has the next structure (I can't add dput version because the loading of my file is wrong but I include variables and their values):
ID Key.Number
1708888894 4222200000549012
0208823891 0002200000549111
0508823891 1717100000549111
0999923891 1717100000549111
0708888894 0002200000591111
It has the name "Testing.txt" and is formed by 2 variables ID and Key.number. In the .txt file variables are separated by tabulation (tab). For reading my file I used this code:
test=read.delim("Testing.txt")
test
And I got this:
ID Key.Number
1 1708888894 4.222200e+15
2 208823891 2.200001e+12
3 508823891 1.717100e+15
4 999923891 1.717100e+15
5 708888894 2.200001e+12
How you can see for ID column zero was omited and for Key.Number all values are in scientific format. Also I have tried with read.table but due to the nature of source file (Testing is only an example) column names are included in the first row and when I use col.names()=test[1,] I don't get the original names. I got this with read.table()
V1 V2
1 ID Key.Number
2 1708888894 4222200000549012
3 0208823891 0002200000549111
4 0508823891 1717100000549111
5 0999923891 1717100000549111
6 0708888894 0002200000591111
Many thanks for your help and your advice It is important for me.
DF <- read.table(text="ID Key.Number
1708888894 4222200000549012
0208823891 0002200000549111
0508823891 1717100000549111
0999923891 1717100000549111
0708888894 0002200000591111",
colClasses="character", header=TRUE, blank.lines.skip=FALSE)
# ID Key.Number
# 1
# 2 1708888894 4222200000549012
# 3 0208823891 0002200000549111
# 4 0508823891 1717100000549111
# 5 0999923891 1717100000549111
# 6 0708888894 0002200000591111

Resources