The rjson::fromJSON() reads a file incorrectly while jsonlite::fromJSON() reads it fine. Here's a sample example.
file test.json contents:
{"name": "Sanjay",
"unit_price": 130848,
"amount": 11,
"up_to_data_sales": 45725}
the jsonlite fromJSON outputs:
jsonlite::fromJSON("test.json")
$name
[1] "Sanjay"
$unit_price
[1] 130848
$amount
[1] 11
$up_to_data_sales
[1] 45725
But the same throws an error in rjson package.
rjson::fromJSON("test.json")
Error in rjson::fromJSON("test.json") : parseTrue: expected to see 'true' - likely an unquoted string starting with 't'.
Why is this error coming?
What is the reason rjson package was launched when jsonlite existed?
Well:
stringdist::stringdist("rjson", "jsonlite")
## [1] 5
That's a modest difference to begin with.
However, your assertion seems to be amiss:
library(magrittr)
rjson::fromJSON('{"name": "Sanjay",
"unit_price": 130848,
"amount": 11,
"up_to_data_sales": 45725}') %>% str()
## List of 4
## $ name : chr "Sanjay"
## $ unit_price : num 130848
## $ amount : num 11
## $ up_to_data_sales: num 45725
jsonlite::fromJSON('{"name": "Sanjay",
"unit_price": 130848,
"amount": 11,
"up_to_data_sales": 45725}') %>% str()
## List of 4
## $ name : chr "Sanjay"
## $ unit_price : int 130848
## $ amount : int 11
## $ up_to_data_sales: int 45725
Apart from jsonlite using a more diminutive data type for the numbers, they both parse the JSON fine.
So there's an issue with your file that you failed to disclose in the question.
A further incorrect assertion
-rw-rw-r-- 1 bob staff 2690 Jul 30 2007 rjson_0.1.0.tar.gz
-rw-rw-r-- 1 bob staff 400196 Dec 3 2013 jsonlite_0.9.0.tar.gz
not to mention:
-rw-rw-r-- 1 bob staff 873843 Oct 4 2010 RJSONIO_0.3-1.tar.gz
rjson came first. (dir listings came from the CRAN mirror sitting next to me).
You can actually read about the rationale and impetus behind jsonlite here: https://arxiv.org/abs/1403.2805 (which I got off the CRAN page for jsonlite.
1) Why is the error coming? - Error is due to the mistake in syntax
rjson does not read the file if 'file=' command is not given whereas when reading the file using Jsonlite it is not required
# For example:
y <- rjson::fromJSON(file = "Input.json")
x <- jsonlite::fromJSON("Input.json")
2) What is the reason rjson package was launched when jsonlite existed?
First, rjson was launched before jsonlite and second, there is a difference in the way they read files:
For example, consider the following input:
{
"id": 1,
"prod_info": [
{
"product": "xyz",
"brand": "pqr",
"price": 500
},
{
"product": "abc",
"brand": "klm",
"price": 5000
}
]
}
prod_info in the above input is a list with 2 vectors. But jsonlite reads it in the form of dataframe while rjson reads it as a list
Outputs:
x
$id
[1] 1
$prod_info
product brand price
1 xyz pqr 500
2 abc klm 5000
y
$id
[1] 1
$prod_info
$prod_info[[1]]
$prod_info[[1]]$product
[1] "xyz"
$prod_info[[1]]$brand
[1] "pqr"
$prod_info[[1]]$price
[1] 500
$prod_info[[2]]
$prod_info[[2]]$product
[1] "abc"
$prod_info[[2]]$brand
[1] "klm"
$prod_info[[2]]$price
[1] 5000
class(x$prod_info)
[1] "data.frame"
class(y$prod_info)
[1] "list"
The question has already been answered, but regarding differences between the two packages, I got bitten by one recently: how empty dictionaries are handled.
With rjson
> rjson::fromJSON("[]")
list()
> rjson::fromJSON("{}")
list()
Whereas, with jsonlite:
> jsonlite::fromJSON("[]")
list()
> jsonlite::fromJSON("{}")
named list()
That is, with rjson, you can't tell the difference between an empty list and an empty dictionary.
The translation to JSON works with both however, e.g. toJSON(structure(list(), names=character(0))) yields "{}".
Related
1) R version 3.4.4 (2018-03-15)
my.timedate <- as.POSIXlt('2016-01-01 16:00:00')
# print(attributes(my.timedate))
print(my.timedate[['hour']])
[1] 16
2) R version 3.5.0 (2018-04-23)
my.timedate <- as.POSIXlt('2016-01-01 16:00:00')
# print(attributes(my.timedate))
print(my.timedate[['hour']])
Error in FUN(X[[i]], ...) : subscript out of bounds
I think that is a known change in R 3.5.0 where the list elements of a POSIXlt need to be unpackaged explicitly. Using R 3.5.0:
edd#rob:~$ docker run --rm -ti r-base:3.5.0 \
R -q -e 'print(unclass(as.POSIXlt("2016-01-01 16:00:00")[["hour"]])'
> print(unclass(as.POSIXlt("2016-01-01 16:00:00"))[["hour"]])
[1] 16
>
>
edd#rob:~$
whereas with R 3.4.* one does not need the unclass() as you showed:
edd#rob:~$ docker run --rm -ti r-base:3.4.3 \
R -q -e 'print(as.POSIXlt("2016-01-01 16:00:00")[["hour"]])'
> print(as.POSIXlt("2016-01-01 16:00:00")[["hour"]])
[1] 16
>
>
edd#rob:~$
I don't find a corresponding NEWS file entry though so not entirely sure if it is on purpose...
Edit: As others have noted, the corresponding NEWS entry is the somewhat opaque
* Single components of "POSIXlt" objects can now be extracted and
replaced via [ indexing with 2 indices.
From ?POSIXlt:
As from R 3.5.0, one can extract and replace single components via [ indexing with two indices (see the examples).
The example is a little opaque, but shows the idea:
leapS[1 : 5, "year"]
If you look at the source, though, you can see what's happening:
`[.POSIXlt`
#> function (x, i, j, drop = TRUE)
#> {
#> if (missing(j)) {
#> .POSIXlt(lapply(X = unclass(x), FUN = "[", i, drop = drop),
#> attr(x, "tzone"), oldClass(x))
#> }
#> else {
#> unclass(x)[[j]][i]
#> }
#> }
#> <bytecode: 0x7fbdb4d24f60>
#> <environment: namespace:base>
It is using i to subset unclass(x), where x is the POSIXlt object. So with R 3.5.0, you use [ and preface the part of the datetime you want with the index of the datetime in the vector:
my.timedate <- as.POSIXlt('2016-01-01 16:00:00')
my.timedate[1, 'hour']
#> [1] 16
as.POSIXlt(seq(my.timedate, by = 'hour', length.out = 10))[2:5, 'hour']
#> [1] 17 18 19 20
Note that $ subsetting still works as usual:
my.timedate$hour
#> [1] 16
See ?DateTimeClasses (same as ?as.POSIXlt):
As from R 3.5.0, one can extract and replace single components via [ indexing with two indices
See also similar description in R NEWS CHANGES IN R 3.5.0.
Thus:
my.timedate[1, "hour"]
# [1] 16
# or leave the i index empty to select a component
# from all date-times in a vector
as.POSIXlt(c('2016-01-01 16:00:00', '2016-01-01 17:00:00'))[ , "hour"]
# [1] 16 17
See also Examples in the help text.
Total R noob here.
I am having difficulty creating a list of stock tickers.
Here's the situation:
I've created a dataframe of tickers pulled in from Quandl's API.
x1<-Quandl.datatable('SHARADAR/SF1',paginate=TRUE,
qopts.columns=c('ticker'))
I then try to put this dataframe into a list.
x2<-as.list(x1)
So that I can then use the API to pull data for all the tickers in the list.
x3<-Quandl.datatable('SHARADAR/SF1',paginate=TRUE,
qopts.columns=c('ticker','dimension','datekey','revenue'),
dimension='ART', calendardate='2015-12-31',ticker=c(x2))
But, alas, this doesn't work.
Compare this, however, with when I pull specific tickers:
Quandl.datatable('SHARADAR/SF1', ticker=c('AAPL', 'TSLA'))
z = list('AAPL','TSLA')
The code behaves itself:
x3<-Quandl.datatable('SHARADAR/SF1',paginate=TRUE,
qopts.columns=c('ticker','dimension','datekey','revenue'),
dimension='ART', calendardate='2015-12-31',ticker=z)
This is because each ticker is its own component in the list(z):
[[1]]
[1] "AAPL"
[[2]]
[1] "TSLA"
Whereas for x2 all the tickers are stored as a single list component:
[1] "AAPL", "TSLA", etc.
Therefore I'd be swell if I could find a way to convert vector x2 into a list where each element is it's own component.
Thank a bunch (and for your patience as well!)
This should work:
x = sapply(1:5000, list)
The length is 5000:
length(x)
[1] 5000
All elements are integers:
all(sapply(x, is.integer) == TRUE)
[1] TRUE
This also works with character vectors:
sapply(c('AAPL', 'MSFT', 'AMZN'), list)
$AAPL
[1] "AAPL"
$MSFT
[1] "MSFT"
$AMZN
[1] "AMZN"
One option could be as:
x1 <- c(list(),1:5000)
str(x1)
# List of 10
# $ : int 1
# $ : int 2
# $ : int 3
# $ : int 4
# $ : int 5
# $ : int 6
# $ : int 7
# $ : int 8
#...
#.....
x1 is a one column data frame. Because a data.frame really is a list under the hood, as.list() just gives you a list of columns, in this case list(x1$column1).
You need to run as.list on a vector to get the result you want. Either of these will work:
as.list(x1$your_column_name)
as.list(x1[["your_column_name"]])
I'm trying to read information from the Zillow API and am running into some data structure issues in R. My outputs are supposed to be xml and appear to be, but aren't behaving like xml.
Specifically, the object that GetSearchResults() returns to me is in a format similar to XML, but not quite right to read in R's XML reading functions.
Can you tell me how I should approach this?
#set directory
setwd('[YOUR DIRECTORY]')
# setup libraries
library(dplyr)
library(XML)
library(ZillowR)
library(RCurl)
# setup api key
set_zillow_web_service_id('[YOUR API KEY]')
xml = GetSearchResults(address = '120 East 7th Street', citystatezip = '10009')
data = xmlParse(xml)
This throws the following error:
Error: XML content does not seem to be XML
The Zillow API documentation clearly states that the output should be XML, and it certainly looks like it. I'd like to be able to easily access various components of the API output for larger-scale data manipulation / aggregation. Let me know if you have any ideas.
This was a fun opportunity for me to get acquainted with the Zillow API. My approach, following How to parse XML to R data frame, was to convert the response to a list, for ease of inspection. The onerous bit was figuring out the structure of the data through inspecting the list, particularly because each property might have some missing data. This was why I wrote the getValRange function to deal with parsing the Zestimate data.
results <- xmlToList(xml$response[["results"]])
getValRange <- function(x, hilo) {
ifelse(hilo %in% unlist(dimnames(x)), x["text",hilo][[1]], NA)
}
out <- apply(results, MAR=2, function(property) {
zpid <- property$zpid
links <- unlist(property$links)
address <- unlist(property$address)
z <- property$zestimate
zestdf <- list(
amount=ifelse("text" %in% names(z$amount), z$amount$text, NA),
lastupdated=z$"last-updated",
valueChange=ifelse(length(z$valueChange)==0, NA, z$valueChange),
valueLow=getValRange(z$valuationRange, "low"),
valueHigh=getValRange(z$valuationRange, "high"),
percentile=z$percentile)
list(id=zpid, links, address, zestdf)
})
data <- as.data.frame(do.call(rbind, lapply(out, unlist)),
row.names=seq_len(length(out)))
Sample output:
> data[,c("id", "street", "zipcode", "amount")]
id street zipcode amount
1 2098001736 120 E 7th St APT 5A 10009 2321224
2 2101731413 120 E 7th St APT 1B 10009 2548390
3 2131798322 120 E 7th St APT 5B 10009 2408860
4 2126480070 120 E 7th St APT 1A 10009 2643454
5 2125360245 120 E 7th St APT 2A 10009 1257602
6 2118428451 120 E 7th St APT 4A 10009 <NA>
7 2125491284 120 E 7th St FRNT 1 10009 <NA>
8 2126626856 120 E 7th St APT 2B 10009 2520587
9 2131542942 120 E 7th St APT 4B 10009 1257676
# setup libraries
pacman::p_load(dplyr,XML,ZillowR,RCurl) # I use pacman, you don't have to
# setup api key
set_zillow_web_service_id('X1-mykey_31kck')
xml <- GetSearchResults(address = '120 East 7th Street', citystatezip = '10009')
dat <- unlist(xml)
str(dat)
Named chr [1:653] "120 East 7th Street" "10009" "Request successfully
processed" "0" "response" "results" "result" "zpid" "text"
"2131798322" "links" ...
- attr(*, "names")= chr [1:653] "request.address" "request.citystatezip" "message.text" "message.code" ...
dat <- as.data.frame(dat)
dat <- gsub("text","", dat$dat)
I'm not exactly sure what you wanted to do with these results but they're there and they look fine:
head(dat, 20)
[1] "120 East 7th Street"
[2] "10009"
[3] "Request successfully processed"
[4] "0"
[5] "response"
[6] "results"
[7] "result"
[8] "zpid"
[9] ""
[10] "2131798322"
[11] "links"
[12] "homedetails"
[13] ""
[14] "http://www.zillow.com/homedetails/120-E-7th-St-APT-5B-New-York-NY-10009/2131798322_zpid/"
[15] "mapthishome"
[16] ""
[17] "http://www.zillow.com/homes/2131798322_zpid/"
[18] "comparables"
[19] ""
[20] "http://www.zillow.com/homes/comps/2131798322_zpid/"
As stated previously, the trick is to get the API into a list (as opposed to XML). Then it becomes quite simple to pull out whatever data you are interested in.
I wrote an R package that simplifies this. Take a look on github - https://github.com/billzichos/homer. It comes with a vignette.
Assuming the Zillow ID of the property you were interested in was 36086728, the code would look like.
home_estimate("36086728")
I have read an ascii (.spe) file into R. This file contains one column of, mostly, integers. However R is interpreting these integers incorrectly, probably because I am not specifying the correct format or something like that. The file was generated in Ortec Maestro software. Here is the code:
library(SDMTools)
strontium<-read.table("C:/Users/Hal 2/Desktop/beta_spec/strontium 90 spectrum.spe",header=F,skip=2)
str_spc<-vector(mode="numeric")
for (i in 1:2037)
{
str_spc[i]<-as.numeric(strontium$V1[i+13])
}
Here, for example, strontium$V1[14] has the value 0, but R is interpreting it as a 10. I think I may have to convert the data to some other format, or something like that, but I'm not sure and I'm probably googling the wrong search terms.
Here are the first few lines from the file:
$SPEC_ID:
No sample description was entered.
$SPEC_REM:
DET# 1
DETDESC# MCB 129
AP# Maestro Version 6.08
$DATE_MEA:
10/14/2014 15:13:16
$MEAS_TIM:
1516 1540
$DATA:
0 2047
Here is a link to the file: https://www.dropbox.com/sh/y5x68jen487qnmt/AABBZyC6iXBY3e6XH0XZzc5ba?dl=0
Any help appreciated.
I saw someone had made a parser for SPE Spectra files in python and I can't let that stand without there being at least a minimally functioning R version, so here's one that parses some of the fields, but gets you your data:
library(stringr)
library(gdata)
library(lubridate)
read.spe <- function(file) {
tmp <- readLines(file)
tmp <- paste(tmp, collapse="\n")
records <- strsplit(tmp, "\\$")[[1]]
records <- records[records!=""]
spe <- list()
spe[["SPEC_ID"]] <- str_match(records[which(startsWith(records, "SPEC_ID"))],
"^SPEC_ID:[[:space:]]*([[:print:]]+)[[:space:]]+")[2]
spe[["SPEC_REM"]] <- strsplit(str_match(records[which(startsWith(records, "SPEC_REM"))],
"^SPEC_REM:[[:space:]]*(.*)")[2], "\n")
spe[["DATE_MEA"]] <- mdy_hms(str_match(records[which(startsWith(records, "DATE_MEA"))],
"^DATE_MEA:[[:space:]]*(.*)[[:space:]]$")[2])
spe[["MEAS_TIM"]] <- strsplit(str_match(records[which(startsWith(records, "MEAS_TIM"))],
"^MEAS_TIM:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["ROI"]] <- str_match(records[which(startsWith(records, "ROI"))],
"^ROI:[[:space:]]*(.*)[[:space:]]$")[2]
spe[["PRESETS"]] <- strsplit(str_match(records[which(startsWith(records, "PRESETS"))],
"^PRESETS:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["ENER_FIT"]] <- strsplit(str_match(records[which(startsWith(records, "ENER_FIT"))],
"^ENER_FIT:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["MCA_CAL"]] <- strsplit(str_match(records[which(startsWith(records, "MCA_CAL"))],
"^MCA_CAL:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["SHAPE_CAL"]] <- str_match(records[which(startsWith(records, "SHAPE_CAL"))],
"^SHAPE_CAL:[[:space:]]*(.*)[[:space:]]*$")[2]
spe_dat <- strsplit(str_match(records[which(startsWith(records, "DATA"))],
"^DATA:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["SPE_DAT"]] <- as.numeric(gsub("[[:space:]]", "", spe_dat)[-1])
return(spe)
}
dat <- read.spe("strontium 90 spectrum.Spe")
str(dat)
## List of 10
## $ SPEC_ID : chr "No sample description was entered."
## $ SPEC_REM :List of 1
## ..$ : chr [1:3] "DET# 1" "DETDESC# MCB 129" "AP# Maestro Version 6.08"
## $ DATE_MEA : POSIXct[1:1], format: "2014-10-14 15:13:16"
## $ MEAS_TIM : chr "1516 1540"
## $ ROI : chr "0"
## $ PRESETS : chr [1:3] "None" "0" "0"
## $ ENER_FIT : chr "0.000000 0.002529"
## $ MCA_CAL : chr [1:2] "3" "0.000000E+000 2.529013E-003 0.000000E+000 keV"
## $ SHAPE_CAL: chr "3\n3.100262E+001 0.000000E+000 0.000000E+000"
## $ SPE_DAT : num [1:2048] 0 0 0 0 0 0 0 0 0 0 ...
head(dat$SPE_DAT)
## [1] 0 0 0 0 0 0
It needs some polish and there's absolutely no error checking (i.e. for missing fields), but no time today to deal with that. I'll finish the parsing and make a minimal package wrapper for it over the next couple days.
How does one determine which architectures are supported by an installation of R? On a standard windows install, one may look for the existence of R_HOME/bin/*/R.exe where * is the architecture (typically i386 or x64). On a standard mac install from CRAN, there are no subdirectories.
I can query R for the default architecture using something like:
$ R --silent -e "sessionInfo()[[1]][[2]]"
> sessionInfo()[[1]][[2]]
[1] "x86_64"
but how do I know on mac/linux whether any sub-architectures are installed, and if so what they are?
R.version, R.Version(), R.version.string, and version provide detailed information about the version of R running.
Update, based on a better understanding of the question. This isn't a complete solution, but it seems you can get fairly close via a combination of the following commands:
# get all the installed architectures
arch <- basename(list.dirs(R.home('bin'), recursive=FALSE))
# handle different operating systems
if(.Platform$OS.type == "unix") {
arch <- gsub("exec","",arch)
if(arch == "")
arch <- R.version$arch
} else { # Windows
# any special handling
}
Note that this won't work if you've built R from source and installed the different architectures in various different places. See 2.6 Sub-architectures of the R Installation and Administration manual for more details.
Using Sys.info() you have a lot of information on your system.
May be it can help here
Sys.info()["machine"]
machine
"x86_64"
EDIT
One workaround to have all architecture possible is to download log files from the Rstudio mirror, it's not complete but it's good estimate of what you need.
start <- as.Date('2012-10-01')
today <- as.Date('2013-07-01')
all_days <- seq(start, today, by = 'day')
year <- as.POSIXlt(all_days)$year + 1900
urls <- paste0('http://cran-logs.rstudio.com/', year, '/', all_days, '.csv.gz')
files <- file.path("/tmp", basename(urls))
list_data <- lapply(files, read.csv, stringsAsFactors = FALSE)
data <- do.call(rbind, list_data)
str(data)
## 'data.frame': 10694506 obs. of 10 variables:
## $ date : chr "2012-10-01" "2012-10-01" "2012-10-01" "2012-10-01" ...
## $ time : chr "00:30:13" "00:30:15" "02:30:16" "02:30:16" ...
## $ size : int 35165 212967 167199 21164 11046 42294 435407 326143 119459 868695 ...
## $ r_version: chr "2.15.1" "2.15.1" "2.15.1" "2.15.1" ...
## $ r_arch : chr "i686" "i686" "x86_64" "x86_64" ...
## $ r_os : chr "linux-gnu" "linux-gnu" "linux-gnu" "linux-gnu" ...
## $ package : chr "quadprog" "lavaan" "formatR" "stringr" ...
## $ version : chr "1.5-4" "0.5-9" "0.6" "0.6.1" ...
## $ country : chr "AU" "AU" "US" "US" ...
## $ ip_id : int 1 1 2 2 2 2 2 1 1 3 ...
unique(data[["r_arch"]])
## [1] "i686" "x86_64" NA "i386" "i486"
## [6] "i586" "armv7l" "amd64" "000000" "powerpc64"
## [11] "armv6l" "sparc" "powerpc" "arm" "armv5tel"