R convert dataframe to JSON - r

I have a dataframe that I'd like to convert to json format:
my data frame called res1:
library(rjson)
structure(list(id = c(1, 2, 3, 4, 5), value = structure(1:5, .Label = c("server1",
"server2", "server3", "server4", "server5"), class = "factor")), .Names = c("id",
"value"), row.names = c(NA, -5L), class = "data.frame")
when I do:
toJSON(res1)
I get this:
{"id":[1,2,3,4,5],"value":["server1","server2","server3","server4","server5"]}
I need this json output to be like this, any ideas?
[{"id":1,"value":"server1"},{"id":2,"value":"server2"},{"id":3,"value":"server3"},{"id":4,"value":"server4"},{"id":5,"value":"server5"}]

The jsonlite package exists to address exactly this problem: "A practical and consistent mapping between JSON data and R objects."
Its toJSON function provides this desired result with the default options:
library(jsonlite)
x <- toJSON(res1)
cat(x)
## [{"id":1,"value":"server1"},{"id":2,"value":"server2"},
## {"id":3,"value":"server3"},{"id":4,"value":"server4"},
## {"id":5,"value":"server5"}]

How about
library(rjson)
x <- toJSON(unname(split(res1, 1:nrow(res1))))
cat(x)
# [{"id":1,"value":"server1"},{"id":2,"value":"server2"},
# {"id":3,"value":"server3"},{"id":4,"value":"server4"},
# {"id":5,"value":"server5"}]
By using split() we are essentially breaking up the large data.frame into a separate data.frame for each row. And by removing the names from the resulting list, the toJSON function wraps the results in an array rather than a named object.

Now you can easily just call jsonlite::write_json() directly on the dataframe.

You can also use library(jsonify)
jsonify::to_json( res1 )
# [{"id":1.0,"value":"server1"},{"id":2.0,"value":"server2"},{"id":3.0,"value":"server3"},{"id":4.0,"value":"server4"},{"id":5.0,"value":"server5"}]

Related

Joining character and numeric json elements in R

I am trying to join two json objects into a single json object in R using jsonlite.
As a simple illustration, if I have the following:
The api that I am using needs a Json object that has the column names of a dataframe as the first element, followed by the numeric output of the rows. To illustrate:
df <- data.frame(A = rnorm(2), B = rnorm(2), C = rnorm(2))
Which I need to look like:
set.seed(123)
[["A", "B", "C"], [-0.5605,1.5587,0.1293],[-0.2302,0.0705,1.7151]]
But the following attempts fail at achieving the above:
c( jsonlite::toJSON( names(df) ), jsonlite::toJSON( df, "values" ))
paste0( jsonlite::toJSON( names(df) ), jsonlite::toJSON( df, "values" ))
This solution does not work, and I haven't found any other suggestions for how to achieve this.
Any ideas would be appreciated.
An option is to split by row (asplit with MARGIN = 1) into a list, concatenate (c) with the names of the data and apply the toJSON
library(jsonlite)
toJSON(c(list(names(df)), asplit(df, 1)))
#[["A","B","C"],[-0.5605,1.5587,0.1293],[-0.2302,0.0705,1.7151]]

How to build a new list base on another file and sort it in a certain way

I have a list that contain multiple files, that looks like this:
Now I have a df that looks like this:
structure(list(Order = c(1, 2, 3, 4), Data = c("Bone Scan", "Brain Scan",
"", "Cancer History")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
How can I build a new data list which only contain the data that is in df$Data and stored in the order that appears in df?
Try to subset datalist using df$Data. It should give data in the same order as df$Data.
result <- datalist[df$Data]
We can also use pluck
library(purrr)
datalist %>%
pluck(df$Data)

How to remove additional numbers in each cell in a dataframe

I am doing some data analyzing with R. I read a csv file. I would like to eliminate 000,000,000 from each cell. How can I get rid of only 000? I tried to use grep(), but it dropped rows.
This is the dataframe:
You can try this. I have included dummy data based on your screenshot (and please attention to coment of #andrew_reece):
#Code
df$NewVar <- trimws(gsub('000','',df$VIOLATIONS_RAW),whitespace = ',')
Output:
VIOLATIONS_RAW NewVar
1 202,403,506,000 202,403,506
2 213,145,123 213,145,123
3 212,000 212
4 123,000,000,000 123
Some data used:
#Data
df <- structure(list(VIOLATIONS_RAW = c("202,403,506,000", "213,145,123",
"212,000", "123,000,000,000")), row.names = c(NA, -4L), class = "data.frame")
We could also do in a general way to remove any number of 0's
df$VIOLATIONS_RAW <- trimws(gsub("(?<=,)0+(?=(,|$))", "",
df$VIOLATIONS_RAW, perl = TRUE), whitespace=",")
df$VIOLATIONS_RAW
#[1] "202,403,506" "213,145,123" "212" "123"
data
df <- structure(list(VIOLATIONS_RAW = c("202,403,506,000", "213,145,123",
"212,000", "123,000,000,000")), row.names = c(NA, -4L), class = "data.frame")

Define empty data.table directly with the correct data type

In order to make my function more failsafe, I need to create an empty data.table, which does have a specific number of columns and a predefined data.type. This is to allow the later call to dplyr::union even though the data.table is empty.
Therefore, I would like to create an empty data.table and define the data types of the columns directly. This works for numeric or character columns, but fails for Date columns.
I found a possible solution by using entry 2.4 from the data.table FAQ, but it seems a bit weird to first fill the data.table with wrong values and remove them afterwards. FAQ 2.4
Code to replicate the issue:
library(data.table)
library(dplyr)
dt.empty <- data.table("Date" = character()
, "Char.Vector" = character()
, "Key.Variable" = character()
, "ExchangeRate" = numeric()
)
dt.Union <- data.table( "Date" = as.Date(c("2000-01-01", "2001-01-01"))
, "Char.Vector" = as.character(c("a", "b"))
, "Key.Variable" = as.character(c("x1", "x2"))
, "ExchangeRate" = as.numeric(c(2,1.4))
)
dplyr::union(dt.Union
, dt.empty)
Error: not compatible:
- Incompatible type for column `Date`: x Date, y character
- Incompatible type for column `ExchangeRate`: x numeric, y character
I could solve this by using dt.Union[0] to create dt.empty, but I thought perhaps there exists an easier way to do this.
You can follow the advice of FAQ 2.4 the first time if you're not sure how to write a length-zero vector for some class:
> dput(dt.Union[0])
structure(list(Date = structure(numeric(0), class = "Date"),
Char.Vector = character(0), Key.Variable = character(0),
ExchangeRate = numeric(0)), row.names = c(NA, 0L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x7ffd8d0ebee0>)
You can take the list(...) part out and your code becomes
myDT = setDT(list(
Date = structure(numeric(0), class = "Date"),
Char.Vector = character(0),
Key.Variable = character(0),
ExchangeRate = numeric(0)
))
More generally, dput(x[0L]) will show code to recreate the zero-length version of any vector.

Splitting a column based on select character?

I have a dataframe with many columns. For one of the columns ('cols'), it roughly has this structure:
'x\y\z'
Some of the rows are 'x\y\z' and others are 'x\y'. I am only interested in the 'y' portion of the row.
I have been looking through various posts on stackoverflow by people with similar questions, but I have not been able to find a solution that works. The closest that I got was this (which resulted in an error):
x = strsplit(df['cols'], "\")
I have a feeling I may not be utilizing a package correctly. Any help would be great!
Edit: Included sample structure and expected output
Current structure:
cols
'test\foo\bar'
'test\foo'
'test\bar'
'test\foo\foo'
Expected output:
cols
'foo'
'foo'
'bar'
'foo'
We need to escape
df$cols <- sapply(strsplit(df$cols, "\\\\"), `[`, 2)
df$cols
#[1] "foo" "foo" "bar" "foo"
Or with sub
sub("^\\w+.(\\w+).*", "\\1", df$cols)
#[1] "foo" "foo" "bar" "foo"
data
df <- structure(list(cols = c("test\\foo\\bar", "test\\foo", "test\\bar",
"test\\foo\\foo")), .Names = "cols", class = "data.frame", row.names = c(NA,
-4L))
You can have a look at a great package for data manipulation: tidyr
Then:
df = tidyr::separate(df, col = cols, into = c("x", "y", "z"), sep="\\\\")
(note the escaped backslash)

Resources