I want to make a data frame from the following JSON sample:
{"gender": "M", "age": 68, "id": "e2127556f4f64592b11af22de27a7932", "became_member_on": "20180426", "income": 70000}
{"gender": null, "age": 118, "id": "8ec6ce2a7e7949b1bf142def7d0e0586", "became_member_on": "20170925", "income": null}
{"gender": null, "age": 118, "id": "68617ca6246f4fbc85e91a2a49552598", "became_member_on": "20171002", "income": null}
{"gender": "M", "age": 65, "id": "389bc3fa690240e798340f5a15918d5c", "became_member_on": "20180209", "income": 53000}
{"gender": null, "age": 118, "id": "8974fc5686fe429db53ddde067b88302", "became_member_on": "20161122", "income": null}
{"gender": null, "age": 118, "id": "c4863c7985cf408faee930f111475da3", "became_member_on": "20170824", "income": null}
{"gender": null, "age": 118, "id": "148adfcaa27d485b82f323aaaad036bd", "became_member_on": "20150919", "income": null}
We can use stream_in
out <- jsonlite::stream_in(textConnection(str1))
str(out)
#'data.frame': 7 obs. of 5 variables:
# $ gender : chr "M" NA NA "M" ...
# $ age : int 68 118 118 65 118 118 118
# $ id : chr "e2127556f4f64592b11af22de27a7932" "8ec6ce2a7e7949b1bf142def7d0e0586" "68617ca6246f4fbc85e91a2a49552598" "389bc3fa690240e798340f5a15918d5c" ...
# $ became_member_on: chr "20180426" "20170925" "20171002" "20180209" ...
# $ income : int 70000 NA NA 53000 NA NA NA
If we are reading from a file
out <- jsonlite::stream_in(file('yourfile.json'))
Or with ndjson::stream_in
out <- ndjson::stream_in('yourfile.json', 'tbl')
data
str1 <- '{"gender": "M", "age": 68, "id": "e2127556f4f64592b11af22de27a7932", "became_member_on": "20180426", "income": 70000}
{"gender": null, "age": 118, "id": "8ec6ce2a7e7949b1bf142def7d0e0586", "became_member_on": "20170925", "income": null}
{"gender": null, "age": 118, "id": "68617ca6246f4fbc85e91a2a49552598", "became_member_on": "20171002", "income": null}
{"gender": "M", "age": 65, "id": "389bc3fa690240e798340f5a15918d5c", "became_member_on": "20180209", "income": 53000}
{"gender": null, "age": 118, "id": "8974fc5686fe429db53ddde067b88302", "became_member_on": "20161122", "income": null}
{"gender": null, "age": 118, "id": "c4863c7985cf408faee930f111475da3", "became_member_on": "20170824", "income": null}
{"gender": null, "age": 118, "id": "148adfcaa27d485b82f323aaaad036bd", "became_member_on": "20150919", "income": null}'
I've a json file as shown below. I would like to extract the data into a R dataframe as follows. See the json object, that has a list of values for various dates. I would like to extract those values into the dataframe. Can you kindly help, on how I should build this?
Output Dataframe
Jan-18 a 5
Jan-18 b 0
Jan-18 c 9
Jan-18 d 0
Jan-18 e 5
Jan-19 a 4
Jan-19 b 0
Jan-19 c 26
Jan-19 d 0
Jan-19 e 35
value_headers = ['a', 'b', 'c', 'd', 'e']
Input JSON content:
{
"default": {
"timelineData": [
{
"time": "1610928000",
"formattedTime": "Jan 18, 2021",
"formattedAxisTime": "Jan 18",
"value": [
5,
0,
9,
0,
5
],
"hasData": [
true,
false,
true,
false,
true
],
"formattedValue": [
"5",
"0",
"9",
"0",
"5"
]
},
{
"time": "1611014400",
"formattedTime": "Jan 19, 2021",
"formattedAxisTime": "Jan 19",
"value": [
4,
0,
26,
0,
35
],
"hasData": [
true,
false,
true,
false,
true
],
"formattedValue": [
"4",
"0",
"26",
"0",
"35"
]
}
],
"averages": [
5,
1,
34,
25,
25
]
}
}
Using tidyverse could be something like:
library(jsonlite)
library(tidyverse)
json_dt <- fromJSON('{
"default": {
"timelineData": [
{
"time": "1610928000",
"formattedTime": "Jan 18, 2021",
"formattedAxisTime": "Jan 18",
"value": [
5,
0,
9,
0,
5
],
"hasData": [
true,
false,
true,
false,
true
],
"formattedValue": [
"5",
"0",
"9",
"0",
"5"
]
},
{
"time": "1611014400",
"formattedTime": "Jan 19, 2021",
"formattedAxisTime": "Jan 19",
"value": [
4,
0,
26,
0,
35
],
"hasData": [
true,
false,
true,
false,
true
],
"formattedValue": [
"4",
"0",
"26",
"0",
"35"
]
}
],
"averages": [
5,
1,
34,
25,
25
]
}
}')
tibble(
time = json_dt$default$timelineData$formattedTime,
value = json_dt$default$timelineData$formattedValue
) %>%
unnest(value) %>%
group_by(time) %>%
mutate(
letter = letters[1:n()],
value = as.integer(value),
time = str_replace(time, ",.*", ""),
time = str_replace(time, " ", "-")
)
Edit: I have cleaned up a bit the question posting, and added a bounty. I will be afk for a few days, but getting this resolved would be a huge help
I would like to create using d3 a d3.hierarchy of a tree model, using basketball data. I essentially want to create a bracket structured as such:
...where the graph / model is a tree where each node has exactly two children (except for all of the end / leaf nodes, of course). This is a textbook example of when you'd want to use the d3.tree() and d3.hierarchy() functionalities, but it requires a JSON in a fairly specific format for the d3.hierarchy command. In particular, for a bracket of 8 basketball teams in a tournament that goes 8 - 4 - 2 - 1, the JSON data needs to be formatted like this:
const playoffData = {
"name": "Rockets",
"round": 4,
"id": 15,
"children": [
{
"name": "Rockets",
"round": 3,
"id": 14,
"children": [
{
"name": "Rockets",
"round": 2,
"id": 9,
"children": [
{
"name": "Rockets",
"round": 1,
"id": 1
},
{
"name": "Timberwolves",
"round": 1,
"id": 8
}
]
},
{
"name": "Jazz",
"round": 2,
"id": 12,
"children": [
{
"name": "Jazz",
"round": 1,
"id": 4
},
{
"name": "Thunder",
"round": 1,
"id": 5
}
]
}
]
},
{
"name": "Warriors",
"round": 3,
"id": 13,
"children": [
{
"name": "Warriors",
"round": 2,
"id": 10,
"children": [
{
"name": "Warriors",
"round": 1,
"id": 2
},
{
"name": "Spurs",
"round": 1,
"id": 7
}
]
},
{
"name": "Pelicans",
"round": 2,
"id": 11,
"children": [
{
"name": "Pelicans",
"round": 1,
"id": 3
},
{
"name": "Trail Blazers",
"round": 1,
"id": 6
}
]
}
]
}
]
};
Note the nested nature of the JSONs. The root node corresponds with the winner of the bracket, and leaf nodes correspond to teams in the first round of the bracket.
I have the following R dataframe of basketball data for the bracket:
> dput(mydata)
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15), teamname = c("Rockets", "Warriors", "Trail Blazers",
"Jazz", "Thunder", "Pelicans", "Spurs", "Timberwolves", "Rockets",
"Warriors", "Pelicans", "Jazz", "Rockets", "Warriors", "Rockets"
), conference = c("West", "West", "West", "West", "West", "West",
"West", "West", "West", "West", "West", "West", "West", "West",
"West"), seeding = c(1, 2, 3, 4, 5, 6, 7, 8, NA, NA, NA, NA,
NA, NA, NA), round = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3,
3, 4), child1 = c(NA, NA, NA, NA, NA, NA, NA, NA, 1, 2, 3, 4,
9, 11, 13), child2 = c(NA, NA, NA, NA, NA, NA, NA, NA, 8, 7,
6, 5, 12, 10, 14), wins = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0), losses = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0), completed = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
), winprobs = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA)), .Names = c("id", "teamname", "conference", "seeding",
"round", "child1", "child2", "wins", "losses", "completed", "winprobs"
), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 17L, 18L, 19L,
20L, 25L, 26L, 29L), class = "data.frame")
> mydata
> playoff.data
id teamname conference seeding round child1 child2 wins losses completed winprobs
1 1 Rockets West 1 1 NA NA 0 0 FALSE NA
2 2 Warriors West 2 1 NA NA 0 0 FALSE NA
3 3 Trail Blazers West 3 1 NA NA 0 0 FALSE NA
4 4 Jazz West 4 1 NA NA 0 0 FALSE NA
5 5 Thunder West 5 1 NA NA 0 0 FALSE NA
6 6 Pelicans West 6 1 NA NA 0 0 FALSE NA
7 7 Spurs West 7 1 NA NA 0 0 FALSE NA
8 8 Timberwolves West 8 1 NA NA 0 0 FALSE NA
17 9 Rockets West NA 2 1 8 0 0 FALSE NA
18 10 Warriors West NA 2 2 7 0 0 FALSE NA
19 11 Pelicans West NA 2 3 6 0 0 FALSE NA
20 12 Jazz West NA 2 4 5 0 0 FALSE NA
25 13 Rockets West NA 3 9 12 0 0 FALSE NA
26 14 Warriors West NA 3 11 10 0 0 FALSE NA
29 15 Rockets West NA 4 13 14 0 0 FALSE NA
If you can tell, My R Dataframe has a row for what will be each node in my d3 graph. Notice the tree structure in particular, and the child1 and child2 helper columns for identifying children - for the Final Round (row 15), its child nodes are the two nodes in the previous round (13 and 14). For row 13 (the semi finals), its children nodes are 9 and 12, etc. The first 8 rows are the first round, and therefore these are leaf nodes and have no children.
Its a bit long, but I wanted to include the whole JSON and R dataframe to keep things clear. I would also like other dataframe columns (wins, losses, win probs) included in the JSON structure, however for a bit of brevity, I did not show these in the JSON above.
A last note: while I work mainly in R, this is a d3 graph I am making, and as such there is quite a bit of javascript coding that I must do for this. My opinion is that R is better for this type of data manip, however since this is a nested JSON object we're dealing with, maybe JS is better. If there's an eas(ier) solution that involves using javasript to map a 2D JSON version of the R dataframe into the desired nested JSON, that would probably be sufficient as well.
Any help with this is appreciated! I promise to select a top answer once I return to award the bounty.
Here is a tidyverse solution.
We reformat your data and split the data.frame in 4 data.frames.
Then we join those, nesting the relevant columns at each step.
Finally we use toJSON to finish the job :
my.split <- my.data %>%
gather(temp,children,child1,child2) %>%
select(-temp) %>%
select(name= teamname,round,id,children) %>% # change here to keep more columns
distinct %>%
split(.$round)
my.split[[1]] %>%
select(-children) %>%
right_join(my.split[[2]],by=c(id="children"),suffix=c("",".y")) %>%
nest(1:3) %>% # change here to keep more columns
setNames(names(my.split[[1]])) %>%
right_join(my.split[[3]],by=c(id="children"),suffix=c("",".y")) %>%
nest(1:4) %>% # change here to keep more columns
setNames(names(my.split[[1]])) %>%
right_join(my.split[[4]],by=c(id="children"),suffix=c("",".y")) %>%
nest(1:4) %>% # change here to keep more columns
setNames(names(my.split[[1]])) %>%
jsonlite::toJSON(pretty=TRUE)
output:
[
{
"name": "Rockets",
"round": 4,
"id": 15,
"children": [
{
"name": "Rockets",
"round": 3,
"id": 13,
"children": [
{
"name": "Rockets",
"round": 2,
"id": 9,
"children": [
{
"name": "Rockets",
"round": 1,
"id": 1
},
{
"name": "Timberwolves",
"round": 1,
"id": 8
}
]
},
{
"name": "Jazz",
"round": 2,
"id": 12,
"children": [
{
"name": "Jazz",
"round": 1,
"id": 4
},
{
"name": "Thunder",
"round": 1,
"id": 5
}
]
}
]
},
{
"name": "Warriors",
"round": 3,
"id": 14,
"children": [
{
"name": "Pelicans",
"round": 2,
"id": 11,
"children": [
{
"name": "Trail Blazers",
"round": 1,
"id": 3
},
{
"name": "Pelicans",
"round": 1,
"id": 6
}
]
},
{
"name": "Warriors",
"round": 2,
"id": 10,
"children": [
{
"name": "Warriors",
"round": 1,
"id": 2
},
{
"name": "Spurs",
"round": 1,
"id": 7
}
]
}
]
}
]
}
]
You can try this recursive function together with jsonlite::toJSON():
get_node <- function(df, id) {
node <- as.list(df[df$id == id, c("teamname", "round", "id")])
names(node) = c("name", "round", "id")
id1 <- df[df$id == id,]$child1
id2 <- df[df$id == id,]$child2
if (!is.na(id1) && !is.na(id2)) {
child1 <- get_node(df, id1)
child2 <- get_node(df, id2)
if (child1$name == node$name)
node$children <- list(child1, child2)
else if (child2$name == node$name)
node$children <- list(child2, child1)
else
stop("Inout data is inconsistent!")
}
node
}
jsonlite::toJSON(get_node(playoffs, 15), pretty = TRUE, auto_unbox = TRUE)
With your data I get the following JSON:
{
"name": "Rockets",
"round": 4,
"id": 15,
"children": [
{
"name": "Rockets",
"round": 3,
"id": 13,
"children": [
{
"name": "Rockets",
"round": 2,
"id": 9,
"children": [
{
"name": "Rockets",
"round": 1,
"id": 1
},
{
"name": "Timberwolves",
"round": 1,
"id": 8
}
]
},
{
"name": "Jazz",
"round": 2,
"id": 12,
"children": [
{
"name": "Jazz",
"round": 1,
"id": 4
},
{
"name": "Thunder",
"round": 1,
"id": 5
}
]
}
]
},
{
"name": "Warriors",
"round": 3,
"id": 14,
"children": [
{
"name": "Warriors",
"round": 2,
"id": 10,
"children": [
{
"name": "Warriors",
"round": 1,
"id": 2
},
{
"name": "Spurs",
"round": 1,
"id": 7
}
]
},
{
"name": "Pelicans",
"round": 2,
"id": 11,
"children": [
{
"name": "Pelicans",
"round": 1,
"id": 6
},
{
"name": "Trail Blazers",
"round": 1,
"id": 3
}
]
}
]
}
]
}
My data frame contains data as follows:
Tester W1 W2 W3 A P WD(%) TS(Hrs.) AT(Hrs.) SU(%)
a 60 40 102 202 150 100 120 120 100
b 30 38 46 114 150 76 135 120 100
c 25 30 52 107 150 71 120 120 100
By using the package jsonlite I have converted to json format:
{
"Tester": [ "a", "b", "c" ],
"W1": [ 60, 30, 25],
"W2": [ 40, 38, 30 ],
"W3": [ 102, 46, 52 ],
"A": [ 202, 114, 107 ],
"P": [ 150, 150, 150 ],
"WD...": [ 100, 76, 71 ],
"TS.Hrs..": [ 120, 135, 120 ],
"AT.Hrs..": [ 120, 120, 120 ],
"SU...": [ 100, 100, 100 ]
}
But my requirement is to get the JSON format like:
[ {
"Tester":"a"
"W1": 60,
"w2": 40
"w3": 102,
"A": 202
"P": 150,
"WD(%)":100,
"TS (Hrs.) ": 120,
"AT (Hrs.)": 120,
"SU(%)": 100
}]
Can someone please help me?
The output that you're seeing is produced by jsonlite, when a data set is a list:
library(jsonlite)
toJSON(as.list(head(iris)))
{"Sepal.Length":[5.1,4.9,4.7,4.6,5,5.4],"Sepal.Width":[3.5,3,3.2,3.1,3.6,3.9],"Petal.Length":[1.4,1.4,1.3,1.5,1.4,1.7],"Petal.Width":[0.2,0.2,0.2,0.2,0.2,0.4],"Species":["setosa","setosa","setosa","setosa","setosa","setosa"]}
Make sure that your data set is indeed a data frame and you will see the expected output:
library(jsonlite)
toJSON(head(iris), pretty = TRUE)
[
{
"Sepal.Length": 5.1,
"Sepal.Width": 3.5,
"Petal.Length": 1.4,
"Petal.Width": 0.2,
"Species": "setosa"
},
{
"Sepal.Length": 4.9,
"Sepal.Width": 3,
"Petal.Length": 1.4,
"Petal.Width": 0.2,
"Species": "setosa"
},
{
"Sepal.Length": 4.7,
"Sepal.Width": 3.2,
"Petal.Length": 1.3,
"Petal.Width": 0.2,
"Species": "setosa"
},
{
"Sepal.Length": 4.6,
"Sepal.Width": 3.1,
"Petal.Length": 1.5,
"Petal.Width": 0.2,
"Species": "setosa"
},
{
"Sepal.Length": 5,
"Sepal.Width": 3.6,
"Petal.Length": 1.4,
"Petal.Width": 0.2,
"Species": "setosa"
},
{
"Sepal.Length": 5.4,
"Sepal.Width": 3.9,
"Petal.Length": 1.7,
"Petal.Width": 0.4,
"Species": "setosa"
}
]
I am using GraphPlot to draw directed graphs with roughly a 100 vertices. I am replacing each vertex with a small rectangular or square image by defining the VertexRenderingFunction. The images often overlap. Is there a way to get Mathematica to space the vertices further apart to prevent them from overlapping?
I have tried the various obvious options for 'Method' ("SpringElectricalEmbedding", "SpringEmbedding", "HighDimensionalEmbedding", "CircularEmbedding", "RandomEmbedding", "LinearEmbedding").
trans = {1 -> 1, 2 -> 1, 3 -> 1, 4 -> 1, 5 -> 1, 6 -> 1, 7 -> 1,
8 -> 1, 9 -> 1, 10 -> 1, 11 -> 1, 12 -> 1, 13 -> 1, 14 -> 1,
15 -> 1, 16 -> 1, 17 -> 1, 18 -> 13, 19 -> 1, 20 -> 13, 21 -> 13,
22 -> 70, 23 -> 1, 24 -> 1, 25 -> 1, 26 -> 1, 27 -> 13, 28 -> 13,
29 -> 1, 30 -> 13, 31 -> 13, 32 -> 1, 33 -> 19, 34 -> 70, 35 -> 70,
36 -> 1, 37 -> 1, 38 -> 1, 39 -> 39, 40 -> 13, 41 -> 2, 42 -> 13,
43 -> 1, 44 -> 2, 45 -> 1, 46 -> 52, 47 -> 2, 48 -> 68, 49 -> 49,
50 -> 19, 51 -> 78, 52 -> 1, 53 -> 1, 54 -> 39, 55 -> 13, 56 -> 56,
57 -> 13, 58 -> 13, 59 -> 1, 60 -> 36, 61 -> 1, 62 -> 52, 63 -> 2,
6 4 -> 68, 65 -> 19, 66 -> 56, 67 -> 4, 68 -> 76, 69 -> 19,
70 -> 78, 71 -> 1, 72 -> 39, 73 -> 52, 74 -> 56, 75 -> 23,
76 -> 76, 77 -> 56, 78 -> 78};
image = {{1, 0, 0, 0, 0}, {0, 1, 0, 0, 0}};
GraphPlot[trans, DirectedEdges -> True, VertexLabeling -> True,
VertexRenderingFunction -> (Inset[
ArrayPlot[image, ImageSize -> 15, Mesh -> True], #1] &)]
Edit [I started over, based on the example you gave]:
Using your trans and image you could try:
p = ArrayPlot[image, ImageSize -> 35, Mesh -> True];
Graph[trans, DirectedEdges -> True, VertexLabels -> Placed[p, Tooltip],
ImagePadding -> 10, ImageSize -> 500]
The images will appear in tooltips when you mouse over each vertex. You could use different images for different vertex labels if you wish; just use a list of rules.
The picture below shows what it looks like (without the tooltips).
Click on link to see how it works with tooltips.