I have a timeseries data which looks like follows
"data": {
"a": {
"T": [
1652167964645,
1652168781684,
1652168781720,
1652169266156,
1652169267146,
1652169272796,
1652169299338
],
"V": [
1,
2,
3,
10,
6,
1252,
1555
]
},
"b": {
"T": [
1652167961657,
1652168781720,
1652168781818,
1652168787377,
1652168835734,
1652169266108,
1652169266125,
1652169272798,
1652169299328
],
"V": [
1,
3,
4,
6,
12,
15,
16,
17,
1
]
},
"c": {
"T": [
1652167960194,
1652168787377,
1652169266108,
1652169272798,
1652169299328
],
"V": [
1,
3,
17,
18,
1
]
}}
inside the sub documents there are time and values
I can process the data in total. but if I want tp process only two sub document how can i do that ?
I can project like following
| project data["a"],data["b"] but then I can not process the time. how can i accomplish it ?
Expected output:
One column with time, and other column ( i.e a, b ) for the values
Time , A , B
0:55, 1,2
let requested_columns = dynamic(["a","b"]);
datatable(data:dynamic)
[
dynamic
(
{
"a": {
"T": [
1652167964645,
1652168781684,
1652168781720,
1652169266156,
1652169267146,
1652169272796,
1652169299338
],
"V": [
1,
2,
3,
10,
6,
1252,
1555
]
},
"b": {
"T": [
1652167961657,
1652168781720,
1652168781818,
1652168787377,
1652168835734,
1652169266108,
1652169266125,
1652169272798,
1652169299328
],
"V": [
1,
3,
4,
6,
12,
15,
16,
17,
1
]
},
"c": {
"T": [
1652167960194,
1652168787377,
1652169266108,
1652169272798,
1652169299328
],
"V": [
1,
3,
17,
18,
1
]
}
}
)
]
| mv-expand data
| extend key = tostring(bag_keys(data)[0])
| where key in (requested_columns)
| mv-expand T = data[key].T to typeof(long), V = data[key].V to typeof(long)
| evaluate pivot(key, take_any(V), T)
| order by T asc
T
a
b
1652167961657
1
1652167964645
1
1652168781684
2
1652168781720
3
3
1652168781818
4
1652168787377
6
1652168835734
12
1652169266108
15
1652169266125
16
1652169266156
10
1652169267146
6
1652169272796
1252
1652169272798
17
1652169299328
1
1652169299338
1555
Fiddle
Related
[
{
"_id": "",
"at": IsoDate(2022-11-19 10:00:00),
"areaId": 3,
"data": [
{
"name": "a",
"sec": 34,
"x": 10.3,
"y": 23.3
},
{
"name": "a",
"sec": 36,
"x": 10.3,
"y": 23.3
},
{
"name": "b",
"sec": 37,
"x": 10.3,
"y": 23.3
}
]
},
{
"_id": "",
"at": IsoDate(2022-11-19 10:00:00),
"areaId": 3,
"data": [
{
"name": "a",
"sec": 10,
"x": 10.3,
"y": 23.3
},
{
"name": "b",
"sec": 12,
"x": 10.3,
"y": 23.3
}
]
}
]
I have a table that indicates in what seconds people are in which area in packets of minutes. My goal here is to find the date the person with the specified name was last in the field with the most performance.
Can you help me for this?
Example output: Last date 'a' was found in polygon=3
2022-11-19 10:01:10 (with sec)
One option is:
Use the first 3 steps to find the matching document
The 4th step is to add the seconds to to the at date
db.collection.aggregate([
{$match: {data: {$elemMatch: {name: inputName}}}},
{$sort: {at: -1}},
{$limit: 1},
{$project: {
res: {
$dateAdd: {
startDate: "$at",
unit: "second",
amount: {$reduce: {
input: "$data",
initialValue: 0,
in: {$cond: [
{$eq: ["$$this.name", inputName]},
{$max: ["$$this.sec", "$$value"]},
"$$value"
]}
}}
}
}
}}
])
See how it works on the playground example
I have a dataset which looks like the following
{
"metadata":"d_meta_v_1.5.9",
"data": {
"a": {
"T": [
1652167964645,
1652168781684,
1652168781720
],
"V": [
1,
2,
3
]
},
"b": {
"T": [
1652167961657,
1652168781720,
1652168781818
],
"V": [
1,
3,
4
]
},
"c": {
"T": [
1652167960194,
1652168787377
],
"V": [
1,
3
]
}
}
}
I want to select the certain column and carry on the metadata also at the end. a part of this question is working in my perviou question here
How can I get my desired output ?
Metadata, Time, a, b
d_meta_v_1.5.9, <Time>, <value of _a>, < value of b>
d_meta_v_1.5.9, <Time>, <value of _a>, < value of b>
d_meta_v_1.5.9, <Time>, <value of _a>, < value of b>
let requested_columns = dynamic(["a","b"]);
datatable(doc:dynamic)
[
dynamic
(
{
"metadata":"d_meta_v_1.5.9",
"data": {
"a": {
"T": [
1652167964645,
1652168781684,
1652168781720
],
"V": [
1,
2,
3
]
},
"b": {
"T": [
1652167961657,
1652168781720,
1652168781818
],
"V": [
1,
3,
4
]
},
"c": {
"T": [
1652167960194,
1652168787377
],
"V": [
1,
3
]
}
}
}
)
]
| project metadata = doc.metadata, data = doc.data
| mv-expand data = data
| extend key = tostring(bag_keys(data)[0])
| where key in (requested_columns)
| mv-expand T = data[key].T to typeof(long), V = data[key].V to typeof(long)
| evaluate pivot(key, take_any(V), metadata, T)
| order by T asc
metadata
T
a
b
d_meta_v_1.5.9
1652167961657
1
d_meta_v_1.5.9
1652167964645
1
d_meta_v_1.5.9
1652168781684
2
d_meta_v_1.5.9
1652168781720
3
3
d_meta_v_1.5.9
1652168781818
4
Fiddle
I want to loop over a JSON array like this:
[
{
"id": 1,
"count" : 30
},
{
"id": 2,
"count" : 10
},
{
"id": 3,
"count" : 5
},
{
"id": 4,
"count" : 15
}
]
So I would like to have a query to project a TotalCount which would basically go over the json array and sum all the count values(30+10+5+15) and display as a new column
You can use mv-apply to do so.
For example:
datatable(d: dynamic) [
dynamic([
{
"id": 1,
"count": 30
},
{
"id": 2,
"count": 10
},
{
"id": 3,
"count": 5
},
{
"id": 4,
"count": 15
}
]),
dynamic([
{
"id": 1,
"count": 3
},
{
"id": 2,
"count": 1
},
{
"id": 3,
"count": 50
},
{
"id": 4,
"count": 1
}
]),
]
| mv-apply d on (
summarize result = sum(tolong(d['count']))
)
result
60
55
I have some json data that looks like:
{
"p": {
"d": {
"a" : {
"r": "foo",
"g": 1
},
"b": {
"r": "bar",
"g": 2
}
},
"c": {
"e": {
"r": "baz",
"g": 1
}
},
...
}
}
I want something like:
{
"d": [
"a",
"b"
],
"c": [
"e"
]
}
I can get the list of keys on the first level under "p" with jq '.p|keys', and the structure and keys on the second level with jq '.p|map(.|keys)', but I can't figure out how to combine it.
Use map_values instead of map to map the values of a JSON object while preserving the keys:
jq '.p | map_values(keys)'
On jq versions lower than 1.5, map_values is not defined: instead, you can use []|=:
jq '.p | . []|= keys'
In general
Top level keys:
curl -s https://crates.io/api/v1/crates/atty | jq '. |= keys'
[
"categories",
"crate",
"keywords",
"versions"
]
Two levels of keys:
curl -s https://crates.io/api/v1/crates/atty | jq '.| map_values(keys)'
{
"crate": [
"badges",
"categories",
"created_at",
"description",
"documentation",
"downloads",
"exact_match",
"homepage",
"id",
"keywords",
"links",
"max_version",
"name",
"newest_version",
"recent_downloads",
"repository",
"updated_at",
"versions"
],
"versions": [
0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16
],
"keywords": [
0,
1,
2
],
"categories": []
}
Method versions
topLevelJsonKeys() {
curl -s $1 | jq '. |= keys'
# EXAMPLE:
# topLevelJsonKeys https://crates.io/api/v1/crates/atty
}
topLevelJsonKeys2() {
curl -s $1 | jq '.| map_values(keys)'
# EXAMPLE:
# topLevelJsonKeys2 https://crates.io/api/v1/crates/atty
}
Here is a solution which uses reduce and setpath
.p
| reduce keys[] as $k (
.
; setpath([$k]; .[$k] | keys)
)
This is the structure of the github stats api data for a repository. I am using dplyr and tidy_json libraries to list the number of commits ("c") ,deletes("d"), lines of code added("a") and the corresponding week("w") for every user in a repository.
{
"total": 5,
"weeks": [
{
"w": 1428192000,
"a": 0,
"d": 0,
"c": 0
},
{
"w": 1428796800,
"a": 0,
"d": 0,
"c": 0
}
],
"author": {
"login": "ttuser1234",
"id": 111111111
}
},
{
"total": 18,
"weeks": [
{
"w": 1428192000,
"a": 212,
"d": 79,
"c": 5
},
{
"w": 1428796800,
"a": 146,
"d": 67,
"c": 1
}
],
"author": {
"login": "coder1234",
"id": 22222222
}
}
}
I am able to extract the weeks and author data separately, but then I am unable to join them together.
inp_file=read_json("The JSON file")
dat=as.tbl_json(inp_file)
dat%>%
enter_object("weeks") %>%
gather_array %>%
spread_values(week=jstring("w"),add=jstring("a"),del=jstring("d"),comm=jstring("c"))
enter_object("author") %>%
spread_values(handle=jstring("login"))
At no point am I able to jump from the author object to the weeks object to link the 2 of them. Is there any way I can do this? Appreciate any help.
tidyjson is nice, but I am not sure it is necessary in this case. Here is one way to achieve what I think is the desired result.
library(jsonlite)
library(dplyr)
df1 <- fromJSON(
'
[
{
"total": 5,
"weeks": [
{
"w": 1428192000,
"a": 0,
"d": 0,
"c": 0
},
{
"w": 1428796800,
"a": 0,
"d": 0,
"c": 0
}
],
"author": {
"login": "ttuser1234",
"id": 111111111
}
},
{
"total": 18,
"weeks": [
{
"w": 1428192000,
"a": 212,
"d": 79,
"c": 5
},
{
"w": 1428796800,
"a": 146,
"d": 67,
"c": 1
}
],
"author": {
"login": "coder1234",
"id": 22222222
}
}
]
'
)
# now the weeks column will actually be nested data.frames
# we can sort of join the weeks with the author information
# like this
df_joined <- df1 %>%
do(
data.frame(
.[["author"]],
bind_rows(.[["weeks"]])
)
)
Solution with tidyjson. It looks like your JSON has a bit of trouble in it, and that it perhaps should be an array? Fixed version below.
Using the development version from devtools::install_github('jeremystan/tidyjson')
In any case, it is not necessary to enter_object for both objects. Rather, you can use a more complex path to grab the handle for author before entering the weeks object.
json <- '[
{
"total": 5,
"weeks": [
{
"w": 1428192000,
"a": 0,
"d": 0,
"c": 0
},
{
"w": 1428796800,
"a": 0,
"d": 0,
"c": 0
}
],
"author": {
"login": "ttuser1234",
"id": 111111111
}
},
{
"total": 18,
"weeks": [
{
"w": 1428192000,
"a": 212,
"d": 79,
"c": 5
},
{
"w": 1428796800,
"a": 146,
"d": 67,
"c": 1
}
],
"author": {
"login": "coder1234",
"id": 22222222
}
}
]'
json %>% as.tbl_json %>%
gather_array() %>%
spread_values(handle=jstring('author','login')) %>% ## useful tip
enter_object("weeks") %>%
gather_array %>%
spread_values(week=jstring("w"),add=jstring("a")
,del=jstring("d"),comm=jstring("c"))
# A tbl_json: 4 x 8 tibble with a "JSON" attribute
# `attr(., "JSON")` document.id array.index handle array.index.2 week add del comm
# <chr> <int> <int> <chr> <int> <chr> <chr> <chr> <chr>
#1 {"w":1428192000... 1 1 ttuser1234 1 1428192000 0 0 0
#2 {"w":1428796800... 1 1 ttuser1234 2 1428796800 0 0 0
#3 {"w":1428192000... 1 2 coder1234 1 1428192000 212 79 5
#4 {"w":1428796800... 1 2 coder1234 2 1428796800 146 67 1
Of course, you can always split the data into two separate pipelines, but this seems like a nicer solution for this example.