R: Using dplyr to link 2 objects (Github stats API example)

R: Using dplyr to link 2 objects (Github stats API example) - r

This is the structure of the github stats api data for a repository. I am using dplyr and tidy_json libraries to list the number of commits ("c") ,deletes("d"), lines of code added("a") and the corresponding week("w") for every user in a repository.
{
"total": 5,
"weeks": [
{
"w": 1428192000,
"a": 0,
"d": 0,
"c": 0
},
{
"w": 1428796800,
"a": 0,
"d": 0,
"c": 0
}
],
"author": {
"login": "ttuser1234",
"id": 111111111
}
},
{
"total": 18,
"weeks": [
{
"w": 1428192000,
"a": 212,
"d": 79,
"c": 5
},
{
"w": 1428796800,
"a": 146,
"d": 67,
"c": 1
}
],
"author": {
"login": "coder1234",
"id": 22222222
}
}
}
I am able to extract the weeks and author data separately, but then I am unable to join them together.
inp_file=read_json("The JSON file")
dat=as.tbl_json(inp_file)
dat%>%
enter_object("weeks") %>%
gather_array %>%
spread_values(week=jstring("w"),add=jstring("a"),del=jstring("d"),comm=jstring("c"))
enter_object("author") %>%
spread_values(handle=jstring("login"))
At no point am I able to jump from the author object to the weeks object to link the 2 of them. Is there any way I can do this? Appreciate any help.

tidyjson is nice, but I am not sure it is necessary in this case. Here is one way to achieve what I think is the desired result.
library(jsonlite)
library(dplyr)
df1 <- fromJSON(
'
[
{
"total": 5,
"weeks": [
{
"w": 1428192000,
"a": 0,
"d": 0,
"c": 0
},
{
"w": 1428796800,
"a": 0,
"d": 0,
"c": 0
}
],
"author": {
"login": "ttuser1234",
"id": 111111111
}
},
{
"total": 18,
"weeks": [
{
"w": 1428192000,
"a": 212,
"d": 79,
"c": 5
},
{
"w": 1428796800,
"a": 146,
"d": 67,
"c": 1
}
],
"author": {
"login": "coder1234",
"id": 22222222
}
}
]
'
)
# now the weeks column will actually be nested data.frames
# we can sort of join the weeks with the author information
# like this
df_joined <- df1 %>%
do(
data.frame(
.[["author"]],
bind_rows(.[["weeks"]])
)
)

Solution with tidyjson. It looks like your JSON has a bit of trouble in it, and that it perhaps should be an array? Fixed version below.
Using the development version from devtools::install_github('jeremystan/tidyjson')
In any case, it is not necessary to enter_object for both objects. Rather, you can use a more complex path to grab the handle for author before entering the weeks object.
json <- '[
{
"total": 5,
"weeks": [
{
"w": 1428192000,
"a": 0,
"d": 0,
"c": 0
},
{
"w": 1428796800,
"a": 0,
"d": 0,
"c": 0
}
],
"author": {
"login": "ttuser1234",
"id": 111111111
}
},
{
"total": 18,
"weeks": [
{
"w": 1428192000,
"a": 212,
"d": 79,
"c": 5
},
{
"w": 1428796800,
"a": 146,
"d": 67,
"c": 1
}
],
"author": {
"login": "coder1234",
"id": 22222222
}
}
]'
json %>% as.tbl_json %>%
gather_array() %>%
spread_values(handle=jstring('author','login')) %>% ## useful tip
enter_object("weeks") %>%
gather_array %>%
spread_values(week=jstring("w"),add=jstring("a")
,del=jstring("d"),comm=jstring("c"))
# A tbl_json: 4 x 8 tibble with a "JSON" attribute
# `attr(., "JSON")` document.id array.index handle array.index.2 week add del comm
# <chr> <int> <int> <chr> <int> <chr> <chr> <chr> <chr>
#1 {"w":1428192000... 1 1 ttuser1234 1 1428192000 0 0 0
#2 {"w":1428796800... 1 1 ttuser1234 2 1428796800 0 0 0
#3 {"w":1428192000... 1 2 coder1234 1 1428192000 212 79 5
#4 {"w":1428796800... 1 2 coder1234 2 1428796800 146 67 1
Of course, you can always split the data into two separate pipelines, but this seems like a nicer solution for this example.

Related

Finding the latest data of the object in the bundled data in Mongo

[
{
"_id": "",
"at": IsoDate(2022-11-19 10:00:00),
"areaId": 3,
"data": [
{
"name": "a",
"sec": 34,
"x": 10.3,
"y": 23.3
},
{
"name": "a",
"sec": 36,
"x": 10.3,
"y": 23.3
},
{
"name": "b",
"sec": 37,
"x": 10.3,
"y": 23.3
}
]
},
{
"_id": "",
"at": IsoDate(2022-11-19 10:00:00),
"areaId": 3,
"data": [
{
"name": "a",
"sec": 10,
"x": 10.3,
"y": 23.3
},
{
"name": "b",
"sec": 12,
"x": 10.3,
"y": 23.3
}
]
}
]
I have a table that indicates in what seconds people are in which area in packets of minutes. My goal here is to find the date the person with the specified name was last in the field with the most performance.
Can you help me for this?
Example output: Last date 'a' was found in polygon=3
2022-11-19 10:01:10 (with sec)

One option is:
Use the first 3 steps to find the matching document
The 4th step is to add the seconds to to the at date
db.collection.aggregate([
{$match: {data: {$elemMatch: {name: inputName}}}},
{$sort: {at: -1}},
{$limit: 1},
{$project: {
res: {
$dateAdd: {
startDate: "$at",
unit: "second",
amount: {$reduce: {
input: "$data",
initialValue: 0,
in: {$cond: [
{$eq: ["$$this.name", inputName]},
{$max: ["$$this.sec", "$$value"]},
"$$value"
]}
}}
}
}
}}
])
See how it works on the playground example

Project certain column and carry on one entry in every row

I have a dataset which looks like the following
{
"metadata":"d_meta_v_1.5.9",
"data": {
"a": {
"T": [
1652167964645,
1652168781684,
1652168781720
],
"V": [
1,
2,
3
]
},
"b": {
"T": [
1652167961657,
1652168781720,
1652168781818
],
"V": [
1,
3,
4
]
},
"c": {
"T": [
1652167960194,
1652168787377
],
"V": [
1,
3
]
}
}
}
I want to select the certain column and carry on the metadata also at the end. a part of this question is working in my perviou question here
How can I get my desired output ?
Metadata, Time, a, b
d_meta_v_1.5.9, <Time>, <value of _a>, < value of b>
d_meta_v_1.5.9, <Time>, <value of _a>, < value of b>
d_meta_v_1.5.9, <Time>, <value of _a>, < value of b>

let requested_columns = dynamic(["a","b"]);
datatable(doc:dynamic)
[
dynamic
(
{
"metadata":"d_meta_v_1.5.9",
"data": {
"a": {
"T": [
1652167964645,
1652168781684,
1652168781720
],
"V": [
1,
2,
3
]
},
"b": {
"T": [
1652167961657,
1652168781720,
1652168781818
],
"V": [
1,
3,
4
]
},
"c": {
"T": [
1652167960194,
1652168787377
],
"V": [
1,
3
]
}
}
}
)
]
| project metadata = doc.metadata, data = doc.data
| mv-expand data = data
| extend key = tostring(bag_keys(data)[0])
| where key in (requested_columns)
| mv-expand T = data[key].T to typeof(long), V = data[key].V to typeof(long)
| evaluate pivot(key, take_any(V), metadata, T)
| order by T asc
metadata
T
a
b
d_meta_v_1.5.9
1652167961657
1
d_meta_v_1.5.9
1652167964645
1
d_meta_v_1.5.9
1652168781684
2
d_meta_v_1.5.9
1652168781720
3
3
d_meta_v_1.5.9
1652168781818
4
Fiddle

Kusto query loop over json array

I want to loop over a JSON array like this:
[
{
"id": 1,
"count" : 30
},
{
"id": 2,
"count" : 10
},
{
"id": 3,
"count" : 5
},
{
"id": 4,
"count" : 15
}
]
So I would like to have a query to project a TotalCount which would basically go over the json array and sum all the count values(30+10+5+15) and display as a new column

You can use mv-apply to do so.
For example:
datatable(d: dynamic) [
dynamic([
{
"id": 1,
"count": 30
},
{
"id": 2,
"count": 10
},
{
"id": 3,
"count": 5
},
{
"id": 4,
"count": 15
}
]),
dynamic([
{
"id": 1,
"count": 3
},
{
"id": 2,
"count": 1
},
{
"id": 3,
"count": 50
},
{
"id": 4,
"count": 1
}
]),
]
| mv-apply d on (
summarize result = sum(tolong(d['count']))
)
result
60
55

Trouble accessing AWS Rekognition facial analysis using ``paws`` package

This is not a data analysis issue, so I don't have a data to reproduce.
I installed the paws package from this Github page to extract facial features (i.e., smile) via Amazon Rekognition. I am doing it as a part of a study to test performance across Microsoft Azure and Face++. By the way, I replaced "AccessKeyHere" and "SecretKeyHere" with appropriate security IDs.
library(paws)
Sys.setenv(
AWS_ACCESS_KEY_ID = "AccessKeyHere",
AWS_SECRET_ACCESS_KEY = "SecretKeyHere",
AWS_REGION = "us-east-1"
)
ec2 <- paws::ec2()
resp <- ec2$run_instances(
ImageId = "ami-f973ab84",
InstanceType = "t2.micro",
KeyName = "default",
MinCount = 1,
MaxCount = 1,
TagSpecifications = list(
list(
ResourceType = "instance",
Tags = list(
list(Key = "webserver", Value = "production")
)
)
)
)
Unfortunately, I get this error:
Error: InvalidKeyPair.NotFound: The key pair 'default' does not exist
I tried following through the Setting Up Credentials document in the Github page without success.
The results I want would look something along the lines of this (taken directly from Amazon demo):
{
"FaceDetails": [
{
"BoundingBox": {
"Width": 0.20394515991210938,
"Height": 0.4204871356487274,
"Left": 0.1556132435798645,
"Top": 0.11629478633403778
},
"AgeRange": {
"Low": 20,
"High": 38
},
"Smile": {
"Value": true,
"Confidence": 98.88771057128906
},
"Eyeglasses": {
"Value": true,
"Confidence": 99.87944030761719
},
"Sunglasses": {
"Value": true,
"Confidence": 99.51188659667969
},
"Gender": {
"Value": "Female",
"Confidence": 99.98441314697266
},
"Beard": {
"Value": false,
"Confidence": 99.99455261230469
},
"Mustache": {
"Value": false,
"Confidence": 99.99205017089844
},
"EyesOpen": {
"Value": true,
"Confidence": 100
},
"MouthOpen": {
"Value": true,
"Confidence": 99.64435577392578
},
"Emotions": [
{
"Type": "ANGRY",
"Confidence": 0.5140029191970825
},
{
"Type": "DISGUSTED",
"Confidence": 0.36493897438049316
},
{
"Type": "SURPRISED",
"Confidence": 1.5832388401031494
},
{
"Type": "CALM",
"Confidence": 7.553433418273926
},
{
"Type": "CONFUSED",
"Confidence": 2.7683539390563965
},
{
"Type": "SAD",
"Confidence": 0.1280381977558136
},
{
"Type": "HAPPY",
"Confidence": 87.08799743652344
}
],
"Landmarks": [
{
"Type": "eyeLeft",
"X": 0.23317773640155792,
"Y": 0.2868470251560211
},
{
"Type": "eyeRight",
"X": 0.3252476453781128,
"Y": 0.27732565999031067
},
{
"Type": "mouthLeft",
"X": 0.2494768351316452,
"Y": 0.4339924454689026
},
{
"Type": "mouthRight",
"X": 0.32560691237449646,
"Y": 0.42571622133255005
},
{
"Type": "nose",
"X": 0.29963040351867676,
"Y": 0.3560841381549835
},
{
"Type": "leftEyeBrowLeft",
"X": 0.18990693986415863,
"Y": 0.25858017802238464
},
{
"Type": "leftEyeBrowRight",
"X": 0.2559714913368225,
"Y": 0.23907452821731567
},
{
"Type": "leftEyeBrowUp",
"X": 0.22477854788303375,
"Y": 0.23571543395519257
},
{
"Type": "rightEyeBrowLeft",
"X": 0.3101874887943268,
"Y": 0.23408983647823334
},
{
"Type": "rightEyeBrowRight",
"X": 0.3540191650390625,
"Y": 0.24142536520957947
},
{
"Type": "rightEyeBrowUp",
"X": 0.3341374397277832,
"Y": 0.2246120721101761
},
{
"Type": "leftEyeLeft",
"X": 0.21425437927246094,
"Y": 0.28872400522232056
},
{
"Type": "leftEyeRight",
"X": 0.2506107687950134,
"Y": 0.28627288341522217
},
{
"Type": "leftEyeUp",
"X": 0.23298975825309753,
"Y": 0.2797400951385498
},
{
"Type": "leftEyeDown",
"X": 0.2338254302740097,
"Y": 0.29329705238342285
},
{
"Type": "rightEyeLeft",
"X": 0.3053741455078125,
"Y": 0.2805119752883911
},
{
"Type": "rightEyeRight",
"X": 0.33686137199401855,
"Y": 0.2753002941608429
},
{
"Type": "rightEyeUp",
"X": 0.3239244222640991,
"Y": 0.2698554992675781
},
{
"Type": "rightEyeDown",
"X": 0.32346177101135254,
"Y": 0.28338298201560974
},
{
"Type": "noseLeft",
"X": 0.27390313148498535,
"Y": 0.37751662731170654
},
{
"Type": "noseRight",
"X": 0.3062724471092224,
"Y": 0.373584508895874
},
{
"Type": "mouthUp",
"X": 0.29330143332481384,
"Y": 0.4100639820098877
},
{
"Type": "mouthDown",
"X": 0.2929871082305908,
"Y": 0.4546505808830261
},
{
"Type": "leftPupil",
"X": 0.23317773640155792,
"Y": 0.2868470251560211
},
{
"Type": "rightPupil",
"X": 0.3252476453781128,
"Y": 0.27732565999031067
},
{
"Type": "upperJawlineLeft",
"X": 0.14384371042251587,
"Y": 0.3039131164550781
},
{
"Type": "midJawlineLeft",
"X": 0.1776188313961029,
"Y": 0.4594067335128784
},
{
"Type": "chinBottom",
"X": 0.2889330983161926,
"Y": 0.5328735709190369
},
{
"Type": "midJawlineRight",
"X": 0.3430669903755188,
"Y": 0.441012978553772
},
{
"Type": "upperJawlineRight",
"X": 0.3498701751232147,
"Y": 0.28120794892311096
}
],
"Pose": {
"Roll": -4.4155192375183105,
"Yaw": 10.105213165283203,
"Pitch": 0.32932278513908386
},
"Quality": {
"Brightness": 60.6755256652832,
"Sharpness": 94.08262634277344
},
"Confidence": 99.99998474121094
}
]
}
If I could advance to this stage, it would be fantanstic. But it would be even nicer if the extracted data could look consistent with my Microsoft Azure results:
anger contempt disgust fear happiness neutral sadness surprise
emotion 0 0 0 0 0 1 0 0
emotion1 0 0 0 0 0 0.997 0.002 0
emotion2 0 0.001 0 0 0 0.994 0.004 0.001
emotion3 0 0 0 0 0 0.965 0.035 0

The error is with this line:
KeyName = "default",
It is referring to an Amazon EC2 Key Pair that should be attached to the Amazon EC2 instance. However, there is no keypair named default. Therefore, it fails.
To fix it, instead of default you should use the name of a Keypair that has been created. You can see a list of keypairs in the EC2 management console. You could also remove this line (without specifying a KeyName), but then you would not be able to login to the instance.

Oracle Reading a JSON_LIST of JSON_LIST using PLJSON

I'm trying to read a JSON object with has nested lists. Which looks like this:
[{
"id": 70070037001,
"text": "List 1",
"isleaf": 0,
"children": [
{
"oid": 100,
"text": "Innerlistobject100",
"isleaf": 0,
"children": [
{
"sid": 1000,
"text": "Innerlistobject1000",
"isleaf": 1
},
{
"sid": 2000,
"text": "Innerlistobject2000",
"isleaf": 1
}
]
},
{
"oid": 200,
"text": "Innerlistobject200",
"isleaf": 0,
"children": [
{
"sid": 1000,
"text": "Innerlistobject1000",
"isleaf": 1
},
{
"sid": 2000,
"text": "Innerlistobject2000",
"isleaf": 1
}
]
}
]
}]
ref: https://sourceforge.net/p/pljson/discussion/935365/thread/375c0293/ - where the person is creating the object, but I want to do the opposite and read it.
Do I have to iterate like this (note name is children within children):
Declare
l_Children_List json_list;
JSON_Obj json;
l_Child_JSON_Obj json;
Begin
IF (JSON_Obj.exist ('children')) THEN
IF (JSON_Obj.get ('children').is_array)
l_Children_List := json_list (JSON_Obj.get ('children'));
FOR i IN 1 .. l_Children_List.COUNT
IF (JSON_Obj.exist ('children')) THEN
IF (JSON_Obj.get ('children').is_array)
l_Children_List := json_list (JSON_Obj.get ('children'));
FOR i IN 1 .. l_Children_List.COUNT
jSON_child_val := l_Children_List.get (i);
l_Child_JSON_Obj := json (jSON_child_val );
LOOP
End If;
LOOP
End If;
End;

with json_example as (
select '{
"id": 70070037001,
"text": "List 1",
"isleaf": 0,
"children": [
{
"oid": 100,
"text": "Innerlistobject100",
"isleaf": 0,
"children": [
{
"sid": 1000,
"text": "Innerlistobject1000",
"isleaf": 1
},
{
"sid": 2000,
"text": "Innerlistobject2000",
"isleaf": 1
}
]
},
{
"oid": 200,
"text": "Innerlistobject200",
"isleaf": 0,
"children": [
{
"sid": 1000,
"text": "Innerlistobject1000",
"isleaf": 1
},
{
"sid": 2000,
"text": "Innerlistobject2000",
"isleaf": 1
}
]
}
]
}' as json_document
from dual
)
SELECT tab.*
FROM json_example a
join json_table (a.json_document, '$'
COLUMNS
(id NUMBER PATH '$.id'
,text VARCHAR2(50) PATH '$.text'
,isleaf NUMBER PATH '$.isleaf'
,NESTED PATH '$.children[*]'
COLUMNS
(oid NUMBER PATH '$.oid'
,otext VARCHAR2(150) PATH '$.text'
,oisleaf NUMBER PATH '$.isleaf'
,NESTED PATH '$.children[*]'
COLUMNS
(sid NUMBER PATH '$.sid'
,stext VARCHAR2(250) PATH '$.text'
,sisleaf NUMBER PATH '$.isleaf'
)
)
)
) tab on 1=1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: Using dplyr to link 2 objects (Github stats API example) - r

Related

Finding the latest data of the object in the bundled data in Mongo

Project certain column and carry on one entry in every row

Kusto query loop over json array

Trouble accessing AWS Rekognition facial analysis using ``paws`` package

Oracle Reading a JSON_LIST of JSON_LIST using PLJSON

Categories

Resources