How to keep FULL_TRANSITIVE compatibility while adding new types to nested map in avro schema? - compatibility

I have an existing avro schema that contains a field with a nested map of map of a record type (let's call it RecordA for now). I'm wondering if it's possible to add a new record type, RecordB, to this nested map of maps while maintaining FULL_TRANSIENT compatibility?
My thinking was that as long as the inner maps gets defaulted to an empty map it still adheres to the schema so it's backwards/forward compatible.
I've tried to redefine the type map<map<RecordA>> maps to map<map<union{RecordA, RecordB}>> maps in an .avdl file, but the schema registry is telling me this is not compatible.
I've also tried to default each map individually to an empty map ({ }) in a generated .avsc file, but schema registry says that's incompatible as well.
I do want to acknowledge that I know map<map<..>> is a bad practice, but what's been done has been done.
Registered Schema (original) .avdl:
record Outer {
map<map<RecordA>> maps;
}
record RecordA {
string value;
string updateTime;
}
Attempt with .avdl:
record Outer {
map<map<union{RecordA, RecordB}>> maps = {};
}
record RecordA {
string value;
string updateTime;
}
record RecordB {
union{null, array<string>} values = null;
union{null, string} updateTime = null;
}
Attempt with .avsc:
{
"name" : "maps",
"type" : {
"type" : "map",
"values" : {
"type" : "map",
"values" : [ {
"type" : "record",
"name" : "RecordA",
"fields" : [ {
"name" : "value",
"type" : "string"
}, {
"name" : "updateTime",
"type" : "string"
} ],
"default": { }
}, {
"type" : "record",
"name" : "RecordB",
"fields" : [ {
"name" : "value",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "values",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "updateTime",
"type" : [ "null", "string" ],
"default" : null
} ],
"default": { }
} ]
}
},
"default" : { }
}
The end goal is to have a map of maps to a record who has a field that can either be a string or array<string>. The original schema was registered to a schema-registry where the field has type string with no union {} with null or a default, so I believe the map needs to be map to a union of types with either version of the field.
Each try has returned the following from the schema-registry compatibility API
{
"is_compatible": false
}
Any insight would be very much appreciated!

Related

API Gateway and DynamoDB PutItem for String Set

I can't seem to find how to correctly call PutItem for a StringSet in DynamoDB through API Gateway. If I call it like I would for a List of Maps, then I get objects returned. Example data is below.
{
"eventId": "Lorem",
"eventName": "Lorem",
"companies": [
{
"companyId": "Lorem",
"companyName": "Lorem"
}
],
"eventTags": [
"Lorem",
"Lorem"
]
}
And my example template call for companies:
"companies" : {
"L": [
#foreach($elem in $inputRoot.companies) {
"M": {
"companyId": {
"S": "$elem.companyId"
},
"companyName": {
"S": "$elem.companyName"
}
}
} #if($foreach.hasNext),#end
#end
]
}
I've tried to call it with String Set listed, but it errors out still and tells me that "Start of structure or map found where not expected" or that serialization failed.
"eventTags" : {
"SS": [
#foreach($elem in $inputRoot.eventTags) {
"S":"$elem"
} #if($foreach.hasNext),#end
#end
]
}
What is the proper way to call PutItem for converting an array of strings to a String Set?
If you are using JavaScript AWS SDK, you can use document client API (docClient.createSet) to store the SET data type.
docClient.createSet - converts the array into SET data type
var docClient = new AWS.DynamoDB.DocumentClient();
var params = {
TableName:table,
Item:{
"yearkey": year,
"title": title
"product" : docClient.createSet(['milk','veg'])
}
};

Loading mapbox with Firebase database

I'm trying to learn Firebase and Mapbox and wanted to integrate the two. Firebase stores some of my data in the following format:
{
"messages" : {
"-KUE2EwfvbI48Azw01Hv" : {
"geometry" : {
"coordinates" : [ 28.6618976, 77.22739580000007 ],
"type" : "Point"
},
"properties" : {
"description" : "xyz",
"hashtag" : "#xyz",
"imageUrl" : "xyz.jpg",
"name" : "Xyz Xyz",
"photoUrl" : "xyz.jpg",
"title" : "XYZ"
},
"type" : "Issue"
},
"-KUD2EwfvbI48Azw01Hv" : {
"geometry" : {
"coordinates" : [ 12.9715987, 77.59456269999998 ],
"type" : "Point"
},
"properties" : {
"description" : "xyz",
"hashtag" : "#xyz",
"imageUrl" : "xyz.jpg",
"name" : "Xyz Xyz",
"photoUrl" : "xyz.jpg",
"title" : "XYZ"
},
"type" : "Issue"
}
}
}
Is there a way to load the data and plot it into Mapbox? The examples require a GeoJSON file hosted somewhere that can be used to plot them. How can we use the Firebase database to plot on the Mapbox in realtime?
Sorry if my question is ambiguous. I'm willing to provide more information if needed :D
Thanks!
You can load the data, but you first have to convert it to a valid GeoJSON object.
Here is a JSFiddle using the data you provided:
https://jsfiddle.net/mkrv9uuy/
var firebaseGeojsonFeatures = [];
for (var key in firebaseData.messages) {
var f = firebaseData.messages[key];
f.type = "Feature";
firebaseGeojsonFeatures.push(f);
}

Using Usergrid how do I get related entities nested in a single json and not only the link to them

When I query /mycollections?ql=Select * where name='dfsdfsdfsdfsdfsdf' I get
{
"action" : "get",
"application" : "859e6180-de8a-11e4-9360-f1aabbc15f58",
"params" : {
"ql" : [ "Select * where name='dfsdfsdfsdfsdfsdf'" ]
},
"path" : "/mycollections",
"uri" : "http://localhost:8080/myorg/myapp/mycollections",
"entities" : [ {
"uuid" : "2ff8961a-dea8-11e4-996b-63ce373ace35",
"type" : "mycollection",
"name" : "dfsdfsdfsdfsdfsdf",
"created" : 1428577466865,
"modified" : 1428577466865,
"metadata" : {
"path" : "/mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35",
"connections" : {
"relations" : "/mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35/relations"
}
}
} ],
"timestamp" : 1428589309204,
"duration" : 53,
"organization" : "myorg",
"applicationName" : "myapp",
"count" : 1
}
Now if I query /mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35/relations I get the second entity
{
"action" : "get",
"application" : "859e6180-de8a-11e4-9360-f1aabbc15f58",
"params" : { },
"path" : "/mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35/relations",
"uri" : "http://localhost:8080/myorg/myapp/mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35/relations",
"entities" : [ {
"uuid" : "56a1185a-dec1-11e4-9ac0-e9343f86b604",
"type" : "secondcollection",
"name" : "coucou",
"created" : 1428588269141,
"modified" : 1428588269141,
"metadata" : {
"connecting" : {
"relations" : "/mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35/relations/56a1185a-dec1-11e4-9ac0-e9343f86b604/connecting/relations"
},
"path" : "/mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35/relations/56a1185a-dec1-11e4-9ac0-e9343f86b604"
}
} ],
"timestamp" : 1428589668542,
"duration" : 51,
"organization" : "myorg",
"applicationName" : "myapp"
}
What I want is that instead of providing me the path of the related entity Usergrid directly nest it in the first JSON answer so that I only need to make a single http request instead of two.
You cannot. Usergrid is not designed in that way. You need to write an extra wrapper rest endpoint to simulate one response.
Not sure what DB you are using. If you are using document db like mongo then you can write a node.js scripts to do this manipulation. Apigee has volvo.js check is it possible to do scripting.

problems on elasticsearch with parent child documents

We work with two types of documents on elastic search (ES): items and slots, where items are parents of slot documents.
We define the index with the following command:
curl -XPOST 'localhost:9200/items' -d #itemsdef.json
where itemsdef.json has the following definition
{
"mappings" : {
"item" : {
"properties" : {
"id" : {"type" : "long" },
"name" : {
"type" : "string",
"_analyzer" : "textIndexAnalyzer"
},
"location" : {"type" : "geo_point" },
}
}
},
"settings" : {
"analysis" : {
"analyzer" : {
"activityIndexAnalyzer" : {
"alias" : ["activityQueryAnalyzer"],
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
},
"textIndexAnalyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["word_delimiter_impl", "trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
},
"textQueryAnalyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["trim", "lowercase", "asciifolding", "spanish_stop"]
}
},
"filter" : {
"spanish_stop" : {
"type" : "stop",
"ignore_case" : true,
"enable_position_increments" : true,
"stopwords_path" : "analysis/spanish-stopwords.txt"
},
"spanish_synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/spanish-synonyms.txt"
},
"word_delimiter_impl" : {
"type" : "word_delimiter",
"generate_word_parts" : true,
"generate_number_parts" : true,
"catenate_words" : true,
"catenate_numbers" : true,
"split_on_case_change" : false
}
}
}
}
}
Then we add the child document definition using the following command:
curl -XPOST 'localhost:9200/items/slot/_mapping' -d #slotsdef.json
Where slotsdef.json has the following definition:
{
"slot" : {
"_parent" : {"type" : "item"},
"_routing" : {
"required" : true,
"path" : "parent_id"
},
"properties": {
"id" : { "type" : "long" },
"parent_id" : { "type" : "long" },
"activity" : {
"type" : "string",
"_analyzer" : "activityIndexAnalyzer"
},
"day" : { "type" : "integer" },
"start" : { "type" : "integer" },
"end" : { "type" : "integer" }
}
}
}
Finally we perform a bulk index with the following command:
curl -XPOST 'localhost:9200/items/_bulk' --data-binary #testbulk.json
Where testbulk.json holds the following data:
{"index":{"_type": "item", "_id":35}}
{"location":[40.4,-3.6],"id":35,"name":"A Name"}
{"index":{"_type":"slot","_id":126,"_parent":35}}
{"id":126,"start":1330,"day":1,"end":1730,"activity":"An Activity","parent_id":35}
We see through ES Head plugin that definitions seem to be ok. We test the analyzers to check that they have been loaded and they work. Both documents appear listed in ES Head browser view. But if we try to retrieve the child item using the API, ES responds that it does not exist:
$ curl -XGET 'localhost:9200/items/slot/126'
{"_index":"items","_type":"slot","_id":"126","exists":false}
When we import 50 documents, all parent documents can be retrieved through API, but only SOME of the requests for child elements get a successful response.
My guess is that it may have something to do with how docs are stored across shards and the routing...which certainly is not clear to me how it works.
Any clue on how to be able to retrieve individual child documents? ES Head shows they have been stored but HTTP GETs to localhost:9200/items/slot/XXX respond randomly with "exists":false.
The child documents are using parent's id for routing. So, in order to retrieve child documents you need to specify parent id in the routing parameter on your query:
curl "localhost:9200/items/slot/126?routing=35"
If parent id is not available, you will have to search for the child documents:
curl "localhost:9200/items/slot/_search?q=id:126"
or switch to an index with a single shard.

Freebase MQL query for topic summary and image?

I'm trying to write an MQL query to be executed using Freebase API's. I would like to retrieve the topic summary and the image for the topic.
I have been able to work out the below query which will get me the images associated with the Bill Gates topic.
MQL:
[
{
"/common/topic/image" : [
{
"id" : null
}
],
"name" : "bill gates",
"type" : "/people/person"
}
]
Results:
[
{
"/common/topic/image" : [
{
"id" : "/guid/9202a8c04000641f8000000004fb4c01"
},
{
"id" : "/wikipedia/images/commons_id/4486276"
}
],
"name" : "Bill Gates",
"type" : "/people/person"
}
]
For those that may have not run into MQL in the past but are interested in playing around with it. Check out the Freebase MQL Query Editor.
billg profile page http://i.friendfeed.com/c31a22d9e439eb67b0feeb4ffd64c3b5ed9a8e16
UPDATE
Query that I ended up using:
[
{
"/common/topic/image" : [
{
"id" : null
}
],
"article" : [
{
"content" : null
}
],
"name" : "bill gates",
"type" : "/common/topic"
}
]
These results can be combined with narphorium's answer to retrieve the actual data:
[
{
"/common/topic/image" : [
{
"id" : "/guid/9202a8c04000641f8000000004fb4c01"
},
{
"id" : "/wikipedia/images/commons_id/4486276"
}
],
"article" : [
{
"content" : null
},
{
"content" : "/guid/9202a8c04000641f800000000903535d"
}
],
"name" : "Bill Gates",
"type" : "/common/topic"
}
]
The images and topic summaries are stored separately in the content store and are accessible via another web service API.
For example, Bill Gates' image can be accessed like this:
http://www.freebase.com/api/trans/raw/guid/9202a8c04000641f8000000004fb4c01
Similarly, the GUID for the topic summary can be found by replacing /common/topic/image with /common/topic/article in your query. The results can be accessed again like this:
http://www.freebase.com/api/trans/raw/guid/9202a8c04000641f8000000008bfed35
You can read more about the content store here.
The new image service provided by freebase can now be used to get the images using the freebase ids, e.g., for Bill Gates following is the image URL:
https://usercontent.googleapis.com/freebase/v1/image/en/bill_gates
More about this service can be found at: http://wiki.freebase.com/wiki/Image_Service

Resources