Schema editor collection sampling is missing fields - azure-cosmosdb

I am attempting to use the ODBC Schema Editor to connect to several Cosmos DB collections for reporting purposes (using Power BI). While I can successfully generate a schema for one collection, another is not working correctly.
The Document in question includes a request object. Within request there should be multiple fields. When I sample my collection in Schema Editor, the resulting schema is missing any array of objects (or anything that includes an array of objects) that should be included under the request object – they are just not listed in the resulting schema. Several others are properly split out into their own tables, but the tables are always empty when the schema is applied (this is not reflective of the underlying data – I would expect to see things in those tables). Behavior does not change if the same collection is re-sampled.
Here's an example:
JSON selection
Does anyone know how I can get the schema editor to recognize all of my data? I'm not sure what to share that would be helpful but I'm happy to provide more if there's something that would be informative.
EDIT: Unless I'm misunderstanding how to query Cosmos DB, it seems that I'm seeing the issue show up even if I try to query the data directly through Data Explorer. In the below, you can see if I select c.request.preparedBy that preparedBy has a property mail:
preparedBy
However, if I try to query c.request.preparedBy.mail directly then I see nothing but blanks, which is exactly what appeared in the Schema Editor:
preparedBy.mail
Thinking that maybe there was a limit to how many layers of depth I could query, I tried selecting from request instead of the entire collection. Interestingly, even though I see preparedBy when I select * from request, request.preparedBy again returns nothing but empty braces.

Related

How to structure data Firestore, for multiple user enteries

This is my first time using a NOSQL database and I'm really struggling to work out how to structure my data.
I have an app that predicts a users mood and then the user can select if that's right or not. So I need to save both the prediction and the actual result. I want to be able to pull the latest result from firebase and display it on the app.
I understand how I'd do this on an SQL DB and understand how to write an SQL query to get that data back out.
For my Firebase DB I thought of the following structure
the document name is the usersID and store multiple arrays based on the timestamp but I can't seem to user OrderBy on a document only a collection so not sure how to get this back.
The fact that this seems so difficult less me to believe I've implemented the DB wrong to begin with.
Structure of DB is as follows:
I should add that it all works fine for the USER_TABLE as its one document id and a single entry, so I've no problem retrieving that.
Thanks for your help!
orderBy is an instruction to the database to order documents on the server, before it returns them to your app. To store the fields inside the document, you can just do that inside your application code after it receives the document(s).
There is in itself nothing wrong with storing these entries in a single document, Just keep in mind that:
A document can be at most be 1MB in size, so make sure this fits your maximum number of entries.
Firestore only ever returns full documents, so you will either get all entries in a document, or none of them.
You won't be able to order or filter the entries inside a single document. If that is a requirement for you, consider storing each entry in its own document in a subcollection. Note that this will increase the number of documents each user reads though, which will increase the cost.

Can I get consistent order of fields from a doc.get().data() query in a Firestore database?

I have a firestore database with data like this:
Now, I access this data with doc('mydoc').get().data() and it returns the data. But, even without the data changing, if I make the same call again and again, I get different response. I mean, the data is the same, but the order of the fields is different each time.
Here's my logs with two calls, see how the field order is random? Not just between objects in the same request, but between the same object in different requests.
I'm accessing this data in a Cloud Function and serving it as an API endpoint. I want to cache the response if the data (in the database) hasn't changed, but I can't, because the data (as returned by doc.get().data()) is constantly changing.
From what I could find, this might stem from ProtoBuf encoding.
My question: is there any way to get a consistent response to a firebase query when the underlying data isn't changing?
And if no, is my only option to JSON.stringify() the whole object before putting it into firestore? (I don't need to query within document objects.)
Edit for clarity: I am not expecting to know in advance the order of the fields being returned. I am expecting (hoping) that the order will be the same each time.
JSON object fields are unordered as per the JSON spec. Individual implementations of JSON are free to rearrange order however they see fit, and there's no surefire way to guarantee an order. See e.g. this answer.
This isn't a Firestore-specific problem, this is just generally how JSON objects work. You cannot and should not depend on the order of fields for any parsing or representation.
If display order is extremely important to you, you might want to investigate libraries like ordered-json.

BizTalk WCF-SQL typed stored procedure response schema

Generating the response schema for a typed stored procedure, the stored procedure did some database updates prior to returning the final resultset. The response schema generated by Visual Studio has quite some garbage.
Is there a way to force it to generate a cleaner schema?
The StoredProcedureResultset4 is the only one that matters.
Here's my same answers from MSDN. Unfortunately, the marked Answer will not work for you since there is no way, or it's really, really hard, to capture and suppress result sets from a called Stored Procedure.
The cause is related to the Stored Procedure code.
The Wizard will only generate Schema types for elements that are returned in the response from SQL Server. Meaning, the Stored Procedure is emitting results for those updates so you're getting metadata for them.
The way to solve this is by modifying the SP code to not emit any result on any operation that shouldn't. Basically, if you see it in the result window in SQL Management Studio, you will get schema for it.
status and message are presumably the result of another SP so one way to suppress that is to assign the result to a temp table thus redirecting it form the output stream.
However, if StoredProcedureResultset4 is all that matters, that's all you have to use. There's nothing wrong with just ignoring all the other results provided they always appear in the same order.
Just to be clear, you still have to write the wrapper that suppresses the unwanted results, simply invoking the original SP from a new SP will not change the output, you'll still get the extra result sets.
In fact, a wrapper would be the harder implementation since you'd have to capture and examine all results sets which I don't think is possible.
The more correct way to do this in BizTalk would be a Port Map that strips the unwanted content.

Get all values of some parameter for all documents in Marklogic

I'm trying to get 'xxx' parameter of all documents in Marklogic using query like:
(/doc/document)/xxx
But since we have very big documents database I get an error "Expanded tree cache full on host". I don't have admin rights for this server, so I can't change configuration. I suggest that I can use ranges while getting documents like:
(/doc/document)[1 to 1000]/xxx
and then
(/doc/document)[1000 to 2000]/xxx
etc, but I'm concerned that I do not know how it works, for example, what will happen if during this process database will be changed (f.e. a new document will be added), how will it affect the result documents list? Also I don't know which order it uses in case when I use ranges...
Please clarify, is this way can be appropriate or is there any other ways to get some parameter of all documents?
Depending on how big your database is there may be no way to get all the values in one transaction.
Suppose you have a trillion documents, the result set will be bigger then can be returned in one transaction.
Is that important ? Only your business case can tell.
The most efficient way of getting all "xxx" values is with a range index. You can see how this works
with cts:element-values ( https://docs.marklogic.com/cts:element-values )
You do need to be able to create a range index over the element "xxxx" to do this (ask your DBA).
Then cts:element-values() returns only those values and the chances of being able to return most or all of them
in memory in a signle transaction is much higher then using xpath (/doc/document/xxx) which as you wrote actualy returns all the "xxx" elements (not just their values). The most likely requires actually loading every document matching /doc and then parsing it and returning the xxx element. That can be both slow and inefficient.
A range index just stores the values and you can retrieve those without ever having to load the actual document.
In general when working with large datasets learning how to access data in MarkLogic using only indexes will produce the fastest results.

How can i create a segment data feature

I have a task to extend my web application to provide users the ability to segment their own data (i.e choose their own fields and add their criteria using And/Or etc), so I'm creating something similar to a query builder tool but lighter. I'm not worrying about the front end for the moment, i am just trying to focus on how to do this in the back end.
My only thoughts so far are to store their "Segment" as an XML document (serialized in the DB) which contains all of their columns and criteria and how they map to the database, then when the segment is called, i have a mapping class which deserializes this xml document and maps the fields and builds a SQL query for this and then returns the query results. The problem i see with this is if the database setup changes (likely) then i have a serialized XML document which knows nothing about these changes.
Has anyone tacked a similar situation?
I had a similar problem and posted a question on here with what could be a potential solution to your own issue.
Dynamic linq query with multiple/unknown criteria
See how you get on with that.

Resources