Why databricks $t and $v is reffered once cosmos table api is connected - azure-cosmosdb

Cosmos table api is connected through databricks.
Endpoint = "https://acccuntanme.documents.azure.com:443/"
MasterKey = "keyyyyy"
Cfg1 = {
"spark.cosmos.accountEndpoint": Endpoint,
"spark.cosmos.accountKey": MasterKey,
"spark.cosmos.database": "TablesDB",
"spark.cosmos.container": "Deals_Metainfo",
"spark.cosmos.read.inferSchema.enabled" : "true"
}
df = spark.read.format("cosmos.oltp").options(**Cfg1) \
.option("spark.cosmos.read.inferSchema.enabled", "true").load()
print(df.count())
Results were as below.
enter image description here
One more question here is ,
created another data frame from above results.
df2= df.select("CreationDate.$v","$pk","filename.$v","Sno.$v","Status.$v") \
.toDF("CreationDate","PrimaryKey","Filename","Sno","Status")
results as below
enter image description here
Created temp view from top that.
df2.createOrReplaceTempView("v_filenames_data")
So if i try to query view for status results is not appering.
enter image description here
created temp view

Related

CosmosDB SDK python odata.error InvalidInput

I have a table in a cosmosDB where I would like to store some query coming in from a azure data explorer.
In data explorer (using python sdk) I run a query to filter and aggregate my data and I would like to push this query result into a cosmosDB.
I did set all the configurations and connection, but when I run the command upset_item or create_item
I get the following error:
azure.cosmos.exceptions.CosmosHttpResponseError: Status code: 400
{"odata.error":{"code":"InvalidInput","message":{"lang":"en-us","value":"Request url is invalid.\r\nActivityId: d8994113-504e-4198-9976-41316aaafb5f, documentdb-dotnet-sdk/2.11.0 Host/64-bit MicrosoftWindowsNT/6.2.9200.0\nRequestID:d8994113-504e-4198-9976-41316aaafb5f\n"}}}
this is how I configured the azure-cosmos:
url = "url"
key = {"masterkey": "my-key"}
client = CosmosClient(url,key)
database_name = "database"
container_name = "table"
database = client.get_database_client(database_name)
container = database.get_container_client(container_name)
for i in df:
container.upsert_item(i)
df is the result of my query from azure data explorer
the error I believe is due how I am passing the body of the query to the upsert_item
Any advice about this?
UPDATE:
so I tried to follow the sample. I updated my cosmosdb to a sql API.
my current code now looks like this:
url = "url"
key = {"masterkey": "my-key"}
client = CosmosClient(url,key)
database_name = "database"
container_name = "table"
database = client.get_database_client(database_name)
container = database.get_container_client(container_name)
data = json.dumps(test)
data_dict = json.loads(data)
for i in data_dict:
container.create_item(body=i)
I converted the data frame to a json. but when I run the for I get this error:
AttributeError: 'str' object has no attribute 'get'
And I have no idea what am I doing wrong
Thanks for your reply and I think we need more contact on debug.
Here's my code and it worked well.
from azure.cosmos import exceptions, CosmosClient, PartitionKey,ProxyConfiguration
import family
# Initialize the Cosmos client
endpoint = "https://cosmosdb_name.documents.azure.com:443/"
key = 'primary_key_in_keys_blade'
client = CosmosClient(endpoint, key)
database_name = 'AzureSampleFamilyDatabase'
database = client.create_database_if_not_exists(id=database_name)
# </create_database_if_not_exists>
# <create_container_if_not_exists>
container_name = 'FamilyContainer'
container = database.create_container_if_not_exists(
id=container_name,
partition_key=PartitionKey(path="/lastName"),
offer_throughput=400
)
family_items_to_create = [
{
"id":"112233",
"name":"andersen"
},
{
"id":"455646",
"name":"johnson"
},
{
"id":"999999",
"name":"smith"
}
]
# <create_item>
for family_item in family_items_to_create:
container.upsert_item(body=family_item)
Per your error, I think you need to check the data format of data_dict or you can use my sample data for testing to rule out other problems. In my position, I don't know why you'd like to use json.loads() here.

DoGet with multiple parameters not being recognized

I'm currently trying to connect a Lua Script with a GS WebApp. The connection is working but due to my lack of knowledge in GScripting I'm not sure why it isn't saving my data correctly.
In the Lua side I'm just passing in a hard-code a random name and simple numerical userid.
local HttpService = game:GetService("HttpService")
local scriptID = scriptlink
local WebApp
local function updateSpreadSheet ()
local playerData = (scriptID .. "?userid=123&name:Jhon Smith")
WebApp = HttpService:GetAsync(playerData)
end
do
updateSpreadSheet()
end
On the Google Script side i'm only saving the data on the last row and then add the value of the userid and the name.
function doGet(e) {
console.log(e)
// console.log(f)
callName(e.parameter.userid,e.parameter.name);
}
function callName(userid,name) {
// Get the last Row and add the name provided
var sheet = SpreadsheetApp.getActiveSheet();
sheet.getRange(sheet.getLastRow() + 1,1).setValues([userid],[name]);
}
However, the only data the script is saving is the name, bypassing the the userid for reasons I have yet to discover.
setValues() requires a 2D array and range dimensions should correspond to that array. The script is only getting 1 x 1 range and setValues argument is not a 2D array. Fix the syntax or use appendRow
sheet.getRange(sheet.getLastRow() + 1,1,1,2).setValues([[userid,name]]);
//or
sheet.appendRow([userid,name])
References:
appendRow

Application Insights and Azure Stream Analytics Query export the whole custom dimensions as string

I have setup a continuous export from Application Insights into Blog. With a data stream I'm able to get out the JSON files into SQL DB. So far so good.
Also with help from Phani Rahul Sivalenka I'm able to query the individual properties of custom dimensions as described here: Application Insights and Azure Stream Analytics Query a custom JSON property
My custom dimensions looks like this when exporting manually into CSV file:
"{""OperatingSystemVersion"":""10.0.18362.418"",""OperatingSystem"":""WINDOWS"",""RuntimePlatform"":""UWP"",""Manufacturer"":""LENOVO"",""ScreenHeight"":""696"",""IsSimulator"":""False"",""ScreenWidth"":""1366"",""Language"":""it"",""IsTablet"":""False"",""Model"":""LENOVO_BI_IDEAPAD4Q_BU_idea_FM_""}"
Additionally to the single columns I like to have the whole custom dimensions as a string in a SQL Table column (varchar(max)).
In the "Test results" of my Data Stream Output Query I see the column as formated above - but when really exporting / wrinting into SQL DB all my tests ended having only the value "Array" or "Record" as value in my SQL Table column.
What do I have to do in the Data Stream Query to get the whole custom dimensions value as a string and I'm able to write this into SQL Table as a whole string?
What do I have to do in the Data Stream Query to get the whole custom
dimensions value as a string and I'm able to write this into SQL Table
as a whole string?
You could use UDF to merge all key-values of single raw into one single json format string.
UDF:
function main(raw) {
let str = "{";
for(let key in raw) {
str = str + "\""+ key+"\":\""+raw[key]+"\",";
}
str += "}";
return str;
}
SQL:
SELECT udf.jsonstring(INPUT1) FROM INPUT1
Output:
The answer brought me on the right track.
The above script don't include the values as expected. So I modified the script to get it work as needed:
function main(dimensions) {
let str = "{";
for (let i in dimensions)
{
let dim = dimensions[i];
for (let key in dim)
{
str = str + "\"" + key+ "\":\""+dim[key]+"\",";
}
}
str += "}";
return str;
}
Selecting:
WITH pageViews as (
SELECT
V.ArrayValue.name as pageName
, *
, customDimensions = UDF.flattenCustomDimensions(A.context.custom.dimensions)
, customDimensionsString = UDF.createCustomDimesionsString(A.context.custom.dimensions)
FROM [AIInput] as A
CROSS APPLY GetElements(A.[view]) as V
)
With this I'm getting the custom dimensions string as follow in my SQL table:
{"Language":"tr","IsSimulator":"False","ScreenWidth":"1366","Manufacturer":"Hewlett-Packard","OperatingSystem":"WINDOWS","IsTablet":"False","Model":"K8K51ES#AB8","OperatingSystemVersion":"10.0.17763.805","ScreenHeight":"696","RuntimePlatform":"UWP",}

Multiple commits to neo4j from R

I have collected some tweets using the twitteR package and thereafter exported them to a neo4j database using Nicole White's various tutorials. I extract the tweets to a dataframe called kdf and thereafter use functions from stringr for basic cleaning up as demonstrated by Nicole. I am then sending this to neo4j from R. The essential part of my code is:
library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/", username="xxxx", password="xxxx")
clear(graph)
addConstraint(graph, "Tweet", "id")
addConstraint(graph, "User", "username")
addConstraint(graph, "Hashtag", "hashtag")
addConstraint(graph, "Tags", "ent_tag")
query = "
CREATE (tweet:Tweet {id: {tweetID}})
SET tweet.text = {text}
CREATE (user:User {name: {Username}})
CREATE (user)-[:TWEETED]->(tweet)
FOREACH(reply_to_sn IN CASE {reply_to_sn} WHEN NULL then [] else [{reply_to_sn}] END |
MERGE (replytouser:User {username:{reply_to_sn}})
CREATE (tweet)-[:IN_REPLY_TO]->(replytouser)
)
FOREACH(retweet_sn IN CASE {retweet_sn} WHEN NULL THEN [] ELSE [{retweet_sn}] END |
MERGE(retweet_user:User {username: {retweet_sn}})
CREATE (tweet)-[:RETWEET_OF]->(retweet_user)
)
FOREACH(hastag_nodes IN CASE {hashtag_nodes} WHEN NULL then [] else [{hashtag_nodes}] END |
MERGE (h:Hashtag {hashtag :{hashtag_nodes}})
CREATE (tweet)-[:HASHTAG]->(h)
)
FOREACH(mentioned_users IN CASE {mentioned_users} WHEN NULL then [] else [{mentioned_users}] END |
MERGE (m:User {username :{mentioned_users}})
CREATE (tweet)-[:MENTIONED]->(m)
)
"
tx = newTransaction(graph)
for(i in 1:nrow(kdf)){
row = kdf[i, ]
appendCypher(tx, query,
tweetID=row$id,
text=row$text,
Username=row$screenName,
reply_to_sn=row$replyToSN,
retweet_sn=getRetweetSN(row$text),
hashtag_nodes=getHashtags(row$text),
mentioned_users=getMentions(row$text))
}
commit(tx)
What I have done thereafter is extracted named entities for all the text using Watson's Alchemy API. This is stored in a dataframe called ent_tbl. This contains three variables, tweetid, etext and etype. Now I am trying to export this data too to the same neo4j databse and join on the id of the tweets. This is the other part of the code:
query="
MATCH(t:ent_tag {id : $twid, type :$etype, text :$etext})
MATCH(tw:tweet {tweetID : $twid })
CREATE (tw)-[:HAS_ENT]->(t)
"
tx=newTransaction(graph)
for (i in 1:nrow(ent_tbl)){
row = ent_tbl[i,]
appendCypher(tx, query,
twid=row2$tweetid,
etype=row2$etype,
etext=row2$etext)
}
commit(tx)
While I do not get any errors on committing this, summary(graph) does not show me the relationship between the tags (t) and the tweets (tw) that I expected to see.
> summary(graph)
This To That
1 User TWEETED Tweet
2 Tweet RETWEET_OF User
3 Tweet HASHTAG Hashtag
4 Tweet MENTIONED User
5 Tweet IN_REPLY_TO User
Why would this happen?
This is my db.schema in neo4j:
That is because the MATCH does not find any tag or tweet so it breaks. If you want to add data to existing nodes, you should match them by ID and then set their properties. And you got to be consistent with labels and upper/lower cases. I think this is what you are looking for.
query="
MATCH(t:Tags {ent_tag : $twid})
MATCH(tw:Tweet {tweetID : $twid })
SET t.type=$etype, t.text=$etext
CREATE (tw)-[:HAS_ENT]->(t)
"
tx=newTransaction(graph)
for (i in 1:nrow(ent_tbl)){
row = ent_tbl[i,]
appendCypher(tx, query,
twid=row2$tweetid,
etype=row2$etype,
etext=row2$etext)
}
commit(tx)

Losing some z3c relation data on restart

I have the following code which is meant to programmatically assign relation values to a custom content type.
publications = # some data
catalog = getToolByName(context, 'portal_catalog')
for pub in publications:
if pub['custom_id']:
results = catalog(custom_id=pub['custom_id'])
if len(results) == 1:
obj = results[0].getObject()
measures = []
for m in pub['measure']:
if m in context.objectIds():
m_id = intids.getId(context[m])
relation = RelationValue(m_id)
measures.append(relation)
obj.measures = measures
obj.reindexObject()
notify(ObjectModifiedEvent(obj))
Snippet of schema for custom content type
measures = RelationList(
title=_(u'Measure(s)'),
required=False,
value_type=RelationChoice(title=_(u'Measure'),
source=ObjPathSourceBinder(object_provides='foo.bar.interfaces.measure.IMeasure')),
)
When I run my script everything looks good. The problem is when my template for the custom content tries to call "pub/from_object/absolute_url" the value is blank - only after a restart. Interestingly, I can get other attributes of pub/from_object after a restart, just not it's URL.
from_object retrieves the referencing object from the relation catalog, but doesn't put the object back in its proper Acquisition chain. See http://docs.plone.org/external/plone.app.dexterity/docs/advanced/references.html#back-references for a way to do it that should work.

Resources