azure kql parse function - unable to parse ? using regex (zero or one time) - azure-data-explorer

I'm trying to parse this line:
01/11/1011 11:11:11: LOG SERVER = 1 URL = /one/one.aspx/ AccountId = 1111 MainId = 1111 UserAgent = Browser = Chrome , Version = 11.0, IsMobile = False, IP = 1.1.1.1 MESSAGE = sample message TRACE = 1
using this parse statement:
parse-where kind=regex flags=i message with
timestamp:datetime
":.*LOG SERVER = " log_server:string
".*URL = " url:string
".*AccountId = " account_id:string
".*MainId = " main_id:string
".*?UserAgent = " user_agent:string
",.*Version = " version:string
",.*IsMobile = " is_mobile:string
",.*IP = " ip:string
".*MESSAGE = " event:string
".*TRACE = " trace:string
now the thing is that sometimes I got records that has one "key=value" missing but the order of the rest of the columns remains the same.
to match all kinds of rows I just wanted to add (<name_of_colum>)? for example:
"(,.*Version = )?" version:string
but it fails everytime.

I think parse/parse-where operators are more useful when you have well formatted inputs - the potentially missing values in this case would make it tricky/impossible to use these operators.
If you control the formatting of the input strings, consider normalizing it to always include all fields and/or add delimiters and quotes where appropriate.
Otherwise, you could use the extract function to parse it - the following expression would work even if some lines are missing some fields:
| extend
timestamp = extract("(.*): .*", 1, message, typeof(datetime)),
log_server = extract(".*LOG SERVER = ([^\\s]*).*", 1, message),
url = extract(".*URL = ([^\\s]*).*", 1, message),
main_id = extract(".*MainId = ([^\\s]*).*", 1, message),
user_agent = extract(".*UserAgent = ([^,]*).*", 1, message),
version = extract(".*Version = ([^,]*).*", 1, message),
is_mobile = extract(".*IsMobile = ([^,]*).*", 1, message),
ip = extract(".*IP = ([^\\s]*).*", 1, message),
event = iff(message has "TRACE", extract(".*MESSAGE = (.*) TRACE.*", 1, message), extract(".*MESSAGE = (.*)", 1, message)),
trace = extract(".*TRACE = (.*)", 1, message)

Related

Telegraf json v2 parsing

I'm trying to pick out the local_temperature property from a Zigbee TRV.
Here is my section from /etc/telegraf/telegraf.conf
[[inputs.mqtt_consumer]]
servers = ["tcp://127.0.0.1:1883"]
topics = [
"zigbee2mqtt/Home/+/Radiator",
]
data_format = "json_v2"
[[inputs.mqtt_consumer.json_v2]]
measurement_name = "temperature"
[[inputs.mqtt_consumer.topic_parsing]]
topic = "_/room/_"
[[inputs.mqtt_consumer.json_v2.field]]
path = "local_temperature"
type = "float"
Error from /var/log/syslog
Feb 21 11:03:07 mini31 telegraf[23428]: 2022-02-21T11:03:07Z E! [inputs.mqtt_consumer] Error in plugin: metric parse error: expected tag at 1:36: "{\"battery\":97,\"boost_heating\":\"OFF\",\"boost_heating_countdown\":0,\"boost_heating_countdown_time_set\":300,\"child_lock\":\"UNLOCK\",\"current_heating_setpoint\":8,\"eco_mode\":\"OFF\",\"eco_temperature\":12,\"linkquality\":110,\"local_temperature\":17.5,\"local_temperature_calibration\":-2,\"max_temperature\":24,\"min_temperature\":8,\"position\":25,\"preset\":\"programming\",\"programming_mode\":\"09:00/19°C 12:00/13°C 14:00/19°C 17:00/8°C 06:00/8°C 12:00/8°C 14:30/8°C 17:30/8°C 06:00/8°C 12:30/8°C 14:30/8°C 18:30/8°C\",\"window\":\"CLOSED\",\"window_detection\":\"OFF\"}"
What does it even mean?
[[inputs.mqtt_consumer]]
servers = ["tcp://127.0.0.1:1883"]
topics = [
"zigbee2mqtt/Home/+/Radiator",
]
data_format = "json_v2"
[[inputs.mqtt_consumer.json_v2]]
measurement_name = "temperature"
[[inputs.mqtt_consumer.topic_parsing]]
topic = "zigbee2mqtt/Home/+/Radiator"
tags = "_/_/room/_"
[[inputs.mqtt_consumer.json_v2.field]]
path = "local_temperature"
rename = "temperature"
type = "float"
[[inputs.mqtt_consumer.json_v2]]
measurement = "valve"
[[inputs.mqtt_consumer.json_v2.field]]
path = "position"
rename = "valve"
type = "float"

How can I pass a line break through an API Patch or Post call in R to Microsoft Dataverse?

I am trying to populate a table in Microsoft Dataverse using an API Patch call through R. The field in Dataverse is a multiline text field with a limit of 4,000 characters, but using a Patch call I can't seem to get text to break across lines. I have other entries in the table that break across lines where they had been brought in by using Access as the front end, which I'm trying to not have to use.
### This is where I create the multiline text
MPC.Comments$Comment = paste(paste("Meeting Date: ", month(Sys.Date()), "/", year(Sys.Date()), sep = ""),
paste("Plan Cost: $", round(MPC.Comments$plantot, 0), sep = ""),
paste("%PC Remain: ", round((round(MPC.Comments$plantot, 0) - round(MPC.Comments$acttot, 0) - round(MPC.Comments$comsub, 0))/round(MPC.Comments$plantot, 0)*100,1), "%", sep = ""),
paste("Date Next PM Gate: ", MPC.Comments$eventdate, sep = ""),
paste("Est. Monthly Spend: $", MPC.Comments$esttot, sep = ""),
paste("Status Update: ", MPC.Comments$MPCStatus, sep = ""),
paste("Action Items: ", MPC.Comments$MPCAction, sep = ""),
sep = " <\\r\\n>")
### This is where I try to push it to the database
if (nrow(MPC.Comments) > 0) {
for (i in 1:nrow(MPC.Comments)) {
### Set POST parameters
MPC.Comments.Temp = list(
crfd0_projno = MPC.Comments$ProjNo[i],
crfd0_fyp = MPC.Comments$FYP[i],
crfd0_comment = MPC.Comments$Comment[i],
statuscode = 1)
POST("https://[REDACTED].crm4.dynamics.com/api/data/v9.2/crfd0_mpccommentses",
add_headers(Authorization = paste("Bearer", API.AuthKey, sep = " ")),
body = MPC.Comments.Temp,
encode = "json",
verbose())}}
No matter what special character I put in as my separator, it either comes through as nothing (if not escaped like \n) or as the physical string (if escaped like \\n). I have tried \n, \r\n, , <br>, and \u000a. Clearly there is a piece missing here that the data needs to be sent as something other than a string with a carriage return character. Does anyone know how to do this?

How to include / exclude filter statement in R httr query for Localytics

I can successfully query data from Localytics using R, such as the following example:
r <- POST(url = "https://api.localytics.com/v1/query,
body=list(app_id=<APP_ID>,
metrics=c("occurrences","users"),
dimensions=c('a:URI'),
conditions=list(day = c("between", "2020-02-11", "2020-03-12"),
event_name = "Content Viewed",
"a:Item URI" = "testing")
),
encode="json",
authenticate(Key,Secret),
accept("application/json"),
content_type("application/json"))
stop_for_status(r)
But what I would like to do is create a function so I can do this quickly and not have to copy/paste data.
The issue I am running into is with the line "a:Item URI" = "testing", where I am filtering all searches by the Item URI where they all equal "testing", but sometimes, I don't want to include the filter statement, so I just remove that line entirely.
When I wrote my function, I tried something like the following:
get_localytics <- function(appID, metrics, dimensions, from = Sys.Date()-30,
to = Sys.Date(), eventName = "Content Viewed",
Key, Secret, filterDim = NULL, filterCriteria = NULL){
r <- httr::POST(url = "https://api.localytics.com/v1/query",
body = list(app_id = appID,
metrics = metrics,
dimensions = dimensions,
conditions = list(day = c("between", as.character(from), as.character(to)),
event_name = eventName,
filterDim = filterCriteria)
),
encode="json",
authenticate(Key, Secret),
accept("application/json"),
content_type("application/json"))
stop_for_status(r)
result <- paste(rawToChar(r$content),collapse = "")
document <- fromJSON(result)
df <- document$results
return(df)
}
But my attempt at adding filterDim and filterCriteria only produce the error Unprocessable Entity. (Keep in mind, there are lots of variables I can filter by, not just "a:Item URI" so I need to be able to manipulate that as well.
How can I include a statement, where if I need to filter, I can incorporate that line, but if I don't need to filter, that line isn't included?
conditions is just a list, so you can conditionally add elements to it. Here we just use an if statement to test of the values are passed and if so, add them in.
get_localytics <- function(appID, metrics, dimensions, from = Sys.Date()-30,
to = Sys.Date(), eventName = "Content Viewed",
Key, Secret, filterDim = NULL, filterCriteria = NULL){
conditions <- list(day = c("between", as.character(from), as.character(to)),
event_name = eventName)
if (!is.null(filterDim) & !is.null(filterCriteria)) {
conditions[[filterDim]] <- filterCriteria)
}
r <- httr::POST(url = "https://api.localytics.com/v1/query",
body = list(app_id = appID,
metrics = metrics,
dimensions = dimensions,
conditions = conditions),
encode="json",
authenticate(Key, Secret),
accept("application/json"),
content_type("application/json"))
stop_for_status(r)
result <- paste(rawToChar(r$content),collapse = "")
document <- fromJSON(result)
df <- document$results
return(df)
}

How to scrape additional data points from Zillow using R

I inherited a file from a previous coworker to use R to pull Zillow "Zestimate" and "Rent Zestimate" data for properties, and then output these data points to a CSV file. However, I am very new to coding and have not been successful with pulling additional information that I know is available. I have searched the site for answers, but since I am still trying to learn how to code I haven't been successful with making my own edits to the current code. Any help I can get adding code to pull any of these additional data points would be much appreciated.
Property details (sqft, year built, beds, baths, property type)
Zestimate range (high and low)
Rent Zestimate range (high and low)
Last sold date and price
Price history (latest event, date, and price)(not sure this can be scraped )
Tax history (latest year and property taxes) (not sure this can be scraped )
Current code:
houseAddsSplit = read.csv(houseAddsFileLocation) zillowAdds = paste(houseAddsSplit$STREET, houseAddsSplit$CITY, houseAddsSplit$STATE, houseAddsSplit$ZIP, sep = " ")
library(ZillowR)
library(XML)
set_zillow_web_service_id(zwsId)
zpidList = NULL
zestimate = NULL
rentZestimate = NULL
for(i in 1:length(zillowAdds)){
print(paste("Processing house: ", i, ", address: ", zillowAdds[i]))
print(zillowAdds[i])
houseZpidClean = "ERR"
houseZestClean = "ERR"
houseRentZestClean = "ERR"
houseInfo = try(GetSearchResults(address = zillowAdds[i], citystatezip = as.character(houseAddsSplit$ZIP[i]), rentzestimate = TRUE))
'#'while(houseInfo$message$code != "0"){
'#' houseInfo = try(GetSearchResults(address = cipAdds[i], citystatezip = as.character(cipLoans$ZIP[i]), rentzestimate = TRUE))
'#' Sys.sleep(runif(1, 3, 5))
'#'}
if(houseInfo$message$code == "0"){
houseZpid = try(xmlElementsByTagName(houseInfo$response, "zpid", recursive = TRUE))
houseZest = try(xmlElementsByTagName(houseInfo$response, "amount", recursive = TRUE))
houseZpidAlmostClean = try(toString.XMLNode(houseZpid$results.result.zpid))
houseZestAC = try(toString.XMLNode(houseZest$results.result.zestimate.amount))
houseRentZestAC = try(toString.XMLNode(houseZest$results.result.rentzestimate.amount))
houseZpidClean = try(substr(houseZpidAlmostClean, 7, nchar(houseZpidAlmostClean) - 7))
houseZestClean = try(substr(houseZestAC, 24, nchar(houseZestAC) - 9))
houseRentZestClean = try(substr(houseRentZestAC, 24, nchar(houseRentZestAC) - 9))
}
closeAllConnections()
zpidList[i] = houseZpidClean
print(paste("zpid: ", houseZpidClean))
zestimate[i] = houseZestClean
print(paste("zestimate: ", houseZestClean))
rentZestimate[i] = houseRentZestClean
print(paste("rent zestimate: ", houseRentZestClean))
Sys.sleep(runif(1, 7, 10))
}
outputData = cbind(houseAddsSplit, zestimate, rentZestimate)
write.csv(outputData, paste(writeToFolder, "/zillowPullOutput.csv", sep = ""))
print(paste("All done. File written to", paste(writeToFolder, "/zillowPullOutput.csv", sep = "")))
Hope you solved this, but GetSearchResult API wouldn't return all the results you are looking for. You may have to call GetUpdatedPropertyDetails API to get all the results.

UK Postcode to Census Data using the API

Using the Office for National Statistics website I can get a census summary for a UK postcode.
https://neighbourhood.statistics.gov.uk/dissemination/
I expected that I should be able to do the same thing using the API.
https://neighbourhood.statistics.gov.uk/HTMLDocs/downloads/QuickStart-Guide-V2.1.pdf
But it isn't clear to me how to get from the postcode to the neighbourhood (or Lower Layer Super Output Area as the Office for National Statistics calls them). It seems that I need to use the Delivery endpoint like this.
http://neighbourhood.statistics.gov.uk/NDE2/Deli/getChildAreaTables?ParentAreaId=276980&LevelTypeId=141&Datasets=67
But how do I find out which parameters to use for a specific postcode?
It looks like three calls are required to get a dataset.
import xml.etree.ElementTree as ElementTree
import json
import requests
API_KEY = "YOUR_API_KEY"
def get_area_id(level_type, postcode):
""" Get the area id for the pos
:param level_type: The resolution you are interested in. 14 = ward level data.
:param postcode: A UK postcode
:return: string area identifier
"""
base_url = "http://neighbourhood.statistics.gov.uk/NDE2/Disco/FindAreas"
payload = {'HierarchyId': '27', 'Postcode': postcode}
response = requests.get(base_url, params=payload)
xml = ElementTree.fromstring(response.content)
namespaces = {'ns1': 'http://neighbourhood.statistics.gov.uk/nde/v1-0/discoverystructs'}
xpath_for_area = './/ns1:Area'
areas = xml.findall(xpath_for_area, namespaces)
ward_area_id = ''
for area in areas:
level_type_id = area.find('ns1:LevelTypeId', namespaces).text
if level_type_id == str(level_type): # find the Ward (=14)
ward_area_id = area.find('ns1:AreaId', namespaces).text
return ward_area_id
def get_ext_code(area_id):
""" Get the ext code (whatever that is) from an area id
:param area_id: the area id for a postcode
:return: the ext code for an area (I think is the GSS code)
"""
base_url = "http://neighbourhood.statistics.gov.uk/NDE2/Disco/GetAreaDetail"
payload = {'AreaId': area_id}
response = requests.get(base_url, params=payload)
xml = ElementTree.fromstring(response.content)
namespaces = {'ns1': 'http://neighbourhood.statistics.gov.uk/nde/v1-0/discoverystructs',
'structure': 'http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure'}
xpath_for_ext_code = './/ns1:ExtCode'
ext_code = xml.find(xpath_for_ext_code, namespaces).text
return ext_code
def get_data(data_set, geog_code):
""" Get the data for a geographical code
:param data_set: string identifier from http://www.nomisweb.co.uk/census/2011 /quick_statistics
:param geog_code: the ext code for the geographical area
:return: a json object with the data
"""
base_url = "http://data.ons.gov.uk/ons/api/data/dataset/"
payload = {'apikey': API_KEY, 'context': 'Census', 'geog': '2011WARDH', 'dm/2011WARDH': geog_code,
'totals': 'false', 'jsontype': 'json-stat'}
r = requests.get(base_url + "/" + data_set + ".json", params=payload)
obj = json.loads(r.text)
return obj
def process(json_object, data_set):
data = {}
values = json_object[data_set]['value']
index = json_object[data_set]['dimension'][json_object[data_set]['dimension']['id'][1]]['category']['index']
labels = json_object[data_set]['dimension'][json_object[data_set]['dimension']['id'][1]]['category']['label']
for l in labels:
num = index[l]
count = values[str(num)]
data[labels[l]] = count
return data
area_id = get_area_id(14, "SW1A 0AA")
gss_code = get_ext_code(area_id)
data_returned = get_data("QS208EW", gss_code) # QS208EW = religion
print(process(data_returned, "QS208EW"))
Have you tried looking at the code in the VBA example?
Function RunAreas()
Dim txtResponse
Dim postcode As String
Dim extCode
Set rootSheet = GetSheet("Query")
Set areaSheet = GetSheet("Areas")
endPoint = "http://neighbourhood.statistics.gov.uk/NDE2/Disco/FindAreas?HierarchyId=27&Postcode="
postcode = rootSheet.Range("A2").Value
Application.StatusBar = "Getting areas for " + postcode
txtResponse = GetAreas(postcode)
delim = "<delim>"
data = GetElements(txtResponse, "Area")
If UBound(data) < 0 Then
Application.StatusBar = False
MsgBox "Postcode " + postcode + " not found", vbExclamation
Exit Function
End If
For i = 0 To UBound(data)
curLevelType = GetValue(data(i), "LevelTypeId")
curHierarchy = GetValue(data(i), "HierarchyId")
curId = GetValue(data(i), "AreaId")
curName = GetValue(data(i), "Name")
Select Case curLevelType
Case 15
extCode = UpdateArea("Output Area", 2, curId, curName, curHierarchy)
Case 14
extCode = UpdateArea("Ward", 3, curId, curName, curHierarchy)
Case 13
extCode = UpdateArea("LA", 4, curId, curName, curHierarchy)
Case 11
extCode = UpdateArea("Region", 5, curId, curName, curHierarchy)
Case 10
extCode = UpdateArea("Country", 6, curId, curName, curHierarchy)
End Select
Next
MsgBox ("Areas Found")
Application.StatusBar = "Get Areas completed"
End Function

Resources