Replace occurrences of one dataset in another - r

I have a dataset called Messages that contains C# errors. I have a 2nd dataset called Usernames that contains a list of usernames. I want to remove occurrences of any username from messages. No message should have more then 1 occurance of username. I thought I could do this with gsubfn, but it output all NULLs. Can someone set me straight on the best way to do this?
usrNm <- c(dataset2$username)
stripUsername <- function(x) {gsubfn(usrNm,'',x)}
noUsernames <- within(dataset,{Message=stripUsername(dataset$Message)})
+----------------------------------+----------------------------------+ +--------------+
| Message | Expected output | | Username |
+----------------------------------+----------------------------------+ +--------------+
| User: Mary.Jane sent bad data | User: sent bad data | | Mary.Jane |
+----------------------------------+----------------------------------+ +--------------+
| Error occurred in System.Module. | Error occurred in System.Module. | | Robert.Frost |
+----------------------------------+----------------------------------+ +--------------+
| Hello, world! | Hello, world! | | BB.Wolf |
+----------------------------------+----------------------------------+ +--------------+
| Tracing request by Robert.Frost! | Tracing request by ! |
+----------------------------------+----------------------------------+

Here is one way:
library(stringi)
stri_replace_all_fixed(dataset$Message, dataset2$Username, '', vectorize_all = FALSE)
Output
[1] "User: sent bad data" "Error occurred in System.Module."
[3] "Hello, world!" "Tracing request by !"
Data
dataset <- data.frame(
Message = c("User: Mary.Jane sent bad data", "Error occurred in System.Module.", "Hello, world!", "Tracing request by Robert.Frost!"),
stringsAsFactors = FALSE
)
dataset2 <- data.frame(
Username = c("Mary.Jane", "Robert.Frost", "BB.Wolf")
)

Related

How to replace empty spaces with values from adjacent colum that needs to be separated?

Hi everyone. I'm so sorry for my english. I need to separate the
domain data of some emails in a table. Then, if these mail data have
the domain of a country, this information must be moved to another
column that is incomplete in which the participants of a congress are
included. This for a relatively large database. I put an example
below.
| email | country |
| -------- | -------------- |
| naco#gmail.com | CO |
| monic45814#gmail.com | AR |
| jsalazar#chapingo.mx | |
| andresramirez#urosario.edu.co | |
| jeimy861491#hotmail.com | CL |
|jytvc#hotmail.com | |
Outcome should be
| email | country |
| -------- | -------------- |
| naco#gmail.com | CO |
| monic45814#gmail.com | AR |
| jsalazar#chapingo.mx | MX |
| andresramirez#urosario.edu.co | CO |
|jeimy861491#hotmail.com | CL |
|jytvc#hotmail.com | *NA* |
Thank you so much.
You can use str_extract to get the string after the last occurrence of "." and if_else to ignore rows that already have a country and rows which e-mail doesn't end with a country code:
df %>%
mutate(country = if_else(is.na(country) & str_extract(email, "[^.]+$") != "com", toupper(str_extract(email, "[^.]+$")), country))
small but not so small PS: I would always recommend to provide fake data when you are mentioning personal data like e-mail addresses
Here is a solution in base R.
Suppose:
df<-data.frame(email,country)
Then:
df$country<-ifelse(is.na(df$country)&sub(".*(.*?)[\\.|:]", "",df$email)!="com",sub(".*(.*?)[\\.|:]", "",df$email),paste(df$country))

Kusto query for grouping AppInsights messages

I need the messages in Azure AppInsights grouped by the existence of particular substrings in the messages and the counts of these messages.
At the end, here is what the grouping would look like
messages count
-------- -------
foomessages <say, 300>
barmessages <say, 450>
:
:
where
foomessages = All messages containing the substring "foo" etc.
How can I construct a query for this ?
datatable(log: string) [
"hello world",
"this is a test",
"this is a world test",
"another test"
]
| summarize
LogsWithWorld = countif(log has "world"),
LogsWithTest = countif(log has "test")
| project Result = pack_all()
| mv-expand Result
| extend Message = tostring(bag_keys(Result)[0])
| extend Count = tolong(Result[Message])
| project Message, Count
The produced result is:
| Message | Count |
|---------------|-------|
| LogsWithWorld | 2 |
| LogsWithTest | 3 |
|---------------|-------|

Grouping similar column string values

I have a table in Azure Log Analytics where messages are logged.
There aren't many distinct messages actually, but in every one there is a variable part like an user id or a timestamp.
I need to count the distinct message types grouped by one hour intervals, ignoring the variable elements in every message (UUID and timestamp in this case).
I don't know all the message types.
I cannot touch anything else, I am forced to work with this table.
Example data:
timestamp | message
----------|--------------------------------------------------------
| Message type A for user id 993215f6-c42a-4957-bd55-78d71306a8d0
| Message type A for user id 60e7d02c-770a-4641-b379-6bd33fcd563c
| Message type A for user id 5bf7646c-092b-4e20-ba43-de7fe01010ea
| Another message type containing timestamp hh:mm:ss
| Another message type containing timestamp hh:mm:ss
| Another message type containing timestamp hh:mm:ss
| Type C message <variable_string>
Desired output:
timestamp | distinct_message | count
----------------------------|--------------------------------------------|------
10/2/2019, 10:00:00.000 AM | Message type A for user id | 25
10/2/2019, 10:00:00.000 AM | Another message type containing timestamp | 13
10/2/2019, 10:00:00.000 AM | Type C message | 0
10/2/2019, 11:00:00.000 AM | Message type A for user id | 4
10/2/2019, 11:00:00.000 AM | Another message type containing timestamp | 6
10/2/2019, 11:00:00.000 AM | Type C message | 2
This is what I've managed to create, but my knowledge of KQL is quite limited.
let regex_uid = "[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+";
traces
| where timestamp > ago(1d)
| extend message = replace(regex_uid, "", message)
| extend message = replace("[0-9]+", "", message)
| extend message = iif(message startswith "Type C message", "Type C message", message )
| project timestamp, message, operation_Name
| summarize count(operation_Name) by bin(timestamp, 1h), message
Is there any better way to do this?
another option for you to consider is using the reduce operator: https://learn.microsoft.com/en-us/azure/kusto/query/reduceoperator
the output won't be identical to the one in your question. though if I understand your intention correctly, it follows the same principles.

Normalized Collection not working with reference of child of child

Angular NormalizedCollection is now working when reference of 'bookings.reservationFor.apartmentId' is passed. It was working perfect when I created a sample firebase entry and defined apartmentId as a value of 'reservationFor' directly and passed 'bookings.reservationFor' in select() parameter.
I am new to angularFire. Please let me know what is wrong with this code.
Using Firebase v2.2.9; AngularJs v1.5.6; AngularFire v1.2.0;
firebase structure
FB
|
--apartment
| |
| --apartment1
| |
| --name: "test name"
| --address: "test address"
| --apartment2
| |
| --name: "test name"
| --address: "test address"
|
--booking
|
--"-JFZG3coHOAblHZ7XSjK"
| |
| --date: "booking date 1"
| --reservatonFor:
|
--apartmentId: "apartment1"
|
--"-KJKJASDIUOPIWE9WEeJ"
| |
| --date: "booking date 2"
| --reservatonFor:
|
--apartmentId: "apartment2"
|
--"-YtUTRGJLNL876F3SSwS"
| |
| --date: "booking date 3"
| --reservatonFor:
|
--apartmentId: "apartment1"
|
Controller
function mainCtrlFunc($scope, $firebaseArray) {
var baseRef = new Firebase(firebaseUrl);
var norm = new Firebase.util.NormalizedCollection(
[baseRef.child("booking"), "bookings"]
[baseRef.child("apartment"), "apartments", "bookings.reservationFor.apartmentId"]
).select(
"bookings.date",
"apartments.name"
)ref();
$scope.bookings = $firebaseArray(norm);
}
Error:
Firebase.child failed: First argument was an invalid path: "[object Object]". Paths must be non-empty strings and can't contain ".", "#", "$", "[", or "]"

Drools - Decision tables without constraints

I need to do a rule with no constraints in a decision table.
i.e.:
rule ...
when
$p : Person()
then
$p.setCity("none");
end
I tried these:
| 1 | RuleTable example |
| 2 | CONDITION | ACTION |
| 3 | p:Person() | |
| 4 | name | p.setCity("$param"); |
| 5 | description | config person |
| 6 | | none |
But when I run application throws this exception:
person cannot be resolved
Exception in thread "main" java.lang.IllegalArgumentException: No se puede parsear base de conocimiento.
Probably it fails because you have no real condition in your table.
Try putting $param == $param as condition
Use condition like as shown in picture. It will generate DRL as:
rule "XYZ"
when
doc:Document()
then
doc.setX("Y");
end

Resources