EBNF Definition of Identifier - bnf

The EBNF definition of an identifier is (a-zA-Z, _ ){a-zA-Z0-9, _ }. Can someone explain this definition and give me a valid identifier by this definition.

The syntax of EBNF like languages differ a lot.
Normally I would define something like this:
letter = "a" | "b" | ... | "z" | "A" | ... | "Z";
digit = "0" | "1" | "2" | ... | "9";
identifier = letter , { letter | digit | "_" } ;
Your form looks like a mixture of EBNF and regex.
It is hard to tell what this means if I don't know which language we are talking about.
But by pure guessing, I would say it describes a C-like identifier (e.g. variable name) like "myVar_0123ab".
The identifier has to start with a letter, or an underline '_', followed by letters, underlines and digits.

Related

Convert EBNF to BNF

I am struggling to convert this EBNF to BNF. Using the image:
I converted this to EBNF and would like to now convert this to BNF.
The EBNF I have two alternatives:
number_constant ::= ( | "-") digit+ ("." digit+ | )
number_constant ::= "-"? digit+ ("." digit+)?
The part where I am struggling is the middle of the diagram, I have digit defined as 1-9 so can't use digit as keyword. I was thinking of breaking down the diagram such as the first part:
<min> ::= ' ' | "-"
Then for the mid part:
<dig> ::= <digit> | <digit> <dig>
Combined this would look simply like:
<number_constant> ::= <min> <dig> <last_part>
Then I am unsure of the last part.
Any help is appreciated.
Your dig solution seems correct.
The last part can be implemented with:
<last_part> ::= "." <dig> | ""
Extended BNF sure lets you have things a lot more concise.
Here's a variation based on the semantics of what goes into making up a decimal number:
<number_constant> ::= <integer>
| <integer> '.' <whole_number>
<integer> ::= <integer>
| '- <whole_number>
<whole_number> ::= Digit
| <whole_number> Digit

How to replace text sequences ending in a fixed pattern within a long text string in R?

I have a column within a data frame containing long text sequences (often in the thousands of characters) of the format:
abab(VR) | ddee(NR) | def(NR) | fff(VR) | oqq | pqq | ppf(VR)
i.e. a string, a suffix in brackets, then a delimiter
I'm trying to work out the syntax in R to delete the items that end in (VR), including the trailing pipe if present, so that I'm left with:
ddee(NR) | def(NR) | oqq | pqq
I cannot work out the regular expression (or gsub) that will remove these entries and would like to request if anyone could help me please.
If you want to use gsub, you can remove the pattern in two stages:
gsub(" \\| $", "", gsub("\\w+\\(VR\\)( \\| )?", "", s))
# firstly remove all words ending with (VR) and optional | following the pattern and
# then remove the possible | at the end of the string
# [1] "ddee(NR) | def(NR) | oqq | pqq"
regular expression \\w+\\(VR\\) will match words ending with (VR), parentheses are escaped by \\;
( \\| )? matches optional delimiter |, this makes sure it will match the pattern both in the middle and at the end of the string;
possible | left out at the end of the string can be removed by a second gsub;
Here is a method using strsplit and paste with the collapse argument:
paste(sapply(strsplit(temp, split=" +\\| +"),
function(i) { i[setdiff(seq_along(i), grep("\\(VR\\)$", i))] }),
collapse=" | ")
[1] "ddee(NR) | def(NR) | oqq | pqq"
We split on the pipe and spaces, then feed the resulting list to sapply which uses the grep function to drop any elements of the vector that end with "(VR)". Finally, the result is pasted together.
I added a subsetting method with setdiff so that vectors without any "(VR)" will return without any modification.

neo4j graphity how to implement

so I have been trying for quite some time to impelement the Graphity on neo4j
But i can find a way to build the queries, anyone have any leads?
for example on the neo4j document for Graphity there is a query just to get only the first element on the chain. how do i get the second one?
and also why there is an order by in the query? isn't that algorithm suppose to eliminate that?
Here is the query:
MATCH p=(me { name: 'Jane' })-[:jane_knows*]->(friend),(friend)-[:has]->(status)
RETURN me.name, friend.name, status.name, length(p)
ORDER BY length(p)
[UPDATED]
That is a variable-length query (notice the * in the relationship pattern), and it gets all the elements in the chain in N result rows (where N is the length of the chain). Each result row's path will contain the previous row's path (if there was a previous row) plus the next element in the chain. And, because every row has a different path length, ordering by the path length makes sense.
If you want to see the names (in order) of all the statuses for each friend, this query should do that:
MATCH p=(me { name: 'Jane' })-[:jane_knows*]->(friend)
WITH me, friend, LENGTH(p) AS len
MATCH (friend)-[:has|next*]->(status)
RETURN me.name, friend.name, COLLECT(status.name), len
ORDER BY len;
With the same data as in the linked example, the result is:
+-----------------------------------------------------+
| me.name | friend.name | COLLECT(status.name) | len |
+-----------------------------------------------------+
| "Jane" | "Bill" | ["Bill_s1","Bill_s2"] | 1 |
| "Jane" | "Joe" | ["Joe_s1","Joe_s2"] | 2 |
| "Jane" | "Bob" | ["Bob_s1"] | 3 |
+-----------------------------------------------------+

Google Chrome: How to find out the name of Transition from its id in History sql-lite db

I am reading Google Chrome History from its Sql-Lite Db.
Table Name: Visits
Structure:
+-----+------------------+-----------+-----+--------+-----+
| "0" | "id" | "INTEGER" | "0" | "NULL" | "1" |
| "1" | "url" | "INTEGER" | "1" | "NULL" | "0" |
| "2" | "visit_time" | "INTEGER" | "1" | "NULL" | "0" |
| "3" | "from_visit" | "INTEGER" | "0" | "NULL" | "0" |
| "4" | "transition" | "INTEGER" | "1" | "0" | "0" |
| "5" | "segment_id" | "INTEGER" | "0" | "NULL" | "0" |
| "6" | "visit_duration" | "INTEGER" | "1" | "0" | "0" |
+-----+------------------+-----------+-----+--------+-----+
I was trying to find out what does transition means then I found the link : Page Transitions and according to it Google Chrome stores a transition value which identifies the type of transition between pages. These are stored in the history database to separate visits, and are reported by the renderer for page navigations.
There are many types of transitions like LINK, TYPED etc...
In sql lite table Google Chrome integer values.
Problem
How to figure out the Transition from the integer value??
There are some more tables in the DB but none of them contains any table representing the meaning of these values.
Other tables are:
Probably a little late, but I'll just leave this here for someone else.
Here is the relevant code from Chromium source -
https://github.com/adobe/chromium/blob/cfe5bf0b51b1f6b9fe239c2a3c2f2364da9967d7/content/public/common/page_transition_types.cc
Basic idea is that you take the integer value from the database and convert to hex.
Perform a Logical AND operation on it and convert the result back to integer.
Run it through a switch case and get the string value back.
For Eg : In Javascript you can do the following.
>> "822083585".toString(16) & 0xff
1
>> "1610612736".toString(16) & 0xff
0
based on #jayarma S answer and on https://github.com/adobe/chromium/blob/cfe5bf0b51b1f6b9fe239c2a3c2f2364da9967d7/content/public/common/page_transition_types.h
You can map the Transition Types as follows:
LINK = 0
TYPED = 1
AUTO_BOOKMARK = 2
AUTO_SUBFRAME = 3
MANUAL_SUBFRAME = 4
GENERATED = 5
START_PAGE = 6
FORM_SUBMIT = 7
RELOAD = 8
KEYWORD = 9
KEYWORD_GENERATED = 10
You can get these core transition type values by applying the core mask: 0xFF
There are also qualifiers that can also define the transition:
FORWARD_BACK = 0x01000000
FROM_ADDRESS_BAR = 0x02000000
HOME_PAGE = 0x04000000
CHAIN_START = 0x10000000
CHAIN_END = 0x20000000
CLIENT_REDIRECT = 0x40000000
SERVER_REDIRECT = 0x80000000
IS_REDIRECT_MASK = 0xC0000000
You can get these qualifier transition type values by applying the qualifier mask: 0xFFFFFF00
Here is an SQLite query to get the transition types:
select u1.title as to_url_title,
u1.url as to_url,
CASE vs.transition & 0xff
WHEN 0
THEN 'LINK'
WHEN 1
THEN 'TYPED'
WHEN 2
THEN 'AUTO_BOOKMARK'
WHEN 3
THEN 'AUTO_SUBFRAME'
WHEN 4
THEN 'MANUAL_SUBFRAME'
WHEN 5
THEN 'GENERATED'
WHEN 6
THEN 'START_PAGE'
WHEN 7
THEN 'FORM_SUBMIT'
WHEN 8
THEN 'RELOAD'
WHEN 9
THEN 'KEYWORD'
WHEN 10
THEN 'KEYWORD_GENERATED'
ELSE NULL
END core_transition_type,
CASE vs.transition & 0xFFFFFF00
WHEN 0x01000000
THEN 'FORWARD_BACK'
WHEN 0x02000000
THEN 'FROM_ADDRESS_BAR'
WHEN 0x04000000
THEN 'HOME_PAGE'
WHEN 0x10000000
THEN 'CHAIN_START'
WHEN 0x20000000
THEN 'CHAIN_END'
WHEN 0x40000000
THEN 'CLIENT_REDIRECT'
WHEN 0x80000000
THEN 'SERVER_REDIRECT'
WHEN 0xC0000000
THEN 'IS_REDIRECT_MASK'
ELSE NULL
END qualifier_transition_type
from visits as vs
join urls u1 on u1.id = vs.url
order by vs.visit_time DESC;

How to strsplit using '|' character, it behaves unexpectedly?

I would like to split a string of character at pattern "|"
but
unlist(strsplit("I am | very smart", " | "))
[1] "I" "am" "|" "very" "smart"
or
gsub(pattern="|", replacement="*", x="I am | very smart")
[1] "*I* *a*m* *|* *v*e*r*y* *s*m*a*r*t*"
The problem is that by default strsplit interprets " | " as a regular expression, in which | has special meaning (as "or").
Use fixed argument:
unlist(strsplit("I am | very smart", " | ", fixed=TRUE))
# [1] "I am" "very smart"
Side effect is faster computation.
stringr alternative:
unlist(stringr::str_split("I am | very smart", fixed(" | ")))
| is a metacharacter. You need to escape it (using \\ before it).
> unlist(strsplit("I am | very smart", " \\| "))
[1] "I am" "very smart"
> sub(pattern="\\|", replacement="*", x="I am | very smart")
[1] "I am * very smart"
Edit: The reason you need two backslashes is that the single backslash prefix is reserved for special symbols such as \n (newline) and \t (tab). For more information look in the help page ?regex. The other metacharacters are . \ | ( ) [ { ^ $ * + ?
If you are parsing a table than calling read.table might be a better option. Tiny example:
> txt <- textConnection("I am | very smart")
> read.table(txt, sep='|')
V1 V2
1 I am very smart
So I would suggest to fetch the wiki page with Rcurl, grab the interesting part of the page with XML (which has a really neat function to parse HTML tables also) and if HTML format is not available call read.table with specified sep. Good luck!
Pipe '|' is a metacharacter, used as an 'OR' operator in regular expression.
try
unlist(strsplit("I am | very smart", "\s+\|\s+"))

Resources