Graphite: Match digit suffix - graphite

THIS IS SPECIFIC TO GRAPHITE
I have the following series:
Foo
Bar1
Bar2
Baz1
Baz2
I'd like to only match series with a digit suffix. But when I use *[0-9], only Bar1 and Baz1 are matched.

Related

What does "strict order" of sequence groups mean?

3.5. Sequence Group section says
Elements enclosed in parentheses are treated as a single element,
whose contents are strictly ordered. Thus,
elem (foo / bar) blat
matches (elem foo blat) or (elem bar blat), and
Given exactly one element is allowed what exact payload does the "strictly ordered" part carry?
The sequence group (a b c) is just for grouping the given elements. This is used for repetitions
like
*(a b c)
The entries inside the sequence group must appear in the given order. Then the sequence group itself can appear/match based on other operators (like the * operator above).
The example shown in 3.5 Sequence Group is only for shown the issue between
elem (foo / bar) blat
and
elem foo / bar blat
that the () is needed when you want a matching of elem foo blat or elem bar blat.

Matching character followed by exactly 1 digit

I need to align formatting of some clinical trial IDs two merge two databases. For example, in database A patient 123 visit 1 is stored as '123v01' and in database B just '123v1'
I can match A to B by grep match those containing 'v0' and strip out the trailing zero to just 'v', but for academic interest & expanding R / regex skills, I want to reverse match B to A by matching only those containing 'v' followed by only 1 digit, so I can then separately pad that digit with a leading zero.
For a reprex:
string <- c("123v1", "123v01", "123v001")
I can match those with >= 2 digits following a 'v', then inverse subset
> idx <- grepl("v(\\d{2})", string)
> string[!idx]
[1] "123v1"
But there must be a way to match 'v' followed by just a single digit only? I have tried the lookarounds
# Negative look ahead "v not followed by 2+ digits"
grepl("v(?!\\d{2})", string)
# Positive look behind "single digit following v"
grepl("(?<=v)\\d{1})", string)
But both return an 'invalid regex' error
Any suggestions?
You need to set the perl=TRUE flag on your grepl function.
e.g.
grepl("v(?!\\d{2})", string, perl=TRUE)
[1] TRUE FALSE FALSE
See this question for more info.
You may use
grepl("v\\d(?!\\d)", string, perl=TRUE)
The v\d(?!\d) pattern matches v, 1 digits and then makes sure there is no digit immediately to the right of the current location (i.e. after the v + 1 digit).
See the regex demo.
Note that you need to enable PCRE regex flavor with the perl=TRUE argument.

Function for assigning a value to types of digrams in a string

I want to write a function that assigns a value to a string based on the alternance of its characters that belong to different classes.
I defined 3 type of classes:
digits <- "[^0-9]"
alphabetical <- "[^a-zA-Z]"
punctuation <- "[^[:punct:]]"
I want the function to:
scan the string left to right
assign the value 1 if two consecutive characters belong to different classes, 0 otherwise, performing the sum at the end.
the more alternances the string has, the higher the value.
For example, for:
123d4ss
I want the function to assign the value '3', because the ordered characters switch first from digit to alphabetical, then from alphabetical to digit and then from digit to alphabetical again.
The following regular expression defines three types of groups: digits, letters, and punctuations.
If we count the number of occurrence of such groups, it will be the number you want plus one.
library(stringr)
regex <- "([0-9]+)|([a-zA-Z]+)|([[:punct:]]+)"
s <- "123d4ss" # digit -> alpha -> digit -> alpha
str_count(s, regex) # gets 4
s <- ",!" # punct only
str_count(s, regex) # gets 1
s <- ",1!Aa9" # punct -> digit -> punct -> alpha -> digit
str_count(s, regex) # gets 5

Partial String Matching by Row

I'm trying to create a unique column in a data frame that has a numeric of the character matches between two strings from the left side of both strings.
Each row represents has a comparison string, which we want to use as a test against a user given string. Given a dataframe:
df <- data.frame(x=c("yhf", "rnmqjk", "wok"), y=c("yh", "rnmj", "ok"))
x y
1 yhf yh
2 rnmqjk rnmj
3 wok ok
Where x is our comparison string and y is our given string, I'm looking to have the values of "2, 3, 0" output in column z., like so:
x y z
1 yhf yh 2
2 rnmqjk rnmj 3
3 wok ok 0
Essentially, I'm looking to have the given strings (y) checked from left -> right against a comparison string (x), and when the characters don't line up to not check the rest of the string and record the match numbers.
Thank you in advance!
This code works for your example:
df$z <- mapply(function(x, y) which.max(x != y),
strsplit(as.character(df$x), split=""),
strsplit(as.character(df$y), split="")) - 1
df
x y z
1 yhf yh 2
2 rnmqjk rnmj 3
3 wok ok 0
As an outline, strsplit splits a string vector into a list of character vectors. Here, each element of a vector is a single character (with the split="" argument). The which.max function returns the first position where it's argument is the maximum of the vector. Since The vectors returned by x != y are logical, which.max returns the first position where a difference is observed. mapply takes a function and lists and applies the provided function to corresponding elements of the lists.
Note that this produces warnings that the lengths of the strings don't match. This could be addressed in a couple of ways, the easiest is wrapping the function in suppressWarnings if the messages bug you.
As the OP notes int the comments if there are instances where the entire word matches, then which.max returns 1. To return the same length as the string, I'd add a second line of code that combines logical subsetting with the nchar function:
df$z[as.character(df$x) == as.character(df$y)] <-
nchar(as.character(df$x[as.character(df$x) == as.character(df$y)]))

How to sort naturally in realm.io?

I have a field "name" (String) with values like "FOO1000", "FOO1100", "FOO150" etc. and use realm-io for Java (Android, 0.89.1)) to get them.
When I use .findAllSorted("name", Sort.ASCENDING) they're not sorted naturally. The output will be
FOO1000
FOO1100
FOO150
What I want to achieve is a natural sorting like
FOO150
FOO1000
FOO1100
Is there a way to have natural sorting?
Realm uses a standard sorting algoritm. It will sort one character at a time, and since 5 is bigger than 1 it comes last. You can't expect to interpret parts of a string as a number.
If you had 150, 1000, 1100 as a number and sorted that you would get the sort order you expect. If you need to have a string combined with a number you should have FOO0150 instead of FOO150 and it would be sorted as you expect.
Realm does not have natural sorting by default. Padding the number with leading zeroes will simulate natural sorting.
e.g.
Foo 100
Foo 2000
Foo 40
Foo 9
Becomes (add more zeroes as you require)
Foo 0000100
Foo 0002000
Foo 0000040
Foo 0000009
Which yields a naturally sorted output
Foo 0000009
Foo 0000040
Foo 0000100
Foo 0002000
You may wish to create a new field that stores the padded value separately.

Resources