Find pattern containing string not preceeded by other string - r

I would like to use the grepl function in R to find if a string contains something, but on the condition that it is not preceeded by something else.
So for example say I wanted to find a string which includes the pattern 'xx', as long as it is not preceeded by 'yy'. So:
'123xx45' world return TRUE
'123yy4xx5' would also return TRUE as the 'yy' is not immediately preceding 'xx'
However '123yyxx45' would return FALSE.
Please let me know if anything is unclear or you would like a better example.

How about grepl('(?<!yy)xx', c('123yy4xx5','123xx45','123yyxx45'), perl=TRUE)?

your.data <- c('123yy4xx5','123xx45','123yyxx45')
grepl("xx",your.data) & !grepl("yyxx",your.data)
[1] TRUE TRUE FALSE

Related

R: Searching for a certain, delimited string

I'm looking for a way in R to search for a certain, delimited string.
In my example I need to receive TRUE if a cell contains "HDT2" and not "HDT21" or "HDT24" and so on, because this string contains HDT2 as well.
So right now I am using
grepl("HDT2",data.label[d,2])
in a for-loop to check each row of the second column of data.label for "HDT2". The problem is that this also returns TRUE if there is more than just "HDT2". As for example it returns also true if there is "HDT21" or "HDT24", but this is not what i want.
Is there a way to only check for a certain, delimited string?
Thanks!
EDIT: The strings I have to check are longer than just "HDT2". The string is for example "HDT2 (Arm 1: reference)".
You can use the following regular expression in grepl(). This will return true for an exact match of "HDT2", with nothing coming before or after it.
grepl("^HDT2$",data.label[d,2])
Usage:
> grepl("^HDT2$", "HDT2")
[1] TRUE
> grepl("^HDT2$", "AHDT2")
[1] FALSE
> grepl("^HDT2$", "HDT2 (Arm 1: reference)")
[1] FALSE

(R) IF statement: stop & warn if FALSE else continue

I'm making a function and before it does any of the hard stuff I need it to check that all the column names listed in the 'samples' dataset are also present in the 'grids' dataset (the function maps one onto the other).
all(names(samples[expvar]) %in% names(grids))
This does that: the code within all() asks if all the names in the list ('expvar') of columns in 'samples' are also names in 'grids'. The output for a correct length=3, expvar would be TRUE TRUE TRUE. 'all' asks if all are TRUE, so the output here is TRUE. I want to make an IF statement along the lines of:
if(all(names(samples[expvar]) %in% names(grids)) = FALSE) {stop("Not all expvar column names found as column names in grids")}
No else needed, it'll just carry on. The problem is that the '= FALSE' is redundant because all() is a logically evaluable statement... is there a "carry on" function, e.g.
if(all(etc)) CARRYON else {stop("warning")}
Or, can anyone think of a way I can restructure this to make it work?
You're looking for the function stopifnot.
However you don't need to implement it as
if (okay) {
# do stuff
} else {
stop()
}
which is what you have. Instead you can do
if (!okay) {
stop()
}
# do stuff
since the lines will execute in sequential order. But, again, it might be more readable to use stopifnot, as in:
stopifnot(okay)
# do stuff
I would code it:
if(!all(...))
stop(...)
... rest of program ...

Allow only alphanumeric character

i want to allow only alphanumeric password i have written following code to match the same
but the same is not working
Regex.IsMatch(txtpassword.Text, "^[a-zA-Z0-9_]*$") never return false
even if i type password test(which do not contain any number).
ElseIf Regex.IsMatch(txtpassword.Text, "^[a-zA-Z0-9_]*$") = False Then
div_msg.Attributes.Add("class", "err-msg")
lblmsg.Text = "password is incorrect"
I have tried this also
Dim r As New Regex("^[a-zA-Z0-9]+$")
Dim bool As Boolean
bool = r.IsMatch(txtpassword.Text) and for txtpassword.Text = '4444' , bool is coming true i dont know what is wrong.
First of all, the '_' is not a valid alpha-numeric character.
See http://en.wikipedia.org/wiki/Alphanumeric
And, second, take another look at your regular expression.
[a-zA-Z0-9_]*
This can match 0 OR more alpha-numeric characters or 0 OR more '_' characters.
Using this pattern, a password '&#&#^$' would return TRUE.
You probably want to test for 1 OR more characters that ARE NOT an alpha-numeric. If that test returns TRUE, then throw the error.
Hope this helps.
Try the following Expression:
([^a-zA-Z0-9]+)
This will match if your Password contains any character that is not alphanumeric.
If you get a match, do your error handling.
So based on the Regex that you have in the question, it appears you want a password with one lower-case and upper-case letter, one number, and an _; so here is a Regex that will do that:
(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*_).{4,8}
Debuggex Demo
The {4,8} indicates the length of the password; you can set that accordingly.

What is wrong with this code? why the List does not identify?

what is wrong with this code?
bool claimExists;
string currentClaimControlNo = "700209308399870";
List<string> claimControlNo = new List<string>();
claimControlNo.Add("700209308399870");
if (claimControlNo.Contains(currentClaimControlNo.Substring(0, 14)))
claimExists = true;
else
claimExists = false;
Why the claimControlNo above is coming into false?
Since I know the value exists, how can i tune the code?
It's reporting false because you aren't asking whether the list contains the currentClaimControlNo, you're asking whether it contains a string that is the first fourteen characters of the fifteen-character string currentClaimControlNo.
Try this instead:
claimExists = claimControlNo.Any(ccn => ccn.StartsWith(currentClaimControlNo.Substring(0,14)));
Your count is wrong. There are 15 characters. Your substring is cutting off the last 0 which fails the condition.
Because you're shaving off the last digit in your substring.
if you change the line
if (claimControlNo.Contains(currentClaimControlNo.Substring(0, 14)))
to
if (claimControlNo.Contains(currentClaimControlNo.Substring(0, 15)))
it works.
Because contains on a list looks for the whole item, not a substring:
currentClaimControlNo.Substring(0, 14)
"70020930839987"
Is not the same as
700209308399870
You're missing a digit, hence why your list search is failing.
I think you are trying to find something in the list that contains that substring. Don't use the lists contain method. If you are trying to find something in the list that has the subset do this
claimExists = claimControlNo.Any(item => item.Contains(currentClaimControlNo.Substring(0, 14)))
This goes through each item in claimControlNo and each item can then check if it contains the substring.
Why do it this way? The Contains method on a string
Returns a value indicating whether the specified System.String object occurs within this string.
Which is what you want.
Contains on a list, however
Determines whether an element is in the System.Collections.Generic.List.
They aren't the same, hence your confusion
Do you really need this explaining?
You are calling Substring for 14 characters when the string is of length 15. Then you are checking if your list (which only has one item of length 15) contains an item of length 14. It doesn;t event need to check the value, the length is enough to determine it is not a match.
The solution of course is to not do the Substring, it makes not sense.
Which would look like this:
if (claimControlNo.Contains(currentClaimControlNo))
claimExists = true;
else
claimExists = false;
Then again, perhaps you know you are trimming the search, and are in fact looking for anything that has a partial match within the list?
If this is the case, then you can simply loop the list and do a Contains on each item. Something like this:
bool claimExists = false;
string searchString = currentClaimControlNo.Substring(0, 14);
foreach(var s in claimControlNo)
{
if(s.Contains(searchString))
{
claimExists = true;
break;
}
}
Or use some slightly complex (certainly more complex then I can remember off the top of my head) LINQ query. Quick guess (it's probably right to be fair, I am pretty freaking awesome):
bool claimExists = claimControlNo.Any(x => x.Contains(searchString));
Check it:
// str will be equal to 70020930839987
var str = currentClaimControlNo.Substring(0, 14);
List<string> claimControlNo = new List<string>();
claimControlNo.Add("700209308399870");
The value str isn't contained in the list.

Find word (not containing substrings) in comma separated string

I'm using a linq query where i do something liike this:
viewModel.REGISTRATIONGRPS = (From a In db.TABLEA
Select New SubViewModel With {
.SOMEVALUE1 = a.SOMEVALUE1,
...
...
.SOMEVALUE2 = If(commaseparatedstring.Contains(a.SOMEVALUE1), True, False)
}).ToList()
Now my Problem is that this does'n search for words but for substrings so for example:
commaseparatedstring = "EWM,KI,KP"
SOMEVALUE1 = "EW"
It returns true because it's contained in EWM?
What i would need is to find words (not containing substrings) in the comma separated string!
Option 1: Regular Expressions
Regex.IsMatch(commaseparatedstring, #"\b" + Regex.Escape(a.SOMEVALUE1) + #"\b")
The \b parts are called "word boundaries" and tell the regex engine that you are looking for a "full word". The Regex.Escape(...) ensures that the regex engine will not try to interpret "special characters" in the text you are trying to match. For example, if you are trying to match "one+two", the Regex.Escape method will return "one\+two".
Also, be sure to include the System.Text.RegularExpressions at the top of your code file.
See Regex.IsMatch Method (String, String) on MSDN for more information.
Option 2: Split the String
You could also try splitting the string which would be a bit simpler, though probably less efficient.
commaseparatedstring.Split(new Char[] { ',' }).Contains( a.SOMEVALUE1 )
what about:
- separating the commaseparatedstring by comma
- calling equals() on each substring instead of contains() on whole thing?
.SOMEVALUE2 = If(commaseparatedstring.Split(',').Contains(a.SOMEVALUE1), True, False)

Resources