order by accented characters in oracle - oracle11g

I have a situation like below:
select p.last_name from person p where user_id = 'mprakash' and contains(last_name , 'fré%') > 0
Result:
FreNormal
Frederic
Frédéric
Frêdéric
result comes as above......accented characters always comes last even if i am giving the search condition with using accented characters 'fré%'
Is there something through which we can get result below:
Frédéric
Frêdéric
FreNormal
Frederic
or if I give search as contains(last_name , 'frê%') > 0
then result comes as below :
Frêdéric
Frédéric
FreNormal
Frederic
and if i search as contains(last_name , 'fre%') > 0 then result would be
FreNormal
Frederic
Frêdéric
Frédéric

Related

Regex in if else statement in R

I have a rather simple question. I am trying to get the if else statement below to work.
It is supposed to assign '1' if the if statement is met, 0 otherwise.
My problem is that I cannot get the regex in the if statement to work ('\w*|\W*). It is supposed to specify the condition that the string either is "Registration Required" or Registration required followed by any character. I cannot specify the exact cases, because following the "Registration required" (in the cases where something follows), it will usually be a date (varying for each observation) and a few words.
Registration_cleaned <- c()
for (i in 1:length(Registration)) {
if (Registration[i] == ' Registration Required\\w*|\\W*') {
Meta_Registration_cleaned <- 1
} else {
Meta_Registration_cleaned <- 0
}
Registration_cleaned <- c(Registration_cleaned, Meta_Registration_cleaned)
}
You may use transform together with ifelse function to set the Meta_Registration_cleaned.
For matching the regular expression grep function can be used with pattern "Registration Required\w*".
Registration <- data.frame(reg = c("Registration Required", "Registration Required ddfdqf","some str", "Regixxstration Required ddfdqf"),stringsAsFactors = F)
transform(Registration,Meta_Registration_cleaned = ifelse(grepl("Registration Required\\w*",Registration[,"reg"]), 1, 0))
Gives result:
reg Meta_Registration_cleaned
1 Registration Required 1
2 Registration Required ddfdqf 1
3 some str 0
4 Regixxstration Required ddfdqf 0
I might have misunderstood the OP completely, because I have understood the question entirely differently than anyone else here.
My comment earlier suggested looking for the regex at the end of the string.
Registration <- data.frame(reg = c("Registration Required", "Registration Required ddfdqf","Registration Required 10/12/2000"),stringsAsFactors = F)
#thanks #user1653941 for drafting the sample vector
Registration$Meta_Registration_cleaned <- grepl('Registration required$', Registration$reg, ignore.case = TRUE)
Registration
1 Registration Required TRUE
2 Registration Required ddfdqf FALSE
3 Registration Required 10/12/2000 FALSE
I understand the OP as such that the condition is: Either the string "Registration required" without following characters, or... anything else. Looking forward to the OPs comment.

R regex match things other than known characters

For a text field, I would like to expose those that contain invalid characters. The list of invalid characters is unknown; I only know the list of accepted ones.
For example for French language, the accepted list is
A-z, 1-9, [punc::], space, àéèçè, hyphen, etc.
The list of invalid charactersis unknown, yet I want anything unusual to resurface, for example, I would want
This is an 2-piece à-la-carte dessert to pass when
'Ã this Øs an apple' pumps up as an anomalie
The 'not contain' notion in R does not behave as I would like, for example
grep("[^(abc)]",c("abcdef", "defabc", "apple") )
(those that does not contain 'abc') match all three while
grep("(abc)",c("abcdef", "defabc", "apple") )
behaves correctly and match only the first two. Am I missing something
How can we do that in R ? Also, how can we put hypen together in the list of accepted characters ?
[a-z1-9[:punct:] àâæçéèêëîïôœùûüÿ-]+
The above regex matches any of the following (one or more times). Note that the parameter ignore.case=T used in the code below allows the following to also match uppercase variants of the letters.
a-z Any lowercase ASCII letter
1-9 Any digit in the range from 1 to 9 (excludes 0)
[:punct:] Any punctuation character
The space character
àâæçéèêëîïôœùûüÿ Any valid French character with a diacritic mark
- The hyphen character
See code in use here
x <- c("This is an 2-piece à-la-carte dessert", "Ã this Øs an apple")
gsub("[a-z1-9[:punct:] àâæçéèêëîïôœùûüÿ-]+", "", x, ignore.case=T)
The code above replaces all valid characters with nothing. The result is all invalid characters that exist in the string. The following is the output:
[1] "" "ÃØ"
If by "expose the invalid characters" you mean delete the "accepted" ones, then a regex character class should be helpful. From the ?regex help page we can see that a hyphen is already part of the punctuation character vector;
[:punct:]
Punctuation characters:
! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { | } ~
So the code could be:
x <- 'Ã this Øs an apple'
gsub("[A-z1-9[:punct:] àéèçè]+", "", x)
#[1] "ÃØ"
Note that regex has a predefined, locale-specific "[:alpha:]" named character class that would probably be both safer and more compact than the expression "[A-zàéèçè]" especially since the post from ctwheels suggests that you missed a few. The ?regex page indicates that "[0-9A-Za-z]" might be both locale- and encoding-specific.
If by "expose" you instead meant "identify the postion within the string" then you could use the negation operator "^" within the character class formalism and apply gregexpr:
gregexpr("[^A-z1-9[:punct:] àéèçè]+", x)
[[1]]
[1] 1 8
attr(,"match.length")
[1] 1 1

R sprintf in sqldf's like

I would like to do a looping query in R using sqldf to that select all non-NULL X.1 variable with date "11/12/2015" and at 9AM. Example :
StartDate X.1
11/12/2015 09:14 A
11/12/2015 09:36
11/12/2015 09:54 A
The date is in variable that generated from other query
nullob<-0
dayminnull<-as.numeric(sqldf("SELECT substr(Min(StartDate),1,03)as hari from testes")) # this produce "11/12/2015"
for (i in 1 : 12){
dday<-mdy(dayminnull)+days(i) #go to next day
sqlsql <- sprintf("SELECT count([X.1]) FROM testes where StartDate like '% \%s 09: %'", dday)
x[i]<-sqldf(sqlsql)
nullob<-nullob+x[i]
}
And it comes with error : Error in sprintf("SELECT count([X.1]) FROM testes WHERE StartDate like '%%s 09%'", :
unrecognised format specification '%'
Please hellp. thank you in advance
It's not super clear in the documentation, but a % followed by a %, that is %%, is the way to tell sprintf to use a literal %. We can test this fairly easily:
sprintf("%% %s %%", "hi")
[1] "% hi %"
For your query string, this should work:
sprintf("SELECT count([X.1]) FROM testes where StartDate like '%% %s 09: %%'", dday)
From ?sprintf:
The string fmt contains normal characters, which are passed through to
the output string, and also conversion specifications which operate on
the arguments provided through .... The allowed conversion
specifications start with a % and end with one of the letters in the
set aAdifeEgGosxX%. These letters denote the following types:
... [Documentation on aAdifeEgGosxX]
%: Literal % (none of the extra formatting characters given below are permitted in this case).

symbols being printed when english alphabet wanted

I am writing this code and have recently come across an error. I have no idea why this is happening. In theory, the english alphabet should be being printed. However, instead of the english alphabet, symbols are being printed instead.
I can not paste the symbols for some reason, but if you ran the code yourself, you'll understand what I mean.
My full code is posted below.
alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFHIJKLMNOPQRSTUVWXYZ0123456789"
choice = input("Would you like to encrypt or decrypt? [e/d]: ")
if choice == "e":
message = input("Please insert the message you would like to use: ")
keyword = input("Please insert the keyword you would like to use: ")
ik = len(keyword)
i = 0
string = ''
for A in message:
message1 = (ord(A)) - 96
key1 = (ord(keyword[i])) - 96
addition = message1 + key1
string += (chr(addition))
if i >= ik:
i = 0
else:
i += 1
print (string)
You need to add back the 96 you originally took away :) Alternatively, use the Caesar cipher formula as adding back 96 will still result in symbols appearing (I did the ocr coursework already)
addition = message1 + key1 + 96
your code will not work if the keyword is shorter than the message, so use the modulo operator (%) on i with the length of the keyword inside the line:
key1 = (ord(keyword[i])) - 96

Regular Expression To exclude sub-string name(job corps) Includes at least 1 upper case letter, 1 lower case letter, 1 number and 1 symbol except "#"

Regular Expression To exclude sub-string name(job corps)
Includes at least 1 upper case letter, 1 lower case letter, 1 number and 1 symbol except "#"
I have written something like below :
^((?!job corps).)(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!#$%^&*]).*$
I tested with the above regular expression, not working for special character.
can anyone guide on this..
If I understand well your requirements, you can use this pattern:
^(?![^a-z]*$|[^A-Z]*$|[^0-9]*$|[^!#$%^&*]*$|.*?job corps)[^#]*$
If you only want to allow characters from [a-zA-Z0-9^#$%&*] changes the pattern to:
^(?![^a-z]*$|[^A-Z]*$|[^0-9]*$|[^!#$%^&*]*$|.*?job corps)[a-zA-Z0-9^#$%&*]*$
details:
^ # start of the string
(?! # not followed by any of these cases
[^a-z]*$ # non lowercase letters until the end
|
[^A-Z]*$ # non uppercase letters until the end
|
[^0-9]*$
|
[^!#$%^&*]*$
|
.*?job corps # any characters and "job corps"
)
[^#]* # characters that are not a #
$ # end of the string
demo
Note: you can write the range #$%& like #-& to win a character.
stribizhev, your answer is correct
^(?!.job corps)(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[!#$%^&])(?!.#).$
can verify the expression in following url:
http://www.freeformatter.com/regex-tester.html

Resources