capital letters in firt letter - capitalization

In python, I want a program that turn the first leter on a word capital letter.
For exemple:
turn "a red apple is sweeter than a green apple" in "A Red Apple is Sweeter Than A Green Apple"
How can I do?
I've tried this:
d = input('insert a quote')
def mydic(d):
dic = {}
for i in d:
palavras = dic.keys()
if i in palavras:
dic[i] += 1
else :
dic[i] = 1
return dic

You could use the title() method.
For example:
sentence = str(input("Insert a quote: ")).title()
print(sentence)
Input: a red apple is sweeter than a green apple
Output: A Red Apple Is Sweeter Than A Green Apple

What you want to do is this:
split the input string into words ie. string.split(' ') splits a given string by spaces, returns a list.
for each word, capitalize the first letter and concatenate onto a bigger string ie. word[:1].upper() + word[1:] this will uppercase the first letter
Add all the words back into a list and return it.

Related

Regex to match only semicolons not in parenthesis [duplicate]

This question already has answers here:
Regex - Split String on Comma, Skip Anything Between Balanced Parentheses
(2 answers)
Closed 1 year ago.
I have the following string:
Almonds ; Roasted Peanuts (Peanuts; Canola Oil (Antioxidants (319; 320)); Salt); Cashews
I want to replace the semicolons that are not in parenthesis to commas. There can be any number of brackets and any number of semicolons within the brackets and the result should look like this:
Almonds , Roasted Peanuts (Peanuts; Canola Oil (Antioxidants (319; 320)); Salt), Cashews
This is my current code:
x<- Almonds ; Roasted Peanuts (Peanuts; Canola Oil (Antioxidants (319; 320)); Salt); Cashews
gsub(";(?![^(]*\\))",",",x,perl=TRUE)
[1] "Almonds , Roasted Peanuts (Peanuts, Canola Oil (Antioxidants (319; 320)); Salt), Cashews "
The problem I am facing is if there's a nested () inside a bigger bracket, the regex I have will replace the semicolon to comma.
Can I please get some help on regex that will solve the problem? Thank you in advance.
The pattern ;(?![^(]*\)) means matching a semicolon, and assert that what is to the right is not a ) without a ( in between.
That assertion will be true for a nested opening parenthesis, and will still match the ;
You could use a recursive pattern to match nested parenthesis to match what you don't want to change, and then use a SKIP FAIL approach.
Then you can match the semicolons and replace them with a comma.
[^;]*(\((?>[^()]+|(?1))*\))(*SKIP)(*F)|;
In parts, the pattern matches
[^;]* Match 0+ times any char except ;
( Capture group 1
\( Match the opening (
(?> Atomic group
[^()]+ Match 1+ times any char except ( and )
| Or
(?1) Recurse the whole first sub pattern (group 1)
)* Close the atomic group and optionally repeat
\) Match the closing )
) Close group 1
(*SKIP)(*F) Skip what is matched
| Or
; Match a semicolon
See a regex demo and an R demo.
x <- c("Almonds ; Roasted Peanuts (Peanuts; Canola Oil (Antioxidants (319; 320)); Salt); Cashews",
"Peanuts (32.5%); Macadamia Nuts (14%; PPPG(AHA)); Hazelnuts (9%); nuts(98%)")
gsub("[^;]*(\\((?>[^()]+|(?1))*\\))(*SKIP)(*F)|;",",",x,perl=TRUE)
Output
[1] "Almonds , Roasted Peanuts (Peanuts; Canola Oil (Antioxidants (319; 320)); Salt), Cashews"
[2] "Peanuts (32.5%), Macadamia Nuts (14%; PPPG(AHA)), Hazelnuts (9%), nuts(98%)"

Difference between stem and normalized_stem in wowool lexicons

I am using wowool but in the lexicons I don't see any difference between stem or normalized_stem. When should I use one or the other?
My sample is from the documentation: "I like kiwis. KIWIS are good."
Both seem to match with
lexicon: (input="stem") : { kiwi } =Fruit;
and
lexicon: (input="normalized_stem") : { kiwi } =Fruit;
This is normal because the root form of KIWIS is kiwiso the stem and normalized_stem will match.
If you would use Kiwi with a initial capital then only the normalized_stem will match, the reason is the stem of Kiwi is a Proper Noun so it will not be stemmed.
I advise you to look at the stem of the words when you are trying to decide whether to use stem or normalized_stem.
// Wowool Source
lexicon: (input="stem") { kiwi } =S_Fruit;
lexicon: (input="normalized_stem") { kiwi } =NS_Fruit;
./wow -l en -i "I like kiwis. I like Kiwis are good. Kiwis" --domains rules
-- EyeOnText WoWoolConsole 2.1.0
1:Process:stream_16840253095957608044 (42b/42b)
Language:english
s(0,13)
{Sentence
t(0,1) "I" (init-cap, init-token)['I':Pron-Pers, +1p, +sg]
t(2,6) "like" ['like':V-Pres, +inf, +positive]
{NS_Fruit
{S_Fruit
t(7,12) "kiwis" ['kiwi':Nn-Pl]
}S_Fruit }NS_Fruit
t(12,13) "." ['.':Punct-Sent]
}Sentence
s(14,36)
{Sentence
t(14,15) "I" (init-cap, init-token)['I':Pron-Pers, +1p, +sg]
t(16,20) "like" ['like':V-Pres, +inf, +positive]
t(21,26) "Kiwis" (init-cap, nf, nf-lex)['Kiwis':Prop-Std]
t(27,30) "are" ['be':V-Pres-Pl-be]
t(31,35) "good" ['good':Adj-Std]
t(35,36) "." ['.':Punct-Sent]
}Sentence
s(37,42)
{Sentence
{NS_Fruit
{S_Fruit
t(37,42) "Kiwis" (init-cap, init-token)['kiwi':Nn-Pl]
}S_Fruit }NS_Fruit }Sentence

Text Mining R Package & Regex to handle Replace Smart Curly Quotes

I've got a bunch of texts like this below with different smart quotes - for single and double quotes. All I could end up with the packages I'm aware of is to remove those characters but I want them to replaced with the normal quotes.
textclean::replace_non_ascii("You don‘t get “your” money’s worth")
Received Output: "You dont get your moneys worth"
Expected Output: "You don't get "your" money's worth"
Also would appreciate if someone's got the regex to replace every such quotes in one shot.
Thanks!
Use two gsub operations: 1) to replace double curly quotes, 2) to replace single quotes:
> gsub("[“”]", "\"", gsub("[‘’]", "'", text))
[1] "You don't get \"your\" money's worth"
See the online R demo. Tested in both Linux and Windows, and works the same.
The [“”] construct is a positive character class that matches any single char defined in the class.
To normalize all chars similar to double quotes, you might want to use
> sngl_quot_rx = "[ʻʼʽ٬‘’‚‛՚︐]"
> dbl_quot_rx = "[«»““”„‟≪≫《》〝〞〟\"″‶]"
> res = gsub(dbl_quot_rx, "\"", gsub(sngl_quot_rx, "'", `Encoding<-`(text, "UTF8")))
> cat(res, sep="\n")
You don't get "your" money's worth
Here, [«»““”„‟≪≫《》〝〞〟"″‶] matches
« 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
» 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
“ 05F4 HEBREW PUNCTUATION GERSHAYIM
“ 201C LEFT DOUBLE QUOTATION MARK
” 201D RIGHT DOUBLE QUOTATION MARK
„ 201E DOUBLE LOW-9 QUOTATION MARK
‟ 201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK
≪ 226A MUCH LESS-THAN
≫ 226B MUCH GREATER-THAN
《 300A LEFT DOUBLE ANGLE BRACKET
》 300B RIGHT DOUBLE ANGLE BRACKET
〝 301D REVERSED DOUBLE PRIME QUOTATION MARK
〞 301E DOUBLE PRIME QUOTATION MARK
〟 301F LOW DOUBLE PRIME QUOTATION MARK
" FF02 FULLWIDTH QUOTATION MARK
″ 2033 DOUBLE PRIME
‶ 2036 REVERSED DOUBLE PRIME
The [ʻʼʽ٬‘’‚‛՚︐] is used to normalize some chars similar to single quotes:
ʻ 02BB MODIFIER LETTER TURNED COMMA
ʼ 02BC MODIFIER LETTER APOSTROPHE
ʽ 02BD MODIFIER LETTER REVERSED COMMA
٬ 066C ARABIC THOUSANDS SEPARATOR
‘ 2018 LEFT SINGLE QUOTATION MARK
’ 2019 RIGHT SINGLE QUOTATION MARK
‚ 201A SINGLE LOW-9 QUOTATION MARK
‛ 201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
՚ 055A ARMENIAN APOSTROPHE
︐ FE10 PRESENTATION FORM FOR VERTICAL COMMA
There's a function in {proustr} to normalize punctuation, called pr_normalize_punc() :
https://github.com/ColinFay/proustr#pr_normalize_punc
It turns :
=> ″‶« »“”`´„“ into "
=> ՚ ’ into '
=> … into ...
For example :
library(proustr)
a <- data.frame(text = "Il l՚a dit : « La ponctuation est chelou » !")
pr_normalize_punc(a, text)
# A tibble: 1 x 1
text
* <chr>
1 "Il l'a dit : \"La ponctuation est chelou\" !"
For your text :
pr_normalize_punc(data.frame( text = "You don‘t get “your” money’s worth"), text)
# A tibble: 1 x 1
text
* <chr>
1 "You don‘t get \"your\" money's worth"
We can use gsub here for a base R option. Replace each curly quoted term at a time.
text <- "You don‘t get “your” money’s worth"
new_text <- gsub("“(.*?)”", "\"\\1\"", text)
new_text <- gsub("’", "'", new_text)
new_text
[1] "You don‘t get \"your\" money's worth"
I have assumed here that your curly quotes are always balanced, i.e. they always wrap a word. If not, then you might have to do more work.
Doing a blanket replacement of opening/closing double curly quotes may not play out as intended, if you want them to remain as is when not quoting a word.
Demo

unix grep command

I have a text file named "file1" containing the following data :
apple
appLe
app^e
app\^e
Now the commands given are :
1.)grep app[\^lL]e file1
2.)grep "app[\^lL]e" file1
3.)grep "app[l\^L]e" file1
4.)grep app[l\^L]e file1
output in 1st case : app^e
output in 2nd case :
apple
appLe
app^e
output in 3rd case :
apple
appLe
app^e
output in 4th case :
apple
appLe
app^e
why so..?
Please help..!
1.)grep app[\^lL]e file1
The escape (\) is removed by the shell before grep sees it so this is equivalent to app[^lL]e. The bit in brackets matches anything not (from the ^, since it's the first character) L or l
2.)grep "app[\^lL]e" file1
This time, the \ escapes the ^ so it matches ^ or L or l
3.)grep "app[l\^L]e" file1
^ works to negate the set only if it is the first character, so this matches ^ or L or l
4.)grep app[l\^L]e file1
The ^ is escaped, but since it's not the first it doesn't make any difference, so it matches ^ or L or l
In the first case grep app[\^lL]e file1, you do not quote the pattern on the command line, the shell takes care of its expansion. So the search pattern, effectively, becomes
app[^lL]e
and means: "app", then any symbol but "l" or "L", then "e". The only line that fits is
app^e
In other cases, ^ is either escaped and matched literally, or, in addition, it is in the middle of of the pattern.

How do associations, #NS and #NV work in UniData Dictionaries?

Does anyone have a quick example of how Associations, #NS and #NV work in UniData?
I’m trying to work out associations in dictionary items but cannot get them to do anything.
For example, in a record
<1,1> = A
<1,2> = B
<2,1> = Apple
<2,2> = Banana
I created 3 dictionary items. LETTER and FRUIT, COMBO as follows
LETTER:
<1> = D
<2> = 1
<3> =
<3> = Letter
<4> = 6L
<5> = M
<6> = COMBO
FRUIT:
<1> = D
<2> = 1
<3> =
<3> = Letter
<4> = 6L
<5> = M
<6> = COMBO
COMBO:
<1> = PH
<2> = LETTER FRUIT
Doing a LIST LETTER FRUIT or LIST COMBO has no difference to when LETTER and FRUIT do not have an association declared in 6.
At this point I thought it might group multivalues together when SELECTing so I created another record as such:
<1,1> = A
<1,2> = B
<2,1> = Banana
<2,2> = Apple
Doing SELECT MyFile WITH LETTER = “A” and FRUIT = “Apple” selects both records, so that cannot be it either.
I then tried changing LETTER to be:
<1> = I
<2> = EXTRACT(#RECORD,1,#NV,1);EXTRACT(FRUIT,1,#NV,1);#1:" (":#2:")" : #NS
<3> =
<3> = Letter
<4> = 6L
<5> = M
<6> = COMBO
Hoping it that a LIST MyFile LETTER would bring back all the different letters with their associated fruit in parentheses. That didn’t work either as now LETTER only ever displayed the first Multivalue instead of all of them. For Eg:
LIST MyFile LETTER 14:05:22 26 FEB 2010 1
MyFile.... LETTER..............
RECORD2 A (Banana)1
RECORD A (Apple)1
2 records listed
The manuals don’t go any further than saying the word “association”. Is anyone able to clarify this for me?
Many times NV and NS only work when using BY-EXP in your LIST or SELECT statements. You need to use modifiers that specifically look at MultiValue and SubValues.
WHEN is one, and BY-EXP is another. There are other, but not sure what they are off the top of my head. I primarly use BY-EXP and BY-EXP-DSND.
LIST MyFile BY-EXP LETTER = "A" BY-EXP FRUIT ="Apple" LETTER FRUIT LETTER.COMBO
To bring back all the combinations, you use need to do the following:
LIST MyFile BY-EXP LETTER LETTER FRUIT LETTER.COMBO
Change the following virtual field from 'LETTER' to say 'LETTER.COMBO' or something along those lines:
<1> = I
<2> = EXTRACT(#RECORD,1,#NV,1);EXTRACT(FRUIT,1,#NV,1);#1:" (":#2:")" : #NS
<3> =
<3> = Letter
<4> = 6L
<5> = M
<6> = COMBO
Hope that helps.
-Nathan
To answer part of my own question:
Only 'WHEN' is affected by the association, not with. If you turn on UDT.OPTIONS 94 and do
LIST MyFile WHEN LETTER = "A" AND FRUIT="Apple" COMBO
when using my D-Type definition of LETTER, I get
LIST MyFile WHEN LETTER = "A" AND FRUIT="Apple" LETTER FRUIT 16:06:42 26 FEB 2010 1
MyFile.... LETTER.............. FRUIT...............
RECORD A Apple
1 record listed
Which is what one would expect.
To use the WHEN clause you need to be in ECLTYPE U, not P. IT would be helpful if this was clearer, but oh well...

Resources