I'm having a long string where I would like to remove consecutive words with uppercase (2+ in a row) and if a punctation follows the last uppercase word, that as well.
But at the same time I would like to keep single uppercase words and uppercase words that are part of a "mixed" word (see reprex).
I struggle to implement the consecutive word group in reprex.
string <- "Lorem ipsum DOLOR SIT AMET? consectetuer adipiscing elit. Morbi gravida libero NEC velit. Morbi scelerisque luctus velit. ETIAM-123 dui sem, fermentum vitae, SAGITTIS ID? malesuada in, quam. Proin mattis lacinia justo. Vestibulum facilisis auctor urna. Aliquam IN LOREM SIT amet leo accumsan"
#remove all consecutive UPPERCASE words including punctation (--> DOLOR SIT AMET?), but not single uppercase words (--> NEC) or "mixed" words with uppercase and digits (--> ETIAM-123)
#this doesn't work:
string %>%
stringr::str_remove_all("\\b[:upper:]+\\b")
#> [1] "Lorem ipsum ? consectetuer adipiscing elit. Morbi gravida libero velit. Morbi scelerisque luctus velit. -123 dui sem, fermentum vitae, ? malesuada in, quam. Proin mattis lacinia justo. Vestibulum facilisis auctor urna. Aliquam amet leo accumsan"
Created on 2020-05-30 by the reprex package (v0.3.0)
Any hints are appreciated :)
You may use
string <- "Lorem ipsum DOLOR SIT AMET? consectetuer adipiscing elit. Morbi gravida libero NEC velit. Morbi scelerisque luctus velit. ETIAM-123 dui sem, fermentum vitae, SAGITTIS ID? malesuada in, quam. Proin mattis lacinia justo. Vestibulum facilisis auctor urna. Aliquam IN LOREM SIT amet leo accumsan"
gsub("\\s*\\b\\p{Lu}{2,}(?:\\s+\\p{Lu}{2,})+\\b[\\p{P}\\p{S}]*", "", string, perl=TRUE)
Output:
[1] "Lorem ipsum consectetuer adipiscing elit. Morbi gravida libero NEC velit. Morbi scelerisque luctus velit. ETIAM-123 dui sem, fermentum vitae, malesuada in, quam. Proin mattis lacinia justo. Vestibulum facilisis auctor urna. Aliquam amet leo accumsan"
See the R demo and the regex demo.
Details
\s* - 0 or more whitespaces
\b - word boundary
\p{Lu}{2,} - two or more capital letters
(?:\s+\p{Lu}{2,})+ - 1 or more occurrences of 1+ whitespaces followed with 2 or more uppercase letters
\b - a word boundary
[\p{P}\p{S}]* - any 0 or more symbols or punctuation
Perhaps this?
stringr::str_remove_all(string, "([[:upper:]]+ )+[[:upper:]]+( |[:punct:])*")
#> [1] "Lorem ipsum consectetuer adipiscing elit. Morbi gravida libero NEC velit. Morbi scelerisque luctus velit. ETIAM-123 dui sem, fermentum vitae, malesuada in, quam. Proin mattis lacinia justo. Vestibulum facilisis auctor urna. Aliquam amet leo accumsan"
Created on 2020-05-30 by the reprex package (v0.3.0)
Related
I would like to put on a text a gradient of two colors.
After some searches, here is the css code I got:
p{
background: linear-gradient(red, blue);
-webkit-text-fill-color: transparent;
background-clip:text;
-webkit-background-clip:text;
}
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus elementum vitae sem at ornare. Cras in massa et ante suscipit dignissim. Donec finibus, erat ac vehicula fringilla, magna nibh molestie est, sit amet pellentesque magna augue in lectus. Sed volutpat enim augue, tempor luctus lectus laoreet a. Integer et nibh nunc. Quisque ac est a sapien blandit rhoncus. Cras maximus mi eu quam auctor facilisis. Aenean a nisl lacus. Vivamus elementum aliquet magna, vel efficitur augue luctus vitae. Sed sit amet lectus feugiat, gravida urna id, imperdiet neque. In nec nulla et ante tristique dapibus eu ac eros. Vestibulum vitae tellus mattis, facilisis massa a, mollis ante. Curabitur et orci laoreet, porttitor ipsum ac, mattis elit. Proin in feugiat dolor, non volutpat velit.</p>
I got the desired result, but on W3C validator, "background-clip: text;" does not pass.
I would like to know if there is another way to achieve this result but that goes to W3C.
Thank you in advance
PS: I must only use HTML/ CSS
PS2: Sorry if the question has already been asked.
I have this line of code here in my view
I have this string #Model.inventory.overview and it has — in it.
When I try to use it so it will display the special html character it shows up as the text —
#Html.Raw(Model.inventory.overview)
and
#MvcHtmlString.Create(Model.inventory.overview)
This is what #Model.inventory.overview is
Lorem ipsum dolor sit—amet, consectetur adipiscing elit.
Mauris eget feugiat nibh. Fusce rhoncus ex et nunc fringilla, ut
fermentum tortor volutpat. Praesent mollis efficitur magna auctor
sollicitudin. Morbi pulvinar, justo ut efficitur rutrum, dui metus
varius magna, vitae molestie leo elit vel turpis. Nullam quis ipsum
nec erat maximus dictum sit amet sed ligula. Vestibulum tincidunt
dolor non—justo accumsan, eu euismod neque rutrum. Donec in
lacinia est.
I have also tried the following:
#Html.Raw(HttpUtility.HtmlDecode(#model.ContentBody));
Still not working.
#Html.Raw(Html.Encode(Model.inventory.overview)) was the solution after all.
I looked at W3C Html ASCII characters list and had a suspicion — wasn't "exactly" a ASCII character, so I found this site soon after. Glad I was helpful.
I'm having trouble figuring out how to modify my input text in order to get strwrap to start a new line at a given place without an extra line in between (a paragraph break).
My desired output:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Vivamus malesuada ante eget lacus aliquam aliquet. Morbi a
nulla in tortor rutrum pulvinar.
Duis auctor condimentum magna ac commodo. Phasellus quis
elementum purus, at ornare magna. Quisque sit amet vehicula
risus. Suspendisse et et scelerisque velit:
item #1
item #2
item #3
I can use \n to get the paragraph break, which works fine, but how do I get a new line without the paragraph break, as in the list of items at the bottom? When I use \r...
txt <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Vivamus malesuada ante eget lacus aliquam aliquet. Morbi a nulla
in tortor rutrum pulvinar.
\n
Duis auctor condimentum magna ac commodo. Phasellus quis elementum purus,
at ornare magna. Quisque sit amet vehicula risus. Suspendisse et
scelerisque velit:
\r
item #1
item #2
item #3"
writeLines(strwrap(txt, width=60))
... I get an unexpected result: a line break but with an extra space indent and some juxtaposition and deletion of text:
#Lorem ipsum dolor sit amet, consectetur adipiscing elit.
#Vivamus malesuada ante eget lacus aliquam aliquet. Morbi a
#nulla in tortor rutrum pulvinar.
#
#Duis auctor condimentum magna ac commodo. Phasellus quis
#elementum purus, at ornare magna. Quisque sit amet vehicula
# item #1 item #2se et scelerisque velit:
#item #3
What do I need to replace /r with in order to get a single line break, like between "velit:" and "item #1" in the desired output above? I've read the strwrap documentation and worked through its example, but haven't found the answer. Thanks for your help.
On the off-change that someone else finds this question in the future, I'll share the solution I used here. As Wiktor points out above, strwrap does not have this functionality. What I ended up doing was simply a workaround that edits the text after it goes through strwrap.
I add an arbitrary character sequence to the beginning of each line in the input which I want to start with a carriage return. Below I use "/r" for this. I send the input through strwrap, add two empty lines to the end, and then parse each line of the output, deleting "/r" and removing the empty preceding line (paragraph break). Here's the code:
txt <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Vivamus malesuada ante eget lacus aliquam aliquet. Morbi a nulla
in tortor rutrum pulvinar.
\n
Duis auctor condimentum magna ac commodo. Phasellus quis elementum purus,
at ornare magna. Quisque sit amet vehicula risus. Suspendisse et
scelerisque velit:
\n\r item #1
\n\r item #2
\n\r item #3"
sink("output.txt")
lines <- append(strwrap(txt, width=100), c("",""), after = length(lines))
invisible(lapply(seq_along(lines), function(index) {
if (index != 1) { #skip first line
if (!grepl("\r ", lines[index])) {
writeLines(gsub("\r ", "", lines[index-1]))
}
}
}))
sink()
This produces a .txt file with the output text formatted as desired, where the list of items at the bottom is separated by carriage returns/newlines but paragraph breaks marked in the input with just "/n" are treated normally.
I have a text file with 100 articles. Each article ends with word Document followed by a space and then an alphanumeric. The alphanumeric is 25 characters long.
Examples of how four article ends.The alphanumeric has no set pattern.
Document AFNR000020161206ecc700006
Document TEKMET0020161202ecc200008
Document AFNR000020161130ecc10001o
Document AFNR000020161127ecbs00018
My code to read Text file in R and split text files
textfile <- "Text.txt"
TextData <-readLines(textfile)
head(TextData)
length(TextData)
nchar(TextData)
TextData = strsplit(TextData, "<Document>" "[a-zA-Z0-9]")
I am stuck with using strsplit to create a split across Document alphanumeric.
Once I split I can create a corpus:
library(tm)
doc.vec <- VectorSource(TextData)
corpusDoc <- Corpus(doc.vec)
summary(corpusDoc)
Thank you
It's not entirely clear what your data looks like, but assuming it is just one long string and the "Document alphanumeric code" is in line with the rest of the text, the following should work:
# mock data
TextData <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam egestas sapien eu leo rutrum, non commodo metus auctor. Quisque id libero quis augue bibendum auctor sed vel odio. Aliquam quam odio, maximus vitae elit at, pharetra volutpat nulla. Nam iaculis mattis lectus, sit amet euismod neque ornare consectetur. Cras nec nibh sit amet massa laoreet tempor sit amet in sem. Sed pulvinar sapien risus, molestie cursus augue rhoncus sed. Donec ullamcorper tellus vel tortor finibus pretium. Sed non tristique nisi. Document AFNR000020161206ecc700006 Vestibulum quis risus pulvinar elit blandit faucibus sed et massa. Phasellus non arcu vulputate, aliquam felis sit amet, pellentesque lorem. Praesent ut felis pellentesque, tincidunt risus imperdiet, vehicula ante. Donec odio sapien, vulputate sed semper at, pharetra sit amet dui. Fusce aliquam ullamcorper nunc in ullamcorper. Suspendisse vitae ex aliquam turpis vestibulum semper at ut quam. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nam varius, risus sed feugiat mollis, nunc urna hendrerit mi, eu cursus erat mi quis ligula. Maecenas sit amet sagittis tellus. Donec ultricies faucibus ipsum id mattis. Ut lacinia, diam nec dignissim vestibulum, nibh augue tincidunt dolor, eu dictum felis augue in elit. Sed et scelerisque felis. Document TEKMET0020161202ecc200008 Aenean ut erat mattis, convallis orci eget, tincidunt massa. Integer dictum in erat et ornare. Donec et cursus eros. Aenean condimentum erat in lacus dictum, ac convallis tortor venenatis. Donec luctus dapibus aliquam. Maecenas et ipsum ac lacus convallis luctus. Phasellus volutpat risus sit amet volutpat vestibulum. Vestibulum eu elit sed massa imperdiet congue id interdum odio. Document AFNR000020161130ecc10001o Proin et accumsan nisi. Suspendisse tempus accumsan mollis. Integer aliquam fermentum consequat. Nunc sit amet suscipit tellus, in fringilla diam. Nulla rutrum elit nec blandit varius. Praesent vehicula nibh orci, nec facilisis sem vulputate non. Cras vel ipsum eleifend, vulputate ante congue, facilisis ligula. Integer ac mollis nibh. Ut vitae lacus eget mauris ultrices iaculis non eget diam. Praesent placerat lorem id ante maximus cursus. Ut quis lacus nec turpis tincidunt sagittis lacinia at tortor. Cras vitae posuere diam. Maecenas ut convallis lacus, in commodo neque. Sed rhoncus cursus arcu, nec pharetra odio lacinia quis. Sed nec neque libero. Etiam sit amet purus eros. Document AFNR000020161127ecbs0001"
# split on desired string
TextData <- strsplit(TextData, "Document [a-zA-Z0-9]{25}")
# if you want it as a vecotr
TextData <- unlist(TextData)
Seems like the problem was with the way you defined the regular expression in your strsplit function.
I have got a div [projItemsContent] that varies in height based on the contents,the problem i have got is that there are two other divs [projItemsLeft,projItemsRigh] that i want to have the same height as projItemsContent div. Any suggestions please?
div.projItems{width:360px;min-height:200px;_height:200px;background:#000}
div.projItemsLeft{float:left;width:30px;background:#990}
div.projItemsRight{float:left;width:30px;background:#900}
div.projItemsContent{float:left;width:300px;background:#ccc}
<div class="projItems">
<div class="projItemsLeft"> </div>
<div class="projItemsContent">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur sit amet nunc eu ligula tincidunt faucibus.
Curabitur eget magna neque, sed porta sem. Fusce eu lorem at orci dapibus faucibus ut eu mi. In eget ligula risus.
Sed id lectus lorem. Integer elit dui, bibendum vitae dictum a, mollis sodales diam. Morbi vehicula lobortis semper.
Suspendisse potenti. Proin eu convallis lectus. Praesent ut sem at enim condimentum dictum vitae id elit.
Phasellus id dolor ante, hendrerit tempus lorem. Proin nisi nibh, convallis et sollicitudin in, interdum vitae nibh.
Fusce ullamcorper dictum nunc, eget bibendum ipsum viverra quis. Aliquam vitae leo non metus ultricies tempus in id libero.
Vivamus mauris tortor, convallis ut luctus at, elementum sed velit. Cras cursus tempus erat adipiscing lacinia.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam diam risus, sollicitudin sed venenatis a, molestie in turpis.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur sit amet nunc eu ligula tincidunt faucibus.
Curabitur eget magna neque, sed porta sem. Fusce eu lorem at orci dapibus faucibus ut eu mi. In eget ligula risus.
Sed id lectus lorem. Integer elit dui, bibendum vitae dictum a, mollis sodales diam. Morbi vehicula lobortis semper.
Suspendisse potenti. Proin eu convallis lectus. Praesent ut sem at enim condimentum dictum vitae id elit.
Phasellus id dolor ante, hendrerit tempus lorem. Proin nisi nibh, convallis et sollicitudin in, interdum vitae nibh.
Fusce ullamcorper dictum nunc, eget bibendum ipsum viverra quis. Aliquam vitae leo non metus ultricies tempus in id libero.
Vivamus mauris tortor, convallis ut luctus at, elementum sed velit. Cras cursus tempus erat adipiscing lacinia.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam diam risus, sollicitudin sed venenatis a, molestie in turpis.
</div>
<div class="projItemsRight"> </div>
</div>
Encapsulate the divs:
div.projItems{width:360px;min-height:200px;_height:200px;background:#000}
div.projItemsLeft{float:left;width:360px;background:#990}
div.projItemsRight{float:right;width:330px;background:#900}
div.projItemsContent{float:left;width:300px;background:#ccc}
<div class="projItems">
<div class="projItemsLeft">
<div class="projItemsRight">
<div class="projItemsContent">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur sit amet nunc eu ligula tincidunt faucibus.
Curabitur eget magna neque, sed porta sem. Fusce eu lorem at orci dapibus faucibus ut eu mi. In eget ligula risus.
Sed id lectus lorem. Integer elit dui, bibendum vitae dictum a, mollis sodales diam. Morbi vehicula lobortis semper.
Suspendisse potenti. Proin eu convallis lectus. Praesent ut sem at enim condimentum dictum vitae id elit.
Phasellus id dolor ante, hendrerit tempus lorem. Proin nisi nibh, convallis et sollicitudin in, interdum vitae nibh.
Fusce ullamcorper dictum nunc, eget bibendum ipsum viverra quis. Aliquam vitae leo non metus ultricies tempus in id libero.
Vivamus mauris tortor, convallis ut luctus at, elementum sed velit. Cras cursus tempus erat adipiscing lacinia.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam diam risus, sollicitudin sed venenatis a, molestie in turpis.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur sit amet nunc eu ligula tincidunt faucibus.
Curabitur eget magna neque, sed porta sem. Fusce eu lorem at orci dapibus faucibus ut eu mi. In eget ligula risus.
Sed id lectus lorem. Integer elit dui, bibendum vitae dictum a, mollis sodales diam. Morbi vehicula lobortis semper.
Suspendisse potenti. Proin eu convallis lectus. Praesent ut sem at enim condimentum dictum vitae id elit.
Phasellus id dolor ante, hendrerit tempus lorem. Proin nisi nibh, convallis et sollicitudin in, interdum vitae nibh.
Fusce ullamcorper dictum nunc, eget bibendum ipsum viverra quis. Aliquam vitae leo non metus ultricies tempus in id libero.
Vivamus mauris tortor, convallis ut luctus at, elementum sed velit. Cras cursus tempus erat adipiscing lacinia.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam diam risus, sollicitudin sed venenatis a, molestie in turpis.
</div>
Right column content
<br style="clear:left;" /></div>
Left column content
<br style="clear:right;" /></div>
</div>
Google turned up this: http://matthewjamestaylor.com/blog/perfect-3-column.htm
I once used this as a guide: http://bluerobot.com/web/layouts/layout3.html
And this too is a good resource: http://www.thenoodleincident.com/tutorials/box_lesson/boxes.html