How to extract substrings dynamically

How to extract substrings dynamically - r

From the string
s <- "|tree| Lorem ipsum dolor sit amet, |house| consectetur adipiscing elit,
|street| sed do eiusmod tempor incididunt ut labore et |car| dolore magna aliqua."
I want to extract the text after the letters within the |-symbols.
My approach:
words <- list("tree","house","street","car")
for(word in words){
expression <- paste0("^.*\\|",word,"\\|\\s*(.+?)\\s*\\|.*$")
print(sub(expression, "\\1", s))
}
This works fine for all but the last wortd car. It instead returns the entire string s.
How can I modify the regex such that for the last element of words-list in prints out dolore magna aliqua..
\Edit: Previously the list with expressions was a,b,c,d. Solutions to this specific problem cannot be generalized very well.

Try this:
library(stringi)
s <- '|a| Lorem ipsum dolor sit amet, |b| consectetur adipiscing elit,
|c| sed do eiusmod tempor incididunt ut labore et |d| dolore magna aliqua.'
stri_split_regex(s, '\\|[:alpha:]\\|')
[[1]]
[1] "" " Lorem ipsum dolor sit amet, "
[3] " consectetur adipiscing elit, \n" " sed do eiusmod tempor incididunt ut labore et "
[5] " dolore magna aliqua."

You can try this pattern
library(stringr)
s <- "|tree| Lorem ipsum dolor sit amet, |house| consectetur adipiscing elit,
|street| sed do eiusmod tempor incididunt ut labore et |car| dolore magna aliqua."
str_extract_all(s, regex("(?<=\\|)\\w+(?=\\|)"))
#[1] "tree" "house" "street" "car"
(?<=\\|): Look behind, position following by |; \\|: is an escape for |
\\w: word characters
(?=\\|): Lookahead, position followed by |

I suggest extracting all the words with corresponding values using stringr::str_match_all:
s <- "|tree| Lorem ipsum dolor sit amet, |house| consectetur adipiscing elit,
|street| sed do eiusmod tempor incididunt ut labore et |car| dolore magna aliqua."
words1 <- list("tree","house","street","car")
library(stringr)
expression <- paste0("\\|(", paste(words1, collapse="|"),")\\|\\s*([^|]*)")
result <- str_match_all(s, expression)
lapply(result, function(x) x[,-1])
See the R demo
Output:
[[1]]
[,1] [,2]
[1,] "tree" "Lorem ipsum dolor sit amet, "
[2,] "house" "consectetur adipiscing elit, \n"
[3,] "street" "sed do eiusmod tempor incididunt ut labore et "
[4,] "car" "dolore magna aliqua."
The regex is
\|(tree|house|street|car)\|\s*([^|]*)
See the regex demo, details:
\| - a | char
(tree|house|street|car) - Group 1: one of the words
\| - a | char
\s* - 0 or more whitespace chars
([^|]*) - Group 2: any 0 or more chars other than |.

Related

Is there a way to achieve the behaviour of Appsilon's shiny.router for Rmd documents? (selectInput rendering dynamic pages within a Rmd doc)

I have a shiny app in which you select one of 100 options from a couple of select inputs to show one of 100 Rmds/html pages.
Once you have chosen an option, an Rmd is rendered and displayed in the app but it is slow to render each time. Once that Rmd is loaded, you can choose another option to see a different Rmd
Since Rmd are more responsive than shiny apps, is there a way for me to recreate the same functionality (Choose an option, that links you to the correct Rmd, but you are still able to select a different option and go to that option's Rmd) but completely contained within an Rmd or family of Rmds?
Thank you

Does it help?
---
title: Test
output:
flexdashboard::flex_dashboard:
vertical_layout: scroll
runtime: shiny_prerendered
---
# Page 0
```{r context='render'}
npages <- 3
links <- paste0("#section-page-", 1:npages)
names(links) <- paste0("Page ", 1:npages)
onChange <- '
function(value){
const a = document.createElement("a");
document.body.append(a);
a.href = value;
a.click();
a.remove();
}
'
selectizeInput(
"sel",
"Select a page",
choices = as.list(links),
options = list(
onChange = I(onChange)
)
)
```
```{r echo=FALSE}
backlink <- function(){
tags$a("Back to selection", href = "#section-page-0")
}
```
# Page 1
blablabla...
```{r context="render"}
backlink()
```
# Page 2
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
```{r context="render"}
backlink()
```
# Page 3
```{r context='render'}
uiOutput("contentBox", inline = TRUE)
```
```{r context='server'}
content <- reactive({
x <- rnorm(1)
tags$span(x, id = 'myspan')
})
output$contentBox <- renderUI({
content()
})
```
```{r context="render"}
backlink()
```
EDIT
Here is the same flex dashboard but this one does not use Shiny (except the Shiny widgets, but no Shiny server). It uses the JavaScript library select2 because I like it (I find the native dropdown lists are not pretty).
---
title: "Navigating without Shiny"
output:
flexdashboard::flex_dashboard:
vertical_layout: scroll
pandoc_args:
header-includes: select2_css.html
include-after: select2_js.html
---
```{js}
$(document).ready(function() {
$("#sel").select2({
width: "resolve"
});
$("#sel").on("select2:select", function(e){
const a = document.createElement("a");
document.body.append(a);
a.href = e.params.data.id;
a.click();
a.remove();
});
});
```
```{r setup, include=FALSE}
library(flexdashboard)
library(htmltools)
```
# Page 0
```{r results='asis'}
npages <- 3
links <- paste0("#page-", 1:npages)
names(links) <- paste0("Page ", 1:npages)
shiny::selectInput(
"sel",
"Select a page",
choices = as.list(links),
selectize = FALSE,
width = "20%"
)
```
```{r echo=FALSE}
backlink <- function(){
tags$a("Back to selection", href = "#section-page-0")
}
```
# Page 1
blablabla...
```{r results='asis'}
backlink()
```
# Page 2
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
```{r results='asis'}
backlink()
```
# Page 3
```{r results='asis'}
backlink()
```
File select2_css.html:
<link rel="stylesheet" href="select2.min.css"></link>
File select2_js.html:
<script src="select2.min.js"></script>
Of course I downloaded the two select2.min files.
Edit by OP:
I was unable to get the selectInput to render in a static Rmd so I used crosstalk to the same effect
```{r}
sd <- SharedData$new(data.frame(n = names(links), l = links), group = 'grp', key = unname(links))
crosstalk::filter_select(id = 'selles',
label = 'select a page',
sharedData = sd,
group = ~names(links),
allLevels = F,
multiple = F)
```
```{js}
var ct_filter = new crosstalk.FilterHandle('grp');
// Get notified when this group's filter changes
ct_filter.on("change", function(e) {
// e.value gives the filter
const a = document.createElement("a");
document.body.append(a);
a.href = e.value;
a.click();
a.remove();
});
```

detect any pattern from a character vector within another character vector in R

I would like to return a logical vector with TRUE values for all elements in which any element from another character vector is detected.
Example data:
lorem <- c("Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.",
"Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.",
"Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.",
"Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.")
As an example, I would like to search the elements 'sit' and 'non'.
I tried
str_detect(lorem, c('sit', 'non'))
and
str_detect(lorem, c('non', 'sit'))
which showed me that the second argument is probably being recycled, so the call str_detect(lorem, c('sit', 'non')) actually occurs as follows:
c(str_detect(lorem[1], 'sit'), str_detect(lorem[2], 'non'), str_detect(lorem[3], 'sit'), str_detect(lorem[4], 'non'))
I eventually came up with the following solution:
multi_string_detect<-function(x,y){
temp<-sapply(y, function(z){str_detect(x, z)})
apply(temp, 1, any)
}
multi_string_detect(lorem, c('sit', 'non')
[1] TRUE FALSE FALSE TRUE
Is there a clean/simpler alternative to my multi_string_detect function?

Another option is to collapse the pattern into a single string with |
library(stringr)
str_detect(lorem, str_c(c('non', 'sit'), collapse = "|"))
#[1] TRUE FALSE FALSE TRUE

Creating a single column vector from a list column in R

I'm currently trying to divide up a dataset of text documents (coded in UTF-8) by paragraph in R, but I'm having trouble getting them into the format I want for tidytext, which is a single column of the different paragraphs.
My data so far looks something like this:
list <- c("Lorem ipsum dolor sit amet, movet omittantur ut vel, vim an offendit prodesset. Sumo summo intellegam vel ei, dicunt persecuti vim ne. Lorem noluisse at est. Per ex postulant philosophia, ut vel amet affert tantas, pro ne consetetur scriptorem. Id mel aeque deleniti.
Nam ut erat eligendi, pro eu minim molestie persequeris. Civibus interesset te nec, cu aeque fabellas luptatum has. Ad usu nominati tractatos. Eu voluptatum disputationi vis, alienum delicatissimi pri eu. Et molestie copiosae nam, ex vix ignota dignissim. Dico suas illum at mea, no case modus antiopam sea.
Ius te copiosae lobortis contentiones. Est ceteros dissentiet ne, qui malis iuvaret tacimates an. Vivendo erroribus nec no. No quo corpora indoctum iracundia, mel ad mollis accusam praesent. Sit at admodum sensibus mediocrem, no pri decore nemore.",
"Lorem ipsum dolor sit amet, movet omittantur ut vel, vim an offendit prodesset. Sumo summo intellegam vel ei, dicunt persecuti vim ne. Lorem noluisse at est. Per ex postulant philosophia, ut vel amet affert tantas, pro ne consetetur scriptorem. Id mel aeque deleniti.
Nam ut erat eligendi, pro eu minim molestie persequeris. Civibus interesset te nec, cu aeque fabellas luptatum has. Ad usu nominati tractatos. Eu voluptatum disputationi vis, alienum delicatissimi pri eu. Et molestie copiosae nam, ex vix ignota dignissim. Dico suas illum at mea, no case modus antiopam sea.
Ius te copiosae lobortis contentiones. Est ceteros dissentiet ne, qui malis iuvaret tacimates an. Vivendo erroribus nec no. No quo corpora indoctum iracundia, mel ad mollis accusam praesent. Sit at admodum sensibus mediocrem, no pri decore nemore.",
"Lorem ipsum dolor sit amet, movet omittantur ut vel, vim an offendit prodesset. Sumo summo intellegam vel ei, dicunt persecuti vim ne. Lorem noluisse at est. Per ex postulant philosophia, ut vel amet affert tantas, pro ne consetetur scriptorem. Id mel aeque deleniti.
Nam ut erat eligendi, pro eu minim molestie persequeris. Civibus interesset te nec, cu aeque fabellas luptatum has. Ad usu nominati tractatos. Eu voluptatum disputationi vis, alienum delicatissimi pri eu. Et molestie copiosae nam, ex vix ignota dignissim. Dico suas illum at mea, no case modus antiopam sea.
Ius te copiosae lobortis contentiones. Est ceteros dissentiet ne, qui malis iuvaret tacimates an. Vivendo erroribus nec no. No quo corpora indoctum iracundia, mel ad mollis accusam praesent. Sit at admodum sensibus mediocrem, no pri decore nemore.")
df <- as.data.frame(list)
df_spl <- str_split(df$list, "\n", n = Inf)
df_spl
Basically it's a large list of different vectors that have different paragraphs in them from each original row.
What I ultimately want is a single column vector with all the list items, like this:
vector <- c("Lorem ipsum dolor sit amet, movet omittantur ut vel, vim an offendit prodesset. Sumo summo intellegam vel ei, dicunt persecuti vim ne. Lorem noluisse at est. Per ex postulant philosophia, ut vel amet affert tantas, pro ne consetetur scriptorem. Id mel aeque deleniti.", "Nam ut erat eligendi, pro eu minim molestie persequeris. Civibus interesset te nec, cu aeque fabellas luptatum has. Ad usu nominati tractatos. Eu voluptatum disputationi vis, alienum delicatissimi pri eu. Et molestie copiosae nam, ex vix ignota dignissim. Dico suas illum at mea, no case modus antiopam sea.", "Ius te copiosae lobortis contentiones. Est ceteros dissentiet ne, qui malis iuvaret tacimates an. Vivendo erroribus nec no. No quo corpora indoctum iracundia, mel ad mollis accusam praesent. Sit at admodum sensibus mediocrem, no pri decore nemore.", "Lorem ipsum dolor sit amet, movet omittantur ut vel, vim an offendit prodesset. Sumo summo intellegam vel ei, dicunt persecuti vim ne. Lorem noluisse at est. Per ex postulant philosophia, ut vel amet affert tantas, pro ne consetetur scriptorem. Id mel aeque deleniti." "Nam ut erat eligendi, pro eu minim molestie persequeris. Civibus interesset te nec, cu aeque fabellas luptatum has. Ad usu nominati tractatos. Eu voluptatum disputationi vis, alienum delicatissimi pri eu. Et molestie copiosae nam, ex vix ignota dignissim. Dico suas illum at mea, no case modus antiopam sea.", "Ius te copiosae lobortis contentiones. Est ceteros dissentiet ne, qui malis iuvaret tacimates an. Vivendo erroribus nec no. No quo corpora indoctum iracundia, mel ad mollis accusam praesent. Sit at admodum sensibus mediocrem, no pri decore nemore.", "Lorem ipsum dolor sit amet, movet omittantur ut vel, vim an offendit prodesset. Sumo summo intellegam vel ei, dicunt persecuti vim ne. Lorem noluisse at est. Per ex postulant philosophia, ut vel amet affert tantas, pro ne consetetur scriptorem. Id mel aeque deleniti.", "Nam ut erat eligendi, pro eu minim molestie persequeris. Civibus interesset te nec, cu aeque fabellas luptatum has. Ad usu nominati tractatos. Eu voluptatum disputationi vis, alienum delicatissimi pri eu. Et molestie copiosae nam, ex vix ignota dignissim. Dico suas illum at mea, no case modus antiopam sea.", "Ius te copiosae lobortis contentiones. Est ceteros dissentiet ne, qui malis iuvaret tacimates an. Vivendo erroribus nec no. No quo corpora indoctum iracundia, mel ad mollis accusam praesent. Sit at admodum sensibus mediocrem, no pri decore nemore.")
I've already tried commands like cbind(), stack(), and unnest(), but none of them have gotten me that single column :(
Any help would be greatly, greatly appreciated! Thanks!!

We can unlist the list element into avectorandpaste` if we need a single string
out <- paste(unlist(df_spl), collapse=" ")

To turn a list into a vector you can use:
unlist(df_spl)

Wrap horizontal legend across multiple rows

Suppose I have data like the following:
lab <- "A really really long string!"
dat <- data.frame(grp = paste(1:6,lab),x=1:6,y=runif(6))
When plotting a legend with strings this long, sometimes it can be a challenge to get the legend to fit nicely. If I have to I can always abbreviate the strings to shorten them, but I was wondering if it's possible (most likely using some grid magic) to 'wrap' a legend across multiple rows or columns. For instance, say I position the legend on the bottom, horizontally:
ggplot(dat,aes(x=x,y=y,colour=grp)) + geom_point() +
opts(legend.position="bottom",legend.direction="horizontal")
Is it possible to get this legend to display as two rows of three, rather than one row of six?

To wrap long strings, use strwrap.
lipsum <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur ullamcorper tellus vitae libero placerat aliquet egestas tortor semper. Maecenas pellentesque euismod tristique. Donec semper interdum magna, commodo vehicula ante hendrerit vitae. Maecenas at diam sollicitudin magna mollis lobortis. In nibh elit, tincidunt eu lobortis ac, molestie a felis. Proin turpis leo, iaculis non commodo quis, venenatis at justo. Duis in magna vel erat fringilla gravida quis non nisl. Nunc lacus magna, varius eu luctus vel, luctus tristique sapien. Suspendisse mi dolor, vestibulum at facilisis elementum, lacinia vitae metus. Etiam ut nisl urna, vel tempus mi. In hac habitasse platea dictumst. Quisque pretium volutpat felis, nec tempor diam faucibus at. Praesent volutpat posuere sapien, eu vulputate risus molestie vitae. Proin iaculis quam non leo porttitor hendrerit."
strwrap(lipsum)
cat(strwrap(lipsum), sep = "\n")
# Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur ullamcorper tellus
# vitae libero placerat aliquet egestas tortor semper. Maecenas pellentesque euismod
# tristique. Donec semper interdum magna, commodo vehicula ante hendrerit vitae. Maecenas
# at diam sollicitudin magna mollis lobortis. In nibh elit, tincidunt eu lobortis ac,
# molestie a felis. Proin turpis leo, iaculis non commodo quis, venenatis at justo. Duis
# in magna vel erat fringilla gravida quis non nisl. Nunc lacus magna, varius eu luctus
# vel, luctus tristique sapien. Suspendisse mi dolor, vestibulum at facilisis elementum,
# lacinia vitae metus. Etiam ut nisl urna, vel tempus mi. In hac habitasse platea
# dictumst. Quisque pretium volutpat felis, nec tempor diam faucibus at. Praesent
# volutpat posuere sapien, eu vulputate risus molestie vitae. Proin iaculis quam non leo
# porttitor hendrerit.

Try this. I wrote this for very long titles but it works for any long string.
You still have to figure out the linelength for your instance.
# splits title of plot if to long
splittitle=function(title,linelength=40)
{
spltitle<-strsplit(title,' ')
splt<-as.data.frame(spltitle)
title2<-NULL
title3<-NULL
titlelength<-round(nchar(title)/round(nchar(title)/linelength))
dimsplt<-dim(splt)
n=1
doonce2=0
for(m in 1:round(nchar(title)/linelength)){
doonce=0
doonce2=0
for(l in n:dimsplt[1]){
if(doonce==0){title2<-title3}
title2=paste(title2,splt[l,],sep=' ')
if(doonce2==0){if(nchar(title2)>=(titlelength*m)){title3=paste(title2,'\n',sep='')
n<-(l+1)
doonce2=1}
}
doonce=1
}
}
title2
}
lab <- "A really really long string!A really really long string!A really really long string!A really really long string!A really really long string!A really really long string!A really really long string!A really really long string!"
lab2<-splittitle(lab)
cat(lab)
cat(lab2)
library('ggplot2')
1 original
dat <- data.frame(grp = paste(1:6,lab2),x=1:6,y=runif(6))
ggplot(dat,aes(x=x,y=y,colour=grp)) + geom_point() +
opts(legend.position="bottom",legend.direction="horizontal")
2 using splittitle
dat <- data.frame(grp = paste(1:6,lab2),x=1:6,y=runif(6))
ggplot(dat,aes(x=x,y=y,colour=grp)) + geom_point() +
opts(legend.position="bottom",legend.direction="horizontal")

The earlier mentioned splittitle almost works, but for example
> splittitle("abc defg hi jkl m", 6)
[1] " abc defg\n hi\n jkl m"
does not really give you what you want...
One trick is to use RGraphics::splitString which
"Splits a single string into multiple lines (by inserting line breaks)
so that the output will fit within the current viewport."
Then you just change the viewport temporarily. The function below did the trick for me, but is still only a quick & dirty -solution. I used it to wrap a legend title.
library(RGraphics)
multiLines <- function(text, maxWidth=11) {
textLen = nchar(text)
maxHeight = ceiling(textLen/maxWidth)*1.5
vp=viewport(width=maxWidth,height=maxHeight, default.units="char")
pushViewport(vp) #activate the viewport
text2 = splitString(text) #given vp, split the text
popViewport() #get rid of it
return(text2)
}

Splitting a string in ASP Classic

So here's my string:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam elit lacus, dignissim quis laoreet non, cursus id eros. Etiam lacinia tortor vel purus eleifend accumsan. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Quisque bibendum vestibulum nisl vitae volutpat.
I need to split it every 100 characters (full words only) until all the characters are used up.
So we'd end up with:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam elit lacus, dignissim quis laoreet non,
and
cursus id eros. Etiam lacinia tortor vel purus eleifend accumsan. Pellentesque habitant morbi tristique
and
senectus et netus et malesuada fames ac turpis egestas. Quisque bibendum vestibulum nisl vitae volutpat.
Any ideas on the best way to do that?

Since Daniel replied with actual code similar to my description, I'm gonna go with a different suggestion. I might be one character off with count. This code prints the start/end offsets and the substrings. What YOU need to do is modify this to save the strings in an array instead:
<%
Dim LoremIpsum
LoremIpsum = "Lorem ipsum dolor sit amet....."
Response.Write LoremIpsum & "<br>"
SplitWords LoremIpsum, 100
Function SplitWords(text, maxlen)
Dim c, i, j, l
l = Len(text)
i = 1
j = maxlen
Do While (j < l And Response.IsClientConnected)
c = Mid(text, j, 1)
Do While (c <> " " And j > i)
j = j - 1
c = Mid(text, j, 1)
Loop
Response.Write(i & "<br>")
Response.Write(j & "<br>")
s = Mid(text, i, j-i)
Response.Write(s & "<br>")
i = j
j = j + maxlen
Loop
End Function
%>

First you may want to split your string with the space character as a delimiter. Then start with an empty string, iterate over each word in the array, concatenate each word to the new string until the number of words exceeds 100:
str = "Lorem ipsum ...."
words = Split(str)
stringSection = ""
wordCounter = 0
FOR EACH word IN words
stringSection = stringSection & word
wordCounter = wordCounter + 1
IF wordCounter >= 100 THEN
Response.Write(stringSection & "<BR /><BR />")
wordCounter = 0
stringSection = ""
ELSE
stringSection = stringSection & " "
END IF
NEXT
Response.Write(stringSection & "<BR /><BR />")
Note that the last Response.Write is necessary to handle the last stringSection, even though it might have not exceeded the 100 words.

I needed to count the spaces as well as have it as a function...here is what I came up with...
Function wordSubstring(txtString,maxLen)
words = Split(txtString," ")
charCounter = 0
stringSection = ""
For Each word IN words
stringSection = stringSection & word
charCounter = len(stringSection)
if charCounter >= maxLen Then
wordSubstring=stringSection
exit For
else
stringSection = stringSection & " "
end If
Next
wordSubstring = stringSection
End Function

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex