multiline comment() attributes for functions - r

Say I have the following function:
sqrt_x = function(x) {
sqrtx = x^0.5
return(list("sqrtx" = sqrt))
}
attr(sqrt_x, "comment") <- "This is a comment to be placed on two different lines"
if I type
comment(sqrt_x)
I get
[1] "This is a comment to be placed on two different lines"
what I want, however, is that the comment is returned on two different lines (it could also be more lines and different comment elements. Any ideas appreciated.

As Andrie stated: you need to insert newline characters.
If you don't want to have to manually specify where the newlines go, then you can use strwrap to create breaks at convenient points, so that your string doesn't exceed a specified width.
msg <- strwrap("This is a comment to be placed on two different lines", width = 20)
cat(msg, sep = "\n")
# This is a comment
# to be placed on two
# different lines
A complete solution could look something like:
#Add comment as normal
comment(sqrt_x) <- "This is a comment to be placed on two different lines"
#Display using this function
multiline_comment <- function(x, width = getOption("width") - 1L)
{
cat(strwrap(comment(x), width = width), sep = "\n")
}
multiline_comment(
sqrt_x,
20
)

You can use \n to insert a newline. The cat method shows this in the way you want:
attr(sqrt_x, "comment") <- "This is a comment to be placed on two\ndifferent lines"
cat(comment(sqrt_x))
This is a comment to be placed on two
different lines

This is a bit of a hack, and maybe not what you want, but if you provide a multi-element character vector, and the lines are long enough that R's default formatting decides they should be on multiple lines, you may get what you want:
comment(sqrt_x) <- c("This is a comment ",
"to be placed on two different lines")
comment(sqrt_x)
## [1] "This is a comment "
## [2] "to be placed on two different lines"
You could use format to pad automatically:
comment(sqrt_x) <- format(c("This is a comment",
"to be placed on two different lines"),
width=50)
(as shown elsewhere you could also use strwrap() to break up a single long string
into parts)
If you're absolutely desperate to have this and you don't like the extra spaces, you could mask the built-in comment function with something like #RichieCotton's multiline version:
comment <- function(x,width = getOption("width") - 1L) {
cat(strwrap(base::comment(x), width = width), sep = "\n")
}
but this is probably a bad idea.

Related

How can I delete half of the words contained in .txt-files using R?

I aim to compare the first half of texts with the entirety of the same texts. I have already done multiple analyses using the full texts, which I simply loaded into r with the help of the "readtext"-function (and some functions to attach variables like the session number). Likewise, I used the same function(s) to load in my texts again and now want to delete the second half of said texts.
My idea was to count the words in each string first, which I did using:
dataframe$numwords <- str_count (dataframe$text, "\\w+")
The next step would be, to use a for-loop to delete half the number of "numwords" from each row in the text column. However, I don't know how to do this. And is there a better way?
My dataframe looks like this (Note: The text in my data frames contains on average about 6000 words per row.)
text
session_no
patient_code
numwords
I do not feel well today.
05
2006X
6
My anxiety is getting worse. Why?
05
2007X
6
I can not do anything right, as always.
10
2006X
8
Edit: Is there a way to keep the punctuation? I am searching the text for specific ngrams. Doing this without punctuation may lead to false alarms, as the detection tool may find a match in text originally coming from two separate sentences.
With the following, we take the text column and split it into words using strsplit().
Then we use lapply() to calculate ho how many words would be half of each text.
Finally, we return only the first half of each text, but we lose all punctuation in the proccess.
lapply(strsplit(dataframe$text, split = "\\W+"), function(words) {
half <- round(length(words) / 2, 0)
paste(words[1:half], collapse = " ")
})
Edit
If we want to keep punctuation, then we need to make some adjustments.
Our regex nos keeps the delimiter, but has the secondary effect of keep some spaces as "words", so we have to remove them. We also use trim_ws() to remove trailing whitespace.
lapply(strsplit(dataframe$text, split = "(?<=\\W)", perl = TRUE), function(words) {
words <- words[words != " "]
half <- round(length(words) / 2, 0)
new_text <- paste(words[1:half], collapse = "")
trimws(new_text)
})

Wrap string at a certain length and replace wrappings with '\n'

I have a string that looks like:
str <- c("some text", "another one, that is a bit longer", "yet another one")
Now I want to wrap that string to a certain width and afterwards replace all those wrappings with '\n'.
The following code does what I want (here: wrapping at maximum width 10):
str <- sapply(str, function(x) {paste0(strwrap(x, width = 10), collapse = "\n")})
names(str) <- NULL
str
The expected output is:
# [1] "some text" "another\none, that\nis a bit\nlonger"
# [3] "yet\nanother\none"
However this code seems fairly complicated to me (given the simplicity of the question). Are there more concise options to achieve what I want?
Some more context about the why:
My string contains labels of an igraph object. I want to wrap those labels at certain lengths. But of course the number of labels must be constant throughout the processing..

extracting only relevant comments from a list of comments

Continuing with my exploration into text analysis, i have encountered yet another roadblock.I understand the logic but don't know how to do it in R.
Here's what i want to do:
I have 2 CSVs- 1. contains 10,000 comments 2. containing a list of words
I want to select all those comments that have any of the words in the 2nd CSV. How can i go about it?
example:
**CSV 1:**
this is a sample set
the comments are not real
this is a random set of words
hope this helps the problem case
thankyou for helping out
i have learned a lot here
feel free to comment
**CSV 2**
sample
set
comment
**Expected output:**
this is a sample set
the comments are not real
this is a random set of words
feel free to comment
Please note:
the different forms of words is also considered, eg, comment and comments are both considered.
We can use grep after pasteing the elements in the second dataset.
v1 <- scan("file2.csv", what ="")
lines1 <- readLines("file1.csv")
grep(paste(v1, collapse="|"), lines1, value=TRUE)
#[1] "this is a sample set" "the comments are not real"
#[3] "this is a random set of words" "feel free to comment"
First create two objects called lines and words.to.match from your files. You could do it like this:
lines <- read.csv('csv1.csv', stringsAsFactors=F)[[1]]
words.to.match <- read.csv('csv2.csv', stringsAsFactors=F)[[1]]
Let's say they look like this:
lines <- c(
'this is a sample set',
'the comments are not real',
'this is a random set of words',
'hope this helps the problem case',
'thankyou for helping out',
'i have learned a lot here',
'feel free to comment'
)
words.to.match <- c('sample', 'set', 'comment')
You can then compute the matches with two nested *apply-functions:
matches <- mapply(
function(words, line)
any(sapply(words, grepl, line, fixed=T)),
list(words.to.match),
lines
)
matched.lines <- lines[which(matches)]
What's going on here? I use mapply to compute a function over each line in lines, taking words.to.match as the other argument. Note that the cardinality of list(words.to.match) is 1. I just recycle this argument across each application. Then, inside the mapply function I call an sapply function to check whether any of the words match the line (I check for the match via grepl).
This is not necessarily the most efficient solution, but it's a bit more intelligible to me. Another way you could compute matches is:
matches <- lapply(words.to.match, grepl, lines, fixed=T)
matches <- do.call("rbind", matches)
matches <- apply(matches, c(2), any)
I dislike this solution because you need to do a do.call("rbind",...), which is a bit hacky.

Wrapping and centering title in R

I have a long title that incorporates italics. I tried using \n to move half of the title onto a new line with limited success. I have it mostly figured out, but now I can't center the second line.
title(main=expression(paste("Inoculated \n", italic("Petunia x hybrida\n"), "`Dreams Red` mortality\n as a function of irrigation treatment" )))
I counted three "\n"'s in that expression so I guess you want 4 lines of title. That's fairly easy with nested used of plotmath atop:
title(main=expression(atop( atop(Inoculated,
italic("Petunia x hybrida")),
atop("\'Dreams Red\'"~mortality,
'as a function of irrigation treatment') )))
I was also guessing you actually didn't want backticks and so escaped single-quotes.
The task of 'writing' to a graphics device with an expression containing a number of lines not a power of 2 is a bit more complex.
You can do it with atop, which is actually for formatting math, but is frequently used for this purpose. It puts its first argument on the top and its second on the bottom, so paste as necessary.
plot(x = rnorm(10))
title(main = expression(atop(paste('Inoculated ', italic("Petunia x hybrida"),
"`Dreams Red` mortality"),
"as a function of irrigation treatment")))

Adding a newline in a substitute() expression

I'm trying to annotate a plot in ggplot with relevant data from a regression model.
I've followed the suggestions in this SO post and tried to modify the function to have a couple additional items in a newline in the plot.
This is my attempt at a new function:
lm_eqn = function(m){
eq <- substitute(italic(y) == a %.% italic(x)^b*","~~italic(r)^2~"="~r2*","~~italic(n)~"="~nn*","~~italic(p-value)~"="~pv,
list(a = format(exp(coef(m)[1]), digits = 3),
b = format(coef(m)[2], digits = 3),
r2 = format(summary(m)$r.squared, digits = 3),
nn=format(summary(m)$df[2]),
pv=format(summary(m)$coefficients[,4][2])))
as.character(as.expression(eq));
}
It produces the expected output: all in one line. But I'd like to split the text in two lines, the second one starting with italic(n)=. But if I introduce a \n, it throws an error when it finds \n. If I introduce the \n inside the quotes: "\n" then it seems to be ignored and the text remains in one line. I haven't found any reference as to how to introduce a newline in such an expression. Your kind help will be much appreciated.
Thanks.
EDIT: following a coment by #Tim I present a rewriten code and adjusted question.
\n cannot be used in plotmath expressions. You could perhaps break the expression in two parts, and use annotate to add the expressions where you want them. Or, use atop. Check out this post ->
Line break in expression()?

Resources