replace string with a random character from a selection

replace string with a random character from a selection - r

How can you take the string and replace every instance of ".", ",", " " (i.e. dot, comma or space) with one random character selected from c('|', ':', '#', '*')?
Say I have a string like this
Aenean ut odio dignissim augue rutrum faucibus. Fusce posuere, tellus eget viverra mattis, erat tellus porta mi, at facilisis sem nibh non urna. Phasellus quis turpis quis mauris suscipit vulputate. Sed interdum lacus non velit. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae;
To get one random character, we can treat the characters as a vector then use sample function to select one out. I assume first I need to search for dot, comma or space, then use gsub function to replace all these?

Given your clarification, try this one:
x <- c("this, is nice.", "nice, this is.")
gr <- gregexpr("[., ]", x)
regmatches(x,gr) <- lapply(lengths(gr), sample, x=c('|',':','#','*'))
x
#[1] "this|*is#nice:" "nice#|this*is:"

Here is another option with chartr
pat <- paste(sample(c('|', ';', '#', '*'), 3), collapse="")
chartr('., ', pat, x)
#[1] "this|*is*nice;" "nice|*this*is;"
data
x <- c("this, is nice.", "nice, this is.")

Related

How to extract repeated pattterns from a string

I need to extract certain patterns from the text below.
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Budget 2016-2017
Curabitur dictum gravida mauris. Budget 2015-2016 mauris ut leo. Cras
viverra metus rhoncus sem
I need to get the 'Budget \d{4}-\d{4}' part of the text so it looks like:
[1] "Budget 2016-2017" "Budget 2015-2016"

You can get what you want with the following:
library(stringr)
string <- "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Budget 2016-2017 Curabitur dictum gravida mauris. Budget 2015-2016 mauris ut leo. Cras viverra metus rhoncus sem"
unlist(str_extract_all(string, 'Budget [0-9]{4}-[0-9]{4}'))
Result:
> unlist(str_extract_all(string, 'Budget [0-9]{4}-[0-9]{4}'))
[1] "Budget 2016-2017" "Budget 2015-2016"

something close
s <- "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Budget 2016-2017 Curabitur dictum gravida mauris. Budget 2015-2016 mauris ut leo. Cras viverra metus rhoncus sem"
gsub(".*(Budget [0-9]{4}-[0-9]{4}).*", "\\1", s)
[1] "Budget 2015-2016"

Is there in R something like the "here document" in bash?

My script contains the line
lines <- readLines("~/data")
I would like to keep the content of the file data (verbatim) in the script itself. Is there in R a "read_the_following_lines" function? Something like to the "here document" in the bash shell?

Multi-line strings are going to be as close as you get. It's definitely not the same (since you have to care about the quotes) but it does work pretty well for what you're trying to achieve (and you can do it with more than read.table):
here_lines <- 'line 1
line 2
line 3
'
readLines(textConnection(here_lines))
## [1] "line 1" "line 2" "line 3" ""
here_csv <- 'thing,val
one,1
two,2
'
read.table(text=here_csv, sep=",", header=TRUE, stringsAsFactors=FALSE)
## thing val
## 1 one 1
## 2 two 2
here_json <- '{
"a" : [ 1, 2, 3 ],
"b" : [ 4, 5, 6 ],
"c" : { "d" : { "e" : [7, 8, 9]}}
}
'
jsonlite::fromJSON(here_json)
## $a
## [1] 1 2 3
##
## $b
## [1] 4 5 6
##
## $c
## $c$d
## $c$d$e
## [1] 7 8 9
here_xml <- '<CATALOG>
<PLANT>
<COMMON>Bloodroot</COMMON>
<BOTANICAL>Sanguinaria canadensis</BOTANICAL>
<ZONE>4</ZONE>a
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$2.44</PRICE>
<AVAILABILITY>031599</AVAILABILITY>
</PLANT>
<PLANT>
<COMMON>Columbine</COMMON>
<BOTANICAL>Aquilegia canadensis</BOTANICAL>
<ZONE>3</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$9.37</PRICE>
<AVAILABILITY>030699</AVAILABILITY>
</PLANT>
</CATALOG>
'
str(xml <- XML::xmlParse(here_xml))
## Classes 'XMLInternalDocument', 'XMLAbstractDocument' <externalptr>
print(xml)
## <?xml version="1.0"?>
## <CATALOG>
## <PLANT><COMMON>Bloodroot</COMMON><BOTANICAL>Sanguinaria canadensis</BOTANICAL><ZONE>4</ZONE>a
## <LIGHT>Mostly Shady</LIGHT><PRICE>$2.44</PRICE><AVAILABILITY>031599</AVAILABILITY></PLANT>
## <PLANT>
## <COMMON>Columbine</COMMON>
## <BOTANICAL>Aquilegia canadensis</BOTANICAL>
## <ZONE>3</ZONE>
## <LIGHT>Mostly Shady</LIGHT>
## <PRICE>$9.37</PRICE>
## <AVAILABILITY>030699</AVAILABILITY>
## </PLANT>
## </CATALOG>

Pages 90f. of An introduction to R state that it is possible to write R scripts like this (I quote the example modified from there):
chem <- scan()
2.90 3.10 3.40 3.40 3.70 3.70 2.80 2.50 2.40 2.40 2.70 2.20
5.28 3.37 3.03 3.03 28.95 3.77 3.40 2.20 3.50 3.60 3.70 3.70
print(chem)
Write these lines into a file, and give it the name, say, heredoc.R. If you then execute that script non-interactively by typing in your terminal
Rscript heredoc.R
you will get the following output
Read 24 items
[1] 2.90 3.10 3.40 3.40 3.70 3.70 2.80 2.50 2.40 2.40 2.70 2.20
[13] 5.28 3.37 3.03 3.03 28.95 3.77 3.40 2.20 3.50 3.60 3.70 3.70
So you see that the data provided in the file are saved in the variable chem. The function scan(.) reads from the connection stdin() per default. stdin() refers to user input from the console in interactive mode (a call to R without specified script), but when an input script is read in, the following lines of that script are read *). The empty line after the data is important because it marks the end of the data.
This also works with tabular data:
tab <- read.table(file=stdin(), header=T)
A B C
1 1 0
2 1 0
3 2 9
summary(tab)
When using readLines(.), you must specify the number of lines read; the approach with the empty line does not work here:
txt <- readLines(con=stdin(), n=5)
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi ultricies diam
sed felis mattis, id commodo enim hendrerit. Suspendisse iaculis bibendum eros,
ut mattis eros interdum sit amet. Pellentesque condimentum eleifend blandit. Ut
commodo ligula quis varius faucibus. Aliquam accumsan tortor velit, et varius
sapien tristique ut. Sed accumsan, tellus non iaculis luctus, neque nunc
print(txt)
You can overcome this limitation by reading one line at a time until one line is empty or some other predefined string. Note however, that you may run out of memory if you read a large (>100MB) file this way, because each time you append a string to your read-in data, all the data is copied to another place in memory. See the chapter "Growing objects" in The R inferno:
txt <- c()
repeat{
x <- readLines(con=stdin(), n=1)
if(x == "") break # you can use any EOF string you want here
txt = c(txt, x)
}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi ultricies diam
sed felis mattis, id commodo enim hendrerit. Suspendisse iaculis bibendum eros,
ut mattis eros interdum sit amet. Pellentesque condimentum eleifend blandit. Ut
commodo ligula quis varius faucibus. Aliquam accumsan tortor velit, et varius
sapien tristique ut. Sed accumsan, tellus non iaculis luctus, neque nunc
print(txt)
*) If you want to read from standard input in an R script, for example because you want to create a reusable script which can be called with any input data (Rscript reusablescript.R < input.txt or
some-data-generating-command | Rscript reusablescript.R), use not stdin() but file("stdin").

Since R v4.0.0, there is a new syntax for raw strings, as stated in changelogs, that largely allows heredocs style documents to be created.
Additionally, from help(Quotes):
The delimiter pairs [] and {} can also be used, and R can be used in place of r. For additional flexibility, a number of dashes can be placed between the opening quote and the opening delimiter, as long as the same number of dashes appear between the closing delimiter and the closing quote.
As an example, one can use (on a system with BASH shell):
file_raw_string <-
r"(#!/bin/bash
echo $#
for word in $#;
do
echo "This is the word: '${word}'."
done
exit 0
)"
writeLines(file_raw_string, "print_words.sh")
system("bash print_words.sh Word/1 w#rd2 LongWord composite-word")
or even another R script:
file_raw_string <- r"(
x <- lapply(mtcars[,1:4], mean)
cat(
paste(
"Mean for column", names(x), "is", format(x,digits = 2),
collapse = "\n"
)
)
cat("\n")
cat(r"{ - This is a raw string where \n, "", '', /, \ are allowed.}")
)"
writeLines(file_raw_string, "print_means.R")
source("print_means.R")
#> Mean for column mpg is 20
#> Mean for column cyl is 6.2
#> Mean for column disp is 231
#> Mean for column hp is 147
#> - This is a raw string where \n, "", '', /, \ are allowed.
Created on 2021-08-01 by the reprex package (v2.0.0)

A way to do multi-line strings but not worry about quotes (only backticks) you can use:
as.character(quote(`
all of the crazy " ' ) characters, except
backtick and bare backslashes that aren't
printable, e.g. \n works but a \ and c with no space between them would fail`))

What about some more recent tidyverse syntax?
SQL <- c("
SELECT * FROM patient
LEFT OUTER JOIN projectpatient ON patient.patient_id = projectpatient.patient_id
WHERE projectpatient.project_id = 16;
") %>% stringr::str_replace_all("[\r\n]"," ")

Deleting lots of characters from a list in R

I have a list of characters with sentences. I have about 10000+ lines. I want to delete 1000+ words from it. So I have a character vector with the words to be deleted. I am using the approach as follows:
c<-gsub(pattern = wordsToBeDeleted,replacement = "",x = mainList)
This is using only the first word. How can I get this done?

gsub only takes one pattern at at time, but You could combine it with Reduce
#sample data
sentences<-c(
"Morbi in tempus metus, quis commodo eros",
"Cum sociis natoque penatibus et magnis dis parturient montes",
"Nulla diam quam, imperdiet vitae blandit eu",
"Nullam nec pellentesque sapien, ac mollis mauris")
words<-c("quis","eros","diam","nec")
New we loop over all the words, removing them from the sentences
Reduce(function(a,b) gsub(b,"", a,fixed=T), words, sentences)
which gives us
[1] "Morbi in tempus metus, commodo "
[2] "Cum sociis natoque penatibus et magnis dis parturient montes"
[3] "Nulla quam, imperdiet vitae blandit eu"
[4] "Nullam pellentesque sapien, ac mollis mauris"

How about trying this recipe:
sentences = tolower(c("I don't like you.", "But I do like this."))
dropWords = tolower(c("I", "like"))
splitSentences = strsplit(sentences, " ")
purged = lapply(X=splitSentences, FUN=setdiff, y=dropWords)
purged
[[1]]
[1] "don't" "you."
[[2]]
[1] "but" "do" "this."
I also recommend using tolower there since it will take care of case differences.

extract semi-structured text from Word documents

I want to text-mine a set of files based on the below form. I can create a corpus where each file is a document (using tm), but I'm thinking it might be better to create a corpus where each section in the 2nd form table was a document having the following meta data:
Author : John Smith
DateTimeStamp: 2013-04-18 16:53:31
Description :
Heading : Current Focus
ID : Smith-John_e.doc Current Focus
Language : en_CA
Origin : Smith-John_e.doc
Name : John Smith
Title : Manager
TeamMembers : Joe Blow, John Doe
GroupLeader : She who must be obeyed
where Name, Title, TeamMembers and GroupLeader are extracted from the first table on the form. In this way, each chunk of text to be analyzed would maintain some of its context.
What is the best way to approach this? I can think of 2 ways:
somehow parse the corpus I have into child corpora.
somehow parse the document into subdocuments and make a corpus from those.
Any pointers would be much appreciated.
This is the form:
Here is an RData file of a corpus with 2 documents. exc[[1]] came from a .doc and exc[[2]] came from a docx. They both used the form above.

Here's a quick sketch of a method, hopefully it might provoke someone more talented to stop by and suggest something more efficient and robust... Using the RData file in your question, I found that the doc and docx files have slightly different structures and so require slightly different approaches (though I see in the metadata that your docx is 'fake2.txt', so is it really docx? I see in your other Q that you used a converter outside of R, that must be why it's txt).
library(tm)
First get custom metadata for the doc file. I'm no regex expert, as you can see, but it's roughly 'get rid of trailing and leading spaces' then 'get rid of "Word"', then get rid of punctuation...
# create User-defined local meta data pairs
meta(exc[[1]], type = "corpus", tag = "Name1") <- gsub("^\\s+|\\s+$","", gsub("Name", "", gsub("[[:punct:]]", '', exc[[1]][3])))
meta(exc[[1]], type = "corpus", tag = "Title") <- gsub("^\\s+|\\s+$","", gsub("Title", "", gsub("[[:punct:]]", '', exc[[1]][4])))
meta(exc[[1]], type = "corpus", tag = "TeamMembers") <- gsub("^\\s+|\\s+$","", gsub("Team Members", "", gsub("[[:punct:]]", '', exc[[1]][5])))
meta(exc[[1]], type = "corpus", tag = "ManagerName") <- gsub("^\\s+|\\s+$","", gsub("Name of your", "", gsub("[[:punct:]]", '', exc[[1]][7])))
Now have a look at the result
# inspect
meta(exc[[1]], type = "corpus")
Available meta data pairs are:
Author :
DateTimeStamp: 2013-04-22 13:59:28
Description :
Heading :
ID : fake1.doc
Language : en_CA
Origin :
User-defined local meta data pairs are:
$Name1
[1] "John Doe"
$Title
[1] "Manager"
$TeamMembers
[1] "Elise Patton Jeffrey Barnabas"
$ManagerName
[1] "Selma Furtgenstein"
Do the same for the docx file
# create User-defined local meta data pairs
meta(exc[[2]], type = "corpus", tag = "Name2") <- gsub("^\\s+|\\s+$","", gsub("Name", "", gsub("[[:punct:]]", '', exc[[2]][2])))
meta(exc[[2]], type = "corpus", tag = "Title") <- gsub("^\\s+|\\s+$","", gsub("Title", "", gsub("[[:punct:]]", '', exc[[2]][4])))
meta(exc[[2]], type = "corpus", tag = "TeamMembers") <- gsub("^\\s+|\\s+$","", gsub("Team Members", "", gsub("[[:punct:]]", '', exc[[2]][6])))
meta(exc[[2]], type = "corpus", tag = "ManagerName") <- gsub("^\\s+|\\s+$","", gsub("Name of your", "", gsub("[[:punct:]]", '', exc[[2]][8])))
And have a look
# inspect
meta(exc[[2]], type = "corpus")
Available meta data pairs are:
Author :
DateTimeStamp: 2013-04-22 14:06:10
Description :
Heading :
ID : fake2.txt
Language : en
Origin :
User-defined local meta data pairs are:
$Name2
[1] "Joe Blow"
$Title
[1] "Shift Lead"
$TeamMembers
[1] "Melanie Baumgartner Toby Morrison"
$ManagerName
[1] "Selma Furtgenstein"
If you have a large number of documents then a lapply function that includes these meta functions would be the way to go.
Now that we've got the custom metadata, we can subset the documents to exclude that part of the text:
# create new corpus that excludes part of doc that is now in metadata. We just use square bracket indexing to subset the lines that are the second table of the forms (slightly different for each doc type)
excBody <- Corpus(VectorSource(c(paste(exc[[1]][13:length(exc[[1]])], collapse = ","),
paste(exc[[2]][9:length(exc[[2]])], collapse = ","))))
# get rid of all the white spaces
excBody <- tm_map(excBody, stripWhitespace)
Have a look:
inspect(excBody)
A corpus with 2 text documents
The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
create_date creator
Available variables in the data frame are:
MetaID
[[1]]
|CURRENT RESEARCH FOCUS |,| |,|Lorem ipsum dolor sit amet, consectetur adipiscing elit. |,|Donec at ipsum est, vel ullamcorper enim. |,|In vel dui massa, eget egestas libero. |,|Phasellus facilisis cursus nisi, gravida convallis velit ornare a. |,|MAIN AREAS OF EXPERTISE |,|Vestibulum aliquet faucibus tortor, sed aliquet purus elementum vel. |,|In sit amet ante non turpis elementum porttitor. |,|TECHNOLOGY PLATFORMS, INSTRUMENTATION EMPLOYED |,| Vestibulum sed turpis id nulla eleifend fermentum. |,|Nunc sit amet elit eu neque tincidunt aliquet eu at risus. |,|Cras tempor ipsum justo, ut blandit lacus. |,|INDUSTRY PARTNERS (WITHIN THE PAST FIVE YEARS) |,| Pellentesque facilisis nisl in libero scelerisque mattis eu quis odio. |,|Etiam a justo vel sapien rhoncus interdum. |,|ANTICIPATED PARTICIPATION IN PROGRAMS, EITHER APPROVED OR UNDER DEVELOPMENT |,|(Please include anticipated percentages of your time.) |,| Proin vitae ligula quis enim vulputate sagittis vitae ut ante. |,|ADDITIONAL ROLES, DISTINCTIONS, ACADEMIC QUALIFICATIONS AND NOTES |,|e.g., First Aid Responder, Other languages spoken, Degrees, Charitable Campaign |,|Canvasser (GCWCC), OSH representative, Social Committee |,|Sed nec tellus nec massa accumsan faucibus non imperdiet nibh. |,,
[[2]]
CURRENT RESEARCH FOCUS,,* Lorem ipsum dolor sit amet, consectetur adipiscing elit.,* Donec at ipsum est, vel ullamcorper enim.,* In vel dui massa, eget egestas libero.,* Phasellus facilisis cursus nisi, gravida convallis velit ornare a.,MAIN AREAS OF EXPERTISE,* Vestibulum aliquet faucibus tortor, sed aliquet purus elementum vel.,* In sit amet ante non turpis elementum porttitor. ,TECHNOLOGY PLATFORMS, INSTRUMENTATION EMPLOYED,* Vestibulum sed turpis id nulla eleifend fermentum.,* Nunc sit amet elit eu neque tincidunt aliquet eu at risus.,* Cras tempor ipsum justo, ut blandit lacus.,INDUSTRY PARTNERS (WITHIN THE PAST FIVE YEARS),* Pellentesque facilisis nisl in libero scelerisque mattis eu quis odio.,* Etiam a justo vel sapien rhoncus interdum.,ANTICIPATED PARTICIPATION IN PROGRAMS, EITHER APPROVED OR UNDER DEVELOPMENT ,(Please include anticipated percentages of your time.),* Proin vitae ligula quis enim vulputate sagittis vitae ut ante.,ADDITIONAL ROLES, DISTINCTIONS, ACADEMIC QUALIFICATIONS AND NOTES,e.g., First Aid Responder, Other languages spoken, Degrees, Charitable Campaign Canvasser (GCWCC), OSH representative, Social Committee,* Sed nec tellus nec massa accumsan faucibus non imperdiet nibh.,,
Now the documents are ready for text mining, with the data from the upper table moved out of the document and into the document metadata.
Of course all of this depends on the documents being highly regular. If there are different numbers of lines in the first table in each doc, then the simple indexing method might fail (give it a try and see what happens) and something more robust will be needed.
UPDATE: A more robust method
Having read the question a little more carefully, and got a bit more education about regex, here's a method that is more robust and doesn't depend on indexing specific lines of the documents. Instead, we use regular expressions to extract text from between two words to make the metadata and split the document
Here's how we make the User-defined local meta data (a method to replace the one above)
library(gdata) # for the trim function
txt <- paste0(as.character(exc[[1]]), collapse = ",")
# inspect the document to identify the words on either side of the string
# we want, so 'Name' and 'Title' are on either side of 'John Doe'
extract <- regmatches(txt, gregexpr("(?<=Name).*?(?=Title)", txt, perl=TRUE))
meta(exc[[1]], type = "corpus", tag = "Name1") <- trim(gsub("[[:punct:]]", "", extract))
extract <- regmatches(txt, gregexpr("(?<=Title).*?(?=Team)", txt, perl=TRUE))
meta(exc[[1]], type = "corpus", tag = "Title") <- trim(gsub("[[:punct:]]","", extract))
extract <- regmatches(txt, gregexpr("(?<=Members).*?(?=Supervised)", txt, perl=TRUE))
meta(exc[[1]], type = "corpus", tag = "TeamMembers") <- trim(gsub("[[:punct:]]","", extract))
extract <- regmatches(txt, gregexpr("(?<=your).*?(?=Supervisor)", txt, perl=TRUE))
meta(exc[[1]], type = "corpus", tag = "ManagerName") <- trim(gsub("[[:punct:]]","", extract))
# inspect
meta(exc[[1]], type = "corpus")
Available meta data pairs are:
Author :
DateTimeStamp: 2013-04-22 13:59:28
Description :
Heading :
ID : fake1.doc
Language : en_CA
Origin :
User-defined local meta data pairs are:
$Name1
[1] "John Doe"
$Title
[1] "Manager"
$TeamMembers
[1] "Elise Patton Jeffrey Barnabas"
$ManagerName
[1] "Selma Furtgenstein"
Similarly we can extract the sections of your second table into separate
vectors and then you can make them into documents and corpora or just work
on them as vectors.
txt <- paste0(as.character(exc[[1]]), collapse = ",")
CURRENT_RESEARCH_FOCUS <- trim(gsub("[[:punct:]]","", regmatches(txt, gregexpr("(?<=CURRENT RESEARCH FOCUS).*?(?=MAIN AREAS OF EXPERTISE)", txt, perl=TRUE))))
[1] "Lorem ipsum dolor sit amet consectetur adipiscing elit Donec at ipsum est vel ullamcorper enim In vel dui massa eget egestas libero Phasellus facilisis cursus nisi gravida convallis velit ornare a"
MAIN_AREAS_OF_EXPERTISE <- trim(gsub("[[:punct:]]","", regmatches(txt, gregexpr("(?<=MAIN AREAS OF EXPERTISE).*?(?=TECHNOLOGY PLATFORMS, INSTRUMENTATION EMPLOYED)", txt, perl=TRUE))))
[1] "Vestibulum aliquet faucibus tortor sed aliquet purus elementum vel In sit amet ante non turpis elementum porttitor"
And so on. I hope that's a bit closer to what you're after. If not, it might be best to break down your task into a set of smaller, more focused questions, and ask them separately (or wait for one of the gurus to stop by this question!).

two column beamer/sweave slide with grid graphic

I'm trying to make a presentation on ggplot2 graphics using beamer + sweave. Some slides should have two columns; the left one for the code, the right one for the resulting graphic. Here's what I tried,
\documentclass[xcolor=dvipsnames]{beamer}
\usepackage{/Library/Frameworks/R.framework/Resources/share/texmf/tex/latex/Sweave}
\usepackage[english]{babel}
\usepackage{tikz}
\usepackage{amsmath,amssymb}% AMS standards
\usepackage{listings}
\usetheme{Madrid}
\usecolortheme{dove}
\usecolortheme{rose}
\SweaveOpts{pdf=TRUE, echo=FALSE, fig=FALSE, eps=FALSE, tidy=T, width=4, height=4}
\title{Reproducible data analysis with \texttt{ggplot2} \& \texttt{R}}
\subtitle{subtitle}
\author{Baptiste Augui\'e}
\date{\today}
\institute{Here}
\begin{document}
\begin{frame}[fragile]
\frametitle{Some text to show the space taken by the title}
\begin{columns}[t] \column{0.5\textwidth}
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.
\column{0.5\textwidth}
\begin{figure}[!ht]
\centering
<<fig=TRUE>>=
grid.rect(gp=gpar(fill="slateblue"))
#
\end{figure}
\end{columns}
\end{frame}
\begin{frame}[fragile]
\frametitle{Some text to show the space taken by the title}
\begin{columns}[t]
\column{0.5\textwidth}
<<echo=TRUE,fig=FALSE>>=
library(ggplot2)
p <-
qplot(mpg, wt, data=mtcars, colour=cyl) +
theme_grey(base_family="Helvetica")
#
\column{0.5\textwidth}
\begin{figure}[!ht]
\centering
<<fig=TRUE>>=
print(p)
#
\end{figure}
\end{columns}
\end{frame}
\end{document}
And the two pages of output.
I have two issues with this output:
the echo-ed sweave code ignores the columns environment and spans the two columns
the column margins for either graphic are unecessarily wide
Any ideas?
Thanks.

As for the first question, the easy way is to set keep.source=TRUE in SweaveOpts. For more fancy control, see fancyvrb and FAQ #9 of Sweave manual.
The width of the figure can be set by \setkeys{Gin}{width=1.0\textwidth}
here is a slight modification:
... snip ...
\SweaveOpts{pdf=TRUE, echo=FALSE, fig=FALSE, eps=FALSE, tidy=T, width=4, height=4, keep.source=TRUE}
\title{Reproducible data analysis with \texttt{ggplot2} \& \texttt{R}}
... snip ...
\begin{document}
\setkeys{Gin}{width=1.1\textwidth}
... snip...
<<echo=TRUE,fig=FALSE>>=
library(ggplot2)
p <-
qplot(mpg,
wt,
data=mtcars,
colour=cyl) +
theme_grey(base_family=
"Helvetica")
#

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

replace string with a random character from a selection - r

Given your clarification, try this one: x <- c("this, is nice.", "nice, this is.") gr <- gregexpr("[., ]", x) regmatches(x,gr) <- lapply(lengths(gr), sample, x=c('|',':','#','')) x #[1] "this|is#nice:" "nice#|this*is:"

Here is another option with chartr pat <- paste(sample(c('|', ';', '#', ''), 3), collapse="") chartr('., ', pat, x) #[1] "this|isnice;" "nice|this*is;" data x <- c("this, is nice.", "nice, this is.")

Related

How to extract repeated pattterns from a string

Is there in R something like the "here document" in bash?

Deleting lots of characters from a list in R

extract semi-structured text from Word documents

two column beamer/sweave slide with grid graphic

Categories

Resources

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

replace string with a random character from a selection - r

Given your clarification, try this one: x <- c("this, is nice.", "nice, this is.") gr <- gregexpr("[., ]", x) regmatches(x,gr) <- lapply(lengths(gr), sample, x=c('|',':','#','*')) x #[1] "this|*is#nice:" "nice#|this*is:"

Here is another option with chartr pat <- paste(sample(c('|', ';', '#', '*'), 3), collapse="") chartr('., ', pat, x) #[1] "this|*is*nice;" "nice|*this*is;" data x <- c("this, is nice.", "nice, this is.")

Related

How to extract repeated pattterns from a string

Is there in R something like the "here document" in bash?

Deleting lots of characters from a list in R

extract semi-structured text from Word documents

two column beamer/sweave slide with grid graphic

Categories

Resources

Given your clarification, try this one: x <- c("this, is nice.", "nice, this is.") gr <- gregexpr("[., ]", x) regmatches(x,gr) <- lapply(lengths(gr), sample, x=c('|',':','#','')) x #[1] "this|is#nice:" "nice#|this*is:"

Here is another option with chartr pat <- paste(sample(c('|', ';', '#', ''), 3), collapse="") chartr('., ', pat, x) #[1] "this|isnice;" "nice|this*is;" data x <- c("this, is nice.", "nice, this is.")