How to add text at beginning of Word document? - r

I'd like to add content at the beginning of a Word document. Important note: this document already has content, I want to add content BEFORE text that is already in this file. something like:
# input file text
blah blah blah
blah blah blah
# output file text
This added paragraph1
blah blah blah
blah blah blah
I'm using OfficeR package in R. I'm trying to open a file, add a line at the beginning of file, and save it with a different name:
library('officer')
sample_doc <- read_docx("inputfile.docx")
cursor_begin(sample_doc)
sample_doc <- body_add_par(sample_doc, "This added paragraph1")
print(sample_doc, target = "outputfile.docx")
Unfortunately, the cursor_begin command doesn't seem to work, the new paragraph is appended to the end of the document. I don't know if I'm reading something wrong in the documentation. Could someone give me a hint?
EDIT:
There was a suggestion below to use pos="before" to indicate where to insert the text - before or after the cursor. For example
body_add_par(sample_doc, "This added paragraph1", pos="before")
Unfortunately, this solution works only for docs with one paragraph of text. With only one paragraph of text, setting pos='before' moves the text up a line whether or not you use cursor_begin. Using this solution for more than one paragraph stil gives something like:
# input file text
blah blah blah
blah blah blah
# output file text
blah blah blah
This added paragraph1
blah blah blah
so it is not the solution i'm lookin for.

Actually, I think that cursor_begin is working, but maybe not the way that you think. It is selecting the first paragraph. But when you use body_add_par the default is pos="after". You need "before". Also, when you call cursor_begin you must save the result back into sample_doc.
This should work for you:
library('officer')
sample_doc <- read_docx("inputfile.docx")
sample_doc <- cursor_begin(sample_doc)
sample_doc <- body_add_par(sample_doc,
"This added paragraph1", pos="before")
print(sample_doc, target = "outputfile.docx")

Related

Scrapy - Remove comma and whitespace from getall() results

would there be an effective way to directly remove commas from the yielded results via getall()?
As an example, the data I'm trying to retrieve is in this format:
<div>
Text 1
<br>
Text 2
<br>
Text 3
</div>
My current selector for this is:
response.xpath("//div//text()").getall()
Which does get the correct data but they come out as:
Text 1,
Text 2,
Text 3
instead of
Text 1
Text 2
Text 3
I understand that they get recognized as a list which is the reason for the commas but would there be a direct function to remove them without affecting the commas from the text itself?
I'm just going to leave the solution I used in case someone needs it:
tc = response.xpath("//div//text()").getall() #xpath selector
tcl = "".join(tc) #used to convert the list into a string

How to create a link to a header in restructuredtext?

I have a document in restructuredtest like:
Header 1
========
and from some any other point (might be the same 'rst' file or a different one) I want to create a hyperlink to that header. So that when a user clicks on it, he gets to the page with the header Header 1
How to do that?
I tried to put the following line in the other document (according to this documentation):
see :ref:`Header 1`
but what I get is the following:
see Header 1
without any link...
I also tried to follow this documentation:
What I put in to the rst file is the following
see `Header 1`_
and what I see is the following link:
see `Header 1`_
which does not look very nice ...
Your first link was almost correct. You need to add a label preceding the section header, separated by a blank line. See Inline markup, Cross-referencing arbitrary locations, using the :ref: directive.
In your case:
.. _header-1-label-name:
Header 1
========
Some text
Here is a section reference: :ref:`header-1-label-name`.
Here is a section reference with a title: :ref:`Header 1 with a title <header-1-label-name>`.
In addition to the accepted answer, the label you add (in this case .. _header-1-label-name:) is required to have a dash. So a simple .. _label: won't do. Took me a while to figure that out.

How do I stop Paw from converting my text input into dynamic values?

I paste something with :blah and :blah is converted to a dynamic value blah. How do I stop this?

XQuery - Why there is difference in result?

<Docs>
<Doc>
<Title>Electromagnetic Fields</Title>
<Info>
<Vol name="Physics"/>
<Year>2006</Year>
</Info>
<SD>
<Info>
<Para>blah blah blah.<P>blh blah blah.</P></Para>
</Info>
</SD>
<LD>
<Info>
<Para>blah blah blah.<P>blah blah blah.</P></Para>
<Para>blah blah blah.<P>blah blah blah.</P></Para>
<Para>blah blah blah.<P>emf waves blah.</P></Para>
<Para>blah blah blah.<B>emf waves</B> blah.</Para>
<Para>blah blah blah.<P>emf waves blah.</P></Para>
<Para>blah waves blah.<B>emf</B> waves blah.</Para>
<Para>emf blah blah.<I>waves blah.</I></Para>
<Para>blah blah blah.<B>emf waves</B> blah.</Para>
<Para>blah blah blah.<P><I>emf</I> waves blah.</P></Para>
</Info>
</LD>
</Doc>
</Docs>
Query 1 -
for $x in ft:search("Article", ("emf","waves"), map{'mode':='all words'})/ancestor::*:Doc
return $x/Title
I am getting 62 Hits
Query 2 -
for $x in ft:search("Article", ("emf","waves"), map{'mode':='all words'})
return $x/ancestor::*:Doc/Title
I am getting 159 Hits
Query 3 -
for $x in doc("Article")/Doc[Info[Vol/#name="Physics" and Year ge "2006" and Year le "2010"]]
[SD/Info/Para/text() contains text {"emf","waves"} all words or
SD/Info/Para/P/text() contains text {"emf","waves"} all words or
LD/Info/Para/text() contains text {"emf","waves"} all words or
SD/Info/Para/P/text() contains text {"emf","waves"} all words or
SD/Info/Para/P/B/text() contains text {"emf","waves"} all words or
SD/Info/Para/P/I/text() contains text {"emf","waves"} all words or
SD/Info/Para/P/U/text() contains text {"emf","waves"} all words]
return $x/Title
This results in 224 hits. In the 3rd one, I am using all the nodes which are actually present. I, B and U are for Italic, Bold and Underline the text.
Why this difference ?
Queries 1 and 2 pretty much look the same, however the path expression in Q1 results in Doc elements. So if there are multiple matching nodes below a single Doc, that Doc will count just once in Q1, whereas each node is counted individually in Q2. This is due to the fact that the node sequence resulting from a path expression, by definition, is duplicate-free.
Q3 is different, but while Q1 and Q2 depend on the properties of a full-text index, Q3 doesn't. If e.g. the index is case-sensitive, you'll get less results from it than from a contains text predicate.
So from the quoted counts, I'd assume that the text index comes up with 159 matching nodes in 62 documents, while being specified as more restrictive than a plain contains text.
Your first query searches for Doc elements which have a certain property, and returns one result for each such Doc element.
Your second query searches for nodes of any kind which have a (related) property, and returns one result for each such node.
Your third query searches for text nodes which have another (related) property.
Whenever there are Doc elements containing more than one node matching the full-text search criterion, the first and second queries will return different numbers of hits. And similarly for the third query, vis-a-vis the others.

displaying text from InnerText

When I try and display text from an the InnerText from an XML element. I get something like this:
I need this spacing \r\n\r\n\r\second lot of spacing\r\n\r\nMore spacing\r\n\r\n
I know you can replace \r\n with <br> but is there no function that automatically takes the html for you and why does it use \r and \n? Many thanks.
You can use <pre> tag - it will show the text as-is like you see it in text editor:
For example:
<pre><%=MyText%></pre>
Better practice for ASP.NET is:
<pre id="myPlaceholder" runat="server"></pre>
Then assign its value from code behind:
myPlaceholder.InnerHtml = MyText;
As for your question "why does it use \r and \n" those are carriage return and line feed characters, aka newline characters - when you have such text:
line 1
line 2
Then code reading it will give: line1\nline2 or line1\r\nline2 depending on how it's stored exactly.

Resources