span across columns with hwrite - r

Is it possible to span a heading across multiple columns with hwrite (or any other HTML-creating package)? I can sort of fake it with dataframe pieces nested within a larger table, but it's not quite a real span (and it looks ugly).
I did not see a version of this in the examples but maybe there exists elsewhere.
Thanks,
Tom

Edit: I should add that the print.xtable method does html, also (I shouldn't assume that is known). Use the type = "html" option.
No experience with html, but I do the following with LaTeX.
In the xtable package, the print.xtable method has an option add.to.row that allows you to do just that. For add.to.row you add a list-of-lists, where the first list is a list of row numbers and the second list is a list of commands to insert at that spot. From the ?print.xtable:
add.to.row -- a list of two
components. The first component (which
should be called 'pos') is a list
contains the position of rows on which
extra commands should be added at the
end, The second component (which
should be called 'command') is a
character vector of the same length of
the first component which contains the
command that should be added at the
end of the specified rows. Default
value is NULL, i.e. do not add
commands.
For LaTeX I use the following homemade command that add a "(1)" above the coefficient and t-stat column.
my.add.to.row <- function(x) {
first <- "\\hline \\multicolumn{1}{c}{} & "
middle <- paste(paste("\\multicolumn{2}{c}{(", seq(x), ")}", sep = ""), collapse = " & ")
last <- paste("\\\\ \\cline {", 2, "-", 1 + 2 * x, "}", collapse = "")
string <- paste(first, middle, last, collapse = "")
list(pos = list(-1), command = string)
}
HTH.

I can't see an obvious way of generating a table with headers that cross multiple columns. Here's a really awful hack that might solve your problem though.
Generate your table as normal.
In the source code for that page, the first row of the table will look something like
<td someattribute="somevalue">First column name</td><td someattribute="somevalue">Second column name</td>
You can read the file into R, either with htmlTreeParse from the XML package, or plain old readLines.
Now replace the offending bit of html with the correct value. The stringr package may well help here.
<td someattribute="somevalue" colspan="2">Column name spanning two columns</td>
And write back out to file.

Related

Use substr with start and stop words, instead of integers

I want to extract information from downloaded html-Code. The html-Code is given as a string. The required information is stored inbetween specific html-expressions. For example, if I want to have every headline in the string, I have to search for "H1>" and "/H1>" and the text between these html expressions.
So far, I used substr(), but I had to calculate the position of "H1>" and "/H1>" first.
htmlcode = " some html code <H1>headline</H1> some other code <H1>headline2</H1> "
startposition = c(21,55) # calculated with gregexpr
stopposition = c(28, 63) # calculated with gregexpr
substr(htmlcode, startposition[1], stopposition[1])
substr(htmlcode, startposition[2], stopposition[2])
The output is correct, but to calculate every single start and stopposition is a lot of work. Instead I search for a similar function like substr (), where you can use start and stop words instead of the position. For example like this:
function(htmlcode, startword = "H1>", stopword = "/H1>")
I'd agree that using a package built for html processing is probably the best way to handle the example you give. However, one potential way to sub-string a string based on character values would be to do the following.
Step 1: Define a simple function to return to position of a character in a string, in this example I am only using fixed character strings.
strpos_fixed=function(string,char){
a<-gregexpr(char,string,fixed=T)
b<-a[[1]][1:length(a[[1]])]
return(b)
}
Step 2: Define your new sub-string function using the strpos_fixed() function you just defined
char_substr<-function(string,start,stop){
x<-strpos_fixed(string,start)+nchar(start)
y<-strpos_fixed(string,stop)-1
z<-cbind(x,y)
apply(z,1,function(x){substr(string,x[1],x[2])})
}
Step 3: Test
htmlcode = " some html code <H1>headline</H1> some other code <H1>headline2</H1> "
htmlcode2 = " some html code <H1>baa dee ya</H1> some other code <H1>say do you remember?</H1>"
htmlcode3<- "<x>baa dee ya</x> skdjalhgfjafha <x>dancing in september</x>"
char_substr(htmlcode,"<H1>","</H1>")
char_substr(htmlcode2,"<H1>","</H1>")
char_substr(htmlcode3,"<x>","</x>")
You have two options here. First, use a package that has been developed explicitly for the parsing of HTML structures, e.g., rvest. There are a number of tutorials online.
Second, for edge cases where you may need to extract from strings that are not necessarily well-formatted HTML you should use regular expressions. One of the simpler implementations for this comes from stringr::str_match:
# 1. the parenthesis define regex groups
# 2. ".*?" means any character, non-greedy
# 3. so together we are matching the expression <H1>some text or characters of any length</H1>
str_match(htmlcode, "(<H1>)(.*?)(</H1>)")
This will yield a matrix where the columns are (in order) the fully matched string followed by each independent regex group we specified. You would just want to pull the second group in this case if you want whatever text is between the <H1> tags (3rd column).

Method to paste whisker template

I am writing a program to generate whisker template in a loop, and I want to paste them together. For every template, my code looks like this:
script[i] <- whisker.render(template_pdf[i], data = parameter)
Is there a method to paste all the script[i] together? I know I can paste all the chunks first and then use function whisker.render just for one time, but that will cause some trouble in my particular case. If I can paste all the script[i] together, that will be convenient for me.
When script is a character vector you can do
paste(script, collapse = "\n")
with \n (newline) the character inserted between the script elements.
When script is a list, you can do
do.call(paste, c(script, sep ="\n"))

R indent output

is it possible to indent output in R?
e.g.
cat("text1\n")
indent.switch(indent=4)
cat("random text\n")
print("another random text")
indent.switch(indent=0)
cat("text2\n")
resulting in
text1
random text
another random text
text2
I searched for this a few months ago, found nothing and am now searching again.
My current idea is to "overwrite" (I forgot the special term) the functions cat and/or print with an additional argument like:
cat("random text", indent=4)
Only I'm stuck with this and I dont like this procedure very much.
Any ideas?
Edit:
I should be more particular, nevertheless thank you for the \t (omg, i totally forgot this -.-) and that I can format it inside cat.
The given solutions work, but only solve my second-choice-path.
A switch as shown in my first codeexample does not exist I suppose?
My problem is that I have parts of a bigger program which have multiple subscripts, and the output of each subscript should be indented. This is absolutely possible with the "\t" or just blanks inside cat() but has to be done in every command, which I dont like very much.
Solution
I used Chris C's code and extended it in a very easy way. (Thank you very much Chris!)
define.catt <- function(ntab = NULL, nspace=NULL){
catt <- function(input = NULL){
if(!is.null(ntab)) cat(paste0(paste(rep("\t", ntab), collapse = ""), input))
if(!is.null(nspace)) cat(paste0(paste(rep(" ", nspace), collapse = ""), input))
if(is.null(ntab) && is.null(nspace)) cat(input)
}
return(catt)
}
The same way you used \n to print a newline, you can use \t to print a tab.
E.g.
cat("Parent level \n \t Child level \n \t \t Double Child \n \t Child \n Parent level")
Evaluates to
Parent level
Child level
Double Child
Child
Parent level
As an alternative, you can create a derivative of cat called catt and alter options depending on the script. For example.
define.catt <- function(ntab = NULL){
catt <- function(input = NULL){
cat(paste0(paste(rep("\t", ntab), collapse = ""), input))
}
return(catt)
}
You would then set catt with however many tabs you wanted by
catt <- define.catt(ntab = 1)
catt("hi")
hi
catt <- define.catt(ntab = 2)
catt("hi")
hi
And just use catt() instead of cat().
You may consider the very versatile function capture.output(...), which evaluates the '...' list of expressions provided as main input arguments, and stores the text output (as if it would be displayed in the console) into a character vector instead. Then, you simply have to modify the strings as desired: here you want to add some leading spaces to each string. Finally, you write the strings to the console.
These can be done all in one line of nested calls. For example:
writeLines(paste(" ", capture.output(print(head(iris))), sep=""))
I therefore recommend you all to read the help of the capture.output function, and then try to use it for various purposes. Indeed, since the main input has the usual flexibility of the '...' list-like structure, you are free to include, for instance, a call to one home-made function, and thus do almost anything. As for indentation, that is simply done with paste function, once the former has done its magic.

R: Add paste() elements to file

I'm using base::paste in a for loop:
for (k in 1:length(summary$pro))
{
if (k == 1)
mp <- summary$pro[k]
else
mp <- paste(mp, summary$pro[k], sep = ",")
}
mp comes out as one big string, where the elements are separated by commas.
For example mp is "1,2,3,4,5,6"
Then, I want to put mp in a file, where each of its elements is added to a separate column in the same row. My code for this is:
write.table(mp, file = recompdatafile, sep = ",")
However, mp just appears in the CSV as one big string as opposed to being divided up. How can I achieve my desired format?
FYI
I've also tried converting mp to a list, and strsplit()-ing it, neither of which have worked.
Once I've added summary$pro to the file, how can I also add summary$me (which has the same format), in one row with multiple columns?
Thanks,
n.i.
If you want to write something to a file, write.table() isn't the only way. If you want to avoid headers and quotes and such, you can use the more direct cat. For example
cat(summary$pro, sep=",", file="filename.txt")
will write out the vector of values from summary$pro separated by commas more directly. You don't need to build a string first. (And building a string one element at a time as you did above is a bad practice anyway. Most functions in R can operate on an entire vector at a time, including paste).

readline is considering every record in the spreadsheet as a new line [R]

I am trying to create a function that will calculate the frequency count of keywords using TM package. The function works fine if the text pasted from readline is on free form text without a new line. The problem is, when I paste a bunch of text copied from a spreadsheet, readline considers it as a new line.
keyword <- function() {
x <- readline(as.character('Input text here: '))
x <- Corpus(VectorSource(x))
...
tdm <- TermDocumentMatrix(x)
...
tdm
}
Here's the full code: https://github.com/CSCDataAnalytics/PM-Analysis/blob/master/Keyword.R
How can I prevent this from happening or at least consider a bunch of text of every row from the spreadsheet as one vector only?
If I'm understanding you correctly, the problem is when the user pastes the text from another application: the newline is causing R to stop accepting the subsequent lines.
One technique (fragile as it may be) is to look for a specific line, such as an empty line "" or a period ".". It's a little fragile because now you need (1) assurance that the data will "never" include that as a whole line, and (2) it is easily appended by the user.
Try:
endofinput <- ""
totalstr <- ""
while(! endofinput == (x <- readline('prompt (empty string when done): ')))
totalstr <- paste(totalstr, x)
In this case, the empty string is the catch, and when the while loop is done, totalstr contains all input separated by a space (this can be changed in the paste function).
NB: one problem with this technique is that it is "growing" the vector totalstr, which will eventually cause performance penalties (depending on the size of the input data): every loop iteration, more memory is allocated and the entire string is copied plus the new line of text. There are more verbose ways to side-step this problem (e.g., pre-allocate a vector larger than your anticipated input data), but if you aren't anticipated 1000s of lines then you may be able to accept this naive programming for simplicity.
Another option would be to have the user save the data to a text file and use file.choose() and readLines() to get your data.
Try collapsing the data into a single string after using readline
x <- paste(readline(as.character('Input text here: ')), collapse=' ')

Resources