docx4j XHTMLImporter ignores &nbsp (non-breaking space) - xhtml

The XHTMLImporter from docx4j is not converting &nbsp into MS WORD non-breaking spaces.
Following code is used:
public void convert() throws Exception {
String stringFromFile = FileUtils.readFileToString(new File("tmp.xhtml"), "UTF-8");
String unescaped = stringFromFile;
System.out.println("Unescaped: " + unescaped);
// Setup font mapping
RFonts rfonts = Context.getWmlObjectFactory().createRFonts();
rfonts.setAscii("Century Gothic");
XHTMLImporterImpl.addFontMapping("Century Gothic", rfonts);
// Create an empty docx package
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();
// Convert the XHTML, and add it into the empty docx we made
XHTMLImporter XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
XHTMLImporter.setHyperlinkStyle("Hyperlink");
wordMLPackage.getMainDocumentPart().getContent().addAll(
XHTMLImporter.convert(unescaped, null) );
System.out.println(
XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(), true, true));
wordMLPackage.save(new java.io.File("OUT_from_XHTML.docx") );
}
When the XHTML input is like:
<p style="LINE-HEIGHT: 120%; MARGIN: 0in 0in 0pt"
class="MsoNormal"><span
style="LINE-HEIGHT: 120%; FONT-FAMILY: 'Courier New'; FONT-SIZE: 10pt; mso-fareast-font-family: 'Times New Roman'">Up
to Age 30<span
style="mso-spacerun: yes"> </span>
2.30<span
style="mso-spacerun: yes"> </span>
3.30</span></p>
then the docx output is like:
<w:r>
<w:rPr>
<w:rFonts w:ascii="Courier New"/>
<w:b w:val="false"/>
<w:i w:val="false"/>
<w:color w:val="000000"/>
<w:sz w:val="20"/>
</w:rPr>
<w:t>
2.30</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rFonts w:ascii="Courier New"/>
<w:b w:val="false"/>
<w:i w:val="false"/>
<w:color w:val="000000"/>
<w:sz w:val="20"/>
</w:rPr>
<w:t>
3.30</w:t>
</w:r>
When opening the document in Word 2013 then there are no spaces at all.

I haven't dig too deep in docx4j sources and just call
String escaped = unescaped.replace(" ", "\u00A0");
Unfortunately in the word document it became as usual space, but it wasn't critical in my case.

This works !!
String escaped = unescaped.replace(" ", "\u00A0");
&nbsp will be replaced by this \u00A0 it will add a space

Related

sap.m.FormattedText not working in the sap.m.CustomTile

I want to make demo with sap.m.FormattedText nested in the sap.m.CustomTile in the jsbin. I do not know why I got this insted of the text SYSTEM SIZE:
The string I used is:
var sString = "<p style="font-size:20px; color:#808080; padding-left:40px; " > SYSTEM SIZE </p>" ;
var oFtext = new sap.m.FormattedText();
oFtext.setHtmlText(sString);
All code is in the jsbin example.
When I use the same string in the view it works:
<CustomTile>
<Vbox>
<FormattedText htmlText='
<p style="font-size:20px; color:#808080; padding-left:40px; margin-bottom:0px; "> SYSTEM SIZE </p>
'/>
</Vbox>
</CustomTile>
Thanks for any advice.
You are escaping the <, > and ' characters. This is necessary on the xml file (it is actually just part of the specification of xml, as you can see in this question on SO) but not in the js file.
Write the html normally and it will work: <p style='font-size:20px; color:#808080; padding-left:40px; ' >; SYSTEM SIZE </p>"

QString.replace not working

I am trying to process HTML data held in a QString. The data has encoded HTML tags, e.g. "<" etc. I want to convert these to the appropriate symbols.
I have been trying a number of approaches but none seem to work, which suggest I am missing something really simple.
Here is the code (amended to fix typos reported by earlier comments):
QString theData = "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
<html><head><meta name="qrichtext" content="1" /><style type="text/css">
p, li { white-space: pre-wrap; }
</style></head><body style=" font-family:'Arial'; font-size:20pt; font-weight:400; font-style:normal;">
<table border="0" style="-qt-table-type: root; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;">
<tr>
<td style="border: none;">
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"><span style=" font-size:14pt; color:#4cb8ff;">This is text on the second page. This page contains a embedded image,</span></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"><span style=" font-size:14pt; color:#4cb8ff;">and audio.</span></p></td></tr></table></body></html>";
QString t2 = theData.replace("&", "&").replace("<", "<").replace(">", ">").replace(""", "'");
The value of t2 however is the same as theData after the replaces.
There is no definition of t1 in your code, I suppose you mean theData (and no double dot). The QString::replace functions alter the value of the string and return a reference of this.
QString s = "abc";
s.replace("a", "z").replace("b", "z");
// s = "zzc";
// if you don't want to alter s
QString s = "abc";
QString t = s;
t.replace("a", "z").replace("b", "z");
But there is better way to escape/unescape html strings:
// html -> plain text
QTextDocument doc;
doc.setHtml(theData);
QString t2 = doc.toPlainText();
// plain text -> html
QString plainText = "#include <QtCore>"
QString htmlText = plainText.toHtmlEscaped();
// htmlText == "#include <QtCore>"
If you only want to convert html entities, I use the following function, complementary to QString::toHtmlEscaped():
QString fromHtmlEscaped(QString html) {
html.replace(""", "\"", Qt::CaseInsensitive);
html.replace(">", ">", Qt::CaseInsensitive);
html.replace("<", "<", Qt::CaseInsensitive);
html.replace("&", "&", Qt::CaseInsensitive);
return html;
}
In all cases, it should hold that str == fromHtmlEscaped(str.toHtmlEscaped()).

Capitalize after a point in css

Having enough people writing in upper case, I inserted the syntax text-transform: lowercase; or the text to be written in lower case and syntax ::first-letter for a capital is created after the beginning of each sentence after the point.
text-transform: lowercase; works fine but for ::first-letter he created me a capital letter at the beginning of the sentence but not after!
Is it possible to create CSS capitalized after a point?
Keep all data into a variable and split it with the point you want. Then display all array inside paragraph. This might be working.
var str = "What ever you want to do. Please do it here.";
var res = str.split(".");
then use for loop and getElementbyId to replace the content
Try this:
str = 'ABC. DEF. XYZ';
str2 = str.toLowerCase();
str3 = str2.replace(/\. /g, '.</span> <span class = caps>')
$('#output').html('<span class = caps>' + str3)
.caps {
display: inline-block;
}
.caps::first-letter {
text-transform: uppercase;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="output"></div>
Convert the entire string to lower case; then replace the . with span elements; apply CSS rules to the span elements so that they are block level, inline and first letter capitalised); and just to tidy up, add an opening <span> before the replacement string to match the closing tag at the end of the first sentence.
<html>
<head>
<style>
p::first-letter {
font-weight: bold;
color: red;
}
</style>
<script>
function myFunction() {
var str = document.getElementById('data').innerHTML;
var res = str.split(".");
var data = "";
for(i=0; i<(res.length-1); i++){
var data = data + "<p>"+res[i]+".</p>";
}
document.getElementById("data").innerHTML = data;
}
</script>
</head>
<body onload="myFunction()">
<div id="data">What ever you want to do. Please do it here.</div>
</body>
</html>
This will automatically change the data onload.
Have more questions leave me a message in grandamour

Flying Saucer font for unicode characters

I am generating PDF using Grails export plugin (basically, Flying Saucer). My GSP page is an UTF-8 page (or at least properties are showing that it is UTF-8, also in the beginning of the GSP page there is a <?xml version="1.0" encoding="UTF-8"?> directive). At first generated PDF properly contained umlaut characters "äöüõ", but Cyrillic characters were missing from PDF (not rendered at all). Then I've changed my css file as described in documentation by adding following:
#font-face {
src: url(ARIALUNI.TTF);
-fs-pdf-font-embed: embed;
-fs-pdf-font-encoding: UTF-8;
}
body {
font-family: "Arial Unicode MS", Arial, sans-serif;
}
ArialUni.ttf is also deployed to the server. But now I am getting both umlaut characters and Cyrillic characters rendered as boxes. If I am changing -fs-pdf-encoding property value to Identity-H then umlaut characters are rendered properly, but Cyrillic characters are rendered as question marks.
Any ideas of what font can be used to properly render both umlaut and Cyrillic characters? Or may be my CSS is somehow wrong? Any hints would be much appreciated.
Upd 1:
I have also tried following css (which was generated by http://fontface.codeandmore.com/):
#font-face {
font-family: 'ArialUnicodeMS';
src: url('arialuni.ttf');
src: url('arialuni.eot?#iefix') format('embedded-opentype'),
url('arialuni.woff') format('woff'),
url('arialuni.ttf') format('truetype'),
url('arialuni.svg#arialuni') format('svg');
font-weight: normal;
font-style: normal;
-fs-pdf-font-embed: embed;
-fs-pdf-font-encoding: UTF-8;
}
body {
font-family:'ArialUnicodeMS';
}
I've added <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
I was also trying to run grails with -Dfile.encoding=UTF-8, as was mentioned here: http://grails.1312388.n4.nabble.com/PDF-plugin-Having-problems-with-instalation-td2297840.html, but nothing helps. Cyrillic characters are not shown at all. Any other ideas what might be the problem?
*BTW:*I am packaging my PDF as zip and sending it back to browser in the response like that:
response.setHeader "Content-disposition", "attachment; filename=test.zip"
response.setHeader "Content-Encoding", "UTF-8"
response.contentType = 'application/zip'
response.outputStream << zip
response.outputStream.flush()
response.outputStream.close()
Do I need to somehow consider encoding while zipping????, which I do like that:
public static byte[] zipBytes(Map<String, ByteArrayOutputStream> fileNameToByteContentMap) throws IOException {
ByteArrayOutputStream zipBaos = new ByteArrayOutputStream();
ZipOutputStream zos = new ZipOutputStream(zipBaos);
fileNameToByteContentMap.eachWithIndex {String fileName, ByteArrayOutputStream baos, i ->
byte[] content = baos.buf
ZipEntry entry = new ZipEntry(fileName)
entry.setSize(content.length)
zos.putNextEntry(entry)
zos.write(content)
zos.closeEntry()
}
zos.close()
return zipBaos.toByteArray();
}
I managed to "enable" unicode characters (cyrillic or czech) within java code and furthermore providing a true type font in my resources (CALIBRI.TTF).
import org.w3c.dom.Document;
import org.xhtmlrenderer.pdf.ITextRenderer;
import com.lowagie.text.pdf.BaseFont;
...
ITextRenderer renderer = new ITextRenderer();
URL fontResourceURL = getClass().getResource("fonts/CALIBRI.TTF");
//System.out.println("font-path:"+fontResourceURL.getPath());
/* HERE comes my solution: */
renderer.getFontResolver().addFont(fontResourceURL.getPath(),
BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.setDocument(doc, null);
renderer.layout();
baos = new ByteArrayOutputStream();
renderer.createPDF(baos);
baos.flush();
result = baos.toByteArray();
...
Finally I added the font-family 'Calibri' to the css section of my document:
...
<style type="text/css">
span { font-size: 11pt; font-family: Calibri; }
...
For some reason it started working with following css and .ttf file, which was generated by face-kit-generator:
#font-face {
src: url('arialuni.ttf');
-fs-pdf-font-embed: embed;
-fs-pdf-font-encoding: Identity-H;
}
body {
font-family: Arial Unicode MS, Lucida Sans Unicode, Arial, verdana, arial, helvetica, sans-serif;
font-size: 8.8pt;
}
Weird thing is that if I put font into some folder, let say "fonts", it will find the font but characters won't be rendered.

Hiding a row from a page that is passed as html file into a stringbuilder type object

I have an html page for a preview functionality.
I pass this as html into a stringbuilder type object and replace content through another page.
Now I want a certain section to be hidden under a specific circumstnace.
Currently that section is a row.
So how can I do so?
The following code in the section I want to hide:
<tr id="rowcontent" bgcolor="E96F00">
<td style="font-family: Arial; font-size: 14px; font-weight: Bold; color:white;">Course Content Link</td>
</tr>
<tr>
<td style="font-family: calibri; font-size: 14px;">#CourseContent#<BR> </td>
</tr>
This is how I am using the above html :
file = Server.MapPath(".") + "\\Template\\Level100_Template.htm";
string input = string.Empty;
if (File.Exists(file))
{
sr = File.OpenText(file);
//input += Server.HtmlEncode(sr.ReadToEnd());
input += sr.ReadToEnd();
x.Append(input);
sr.Close();
}
This is how I am replacing the content section:
if (dt.Rows[0]["CourseContentPath"].ToString() != string.Empty)
{x.Replace ("#CourseContent#", "<A href='" +CourseContentLink + "' target=_blank onclick='window.open(this.href, this.target,'height=1000px,width=1000px'); return false>Click here</A> to find the course content");
}
How can I hide the entire section in a particular case..
A simplest way is to use this code
string StartStr= "<tr id=\"rowcontent\"", EndStr= "</tr>", String= x.ToString();
int Start= String.IndexOf(StartStr);
int End= String.IndexOf(EndStr, String.IndexOf(EndStr, Start)+ EndStr.Length);
x.Remove(Start,End- Start+ EndStr.Length);

Resources