we want to realize the following:
Generate PDF with Template,which means set value in AcroFields
Create a big details table (structure of table is also in template). In this progress, the details will occupy more than one page.
If the detail table is on multi pages, the header of the table should be also on top of the new page.
We found some examples on following website:
http://kuujinbo.info/cs/itext_template1.aspx
http://kuujinbo.info/cs/itext_template2.aspx
But the details the founction is omitted there.
Add content; the code for _do_form_fields(), _get_transaction_details(), and _transaction_summary() are omitted, since they only return strings to add to ColumnText. ColumnText is smart; each call to Go() renders as much text that will fit on the current page and returns a status code that tells you: (1) how much text (to write) is remaining, and/or (2) how much space is still available on the page. On each iteration you add text to the current page, call ColumnText.HasMoreText() to inspect the status, and then Document.NewPage() if necessary.
Is there anyone who had same situation before? We are appreciated that you could offer some tips or suggestions.
Thank you.
best regards,
Cheng Gong
You are already making a mistake in the first step of your requirements.
You say "Generate PDF with Template,which means set value in AcroFields."
The first part is OK: you want to generate a PDF with a template. However, this doesn't mean setting values in AcroFields. That's only one option. It's the option you take if you consider PDF being the digital equivalent of paper. The form is static: every coordinate is fixed. You just fill out data at the appropriate places. If the data doesn't fit the designated areas, you're out of luck. I already referred to chapter 6 of my book in a comment. You can also see how AcroForms work in a longer tutorial: https://www.youtube.com/watch?v=6YwDME0Fl1c (This tutorial is almost completely dedicated to creating a report from a data set.)
Another way to create PDF from a template is by using the XML Forms Architecture. In this case (if you have a pure XFA form), your PDF is a container for XML. You can then inject XML data into this form and the form will adapt itself depending on the data. A one-page form can easily grow into a 20-page document when filled out. This is explained in this video: https://www.youtube.com/watch?v=h0wzj84tnmw (Note that the video dates from 2012. The product I present has been finished and the results are much better now.)
Alternatives to this approach could be to create a template in HTML. I often refer to this solution as a poor man's XFA solution. This solution requires XML Worker. You can see an example in this video: https://www.youtube.com/watch?v=clWoDrEEl50
This is a general answer. I couldn't be more specific because your question isn't clear. You first need to make your mind up regarding the approach. Right now, you talk about AcroFields and at the same time about ColumnText. In the long tutorial, this is described as the hard way. See also the corresponding online samples. It is very confusing why you're asking a very difficult question before asking the simple questions. Unless of course, you already have the answer to those simple questions. If so, please share these answers.
Related
I have been exploring Azure Form Recognizer for one of my project where we wants to perform OCR on some hand written texts.
The problem is that when we give scanned images to the tool to process, it some time doesn't even recognize the text written on it (even if it is clearly written). I tried multiple type of images by performing enhancement on it and also the B/W or colored copy of it but it doesn't works.
Some times it recognize value of two fields as one and this leads to incorrect data where one field is completely blank and other is having value of other one along with its own.
When there is NO VALUE in the tagged field in the testing data, it try to read the from some other place which is not even closer to that field or sometimes un-tagged
Could you please help with these queries.
Thanks in advance.
Can you please share also sample forms please make sure data is anonymized and without any real data ?
Please contact customer service to debug this issue.
Thanks,
Neta - MSFT
I am relatively new to Drupal. We have a site and I've been asked to jump in and make some changes. I'm working on customizing the output of the Webforms module. I'm having trouble doing so because I can't seem to find a reference to the various data structures Webforms uses.
For example, I need to change something in a preprocess hook. Passed into the hook is a structure called $variables. I can see that attributes are being added to the piece I want to change, so I know I'm in the right hook. What I want to do is add something to the text. But I can't figure out where in $variables the text is so I can change it.
I'm sure what I need to change is in there, but I can't seem to get at it. All the documentation I've found on the web is either "paste this code in" or assumes you know the data structures.
So:
1. Is there a reference anywhere to these structures? $variables is one. $submission, $components are others. There are probably more. I know their contents vary widely with the specific webform, but looking for a general reference.
2. How can I see the contents of one of the structures from inside a hook? I've tried a lot of things, but no luck. Would be great to either have it output to the Apache log, or show up on the screen, something...
Any help would be greatly appreciated. It feels like there's real power here, but I can't get at it because I'm missing some basics.
I would say you need to install 2 modules to figure out what is going on...
First Devel, allowing you to use the dmp function. This will output a whole array to the message area.
And then my new favorite module, Search Krumo.
A webform is generated from large array of data and finding the bit that is relevant to you can often be difficult just looking though the dmp output. Search Krumo puts a search box in the message area allowing you to search for any instances of a string in the whole array structure. When you've found the bit that is relevant it also lets you copy the path to that array element so you can easily modify values buried deep in multi-arrays.
EDIT:
If you don't want the output on the screen but would rather log it then use Devel Debug Log. Very useful for debugging ajax requests etc.
If you just need to log simple strings not whole arrays then the dd function is useful combined with: tail -f /tmp/drupal_debug.txt assuming you have SSH access.
I'm currently looking for a way to make a dynamic checklist-type document for my job to be used for software upgrades. Right now, we have a generic Word checklist that has all the steps for upgrading a client's software, but due to its nature, not all steps apply to each client, and to list all possible options would make it difficult to navigate and difficult to use, which goes against its purpose.
What I'm looking for is a way to input information (checkboxes, drop-downs, and text fields), and based on that information, produce a list of tasks in some format that is user-readable. For example, if I check one box to indicate that they have a certain feature installed, then add 3 items to the task list.
Is InfoPath the right tool for the job, or am I barking up the wrong tree?
From what you describe, I'd say InfoPath is a very good choice for your project. My first thought would be to work in two different views. The first view would be for your people to input the information about what features are installed (there can be hidden content that only shows if certain answers are given, making it less unwieldy than your Word form). Then I'd have another view designed for printing out and giving to the client, containing only the task list info derived from the data in the first view. Bark away!
It could be a project well beyond my skills right now but I've got around one full month to spend on it so I think I can do it. What I want to build is this: Gather news about a specific subject from various sources. Easy, right? Just get the rss feeds and display them on a page. Well, I want something more advanced: Duplicates removed and customized presentation (that is, be able to define/change the format in which the news headlines are displayed).
I've played a bit with Yahoo Pipes and some other tools and I am facing two big problems:
Some sources don't provide rss feeds. How do I create one?
What's the best method to find and remove duplicates. I thought about comparing the headlines and checking if there is a matching bigger than, say, 50%. Is that a good practice though?
Please add any other things (problems, suggestions, whatever) I might not have considered.
Duplication is a nasty issue. What I eventually ended up doing:
1. Strip out all HTML tags except for links (Although I started using regex, I was burned. I eventually moved to custom parsing to remove tags)
2. Strip out all whitespace
3. Case-desensitize
4. Hash all that with MD5.
Here's why you leave the link in:
A comment might be as simple as "Yes, this sucks". "Yes, this sucks" could be a common comment. BUT if the text "this sucks" is linked to different things, then it is not a duplicate comment.
Additionally, you will find that HTML tag escaping is weird with RSS feeds. You would think that a stray < would be double-encoded: (I think)&<;
But it is not. It is encoded <
But so too are HTML tags! :<p>
I eventually copied all the known HTML tags as parsed by Mozilla Firefox and manually recognized those tags.
Creating an RSS feed from HTML is quite nasty and I can only point you to services such as Spinn3r, which are fantastic at de-duplication and content extraction. These services typically use probability-based algorithms that are above me. I know of one provider that got away with regexing pages (They had to know that a certain page was MySpace-based or Blogger-based) but they did not perform admirably.
You might want to try to use the YQL module to scrape a webpage that doesn't provide RSS. Here's a sample of a YQL statement to scrape HTML.
About duplicates, take a look at this pipe.
Customized presentation: if you want it truly customized you'll have to manipulate the pipe results yourself, e.g. get it as JSON an manipulate it with Javascript, or process it server-side.
I am tasked with writing a program that, given a search term and the HTML source of a page representing search results of some unknown search engine (it can really be anything, a blog, a shop, Google, eBay, ...), needs to build a data structure of the results containing "what's in the results": a title for earch result, the "details" link, the position within the results etc. It is not known whether the results page contains any of the data at all, and whether there are any search results. The goal is to feed the data structure into another program that extracts meaning.
What I am looking for is not BeautifulSoup or a RegExp but rather some clever ideas or algorithms on how to interpret the HTML source. What do I do to find out what part of the page constitutes a single result item? How do I filter the markup noise to extract the important bits? What would you do? Pointers to fields of research covering what I try to to are aly greatly appreciated.
Thanks, Simon
I doubt that there exist a silver-bullet algorithm that without any training will just work on any arbitrary search query output.
However, this task can be solved and is actually solved in many applications, but with different approach. First you have to define general structure of single search result item based on what you actually going to do with it (it could be name, date, link, description snippet, etc.), and then write number of html parsers that will extract necessary necessary fields from search result output of particular web sites.
I know it is not super sexy solution, but it probably the only one that works. And it is not rocket science. Writing parsers is actually extremly simple, you can make dozen per day. If you will look into html source of search result, you will notice that output results are typically very structured and marked with specific div sections or class atributes, so it is very easy to find it in the document. You dont have even use any complicated HTML parsing library for that, something grep-like will be enough.
For example, on this particular page your question starts with <div class="post-text"> and ends with </div>. Everything in between is actually a post text with some HTML formatting that you may want to remove along with extra spaces and "\n". And this <div class="post-text"> appears on the page only once.
Once you go at large scale with your retrieval applicaiton, you will find out that there is not that big variety of different search engines on different sites, and you will be able to re-use already created parsers for sties using similar search engines.
The only thing you have to remember is built-in self-testing. Sites tend to upgrade and change design from time to time. If your application is going to live for some time, you will need to include into your parsers some logic that will check validity of their results and notify you every time search output has changed and is not compatible anymore with your parser. Then you will have to modify particular parser or write new one.
Hope this helps.