We are thinking to use ConverAPI component to handle the pdf conversion in our application.
But we are still unclear about the Limitation of Pdf generation and the Load handling.
How much Load will it support to do the pdf conversion? (e.g. in a sequence if we send 100 request at a time to do the pdf conversion will it work without any crash?)
What is the limitation of handling the pdf conversion? (e.g. if i send a document size around 800MB -1024MB will it be able to handle it for doing the Pdf conversion?)
100 simultaneous file uploads is inefficient. The best is to use ~4 (and it also highly depends on the situation). If you are really planning to convert 100x1Gb files simultaneously, please consult with the support.
The hard limit is 1Gb for files that are processed. Rest depends on file complexity and conversion.
The best would be to register and try it for free with your files.
Related
We are working on a project where following an analysis of data using R, we use rmarkdown to render an html report which will be returned to users uploading the original dataset. This will be part of an online complex system involving multiple steps. One of the requirements is that the rmarkdown html will be serialized and saved in a SQL database for the system to return to users.
My question is - is there a way to render the markdown directly to an object in R to allow for direct serialisation? We want to avoid saving to disk unless absolutely needed as there will be multiple parallel processes doing similar tasks and resources might be limited. From my reasearch so far it doesn't seem possible, but would appreciate any insight.
You are correct, it's not possible due to the architecture of rmarkdown.
If you have this level of control over your server, you could create a RAM disk, using part of your the memory to simulate a hard drive. The actual hard disk won't be used.
There is a way to take files from user via Shiny app using inputFile. I would like to take pdf files from user. How could i make it secure ? I mean that user will not have possibility to upload 200+ gigabyte file or virus etc. Other possibly concerns or your own experience would be very helpful,like where to store those files, how collect them and so on
This question is more about sanitizing your inputs than it is about shiny. However, it is still valid and an important aspect to consider whenever you allow file uploads in your application.
One way would be to only allow PDFs, but inputFile doesn't really support limiting which files to upload (yet). However, you can restrict the size of your file uploads by placing this at the top of your app.R-file: options(shiny.maxRequestSize = 30*1024^2). This equates to 30 MB, as demonstrated in this vignette.
This is obviously not the ideal solution. One alternative might be to use a HTML method for uploading files, and then handling those passing those files along to Shiny with some Javascript.
I'm working with an RDF dataset generated as part of our data collection which consists of around 1.6M small files totalling 6.5G of text (ntriples) and around 20M triples. My problem relates to the time it's taking to load this data into a Sesame triple store running under Tomcat.
I'm currently loading it from a Python script via the HTTP api (on the same machine) using simple POST requests one file at a time and it's taking around five days to complete the load. Looking at the published benchmarks, this seems very slow and I'm wondering what method I might use to load the data more quickly.
I did think that I could write Java to connect directly to the store and so do without the HTTP overhead. However I read in an answer to another question here that concurrent access is not supported, so that doesn't look like an option.
If I were to write Java code to connect to the HTTP repository does the Sesame library do some special magic that would make the data load faster?
Would grouping the files into larger chunks help? This would cut down the HTTP overhead for sending the files. What size of chunk would be good? This blog post suggest 100,000 lines per chunk (it's cutting a larger file up but the idea would be the same).
Thanks,
Steve
If you are able to work in Java instead of Python I would recommend using the transactional support of Sesame's Repository API to your advantage - start a transaction, add several files, then commit; rinse & repeat until you've sent all files.
If that is not an option then indeed chunking the data into larger files (or larger POST request bodies - you of course do not necessarily need to physically modify your files) would help. A good chunk size would probably be around 500,000 triples in your case - it's a bit of a guess to be honest, but I think that will give you good results.
You can also cut down on overhead by using gzip compression on the POST request body (if you don't do so already).
I need to parse a large trace file (up to 200-300 MB) in a Flex application. I started using JSON instead of XML hoping to avoid these problems, but it did not help much. When the file is bigger than 50MB, JSON decoder can't handle it (I am using the as3corelib).
I have doing some research and I found some options:
Try to split the file: I would really like to avoid this; I don't want to change the current format of the trace files and, in addition, it would be very uncomfortable to handle.
Use a database: I was thinking of writing the trace into a SQLite database and then reading from there, but that would force me to modify the program that creates the trace file.
From your experience, what do you think of these options? Are there better options?
The program that writes the trace file is in C++.
Using AMF will give you much smaller data sizes for transfer because it is a binary, not text format. That is the best option. But, you'll need some middleware to translate the C++ program's output into AMF data.
Check out James Ward's census application for more information about benchmarks when sharing data:
http://www.jamesward.com/census/
http://www.jamesward.com/2009/06/17/blazing-fast-data-transfer-in-flex/
Maybe you could parse the file into chunks, without splitting the file itself. That supposes some work on the as3 core lib Json parser, but it should be doable, I think.
I found this library which is a lot faster than the official one: https://github.com/mherkender/actionjson
I am using it now and works perfectly. It also has asynchronous decoder and encoder
What my users will do is select a PDF document on their machine, upload it to my website, where I will convert into an HTML document for display on the website. The document will be stored in a database after conversion.
What's the best way to convert a PDF to HTML?
I have been handed a requirement where a user would create a "news" story as a pdf and then would upload it to the sever, where it will be converted to HTML and displayed on the website.
Any document creation software that can save documents as PDF can save them as HTML. I'm assuming the issue is that your users will be creating rich documents (lots of embedded images), which results in multiple files, and your requirements stem from a desire to make uploading these documents as simple as possible to the user.
There are numerous conversion packages that can probably do this for you, however when you're talking about rich content, you are talking about text plus images. Those images have to be stored somewhere and served somehow, and whatever conversion method you use will require you to examine all image sources to make sure they point to valid locations on your server.
I would like to suggest an alternate way of doing this that you can take to your team: Implement one of the many blog APIs for publishing content. There are free and commercial software packages that use these APIs to publish content directly to a website, such as Windows Live Writer and Microsoft Word. Your users can simply create their content and upload it directly to your website without having to publish it as PDF first then upload it. So the process becomes much smoother for your users, and you get the posts in a form that doesn't require you spend thousands of dollars on developing or buying conversion code.
The two most common APIs are the MetaWeblog API and the Movable Type API. Both are very simple and easy to implement. I think this way would be a MUCH better alternative than what you're thinking about doing.
I don't think converting a PDF to an HTML string is necessarily the best idea, especially if you want to export it back as PDF. PDF files often contain binary elements such as images, so you may be best to convert it to ASCII via an encoding, such as Base64. That way you will have an ASCII string you can save into a text field in the DB and then convert it back out. Could you expand more on the main requirement?
My recommendation would be to not do it that way IF POSSIBLE (but we all know what managers are like) so...
I would recommend that you stay away from converting the PDF to/from HTML (because unless you can find a commercial solution it will be nigh on impossible) and instead do as has already been mentioned and store it as an encoded Base64 string, or BLOB or some other binary format in the database, and then display it to the user with some sort of PDF view plugin for the browser.
All it took was a simple google search for "PDF to HTML": http://www.gnostice.com/pdf2manyOverview_x.asp. I'm sure there are others.
So while it's 'possible', you may want to explain to your manager that this isn't the best content management solution.
Why not use the iTextSharp to read the PDF content? Then You could save both the binary PDF and the text content to the database. You could then let users search the content and download the PDF.
You should look into DynamicPDF. They have a converter (currently Beta) out for serving exactly this purpose. We have used their products with great success (especially for dumping Reporting Services reports directly to PDF).
Ref: http://www.dynamicpdf.com/