I've started working with the azure cognitive services computer vision apis. I'm not sure what the difference is between RecognizePrintedText vs. RecognizeText. It seems that recognize text returns printed text even when I send in the argument 'handwritten'. The output is similar between the two endpoints, so I'm not sure what the different use cases are. Any ideas?
Cheers,
Are you saying mode=Handwritten/Printed not working, if yes then following post is talking about it but there is no answer to it so far.
How to get only handwritten text as results from Microsoft handwriting recognition API?
I think the difference is that the printed mode is optimized for printed text and the handwritten mode is optimized for handwritten text.
With that said, it doesn't mean that the handwritten mode will only get you handwritten text, but the handwritten mode will likely get you more of the handwritten text if compared to the printed mode (in general, of course there will be cases where this does not hold.)
I have been working with both tools when scanning printed text to draw a comparison and both retrieve the text, although with different levels of quality.
I'm researching different products for my organization. We are looking for a solution that will replace our current text mining software - DataWatch Monarch. We need some type of software that will be able to extract only relevant data from PDF reports and prepare it to be stored in a database.
DataWatch is causing a bottleneck for our organization due to learning curve and limitations. I started to try and do this just by programming using R, however, we need a more streamlined approach.
If you know of any easy to use, highly effective, text miners or report-text-extractor-like software please share. I will be looking into Scribe Software, SiMX, RapidMiner, and some others.
RapidMiner can extract info from PDFs no problem using the Text Processing extension. Start with the Read Document operator and go from there.
Storing in a database is also straightforward - set up your database connection in the "Manage Database Connections" menu and then use the "Write Database" operator.
Is it possible to get transcription of word translated by Google API?
Same as in translate.google.com right under the textarea.
I need translation and transcription of single words only, like in dictionary, so may be Google offers some dictionary API, simpler than translation but with transcription?
Look at this picture, second red arrow points to transcription of word "transcription"
Google Translation API doesn't give you the variants of translation of a single word and doesn't give to you the text to speech functionality. Translate API can detect the language automatically and respond the translated text.
I want to do speech <-> text with mixed-language inputs.
Initially only Chinese & English, but eventually more language pairs. Vast majority of speech will be English, but small amounts of Chinese will be included. The application is kind of a "conversational verbal dictionary":
speech-to-text with mixed-language input: "How do you say 猫?"
text-to-speech with mixed-language input: "The English word for 猫 is Cat." I would want this to be spoken with the voice/accent of a native English speaker.
I noticed that the text-to-speech demo at this URL can handle sentences like this IF you choose the "Chinese-CN", "Chinese-HK", or "Chinese-TW" accent, but not if you choose any of the "English-*" accents. This doesn't work for me because I need a native English-speaking accent ...
Bing Speech to Text does mixed language support for a small set of vocabulary for Chinese/english. But this is a small set and we do not have plans to expand to other languages this year.
Text To Speech is only available for the voices published.
I try to write all data analysis reports using R Markdown, because I can have a reproducible document that I can share in several output formats (Pdf, html and MS Word).
However, most of my colleagues use MS Word and they have no idea about R, Markdown, etc.
One advantage of using R Markdown is that I can generate my report in MS Word and directly share it with my colleagues.
The disadvantage is that collaboration becomes cumbersome for me, because I receive feedback on MS Word as well (typically using track changes) and I have to manually introduce those changes back into the .rmd file.
So, my question is: how can I simplify the process (i.e. make it as automatic as possible) of getting the changes in the MS Word document into the .Rmd?
Are there any tools out there that can help me out?
P.s.getting my colleagues to become R-literate is not an option :(
I haven't yet tried what I'm proposing, but here is how I plan to handle this, since I have exactly the same need. First, there are two distinct scenarios:
I am the lead author, or I am responsible for the statistical analysis: I will require all collaborators to learn and use markdown (not R Markdown, just generic markdown) and I'll instruct them not to touch any R code. I believe markdown is easy enough that anyone who is competent enough to collaborate on an article with data analysis is more than competent to learn markdown. For teaching them, the key features for people familiar with working with Microsoft Word and track changes are the following:
Basic markdown references: I would give them the core R Markdown references, which are their Pandoc Markdown documentation and their R Markdown cheat sheet.
Track changes: Collaborators would simply edit the markdown in plain text and submit their edited version. To view and reconcile differences, I would simply use a diff tool; I would find a good online one to teach my collaborators how to diff changes.
Comments between authors: I would select one of the options for markdown comments and teach my collaborators to use that when needed. The modified HTML comment (<!--- Pandoc-enhanced HTML comment -->) is the one I would probably use.
Reference management: I use Zotero, so I would use Better BibTeX for Zotero to handle references. The nice thing about this is that although I would have to handle the references myself, collaborators can directly add references to the Zotero group library. In fact, using citation keys, it should be simple for collaborators to learn how to insert references themselves into the markdown text.
I am NOT the lead author and I am NOT responsible for the statistical analysis: I would use whatever workflow the lead author uses (e.g. if the lead author uses Word with tracked changes, I'll use the same things).
I want to note that it seems that the only part that seems to be not so easy (compared to Microsoft Word normal working features) is replacing track changes with diff. I'm not aware of a tool that makes incorporating diff files as easy as how Word reconciles changes, but if such a tool exists, then the process should be more seamless.
I believe we would need to work on several packages in order to make true collaboration possible between users of Word and RMarkdown. I would be happy to collaborate with anyone interested in making this happen.
Adding a CriticMarkup plugin for RStudio. https://github.com/CriticMarkup/CriticMarkup-toolkit/
Having an R package that can scrape Word documents along with tracked changes. The officer package can already read Word documents, but not the tracked changes. It would also be extremely useful if this package could add simple RMarkdown formatting to the scrapes, e.g. for bold, subscripts and perhaps even tables to facilitate the subsequent matching of Word text to the RMarkdown file.
https://github.com/davidgohel/officer/issues/132
Write a package that can translate the scraped Tracked changes to CriticMarkup into the RMarkdown file.
Generate a key (paragraph)->(lines) that matches paragraphs scraped from Word (without any of the tracked changes) to lines in the RMarkdown. The problem is that we don't know what was generated using code, and what was directly written as Rmd. The first step would be to find lines in the RMarkdown file that should form paragraphs (exclude R chunks, but not inline R). Then, ensuring the order remains the same, compare these lines (remove newlines) to paragraphs scraped from the Word document, using a regexp symbol for "any char, any length" in the place of inline r chunks. Next, split paragraphs with inline chunks as into sub-paragraphs in order to be able to apply tracked changes and comments to either the inline code, before, or after the inline chunk more easily. Finally, the paragraphs that could not be matched were likely generated within code chunks and should be matched to the appropriate code chunks, determined from the order of the paragraphs.
Use the generated key, apply tracked changes (as CritcMarkup) to the RMarkdwown file. Any changes made to code chunks should be reported as a CrticMarkup comment around that code chunk (or group of code chunks if there is no markdown in between chunks).
I suggest you try trackdown https://claudiozandonella.github.io/trackdown/
trackdown offers a simple answer to collaborative writing and editing of R Markdown (or Sweave) documents. Using trackdown, the local .Rmd (or .Rnw) file is uploaded as plain-text in Google Drive where, thanks to the easily readable Markdown (or LaTeX) syntax and the well-known online interface offered by Google Docs, collaborators can easily contribute to the writing and editing of the narrative part of the document. After integrating all authors’ contributions, the final document can be downloaded and rendered locally.
Using Google Docs, anyone can collaborate on the document as no programming experience is required, they only have to focus on the narrative text ignoring code jargon.
Moreover, you can hide code chunks setting hide_code = TRUE (they will be automatically restored when downloaded). This prevents collaborators from inadvertently making changes to the code that might corrupt the file and it allows collaborators to focus only on the narrative text ignoring code jargon.
You can also upload the actual Output (i.e., the resulting complied document) in Google Drive together with the .Rmd (or .Rnw) document. This helps collaborators to evaluate the overall layout, figures and tables and it allows them to use comments on the pdf to propose and discuss suggestions.
I know this is an old post, but for future askers, there is now a package available that can do (mostly) this:
The {redoc} package can output to Word, and by storing the R code internally within the Word document, it can also dedoc() a Word file back into RMarkdown. It uses the Critic Markup syntax discussed in another answer.