U-sql ocrextractor not working - u-sql

Trying to extract text information from a jpg file using Ocrextractor. Getting error connot implicity convert type Cognition.Vision.OcrExtractor to Microsoft.Analytics.Interfaces.IProcessor

The Processor has been changed into an Extractor and a new Processor has been introduced. See here for details: https://github.com/Azure/AzureDataLake/blob/master/docs/Release_Notes/2018/2018_Winter/USQL_Release_Notes_2018_Winter.md#u-sql-cognitive-library-additions

Related

BlueSky Statistics - Character Encoding Problem

I am loading a data set, characters of which was encoded in ISO 8859-9 ("Latin 5") using Windows 10 OS (Microsoft has assigned code page 28599 a.k.a. Windows-28599 to ISO-8859-9 in Windows).
The data set is originally in Excel.
Whenever I run an analysis, or any operation with a variable name containing a character specific to this code page (ISO 8859-9), I get an error like:
Error: undefined columns selected
BSkyFreqResults <- BSkyFrequency(vars = c("MesleÄŸi"), data = Turnudep_raw_data_5)
Error: object 'BSkyFreqResults' not found
BSkyFormat(BSkyFreqResults)
The characters ÄŸ within "MesleÄŸi" are originally one character in Turkish (g with an inverted hat on) ğ
Those variable names that contain only letters from US code page work normally in BlueSky operations.
If I try to use save as in Excel and use web option UTF-8, to convert the data to UTF-8, this does not work either. If I export it to csv file, it does not work as is, or saved as UTF-8.
How can I load this data into BlueSky so that it works?
This same data set works in Rstudio:
> Sys.getlocale('LC_CTYPE')
[1] "Turkish_Turkey.1254"
And also in SPSS:
Language is set to Unicode
Picture of Language settings in SPSS
It also works in Jamovi
I also get an error when I start BlueSky, that may be relevant to this problem:
Python-CFFI error
From cffi callback <function _consolewrite_ex at 0x000002A36B441F78>:
Traceback (most recent call last):
File "rpy2\rinterface_lib\callbacks.py", line 132, in _consolewrite_ex
File "rpy2\rinterface_lib\conversion.py", line 133, in _cchar_to_str_with_maxlen
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 15: invalid start byte
Since then I re-downloaded and re-installed BlueSky, but I still get this Python-CFFI error every time I start the software.
I want to work with BlueSky and will appreciate any help in resolving this problem.
Thanks in advance
Here is a link for reproducing the problem.
The zip file contains a data source of 2 cases both in Excel and BlueSky format, a BlueSky Markdown file to show how the error is produced and an RMarkdown file for redundancy (probably useless).
UPDATE: The Python error (Python-CFFI error) appears to be related to the Region settings in Windows.
If the region is USA (Turnudep_reprex_Windows_Region_USA-Settings.jpg) , the python error does NOT appear.
If the region is Turkey (Turnudep_reprex_Windows_Region_Turkey-Settings.jpg) the python error DOES appear.
Unfortunately, setting the region and language to USA does eliminate the python error message but not the other problem. Still all the operations with the Turkish variable names end up with an error.
This may be a problem only the BlueSky developers may solve ...
Any help or suggestion will be greatly appreciated.
UPDATE FOR VERSION 10.2: The Python error (Python-CFFI error) is eliminated in this version. All others persist. I also notice that I can not change the variable names that have characters not in US code page. Meaning, if a variable name is something like "HastaNo", I can do analysis with that variable and change the name of the variable in the editor. If the variable name is something like "Mesleği" I can not do analysis with that variable AND I CANNOT CHANGE THAT NAME in the editor to "Meslegi" or anything else, so that it is usable in analysis.
UPDATE FOR VERSION: BlueSky Statistics Version 10.2.1, R package version 8.70
No change from Version 10.2. Variable names that contain a character outside of ASCII, cause an error AND can not be changed in BlueSky Statistics.
For version 10, according to user manual chapter 15.1.3 you can adjust the encoding setting. (answer has been edited for more clarity)

TypeError: Argument 'element' has incorrect type (expected lxml.etree._Element, got None Type)

When I try to generate an interview with docx file in Docassemble it raises the error:
TypeError: Argument 'element' has incorrect type (expected lxml.etree._Element, got None Type)
The problem is in docx file. Create a new docx file in Windows, insert just one field in it to test. If it works, then you add the other fields.
There's actually an open pull request to fix this bug in docxcompose, an upstream dependency of Docassemble. For a specific DOCX file, you should be able to fix it by opening the file in Word and then saving, as it appears to be an issue with very small errors that Word automatically fixes on save.
https://github.com/4teamwork/docxcompose/pull/58

locate invalid character causing error in R xmlToDataFrame()

For background I am very new to R, and have almost no experience with XML files.
I wrote a webscraper using the RSelenium package that downloads XML files for multiple states and years from this website, and then wrote code that reads in each file and appends it to one file and exports a CSV. My webscraper successfully downloads all of the files I need, and the next segment of code is able to successfully read all but two of the downloaded xml files.
The first file that I am unable to read into an R dataframe can be retrieved by selecting the following options on this page: http://www.slforms.universalservice.org/DRT/Default.aspx
Year=2013
State=PA
Click radio button for "XML Feed"
Click checkbox for "select data points"
Click checkbox for "select all data points"
Click "build data file"
I try to read the resulting XML file into R using xmlToDataFrame:
install.packages("XML")
require("XML")
data_table<-xmlToDataFrame("/users/datafile.xml")
When I do, I get an error:
xmlParseCharRef: invald xmlChar value 19
Error: 1: xmlParseCharRef: invalid xmlChar value 19
The other examples I've seen of invalid character errors using xmlToDataFrame usually give two coordinates for the problematic character, but since only the value "19" is given, I'm not sure how to locate the problematic character.
Once I do find the invalid character, would there be a way to alter the text of the xml file directly to escape the invalid character, so that xmlToDataFrame will be able to read in the altered file?
It's a bad encoding on this line of XML:
31 to $26,604.98 to remove: the ineligible entity MASTERY CHARTER SCHOOLS 
but the document seems to have other encoding issues as well.
The TSV works fine, so you might think abt using that instead.

read.bib gives lex fatal error - end of buffer missed, {bibtex} package

I am attempting to create a script which will distribute a number of pdfs into a folder tree according to tags. I have the file metadata (including filepath) in a bibtex format. I have tried a number of work-arounds to import the metadata, but so far have been unable to get the filepath, year, title, and tags into a single data frame.
When I try to import using read.bib (which seems the simplest solution) I get the following error:
dbase_full <- read.bib("C:/Users/WILIAM-PLAN/Desktop/My Collection 23 07.bib")
Error in read.bib("C:/Users/WILIAM-PLAN/Desktop/My Collection 23 07.bib") :
lex fatal error:
fatal flex scanner internal error--end of buffer missed
I have looked up the error but language of the 'under the hood' part of the {bibtex} package (lex scanners etc) is beyond me.
Is there quick fix for this error?
If not, is there another way to get the file metadata from bibtex into a dataframe?
i had the same problem.
The problem is that in the bib file could be in some fields (as abstract) lines with a lot of chars..
You need to split and wrap them.
I hope it is useful

Converting .obj file to .sdkmesh using MeshConvert tool

DirectX SDK provides a utility that converts .x or .obj file to .sdkmesh, its called meshconvert tool.
But, I tried many times and its not working.
It shows this message "Cannot Load specified input file"
(I input a .obj file named samp.obj and typed "meshconvert /sdkmesh samp.obj".)
Can anyone please help me solve it?
P.S. I'm on windows 7 and inputting the above command in DirectXSDK Command prompt.
Thanks in advance!
I have read that the .obj file type listed by MeshConvert.exe refers to a binary form of .x, not the popular Wavefront Object Model format. I'm still looking for a way to do this myself.

Resources