Plone:Folderish content type generated with Dexterity. How do I populate it's file, image and rich text fields? - content-type

I have made some new content type using Dexterity. I now wish to create the content from a python script. All is well with the line below, and the item is generated in the target folder with the correct id and date. But how do you pass the file data to a file field, the image data to the image field and the richt_text data to the rich_text field?
target.invokeFactory(type_name="content_type_name", id=id, date=date, file=file, image=image, rich_text=rich_text)
The date I could figure out; Dexterity wants the Python datetime format:
datetime.datetime(2011,1,1)
Thank you very much for your help - I am sure I am missing something quite elementary here, but haven't found it - probably because I am looking in the wrong place.

For file use plone.namedfile.NamedFile and for image use plone.namedfile.NamedImage and for rich text use plone.app.textfield.value.RichTextValue
e.g.
from plone.namedfile import NamedFile
from plone.namedfile import NamedImage
from plone.app.textfield.value import RichTextValue
file = NamedFile("<html></html>", "text/html", u"text.html")
logo = ... some binary data in a byte string ...
image = NamedImage(logo, filename="logo.gif")
rich_text = RichTextValue(u"<p>A paragraph</p>", 'text/html',
'text/x-html-safe', 'utf-8')
target.invokeFactory(type_name="content_type_name", id=id, date=date, file=file,
image=image, rich_text=rich_text)

Related

Not able to access certain JSON properties in Autoloader

I have a JSON file that is loaded by two different Autoloaders.
One uses schema evolution and besides replacing spaces in the json property names, writes the json directly to a delta table, and I can see all the values are there properly.
In the second one I am mapping to a defined schema and only use a subset of properties. So use a lot of withColumn and then a select to narrows to my defined column list.
Autoloader definition:
df = (spark
.readStream
.format('cloudFiles')
.option('cloudFiles.format', 'json')
.option('multiLine', 'true')
.option('cloudFiles.schemaEvolutionMode','rescue')
.option('cloudFiles.includeExistingFiles','true')
.option('cloudFiles.schemaLocation', bronze_schema)
.option('cloudFiles.inferColumnTypes', 'true')
.option('pathGlobFilter','*.json')
.load(upload_path)
.transform(lambda df: remove_spaces_from_columns(df))
.withColumn(...
Writer:
df.writeStream.format('delta') \
.queryName(al_stream_name) \
.outputMode('append') \
.option('checkpointLocation', checkpoint_path) \
.option('mergeSchema', 'true') \
.trigger(once = True) \
.table(bronze_table)
Issue is that some of the source columns are ok load and I get their values, and others are constantly null in the output table.
For example:
.withColumn('vl_rating', col('risk_severity.value')) # works
.withColumn('status', col('status.name')) # always null
...
.select(
'rating',
'status',
...
json is quite simple, these are all string values, they are always populated. The same code works against another simular json file in another autoloader without issue.
I have run out of ideas to fault find on this. My imports are minimal, outside of Autoloader the JSON loads fine.
e.g
%python
import pyspark.sql.functions as psf
jsontest = spark.read.option('inferSchema','true').json('dbfs:....json')
df = jsontest.withColumn('status', psf.col('status.name')).select('status')
display(df)
Results in the values of the status.name property of the json file
Any ideas would be greatly appreciated.
I have found generally what is causing this. Interesting cause!
I am scanning a whole directory of json files, and the schema evolves over time (as expected). But when I clear out the autoloader schema and checkpoint directories and only scan the latest json file it all works correctly.
So what I surmise is that something in schema evolution with the older json files causes Autoloader to get into a state where it will not put certain properties into the stream to the writer.
If anyone has any recommendation on how to implement some data quality analysis in an Autoloader I would be most appreciative if you would share.

How to find a file with python with Ã

im working on a mtg auto sorter and some of the cards have interesting names that python seems to not want to find. i am looking for a file (that i know i have in the right spot) called 8_JÃtun_Grunt.png. using this...
for card_name in card_names:
# Fetch the image - name can be found based on the card's information
card_info['name'] = card_name
img_name = '%s/card_img/png/%s/%s_%s.png' % (Config.data_dir, card_info['set'],
card_info['collector_number'],
fetch_data.get_valid_filename(card_info['name']))
card_img = cv2.imread(img_name)
# If the image doesn't exist, download it from the URL
if card_img is None:
fetch_data.fetch_card_image(card_info,
out_dir='%s/card_img/png/%s' % (Config.data_dir, card_info['set']))
card_img = cv2.imread(img_name)
if card_img is None:
print('WARNING: card %s is not found!' % img_name)
the error i get is so
error from cmd
this leads me to think that it cant recognize the file name but im reading it from a database that i cant change. any ideas.
I wouldn't be surprised if OpenCV couldn't handle filepaths with unicode caracters.
you could try to add the code from the answer of this SO question

Save an Excel sheet as PDF programatically through powerbuilder

There is a requirement to save an excel sheet as a pdf file programmatically through powerbuilder (Powerbuilder 12.5.1).
I run the code below; however, I am not getting the right results. Please let me know if I should do something different.
OLEObject ole_excel;
ole_excel = create OLEObject;
IF ( ole_excel.ConnectToObject(ls_DocPath) = 0 ) THEN
ole_excel.application.activeworkbook.SaveAs(ls_DocPath,17);
ole_excel.application.activeworkbook.ExportAsFixedFormat(0,ls_DocPath);
END IF;
....... (Parsing values from excel)
DESTROY ole_excel;
I have searched through this community and others for a solution but no luck so far. I tried using two different commands that I found during this search. Both of them return a null object reference error. It would be great if someone can point me in the right direction.
It looks to me like you need to have a reference to the 'activeworkbook'. This would be of type OLEobject so the declaration would be similar to: OLEobject lole_workbook.
Then you need to set this to the active work book. Look for the VBA code on Excel (should be in the Excel help) for something like a 'getactiveworkbook' method. You would then (in PB) need to do something like
lole_workbook = ole_excel.application.activeworkbook
This gets the reference for PB to the activeworkbook. Then do you saveas and etc. like this lole_workbook.SaveAs(ls_DocPath,17)
workBook.saveAs() documentation says that saveAs() has the following parameters:
SaveAs(Filename, FileFormat, Password, WriteResPassword, ReadOnlyRecommended, CreateBackup, AccessMode, ConflictResolution, AddToMru, TextCodepage, TextVisualLayout, Local)
we need the two first params:
FileName - full path with filename and extension, for instance: c:\myfolder\file.pdf
FileFormat - predefined constant, that represents the target file format.
According to google (MS does not list pdf format constant for XLFileFormat), FileFormat for pdf is equal to 57
so, try to use the following call:
ole_excel.application.activeworkbook.SaveAs(ls_DocPath, 57);

how to run and get document stats from boilerpipe article extractor?

There's something I'm not quite understanding about the use of boilerpipe's ArticleExtractor class. Albeit, I am also very new to java, so perhaps my basic knowledge of this enviornemnt is at fault.
anyhow, I'm trying to use boilerpipe to extract the main article from some raw html source I have collected. The html source text is stored in a java.lang.String variable (let's call it htmlstr) variable that has the raw HTML contents of a webpage.
I know how to run boilerpipe to print the extracted text to the output window as follows:
java.lang.String htmlstr = "<!DOCTYPE.... ****html source**** ... </html>";
java.lang.String article = ArticleExtractor.INSTANCE.getText(htmlstr);
System.out.println(article);
However, I'm not sure how to run BP by first instantiating an instance of the ArticleExtractor class, then calling it with the 'TextDocument' input datatype. The TextDocument datatype is itself somehow constructed from BP's 'TextBlock' datatype, and perhaps I am not doing this correctly...
What is the proper way to construct a TextDocument type variable from my htmlstr string variable?
So my problem is then in using the processing method of BP's Article Extractor class aside from calling the ArticleExtractor getText method as per the example above. In other words, I'm not sure how to use the
ArticleExtractor.process(TextDocument doc);
method.
It is my understanding that one is required to run this ArticleExtractor process method to then be able to use the same "TextDocument doc" variable for getting document stats, using BP's
TextDocumentStatistics(TextDocument doc, boolean contentOnly)
method? I would like to use the stats to determine how good the filtering was estimated to be.
Any code examples someone could help me out with?
Code written in Jython (Conversion to java should be easy)
1) How to get TextDocument from a HTML String:
import org.xml.sax.InputSource as InputSource
import de.l3s.boilerpipe.sax.HTMLDocument as HTMLDocument
import de.l3s.boilerpipe.document.TextDocument as TextDocument
import de.l3s.boilerpipe.sax.BoilerpipeSAXInput as BoilerpipeSAXInput
import de.l3s.boilerpipe.extractors.ArticleExtractor as ArticleExtractor
import de.l3s.boilerpipe.estimators.SimpleEstimator as SimpleEstimator
import de.l3s.boilerpipe.document.TextDocumentStatistics as TextDocumentStatistics
import de.l3s.boilerpipe.document.TextBlock as TextBlock
htmlDoc = HTMLDocument(rawHtmlString)
inputSource = htmlDoc.toInputSource()
boilerpipeSaxInput = BoilerpipeSAXInput(inputSource)
textDocument = boilerpipeSaxInput.getTextDocument()
2) How to process TextDocument using Article Extractor (continued from above)
content = ArticleExtractor.INSTANCE.getText(textDocument)
3) How to get TextDocumentStatistics (continued from above)
content_list = [] #replace python 'List' Object with ArrayList in java
content_list.append(TextBlock(content)) #replace with arrayList.add(TextBlock(content))
content_td = TextDocument(content_list)
content_stats = TextDocumentStatistics(content_td, True)#True for article content statistics only
Note: The java docs accompanied with the boilerpipe 1.2.jar library should be somewhat useful for future reference

Unicode errors after upgrade to 2.1.0 final

I've recently upgraded a DjangoCMS project from 2.1.0beta3 to 2.1.0
final, and I've started getting Unicode errors during page editing.
There was a large volume of production content that was migrated
forward with South. I get the error while (using TinyMCE) I try to
insert another plugin, such as an image, into a text plugin or when I
try to add a plugin to a placeholder.
URL:
/admin/cms/page/188/edit-plugin/673/edit-plugin/676/
Stack Trace:
File "/srv/wsphp/wspython/virtualenv/iaffe-prod/lib/python2.6/site-packages/django/template/__init__.py", line 849, in render
return _render_value_in_context(output, context)
File "/srv/wsphp/wspython/virtualenv/iaffe-prod/lib/python2.6/site-packages/django/template/__init__.py", line 829, in _render_value_in_context
value = force_unicode(value)
File "/srv/wsphp/wspython/virtualenv/iaffe-prod/lib/python2.6/site-packages/django/utils/encoding.py", line 88, in force_unicode
raise DjangoUnicodeDecodeError(s, *e.args)
DjangoUnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 12: ordinal not in range(128). You passed in <django.forms.forms.BoundField object at 0xb73cedec> (<class 'django.forms.forms.BoundField'>)
If I repeatedly try to create the plugin, the ID (676 here) increments,
so it looks like the error happens when the form is rendered. This
affects link, picture, and teaser plugins, but not text, file or
snippet plugins.
I'd appreciate any help in isolating the cause here.
Thanks,
Michael
Bit of a late answer, but I had some problems with unicode and solved in with defining source code encodings also see http://evanjones.ca/python-utf8.html
I put
# -*- coding: utf-8 -*-
at the top of the offending files and everything was sorted.
It turns out that this was a data migration issue. The ultimate solution was to force utf8 encoding in the relevant mysql tables using commands like:
alter table cms_page convert to character set utf8;

Resources