plone.memoize cache depending on function's return value - plone

i'm trying to cache the return value of a function only in case it's not None.
in the following example, it makes sense to cache the result of someFunction in case it managed to obtain data from some-url for an hour.
if the data could not be obtained, it does not make sense to cache the result for an hour (or more), but probably for 5 minutes (so the server for some-domain.com has some time to recover)
def _cachekey(method, self, lang):
return (lang, time.time() // (60 * 60))
#ram.cache(_cachekey)
def someFunction(self, lang='en'):
data = urllib2.urlopen('http://some-url.com/data.txt', timeout=10).read()
except socket.timeout:
data = None
except urllib2.URLError:
data = None
return expensive_compute(data)
calling method(self, lang) in _cachekey would not make a lot of sense.

as this code would be too long for a comment, i'll post it here in hope it'll help others:
#initialize cache
from zope.app.cache import ram
my_cache = ram.RAMCache()
my_cache.update(maxAge=3600, maxEntries=20)
_marker = object()
def _cachekey(lang):
return (lang, time.time() // (60 * 60))
def someFunction(self, lang='en'):
cached_result = my_cache.query(_cacheKey(lang), _marker)
if cached_result is _marker:
#not found, download, compute and add to cache
data = urllib2.urlopen('http://some-url.com/data.txt', timeout=10).read()
except socket.timeout:
data = None
except urllib2.URLError:
data = None
if data is not None:
#cache computed value for 1 hr
computed = expensive_compute(data)
my_cache.set(data, (lang, time.time() // (60 * 60) )
else:
# allow download server to recover 5 minutes instead of trying to download on every page load
computed = None
my_cache.set(None, (lang, time.time() // (60 * 5) )
return computed
return cached_result

In this case, you should not generalize "return as None", as decorator cached results can depend only on input values.
Instead, you should build the caching mechanism inside your function and not rely on a decorator.
Then this becomes a generic non-Plone specific Python problem how to cache values.
Here is an example how to build your manual caching using RAMCache:
https://developer.plone.org/performance/ramcache.html#using-custom-ram-cache

Related

Interrupt ZStream mapMPar processing

I have the following code which, because of Excel max row limitations, is restricted to ~1million rows:
ZStream.unwrap(generateStreamData).mapMPar(32) {m =>
streamDataToCsvExcel
}
All fairly straightforward and it works perfectly. I keep track of the number of rows streamed, and then stop writing data. However I want to interrupt all the child fibers spawned in mapMPar, something like this:
ZStream.unwrap(generateStreamData).interruptWhen(effect.true).mapMPar(32) {m =>
streamDataToCsvExcel
}
Unfortunately the process is interrupted immediately here. I'm probably missing something obvious...
Editing the post as it needs some clarity.
My stream of data is generated by an expensive process in which data is pulled from a remote server, (this data is itself calculated by an expensive process) with n Fibers.
I then process the streams and then stream them out to the client.
Once the processed row count has reached ~1 million, I then need to stop pulling data from the remote server (i.e. interrupt all the Fibers) and end the process.
Here's what I can come up with after your clarification. The ZIO 1.x version is a bit uglier because of the lack of .dropRight
Basically we can use takeUntilM to count the size of elements we've gotten to stop once we get to the maximum size (and then use .dropRight or the additional filter to discard the last element that would take it over the limit)
This ensures that both
You only run streamDataToCsvExcel until the last possible message before hitting the size limit
Because streams are lazy expensiveQuery only gets run for as many messages as you can fit within the limit (or N+1 if the last value is discarded because it would go over the limit)
import zio._
import zio.stream._
object Main extends zio.App {
override def run(args: List[String]): URIO[zio.ZEnv, ExitCode] = {
val expensiveQuery = ZIO.succeed(Chunk(1, 2))
val generateStreamData = ZIO.succeed(ZStream.repeatEffect(expensiveQuery))
def streamDataToCsvExcel = ZIO.unit
def count(ref: Ref[Int], size: Int): UIO[Boolean] =
ref.updateAndGet(_ + size).map(_ > 10)
for {
counter <- Ref.make(0)
_ <- ZStream
.unwrap(generateStreamData)
.takeUntilM(next => count(counter, next.size)) // Count size of messages and stop when it's reached
.filterM(_ => counter.get.map(_ <= 10)) // Filter last message from `takeUntilM`. Ideally should be .dropRight(1) with ZIO 2
.mapMPar(32)(_ => streamDataToCsvExcel)
.runDrain
} yield ExitCode.success
}
}
If relying on the laziness of streams doesn't work for your use case you can trigger an interrupt of some sort from the takeUntilM condition.
For example you could update the count function to
def count(ref: Ref[Int], size: Int): UIO[Boolean] =
ref.updateAndGet(_ + size).map(_ > 10)
.tapSome { case true => someFiber.interrupt }

Google Earth Engine download problems, is this caused by immutable server side objects?

I have a function that will download an image collection as a TFrecord or a geotiff.
Heres the function -
def download_image_collection_to_drive(collection, aois, bands, limit, export_format):
if collection.size().lt(ee.Number(limit)):
bands = [band for band in bands if band not in ['SCL', 'QA60']]
for aoi in aois:
cluster = aoi.get('cluster').getInfo()
geom = aoi.bounds().getInfo()['geometry']['coordinates']
aoi_collection = collection.filterMetadata('cluster', 'equals', cluster)
for ts in range(1, 11):
print(ts)
ts_collection = aoi_collection.filterMetadata('interval', 'equals', ts)
if ts_collection.size().eq(ee.Number(1)):
image = ts_collection.first()
p_id = image.get("PRODUCT_ID").getInfo()
description = f'{cluster}_{ts}_{p_id}'
task_config = {
'fileFormat': export_format,
'image': image.select(bands),
'region': geom,
'description': description,
'scale': 10,
'folder': 'output'
}
if export_format == 'TFRecord':
task_config['formatOptions'] = {'patchDimensions': [256, 256], 'kernelSize': [3, 3]}
task = ee.batch.Export.image.toDrive(**task_config)
task.start()
else:
logger.warning(f'no image for interval {ts}')
else:
logger.warning(f'collection over {limit} aborting drive download')
It seems whenever it gets to the second aoi it fails, Im confused by this as if ts_collection.size().eq(ee.Number(1)) confirms there is an image there so it should manage to get product id from it.
line 24, in download_image_collection_to_drive
p_id = image.get("PRODUCT_ID").getInfo()
File "/lib/python3.7/site-packages/ee/computedobject.py", line 95, in getInfo
return data.computeValue(self)
File "/lib/python3.7/site-packages/ee/data.py", line 717, in computeValue
prettyPrint=False))['result']
File "/lib/python3.7/site-packages/ee/data.py", line 340, in _execute_cloud_call
raise _translate_cloud_exception(e)
ee.ee_exception.EEException: Element.get: Parameter 'object' is required.
am I falling foul of immutable server side objects somewhere?
This is a server-side value, problem, yes, but immutability doesn't have to do with it — your if statement isn't working as you intend.
ts_collection.size().eq(ee.Number(1)) is a server-side value — you've described a comparison that hasn't happened yet. That means that doing any local operation like a Python if statement cannot take the comparison outcome into account, and will just treat it as a true value.
Using getInfo would be a quick fix:
if ts_collection.size().eq(ee.Number(1)).getInfo():
but it would be more efficient to avoid using getInfo more than needed by fetching the entire collection's info just once, which includes the image info.
...
ts_collection_info = ts_collection.getInfo()
if ts_collection['features']: # Are there any images in the collection?
image = ts_collection.first()
image_info = ts_collection['features'][0] # client-side image info already downloaded
p_id = image_info['properties']['PRODUCT_ID'] # get ID from client-side info
...
This way, you only make two requests per ts: one to check for the match, and one to start the export.
Note that I haven't actually run this Python code, and there might be some small mistakes; if it gives you any trouble, print(ts_collection_info) and examine the structure you actually received to figure out how to interpret it.

How to solve a tkinter memory leak?

I have a dynamic table with a fixed row number (like a FIFO Queue), which updates continuously through tkinter's after() function. Inside the table is a Button, which text is editable.
To make the Button's text editable I used the solution of BrenBarn and reference a loop variable into a function call at the command-attribute.
When the function update_content_items() is cycled, I found, that the memory usage is increasing MB by MB per second. I can confirm that after commenting out the lambda expression, the memory leak was gone. (as seen live running 'top' in the terminal)
It seems I have to use the lambda, otherwise the Button will have a wrong index and the user edits the wrong row, when I just used self.list_items[i], though the user clicked the right one.
Is there a way to solve the problem? How can the user click the right button and edit it while having the right index and getting rid of the leak?
The corresponding code:
def update_content_items(self):
"""
Continuously fills and updates the Table with rows and content.
The size of the table rows is initially fixed by an external value at config.ini
:return: nothing
"""
if len(self.list_items) > self.queueMaxlen:
self.queueMaxlen = len(self.list_items)
self.build_table()
try:
for i in range(len(self.list_items)):
item = self.list_items[i]
self.barcodeImgList[i].image = item.plateimage
orig_image = Image.open(io.BytesIO(item.plateimage))
ein_image = ImageTk.PhotoImage(orig_image)
self.barcodeImgList[i].configure(image=ein_image)
# keeps a reference, because somehow tkinter forgets it...??? Bug of my implementation???
self.barcodeImgList[i].image = ein_image
orig_image = None
ein_image = None
#FIXME Memory LEAK?
self.numberList[i].configure(text=item.number,
command=lambda K=i: self.edit_barcode(self.list_items[K]))
self.timestampList[i].configure(text=item.timestamp)
self.search_hitlist[i].config(bg='white', cursor="xterm")
self.search_hitlist[i].unbind("<Button-1>")
if item.queryresult is not None:
if item.queryresult.gesamtstatus != 'Gruen':
self.search_hitlist[i].insert(tk.END, item.queryresult.barcode +
'\n' + item.queryresult.permitlevel)
self.search_hitlist[i].configure(bg='red', cursor="hand2")
self.search_hitlist[i].bind("<Button-1>", item.url_callback)
else:
self.search_hitlist[i].configure(bg='green', cursor="xterm")
self.search_hitlist[i].configure(state=tk.DISABLED)
self.on_frame_configure(None)
self.canvas.after(10, self.update_content_items)
except IndexError as ie:
for number, thing in enumerate(self.list_items):
print(number, thing)
raise ie
def edit_barcode(self, item=None):
"""
Opens the number plate edit dialogue and updates the corresponding list item.
:param item: as Hit DAO
:return: nothing
"""
if item is not None:
new_item_number = EditBarcodeEntry(self.master.master, item)
if new_item_number.mynumber != 0:
item.number = new_item_number.mynumber
self.list_items.request_work(item, 'update')
self.list_items.edit_hititem_by_id(item)
self.parent.master.queryQueue.put(item)
else:
print("You shouldn't get here at all. Please see edit_barcode function.")
EDIT: It seems there is indeed a deeper memory leak (python itself). The images won't get garbage collected. Memory is slowly leaking in Python 3.x and I do use PIL. Also here: Image loading by file name memory leak is not properly fixed
What can I do, because I have to cycle through a list with records and update Labels with images? Is there a workaround? PhotoImage has no explicit close() function, and if I call del, the reference is gc'ed and no configuring of the Label possible.
an example of my proposed changes, with indentation fixed:
def update_content_items(self):
"""
Continuously fills and updates the Table with rows and content.
The size of the table rows is initially fixed by an external value at config.ini
:return: nothing
"""
if len(self.list_items) > self.queueMaxlen:
self.queueMaxlen = len(self.list_items)
self.build_table()
try:
for i in range(len(self.list_items)):
item = self.list_items[i]
self.barcodeImgList[i].image = item.plateimage
orig_image = Image.open(io.BytesIO(item.plateimage))
ein_image = ImageTk.PhotoImage(orig_image)
self.barcodeImgList[i].configure(image=ein_image)
# keeps a reference, because somehow tkinter forgets it...??? Bug of my implementation???
self.barcodeImgList[i].image = ein_image
orig_image = None
ein_image = None
self.numberList[i].configure(text=item.number) # removed lambda
self.numberList[i].bind("<Button-1>", self.edit_barcode_binding) # added binding
self.timestampList[i].configure(text=item.timestamp)
self.search_hitlist[i].config(bg='white', cursor="xterm")
self.search_hitlist[i].unbind("<Button-1>")
if item.queryresult is not None:
if item.queryresult.gesamtstatus != 'Gruen':
self.search_hitlist[i].insert(tk.END, item.queryresult.barcode +
'\n' + item.queryresult.permitlevel)
self.search_hitlist[i].configure(bg='red', cursor="hand2")
self.search_hitlist[i].bind("<Button-1>", item.url_callback)
else:
self.search_hitlist[i].configure(bg='green', cursor="xterm")
self.search_hitlist[i].configure(state=tk.DISABLED)
self.on_frame_configure(None)
self.canvas.after(10, self.update_content_items)
except IndexError as ie:
for number, thing in enumerate(self.list_items):
print(number, thing)
raise ie
def edit_barcode_binding(self, event): # new wrapper for binding
K = self.numberList.index(event.widget) # get index from list
self.edit_barcode(self.list_items[K]) # call the original function
def edit_barcode(self, item=None):
"""
Opens the number plate edit dialogue and updates the corresponding list item.
:param item: as Hit DAO
:return: nothing
"""
if item is not None:
new_item_number = EditBarcodeEntry(self.master.master, item)
if new_item_number.mynumber != 0:
item.number = new_item_number.mynumber
self.list_items.request_work(item, 'update')
self.list_items.edit_hititem_by_id(item)
self.parent.master.queryQueue.put(item)
else:
print("You shouldn't get here at all. Please see edit_barcode function.")

Filtering tab completion in input task implementation

I'm currently implementing a SBT plugin for Gatling.
One of its features will be to open the last generated report in a new browser tab from SBT.
As each run can have a different "simulation ID" (basically a simple string), I'd like to offer tab completion on simulation ids.
An example :
Running the Gatling SBT plugin will produce several folders (named from simulationId + date of report generaation) in target/gatling, for example mysim-20140204234534, myothersim-20140203124534 and yetanothersim-20140204234534.
Let's call the task lastReport.
If someone start typing lastReport my, I'd like to filter out tab-completion to only suggest mysim and myothersim.
Getting the simulation ID is a breeze, but how can help the parser and filter out suggestions so that it only suggest an existing simulation ID ?
To sum up, I'd like to do what testOnly do, in a way : I only want to suggest things that make sense in my context.
Thanks in advance for your answers,
Pierre
Edit : As I got a bit stuck after my latest tries, here is the code of my inputTask, in it's current state :
package io.gatling.sbt
import sbt._
import sbt.complete.{ DefaultParsers, Parser }
import io.gatling.sbt.Utils._
object GatlingTasks {
val lastReport = inputKey[Unit]("Open last report in browser")
val allSimulationIds = taskKey[Set[String]]("List of simulation ids found in reports folder")
val allReports = taskKey[List[Report]]("List of all reports by simulation id and timestamp")
def findAllReports(reportsFolder: File): List[Report] = {
val allDirectories = (reportsFolder ** DirectoryFilter.&&(new PatternFilter(reportFolderRegex.pattern))).get
allDirectories.map(file => (file, reportFolderRegex.findFirstMatchIn(file.getPath).get)).map {
case (file, regexMatch) => Report(file, regexMatch.group(1), regexMatch.group(2))
}.toList
}
def findAllSimulationIds(allReports: Seq[Report]): Set[String] = allReports.map(_.simulationId).distinct.toSet
def openLastReport(allReports: List[Report], allSimulationIds: Set[String]): Unit = {
def simulationIdParser(allSimulationIds: Set[String]): Parser[Option[String]] =
DefaultParsers.ID.examples(allSimulationIds, check = true).?
def filterReportsIfSimulationIdSelected(allReports: List[Report], simulationId: Option[String]): List[Report] =
simulationId match {
case Some(id) => allReports.filter(_.simulationId == id)
case None => allReports
}
Def.inputTaskDyn {
val selectedSimulationId = simulationIdParser(allSimulationIds).parsed
val filteredReports = filterReportsIfSimulationIdSelected(allReports, selectedSimulationId)
val reportsSortedByDate = filteredReports.sorted.map(_.path)
Def.task(reportsSortedByDate.headOption.foreach(file => openInBrowser((file / "index.html").toURI)))
}
}
}
Of course, openReport is called using the results of allReports and allSimulationIds tasks.
I think I'm close to a functioning input task but I'm still missing something...
Def.inputTaskDyn returns a value of type InputTask[T] and doesn't perform any side effects. The result needs to be bound to an InputKey, like lastReport. The return type of openLastReport is Unit, which means that openLastReport will construct a value that will be discarded, effectively doing nothing useful. Instead, have:
def openLastReport(...): InputTask[...] = ...
lastReport := openLastReport(...).evaluated
(Or, the implementation of openLastReport can be inlined into the right hand side of :=)
You probably don't need inputTaskDyn, but just inputTask. You only need inputTaskDyn if you need to return a task. Otherwise, use inputTask and drop the Def.task.

Geb: Waiting/sleeping between tests

Is there a way to wait a set amount of time between tests? I need a solution to compensate for server lag. When creating a record, it takes a little bit of time before the record is searchable in my environment.
In the following code example, how would I wait 30 seconds between the first test and the second test and have no wait time between second test and third test?
class MySpec extends GebReportingSpec {
// First Test
def "should create a record named myRecord"() {
given:
to CreateRecordsPage
when:
name_field = "myRecord"
and:
saveButton.click()
then:
at IndexPage
}
// Second Test
def "should find record named myRecord"() {
given:
to SearchPage
when:
search_query = "myRecord"
and:
searchButton.click()
then:
// haven't figured this part out yet, but would look for "myRecord" on the results page
}
// Third Test
def "should delete the record named myRecord"() {
// do the delete
}
}
You probably don't want to wait a set amount of time - it will make your tests slow. You would ideally want to continue as soon as the record is added. You can use Geb's waitFor {} to poll for a condition to be fulfilled.
// Second Test
def "should find record named myRecord"() {
when:
to SearchPage
then:
waitFor(30) {
search_query = "myRecord"
searchButton.click()
//verify that the record was found
}
}
This will poll every half a second for 30 seconds for the condition to be fulfilled passing as soon as it is and failing if it's still not fulfilled after 30 seconds.
To see what options you have for setting waiting time and interval have look at section on waiting in The Book of Geb. You might also want to check out the section on implicit assertions in waitFor blocks.
If your second feature method depends on success of the first one then you should probably consider annotating this specification with #Stepwise.
You should always try to use waitFor and check conditions wherever possible. However if you find there isn't a specific element you can check for, or any other condition to check, you can use this to wait for a specified amount of time:
def sleepForNSeconds(int n) {
def originalMilliseconds = System.currentTimeMillis()
waitFor(n + 1, 0.5) {
(System.currentTimeMillis() - originalMilliseconds) > (n * 1000)
}
}
I had to use this while waiting for some chart library animations to complete before capturing a screenshot in a report.
Thread.sleep(30000)
also does the trick. Of course still agree to "use waitFor whenever possible".

Resources