GridGain Off Heap doesn't work - bigdata

could, please, someone take a quick look, what is wrong with my code, that is supposed to store cache in Off Heap Memory through GridGain?
My configuration is quite the same, as on the wiki page (http://doc.gridgain.org/latest/Off-Heap+Memory)
My configuration is following:
<!-- Enable OffHeap -->
<property name="offHeapMaxMemory" value="#{2L * 1024L * 1024L * 1024L}"/>
<!-- Always store cache entries in off-heap memory, evict to Swap. -->
<property name="memoryMode" value="OFFHEAP_TIERED"/>
However, jconsole shows, that data is still written to the heap memory, as it is fluctuating and when I try to get data, that I've stored, I get the zero result.
Code is following:
final GridCache<String, Object> cache = grid.cache("partitioned");
for (long i = 0; i < 1000; i++) {
cache.putx(String.valueOf(i), hundredBytes.clone());
if (i % 1024 * 1024 == 0) {
System.out.println(i + "bytes inserted");
}
}
System.out.println("Cache size: " + cache.size());
The last one line shows me "Cache size: 0". That's strange, probably, I do not quite understand, how to access Off Heap memory. Is there other/separate API for that?
Thanks in advance

Correct, there is another one API
GridCache<String, Object> cache = grid.cache("partitioned");
Iterator<Map.Entry<String, Object>> localIterator = cache.offHeapIterator();
However, pay attention, that 'grid.cache("partitioned")' in case of storing data on heap returns the cache for whole cluster, but 'cache.offHeapIterator()' in case of storing data off heap returns only data for the local node in a cluster.
This confused me a bit.

There are a few points about the on-heap/off-heap difference:
There is no difference between on-heap and off-heap data storage when you access your data by key: grid.cache("partitioned") still works with the cache of the whole cluster, not just local entries. You can access your data by key using cache.get(...) methods regardless of the node you execute your code on.
There are different methods which returns local size of on-heap and off-heap memory. cache.size() returns number of on-heap entries stored in cache. cache.offHeapEntriesCount() returns number of entries stored in off-heap storage. cache.swapKeys() returns number of entries stored in swap.
Entry iterators for on-heap, off-heap and swap always return local entries for the node. If you need a whole cache iteration, you can use scan query or broadcast a closure which will iterate over local entries.

Related

Getting data from a direct mapped cache in prolog

The predicate getDataFromCache(StringAddress,Cache,Data,HopsNum,directMap,BitsNum)
should succeed when the Data is successfully retrieved from the Cache (cache hit)
and the HopsNum represents the number of hops required to access the data from
the cache which can differ according to direct map cache mapping technique such
that:
• StringAddress is a string of the binary number which represents the address
of the data you are required to address and it is six binary bits.
• Cache is the cache using the representation discussed previously .
• Data is the data retrieved from cache when cache hit occurs.
• HopsNum the number of hops required to access the data from the cache.
• BitsNum The BitsNum is the number of bits the index needs.
getDataFromCache is always giving me false although everythings seems working so I want someone to fix it
convertAddress(Binary,N,Tag,Idx,directMap):-
Idx is mod(Binary,10**N),
Tag is Binary // 10**N.
getDataFromCache(SA,[item(tag(T),data(D),V,_)|T],Data,HopsNum,directMap,BitsNum):-
convertAddress(SA,BitsNum,Tag,Idx,directMap),
number_string(Tag,Z),
Z==T,
V==1,
Data is D.
getDataFromCache(SA,[item(tag(T),data(D),V,_)|T],Data,HopsNum,directMap,BitsNum):-
convertAddress(SA,BitsNum,Tag,Idx,directMap),
number_string(Tag,Z),
(Z\=T;V==0),
getDataFromCache(SA,T,Data,HopsNum,directMap,BitsNum).
simply hopsNumber is always zero
and you don't have to traverse since it's direct
you can access it using nth0 perdicate
Also you are using the T variable twice

how to flush page data in python using mmap

I am trying to map a region of fpga memory to host system,
resource0 = os.open("/sys/bus/pci/devices/0000:0b:00.0/resource0", os.O_RDWR | os.O_SYNC)
resource_size = os.fstat(resource0).st_size
mem = mmap.mmap(resource0, 65536, flags=mmap.MAP_SHARED, prot=mmap.PROT_WRITE|mmap.PROT_READ, offset= 0 )
If i flush my host page with
mem.flush()
then print the contents
the data is same as before,
nothing is getting cleared from page
print(mem[0:131072])
mem.flush()
print(mem[0:131072])
as i read on python mmap docs , it says it clears then content,
https://docs.python.org/3.6/library/mmap.html
but when i test it remains same
i am using python 3.6.9
Why do you expect flush to clear a page?
https://docs.python.org/2/library/mmap.html
flush([offset, size])
Flushes changes made to the in-memory copy of a file back to disk. Without use of this call there is no guarantee that changes are written back before the object is destroyed. If offset and size are specified, only changes to the given range of bytes will be flushed to disk; otherwise, the whole extent of the mapping is flushed. offset must be a multiple of the PAGESIZE or ALLOCATIONGRANULARITY.
So if you want to clear anything you have to assign a new value first and then write it to the memory i.e. flush it.

Understanding elasticsearch circuit_breaking_exception

I am trying to figure out why I am getting this error when indexing a document from a python web app.
The document in this case is a base64 encoded string of a file of size 10877 KB.
I post it to my web app, which then posts it via elasticsearch.py to my elastic instance.
My elastic instance throws an error:
TransportError(429, 'circuit_breaking_exception', '[parent] Data
too large, data for [<http_request>] would be
[1031753160/983.9mb], which is larger than the limit of
[986932838/941.2mb], real usage: [1002052432/955.6mb], new bytes
reserved: [29700728/28.3mb], usages [request=0/0b,
fielddata=0/0b, in_flight_requests=29700728/28.3mb,
accounting=202042/197.3kb]')
I am trying to understand why my 10877 KB file ends up at a size of 983mb as reported by elastic.
I understand that increasing the JVM max heap size may allow me to send bigger files, but I am more wondering why it appears the request size is 10x the size of what I am expecting.
Let us see what we have here, step by step:
[parent] Data too large, data for [<http_request>]
gives the name of the circuit breaker
would be [1031753160/983.9mb],
says, how the heap size will look, when the request would be executed
which is larger than the limit of [986932838/941.2mb],
tells us the current setting of the circuit breaker above
real usage: [1002052432/955.6mb],
this is the real usage of the heap
new bytes reserved: [29700728/28.3mb],
actually an estimatiom, what impact the request will have (the size of the data structures which needs to be created in order to process the request). Your ~10MB file will probably consume 28.3MB.
usages [
request=0/0b,
fielddata=0/0b,
in_flight_requests=29700728/28.3mb,
accounting=202042/197.3kb
]
This last line tells us how the estmation is being calculated.

Invalidate/prevent memoize with plone.memoize.ram

I've and Zope utility with a method that perform network processes.
As the result of the is valid for a while, I'm using plone.memoize.ram to cache the result.
MyClass(object):
#cache(cache_key)
def do_auth(self, adapter, data):
# performing expensive network process here
...and the cache function:
def cache_key(method, utility, data):
return time() // 60 * 60))
But I want to prevent the memoization to take place when the do_auth call returns empty results (or raise network errors).
Looking at the plone.memoize code it seems I need to raise ram.DontCache() exception, but before doing this I need a way to investigate the old cached value.
How can I get the cached data from the cache storage?
I put this together from several code I wrote...
It's not tested but may help you.
You may access the cached data using the ICacheChooser utility.
It's call method needs the dotted name to the function you cached, in your case itself
key = '{0}.{1}'.format(__name__, method.__name__)
cache = getUtility(ICacheChooser)(key)
storage = cache.ramcache._getStorage()._data
cached_infos = storage.get(key)
In cached_infos there should be all infos you need.

Please suggest a way to store a temp file in Windows Azure

Here I have a simple feature on ASP.NET MVC3 which host on Azure.
1st step: user upload a picture
2nd step: user crop the uploaded picture
3rd: system save the cropped picture, delete the temp file which is the uploaded original picture
Here is the problem I am facing now: where to store the temp file?
I tried on windows system somewhere, or on LocalResources: the problem is these resources are per Instance, so here is no guarantee the code on an instance shows the picture to crop will be the same code on the same instance that saved the temp file.
Do you have any idea on this temp file issue?
normally the file exist just for a while before delete it
the temp file needs to be Instance independent
Better the file can have some expire setting (for example, 1H) to delete itself, in case code crashed somewhere.
OK. So what you're after is basically somthing that is shared storage but expires. Amazon have just announced a rather nice setting called object expiration (https://forums.aws.amazon.com/ann.jspa?annID=1303). Nothing like this for Windows Azure storage yet unfortunately, but, doesnt mean we can't come up with some other approach; indeed even come up with a better (more cost effective) approach.
You say that it needs to be instance independant which means using a local temp drive is out of the picture. As others have said my initial leaning would be towards Blob storage but you will have cleanup effort there. If you are working with large images (>1MB) or low throughput (<100rps) then I think Blob storage is the only option. If you are working with smaller images AND high throughput then the transaction costs for blob storage will start to really add up (I have a white paper coming out soon which shows some modelling of this but some quick thoughts are below).
For a scenario with small images and high throughput a better option might be to use the Windows Azure Cache as your temporary storaage area. At first glance it will be eye wateringly expensive; on a per GB basis (110GB/month for Cache, 12c/GB for Storage). But, with storage your transactions are paid for whereas with Cache they are 'free'. (Quotas are here: http://msdn.microsoft.com/en-us/library/hh697522.aspx#C_BKMK_FAQ8) This can really add up; e.g. using 100kb temp files held for 20 minutes with a system throughput of 1500rps using Cache is about $1000 per month vs $15000 per month for storage transactions.
The Azure Cache approach is well worth considering, but, to be sure it is the 'best' approach I'd really want to know;
Size of images
Throughput per hour
A bit more detail on the actual client interaction with the server during the crop process? Is it an interactive process where the user will pull the iamge into their browser and crop visually? Or is it just a simple crop?
Here is what I see as a possible approach:
user upload the picture
your code saves it to a blob and have some data backend to know the relation between user session and uploaded image (mark it as temp image)
display the image in the cropping user interface interface
when user is done cropping on the client:
4.1. retrieve the original from the blob
4.2. crop it according the data sent from the user
4.3. delete the original from the blob and the record in the data backend used in step 2
4.4. save the final to another blob (final blob).
And have one background process checking for "expired" temp images in the data backend (used in step 2) to delete the images and the records in the data backend.
Please note that even in WebRole, you still have the RoleEntryPoint descendant, and you still can override the Run method. Impleneting the infinite loop in the Run() (that method shall never exit!) method, you can check if there is anything for deleting every N seconds (depending on your Thread.Sleep() in the Run().
You can use the Azure blob storage. Have look at this tutorial.
Under sample will be help you.
https://code.msdn.microsoft.com/How-to-store-temp-files-in-d33bbb10
you have two way of temp file in Azure.
1, you can use Path.GetTempPath and Path.GetTempFilename() functions for the temp file name
2, you can use Azure blob to simulate it.
private long TotalLimitSizeOfTempFiles = 100 * 1024 * 1024;
private async Task SaveTempFile(string fileName, long contentLenght, Stream inputStream)
{
try
{
//firstly, we need check the container if exists or not. And if not, we need to create one.
await container.CreateIfNotExistsAsync();
//init a blobReference
CloudBlockBlob tempFileBlob = container.GetBlockBlobReference(fileName);
//if the blobReference is exists, delete the old blob
tempFileBlob.DeleteIfExists();
//check the count of blob if over limit or not, if yes, clear them.
await CleanStorageIfReachLimit(contentLenght);
//and upload the new file in this
tempFileBlob.UploadFromStream(inputStream);
}
catch (Exception ex)
{
if (ex.InnerException != null)
{
throw ex.InnerException;
}
else
{
throw ex;
}
}
}
//check the count of blob if over limit or not, if yes, clear them.
private async Task CleanStorageIfReachLimit(long newFileLength)
{
List<CloudBlob> blobs = container.ListBlobs()
.OfType<CloudBlob>()
.OrderBy(m => m.Properties.LastModified)
.ToList();
//get total size of all blobs.
long totalSize = blobs.Sum(m => m.Properties.Length);
//calculate out the real limit size of before upload
long realLimetSize = TotalLimitSizeOfTempFiles - newFileLength;
//delete all,when the free size is enough, break this loop,and stop delete blob anymore
foreach (CloudBlob item in blobs)
{
if (totalSize <= realLimetSize)
{
break;
}
await item.DeleteIfExistsAsync();
totalSize -= item.Properties.Length;
}
}

Resources