I am trying to figure out why I am getting this error when indexing a document from a python web app.
The document in this case is a base64 encoded string of a file of size 10877 KB.
I post it to my web app, which then posts it via elasticsearch.py to my elastic instance.
My elastic instance throws an error:
TransportError(429, 'circuit_breaking_exception', '[parent] Data
too large, data for [<http_request>] would be
[1031753160/983.9mb], which is larger than the limit of
[986932838/941.2mb], real usage: [1002052432/955.6mb], new bytes
reserved: [29700728/28.3mb], usages [request=0/0b,
fielddata=0/0b, in_flight_requests=29700728/28.3mb,
accounting=202042/197.3kb]')
I am trying to understand why my 10877 KB file ends up at a size of 983mb as reported by elastic.
I understand that increasing the JVM max heap size may allow me to send bigger files, but I am more wondering why it appears the request size is 10x the size of what I am expecting.
Let us see what we have here, step by step:
[parent] Data too large, data for [<http_request>]
gives the name of the circuit breaker
would be [1031753160/983.9mb],
says, how the heap size will look, when the request would be executed
which is larger than the limit of [986932838/941.2mb],
tells us the current setting of the circuit breaker above
real usage: [1002052432/955.6mb],
this is the real usage of the heap
new bytes reserved: [29700728/28.3mb],
actually an estimatiom, what impact the request will have (the size of the data structures which needs to be created in order to process the request). Your ~10MB file will probably consume 28.3MB.
usages [
request=0/0b,
fielddata=0/0b,
in_flight_requests=29700728/28.3mb,
accounting=202042/197.3kb
]
This last line tells us how the estmation is being calculated.
Related
We are running some pyspark jobs on GCP dataproc and we are trying to process one file which is ~2 GB in size and has 925 columns in it. When we are trying to process that file we are getting the error "An error occurred while calling o3955.save: java.lang.StackOverflowError". We have tried increasing the memory of "spark.driver.extraJavaOptions" and "spark.driver.memory" and some other configs but it's still not working in dataprocenter image description here
We have one UDF (apply_type_caste()) in the script that does type casting of data (a PySpark dataframe) into the custom data type that has been provided from the metadata table from bigquery (the lookup table) for these 925-columns. You can see that line no. 2692 is problematic while executing type casting because he is not getting enough static JVM memory to process. So I'm having memory problems while running. It worked fine up to 300 columns, but after that, we got an error.
enter image description here
We have tried these configurations to increase memory performance with all machine types that I mentioned in above but got the same error.
.config("spark.driver.memory","6G") - - - - -upto 30g
.config("spark.executor.memory","6G")
.config("spark.executor.cores","2")
.config("spark.executor.instances","2")
.config("spark.driver.maxResultSize", '8g')
.config("spark.memory.offHeap.enabled", 'true')
.config("spark.memory.offHeap.size", '4g')
.config("spark.driver.extraJavaOptions","-Xmx")
.config("spark.executor.defaultJavaOptions","-Xmx")
.config("spark.driver.extraJavaOptions","-Xss32M")
.config("spark.executor.defaultJavaOptions","-Xmx64M")
PLease let me know what i can do here
My Java application's codecache is nearly full, so I use VM.getVM().getCodeCache().iterate() dump codecache info. Then call CodeBlob.getSize() count total size.
However codeBolbs' total size is not equals to JVM
How to get sun.jvm.hotspot.code.CodeBlob real size in codecache
I am trying to map a region of fpga memory to host system,
resource0 = os.open("/sys/bus/pci/devices/0000:0b:00.0/resource0", os.O_RDWR | os.O_SYNC)
resource_size = os.fstat(resource0).st_size
mem = mmap.mmap(resource0, 65536, flags=mmap.MAP_SHARED, prot=mmap.PROT_WRITE|mmap.PROT_READ, offset= 0 )
If i flush my host page with
mem.flush()
then print the contents
the data is same as before,
nothing is getting cleared from page
print(mem[0:131072])
mem.flush()
print(mem[0:131072])
as i read on python mmap docs , it says it clears then content,
https://docs.python.org/3.6/library/mmap.html
but when i test it remains same
i am using python 3.6.9
Why do you expect flush to clear a page?
https://docs.python.org/2/library/mmap.html
flush([offset, size])
Flushes changes made to the in-memory copy of a file back to disk. Without use of this call there is no guarantee that changes are written back before the object is destroyed. If offset and size are specified, only changes to the given range of bytes will be flushed to disk; otherwise, the whole extent of the mapping is flushed. offset must be a multiple of the PAGESIZE or ALLOCATIONGRANULARITY.
So if you want to clear anything you have to assign a new value first and then write it to the memory i.e. flush it.
I have an application which dynamically generates oozie workflow.xml and now the size is increased to 245,524 bytes which is exceeding the default limit of 100000 bytes and getting the below error while running the job:
Error: E0736 : E0736: Workflow definition length [245,524] exceeded maximum allowed length [100,000]
This property can be set in oozie-default.xml but I would like to set in the application level. Is there any other way to set it?
This property can't be set on an application level, only in oozie-site.xml. Setting it requires an Oozie restart.
Have you considered breaking down your huge xml to many smaller pieces using the subworkflow action? It might help you reduce some duplication as well if you use parameters in the subworkflows.
I am dealing with Message Passing IPCS method. I do have few question regarding this:
KEY field in ipcs -q shows me 0x00000000 what does this means ?
Can i see what messsage is passes using msqid ?
If two entries are present (for a particular user) after executing command ipcs -q. Does this means that two messages were passed by this particular user ?
If used-bytes and message fields are set as 0 what does this mean?
Is there away to see if message queue is full or not?
How many queues can we have for one particular user?
I tried goggling, but was not able to find answer to these questions.
Please help
1. The "key" field of the Shared memory segments is usually 0x00000000. This indicates the IPC_PRIVATE key specified during creation of the shared memory segment. The manual of shmget() contains more details.
2. AFAIK, this cannot be done. If any msg is "de-queued" from the msgQ, then the intended receiver will not see it.
3. The 2 entries in the list of message queues indicates that there are currently 2 active message queues on the system identified by their corresponding unique keys.
Creating additional msgQ : ipcmk -Q
Deleting an existing msgQ : ipcrm -Q <unique-key>
4. The used-bytes and messages fields set to 0 indicate that currently no transfers have occurred using that particular msgQ.
5. Currently one way to do this to obtain the number of msgs currently queued-up in the msgQ programmatically as shown in the following C snippet. Next this can be compared with the size of the msgQ as demonstrated in this answer.
int ret = msgctl(msqid, IPC_STAT, &buf);
uint msg = (uint)(buf.msg_qnum);
printf("msgs in Q = %u\n", msg);
6. There exists a limit on the total memory used by all the msgQs on the system combined together. This can be obtained by ulimit -q. The amount of bytes used in a msgQ is listed under the used-bytes column in the output of ipcs -Q. The total number of msgQs is limited only by the amount of memory available to create a new msgQ from the msgQ memory pool limit seen above.
Also checkout the latter part of this answer for a few sample operations on POSIX message queues.