Search performance problems with Alfresco Search Services - alfresco

We are trying to migrate an Alfresco CE system from 5.2 with solr4 to Alfresco 6.1 with Alfresco Search Services (we tried 1.3 and 1.4) but we are facing massive performance problems using Alfresco Search Services / Solr6: Searches running on a similar setup take 3-5 x longer.
Some background:
Alfresco 5.2 / solr4 is running on Ubuntu 16 / OracleJdk 8
Alfresco 6.1 / ASS 1.4 is running on Ubuntu 18 / Adopt OpenJDK 11
Repository and ASS are running on dedicated servers (no docker involved)
solr index is stored on a very fast ssd SAN ext4 device having no issues for random and sequential access / number of IOPS
all boxes have 8 cores, 16 GB RAM
all boxes have jvm with 12 GB heap space
both solr versions have the same configuration for caches
both solr versions have the same memory configuration
number of solr docs: ~ 7,000,000
What we could observe:
searching for simple words like alfresco, christmas, ... Alfresco 5.2/solr4 returns a not already cached result in ~1-2 sec
searching for simple words like alfresco, christmas, ... Alfresco 6.1/solr6 returns a not already cached result in ~7-15 sec
Alfresco 5.2/solr4 shows in solr admin ui to use ~9 of 12 GB heap space
Alfresco 6.1/solr6 shows in solr admin ui to use ~3 of 12 GB heap space
We already tried to increase RAM, heap space, CPU without any change in the search performance.
I wonder why sol46/ASS consumes so little heap space.
Does anybody have similar experience?
What should we do to get more acceptable response times?
I also tried to configure sharding in solr6 (without being convinced that this solves the real problem) but Creating Solr shards in Alfresco 6.1 CE seams not to work either.

it pointed out that the search performance issue was caused by a fix from the community to work around localization restrictions (by adding locale = '*' in the search query).
Instead the index should be always created with cross locale properties which is not set by default. e.g. in shared.properties
# Data types that support cross locale/word splitting/token patterns if tokenised
alfresco.cross.locale.datatype.0={http://www.alfresco.org/model/dictionary/1.0}text
alfresco.cross.locale.datatype.1={http://www.alfresco.org/model/dictionary/1.0}content
alfresco.cross.locale.datatype.2={http://www.alfresco.org/model/dictionary/1.0}mltext
please check https://github.com/Alfresco/SearchServices/issues/234 for more details.

Related

Tar Compaction activity on Adobe AEM repository is stuck

I am trying to perform a Revision Cleanup activity on AEM Repository to reduce the size of the same by Tar Compaction. The Repository Size is 730 GB and Adobe Version is 6.1 which is old. The estimated time is 7-8 hours for the activity to get completed. We ran it for 24 hours straight but the activity is still running and no output. We have also tried running all the commands to speed up the process but still it is taking time.
Kindly suggest an alternative to reduce the size of the repository.
Adobe does not provide support to older versions, so we cannot raise a ticket.
Try to check the memory assigned to your machine, RAM memory I mean for JVM. Maybe if you increase it will take less and finish.
The repository size is not big at all. Mine is more than 1TB and is working.
In order to clean your repo you can try to run the Garbage Collector directly from AEM on JMX console.
The only way to reduce the datastorage is compact the server, or delete content like big assets or also big packages. Create some queries to see which assets / packages are huge and also delete them.
Hope you can fix your issue.
Regards,

GC / OOM problems with Alfresco Search Services and OpenJDK11

I updated Alfresco Search Services from 1.3 to 1.4 which forced me to also update OpenJDK from 8 to 11. Running Alfresco Search Services 1.3 with jdk8 worked without any OutOfMemoryExceptions during (re)indexing but with jdk11 we see repeatable growing the heap until the solr oom killer kills the solr process. During indexing the jvm performs continuously GC with jdk but I guess jdk11 changed GC in a way that the objects will stay longer in memory. Continuous GC indicates inefficient object creatrion but this is nothing I can influence. I tried with UseConcMarkSweepGC and G1 garbage collector but with the same behavior. Does anybody know a way how to configure GC in OpenJDK 11 to behave similar as in OpenJDK8 with Alfresco Search Services / solr6?
My parameters in solr.in.sh
SOLR_JAVA_MEM="-Xms16g -Xmx30g"
SOLR_OPTS="$SOLR_OPTS -Dsolr.jetty.request.header.size=1000000 -Dsolr.jetty.threads.stop.timeout=300000 -Ddisable.configEdit=true -Dsolr.allow.unsafe.resourceloading=true"
SOLR_OPTS="$SOLR_OPTS -XX:+UseConcMarkSweepGC -XX:-DisableExplicitGC -XX:-UseGCOverheadLimit"
SOLR_OPTS="$SOLR_OPTS -server -Djava.net.preferIPv4Stack=true -Duser.language=en -Duser.country=US -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djava.net.preferIPv6Addresses=false"
SOLR_OPTS="$SOLR_OPTS -Dsun.security.ssl.allowUnsafeRenegotiation=true -Dsolr.allow.unsafe.resourceloading=true"
Issue is caused by a bug in the Alfresco Solr Tracker not recognizing recursions correctly (e.g. groups as member in groups or secondary child assocs in Alfresco). We worked around by replacing all secondary child associations by alfresco links.
Alfresco Search Services 2.0 should have fixes for that recursion isse but that requires Alfresco Content Services 6.2

Symfony2 slow page loads despite quick initialization/query/render time?

I'm working on a Symfony2 project that is experiencing a slow load time for one particular page. The page in question does run a pretty large query that includes 16 joins, and I was expecting that to be the culprit. And maybe it is, and I am just struggling to interpret the figures in the debug toolbar properly. Here are the basic stats:
Peak Memory: 15.2 MB
Database Queries: 2
Initialization Time: 6ms
Render Time: 19ms
Query Time: 490.57 ms
TOTAL TIME: 21530 ms
I get the same basic results, more or less, in three different environments:
php 5.4.43 + Symfony 2.3
php 5.4.43 + Symfony 2.8
php 5.6.11 + Symfony 2.8
Given that the initialization + query + render time is nowhere near the TOTAL TIME figure, I'm wondering what else comes into play, and other method I could go about identifying the bottle neck. Currently, the query is set up to pull ->getQuery()->getResult(). From what I've read, this can present huge overhead, as returning full result objects means that each of the X objects needs to be hydrated. (For the sake of context, we are talking about less than 50 top-level/parent objects in this case). Consequently, many folks suggest using ->getQuery()->getArrayResult() instead, to return simple arrays as opposed to hydrated objects to drastically reduce the overhead. This sounded reasonable enough to me so, despite it requiring some template changes in order for the page to render the alternate type of result, I gave it a shot. It did reduce the TOTAL TIME, but by a generally unnoticeable amount (reducing from 21530ms to 20670 ms).
I have been playing with Docker as well, and decided to spin up a minimal docker environment that uses the original getResult() query in Symfony 2.8 code running on php 7. This environment is using the internal php webserver, as opposed to Apache, and I am not sure if that should/could have any affect. While the page load is still slow, it seems to be markedly improved on php 7. The other interesting part is that, while the TOTAL TIME was reduced a good deal, most of the other developer toolbar figured went up:
Peak Memory: 235.9 MB
Queries: 2
Initialization Time: 6 ms
Render Time: 53 ms
Query Time: 2015 ms
TOTAL TIME: 7584 ms
So, the page loads on php 7 in 35% of the amount of time that it takes to load on php 5.4/5.6. This is good to know, and provides a compelling argument for why we should upgrade. That being said, I am still interested in figuring out what are the common factors that explain large discrepancies between TOTAL TIME and the sum of [initialization time + query time + render time]. I'm guessing that I shouldn't expect these numbers to line up exactly, but I notice that, while still off, they are significantly closer in the php 7 Docker environment than they are in the php 5.4/5.6 environments.
For the sake of clarity, the docker container naturally spins up with a php.ini memory_limit setting of -1. The other environments were using 256M, and I even dialed that up to 1024M, but saw no noticeable change in performance, and the "Peak Memory" figure stayed low. I tried re-creating the Docker environment with 1024M and also did not notice a difference there.
Thanks in advance for any advice.
EDIT
I've tested loading the page via the php 5.6 / Symfony 2.8 environment via php's internal webserver, and it loads in about half the time. Still not as good a php 7 + internal server, but at least it gives me a solid lead that something about with my Apache setup is at least significantly related (though not necessarily the sole culprit). Any/all advice/suggestions welcome!

OutofMemory error with ibm jdk 1.7

I am using IBM jdk 1.7(to support TLS cyphers) for an struts based application deployed with embedded tomcat.
We are running with memory leaks(OOM) that generated almost 30 gigs of dumps.This has become a rotine event.
We have tried increasing the heap mem by including
wrapper.java.additional.1="-XX:MaxPermSize=256m -Xss2048k" in the wrapper.conf.
But this didnt help much.
Try using Memory Analyzer, you can follow the instructions here to download and install it:
https://www.ibm.com/developerworks/java/jdk/tools/memoryanalyzer/
It should provide an overview of your heap usage.
I'd recommend starting with the dominator tree view to see which objects are responsible for keeping data alive on the heap. You can also run various reports which analyse the heap for you.
You should have core files (.dmp) and heap dumps (.phd), the core files will be large but may be faster to access and will also contain all the values for basic types in objects and strings. The phd files just contain object sizes and the connections between them. It may be easier to relate what you are seeing back to your code if you start with the core file.

obtain data from remote computers using autoit

I have been maintaining a program written in batch. I want to write a replacement program using autoit.
The program is downloaded to the desktop of remote computers and prints out a log of the scan results in notepad on the desktop.
I want it to cover windows XP-vista-7-8-8.1-10. At the moment it does not cover 8-8.1 or 10.
This is the printout:
Results of my test version 001
Windows 7 x86 Service Pack 1 ---- (shows in brackets if service pack is out of date)
(UAC) --- shows if UAC is on or disabled.
Internet Explorer----(shows if out of date)
Antivirus/Firewall Check:
Windows Firewall Enabled!
Panda Free Antivirus
WMI entry may not exist for antivirus; attempting automatic update.
Anti-malware/Other Utilities Check:
CCleaner
Java 8 Update 31 (Java version out of Date!)
Adobe Flash Player 17.0.0.188
Adobe Reader XI
Mozilla Firefox (38.0.5)
Thunderbird (38)
System Health check
Total Fragmentation on Drive C: 2%
````````````````````End of Log``````````````````````
So this is possible. To get versions of files(like java and firefox) I think you can use FileGetVersion
To get if windows filewall is enabled you have to read the registry. This key might be a little bit different depending on your system but the one for me was this one
RegRead("HKLM\SYSTEM\CurrentControlSet\Services\SharedAccess\Parameters\FirewallPolicy\DomainProfile\", "EnableFirewall")
These two macros should be usefull to determine OS specific information that you request
#OSType
#OSVersion
UAC can also be read from the registry and as with the firewall it might depend on your system but for me this was the registry:
RegRead("HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System", "EnableLUA")
Im not quite sure what the Total Fragmentation means so I am not sure how you can get this.
You should be able to compose a txt file with all this information. You should be able to find examples of autoit code that transfers text files just by searching here on stackoverflow or on google.

Resources