Nexus Repository Manager 3.14 with Ceph blobstore performance

Nexus Repository Manager 3.14 with Ceph blobstore performance - nexus

I've set up NXRM 3.14 with a ceph (S3 compatible) blobstore back-end. I've been testing it both on physical hardware and inside a docker container.
It "works" but is much, much slower than uploading directly to the bucket (a 2 second upload directly to the bucket may take 2 minutes through NXRM)
I haven't found any bugs or complaints about this, so I'm guessing it's specific to ceph and that the performance may be fine with S3. Uploads to the local filesystem are also very fast.
I've found nothing in the log files to indicate performance problems.
Sorry this question is extremely vague, but does anyone have recommendations for debugging NXRM performance or maybe is anyone using a similar setup? Thanks.

I eventually tracked this down in the NXRM open source code, the current MultipartUploader is single threaded (https://github.com/sonatype/nexus-public/blob/master/plugins/nexus-blobstore-s3/src/main/java/org/sonatype/nexus/blobstore/s3/internal/MultipartUploader.java) and uploads chunks sequentially.
For files larger than 5mb, this introduces a considerable slowdown in upload times.
I've submitted an improvement suggestion on their issue tracker: https://issues.sonatype.org/browse/NEXUS-19566

Related

Amazon Aws EC2 - Bitnami - wordpress configuration - extremely slow loading assets

I am trying to test out the feasibility of moving my website from godaddy to AWS.
I used a wordpress migrate plugin which seems to have moved the complete site and at least peripherally appears to be moved properly.
However, when I try to access the site, it is extremely slow. Upon using developer tools, I can tell that some of the css and jpg images are sort of acting as blocking threads.
However, I cannot tell why this is the case. The site loads in less than 3 seconds in godaddy, however, it takes over a minute to load it fully on AWS and there are at least a few requests that timeout. Waterfall view on chrome developer tools show a lot of waiting on multiple requests and I cannot seem to figure out what or why these requests are sort of waiting forever and timing out.
Any guidance is appreciated.
I have pointed the current instance to www. blind beliefs .com
I cannot seem to figure out if it is an issue with the bitnami wordpress AMI or if I am doing something wrong. May be I should go the traditional route of spinning up EC2 instance , run a server on it, connect it to a db and then install wordpress on my server. I just felt the AMI available took care of all of that tailoring without me having to manually doing it.
However, it is difficult to debug though as to why certain assets get blocked/are extremely slow and timeout without loading.
Thank you.
Some more details:
The domain is still at godaddy and I have not moved it to AWS yet, not sure if that is sort of having an impact.
I still feel it has to do with the AMI though - cannot prove it.

Your issue sounds like you have a free memory problem. You did not go into details on the instance size, if MySQL is installed on the instance, etc.
This article will show you how to determine memory usage on your instance. When free memory is low OR you start using SWAP space, your machine will become very slow. Your goal should be 0 bytes used in SWAP space and at least 25% free memory during normal operations.
Other factors to check is percent CPU utilization and free disk space on your file systems.
Linux Memory Check Commands
If you have a free memory problem, increase the instance size. If you have a CPU usage problem, either change the instance size or switch to another instance type. If you have a free disk space problem, create a new instance with a larger EBS volume OR move your website, etc to a new EBS volume sized correctly.

How do you set up a large scale Alfresco CIFS server?

Alfresco provides a CIFS connector so it can act just a normal file-server in your intranet.
Compared with a "normal" (windows/samba) based fileserver, certain operations can really hurt the system, e.g. listing a folder with a few thousand files using windows explorer. Not quite sure, but I think permission checking is the primary reason for this case. Anyways, now assume you have a big filesystem hierarchy exposed and many users using CIFS, stressing the system, effectively "knocking it down".
What is the suggested approach to scale / improve performance ?

In my experience Windows Explorer is part of the CIFS performance issue. I don't have exact numbers, but I remember working on an instance with roughly 500GB data, mostly composed of small images and a few texts in a not well balanced folder tree, for which listing a folder with a thousand children was taking in Explorer around a minute to display. The same operation was taking around 3s on Chrome browser.
We never had time to investigate the issue thoroughly, but we saw an impressive amount of traffic generated by Explorer due to prefetch of information of the subfolders of the currently open folder.

Been revisiting the issue a little, and I guess the best answer I can give for now is: Tweak the cache(s).
I used a 5k children space, default cache values and benchmarked executing "ls -alrt" on the CIFS mount running alfresco 4.0.d.
The first execution took roughly two minutes bombarding the (lightning fast) mysql database with approx 200k queries.
The second execution took "only" around 40 seconds, but the amount of queries did not change significantly.
Increasing the CIFS fileinfo cache, I got the second time down to 30 seconds, but I still see 160k DB queries firing. I'm fairly sure this lions share has to do with permissions/ACLs and it should be possible improve the situation a lot.
PS: Windows Explorer definitely behaves a little unexpected, but I cannot confirm that it makes a significant difference regarding user experience.
PPS: https://issues.alfresco.com/jira/browse/ALFCOM-2951
PPPS: I'll look into this further when I find the time - should be this year. ;)
Update: The massive amount of queries is no permission issue.

Permission checks definitely IS a part of the problem. I can't link to anything specific, but browsing alfresco forums and the net for the last few years I've learned that permissions can hurt the performance.
I've read (and experienced) in several scenarios that alfresco spaces with large numbers of children (1000+) can be painfully slow. One part you noticed yourself: it takes a while to go through 100-200k queries. But hook up something into alfresco to watch what's it doing and you'll see that massive amounts of time go on serialization/deserialization (e.g.webscripts for share) and also node traversal (hence the thousands of queries and averages of 400-500 qps when nobody is logged on).
So you're on the right way with your cache optimizations.
Do you have dedicated hardware for your installation? I've had big issues with performance, but I've moved the MySQL server to a separate box (server-grade hardware - 4 cores, 8GB ram, SSD for myqsl server and SAS for tomcat server etc) and I gained a lot. So, get on with begging for the new hardware too :)
I think you're on the right path here.

RAMDisk reliability

I was looking at the existing RAMDisk discussions ... and none seem to bring up any reliability issues. I recently started using a Dataram ramdisk for my source code and am wondering if there are any risks I should be concerned with.
It did speed up the compile time by 30%

I am not fully familiar with that product, but the answer probably depends on whether you have a (good) UPS, and what you are using to sync changes with your hard drive. I had looked into this a while ago (on a linux machine) mapping a portion of the ram as a disk and using RSYNC to persist changes to the hard drive, but discontinued the idea and got a faster hard drive instead :) I would be very interested in seeing this working...

rsync vs SyncML (Funambol)

I would like some idea about how rsync compares to SyncML/Funambol, especially when it comes to bandwidth, sync over unstable network and multiple clients to one server.
This is to sync several mobile devices with a directory structure of growing text-files. (Se we essentially want as much as possible on the server, and inconsistent files is not really a problem, also we know where changes originates).
So far, it seems Funambol doesn't compress, doesn't handle partial updates, and it is difficult to handle interruptions in a file-transfer.
I know rsync doesn't go through the server, but I don't quite see how that is a disadvantage.

Olav,
rsync can:
Compress the data (as you said) - thus gaining better performances over the net.
Synchronize only the newest data within each file - thus, once again, saving time.
Can be ran by multiple users at the same time. It's a very basic backup software behavior.
And one of my favorites: work over a secure shell.
You might want to check Rsyncrypto, for compressing and encrypting at the same time.
Dotan

Tools to diagnose slow compile/build reasons

In a number of situations as a programmer, I've found that my compile times are slower than I would like, and I want to understand the reason and fix them. Particular language (yes, I'm using C/C++) tricks have already been discussed, and we apply many of them. I've also seen this question and realize it's related. What I'm more interested in is what tools people use to diagnose hardware/system bottlenecks in the build process. Is there a standard way to prove "Disk read/writes are too slow for our builds - we need SSD's!" or "The anti-virus settings are killing our build times!", etc...?
Resources I've found, none directly related to compiling performance diagnosis:
A TechNet article about using PerfMon (Quite good, close to what I'd like)
This IBM link detailing some PerfMon information, but it's not specific to compiling and appears somewhat out of date.
A webpage specifically describing diagnosis of avg disk queue length
Currently, diagnosing a slow build is very much an art, and my tools of choice are:
PerfMon
Process Explorer
Process Monitor
Push hard enough to get a machine to "just try it". (Basically, trial and error.)
What do others do to diagnose system-level build performance bottlenecks?
Can we come up with a list of PerfMon or Process Explorer statistics to watch for, with thresholds for whats "acceptable" on a modern machine?
PerfMon:
CPU -> % of processor time
MEMORY -> Page/sec
DISK -> Avg. disk queue length
Process Explorer:
CPU -> CPU
DISK -> I/O Delta Total
MEMORY -> Page Faults

I resolved a "too slow build" time issue with Eclipse and Spring recently. For me the solution was to use the Vista Resource Monitor (which identified CPU spiking but not consistently high) and quite a bit of disk activity. I then used Procmon from Sysinternals to identify exactly which files were being heavily accessed.
Part of our build process also involves checking external Maven (binary file) repositories for updates every build. I disabled that check (which also gives me perfect control over when I update those dependencies). If you have resources external to the build machine, benchmark how long it takes to access them (source control, maven, etc.).
Since I'm stuck on 32-bit Vista for now, I decided to try creating a Ramdisk with the 700MB of non-addressable memory (PC has 4GB, Vista only exposes 3.3GB) and place the heavily accessed files as identified by Procmon on the Ramdisk using a nice trick of creating drive junctions to make that move transparent to my IDE. For details see here.

I have used filemon to see the header files that a C++ build was most often opening then used:
“#ifndef checks” so header files are only included once
Precompiled headers
Combined some small header files
Reduce the number of header files included by other header files by tidying up the code.
However these days I would start with a RamDisk and or SSD, but opening lot of header files still uses lots of CPU time.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex