Artifactory trash contains lots of empty folders - artifactory

We have a mechanism in place that deletes artifacts from Artifactory (Cloud) on a regular basis meaning our trash is quite big. It uses 33% of our total storage.
The retention policy of the trash is set to 14 days. When I look at the trash can I notice a lot of empty folders. I assume they are empty because nothings is visible when I expand them in the UI and when I use the REST call https://<my_organization>.jfrog.io/artifactory/api/storage/auto-trashcan?list&deep=1&listFolders=1&mdTimestamps=1&includeRootPath=1 the sizes shows as -1.
I have more than 100k "empty" folders in the trash.
Why are they here and should I worry about those?

Related

No Empty Trash scheduled task in Sonatype Nexus OSS 3

Nexus used to have a scheduled task option to empty trash, but this is not present in Nexus 3:
Whenever I delete Assets or Components, my blob store's size doesn't decrease, making it very difficult to maintain in the long term.
How do I empty the trash and permanently remove deleted assets and components so that the blob size goes down? Groovy scripts are welcome too.
The "Empty Trash" seems to not exist because it is replaced by the "Compact blob store" scheduled task.
To reduce space you need to first delete the assets and components and then run the "Compact blob store" task.
Empty Trash and Compact Blobstore are two different features that end up with similar end states. One would be the ability to see what you have deleted, and then assumedly restore something, or choose to finally delete it, which is an open box, compared to Compact Blobstore which is a black box, and just allows you to free up space.
There has been quite a bit of internal debate over this and what should be the path forward. I encourage you to file an issue about this:
https://issues.sonatype.org/projects/NEXUS/
Assumedly work can be done to make the Compact Blobstore a bit more transparent, and accomplish similar functionality as to having a "Trash can", if such is desired.

I have Advagg Module for Drupal 7.x and i have many files in folders advagg_js/css.... why?

I have Drupal 7.x and Advanced CSS/JS Aggregation 7.x-2.7
In folders advagg_js and advagg_css (path is sites/default/files) i have
I have too many identical files and i don't understand why...
This is a name of file in advagg_css :
css____tQ6DKNpjnnLOLLOo1chze6a0EuAzr40c2JW8LEnlk__CmbidT93019ZJXjBPnKuAOSV78GHKPC3vgAjyUWRvNg__U78DXVtmNgrsprQhJ0bcjElTm2p5INlkJg6oQm4a72o
How can I delete all these files without doing damage?
Maybe in performance/advagg/operations in box Cron Maintenance Tasks i must check
Clear All Stale Files
Remove all stale files. Scan all files in the advagg_css/js directories and remove the ones that have not been accessed in the last 30 days.
????
I hope you can help me...
Thanks a lot
I can guarantee that there are very few duplicate files in those directories. If you really want, you can manually delete every file in there; a lot of them will be generated again so you're back to having a lot of files (the css/js files get auto created on demand, just like image styles). AdvAgg is very good at preventing a 404 from happening when requesting an aggregated css/js file. You can adjust how old a file needs to be in order for it to be considered "stale". Inside of the core drupal_delete_file_if_stale() function is the drupal_stale_file_threshold variable. Changing this inside of your settings.php file to something like 2 days $conf['drupal_stale_file_threshold'] = 172800; will make Drupal more aggressive in terms of removing aggregated css and js files.
Long term if you want to reduce the number of different css/js files from being created you'll need to reduce the number of combinations/variations that are possible with your css and js assets. On the "admin/config/development/performance/advagg/bundler" page under raw grouping info it will tell you how many different groupings are currently possible, take that number and multiply it by the number of bundles (usually 2-6 if following a guide like this https://www.drupal.org/node/2493801 or 6-12 if using the default settings) and that's the number of files that can currently be generated. Multiply it by 2 for gzip. On one of our sites that gives us over 4k files.
In terms of file names the first base64 group is the file name, second base64 group are the file contents, and the third base64 group are the advagg settings. This allows for the aggregates contents to be recreated by just knowing the filename as all this additional info is stored in the database.

Does System.Web.Caching utilize an LRU algorithm?

I was just working on the documentation for an open source project I created awhile back called WebCacheHelper. It's an abstraction on top of the existing Cache functionality in System.Web.Caching.
I'm having trouble finding the details of the algorithm used to purge the cache when the server runs low on memory.
I found this text on MSDN:
When the Web server hosting an ASP.NET application runs low on memory,
the Cache object selectively purges items to free system memory. When
an item is added to the cache, you can assign it a relative priority
compared to the other items stored in the cache. Items to which you
assign higher priority values are less likely to be deleted from the
cache when the server is processing a large number of requests,
whereas items to which you assign lower priority values are more
likely to be deleted.
This is still a little vague for my taste. I want to know what other factors are used to determine when to purge a cached object. Is it a combination of the last accessed time and the priority?
Let's take a look at the source code. Purging starts from TrimIfNecessary() method in CacheSingle class. Firstly it tries to remove all expired items in FlushExpiredItems() method of CacheExpires class. If that's not enough it starts iterating through "buckets" in CacheUsage.FlushUnderUsedItems(). Cache usage data/statistics divided into "buckets" according to CacheItemPriority and their statistics/LRU treated separately in each bucket. There're two iteration through buckets. The first iteration removes only newly added items (during last 10 seconds). The second one removes other items. It starts removing items from CacheItemPriority.Low bucket and its LRU items. It stops when removed enough otherwise continues to next LRU items and higher priority buckets. It doesn't touch CacheItemPriority.NotRemovable items as it doesn't add them into usage buckets.

Rackspace CDN container organization

I'm developing a web platform that may reach some million of users where I need to store users' images and docs.
I'm using Rackspace and now I need to define the files logic into cloud files service.
Rackspace allows to create up to 500,000 containers with an account (reference page 17, paragraph 4.2.2) and in addition they suggest to limit each container size up to 500,000 objects (reference Best practice - Limit the Number of Objects in Your Container), which is the best practice for users files management?
One container for user don't seems to be a good solution because there is the 500,000 containers limit.
Rackspace suggests to use virtual container. I'm a bit undecided how to use them.
Thanks in advance.
If you will only be interactive with the files via API calls having 200,000 objects is fine (from my experience, haven't had the need for anything larger).
if you want to try to use the web interface for ANY TASKS AT ALL you need to have far, far less than that. The web interface does not break contents up by folder, so if you have 30,000 objects, the web interface will just paginate them and show them to you in alphabetical order. This is ok for containers with up to a few hundred objects, but beyond that the web interface is unusable.
If you have some number of millions of users, you can use some part of the user ID as a shard key to decide what bucket to use. See http://docs.mongodb.org/manual/core/sharding-internals/#sharding-internals-shard-keys for information about choosing a shard key. It's written for Mongo users, but is applicable here. The takeaway is pick some attribute that will distribute your users somewhat evenly so you don't have one bucket that exceeds the max number of files you want to have per bucket.
One way is to use user ID's, which we can randomly assign and shard based on the first digit. For this example, we'll use the UID's 1234, 2234, 1123, and 2134. Say you want to break files up by the first digit of UID, you'd save user the files for 1234 and 1123 in the container "files_group_1" and the files for 2234 and 2134 in the "files_group_2" container.
Before picking a shard key, make sure you think about how many files users might store. If, for example, a user may store hundreds (or thousands) of files, then you will want to shard by a more unique key than the first digit of a UID.
Hope that helped.

effective data fetching in asp.net

I have this much:
A table which used to store the "Folders". Each folder may contain sub folders and files. So if I click a folder, I have to list the content of the folder.
The table to represent the folder listing is something like the following
FolderID Name Type Desc ParentID
In the case of sub folders, ParentID is refer to the FolderID of the parent folder.
Now, my questions are
1.
a. There are 3 type of folders, I use 3 data lists to categorize them. Can I load the entire table in a single fetch and then use LINQ to categorize the types.
OR
b. Load each category by passing 'Type' to stored procedure. Which will do 3 database calls.
2.
a. If I click the parent folder, use LINQ to filter the contents of the folder(because we have the entire table in memory)
OR
b. If I click the parent folder, pass the FolderID of the parent folder and then fetch the content.
In the two cases above, which points makes more sense, which points are best in the case of performance?
There are a number of considerations you need to make.
What is the size of the folder tree, if not currently large could it potentially become very large?
What is the likelihood that the folder table will be modified whilst a user is using/viewing it? If there is a high chance then it may be worthwhile to make smaller, more frequent calls to the DB so that the user is aware of any changes which have been made by other users.
Will users by working with one folder type at a time? Or will they be switching between these three different trees?
As an instinctive answer I would be drawn towards calling 1 or 2 levels at a time. For example - start with loading root folder and immediate children. As the user navigates down into the tree, retrieve more children...
When you are questioning about the performance, the only available answer is:
Measure it!. Implement both scenarios and look at them - how can they load your system.
Try to think, how will you cache your data to prevent high database load.
All works fast for small n, so we can't say something for sure.
If your data is small and changed not frequently, then use Caching and LINQ-based queries for your cache data.
If your data can't be stored in cache because it is huge, or it changes constantly, then cache the results of your queries, create cache dependensies for them, and again, measure it!

Resources