RAMDisk reliability - build-process

I was looking at the existing RAMDisk discussions ... and none seem to bring up any reliability issues. I recently started using a Dataram ramdisk for my source code and am wondering if there are any risks I should be concerned with.
It did speed up the compile time by 30%

I am not fully familiar with that product, but the answer probably depends on whether you have a (good) UPS, and what you are using to sync changes with your hard drive. I had looked into this a while ago (on a linux machine) mapping a portion of the ram as a disk and using RSYNC to persist changes to the hard drive, but discontinued the idea and got a faster hard drive instead :) I would be very interested in seeing this working...

Related

Getting started with Jupyter in the cloud

So far I've only used Jupyter on my local machine, which is way too slow. I'm completely new to using cloud services for Jupyter, or using cloud services at all for that matter. I know there are a million tutorials out there, but this is my problem: How to choose the right service from all those options (Amazon? Google? Cheaper options?)? What's the 'right way' to get started?
What I need:
I want a service where I can start up a Jupyter notebook in my browser as simply as possible. (I know next to nothing about setting up servers etc., and have very limited time to learn that if needed)
I currently have an old MacBook from 2014. The server should be at least 10x faster. (Which options do I need to pick?)
I want to do machine learning, so GPUs would be good.
My budget is about $50 per month, less would be great; a free tryout would be great too.
As I am completely new, I also need to know what pitfalls to look out for. (E.g.: Stop the machine to stop increasing the costs?)
If you could help me, or point me to a good tutorial or even a book, I'd be forever grateful.
(Sorry for the basic question. Of course I googled tutorials myself before posting this question, but as indicated above, I'm overwhelmed by the options - that's why I posted this question.)
AWS based tutorial:
https://aws.amazon.com/de/getting-started/hands-on/get-started-dlami/
GPU, CPU and pricing informations are gathered here:
https://docs.aws.amazon.com/dlami/latest/devguide/pricing.html
You can set up a budget for cost limitation:
https://aws.amazon.com/de/getting-started/hands-on/control-your-costs-free-tier-budgets/

How do you set up a large scale Alfresco CIFS server?

Alfresco provides a CIFS connector so it can act just a normal file-server in your intranet.
Compared with a "normal" (windows/samba) based fileserver, certain operations can really hurt the system, e.g. listing a folder with a few thousand files using windows explorer. Not quite sure, but I think permission checking is the primary reason for this case. Anyways, now assume you have a big filesystem hierarchy exposed and many users using CIFS, stressing the system, effectively "knocking it down".
What is the suggested approach to scale / improve performance ?
In my experience Windows Explorer is part of the CIFS performance issue. I don't have exact numbers, but I remember working on an instance with roughly 500GB data, mostly composed of small images and a few texts in a not well balanced folder tree, for which listing a folder with a thousand children was taking in Explorer around a minute to display. The same operation was taking around 3s on Chrome browser.
We never had time to investigate the issue thoroughly, but we saw an impressive amount of traffic generated by Explorer due to prefetch of information of the subfolders of the currently open folder.
Been revisiting the issue a little, and I guess the best answer I can give for now is: Tweak the cache(s).
I used a 5k children space, default cache values and benchmarked executing "ls -alrt" on the CIFS mount running alfresco 4.0.d.
The first execution took roughly two minutes bombarding the (lightning fast) mysql database with approx 200k queries.
The second execution took "only" around 40 seconds, but the amount of queries did not change significantly.
Increasing the CIFS fileinfo cache, I got the second time down to 30 seconds, but I still see 160k DB queries firing. I'm fairly sure this lions share has to do with permissions/ACLs and it should be possible improve the situation a lot.
PS: Windows Explorer definitely behaves a little unexpected, but I cannot confirm that it makes a significant difference regarding user experience.
PPS: https://issues.alfresco.com/jira/browse/ALFCOM-2951
PPPS: I'll look into this further when I find the time - should be this year. ;)
Update: The massive amount of queries is no permission issue.
Permission checks definitely IS a part of the problem. I can't link to anything specific, but browsing alfresco forums and the net for the last few years I've learned that permissions can hurt the performance.
I've read (and experienced) in several scenarios that alfresco spaces with large numbers of children (1000+) can be painfully slow. One part you noticed yourself: it takes a while to go through 100-200k queries. But hook up something into alfresco to watch what's it doing and you'll see that massive amounts of time go on serialization/deserialization (e.g.webscripts for share) and also node traversal (hence the thousands of queries and averages of 400-500 qps when nobody is logged on).
So you're on the right way with your cache optimizations.
Do you have dedicated hardware for your installation? I've had big issues with performance, but I've moved the MySQL server to a separate box (server-grade hardware - 4 cores, 8GB ram, SSD for myqsl server and SAS for tomcat server etc) and I gained a lot. So, get on with begging for the new hardware too :)
I think you're on the right path here.

rsync vs SyncML (Funambol)

I would like some idea about how rsync compares to SyncML/Funambol, especially when it comes to bandwidth, sync over unstable network and multiple clients to one server.
This is to sync several mobile devices with a directory structure of growing text-files. (Se we essentially want as much as possible on the server, and inconsistent files is not really a problem, also we know where changes originates).
So far, it seems Funambol doesn't compress, doesn't handle partial updates, and it is difficult to handle interruptions in a file-transfer.
I know rsync doesn't go through the server, but I don't quite see how that is a disadvantage.
Olav,
rsync can:
Compress the data (as you said) - thus gaining better performances over the net.
Synchronize only the newest data within each file - thus, once again, saving time.
Can be ran by multiple users at the same time. It's a very basic backup software behavior.
And one of my favorites: work over a secure shell.
You might want to check Rsyncrypto, for compressing and encrypting at the same time.
Dotan

DVCS for a small company of remote employees

Here's the situation: at my small office, because we like to keep mobile and occasionally work from home, instead of having a central file server, we have all the office documents in an SVN repository, and each person keeps a checkout on their own laptops. A checkout weighs in at about 3GB, and the repo with revisions in it: about 6GB. This is all working great.
The problem is that soon we won't have a small office any more - all our 5 workers will be working remotely. I had considered purchasing a dedicated server and running our SVN repository from that, except two of our workers will be really remote and will be using wireless "broadband" with a 3GB/month limit, and I'm afraid that a few large updates will really rip through their monthly allowance, not to mention taking all day to complete.
Reading a few questions on Stack Overflow, it seems there's quite a community of distributed VCS aficionados who think git or mercurial is definitely the best for many situations. Given that all the employees would still be able to meet face-to-face at least once a fortnight (and hence be on a fast LAN), I'm wondering if a DVCS would work for us?
I don't know exactly what's in your repo, but unless you're changing all the files regularly, a DVCS should provide you a very desirable workflow.
You could do an svn -> git conversion, stick the repo on a DVD and mail it out to all the satellite offices, and then let them fetch from the office as things change at a fairly low incremental cost (should be smaller than the delta in general).
Checkout the Fossil DVCS, it may fit your bill. Fossil may be used like SVN or a DVCS. If you are concerned about it handling your current repository try it out. It also has a built in project wiki and bug tracking system that distribute with the repository as well. You could try it out and see if it would work for your small team.
The pain for you would be losing your revision history, at this time I don't beleive you can import a svn repository into Fossil.
Join the mailing list and you will get answers for any of your questions. The creator of SQLite is also the creator of this project as well. Hope this helps.
I can't see why not. With something like git, the repository is local to the machine, and so your remote employees can actually have a tracked changelog that can then be merged or rebased with the main repository--whatever you decide that to be--when they get the chance.
Also, git has really good compression compared to SVN, so the 3GB/mo quota may be more than enough for your remote employees.
Randal Schwartz actually gave a really good presentation on git at Google's Tech Talks: http://www.youtube.com/watch?v=8dhZ9BXQgc4
(It seems no one is answering this.) DVCS of course seems like it would work, but I have no experience with it. A centralized system like svn might also work if you are not expecting large changes daily. (to go up and back from the server) The initial get in that case would be the only real expensive issue.
Can you monitor your use now and see how much traffic goes back and forth?
The real problem here is the 3GB/mo bandwidth limitation. It's probably just better to come up with a better solution for connectivity...

What are you using for Distributed Caching in web farms running ASP.NET?

I am curious as to what others are using in this situation. I know a couple of the options that are out there like a memcached port or ScaleOutSoftware. The memcached ports don't seem to be actively worked on (correct me if I'm wrong). ScaleOutSoftware is too expensive for me (I don't doubt it is worth it). This is not to say that I don't want to hear about people using memcached or ScaleOutSoftware. I'm just stating what I "know" at this point.
So my question is basically this: for those of you ACTIVELY using distributed caching, what are you using, are you happy with it, and what should I look out for?
I am moving to two servers very soon...both will be at the same location. I use caching fairly heavily (but carefully) to reduce the load on my database server.
Edit: I downloaded Scaleout Software's solution. I've coded for it and it seems to work real well. I just have to decide if my wallet will part with the cash for it. :) Anyone have experiences good or bad with ScaleoutSoftware?
Edit Again: It's been a little while since I asked this? Any more thoughts on it? We ended up buying the solution from ScaleOutSoftware and have been happy with it, but I'm curious what others are doing.
Microsoft has a product pending code-named Velocity. It's still in CTP, and is moving slowly, but looks like it will be pretty good. We'll be beating it up in the near future to see how it handles what we want it to do (> 2 million read/writes per hour). Will post back with results.
There is a 100% native .NET, well documented open source (LGPL) project called Shared Cache. Looks like it is not yet mentioned on SO, but it's promising and should be able to do what most people expect from a distributed cache. It even supports different strategies like distributed or replicated caching etc.
I will update this post with more details as soon as I had a chance to try it on a real project.
We're currently using an incredibly simple cache that I wrote in a couple of hours, based on re-hosting the ASP.NET cache in a Windows Service (more info and source code here). I won't pretend it's anywhere near as optimised as something like Memcached but we were just looking for something simple and free until Velocity came along, and it's held up extremely well even under fairly heavy load.
It comes down to our personal preference for core components - i.e. ones that affect whether the site is available or not - that they are either (a) supported by a vendor with a history of rapid and high quality support, or (b) written by us so that if something goes wrong we can fix it quickly. Open source is all well and good, and indeed we do use some OSS, but if your site is offline then unfortunately newsgroups et al don't have a 1 hour SLA, and just because it's OSS doesn't mean you have the necessary understanding or ability to fix it yourself.
We are using the memcached port for Windows and we are very pleased with it. The enyim.com memcached client API is great and easy to work with. It's also open source, which is a big advantage, if you ask me.
We are now using this setup in a production web-app and it has helped a lot in improving its performance.
There's a great .NET wrapper/port found here on Codeplex. Awesomesauce!
We use memcached with the enyim library in a production environment (www.funda.nl). Works fine, very pleased with it, but we did notice a substantial raise in CPU use on the clients. Presumably due to the serializing/deserializing going on. We do around 1000 reads per second.
One tried and tested product by 100's of customers worldwide is NCache. Its
a feature rich product that lets you store session state in a redundant and highly available manner, lets you share data
within the enterprise as well as bridging for WAN communication essentially acting as a data fabric and lastly it lets you build an elastic caching tier so that when
your application scales, you can add servers to the cache and actually boost performance further.

Resources