So far I've only used Jupyter on my local machine, which is way too slow. I'm completely new to using cloud services for Jupyter, or using cloud services at all for that matter. I know there are a million tutorials out there, but this is my problem: How to choose the right service from all those options (Amazon? Google? Cheaper options?)? What's the 'right way' to get started?
What I need:
I want a service where I can start up a Jupyter notebook in my browser as simply as possible. (I know next to nothing about setting up servers etc., and have very limited time to learn that if needed)
I currently have an old MacBook from 2014. The server should be at least 10x faster. (Which options do I need to pick?)
I want to do machine learning, so GPUs would be good.
My budget is about $50 per month, less would be great; a free tryout would be great too.
As I am completely new, I also need to know what pitfalls to look out for. (E.g.: Stop the machine to stop increasing the costs?)
If you could help me, or point me to a good tutorial or even a book, I'd be forever grateful.
(Sorry for the basic question. Of course I googled tutorials myself before posting this question, but as indicated above, I'm overwhelmed by the options - that's why I posted this question.)
AWS based tutorial:
https://aws.amazon.com/de/getting-started/hands-on/get-started-dlami/
GPU, CPU and pricing informations are gathered here:
https://docs.aws.amazon.com/dlami/latest/devguide/pricing.html
You can set up a budget for cost limitation:
https://aws.amazon.com/de/getting-started/hands-on/control-your-costs-free-tier-budgets/
Related
A friend wants to start a dating website, she wants me to help her. We still haven't discussed on what platform it'll be developed, but I'm thinking she'll suggest LAMP to save a buck (which is one reason already to chose over ASP.NET already). If the dating website does well, it'll potentially hold a large amount of data (I'm not sure if this would be another reason to consider either ASP.NET or LAMP).
Anyway, I ask this from an ASP.NET developer point of view. I have very little, almost null experience with LAMP, and I don't like it very much either, so if she decides to go with PHP odds are I won't help her. So what would be some good points to bring up when deciding which platform to develop on?
Please be objective, I don't want this to be argumentative or anything, try to stick to facts, not opinions alone.
Thanks!
What generally matter in that kind of choice is :
How much time will it require ?
How much money will it cost
Which is often linked to the time ^^
If you have a lot of experience with .NET and none with Linux/Apache/PHP/MySQL, choosing LAMP will mean that you'll need much more time : a whole lot of new stuff to learn.
It'll also mean that your code will probably not be as good as it would be with what you know.
After, the question is : do a couple of week "cost" more than a few licences ?
Only you and her can decide, there ;-)
If LAMP makes you queasy, you can try ASP.NET over Mono.
IMO the only good reason to move away from a programming environment that you are already experienced with is the one you already mentioned: cost.
You would use LAMP specifically to build appliances. If you're not building appliances, the software cost for ONE server is marginal, and is not worth the tradeoff for moving to a totally different development environment, IMO.
I think the first question is: Which is the target programming language and environment that you have experience with?
Imagine the site will become a success - how do you scale then? LAMP can scale, and so can WISC, but in both scenarios you need people who actually know the environment and who can secure it. If you don't know Linux and MySQL and PHP, how are you going to scale and secure it?
So even though LAMP may be significantly cheaper (The SQL Server license is the heavy part in the WISC stack), after the first hacker attack or downtime, that initial savings may seem marginal compared to the damage.
The other thing is of course the PHP vs. ASP.net/C# decision. If you don't know PHP, then it's a decision of "Not having the application at all" and "Having the application on an expensive stack", unless your partner of course decides to hire someone else to develop that.
Technically, both have their pros and cons, but there are huge websites built on both stacks, so it really boils down to "Which platform can you reliably/comfortably setup and maintain?"
I agree with Pascal. Go with what you feel comfortable with in completing the project and don't forget that YOUR TIME EQUALS MONEY. You have to put a $$ value on your time. LAMP may be cheaper up front but if it winds up taking 1000 extra manhours, then suddenly it's more expensive.
Also take into account the lost opportunity cost in not being able to bring something to market b/c you chose a technology you were not familiar with.
At the end, if the plans are for this to be a business that is successful, the cost of using ASP.NET should be negligible or else I would question the seriousness of the effort.
One argument for the Apache/MySQL/PHP stack is that it's available on most major platforms (Windows/Linux/Mac/BSD/...) and most webhosters provide it as well.
You also find many (as in "huge amounts") of good tutorials, books and other educational stuff about PHP/MySQL.
Apart from that all tools used in the LAMP stack are free (as in "free speech" and also as in "free beer"). ASP.NET is still a proprietary technology owned by Microsoft. I'm not a huge open source fan, but knowing that your tools will remain free to use in any way you want is quite nice.
Of course, if you have no experience with PHP at all and much exp. with ASP.NET it's easier for you to stick with ASP.
If your comfortable with Microsoft products there's nothing to stop you from developing code in .NET and using a free database (however you may need to find/develop a custom database adapter if you are not using free versions of SQL server or Oracle). If you are generating a lot of traffic you can swap out the data layer of your code and invest in a better performing database.
Time costs money and if you can develop a better product both from a user and maintenance/performance perspective it will serve you better in the long run.
Some hosting companies include the OS and flexible contracts so I would make fit from your prespective. The market's pretty competitve for that type of site and there's no point throwing a lot of money at it until you get some useful metrics for your site IMO.
The short answer is: it doesn't matter, unless the site is going to do something so amazingly different that one technology is obviously better suited. And I can't think of anything like that off the top of my head.
A big red flag is: if your friend is concerned about the extra $5/month for asp.net hosting instead of LAMP hosting, then you're probably not going to get paid. Ever.
Caveats aside, be realistic: what is the immediate goal? To get something working, or to design something on the scale of plentyoffish.com or facebook.com? [Facebook.com has about 44,000 servers at the moment]
So, what are the chances of your friend's dating web site exploding to the size where scaling is a concern? For most sites, the answer is "very close to zero" - because of the marketing effort required to drive that much traffic.
Now, what is the revenue stream? Is there any expectation that you will get paid to do this? Do you think the site will be profitable? Is the project fully funded?
Friendship is great, but don't let that keep you from asking the appropriate business and client-relationship questions. One sure way to ruin a friendship is to do some work for free and/or without thinking through the full extent of the project. Far too often, you think it is a one-time favor, while they think it is your job!
LAMP is only cheaper until you read the fine print. It's not better or worse technically, just different.
The WebsiteSpark/BizSpark programs will get you all the Microsoft software you need to get started, free for three years. If price is her driving concern, point her to those programs if she's willing to consider the ASP.NET platform.
Hosting will cost a fair amount either way, because for a full-service website you don't want to go shared. You'll need at least one dedicated server to support a dating site. The OS and database will be free either way if you go with one of the *Spark programs I mentioned.
As a small startup company you can get a free 3-year MSDN subscription (well, you have to pay $100 at the end of the 3 years). If you think .Net will be more efficient and this website will make money, seriously consider BizSpark.
Since you are looking for dating site, check out Markus Frind of plentyoffish.com he is running the largest dating site on .net platform with asp.net and sql.
Apologies for this huge question.....please bear with me and try to help :)
Previous employers have all had in house hosting or people other than me to deal with that side of stuff and all my personal projects (ie low traffic) have been comfortably handled by servergrid.com who allow any number of domains even in their basic package.
I am about to take on more serious projects and have little clue about hosting, the questions to ask and what to look for. Some basic research has been done but I am honestly confused by the number of metrics involved when main thing i care about is SPEED & SCALING.
I have noticed that servergrid db servers for instance shares many 100s DB users/server so I imagine a shared package where your paying just 2$/month for sql server, tho a bargain, is not going to scale beyond a hobby site.
So:
is moving to a dedicated or virtual dedicated server the simple answer to speed and the only real metric I need to worry about?
dedicated pricing is a big jump on servergrid - are there premium shared services that don't put a bazillion people on the server - it doesn't seem obvious from the sites, would it make a huge difference?
the landscape seems to changing in a big way - IIS7 and Server 2008 seem to have all these
features like Isolated Application Pools/ Hyper V, are these just BS hype or things that seriously help with scaling and speed?
Lastly cloud hosting (specifically http://www.rackspacecloud.com) - it runs .NET right, is it fundamentally architecturally different to anything else or just use of the word cloud for marketing? It looks v cool - but is it just normal hosting with a different billing model and a somewhat easier way to scale? Is this similar to the much hyped squarespace hosted blog/site system?
Sorry for my rambling style of question and would be deeply grateful for someone who can just in relatively plain english sweep away some of my basic misconceptions....
Thanks!
Okay, take a look at Amazon Web Services. They are very flexible in terms of infrastructure (both hardware and software) and I find their rates to be ok. Also, their business model revolves around "using" not "leasing" (ie you pay based on what you use, for how long, etc).
I think it's a good starting point.
Since your main concern is "speed" & "scale" you may also take a look at Windows Azure and SQL Azure
Windows Azure
A nice brief video explanation by Steve Marx.
What is Windows Azure
I would stay away from shared hosting for a "more serious" production deployment. Amazon's AWS is as good a place to start as any (rackspace has a similar service which now supports self-provisioning). Failing that, you might carefully evaluate how much scale you really need. If you know how many users you'll have and have any idea what their usage patterns will be, then get dedicated hosting to fit. If the number of your users is unknown and unpredictable, and their usage will be spiky, then go with AWS.
That would be my first-pass approach. YMMV, and it will take time to fine-tune your own approach.
We have 4 different environments:
Staging
Dev
User Acceptance
Live
We use TFS, pull down the latest code and code away.
When they finish a feature, the developers individually upload their changes to Staging. If the site is stable (determined by really loose testing), we upload changes to Dev, then UserAcceptance and then live.
We are not using builds/tags in our source control at all.
What should I tell management? They don't seem to think there is an issue as far as I can tell.
If it would be good for you, you could become the Continuous Integration champion of your company. You could do some research on a good process for CI with TFS, write up a proposed solution, evangelize it to your fellow developers and direct managers, revise it with their input and pitch it to management. Or you could just sit there and do nothing.
I've been in management for a long time. I always appreciate someone who identifies an issue and proposes a well thought-out solution.
Whose management? And how far removed are they from you?
I.e. If you are just a pleb developer and your managers are the senior developers then find another job. If you are a Senior developer and your managers are the CIO types, i.e. actually running the business... then it is your job to change it.
Tell them that if you were using a key feature of very expensive software they spent a lot of money on, it would be trivial to tell what code got pushed out when. That would mean in the event of a subtle bug getting introduced that gets passed user acceptance testing, it would be a matter of diffing the two versions to figure out what changed.
One of the most important parts of using TAGS is so you can rollback to a specific point in time. Think of it as an image backup. If something bad gets deployed you can safely assume you can "roll" back to a previous working version.
Also, developers can quickly grab a TAG (dev, prod or whatever) and deploy to their development PC...a feature I use all the time to debug production problems.
So you need someone to tell the other developers that they must label their code every time a build is done and increment a version counter. Why can't you do that?
You also need to tell management that you believe the level of testing done is not sufficient. This is not a unique problem for an organisation and they'll probably say they already know. No harm in mentioning it though rather than waiting for a major problem to arrive.
As far as individuals doing builds or automated build processes this depends on whether you really need this based on how many developers there are and how often you do builds.
What is the problem? As you said, you can't tell if management see the problem. Perhaps they don't! Tell them what you see as the current problem and what you would recommend to fix the problem. The problem has to of the nature of "our current process has failed 3 out of 10 times and implementing this new process would reduce those failures to 1 out of 10 times".
Management needs to see improvements in terms of: reduced costs, icreased profits, reduced time, reduced use of resources. "Because it's widely used best practice" isn't going to be enough. Neither is, "because it makes my job easier".
Management often isn't aware of a problem because everyone is too afraid to say anything or assumes they can't possibly fail to see the problem. But your world is a different world than theirs.
I see at least two big problems:
1) Developers loading changes up themselves. All changes should come from source control. Do you encounter times where someone made a change that went to production but never got into source control and then was accidentally removed on the next deploy? How much time (money) was spent trying to figure out what went wrong there?
2) Lack of a clear promotion model. It seems like you guys are moving changes between environments rather than "builds". The key distinction is that if two changes work great in UAT because of how they interact, if only one change is promoted to production it could break there. Promoting consistent code - whether by labeling it or by just zipping up the whole web application and promoting the zip file - should cause fewer problems.
I work on the continuous integration and deployment solution, AnthillPro. How we address this with TFS is to retrieve the new code from TFS based on a date-time stamp (of when someone pressed the "Deliver to Stage" button).
This gives you most (all?) the traceability you would have of using tags, without actually having to go around tagging things. The system just records the time stamp, and every push of the code through the testing environments is tied to a known snapshot of code. We also have customers who lay down tags as part of the build process. As the first poster mentioned - CI is a good thing - less work, more traceability.
If you already have TFS, then you are almost there.
The place I'm at was using TFS for source control only. We have a similar setup with Dev/Stage/Prod. I took it upon myself to get a build server installed. Once that was done I added in the ability to auto deploy to dev for one of my projects and told a couple of the other guys about it. Initially the reception was luke warm.
Later I added TFS Deployer to the mix and have it set to auto deploy the good dev build to stage.
During this time the main group of developers were constantly fighting the "Did you get latest before deploying to Stage or Production?" questions; my stuff was working without a hitch. Believe me, management and the other devs noticed.
Now (6 months into it), we have a written rule that you aren't even allowed to use the Publish command in visual studio. EVERYTHING goes through the CI build and deployments. When moving to prod, our production group pulls the appropriate copy off of the build server. I even trained our QA group on how to do web testing and we're slowly integrating automated tests into the whole shebang.
The point of this ramble is that it took awhile. But more importantly it only happened because I was willing to just run with it and show results.
I suggest you do the same. Start using it, then show the benefits to get everyone else on board.
Here's the situation: at my small office, because we like to keep mobile and occasionally work from home, instead of having a central file server, we have all the office documents in an SVN repository, and each person keeps a checkout on their own laptops. A checkout weighs in at about 3GB, and the repo with revisions in it: about 6GB. This is all working great.
The problem is that soon we won't have a small office any more - all our 5 workers will be working remotely. I had considered purchasing a dedicated server and running our SVN repository from that, except two of our workers will be really remote and will be using wireless "broadband" with a 3GB/month limit, and I'm afraid that a few large updates will really rip through their monthly allowance, not to mention taking all day to complete.
Reading a few questions on Stack Overflow, it seems there's quite a community of distributed VCS aficionados who think git or mercurial is definitely the best for many situations. Given that all the employees would still be able to meet face-to-face at least once a fortnight (and hence be on a fast LAN), I'm wondering if a DVCS would work for us?
I don't know exactly what's in your repo, but unless you're changing all the files regularly, a DVCS should provide you a very desirable workflow.
You could do an svn -> git conversion, stick the repo on a DVD and mail it out to all the satellite offices, and then let them fetch from the office as things change at a fairly low incremental cost (should be smaller than the delta in general).
Checkout the Fossil DVCS, it may fit your bill. Fossil may be used like SVN or a DVCS. If you are concerned about it handling your current repository try it out. It also has a built in project wiki and bug tracking system that distribute with the repository as well. You could try it out and see if it would work for your small team.
The pain for you would be losing your revision history, at this time I don't beleive you can import a svn repository into Fossil.
Join the mailing list and you will get answers for any of your questions. The creator of SQLite is also the creator of this project as well. Hope this helps.
I can't see why not. With something like git, the repository is local to the machine, and so your remote employees can actually have a tracked changelog that can then be merged or rebased with the main repository--whatever you decide that to be--when they get the chance.
Also, git has really good compression compared to SVN, so the 3GB/mo quota may be more than enough for your remote employees.
Randal Schwartz actually gave a really good presentation on git at Google's Tech Talks: http://www.youtube.com/watch?v=8dhZ9BXQgc4
(It seems no one is answering this.) DVCS of course seems like it would work, but I have no experience with it. A centralized system like svn might also work if you are not expecting large changes daily. (to go up and back from the server) The initial get in that case would be the only real expensive issue.
Can you monitor your use now and see how much traffic goes back and forth?
The real problem here is the 3GB/mo bandwidth limitation. It's probably just better to come up with a better solution for connectivity...
I am curious as to what others are using in this situation. I know a couple of the options that are out there like a memcached port or ScaleOutSoftware. The memcached ports don't seem to be actively worked on (correct me if I'm wrong). ScaleOutSoftware is too expensive for me (I don't doubt it is worth it). This is not to say that I don't want to hear about people using memcached or ScaleOutSoftware. I'm just stating what I "know" at this point.
So my question is basically this: for those of you ACTIVELY using distributed caching, what are you using, are you happy with it, and what should I look out for?
I am moving to two servers very soon...both will be at the same location. I use caching fairly heavily (but carefully) to reduce the load on my database server.
Edit: I downloaded Scaleout Software's solution. I've coded for it and it seems to work real well. I just have to decide if my wallet will part with the cash for it. :) Anyone have experiences good or bad with ScaleoutSoftware?
Edit Again: It's been a little while since I asked this? Any more thoughts on it? We ended up buying the solution from ScaleOutSoftware and have been happy with it, but I'm curious what others are doing.
Microsoft has a product pending code-named Velocity. It's still in CTP, and is moving slowly, but looks like it will be pretty good. We'll be beating it up in the near future to see how it handles what we want it to do (> 2 million read/writes per hour). Will post back with results.
There is a 100% native .NET, well documented open source (LGPL) project called Shared Cache. Looks like it is not yet mentioned on SO, but it's promising and should be able to do what most people expect from a distributed cache. It even supports different strategies like distributed or replicated caching etc.
I will update this post with more details as soon as I had a chance to try it on a real project.
We're currently using an incredibly simple cache that I wrote in a couple of hours, based on re-hosting the ASP.NET cache in a Windows Service (more info and source code here). I won't pretend it's anywhere near as optimised as something like Memcached but we were just looking for something simple and free until Velocity came along, and it's held up extremely well even under fairly heavy load.
It comes down to our personal preference for core components - i.e. ones that affect whether the site is available or not - that they are either (a) supported by a vendor with a history of rapid and high quality support, or (b) written by us so that if something goes wrong we can fix it quickly. Open source is all well and good, and indeed we do use some OSS, but if your site is offline then unfortunately newsgroups et al don't have a 1 hour SLA, and just because it's OSS doesn't mean you have the necessary understanding or ability to fix it yourself.
We are using the memcached port for Windows and we are very pleased with it. The enyim.com memcached client API is great and easy to work with. It's also open source, which is a big advantage, if you ask me.
We are now using this setup in a production web-app and it has helped a lot in improving its performance.
There's a great .NET wrapper/port found here on Codeplex. Awesomesauce!
We use memcached with the enyim library in a production environment (www.funda.nl). Works fine, very pleased with it, but we did notice a substantial raise in CPU use on the clients. Presumably due to the serializing/deserializing going on. We do around 1000 reads per second.
One tried and tested product by 100's of customers worldwide is NCache. Its
a feature rich product that lets you store session state in a redundant and highly available manner, lets you share data
within the enterprise as well as bridging for WAN communication essentially acting as a data fabric and lastly it lets you build an elastic caching tier so that when
your application scales, you can add servers to the cache and actually boost performance further.