Related
my name is Benjamin Mach and I’m a student of computer science at the FAU Erlangen-Nürnberg. For my bachelor thesis, I want to prepare measurement software for collaboration within software organizations that collaborate (potentially over intra-organizational boundaries) using software forges. For the models we want to use, it is important that I collect data about the structure of the organizations.
I read about a strategy of resolving the missing hierarchy through delimiters. (e.g. maingroup-subgroup1-subgroupofsubgroup3) Is this common practice for you and if not, how do you handle this?
Thank you for your time.
Best regards,
Benjamin Mach
Answering this question made me wonder: is there nice modern version of the famous C10K page? 5 years passed from the time it was updated and I'm sure there're advances in the field.
Perhaps current version would be called something like C1M problem? ;)
In the modern world of things like distributed load-balancers and content distribution networks, I'm not sure that a single system having to be able to handle such a large number of concurrent connections is as big a deal. These days scalability is more about scaling out to distribute than scaling up to beef up your capabilities.
There's a million user comet application , though it investigates Erlang(+ OS tuning)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
What are the relevant skills in the arsenal of a Data Scientist? With new technologies coming in every day, how does one pick and choose the essentials?
A few ideas germane to this discussion:
Knowing SQL and the use of a DB such as MySQL, PostgreSQL was great till the advent of NoSql and non-relational databases. MongoDB, CouchDB etc. are becoming popular to work with web-scale data.
Knowing a stats tool like R is enough for analysis, but to create applications one may need to add Java, Python, and such others to the list.
Data now comes in the form of text, urls, multi-media to name a few, and there are different paradigms associated with their manipulation.
What about cluster computing, parallel computing, the cloud, Amazon EC2, Hadoop ?
OLS Regression now has Artificial Neural Networks, Random Forests and other relatively exotic machine learning/data mining algos. for company
Thoughts?
To quote from the intro to Hadley's phd thesis:
First, you get the data in a form that
you can work with ... Second, you
plot the data to get a feel for what
is going on ... Third, you iterate
between graphics and models to build a
succinct quantitative summary of the
data ... Finally, you look back at
what you have done, and contemplate
what tools you need to do better in
the future
Step 1 almost certainly involves data munging, and may involve database accessing or web scraping. Knowing people who create data is also useful. (I'm filing that under 'networking'.)
Step 2 means visualisation/ plotting skills.
Step 3 means stats or modelling skills. Since that is a stupidly broad category, the ability to delegate to a modeller is also a useful skill.
The final step is mostly about soft skills like introspection and management-type skills.
Software skills were also mentioned in the question, and I agree that they come in very handy. Software Carpentry has a good list of all the basic software skills you should have.
Just to throw in some ideas for others to expound upon:
At some ridiculously high level of abstraction all data work involves the following steps:
Data Collection
Data Storage/Retrieval
Data Manipulation/Synthesis/Modeling
Result Reporting
Story Telling
At a minimum a data scientist should have at least some skills in each of these areas. But depending on specialty one might spend a lot more time in a limited range.
JD's are great, and for a bit more depth on these ideas read Michael Driscoll's excellent post The Three Sexy Skills of Data Geeks:
Skill #1: Statistics (Studying)
Skill #2: Data Munging (Suffering)
Skill #3: Visualization (Story telling)
At dataist the question is addressed in a general way with a nice Venn diagram:
JD hit it on the head: Storytelling. Although he did forget the OTHER important story: the story of why you used <insert fancy technique here>. Being able to answer that question is far and away the most important skill you can develop.
The rest is just hammers. Don't get me wrong, stuff like R is great. R is a whole bag of hammers, but the important bit is knowing how to use your hammers and whatnot to make something useful.
I think it's important to have command of a commerial database or two. In the finance world that I consult in, I often see DB/2 and Oracle on large iron and SQL Server on the distributed servers. This basically means being able to read and write SQL code. You need to be able to get data out of storage and into your analytic tool.
In terms of analytical tools, I believe R is increasingly important. I also think it's very advantageous to know how to use at least one other stat package as well. That could be SAS or SPSS... it really depends on the company or client that you are working for and what they expect.
Finally, you can have an incredible grasp of all these packages and still not be very valuable. It's extremely important to have a fair amount of subject matter expertise in a specific field and be able to communicate to relevant users and managers what the issues are surrounding your analysis as well as your findings.
Matrix algebra is my top pick
The ability to collaborate.
Great science, in almost any discipline, is rarely done by individuals these days.
There are several computer science topics that are useful for data scientists, many of them have been mentioned: distributed computing, operating systems, and databases.
Analysis of algorithms, that is understanding the time and space requirements of a computation, is the single most-important computer science topic for data scientists. It's useful for implementing efficient code, from statistical learning methods to data collection; and determining your computational needs, such as how much RAM or how many Hadoop nodes.
Patience - both for getting results out in a reasonable fashion and then to be able to go back and change it for what was 'actually' required.
Study Linear Algebra on MIT Open course ware 18.06 and substitute your study with the book "Introduction to Linear Algebra". Linear Algebra is one of the essential skill sets in data analytic in addition to skills mentioned above.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
There are a lot of good solutions for audio and video conferencing, task, calender and document management. We got specs, uml diagrams, code generators, etc.
But still companies pour tons of cash so that people can be physically there even in the times of recession and i wonder why?
Nothing beat a face to face meeting. Period.
I work with a remote development team every day and I can only support other responders in saying that NOTHING beats working face-to-face. You need the subtle cues of body language, facial expression, and the ease of communication when you're physically present such as doodling on a whiteboard. Video conferencing is a close second but the organizational issues are difficult to overcome (meeting rooms, webcams, bandwidth...).
Communication through documentation works to some extent, but is often perceived as unnecessary overhead by developers who drank the Agile kool-aid. I try to use the phone, skype, MSN or e-mail as much as possible, but it works better with those people of the team that I've actually worked with in-person for at least a few days.
Distance isn't to much the problem, to be honest. You're either co-located or you're not.
We use a combination of Skype IM, Skype Voice, mobile phones and email to keep in touch. We haven't really got into webcams properly, but even then, there's something about face to face contact you can't really replicate with technology.
I think most companies see splitting up their workforce as a step-change. A company that started out with homeworkers is better able to find and establish an office to move them into than the other way round. Money is only one consideration - it really does change the way the team works, and if you get it wrong, you don't have a team, you have a bunch of solo developers who actually take longer to do things.
Of course, it's also easier to recruit and mentor new members of the team if there's an office for everyone to work in.
Many people do better face-to-face. Many people do just as well at a distance. However, people in management tend to more focused on interpersonal relations, which generally means they're face-to-face people. So, as a general rule, people in management tend to dislike or distrust meeting at a distance.
Furthermore, meetings are very often unproductive. This applies to meetings at a distance and face-to-face meetings. Indeed, it's significantly easier to get off-topic, unprofessional, and unproductive when meeting face to face. However, when an at-a-distance meeting is unproductive, it's almost always seen as such because it was at a distance. It can be exceptionally frustrating to deal with the technology and limitations of remote meetings, and it's infinitely easier to blame the situation and the technology for your lack of productivity.
To sum up: people are a problem.
I have a few cynical and contrary opinions on this, based on my experience working for a large Australian organisation with branches all over the country and my current remote work for a US company.
Cynically - face to face works so you can do deals off the record. This may not be as corrupt or as underhand as it sounds but an astounding amount of management-level decision making happens where people negotiate relative tradeoffs involving influence, favors accrued and owed and stuff which is hard or embarrassing to quantify. Even when an organisation has a commitment to using teleconferencing, groups emerge who negotiate off-camera and thus acquire a competitive advantage.
At the purely technical level, I think face-to-face is nowhere near as important as cited. The political issue is in drawing this distinction - if you label your stuff as non-political and safe to do via remote comms, you are explicitly labeling the other negotiations as somehow not safe. Another aspect is that people looking to move up to management need to become visible and a known player in the face-to-face discussions.
Developers, including myself, are notoriously poor at picking up the non-verbal cues cited above (just ask my wife!). In a relaxed atmosphere of trust, they can use emoticons and in-jokes explicitly in IM sessions without worrying about translating someone else's expression, especially across cultures.
IM sessions, with the ability to search the transcript, are far more efficient than verbal or video conversations, when discussing projects. If you don't pick up some nuance at the time someone says it, you can go back and examine the exact sentence in context.
I use video chat infrequently and the main use of voice chat is so I can talk to my boss in his spare time whilst he's driving. Those are good conversations to give me a general feel for how things are going but usually inadequate for technical.
Here's a podcast that talks about distributed software development: Managing Commercial Software Projects. Here's a blurb from the show page:
Andy Singleton is an entrepreneur who
has long studied and practiced the art
of distributed software development.
Influenced by the open source and
agile movements, he has arrived at
some startling conclusions about how
to manage commercial projects. Among
them: don't interview people, don't
estimate schedules, and don't spend
time in teleconferences. In this
conversation with host Jon Udell he
explains why not to do these things,
and what to do instead.
I thought it was pretty interesting.
Face to face meetings provide a lot of visual feedbacks which you do not get in other means. This is must if you want to discuss important subjects like architecture, reviews etc, which tend to be slow or useless if done over phone, email, twiki etc.
Typically status updates can be done via other means, we normally use Skype, Twiki, Email, Phone keep in sync.
I really like communication via IM, email or phone. It's totally ok and I really appreciate using it.
Now comes the big "but" (with a single 't'):
You are not able to "just walk over" your colleague and ask him** a short question. For sure, you can IM him or mail him. But it will take some times till he answers your question.
The other point is drinking coffee. You cannot just drink a cup of coffee with him and talk about your problems.
If you let your brain release your thoughts, problem will disappear. And that's one reason behind drinking coffee with colleagues.
I really need personal communication. I need it. About 70% of the communication can be replaced by IM or whatever, but the 30% are very important.
** him = him/her
Some 80% communication is non-verbal, video conferencing helps a bit, but is not good enough.
There have been studies done over success rate of business negotiation and project cooperation depending on type of communication used. It was ranging from 90% success face to face, to less then 10% with email and text only IM.
For example one of such studies, conducted by Harvard Business School professor Kathleen L. Valley yielded for example such results:
"among 24 four-person decision making
groups interacting via computer, there
were 102 instances of rude or
impulsive behavior. Another 24 groups
that interacted in person yielded only
12 remarks of that nature."
Related Wired article: "The Secret Cause of Flame Wars"
I'm a totally blind individual who would like to learn more of the theory aspect of computer science. I've had an intro data structures class and the general intro programming but would like to learn more on things such as software design, advanced data structures, and compiler design. I want to do this as a self study course not as part of college classes.
Unfortunately there aren’t many text books available on computer science from Recordings for the Blind and Dyslexic where I normally get my textbooks. I would appreciate any electronic resources preferably free that could help me get more of a computer science education rather then the newest language or platform that a lot of programming sites appear to focus on.
You might find the Experiences of a Blind Computer Scientist a good read.
MIT's Open Courseware would be a good resource for you with the amount of videos/audio they have.
Really though, for the core computer-science topics I find it pretty hard to beat some of the better textbooks out there. Some offer digital versions of their book with purchase and some don't. For those that don't, I would just purchase the book and then download via a torrent site a digital e-book equivelant. Since you already own the book I don't think this would be a major problem.
UC Berkley has a couple of computer science courses online for free as mp3 and video files (including RSS feed for each course). And if reading PDF files aren't an issue you could check out O'Reilly's Safari.
The text book for Structure and Interpretation of Computer Programs appears to be accessible. Software engineering radio is a good podcast that I listen to but recently has focused a lot on model driven development and UML which doesn't interest me. The UC Berkley
lectures are of varying quality, it's like all other college classes it depends on the professor. I've found I can follow along with the cs162 lectures fine but not so much with the cs61b. Part of this is because of the professor and part is probably because 61b is more math heavy since it's a data structures class. Unfortunately the RSS feeds are useless since the file names are meaningless. I used my podcatcher to download the entire lecture series, then used the converting capability of foobar 2000 to rename the files with there track number so I could listen to them in order. I've used Safari at work before and it is accessible although to expensive for me to get a yearly subscription. Open Courseware appears to have a lot of good stuff. Unfortunately I don't use itunes so instead of downloading each mp3 file individually I used the firefox extension DownThemAll! with a custom filter to grab all the mp3 files at once from the specific course I wanted. Another series of books that looks useful are the data structures books by Bruno R. Preiss several of which are available online at
http://www.brpreiss.com/books/opus5/
Some of the equations are represented as graphics but I can often tell what the general idea is by context.
I wonder would the Structure and Interpretation of Computer Programs video lectures by Hal Abelson and Gerald Jay Sussman be of any use?
If the audio content is enough on its own without the video, they are an excellent digital resource.
The podcast "software engineering radio" is excellent. Though not CS courseware, it is the most academic and intellectually stimulating podcast I have found about software development and computer science.
http://www.se-radio.net/
personally I am just blown away by the questioner. I mean, the challenge alone of programming is too much for most people but being without the primary sense used in the task is amazing to me. What is ironic though is I bet that given this challenge the questioner is still FAR more adept at most CS tasks than the people I work with day to day. Just saying.
I'm also a totally blind programmer, currently working for Microsoft. The most valuable resource for te technical books is Safari (safari.oreilly.com). You can read thousands of computer science texts there. if you're in the USA, you can also get many of those titles for free from BookShare (www.bookshare.org). In both cases graphical images will be an issue, but there's no easy solution for that. Most good books have enough descriptive text that one can manage without the diagrams.
I to am a new blind programmer! I only lost my vision 5 years ago. Anyway, I have been programming in Visual Basic 2008 throughout the past year. It turned out to be more accessible than I had at first suspected.
I start a Java class next semester and the required text is a free online text! It is posted below.
Introduction to Programming Using Java, Fifth Edition
http://math.hws.edu/javanotes/
Can some of you seasoned blind programmers share with us any blogs or websites where other blind programmers can be found??
Check out this Stack Overflow question about podcasts.
A language called Quorum is a lot like Python but optimized across a few more syntactic details, and the corresponding development environment is designed with the blind in mind. https://quorumlanguage.com/ This might fit especially well with the use case where most students are using Python.
A 2016 blog about CSed (actually a response to a blog post) points to
program-l discussion board for blind programmers at https://www.freelists.org/list/program-l
The EPIQ conference for blind and other programmers interested in Quorum
https://quorumlanguage.com/epiq.html
Also, see other ideas in a similar question on another SO site: https://cseducators.stackexchange.com/questions/3441/teaching-a-blind-high-school-student