Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm currently interested in working and studying Big data analytics and web analytics, but I don't know how and where to get started. I tried looking in the Internet, but some are advance for me. Is there any skills, knowledge in statistics and mathematics that I need first before going this route?
My current plan is to attend online courses every weekends, since I'm currently working as Associate Software Engineer during weekdays, and practice programming languages needed for Big data like R. I already have a degree in Computer Science so familiarity with some statistical and mathematical methods is not a problem. Any suggestions and comments are pretty much appreciated!
For those who already have an experience, how is your experienced and what do you work most with?
I am in a similar boat as you. I work in a web development department as a business analyst. I do some software development, data mining, and data visualization, but I am constantly improving my skills because it's all pretty interesting to me, and it makes me an extremely versatile employee.
Web Analytics/Big Data
See if you can get read access into your company's Google Analytics account, assuming they have a website. The API is really good, and pre-built packages in R make it really easy to get large amounts of data out. If their website is big enough, you can easily create your own, real data sets. While these probably won't be "big" as in "big data", they're definitely awesome for practicing data visualizations. I'd suggest learning Shiny and R Markdown. You can easily create web stats visualizations you can share with your company. If you end up coming up to issues with the amount of data you're trying to process (ie: if they have a huge web presence), then you might look into Spark for processing big data. Coursera has a specialization focusing on Big Data - https://www.coursera.org/specializations/big-data. You can take all the classes for free if you just "audit" them. You won't get certificate or anything, but you get access to all the course material. They apparently go through Spark, Hadoop, Pig, and Hive. I haven't taken it, but the UCSD Coursera classes I have taken have been pretty good.
Obviously Coursera isn't the end-all-be-all... Also check out edx.org, Pluralsight, Udemy, etc... You can get a free Pluralsight membership for a year - just Google it. Mine was through Microsoft somehow. My favorite courses by Pluralsight have been (unrelated to data/analytics) Ethical Hacking. Udemy often has amazing deals on HUGE courses - like 21 hours of lectures about Python for data analysis and stuff like that. Just sign up for the service, and you'll get a "special offer" in a week or two. They're usually $10-20. https://www.brighttalk.com/ is also a good place for webinars and talks related to data science/analytics.
Databases
My company uses SQL Server (Microsoft), so I also took some database classes on MVA (Microsoft Virtual Academy). They have a bunch of classes from complete noob to brushing up on skills: MVA Database Stuff.
Data Sets
If you find yourself needing big data sets, join Kaggle. They often have great data sets for machine learning, but you can use them yourself to mine and do visualizations. I'd look for labelled data sets in particular. Many of the bigger sets are completely anonymized - no labels, no nothin'. But that's not very fun if you're just digging around. Additionally, someone has compiled a bunch of public data sources here: https://github.com/caesar0301/awesome-public-datasets. Finally, NYC Open Data is one of my favorite places to get net data sets. Some are super boring, but there have been some cool analyses done on parking tickets and alike.
More...
If you're just looking for more classes to take or books to read, check out https://www.metacademy.org/. They have a few suggested paths to learn deep learning, machine learning, Bayesian stats, and other stuff like that. I think machine learning is an excellent next step - once you're versed in software development, database management/creation/querying, and visualization.
Even more...
Just immerse yourself. There are TONS of data blogs, podcast, meetup groups, conferences, and news out there. Do all you can to get in there and figure out what's going on and who's doing what. It's super interesting anyway. Two of my favorite things I follow: datatau (hacker news for data science) and I Quant NY (linked above, for parking tickets).
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Does anyone know ways to use GML programming skills to find or make work, such as freelance work using Game Maker?
I was thinking of doing freelance game prototyping work for people looking for a quick mock-up and getting the feel for their game, that can be done within a week or a few days.
If anyone has an idea, could someone please help me out? Thank you.
Personally, I started out using GameMaker as a hobby, and I've worked on some small projects for other people, but I eventually got hired as a website and database programmer rather than as a game programmer.
Unless you find a team that is already using GameMaker for its project(s), your experience with GML may not count for much on its own, as the language is only useful inside GameMaker itself. However, understanding GML means that you also have basic programming skills, and once you know one way of programming it goes much quicker to learn another.
GameMaker made programming easy and interesting for me, but other languages gave me the tools needed for non-game projects.
A company may not hire you based on the fact that you know GameMaker specifically, but it may hire you because you know programming. It could be wise to research other programming languages and learn the basics of how they work.
If you are to sell your skills to a client, they will likely care more about the end result than the exact road you took to get there. For example, if the job is to make a game that works on Android phones, that is something GameMaker can do, and by extension it is something you could do.
If GameMaker doesn't seem like the tool for the job, use what you learned from GameMaker to help you understand a different program/framework. Even if you focus on GameMaker, you may need other languages if you are to set up an online game server or scoreboard.
A lot of successful games have been made with GameMaker, so it's definitely possible to make a living by using it. The Showcase section on the official homepage shows us games like Hotline Miami and Undertale - big hits in the Steam store.
This article from GameMakerBlog.com lists a few people who's made it big. Most important, I would say, is "True Valhalla", who gives the community running updates on how his business is going. You can find his blog linked in the article. He has written a book about how to make money by selling apps and games, which could be well worth checking out.
If you wish to focus on freelance work using GameMaker alone, then make sure to understand the ins and outs of the program so that you can be as flexible as possible. Make sure that you understand how the movement functions work, how to do collision checking, how to work with data structures, how to work with views and surfaces, and so on.
The technical skill doesn't need to be perfect, but you need to have an idea of what to do and how in order to realize your ideas within a reasonable time frame. Practice until you feel comfortable taking a game from concept to demo in a short time, and build a collection of examples and engines that could be useful to you. If you can reuse a script, that's a lot better than writing it from scratch for every new game you make.
Finally: Marketing yourself. In order to become attractive to potential clients, it helps to demonstrate your expertise by publishing your work online. Make yourself visible. Post screenshots, videos, and playable versions of games you've made. You could blog about game development, or build up a small profile by helping people online and getting credit for it.
Any project you can point to and say "I worked on this" makes you a more credible developer. If you are just starting out you may not have any projects yet, so one suggestion would be to make a small mini-game and publish it in an app store. You may even publish it for free. For your first games, exposure could be as valuable as sales.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Background (sorry it's so long):
I've been tasked with maintaining an ETL that collects a variety of online advertising data, around 20-30 MBs a day, and appends it to tables in MySQL. Outside contractors built the ETL with Pentaho Spoon (kitchen, kettle?). The ETL consists of about 250 jobs and transformations (.ktr,.kjb), each with about 5 to 25 steps. It is very common that something is going wrong in this large process. I've found that writing R scripts to do the transform and load is much more efficient. In fact, I think the ETL could be reduced to well under 1000 lines of code besides calls with RMySQL (i.e. plyr!). Perhaps Python would be used to extract the data from the web.
My use of R has led to some resistance. The computer programmers that designed the ETL don't know R so couldn't be called if I leave, and moreover a lot of time was invested in the Spoon ETL. Also, a layman can more easily follow the steps visually in Spoon, than in the R scripts. For my part, I think we are getting bogged down by the ETL. However, I don't have a large say in the matter as I don't have a background in computer science.
Please comment if you have any insights on the following. Please know I have been researching this for months and have read many opinions, but nothing as concise or reliable as SO usually provides:
R has been called not as scalable by some at the company. I think the opposite mostly because of the logging capabilities. Spoon has limited pure logging output, whereas all R scripts can be sinked into a daily log. Fixing and avoiding mistakes in the .ktrs is very tedious, but easy with setting flags and/or searching through the R log. Any thoughts on this?
This leads to a big picture question. What is the point of ETLs like Pentaho? This post Do I need a ETL?, leads me to believe that if you use R or other so-called OOL, there is no reason to have a tool like Pentaho. Can someone please confirm this if so? I really need a second opinion here. If this is so who uses tools like Pentaho? Is it simply people without the programming background, or someone else? I do see a fair amount of Pentaho questions on SO.
It is true that a lot more people use R and than Pentaho, right? This http://www.kdnuggets.com/2012/05/top-analytics-data-mining-big-data-software.html makes it look so. To be honest I was surprised that Pentaho was 5th, which makes me doubly wonder who uses Pentaho and if my doubts about it's use in my work setting are misplaced.
Thanks for any responses. I don't mean any condescension towards Spoon or Spoon users; I am just really confused and in need of outside opinions.
R as an ETL tool? Thats a new one, but whatever floats your boat.
I would say this though, if you can get 250 jobs and transformations down to under 1000 lines of R I would say your ETL is poorly written.
Along with this you have to think about supportability and scalability. Both of which I would imagine would be far easier with a graphical tool like Spoon rather than R code.
Personally I think you are misguided and the question you ask is poorly written but thats a different argument.
Regarding your points, PDI's logging is very good and you can log pretty much however you like, all into one large database table if you like a consolidated log.
ETL's wont be going away, even with the advent of the love of unstructured data storage pools like HDFS, also think about data analysis done outside R, if you want reporting or OLAP over the top of your data, it will still need transforming regardless.
Is it true, more people use R vs Pentaho? What sort of question is that? By Pentaho I assume you mean PDI? How can that ever be compared? A data analysis tool vs ETL tool and you want to count users? eh? If on the other hand you mean R vs Pentaho as a whole, then I would guess no.You are looking at a report on R vs Weka and making it fit your ETL argument. That doesn't wash in a month of sundays.
==EDIT==
Okay so you have around 1000 lines of R & Python code currently. As your bosses requirements expand this slowly grows over time, and because you are trying to hit deadlines the new code is written as cleanly or as well documented as the code you currently have in place. So over time this grows to 5000 lines say plus a few python scripts. Then one day you get hit by a bus, and some new person has to come in and manage your code... where do they start, how to they make changes?
Virtually anyone with a modicum of data experience could make a change to a PDI ETL should they be required to. Where as it would take some with enough in depth R knowledge to make changes to what you have done.
ETL tools are designed to be quick and easy to use, they also offer far more than R can provide in terms of data connectivity to different systems (non db or file based, for example), although I guess this is why people resort to python etc.
That said there is room for both, there is an R plugin for PDI kicking around in the community I've seen demonstrated.
On top of that I've seen enough TSQL to ETL migrations over the years to know from experience, that even though maintaining your ETL in code may seem practical in the short term, in the long term it just brings more pain.
On the other hand if you can code 250 PDI transformations down to 1000 lines of R, your ETL is likely bloated through bad design by your predecessor.
If you'd like me to give an opinion on your existing PDI ETL structure, that can also be arranged.
Tom
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
What are the relevant skills in the arsenal of a Data Scientist? With new technologies coming in every day, how does one pick and choose the essentials?
A few ideas germane to this discussion:
Knowing SQL and the use of a DB such as MySQL, PostgreSQL was great till the advent of NoSql and non-relational databases. MongoDB, CouchDB etc. are becoming popular to work with web-scale data.
Knowing a stats tool like R is enough for analysis, but to create applications one may need to add Java, Python, and such others to the list.
Data now comes in the form of text, urls, multi-media to name a few, and there are different paradigms associated with their manipulation.
What about cluster computing, parallel computing, the cloud, Amazon EC2, Hadoop ?
OLS Regression now has Artificial Neural Networks, Random Forests and other relatively exotic machine learning/data mining algos. for company
Thoughts?
To quote from the intro to Hadley's phd thesis:
First, you get the data in a form that
you can work with ... Second, you
plot the data to get a feel for what
is going on ... Third, you iterate
between graphics and models to build a
succinct quantitative summary of the
data ... Finally, you look back at
what you have done, and contemplate
what tools you need to do better in
the future
Step 1 almost certainly involves data munging, and may involve database accessing or web scraping. Knowing people who create data is also useful. (I'm filing that under 'networking'.)
Step 2 means visualisation/ plotting skills.
Step 3 means stats or modelling skills. Since that is a stupidly broad category, the ability to delegate to a modeller is also a useful skill.
The final step is mostly about soft skills like introspection and management-type skills.
Software skills were also mentioned in the question, and I agree that they come in very handy. Software Carpentry has a good list of all the basic software skills you should have.
Just to throw in some ideas for others to expound upon:
At some ridiculously high level of abstraction all data work involves the following steps:
Data Collection
Data Storage/Retrieval
Data Manipulation/Synthesis/Modeling
Result Reporting
Story Telling
At a minimum a data scientist should have at least some skills in each of these areas. But depending on specialty one might spend a lot more time in a limited range.
JD's are great, and for a bit more depth on these ideas read Michael Driscoll's excellent post The Three Sexy Skills of Data Geeks:
Skill #1: Statistics (Studying)
Skill #2: Data Munging (Suffering)
Skill #3: Visualization (Story telling)
At dataist the question is addressed in a general way with a nice Venn diagram:
JD hit it on the head: Storytelling. Although he did forget the OTHER important story: the story of why you used <insert fancy technique here>. Being able to answer that question is far and away the most important skill you can develop.
The rest is just hammers. Don't get me wrong, stuff like R is great. R is a whole bag of hammers, but the important bit is knowing how to use your hammers and whatnot to make something useful.
I think it's important to have command of a commerial database or two. In the finance world that I consult in, I often see DB/2 and Oracle on large iron and SQL Server on the distributed servers. This basically means being able to read and write SQL code. You need to be able to get data out of storage and into your analytic tool.
In terms of analytical tools, I believe R is increasingly important. I also think it's very advantageous to know how to use at least one other stat package as well. That could be SAS or SPSS... it really depends on the company or client that you are working for and what they expect.
Finally, you can have an incredible grasp of all these packages and still not be very valuable. It's extremely important to have a fair amount of subject matter expertise in a specific field and be able to communicate to relevant users and managers what the issues are surrounding your analysis as well as your findings.
Matrix algebra is my top pick
The ability to collaborate.
Great science, in almost any discipline, is rarely done by individuals these days.
There are several computer science topics that are useful for data scientists, many of them have been mentioned: distributed computing, operating systems, and databases.
Analysis of algorithms, that is understanding the time and space requirements of a computation, is the single most-important computer science topic for data scientists. It's useful for implementing efficient code, from statistical learning methods to data collection; and determining your computational needs, such as how much RAM or how many Hadoop nodes.
Patience - both for getting results out in a reasonable fashion and then to be able to go back and change it for what was 'actually' required.
Study Linear Algebra on MIT Open course ware 18.06 and substitute your study with the book "Introduction to Linear Algebra". Linear Algebra is one of the essential skill sets in data analytic in addition to skills mentioned above.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Statistical analysis/programming, is writing code. Whether for descriptive or inferential, You write code to: import data, to clean it, to analyse it and to compile a report.
Analyzing the data can involve many twists and turns of statistical procedures, and angles from which you look at your data. At the end, you have many files, with many lines of code, performing tasks on your data. Some of which is reusable and you capsulate it as a "good to have" function.
This process of "Statistical analysis" feels to me like "programming" But I am not sure it feels the same to everyone.
From the Wikipedia article on Software development:
The term software development is often
used to refer to the activity of
computer programming, which is the
process of writing and maintaining the
source code, whereas the broader sense
of the term includes all that is
involved between the conception of the
desired software through to the final
manifestation of the software.
Therefore, software development may
include research, new development,
modification, reuse, re-engineering,
maintenance, or any other activities
that result in software products.
For larger software systems, usually
developed by a team of people, some
form of process is typically followed
to guide the stages of production of
the software.
According to this simplistic definition (and my humble opinion), this sounds very much like building a statistical analysis. But I imagine it is not that simple.
Which leads me to my question: what differences can you outline between the two activities?
It can be in terms of the technical aspects, the different strategies or work styles, and what ever else you think is relevant.
This question came to me from the following threads:
How do you combine "Revision Control" with "Workflow" for R?
How to organize large R programs?
Workflow for statistical analysis and report writing
As I said in my response to your other question, what you're describing is programming. So the short answer is: there is no difference. The slightly longer answer is that statistical and scientific computing should require even more controls around development than other programming.
A certain percentage of statistical analysis can be done using Excel, or in a point-and-click approach using SPSS, SAS, Matlab, or S-Plus (for instance). A more sophisticated analysis done using one of those programs (or R) that involves programming is clearly a form of software development. And this kind of statistical computing can benefit immensely from following all the best practices from software development: source control, documentation, a project plan, scope document, bug tracking/change control, etc.
Moreover, there are different kinds of statistical analyses that can follow different approaches, as with any programming project:
Exploratory data analysis should follow an iterative methodology, like the Agile methodology. In this case, when you don't know explicity the steps involved up front, it's critical to use a development methodology that is adaptive and self-reflective.
A more routine kind of analysis (e.g. an government annual survey such as the Census) could follow a more traditional methodology such as the waterfall approach since it would be following a very clear set of steps that are mostly known in advance.
I would suggest that any statistician would benefit from reading a book like "Code Complete" (look at the other top books in this post): the more organized you are with your analysis, the greater the likelihood of success.
Statistical analysis in some sense requires even more good practices around version control and documentation than other programming. If your program is just serving some business need, then the algorithm or software used is really of secondary importance so long as the program functions the way the specifications require. On the other hand, with scientific and statistical computing, accuracy and reproducibility are paramount. This is one of John Chambers' (the creator of the S language) major emphases in "Software for Data Analysis". That is another reason to add literate programming (e.g. with Sweave) as an important tool in the statistician's toolkit.
Perhaps the common denominator is "problem solving."
Beyond that, i doubt i doubt i could provide any insight, but i can at least provide a limited answer from personal experience.
This issue arises for us in hiring--i.e., do we hire a programmer and teach them statistics or do we hire a statistics person and teach them to program? Ideally we could find someone fluent in both discipline, and indeed, that's the third net we cast, but rarely with any success.
Here's an example. The most stable distinction between the two activities (software dev & statistical analysis) is probably their respective outputs, or project deliverables. For instance, in my group someone is conducting the statistical analysis on the results of our split-path and factorial experiments (e.g., from the t-test results, whether the difference is significant, or whether the test ought to continue). That analysis will be sent to the marketing department which they'll use to modify the web pages comprising the Site with a view towards improving conversion. A second task involves the abstraction of and partial automation of those analyses so the results can be processed in near-real time.
For the first task, we'll assign a statistician; for the second, a programmer. The business problem we are trying to solve is the same for both tasks, yet for the first, the crux is statistics, for the second, the statistics problems have been largely solved and the crux is a core programming task (I/O).
Notice also how the evolution of the tools associated with the two activities have evolved so the distinction between the two (software dev & data analysis) is further obfuscated: mainstream development languages are being adapted for use as domain-specific analytical tools, at the same time, frameworks continue to be developed which enable the non-developers to quickly build lightweight, task-oriented applications in DSLs.
For instance, python, a general purpose development language has R bindings (RPy2) which along with its native interactive interpreter (IDLE), substantially facilitates Python's use in statistical analysis, while at the same time, there is a clear trend in R package development toward (web) application development: R Bindings for Qt, gWidgetsWWW, and RApache--are all R Packages directed to Client or Web App development, and whose initial release was (i think) w/in the past 18 months. Aside from that, since at least the last quarter of last year, i've noticed an accelerating frequency of blog posts, presentations, etc. on the subject of Web app development in R.
Finally, i wonder if your question is perhaps evidence of the growing popularity of R. Here's what i mean. A decade ago, when my employer purchased a site license, i began learning and using one of the major statistical computing products (no point here in saying which one, it begins with "S"). i found it unnatural and inflexible. Unlike Perl (which i was using at the time) this tool was not an extension of my brain (which isn't an optional attribute of an analytical tool, to me it's more or less the definition of one). Interacting with this System was more like using a vending machine--i selected some statistical function i wanted and then waited for the "output", which was often an impressive set of high-impact, full-color charts and tables. Nearly always though what i wanted was to modify my input or use that output for the next analytical step. That seemed to required another, separate trip to the vending machine. The fact that this tool was context-aware--i.e., it knew statistics--while Perl didn't, didn't compensate for the awkward interaction. Statistical analysis done this way would never be confused with software development. (Again, this is just a summary of my own experience, i don't claim it can be abstracted. It's also not a polemic against any (or all) commercial data analysis platforms--millions use them and they've earned zillions for the people who created them, so let's assume it was my own limitations that caused the failure to bond.)
I had never heard of R until about 18 months ago, and i only discovered it while scanning PyPI (The Web Interface to Python's external package repository) for statistics libraries for python. There i came across RPy, which seemed brilliant but required a dependency called "R" (RPy of course is really just a set of Python bindings to R).
Perhaps R appeals to programmer and non-programmers equally, still for a programmer/analyst, this was a godsend. It hit everything on my wish list for a data analysis platform: an engine based on a full-featured, general programming language (which in this case is a proven scheme descendant), an underlying functional paradigm, built-in interactive interpreter, native data types built from the ground up for data analysis, and the domain knowledge baked in. Data analysis became more like coding. Life was good.
If you are using R, then you'll likely be writing code to solve your statistical questions, so in this sense, statistical analysis is a subset of programming.
On the other hand, there are plenty of SPSS users who have never ventured beyind a bit of pointing and clicking to solve their stats problems. This feels less like programming to me.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
In my work experience, most fresh out of school programmers are set right to creating reports for 6-12 months or so. While I see the benefit of doing something non-crucial, it seems to really discourage them.
So my question is, should organizations allow newbies to work with someone experienced right off the bat, obviously doing non-critical phases of a project, do get a real feel for what their career choice has in stock, or throw them on reports out of the gate?
Ah, there really is nothing like exploiting interns for remedial jobs...
Seriously though, you get back what you put in. Forcing them to do a thoughtless, thankless job for a long period of time is a quick way to build up a useless team member.
Perhaps they should be looking for a job at different companies? Maybe they shouldn't settle?
I was once a fresh-grad, and I have never been asked to work on a report. I had a programming check-in within the first 5 days of my job.
Maybe I am confused about the question. We are talking about folks who apply for programming positions and are sent to doing "reports" related job?!
I didn't start in "reports". I started on a conversion -- just get stuff to run on the new platform. Relatively safe, minor programming changes.
Then I did some new development for a while.
Then another conversion.
Then -- 2 years into my career -- no longer a complete n00b -- I wound up in "Reports". They wanted something like a dozen dumb-as-dirt accounting reports. Each was a "pull from the general ledger", "do some quick math" and "write a columnar report". [It was 1980, that's how stuff was done.]
I couldn't stand to do copy-and-paste programming. So I wrote a thing that extracted from the ledger into an array of values. It used a flexible notation for doing calculations on values in that array, then it wrote out the results of the calculations.
It could add, subtract, multiply and divide. You could use multiple operations on a series of "cells" to compute wonderfully complex things. To a limit.
I had invented the spreadsheet, built as a COBOL batch program. Seriously. That's what putting someone on reports can lead to. A single program that produced the dozen dumb-as-dirt financial reports. And a large number of additional reports, too.
Bonus. It was built in an Agile, incremental fashion. The first version did a half-dozen of the really easy reports. The next one did two or three more.
I don't think "reports" is a bad gig. What's bad is forcing people to copy and paste yet another dumb-as-dirt report program from a cookie-cutter template.
I believe it to be beneficial. It's what happened to me long ago and it provided me an opportunity to learn the database schema, the domain, and how the data is being used.
But, if they were hired as a Software Engineer they shouldn't be a report writer indefinitely. Programmer/Analyst however...
It's beneficial to the company in the short run, because then you can get useful work out of new graduates. It's harmful to everyone in the long run, because creating reports isn't really that hard, so the newbies don't learn much from doing it.
That being said, 6-12 months is a really long time to stick anybody on doing reports (unless they enjoy it, which most people don't). Maybe a shorter time period would be better training for a new employee.
I've worked in shops that threw a lot at the new hires where the results were mixed and I've worked at shops where they did pointless monkey-business exercises such as writing reports that nobody would read, attending 'process' meetings and open-ended tasks like "read a book about C++" or "learn something about this technology or that one. Both of these approaches were a waste of effort and time.
At my shop, if you are the new guy you aren't going to get left to your own devices to figure out X or to create busy work for yourself. Typically, we'll run you through our products so you are familiar with them as a user, then we'll talk through whatever task it is we need you to do, do the "I'm right over here, tell me if you need assistance" thing and then check up on them during the morning "what are you working on?" meeting. The goal at my shop is to get a developer up to speed as quickly as possible without skipping over the important stuff.
I think the key to successfully developing the new employee, particularly one who may be right out of school is to challenge them, provide them with interesting tasking that will make them not dread coming to work. If you get them interested in the work, you get an employee who becomes valuable. There are some tasks that just aren't interesting, and we all do them at my place. For me, I dread getting anywhere near MS Word to write formal documentation, but that comes with the territory sometimes. The 'new guy' needs to realize it won't always be code slinging or new development. Sometimes it is maintenance coding - much of the time it is. Sometimes it is 'turn the crank' type work. Sometimes it is report writing.
A good manager or senior developer will mentor the new hire. If a shop doesn't do that, I'd probably not want to work there myself.
They should be pair programming (or spectator programming) with different people from their department for a few weeks. Then they get to know all the people, the structure, the code and useful tips.
Reports are a wonderful introduction.
They tend to have very specific specifications, unlike many other projects. They're a good "stand alone" task. They also give the developer a good introduction to the domain model, which they must use to actually get the data out for the report.
Finally, they're (typically) reasonably simple with some reporting framework doing most of the heavy lifting for them. So they need to focus on learning the tools of the trade, deployment, and the data model.
They're a nice gradual introduction to the larger domain and application.
I've never been put on a non-important job as a safety function. Even when I didn't know exactly what I was doing I got put on important projects people wanted yesterday, and then paired with someone who had specific development he/she wanted to offload onto the new-hire.
It works pretty well that way.
If you put a college grad on report-writing duty for a long time, he's going to bail on you. Bad management and a waste of money...
I have two contrasting experiences with Crystal Reports in two different companies:
With my first employer (fresh out of University), our Crystal Reports expert was leaving, so I was asked to take over the role. No actual training was provided, so I had to learn everything on-the-job, with no support from either the Vendor or the Employer. Although my position description was as an IT Developer, I eventually spent 100% of my time working on Crystal Reports. It was an unproductive experience for me, and a waste of manpower and resources.
My current employer asked me to assist another Developer in creating and maintaining their Crystal Reports setup. Because they provided adequate training, and I was mentored in the role, I gained knowledge on multiple systems and databases. I even a little experience at administrating and maintaining SQL Server. And I also got the chance to interact with many different clients in the company, as many different sections of the company needed these reports.
So my answer to the original question is that it really depends on the organization, rather than the central concept. If your employer is intending to use it as a way of familiarizing new employees with multiple systems, then I think it's a great idea. If it's just a short-term way of foisting a thankless (and rotten) job on a hapless new employee, then I think it's a waste of manpower and resources.
The good thing about reports is that they are not updating information so there's no chance that any data will be lost.
Depending on what the tools are for reporting too. When I did reporting, I learned tons about SQL, and stored procedures. Of course that is probably not the norm for reporting.
It depends on the report, and it depends on the job. Many reports are anything but trivial, and excellent SQL skills are needed to create a performant and properly maintainable back-end. If your newbies are good with SQL, let them cut their teeth on the queries. It will be a good way for them to learn the schema of your database.
However, if "putting them on reports" is just a euphamism for them trying in vain for hours without direction or inspiration to format a table in Crystal reports 25 (or whatever the current version is), well, I think you probably already know my answer to that question...