What is a programming language which is appropriate with data classification project - information-retrieval

I would like to easily implement a data classification project, so I'm looking for the language which provides the library for that. Could you suggest the proper language?

matlab is not exactly a programming language, but no doubt it's the easiest way to implementing math oriented programs. it has lots of toolboxes for classifications (e.g. MLP, SVM) optimization toolboxes.

There is a Python distribution called SciPy that has lots of tools for scientific programming and people have used it to do data classification. Some bioinformatics people have built Excel2SVM in Python.
If the focus of your work is on the data classification, not on developing software, then Python is a good choice because you can be more productive than with languages like java or C++.

I'd say you really need more information before choosing a language.
Where are you getting data from, what front end do you want to use (web / dedicated client) ?
C# could do just as good a job, or any Object oriented language.
Cheers

(A little late coming, but I thought this answer should be here for the record).
WEKA and MALLET are two useful libraries for data classification that I've come across. I've used WEKA in a couple of projects and can say that it is pretty mature. Both these libraries are Java-based.

Related

A good framework for Image(DICOM) data manipulation, visualization and development

I need to start a project which deals with the DICOM data manipulation, visualization. As a part of some basic research, I have found there are a few toolkits such as ITK, VTK which performs data manipulation on the medical image data. My question is, using ITK+VTK+QT for DICOM image manipulation(segmentation & registration) a better choice or using ITK with OpenCV would be a better option? Or any alternate toolkits exists that would likely achieve my requirement?
Any suggestions, sources or links related to this topic would be much helpful.
There are many toolkits and frameworks that work with Dicom. It depends on what you want to do.
In many cases, the easiest thing to do is to build a plugin for an existing application/toolkit such as Horos, 3D Slicer, ImageJ, MITK, MeVisLab, ITK Snap, etc. I'm sure there are dozens more.
If you want to build your own medical imaging application, most of the above are open-source; adapting one of these would save you a lot of grief (and probably years) compared to trying to write your own application from scratch.
If your main interest is in developing algorithms, then Python is a good prototyping language - consider packages such as numpy, scipy, pydicom, ITK, SimpleITK. Java has dcm4chee. C++ has QT, ITK, VTK.
If you want to do something JavaScript-based that will work though web browsers on tablets etc, look at in-progress projects such as OHIF Viewer or Cornerstone.
One other thing: a) dealing with Dicom data, b) manipulation and c) visualisation, are three different things. It's easy to convert your Dicom data to, say, nifti format, which opens up a lot of academic analysis tools. Similarly, there are many 2D and 3D visualisation libraries that are not specific to Dicom - it's just about converting data into the right form.

How much is Eclipse EMF related to the OMG MDA standard?

I am looking for a new MDA tool to try out for modelling and code generation. This is not for any work related project yet, but for testing purposes. I only used the Merode approach until now (using jMermaid for modelling and the accompagnied code generator) but want to try out something new.
Since EMF is integrated in Eclipse I see a lot of positive reasons to try it out. But after reading some documentation and online articles, I wonder how much it adopts the OMG MDA standards and how much it doesn't.
For example I found the following text
If, on the other hand, you have already bought into the idea of modeling, and even the Model Driven Architecture (MDA) big picture,3 you should think of EMF as a technology that is moving in that direction, but more slowly than immediate widespread adoption. You can think of EMF as MDA on training wheels.
on http://www.informit.com/articles/article.aspx?p=1323360&seqNum=2
But I can nowhere find a concise list of what points of the OMG standard are implemented and which ones are left out or interpreted differently. Anyone that can help out with that?
(And if there are other, more recommended tools, I'm always open to suggestions.)
There is very little relation. EMF is a framework to create (meta)models with very basic code-generation capabilities (basically only a Java direct translation). EMF's goal is not to be an MDA framework but to be the building block on top of which other tools may build more sophisticated solutions (e.g. check the open soruce Eclipse Acceleo tool).
And MDA is just a philosophy. Itself is not even a specific method. The MDA guide, the OMG standard document explaining MDA, is just a set of principles for model-driven development using OMG technologies but does not go further than that (if needed you may want to check the difference between all these MD* acronyms).
So, you can find EMF-based tools that follow MDA principles but EMF as such does not pretend to do so.
In EMF FAQ there is question "What is the relationship of EMF to OMG MDA?" which states
"Essentially EMF supports the key MDA concept of using models as input
to development and integration tools which produce multiple
programming language (Java in the case of Eclipse EMF itself) or data
interchange format (XML) representations."
EMF corresponds to a simplified OMG's MOF implementation (http://www.omg.org/mof/), providing facilities to express custom metamodels and generate java components to instantiate models.
MDA is a particular model-driven philosophy, based on several kind of models (CIM, PIM, PSM...), and aiming to provide a way to target several technical architectures (PSM) from a unique functional model (PIM).
You can use EMF for any model-driven philosophy MBE, MDE, MDD, or MDA. It is the fundamental building block that allows you to define your own metamodels and models. Simply said, EMF provides models, and you can use it for any model-driven approach, including MDA.

What is a good language to develop in for simple, yet customizable math programs?

I'm writing to ask for some guidance on choosing a language and course of action in learning programming. I apologize if this type of question is inappropriate for Cross Validated, please advise me to another forum if that is the case.
I've seen thread after thread with questions from newbies, asking, "What is the best language to start with?" and then it always starts a flame war or someone just answers, "There's no best language, it's best to pick one and start learning it." My question is a little bit more focused than that.
First off, I've been programming my whole life, in very limited capacities. My deepest training was in C++. Whilst in my EECS degree program, I resolved to never be a software developer because I couldn't stand not interacting with people for such long periods of time. Instead I realized I wanted to be a math teacher, and so that is the path I have taken.
But now that I'm well down that path, I've started to realize that perhaps I could develop my own software to help me in the classroom. If I want to demonstrate the Euclidean algorithm, what better way than to have a piece of software that breaks down the process? Students could run that software as part of their studies, and the advanced students might even develop programs for themselves. Or, with an Ipad in hand, why not have an app that lets students take their own attendance? It would certainly streamline some of the needs of classroom management.
There's obviously a lot of great stuff already out there for math, and for education, but I want a way to more directly create things specific to my lectures. If I'm teaching a specific way of calculating a percent, I want to create an app that aligns with my teaching style, not just another calculator app that requires the student to learn twice.
The most I use in class right now is iWork Numbers/Microsoft Excel for my stats class. Students can learn the basic statistical functions, and turn some of their data into graphs.
I have dabbled a bit with R, and used Maple in college. I've started the basic tutorials for OS X/iOS development and have actually made good progress making an OS X app that takes a text string, converts it to numbers, and performs encryption using modular addition and multiplication. I sometimes use Wolfram|Alpha to save myself some time in getting quick solutions to equations or base conversions. I know of MatLab, Mathematica, and recently people have been telling me to check into Python or Ruby. I also know basic HTML, and while it's forgotten now, learned Javascript and PERL in college.
If I keep on the path of Obj-C/Cocoa, I think it will have great benefits. Unfortunately, anything I produced for Mac would only be usable on a Mac, so it wouldn't be universal for all of my students. Perhaps then learning a web language would be better. Second, I'm wondering if the primary use is mathematical, then perhaps my time would be better spent learning Mathematica Programming Language, or R, or something based less on GUI and more on simple coding of algorithms, maybe Python or Ruby?
It seems that Mathematica already has a lot of demos for different math concepts, so why reinvent the wheel is also a question I have. I think overall, it would be good to have more control and design things the way I need. And then, if I do want to make an "Attendance" app or something else, I would already have the programming experience to more easily design something for my iPad or MacBook.
The related question to this is what is a good language to teach to my students? In his TED talk, Conrad Wolfram says one of the best ways to check the understanding of a student is have them write a program. But if Mathematica does the math virtually automatically for them, then I'm not sure that will get the deeper experience of working out logic for themselves, like you do when you're writing C, or a traditional procedural language.
I know that programming takes time to learn, but I also know that at this point, my goal is not to be able to make an app like "Tiny Wings." With the app store ease, some of my work may be an extra revenue stream, but I see myself as more of a hobbyist, and now teacher looking to software development specifically for its ability to help me demonstrate mathematical concepts.
I think I will push ahead with Obj-C/Cocoa for OSX/iOS, but if anyone has some better guidance regarding all of the other available stuff, it would be much appreciated. I don't think I would want to go fully to the web (I like apps), but perhaps someone could suggest a nice way of bridging what I produce in XCode to a universal web version. For example, if you come up with an algorithm in obj-c is it easiest to transition that to ruby and run it online, or is there another approach that works better?
Mathematica is pretty awesome for the first part of your question. I've used the interactive mode (Manipulate[]) for explaining things to my colleges (and myself). It makes really nice dynamic figures and is fairly expressive (although your code can end up looking like line noise). It is very powerful, but it does far less for you than you might think. It's pretty intuitive, which is a good thing for teaching.
You could use Scala if you want an "easy" way to make a domain specific language for teaching. Python seems to confuse people as a first programming language. Objective C seems like a completely random choice to me.
Mathematica then. It's worth the price. But anything that is interpreted and has an interactive shell is probably better than a compiled language. BBC BASIC?
Nothing beats Haskell for general-purpose mathematical programming. The wiki's quite extensive and the IRC channel (#haskell on Freenode) is great for asking questions. If you statically link your binaries on compilation, you should be able to run your programs on just about any system (with a few exceptions, e.g., libgmp).
Haskell code reads (roughly) like mathematical notation once you get the hang of it, so it can really help to tie things together for your students who are motivated to write their own programs. The purely functional style can be beneficial, as well, since it focuses less on I/O and the marshalling of data (perfectly useful in applications, perhaps less so in pure math), and more on the actual creation and refinement of functions and algorithms. You can even compose functions just as you would on paper.
If you want to get really serious, you could also look into Coq or Agda, but those might be a bit much for most classes.
For a Haskell program idea for an educator, check out this link.
A nice list of arguments can also be found at:
Eleven Reasons to use Haskell as a Mathematician and the book The Haskell Road to Logic, Maths and Programming

For any projects using functional programming?

I have a free time and would like to do functional programming and learn some functional programming language.
But as we know the best theory it is practice. In this regard, I would like to know in which sector is most often used functional programming? I understand if the project is written in a functional language that is somehow justified. Therefore, such a question: what kind of projects easier and more profitable to write in functional languages?
Thank you
Compilers are often referred to as the "killer app" for functional languages with algebraic data types, like Haskell and ML. I have written compilers in a procedural language, in an object oriented language, and in functional languages, and a functional language is worlds better.
A compiler is also a relatively attractive project in that you can pick up, say, Andrew Appel's book on the used market, and build the whole thing yourself—just be sure to compile a very simple language.
interpreters, hand-written recursive descendant parsers, program analyzers
AI, data processing, scientific/financial/computationally intensive applications.
Financials, Statistics, and Scientific Computation are the three areas where Functional Programming are used the heaviest.
You could always throw together a simple statistics calculation package that works against one of the various social networks out there. An F# stats application against the StackOverflow would be an interesting project...

Statistical tools for programmers [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm trying to evaluate the purchase of a statistical tool. This will be used in part by non-programming users (doing clinical studies) and in part by programmers, so I'm trying to find a good compromise between usability and automation. Of course, cost is an issue, but if I can build a solid case, we could probably buy a commercial package, so we're not totally limited to free options.
So far, our options are:
Statistica (which some non-programmers already know)
Matlab Statistics toolbox (programmers already use matlab)
R language (would need a UI for non-programmers)
Hack something into Excel (not fun, but that's what non-programmers do right now)
?...
What else is out there? What's the industry standard? What kind of distinctive features should I look for? What would you recommend, and why?
Ideally, we'd like a tool that can run both on Linux and Windows machines.
(I work in medical imaging, so we do both biostatistics, and software engineering statistics)
Hands down it's R. R is very programmer friendly. It has functional aspects and it's GNU.
S-PLUS and R are both based off the S language. Both are similar and in most cases you can run as S-PLUS program in R and vice versa.
SAS is another option, although geared more towards BI and enterprise. SAS has a simpler syntax than R and in my opinion is easier to pickup for a non-programmer.
Other options include SPSS, Matlab, and even Excel.
I recommend R, personally. It's used by bioinformaticians and psychologists, I hear. Don't know what your field is though, so maybe it's a lousy choice. It is reasonably easy to use and learn.
Stata and SPSS tend to be the most commonly used packages in clinical studies. Both are pretty easy to pick up and use for non-technically minded folks but are generally flexible enough. I've used Stata more than any of the others and have been pretty happy with its options (supports both menu-based and command line operation, decent enough plugin system to get new user-created modules, good graphing support).
R is a little more daunting for newbie users, though it is popular with the biostatisticians. Since it's free, that's another nice point in its favor.
For a statistical package with a GUI which non-technical users can use, I would recommend that you go with "SAS Enterprise Guide". You will get the common and advanced SAS procedures, an excellent graphics facility and the ability to program for the technical users. I recommend that you start with the "SAS Learning Edition" (http://support.sas.com/learn/le/) which is a fully functional version of Enterprise Guide, but limited to processing 1000 rows at a time only. It is under $500, which makes it a pretty good deal.
I would look at S-Plus.
You get a strong programming environment (S-Plus Workbench, based upon the Eclipse platform), an intuitive GUI for non-programmers, and an extensive user community (including users of R, which was based upon the original S).
Visual Numerics is another option.
It sounds like you're trying to maximize multiple goals. You say "This will be used in part by non-programming users (doing clinical studies) and in part by programmers, so I'm trying to find a good compromise between usability and automation", with an implicit assumption that this will be the same tool in both cases, when that might not be realistic. What's the compromise for Word and LaTeX, for example?
Some different questions about the requirements:
Should it be extensible for programmers
Able to use C extensions
Easy to make new procedures and methods
What analysis are non-programmers going to want to use?
Graphics?
Ease of use for different groups
So my read on this:
Easy to extend: R/S-plus, Matlab/Octave (I happen to prefer R, but I do more stats and fewer matrix things)
Easy to use for normal people: Excel, custom wrapped R, SPSS
Also, R on windows has a limited GUI, which may or may not help your users.
If it was me, I'd go with a hybrid solution. Use R, and give a cheat sheet for for common tasks to non-programmers that illustrates common tasks, or even better, write some wrapper functions with names like "image_summary" that automate their exploratory work.
For writing front end scripts for R, the RPy python wrappers might help as well.
SAS Enterprise Guide has good usability for non-programmers. Also, it has good options to connect to Excel. And for programmers, it's the most robust option out there. The sas server runs on anything, though, enterprise guide is Windows only.
Consider Excel one more time. It is well known, and widely available. Refer this book or this book.
This Wikipedia page compares the features available for several statistical packages, as well as their OS compatibility and pricing info (which seems a little out of date, but it gives an overall idea)
We ended up getting the Matlab Statistics toolbox (mainly because we already have some experience with Matlab in the team, and needed the tool anyway)
So far, it's doing what we need to do, and it's easily expansible. Usage will show if non-programmers really use it, but so far it's looking good.

Resources