tidy/efficient function writing (garbage collection) in R - r

Excuse my ignorance, as I'm not a computer engineer but with roots in biology. I have become a great fan of pre-allocating objects (kudos to SO and R inferno by Patrick Burns) and would like to improve my coding habits. In lieu of this fact, I've been thinking about writing more efficient functions and have the following question.
Is there any benefits in removing variables that will be overwritten at the start of the next loop, or is this just a waste of time? For the sake of argument, let's assume that the size of old and new variables is very similar or identical.

I think it will really depend on the specifics of the case. In some circumstances, when the object is large it might be a good idea to rm() it, especially if it is not needed and there are lots of other things to do before it gets overwritten. But then again, it's not impossible to conceive of circumstance were that strategy might be costly in terms of computation time.
The only way to know whether it would really be worthwhile is to try both ways and check with system.time().

No. Automatic garbage collection will take care of this just fine.

Related

How to test Math-related units?

I'm giving TDD a serious try today and have found it really helpful, in line with all the praise it receives.
In my quest for exercises on which to learn both Python and TDD i have started to code SPOJ exercises using the TDD technique and i have arrived at a question:
Given that all of SPOJ's exercises are mostly math applied to programming; How does one test a math procedure as in the TDD Fashion? Sample known-to-be-correct data? Test against a known implementation?
I have found that using the sample data given in the problem itself is valuable but it feels overkill for something you can test so quickly using the console, not to mention the overhead to design your algorithms in a testable fashion (Proxying the stdout and stdin objects is nothing short of too much work for a really small reward), and while it is good because it forces you to think your solutions in testable terms i think i might be trying way too hard on this.
All guidance is welcome
test all edge cases. Your program is most likely to fail when the input is special for some reason: negative, or zero values, very large values, inputs in reverse order, empty inputs. You might also want some destructive tests to see how large the inputs can be before things break or grind to a halt.
Sphere online Judge may not be the best fit of TDD. For one the input data might be better behaved than what a real person might put in. Secondly there is a code size limit on some problems. An extensive test-suite might put you over that limit.
You might want to take a look at Uncle Bob's "Transformation Priority Premise." It offers some guidance on how to pick a sequence of tests to test drive an algorithm.
Use sample inputs for which you know the results (outputs). Use equivalence partitioning to identify a suitable set of test cases. With maths code you might find that you can not implement as incrementally as for other code: you might need several test cases for each incremental improvement. By that I mean that non maths code can typically be thought of as having a set of "features" and you can implement one feature at a time, but maths code is not much like that.

Why is OCaml still reasonable fast while constantly creating new things?

OCaml is functional, so in many cases, all the data are immutable, which means it constantly creates new data, or copying data to new memory, etc.
However, it has the reputation of being fast.
Quite a number of talks about OCaml always say although it constantly creates new things, it is still fast. But I can't find anywhere explaining why.
Can someone summarise why it is fast even with functional way?
Also, you should know that copies are not made nearly as often as you might think. Only the changed part of an immutable data structure has to be updated. For example, say you have an immutable set x. You then define y to be x with one additional item in it. The set y will share most of its underlying representation with x even though semantically x and y are completely different sets. The usual reference for this is Okasaki's Purely Functional Data Structures.
I think the essence, as Jerry101 points out, is that you can make GC a lot faster if it's known to be working in an environment where virtually all objects are immutable and short-lived. You can use a generational collector, and virtually none of the objects make it out of the first generation. This is surprisingly fast.
OCaml has mutable values as well. For some cases (I would expect they are rare in practice) you could find that using mutability makes your code slower because GC is tuned for immutability.
OCaml doesn't have concurrent GC. That's something that would be great to see.
Another angle on this is that the OCaml implementors (Xavier Leroy et al) are excellent :-)
The Real World OCaml book seems to have a good description of GC in OCaml. Here's a link I found: https://realworldocaml.org/v1/en/html/understanding-the-garbage-collector.html
I'm not familiar with OCaml but here's some general background on programming language VM (including garbage collection) speed.
One aspect is to dig into the claims -- "fast" compared to what?
In one comparison, the summary is "D is slower than C++ but D programs are faster than C++ programs." The micro-benchmarks in D are slower but it's easier to see the big picture while programming and thus use better algorithms, avoid duplicate work, and not have to work around C++ rough edges.
Another aspect is to realize that modern garbage collectors are quite fast, that concurrent garbage collectors can do most of the work in another thread thus making use of multiple CPU cores in a way that saves time for the "mutator" threads, and that memory allocation in a GC environment is faster than C's malloc() because it doesn't have to search for a suitable memory gap.
Also functional languages are easy to parallelize.

When is it best to change code to match standards?

I have recently been put in charge of debugging two different programs which will eventually need to share an XML parsing script, at the minimum. One was written with PureMVC, and another was built from scratch. While it made sence, originally, to write the one from scratch (it saved a good deal of memory, but the memory problems have since been resolved).
Porting the non-PureMVC application will take a good deal of time and effort which does not need to be used, but it will make documentation and code-sharing easier. It will also lower the overall learning curve. With that in mind:
1. What should be taken into account when considering whether it is best to move things to one standard?
(On a related note)
Some of the code is a little odd. Because the interpreting App had to convert commands from one syntax to another, it made sense to have an interpreter Object. Because there needed to be communication with the external environment, it made more sense to have one object interact with the environment, and for that to deal with the interpreter exclusively.
Effectively, an anti-Singleton was created. The object would only interface with the interpreter, and that's it. If a member of another class were to try to call one of its public methods, the object would raise an Exception.
There are better ways to accomplish this, but it is definitely a bit odd. There are more standard means of accomplishing the same thing, though they often involve the creation of classes or class files which are extraordinarily large. The only solution which I could find that was standards compliant would involve as much commenting and explanation as is currently required, if not more. Considering this:
2. If some code is quirky, but effective, is it better to change it to make it less quirky, even if it is made a more unwieldy?
In my opinion this type of refactoring is often not considered in schedules and can only be done when there is extra time.
More often than not, the criterion for shipping code is if it works, not necessarily if it's the best possible code solution.
So in answer to your question, I try and refactor when I have time to do so. Priority One still remains to produce a functional piece of code.
Things to take into account:
Does it work as-is?
As Galwegian notes, this is the only criterion in many shops. However, IMO just as important is:
How skilled are the programmers who are going to maintain it? Have they ever encountered nonstandard code? Compare the cost of their time to learn it (including the cost of delayed dot releases) to the cost of your time to refactor it.
If you're maintaining it, then instead consider:
How much time will dealing with the nonstandard code cost you over the intended lifecycle of the codebase (e.g., the time between now and when the whole thing is rewritten)?
That's hard to guess, but consider that many codebases FAR outlive the usefulness envisioned by their original authors. (Y2K anyone?) I've gradually developed a sense of when a refactoring is worthwhile and when it's not, mostly by erring on the side of "not" too often and regretting it later.
Only change it if you need to be making changes anyway. But less quirky is always a good goal. Most of the time spent on a particular piece of software is in maintenance, so if you can do something to make that easier, you'll be reducing the overall time spent on that piece of code. Nonetheless, don't change something if it's working and doesn't need any modifications.
If you have time, now. If you don't have time and it can be avoided, later.

How do I get my brain moving in "lisp mode?"

My professor told us that we could choose a programming language for our next programming assignment. I've been meaning to try out a functional language, so I figured I'd try out clojure. The problem is that I understand the syntax and understand the basic concepts, but I'm having problems getting everything to "click" in my head. Does anyone have any advice? Or am I maybe picking the wrong language to start functional programming with?
It's a little like riding a bike, it just takes practice. Try solving some problems with it, maybe ProjectEuler and eventually it'll click.
Someone mentioned the book "The Little Schemer" and this is a pretty good read. Although it targets Scheme the actual problems will be worth working through.
Good luck!
Well, for me, I encountered the same problem as you do in the beginning when I started doing OCAML, but the trick is that you must start thinking about what you want from the code and not how to do it!!!
For example, to calculate the square of list's elements, forget about the length of the list and such tricks, just think mathematically like that:
if the list is empty -> I am done
if not, then the list must have a head and tail -> you calculate the square of the head, then ask your function to do the same with the tail.
Just think about the general case and the base one, and that you are emitting data and not modifying it (unless you want to modify it ;) ).
Good luck!
You could check out The Little Schemer.
How about this: http://www.defmacro.org/ramblings/lisp.html
It is a very simple, step-by-step introduction to thinking in lisp from the point of view of a regular imperative programmer (Java, C#, etc.).
For educational purposes I would recommend PLT Scheme. It is a portable and powerful environment with very good examples and an even better documentation. It will help you to discover the thoughts behind functional programming step by step and in a very clean way. Choosing a little application to implement will help you learning the new language.
http://www.plt-scheme.org/
Additionally "Structure and Interpretation of Computer Programs" of H. Abelssn, G. Sussman, and J. Sussman is a very good book regarding Scheme (and programming).
Regards
mue
Take a look at 99 Lispy problems
Some thoughts on Lisps, not specific to Clojure (I'm not a Lisp expert, so I hope they're mostly correct and useful):
Coding in AST
I know little about compiler or interpreter theory, but every time I code in Lisp, it amazes me that it feels like directly building an AST.
That's part of what "code = data" means, coding in Lisp is a lot like filling data structures (nested lists) with AST nodes. Amazing, and it's easy to read too (with the right text editor).
A Programmable Programming Language
So code chunks are just nested lists, and list operations are part of the language. So you can very easily write Lisp code that generates Lisp code (see Lisp macros). This makes Lisp a programmable (in itself!) programming language.
This makes building a DSL or an interpreter in Lisp is very easy (see also meta-circular evaluation).
Never reboot anything
And in most Lisp systems, code (including documentation) can be introspected and hot swapped at run time.
Advanced OOP
Then, most Lisp Systems have some sort of Object System derived from CLOS, which is an advanced (compared to many OOP implementations) and configurable Object System (see The Art of the Metaobject Protocol).
All these features where invented long ago, but I'm not sure they are available in many other programming languages (although most are catching up, e.g. with closures), so you have to "rediscover" and get used to these by practicing (see the books in other answers).
Just remember: it's all data!
Write some simple classic functions that Lisp is good at, like
reverse a list
tell if an atom is somewhere in an s-expression
write EQUAL to tell if 2 s-expressions are equal
write FRINGE to get the list of atoms at the fringe of an s-expression
write SUBST, then write SUBLIS
Symbolic differentiation
Algebraic simplification
write a simple EVAL and/or APPLY
Understand that Lisp is good for these kinds of no-side-effect functional programs.
It is also useful for stateful side-effect (non-functional) programs, but those are more like "programs" than "functions".
Which is better for a given application depends on the application. In general, it should contain no less, and no more, state information than necessary.
Easy!
M-x lisp-mode
OK, OK, so you might not have Emacs for a brain. In all seriousness, what you need to do is to get really good at recursion. This can be quite a brain warp initially when trying to extend the concept of recursion beyond the canonical examples, but ultimately it will result in more fluid, lispy code.
Also, a lot of people get hung up on the parenthesis, and I don't really know why - the syntax is very simple and consistent and can be mastered in minutes. For me, I came to Scheme after having learned C++ and Java, and I always thought that the difference between "functions" and "operators" was a false dichotomy, and it was refreshing to see that distinction eliminated.
As far as functional programming goes, as long as you can wrap your head around the fact that a function is a first-class value and can be passed both into and out of other functions you should be fine. The usefulness of this will become clear over time, but it's enough that you can write function-taking and function-returning functions.
Finally, I'm not sure what support Clojure has for macros, but they're considered an essential part of lisp. However, I wouldn't worry about learning them until you're deeply familiar with the above items - though macros are incredibly useful and versatile, they're also used less often than the other techniques I mentioned.
I'd start with a language that can be interpreted. I found Moscow ML to be fairly easy. It is a lightweight implementation of Standard ML.
My personal practice is to find a small project (something that might take 3-5 nights hacking away) and implement it. How about a blog filter tool? Maybe just a Towers of Hanoi or Linked List implementation (those are usually 1-night projects).
The way it usually works out is I implement it poorly the first time, throw away what I had, and it finally clicks a few hours in.
A HUGE help is taking a course in something like... um... LISP! :) The homework will force you to confront a lot of the concepts and it clicked for me long before the semester ended.
Good luck!!
Good luck. It took me until about halfway through the "Programming Languages" course in college before Scheme "clicked". Once that happened, though, everything just made sense, and I fell in love with functional programming.
Write a Lisp interpreter in Lisp.
If you haven't alrady, read up on what makes lisp a unique language. If you don't do this first, you'll be trying to do the same things you could do in some other programming languages.
Then try to implement some small thing (try to make it useful to you or you might not have the motivation).
Lisp in a box is a great way to get your feet wet.
For me the important thing is to make sure you do everything in a 'lisp-y' way. Don't be tempted to think 'In Java I'd use a for loop here, how do I do for loops in Lisp?' but to go through enough examples and tutorials (as someone pointed out, SICP is perfect for this) that you can start to spot when code looks 'Lisp-y' and recognise common language paradigms.
I certainly know the feeling of looking at some code I've just written and intuitively knowing that it's correctly idiomatic for that language and platform/framework - that, I think, is when it 'clicks'.
Edit: And kudos for choosing a functional language, lesser students would have just done it in Java :)
Who said it is going to click? I'm always confused
But if you think about how much abstraction it is possible to hide away, behind lisp macros. Then your brain will explode.
:)
I'd check out Programming Clojure. It's a great book for non-lispers.
In addition to what other SO'ers have already suggested, here are my 2 cents:
Start learning the language and try out a few simple numerical/hobby problems in the language
IMPORTANT: Post the solution/code to StackOverflow, asking for people's opinion if that is really the LISPy way to do it.
Best of luck!

I am compiling a rules of programming mindset for my team: What are yours? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 3 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I have been working on a list for a while that helps me share the why of programming approach and thought as much as how to do something.
For this, I wanted to build a list of things that are:
best practice,
best thought,
best approach...
that help a programmer's ability to analyze, think, approach, solve and implement in the most effective way.
I have seen dozens of incredibly valuable comments in questions throughout Stack Overflow, but I couldn't find a place where we keep them together. There is the most controversial opinion on Stack Overflow. However, I'm just looking for sagely insights that can be shared and help my team, and I approach and solve problems better through better programming.
Hopefully this can be one place to gather the one or two liners that are concise, profound and easy to share, repeat, review. If we keep it to one rule per answer it might be easiest to vote up/down.
I'll start with the first.
DRY - Don't Repeat Yourself - In code, comments or documentation.
Always leave the code a little better than when you found it.
Code does not exist until entered into a versioning control system.
Don't be afraid to admit "I don't know" and ask.
10 minutes asking someone could save a day pulling your hair out!
KISS - Keep it simple, stupid.
Pick the simplest solution that works.
Don't make things (too) complicated before they need to be.
Just because everyone else is using some complicated framework to solve their problem, doesn't mean you have to.
Don't reinvent the wheel
If there ought to be a function for it in the core library - there probably is.
Maintainability is important.
Write code as if the person who will end up maintaining it is crazy and knows where you live.
Someone else won't fix it.
If a problem comes to your attention, take ownership long enough to ensure it will be taken care of one way or another.
Don't optimize unless there's a demonstrable problem.
Most of the time when people try to optimize code before it's been proved necessary, they'll spend a lot of resources, make the code harder to read and maintain, and achieve no noticeable effect. Sometimes they'll even make it worse.
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
- Donald Knuth
How hard can it be?
Don't let any problem intimidate you.
Don't Gather Requirements -- Dig for Them
Requirements rarely lie on the surface. They're buried deep beneath layers of assumptions, misconceptions, and politics
via The Pragmatic Programmer
Follow the SOLID principles:
Single Responsibility Principle (SRP)
There should never be more than one reason for a class to change.
Open-Closed Principle (OCP)
Software entities (classes, modules, functions, etc.)
should be open for extension, but closed for
modification.
Liskov Substitution Principle (LSP)
Functions that use pointers or references to base
classes must be able to use objects of derived classes
without knowing it.
Interface Segregation Principle (ISP)
Clients should not be forced to depend upon interfaces
that they do not use.
Dependency Inversion Principle (DIP)
A. High level modules should not depend upon low
level modules. Both should depend upon abstractions.
B. Abstractions should not depend upon details. Details
should depend upon abstractions.
Best Practice: Use your brain
Don't follow any trend/principle/pattern without thinking about it
I think almost everything that is listed under "The Zen of Python" applies for every "Rules of Programming Mindset" list. Start with 'python -c "import this"':
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Test Driven Development (TDD) makes coders sleep better at night
Just to clarify: Some people seem to think TDD is just an incompetent coder's way of limping from A to B without borking everything up too much, and that if you know what you're doing, that means there is no need for (unit) testing methodologies. That completely misses the point of Test Driven Development. TDD is about three (update: apparently four) things:
Refactoring magic. Having a full set of tests means you can make otherwise insane refactoring stunts, juggling the entire structure of your application without missing even one of the two hundred crazy subtle side effects that result from it. Even the best programmers are reluctant to refactor their core classes and interfaces without good (unit) test coverage, because it's damn near impossible to track down all the little 'ripple effects' it causes without them.
Detecting pitfalls early. If you are writing tests the right way, it means forcing yourself to consider all the fringe cases. Often, this leads to better design choices once the actual development begins, because the coder has already considered some of the trickier situations that may call for a different inheritance structure or a more flexible design pattern. The need for these changes is often not apparent - or intuitive - during initial planning and analysis, but those exact changes can make the application much easier to extend and maintain down the line.
Ensuring that tests get written. TDD requires you to write the tests before writing the code. Sure, that can be a pain in the ass, since writing tests is tedious compared to writing actual code - and often takes longer, too. However, doing so is the only way to make sure the tests will be written at all. If you think you'll remember to write the tests once the code is done, you're almost always wrong.
Forcing you to write better code. Since TDD forces all code to be testable (you don't write code before there is a test for it), it requires you write more decoupled code so that you can test the components in isolation. So TDD forces you to write better code. (Thanks, Esko)
Google before you ask your colleague and interrupt his coding.
Less code is better than more, as long as it makes more sense than lots of code.
Habits of the lazy coder
The first time you are asked to do something, do it (right).
The second time you are asked to do it, make a tool that does it automatically.
And the third time, if the tool doesn't cut it, design a domain specific language for generating more tools.
(not to be taken too seriously)
Be a Catalyst for Change
You can't force change on people. Instead, show them how the future might be and help them participate in creating it.
via The Pragmatic Programmer
Don't Panic When Debugging
Take a deep breath and THINK! about what could be causing the bug.
via The Pragmatic Programmer
You may copy and paste to get it working, but you may not leave it that way.
Duplicated code is an intermediate step, not a final product.
It's Both What You Say and the Way You Say It
There's no point in having great ideas if you don't communicate them effectively.
via The Pragmatic Programmer
Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live.
From: Coding Horror
Build Breaker Buys Lunch
Publish Early, Publish Often
Build it correct first. Make it fast second.
Frequently conduct code reviews
Code review and consequently refactoring is an ongoing task. Here is a few goodies about code review in my opinion:
It improves code quality.
It helps refactor reusable codes into reusable libraries.
It helps you learn from your fellow developers.
It helps you learn from your mistakes and refresh your memory about a genius code you have written before.
Anything that could affect how the application runs should be treated as code, and that means putting it in version control. Especially build scripts and database schema and data (.sql) files.
Take part in open source development
If you are using open source code in your projects, remember to post your bugfixes and improvements back to the community. It's not a development best practice per se, but it's definitely a programmer mindset to strive for.
Understand the tools you use
Don't use a pattern until you've understood why you're using it; don't use a tool without knowing why; don't rely on your framework or language designer always being right for your situation, but also don't assume they're wrong until proven to be!
Convention over Configuration
Especially where conventions are strong and some flexibility can be sacrificed.

Resources