How to get useful names for anonymous java methods using async-profiler - anonymous-class

I'm trying to use async-profiler to analyze Apache Spark performance, and I find that, even when codegen disabled, the majority of CPU time is spend in anonymous classes, such as Iterator$$anon$11.hasNext.
Is there a way I can get async-profiler to include outer class names for the objects containing the methods?
Or is there a way I can get the JVM to attach more meaningful names?
Or is there a way I can give names to anonymous classes?
The closest I've found to anyone ever even asking about this is from Finding a specific lambda noted in a Profiler, but that doesn't seem super helpful.
Thanks.
I've tried to look up ways to do this, but I'm coming up empty.

Related

Can I execute untrusted Common Lisp code in a restricted environment?

Supposed I wanted to take advantage of Common Lisp's ability to read and execute Common Lisp code so that my program can execute external code written in Lisp, but I don't trust that code so I don't want it to have access the full power of Common Lisp. Is it possible for me to restricts its environment so that it can only see the packages/symbols to which I explicitly give it access, effectively creating a DSL?
To read the code, start by disabling *read-eval* (that stops people injecting execution during parsing, using something like #.(do-evil-stuff). You probably want to do the reading using a custom read-table that disables most (if not all) read-macros. You probably want to do the reading with a custom, one-off, package, importing only symbols you allow.
Once you've read the user-provided code, you still need to validate that there's no unexpected function/macro references in the code. If you have used a custom package, you should be able to confirm that each symbol falls in either of the two classes "belongs to the custom one-off package" (this is user-supplied stuff) or "explicitly allowed from elsewhere" (you would need this list to construct the custom package).
Once that's been done, you can then evaluate it.
However, doing this correctly would take a fair bit of care and you really should have someone else have a look at the code and actively try to break out of the sandbox.
Take a look at the section 'Reader security' in chapter 4 of Let over lambda which discusses this topic in some depth. In particular, you probably want to set *read-eval* to nil. To address your question regarding restricting access to the environment, this is generally difficult in Common Lisp, as it is designed to allow access to most pieces of the system in the first place. Maybe you can use elaborate the ideas of Let over lambda in the direction of white listing symbols (in comparison to the blacklisting of macro characters in the linked chapter). I don't think there are any ready-made solutions.

ECS with Go - circular imports

I'm exploring both Go and Entity-Component-Systems. I understand how ECS works, and I'm trying to replicate what seems to be the go-to document of ECS, namely http://cowboyprogramming.com/2007/01/05/evolve-your-heirachy/
For performance, the document recommends to use static arrays of every component type. That is, not arrays of component interfaces (arrays of pointers). The problem with this in Go is circular imports.
I have one package, ecs, which contains the definitions for Entity, Component and System types/interfaces as well as an EntityManager. Another package, ecs/components, contains the various components. Obviously, the ecs/components package depends on ecs. But, to declare arrays of specific components in EntityManager, ecs would depend on ecs/components, therefore creating a circular import.
Is there any way of avoiding this? I am aware that normally a high level system should not depend on lower systems. I'm also want to point out that using an array of pointers is probably fast enough for my purposes, but I'm interested in possible workarounds (for future reference)
Thank you for your help!
For performance, the document recommends to use static arrays of every
component type.
I'm just going to start off saying that I may be blind, but I ctrl+f'd and read that document multiple times and didn't see anything close to that. (Certainly some optimizations could be made this way with regards to things like avoiding cache misses, but I'm dubious it in any way outweighs the clerical overhead).
There's the easy answer to the exact question you asked first, the . import. Any package with an import statement like import . "some/other/package" will treat that package's contents as its own, ignoring circular dependencies. Don't do this.
Unfortunately, without merging the packages, you won't be able to do this (without using interfaces, I mean). Don't fear, though. The article you posted explicitly says this under "implementation details".
Giving each component a common interface means deriving from a base
class with virtual functions. This introduces some additional
overhead. Do not let this turn you against the idea, as the additional
overhead is small, compared to the savings due to simplification of
objects.
It's outright telling you to use interfaces (okay, C++ virtual inheritance, but close enough). It's okay, it's necessary. Especially if you want two slightly different AI components or something, it's a godsend then.

Using generic functions of R, when and why?

I'm developing an major upgrade to the R package, and as part of the changes I want to start using the S3 methods so I can use the generic plot, summary and print functions. But I think I'm not totally sure I understand why and when to use generic functions in general.
For example, I currently have a function called logLikSSM, which computes the log-likelihood of a state space model. Instead of using this functions, I could make function logLik.SSM or something like that, as there is generic function logLik in R. The benefit of this would be that logLik is shorter to write than logLikSSM, but is there really any other point in this?
Similar case, there is a generic function called simulate in stats package, so in theory I could use that instead of simulateSSM. But now the description of the simulate function tells that function is used to "Simulate Responses", but my function actually simulates the hidden states, so it really doesn't fit into the description of the simulate function. So probably in this case I shouldn't use the generic function right?
I apologize if this question is too vague for here.
The advantages of creating methods for generics from the core of R include:
Ease of Use. Users of your package already familiar with those generics will have less to remember making it easier to use your package. They might even be able to do a certain amount without reading the documentation. If you come up with your own names then they must discover and remember new names which is an added cognitive burden.
Leverage Existing Functionality. Also any other functions that make use of generics you create methods for can then automatically use yours as well; otherwise, they would have to be changed. For example, AIC uses logLik.
A disadvantage is that the generic involves the extra level of dispatch and if logLik is in the inner loop of an optimization there could be an impact (although possibly not material). In that case you could check the performance of calling the generic vs. calling the method directly and use the latter if it makes a significant difference.
Regarding the case that your function has a completely different purpose than the generic in the core of R, then it might be more confusing than helpful so you might, in that case, not create a method but have your own function name.
You might want to read the zoo Design manual (see link to zoo Design under Vignettes near the bottom of that page) which discusses the design ideas that went into the zoo package. These include the idea being discussed here.
EDIT: Added disadvantates.
good question.
I'll split your Question into two parts; here's the first one:
i]s there really any other point in [making functions generic]?
Well, this pattern is usually invoked when the develper doesn't know the object class for every object he/she expects a user to pass in to the method under consideration.
And because of this uncertainty, this design pattern (which is called overloading in many other languages) is invokved, and which requires R to evaluate the object class, then dispatch that object to the appropriate method given the object type.
The second part of your Question: [i]n this case I shouldn't use [the generic function] right?
To try to give you an answer useful beyond the detail of your Question, consider what happens to the original method when you call setGeneric, passing that method in.
the original function body is replaced with code for performing a top-level dispatch based on type of object passed in. This replaces the original function body, which just slides down one level so that it becomes the default method that the top level (generic) function dispatches to.
showMethods() will let you see all of those methods which are called by the newly created dispatch function (generic function).
And now for one huge disadvantage:
Ease of MISUse:
Users of your package already familiar with those generics might do a certain amount without reading the documentation.
And therein lies the fallacy that components, reusable objects, services, etc are an easy panacea for all software challenges.
And why the overwhelming majority of software is buggy, bloated, and operates inconsistently with little hope of tech support being able to diagnose your problem.
There WAS a reason for static linking and small executables back in the day. But this generation of code now, get paid now, debug later if ever, before the layoffs/IPO come, has no memory of the days when code actually worked very reliably and installation/integration didn't require 200$/hr Big 4 consultants or hackers who spend a week trying to get some "simple" open source product installed and productively running.
But if you want to continue the tradition of writing ever shorter function/method names, be my guest.

Console Application Structure

I've written several .Net Console Applications over the past 6 months and we have many more throughout different projects in our organization. I generally stick to the same standard format/structure for my Console Applications. Unfortunately, many of our console applications do not.
I have been looking into ways of standardizing the structure of these Console Applications. I would also like to provide a framework for the basic structure of a Console Application and provide easy access to standard ways of handling things such as argument passing, logging, etc.
Can anyone suggest Best Practices for addressing these concerns? I have been reading this MSDN article on Console Applications in .Net which suggests a Design Pattern for Console Apps. The example uses a Template Method pattern to handle some of the concerns I listed earlier.
Two negatives of using this approach are listed in the article.
Ending up with twice as many classes
Having many simple, similar classes
Can anyone suggest better, or more standard, ways of handling this? What about listing additional negatives with this approach?
To answer part of my own question...
I did spend quite a bit of time looking for a standard way to handle the parsing of arguments passed to my console applications. I was specifically looking for something similar to GetOpt for python. With that in mind, the solution that I settled on is NDesk.Options. It covers all of our needs and seems to handle arguments in a standard fashion. I thought this might help someone who stumbles on this question in their search.
I tend to keep my console applications as simple as possible and as much in another layer so that it can be more easily unit and acceptance tested. I also generally keep my console applications simple/single tasked, so I don't often have many possible paths through things like command line arguments. This allows me to just take the arguments, if any, parse them and pass them along to "back end" logic.

How to hand over variables to a function? With an array or variables?

When I try to refactor my functions, for new needs, I stumble from time to time about the crucial question:
Shall I add another variable with a default value? Or shall I use only one array, where I´m able to add an additional variable without breaking the API?
Unless you need to support a flexible number of variables, I think it's best to explicitly identify each parameter. In most cases you can add an overloaded method that has a different signature to support the extra parameter while still supporting the original method signature. If you use an array for passing variables it just makes it too confusing for users of your API. Obviously there are some inputs that lend themselves to an array (a list of points in a polygon, a list of account IDs you wish to perform an action on, etc.) but if it's not a variable that you would reasonably expect to be an array or list, you should pass it into the method as a separate parameter.
Just like many questions in programming, the right answer is "it depends".
To take Javascript/jQuery as an example, one good rule of thumb is whether the parameter will be required each time the function is called or whether it is optional. For example, the main jQuery function itself requires an expression to determine what element(s) the operation will affect:
jQuery(expresssion)
It makes no sense to try to pass this parameter as part of an array as it will be required every time this function is called.
On the other hand, many jQuery plugins require several miscellaneous parameters that may be optional. By convention, these are passed as parameters via an 'options' array. As you said, this provides a nice interface as new parameters can be added without affecting the existing API. This makes the API clean as well since the user can ignore those options that are not applicable.
In general, when several parameters are involved, passing them as an array is a nice convention as many of them are certainly going to be optional. This would have helped clean up many WIN32 API's, although it is more difficult to deal with arrays in C/C++ than in Javascript.
It depends on the programming language used.
If you have a run-of-the-mill OO language, you should use an object that you can easily extend, if you are really concerned about API consistency.
If that doesn't matter that much, there is the option of changing the method signature and overloading the method with more / different parameters.
If your language doesn't support either and you want the API to be binary stable, use an array.
There are several considerations that must be made.
Where is the function used? - Only in code you created? One place or hundreds of places? The amount of work that will need to be done to maintain existing code is important. Remember to include the amount of time it will take to communicate to other programmers that may currently be using your function.
How critical is the new parameter? - Do you want to require it to be used? If it has a default value, will that default value break existing use of the function in any subtle ways?
Ease of comprehension - How many parameters are already passed into the function? The larger the number, the more confusing and error prone it will be. Code Complete recommends that you restrict the number of parameters to 7 or less. If you need more than that, you should try to abstract some or all of the related parameters into one object.
Other special considerations - Do you want to optimize your efforts for any special conditions such as code speed or size? Are there any special considerations that must be taken into account for your execution environment? Keep in mind your goals for the project and make sure you aren't working against them with whatever design choice you make.
In his book Code Complete, Steve McConnell decrees that a function should never have more than 7 arguments, and rarely even that many. He presents compelling arguments - that I can't cite from memory, alas.
Clean Code, more recently, advocates even fewer arguments.
So unless the number of things to pass is really small, they should be passed in an enveloping structure. If they're homogenous, an array. If not, then a reasonably lightweight object should be built for the purpose.
You should do neither. Just add the parameter and change all callers to supply the proper default value. The reason is that parameters with default values can only be at the end, and will not be able to add any more required parameters anywhere in the parameters list, without having a risk of misinterpretation.
These are the critical steps to disaster:
1. add one or two parameters with defaults
2. some callers will supply it, and some will rely on defaults.
[half a year passed]
3. add a required parameter (before them)
4. change all callers to accept the required parameter
5. get a phone call, or other event which will make you forget to change one of the instances in part#2
6. now your program compiles perfectly, but is invalid.
Unfortunately, in function call semantics we usually don't have a chance to say, by name, which value goes where.
Array is also not a proper solution. Array should be used as a connection of similar objects, upon which there's a uniform activity performed. As they say here, if it's worth refactoring, it's worth refactoring now.

Resources