How can I quickly search my code using unix? - unix

I frequently search for a token, perhaps a function name, throughout my codebase. My traditional method would be to grep for the term itself. However, the codebase is so large that I can't do this efficiently (it takes minutes).
Is there a way to do this efficiently?
ack (which ignores irrelevent files such as revision control files) is still too slow. ctags only finds declarations, which isn't what I need. I thought something like strigi might work, but I haven't tried it.
I'm on linux, using vim and the GNU toolchain, on a largely C++ codebase.

Use find and fgrep. A decent find will restrict the set to files matching some set of criteria, and fgrep will search within each file. Note that fgrep will be faster than grep.
You could also use an IDE's code indexing features. For example, Eclipse will allow you to go to a function's declaration, and it builds the relevant set in the background while you work. Even vim 7 has some such features, although I don't remember exactly what it does beyond code completion.

Related

Standardized filenames when passing folders between steps in pipeline architecture?

I am using AzureML pipelines, where the interface between pipeline steps is through a folder or file.
When I am passing data into the pipeline, I point directly to a single file. No problem at all. Very useful when passing in configuration files which all live in the same folder on my local computer.
However, when passing data between different steps of the pipeline, I can't provide the next step with a file path. All the steps get is a path to some folder that they can write to. Then that same path is passed to the next step.
The problem comes when the following step is then supposed to load something from the folder.
Which filename is it supposed to try to load?
Approaches I've considered:
Use a standardized filename for everything. Problem is that I want to be able to run the steps locally too, independant of any pipeline. This makes very for a very poor UX for that use case.
Check if the path is to a file, if it isn't, check all the files in the folder. If there is only one file, then use it. Otherwise throw an exception. This is maybe the most elegant solution from a UX perspective, but it sounds overengineered to me. We also don't structurally share any code between the steps at the moment, so either we will have repetition or we will need to find some way to share code, which is non-trivial.
Allow custom filenames to be passed in optionally, otherwise use a standard filename. This helpes with the UX, but often the filenames are supposed to be defined by the configuration files being passed in, so while we could do some bash scripting to get the filename into the command, it feels like a sub-par solution.
Ultimately it feels like none of the solutions I have come up with are any good.
It feels like we are making things more difficult for ourselves in the future if we assume some default filename. F.x. we work with multiple file types, so it would need to omit an extension.
But any way to do it without default filenames would also cause maintainence headache down the line, or incurr substantial upfront cost.
The question is am I missing something? Any potential traps, better solutions, etc. would be appreciated. It definately feels like I am somewhat under- and/or overthinking this.

How to obfuscate lua code?

I can't find anything on Google for some tool that encrypts/obfuscates my lua files, so I decided to ask here. Maybe some professional knows how to do it? (For free).
I have made a simple game in lua now and I don't want people to see the code, otherwise they can easily cheat. How can I make the whole text inside the .lua file to just random letters and stuff?
I used to program in C# and I had this .NET obfuscator called SmartAssembly which works pretty good. When someone would try check the code of my applications it would just be a bunch of letters and numbers together with chinese characters and stuff.
Anyone knows any program that can do this for lua aswell? Just load what file to encrypt, click Encrypt or soemthing, and bam! It works!?
For example this:
print('Hello world!')
would turn into something like
sdf9sd###&/sdfsdd9fd0f0fsf/&
Just precompile your files (chunks) and load binary chunks. luacallows you to strip debugging info. If that is not enough, define your own transformations on the compiled lua, stripping names where possible. There's not really so much demand for lua obfuscators though...
Also, you loose one of the main advantages of using an embedded scripting language: Extensibility.
The simplest obfuscation option is to compile your Lua code as others suggested, however it has two major issues: (1) the strings are still likely to be easily visible in your compiled code, and (2) the compiled code for Lua interpreter is not portable, so if you target different architectures, you need to have different compiled chunks for them.
The first issue can be addressed by using a pre-processor that (for example) converts your strings to a sequence of numbers and then concatenates them back at run-time.
The second issue is not easily addressed without changes to the interpreter, but if you have a choice of interpreters, then LuaJIT generates portable bytecode that will run across all its platforms (running the same version of LuaJIT); note that LuaJIT bytecode is different from Lua bytecode, so it can't be run by a Lua interpreter.
A more complex option would be to encrypt the code (possibly before compiling it), but you need to weight any additional mechanisms (and work on your part) against any possible inconvenience for your users and any loss you have from someone cracking the protection. I'd personally use something sufficiently simple to deter the majority of curious users as you likely stand no chance against a dedicated hacker anyway.
You could use loadstring to get a chunk then string.dump and then apply some transformations like cycling the bytes, swapping segments, etc. Transformations must be reversible. Then save to a file.
Note that anyone having access to your "encryptor" Lua module will know how to decrypt your file. If you make your encrypted module in C/C++, anyone with access to source will too, or to binary of Lua encryption module they could require the module too and unofuscate the code. With interpreted language it is quite difficult to do: you can raise the bar a bit via the above the techniques but raising it to require a significant amount of work (the onlybreal deterent) is very difficult AFAIK.
If you embed the Lua interpreter than you can do this from C, this makes it significantly harder (assuming a Release build with all symbols stripped), person would have to be comfortable with stepping through assembly but it only takes one capable person to crack the algorithm then they can make the code available to others.
Yo still interested in doing this? :)
I thought I'd add some example code, since the answers here were helpful, but didn't get us all the way there. We wanted to save some lua table information, and just not make it super easy for someone to inject their own code. serialize your table, and then use load(str) to make it into a loadable lua chunk, and save with string.dump. With the 'true' parameter, debug information is stripped, and there's really not much there. Yes you can see string keys, but it's much better than just saving the naked serialized lua table.
function tftp.SaveToMSI( tbl, msiPath )
assert(type(tbl) == "table")
assert(type(msiPath) == "string")
local localName = _GetFileNameFromPath( msiPath )
local file,err = io.open(localName, "wb")
assert(file, err)
-- convert the table into a string
local str = serializer.Serialize( tbl )
-- create a lua chunk from the string. this allows some amount of
-- obfuscation, because it looks like gobblygook in a text editor
local chunk = string.dump(load(str), true)
file:write(chunk)
file:close()
-- send from /usr to the MSI folder
local sendResult = tftp.SendFile( localName, msiPath )
-- remove from the /usr folder
os.remove(localName)
return sendResult
end
The output from one small table looks like this in Notepad++ :
LuaS У
Vx#w( # АKА└АJБ┴ JА #
& А &  name
Coulombmetervalue?С╘ ажў

Mass Thunderbird folder to Gnus nnfolder conversions

I'm pondering the idea of importing a few thousand Thunderbird folders, each folder containing many emails of course, as a set of Emacs' Gnus mailgroups. Each mailgroup name would be derived from the folder hierarchy. Because of the quantity, the work is going to be fairly tedious, so I would automate this massive import if possible.
Among the available backends, nnfolder seems the most promising in this case. I presume it would be better to populate the mailgroups from within Gnus. Otherwise, I would have to thoroughly understand the nnfolder format, and this might require many iterations before I really get it right. Moreover, as email continues to flow in, iterations may become difficult to properly organize without loosing anything.
I guess I have to respool everything, under the constraint that the selected mailgroup is a function of the Thunderbird origin, overriding the standard Gnus selection mechanism. I did some Gnus coding in the past, but since I did not touch Emacs for a dozen years, it is all very rusty. I'm a bit lost about how to approach this task as efficiently and quickly as possible. So my question: how would you handle it? Or is there some clever Gnus hidden corner that I should explore more deeply? :-)
François
P.S. After I wrote this question, I found out that Gnus has a nice, helping function towards this goal. The idea is to first copy all Thunderbird folder files within the ~/Mail directory, as they are for the contents, but properly renamed. Once this done, M-x nnfolder-generate-active-file does at once, for each copied folder, edit the contents, leave a ~ backup, generate NOV data, create one mailgroup and, of course, adjust the ~/Mail/active file.
To copy the folders underneath the ~/.thunderbird/LOGIN/Mail/Local Folders/ directory, I wrote a small Python script. It ignores all .msf files, and recurse within .sbd directories. The folder path name, relative to Local Folders/, has all its .sbd/ strings turned into periods to produce the mailgroup name, also lowering case, turning spaces and underlines to dashes, and handling other special characters appropriately. In particular, non-ASCII characters are not handled properly, nnfolder is confusing UTF-8 and ISO-8859-1 here and there. The script also has to skip msgfilterrules.dat and likely drafts, junk and such things.
I notice two details requiring attention :
Thunderbird itself can be used to compact folders before copying them, otherwise one might unwillingly recover messages which were already deleted.
(setq nnmail-use-long-file-names t) is needed in ~/.emacs prior to the whole operation.
The batch transformation aborted, saying it is not able to decrypt one of the message. I moved the offending folder out of the way, and then, the lengthy operation succeeded.

finding duplicate source code

I'm analyzing some legacy code. It is about 80.000 lines of old plsql code. On a fist look there is quite some duplication in the source which needs to be removed. Instead off doing diff's manual and looking at each file there must be some tool/commandline confu out there to detect duplicate lines of source code.
My goal is to make an educated guess about the minimal size of a rewrite of source and about how much actual knowledge is captured in this program. I wrote some a basic static code analyzer to find the amount of control statements IF ELSE FOR etc and Functions in each file.
But duplicated code still needs to be removed from my statistics.
Have you looked at Simian - Similarity Analyser? (Just checked and it's no longer free, but it is available for a period of 15 days for evaluation purposes.)
Simian (Similarity Analyser)
identifies duplication in Java, C#, C,
C++, COBOL, Ruby, JSP, ASP, HTML, XML,
Visual Basic, Groovy source code and
even plain text files. In fact, simian
can be used on any human readable
files such as ini files, deployment
descriptors, you name it.
I have used it in practice and it does work well.
Sonar has duplication detection and claims to support PL/SQL, though I've never used it for that.
You would need to beg/borrow/steal/write a plsql parser and compare the resulting abstract syntax trees. With the size of the code base you have, that might be worthwhile. There would be other uses for the parser once you're done.
How about this:
http://sourceforge.net/projects/sddforeclipse/
It is opensource, and is said to be used by commercial software. It is a plugin to Eclipse, by the way.

How to learn work effectively with Unix CLI

Do you know any resources that teach to good habits of working in UNIX command line?
EDIT: I don't mean general books about shell or man pages. I mean the things that you can only see watching professionals working with command line. For example when changing frequently between two directories they use "pushd" command, when repeating a command they use "history". I can read about these commands but I want to make it a habit to use them effectively.
I am speaking out of my own experience so it may not apply to you;
The best way to be efficient is actually using it on a daily basis, instead of using graphical tools even if they make look things easy. You will then become aware of most common tasks you care about, and instead of trying to grok it at once, you get a fairly good starting point to start learning. Man pages are the first thing to look at, but there will be non-obvious tricks which you need to search anyway. Knowing what you exactly want, infinitely increases probability of finding it.
For example, you can find how to search all mp3 files easier in man page of "find" than how to deal with files in general (where to start?).
Some common bash command line actions, not in order:
Command line editing: you'll want to be good with emacs or vi and apply that to editing your commands.
Completion: use TAB to expand file names and paths.
note: There is a huge set of file, command, and history completion functions, and it is configurable. Big topic.
"cd -" : go back to the last directory you were in
~ = home directory (or ~user for users home dir)
"ESC ." : expands to the final arg from the previous command
"!string" : execute the last command starting with string
learn find, grep, sed, piping "|" and redirection ">". You'll often combine these to do useful things.
Loops from the shell prompt, e.g. "for" loop - to do repetitive actions
Learn your regular expressions! Often used for matching files.
example: ls x[0-5]*.{zip,tar} = list files starting with x, followed by a number 0 through 5, followed by any string ending in .zip or .tar
If possible ask others for their favorite tricks, read the manual, and practice.
For the more advanced stuff This seems to be fairly comprehensive
this is a great resource: "Rute User's Tutorial and Exposition" (http://rute.2038bug.com/index.html.gz)
stackoverflow.com esp. the bash tag ;-)
(and of course the bash man page)
If you want things that you can "only see watching professionals working with command line," then you've answered your own question: Watch professionals working with the command line. I don't personally find that very useful unless the other person is doing the same thing multiple times; it's hard to pick something up after just one session because it's hard to watch the screen and the keyboard at the same time.
I think the key is to not try to become an expert right away. Just use the command line frequently, and be aware that you might not be using it as well as you could, but don't let that discourage you from using it anyway.
Browse through the man page of your shell, and through lists of tips, not with the goal of memorizing everything in them, but just to pick out a couple of things to try out. Skim through until something catches your eye and makes you think, "Gee, that sounds useful." Then try it out. Not everything is going to be useful immediately; you might have to wait a while before you encounter a situation where you can try something out. Maybe you could write down some things on Post-It notes by your desk to remind you that certain feats are possible, so when you encounter a situation where a more obscure feature could be handy, you'll be more likely to remember to try it.
Frankly, it's impossible to learn this stuff in a vacuum. You need to have problems to solve.
While it certainly helps to have familiarity with the tools available (of which there are a myriad), "learning" it requires applying it. And applying it requires "real" problems to solve.
For example, the skillset of a System Admin may be different from someone who works with databases because their roles are different.
I use them for data processing, using mostly one off files. /tmp/x.sh and /tmp/x.x are worn bare in the directory folder.
My hammers tend to lean towards: ls, find, sort, sed, vi, awk, grep, and comm. Combined with simple shell scripting like: for i in cat /tmp/list; do .. done
But I do a lot of ETL work, and very few script files, which is why my shell scripting skills are so weak.
I do rely on one script, however:
#!/bin/sh
# latest -- show latest files
ls -lt $# | head
As 95% of the time the files I'm working on are in the top 10 latest files. And "latest *.txt" works a peach.
So, bottom line, you need problems to solve. You need to learn the 'man' command, man -k is nice to find things. You also need to leverage the "See Also" at the bottom of most man pages. That's a treasure trove of "I didn't know you could do that".
Then, just start solving problems. Start figuring out "what would be nice to have" and then see if it exists (it very well may). If not, awk, perl, or python can make those "nice to haves" out of thin air.
Join a LUG. That is where I learned most things early on. Ask the organizers to do a "Bash Tips And Tricks Night".
Deft shell users love to show off.
apropos is a really good tool for this sort of thing. Whenever you find yourself unsure of the best way to do something, or wishing you weren't repeating yourself, just use apropos with a keyword or two to find other commands that can help. In distros like debian, you can also install web-based help tools that search all of the manuals available on the system: texinfo, man pages, html, and pdf etc.
Aside from that, yep, read your shell's manual right through at least once --- preferably, go back to repeatedly it as you learn more, reach limits and want to be more efficient.
The join a LUG idea is also good; you'll definitely learn from others' demos.

Resources