Log parser/analyzer in Unix - unix

What's the popular tool people use in Unix to parse/analyze log files? Doing counting, find unique, select/copy certain line which have certain patterns. Please advise some tools or some keyword. Since I believe there must be similar questions asked before, but I don't any idea about the keywords. Thanks.

I find it to be a huge failure that many log formats do not separate columns with proper unique field separators. Not because that is best, but because it is the basic premise of unix textutils that operate on table data. Instead they tend to use spaces as separators and quote fields that might contain spaces.
One of the most practical simple changes I made to web log analyzing was to leave the default NCSA log format produced by the nginx web server, to instead use tab as the field separator.
Suddenly I could use all of the primitive unix textutils for quick lookups, but especially awk! Print only lines where the user-agent field contains Googlebot:
awk 'BEGIN {FS="\t"} $7 ~ /Googlebot/ { print; }' < logfile
Find the number of requests on for each unique request
awk 'BEGIN {FS="\t"} { print $4; }' < logfile | sort | uniq -c | sort -n
And of course lots of combinations to find specific visitors.

For regular, nightly checking there is logwatch which have several different scripts in /usr/share/logwatch/scripts/services that check for specific things (like web server stuff, ftp server stuff, sshd related stuff, etc) in syslog. Default install enables most of them, but you are able to enable/disable as you like or even write your own scripts.
For real-time watching there is multitail.

You might want to try out lnav, a curses based log analyzer. It has most of the features you would expect from a log parser like, chronological arrangement of log messages from multiple log files, support for multiple log formats, highlighting of error/warning messages, hotkeys for navigating between error/warning messages, support for SQL queries and lots more. Take a look at the project's website for screenshots and a detailed list of features.

Take a look at some of the generic log parsers listed here. If you use something like syslog, you can probably get a custom parser/analyzer too. Otherwise, for trivial searches, any scripting language like perl, python or even awk suffices.

Any programming language that allows you to open and read files, do string/text manipulations can be used, eg Perl,Python,(g)awk, Ruby,PHP, even Java etc. They support modules for the file formats you are parsing,eg csv, etc.

Related

Man page, as structured data (csv, database, ...)

In order to simplify my question, I mainly consider man pages of commands. eg «man grep».
Man pages are, more or less, structured. Most sections and their presentation are standard, and an explanation can be found on https://www.tldp.org/HOWTO/Man-Page/q3.html
(And the source of man page, in groff, is not really hard to understand, even without knowing groff)
My question is: is there already a database with the (more standard) man pages. Or at least a program, taking as input a man page (as a groff file, probably) and output such a database.
Here, I mean database in a very vague sens. Sqlite or mysql would be perfect. But a zip of csv would also be great.
Let me give you an example using man grep.
The database would have an option table, with an entry for each option. This entry would contains:
- the actual option name(s)
- The abbreviation(s),
- The description of what this option does
- The enclosing section.
. In CSV, an entry would be
--extended-regexp, -E, Interpret PATTERN as an extended regular expression (ERE\, see below). (-E is specified by POSIX .), Matcher Selection
It would have an "exit" table, with:
0, selected lines are found
1, otherwise
2 an error occurred\, unless the -q or --quiet or --silent option is used and a selected line is found.
And so on for each standard kind of sections of the man page.
And a table with every text which was not successfully put in some other table.
I hope that some part of it would be simple to parse, for example creating the option table. But some other part would be quite hard, for example the exit status. Which is why I really want to know whether something like that is already done, in order not to do it myself.
You can download man pages with
git clone http://git.kernel.org/pub/scm/docs/man-pages/man-pages

how access specific part of data as an input of AWK

Suppose I want to access an online dictionary and need to look for a specific word. I just like to have the specific part of data, which is those related to word and its translation as input of AWK,any idea?
In other words, I just want to have on my machine a margin of data, How can I prevent downloading all the data and hopefully save space and time. Is there any way to do so without downloading all the data to local machine?
This question is related to my last question here.
Edit 1:
I select dictionary as an example because when you want to look up for a word, it is enough to access a specific part of data and there is no need to process whole of it.
I am not an expert in programming so i was thinking I can modify this answer to make it work(that is why I add AWK tag again). I dont use any specific OS or tool. this is just a basic idea to see what are the possibilities so I dont know how can I improve the tags.
awk cannot download. You must download the file and pipe it into a command that terminates as soon as it finds a result:
wget -qqO- http://example.com/path |grep -wim1 "word"
wget -qqO- URL will have no output other than the content of the given URL, which is placed on standard out so you can then parse it. grep -wim1 "word" will find the first bounded word matching "word" and then terminate. If you don't need it outputted, you can use -wiq instead. If the dictionary has one word per line (and nothing else), you're better off with -x instead of -w so that you can match "can" in its entirety rather than "can't" (' is a word boundary). Remove the -i if you want to match case.
In the comments, you asked:
it may improve to jumpt to start of "w" character maybe so not to download whole data from "a" to "w". is it possible? I guess not
Some programs can "resume" downloads and you may be able to play with that, but you'd have to guess where to start. This would be a lot of work and you might seek too far and therefore fail to get a match.
If you are querying this dictionary more than once, I'd recommend downloading it and saving it so you can query it locally. Even the largest dictionary I know of is only 213MB (compressed, search with zgrep), though I am assuming you're talking about a traditional word list rather than a hash table or other arbitrary data form. Of course, anything longer would take such a long time to download that you'd only want to do it once.
If you really don't want to store it locally, you should probably consider a database rather than a flat file.

How to obfuscate lua code?

I can't find anything on Google for some tool that encrypts/obfuscates my lua files, so I decided to ask here. Maybe some professional knows how to do it? (For free).
I have made a simple game in lua now and I don't want people to see the code, otherwise they can easily cheat. How can I make the whole text inside the .lua file to just random letters and stuff?
I used to program in C# and I had this .NET obfuscator called SmartAssembly which works pretty good. When someone would try check the code of my applications it would just be a bunch of letters and numbers together with chinese characters and stuff.
Anyone knows any program that can do this for lua aswell? Just load what file to encrypt, click Encrypt or soemthing, and bam! It works!?
For example this:
print('Hello world!')
would turn into something like
sdf9sd###&/sdfsdd9fd0f0fsf/&
Just precompile your files (chunks) and load binary chunks. luacallows you to strip debugging info. If that is not enough, define your own transformations on the compiled lua, stripping names where possible. There's not really so much demand for lua obfuscators though...
Also, you loose one of the main advantages of using an embedded scripting language: Extensibility.
The simplest obfuscation option is to compile your Lua code as others suggested, however it has two major issues: (1) the strings are still likely to be easily visible in your compiled code, and (2) the compiled code for Lua interpreter is not portable, so if you target different architectures, you need to have different compiled chunks for them.
The first issue can be addressed by using a pre-processor that (for example) converts your strings to a sequence of numbers and then concatenates them back at run-time.
The second issue is not easily addressed without changes to the interpreter, but if you have a choice of interpreters, then LuaJIT generates portable bytecode that will run across all its platforms (running the same version of LuaJIT); note that LuaJIT bytecode is different from Lua bytecode, so it can't be run by a Lua interpreter.
A more complex option would be to encrypt the code (possibly before compiling it), but you need to weight any additional mechanisms (and work on your part) against any possible inconvenience for your users and any loss you have from someone cracking the protection. I'd personally use something sufficiently simple to deter the majority of curious users as you likely stand no chance against a dedicated hacker anyway.
You could use loadstring to get a chunk then string.dump and then apply some transformations like cycling the bytes, swapping segments, etc. Transformations must be reversible. Then save to a file.
Note that anyone having access to your "encryptor" Lua module will know how to decrypt your file. If you make your encrypted module in C/C++, anyone with access to source will too, or to binary of Lua encryption module they could require the module too and unofuscate the code. With interpreted language it is quite difficult to do: you can raise the bar a bit via the above the techniques but raising it to require a significant amount of work (the onlybreal deterent) is very difficult AFAIK.
If you embed the Lua interpreter than you can do this from C, this makes it significantly harder (assuming a Release build with all symbols stripped), person would have to be comfortable with stepping through assembly but it only takes one capable person to crack the algorithm then they can make the code available to others.
Yo still interested in doing this? :)
I thought I'd add some example code, since the answers here were helpful, but didn't get us all the way there. We wanted to save some lua table information, and just not make it super easy for someone to inject their own code. serialize your table, and then use load(str) to make it into a loadable lua chunk, and save with string.dump. With the 'true' parameter, debug information is stripped, and there's really not much there. Yes you can see string keys, but it's much better than just saving the naked serialized lua table.
function tftp.SaveToMSI( tbl, msiPath )
assert(type(tbl) == "table")
assert(type(msiPath) == "string")
local localName = _GetFileNameFromPath( msiPath )
local file,err = io.open(localName, "wb")
assert(file, err)
-- convert the table into a string
local str = serializer.Serialize( tbl )
-- create a lua chunk from the string. this allows some amount of
-- obfuscation, because it looks like gobblygook in a text editor
local chunk = string.dump(load(str), true)
file:write(chunk)
file:close()
-- send from /usr to the MSI folder
local sendResult = tftp.SendFile( localName, msiPath )
-- remove from the /usr folder
os.remove(localName)
return sendResult
end
The output from one small table looks like this in Notepad++ :
LuaS У
Vx#w( # АKА└АJБ┴ JА #
& А &  name
Coulombmetervalue?С╘ ажў

How can I quickly search my code using unix?

I frequently search for a token, perhaps a function name, throughout my codebase. My traditional method would be to grep for the term itself. However, the codebase is so large that I can't do this efficiently (it takes minutes).
Is there a way to do this efficiently?
ack (which ignores irrelevent files such as revision control files) is still too slow. ctags only finds declarations, which isn't what I need. I thought something like strigi might work, but I haven't tried it.
I'm on linux, using vim and the GNU toolchain, on a largely C++ codebase.
Use find and fgrep. A decent find will restrict the set to files matching some set of criteria, and fgrep will search within each file. Note that fgrep will be faster than grep.
You could also use an IDE's code indexing features. For example, Eclipse will allow you to go to a function's declaration, and it builds the relevant set in the background while you work. Even vim 7 has some such features, although I don't remember exactly what it does beyond code completion.

How to learn work effectively with Unix CLI

Do you know any resources that teach to good habits of working in UNIX command line?
EDIT: I don't mean general books about shell or man pages. I mean the things that you can only see watching professionals working with command line. For example when changing frequently between two directories they use "pushd" command, when repeating a command they use "history". I can read about these commands but I want to make it a habit to use them effectively.
I am speaking out of my own experience so it may not apply to you;
The best way to be efficient is actually using it on a daily basis, instead of using graphical tools even if they make look things easy. You will then become aware of most common tasks you care about, and instead of trying to grok it at once, you get a fairly good starting point to start learning. Man pages are the first thing to look at, but there will be non-obvious tricks which you need to search anyway. Knowing what you exactly want, infinitely increases probability of finding it.
For example, you can find how to search all mp3 files easier in man page of "find" than how to deal with files in general (where to start?).
Some common bash command line actions, not in order:
Command line editing: you'll want to be good with emacs or vi and apply that to editing your commands.
Completion: use TAB to expand file names and paths.
note: There is a huge set of file, command, and history completion functions, and it is configurable. Big topic.
"cd -" : go back to the last directory you were in
~ = home directory (or ~user for users home dir)
"ESC ." : expands to the final arg from the previous command
"!string" : execute the last command starting with string
learn find, grep, sed, piping "|" and redirection ">". You'll often combine these to do useful things.
Loops from the shell prompt, e.g. "for" loop - to do repetitive actions
Learn your regular expressions! Often used for matching files.
example: ls x[0-5]*.{zip,tar} = list files starting with x, followed by a number 0 through 5, followed by any string ending in .zip or .tar
If possible ask others for their favorite tricks, read the manual, and practice.
For the more advanced stuff This seems to be fairly comprehensive
this is a great resource: "Rute User's Tutorial and Exposition" (http://rute.2038bug.com/index.html.gz)
stackoverflow.com esp. the bash tag ;-)
(and of course the bash man page)
If you want things that you can "only see watching professionals working with command line," then you've answered your own question: Watch professionals working with the command line. I don't personally find that very useful unless the other person is doing the same thing multiple times; it's hard to pick something up after just one session because it's hard to watch the screen and the keyboard at the same time.
I think the key is to not try to become an expert right away. Just use the command line frequently, and be aware that you might not be using it as well as you could, but don't let that discourage you from using it anyway.
Browse through the man page of your shell, and through lists of tips, not with the goal of memorizing everything in them, but just to pick out a couple of things to try out. Skim through until something catches your eye and makes you think, "Gee, that sounds useful." Then try it out. Not everything is going to be useful immediately; you might have to wait a while before you encounter a situation where you can try something out. Maybe you could write down some things on Post-It notes by your desk to remind you that certain feats are possible, so when you encounter a situation where a more obscure feature could be handy, you'll be more likely to remember to try it.
Frankly, it's impossible to learn this stuff in a vacuum. You need to have problems to solve.
While it certainly helps to have familiarity with the tools available (of which there are a myriad), "learning" it requires applying it. And applying it requires "real" problems to solve.
For example, the skillset of a System Admin may be different from someone who works with databases because their roles are different.
I use them for data processing, using mostly one off files. /tmp/x.sh and /tmp/x.x are worn bare in the directory folder.
My hammers tend to lean towards: ls, find, sort, sed, vi, awk, grep, and comm. Combined with simple shell scripting like: for i in cat /tmp/list; do .. done
But I do a lot of ETL work, and very few script files, which is why my shell scripting skills are so weak.
I do rely on one script, however:
#!/bin/sh
# latest -- show latest files
ls -lt $# | head
As 95% of the time the files I'm working on are in the top 10 latest files. And "latest *.txt" works a peach.
So, bottom line, you need problems to solve. You need to learn the 'man' command, man -k is nice to find things. You also need to leverage the "See Also" at the bottom of most man pages. That's a treasure trove of "I didn't know you could do that".
Then, just start solving problems. Start figuring out "what would be nice to have" and then see if it exists (it very well may). If not, awk, perl, or python can make those "nice to haves" out of thin air.
Join a LUG. That is where I learned most things early on. Ask the organizers to do a "Bash Tips And Tricks Night".
Deft shell users love to show off.
apropos is a really good tool for this sort of thing. Whenever you find yourself unsure of the best way to do something, or wishing you weren't repeating yourself, just use apropos with a keyword or two to find other commands that can help. In distros like debian, you can also install web-based help tools that search all of the manuals available on the system: texinfo, man pages, html, and pdf etc.
Aside from that, yep, read your shell's manual right through at least once --- preferably, go back to repeatedly it as you learn more, reach limits and want to be more efficient.
The join a LUG idea is also good; you'll definitely learn from others' demos.

Resources