What is the meaning of "t" in ls output? - ls

ls reports the following:
# ls -ld /var/lib/puppet/state/
drwxr-xr-t 3 puppet puppet 4096 Jan 8 16:53 /var/lib/puppet/state/
What does the "t" mean for other? What tool reports the symbolic names for the permissions? ls has the --numeric-uid-gid option, but is there another one for permissions?

man ls is your friend:
t The sticky bit is set (mode 1000), and is searchable or executable.
(See chmod(1) or sticky(8).)
About sticky bit:
When set, it instructed the operating system to retain the text segment of the program in swap space after the process exited. This speeds up subsequent executions by allowing the kernel to make a single operation of moving the program from swap to real memory. Thus, frequently-used programs like editors would load noticeably faster.

The sticky bit is today incredibly confusing. It no longer pins the file into memory, making it actually "sticky", anymore. Sometimes it is called the "tacky" bit because it is represented as a 't' or 'T', but other folks still call it sticky. It only matters in modern linux and unix when applied to a directory, as far as I can tell. It means that people in groups who have the permission to delete a file still can't do it if the sticky bit is set on the directory. But it gets more confusing. It shows up in the last field, which is the execute/search field for "other" users, but acts on "group" users ("other" normal users can never delete files). The reason why it isn't shown in the "group" execute field is because that one changes to an "s" if the SETUID bit is set for group. I think. I am still researching that one.

Related

Copying 100 GB with continues change of file between datacenters with R-sync is good idea?

I have a datacenter A which has 100GB of the file changing every millisecond. I need to copy and place the file in Datacenter B. In case of failure on Datacenter A, I need to utilize the file in B. As the file is changing every millisecond does r-sync can handle it at 250 miles far datacenter? Is there any possibility of getting the corropted file? As it is continuously updating when we call this as a finished file in datacenter B ?
rsync is a relatively straightforward file copying tool with some very advanced features. This would work great for files and directory structures where change is less frequent.
If a single file with 100GB of data is changing every millisecond, that would be a potential data change rate of 100TB per second. In reality I would expect the change rate to be much smaller.
Although it is possible to resume data transfer and potentially partially reuse existing data, rsync is not made for continuous replication at that interval. rsync works on a file level and is not as commonly used as a block-level replication tool. However there is an --inplace option. This may be able to provide you the kind of file synchronization you are looking for. https://superuser.com/questions/576035/does-rsync-inplace-write-to-the-entire-file-or-just-to-the-parts-that-need-to
When it comes to distance, the 250 miles may result in at least 2ms of additional latency, if accounting for the speed of light, which is not all that much. In reality this would be more due to cabling, routers and switches.
rsync by itself is probably not the right solution. This question seems to be more about physics, link speed and business requirements than anything else. It would be good to know the exact change rate, and to know if you're allowed to have gaps in your restore points. This level of reliability may require a more sophisticated solution like log shipping, storage snapshots, storage replication or some form of distributed storage on the back end.
No, rsync is probably not the right way to keep the data in sync based on your description.
100Gb of data is of no use to anybody without without the means to maintain it and extract information. That implies structured elements such as records and indexes. Rsync knows nothing about this structure therefore cannot ensure that writes to the file will transition from one valid state to another. It certainly cannot guarantee any sort of consistency if the file will be concurrently updated at either end and copied via rsync
Rsync might be the right solution, but it is impossible to tell from what you have said here.
If you are talking about provisioning real time replication of a database for failover purposes, then the best method is to use transaction replication at the DBMS tier. Failing that, consider something like drbd for block replication but bear in mind you will have to apply database crash recovery on the replicated copy before it will be usable at the remote end.

Increase the command window in R [duplicate]

In R, I like to use reverse search (ctrl+r) to redo infrequent but complex commands without a script. Frequently, I will do so many other commands in between that the command history discards the old command. How can I change the default length of the command history?
This is platform and console specific. From the help for ?savehistory:
There are several history mechanisms available for the different R
consoles, which work in similar but not identical ways...
...
The history mechanism is controlled by two environment variables:
R_HISTSIZE controls the number of lines that are saved (default 512),
and R_HISTFILE sets the filename used for the loading/saving of
history if requested at the beginning/end of a session (but not the
default for these functions). There is no limit on the number of lines
of history retained during a session, so setting R_HISTSIZE to a large
value has no penalty unless a large file is actually generated.
So, in theory, you can read and set R_HISTSIZE with:
Sys.getenv("R_HISTSIZE")
Sys.setenv(R_HISTSIZE = new_number)
But, in practise, this may or may not have any effect.
See also ?Sys.setenv and ?EnvVar
Take a look at the help page for history(). This is apparently set by the environment variable R_HISTSIZE so you can set it for the session with Sys.setenv(R_HISTSIZE = XXX). I'm still digging to find where you change this default behavior for all R sessions, but presumably it will be related to .Startup or your R profile.
?history
"There are several history mechanisms available for the different R
consoles, which work in similar but not identical ways. "
Furthermore there may even be two history mechanism in the same device. I have .history files saved from the console and the Mac R GUI has its own separate system. You can increase the number of GUI managed history entries in the Preferences panel.
There is an incremental history package:
http://finzi.psych.upenn.edu/R/library/track/html/track.history.html
In ESS you can set comint-input-ring-size:
(add-hook 'inferior-ess-mode-hook
'(lambda()
(setq comint-input-ring-size 9999999)))

Need an option to Wireshark Statistics

I need to obtain statistics about the network traffic of an mpls link between two sites. The main purpose of this is detect the 'top flooders' at the end of the day and at precise moments when the network is 'overloaded'.
At this time i have a sniffer with Ubuntu and i'm using wireshark to capture packets. The built-in statistics are awesome, but i can only use them with not bigger files than 150mb (it hungs for memory leaks with bigger files). So i use them for precise moments to detect in 'live mode' any instant flooder. But its impossible for me to leave wireshark capturing traffic all day long because of the hungs.
What tools are better suited to use them for these purposes? (detect any 'instant' flooder and take statistics of top talkers and top conversations between computers for the entire day)
Thank you.
Preliminary important note:
wireshark does not "hang for memory leaks with bigger files". The (very annoying) problem is that when opening a file, wireshark dissect it entirely from first to last packet before doing anything else and 1/ that can take a very very very long time and 2/ this imply that wireshark will have the entire file in memory e.g. the wireshark process will weight 1GB of memory for a 1GB trace (plus its own internal memory data of course), which may becomes a problem not only for wireshark but for the whole OS. Hence yes, it can become so unresponsive for so long that it looks like it's "hanged". Not a bug - rather a missing very complicated feature to dissect in "lazy" mode. The same goes with live capture, it dissect and put to relation everything (so that it knows and follow TCP dialog for instance) on the fly, and will hold the entire capture in memory. Which can quickly becomes quite heavy, both on memory and CPU.
And this will not the fixed implemented tomorrow, so now to your problem:
An option would be not to save to a file and latter process it, but doing it "live". You can do so using tshark (a terminal base version of wireshark) that will do the capture just like wireshark, and pipe its textual output to a dissecting/statistic analysis of your own.
https://www.wireshark.org/docs/man-pages/tshark.html
It has a -Y <displaY filter> option, so you should be able to use the MPLS filters from wireshark:
https://www.wireshark.org/docs/dfref/m/mpls.html
The -z <statistics> option will not be usable since it display the result after finishing reading the capture file, and you'll be piping live.
And tshark by default work in "one-pass analysis" mode, which of course limit a lot the analysis it can do, but alleviate the wireshark issue of "I want to dissect everything".[*]
So that would look like:
$ sudo tshark -i <your interface> -Y <your display filters> etc etc | your_parsing_and_statistical_tool
Of course, you'll have to write your own code for "your_parsing_and_statistical_tool". I'm not familiar with MPLS, nor know the statistics your interested in, but that may just be a couple of hours (or days) or Python coding? So if that's worth it for your job...
[*]:
tshark also have an option -2 to perform a two-pass analysis, but that would not work here since the first-pass must be completed first, which will never occur since your not reading a file but capture and analyse live.

How can I reverse the history numbering?

I am a unix addict, and many of the machines I use (at home and at work) are now quickly passing the 10,000 command mark. I like to keep all of the commands I issue readily available which is why I have set the upper limit to something like 100,000 entries, but it is becoming tedious to recall particular recent entries as I have to be writing !12524 in the shell to expand out that one.
Sure, I can use the shortcut ones to recall the last command or even the tenth-last command but its impossible to keep track of that so 90% of the time I am doing things like history | grep 'configure --prefix' (for reviewing how I configured something last, etc) and then using whatever history index that spits back.
Can I reverse it so that command #10000 corresponds to the command from ten thousand commands ago, and have command #1 be the last command?

Disk reads / seeks on a shared unix server for a directory list

I want to get a better understanding of how disk reads work for a simple ls command and for a cat * command on a particular folder.
As I understand it, disk reads are the "slowest" operation for a server/any machine, and a webapp I have in mind will be making ls and cat * calls on a certain folder very frequently.
What are "ball park" estimates of the disk reads involved for an "ls" and for a "cat *" for the following number of entries?
Disk reads for ls Disk reads for cat *
200
2,000
20,000
200,000
Each file entry is just a single line of text
Tricky to answer - which is probably why it spent so long getting any answer at all.
In part, the answer will depend on the file system - different file systems will give different answers. However, doing 'ls' requires reading the pages that hold the directory entries, plus reading the pages that hold the inodes identified in the directory. How many pages that is - and therefore how many disk reads - depends on the page size and on the directory size. If you think in terms of 6-8 bytes of overhead per file name, you won't be too far off. If the names are about 12 characters each, then you have about 20 bytes per file, and if your pages are 4096 bytes (4KB), then you have about 200 files per directory page.
If you just list names and not other attributes with 'ls', you are done. If you list attributes (size, etc), then the inodes have to be read too. I'm not sure how big a modern inode is. Once upon a couple of decades ago on a primitive file system, it was 64-bytes each; it might have grown since then. There will be a number of inodes per page, but you can't be sure that the inodes you need are contiguous (adjacent to each other on disk). In the worst case, you might need to read another page for each separate file, but that is pretty unlikely in practice. Fortunately, the kernel is pretty good about caching disk pages, so it is unlikely to have to reread a page. It is impossible for us to make a good guess on the density of the relevant inode entries; it might be, perhaps, 4 inodes per page, but any estimate from 1 to 64 might be plausible. Hence, you might have to read 50 pages for a directory containing 200 files.
When it comes to running 'cat' on the files, the system has to locate the inode for each file, just as with 'ls'; it then has to read the data for the file. Unless the data is stored in the inode itself (I think that is/was possible in some file systems with biggish inodes and small enough file bodies), then you have to read one page per file - unless partial pages for small files are bunched together on one page (again, I seem to remember hearing that could happen in some file systems).
So, for a 200 file directory:
Plain ls: 1 page
ls -l: 51 pages
cat *: 251 pages
I'm not sure I'd trust the numbers very far - but you can see the sort of data that is necessary to improve the estimates.

Resources