I am using connect direct with scp and trying to send some pdf files from unix to mainframes.
On unix end, I have archive containing pdfs which I am simply renaming to ABC.XYZ.LMN.PQR (mainframe file name) and then sending to mainframe.
The archive contains variable length pdf files.
However, the requirement is:
For any variable length file mainframe needs to know the maximum possible length, of any record in the file. For e.g. say the LRECL is 1950.
How to include LRECL as well when the pdf files inside the archive file to be sent is of variable length ?
An alternative would be to transfer the files to a Unix System Services file (z/OS Unix) instead of a "traditional" z/OS dataset. Then the folks on the mainframe side could use their utilities to copy the file to a "traditional" mainframe dataset if that's what they need.
For Variable Blocked datasets Only! If your maximum record size is 1950 you will want to specify RECFM=VB,LRECL=1954 adding 4 bytes more than your maximum record. This 4 byte allowance is for the Record Descriptor Word (RDW). If you need to specify BLKSIZE then the minimum is the size of LRECL plus another 4 bytes.
So in your example, your JCL will have DCB parameters that looks like: RECFM=VB,LRECL=1954,BLKSIZE=1958
Ideally, for optimal storage, BLKSIZE should be set to the largest size that does not exceed the device specific recommendation. i.e. TAPE devices typically try to use BLKSIZE=32670 (32 * 1024K - 8 for RDW & BDW). Disk drives may vary but in our shop BLKSIZE=23476 is considered optimal.
Incorrect blocking factors can waste tremendous amounts of space. When in doubt, let your system defaults apply or consult with your local system guru's for their device specific recommendations.
Related
In describing how find (& find and replace) work, the Atom Flight Manual refers to the buffer as one scope of search, with the entire project being another scope. What is the buffer? It seems like it would be the current file, but I expect it is more than that.
From the Atom Flight Manual:
A buffer is the text content of a file in Atom. It's basically the same as a file for most descriptions, but it's the version Atom has in memory. For instance, you can change the text of a buffer and it isn't written to its associated file until you save it.
Also came across this from The Craft of Text Editing by Craig Finseth, Chapter 6:
A buffer is the basic unit of text being edited. It can be any size, from zero characters to the largest item that can be manipulated on the computer system. This limit on size is usually set by such factors as address space, amount of real and/or virtual memory, and mass storage capacity. A buffer can exist by itself, or it can be associated with at most one file. When associated with a file, the buffer is a copy of the contents of the file at a specific time. A file, on the other hand, can be associated with any number of buffers, each one being a copy of that file's contents at the same or at different times.
Files on unix are assigned to index-nodes which may vary in layout and content among differnt systems. Usually, there must be common elements across the different implementations. Which are those ?
Normally, the following elements should be available:
Filetype, 9 RWX permission bits
Number of references to the file
Owner's identity
Owner's group
File size in bytes
13 device addresses
last read date/time
last write date/time
last i-node modification date/time
Is there a simple and quick way to detect encrypted files? I heard about enthropy calculation, but if I calculate it for every file on a drive, it will take days to detect encryption.
Is it possible to, say it, calculate some value for first 100 bytes or 1024 bytes and then decide? Anyone has a sources for that?
I would use a cross-entropy calculation. Calculate the cross-entropy value for X bytes for known encrypted data (it should be near 1, regardless of type of encryption, etc) - you may want to avoid file headers and footers as this may contain non-encrypted file meta data.
Calculate the entropy for a file; if it's close to 1, then it's either encrypted or /dev/random. If it's quite far away from 1, then it's likely not encrypted. I'm sure you could apply signifance tests to this to get a baseline.
It's about 10 lines of Perl; I can't remember what library is used (although, this may be useful: http://dingo.sbs.arizona.edu/~hammond/ling696f-sp03/addonecross.txt)
You could just make a system that recognizes particular common forms of encrypted files (ex: recognize encrypted zip, rar, vim, gpg, ssl, ecryptfs, and truecrypt). Any attempt to determine encryption based on the raw data will quickly run into a steganography discussion.
One of the advantages of good encryption is that you can design it so that it can't be detected - see the Wikipedia article on deniable encryption for example.
Every statistical approach to detect encryption will give you various "false alarms", like
compressed data or random looking data in general.
Imagine I'd write a program that outputs two files: file1 contains 1024 bit of π and file2 is an encrypted version of file1. If you don't know anything about the contents of file1 or file2, there's no way to distinguish them. In fact, it's quite likely that π contains the contents of file2 somewhere!
EDIT:
By the way, it's not even working the other way round (detecting unencrypted files). You could write a program that transforms encrypted data to readable english text by assigning words or whole sentences to bits/bytes of it.
I am writing an application which uses the DCM MKDIR, We save our images in little bigger name, but when I am trying to use the DCMMKDIR application, which is asking me to input file name max 8 characters.
Presently, I am planning to rename my Images starting from 1 to N. But remapping these images to known names(on the disk) will be bit difficult(I feel).
are there are any other methods/process to achieve the same.
The restriction of the filename to eight characters is derived from the DICOM Standard to ensure compatibility with applications that support e.g. only ISO 9660 as a file system for CDs.
About the naming you can have a look at the specifications the german CD Testat (http://www.dicom-cd.de/docs/DRG-RequirementsSpecification-2006.pdf). As a vendor you can be certified to conform to certain standards for interoperability of patient CDs which is currently the most common usage of DICOMDIRs.
The DICOMDIR file generated by DCMMKDIR is mainly a kind of index to tell an application, what DICOM files in a certain directory exist and this kind of file structure is usually more common for transfer media.
It is very common to use subdirectories to circumvent the restriction on max 8 length filenames. For example, a file could be 20110331/183421/000001 which would identify it properly with date, time and index without exceeding the somewhat arcane filename limit.
I want to get a better understanding of how disk reads work for a simple ls command and for a cat * command on a particular folder.
As I understand it, disk reads are the "slowest" operation for a server/any machine, and a webapp I have in mind will be making ls and cat * calls on a certain folder very frequently.
What are "ball park" estimates of the disk reads involved for an "ls" and for a "cat *" for the following number of entries?
Disk reads for ls Disk reads for cat *
200
2,000
20,000
200,000
Each file entry is just a single line of text
Tricky to answer - which is probably why it spent so long getting any answer at all.
In part, the answer will depend on the file system - different file systems will give different answers. However, doing 'ls' requires reading the pages that hold the directory entries, plus reading the pages that hold the inodes identified in the directory. How many pages that is - and therefore how many disk reads - depends on the page size and on the directory size. If you think in terms of 6-8 bytes of overhead per file name, you won't be too far off. If the names are about 12 characters each, then you have about 20 bytes per file, and if your pages are 4096 bytes (4KB), then you have about 200 files per directory page.
If you just list names and not other attributes with 'ls', you are done. If you list attributes (size, etc), then the inodes have to be read too. I'm not sure how big a modern inode is. Once upon a couple of decades ago on a primitive file system, it was 64-bytes each; it might have grown since then. There will be a number of inodes per page, but you can't be sure that the inodes you need are contiguous (adjacent to each other on disk). In the worst case, you might need to read another page for each separate file, but that is pretty unlikely in practice. Fortunately, the kernel is pretty good about caching disk pages, so it is unlikely to have to reread a page. It is impossible for us to make a good guess on the density of the relevant inode entries; it might be, perhaps, 4 inodes per page, but any estimate from 1 to 64 might be plausible. Hence, you might have to read 50 pages for a directory containing 200 files.
When it comes to running 'cat' on the files, the system has to locate the inode for each file, just as with 'ls'; it then has to read the data for the file. Unless the data is stored in the inode itself (I think that is/was possible in some file systems with biggish inodes and small enough file bodies), then you have to read one page per file - unless partial pages for small files are bunched together on one page (again, I seem to remember hearing that could happen in some file systems).
So, for a 200 file directory:
Plain ls: 1 page
ls -l: 51 pages
cat *: 251 pages
I'm not sure I'd trust the numbers very far - but you can see the sort of data that is necessary to improve the estimates.