dos2unix modifies binary files - why - centos6

By default it is not supposed to affect binary files.
I tested it in a folder with images and although most images were not affected, a few were. If dos2unix cannot tell a binary file from a text file, must I resort to specifically including and/or excluding certain file extensions for it to work properly?
NOTE: when I run file image.jpg on any of the jpgs, whether it got modified or not, the result is:
JPEG image data, JFIF standard 1.01

This is a relevant part of the source code of dos2unix program:
if ((ipFlag->Force == 0) &&
(TempChar < 32) &&
(TempChar != 0x0a) && /* Not an LF */
(TempChar != 0x0d) && /* Not a CR */
(TempChar != 0x09) && /* Not a TAB */
(TempChar != 0x0c)) { /* Not a form feed */
RetVal = -1;
ipFlag->status |= BINARY_FILE ;
if (ipFlag->verbose) {
if ((ipFlag->stdio_mode) && (!ipFlag->error)) ipFlag->error = 1;
d2u_fprintf(stderr, "%s: ", progname);
d2u_fprintf(stderr, _("Binary symbol 0x00%02X found at line %u\n"),TempChar, line_nr);
}
break;
}
It seems that if the file has other control character it is considered as a binary file and is skipped, otherwise it is processed as a text file. So if the binary file (e.g. an image) doesn't contain these characters, it will be corrupted.

There's no such a thing as a "binary" or "text" file in line of principle - all files are just a sequence of bytes.
Most programs that try to detect them just use some kind of heuristic to rule out files which contain characters unusual for text (typically, characters < 32) or do not contain characters that are typically found in text (for example, whitespace, as shown in #Andrey's answer).
This is just a kindness done to you to avoid accidental mistakes, but "without warranty of any kind", since it's entirely possible to have "binary" files which employ just the ASCII characters (it's easy to build, say, PPM and COM files which pass the test above).

Related

grep netlist and cell name from file

Suppose, Directory A contains two files, fileA and netlist.tcl. Below is information of fileA
Source "netlist.tcl"
cell "chklist"
I want that when the user selects fileA in Combobox in GUI,
Automatically in netlist string field: - a real path of the netlist.tcl from fileA popup
and in cell string field:-cell name from fileA popup.
How to achieve the above result?
output:
A/netlist.tcl
chklist
There's several ways to parse such a file. Here's one of the nicer ones with a safe child interpreter:
interp create -safe i
i alias Source netlistSource
i alias cell netlistCell
proc netlistSource {filename} {
global fn
set fn $filename
return
}
proc netlistCell {cellname} {
global cn
set cn $cellname
return
}
i invokehidden source "fileA"
interp delete i
That will store netlist.tcl in fn and chklist in cn. I'm not sure where the A/ prefix comes from, so I've left that part out.
Real code might need more aliases setting up. I hope you can see how easy that is to do. Remember, the aliases are called in the child, but call into your nominated code in the parent interpreter; it's a bit like doing an OS system call but with much less overhead. (Safe interpreters have all commands that touch the OS disabled/hidden by default.)
If all you need is just basic file I/O and string matching, then this would be a start. You would just need to adapt this into whatever you're doing with the GUI.
# File I/O
set fp [open fileA "r"]
set lines [split [read $fp] "\n"]
close $fp
# Check lines for netlist and cell
foreach line $lines {
if {[string match "Source*" $line]} {
set netlist [lindex $line 1]
if {[file exists $netlist]} {
puts [file normalize "./$netlist"]
}
}
if {[string match "cell*" $line]} {
set cell [lindex $line 1]
puts "$cell"
}
}
There are multiple ways to do this kind of work. regexes vs string matching. opening the file in Tcl vs executing a system call to grep vs using the fileutil package...
There's nothing here that isn't covered in Tcl introduction, like https://wiki.tcl-lang.org/page/Tcl+Tutorial+Index. It would be helpful to understand what you've already tried.
Donal's previous answer using a safe interpreter is pretty cool too, if you understand what's happening.

I always end up with mixed line-endings in Atom

I have a package installed (Line Ending Selector) which can tell me what line endings (LF or CRLF or Mixed) are used in a file.
By default it's LF (which is my preferred one), but during the editing procedure it sometimes becomes Mixed and then I always have to set it back to LF manually.
It's very annoying when I forget to set all line-endings back to LF, push a file onto Github with Mixed line endings, someone pulls it, pushes their changes, and half of that commit are just line-ending changes that their editor made to the file because it has automatic line-ending correction - not like mine.
Can Atom also have this functionality? Is there a way to ensure that (for example on save) all line endings of a file are set to LF?
You can make use of .gitattributes to deal with line-endings:
# Set the default behavior, in case people don't have core.autocrlf set.
* text=auto
# Explicitly declare text files you want to always be normalized and converted
# to native line endings on checkout.
*.c text
*.h text
# Declare files that will always have CRLF line endings on checkout.
*.sln text eol=crlf
# Denote all files that are truly binary and should not be modified.
*.png binary
*.jpg binary
Related: GitHub User Documentation: Dealing with line endings
Alternatively, you can use EditorConfig to achieve the same. It's especially useful to enforce consistent settings across teams or open-source contributors.
I've found a package (Force Line Endings) that does exactly what I need. I wonder why others are not having this same problem, and why this package has so few downloads.

How do I convert my 5GB 1 liner file to lines based on pattern?

I have a 5GB 1 liner file with JSON data and each line starts from this pattern "{"created". I need to be able to use Unix commands on my Mac to convert this monster of a 1 liner into as many lines as it deserves. Any commands?
ASCII English text, with very long lines, with no line terminators
If you have enough memory you can open the file once with the TextWrangler application (the free BBEdit cousin) and use regular search/replace on the whole file. Use \r in replace to add a return. Will be very slow at opening the file, may even hang if low on memory, but in the end it may probably work. No scripting, no commands,.. etc.. I did this with big SQL files and sometimes it did the job.
You have to replace your line-start string with the same string with \n or \r or \r\n in front of it.
Unclear how it can be a “one liner” file but then each line starts with "{"created", but perhaps python -mjson.tool can help you get started:
cat your_source_file.json | python -mjson.tool > nicely_formatted_file.json
Piping raw JSON through ``python -mjson.tool` will cleanly format the JSON to be more human readable. More info here.
OS X ships with both flex and bison, you can use those to write a parser for your data.
You can use PHP as a shell command (if PHP is installed), just save a text file with name "myscript" and appropriate code (I cannot test code now, but the idea is as follows)
UNTESTED CODE
#!/usr/bin/php
<?php
$REPLACE_STRING='{"created'; // anything you like
// open input file with fopen() in read mode
$inFp=fopen('big_in_file.txt', "r");
// open output file with fopen() in write mode
$outFp=fopen('big_out_file.txt', "w+");
// while not end of file
while (!feof($inFp)) {
// read file chunks here with fread() in variable $chunk
$chunk = fread($inFp, 8192);
// do a $chunk=str_replace($REPLACE_STRING,"\r".$REPLACE_STRING; // to add returns
// (or use \r\n for windows end of lines)
$chunk=str_replace($REPLACE_STRING,"\r".$REPLACE_STRING,$chunk);
// problem: if chunk contains half the string at the end
// easily solved if $REPLACE_STRING is a one char like '{'
// otherwise test for fist char { in the end of $chunk
// remove final part and save it in a var for nest iteration
// write $chunk to output file
fwrite($outFp, $chunk);
// End while
}
?>
After you save it you must make it executable whith sudo chmod a+x ./myscript
and then launch it as ./myscript in terminal
After this, the myscript file is a full unix command

Has there ever been a unix system call to create a link from an open file descriptor? [duplicate]

In Unix, it's possible to create a handle to an anonymous file by, e.g., creating and opening it with creat() and then removing the directory link with unlink() - leaving you with a file with an inode and storage but no possible way to re-open it. Such files are often used as temp files (and typically this is what tmpfile() returns to you).
My question: is there any way to re-attach a file like this back into the directory structure? If you could do this it means that you could e.g. implement file writes so that the file appears atomically and fully formed. This appeals to my compulsive neatness. ;)
When poking through the relevant system call functions I expected to find a version of link() called flink() (compare with chmod()/fchmod()) but, at least on Linux this doesn't exist.
Bonus points for telling me how to create the anonymous file without briefly exposing a filename in the disk's directory structure.
A patch for a proposed Linux flink() system call was submitted several years ago, but when Linus stated "there is no way in HELL we can do this securely without major other incursions", that pretty much ended the debate on whether to add this.
Update: As of Linux 3.11, it is now possible to create a file with no directory entry using open() with the new O_TMPFILE flag, and link it into the filesystem once it is fully formed using linkat() on /proc/self/fd/fd with the AT_SYMLINK_FOLLOW flag.
The following example is provided on the open() manual page:
char path[PATH_MAX];
fd = open("/path/to/dir", O_TMPFILE | O_RDWR, S_IRUSR | S_IWUSR);
/* File I/O on 'fd'... */
snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file", AT_SYMLINK_FOLLOW);
Note that linkat() will not allow open files to be re-attached after the last link is removed with unlink().
My question: is there any way to re-attach a file like this back into the directory structure? If you could do this it means that you could e.g. implement file writes so that the file appears atomically and fully formed. This appeals to the my compulsive neatness. ;)
If this is your only goal, you can achieve this in a much simpler and more widely used manner. If you are outputting to a.dat:
Open a.dat.part for write.
Write your data.
Rename a.dat.part to a.dat.
I can understand wanting to be neat, but unlinking a file and relinking it just to be "neat" is kind of silly.
This question on serverfault seems to indicate that this kind of re-linking is unsafe and not supported.
Thanks to #mark4o posting about linkat(2), see his answer for details.
I wanted to give it a try to see what actually happened when trying to actually link an anonymous file back into the filesystem it is stored on. (often /tmp, e.g. for video data that firefox is playing).
As of Linux 3.16, there still appears to be no way to undelete a deleted file that's still held open. Neither AT_SYMLINK_FOLLOW nor AT_EMPTY_PATH for linkat(2) do the trick for deleted files that used to have a name, even as root.
The only alternative is tail -c +1 -f /proc/19044/fd/1 > data.recov, which makes a separate copy, and you have to kill it manually when it's done.
Here's the perl wrapper I cooked up for testing. Use strace -eopen,linkat linkat.pl - </proc/.../fd/123 newname to verify that your system still can't undelete open files. (Same applies even with sudo). Obviously you should read code you find on the Internet before running it, or use a sandboxed account.
#!/usr/bin/perl -w
# 2015 Peter Cordes <peter#cordes.ca>
# public domain. If it breaks, you get to keep both pieces. Share and enjoy
# Linux-only linkat(2) wrapper (opens "." to get a directory FD for relative paths)
if ($#ARGV != 1) {
print "wrong number of args. Usage:\n";
print "linkat old new \t# will use AT_SYMLINK_FOLLOW\n";
print "linkat - <old new\t# to use the AT_EMPTY_PATH flag (requires root, and still doesn't re-link arbitrary files)\n";
exit(1);
}
# use POSIX qw(linkat AT_EMPTY_PATH AT_SYMLINK_FOLLOW); #nope, not even POSIX linkat is there
require 'syscall.ph';
use Errno;
# /usr/include/linux/fcntl.h
# #define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
# #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
# #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
unless (defined &AT_SYMLINK_NOFOLLOW) { sub AT_SYMLINK_NOFOLLOW() { 0x0100 } }
unless (defined &AT_SYMLINK_FOLLOW ) { sub AT_SYMLINK_FOLLOW () { 0x0400 } }
unless (defined &AT_EMPTY_PATH ) { sub AT_EMPTY_PATH () { 0x1000 } }
sub my_linkat ($$$$$) {
# tmp copies: perl doesn't know that the string args won't be modified.
my ($oldp, $newp, $flags) = ($_[1], $_[3], $_[4]);
return !syscall(&SYS_linkat, fileno($_[0]), $oldp, fileno($_[2]), $newp, $flags);
}
sub linkat_dotpaths ($$$) {
open(DOTFD, ".") or die "open . $!";
my $ret = my_linkat(DOTFD, $_[0], DOTFD, $_[1], $_[2]);
close DOTFD;
return $ret;
}
sub link_stdin ($) {
my ($newp, ) = #_;
open(DOTFD, ".") or die "open . $!";
my $ret = my_linkat(0, "", DOTFD, $newp, &AT_EMPTY_PATH);
close DOTFD;
return $ret;
}
sub linkat_follow_dotpaths ($$) {
return linkat_dotpaths($_[0], $_[1], &AT_SYMLINK_FOLLOW);
}
## main
my $oldp = $ARGV[0];
my $newp = $ARGV[1];
# link($oldp, $newp) or die "$!";
# my_linkat(fileno(DIRFD), $oldp, fileno(DIRFD), $newp, AT_SYMLINK_FOLLOW) or die "$!";
if ($oldp eq '-') {
print "linking stdin to '$newp'. You will get ENOENT without root (or CAP_DAC_READ_SEARCH). Even then doesn't work when links=0\n";
$ret = link_stdin( $newp );
} else {
$ret = linkat_follow_dotpaths($oldp, $newp);
}
# either way, you still can't re-link deleted files (tested Linux 3.16 and 4.2).
# print STDERR
die "error: linkat: $!.\n" . ($!{ENOENT} ? "ENOENT is the error you get when trying to re-link a deleted file\n" : '') unless $ret;
# if you want to see exactly what happened, run
# strace -eopen,linkat linkat.pl
Clearly, this is possible -- fsck does it, for example. However, fsck does it with major localized file system mojo and will clearly not be portable, nor executable as an unprivileged user. It's similar to the debugfs comment above.
Writing that flink(2) call would be an interesting exercise. As ijw points out, it would offer some advantages over current practice of temporary file renaming (rename, note, is guaranteed atomic).
Kind of late to the game but I just found http://computer-forensics.sans.org/blog/2009/01/27/recovering-open-but-unlinked-file-data which may answer the question. I haven't tested it, though, so YMMV. It looks sound.

Displaying unicode string using opengl, qt and freetype

I want to display text in OGL using FTGL(wrapper for FreeType2) in Qt. I have problem with unicode->FTGL must have charcode of actually rendering char, to get glyph from truetype font, which is problematic when this char is for example one of: 'Zażółć gęślą jaźń'.
Do you have any ideas, why this code:
const unsigned char *string=(const unsigned char*)"POCZUJ GĘŚLĄ JAŹŃ";
// for multibyte - we can't rely on sizeof(T) == character
FTUnicodeStringItr<T> ustr(string);
for(int i = 0; (len < 0 && *ustr) || (len >= 0 && i < len); i++)
{
unsigned int thisChar = *ustr++;
unsigned int nextChar = *ustr;
if(CheckGlyph(thisChar))
{
position += glyphList->Render(thisChar, nextChar,
position, renderMode);
}
}
works in Visual, but in Qt doesn't(it doesn't get proper charcodes, so it displays brackets)?
FTUnicodeStringItr template looks like this: http://www.nopaste.pl/11xt
Thanks.
The problem you're running into is, that C/C++ don't really support Unicode/wide characters in the source code. To be done properly you'd have to specify unicode characters either by \uXXXX escape sequences (if the compiler supports these), or by \xXX\xXX sequences building the code points from scratch. Visual C++ has wide character support (for the simple reason that all string manipulation in Windows is done in wide characters – don't confuse that with Unicode code points!).
I suggest you do the following: Qt has internationalization support built in. It boils down to define strings through the tr(...) helper function/macro with default language, i.e. english strings. Qt Linguist can then is used to create substitution rules, that are applied through the usage of tr(...) and does this with full Unicode support.

Resources