What do programs see when ZFS can't deliver uncorrupted data? - unix

Say my program attempts a read of a byte in a file on a ZFS filesystem. ZFS can locate a copy of the necessary block, but cannot locate any copy with a valid checksum (they're all corrupted, or the only disks present have corrupted copies). What does my program see, in terms of the return value from the read, and the byte it tried to read? And is there a way to influence the behavior (under Solaris, or any other ZFS-implementing OS), that is, force failure, or force success, with potentially corrupt data?

EIO is indeed the only answer with current ZFS implementations.
An open ZFS "bug" asks for some way to read corrupted data:
http://bugs.opensolaris.org/bugdatabase/printableBug.do?bug_id=6186106
I believe this is already doable using the undocumented but open source zdb utility.
Have a look at http://www.cuddletech.com/blog/pivot/entry.php?id=980 for explanations about how to dump a file content using zdb -R option and "r" flag.

Solaris 10:
# Create a test pool
[root#tesalia z]# cd /tmp
[root#tesalia tmp]# mkfile 100M zz
[root#tesalia tmp]# zpool create prueba /tmp/zz
# Fill the pool
[root#tesalia /]# dd if=/dev/zero of=/prueba/dummy_file
dd: writing to `/prueba/dummy_file': No space left on device
129537+0 records in
129536+0 records out
66322432 bytes (66 MB) copied, 1.6093 s, 41.2 MB/s
# Umount the pool
[root#tesalia /]# zpool export prueba
# Corrupt the pool on purpose
[root#tesalia /]# dd if=/dev/urandom of=/tmp/zz seek=100000 count=1 conv=notrunc
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.0715209 s, 7.2 kB/s
# Mount the pool again
zpool import -d /tmp prueba
# Try to read the corrupted data
[root#tesalia tmp]# md5sum /prueba/dummy_file
md5sum: /prueba/dummy_file: I/O error
# Read the manual
[root#tesalia tmp]# man -s2 read
[...]
RETURN VALUES
Upon successful completion, read() and readv() return a
non-negative integer indicating the number of bytes actually
read. Otherwise, the functions return -1 and set errno to
indicate the error.
ERRORS
The read(), readv(), and pread() functions will fail if:
[...]
EIO A physical I/O error has occurred, [...]
You must export/import the test pool because, if not, the direct overwrite (pool corruption) will be missed since the file will still be cached in OS memory.
And no, currently ZFS will refuse to give you corrupted data. As it should.

How would returning anything but an EIO error from read() make sense outside a file system specific low level data rescue utility?
The low level data rescue utility would need to use an OS and FS specific API other than open/read/write/close to to access the file. The semantics it would need are fundamentally different from reading normal files, so it would need a specialized API.

Related

Stress-ng - Overload Memory

I want to test the systems reaction to a process that wants to consume more memory than there is available.
I run stress-ng with the following command (on a 6G RAM machine):
stress-ng --vm-bytes 8G --vm-keep -m 1 --aggressive
but I get this error:
stress-ng: error: [5035] stress-ng-vm: gave up trying to mmap, no available memory
Is it possible to force the program to ignore its own secure mechanism ?
try to add this parameter --vm 4
I was having the same problem and it is gone after that.

How to input many big static json files into logstash?

My inputs are hundreds of big 1-line json files (~10MB-20MB).
After getting out-of-memory errors with my real setup (with two custom filters), I simplified the setup to isolate the problem.
logstash --verbose -e 'input { tcp { port => 5000 } } output { file { path => "/dev/null" } }'
`
My test input is a multi-level nested object in json:
$ ls -sh example_fixed.json
9.7M example_fixed.json
If I send the file once, it works fine. But if I do:
$ repeat 50 cat example_fixed.json|nc -v localhost 5000
I get the error message:
Logstash startup completed
Using version 0.1.x codec plugin 'line'. This plugin isn't well supported by the community and likely has no maintainer. {:level=>:info}
Opening file {:path=>"/dev/null", :level=>:info}
Starting stale files cleanup cycle {:files=>{"/dev/null"=>#<IOWriter:0x6f51765 #active=true, #io=#<File:/dev/null>>}, :level=>:info}
Error: Your application used more memory than the safety cap of 500M.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace
I have determined that the error triggers if I send the input more than 30 times with a heap size of 500MB. If I increase heap size, this limit goes up accordingly.
However, from documentation I understand logstash should be able to throttle the input when it cannot process the events quickly enough.
In fact, If I do a sleep 0.1 after sending new events, it can handle up to 100 repetitions. But not 1000. So I assume this means the input is not being throttled properly, and whenever the input rate is higher than the processing rate, it's a matter of time before the heap is filled and logstash crashes.

IIS: Download large file with wget - connection always fails with Bitrate Throottling

I have a problem to make IIS to allow download file with Bitrate Throottling. I set limit file to 100kb/s. There is no problem without bitrate limitation. But with limit I have a problem.
I'm using a code similar to described in this article:
Securing Large Downloads Using C# and IIS 7
I also tried to switch off IIS Bitrate Throottling and control bitrate "by hand" calculating with TimeSpan the bitrate and using Thread.Sleep(10) in a while...
But all my tries was useless, I don't get any exceptions.
to test download I use wget, this way:
wget -t 1 http://db.realestate.ru/yrl/RealEstateExportToYandex.xml
(you can try it with wget for windows)
this is a 240Mb text file, wget always stops, at random position of downloading, 5% - 60% and throws this error message:
Read error at byte ... (Connection reset by peer).
May be the problem is not with IIS, because in may localhost is working well, but not online on highly loaded server.
Solved with this parameters specified in wget command:
wget -t 1 --header="Keep-Alive: 30000" -nv http://db.realestate.ru/yrl/RealEstateExportToYandex.xml

exit code for rsync if there is no modification done to destination folder

I'm using rsync in solaris and couldn't find an exit code if there is no file or folder modification/addition or deletion done on the destination folder. How can I get the status if rsync doesn't have one ?
0 Success
1 Syntax or usage error
2 Protocol incompatibility
3 Errors selecting input/output files, dirs
4 Requested action not supported: an attempt was made to manipulate 64-bit
files on a platform that cannot support them; or an option was specified
that is supported by the client and not by the server.
5 Error starting client-server protocol
6 Daemon unable to append to log-file
10 Error in socket I/O
11 Error in file I/O
12 Error in rsync protocol data stream
13 Errors with program diagnostics
14 Error in IPC code
20 Received SIGUSR1 or SIGINT
21 Some error returned by waitpid()
22 Error allocating core memory buffers
23 Partial transfer due to error
24 Partial transfer due to vanished source files
25 The --max-delete limit stopped deletions
30 Timeout in data send/receive
35 Timeout waiting for daemon connection
Thank you
There is a work around
rsync --log-format=%f ...
Note that rsync outputs files anytime any attribute changes, not only if the content of the file is updated.
There is also a -i option (or --log-format=%i) that itemizes all of the changes. See the rsync man page for details of the output format.

Unable to rsync between my server and my Mac

I have a server where I store data from Mac A and Mac B.
I use rsync to keep the files updated between my Macs.
I run the following code unsuccessfully
#!/bin/zsh
# to copy files from my server to my folder
rsync -Pav $Masi:~/private/ ~/Dropbox/Courses/math/
# to copy files from my folder to my server
rsync -Pav ~/Dropbox/Courses/math $Masi:~/private/
I get the following error message
ssh: connect to host port 22: Connection refused
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: unexplained error (code 255) at io.c(600) [receiver=3.0.5]
ssh: connect to host port 22: Connection refused
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.5]
I have ssh keys in place so the connection should work, since I can use scp without problems.
How can you use rsync between my server and one of my Macs?
I used to do a lot of this. Just ran a test, a few suggestions.
Spell out your entire user#host pattern
Run the ssh connection sans the rsync first, you may need to first approve your fingerprint
You do not seem to pass a flag to protect extended attributes, this can yield broken files on OS X. If you do not need resource forks, you are OK, but most of the time you do need them.
My test case:
$ rsync -Pav ~/Desktop/ me#remote.example.com:~/rsyc-test
In that case, all the files within ~/Desktop were copied to the remote host, in my home dir. Since the directory 'rsyc-test' did not exist, it was made for me. I had a .app on my Desktop, it made it over, surprisingly, it works. Even some .webloc files made it and appear to work, though I do not trust it.
I would strongly suggest adding in the -E flag
-E, --extended-attributes
Apple specific option to copy extended attributes, resource
forks, and ACLs. Requires at least Mac OS X 10.4 or suitably
patched rsync.
I ran a new test, moved a Interarchy bookmark to my desktop, I know for a fact these break if they are copied sans resource forks. Running without the -E versus with the -E, there is a difference of 152 bytes in xfered data. The first file on the remote machine did not work, the second transfered file did work.
I can not help but notice in your example one of your paths is ~/Dropbox so this may all not matter, since DropBox, the app, does not at all support resource forks currently, though I hear there are plans to in the future.
You also are not sending in the --delete flag, if your end goal is a mirror of your data, you are not getting that, if your end goal is backups that continually grows, keeping everything that was ever on the source, the lack of --delete is good.
Other notes:
You can exclude those silly .DS_Store files
--exclude '.DS_Store'
You can also set rsync up in a way to be a true mirror, so you would not need to run your other command, see the man page for details.
My final working command to shove the Desktop of my laptop to a remote machine:
$ rsync -PEav --delete --exclude '.DS_Store' ~/Desktop/ me#remote.example.com:~/rsycn-test
Check "$Masi". Is that the hostname you are trying to reach?
Try the following command to debug it:
rsync -e 'ssh -v' -Pav $Masi:~/private/ ~/Dropbox/Courses/math/
The Connection refused usually happens when there is a connection issue to the remote (e.g. firewall).
In your case the problem is that $Masi variable is empty. If it's not variable, use Masi.
As per this error:
ssh: connect to host port 22: Connection refused
Notice the double space above after the host word.
the connect to host message doesn't say to which host, so you're trying to connect to empty host. So it sound like a typo in the host name.

Resources