I have a gzip file of size 325 MB. I just figured it that it is truncated by 361 bytes from the beginning.
Please advise how can I recover the compressed files from it.
You need to find the next deflate block boundary. Such a boundary can occur at any bit location. You will need to attempt decompression starting at every bit until you get successful decoding for at least a few deflate blocks.
You can use zlib's inflatePrime() to feed less than a byte to inflate(). You can use inflateSetDictionary() to provide a faux 32K dictionary to precede the data being inflated, in order to avoid distance-too-far-back errors.
Once you find a block boundary, you have solved half the problem. The next half is to find where in the deflate stream there is no longer a dependence on the unknown uncompressed data derived from that missing 361 bytes of compressed data. It is possible for such a dependency to very long lasting. For example, if the word " the " appears in that missing section, then it can be referred to after that as a missing string. However, you don't know that it is " the ". All you know is that there is a reference to a five-byte string in the missing data. Then where that five-byte string is copied to can itself be referenced by a later match. This could, in principle, propagate through the entire 325 MB, making the whole thing completely unrecoverable.
However that is unlikely. It is more likely that at some point the propagation of strings from the first 361 bytes stops. From there on, you can recover the uncompressed data.
In order to tell whether you are still seeing propagation or not, do the decompression twice. Once with an initial faux dictionary of all 0's, and once with an initial faux dictionary of all 1's. Where the decompressed data is the same for both decompressions, you have successfully recovered that data.
Then you will need to go up to the next level of structure in that data, and see if you can somehow make use of what you have recovered.
Good luck. And don't cut off the first 361 bytes next time.
Below is example code that does what is described above.
/* salvage -- recover data from a corrupted deflate stream
* Copyright (C) 2015 Mark Adler
* Version 1.0 28 June 2015 Mark Adler
*/
/*
This software is provided 'as-is', without any express or implied
warranty. In no event will the author be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
Mark Adler
madler#alumni.caltech.edu
*/
/* Attempt to recover deflate data from a corrupted stream. The corrupted data
is read on stdin, and any reliably decompressed data is written to stdout. A
deflate stream is deemed to have been found successfully if there are eight
or fewer bytes of compressed data unused when done. This can be changed
with the MAXLEFT macro below, or the conditional that currently uses
MAXLEFT. */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <assert.h>
#include "zlib.h"
/* Get the size of an allocated piece of memory (usable size -- not necessarily
the requested size). */
#if defined(__APPLE__) && defined(__MACH__)
# include <malloc/malloc.h>
# define memsize(p) malloc_size(p)
#elif defined (__linux__)
# include <malloc.h>
# define memsize(p) malloc_usable_size(p)
#elif defined (_WIN32)
# include <malloc.h>
# define memsize(p) _msize(p)
#else
# error You need to find an allocated memory size function
#endif
#define local static
/* Load an entire file into a memory buffer. load() returns 0 on success, in
which case it puts all of the file data in *dat[0..*len - 1]. That is,
unless *len is zero, in which case *dat is NULL. *data is allocated memory
which should be freed when done with it. load() returns zero on success,
with *data == NULL and *len == 0. The error values are -1 for read error or
1 for out of memory. To guard against bogging down the system with
extremely large allocations, if limit is not zero then load() will return an
out of memory error if the input is larger than limit. */
local int load(FILE *in, unsigned char **data, size_t *len, size_t limit)
{
size_t size = 1048576, have = 0, was;
unsigned char *buf = NULL, *mem;
*data = NULL;
*len = 0;
if (limit == 0)
limit--;
if (size >= limit)
size = limit - 1;
do {
/* if we already saturated the size_t type or reached the limit, then
out of memory */
if (size == limit) {
free(buf);
return 1;
}
/* double size, saturating to the maximum size_t value */
was = size;
size <<= 1;
if (size < was || size > limit)
size = limit;
/* reallocate buf to the new size */
mem = realloc(buf, size);
if (mem == NULL) {
free(buf);
return 1;
}
buf = mem;
/* read as much as is available into the newly allocated space */
have += fread(buf + have, 1, size - have, in);
/* if we filled the space, make more space and try again until we don't
fill the space, indicating end of file */
} while (have == size);
/* if there was an error reading, discard the data and return an error */
if (ferror(in)) {
free(buf);
return -1;
}
/* if a zero-length file is read, return NULL for the data pointer */
if (have == 0) {
free(buf);
return 0;
}
/* resize the buffer to be just big enough to hold the data */
mem = realloc(buf, have);
if (mem != NULL)
buf = mem;
/* return the data */
*data = buf;
*len = have;
return 0;
}
#define DICTSIZE 32768
#if UINT_MAX <= 0xffff
# define BUFSIZE 32768
#else
# define BUFSIZE 1048576
#endif
/* Inflate the provided buffer starting at a specified bit offset. Use an
already-initialized inflate stream structure for rapid repeated attempts.
The structure needs to have been initialized using inflateInit2(strm, -15).
Inflation begins at data[off], starting at bit bit in that byte, going from
that bit to the more significant bits in that byte, and then on to the next
byte. bit must be in the range 0..7. bit == 0 uses the entire byte at
data[off]. bit == 7 uses only the most significant bit of the byte at
data[off]. Before inflation, the dictionary is initialized to
dict[0..DICTSIZE-1] so that references before the start of the uncompressed
data do not stop inflation. Inflation continues as long as possible, until
either an error is encountered, the end of the deflate stream is reached, or
data[len-1] is processed. On entry *recoup is a pointer to allocated memory
or NULL, and on return *recoup points to allocated memory with the
decompressed data. *got is set to the number of bytes of decompressed data
returned at *recoup.
inflate_at() returns Z_DATA_ERROR if an error was detected in the alleged
deflate data, Z_STREAM_END if the end of a valid deflate stream was reached,
or Z_OK if the end of the provided compressed data was reached without
encountering an erorr or the end of the stream. */
local int inflate_at(z_stream *strm, unsigned char *data, size_t len,
size_t off, int bit, size_t *unused, unsigned char *dict,
unsigned char **recoup, size_t *got)
{
int ret;
size_t left, size;
/* check input */
assert(data != NULL && off < len && bit >= 0 && bit <= 7);
assert(dict != NULL && recoup != NULL);
/* set up inflate engine, feeding first few bits if necessary */
ret = inflateReset(strm);
assert(ret == Z_OK);
ret = inflateSetDictionary(strm, dict, DICTSIZE);
assert(ret == Z_OK);
if (bit) {
ret = inflatePrime(strm, 8 - bit, data[off] >> bit);
assert(ret == Z_OK);
off++;
}
/* inflate as much as possible */
strm->next_in = data + off;
left = len - off;
*got = 0;
do {
strm->avail_in = left > UINT_MAX ? UINT_MAX : left;
left -= strm->avail_in;
do {
/* assure at least BUFSIZE available in recoup */
size = memsize(*recoup);
if (*got + BUFSIZE > size) {
size = size ? size << 1 : BUFSIZE;
assert(size != 0);
*recoup = reallocf(*recoup, size);
assert(*recoup != NULL);
}
/* inflate into recoup */
strm->next_out = *recoup + *got;
strm->avail_out = BUFSIZE;
ret = inflate(strm, Z_NO_FLUSH);
assert(ret != Z_STREAM_ERROR && ret != Z_MEM_ERROR);
/* set the number of compressed bytes unused so far, in case we
return */
if (unused != NULL)
*unused = left + strm->avail_in;
/* update the number of uncompressed bytes generated */
*got += BUFSIZE - strm->avail_out;
/* if we cannot continue to decompress, then return the reason */
if (ret == Z_DATA_ERROR || ret == Z_STREAM_END)
return ret;
/* continue with provided input data until all output generated */
} while (strm->avail_out == 0);
assert(strm->avail_in == 0);
/* provide more input data, if any */
} while (left);
/* ran through all compressed data with no errors or end of stream */
return Z_OK;
}
/* The criteria for success is the completion of inflate with no more than this
many bytes unused. (8 is the length of a gzip trailer.) */
#define MAXLEFT 8
/* Read a corrupted (or not) deflate stream from stdin and write the reliably
recovered data to stdout. */
int main(void)
{
int ret, bit;
unsigned char *data = NULL, *recoup = NULL, *comp = NULL;
size_t len, off, unused, got;
z_stream strm;
unsigned char dict[DICTSIZE] = {0};
/* read input into memory */
ret = load(stdin, &data, &len, 0);
if (ret < 0)
fprintf(stderr, "file error reading input\n");
if (ret > 0)
fprintf(stderr, "ran out of memory reading input\n");
assert(ret == 0);
fprintf(stderr, "read %lu bytes\n", len);
/* initialize inflate structure */
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
strm.next_in = Z_NULL;
strm.avail_in = 0;
ret = inflateInit2(&strm, -15);
assert(ret == Z_OK);
/* scan for an acceptable starting point for inflate */
for (off = 0; off < len; off++)
for (bit = 0; bit < 8; bit++) {
ret = inflate_at(&strm, data, len, off, bit, &unused, dict,
&recoup, &got);
if ((ret == Z_STREAM_END || ret == Z_OK) && unused <= MAXLEFT)
goto done;
}
done:
/* if met the criteria, show result and write out reliable data */
if (bit != 8 && (ret == Z_STREAM_END || ret == Z_OK)) {
fprintf(stderr,
"decoded %lu bytes (%lu unused) at offset %lu, bit %d\n",
len - off - unused, unused, off, bit);
/* decompress again with a different dictionary to detect unreliable
data */
memset(dict, 1, DICTSIZE);
inflate_at(&strm, data, len, off, bit, NULL, dict, &comp, &got);
{
unsigned char *p, *q;
/* search backwards from the end for the first unreliable byte */
p = recoup + got;
q = comp + got;
while (q > comp)
if (*--p != *--q) {
p++;
q++;
break;
}
/* write out the reliable data */
fwrite(q, 1, got - (q - comp), stdout);
fprintf(stderr,
"%lu bytes of reliable uncompressed data recovered\n",
got - (q - comp));
fprintf(stderr,
"(out of %lu total uncompressed bytes recovered)\n", got);
}
}
/* otherwise declare failure */
else
fprintf(stderr, "no deflate stream found that met criteria\n");
/* clean up */
free(comp);
free(recoup);
inflateEnd(&strm);
free(data);
return 0;
}
Just before starting, sorry for my english... i'm french.
I'm developing (in C language) a wireshark dissector to dissect a specific protocol to the company (it's owner of it) where I work but I have a problems when messages are several TCP frames ... I can not reassemble the messages when a message is broken into two different frames TCP, I can not reform it in one message...
I read the readme.dissector and try using two methods:
First method:
tcp_dissect_pdus(tvb, pinfo, tree, dns_desegment, 2,
get_dns_pdu_len, dissect_dns_tcp_pdu, data);
return tvb_captured_length(tvb);
Second method :
guint offset = 0;
while(offset < tvb_reported_length(tvb)) {
gint available = tvb_reported_length_remaining(tvb, offset);
gint len = tvb_strnlen(tvb, offset, available);
if( -1 == len ) {
/* we ran out of data: ask for more */
pinfo->desegment_offset = offset;
pinfo->desegment_len = DESEGMENT_ONE_MORE_SEGMENT;
return (offset + available);
}
col_set_str(pinfo->cinfo, COL_INFO, "C String");
len += 1; /* Add one for the '\0' */
if (tree) {
proto_tree_add_item(tree, hf_cstring, tvb, offset, len,
ENC_ASCII|ENC_NA);
}
offset += (guint)len;
}
/* if we get here, then the end of the tvb coincided with the end of a
string. Happy days. */
return tvb_captured_length(tvb);
But impossible to reassemble the message, I do not understand why ... can you help me please?
I hope you understand my problem ...: /
I am trying to send a small packet via tcp but i have a problem with the size of it.
My data is 255 byte buffer:
buffer[0] = 0x00;
buffer[1] = 0x04;
buffer[2] = 0x06;
buffer[3] = 0x08;
buffer[4] = 0x01;
buffer[5] = 0x01;
and i use
send(sockfd,buffer,6,MSG_NOSIGNAL|MSG_DONTWAIT);
but data can not be send when i use 6. If i use send like:
send(sockfd,buffer,sizeof(buffer),MSG_NOSIGNAL|MSG_DONTWAIT);
then data is sent but recevier has to parse extra 250 0x00 byte and i don't want it. Why 6 is no ok. I also try 10 randomly nothing changes.
Here is my receiver code Anything writes at that side:
while(1) {
ret = recv(socket,buffer,sizeof(buffer),0);
if (ret > 0)
printf("recv success");
else
{
if (ret == 0)
printf("recv failed (orderly shutdown)");
else
printf("recv failed (errno:%d)",errno);
}
}
Get rid of MSG_DONTWAIT. This is only useful when you have something else to do when the socket send buffer is full and the data can't be sent. As you aren't checking the result of send(), clearly you aren't interested in that condition. However you must check the result of send anyway, as you may have got an error, so fix your code to do that as well.
I am currently in the process of making a Client and Server in the Unix/Windows environment but right now I am just working on the Unix side of it. One of the function we have to create for the program is similar to the list function in Unix which shows all files within a dir but we also have to show more information about the file such as its owner and creation date. Right now I am able to get all this information and print it to the client however we have to also add that once the program has printing 40 lines it waits for the client to push any key before it continues to print.
I have gotta the program to sort of do this but it will cause my client and server to become out of sync or at least the std out to become out of sync. This means that if i enter the command 'asdad' it should print invalid command but it won't print that message until i enter another command. I have added my list functions code below. I am open to suggestions how how to complete this requirement as the method I have chosen does not seem to be working out.
Thank-you in advance.
Server - Fork Function: This is called when the list command is enter. eg
fork_request(newsockfd, "list", buf);
int fork_request(int fd, char req[], char buf[])
{
#ifndef WIN
int pid = fork();
if (pid ==-1)
{
printf("Failed To Fork...\n");
return-1;
}
if (pid !=0)
{
wait(NULL);
return 10;
}
dup2(fd,1); //redirect standard output to the clients std output.
close(fd); //close the socket
execl(req, req, buf, NULL); //run the program
exit(1);
#else
#endif
}
Here is the function used to get all the info about a file in a dir
void longOutput(char str[])
{
char cwd[1024];
DIR *dip;
struct dirent *dit;
int total;
char temp[100];
struct stat FileAttrib;
struct tm *pTm;
int fileSize;
int lineTotal;
if(strcmp(str, "") == 0)
{
getcwd(cwd, sizeof(cwd));
}
else
{
strcpy (cwd, str);
}
if (cwd != NULL)
{
printf("\n Using Dir: %s\n", cwd);
dip = opendir(cwd);
if(dip != NULL)
{
while ((dit = readdir(dip)) != NULL)
{
printf("\n%s",dit->d_name);
stat(dit->d_name, &FileAttrib);
pTm = gmtime(&FileAttrib.st_ctime);
fileSize = FileAttrib.st_size;
printf("\nFile Size: %d Bytes", fileSize);
printf("\nFile created on: %.2i/%.2i/%.2i at %.2i:%.2i:%.2i GMT \n", (pTm->tm_mon + 1), pTm->tm_mday,(pTm->tm_year % 100),pTm->tm_hour,pTm->tm_min, pTm->tm_sec);;
lineTotal = lineTotal + 4;
if(lineTotal == 40)
{
printf("40 Lines: Waiting For Input!");
fflush(stdout);
gets(&temp);
}
}
printf("\n %d \n", lineTotal);
}
else
{
perror ("");
}
}
}
At here is the section of the client where i check that a ! was not found in the returned message. If there is it means that there were more lines to print.
if(strchr(command,'!') != NULL)
{
char temp[1000];
gets(&temp);
}
Sorry for the long post but if you need anything please just ask.
Although, I didn't see any TCP/IP code, I once had a similar problem when I wrote a server-client chat program in C++. In my case, the problem was that I didn't clearly define how messages were structured in my application. Once, I defined how my protocol was suppose to work--it was a lot easier to debug communication problems.
Maybe you should check how your program determines if a message is complete. In TCP, packets are guaranteed to arrive in order with no data loss, etc. Much like a conversation over a telephone. The only thing you have to be careful of is that it's possible to receive a message partially when you read the buffer for the socket. The only way you know to stop reading is when you determine a message is complete. This could be as simple as two '\n' characters or "\n\r".
If you are using UDP, then that is a completely different beast all together (i.e. messages can arrive out of order and can be lost in transit, et cetera).
Also, it looks like you are sending across strings and no binary data. If this is the case, then you don't have to worry about endianess.
I'm decompressing gzip data received from a http server, using the zlib library from Qt. Because qUncompress was no good, I followed the advice given here: Qt quncompress gzip data and created my own method to uncompress the gzip data, like this:
QByteArray gzipDecompress( QByteArray compressData )
{
//strip header
compressData.remove(0, 10);
const int buffer_size = 16384;
quint8 buffer[buffer_size];
z_stream cmpr_stream;
cmpr_stream.next_in = (unsigned char *)compressData.data();
cmpr_stream.avail_in = compressData.size();
cmpr_stream.total_in = 0;
cmpr_stream.next_out = buffer;
cmpr_stream.avail_out = buffer_size;
cmpr_stream.total_out = 0;
cmpr_stream.zalloc = Z_NULL;
cmpr_stream.zfree = Z_NULL;
cmpr_stream.opaque = Z_NULL;
int status = inflateInit2( &cmpr_stream, -8 );
if (status != Z_OK) {
qDebug() << "cmpr_stream error!";
}
QByteArray uncompressed;
do {
cmpr_stream.next_out = buffer;
cmpr_stream.avail_out = buffer_size;
status = inflate( &cmpr_stream, Z_NO_FLUSH );
if (status == Z_OK || status == Z_STREAM_END)
{
QByteArray chunk = QByteArray::fromRawData((char *)buffer, buffer_size - cmpr_stream.avail_out);
uncompressed.append( chunk );
}
else
{
inflateEnd(&cmpr_stream);
break;
}
if (status == Z_STREAM_END)
{
inflateEnd(&cmpr_stream);
break;
}
}
while (cmpr_stream.avail_out == 0);
return uncompressed;
}
Eveything seems to work fine if the decompressed data fits into the output buffer (ie. is smaller than 16 Kb). If it doesn't, the second call to inflate returns a Z_DATA_ERROR. I know for sure the data is correct because the same chunk of data is correctly decompressed if the output buffer is made large enough.
The server doesn't return a header with the size of the uncompressed data (only the size of the compressed one) so I followed the usage instructions in zlib: http://www.zlib.net/zlib_how.html
And they do exactly what I'm doing. Any idea what I could be missing? the next_in and avail_in members in the stream seem to be updated correctly after the first iteration. Oh, and if it's any useful, the error message when the data error is issued is: "invalid distance too far back".
Any thoughts? Thanks.
The Deflate/Inflate compression/decompression algorithm uses a 32Kb circular buffer. So a 16Kb buffer can never work if the decompressed data is bigger than 16Kb. (Not strictly true, because the data is allowed to be split into blocks, but you need to assume that there may be 32Kb blocks in there.) So just set buffer_size = 32768 and you should be OK.