Antiword is unable to read header content - filereader

I am using antiword to read a doc file in linux ubuntu machine.
antiword full path of file
Unfortunately it is not reading the header content.Is there any way to read the header content also?

Related

How to convert .xls or .xlxs file to csv file without any plugins or tools using Unix command

I have to convert .xls or .xlxs file to .csv file without using plugins or tools using Unix Command
Is their any way to do this ?
I Tried to do like this below ...But not working
Change the characterSet code from .xls file to UTF-8 encoding
Then create file again with extension change
cp temp.xls temp.csv
It is possible, but you need to realise that an *.xls file is a zipped directory structure (just unzip such a file, using Winzip or 7-zip). The unzipping can also be done using UNIX commands.
But what then? The directory structure is quite complicated to understand, and in order to create a script or a program which can do this (without using any external tools) is a tremendous work, so I'd propose you, either to use external tools anyway, or to make sure the files you receive already are CSV format.

What R command to use to force download files from iCloud

On a mac using iCloud file optimization, large files that are seldom used are uploaded to iCloud and only a small pointer file is left. When I look for the file in Finder, I see the file name and to the left is an icon that indicates that the file is in the cloud. To access the file, I click on the icon and the file is downloaded. With the file.exist command, R returns FALSE for the existence of the file. But after some research I found that the file link is stored in a directory below ~/Library/Mobile\ Documents/com~apple~CloudDocsand the file name is changed to xxx.icloud where xxx is the original file name.
Here's an example of the path to a a directory that holds a .icloud file from a shell in my mac
/Users/gcn/Library/Mobile Documents/com~apple~CloudDocs/Documents/workspace/nutmod/data-raw/NutrientData
I can query for the existence of the file with exists(xxx.icloud). But how do I tell my mac to download the iCloud file and then read it in? Using something like read.table or read.csv doesn't work because the pointer file is not csv.
You can read a csv file directly from a iCloud folder on the Mac by using the path to that folder. Get the path by finding it in the finder. Then right click on the filename at the bottom of the finder window where it shows all the folders leading to the file and choose: Copy "YourFile" as Pathname.
That path will look something like this:
"/Users/NAME/Library/Mobile
Documents/com~apple~CloudDoc/Docs/YourFile.csv"
Use that in your read code:
iCloudDat <- read_csv("/Users/NAME/Library/Mobile Documents/com~apple~CloudDocs/Documents/YourFileName.csv")
That should work.
If the extension isn't .txt or .csv read.table and read.csv won't work.
you have to download the file and extract the tables to a readable format.
you can download the file using download.file() which is is the utils basic library.

Rsync without creating hidden file in destination

rsync creates a temporary hidden file during a transfer, but the file is renamed when the transfer is complete. I would like to rsync files without creating a hidden file.
alvits is correct, --inplace will fix this for you. I found this when I had issues syncing music to my phone (mounted on ubuntu with jmtpfs). I would get a string of errors renaming the temporary files to replace existing files.

PIG UDF load .gz file failed

I wrote my UDF to load file into Pig. It works well for loading text file, however, now I need also be able to read .gz file. I know I can unzip the file then process, but I want just read .gz file without to unzip it.
I have my UDF extends from LoadFunc, then in my costom input file MyInputFile extends TextInputFormat. I also Implemented MyRecordReader. Just wondering if extends TextInputFormat is the problem? I tried FileInputFormat, still cannot read the file. Anyone wrote UDF read data from .gz file before?
TextInputFormat handles gzip files as well. Have a look at its RecordReader's (LineRecordReader) initialize() method where the proper CompressionCodec is initialized. Also note that gzip files aren't splittable (even if they are located on S3) so you might either need to use a splittable format (e.g: LZO) or an uncompressed data to exploit the desired level of parallel processing.
If your gzipped data is stored locally you can uncompress and copy it to hdfs in one step as described here. Or if it's already on hdfs
hadoop fs -cat /data/data.gz | gzip -d | hadoop fs -put - /data/data.txt would be more convenient.

LinqToExcel External table is not in the expected format

I've been using LinqToExcel to parse an exel document for a while and suddenly it's stoped working.
I'm getting the following error:
External table is not in the expected format.
Any ideas why this is happening? Or how to fix?
if (File.Exists(filenameFull))
{
var excel = new ExcelQueryFactory(filenameFull);
IList<Row> scanningRangesRows =
excel.Worksheet("B - Scanning Ranges").ToList();
I was using version LinqToExcel 1.6.3, when the problem started happening I updated to the latest version LinqToExcel 1.6.6 to no avail.
I've just noticed that the file I'm downloading is significantly smaller than previous verisons. I opened it in notepad and I can see [Content_Types].xml amongst the binary data. So it appears that the data source is now being saved as an xml represention of the xls file with the same extension. When I open the same file manually in Excel it popup with
The file you are trying to open '', is in a different format
than specified by the file extension. Verify that the file is from a
trusted source before opening the file. Do you want to open the file
now?
On clicking yes the file still opens and looks the same as previous versions.
It's probably something to do with the file.
Maybe it's being saved as an .xlsx type of file. Can you try renaming the file extenstion to .xlsx and see if that works.

Resources