R Linux Shell convert multi-sheet xls to csv in batch - r

In R i have a script gets content of multiple xls files <Loop over directory to get Excel content>.
All files are about 2 MB. The script takes a few seconds for 3 files, but is now running for 6 hours on a Debian i7 system without results on 120 files.
A better solution is therefore [hopefully] to convert all xls files to csv using ssconvert, using a bash script <Linux Shell Script For Each File in a Directory Grab the filename and execute a program>:
for f in *.xls ; do xls2csv "$f" "${f%.xls}.csv" ; done
This script does the job, however my content is in sheet nr 14, whereas the csv files produced by this script just return the first sheet [i replaced 'xls2csv' with 'ssconvert'].
Can this script be adopted to pickup only sheet nr 14 in the workbook?

If you know the worksheet name, you can do this:
for f in *.xls ; xls2csv -x "$f" -w sheetName -c "${f%.xls}.csv";done
To see all the xls2csv details see here.
EDIT
The OP find the right answer, so I edit mine to add it :
for f in *.xls ; do xls2csv -x "$f" -f -n 14 -c "${f%.xls}.csv"

For this job I use a python script named ssconverter.py (which you can find here, scroll down and download the two attachments, ssconverter.py and ooutils.py), which I call directly from R using system().
It can extract a specific sheet in the workbook, not only by name but also by sheet number, for example:
ssconverter.py infile.xls:2 outfile.csv
to extract the second sheet.
You need to have python and python-uno installed.

Related

concatenating UNIX files on Mainframe

I am using BPXBATCH to concatenate an unknown number of files to 1 single file, then porting the single file to the mainframe; The files are VB: The files append after the last byte of previous file and I would like to append new file at beginning of new record on the Single file
What Data looks like:
File1BDT253748593725623.....File2BDT253748593725623.......
...............File3BDT253748593725623....
Here is what I would like it to look like:
File1BDT253748593725623.....
File2BDT253748593725623.......
...............
File3BDT253748593....
725623
Here is the BPXBATCH SH command I am using.
BPXBATCH SH cat /u/icm/comq/tmp1/rdq40.img.bin* > +
/u/icm/comq/tmp1/rdq40.img.all
Does anyone know a way to accomplish this?
You should use something like:
SH for f in /u/icm/comq/tmp1/rdq40.img.bin* ; do cat $f >> /u/icm/comq/tmp1/rdq40.img.all ; done
you can also copy your file to an MVS Sequential Dataset with the following syntax "//'RDQ40.IMG.ALL'". Not all shell commands understand it. cp and mv does.

Converting this code from R to Shell script?

So I'm running a program that works but the issue is my computer is not powerful enough to handle the task. I have a code written in R but I have access to a supercomputer that runs a Unix system (as one would expect).
The program is designed to read a .csv file and find everything with the unit ft3(monthly total) in the "Units" column and select the value in the column before it. The files are charts that list things in multiple units.
To convert this program in R:
getwd()
setwd("/Users/youruserName/Desktop")
myData= read.table("yourFileName.csv", header=T, sep=",")
funData= subset(myData, units="ft3(monthly total)", select=units:value)
write.csv(funData, file="funData.csv")
To a program in Shell Script, I tried:
pwd
cd /Users/yourusername/Desktop
touch RunThisProgram
nano RunThisProgram
(((In nano, I wrote)))
if
grep -r yourFileName.csv ft3(monthly total)
cat > funData.csv
else
cat > nofun.csv
fi
control+x (((used control x to close nano)))
chmod -x RunThisProgram
./RunThisProgram
(((It runs for a while)))
We get a funData.csv file output but that file is empty
What am I doing wrong?
It isn't actually running, because there are a couple problems with your script.
grep needs the pattern first, and quoted; -r is for recursing a
directory...
if without a then
cat is called wrong so it is actually reading from stdin.
You really only need one line:
grep -F "ft3(monthly total)" yourFileName.csv > funData.csv

Unzip only limited number of files in linux

I have a zipped file containing 10,000 compressed files. Is there a Linux command/bash script to unzip only 1,000 files ? Note that all compressed files have same extension.
unzip -Z1 test.zip | head -1000 | sed 's| |\\ |g' | xargs unzip test.zip
-Z1 provides a raw list of files
sed expression encodes spaces (works everywhere, including MacOS)
You can use wildcards to select a subset of files. E.g.
Extract all contained files beginning with b:
unzip some.zip b*
Extract all contained files whose name ends with y:
unzip some.zip *y.extension
You can either select a wildcard pattern that is close enough, or examine the output of unzip -l some.zip closely to determine a pattern or set of patterns that will get you exactly the right number.
I did this:
unzip -l zipped_files.zip |head -1000 |cut -b 29-100 >list_of_1000_files_to_unzip.txt
I used cut to get only the filenames, first 3 columns are size etc.
Now loop over the filenames :
for files in `cat list_of_1000_files_to_unzip.txt `; do unzip zipped_files.zip $files;done
Some advices:
Execute zip to only list a files, redirect output to some file
Truncate this file to get only top 1000 rows
Pass the file to zip to extract only specified files

UNIX ls add text and bulk run conversion to csv for all files in dir

Perhaps this was not asked correctly> This is my first post to stackoverflow.
I create a file that looks like this: and run it as a shell from terminal
#!/bin/sh
in2csv -f xls "A1.csv" > "export/A1.csv" |
in2csv -f xls "Allison am sch..XLS" > "export/Allison am sch..csv" |
in2csv -f xls "B & C Rmap am sch.XLS" > "export/B & C Rmap am sch.csv" |
in2csv -f xls "BOW K's am sch.XLS" > "export/BOW K's am sch.csv" |
cat *.csv > all.csv
Since no one tool does it. Can I build the text file from terminal that will create the shell file?
first ls > files.txt # this list all files which i want to add text in front and behind:
in front in2csv -f xls " file names goes here
behind " > "export/Allison am sch..csv" |
once I have all this I can just simply run doing a sh command.
Or is there a better way. without having to use another program with lots of dependiencies to manage? Thanks
I would like to know how to use unix (terminal) to get all files in a dir that are .xls convert to csv. I can do it but would like to know better way. My step is to
ls > files.txt
then import the txt file into SAS where I add text to the files to use csvkit for the conversion: The SAS code looks like this, where VAR1 is the text of the name of the xls files.
data names2;
set names;
example VAR1 = "Lakota Trade ctr am sch.XLS"
there are many files like this: Here is SAS code:
data file;
set file;
f = "in2csv -f xls " || '"'|| strip(VAR1) || '"'|| " > " || '"'||strip(tranwrd(VAR1, "XLS", "csv"))||'"';
drop var1;
run;
I then can copy each line into terminal and let csvkit:in2csv do the conversion. I do not want to use the native import of SAS for the conversion. I would like to know how to do something like this totally in the unix terminal or even Perl.
check out the package gdata. This provides a function that uses perl, read.xls, that will convert .xlsx files to .csv. You can probably dig into the package to see how to do it from the command line instead of R.

Open files listed in txt

I have a list of files with their full path in a single text file. I would like to open them all at once in Windows. The file extension will tell Windows what programme to use. Can I do this straight from the command line or would I need to make a batch file? Tips on how to write the batch file appreciated.
My text file looks like the following:
J:/630/630A/SZ299_2013-04-19_19_36_52_M01240.WAV
J:/630/630A/SZ299_2013-04-19_20_15_39_M02312.WAV
J:/630/630A/SZ299_2013-04-19_21_48_07_M04876.WAV
etc
The .WAV extension is associated with Adobe Audition, which is a sound editing programme. When each path is hyperlinked in an Excel column, they can be opened with one click. Clicking on the first link will open both Audition and the hyperlinked file in it. Clicking another hyperlink will open the next file in the same instance of the programme. But this is too slow for hundreds of paths. If I open many files straight from R, e.g.
shell("J:/630/630A/SZ299_2013-04-19_19_36_52_M01240.WAV", intern=TRUE)
shell("J:/630/630A/SZ299_2013-04-19_20_15_39_M02312.WAV", intern=TRUE)
etc
each file will be opened in a new instance of the programme, which is nasty. So batch seems preferable.
for /f "delims=" %%a in (yourtextflename) do "%%a"
should do this as a batch line.
You could run this directly from the prompt if you like, but you'd need to replace each %% with % to do so.
It's a lot easier to put the code into a batch:
#echo off
setlocal
for /f "delims=" %%a in (%1) do "%%a"
then you'd just need to enter
thisbatchfilename yourtextfilename
and yourtextfilename will be substituted for %1. MUSCH easier to type - and that's what batch is all about - repetitive tasks.
Following on from this post, which uses the identify function in R to create a subset selection of rows (from a larger dataset called "testfile") by clicking on coordinates in a scatterplot. One of the columns contains the list of Windows paths to original acoustic datafiles. The last line below will open all files in the listed paths in only one instance of the programme linked to the Windows file extension.
selected_rows = with(testfile, identify(xvalue, yvalue))
SEL <-testfile[selected_rows,]
for (f in 1:nrow(SEL)){system2("open",toString(SEL[f,]$path))}

Resources