CSV column formatting while spooling a csv file in sqlplus - unix

How do i extract a number formatted column when i spool to a csv file from unix when the column is varchar in database?
Number format in CSV is 5.05291E+12
should actually be 5052909272618

This problem is not a Unix or ksh problem, but an Excel problem. Assuming Excel is the default application for a .csv file, When you double-click to open in Excel, it makes the column type "General" by default and you get the scientific notation instead of the text value as expected. If your numeric data has leading zeroes they will be removed also.
One way to fix is to start Excel (this example in the 2010 version), the go to Data/get external data/from text, follow the wizard making sure to set the "column data format" to "text" for the column that contains your large number (click on the column in the "data preview" section first). Leading zeroes will be preserved also.
You could code a vba macro that would open with all columns as text (a little searching will show you some examples) but there seems to be no place to tell Excel to treat columns as "text" by default.

There was need to develop report and I was also facing the same issue.i found that there is one workaround/solution. e.g.
your table name --> EMPLOYEE
contains 3 columns colA,colB,colC. the problem is with colB. after connecting to sqlplus you can save the the data in spool file as :
select colA|| ','''||colB||''',' ||colC from employee
sample result:
3603342979,'8938190128981938262',142796283 .
when excel opens the file it displays the result "as-it-is"

Related

When exported to csv it changes some cell names, how can i prevent this?

I have some data points (gene id) that are like "dates" for example MAR1 but when i export it to csv it keeps changing it to 1.Mar as if it's an actual date.
How can I keep it as MAR1 not keep changing it as date format?
When i try to fix this on excel it trying to change it into txt format it changes into cell number.
You should look into PowerQuery if you are working with CSV format.
Open a blank workbook, instead of opening the file go to the Data Tab -> Get Data -> From Text/CSV and locate the file.
(You can also pick Excel file if you plan to use writexl)
Click "Transform Data" and from there you can pick which formats you want by clicking on the small icons in the columns (in your case set the column to text). You can also do many other things such as sort or convert a data to a week number.
Click "Close & Load" and it will appear in a new tab as a table.
What's nice about PowerQuery if you are doing any post analysis in Excel is that next time you export in R, you only have to hit "Refresh All" in the data tab and it will pull your new data in and apply transformations provided you haven't changed any of the column/names.

Missing delimiter error when importing html text

Playing with Azure Machine Learning using the Designer and am getting a "Delimiter not found" error when importing my data.
I originally started with a few hundred html files stored as azure blobs. Each file would be considered a single row of text, however, I had no luck importing these files for further text analytics.
I created a Data Factory job that imported each file, stripped all the tabs, quotes, cr/lf from the text, added a column for the file name and stored it all as a combined tab-delimited file. In notepad++ I can confirm that the format is FileName tab HtmlText. This is the file I'm trying to import into ML and getting the missing delimiter message as I'm trying to define the import module.
Here is the error when I try and create a dataset:
{
"message": "'Delimiter' is not specified or invalid."
}
Question 1: Is there a better way to do text analytics on a large collection of html files?
Question 2: Is there a format I need to use in my combined .tsv file that works?
Question 3: Is there maybe a max length to the string column? My html can be 10's of thousands of characters long.
you're right that it might be line length, but my guess is that there are still some special characters (i.e. anything starting with \ that aren't properly escaped or removed. How did you scrape and strip the text data? Have you tried using beautifulsoup?

How to split one column containing several values so each column only contains one value?

starting situation as follows:
I've got a csv files with roughly 3000 rows, but only 1 column. In each of the rows there are several values included.
Now I want to assign only one value per column.
How do I manage to do that?
convert the file into txt format and then open the data using MS excel. Don't directly open the file. Open it using Open option in file menu. When you do this a text wizard will appear. You can then split your data by using delimited such as commas, spaces and form multiple columns. Once you are done with it, you save the file in csv format

Export From Teradata Table to CSV

Is it possible to transfer the date from the Teradata Table into .csv file directly.
Problem is - my table has more that 18 million rows.
If yes, please send tell me the process
For a table that size I would suggest using the FastExport utility. It does not natively support a CSV export but you can mimic the behavior.
Teradata SQL Assistant will export to a CSV but it would not be appropriate to use with a table of that size.
BTEQ is another alternative that may be acceptable for a one-time dump if the table.
Do you have access to any of these?
It's actually possible to change the delimiter of exported text files within Teradata SQL Assistant, without needing any separate applications:
Go to Tools > Options > Export/Import. From there, you can change the Use this delimiter between column option from {Tab} to ','.
You might also want to set the 'Enclose column data in' option to 'Double Quote', so that any commas in the data itself don't upset the file structure.
From there, you use the regular text export: File > Export Results, run the query, and select one of the Delimited Text types.
Then you can just use your operating system to manually change the file extension from .txt to .csv.
These instructions are from SQL Assistant version 16.20.0.7.
I use the following code to export data from the Teradata Table into .csv file directly.
CREATE EXTERNAL TABLE
database_name.table_name (to be created) SAMEAS database_name.table_name (already existing, whose data is to be exported)
USING (DATAOBJECT ('C:\Data\file_name.csv')
DELIMITER '|' REMOTESOURCE 'ODBC');
You can use FastExport utility from Teradata Studio for exporting the table in CSV format. You can define the delimiter as well.
Very simple.
Basic idea would be to export first table as a TXT file and then converting TXT t o CSV using R...read.table ()---> write.csv().....
Below are the steps of exporting TD table as txt file:
Select export option from file
Select all records from the table you want to export
Save it as a TXT file
Then use R to convert TXT file to CSV (set working directory to the location where you have saved your big TXT file):
my_table<-read.table("File_name.txt", fill = TRUE, header = TRUE)
write.csv(my_table,file = "File_name.csv")
This had worked for 15 million records table. Hope it helps.

How do I prevent data from displaying in scientific format?

I have this sql server query that I am running in my .net app.
(CONVERT(VARCHAR(8), EventDate, 112)+ substring(RequestedBy,1,1)+right( '0000000' + convert( varchar( 7 ), ContactID ), 7 )) as Contacts
It produces the following results in the following format:
20120731e0000001
20120731f0000002
20120731p0000003
This is the result and format that we want.
Problem is when we click on export icon to export these results to excel, the first one changes to the scientific format like 2.01E+08.
Any date that has e in the middle such as 20120731*e*0000001 turns into scientific data.
The rest is just fine.
Any ideas how to fix this?
I want to apologize in advance if I stick the wrong tag in the Tags section since I am not sure where the fix could come from.
The formatting is happening when Excel opens your exported file. Simply change the column to have "formatted text" of string so that it displays as the original format.
When you open the exported data file directly with Excel, all formats are set as General. In General format, Excel applies formatting to what it recognizes as numbers and dates. So, to be clear, the issue is not with your export, but rather with how Excel is reading your data by default. Try the following to get around this issue.
Export to CSV. Then, rather than opening the CSV in Excel, use Excel's 'Get External Data From Text' tool to import the data into an empty workbook. Here you can specify to treat this field as TEXT rather than as GENERAL.
Note that once Excel applies number (or date) format to a cell, the data in the cell has been changed. No application of a new format will bring back your desired data. For this reason you must specify the format before Excel opens the file.

Resources