Export From Teradata Table to CSV - teradata

Is it possible to transfer the date from the Teradata Table into .csv file directly.
Problem is - my table has more that 18 million rows.
If yes, please send tell me the process

For a table that size I would suggest using the FastExport utility. It does not natively support a CSV export but you can mimic the behavior.
Teradata SQL Assistant will export to a CSV but it would not be appropriate to use with a table of that size.
BTEQ is another alternative that may be acceptable for a one-time dump if the table.
Do you have access to any of these?

It's actually possible to change the delimiter of exported text files within Teradata SQL Assistant, without needing any separate applications:
Go to Tools > Options > Export/Import. From there, you can change the Use this delimiter between column option from {Tab} to ','.
You might also want to set the 'Enclose column data in' option to 'Double Quote', so that any commas in the data itself don't upset the file structure.
From there, you use the regular text export: File > Export Results, run the query, and select one of the Delimited Text types.
Then you can just use your operating system to manually change the file extension from .txt to .csv.
These instructions are from SQL Assistant version 16.20.0.7.

I use the following code to export data from the Teradata Table into .csv file directly.
CREATE EXTERNAL TABLE
database_name.table_name (to be created) SAMEAS database_name.table_name (already existing, whose data is to be exported)
USING (DATAOBJECT ('C:\Data\file_name.csv')
DELIMITER '|' REMOTESOURCE 'ODBC');

You can use FastExport utility from Teradata Studio for exporting the table in CSV format. You can define the delimiter as well.

Very simple.
Basic idea would be to export first table as a TXT file and then converting TXT t o CSV using R...read.table ()---> write.csv().....
Below are the steps of exporting TD table as txt file:
Select export option from file
Select all records from the table you want to export
Save it as a TXT file
Then use R to convert TXT file to CSV (set working directory to the location where you have saved your big TXT file):
my_table<-read.table("File_name.txt", fill = TRUE, header = TRUE)
write.csv(my_table,file = "File_name.csv")
This had worked for 15 million records table. Hope it helps.

Related

Import data csv with particular quotes in R

I have a csv like this:
"Data,""Ultimo"",""Apertura"",""Massimo"",""Minimo"",""Var. %"""
"28.12.2018,""86,66"",""86,66"",""86,93"",""86,32"",""0,07%"""
What is the solution for importing correctly please?
I tried with read.csv("IT000509408=MI Panoramica.csv", header=T,sep=",", quote="\"") but it doesn't work.
Each row in your file is encoded as a single csv field.
So instead of:
123,"value"
you have:
"123,""value"""
To fix this you can read the file as csv (which will give you one field per row without the extra quotes), and then write the full value of that field to a new file as plain text (without using a csv writer).

How to drop delimiter in hive table

I download the file using download.file function in R
Save it locally
Move it to HDFS
My hive table is mapped to this location of file in HDFS. Now, this file has , as the delimiter/separator which I cannot change using download.file function from R.
There is a field that has , in its content. How do I tell hive to drop this delimiter if found anywhere within the field contents?
I understand we can change the delimiter but is there really a way I can drop it like sqoop allows us to do?
My R script is as simple as
download.file()
hdfs.put
Is there a workaround?

How do you convert a table that is in a .docx file to an .xlsx or a csv file in python or R?

I have a document like the one mentioned below. There is some text above the table and then there's a table. How do I extract table from the docx file in R or python and then convert it to a csv file or an xlsx file. I don't even mind a .txt file if it retains the exact format of the table. I just don't know what to do with this doc file.
If the document is docx, then it is all XML. The docx file is just a zip container with various XML "parts". Take a look at the Open XML SDK for some ideas on how to parse the file. This SDK is C#, but maybe you can get some ideas from that.
If you are just going to extract the table it should not be too bad ( Updating complex docx documents can get very complicated. I'm working on this now.) My tip to make things easier is to go to the table properties, then to the Alt Text tab and add a unique value to the "Title" field. The value will show up like this within the table properties: <w:tblCaption w:val="TBL1"/>, which will make the table easier to extract from the XML.
If you are going to work with Open XML documents, get the OOXML Chrome Addin. That is great for exploring the internals of docx files.
Note: I saw the link to another SO answer for this. That uses "automation", which is certainly easier to code, but Office via "automation" on the server is not recommended by MS.
You can extract tables from docx using python-docx in python.
Try this:
from docx import Document
import pandas as pd
document = Document(file_path)
tables = []
for index,table in enumerate(document.tables):
df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
df[i][j] = cell.text
pd.DataFrame(df).to_excel("Table# "+str(index)+".xlsx")

CSV column formatting while spooling a csv file in sqlplus

How do i extract a number formatted column when i spool to a csv file from unix when the column is varchar in database?
Number format in CSV is 5.05291E+12
should actually be 5052909272618
This problem is not a Unix or ksh problem, but an Excel problem. Assuming Excel is the default application for a .csv file, When you double-click to open in Excel, it makes the column type "General" by default and you get the scientific notation instead of the text value as expected. If your numeric data has leading zeroes they will be removed also.
One way to fix is to start Excel (this example in the 2010 version), the go to Data/get external data/from text, follow the wizard making sure to set the "column data format" to "text" for the column that contains your large number (click on the column in the "data preview" section first). Leading zeroes will be preserved also.
You could code a vba macro that would open with all columns as text (a little searching will show you some examples) but there seems to be no place to tell Excel to treat columns as "text" by default.
There was need to develop report and I was also facing the same issue.i found that there is one workaround/solution. e.g.
your table name --> EMPLOYEE
contains 3 columns colA,colB,colC. the problem is with colB. after connecting to sqlplus you can save the the data in spool file as :
select colA|| ','''||colB||''',' ||colC from employee
sample result:
3603342979,'8938190128981938262',142796283 .
when excel opens the file it displays the result "as-it-is"

Speed up read.dbf in R (problems with importing large dbf file)

I have a dataset given in .dbf format and need to import it into R.
I haven't worked with such extension previously, so have no idea how to export dbf file with multiple tables into different format.
Simple read.dbf has been running hours and still no results.
Tried to look for speeding up R performance, but not sure whether it's the case, think the problem is behind reading the large dbf file itself (weights ~ 1.5Gb), i.e. the command itself must be not efficient at all. However, I don't know any other option how to deal with such dataset format.
Is there any other option to import the dbf file?
P.S. (NOT R ISSUE) The source of the dbf file uses visual foxpro, but can't export it to other format. I've installed foxpro, but given that I've never used it before, I don't know how to export it in the right way. Tried simple "Export to type=XLS" command, but here comes a problem with encoding as most of variables are in Russian Cyrillic and can't be decrypted by excel. In addition, the dbf file contains multiple tables that should be merged in 1 big table, but I don't know how to export those tables separately to xls, same as I don't know how to export multiple tables as a whole into xls or csv, or how to merge them together as I'm absolutely new to dbf files theme (though looked through base descriptions already)
Any helps will be highly appreciated. Not sure whether I can provide with sample dataset, as there are many columns when I look the dbf in foxpro, plus those columns must be merged with other tables from the same dbf file, and have no idea how to do that. (sorry for the mess)
Your can export from Visual FoxPro in many formats using the COPY TO command via the Command Window, as per the VFP help file.
For example:
use mydbf in 0
select mydbf
copy to myfile.xls type xl5
copy to myfile.csv type delimited
If you're having language-related issues, you can add an 'as codepage' clause to the end of those. For example:
copy to myfile.csv type delimited as codepage 1251
If you are not familiar with VFP I would try to get the raw data out like that, and into a platform that you are familiar with, before attempting merges etc.
To export them in a loop you could use the following in a .PRG file (amending the two path variables at the top to reflect your own setup).
Close All
Clear All
Clear
lcDBFDir = "c:\temp\" && -- Where the DBF files are.
lcOutDir = "c:\temp\export\" && -- Where you want your exported files to go.
lcDBFDir = Addbs(lcDBFDir) && -- In case you forgot the backslash.
lcOutDir = Addbs(lcOutDir)
* -- Get the filenames into an array.
lnFiles = ADir(laFiles, Addbs(lcDBFDir) + "*.DBF")
* -- Process them.
For x = 1 to lnFiles
lcThisDBF = lcDBFDir + laFiles[x, 1]
Use (lcThisDBF) In 0 Alias currentfile
Select currentfile
Copy To (lcOutDir + Juststem(lcThisDBF) + ".csv") type csv
Use in Select("Currentfile") && -- Close it.
EndFor
Close All
... and run it from the Command Window - Do myprg.prg or whatever.

Resources