In Teradata Assistant I run a query and make export to CSV file.
This CSV file needs to be imported in Access database.
Problem is - only files, which are smaller than 2 GB, can be imported into Access database.
So the question is - is it possible to cut the big CSV file into smaller pieces (for example 2GB), during the export?
1) You Export the query to CSV file (2 Go) from Teradata Assistant : File name : export_file.csv
2) You open the Shell console. I Prefer MSYS MSYS
3) You type the command in the console : split -l 2000 export_file.csv (2000 : number of lines/file)
4) You will have your file split into several small files
5) You can after import this files in Access database
I suggest you to use shell utilities
Use split - e.g. to split a file every 2000 lines (should give you many files):
split -l 2000 <file_name>
For more : visit split-properties
Related
i have many files (with no extension) that i would like to open with R and extract data.frames.
it says in the header that it is a
--
m -- PostgreSQL database dump
It has many tables inside it, and the only noticeable pattern i detected is that and the end of each table it has a "." (see print screen)
is there a smart way to import this file and and extract/break it into meaningful dataframes?
Thank you in advance!
I have a large file stored in linux. I don't want to transfer the file onto my laptop and then read into R. I was hoping I can read the large file into R without storing the file on my laptop (as my storage is nearly full). The file I want to read into R Studio is located in my university file path: /data/genes/h3/PROs_GWAS/output_PROs.bgen
The file is not a txt file but a genotype file e.g. ending is .bgen
I have tried the command below:
d = read.table( pipe ('ssh hkj7#spectre2.le.ac.uk "ls /data/genes/h3/PROs_GWAS/output_PROs.bgen"'), header = T )
However, this prompts me to a password but then an error which I am assuming is because of the read.table thinking the file is a txt file.
Error in read.table(pipe("ssh hkj7#spectre2.le.ac.uk \"ls /data/genes/h3/PROs_GWAS/output_PROs.bgen\""), :
no lines available in input
I am not sure how to get round this.
Any help will be greatly appreciated!
I have a dataset given in .dbf format and need to import it into R.
I haven't worked with such extension previously, so have no idea how to export dbf file with multiple tables into different format.
Simple read.dbf has been running hours and still no results.
Tried to look for speeding up R performance, but not sure whether it's the case, think the problem is behind reading the large dbf file itself (weights ~ 1.5Gb), i.e. the command itself must be not efficient at all. However, I don't know any other option how to deal with such dataset format.
Is there any other option to import the dbf file?
P.S. (NOT R ISSUE) The source of the dbf file uses visual foxpro, but can't export it to other format. I've installed foxpro, but given that I've never used it before, I don't know how to export it in the right way. Tried simple "Export to type=XLS" command, but here comes a problem with encoding as most of variables are in Russian Cyrillic and can't be decrypted by excel. In addition, the dbf file contains multiple tables that should be merged in 1 big table, but I don't know how to export those tables separately to xls, same as I don't know how to export multiple tables as a whole into xls or csv, or how to merge them together as I'm absolutely new to dbf files theme (though looked through base descriptions already)
Any helps will be highly appreciated. Not sure whether I can provide with sample dataset, as there are many columns when I look the dbf in foxpro, plus those columns must be merged with other tables from the same dbf file, and have no idea how to do that. (sorry for the mess)
Your can export from Visual FoxPro in many formats using the COPY TO command via the Command Window, as per the VFP help file.
For example:
use mydbf in 0
select mydbf
copy to myfile.xls type xl5
copy to myfile.csv type delimited
If you're having language-related issues, you can add an 'as codepage' clause to the end of those. For example:
copy to myfile.csv type delimited as codepage 1251
If you are not familiar with VFP I would try to get the raw data out like that, and into a platform that you are familiar with, before attempting merges etc.
To export them in a loop you could use the following in a .PRG file (amending the two path variables at the top to reflect your own setup).
Close All
Clear All
Clear
lcDBFDir = "c:\temp\" && -- Where the DBF files are.
lcOutDir = "c:\temp\export\" && -- Where you want your exported files to go.
lcDBFDir = Addbs(lcDBFDir) && -- In case you forgot the backslash.
lcOutDir = Addbs(lcOutDir)
* -- Get the filenames into an array.
lnFiles = ADir(laFiles, Addbs(lcDBFDir) + "*.DBF")
* -- Process them.
For x = 1 to lnFiles
lcThisDBF = lcDBFDir + laFiles[x, 1]
Use (lcThisDBF) In 0 Alias currentfile
Select currentfile
Copy To (lcOutDir + Juststem(lcThisDBF) + ".csv") type csv
Use in Select("Currentfile") && -- Close it.
EndFor
Close All
... and run it from the Command Window - Do myprg.prg or whatever.
Is it possible to transfer the date from the Teradata Table into .csv file directly.
Problem is - my table has more that 18 million rows.
If yes, please send tell me the process
For a table that size I would suggest using the FastExport utility. It does not natively support a CSV export but you can mimic the behavior.
Teradata SQL Assistant will export to a CSV but it would not be appropriate to use with a table of that size.
BTEQ is another alternative that may be acceptable for a one-time dump if the table.
Do you have access to any of these?
It's actually possible to change the delimiter of exported text files within Teradata SQL Assistant, without needing any separate applications:
Go to Tools > Options > Export/Import. From there, you can change the Use this delimiter between column option from {Tab} to ','.
You might also want to set the 'Enclose column data in' option to 'Double Quote', so that any commas in the data itself don't upset the file structure.
From there, you use the regular text export: File > Export Results, run the query, and select one of the Delimited Text types.
Then you can just use your operating system to manually change the file extension from .txt to .csv.
These instructions are from SQL Assistant version 16.20.0.7.
I use the following code to export data from the Teradata Table into .csv file directly.
CREATE EXTERNAL TABLE
database_name.table_name (to be created) SAMEAS database_name.table_name (already existing, whose data is to be exported)
USING (DATAOBJECT ('C:\Data\file_name.csv')
DELIMITER '|' REMOTESOURCE 'ODBC');
You can use FastExport utility from Teradata Studio for exporting the table in CSV format. You can define the delimiter as well.
Very simple.
Basic idea would be to export first table as a TXT file and then converting TXT t o CSV using R...read.table ()---> write.csv().....
Below are the steps of exporting TD table as txt file:
Select export option from file
Select all records from the table you want to export
Save it as a TXT file
Then use R to convert TXT file to CSV (set working directory to the location where you have saved your big TXT file):
my_table<-read.table("File_name.txt", fill = TRUE, header = TRUE)
write.csv(my_table,file = "File_name.csv")
This had worked for 15 million records table. Hope it helps.
I have two pdf files and two text files which are converted into ebcdif format. The two text files acts like cover files for the pdf files containing details like pdf name, number of pages, etc. in a fixed format.
Cover1.det, Firstpdf.pdf, Cover2.det, Secondpdf.pdf
Format of the cover file could be:
Firstpdf.pdf|22|03/31/2012
that is
pdfname|page num|date generated
which is then converted into ebcdic format.
I want to merge all these files in a single file in the order first text file, first pdf file, second text file, second pdf file.
The idea is to then push this single merged file into mainframes using scp.
1) How to merge above mentioned four files into a single file?
2) Do I need to convert pdf files also in ebcdic format ? If yes, how ?
3) As far as I know, mainframe files also need record length details during transit. How to find out record length of the file if at all I succeed in merging them in a single file ?
I remember reading somewhere that it could be done using put and append in ftp. However since I have to use scp, I am not sure how to achieve this merging.
Thanks for reading.
1) Why not use something like pkzip?
2) I don't think converting the pdf files to ebcdic is necessary or even possible. The files need to be transfered in Binary mode
3) Using pkzip and scp you will not need the record length
File merging could easily be achieved by using Cat command in unix with > and >> append operators.
Also, if the next file should start from a new line (as was my case) a blank echo could be inserted between files.