Teradata Parallel Transporter: Quoted CSV Null Values - teradata

I am converting a json file into multiple csv files, which I then want to load in a Teradata database using Teradata Parallel Transporter.
Since I didn't find any other possibility than using quotation marks for every column (otherwise there would be different problems with escape chars..), my csv files now looks similar to this:
"abc"|"def"|""|"x"|""
My problem is: the empty columns (="") don't become NULL in my table, but empty strings.
Without the quotation marks, they would be null. Is there an easy way to "automatically" make these columns null?

Related

Dump CSV files to Postgres and read in R while maintaining column data types

I'm new to R and correctly working a project refactoring code reading from csv files to from a database.
The work includes dumping the csv files to a Postgres database, and modify existing R scripts to ingest input data from the db tables instead of csv files for subsequent transformation
Right now I ran into an issue that the dataframe columns returned from dbGetQuery() have different modes and classes than the original dataframe from read_csv()
Since the data I'm reading in has hundreds of columns, it is not that convenient to explicitly specify the mode and class for each column.
Is there an easy way to make the dataframe with same schema as the old one, so I can apply existing code for data transformation on the dataframe
i.e
when I run a comparison between the old dataframe and the new one from db, this is what I see
==================================
VARIABLE CLASS.(from csv) CLASS.(from db)
----------------------------------
col1 numeric integer64
col2 numeric integer
col3 numeric integer
----------------------------------
This won't be possible in general, because some SQL datatypes (e.g. DATE, TIMESTAMP, INTERVAL) have no equivalent in R, and the R data type factor has no equivalent in SQL. Depending on your R version, strings are automatically converted to factors, so it will at least be useful to import the data with stringsAsFactors=FALSE.

How to fix "Each row of output must be identified by a unique combination of keys" error in R

I'm new to R. I have uBiome data (in csv) that I want to convert to Phyloseq. Been trying to use this R package called Actino, but whenever I use the actino::experiment_to_phyloseq() function, "Error: Each row of output must be identified by a unique combination of keys" shows up. Also says "Keys are shared for 2956 rows" along with a list of row pairs.
I have two files: the csv file (taxannotation.csv) and the mapfile (mapfile.csv). My csv file contains the columns ssr, tax_name, tax_rank, count, and percent.
The mapfile contains the ssrs on the first column similar to those in the csv file along with other attributes.
I use the code
taxannotation.ps<-experiment_to_phyloseq(taxannotation,mapfile)
While the ssrs in my csv file repeat in different rows, I believe that the other columns such as tax_name, tax_rank, count, and percent all give a different identity to each row.
Already tried searching for an answer, but never really found one that's informative or helpful.

Read a PostgreSQL local file into R in chunks and export to .csv

I have a local file in PostgreSQL format that I would like to read into R into chunks and export it as .csv.
I know this might be a simple question but I'm not at all familiar with PostgreSQL or SQL. I've tried different things using R libraries like RPostgreSQL, RSQLite and sqldf but I couldn't get my head around this.
If your final goal is to create a csv file, you can do it directly using PostgreSQL.
You can run something similar to this:
COPY my_table TO 'C:\my_table.csv' DELIMITER ',' CSV HEADER;
Sorry if I misunderstood your requirement.
The requirement is to programmatically create a very large .csv file from scratch and populate it from data in a database? I would use this approach.
Step 1 - isolate the database data into a single table with an auto incrementing primary key field. Whether you always use the same table or create and drop one each time depends on the possibility of concurrent use of the program.
Step 2 - create the .csv file with your programming code. It can either be empty, or have column headers, depending on whether or not you need column headers.
Step 3 - get the minimum and maximum primary key values from your table.
Step 4 - set up a loop in your programming code using the values from Step 3. Inside the loop:
query the table to get x rows
append those rows to your file
increment the variables that control your loop
Step 5 - Do whatever you have to do with the file. Don't try to read it with your programming code.

Export gridview to csv

I am trying export gridview to csv.
I am adding actual record into cell of excel not trying html to excel.
I am executing sp, taking record into datatable.
Looping through datatable, writing into file using streamwriter.
But problem occur when my column has long number, csv shows 890+32 like this.
I don't want like this, want actual number like 89012345676898899998776766544333445556677.
How to do that? I am not using Gridview.RenderControl(htmltextwrtter).
Try to cast the number into a String and add a leading ' to force excel to see the entry as text, like this:
'89012345676898899998776766544333445556677
Otherwise excel will see them as numbers (which they are, of course) and auto-format them in the unwanted format.

CSV column formatting while spooling a csv file in sqlplus

How do i extract a number formatted column when i spool to a csv file from unix when the column is varchar in database?
Number format in CSV is 5.05291E+12
should actually be 5052909272618
This problem is not a Unix or ksh problem, but an Excel problem. Assuming Excel is the default application for a .csv file, When you double-click to open in Excel, it makes the column type "General" by default and you get the scientific notation instead of the text value as expected. If your numeric data has leading zeroes they will be removed also.
One way to fix is to start Excel (this example in the 2010 version), the go to Data/get external data/from text, follow the wizard making sure to set the "column data format" to "text" for the column that contains your large number (click on the column in the "data preview" section first). Leading zeroes will be preserved also.
You could code a vba macro that would open with all columns as text (a little searching will show you some examples) but there seems to be no place to tell Excel to treat columns as "text" by default.
There was need to develop report and I was also facing the same issue.i found that there is one workaround/solution. e.g.
your table name --> EMPLOYEE
contains 3 columns colA,colB,colC. the problem is with colB. after connecting to sqlplus you can save the the data in spool file as :
select colA|| ','''||colB||''',' ||colC from employee
sample result:
3603342979,'8938190128981938262',142796283 .
when excel opens the file it displays the result "as-it-is"

Resources