Unable to set SESSION CHARSET to UTF16 in BTEQ - teradata

I have a file that I need to execute within BTEQ that is formatted in UTF16 with BOM. However I get the following error when I try to set the CHARSET before running the file:
.SET SESSION CHARSET 'UTF16';
*** Error: SET SESSION CHARSET UTF16 is not allowed.
I can set this to UTF8 and convert the file to UTF8 and it works. I also can convert the file to UTF8 without BOM and not set CHARSET the script runs. This file is system generated and I'm automating this script deployment so converting the document is not preferred.
ADDING ADDITIONAL INFO:
The input script used for my automation do not have the BOM and are generated in Powershell using the .RUN FILE option to open the file that does have the BOM. Thus the need to do the .SET SESSION CHARSET setting.

To start bteq in utf-16 mode, use below command
>bteq -c utf16
According to teradata documentation:
-c option defines the session character set encoding for a Unicode
session and takes an argument which can be any supported character set
value
It can be verified using .show control charset as below.
T e r a d a t a B T E Q 1 6 . 0 0 . 0 0 . 0 2 f o r W I N 3 2 . P I D : 1 2 1 2
C o p y r i g h t 1 9 8 4 - 2 0 1 6 , T e r a d a t a C o r p o r a t i o n . A L L R I G H T S R E S E R V E D .
E n t e r y o u r l o g o n o r B T E Q c o m m a n d :
.show control charset;
. s h o w c o n t r o l c h a r s e t ;
[ S E T ] S E S S I O N C H A R S E T = U T F 1 6 ;
i m p o r t / e x p o r t e n c o d i n g = U T F 1 6 ;
s t d i n / s t d o u t e n c o d i n g = U T F 1 6 ;
You can check invoking-bteq-to-use-unicode for details

Related

Converting huge dataframe into a multidimensional array

I have a huge data frame with 10000 rows and 2048 columns. I am trying to convert this into a multi-dimensional array in the dimensions of 10000[rows]x32x32x2.
Here each row data should be converted into 32x32x2
> For ex:
1 2 3 4 5 6 7 8 9 10 11
2 a b c d e f g h i j
3 q k l m j g t y u r
4 a e t i o p l m n s
5
>
> I want to create 10000 arrays with dimension of 32x32x2 here for ex
> 5x5x2
1 array
a b c d e
f g h i j
>
> 2nd array
q k l m j
g t y u r
I am fairly new to R building models.
Please can you anyone help me.

R - Adding a total row in Excel output

I want to add a total row (as in the Excel tables) while writing my data.frame in a worksheet.
Here is my present code (using openxlsx):
writeDataTable(wb=WB, sheet="Data", x=X, withFilter=F, bandedRows=F, firstColumn=T)
X contains a data.frame with 8 character variables and 1 numeric variable. Therefore the total row should only contain total for the numeric row (it will be best if somehow I could add the Excel total row feature, like I did with firstColumn while writing the table to the workbook object rather than to manually add a total row).
I searched for a solution both in StackOverflow and the official openxslx documentation but to no avail. Please suggest solutions using openxlsx.
EDIT:
Adding data sample:
A B C D E F G H I
a b s r t i s 5 j
f d t y d r s 9 s
w s y s u c k 8 f
After Total row:
A B C D E F G H I
a b s r t i s 5 j
f d t y d r s 9 s
w s y s u c k 8 f
na na na na na na na 22 na
library(janitor)
adorn_totals(df, "row")
#> A B C D E F G H I
#> a b s r t i s 5 j
#> f d t y d r s 9 s
#> w s y s u c k 8 f
#> Total - - - - - - 22 -
If you prefer empty space instead of - in the character columns you can specify fill = "" or fill = NA.
Assuming your data is stored in a data.frame called df:
df <- read.table(text =
"A B C D E F G H I
a b s r t i s 5 j
f d t y d r s 9 s
w s y s u c k 8 f",
header = TRUE,
stringsAsFactors = FALSE)
You can create a row using lapply
totals <- lapply(df, function(col) {
ifelse(!any(!is.numeric(col)), sum(col), NA)
})
and add it to df using rbind()
df <- rbind(df, totals)
head(df)
A B C D E F G H I
1 a b s r t i s 5 j
2 f d t y d r s 9 s
3 w s y s u c k 8 f
4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 22 <NA>

Statistical testing for multiple columns from a dataframe

For the data frame below I want to perform kolmogorov-smirnov tests for multiple columns. Column ID is the record ID, A-D are factors consisting of 2 levels ('Other' and A,B,C,D respectively. My test variable is in column E.
Now I would like to perform 4 KS tests:
Distributions of E for column A (A vs O)
Distributions of E for column B (B vs O)
Distributions of E for column C (C vs O)
Distributions of E for column A (D vs O)
In reality, I have 80 columns, so I'm looking for a way to perform these 80 tests 'Simultaneously'
ID A B C D E
1 1 O B C O 1
2 2 O O O O 3
3 3 O O O D 2
4 4 A O C D 7
5 5 A B O O 12
6 6 O O O O 4
7 7 O B O O 8
I hope this solves your problem:
dat <- read.table("path/data.txt") # your data imported into my session.
cols <- c("A", "B", "C", "D") #these are the your columnss with categories. We leave the others out.
E <- dat$E # but save the E variable
lapply(cols, function(i){ # Evaluate E at each level of each column
x <- factor(dat[,i])
a <- E[x == levels(x)[1]]
b <- E[x == levels(x)[2]]
ks.test(a, b)
}) #you get a list with the results for each column

Whole dataset shows up, although a subset has been selected and newly defined

I a dataframe which I have subsetted using normal indexing. Code below.
dframe <- dframe[1:10, c(-3,-7:-10)]
But when I write dframe$Symbol I get the output.
BABA ORCL LFC TSM ACT ABBV MA ABEV KMI UPS
3285 Levels: A AA AA^B AAC AAN AAP AAT AAV AB ABB ABBV ABC ABEV ABG ABM ABR ABR^A ABR^B ABR^C ABRN ABT ABX ACC ACCO ACE ACG ACH ACI ACM ACN ACP ACRE ACT ACT^A ACW ADC ADM ADPT ADS ADT ADX AEB AEC AED AEE AEG AEH AEK AEL AEM AEO AEP AER AES AES^C AET AF AF^C ... ZX
I'm wondering what is happening here. Does the dframe dataframe only contain 10 rows or still all rows, but only outputs 10 rows?
Thanks
That's just the way factors work. When you subset a factor, it preserves all levels, even those that are no longer represented in the subset. For example:
f1 <- factor(letters);
f1;
## [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
f2 <- f1[1:10];
f2;
## [1] a b c d e f g h i j
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
To answer your question, it's actually slightly tricky to append all missing levels to a factor. You have to combine the existing factor data with all missing indexes (here I'm referring to the integer indexes that the factor class internally uses to map the actual factor data to its levels vector, which is stored as an attribute on the factor object), and then rebuild a factor (using the original levels) from that combined data. Below I demonstrate this, now randomizing the subset taken from f1 to demonstrate that order does not matter:
set.seed(1); f3 <- sample(f1,10);
f3;
## [1] g j n u e s w m l b
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
factor(c(f3,setdiff(1:nlevels(f3),as.integer(f3))),labels=levels(f3));
## [1] g j n u e s w m l b a c d f h i k o p q r t v x y z
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

Remove variables with factor level 1

I am using the program gs in the bnlearn package for my data frame EMGbin. The dataframe EMGbin contains all factors, ranging from A to Z. EMGbin has 600000 columns and 130 rows. Here is a sample of EMGbin:
V101 V102 V103 V104 V105 V106
1 L M D S O O
2 L M C P A O
3 J M C O O O
4 L N D R A O
5 K M D O A O
6 K M C P O O
7 K N D Q O O
8 L N D R O O
9 L M D O O O
10 K M D S A O
When I run the program gs(EMGbin), I get the error:
Error in check.data(x) : all factors must have at least two levels.
When I run sapply(EMGbin, nlevels), I see the levels of factors each of the 600,000 variables has, and I see some of them are listed as 1 level. Would removing the variables with 1 factor level help? So far, the only way I know how to do this is x[, sapply(x, fun) != 1], but I don't know what to substitute in for fun.
Use this:
x[, sapply(x, nlevels) > 1]
You can check the number of levels in a factor with the nlevels function.

Resources