Need to encrypt the data in teradata SQL Query - teradata

I have requirement to encrypt the data in teradata using SQL Query or Stored procedure.
I have data in below sample:
May123#34##
AbC##$%1234DE#f
zyx#12
So I want output in the below format
aaadddpddpp
aaappppddddaapa
aaapdd
So we want to replace character with a and number with d and special character with p.
Your help is highly appreciated.

This looks like data masking, not encryption. Encryption is something you can undo with decryption where masking is a one-way thing.
You can use Regexp_Replace() to do the swaps:
SELECT REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE('May123#34##', '[a-z]', 'a', 1, 0, 'i'),'[0-9]', 'd', 1, 0, 'i'), '[^a-z]', 'p', 1, 0, 'i')
result:
aaadddpddpp
This first swaps all letters to a. Then it swaps all numbers to d. Finally anything left over is swapped to p.

Related

Key value pair data structure in R where key length is greater than 1 (multiple keys per value)

Is there a sensible way of storing a mapping of key/value pairs where the key is of length > 1?
What I know so far
Where keys are of length 1, we can use a named list, e.g.
mylist <- list(a=c("apple", "alphabet", "allegro"),
b=c("baseball", "brilliant"))
and access the values by using the keys, like so
mylist$a
# [1] "apple" "alphabet" "allegro"
But if the keys are of length greater than 1, e.g. instead of a and b, they were c('a', 'foo', 'bar'), and c('b', 'some', 'thing'), is there a data structure in R that caters to this many to many mapping, so that any one of the elements of a key will map to the relevant values?
What you want is alternate keys to the same element, from what I understand. This is more a problem in designing the best structure, that something intrinsic to R.
One solution would be to assign the value to each corresponding key, but that would create redundancy, and the value would be repeated.
Another better solution would be to use a pre-list to translate all possible jargons to only which can be used as the key.
So you can have a list of synonyms like:
synonyms <- list(jargon1 = "keyword1", jargon2 = "keyword1", jargon3 = "keyword3")
So both jargon1 and jargon2 would point to the same keyword which could then be used to fetch the correct value from your main list.
What I would do is create a new master_list with the name of all the keys that it can take.
master_list <- list(a = c('a', 'foo', 'bar'), b = c('b', 'some', 'thing'))
Now the values present in master_list can be referred with one common key in mylist.
mylist <- list(a=c("apple", "alphabet", "allegro"), b=c("baseball", "brilliant"))
This will give minimum redundancy overall.

How do I write R code that depends on my data in complex ways?

I'm trying to write complex R code that depends on my data. For example, suppose x = c(1,3,17), but critically I don't know how many elements x has ahead of time. All I know is that x is a vector of integers. I want to use x to create a code block like this:
a = fcn(complex_stuff,
thing(abc = xyz, 1, zyx),
thing(abc = xyz, 3, zyx),
thing(abc = xyz, 17, zyx),
more_complex_stuff
)
I agree my question here is ill-stated, but in case anyone winds up here, I solved this in two steps:
Step 1: make the text I want to insert into the function.
insert_me <- paste("thing(abc = xyz", x, "zyx)", sep=",", collapse=",")
Step 2: insert the text into the function.
The key is to use eval(parse()) in a particular way, see the comments and answer to my question here: How to insert text into an R function?

How can I use toupper in all values of a data frame variable?

I have a data frame that has several thousand records. One record is for a U. S. state abbreviation. The two character abbreviation can be capitol or small letters. For example, New York state, might be either ny or NY. I need all characters to be capitol letters, so the tabulations will include all records from one state with that state. Otherwise, I get two rows in a state tabulation one for capitol letter abbreviations and one for small letter abbreviations.
I have tried a number of options, and so far none have worked. The latest has a data frame called A1, and within that is the State variable. That is, the State variable is A1$State.
I wrote a function like this…
FixVal <- function(a)
{
tempstring1 = toupper(a)
return(tempstring1)
}
and then tried several variations on the apply function
apply(A1$State, 1, FixVal(A1$State))
apply(A1$State, 2, FixVal(A1$State))
apply(A1, 1, FixVal(A1))
apply(A1, 1, FixVal(A1$State))
apply(A1$State, 1, FixVal(State))
The error messages are different for each attempt. The most recent error message is that “State” is not found in function toupper.
Since I can print “State” using ..
print(A1$State) I know that it exists and that it is spelled correctly.
What am I doing wrong here?

Apriori Algorithm in R

I have what I thought was a well-prepared dataset. I wanted to use the Apriori Algorithm in R to look for associations and come up with some rules. I have about 16,000 rows (unique customers) and 179 columns that represent various items/categories. The data looks like this:
Cat1 Cat2 Cat3 Cat4 Cat5 ... Cat179
1, 0, 0, 0, 1, ... 0
0, 0, 0, 0, 0, ... 1
0, 1, 1, 0, 0, ... 0
...
I thought having a comma separated file with binary values (1/0) for each customer and category would do the trick, but after I read in the data using:
data5 = read.csv("Z:/CUST_DM/data_test.txt",header = TRUE,sep=",")
and then run this command:
rules = apriori(data5, parameter = list(supp = .001,conf = 0.8))
I get the following error:
Error in asMethod(object):
column(s) 1, 2, 3, ...178 not logical or a factor. Discretize the columns first.
I understand Discretize but not in this context I guess. Everything is a 1 or 0. I've even changed the data from INT to CHAR and received the same error. I also had the customer ID (unique) as column 1 but I understand that isn't necessary when the data is in this form (flat file). I'm sure there is something obvious I'm missing - I'm new to R.
What am I missing? Thanks for your input.
I solved the problem this way: After reading in the data to R I used lapply() to change the data to factors (I think that's what it does). Then I took that data set and created a data frame from it. Then I was able to apply apriori() successfully.
Your data is actually already in (dense) matrix format, but read.csv always reads data in as a data.frame. Just coerce the data to a matrix first:
dat <- as.matrix(data5)
rules <- apriori(dat, parameter = list(supp = .001,conf = 0.8))
1s in the data will be interpreted as the presence of the item and 0s as the absence. More information about how to create transactions can be found in the manual page ? transactions.

Express a string as a function in R

I am automating the creation of a series of plots each of which is based on a class of chemicals (e.g., metals, PCBs, etc.); for reasons I'll leave out, I am plotting the legend outside of the plot and using negative values for the inset argument for the legend() function to do this (e.g., inset = c(-0.2, 0)). As each of the chemical classes requires different values for the inset I thought of creating a hash table using the hash package to store the values needed for each chemical class. However, in order to store these in the hash table I was storing the vector of values as a string (e.g., "c(-0.2, 0)").
My code for the hash table looks like this:
legend.hash <- hash(chem.class, c('c(-0.2, 0)', 'c(-0.2, 0)', 'c(-0.25, -0.4)', 'c(-0.25, -0.3)', 'c(-0.2, 0)', 'c(-0.4, -0.2)', 'c(-0.2, 0)', 'c(-0.2, 0)'))
where chem.class is a vector of chemical classes.
Retrieving the values from the resulting hash table are obviously as a string "c(-0.2, 0)", is there a way of converting this string of text so that R interprets it as a function that could be used like the following: legend(..., inset = legend.hash[[chem.class[i]]])?
Or is there a better way to implement this using the traditional graphics system?
The classic way of executing a string as if it was a function is by using eval() and parse() :
> eval(parse(text="c(-0.2,0)"))
[1] -0.2 0.0
But I really wonder why you insist on using a hash instead of a simple list.
legend.hash <- list(c(-0.2, 0), c(-0.2, 0), c(-0.25, -0.4), c(-0.25, -0.3),
c(-0.2, 0), c(-0.4, -0.2), c(-0.2, 0), c(-0.2, 0))
names(legend.hash) <- chem.class
would allow you to use the exact construct you're using now, without all the tricky bits and pieces of eval() and parse(), especially thinking about the infamous fortune(106) :
> require(fortunes)
> fortune(106)
If the answer is parse() you should usually rethink the question.
-- Thomas Lumley
R-help (February 2005)
It may work better to position your legend using the grconvertX and grconvertY functions rather than using negative insets.
If you really want to convert a string with 2 number values in it into a vector of numbers then consider using the strapply function from the gsubfn package. This way you avoid the parse function and all the potential headaches that come with it. It may also end up being faster.
If you change the strings to just the numbers and a seperator (without the 'c' and parens) then you could just use as.numeric on the result of strsplit which may be even faster.

Resources