I have a problem when I input numbers 2,5,7 the results are ok and when I input numbers 1,3,4,6 the results don't match.
define variable oct as character.
define variable l-oct as integer.
define variable oktal as integer.
define variable l-oktal as integer.
define variable count as integer.
define variable i as integer.
define variable bin as character.
define variable bin2 as character.
end.
do i = length(bin) to 1 by -1:
bin2 = bin2 + substring(bin, i, 1).
end.
display bin2 with frame a down.
Your if else if train is missing an else if. An if else if train is usually better served by a case statement.
Additionally, you are first constructing a back to front in groups of three digits binary string and subsequently reversing this at the end per digit.
So where octal 1 becomes binary 001 you are flipping this to 100. Instead of flipping afterwards, concatenate your string correctly to start with, so replace all occurrences of:
bin = bin + 'xxx'
with:
bin = 'xxx' + bin
Related
I have a big dataset with alot of columns, being most of them not numeric values. I need to find inconsistencies in the data as well as outliers and the part of obtaining inconsistencies would be easy if the dataset wasn't so big (7032 rows to be exact).
An inconsistency would be something like: ID supposed to be 4 letters and 4 numbers and I obtain something else (like 3 numbers and 2 letters); or other example would be a number that should be a 0 or 1 and I obtain a -1 or a 2 .
Is there any function that I can use to obtain the inconsitencies in each column?
For the specific columns that doesn't have numeric values, I thought of doing a regex and validate if each row for a certain column is valid but I didn't found info that could give me that.
For the part of outliers I did a boxplot to see if I could obtain any outlier, like this:
boxplot(dataset$column)
But the graphic didn't gave me any outliers. Should I be ok with the results that I obtain in the graphic or should I try something else to see if there is really any outlier in the data?
For the specific examples you've given:
an ID must be be four numbers and 4 letters:
!grepl("^[0-9]{4}-[[:alpha:]]{4}$", ID)
will be TRUE for inconsistent values (^ and $ mean beginning- and end-of-string respectively; {4} means "previous pattern repeats exactly four times"; [0-9] means "any symbol between 0 and 9 (i.e. any numeral); [[:alpha:]] means "any alphabetic character"). If you only want uppercase letters you could use [A-Z] instead (assuming you are not working in some weird locale like Estonian).
If you need a numeric value to be 0 or 1, then !num_val %in% c(0,1) will work (this will work for any set of allowed values; you can use it for a specific set of allowed character values as well)
If you need a numeric value to be between a and b then !(a < num_val & num_val < b) ...
I have a mock-up dataframe representing some of the confidential data I have and it looks like this:
Name Value
1. AaaaBaCCCaaa.x 1
2. AbbAbbKalllNBN.y 2
3. CCCdddEfffFg.x 8
4. ZZZtTThGGtGGGG.y 1
...
9. AAAHHHhhhhIIIIII.x 2
10. RRRRmmmmJJJJJJJ.y 3
11. MMMMMnnnnNNNNrrrr.x 4
...
What's important to notice here is that the Name variable contains ordinal numbers (e.g. 1. 2., 10.) at the beginning of the string and either .x or .y at the end of the string. Also, length of the Name variable is not the same in each row.
How can I remove the number from the beginning of the each string in the Name variable along with the period and the space that come after it? It's very important for me to get rid of them because I need to use the separate function on this data afterwards to separate into x and y from the end of the string. If I will still have that period after the number on the beginning of the string, separate will fail.
I wanted to use substr but I didn't know how to do it since, for example, 10. is longer than 9. and I don't know which values I would put into the start and stop arguments.
I have two unique numbers, 100000 - 999999 (fixed 6 chars length [0-9]), second
1000000 - 9999999 (fixed 7 char length [0-9]). How can i encode/decode this numbers (they need to remain separate after decoding), using only uppercase letters [A-Z] and [0-9] digits and have a fixed length of 8 chars in total?
Example:
input -> num_1: 242404, num_2 : 1002000
encode -> AX3B O3XZ
decode -> 2424041002000
Is there any algorithm for this type of problem?
This is just a simple mapping from one set of values to another set of values. The procedure is always the same:
List all possible input and output values.
Find the index of the input.
Return the value of the output list at that index.
Note that it's often not necessary to make an actual list (i.e. loading all values into some data structure). You can typically compute the value for any index on-demand. This case is no different.
Imagine a list of all possible input pairs:
0 100'000, 1'000'000
1 100'000, 1'000'001
2 100'000, 1'000'002
...
K 100'000, 9'999'999
K+1 100'001, 1'000'000
K+2 100'001, 1'000'001
...
N-1 999'999, 9'999'998
N 999'999, 9'999'999
For any given pair (a, b), you can compute its index i in this list like so:
// Make a and b zero-based
a -= 100'000
b -= 1'000'000
i = a*1'000'000 + b
Convert i to base 36 (A-Z and 0-9 gives you 36 symbols), pad on the left with zeros as necessary1, and insert a space after the fourth digit.
encoded = addSpace(zeroPad(base36(i)))
To get back to the input pair:
Convert the 8-character base 36 string to base 10 (this is the index into the list, remember), then derive a and b from the index.
i = base10(removeSpace(encoded))
a = i/1'000'000 + 100'000 // integer divison (i.e. ignore remainder)
b = i%1'000'000 + 1'000'000
Here is an implementation in Go: https://play.golang.org/p/KQu9Hcoz5UH
1 If you don't like the idea of zero padding you can also offset i at this point. The target set of values is plenty big enough, you need only about 32% of all base 36 numbers with eight digits or less.
This is Teradata specific question. In RANDOM function, I want the lower bound to be taken directly from one of the columns. e.g. I want a random value between age of the subscriber and till date. SO I want to put RANDOM(int_tenure, 0). I am receiving below error:
"Syntax error, expected something like an integer or a decimal number or a floating point number or '+' or '-' between '(' and the word 'int_tenure'"
the RANDOM only can take literals (no field/column names) and first parameter and to be lower/equal than second one. So in first step it's not possible. But you can work around: Generate a random factor [0;1] and apply this factor to the interval.
select 10 as lower_bound
,20 as upper_bound
-- ,random(lower_bound, upper_bound) -- will not work
,random(0, 1000)/1000.0000 as RND_Factor -- a random factor between 0 and 1
,(upper_bound-lower_bound)*RND_Factor+lower_bound;
I am trying to group some price ranges from an .ods file, but have no idea how to do that.
e.g. I have a column with different prices like this:
11,61
6,15
13,68
7,69
6,00
What I want is to tell Calc to group everything from 0,00~10,99 and output text 0-10 and everything from 11,00~20,00 and output text 11-20, so the final output would be:
col1 col2
11,61 11-20
6,15 0-10
13,68 11-20
7,69 0-10
6,00 0-10
You can use the functions ROUNDDOWN() and ROUNDUP() with a negative count to get the next multiple of 10 (-1), 100 (-2) or 1000 (-3). It reduces the accuracy of a certain value by squares of 10. So, rounding to the previous or next multiple of 10 is done using:
=ROUNDDOWN(<yourvalue>; -1)
and
=ROUNDUP(<yourvalue>; -1)
respectively (take care to adapt the formula argument separators to commata (,) if this is required by the i18y your're using).
So, =ROUNDDOWN(11,61; -1) will result in 10, and =ROUNDUP(11,61; -1) will give you 20. This way, you can "calculate" the appropriate group for each value (example for value in A1):
=CONCATENATE(ROUNDDOWN($A1; -1)+1;"-";ROUNDUP($A1;-1))
To split it up on multiple lines:
=CONCATENATE( # Result will be a concatenated string
ROUNDDOWN($A1;-1)+1; # first value: previous multiple of 10, +1;
"-"; # second value: literal "-"
ROUNDUP($A1;-1) # third value: next multiple of 10
)
With your example data, this results in:
EDIT:
For a grouping 0-9, 9-19 and so on, the following formula should work:
=CONCATENATE(ABS(ROUNDDOWN($A2+1; -1)-1);"-";ROUNDUP($A2+1,01;-1)-1)
EDIT2:
For a solution using the IF() function, you could use:
=IF(A2 < 9;"0-9";IF(A2 < 19; "9-19";IF(A2 < 29; "19-29";"more than 29")))
For grouping of values greater than 29, you will have to add according IF clauses replacing the string "more than 29" by additional checks. Every grouping range will require its own IF clause.