merging 2 unequal column size rowsets in u sql - u-sql

I have a rowsetA with 3 columns. I need to add this rowsetA to an existing rowsetB which has the above 3 columns as well as other columns.
How can I add/union above 2 rowsets such that rowsetA will have null/empty/default values for other columns present in rowsetB?

The easiest way is to add default null values in rowsetA when doing UNION with rowsetB.
#rowsetA = EXTRACT A string,
B string,
C string
FROM #path
USING Extractors.Csv();
#rowsetB = EXTRACT A string,
B string,
C string,
D string,
E string
FROM #path1
USING Extractors.Csv();
#union = SELECT A,B,C,null AS D,null AS E FROM #rowsetA
UNION
SELECT A,B,C,D,E FROM #rowsetB;
This way you will have null value on missing columns.
Note for other data types such as DateTime,int,etc, you just put default(int?) instead of null.
Hope this helps

Related

Teradata SQL - Convert values separated by ; in a single column to multiple rows

I have a single column with comma separated values like below.
sel job_dependency from test_table;
job_dependency
1;2;3;4;5;6
I need to convert it into below format in Teradata SQL where each number is a row.
job_dependency
1
2
3
4
5
6
Any help would be really helpful.
There's a table function for this task:
WITH cte AS
(
SELECT
1 AS inKey -- might be a column, either INT or VarChar up to 60 bytes
-- Other data types should be CASTed to VarChar (and back in the Select)
,job_dependency2 AS inString
FROM test_table
)
SELECT *
FROM TABLE
( StrTok_Split_To_Table(cte.inKey, cte.inString, ';')
RETURNS (outKey INTEGER, -- data type must match input column
tokenNum INTEGER,
token VARCHAR(20))
) AS dt

JuliaDB select a column from a string, not a symbol

I want to select a column from a JuliDB database. The problem is that I can't do it with string (not a symbol that starts with :), for example:
db = loadtable("table.dat")
#This table has 3 columns named position_1, position2, position_3
pos_num = 3
column_name = "position_$pos_num"
select(db,column_name)
If I do that, then the following error appears:
column position_3 not found.
Any suggestion?
You can create a symbol directly Symbol("position_$pos_num") or from your string with Symbol(column_name).

sqlite advanced case sensitive query

i search for a special kind of query in SQLite
to sort a notes table.
The result from the query should be like this:
id oid
1 1
2 1,1
5 1,1,a
6 1,1,a,1
3 1,1,A
4 1,1,A,1
But with the folling code I receive this:
CREATE TABLE note (
id INTEGER PRIMARY KEY AUTOINCREMENT,
created DATETIME DEFAULT CURRENT_TIMESTAMP,
oid VARCHAR unique,
tit VARCHAR,
dsc VARCHAR
);
select id, oid from note
order by oid collate NOCASE
Result:
id oid
1 1
2 1,1
5 1,1,a
3 1,1,A
6 1,1,a,1
4 1,1,A,1
Any suggestions?
Thanks
--jonah
The following transforms the sort keys so that the normal case sensitive ordering yields the requested result:
If there was a togglecase() function, the function would uppercase lowercase and lowercase uppercase (for example Hello => hELLO), one could ORDER BY togglecase(oid) and the result would be in the order requested.
You could define such a function and expose it to SQLite as a UDF. It could also be possible to write this function using builtin SQLite functions but I don't know them well enough to give an answer using them. The following is an example of such a function in Python:
def togglecase(s):
def toggle(l):
if l.isupper():
return l.lower()
if l.islower():
return l.upper()
return l
return ''.join(toggle(l)
for l in s)
Note that for proper Unicode support it needs to iterate over graphemes. Not over code points.
See that this does what I described it to do:
>>> togglecase("1,1,A")
'1,1,a'
>>> togglecase("1,1,a")
'1,1,A'
It is possible to test if this sorts correctly in Python:
>>> sorted(["1", "1,1", "1,1,a", "1,1,a,1", "1,1,A", "1,1,A,1"])
['1', '1,1', '1,1,A', '1,1,A,1', '1,1,a', '1,1,a,1']
See how the uppercase follows the lowercase:
>> sorted(["1", "1,1", "1,1,a", "1,1,a,1", "1,1,A", "1,1,A,1"], key=togglecase)
['1', '1,1', '1,1,a', '1,1,a,1', '1,1,A', '1,1,A,1']
Now if you use it in SQLite like:
SELECT id, oid
FROM note
ORDER BY togglecase(oid)
This should result in:
1 "1"
2 "1,1"
3 "1,1,a"
4 "1,1,a,1"
5 "1,1,A"
6 "1,1,A,1"
The code is untested except for the togglecase function.
You are getting that result because the sorting is pecified to be NOCASE. That means that "a" and "A" are equals. So, first the rows with "a/A" and nothing after, and then rows with "a/A" and data after.
If you make the query CASE SENSITIVE, you will get a different result. BUT "A" comes befores "a" in case sensitive sort:
SELECT id, oid
FROM note
ORDER by oid
Results:
1 "1"
2 "1,1"
5 "1,1,A"
6 "1,1,A,1"
3 "1,1,a"
4 "1,1,a,1"

count the unique values in one column in EXCEL 2010 or R with 1 million rows

After searching the forum, I did not find a good solution for this question. If I missed it, please tell me.
I need to count the unique values in one column in EXCEL 2010.
The worksheet has 1 million rows and 10 columns. All cell values are string or numbers.
I used the solution at Count unique values in a column in Excel
=SUMPRODUCT((A2:A1000000<>"")/COUNTIF(A2:A100000,A2:A1000000&""))
But, it runs so long time that the EXCEL is almost frozen. And, it generates 25 processes in Win 7.
Are there more efficient ways to do it?
Also, in the column, all values have for format of
AX_Y
here, A is a character, X is an integer, Y is an integer from 1 to 10.
For example, A5389579_10
I need to cut off the part after (including) undersocre. for the example,
A5389579
This is what I need to count as unique values in all cells in one column.
For example, A5389579_10
A1543848_6
A5389579_8
Here, the unique value has 2 after removing the part after underscore.
How to do it in EXCEL VBA and R (if no efficient solution for EXCEL)?
If you want to do this by VBA, you can take advantage of the Collection object. Since collections can only contain unique values, trying to add all of your input data to a collection will result in an array of unique values. The code below takes all the variables in a selected range and then outputs an array with distinct values to an other sheet (in this case a sheet named Output).
Sub ReturnDistinct()
Dim Cell As Range
Dim i As Integer
Dim DistCol As New Collection
Dim DistArr()
Dim OutSht As Worksheet
Dim LookupVal As String
Set OutSht = ActiveWorkbook.Sheets("Output") '<~~ Define sheet to putput array
If TypeName(Selection) <> "Range" Then Exit Sub
'Add all distinct values to collection
For Each Cell In Selection
If InStr(Cell.Value, "_") > 0 Then
LookupVal = Mid(Cell.Value, 1, InStr(Cell.Value, "_") - 1)
Else
LookupVal = Cell.Value
End If
On Error Resume Next
DistCol.Add LookupVal, CStr(LookupVal)
On Error GoTo 0
Next Cell
'Write collection to array
ReDim DistArr(1 To DistCol.Count, 1 To 1)
For i = 1 To DistCol.Count Step 1
DistArr(i, 1) = DistCol.Item(i)
Next i
'Outputs distinct values
OutSht.Range("A1:A" & UBound(DistArr)).Value = DistArr
End Sub
Note that since this code writes all the distinct values to a single column in the OutSht-sheet, this will return an error if there are more than 1,048,576 distinct values in your dataset. In that case you would have to split the data to be filled into multiple output columns.
For your specific request to count, use the below in a formula like =COUNTA(GetUniques(LEFT("A1:A100000",FIND("_","A1:A100000")-1)) entered as an array formula with Ctrl+Shift+Enter.
It also accepts multiple ranges / values (e.g. GetUniques("A1:A10","B2:E4"))
Function GetUniques(ParamArray args())
Dim arg, ele, arr, i As Long
Dim c As Collection
Set c = New Collection
For Each arg In args
If TypeOf arg Is Range Then
If arg.Count = 1 Then
arr = array(arg.value)
Else
arr = arg.Value
End If
ElseIf VarType(arg) > vbArray Then
arr = arg
Else
arr = Array(arg)
End If
For Each ele In arr
On Error Resume Next
c.Add ele, VarType(ele) & "|" & CStr(ele)
On Error GoTo 0
Next ele
Next arg
If c.Count > 0 Then
ReDim arr(0 To c.Count - 1)
For i = 0 To UBound(arr)
arr(i) = c(i + 1)
Next i
Set c = Nothing
GetUniques = arr
End If
End Function
edit: added a performance optimisation for ranges (loads them at once into an array - much faster than enumerating through a range)
In R:
# sample data
df <- data.frame(x=1:1000000,
y=sample(1e6:(1e7-1),1e6,replace=T))
df$y <- paste0("A",df$y,"_",sample(1:10,1e6,replace=T))
# this does the work...
length(unique(sub("_[0-9]+","",df$y)))
# [1] 946442
# and it's fast...
system.time(length(unique(sub("_[0-9]+","",df$y))))
# user system elapsed
# 2.01 0.00 2.02
In excel 2010... in the next column add (if original data was in A:A add in B1)
= 1/COUNTIF(A:A,A1) and copy down col B to the bottom of your data. Depending on your PC it may chug away calculating for a long time, but it will work. Then copy col B & paste values over itself.
Then SUM col B

sqlite returns 0 rows

SELECT skill_name, character_name, cb_id, cb_id2 FROM characterbasics, characterskills WHERE characterbasics.character_name = 'Joe' & characterbasics.cb_id = characterskills.cb_id2
This, for some reason, returns 0 rows
The character name is in there (as well as 2 other dummy names).. and both cbid and cbid2 are the same.
When i try the query without the & cbid=cbid2 i get the name with the other data.. now when i check for JUST cbid=cbid2 i get 3 different dummy characters i created...
im trying to pull all "skills" associated with one character by matching the id of the character name in table 1 with the character id in table 2
Where have I erred?
cn = character name
cn cbid cbid2
Joe 2 2
This is what it SHOULD look like..
You cant use & as logical AND operator (& is binary operator), so sql should look like :
SELECT skill_name, character_name, cb_id, cb_id2
FROM characterbasics, characterskills
WHERE characterbasics.character_name = 'Joe' AND characterbasics.cb_id = characterskills.cb_id2

Resources