I am running a .set-or-append command to ingest from 1 table into another. I know that the query from the source table is fine and the target table, if it exists, should have the same query but if not the command should just create it. Initially I was not having a problem with this. But a few of my .set-or-append queries have been getting this error:
Invalid query for distributed set / append / replace operation. Error: Query schema does not match table schema. QuerySchema=('long'), TableSchema=('datetime,string,string,string,string,dynamic,dynamic,dynamic'). Query:...'
I know for a fact that the schemas match. I ran the same command again and again and on about the 3rd try the call succeeded. Which makes 0 sense to me. So what is this above error and why did the same command work after failing with no change to the query whatsoever?
The query/command I am running is essentially the following:
.set-or-append async TargetTable <|
SourceTable
| where __id in ("...", "....", ........) // is aproximately 250 distinct ids in "in" operator
It appears that the query you're using is extending data with extra column:
extend hashBucket = hash(row_number(), ...) | where hashBucket == ... - and thus you're getting schema mismatch.
Perhaps, your intention was to filter based on the hashBucket, and in this case you can just use filtering without extension:
where hash(row_number(), ...) == ...
Yoni could you explain what you mean by bag_unpack giving an inconsistent mismatch? i aligned my bag_unpack ... project-reorder to match the target table i'm unpacking to, but it just changes around a few variables types in the error message:
Query schema does not match table schema.
QuerySchema=(
'datetime,long,datetime,string,string,datetime,string,
long,real,string,bool,guid,guid,string,real'),
TableSchema=(
'datetime,long,datetime,string,string,datetime,string,
long,real,guid,guid,string,bool,string,real')
really confused what the table schema and query schema even are at this point.
For reference my query is like this:
.set-or-append async apiV2FormationSearchTransform <|
//set notruncation;
apiV2FormationSearchLatest
| where hash(toguid(fullRecord["id"]), 1) == 0
| project fullRecord
| evaluate bag_unpack(fullRecord)
| extend dateCatalogued = todatetime(column_ifexists("dateCatalogued", ""))
, simpleId = tolong(column_ifexists("simpleId", ""))
, dateLastModified = todatetime(column_ifexists("dateLastModified", ""))
, reportedFormationName = tostring(column_ifexists("reportedFormationName", ""))
, comments = tostring(column_ifexists("comments", ""))
, dateCreated = todatetime(column_ifexists("dateCreated", ""))
, formationName = tostring(column_ifexists("formationName", ""))
, internalId = tolong(column_ifexists("internalId", ""))
, topDepth = toreal(column_ifexists("topDepth", ""))
, wellId = column_ifexists("wellId", toguid(""))
, id = column_ifexists("id", toguid(""))
, methodObtained = tostring(column_ifexists("methodObtained", ""))
, isTarget = tobool(column_ifexists("isTarget", ""))
, completionId = tostring(column_ifexists("completionId", ""))
, baseDepth = toreal(column_ifexists("baseDepth", ""))
| project-reorder dateCatalogued
, simpleId
, dateLastModified
, reportedFormationName
, comments
, dateCreated
, formationName
, internalId
, topDepth
, wellId
, id
, methodObtained
, isTarget
, completionId
, baseDepth
and this is the getschema output of my target table:
dateCatalogued 0 System.DateTime datetime
simpleId 1 System.Int64 long
dateLastModified 2 System.DateTime datetime
reportedFormationName 3 System.String string
comments 4 System.String string
dateCreated 5 System.DateTime datetime
formationName 6 System.String string
internalId 7 System.Int64 long
topDepth 8 System.Double real
wellId 9 System.Guid guid
id 10 System.Guid guid
methodObtained 11 System.String string
isTarget 12 System.SByte bool
completionId 13 System.String string
baseDepth 14 System.Double real
I had a similar problem. In my case, it was that there was there in the DB another table with the same name (although in a different folder. I was using the with (folder = 'foo/bar') <| option ). I have changed the table name and the error disappeared.
Related
I'm trying to use the Julia SQLite.jl library, but I can't figure out how to bind variables.
using SQLite
# Create DB and table
db = SQLite.DB("mydb.sqlite")
SQLite.createtable!(db, "Student", Tables.Schema((:Name, :Grade), (String, Int64)); temp=false, ifnotexists=true)
# Add vals
SQLite.execute(db, "INSERT INTO Student VALUES('Harry', 1)")
# Prepared statement: Can use: ?, ?NNN, :AAA, $AAA, #AAA
insert_stmt = SQLite.Stmt(db, "INSERT INTO Student VALUES(:N, :G)")
SQLite.bind!(insert_stmt, Dict("N"=>"George", "G"=>"4"))
# This fails, with error: SQLiteException("`:N` not found in values keyword arguments to bind to sql statement")
insert_stmt = SQLite.Stmt(db, "INSERT INTO Student VALUES(:N, :G)")
SQLite.bind!(insert_stmt, Dict(:N=>"George", :G=>"4"))
SQLite.execute(insert_stmt)
# This fails, with error: SQLiteException("values should be provided for all query placeholders")
insert_stmt = SQLite.Stmt(db, "INSERT INTO Student VALUES(?1, ?2)")
SQLite.bind!(insert_stmt, ["George", "4"])
SQLite.execute(insert_stmt)
# This fails, with error: SQLiteException("values should be provided for all query placeholders")
insert_stmt = SQLite.Stmt(db, "INSERT INTO Student VALUES(':N', ':G')")
SQLite.bind!(insert_stmt, Dict(:N=>"George", :G=>"4"))
SQLite.execute(insert_stmt)
# This doesn't bind, it inserts ':N' and ':G'
What's the right syntax? Thanks!
You could try:
stmt = SQLite.Stmt(db, "INSERT INTO Student VALUES(?, ?)")
DBInterface.execute(stmt, ["Jack",2])
Let's check if this worked:
julia> DBInterface.execute(db, "SELECT * FROM Student") |> DataFrame
2×2 DataFrame
Row │ Name Grade
│ String Int64
─────┼───────────────
1 │ Harry 1
2 │ Jack 2
(Similar to Przemyslaw Szufel's answer, just with named parameters like in the question.)
The documentation for DBInterface.execute (which the one for SQLite.execute recommends to be used) says:
DBInterface.execute(db::SQLite.DB, sql::String,
[params]) DBInterface.execute(stmt::SQLite.Stmt, [params])
Bind any positional (params as Vector or Tuple) or named (params as
NamedTuple or Dict) parameters to an SQL statement, given by db and
sql or as an already prepared statement stmt, execute the query and
return an iterator of result rows.
So you can pass on your Dict to execute directly as:
julia> insert_stmt = SQLite.Stmt(db, "INSERT INTO Student VALUES(:N, :G)")
julia> DBInterface.execute(insert_stmt, Dict(:N => "Palani", :G => 3))
SQLite.Query(SQLite.Stmt(SQLite.DB("mydb.sqlite"), 1), Base.RefValue{Int32}(101), Symbol[], Type[], Dict{Symbol, Int64}(), Base.RefValue{Int64}(0))
I am creating a recursive CTE in snowflake for getting complete path an getting following error:
String 'AAAA_50>BBBB_47>CCCC_92' is too long and would be truncated in 'CONCAT'
My script is as follows: (it works fine for 2 levels, starts failing for 3rd level)
with recursive plant
(child_col,parent_col,val )
as
(
select child_col, '' parent_col , trim(child_col) from My_view
where condition1 = 'AAA'
union all
select A.child_col,A.parent_col,
concat(trim(A.child_col),'>')||trim(val)
from My_view A
JOIN plant as B ON trim(B.child_col) = trim(A.parent_col)
)
select distinct * from plant
Most likely the child_col data type is defined as VARCHAR (N), this type is being passed on. Because CONCAT Returns:
The data type of the returned value is the same as the data type of
the input value(s).
Try to explicitly cast a type to a string like this cast(trim(child_col) as string):
Full code:
with recursive plant (child_col,parent_col,val )
as (
select child_col, '' parent_col , cast(trim(child_col) as string)
from My_view
where condition1 = 'AAA'
union all
select A.child_col, A.parent_col, concat(trim(A.child_col),'>')||trim(val)
from My_view A
join plant as B ON trim(B.child_col) = trim(A.parent_col)
)
select distinct * from plant
Remember that recursion in Snowflake is limited to 100 loops by default.
If you want to increase them, you need to contact support.
Reference: CONCAT Troubleshooting a Recursive CTE
I am trying to get the top 5 records ordered on a specific value in Cosmos DB but I am getting stuck at getting the records ordered.
The query is done on the following document:
{
"id": string,
"Compliant": bool,
"DefinitionId": int,
"DefinitionPeriod": string,
"EventDate": date,
"HerdProfileId": int,
"Period": int,
"Value": int
}
What i have tried:
1st try
SELECT TOP 5 cr.HerdProfileId, cr.Compliant, cr.NonCompliant, cr.NullCompliant FROM (
SELECT
c.HerdProfileId,
SUM(comp) as Compliant,
SUM(noncomp) as NonCompliant,
SUM(nullcomp) as NullCompliant
FROM c
JOIN(SELECT VALUE COUNT(c.id) FROM c WHERE c.Compliant = true) comp
JOIN(SELECT VALUE COUNT(c.id) FROM c WHERE c.Compliant = false) noncomp
JOIN(SELECT VALUE COUNT(c.id) FROM c WHERE c.Compliant = null) nullcomp
WHERE c.Period = 201948
GROUP BY c.HerdProfileId) cr
WHERE cr.NonCompliant > 0
ORDER BY cr.NonCompliant
results in: Unsupported ORDER BY clause. ORDER BY item expression could not be mapped to a document path
2nd try:
SELECT TOP 5 cr.HerdProfileId, cr.Compliant, cr.NonCompliant, cr.NullCompliant FROM (
SELECT
c.HerdProfileId,
SUM(comp) as Compliant,
SUM(noncomp) as NonCompliant,
SUM(nullcomp) as NullCompliant
FROM c
JOIN(SELECT VALUE COUNT(c.id) FROM c WHERE c.Compliant = true) comp
JOIN(SELECT VALUE COUNT(c.id) FROM c WHERE c.Compliant = false) noncomp
JOIN(SELECT VALUE COUNT(c.id) FROM c WHERE c.Compliant = null) nullcomp
WHERE c.Period = 201950
GROUP BY c.HerdProfileId
ORDER BY NonCompliant DESC) cr
WHERE cr.NonCompliant > 0
results in: ORDER BY' is not supported in presence of GROUP BY
Is there any way to get the data needed or is this just not possible in Cosmos DB and do I need to order the results in code later on?
The first sql: Order by item expression could not be mapped to a document path. Please refer to the statements in this blog:
The second sql:
Order by can't work with Group By so far,please refer to official statement:
I suppose that you have to follow the suggestions in my previous case:How to group by and order by in cosmos db? you order the results in code so far. Waiting for the plan of above 2rd statement for group by and order by...
I'm using data in a dataframe to try and update a table in an sqlite database that looks like
Part | Price
------------
a | 5
b | 9
I am getting a syntax error for this
for(row in 1:nrow(newdata)){dbGetQuery(conn=db,"UPDATE Parts SET Price = ",newdata$Price[row], " WHERE Part = '", newdata$Part[row],"';")}
The exact error I'm getting:
Error in rsqlite_send_query(conn#ptr, statement) : near " ": syntax error
Why is this please?
The query string needs to be built into a single string
for(row in seq_len(nrow(newdata))) {
dbGetQuery(conn=db, sprintf("UPDATE Parts SET Price = %i WHERE Part = '%s';", newdata$Price[row], newdata$Part[row]))
}
It's also possible to accomplish this with paste or paste0, but sprintf can be easier to read.
I am quite new to u-sql, trying to solve
str1=\global\europe\Moscow\12345\File1.txt
str2=\global.bee.com\europe\Moscow\12345\File1.txt
str3=\global\europe\amsterdam\54321\File1.Rvt
str4=\global.bee.com\europe\amsterdam\12345\File1.Rvt
case1:
how do i get just "\europe\Moscow\12345\File1.txt" from the strings variable str1 & str2, i want to just take ("\europe\Moscow\12345\File1.txt") from str1 and str2 then "Groupby(\global\europe\Moscow\12345)" and take the count of distinct files from the path (""\europe\Moscow\12345\")
so the output would be something like this:
distinct_filesby_Location_Date
to solve the above case i tried the below u-sql code but not quite sure whether i am writing the right script or not:
#inArray = SELECT new SQL.ARRAY<string>(
filepath.Contains("\\europe")) AS path
FROM #t;
#filesbyloc =
SELECT [ID],
path.Trim() AS path1
FROM #inArray
CROSS APPLY
EXPLODE(path1) AS r(location);
OUTPUT #filesbyloc
TO "/Outputs/distinctfilesbylocation.tsv"
USING Outputters.Tsv();
any help would you greatly appreciated.
One approach to this is to put all the strings you want to work with in a file, eg strings.txt and save it in your U-SQL input folder. Also have a file with the cities in you want to match, eg cities.txt. Then try the following U-SQL script:
#input =
EXTRACT filepath string
FROM "/input/strings.txt"
USING Extractors.Tsv();
// Give the strings a row-number
#input =
SELECT ROW_NUMBER() OVER() AS rn,
filepath
FROM #input;
// Get the cities
#cities =
EXTRACT city string
FROM "/input/cities.txt"
USING Extractors.Tsv();
// Ensure there is a lower-case version of city for matching / joining
#cities =
SELECT city,
city.ToLower() AS lowercase_city
FROM #cities;
// Explode the filepath into separate rows
#working =
SELECT rn,
new SQL.ARRAY<string>(filepath.Split('\\')) AS pathElement
FROM #input AS i;
// Explode the filepath string, also changing to lower case
#working =
SELECT rn,
x.pathElement.ToLower() AS pathElement
FROM #working AS i
CROSS APPLY
EXPLODE(pathElement) AS x(pathElement);
// Create the output query, joining on lower case city name, display, normal case name
#output =
SELECT c.city,
COUNT( * ) AS records
FROM #working AS w
INNER JOIN
#cities AS c
ON w.pathElement == c.lowercase_city
GROUP BY c.city;
// Output the result
OUTPUT #output TO "/output/output.txt"
USING Outputters.Tsv();
//OUTPUT #working TO "/output/output2.txt"
//USING Outputters.Tsv();
My results:
HTH
Taking the liberty to format your input file as TSV file, and not knowing all the column semantics, here is a way to write your query. Please note that I made the assumptions as provided in the comments.
#d =
EXTRACT path string,
user string,
num1 int,
num2 int,
start_date string,
end_date string,
flag string,
year int,
s string,
another_date string
FROM #"\users\temp\citypaths.txt"
USING Extractors.Tsv(encoding: Encoding.Unicode);
// I assume that you have only one DateTime format culture in your file.
// If it becomes dependent on the region or city as expressed in the path, you need to add a lookup.
#d =
SELECT new SqlArray<string>(path.Split('\\')) AS steps,
DateTime.Parse(end_date, new CultureInfo("fr-FR", false)).Date.ToString("yyyy-MM-dd") AS end_date
FROM #d;
// This assumes your paths have a fixed formatting/mapping into the city
#d =
SELECT steps[4].ToLowerInvariant() AS city,
end_date
FROM #d;
#res =
SELECT city,
end_date,
COUNT( * ) AS count
FROM #d
GROUP BY city,
end_date;
OUTPUT #res
TO "/output/result.csv"
USING Outputters.Csv();
// Now let's pivot the date and count.
OUTPUT #res2
TO "/output/res2.csv"
USING Outputters.Csv();
#res2 =
SELECT city, MAP_AGG(end_date, count) AS date_count
FROM #res
GROUP BY city;
// This assumes you know exactly with dates you are looking for. Otherwise keep it in the first file representation.
#res2 =
SELECT city,
date_count["2016-11-21"]AS [2016-11-21],
date_count["2016-11-22"]AS [2016-11-22]
FROM #res2;
UPDATE AFTER RECEIVING SOME EXAMPLE DATA IN PRIVATE EMAIL:
Based on the data you sent me (after the extraction and counting of the cities that you either could do with the join as outlined in Bob's answer where you need to know your cities in advance, or with the taking the string from the city location in the path as in my example, where you do not need to know the cities in advance), you want to pivot the rowset city, count, date into the rowset date, city1, city2, ... were each row contains the date and the counts for each city.
You could easily adjust my example above by changing the calculations of #res2 in the following way:
// Now let's pivot the city and count.
#res2 = SELECT end_date, MAP_AGG(city, count) AS city_count
FROM #res
GROUP BY end_date;
// This assumes you know exactly with cities you are looking for. Otherwise keep it in the first file representation or use a script generation (see below).
#res2 =
SELECT end_date,
city_count["istanbul"]AS istanbul,
city_count["midlands"]AS midlands,
city_count["belfast"] AS belfast,
city_count["acoustics"] AS acoustics,
city_count["amsterdam"] AS amsterdam
FROM #res2;
Note that as in my example, you will need to enumerate all cities in the pivot statement by looking it up in the SQL.MAP column. If that is not known apriori, you will have to first submit a script that creates the script for you. For example, assuming your city, count, date rowset is in a file (or you could just duplicate the statements to generate the rowset in the generation script and the generated script), you could write it as the following script. Then take the result and submit it as the actual processing script.
// Get the rowset (could also be the actual calculation from the original file
#in = EXTRACT city string, count int?, date string
FROM "/users/temp/Revit_Last2Months_Results.tsv"
USING Extractors.Tsv();
// Generate the statements for the preparation of the data before the pivot
#stmts = SELECT * FROM (VALUES
( "#s1", "EXTRACT city string, count int?, date string FROM \"/users/temp/Revit_Last2Months_Results.tsv\" USING Extractors.Tsv();"),
( "#s2", "SELECT date, MAP_AGG(city, count) AS city_count FROM #s1 GROUP BY date;" )
) AS T( stmt_name, stmt);
// Now generate the statement doing the pivot
#cities = SELECT DISTINCT city FROM #in2;
#pivots =
SELECT "#s3" AS stmt_name, "SELECT date, "+String.Join(", ", ARRAY_AGG("city_count[\""+city+"\"] AS ["+city+"]"))+ " FROM #s2;" AS stmt
FROM #cities;
// Now generate the OUTPUT statement after the pivot. Note that the OUTPUT does not have a statement name.
#output =
SELECT "OUTPUT #s3 TO \"/output/pivot_gen.tsv\" USING Outputters.Tsv();" AS stmt
FROM (VALUES(1)) AS T(x);
// Now put the statements into one rowset. Note that null are ordering high in U-SQL
#result =
SELECT stmt_name, "=" AS assign, stmt FROM #stmts
UNION ALL SELECT stmt_name, "=" AS assign, stmt FROM #pivots
UNION ALL SELECT (string) null AS stmt_name, (string) null AS assign, stmt FROM #output;
// Now output the statements in order of the stmt_name
OUTPUT #result
TO "/pivot.usql"
ORDER BY stmt_name
USING Outputters.Text(delimiter:' ', quoting:false);
Now download the file and submit it.