is there any solution for how to convert decimal values and alphanumeric in integer type in pyspark - pyspark-schema

ex: act: salesorgcode: Reqired
6001.0 6001
9001.0 9001
7002.0 7002
A001 A001
T001 T001

Please try below code
salesorgcode = [("Finance",6001.0),
("Marketing",9001.0),
("Sales",7002.0),
("IT",8002.0)
]
salesColumns = ["sales_name","sales_id"]
salesDF = spark.createDataFrame(data=salesorgcode, schema = salesColumns)
data_df = salesDF.withColumn("sales_id", salesDF["sales_id"].cast(IntegerType()))
data_df.show(truncate=False)
+----------+--------+
|sales_name|sales_id|
+----------+--------+
|Finance |6001 |
|Marketing |9001 |
|Sales |7002 |
|IT |8002 |
+----------+--------+

Related

Kusto complex json with array

This is my source format:
{
"message":[
{"name":"sensorID","value":"5"},
{"name":"eventT","value":"2021-04-16T19:11:26.149Z"},
{"name":"pressure","value":"150"}
]
}
Looking to flatten it out into a table:
sensorID
eventT
pressure
5
"2021-04-16T19:11:26.149Z"
150
Cannot for the life of me figure it out.
Splitting the array just gets me a more nested array:
test
| project ray=array_split(message, 1)
And using mv-expand gets me two separate rows:
test
| mv-expand message
At my wits end. any help greatly appreciated.
if the schema is unknown in advance, you could try something like this (using mv-apply, summarize make_bag() and bag_unpack())
datatable(d:dynamic)
[
dynamic({
"message":[
{"name":"sensorID","value":"5"},
{"name":"eventT","value":"2021-04-16T19:11:26.149Z"},
{"name":"pressure","value":"150"}
]}),
dynamic({
"message":[
{"name":"sensorID","value":"55"},
{"name":"eventT","value":"2021-03-16T19:11:26.149Z"},
{"name":"pressure","value":"1515"}
]})
]
| mv-apply d.message on (
summarize b = make_bag(pack(tostring(d_message.name), d_message.value))
)
| project b
| evaluate bag_unpack(b)
eventT
pressure
sensorID
2021-03-16 19:11:26.1490000
1515
55
2021-04-16 19:11:26.1490000
150
5

How to query dates using sqlite between

Im trying to query a range between dates but
i have tried using the date datatype,store the values in the date column as string and also use the date function but not getting the desired results
CREATE TABLE PvcTable (
date TEXT NOT NULL,
Wardname TEXT NOT NULL,
Puname TEXT NOT NULL,
PvcReceived TEXT,
PRIMARY KEY (
date,
Wardname,
Puname
)
);
the expected result is when i query let say
SELECT * from pvctable
where date between '2019-1-1' and '2019-12-1'
order by WARDNAME
i should get all the records between jan - dec 2019, but instead i get
this.only 3 records return.
date Wardname Puname PvcReceived
2019-10 01Alagarno 010KANGARWAPRISCHII 58
2019-11 02Baga 001MILEFOUR 58
2019-12 02Baga 002DARBASHATA 58
It is important to make sure that the dates in the table have the proper format YYYY-MM-DD which is comparable.
From the sample data you posted I see that there is no DD part in the dates, which is fine if you don't need it, because YYYY-MM is also comparable.
But if there is no DD part then in your query you should not compare the date column with dates containing this part, but with dates in the format YYYY-MM.
So change to this:
SELECT * from pvctable
where date between '2019-01' and '2019-12'
order by WARDNAME
See the demo.
Results:
| date | Wardname | Puname | PvcReceived |
| ------- | ---------- | ------------------- | ----------- |
| 2019-01 | 01Alagarno | 001ALAGARNOPRISCH | 58 |
| 2019-10 | 01Alagarno | 010KANGARWAPRISCHII | 58 |
| 2019-11 | 02Baga | 001MILEFOUR | 58 |
| 2019-12 | 02Baga | 002DARBASHATA | 58 |

SQLITE order by numeric and not alphabetic

When I order my database SQLITE by Classement I have this :
Classement | Nom
1 | clem
10 | caro
11 | flo
12 | raph
2 | prisc
3 | karim
4 | prout
I would like to get :
Classement | Nom
1 | clem
2 | prisc
3 | karim
4 | prout
10 | caro
11 | flo
12 | raph
Here is my code :
SELECT t.Classement
FROM tableau t
WHERE 1 = (SELECT 1 + COUNT (*) FROM tableau t2 WHERE t2.Classement < t.Classement OR ( t2.Classement == t.Classement AND t2.Nom < t.Nom ))
Can anyone help me ?
Thank you!
I guess column Classement is not an integer but character. So try this:
SELECT * FROM tableau ORDER BY cast(Classement as integer);
You get alphabetic order if the values are strings.
To change the table so that all Classement values are numbers, ensure that the column type is not a text type, and use this:
UPDATE tableau SET Classement = CAST(Classement AS NUMBER);

Levensthein logic to get all the string with minimum difference

Suppose i have a datframe with values
Mtemp:
-----+
code |
-----+
Ram |
John |
Tracy|
Aman |
i want to compare it with dataframe
M2:
------+
code |
------+
Vivek |
Girish|
Rum |
Rama |
Johny |
Stacy |
Jon |
i want to get result so that for each value in Mtemp i will get maximum 2 possible match in M2 with Levensthein distance 2.
i have used
tp<-as.data.frame(amatch(Mtemp$code,M2$code,method = "lv",maxDist = 2))
tp$orig<-Mtemp$code
colnames(tp)<-c('Res','orig')
and i am getting result as follow
Res |orig
-----+-----
3 |Ram
5 |John
6 |Tracy
4 |Aman
please let me know a way to get 2 values(if possible) for every Mtemp string with Lev distance =2

.NET Merge Single Column from datarows when ID matchs

Just in advance, I have no access to the SQL query written, so all I can do is try to handle the dataset after the query has executed.
I'm using ASP.NET Webforms to try and merge only one column across a SQL returned datatable e.g
PID | C1 | C2 | C3 | I1
1 | a | a | a | bob
1 | x | x | x | Jim
1 | b | b | b | Fred
2 | g | g | g | Jill
From this Dataset I would like to see:
PID | C1 | C2 | C3 | I1
1 | a | a | a | bob Jim Fred
2 | g | g | g | Jill
Essentially I don't care what is in C1-C3, it will just take the values of the first match. What I need to do though is join all the values of I1 into the one result based on a matching PID.
Any help would be greatly appreciated. LINQ answers acceptable, preferably in vb.net so I don't have to change it later.
Thank you.
You can use the group by with the String.Join. Ia,m adding answer in c# you might want to convert to vb(I not very well with vb syntax :P).
var result = dataListObject.GroupBy(l => l.PId )
.Select(g => new { PID = g.Key.PId, C1= g.Key.C1, C2 = g.Key.C2, C3=g.Key.C3, I1 = string.Join(",", g.Select(i => i.I1)) });
Should select the first PID, C1 ,C2 and C3. Then join the I1 together. Haven't checked this but seems like it should work.
UPDATE
For datatable you can apply the AsEnumerable() to datatable to make it enumerable
var result = datatable.AsEnumerable().GrouBy(l=>l.Field<string>("Pid"))
.Select(g=>new{PID = g.Key.PId, C1= g.Key.C1, C2 = g.Key.C2, C3=g.Key.C3, I1 = string.Join(",", g.Select(i => i.I1)) });

Resources