I want to combine the topic to a single field. Currently I am trying this:
data_format = "json_v2"
[[inputs.mqtt_consumer.json_v2]]
[[inputs.mqtt_consumer.topic_parsing]]
topic = "+/+/+/"
tags = "name/id/value"
fields = "name/id/_"
[[inputs.mqtt_consumer.json_v2.tag]]
path = "timestamp"
But it will split it into name and id, but it should combine both to a new string and save that one.
Related
I need to read a 10GB fixed width file to a dataframe. How can I do it using Spark in R?
Suppose my text data is the following:
text <- c("0001BRAjonh ",
"0002USAmarina ",
"0003GBPcharles")
I want the 4 first characters to be associated to the column "ID" of a data frame; from character 5-7 would be associated to a column "Country"; and from character 8-14 to be associated to a column "Name"
I would use function read.fwf if the dataset was small, but that is not the case.
I can read the file as a text file using sparklyr::spark_read_text function. But I don't know how to attribute the values of the file to a data frame properly.
EDIT: Forgot to say substring starts at 1 and array starts at 0, because reasons.
Going through and adding the code I talked about in the column above.
The process is dynamic and is based off a Hive table called Input_Table. The table has 5 columns: Table_Name, Column_Name, Column_Ordinal_Position, Column_Start, and Column_Length. It is external so any user can change, drop, and remove any file into the folder location. I quickly built this from scratch to not take actual code, does everything make sense?
#Call Input DataFrame and the Hive Table. For hive table we make sure to only take correct column as well as the columns in correct order.
val inputDF = spark.read.format(recordFormat).option("header","false").load(folderLocation + "/" + tableName + "." + tableFormat).rdd.toDF("Odd_Long_Name")
val inputSchemaDF = spark.sql("select * from Input_Table where Table_Name = '" + tableName + "'").sort($"Column_Ordinal_Position")
#Build all the arrays from the columns, rdd to map to collect changes a dataframe col to a array of strings. In this format I can iterator through the column.
val columnNameArray = inputSchemaDF.selectExpr("Column_Name").rdd.map(x=>x.mkString).collect
val columnStartArray = inputSchemaDF.selectExpr("Column_Start_Position").rdd.map(x=>x.mkString).collect
val columnLengthArray = inputSchemaDF.selectExpr("Column_Length").rdd.map(x=>x.mkString).collect
#Make the iteraros as well as other variables that are meant to be overwritten
var columnAllocationIterator = 1
var localCommand = ""
var commandArray = Array("")
#Loop as there are as many columns in input table
while (columnAllocationIterator <= columnNameArray.length) {
#overwrite the string command with the new command, thought odd long name was too accurate to not place into the code
localCommand = "substring(Odd_Long_Name, " + columnStartArray(columnAllocationIterator-1) + ", " + columnLengthArray(columnAllocationIterator-1) + ") as " + columnNameArray(columnAllocationIterator-1)
#If the code is running the first time it overwrites the command array, else it just appends
if (columnAllocationIterator==1) {
commandArray = Array(localCommand)
} else {
commandArray = commandArray ++ Array(localCommand)
}
#I really like iterating my iterators like this
columnAllocationIterator = columnAllocationIterator + 1
}
#Run all elements of the string array indepently against the table
val finalDF = inputDF.selectExpr(commandArray:_*)
In u-sql script I must extract a variable from file to a dataset and then use it to form a name of output file. How can I get the variable from the dataset?
In details.
I have 2 input files: csv file with a set of fields and a dictionary file. The 1st file has file name like ****ClintCode*****.csv. The 2nd file-dictionary has 2 fields with mapping: ClientCode - ClintCode2. My task is extract ClientCode value from the file name, get ClientCode2 from the dictionary, insert it as a field to output file (implemented), and, moreover, form the name of output file as ****ClientCode2****.csv.
Dictionary csv file has the content:
OldCode NewCode
6HAA Alfa
CCVV Beta
CVXX gamma
? Davis
The question is how to get ClientCode2 into scalar variable to write an expression for the output file?
DECLARE #inputFile string = "D:/DFS_SSC_Automation/Tasks/FundInfo/ESP_FAD_GL_6HAA_20170930.txt"; // '6HAA' is ClientCode here that mapped to other code in ClientCode_KVP.csv
DECLARE #outputFile string = "D:/DFS_SSC_Automation/Tasks/FundInfo/ClientCode_sftp_" + // 'ClientCode' should be replaced with ClientCode from mapping in ClientCode_KVP.csv
DateTime.Now.ToString("yyyymmdd") + "_" +
DateTime.Now.ToString("HHmmss") + ".csv";
DECLARE #dictionaryFile string = "D:/DFS_SSC_Automation/ClientCode_KVP.csv";
#dict =
EXTRACT [OldCode] string,
[NewCode] string
FROM #dictionaryFile
USING Extractors.Text(skipFirstNRows : 1, delimiter : ',');
#theCode =
SELECT Path.GetFileNameWithoutExtension(#inputFile).IndexOf([OldCode]) >= 0 ? 1 : 3 AS [CodeExists],
[NewCode]
FROM #dict
UNION
SELECT *
FROM(
VALUES
(
2,
""
)) AS t([CodeExists],[NewCode]);
#code =
SELECT [NewCode]
FROM #theCode
ORDER BY [CodeExists]
FETCH 1 ROWS;
#GLdata =
EXTRACT [ASAT] string,
[ASOF] string,
[BASIS_INDICATOR] string,
[CALENDAR_DATE] string,
[CR_EOP_AMOUNT] string,
[DR_EOP_AMOUNT] string,
[FUND_ID] string,
[GL_ACCT_TYPE_IND] string,
[TRANS_CLIENT_FUND_NUM] string
FROM #inputFile
USING Extractors.Text(delimiter : '|', skipFirstNRows : 1);
// Prepare output dataset
#FundInfoGL =
SELECT "" AS [AccountPeriodEnd],
"" AS [ClientCode],
[FUND_ID] AS [FundCode],
SUM(GL_ACCT_TYPE_IND == "A"? System.Convert.ToDecimal(DR_EOP_AMOUNT) : 0) AS [NetValueOtherAssets],
SUM(GL_ACCT_TYPE_IND == "L"? System.Convert.ToDecimal(CR_EOP_AMOUNT) : 0) AS [NetValueOtherLiabilities],
0.0000 AS [NetAssetsOfSeries]
FROM #GLdata
GROUP BY FUND_ID;
// NetAssetsOfSeries calculation
#FundInfoGLOut =
SELECT [AccountPeriodEnd],
[NewCode] AS [ClientCode],
[FundCode],
Convert.ToString([NetValueOtherAssets]) AS [NetValueOtherAssets],
Convert.ToString([NetValueOtherLiabilities]) AS [NetValueOtherLiabilities],
Convert.ToString([NetValueOtherAssets] - [NetValueOtherLiabilities]) AS [NetAssetsOfSeries]
FROM #FundInfoGL
CROSS JOIN #code;
// Output
OUTPUT #FundInfoGLOut
TO #outputFile
USING Outputters.Text(outputHeader : true, delimiter : '|', quoting : false);
As David points out: You cannot assign query results to scalar variables.
However, we have a dynamic partitioned output feature in private preview right now that will give you the ability to generate file names based on column values. Please contact me if you want to try it out.
You can't. Please see Convert Rowset variables to scalar value.
You may still be able to achieve your ultimate goal in a different manner. Please consider re-writing your post with clear & concise language, small dataset, expected output, and a very minimal amount of code needed to repro - remove all details and nuances that aren't necessary to create a test case.
So I have a root model that looks like this:
class Contact(ndb.Model):
first_name= ndb.StringProperty()
last_name= ndb.StringProperty()
age = ndb.IntegerProperty()
and a child model that looks like this:
class Address(ndb.Model)
address_type=ndb.StringProperty(choices=['Home','Office','School'],default='Home')
street = ndb.StringProperty()
city = ndb.StringProperty()
state = ndb.StringProperty()
I want to be able to perform a query similar to this:
Select first_name, last_name, street, city, state WHERE contact.age > 25 and address.city = 'Miami' and address_type = 'School'
I know I can perform searches more easily if I were to setup the addresses as a structured property within the contact model, but I don't like using Structured Properties because they don't have their own keys, thus making entity maintenance more challenging.
I tried doing a search for contacts first and then feeding the resulting keys into a WHERE IN clause but it didn't work, example:
query1 = Contact.query(Contact.age>25).iter(keys_only = True)
query2 = Address.query(Address.city=='Miami', Address.address_type=='School',Address.ancestor.IN(query1))
Any ideas as to how to go about this would be appreciated.
OK so it looks like my original idea of filtering one query by passing in the keys of another will work. The problem is that you can't perform a WHERE-IN clause against an ancestor property so you have to store the parent key as a standard ndb.KeyProperty() inside of the child entity, then perform the WHERE-IN clause against that KeyProperty field.
Here's an example that will work directly from the interactive console in the Appengine SDK:
from google.appengine.ext import ndb
class Contact(ndb.Model):
first_name= ndb.StringProperty()
last_name= ndb.StringProperty()
age = ndb.IntegerProperty()
class Address(ndb.Model):
address_type=ndb.StringProperty(choices=['Home','Office','School'],default='Home')
street = ndb.StringProperty()
city = ndb.StringProperty()
state = ndb.StringProperty()
contact = ndb.KeyProperty()
# Contact 1
contact1 = Contact(first_name='Homer', last_name='Simpson', age=45)
contact1_result = contact1.put()
contact1_address1 = Address(address_type='Home',street='742 Evergreen Terrace', city='Springfield', state='Illinois', contact=contact1_result, parent=contact1_result)
contact1_address1.put()
contact1_address2 = Address(address_type='Office',street=' 1 Industry Row', city='Springfield', state='Illinois', contact=contact1_result, parent=contact1_result)
contact1_address2.put()
# Contact 2
contact2 = Contact(first_name='Peter', last_name='Griffan', age=42)
contact2_result = contact2.put()
contact2_address1 = Address(address_type='Home',street='31 Spooner Street', city='Quahog', state='Rhode Island', contact=contact2_result, parent=contact2_result)
contact2_address1.put()
# This gets the keys of all the contacts that are over the age of 25
qry1 = Contact.query(Contact.age>25).fetch(keys_only=True)
# This query gets all addresses of type 'Home' where the contacts are in the result set of qry1
qry2 = Address.query(Address.address_type=='Home').filter(Address.contact.IN(qry1))
for item in qry2:
print 'Contact: %s,%s,%s,%s'% (item.contact.get().first_name, item.contact.get().last_name, item.address_type, item.street)
This will render a result that looks kinda like this:
Contact: Peter,Griffan,Home,31 Spooner Street
Contact: Homer,Simpson,Home,742 Evergreen Terrace
Can you use an Ancestor query?
query1 = Contact.query(Contact.age>25).iter(keys_only = True)
for contact in query1:
query2 = Address.query(Address.city=='Miami',
Address.address_type=='School',
ancestor=contact)
If that's not efficient enough, how about filtering the addresses?
query1 = Contact.query(Contact.age>25).iter(keys_only = True)
contacts = set(query1)
query2 = Address.query(Address.city=='Miami', Address.address_type=='School')
addresses = [address for address in query2 if address.key.parent() in contacts]
Here are my many-to-many relationship models:
class ModelA(ndb.Model):
name = ndb.StringProperty(required=true)
model_b = ndb.KeyProperty(kind=ModelB,repeated=True)
class ModelB(ndb.Model):
name = ndb.StringProperty(required=true)
model_a = ndb.KeyProperty(kind=ModelA,repeated=True)
My question is, how do I add/update/delete a single (or many) KeyProperty from let's say model_b?
I managed to do it like this:
pos = ModelA.model_b.index(ndb.Key('ModelB',213)) # Get position from list
ModelA.model_b.pop(pos) # Remove from list
ModelA.put() # Update
I want to know how I can work with collection in cqlengine
I can insert value to list but just one value so I can't append some value to my list
I want to do this:
In CQL3:
UPDATE users
SET top_places = [ 'the shire' ] + top_places WHERE user_id = 'frodo';
In CqlEngine:
connection.setup(['127.0.0.1:9160'])
TestModel.create(id=1,field1 = [2])
this code will add 2 to my list but when I insert new value it replace by old value in list.
The only help in Cqlengine :
https://cqlengine.readthedocs.org/en/latest/topics/columns.html#collection-type-columns
And I want to know that how I can Read collection field by cqlengine.
Is it an dictionary in my django project? how I can use it?!!
Please help.
Thanks
Looking at your example it's a list.
Given a table based on the Cassandra CQL documentation:
CREATE TABLE plays (
id text PRIMARY KEY,
game text,
players int,
scores list<int>
)
You have to declare model like this:
class Plays(Model):
id = columns.Text(primary_key=True)
game = columns.Text()
players = columns.Integer()
scores = columns.List(columns.Integer())
You can create a new entry like this (omitting the code how to connect):
Plays.create(id = '123-afde', game = 'quake', players = 3, scores = [1, 2, 3])
Then to update the list of scores one does:
play = Plays.objects.filter(id = '123-afde').get()
play.scores.append(20) # <- this will add a new entry at the end of the list
play.save() # <- this will propagate the update to Cassandra - don't forget it
Now if you query your data with the CQL client you should see new values:
id | game | players | scores
----------+-------+---------+---------------
123-afde | quake | 3 | [1, 2, 3, 20]
To get the values in python you can simply use an index of an array:
print "Length is %(len)s and 3rd element is %(val)d" %\
{ "len" : len(play.scores), "val": play.scores[2] }