Convert FluxJust into JsonObject - azure-cosmosdb

I am reading stream data from cosmosdb, it gives results in pageflux which i converted to FluxJust. now i wanted the result to be in JsonObject.
val pagedFlux = container.queryItems(QUERY, queryOptions, classOf[JsonNode])
val cosmosQueryResponseObjectAsAJSONArray = pagedFlux.byPage(preferredPageSize).map(page => Flux.just(page.getResults.stream())).collectList().block()

Related

How to get/build a JavaRDD[DataSet]?

When I use deeplearning4j and try to train a model in Spark
public MultiLayerNetwork fit(JavaRDD<DataSet> trainingData)
fit() need a JavaRDD parameter,
I try to build like this
val totalDaset = csv.map(row => {
val features = Array(
row.getAs[String](0).toDouble, row.getAs[String](1).toDouble
)
val labels = Array(row.getAs[String](21).toDouble)
val featuresINDA = Nd4j.create(features)
val labelsINDA = Nd4j.create(labels)
new DataSet(featuresINDA, labelsINDA)
})
but the tip of IDEA is No implicit arguments of type:Encode[DataSet]
it's a error and I dont know how to solve this problem,
I know SparkRDD can transform to JavaRDD, but I dont know how to build a Spark RDD[DataSet]
DataSet is in import org.nd4j.linalg.dataset.DataSet
Its construction method is
public DataSet(INDArray first, INDArray second) {
this(first, second, (INDArray)null, (INDArray)null);
}
this is my code
val spark:SparkSession = {SparkSession
.builder()
.master("local")
.appName("Spark LSTM Emotion Analysis")
.getOrCreate()
}
import spark.implicits._
val JavaSC = JavaSparkContext.fromSparkContext(spark.sparkContext)
val csv=spark.read.format("csv")
.option("header","true")
.option("sep",",")
.load("/home/hadoop/sparkjobs/LReg/data.csv")
val totalDataset = csv.map(row => {
val features = Array(
row.getAs[String](0).toDouble, row.getAs[String](1).toDouble
)
val labels = Array(row.getAs[String](21).toDouble)
val featuresINDA = Nd4j.create(features)
val labelsINDA = Nd4j.create(labels)
new DataSet(featuresINDA, labelsINDA)
})
val data = totalDataset.toJavaRDD
create JavaRDD[DataSet] by Java in deeplearning4j official guide:
String filePath = "hdfs:///your/path/some_csv_file.csv";
JavaSparkContext sc = new JavaSparkContext();
JavaRDD<String> rddString = sc.textFile(filePath);
RecordReader recordReader = new CSVRecordReader(',');
JavaRDD<List<Writable>> rddWritables = rddString.map(new StringToWritablesFunction(recordReader));
int labelIndex = 5; //Labels: a single integer representing the class index in column number 5
int numLabelClasses = 10; //10 classes for the label
JavaRDD<DataSet> rddDataSetClassification = rddWritables.map(new DataVecDataSetFunction(labelIndex, numLabelClasses, false));
I try to create by scala:
val JavaSC: JavaSparkContext = new JavaSparkContext()
val rddString: JavaRDD[String] = JavaSC.textFile("/home/hadoop/sparkjobs/LReg/hf-data.csv")
val recordReader: CSVRecordReader = new CSVRecordReader(',')
val rddWritables: JavaRDD[List[Writable]] = rddString.map(new StringToWritablesFunction(recordReader))
val featureColnum = 3
val labelColnum = 1
val d = new DataVecDataSetFunction(featureColnum,labelColnum,true,null,null)
// val rddDataSet: JavaRDD[DataSet] = rddWritables.map(new DataVecDataSetFunction(featureColnum,labelColnum, true,null,null))
// can not reslove overloaded method 'map'
debug error infomations:
A DataSet is just a pair of INDArrays. (inputs and labels)
Our docs cover this in depth:
https://deeplearning4j.konduit.ai/distributed-deep-learning/data-howto
For stack overflow sake, I'll summarize what's here since there's no "1" way to create a data pipeline. It's relative to your problem. It's very similar to how you you would create a dataset locally, generally you want to take whatever you do locally and put that in to spark in a function.
CSVs and images for example are going to be very different. But generally you use the datavec library to do that. The docs summarize the approach for each kind.

addQueue.leaseTasks(options) returns empty params []

addQueue.leaseTasks(options) returns empty params []
I created a queue added data when I try to get the data out TaskHandle has a empty parms[]
//Add to queue
Queue addQueue = queueService.addQueue();
TaskHandle task = addQueue.add(mapFundToTask(fund));
private TaskOptions mapFundToTask(Fund fund){
return TaskOptions.Builder.withMethod(Method.PULL)
.tag("FundTask")
.param("ClientId", fund.getClientId())
.param("FundId", fund.getFundId())
.param("FundName", fund.getFundName());
}
// Get data from queue
Queue addQueue = queueService.addQueue();
int count = 2;
Long leaseDuration = 1000L;
LeaseOptions options = LeaseOptions.Builder
.withTag("FundTask")
.countLimit(count)
.leasePeriod(leaseDuration, TimeUnit.MILLISECONDS);
List<TaskHandle> tasks = addQueue.leaseTasks(options);
My fault, it was saving the params, but when I did taskHolder.toString() it returned params as []. But List> entries = taskHolder.extractParams(); returned a list of enties with the data I had set in the params

How to group all duplicate object to one list and all unique object to another list from a original list in C#?

I have a text file and to read from and convert each line to and object with Id and someText. I would like to group them so that I have two lists: unique list and duplicate list. the data is very big up to hundred of thousand of lines. Which is the best data structure to use? Please provide some sample code in C#. Thanks a lot!
for example:
original list read from text file:
{(1, someText),(2, someText),(3, someText),(3, someText1),(4, someText)}
unique list:
{(1, someText),(2, someText),(4, someText)}
duplicate list:
{(3, someText),(3, someText1)}
Here's an example with LinQ
Random rnd = new Random();
StreamReader sr = new StreamReader("enterYourPathHere");
string line = "";
int cnt = 0; //This will "generate our ids".
List<KeyValuePair<int,string>> values = new List<KeyValuePair<int, string>>();
while ((line = sr.ReadLine()) != null)
{
//You convert the line to your object (using keyvaluepair for testing)
var obj = new KeyValuePair<int, string>(cnt, line);
values.Add(obj);
//Increment the id on with 50% chances
if (rnd.Next(0,1) >0.5) cnt++;
}
var unique = values.GroupBy(x=>x.Key).Distinct().Select(x=>x).ToList();
var duplicates = values.GroupBy(x => x.Key).Where(x => x.Count() > 1).Select(x => x).ToList();

parse lazy load result table(json)

i try to parse this link : http://agent.bronni.ru/Result.aspx?id=c7a6a33a-174e-426d-b127-828ee612c36e&account=27178&page=1&pageSize=50&mr=true
but i can t get the result table because as i see in fiddler there are lazyloading method with json result.
My code is :
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load("http://agent.bronni.ru/Result.aspx?id=c7a6a33a-174e-426d-b127-828ee612c36e&account=27178&page=1&pageSize=50&mr=true");
// Get all tables in the document
HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");
// Iterate all rows in the first table
HtmlNodeCollection rows = tables[0].SelectNodes(".//tr");
var data = rows.Skip(1).ToList().Take(10).ToList().Select(x => new TableRow()
{
Price = x.SelectNodes(".//td").ToList()[4].InnerText,
Operator = x.SelectNodes(".//td").ToList()[15].InnerText,
DepartureDate = x.SelectNodes(".//td").ToList()[6].InnerText,
DestinationRegion = x.SelectNodes(".//td").ToList()[7].InnerText
}).ToList();
UPDATE
Second site :
Code
WebClient wc = new WebClient();
wc.Headers.Add("Referer", "http://sletat.ru/");//MUST BE THIS HEADER
string result = wc.DownloadString("http://module.sletat.ru/Main.svc/GetTours?cityFromId=832&countryId=35&cities=&meals=&stars=&hotels=&s_adults=1&s_kids=0&s_kids_ages=&s_nightsMin=6&s_nightsMax=16&s_priceMin=0&s_priceMax=&currencyAlias=RUB&s_departFrom=25%2F06%2F2012&s_departTo=31%2F07%2F2012&visibleOperators=&s_hotelIsNotInStop=true&s_hasTickets=true&s_ticketsIncluded=true&debug=0&filter=0&f_to_id=&requestId=19198631&pageSize=20&pageNumber=1&updateResult=1&includeDescriptions=1&includeOilTaxesAndVisa=1&userId=&jskey=1&callback=_jqjsp&_1340633427022=");
result = result.Substring(result.IndexOf("{"), result.LastIndexOf("}") - result.IndexOf("{") + 1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json = js.DeserializeObject(result);
var prices = json["GetToursResult"]["Data"]["aaData"] as object[];
// var operators = ((object[])json["result"]["prices"]).Cast<Dictionary<string, object>>();
var temp = prices.ToList().Take(20).Select(x => new TableRow
{
Operator = (x as object[])[40].ToString(),
//Price = x["operatorPrice"].ToString(),
//DepartureDate = x["checkinDate"].ToString(),
//DestinationRegion = ((Dictionary<string, object>)x["country"])["englishName"].ToString()
}).ToList();
string str = "";
foreach (var tableRow in temp)
{
str += tableRow.Operator + "<br />";
}
Response.Write(str);
In this way i try all works ok but the problem is that this link works for roughly 30minutes and then i need to put other link again.Is any way to fix this?(only the second site has it)
THanks again,
The data is really coming from here:
http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=3&pageSize=50&_=1340131756631
With the exception that the page=# and pageSize=# can be adjusted dynamically.
So instead of parsing HTML, you could just get the JSON data from the URL and parse it. For example:
WebClient wc = new WebClient();
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=1000&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json = js.DeserializeObject(result);
var prices = ((object[])json["result"]["prices"]).Cast<Dictionary<string,object>>();
var data = from p in prices
select new
{
OperatorID = p["operatorID"],
Price = p["operatorPrice"],
Country = ((Dictionary<string,object>)p["country"])["englishName"],
CheckinDate = p["checkinDate"]
};
Console.WriteLine(data);
On my LinqPad program, produces something like:
OperatorID Price Country CheckinDate
0 1,27 Greece 2012-06-28
0 55,90 Greece 2012-06-28
0 67,34 Greece 2012-06-28
And many more rows, depending on how much you ask for...
Note: the reason for the result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1); line is that the jsonp result has this garbage in the beginning:
jQuery17207647891761735082_1340131755603({"
Ending with }) which makes the JavascriptSerializer choke when it tries to parse it; hence the need to remove it.
Update:
Interestingly, the ASHX handler that returns the data seems to require a Referer Header in the request; otherwise, the response will not include the operator information. The Referer required cannot be anything you want, it seems that it's actually looking for http://agent.bronni.ru in particular.
Basically, all you need to do is the following:
WebClient wc = new WebClient();
wc.Headers.Add("Referer","http://agent.bronni.ru");//MUST BE THIS HEADER
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=1000&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json = js.DeserializeObject(result);
var prices = ((object[])json["result"]["prices"]).Cast<Dictionary<string,object>>();
var data = from p in prices
select new
{
OperatorID = p["operatorID"],
Price = p["operatorPrice"],
Country = ((Dictionary<string,object>)p["country"])["englishName"],
Hotel = ((Dictionary<string,object>)p["hotel"])["englishName"],
Operator = ((Dictionary<string,object>)p["operator"])["englishName"],//OPERATOR
CheckinDate = p["checkinDate"]
};
OperatorID Price Country Hotel Operator CheckinDate
19681 1,27 Greece Julia Hotel Mouzenidis Travel 2012-06-28
19681 1,27 Greece Forest Park Mouzenidis Travel 2012-06-28
19681 1,27 Greece Kassandra Mare (ï-îâ Êàññàíäðà) Mouzenidis Travel 2012-06-28
UPDATE 2:
I decided to compare the performance of the out-of-the-box Javascriptserializer vs JSON.NET serializer and in all my tests with different record sizes (50,1000,3000) JSON.NET was at least twice faster than the Javascriptserializer and in some cases even 10 times faster on smaller record-sets.
If you decide to use the JSON.NET library, here's the code that will get you the same results as above code:
WebClient wc = new WebClient();
wc.Headers.Add("Referer","http://agent.bronni.ru");
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=50&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JObject o = JObject.Parse(result);
var data = from x in o["result"]["prices"]
select new
{
OperatorID = x["operatorID"],
Price = x["operatorPrice"],
Country = x["country"]["englishName"],
Hotel = x["hotel"]["englishName"],
Operator = x["operator"]["englishName"],
CheckinDate = x["checkinDate"]
};
Console.WriteLine(data);

LINQ group by and compare date

I have the following:
var currentDate = DateTime.UtcNow;
var calendarEntry = from item in new CalendarEntryRepository(this.Db).List().Where(x => x.Culture == language.Value)
group item by item.ContentObjectId into g
let itemMaxDate = g.Where(i => i.StartDate > currentDate).Select(i => i.StartDate).DefaultIfEmpty()
let city = g.Select(i => i.City).FirstOrDefault()
select new
{
ContentObjectId = g.Key,
StartDate = itemMaxDate,
City = city ?? string.Empty
};
From CalendarEntryRepository I want to group by ContentObjectId (this works fine). However when i add this line:
let itemMaxDate = g.Where(i => i.StartDate > currentDate).Select(i => i.StartDate).DefaultIfEmpty()
I keep getting this error:
Message = "The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value."
What i'm trying to do is group by ContentObjectId and then get StartDate that is greater than today.
I'm using entity framwork and MS SQL2008
thanks
The Min Value of DateTime in your database has a lower value that the MinValue defined int he culture you used in your App.
Change the StartDate column in DataBase to data type DateTime2 to support a broader range of dates

Resources