Best way to import bulk data into ArangoDB

Best way to import bulk data into ArangoDB - graph

I'm currently working on an ArangoDB POC. I find that the time taken for document creation is very high in ArangoDB with PyArango. It takes about 5 minutes to insert 300 documents. I've pasted the rough code below, please let me know if there are better ways to speed this up :
with open('abc.csv') as fp:
for line in fp:
dataList = line.split(",")
aaa = dbObj['aaa'].createDocument()
bbb = dbObj['bbb'].createDocument()
ccc = dbObj['ccc'].createEdge()
bbb['bbb'] = dataList[1]
aaa['aaa'] = dataList[0]
aaa._key = dataList[0]
aaa.save()
bbb.save()
ccc.links(aaa,bbb)
ccc['related_to'] = "gfdgf"
ccc['weight'] = 0
ccc.save()
The different collections are created by the below code :
dbObj.createCollection(className='aaa', waitForSync=False)

for your problem with the batch mode in the arango java driver. if you know the key attributes of the vertices you can build the document handle by "collectionName" + "/" + "documentKey".
Example:
arangoDriver.startBatchMode();
for(String line : lines)
{
String[] data = line.split(",");
BaseDocument device = new BaseDocument();
BaseDocument phyAddress = new BaseDocument();
BaseDocument conn = new BaseDocument();
String keyDevice = data[0];
String handleDevice = "DeviceId/" + keyDevice;
device.setDocumentKey(keyDevice);
device.addAttribute("device_id",data[0]);
String keyPhyAddress = data[1];
String handlePhyAddress = "PhysicalLocation/" + keyPhyAddress;
phyAddress.setDocumentKey(keyPhyAddress);
phyAddress.addAttribute("address",data[1]);
final DocumentEntity<BaseDocument> from = arangoDriver.graphCreateVertex("testGraph", "DeviceId", device, null);
final DocumentEntity<BaseDocument> to = arangoDriver.graphCreateVertex("testGraph", "PhysicalLocation", phyAddress, null);
arangoDriver.graphCreateEdge("testGraph", "DeviceId_PhysicalLocation", null, handleDevice, handlePhyAddress, null, null);
}
arangoDriver.executeBatch();

I would build all of the data to be inserted into a json formatted string and use createDocumentRaw to create them all at once with one save.

Related

How to get/build a JavaRDD[DataSet]？

When I use deeplearning4j and try to train a model in Spark
public MultiLayerNetwork fit(JavaRDD<DataSet> trainingData)
fit() need a JavaRDD parameter,
I try to build like this
val totalDaset = csv.map(row => {
val features = Array(
row.getAs[String](0).toDouble, row.getAs[String](1).toDouble
)
val labels = Array(row.getAs[String](21).toDouble)
val featuresINDA = Nd4j.create(features)
val labelsINDA = Nd4j.create(labels)
new DataSet(featuresINDA, labelsINDA)
})
but the tip of IDEA is No implicit arguments of type:Encode[DataSet]
it's a error and I dont know how to solve this problem,
I know SparkRDD can transform to JavaRDD, but I dont know how to build a Spark RDD[DataSet]
DataSet is in import org.nd4j.linalg.dataset.DataSet
Its construction method is
public DataSet(INDArray first, INDArray second) {
this(first, second, (INDArray)null, (INDArray)null);
}
this is my code
val spark:SparkSession = {SparkSession
.builder()
.master("local")
.appName("Spark LSTM Emotion Analysis")
.getOrCreate()
}
import spark.implicits._
val JavaSC = JavaSparkContext.fromSparkContext(spark.sparkContext)
val csv=spark.read.format("csv")
.option("header","true")
.option("sep",",")
.load("/home/hadoop/sparkjobs/LReg/data.csv")
val totalDataset = csv.map(row => {
val features = Array(
row.getAs[String](0).toDouble, row.getAs[String](1).toDouble
)
val labels = Array(row.getAs[String](21).toDouble)
val featuresINDA = Nd4j.create(features)
val labelsINDA = Nd4j.create(labels)
new DataSet(featuresINDA, labelsINDA)
})
val data = totalDataset.toJavaRDD
create JavaRDD[DataSet] by Java in deeplearning4j official guide:
String filePath = "hdfs:///your/path/some_csv_file.csv";
JavaSparkContext sc = new JavaSparkContext();
JavaRDD<String> rddString = sc.textFile(filePath);
RecordReader recordReader = new CSVRecordReader(',');
JavaRDD<List<Writable>> rddWritables = rddString.map(new StringToWritablesFunction(recordReader));
int labelIndex = 5; //Labels: a single integer representing the class index in column number 5
int numLabelClasses = 10; //10 classes for the label
JavaRDD<DataSet> rddDataSetClassification = rddWritables.map(new DataVecDataSetFunction(labelIndex, numLabelClasses, false));
I try to create by scala:
val JavaSC: JavaSparkContext = new JavaSparkContext()
val rddString: JavaRDD[String] = JavaSC.textFile("/home/hadoop/sparkjobs/LReg/hf-data.csv")
val recordReader: CSVRecordReader = new CSVRecordReader(',')
val rddWritables: JavaRDD[List[Writable]] = rddString.map(new StringToWritablesFunction(recordReader))
val featureColnum = 3
val labelColnum = 1
val d = new DataVecDataSetFunction(featureColnum,labelColnum,true,null,null)
// val rddDataSet: JavaRDD[DataSet] = rddWritables.map(new DataVecDataSetFunction(featureColnum,labelColnum, true,null,null))
// can not reslove overloaded method 'map'
debug error infomations:

A DataSet is just a pair of INDArrays. (inputs and labels)
Our docs cover this in depth:
https://deeplearning4j.konduit.ai/distributed-deep-learning/data-howto
For stack overflow sake, I'll summarize what's here since there's no "1" way to create a data pipeline. It's relative to your problem. It's very similar to how you you would create a dataset locally, generally you want to take whatever you do locally and put that in to spark in a function.
CSVs and images for example are going to be very different. But generally you use the datavec library to do that. The docs summarize the approach for each kind.

After encoding UTF-16, the string is broken if I want to use in iTextSharp

Firstly I am getting some informations from a text file, later these informations are added to pdf files' meta data. In the "Producer" section an error was occured about Turkish characters as ğ, ş. And I solved the problem via using UTF-16 like this:
write.Info.Put(new PdfName("Producer"), new PdfString("Ankara Üniversitesi Hukuk Fakültesi Dergisi (AÜHFD), C.59, S.2, y.2010, s.309-334.", "UTF-16"));
Here is the screenshot:
Then, I am getting all pdf files with foreach loop and reading meta data and insert into SQLite database file. The problem occurs right here. Because when I want to get from pdf file and set to database file UTF-16 encoded string (Producer data), it arises strange characters like this:
I don't understand, why it occurs error.
EDIT: Here is my all codes. The following codes get meta data from text file and insert pdf files' meta meta section:
var articles = Directory.GetFiles(FILE_PATH, "*.pdf");
foreach (var article in articles)
{
var file_name = Path.GetFileName(article);
var read = new PdfReader(article);
var size = read.GetPageSizeWithRotation(1);
var doc = new Document(size);
var write = PdfWriter.GetInstance(doc, new FileStream(TEMP_PATH + file_name, FileMode.Create, FileAccess.Write));
// Article file names like, 1.pdf, 2.pdf, 3.pdf....
// article_meta_data.txt file content like this:
//1#Article 1 Tag Number#Article 1 first - last page number#Article 1 Title#Article 1 Author#Article 1 Subject#Article 1 Keywords
//2#Article 2 Tag Number#Article 2 first - last page number#Article 2 Title#Article 2 Author#Article 2 Subject#Article 2 Keywords
//3#Article 3 Tag Number#Article 3 first - last page number#Article 3 Title#Article 3 Author#Article 3 Subject#Article 3 Keywords
var pdf_file_name = Convert.ToInt32(Path.GetFileNameWithoutExtension(article)) - 1;
var line = File.ReadAllLines(FILE_PATH + #"article_meta_data.txt");
var info = line[pdf_file_name].Split('#');
var producer = Kunye(info); // It returns like: Ankara Üniversitesi Hukuk Fakültesi Dergisi (AÜHFD), C.59, S.2, y.2010, s.309-334.
var keywords = string.IsNullOrEmpty(info[6]) ? "" : info[6];
doc.AddTitle(info[3]);
doc.AddSubject(info[5]);
doc.AddCreator("UzPDF");
doc.AddAuthor(info[4]);
write.Info.Put(new PdfName("Producer"), new PdfString(producer, "UTF-16"));
doc.AddKeywords(keywords);
doc.Open();
var cb = write.DirectContent;
for (var page_number = 1; page_number <= read.NumberOfPages; page_number++)
{
doc.NewPage();
var page = write.GetImportedPage(read, page_number);
cb.AddTemplate(page, 0, 0);
}
doc.Close();
read.Close();
File.Delete(article);
File.Move(TEMP_PATH + file_name, FILE_PATH + file_name);
}
And the following codes get data from files and insert SQLite database file. For database operation, I am using Devart - dotConnect for SQLite.
var files = Directory.GetFiles(FILE_PATH, "*.pdf");
var connection = new Linq2SQLiteDataContext();
TruncateTable(connection);
var i = 1;
foreach (var file in files)
{
var read = new PdfReader(file);
var title = read.Info["Title"].Trim();
var author = read.Info["Author"].Trim();
var producer = read.Info["Producer"].Trim();
var file_name = Path.GetFileName(file)?.Trim();
var subject = read.Info["Subject"].Trim();
var keywords = read.Info["Keywords"].Trim();
var art = new article
{
id = i,
title = (title.Length > 255) ? title.Substring(0, 255) : title,
author = (author.Length > 100) ? author.Substring(0, 100) : author,
producer = (producer.Length > 255) ? producer.Substring(0, 255) : producer,
filename = file_name != null && (file_name.Length > 50) ? file_name.Substring(0, 50) : file_name,
subject = (subject.Length > 50) ? subject.Substring(0, 50) : subject,
keywords = (keywords.Length > 500) ? keywords.Substring(0, 500) : keywords,
createdate = File.GetCreationTime(file),
update = File.GetLastWriteTime(file)
};
connection.articles.InsertOnSubmit(art);
i++;
}
connection.SubmitChanges();

Instead of:
new PdfString(producer, "UTF-16")
Use:
new PdfString(producer, PdfString.TEXT_UNICODE)
UTF-16 is a specific way to store Unicode values but you don't need to worry about that, iText will take care of everything for you.

How to post document date and document number in general journal

i'm wondering if anyone can help me figure out how to programatically set the document number and date in the general journal trans, under the tab invoice. I'm trying to post to the general journal in ax 2012 with x++. I currently have this code that works but there is no method under the ledgerjournal trans to set the document no or date. infact alot of the setters are missing and only has linenum account type, journal num etc etc.
How can i set these fields? below i have some code
static void TestLedgerJournalImport(Args _args)
{
// Set these variables.
LedgerJournalNameId journalName = 'GenJrn';
SelectableDataArea company = '019';
TransDate transactionDate = 30\6\2012;
str line1MainAccount = '131310';
str line1FullAccount = '131310--';
str line2MainAccount = '131310';
str line2FullAccount = '131310-10-';
str line2Dimension1Name = 'Department';
str line2Dimension1Value = 'ACCT';
LedgerGeneralJournalService ledgerGeneralJournalService;
LedgerGeneralJournal ledgerGeneralJournal;
AfStronglyTypedDataContainerList journalHeaderCollection;
LedgerGeneralJournal_LedgerJournalTable journalHeader;
AifEntityKeyList journalHeaderCollectionKeyList;
RecId journalHeaderRecId;
AfStronglyTypedDataContainerList journalLineCollection;
LedgerGeneralJournal_LedgerJournalTrans journalLine1;
AifMultiTypeAccount journalLine1LedgerDimension;
LedgerGeneralJournal_LedgerJournalTrans journalLine2;
AifMultiTypeAccount journalLine2LedgerDimension;
AifDimensionAttributeValue journalLine2Dim1;
AfStronglyTypedDataContainerList journalLine2DimensionCollection;
;
ledgerGeneralJournalService = LedgerGeneralJournalService::construct();
ledgerGeneralJournal = new LedgerGeneralJournal();
// Create journal header.
journalHeaderCollection = ledgerGeneralJournal.createLedgerJournalTable();
journalHeader = journalHeaderCollection.insertNew(1);
journalHeader.parmJournalName(journalName);
// Create journal lines.
journalLineCollection = journalHeader.createLedgerJournalTrans();
// Line 1
journalLine1 = journalLineCollection.insertNew(1);
journalLine1.parmLineNum(1.00);
journalLine1.parmCompany(company);
journalLine1.parmTransDate(transactionDate);
journalLine1.parmAccountType(LedgerJournalACType::Ledger);
journalLine1.parmTxt('Test journal transaction');
journalLine1.parmAmountCurDebit(100.00);
journalLine1LedgerDimension = journalLine1.createLedgerDimension();
journalLine1LedgerDimension.parmAccount(line1MainAccount);
journalLine1LedgerDimension.parmDisplayValue(line1FullAccount);
journalLine1.parmLedgerDimension(journalLine1LedgerDimension);
// Line 2
journalLine2 = journalLineCollection.insertNew(2);
journalLine2.parmLineNum(2.00);
journalLine2.parmCompany(company);
journalLine2.parmTransDate(transactionDate);
journalLine2.parmAccountType(LedgerJournalACType::Ledger);
journalLine2.parmTxt('Test journal transaction');
journalLine2.parmAmountCurCredit(100.00);
journalLine2LedgerDimension = journalLine2.createLedgerDimension();
journalLine2DimensionCollection = journalLine2LedgerDimension.createValues();
journalLine2Dim1 = new AifDimensionAttributeValue();
journalLine2Dim1.parmName(line2Dimension1Name);
journalLine2Dim1.parmValue(line2Dimension1Value);
journalLine2DimensionCollection.add(journalLine2Dim1);
journalLine2LedgerDimension.parmAccount(line2MainAccount);
journalLine2LedgerDimension.parmDisplayValue(line2FullAccount);
journalLine2LedgerDimension.parmValues(journalLine2DimensionCollection);
journalLine2.parmLedgerDimension(journalLine2LedgerDimension);
// Insert records.
journalHeader.parmLedgerJournalTrans(journalLineCollection);
ledgerGeneralJournal.parmLedgerJournalTable(journalHeaderCollection);
journalHeaderCollectionKeyList =
LedgerGeneralJournalService.create(ledgerGeneralJournal);
journalHeaderRecId =
journalHeaderCollectionKeyList.getEntityKey(1).parmRecId();
info(strFmt("LedgerJournalTable.Recid = %1", int642str(journalHeaderRecId)));
}

Don't do it like that, you're making more work for yourself. I just wrote this example for you. I hacked up a more complex piece of code I wrote, so the offsetDefaultDimension I just left in for some example code.
static void Job3(Args _args)
{
AxLedgerJournalTable journalTable = AxLedgerJournalTable::construct();
LedgerJournalTable ledgerJournalTable;
LedgerJournalName ledgerJournalName = LedgerJournalName::find('GenJrn');
AxLedgerJournalTrans journalTrans = AxLedgerJournalTrans::construct();
DimensionAttribute dimensionAttribute;
DimensionAttributeValue dimensionAttributeValue;
DimensionAttributeValueSetStorage dimStorage;
LedgerDimensionAccount ledgerDimension = DimensionDefaultingService::serviceCreateLedgerDimension(DimensionStorage::getDefaultAccountForMainAccountNum('131310'));
journalTable.parmJournalName(ledgerJournalName.JournalName);
journalTable.parmJournalType(ledgerJournalName.JournalType);
journalTable.save();
ttsBegin;
ledgerJournalTable = LedgerJournalTable::findByRecId(journalTable.ledgerJournalTable().RecId, true);
// The name gets reset if no journal number is provided, so we can just update afterwords
ledgerJournalTable.Name = 'My Custom Journal Name/Description';
ledgerJournalTable.update();
ttsCommit;
journalTrans.parmJournalNum(journalTable.ledgerJournalTable().JournalNum);
journalTrans.parmTransDate(today());
journalTrans.parmCurrencyCode('USD');
journalTrans.parmTxt('AlexOnDAX.blogspot.com');
journalTrans.parmDocumentNum('MyDocNumber');
journalTrans.parmDocumentDate(today() - 1);
journalTrans.parmAccountType(LedgerJournalACType::Ledger);
journalTrans.parmLedgerDimension(DimensionAttributeValueCombination::find(ledgerDimension).RecId);
journalTrans.parmAmountCurDebit(100.00);
journalTrans.save();
info("Done");
}

web2py SQLFORM.grid url

When I try to put form = SQLFORM.grid(db.mytable) in my controller the request changes to my/web/site/view?_signature=520af19b1095db04dda2f1b6cbea3a03c3551e13 which causes my if statement in controller to collapse. Can smbd please explain why this happens?
If I put user_signature=False then on view load the grid is shown (though the looks is awful, and I still need to find out how to change the view of my table), but on search,edit, etc. click, the same thing happens again. The url is changed and I get an error
Any suggestions?
thank you
EDIT
This is my edit function
#auth.requires_login()
def edit():
#Load workers
workers = db(db.worker.w_organisation == 10).select(db.worker.w_id_w, db.worker.w_organisation, db.worker.w_first_name, db.worker.w_last_name,db.worker.w_nick_name,db.worker.w_email,db.worker.w_status,db.worker.w_note).as_list()
#Define the query object. Here we are pulling all contacts having date of birth less than 18 Nov 1990
query = ((db.worker.w_organisation == 10) & (db.worker.w_status==db.status.s_id_s))
#Define the fields to show on grid. Note: (you need to specify id field in fields section in 1.99.2
fields = (db.worker.w_first_name, db.worker.w_last_name,db.worker.w_nick_name,db.worker.w_email,db.status.s_code,db.worker.w_note)
#Define headers as tuples/dictionaries
headers = { 'worker.w_first_name' : 'Ime',
'worker.w_last_name' : 'Priimek',
'worker.w_nick_name' : 'Vzdevek',
'worker.w_email' : 'E-posta',
'status.s_code': 'Status',
'worker.w_note' : 'Komentar' }
#Let's specify a default sort order on date_of_birth column in grid
default_sort_order=[db.worker.w_last_name]
#Creating the grid object
form = SQLFORM.grid(query=query, fields=fields, headers=headers,searchable=True, orderby=default_sort_order,create=True, \
deletable=True, editable=True, maxtextlength=64, paginate=25,user_signature=False
)
form = SQLFORM.grid(db.worker,user_signature=False)
workersDb = db((db.worker.w_organisation == 10) & (db.worker.w_status==db.status.s_id_s)).select(db.worker.w_id_w, \
db.worker.w_organisation, db.worker.w_first_name, \
db.worker.w_last_name,db.worker.w_nick_name,db.worker.w_email,\
db.status.s_code,db.worker.w_note).as_list()
workersList = []
for rec in workersDb:
status = rec['status']['s_code']
workers = rec['worker']
if not rec["worker"]["w_first_name"]:
polno_ime = rec["worker"]["w_last_name"]
elif not rec["worker"]["w_last_name"]:
polno_ime = rec["worker"]["w_first_name"]
else:
polno_ime = rec["worker"]["w_first_name"] + " " + rec["worker"]["w_last_name"]
rec["worker"]['w_full_name'] = polno_ime
rec["worker"]["w_status"] = status
data = rec["worker"]
#print rec
#print data
workersList.append(rec["worker"])
# If type of arg is int, we know that user wants to edit a script with an id of the argument
if(request.args[0].isdigit()):
script = db(getDbScript(request.args[0])).select(db.script.sc_lls, db.script.sc_name, db.script.id, db.script.sc_menu_data).first()
formData = str(script["sc_menu_data"])
#form = SQLFORM.grid(db.auth_user)
#print formData
# If we dont get any results that means that user is not giving proper request and we show him error
#print script
#Parsing script to be inserted into view
if not script:
return error(0)
return dict(newScript = False, script = script, formData = formData, workers = workersList, form = form)
# If the argument is new we prepare page for new script
elif request.args[0] == 'new':
scripts = db((auth.user.organization == db.script.sc_organization)).select(db.script.sc_name, db.script.id, workers = workersList, form = form)
return dict(newScript = True, scripts = scripts, workers = workersList, form = form)
# Else error
else:
return error(0)
also not to mention the sqlgrid looks awful, here is link to the picture https://plus.google.com/103827646559093653557/posts/Bci4PCG4BQQ

parse lazy load result table(json)

i try to parse this link : http://agent.bronni.ru/Result.aspx?id=c7a6a33a-174e-426d-b127-828ee612c36e&account=27178&page=1&pageSize=50&mr=true
but i can t get the result table because as i see in fiddler there are lazyloading method with json result.
My code is :
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load("http://agent.bronni.ru/Result.aspx?id=c7a6a33a-174e-426d-b127-828ee612c36e&account=27178&page=1&pageSize=50&mr=true");
// Get all tables in the document
HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");
// Iterate all rows in the first table
HtmlNodeCollection rows = tables[0].SelectNodes(".//tr");
var data = rows.Skip(1).ToList().Take(10).ToList().Select(x => new TableRow()
{
Price = x.SelectNodes(".//td").ToList()[4].InnerText,
Operator = x.SelectNodes(".//td").ToList()[15].InnerText,
DepartureDate = x.SelectNodes(".//td").ToList()[6].InnerText,
DestinationRegion = x.SelectNodes(".//td").ToList()[7].InnerText
}).ToList();
UPDATE
Second site :
Code
WebClient wc = new WebClient();
wc.Headers.Add("Referer", "http://sletat.ru/");//MUST BE THIS HEADER
string result = wc.DownloadString("http://module.sletat.ru/Main.svc/GetTours?cityFromId=832&countryId=35&cities=&meals=&stars=&hotels=&s_adults=1&s_kids=0&s_kids_ages=&s_nightsMin=6&s_nightsMax=16&s_priceMin=0&s_priceMax=&currencyAlias=RUB&s_departFrom=25%2F06%2F2012&s_departTo=31%2F07%2F2012&visibleOperators=&s_hotelIsNotInStop=true&s_hasTickets=true&s_ticketsIncluded=true&debug=0&filter=0&f_to_id=&requestId=19198631&pageSize=20&pageNumber=1&updateResult=1&includeDescriptions=1&includeOilTaxesAndVisa=1&userId=&jskey=1&callback=_jqjsp&_1340633427022=");
result = result.Substring(result.IndexOf("{"), result.LastIndexOf("}") - result.IndexOf("{") + 1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json = js.DeserializeObject(result);
var prices = json["GetToursResult"]["Data"]["aaData"] as object[];
// var operators = ((object[])json["result"]["prices"]).Cast<Dictionary<string, object>>();
var temp = prices.ToList().Take(20).Select(x => new TableRow
{
Operator = (x as object[])[40].ToString(),
//Price = x["operatorPrice"].ToString(),
//DepartureDate = x["checkinDate"].ToString(),
//DestinationRegion = ((Dictionary<string, object>)x["country"])["englishName"].ToString()
}).ToList();
string str = "";
foreach (var tableRow in temp)
{
str += tableRow.Operator + "<br />";
}
Response.Write(str);
In this way i try all works ok but the problem is that this link works for roughly 30minutes and then i need to put other link again.Is any way to fix this?(only the second site has it)
THanks again,

The data is really coming from here:
http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=3&pageSize=50&_=1340131756631
With the exception that the page=# and pageSize=# can be adjusted dynamically.
So instead of parsing HTML, you could just get the JSON data from the URL and parse it. For example:
WebClient wc = new WebClient();
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=1000&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json = js.DeserializeObject(result);
var prices = ((object[])json["result"]["prices"]).Cast<Dictionary<string,object>>();
var data = from p in prices
select new
{
OperatorID = p["operatorID"],
Price = p["operatorPrice"],
Country = ((Dictionary<string,object>)p["country"])["englishName"],
CheckinDate = p["checkinDate"]
};
Console.WriteLine(data);
On my LinqPad program, produces something like:
OperatorID Price Country CheckinDate
0 1,27 Greece 2012-06-28
0 55,90 Greece 2012-06-28
0 67,34 Greece 2012-06-28
And many more rows, depending on how much you ask for...
Note: the reason for the result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1); line is that the jsonp result has this garbage in the beginning:
jQuery17207647891761735082_1340131755603({"
Ending with }) which makes the JavascriptSerializer choke when it tries to parse it; hence the need to remove it.
Update:
Interestingly, the ASHX handler that returns the data seems to require a Referer Header in the request; otherwise, the response will not include the operator information. The Referer required cannot be anything you want, it seems that it's actually looking for http://agent.bronni.ru in particular.
Basically, all you need to do is the following:
WebClient wc = new WebClient();
wc.Headers.Add("Referer","http://agent.bronni.ru");//MUST BE THIS HEADER
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=1000&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json = js.DeserializeObject(result);
var prices = ((object[])json["result"]["prices"]).Cast<Dictionary<string,object>>();
var data = from p in prices
select new
{
OperatorID = p["operatorID"],
Price = p["operatorPrice"],
Country = ((Dictionary<string,object>)p["country"])["englishName"],
Hotel = ((Dictionary<string,object>)p["hotel"])["englishName"],
Operator = ((Dictionary<string,object>)p["operator"])["englishName"],//OPERATOR
CheckinDate = p["checkinDate"]
};
OperatorID Price Country Hotel Operator CheckinDate
19681 1,27 Greece Julia Hotel Mouzenidis Travel 2012-06-28
19681 1,27 Greece Forest Park Mouzenidis Travel 2012-06-28
19681 1,27 Greece Kassandra Mare (ï-îâ Êàññàíäðà) Mouzenidis Travel 2012-06-28
UPDATE 2:
I decided to compare the performance of the out-of-the-box Javascriptserializer vs JSON.NET serializer and in all my tests with different record sizes (50,1000,3000) JSON.NET was at least twice faster than the Javascriptserializer and in some cases even 10 times faster on smaller record-sets.
If you decide to use the JSON.NET library, here's the code that will get you the same results as above code:
WebClient wc = new WebClient();
wc.Headers.Add("Referer","http://agent.bronni.ru");
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=50&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JObject o = JObject.Parse(result);
var data = from x in o["result"]["prices"]
select new
{
OperatorID = x["operatorID"],
Price = x["operatorPrice"],
Country = x["country"]["englishName"],
Hotel = x["hotel"]["englishName"],
Operator = x["operator"]["englishName"],
CheckinDate = x["checkinDate"]
};
Console.WriteLine(data);

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Best way to import bulk data into ArangoDB - graph

I would build all of the data to be inserted into a json formatted string and use createDocumentRaw to create them all at once with one save.

Related

How to get/build a JavaRDD[DataSet]？

After encoding UTF-16, the string is broken if I want to use in iTextSharp

How to post document date and document number in general journal

web2py SQLFORM.grid url

parse lazy load result table(json)

Categories

Resources