After encoding UTF-16, the string is broken if I want to use in iTextSharp - sqlite

Firstly I am getting some informations from a text file, later these informations are added to pdf files' meta data. In the "Producer" section an error was occured about Turkish characters as ğ, ş. And I solved the problem via using UTF-16 like this:
write.Info.Put(new PdfName("Producer"), new PdfString("Ankara Üniversitesi Hukuk Fakültesi Dergisi (AÜHFD), C.59, S.2, y.2010, s.309-334.", "UTF-16"));
Here is the screenshot:
Then, I am getting all pdf files with foreach loop and reading meta data and insert into SQLite database file. The problem occurs right here. Because when I want to get from pdf file and set to database file UTF-16 encoded string (Producer data), it arises strange characters like this:
I don't understand, why it occurs error.
EDIT: Here is my all codes. The following codes get meta data from text file and insert pdf files' meta meta section:
var articles = Directory.GetFiles(FILE_PATH, "*.pdf");
foreach (var article in articles)
{
var file_name = Path.GetFileName(article);
var read = new PdfReader(article);
var size = read.GetPageSizeWithRotation(1);
var doc = new Document(size);
var write = PdfWriter.GetInstance(doc, new FileStream(TEMP_PATH + file_name, FileMode.Create, FileAccess.Write));
// Article file names like, 1.pdf, 2.pdf, 3.pdf....
// article_meta_data.txt file content like this:
//1#Article 1 Tag Number#Article 1 first - last page number#Article 1 Title#Article 1 Author#Article 1 Subject#Article 1 Keywords
//2#Article 2 Tag Number#Article 2 first - last page number#Article 2 Title#Article 2 Author#Article 2 Subject#Article 2 Keywords
//3#Article 3 Tag Number#Article 3 first - last page number#Article 3 Title#Article 3 Author#Article 3 Subject#Article 3 Keywords
var pdf_file_name = Convert.ToInt32(Path.GetFileNameWithoutExtension(article)) - 1;
var line = File.ReadAllLines(FILE_PATH + #"article_meta_data.txt");
var info = line[pdf_file_name].Split('#');
var producer = Kunye(info); // It returns like: Ankara Üniversitesi Hukuk Fakültesi Dergisi (AÜHFD), C.59, S.2, y.2010, s.309-334.
var keywords = string.IsNullOrEmpty(info[6]) ? "" : info[6];
doc.AddTitle(info[3]);
doc.AddSubject(info[5]);
doc.AddCreator("UzPDF");
doc.AddAuthor(info[4]);
write.Info.Put(new PdfName("Producer"), new PdfString(producer, "UTF-16"));
doc.AddKeywords(keywords);
doc.Open();
var cb = write.DirectContent;
for (var page_number = 1; page_number <= read.NumberOfPages; page_number++)
{
doc.NewPage();
var page = write.GetImportedPage(read, page_number);
cb.AddTemplate(page, 0, 0);
}
doc.Close();
read.Close();
File.Delete(article);
File.Move(TEMP_PATH + file_name, FILE_PATH + file_name);
}
And the following codes get data from files and insert SQLite database file. For database operation, I am using Devart - dotConnect for SQLite.
var files = Directory.GetFiles(FILE_PATH, "*.pdf");
var connection = new Linq2SQLiteDataContext();
TruncateTable(connection);
var i = 1;
foreach (var file in files)
{
var read = new PdfReader(file);
var title = read.Info["Title"].Trim();
var author = read.Info["Author"].Trim();
var producer = read.Info["Producer"].Trim();
var file_name = Path.GetFileName(file)?.Trim();
var subject = read.Info["Subject"].Trim();
var keywords = read.Info["Keywords"].Trim();
var art = new article
{
id = i,
title = (title.Length > 255) ? title.Substring(0, 255) : title,
author = (author.Length > 100) ? author.Substring(0, 100) : author,
producer = (producer.Length > 255) ? producer.Substring(0, 255) : producer,
filename = file_name != null && (file_name.Length > 50) ? file_name.Substring(0, 50) : file_name,
subject = (subject.Length > 50) ? subject.Substring(0, 50) : subject,
keywords = (keywords.Length > 500) ? keywords.Substring(0, 500) : keywords,
createdate = File.GetCreationTime(file),
update = File.GetLastWriteTime(file)
};
connection.articles.InsertOnSubmit(art);
i++;
}
connection.SubmitChanges();

Instead of:
new PdfString(producer, "UTF-16")
Use:
new PdfString(producer, PdfString.TEXT_UNICODE)
UTF-16 is a specific way to store Unicode values but you don't need to worry about that, iText will take care of everything for you.

Related

How to convert duration to seconds in Google Sheets which uses IMPORTRANGE

I have copied the data from Googlesheet1 to Googlesheet2 using the below query
=IMPORTRANGE("url","!A2:H")
Which has copied the data from Googlesheet1 to Googlesheet2.
In that sheet, I am having a duration column like the below image
When i used the app script to copy the data to the firestore instead of saving the duration it saves the data in DateTime format like below.
Is there any way to convert the given duration to seconds in Google sheet.
I have tried using =value(G2*24*3600) but it didn't work in the Googlesheet2 since that sheet is a clone of Googlesheet1
App script Logic:
function firestore() {
// Firestore setup
const email = "//client-email";
const key = "//client-key";
const projectId = "timesheet-aog";
var firestore = FirestoreApp.getFirestore (email, key, projectId);
// get document data from ther spreadsheet
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheetname = "timesheet";
var sheet = ss.getSheetByName(sheetname);
// get the last row and column in order to define range
var sheetLR = sheet.getLastRow(); // get the last row
var sheetLC = sheet.getLastColumn(); // get the last column
var dataSR = 2; // the first row of data
// define the data range
var sourceRange = sheet.getRange(2,1,sheetLR-dataSR+1,sheetLC);
// get the data
var sourceData = sourceRange.getValues();
// get the number of length of the object in order to establish a loop value
var sourceLen = sourceData.length;
console.log('sourceLen is', sourceLen);
// Loop through the rows
for (var i=0;i<sourceLen;i++){
var data = {};
console.log('data is', sourceData);
data.date = sourceData[i][0];
data.name = sourceData[i][1];
data.workFrom = sourceData[i][2];
data.project = sourceData[i][3];
data.phase = sourceData[i][4];
data.task = sourceData[i][5];
data.totalHrs = sourceData[i][6];
data.comments = sourceData[i][7];
firestore.createDocument("timesheet",data);
}
}
Here is the formula for A1 cell of the second sheet:
={
IMPORTRANGE("url","!A2:F"),
ARRAYFORMULA(
IF(
IMPORTRANGE("url","!G2:G") = "",
"",
N(IMPORTRANGE("url","!G2:G")) * 24 * 3600
)
),
IMPORTRANGE("url","!H2:H")
}
Try using named ranges for columns (A2:F, G2:G, H2:H) in the original sheet, and import them by those names so you won't need to adjust the formula where exact column names are used.

Write to Spreadsheet in Google sheet from ASP.NET

I have a code, it should write in the Spreadsheet of google sheet. When I run the function, I receive this error:
Message[Requested writing within range ['6/12/2019-20:37'!A1], but
tried writing to column [B]] Location[ - ] Reason[badRequest]
Domain[global]
That its my code:
private void SheetPattern(Item webinar)
{
var valueRange = new ValueRange();
var range = $"{sheet}!A:D";
DateTime dateTime=(DateTime)webinar.webInfo.times[0].startTime;
var date = dateTime.Day+"-"+dateTime.Month+"-"+dateTime.Year;
var hour = dateTime.Hour + ":" + dateTime.Minute;
var webName = webinar.webInfo.subject;
var webDescription = webinar.webInfo.description;
var oblist = new List<object>() { date, hour, webName, webDescription};
valueRange.Values = new List<IList<object>> { oblist };
var appendRequest = service.Spreadsheets.Values.Append(valueRange, SpreadsheetId, range);
Console.WriteLine(appendRequest);
appendRequest.ValueInputOption = SpreadsheetsResource.ValuesResource.AppendRequest.ValueInputOptionEnum.USERENTERED;
var appendReponse = appendRequest.Execute();
}
I found the problem its a Syntax problem, here:
var hour = dateTime.Hour + ":" + dateTime.Minute;
when I make a new sheet with a new name, google sheet doesn't permit the char : in the sheet name. So I change this code for that code:
var hour = dateTime.Hour + "-" + dateTime.Minute;

Best way to import bulk data into ArangoDB

I'm currently working on an ArangoDB POC. I find that the time taken for document creation is very high in ArangoDB with PyArango. It takes about 5 minutes to insert 300 documents. I've pasted the rough code below, please let me know if there are better ways to speed this up :
with open('abc.csv') as fp:
for line in fp:
dataList = line.split(",")
aaa = dbObj['aaa'].createDocument()
bbb = dbObj['bbb'].createDocument()
ccc = dbObj['ccc'].createEdge()
bbb['bbb'] = dataList[1]
aaa['aaa'] = dataList[0]
aaa._key = dataList[0]
aaa.save()
bbb.save()
ccc.links(aaa,bbb)
ccc['related_to'] = "gfdgf"
ccc['weight'] = 0
ccc.save()
The different collections are created by the below code :
dbObj.createCollection(className='aaa', waitForSync=False)
for your problem with the batch mode in the arango java driver. if you know the key attributes of the vertices you can build the document handle by "collectionName" + "/" + "documentKey".
Example:
arangoDriver.startBatchMode();
for(String line : lines)
{
String[] data = line.split(",");
BaseDocument device = new BaseDocument();
BaseDocument phyAddress = new BaseDocument();
BaseDocument conn = new BaseDocument();
String keyDevice = data[0];
String handleDevice = "DeviceId/" + keyDevice;
device.setDocumentKey(keyDevice);
device.addAttribute("device_id",data[0]);
String keyPhyAddress = data[1];
String handlePhyAddress = "PhysicalLocation/" + keyPhyAddress;
phyAddress.setDocumentKey(keyPhyAddress);
phyAddress.addAttribute("address",data[1]);
final DocumentEntity<BaseDocument> from = arangoDriver.graphCreateVertex("testGraph", "DeviceId", device, null);
final DocumentEntity<BaseDocument> to = arangoDriver.graphCreateVertex("testGraph", "PhysicalLocation", phyAddress, null);
arangoDriver.graphCreateEdge("testGraph", "DeviceId_PhysicalLocation", null, handleDevice, handlePhyAddress, null, null);
}
arangoDriver.executeBatch();
I would build all of the data to be inserted into a json formatted string and use createDocumentRaw to create them all at once with one save.

parse lazy load result table(json)

i try to parse this link : http://agent.bronni.ru/Result.aspx?id=c7a6a33a-174e-426d-b127-828ee612c36e&account=27178&page=1&pageSize=50&mr=true
but i can t get the result table because as i see in fiddler there are lazyloading method with json result.
My code is :
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load("http://agent.bronni.ru/Result.aspx?id=c7a6a33a-174e-426d-b127-828ee612c36e&account=27178&page=1&pageSize=50&mr=true");
// Get all tables in the document
HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");
// Iterate all rows in the first table
HtmlNodeCollection rows = tables[0].SelectNodes(".//tr");
var data = rows.Skip(1).ToList().Take(10).ToList().Select(x => new TableRow()
{
Price = x.SelectNodes(".//td").ToList()[4].InnerText,
Operator = x.SelectNodes(".//td").ToList()[15].InnerText,
DepartureDate = x.SelectNodes(".//td").ToList()[6].InnerText,
DestinationRegion = x.SelectNodes(".//td").ToList()[7].InnerText
}).ToList();
UPDATE
Second site :
Code
WebClient wc = new WebClient();
wc.Headers.Add("Referer", "http://sletat.ru/");//MUST BE THIS HEADER
string result = wc.DownloadString("http://module.sletat.ru/Main.svc/GetTours?cityFromId=832&countryId=35&cities=&meals=&stars=&hotels=&s_adults=1&s_kids=0&s_kids_ages=&s_nightsMin=6&s_nightsMax=16&s_priceMin=0&s_priceMax=&currencyAlias=RUB&s_departFrom=25%2F06%2F2012&s_departTo=31%2F07%2F2012&visibleOperators=&s_hotelIsNotInStop=true&s_hasTickets=true&s_ticketsIncluded=true&debug=0&filter=0&f_to_id=&requestId=19198631&pageSize=20&pageNumber=1&updateResult=1&includeDescriptions=1&includeOilTaxesAndVisa=1&userId=&jskey=1&callback=_jqjsp&_1340633427022=");
result = result.Substring(result.IndexOf("{"), result.LastIndexOf("}") - result.IndexOf("{") + 1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json = js.DeserializeObject(result);
var prices = json["GetToursResult"]["Data"]["aaData"] as object[];
// var operators = ((object[])json["result"]["prices"]).Cast<Dictionary<string, object>>();
var temp = prices.ToList().Take(20).Select(x => new TableRow
{
Operator = (x as object[])[40].ToString(),
//Price = x["operatorPrice"].ToString(),
//DepartureDate = x["checkinDate"].ToString(),
//DestinationRegion = ((Dictionary<string, object>)x["country"])["englishName"].ToString()
}).ToList();
string str = "";
foreach (var tableRow in temp)
{
str += tableRow.Operator + "<br />";
}
Response.Write(str);
In this way i try all works ok but the problem is that this link works for roughly 30minutes and then i need to put other link again.Is any way to fix this?(only the second site has it)
THanks again,
The data is really coming from here:
http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=3&pageSize=50&_=1340131756631
With the exception that the page=# and pageSize=# can be adjusted dynamically.
So instead of parsing HTML, you could just get the JSON data from the URL and parse it. For example:
WebClient wc = new WebClient();
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=1000&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json = js.DeserializeObject(result);
var prices = ((object[])json["result"]["prices"]).Cast<Dictionary<string,object>>();
var data = from p in prices
select new
{
OperatorID = p["operatorID"],
Price = p["operatorPrice"],
Country = ((Dictionary<string,object>)p["country"])["englishName"],
CheckinDate = p["checkinDate"]
};
Console.WriteLine(data);
On my LinqPad program, produces something like:
OperatorID Price Country CheckinDate
0 1,27 Greece 2012-06-28
0 55,90 Greece 2012-06-28
0 67,34 Greece 2012-06-28
And many more rows, depending on how much you ask for...
Note: the reason for the result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1); line is that the jsonp result has this garbage in the beginning:
jQuery17207647891761735082_1340131755603({"
Ending with }) which makes the JavascriptSerializer choke when it tries to parse it; hence the need to remove it.
Update:
Interestingly, the ASHX handler that returns the data seems to require a Referer Header in the request; otherwise, the response will not include the operator information. The Referer required cannot be anything you want, it seems that it's actually looking for http://agent.bronni.ru in particular.
Basically, all you need to do is the following:
WebClient wc = new WebClient();
wc.Headers.Add("Referer","http://agent.bronni.ru");//MUST BE THIS HEADER
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=1000&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json = js.DeserializeObject(result);
var prices = ((object[])json["result"]["prices"]).Cast<Dictionary<string,object>>();
var data = from p in prices
select new
{
OperatorID = p["operatorID"],
Price = p["operatorPrice"],
Country = ((Dictionary<string,object>)p["country"])["englishName"],
Hotel = ((Dictionary<string,object>)p["hotel"])["englishName"],
Operator = ((Dictionary<string,object>)p["operator"])["englishName"],//OPERATOR
CheckinDate = p["checkinDate"]
};
OperatorID Price Country Hotel Operator CheckinDate
19681 1,27 Greece Julia Hotel Mouzenidis Travel 2012-06-28
19681 1,27 Greece Forest Park Mouzenidis Travel 2012-06-28
19681 1,27 Greece Kassandra Mare (ï-îâ Êàññàíäðà) Mouzenidis Travel 2012-06-28
UPDATE 2:
I decided to compare the performance of the out-of-the-box Javascriptserializer vs JSON.NET serializer and in all my tests with different record sizes (50,1000,3000) JSON.NET was at least twice faster than the Javascriptserializer and in some cases even 10 times faster on smaller record-sets.
If you decide to use the JSON.NET library, here's the code that will get you the same results as above code:
WebClient wc = new WebClient();
wc.Headers.Add("Referer","http://agent.bronni.ru");
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=50&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JObject o = JObject.Parse(result);
var data = from x in o["result"]["prices"]
select new
{
OperatorID = x["operatorID"],
Price = x["operatorPrice"],
Country = x["country"]["englishName"],
Hotel = x["hotel"]["englishName"],
Operator = x["operator"]["englishName"],
CheckinDate = x["checkinDate"]
};
Console.WriteLine(data);

Action Script 3.0 : How to extract two value from string.?

HI
I have a URL
var str:String = "conn=rtmp://server.com/service/&fileId=myfile.flv"
or
var str:String = "fileId=myfile.flv&conn=rtmp://server.com/service/"
The str might be like this, But i need to get the value of "conn" and "fileId" from the string.
how can i write a function for that.
I'm guessing that you're having trouble with the second '=' in the string. Fortunatly, ActionScript's String.Split method supports splitting on strings, so the following code should work:
var str:String = "conn=rtmp://server.com/service/&fileId=myfile.flv";
var conn:String = (str + "&").Split("conn=")[1].Split("&")[0];
and
var str:String = "fileId=myfile.flv&conn=rtmp://server.com/service/";
var fileId:String = (str + "&").Split("fileId=")[1].Split("&")[0];
Note: I'm appending a & to the string, in case the string didn't contain any url parameters beyond the one we're looking for.
var str:String = "fileId=myfile.flv&conn=rtmp://server.com/service/"
var fa:Array = str.split("&");
for(var i:uint=0;i<fa.length;i++)
fa[i] = fa[i].split('=');
That's how the "fa" variable be in the end:
fa =
[
["fileId","myfile.flv"],
["conn","rtmp://server.com/service/"]
]
var url:String = "fileId=myfile.flv&conn=rtmp://server.com/service/";
var strArray:Array = url.split(/=/);
trace(strArray[0]) //Just to test
returns an array, with the word 'conn or fileid' in index 0 - 2 (anything even), alternatives of 1, 3 is the information within.
Or was it something else you needed?

Resources