lnternational/Domestic flight indicator in Sabre reservation - sabre

When using Sabre APIs, is there any reliable indicator available in a Sabre TravelItineraryReadRS (or GetReservation) or other API that indicates whether a flight is international or domestic?
I want to avoid adding complexity and having to maintain a separate list of airport codes and countries if possible, and instead just use an indicator from a response.
I've checked <FlightSegment> in <PTC_FareBreakdown> but nothing seems to indicate internationality:
<tir39:FlightSegment ConnectionInd="O" DepartureDateTime="02-24T13:00" FlightNumber="123" ResBookDesigCode="E" SegmentNumber="1" Status="SS">
<tir39:BaggageAllowance Number="01P"/>
<tir39:FareBasis Code="AFB112"/>
<tir39:MarketingAirline Code="VA" FlightNumber="123"/>
<tir39:OriginLocation LocationCode="BNE"/>
<tir39:ValidityDates>
<tir39:NotValidAfter>2019-02-24</tir39:NotValidAfter>
<tir39:NotValidBefore>2019-02-24</tir39:NotValidBefore>
</tir39:ValidityDates>
</tir39:FlightSegment>
and also checked in <ReservationItems><Item>, e.g.:
<tir39:Item RPH="1">
<tir39:FlightSegment AirMilesFlown="0466" ArrivalDateTime="05-18T14:40" DayOfWeekInd="6" DepartureDateTime="2019-05-18T13:05" SegmentBookedDate="2018-12-21T11:20:00" ElapsedTime="01.35" eTicket="true" FlightNumber="0529" NumberInParty="01" ResBookDesigCode="E" SegmentNumber="0001" SmokingAllowed="false" SpecialMeal="false" Status="HK" StopQuantity="00" IsPast="false" CodeShare="false" Id="123">
<tir39:DestinationLocation LocationCode="SYD" Terminal="TERMINAL 3 DOMESTIC" TerminalCode="3"/>
<tir39:Equipment AirEquipType="21B"/>
<tir39:MarketingAirline Code="QF" FlightNumber="0529">
<tir39:Banner>MARKETED BY QANTAS AIRWAYS</tir39:Banner>
</tir39:MarketingAirline>
<tir39:Meal Code="L"/>
<tir39:OperatingAirline Code="QF" FlightNumber="0529" ResBookDesigCode="E">
<tir39:Banner>OPERATED BY QANTAS AIRWAYS</tir39:Banner>
</tir39:OperatingAirline>
<tir39:OperatingAirlinePricing Code="QF"/>
<tir39:DisclosureCarrier Code="QF" DOT="false">
<tir39:Banner>QANTAS AIRWAYS</tir39:Banner>
</tir39:DisclosureCarrier>
<tir39:OriginLocation LocationCode="BNE" Terminal="DOMESTIC" TerminalCode="D"/>
<tir39:UpdatedArrivalTime>05-18T14:40</tir39:UpdatedArrivalTime>
<tir39:UpdatedDepartureTime>05-18T13:05</tir39:UpdatedDepartureTime>
</tir39:FlightSegment>
</tir39:Item>
and although these have origin/destination airports, neither indicate whether the flight is international or not, and the terminal name is not reliable as an indicator.
<PriceQuotePlus> has a DomesticIntlInd attribute that initially looked useful:
<tir39:PriceQuotePlus DomesticIntlInd="I" PricingStatus="S" VerifyFareCalc="false" ItineraryChanged="false" ...>
but PriceQuotePlus and therefore DomesticIntlInd does not seem to be present in all circumstances. e.g. I have TravelItineraryReadRs responses where there is no PriceQuotePlus element, but still contains ReservationItem/Item/FlightSegment elements that I need to be able to identify as International or Domestic.
Not only this, but as an example, I have a reservation where "DomesticIntlInd" is set to "I" in a reservation that does not have an International flight (it has only one flight, and that flight is domestic (BNE-SYD)).
Any other thoughts on where I might find a reliable international flight indicator or is this functionality simply not available?

Sabre does expose a City Pairs API that includes country codes for each airport, which you could use to infer whether a flight started and ended in the same country.
They also expose this as a list that you could build into your own data table, but the API would probably be more futureproof.
The current file can be found here, but I don't know if that link will work forever.

Related

Extracting full article text via the newsanchor package [in R]

I am using the newsanchor package in R to try to extract entire article content via NewsAPI. For now I have done the following :
require(newsanchor)
results <- get_everything(query = "Trump +Trade", language = "en")
test <- results$results_df
This give me a dataframe full of info of (maximum) a 100 articles. These however do not containt the entire actual article text. Rather they containt something like the following:
[1] "Tensions between China and the U.S. ratcheted up several notches over the weekend as Washington sent a warship into the disputed waters of the South China Sea. Meanwhile, Google dealt Huaweis smartphone business a crippling blow and an escalating trade war co… [+5173 chars]"
Is there a way to extract the remaining 5173 chars. I have tried to read the documentation but I am not really sure.
I don't think that is possible at least with free plan. If you go through the documentation at https://newsapi.org/docs/endpoints/everything in the Response object section it says :
content - string
The unformatted content of the article, where available. This is truncated to 260 chars for Developer plan users.
So all the content is restricted to only 260 characters. However, test$url has the link of the source article which you can use to scrape the entire content but since it is being aggregated from various sources I don't think there is one automated way to do this.

How to get the items count in xml using marklogic

I am new to MarkLogic ..I need to get the total count of books from the following XML. Can anyone suggest me.
<bk:bookstore xmlns:bk="http://www.bookstore.org">
<bk:book category='Computer'>
<bk:author>Gambardella, Matthew</bk:author>
<bk:title>XML Developer's Guide</bk:title>
<bk:price>44.95</bk:price>
<bk:publish_year>1995</bk:publish_year>
<bk:description>An in-depth look at creating applications with XML.
</bk:description>
</bk:book>
<bk:book category='Fantasy'>
<bk:author>Ralls, Kim</bk:author>
<bk:title>Midnight Rain</bk:title>
<bk:price>5.95</bk:price>
<bk:publish_year>2000</bk:publish_year>
<bk:description>A former architect battles corporate zombies, an evil
sorceress, and her own childhood to become queen of the world.
</bk:description>
</bk:book>
<bk:book category='Comic'>
<bk:author>Robert M. Overstreet</bk:author>
<bk:title>The Overstreet Indian Arrowheads Identification </bk:title>
<bk:price>2000</bk:price>
<bk:publish_year>1991</bk:publish_year>
<bk:description>A leading expert and dedicated collector, Robert M.
Overstreet has been writing The Official Overstreet Identification and
Price
Guide to Indian Arrowheads for more than 21 years</bk:description>
</bk:book>
<bk:book category='Comic'>
<bk:author>Randall Fuller</bk:author>
<bk:title>The Book That Changed America</bk:title>
<bk:price>1000</bk:price>
<bk:publish_year>2017</bk:publish_year>
<bk:description>The New York Times Book Review Throughout its history
America has been torn in two by debates over ideals and beliefs.
</bk:description>
</bk:book>
</bk:bookstore>
Can anyone find the solution for this question as I am new to this.
Id suggest using a cts:count-aggregate in combination with cts:element-reference. This requires you to have a element range index on book.
cts:count-aggregate(cts:element-reference(fn:QName("http://www.bookstore.org", "book")))
If performance isn't too critical and your document count isn't too large, you could also count with fn:count.
declare namespace bk="http://www.bookstore.org";
fn:count(//bk:book)
Try this-
declare namespace bk="http://www.bookstore.org";
let $book_xml :=
<bk:bookstore xmlns:bk="http://www.bookstore.org">
</bk:book>
........
........
</bk:book>
</bk:bookstore>
return fn:count($book_xml//bk:book)
Hope That Helps !

Can MeCab be configured / enhanced to give me the reading of English words too?

If I begin with a wholly Japanese sentence and run it through MeCab, I get something like this:
$ echo "吾輩は猫である" | mecab
吾輩 名詞,代名詞,一般,*,*,*,吾輩,ワガハイ,ワガハイ
は 助詞,係助詞,*,*,*,*,は,ハ,ワ
猫 名詞,一般,*,*,*,*,猫,ネコ,ネコ
で 助動詞,*,*,*,特殊・ダ,連用形,だ,デ,デ
ある 助動詞,*,*,*,五段・ラ行アル,基本形,ある,アル,アル
EOS
If I smash together everything I get from the last column, I get "ワガハイワネコデアル", which I can then feed into a speech synthesis program and get output. Said program, however, doesn't handle English words.
I throw English into MeCab, it manages to tokenise it (probably naively at the spaces), but gives no reading:
$ echo "I am a cat" | mecab
I 名詞,固有名詞,組織,*,*,*,*
am 名詞,一般,*,*,*,*,*
a 名詞,一般,*,*,*,*,*
cat 名詞,固有名詞,組織,*,*,*,*
EOS
I want to get readings for these as well, even if they're not perfect, so that I can get something along the lines of "アイアムアキャット".
I have already scoured the web for solutions and whereas I do find a bunch of web sites which have transliteration that appears to be adequate, I can't find any way to do it in my own code. In a couple of cases, I emailed the site authors and got no response yet after waiting for a few weeks. (Just how far behind on their inboxes are these people?)
There are a number of directions I can go but I hit dead ends on all of them so far, so this is my compound question:
MeCab takes custom dictionaries. Is there a custom dictionary which fills in the English knowledge somewhat?
Is there some other library or tool that can take English and spit out Katakana?
Is there some library or tool that can take IPA (International Phonetic Alphabet) and spit out Katakana? (I know how to get from English to IPA.)
As an aside, I find that the software "VOICEROID" can speak English text (poorly, but adequately for my purposes). This software uses MeCab too (or at least its DLL and dictionary files are included in the install.) It also uses another library, Cabocha, which as far as I can tell by running it does the exact same thing as MeCab. It could be using custom dictionaries for either of these two libraries to do the job, or the code to do it could be in the proprietary AITalk library they are using. More research is needed and I haven't figured out how to run either tool against their dictionaries to test it out directly either.

How to connect data dictionaries to the unlabeled data

I'm working with some large government datasets from the Department of Transportation that are available as tab-delimited text files accompanied by data dictionaries. For example, the auto complaints file is a 670Mb file of unlabeled data (when unzipped), and comes with a dictionary. Here are some excerpts:
Last updated: April 24, 2014
FIELDS:
=======
Field# Name Type/Size Description
------ --------- --------- --------------------------------------
1 CMPLID CHAR(9) NHTSA'S INTERNAL UNIQUE SEQUENCE NUMBER.
IS AN UPDATEABLE FIELD,THUS DATA FOR A
GIVEN RECORD POTENTIALLY COULD CHANGE FROM
ONE DATA OUTPUT FILE TO THE NEXT.
2 ODINO CHAR(9) NHTSA'S INTERNAL REFERENCE NUMBER.
THIS NUMBER MAY BE REPEATED FOR
MULTIPLE COMPONENTS.
ALSO, IF LDATE IS PRIOR TO DEC 15, 2002,
THIS NUMBER MAY BE REPEATED FOR MULTIPLE
PRODUCTS OWNED BY THE SAME COMPLAINANT.
Some of the fields have foreign keys listed like so:
21 CMPL_TYPE CHAR(4) SOURCE OF COMPLAINT CODE:
CAG =CONSUMER ACTION GROUP
CON =FORWARDED FROM A CONGRESSIONAL OFFICE
DP =DEFECT PETITION,RESULT OF A DEFECT PETITION
EVOQ =HOTLINE VOQ
EWR =EARLY WARNING REPORTING
INS =INSURANCE COMPANY
IVOQ =NHTSA WEB SITE
LETR =CONSUMER LETTER
MAVQ =NHTSA MOBILE APP
MIVQ =NHTSA MOBILE APP
MVOQ =OPTICAL MARKED VOQ
RC =RECALL COMPLAINT,RESULT OF A RECALL INVESTIGATION
RP =RECALL PETITION,RESULT OF A RECALL PETITION
SVOQ =PORTABLE SAFETY COMPLAINT FORM (PDF)
VOQ =NHTSA VEHICLE OWNERS QUESTIONNAIRE
There are import instructions for Microsoft Access, which I don't have and would not use if I did. But I THINK this data dictionary was meant to be machine-readable.
My question: Is this data dictionary a standard format of some kind? I've tried to Google around, but it's hard to do so without the right terminology. I would like to import into R, though I'm flexible so long as it can be done programmatically.

Collect tweets with their related tweeters

I am doing text mining on tweets,I have collected random tweets form different accounts about some topic, I transformed the tweets into data frame, I was able to find the most frequent tweeters among those tweets(by using the column "screenName")... like those tweets:
[1] "ISCSP_ORG: #cybercrime NetSafe publishes guide to phishing:
Auckland, Monday 04 June 2013 – Most New Zealanders will have...
http://t.co/dFLyOO0Djf"
[1] "ISCSP_ORG: #cybercrime Business Briefs: MILL CREEK — H.M. Jackson
High School DECA chapter members earned the organizatio...
http://t.co/auqL6mP7AQ"
[1] "BNDarticles: How do you protect your #smallbiz from #cybercrime?
Here are the top 3 new ways they get in & how to stop them.
http://t.co/DME9q30mcu"
[1] "TweetMoNowNa: RT #jamescollinss: #senatormbishop It's the same
problem I've been having in my fight against #cybercrime. \"Vested
Interests\" - Tell me if …"
[1] "jamescollinss: #senatormbishop It's the same problem I've been
having in my fight against #cybercrime. \"Vested Interests\" - Tell me
if you work out a way!"
there are different tweeters have sent many tweets (in the collected dataset)
Now , I want to collect/group the related tweets for their corresponding tweeters/user..
Is there any way to do it using R ?? any suggestion? your help would be very appreciated.

Resources