R's XML to Dataframe - r

I have the next XML Example and I want to get "listing" data in a R dataframe. What's the best way to do it, ex: How would you configure xmlToDataFrame function to do the job starting with an URL?
<response>
<area_name/>
<bounding_box>
<latitude_max>51.667389</latitude_max>
<latitude_min>51.385262</latitude_min>
<longitude_max>0.137236</longitude_max>
<longitude_min>-0.34844</longitude_min>
</bounding_box>
<country>England</country>
<county>London</county>
<latitude>51.5263255</latitude>
<listing>
<agent_address>218a Brick Lane</agent_address>
<agent_logo/>
<agent_name>Salik & Co</agent_name>
<agent_phone>020 3318 7059</agent_phone>
<country/>
<county>London</county>
<description>
Description:
Salik & Co is offering this Commercial Freehold Property for sale on Wise Road, Stratford, London E15... Premises consists of:
12 X 2 Bedroom Flat AS Following-
* 2x Pent House With One Bath
* 2x Flat With 2 Double Bed With 2 Bath
* 8x Flat With 1 Bath
* 4x Car Park
* 4x floor with ground floor parking
* Lift
* Flat 11 & 13 Pent House with 500 sqft Terrace.
* Flat 12 & Ground Floor shop sold for Long lease. Area Profile:
Stratford is a place in the London Borough of Newham in East London. It will be the primary location of the 2012 Summer Olympics. The area is identified in the London Plan as one of 35 major centres in Greater London. Stratford has been a focus of regeneration for some years, and is the location of a number of major projects.... Property Location:
Set only moments from the vibrant amenities of Stratford, Olympic park, Westfield Shopping centre. This modern 2 bed stunning flat offers contemporary accommodation with a private balcony, in a fabulous new eco building close to the green open spaces of East London
Situated on Stratford High Street the property enjoys swift access into the fashionable bars, restaurants and boutiques of Stratford High Street. Transport links include Stratford (Central line, Jubilee Line and British Rail) which provide further link to district line. Price:
Asking price £2.8 million. For further information contact:
Ryan -
Salik -
Office -
Email -
Web -
</description>
<details_url>http://www.zoopla.co.uk/for-sale/details/4507257</details_url>
<displayable_address>Wise Road, London</displayable_address>
<image_caption/>
<image_url>
http://images.zoopla.co.uk/52367ed1b61a63c1b93f1ec0a70d39f83d590c74_354_255.jpg
</image_url>
<latitude>51.53477</latitude>
<listing_id>4507257</listing_id>
<listing_status>sale</listing_status>
<longitude>-0.0045035</longitude>
<num_bathrooms>0</num_bathrooms>
<num_bedrooms>0</num_bedrooms>
<num_floors>0</num_floors>
<num_recepts>0</num_recepts>
<outcode>E15</outcode>
<post_town>London</post_town>
<price>2800000</price>
<price_change>
<date>2010-04-27 00:42:05</date>
<price>3000000</price>
</price_change>
<price_change>
<date>2010-07-04 03:21:14</date>
<price>2800000</price>
</price_change>
<property_type/>
<street_name>Commercial Property</street_name>
<thumbnail_url>
http://images.zoopla.co.uk/52367ed1b61a63c1b93f1ec0a70d39f83d590c74_80_60.jpg
</thumbnail_url>
</listing>
<listing>
<agent_address>Rawlings House 2a Milner Street</agent_address>
<agent_logo>
http://static.zoopla.co.uk/zoopla_static_agent_logo_(29807).gif
</agent_logo>
<agent_name>Marsh & Parsons</agent_name>
<agent_phone>020 3318 6922</agent_phone>
<country/>
<county>London</county>
<description>
A stunning, three bed mews house with spacious, roof terrace in a popular gated development off Milner Street and ideally located for the nearby amenities of South Kensington and Knightsbridge. A large reception room and a contemporary, open plan kitchen occupy the ground floor, while the first floor houses two double bedrooms and a modern family bathroom. An additional shower room can be found on the second floor which also has a substantial area suitable for a third bedroom and access to the sunny roof terrace. Located in the heart SW3's Brompton area, there is a multitude of local amenities available at Chelsea's popular King's Road and Sloane Street, while the shops, bars and restaurants of Brompton Road and Knightsbridge are easily reached. St. Catherine's Mews is ideally located for the Underground stations at both Sloane Square (Circle and District Lines) and South Kensington (Piccadilly, Circle and District Lines) while such a central location provides a number of convenient bus services. For transport links into and out of London the motorways can be accessed via the nearby A4.The property also has planning permission to extend the 2nd floor to encompass some of the roof terrace that would create greater internal square footage.
</description>
<details_url>http://www.zoopla.co.uk/for-sale/details/491528</details_url>
<displayable_address>St Catherines Mews, London SW3</displayable_address>
<floor_plan>
http://content.zoopla.co.uk/5cb125b77de67eb6717bd8a2c74ba7edb6839959.jpg
</floor_plan>
<image_caption>Picture No.46</image_caption>
<image_url>
http://images.zoopla.co.uk/0867139d8bc2e2aac63056d75bbce1677de438c6_354_255.jpg
</image_url>
<latitude>51.493774</latitude>
<listing_id>491528</listing_id>
<listing_status>sale</listing_status>
<longitude>-0.164762</longitude>
<num_bathrooms>0</num_bathrooms>
<num_bedrooms>2</num_bedrooms>
<num_floors>0</num_floors>
<num_recepts>0</num_recepts>
<outcode>SW3</outcode>
<post_town>London</post_town>
<price>1500000</price>
<price_change>
<date>2009-05-16 01:40:29</date>
<price>1350000</price>
</price_change>
<price_change>
<date>2010-02-20 00:30:24</date>
<price>1450000</price>
</price_change>
<price_change>
<date>2011-02-12 00:31:53</date>
<price>1500000</price>
</price_change>
<property_type>Town house</property_type>
<street_name>London</street_name>
<thumbnail_url>
http://images.zoopla.co.uk/0867139d8bc2e2aac63056d75bbce1677de438c6_80_60.jpg
</thumbnail_url>
</listing>
<listing>
<agent_address>175 Putney High Street, Putney</agent_address>
<agent_logo>
http://static.zoopla.co.uk/zoopla_static_agent_logo_(47723).jpeg
</agent_logo>
<agent_name>Foxtons - Putney</agent_name>
<agent_phone>020 3318 9160</agent_phone>
<country/>
<county>London</county>
<description>
A stunning four bedroomed house offering exceptionally spacious accommodation with stylish interior throughout. The property is arranged over four floors and comprises two good-sized reception rooms, generous 29' kitchen/dining room, four bedrooms (two with en suite), two bathrooms, two shower rooms, utility room, guest cloakroom, attractive garden and off-street parking. Akehurst Street is a quiet residential road located close to the green expanses of Richmond Park, and close to amenities in Roehampton with a greater selection of shops, bars and restaurants within easy reach in Putney. The area is well served by a number of local bus routes, while the nearby A3 provides motorists with a fast route into central London and to the South-West.
</description>
<details_url>http://www.zoopla.co.uk/for-sale/details/14226965</details_url>
<displayable_address>Akehurst Street, London</displayable_address>
<image_caption/>
<image_url>
http://images.zoopla.co.uk/5a24b05bff28865405aee72bcf4c46ae2ce299a4_354_255.jpg
</image_url>
<latitude>51.450905</latitude>
<listing_id>14226965</listing_id>
<listing_status>sale</listing_status>
<longitude>-0.242762</longitude>
<num_bathrooms>0</num_bathrooms>
<num_bedrooms>4</num_bedrooms>
<num_floors>0</num_floors>
<num_recepts>0</num_recepts>
<outcode>SW15</outcode>
<post_town>London</post_town>
<price>1499950</price>
<property_type>Town house</property_type>
<street_name>Akehurst Street</street_name>
<thumbnail_url>
http://images.zoopla.co.uk/5a24b05bff28865405aee72bcf4c46ae2ce299a4_80_60.jpg
</thumbnail_url>
</listing>
<listing>
<agent_address>55 Fulham Broadway, Fulham</agent_address>
<agent_logo>
http://static.zoopla.co.uk/zoopla_static_agent_logo_(47732).jpeg
</agent_logo>
<agent_name>Foxtons - Fulham</agent_name>
<agent_phone>020 3318 6868</agent_phone>
<country/>
<county>London</county>
<description>
Located on a quiet residential street in Fulham, this great four bedroomed house offers spacious accommodation with loft conversion and south-facing flat roof. Arranged over three floors, the property comprises reception room with bay window, dining room, kitchen with space to dine and access to the garden, top floor master bedroom with en suite shower room, large second bedroom, two additional bedrooms, bathroom and outside store room. The property is situated on a tree-lined street, ideally located just moments from the local amenities on both Dawes Road and Lillie Road and is within easy reach of a more a comprehensive range of bars, shops and restaurants on nearby Fulham Broadway. The closest underground station is Fulham Broadway (District Line), providing convenient access to various central and greater London destinations.
</description>
<details_url>http://www.zoopla.co.uk/for-sale/details/14351415</details_url>
<displayable_address>Prothero Road, London</displayable_address>
<image_caption/>
<image_url>
http://images.zoopla.co.uk/a9d983b76018a07537de832224bb5174099c2758_354_255.jpg
</image_url>
<latitude>51.48186</latitude>
<listing_id>14351415</listing_id>
<listing_status>sale</listing_status>
<longitude>-0.208447</longitude>
<num_bathrooms>0</num_bathrooms>
<num_bedrooms>4</num_bedrooms>
<num_floors>0</num_floors>
<num_recepts>0</num_recepts>
<outcode>SW6</outcode>
<post_town>London</post_town>
<price>750000</price>
<property_type>Town house</property_type>
<street_name>Prothero Road</street_name>
<thumbnail_url>
http://images.zoopla.co.uk/a9d983b76018a07537de832224bb5174099c2758_80_60.jpg
</thumbnail_url>
</listing>
<longitude>-0.105602</longitude>
<postcode/>
<result_count>76494</result_count>
<street/>
<town/>
</response>
Thank You

Dataframes require that the input data be "rectangular". You clearly do not have such a data arrangement. R's list data type is more appropriate to something like this. The other problem appears to be the presence of ampersands, "&", in this file. Given that "&" is an escape character in html, the safer maneuver here would have been to replace "& " with "&amp ". If you change all of the "&"'s to "and" you can get xmlToList() to read it and create a corresponding list ( but that might damage other files if they had valid HTML buried in them).
The listing data can be extracted by matching 'listing' to the names of the list :
xmlToList(f)[grep("listing", names(xmlToList(f)))]
(You will need to provide an URL.)

Related

How to find a route between two stations in PROLOG

Sorry, I know this question comes up a lot but I've done so much research and I just can't figure out how to solve this problem. I have represented a tube map in PROLOG and need to write a predicate that returns all routes between two stations. I know I need to use recursion and have tried a bunch of different solutions but none are working. The facts I have are:
station('AL',[metropolitan]).%Aldgate on the Metropolitan Line
station('BG',[central]).%Bethnal Green on the Central Line
station('BR',[victoria]).%Brixton on the Victoria Line
station('BS',[metropolitan]).%Baker Street on the Metropolitan Line
station('CL',[central]).%Chancery Lane on the Central Line
station('EC',[bakerloo]).%Elephant & Castle on the Bakerloo Line
station('EM',[bakerloo,northern]).%Embankment on the Bakerloo and Northern Lines
station('EU',[northern]).%Euston on the Northern Line
station('FP',[victoria]).%Finsbury Park on the Victoria Line
station('FR',[metropolitan]).%Finchley Road on the Metropolitan Line
station('KE',[northern]).%Kennington on the Northern Line
station('KX',[metropolitan,victoria]).%Kings Cross on the Metropolitan and Victoria Lines
station('LG',[central]).%Lancaster Gate on the Central Line
station('LS',[central,metropolitan]).%Liverpool Street on the Central and Metropolitan Lines
station('NH',[central]).%Notting Hill Gate on the Central Line
station('OC',[bakerloo,central,victoria]).%Oxford Circus on the Bakerloo, Central and Victoria Lines
station('PA',[bakerloo]).%Paddington on the Bakerloo Line
station('TC',[central,northern]).%Tottenham Court Road on the Central and Northern Lines
station('VI',[victoria]).%Victoria on the Victoria Line
station('WA',[bakerloo]).%Warwick Avenue on the Bakerloo Line
station('WS',[northern,victoria]).%Warren Street on the Northern and Victoria Lines
adjacent('WA','PA').%Warwick Avenue is adjacent to Paddington
adjacent('PA','OC').%Paddington is adjacent to Oxford Circus
adjacent('OC','EM').%Oxford Circus is adjacent to Embankment
adjacent('EM','EC').%Embankment is adjacent to Elephant & Castle
adjacent('NH','LG').%Notting Hill Gate is adjacent to Lancaster Gate
adjacent('LG','OC').%Lancaster Gate is adjacent to Oxford Circus
adjacent('OC','TC').%Oxford Circus is adjacent to Tottenham Court Road
adjacent('TC','CL').%Tottenham Court Road is adjacent to Chancery Lane
adjacent('CL','LS').%Chancery Lane is adjacent to Lviverpool Street
adjacent('LS','BG').%Liverpool Street is adjacent to Bethnal Green
adjacent('FR','BS').%Finchley Road is adjacent to Baker Street
adjacent('BS','KX').%Baker Street is adjacent to Kings Cross
adjacent('KX','LS').%Kings Cross is adjacent to Liverpool Street
adjacent('LS','AL').%Liverpool Street is adjacent to Algate
adjacent('EU','WS').%Euston is adjacent Warren Street
adjacent('WS','TC').%Warren Street is adjacent to Tottenham Court Road
adjacent('TC','EM').%Tottenham Court Road is adjacent to Embankment
adjacent('EM','KE').%Embankment is adjacent to Kennington
adjacent('BR','VI').%Brixton is adjacent to Victoria
adjacent('VI','OC').%Victoria is adjacent to Oxford Circus
adjacent('OC','WS').%Oxford Circus is adjacent to Warren Street
adjacent('WS','KX').%Warrent Street is adjacent to Kings Cross
adjacent('KX','FP').%Kings Cross is adjacent to Finsbury Park
And the solution I have tried is:
route(From,To,Route) :-
routeattempt(From,To,[From],Route),
reverse(Route,route).
routeattempt(From,To,Inbetween,Route) :-
adjacent(From,To),
\+member(From,Inbetween),
Route = [From|Inbetween].
routeattempt(From,To,Visited,Route) :-
adjacent(From,Inbetween),
Inbetween \== To,
\+member(Inbetween,Visited),
routeattempt(Inbetween,To,Inbetween|Visited],Route).
But it just returns false to any input. If anyone could help that would be great.
This is confused.
In
route(From,To,Route) :-
routeattempt(From,To,[From],Route),
reverse(Route,route).
the idea is clearly for routeattempt/4 to "find a route from From to To to be stored as a list of stations in Route. But what does argument 3, here [From] do exactly? It seems to be the list of visited stations, but do your really need it? You have Route already.
Now you have to subcases: either From or To are adjacent:
routeattempt(From,To,Inbetween,Route) :-
adjacent(From,To),
\+member(From,Inbetween),
Route = [From|Inbetween].
or they are are not:
routeattempt(From,To,Visited,Route) :-
adjacent(From,Inbetween),
Inbetween \== To,
\+member(Inbetween,Visited),
routeattempt(Inbetween,To,Inbetween|Visited],Route).
In the first case (the base case), it is unclear why to check whether From is a member of Inbetween. Indeed, this will preclude finding any route between between two adjacent stations, because route/3 will already have deposited From into the list Inbetween. And shouldn't the route between two adjacent stations not just be [From,To] instead of [From|Inbetween]?
In the second case, you go from From to To by waystation Inbetween, which is also not equal to To, and has not been visited yet. Okay. But you need to complete Route after the recursive call.
Try add a format("Current route from ~w to ~w: ~w\n", [From, To, Route]) before adjacent/2 and see what happens.

How to simulate river level rise in R

I need to make a simulation to see what areas would be affected if the sea level rises in X meters. Could anyone give me tips were to start? I've search for tools embedded in the google maps API but didn't find any workaround.
The idea is to create a function such as this:
isAffected <- function( coordinate, metersRised)
---- return True if it is affected, false otherwise
Thanks in advance!
First reaction is I can't see there being any quick straightforward solution with off the shelf R libraries/data sets on top of which to build a function like that. Second is wondering if you'd like to model it or rely on already developed products, or something in the middle. The most rigorous would be applying a hydrodynamic model and the other bookend is sampling someone else's grid of anticipated results.
Just for context, For river level affected by sea level rise near the coast, you may want to consider variable river stages if they vary quite a bit. If the rivers are running high due to recent storms or snowmelt events, it will worsen flooding from sea level rise alone. So maybe you could assume a limited number of river heights (say rainy season - high, dry season - low). Tides complicate things too, as do storms and storm surge - basically above average ocean heights due to the temporary very low pressure. An example worst cast scenario with those three components is, how much of x city (regional coastline) would be flooded, say New Orleans or Australian coast, during storm surge, a high tide, and the local river very full from spring snowmelt, with 5 feet more extra sea level added, so lots of data needs to consider - eg you may want some sort of x,y,z data for those river height assumptions. Lots of cities have inundation maps where you can get those river stage elevations. The bigger the sea level rise assumption, the less the rivers might matter. Eg, a huge sea level rise scenario could easily inundate the whole city as it is today, no matter how high the river is, with the mouth of the river moving miles inland.
Simplyifying things, I'd say the most important data will be the digital elevation model (DEM), probably a raster file of x,y,z coordinates, with z being the key piece - the elevation of a pixel at every xy location above some certain datum. Higher resolution DEMs will give much more detailed and realistic inundation. Processed LiDAR data is maybe ideal - very high resolution data that some else has produced - raw LiDAR data is a burden. There's at least some here for New Zealand - http://opentopo.sdsc.edu/datasets - but I'm not sure of good warehouses for data outside the US.
A basic workflow might be, decide what hydraulic components you'll consider and how many scenarios. Eg, you'll ignore tides by using an average sea level and have just two sea level rise scenarios, and assume the river is always at __ feet, or maybe __ ft and __ ft. Download/build DEM, and then add your river heights to the digital elevation model (not trivial, but searching GIS Stack overflow a good start). That's a reference baseline elevation to combine sea water with. With an assumption of sea level rise, say 10 feet, that's incorporated into another DEM, one approach is raster math centric, subtracting one from the other and the result will show the new inundation areas. Once you've done the raster math, you could have a binary xy grid with either flooded or not flooded, to apply that final xy search function: is xy 1 or 0, but by far the trickiest part is all before that. There's maybe more straightforward or simplified approaches, but the system is so dyanmic so the sky is the limit for how complicated your model will be. Here's more information on the river component, that might help visualize the river starting points to which you'll add your sea water scenario(s) https://www.usgs.gov/mission-areas/water-resources/science/flood-inundation-mapping-science?qt-science_center_objects=0#qt-science_center_objects
The library raster might be a good start, that will read in downloaded raster/grid files, like .tif, and also perform the raster math you'd need - adding/subtracting same size rasters together. Or forgetting all this processing, maybe you could just read in pre-processed rasters of such scenarios done by others, then do your search on them. There's probably a good number for certain sea level rises, but it just gets much trickier if you want to assume both sea level and river elevation scenarios.

TomTom reverse Geocode not returning street name for some gps coordinates

For some GPS coordinates, no street name is returned. For example
https://api.tomtom.com/search/2/reverseGeocode/47.532289,-122.251843.json?key=MYKEY&roadUse=[%22LocalStreet%22]&returnRoadUse=true
returns
{"summary":{"queryTime":102,"numResults":1},"addresses":[{"address":{"routeNumbers":[],"countryCode":"US","countrySubdivision":"WA","countrySecondarySubdivision":"King","countryTertiarySubdivision":"Seattle East","municipality":"Mercer Island","postalCode":"98040","municipalitySubdivision":"Mercer Island","country":"United States","countryCodeISO3":"USA","freeformAddress":"Mercer Island, WA 98040","boundingBox":{"northEast":"47.535094,-122.241410","southWest":"47.534766,-122.242287","entity":"position"},"countrySubdivisionName":"Washington"},"position":"47.534897,-122.242287","roadUse":["Publicly Accessible","LocalStreet","Terminal"]}]}
Which contains no street name. Is there anyway to tell the TomTom API to return results that ALWAYS include a street name?
This may be a hack, but if we create a pedestrian route starting with that location (in the middle of Lake Washington) the first instruction starts usually at the closest street that can be navigable, isn't?
These coordinates are pointing to a lake. And the nearest road is some living street with no name. So that is not a perfect example.

Find map coordinates based on street and cross streets information

Example:
I have a street BECK STREET and two cross streets WESTCHESTER SQUARE and KIRK STREET. As well as direction W.
Result:
Need to drop a pin at this location. If it's easier I would also be ok with drawing a geofence.
What would be a good way to calculate this.

standardize text using phonetic

i received data from a datacenter and i have to cleanse and make data useful and my biggest problem is one column lets call it "service_description" and for example the data center belong to a hair salon, this column is filled manually (text box) and contain huge amount of data (Billions), here is a small sample
service description
washed the haair
hair washed and dried
used shampoo on har
nails manicure
nail paint
nail pant
paint the nails
what i need to do is get each category together by ruining a script that will analyze each line and give it specif category e.g. hair could be the category for the first three lines because it is repeated in all of them while nail is category for the rest, taking in consideration the category word could be misspelled.
results
service description possible categories
washed the haair hair
hair washed and dried hair
used shampoo on har hair
nails manicure nail
nail paint nail
nail pant nail
paint the nails nail
I'm assuming your categories are fixed lookup.
I would split the string by white spaces; and for each part I would go through all items in your categories lookup, and pick the one with minimum levenshtein distance.
Some references:
http://en.wikipedia.org/wiki/Levenshtein_distance
http://www.codeproject.com/Articles/13525/Fast-memory-efficient-Levenshtein-algorithm

Resources