How to follow links if in HTML element in href attribute we have href='#' in scrapy? - web-scraping

I am trying to scrape Niche.com website to extract all schools and details of schools which are present in each school links but if we try to follow the school link in href attribute we have href = "#" so scrapy unable to get inside each school page and collect the data
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
class NicheschoolsSpider(scrapy.Spider):
name = 'nicheschools'
allowed_domains = ['www.niche.com']
start_urls = ['https://www.niche.com/k12/search/best-schools/s/wisconsin/']
def parse(self, response):
schoollink = response.xpath("//div[#class='search-result__title-wrapper']/h2")
for school in schoollink:
name= school.xpath(".//text()").get()
link = school.xpath(".//#href").get()
yield {
'name':name,
'link':link
}
yield response.follow(url=link,callback =self.parse_schools)
def parse_schools(self,response):
name = response.xpath("//h1[#class='postcard__title postcard__title--claimed']/text()").get()
website = response.xpath("(//a[#class='profile__website__link']/#href)[1]").get()
address = response.xpath("(//address[#class='profile__address--compact']/text())[1]").get()
yield{
'name':name,
"website":website,
'address':address
}
OUTPUT FOR ONE ENTRY:
2023-01-25 16:33:10 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.niche.com/k12/search/best-schools/s/wisconsin/%5C\>
{'name': 'Brookfield Central High School', 'link': '#'}
when it try to get inside link shown below
2023-01-25 16:33:12 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.niche.com/k12/search/best-schools/s/wisconsin/%5C\>
{'name': None, 'website': None, 'address': None}
Trying to get inside each school link and collect schoolname, address, telephone, tutuion fees, enrollment for particular link.

Not really a job for Scrapy, although it can certainly be accomplished with Scrapy.
Website is dynamic, pulling data from an API endpoint. I won't be setting up a Scrapy project just to answer your question, but I will demonstrate how you can get the data using Requests and pandas (code is ran in Jupyter notebook):
import requests
import pandas as pd
from tqdm.notebook import tqdm
pd.set_option('display.max_columns', None, 'display.max_colwidth', None)
headers = {
'accept-language': 'en-US,en;q=0.9',
'accept': 'application/json',
'referer': 'https://www.niche.com/k12/search/best-schools/s/wisconsin/',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
}
big_df = pd.DataFrame()
s = requests.Session()
s.headers.update(headers)
for x in tqdm(range(1, 5)):
r = s.get(f'https://www.niche.com/api/renaissance/results/?state=wisconsin&listURL=best-schools&page={x}&searchType=school')
df = pd.json_normalize(r.json(), record_path=['entities'])
big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
display(big_df)
Result in terminal:
100%
4/4 [00:02<00:00, 2.12it/s]
guid ctas badge.display badge.ordinal badge.total badge.vanityURL badge.photoURLs.desktop badge.photoURLs.mobile content.centroid.lat content.centroid.lon content.entity.abbreviation content.entity.alternates.nces content.entity.character content.entity.claimed content.entity.displayable content.entity.genus content.entity.guid content.entity.isClaimed content.entity.isPremium content.entity.location content.entity.name content.entity.parentGUIDs.county content.entity.parentGUIDs.metroArea content.entity.parentGUIDs.state content.entity.parentGUIDs.town content.entity.parentGUIDs.zipCode content.entity.premium content.entity.published content.entity.shortName content.entity.tagline content.entity.type content.entity.url content.entity.variation content.facts content.featuredReview.author content.featuredReview.body content.featuredReview.categories content.featuredReview.created content.featuredReview.guid content.featuredReview.rating content.grades content.photos.default.crops.DesktopHeader content.photos.default.crops.MobileHeader content.photos.default.crops.Original content.photos.default.guid content.photos.default.licenseName content.photos.editorial.crops.Original content.photos.editorial.guid content.photos.editorial.licenseName content.photos.editorial.uploadTimestamp content.photos.mapbox_header.author content.photos.mapbox_header.crops.DesktopHeader content.photos.mapbox_header.crops.MobileHeader content.photos.mapbox_header.guid content.photos.mapbox_header.licenseName content.photos.mapbox_header.licenseUrl content.photos.mapbox_header.sourceUrl content.photos.spotlight.crops.Original content.photos.spotlight.crops.Spotlight content.photos.spotlight.guid content.photos.spotlight.licenseName content.photos.spotlight.uploadTimestamp content.reviewAverage.average content.reviewAverage.count content.virtualTour content.entity.alternates.ceeb content.photos.default.crops.Thumbnail content.photos.default.uploadTimestamp content.entity.parentGUIDs.parent content.entity.parentGUIDs.schoolDistrict content.entity.parentGUIDs.schoolNetwork content.entity.parentGUIDs.neighborhood content.photos.default.crops.Spotlight
0 d6574ad4-6add-45c3-a90a-9d24f58b040e [{'label': 'View Nearby Homes', 'type': 'realE... Best Private High Schools in Wisconsin 1 82 best-private-high-schools/s/wisconsin https://d33a4decm84gsn.cloudfront.net/search/2... https://d33a4decm84gsn.cloudfront.net/search/2... 43.081737 -88.145195 Brookfield Academy 0 Private True True Private School d6574ad4-6add-45c3-a90a-9d24f58b040e True True BROOKFIELD, WI Brookfield Academy ba8709ae-856d-4583-83b7-4484b51ed4c2 3940b781-a9f6-4333-b607-6a6367e6af44 963a1085-efe7-45f5-81ee-d2bbf82a907c cc01665b-5240-4885-b13d-a4ae0dd271fc b802227a-e061-45e5-9dfd-6c3ddaf8bebb True True Brookfield Academy [Private School, BROOKFIELD, WI, PK, K-12] School brookfield-academy-brookfield-wi 1041 [{'config': {'format': ['comma'], 'rounding': ... Parent When my kids started school something just did... [Overall Experience] 2022-07-28T18:37:49.017538Z e3bfc3ad-86eb-4bba-8c66-5eb95e4111f7 5.0 [{'description': 'Based on quality of academic... https://d13b2ieg84qqce.cloudfront.net/d1d42e87... https://d13b2ieg84qqce.cloudfront.net/c046f1e3... https://d13b2ieg84qqce.cloudfront.net/d1d42e87... a4125add-a984-4609-a879-ce1afa699db8 UNLICENSED https://d13b2ieg84qqce.cloudfront.net/352e79e1... 53acd2d2-1185-49bd-9928-6a1f1054fba0 UNLICENSED 2022-02-10T21:15:52.569792Z © Mapbox https://api.mapbox.com/styles/v1/niche-admin/c... https://api.mapbox.com/styles/v1/niche-admin/c... f696705b-0766-48e5-97b5-72370788f0c6 © OpenStreetMap http://www.openstreetmap.org/copyright https://www.mapbox.com/about/maps/ https://d13b2ieg84qqce.cloudfront.net/2273b7a3... https://d13b2ieg84qqce.cloudfront.net/d512adbc... 2a0ddcf9-ae58-404f-9d91-13a5196c2217 UNLICENSED 2022-07-28T18:00:15.479792Z 4.333333 39 [{'label': 'Virtual Tour', 'value': 'https://w... NaN NaN NaN NaN NaN NaN NaN NaN
1 c5ce3267-c2ed-4785-a5d8-66c61fcf6063 [{'label': 'View Nearby Homes', 'type': 'realE... Best Private High Schools in Wisconsin 2 82 best-private-high-schools/s/wisconsin https://d33a4decm84gsn.cloudfront.net/search/2... https://d33a4decm84gsn.cloudfront.net/search/2... 43.186400 -87.935800 USM 01512787 Private True True Private School c5ce3267-c2ed-4785-a5d8-66c61fcf6063 True True WI University School of Milwaukee 8b295479-c31f-47a9-83b8-94b2100e2832 3940b781-a9f6-4333-b607-6a6367e6af44 963a1085-efe7-45f5-81ee-d2bbf82a907c 739d0594-0714-4d74-ad01-f07df19bc756 5d98fbca-9d9d-4219-8335-8dba54962ca7 True True University School [Private School, WI, PK, K-12] School university-school-of-milwaukee-river-hills-wi 1041 [{'config': {'format': ['comma'], 'rounding': ... Parent It is clear to see, in the short time we’ve be... [Overall Experience] 2022-10-28T07:41:07.70707Z a7a94913-bb20-4def-9553-761720f5cac8 5.0 [{'description': 'Based on quality of academic... https://d13b2ieg84qqce.cloudfront.net/184acaa6... https://d13b2ieg84qqce.cloudfront.net/d98566c3... https://d13b2ieg84qqce.cloudfront.net/c65ee0e3... be9334fb-56d4-4c0c-a4b9-b2de53c46b09 UNLICENSED https://d13b2ieg84qqce.cloudfront.net/887e0e98... cfb21e87-82a7-4fa5-8af6-bf33d199039a UNLICENSED 2022-02-10T21:12:07.916464Z © Mapbox https://api.mapbox.com/styles/v1/niche-admin/c... https://api.mapbox.com/styles/v1/niche-admin/c... 733bf01a-d21c-4374-bb52-42175a61a2c2 © OpenStreetMap http://www.openstreetmap.org/copyright https://www.mapbox.com/about/maps/ https://d13b2ieg84qqce.cloudfront.net/e80f6114... https://d13b2ieg84qqce.cloudfront.net/e80f6114... c8cb35e1-b83c-47f1-a649-eb3766a53de7 UNLICENSED NaN 4.209524 105 [{'label': 'Virtual Tour'}] 501390 https://d13b2ieg84qqce.cloudfront.net/97d061e2... 2022-07-11T13:31:31.710239Z NaN NaN NaN NaN NaN
2 84ab245d-ad99-43c9-93d8-9e474a109434 [{'label': 'View Nearby Homes', 'type': 'realE... Best Private High Schools in Wisconsin 3 82 best-private-high-schools/s/wisconsin https://d33a4decm84gsn.cloudfront.net/search/2... https://d33a4decm84gsn.cloudfront.net/search/2... 43.163916 -89.385004 MCDS A9904507 Private True True Private School 84ab245d-ad99-43c9-93d8-9e474a109434 True True WAUNAKEE, WI Madison Country Day School 4135e47a-62f6-4777-b514-d2e51894603f 1a1aaa73-65d0-490d-b3d3-d828716c5f6b 963a1085-efe7-45f5-81ee-d2bbf82a907c NaN 3bca1e55-0153-485a-a337-03448396568b True True MCDS [Private School, WAUNAKEE, WI, PK, K-12] School madison-country-day-school-waunakee-wi 1041 [{'config': {'format': ['comma'], 'rounding': ... Parent The MCDS faculty is truly exceptional -- they ... [Overall Experience] 2022-07-22T13:59:50.567397Z 6c714271-25ac-4206-9ef8-38d3ef1f92d6 5.0 [{'description': 'Based on quality of academic... https://d13b2ieg84qqce.cloudfront.net/68e0beb3... https://d13b2ieg84qqce.cloudfront.net/3a1cfdcf... https://d13b2ieg84qqce.cloudfront.net/b2d1416c... 86a7a6ce-2538-4bf1-8703-6b3b44fda5a4 UNLICENSED https://d13b2ieg84qqce.cloudfront.net/6cb7bdfd... 809aece8-55ce-4632-a3cf-d0a14417ffdc UNLICENSED 2022-02-09T21:25:15.513499Z © Mapbox https://api.mapbox.com/styles/v1/niche-admin/c... https://api.mapbox.com/styles/v1/niche-admin/c... dc5b6bd7-5a5c-48ee-bdfd-5780de198bc9 © OpenStreetMap http://www.openstreetmap.org/copyright https://www.mapbox.com/about/maps/ https://d13b2ieg84qqce.cloudfront.net/66e8fd60... https://d13b2ieg84qqce.cloudfront.net/fb59b45f... 2e6d1bd0-7760-46ae-8ce2-02306508b864 UNLICENSED 2022-04-18T18:58:56.007652Z 3.882353 34 [{'label': 'Virtual Tour', 'value': 'https://w... 502396 https://d13b2ieg84qqce.cloudfront.net/6ea5d8cb... 2022-06-08T22:11:36.605259Z NaN NaN NaN NaN NaN
3 35ca6237-c994-4fe6-b5f9-f09142680d7b [{'label': 'View Nearby Homes', 'type': 'realE... Best Private High Schools in Wisconsin 4 82 best-private-high-schools/s/wisconsin https://d33a4decm84gsn.cloudfront.net/search/2... https://d33a4decm84gsn.cloudfront.net/search/2... 43.457700 -88.827400 Wayland Academy 01514944 Private, Boarding True True Private School 35ca6237-c994-4fe6-b5f9-f09142680d7b True True BEAVER DAM, WI Wayland Academy 3c05ff22-e610-450d-8684-1b9f99edcd1f NaN 963a1085-efe7-45f5-81ee-d2bbf82a907c 1d49bb1b-d2a1-45e2-ac8e-c8d16ab29f3e f132a02a-1ead-4325-bf32-9079b435d74c True True Wayland [Private School, BEAVER DAM, WI, 9-12] School wayland-academy-beaver-dam-wi 1040 [{'config': {'format': ['comma'], 'rounding': ... Alum Though I only attended Wayland for two years (... [Overall Experience] 2022-08-14T20:05:05.231126Z a0bf7334-047c-4ee8-ab95-59c46dff42b3 5.0 [{'description': 'Based on quality of academic... https://d13b2ieg84qqce.cloudfront.net/7cc728a3... https://d13b2ieg84qqce.cloudfront.net/5e24f8a2... https://d13b2ieg84qqce.cloudfront.net/d7835cfd... 99230263-8332-4b03-b475-b948546402b7 UNLICENSED https://d13b2ieg84qqce.cloudfront.net/42561f2c... 697e0f82-7ccb-4651-877e-ffe881e188c5 UNLICENSED NaN © Mapbox https://api.mapbox.com/styles/v1/niche-admin/c... https://api.mapbox.com/styles/v1/niche-admin/c... c641160c-30c7-4b52-b336-e844ac8a059a © OpenStreetMap http://www.openstreetmap.org/copyright https://www.mapbox.com/about/maps/ https://d13b2ieg84qqce.cloudfront.net/3aaf34d3... https://d13b2ieg84qqce.cloudfront.net/f197c0a9... 124661d4-71cd-4d13-bfcc-926f3e074ade UNLICENSED 2022-09-28T16:06:46.315837Z 3.833333 66 [{'label': 'Virtual Tour', 'value': 'https://y... 500170 https://d13b2ieg84qqce.cloudfront.net/9b54f4ea... 2022-07-26T17:26:09.050891Z NaN NaN NaN NaN NaN
4 9b394d9c-46a0-431d-8ae4-62b6142cd46b [{'label': 'View Nearby Homes', 'type': 'realE... Best Private High Schools in Wisconsin 5 82 best-private-high-schools/s/wisconsin https://d33a4decm84gsn.cloudfront.net/search/2... https://d33a4decm84gsn.cloudfront.net/search/2... 42.773585 -87.774410 TPS 01513124 Private True True Private School 9b394d9c-46a0-431d-8ae4-62b6142cd46b True False WIND POINT, WI The Prairie School 5455e716-0063-4d63-a0e2-a07d199cdee1 3940b781-a9f6-4333-b607-6a6367e6af44 963a1085-efe7-45f5-81ee-d2bbf82a907c 5ef4c7c2-c006-49ea-88e9-9f40a0da6ce6 0d949807-5d44-4fc8-8753-1ce81f4a5d67 False True Prairie [Private School, WIND POINT, WI, PK, K-12] School the-prairie-school-wind-point-wi 41 [{'config': {'format': ['comma'], 'rounding': ... Alum The teachers are awesome and so approachable! ... [Overall Experience] 2020-06-23T03:25:59.897153Z 2d7de44a-38a7-493c-ac87-a024ba85d42d 5.0 [{'description': 'Based on quality of academic... NaN NaN NaN NaN NaN https://d13b2ieg84qqce.cloudfront.net/608f2378... a69ad3c5-f274-4bbe-ab1b-f1977c79c6f9 UNLICENSED 2022-02-10T20:38:50.869965Z © Mapbox https://api.mapbox.com/styles/v1/niche-admin/c... https://api.mapbox.com/styles/v1/niche-admin/c... 86b0f616-f1bf-4123-a2bc-f93255053083 © OpenStreetMap http://www.openstreetmap.org/copyright https://www.mapbox.com/about/maps/ https://d13b2ieg84qqce.cloudfront.net/a26a05f2... https://d13b2ieg84qqce.cloudfront.net/a26a05f2... ee7758c3-09ee-4d81-b4bd-c7f91c98652a UNLICENSED NaN 4.642857 70 [{'label': 'Virtual Tour', 'value': 'https://w... 501918 NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
95 8365632c-160e-4beb-b75c-dfafca1c2441 [{'label': 'View Nearby Homes', 'type': 'realE... Best Public Middle Schools in Wisconsin 24 594 best-public-middle-schools/s/wisconsin https://d33a4decm84gsn.cloudfront.net/search/2... https://d33a4decm84gsn.cloudfront.net/search/2... 44.891018 -87.290723 Sevastopol Middle School 551350000496 Public False True Public School 8365632c-160e-4beb-b75c-dfafca1c2441 False False STURGEON BAY, WI Sevastopol Middle School caaa657e-9c5e-4740-b72f-bef5b2c75ac1 NaN 963a1085-efe7-45f5-81ee-d2bbf82a907c 65ab2591-75de-487d-8a82-bddd79e3d3bd 7ae55b50-154c-4e0e-aff7-ed2726f7ceb8 False True Sevastopol Middle School [Sevastopol School District, WI, 6-8] School sevastopol-middle-school-sturgeon-bay-wi 45 [{'config': {'format': ['comma'], 'rounding': ... NaN NaN NaN NaN NaN NaN [{'description': 'Based on quality of academic... NaN NaN NaN NaN NaN NaN NaN NaN NaN © Mapbox https://api.mapbox.com/styles/v1/niche-admin/c... https://api.mapbox.com/styles/v1/niche-admin/c... e750dc05-07ed-42b0-92e3-f24ab16f1b8b © OpenStreetMap http://www.openstreetmap.org/copyright https://www.mapbox.com/about/maps/ NaN NaN NaN NaN NaN 0.000000 0 [{'label': 'Virtual Tour'}] NaN NaN NaN d4d24c63-d104-44cd-ad3f-0ded85522583 d4d24c63-d104-44cd-ad3f-0ded85522583 NaN NaN NaN
96 d46a53a4-62f4-4086-9a53-4c6f78f54915 [{'label': 'View Nearby Homes', 'type': 'realE... Best Public Elementary Schools in Wisconsin 36 1074 best-public-elementary-schools/s/wisconsin https://d33a4decm84gsn.cloudfront.net/search/2... https://d33a4decm84gsn.cloudfront.net/search/2... 42.935063 -88.405594 Prairie View Elementary School 551006001321 Public True True Public School d46a53a4-62f4-4086-9a53-4c6f78f54915 True False NORTH PRAIRIE, WI Prairie View Elementary School ba8709ae-856d-4583-83b7-4484b51ed4c2 3940b781-a9f6-4333-b607-6a6367e6af44 963a1085-efe7-45f5-81ee-d2bbf82a907c 1a756678-9c81-4d89-8620-604b8e10507c 8afa0d18-4b1f-4052-a09a-d9bfe3e67295 False True Prairie View Elementary School [Mukwonago Area School District, WI, PK, K-6] School prairie-view-elementary-school-north-prairie-wi 45 [{'config': {'format': ['comma'], 'rounding': ... NaN NaN NaN NaN NaN NaN [{'description': 'Based on quality of academic... NaN NaN NaN NaN NaN NaN NaN NaN NaN © Mapbox https://api.mapbox.com/styles/v1/niche-admin/c... https://api.mapbox.com/styles/v1/niche-admin/c... 51a79bc3-f67e-4b49-87b1-d36ef46e1145 © OpenStreetMap http://www.openstreetmap.org/copyright https://www.mapbox.com/about/maps/ NaN NaN NaN NaN NaN 0.000000 0 [{'label': 'Virtual Tour'}] NaN NaN NaN bda72d2a-3f49-4288-a9f2-d024898ca67b bda72d2a-3f49-4288-a9f2-d024898ca67b NaN NaN NaN
97 0ed32d0d-f062-4784-8dd4-57a9724209eb [{'label': 'View Nearby Homes', 'type': 'realE... Best Public Middle Schools in Wisconsin 25 594 best-public-middle-schools/s/wisconsin https://d33a4decm84gsn.cloudfront.net/search/2... https://d33a4decm84gsn.cloudfront.net/search/2... 43.624047 -87.786299 Oostburg Middle School 551107001464 Public False True Public School 0ed32d0d-f062-4784-8dd4-57a9724209eb False False OOSTBURG, WI Oostburg Middle School 1db5c6d2-5b8f-44fa-87fc-f7471ee45443 NaN 963a1085-efe7-45f5-81ee-d2bbf82a907c b44a651d-bcfc-4d95-a6ce-f00c0c42671e d594fdef-3441-462c-93ad-981c8fd1f064 False True Oostburg Middle School [Oostburg School District, WI, 6-8] School oostburg-middle-school-oostburg-wi 45 [{'config': {'format': ['comma'], 'rounding': ... Niche User The middle school does a great job at preparin... [Academics] 2015-02-12T14:29:22Z 0c61a295-b7c7-44e9-ab7a-64993190796f 5.0 [{'description': 'Based on quality of academic... NaN NaN NaN NaN NaN NaN NaN NaN NaN © Mapbox https://api.mapbox.com/styles/v1/niche-admin/c... https://api.mapbox.com/styles/v1/niche-admin/c... 32fa503a-a377-44e8-bb10-f9a7fa0bb67c © OpenStreetMap http://www.openstreetmap.org/copyright https://www.mapbox.com/about/maps/ NaN NaN NaN NaN NaN 4.800000 10 [{'label': 'Virtual Tour'}] NaN NaN NaN 3478a622-503f-47d5-93a0-c3207124cdd4 3478a622-503f-47d5-93a0-c3207124cdd4 NaN NaN NaN
98 17b8ac12-d893-4af4-bf79-3aa06bef648a [{'label': 'View Nearby Homes', 'type': 'realE... Best Public High Schools in Wisconsin 24 496 best-public-high-schools/s/wisconsin https://d33a4decm84gsn.cloudfront.net/search/2... https://d33a4decm84gsn.cloudfront.net/search/2... 42.993816 -88.224033 WEPA 551578002688 Public, Charter True True Charter School 17b8ac12-d893-4af4-bf79-3aa06bef648a True False WAUKESHA, WI Waukesha Engineering Preparatory Academy ba8709ae-856d-4583-83b7-4484b51ed4c2 3940b781-a9f6-4333-b607-6a6367e6af44 963a1085-efe7-45f5-81ee-d2bbf82a907c 5a94913e-87ac-4e4e-9b76-a2330bf1a635 b88c94da-24d3-4004-b43b-547d9da55e0d False True Waukesha Engineering Preparatory Academy [School District of Waukesha, WI, 9-12] School waukesha-engineering-preparatory-academy-wauke... 52 [{'config': {'format': ['comma'], 'rounding': ... Senior The Academy is well equipped and staffed, and ... [Overall Experience] 2021-10-13T21:07:42.049714Z 8d8241ae-fc49-4bab-a146-48d77f1e6391 4.0 [{'description': 'Based on quality of academic... NaN NaN NaN NaN NaN NaN NaN NaN NaN © Mapbox https://api.mapbox.com/styles/v1/niche-admin/c... https://api.mapbox.com/styles/v1/niche-admin/c... b1db421b-6290-4a08-a854-82dd9089e116 © OpenStreetMap http://www.openstreetmap.org/copyright https://www.mapbox.com/about/maps/ NaN NaN NaN NaN NaN 3.681818 22 [{'label': 'Virtual Tour'}] 500331 NaN NaN a368f833-c451-45bb-a0f7-b656d02477f3 a368f833-c451-45bb-a0f7-b656d02477f3 NaN NaN NaN
99 c3b20454-cd71-45bb-ab33-0b3ea37527fb [{'label': 'View Nearby Homes', 'type': 'realE... Best Public Elementary Schools in Wisconsin 37 1074 best-public-elementary-schools/s/wisconsin https://d33a4decm84gsn.cloudfront.net/search/2... https://d33a4decm84gsn.cloudfront.net/search/2... 43.089194 -87.883770 Atwater Elementary School 551380001809 Public True True Public School c3b20454-cd71-45bb-ab33-0b3ea37527fb True False SHOREWOOD, WI Atwater Elementary School 8b295479-c31f-47a9-83b8-94b2100e2832 3940b781-a9f6-4333-b607-6a6367e6af44 963a1085-efe7-45f5-81ee-d2bbf82a907c 900b6b9c-206e-4c34-82a8-247fee552b49 542c1289-ad69-4fc3-afab-bf91c1a6110e False True Atwater Elementary School [Shorewood School District, WI, PK, K-6] School atwater-elementary-school-shorewood-wi 45 [{'config': {'format': ['comma'], 'rounding': ... NaN NaN NaN NaN NaN NaN [{'description': 'Based on quality of academic... NaN NaN NaN NaN NaN NaN NaN NaN NaN © Mapbox https://api.mapbox.com/styles/v1/niche-admin/c... https://api.mapbox.com/styles/v1/niche-admin/c... b523d19c-fdd1-497b-bd6d-ab394cde0dbf © OpenStreetMap http://www.openstreetmap.org/copyright https://www.mapbox.com/about/maps/ NaN NaN NaN NaN NaN 0.000000 0 [{'label': 'Virtual Tour', 'value': 'https://w... NaN NaN NaN 84c36616-1b72-4d85-998d-c9795aadb726 84c36616-1b72-4d85-998d-c9795aadb726 NaN NaN NaN
100 rows × 73 columns
​
You can get all data by adjusting the range (go for 123 for max records). Also, you may want to add some pause between requests, otherwise you'd be blocked.
You can also use Scrapy, if you wish.

You need to check carefully the HTML because you can find the url inside one div
import scrapy
class NicheschoolsSpider(scrapy.Spider):
name = 'nicheschools'
allowed_domains = ['www.niche.com']
start_urls = ['https://www.niche.com/k12/search/best-schools/s/wisconsin/']
def parse(self, response):
school_links = response.xpath("//div[#class='card ']/a/#href").extract()
for link in school_links:
yield response.follow(url=link, callback=self.parse_schools)
def parse_schools(self, response):
name = response.xpath("//h1[#class='postcard__title postcard__title--claimed']/text()").extract_first()
website = response.xpath("(//a[#class='profile__website__link']/#href)[1]").extract_first()
address = response.xpath("(//address[#class='profile__address--compact']/text())[1]").extract_first()
yield {
'name': name,
'link': response.url,
'website': website,
'address': address,
}
Result on json
{'name': 'Brookfield Academy', 'link': 'https://www.niche.com/k12/brookfield-academy-brookfield-wi/', 'website': 'https://www.brookfieldacademy.org', 'address': '3462 N BROOKFIELD RD'}
{'name': 'Wisconsin Lutheran High School', 'link': 'https://www.niche.com/k12/wisconsin-lutheran-high-school-milwaukee-wi/', 'website': 'https://www.wlhs.org', 'address': '330 N GLENVIEW AVE'}
{'name': 'Homestead High School', 'link': 'https://www.niche.com/k12/homestead-high-school-mequon-wi/', 'website': 'http://www.mtsd.k12.wi.us/homestead/', 'address': '5000 W MEQUON RD'}
{'name': 'Brookfield Central High School', 'link': 'https://www.niche.com/k12/brookfield-central-high-school-brookfield-wi/', 'website': 'https://www.elmbrookschools.org/brookfield-central-high-school', 'address': '16900 W GEBHARDT RD'}
{'name': 'Shorewood High School', 'link': 'https://www.niche.com/k12/shorewood-high-school-shorewood-wi/', 'website': 'https://www.shorewood.k12.wi.us/apps/pages/shs', 'address': '1701 E CAPITOL DR'}
{'name': 'School District of Waukesha', 'link': 'https://www.niche.com/k12/d/school-district-of-waukesha-wi/', 'website': 'https://sdw.waukesha.k12.wi.us', 'address': '222 MAPLE AVE'}
{'name': 'Pilgrim Park Middle School', 'link': 'https://www.niche.com/k12/pilgrim-park-middle-school-elm-grove-wi/', 'website': 'http://www.elmbrookschools.org/', 'address': '1500 PILGRIM PKWY'}
{'name': 'Marquette University High School', 'link': 'https://www.niche.com/k12/marquette-university-high-school-milwaukee-wi/', 'website': 'https://www.muhs.edu/', 'address': '3401 W WISCONSIN AVE'}
...
If you are new on web scraping you need to be careful with over hitting the site because they could block you and then you need to solve a captcha solution for enter the site.
Also If you want to expand your knowledge there are clusters of web scraping like Estela where you can run your spiders and also create cronjobs for do it everyday.

Related

How do I calculate the mean of each categorical descriptor?

I have this dataset in which the career statistics are described of various different university degrees. These degrees are categorised using a broader area of study in a different column, for example the degree 'Actuarial Science' falls under the 'Business' category, 'Nursing' under the 'Health' etc. I wish to condense the 172 rows of degrees into the 16 major categories (such that my dataset is now just 16 rows) and use their mean scores for my analysis.
I'm aware this is probably a few functions in addition to the 'group_by()' _function from tidyverse but I'm unsure where to go after this. The head of the dataset is below. There's an additional 12 columns omitted here.
Rank Major Total Men Women Major_category ShareWomen Sample_size Employed
1. Petroleum Eng 2339 2057 282 Engineering 0.121 36 1976
2. Mining 756 679 77 Engineering 0.102 7 640
3. Metallurgic Eng. 856 725 131 Engineering 0.153 3 648
4. Naval Architecture 1258 1123 135 Engineering 0.107 16 758
5. Chemical Eng. 32260 21239 11031 Engineering 0.342 289 25694
6. Nuclear Eng. 2573 2200 373 Engineering 0.145 17 1857
7. Studio Arts 16977 4754 12223 Arts 0.7199 182 13908
Simply try this, have added more variables which might be of interest to you. Modify as needed and not copied:
yourDf%>%
group_by(Major_category)%>%
summarise( Mean_Score = mean(Variable_to_average,na.rm=T),
,Counts_Major =n_distint(Major) # will give number of categories
,Men = sum(Men,na.rm=T) # total Men / women
)
Hope you got the gist to analyse other columns. Summarise is very powerful.

NA/NaN/Inf in foreign function call (arg 6) in KNN Algorithm

I am trying to predict category based on knn alog. but I don't know why I am getting above error like "NA/NaN/Inf in foreign function call (arg 6)"
I already removed NA values using na.omit(A) function but still getting NA error.
data.csv
RegionName,RetailerId,PartyName,Address1,Address2,Area,City,ContactPerson,CSTNumber,Email,LicenseNumber,Telephone,MobileNumber
MUMBAI,297,BHAGWATI MEDICAL & GENERAL STORES,"2,GROUND FLOOR,ABDUL REHAMAN CHAWL,MAROL GAON",SHREE HANUMAN MANDIR ROAD,MAROL,ANDHERI EAST,HARSHIT JAIN,20 Z6 59 90B,BHAGWATIMEDICAL7#YAHOO.COM,21 Z6 59 90B,29207788 / 07666464888,"82,864,534,619,867,000,000"
MUMBAI,297,BHAGWATI MEDICAL [MAROL],"SHRI HANUMAN MANDIR RD;MAROL GAON,","ANDHERI[E],MUMBAI-59.",,ANDHERI [E],MR.DINESH KOTHARI,20Z-6-59-908,BHAGWATIMEDICAL7#YAHOO.COM,21Z-6-59-908,29207788,
MUMBAI,297,BHAGWATI MEDICAL [MAROL],"SHRI HANUMAN MANDIR RD;MAROL GAON,","ANDHERI[E],MUMBAI-59.",,ANDHERI [E],MR.DINESH KOTHARI,20Z-6-59-908,BHAGWATIMEDICAL7#YAHOO.COM,21Z-6-59-908,29207788,
MUMBAI,297,BHAGWATI MEDICAL [MAROL],"SHRI HANUMAN MANDIR RD;MAROL GAON,","ANDHERI[E],MUMBAI-59.",,ANDHERI [E],MR.DINESH KOTHARI,20Z-6-59-908,BHAGWATIMEDICAL7#YAHOO.COM,21Z-6-59-908,29207788,
MUMBAI,297,BHAGWATI MEDICAL & GENRAL STORE,"SHRI HANUMAN MANDIR ROAD,",MAROL VILLAGE,MAROL,MUMBAI,DINESH,20/Z-6/59/908,BHAGWATIMEDICAL7#YAHOO.COM,20C/Z-6/59/908,29207788/8286453461,98670976670
MUMBAI,297,$BHAGWATI MEDICAL.,"SHOP NO.2,ABDUL REHMAN CHAWL SHRI HANUMAN MANDIR ROAD",,ANDHERI(E),MUMABAI,,20-21-Z-1,BHAGWATIMEDICAL7#YAHOO.COM,59-908-20C,29207788/8286453461,
MUMBAI,297,BHAGWATI MEDICAL & GENERAL STORE,SHOP NO.2 ABDUL REHMAN CHAWL SHRI HANUMAN MANDIR MARG,"MAROL VILLAGE,",,ANDHERI (E),,20/Z-6/59/908,BHAGWATIMEDICAL7#YAHOO.COM,21/Z-6/59/908,29207788 / 9867097667,7666464888
MUMBAI,297,BHAGWATI MED. & GEN. STORES.,"SHREE HANUMAN MANDIR ROAD, MAROL VILLEG",,MAROL,MUMBAI,DINESH BHIMRAJ,20Z-6/59/908,BHAGWATIMEDICAL7#YAHOO.COM,20C-Z-6/59/940,29207788,9869260832
MUMBAI,297,BHAGWATI MEDICAL & GENERAL STORES.,"SHOP NO.2, GR FLOOR, MEZZAINI FLR,ABDULREHMAN CHAWL,","SHREE HANUMAN MANDIR ROAD,MAROL GAON",ANDHERI(E),MUMBAI,,"20,21/Z-6/59/90B",BHAGWATIMEDICAL7#YAHOO.COM,20C/Z-6/59/940,7977458967,9867097667
MUMBAI,297,BHAGWATI MEDICAL,"SHRI HANUMAN MANDIR RD,","MAROL GAON,MAROL, ANDHERI(E)",VP(E)-A(E)-MA,MUMBAI,,"20,21/Z-6/59/908",,20C/Z-6/59/940,29207788,7738788474
MUMBAI,297,BHAGWATI MEDICAL & GENERAL STORES.,"SHOP NO.2,ABDUL REHMAN CHWAL,HANUMAN MANDIR,MAROL VILLADGE REZY COELHO CHAWL,",ANDHERI(E),ANDHERI (E),MUMBAI,DINESH BHAI,21Z-6/59/908,BHAGWATIMEDICAL7#YAHOO.COM,20Z-6/59/908,29207788/7666464888,
MUMBAI,297,BHAGWATI MED.& GEN. ST.,2 GR.FL.ABDUL REHMAN CHAWL,HANUMAN MANDIR RD.,MAROL GAON,ANDHERI-E,DINESH KOTHARI,"20,21/Z-6/59/908",BHAGWATIMEDICAL7#YAHOO.COM,20C/Z-6/59/940,9869260832,29207788
MUMBAI,297,BHAGWATI MEDICAL & GENERAL STORES.,SHOP NO 2.ABDUL REHMAN CHAWL.,"SHRI HANUMAN MANDIR ROAD, MAROL VILLAGE",MAROL - ANDHERI - EAST,MUMBAI,MAROL,20-Z6/59/908,BHAGWATIMEDICAL7#YAHOO.COM,21-Z6/59/908,29207788/7738788474/9869260832,9867097667
MUMBAI,297,BHAGWATI MEDICAL,"SHRI HANUMAN MANDIR ROAD,","MAROL GAON,",ANDHERI (E),MUMBAI,,,,,29207788/8286453461,
MUMBAI,297,BHAGWATI MEDI & GEN.STORES,SHRI HANUMAN MANDIR ROAD MAROL VILLAGE,MAROL,,MAROL,,20/Z/6/59/749,,20 C/Z-6/59/788,29207788,
MUMBAI,297,BHAGWATI MED ST 29207788,2 GR FL MEZZANIN ABDUL REHAMAN,CHAWLHUMAN MANDIR RDMAROL,ANDHERI,,,27390646287V,BHAGWATIMEDICAL7#YAHOO.COM,20-21Z-59-908-20CZ6-59-940,,7666464888
MUMBAI,297,BHAGWATI MEDICAL,"SHRI HANUMAN MANDIR ROAD,MAROL GAON,MAROL,ANDHERI-E",,,,,,,,,8286453461
MUMBAI,297,BHAGWATI MED & GEN STORES,,ANDHERI (E),ANDHERI [W],,,,,/,,
MUMBAI,297,BHAGWATI MEDICAL STORE,SH NO.2BRFLR.MAZALIN FLR.,ABDUL REHMAN CHL.HANUMAN MAND,ANDHERI (WEST),,,27390646287 V,BHAGWATIMEDICAL7#YAHOO.COM,20-21-Z-6-59-90B,9867097667 / 8286453461,
MUMBAI,297,BHAGWATI MEDICAL MAROL,SHOP NO 2 ABDULREHMAN CHAWL SH,ANDHERI E,,GENERAL,,20/21-Z6-59-908,,20C-Z6-59-940,29207788,
MUMBAI,297,BHAGWATI MEDICAL & GENERAL STORES,"SHRI HANUMAN MANDIR ROAD,, MAROL VILLAGE,, ANDHERI (E),",", MUMBAI.",ANDHERI (E),MUMBAI,,C_00121689190,MUMBAI,20/21-Z-6/59/908,,9867097667
MUMBAI,389,GOPAL KRISHNA MED.& GEN.ST. #,"22,LAXMI CHAYYA BLDG","L.T.ROAD,BABHAI NAKA",BORIVLI,BORIVALI WEST,8959202,20/Z7/92/2221,GOPALKRISHNAMED22#GMAIL.COM,21/Z7/92/2221,9821287221/28959202,
MUMBAI,389,GOPAL KRISHNA MED & GEN STORES,"22,LAXMI CHHAYA,L.T.ROAD","BABAI NAKA ,EKSAR ROAD",BORIVALI (WEST),MUMBAI,MR CHANDRAKANT,20/Z7/92/2221,GOPALKRISHNAMED22#GMAIL.COM,21/Z7/92/2221,28959202/983381929,9821287221
MUMBAI,389,GOPAL KRISHNA MEDICAL & GENERAL STORES,"22, LAXMI CHHAYA, L.T.ROAD",BABHAI NAKA,BORIVALI W,MUMBAI,,20/Z/7/92/2221,GOPALKRISHNAMED22#GMAIL.COM,21/Z/7/92/2221,28959202,
MUMBAI,389,NEW GOPAL KRISHNA MEDICAL & GEN.STORES,"22, LAXMI CHHAYA, BABHAI NAKA",EKSAR ROAD,L.T.ROAD,BORIVALI (W),CHANDHUBHAI,20-MH-MZ7-192791,GOPALKRISHNAMED22#GMAIL.COM,21-MH-MZ7/192792,28959202,9833819296/9821287221
MUMBAI,389,GOPAL KRISHNA MED.&GEN.STORES,"22,LAXMI CHHAYA,L.T.ROAD,BABHAI","NAKA,WEST MUMBAI",,BORIVALI,CHANDRAKANTBHAI,20Z-7/92/2221,GOPALKRISHNAMED22#GMAIL.COM,21Z-7/92/2221,28959202/69931501,9833819296
MUMBAI,389,GOPAL KRISHNA MED.& GEN.ST;[BORIVALI-W],"22,LAXMI CHHAYA ,L.T.RD;BHABAI NAKA,","BORIVALI[W],MUMBAI-92.",,BORIVALI [W],MR.CHANDUBHAI,20-Z-7/92/2221,,21-Z-7/92/2221,28959202,
MUMBAI,389,GOPAL KRISHNA MED.& GEN.ST;[BORIVALI-W],"22,LAXMI CHHAYA ,L.T.RD;BHABAI NAKA,","BORIVALI[W],MUMBAI-92.",,BORIVALI [W],MR.CHANDUBHAI,20-Z-7/92/2221,,21-Z-7/92/2221,28959202,
MUMBAI,389,GOPAL KRISHNA MED.& GEN.ST;[BORIVALI-W],"22,LAXMI CHHAYA ,L.T.RD;BHABAI NAKA,","BORIVALI[W],MUMBAI-92.",,BORIVALI [W],MR.CHANDUBHAI,20-Z-7/92/2221,,21-Z-7/92/2221,28959202,
MUMBAI,389,GOPAL KRISHNA MED &. GENERAL STORES,"22, LAXMI CHHAYA BLDG,","BABHAI NAKA, EKSAR RD,",BORIVALI (W),MUMBAI,,20/Z/7/92/2221,,21/Z/7/92/2221,28959202 / 9821287221,
MUMBAI,389,GOPAL KRISHNA MED. & GEN. STORES,"22,LAXMI CHHAYA,","L.T. ROAD,BABHAI NAKA,",,BORIVALI{WEST},,20&21-Z-7/92/2221,GOPALKRISHNAMED22#GMAIL.COM,20C-Z-7/92/2124,"289,592,029,821,287,000",9833819296
MUMBAI,389,GOPAL KRISHNA MEDICAL,22LAXMI CHHAYYA,BABHAI NAKA EKSAR ROAD,(S) BORIVALI (WEST).,,,,,20-Z-7/92/187121-Z-7/92/1871 20C-Z-7/92/1817. DT.6.10.08,9821287221/9892695575,
MUMBAI,389,GOPALKRISHNA MEDICAL STORE,,,BORIVALI (WEST),MUMBAI,,,,,28959202,
MUMBAI,389,GOPAL KRISHNA MED &. GENERAL STORES,"22, LAXMI CHHAYA BLDG,L.T.RD","BABHAI NAKA, EKSAR RD,",BORIVALI (W),MUMBAI,,20-MH-MZ7-192791,GOPALKRISHNAMED22#GMAIL.COM,21-MH-MZ7-192792,28959202 / 9821287221,
MUMBAI,389,ZZGOPAL KRISHNA MED.ST.,22 LAXMI CHAYA,BABHAI NAKA,L.T.RD,BORIVALI-W,CHANDU BHAI,"20,21/Z-7/92/2221",GOPALKRISHNAMED22#GMAIL.COM,20C/Z-7/92/2124,28959202,
MUMBAI,389,GOPAL KRISHNA MED & GEN STORES,"22,LAXMI CHHAYA, L.T.RD,BABHAI NAKA",,,BORIVALI-W,,"20-Z-7/92/1536,21-Z-7/92/1536",,21-C-Z/92/1481,,
MUMBAI,389,GOPALKRISHNA MEDICAL.,"L.T.ROAD, BABHAI NAKA",BORIVALI (W),,BORIVALI (W),,,,,9821287221,
MUMBAI,389,GOPAL KRISHNA MEDICAL,"SH-22,L.T.RD,BABAI NAKA",,BORIVALI(W),MUMBAI,,,,,9821287221/28959202,
MUMBAI,389,GOPAL KRISHNA MED.&GEN.STORE,22/LAXMI CHHAYA; L.T.ROAD,BORIVALI (WEST) BABHAI NAKA,BORIVALI,,CHANDU BHAI - 9833819296,27480593421V,GOPALKRISHNAMED22#GMAIL.COM,20-Z-7/92/2221*21-Z-7/92/2221 20C-Z-7/92/2124,28959202,
MUMBAI,389,GOPAL KRISHNA MED.(CLOSED-,"22,LAXMI CHHAYA,","L.T.ROAD,BABHAI NAKA, BORAVALI WEST,MUMBAI-400092",,BORIVALI- WEST,,20-Z-7/92/1536,,21-Z-7/92/1536,28959202,
MUMBAI,389,GOPAL KRISHNA MED & GEN STO,22 LAXMI CHHAYA L T RD,BABHAI NAKABORIVLI W MUM-92,BORIVALI,,9821287221 9892695575,27480593421.V,GOPALKRISHNAMED22#GMAIL.COM,20-21Z7922221 20C2124,28959202,
MUMBAI,389,GOPAL KRISHNA MED & GEN STORE,22/LAXMI CHHAYA,L.T.ROAD,BORIVALI (WEST),,,,,20-7-7/92/1536 /21-Z-7/92/1536,,
RCode
A = read.csv("data.csv")
A = data.frame(na.omit(A))
str(A)
#######
# split training adn testing set
#######
set.seed(123)
sf = sample(2,nrow(A),replace = T,prob = c(0.9,0.1))
trd = A[sf == 1,]
tsd = A[sf == 2,]
# lists out the variables that are problematic
which(sapply(A, function(x) length(unique(x))<2))
# Converts Dependent Variable into Factor
Train_RetailerId = as.factor(trd[,2])
#######
# KNN
#######
library(class)
Predicted.RetailerId = knn(trd,tsd,Train_RetailerId, k=1)
print(mean(A$RetailerId != Predicted.RetailerId))
Result = cbind(Predicted.RetailerId,tsd)
confusionMatrix(Predicted.RetailerId,tsd$RetailerId)
Structure of Dataset
> str(A)
'data.frame': 42 obs. of 13 variables:
$ RegionName : Factor w/ 1 level "MUMBAI": 1 1 1 1 1 1 1 1 1 1 ...
$ RetailerId : int 297 297 297 297 297 297 297 297 297 297 ...
$ PartyName : Factor w/ 32 levels "$BHAGWATI MEDICAL.",..: 12 15 15 15 14 1 11 5 13 8 ...
$ Address1 : Factor w/ 36 levels "","2 GR FL MEZZANIN ABDUL REHAMAN",..: 4 32 32 32 34 27 25 29 26 31 ...
$ Address2 : Factor w/ 31 levels "",", MUMBAI.",..: 29 7 7 7 26 1 27 1 30 25 ...
$ Area : Factor w/ 19 levels "","(S) BORIVALI (WEST).",..: 16 1 1 1 16 7 1 16 7 19 ...
$ City : Factor w/ 16 levels "","ANDHERI-E",..: 5 4 4 4 16 15 3 16 16 16 ...
$ ContactPerson: Factor w/ 16 levels "","8959202","9821287221 9892695575",..: 12 16 16 16 8 1 1 10 1 1 ...
$ CSTNumber : Factor w/ 26 levels "","20-21-Z-1",..: 8 18 18 18 14 2 14 19 11 10 ...
$ Email : Factor w/ 4 levels "","BHAGWATIMEDICAL7#YAHOO.COM",..: 2 2 2 2 2 2 2 2 2 1 ...
$ LicenseNumber: Factor w/ 30 levels "","/","20-21-Z-6-59-90B",..: 24 28 28 28 14 30 25 11 15 15 ...
$ Telephone : Factor w/ 18 levels "","289,592,029,821,287,000",..: 9 7 7 7 12 12 8 7 13 7 ...
$ MobileNumber : Factor w/ 12 levels "","29207788",..: 5 1 1 1 11 1 3 12 10 4 ...
The first line of the knn source code (if you type knn) on your console is train <- as.matrix(train), which converts the data.frame to matrix. And since a matrix can only contain one data type, it gets converted into a character matrix. Obviously, knn and pretty much any other algorithm, requires a numerical matrix in order to run the calculations.
trd_mat <- as.matrix(trd)
typeof(trd_mat)
#[1] "character"
All of your variables are of type factor and they contain quite a few labels. The only way for it to work is to convert it to dummy variables first (so that it's full of 0-1 variables) and then run knn on that data.frame. Given that your factor variables have plenty of levels, your resulting data.frame will be very sparse which might make knn less efficient.
There are plenty of tutorials on how to convert your factors into dummy variables if you want to follow that route. I link one.
As an alternative a random forest might give you better results given your factor variables.

Folium Heatmap Recursion Error based on location coordinates

I am attempting to make a heat map through folium with my data. Below is my code but I keep getting an error stating: RecursionError: maximum recursion depth exceeded and I have no clue what that means. Any input? Below is the code for the heatmap.
# Creating a dataframe of the 'month', 'day_of_week' and 'location' day_month = pd.DataFrame(df_criclean[['month', 'day_of_week','location']])
day_month.sort_values('month', ascending = False).head(10)
# Trying to use folium to make a heatmap of the data I have in 'day_month'
map = folium.Map(location=[42.3601, -71.0589], [enter image description here][1]tiles='cartodbpositron', zoom_start=1)
HeatMap(day_month['location']).add_to(map)
I also have this "bug", I think its related to variables being of type object instead of float64 or other base types (my dataset had a lot of blanks "" instead of valid GPS coordinates).
#> ./folium-test.py
--------------------
_id city daily_rain date ... wind_degrees wind_dir wind_speed wind_string
0 {'$oid': '5571aaa8e4b07aa3c1c4e231'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 4.8 NaN
1 {'$oid': '5571aaa9e4b07aa3c1c4e232'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 1.6 NaN
2 {'$oid': '5571aaa9e4b07aa3c1c4e233'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 11.3 NaN
3 {'$oid': '5571aaa9e4b07aa3c1c4e234'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 13 NaN
4 {'$oid': '5571aaa9e4b07aa3c1c4e235'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 5 NaN
[5 rows x 18 columns]
(500, 18)
--------------------
0 8.48402346662349
1 8.15408706665039
2 9.81855869293213
3 9.83495235443115
4 9.92164134979248
5 9.26684331789147
6 9.59504252663464
7 9.07091170549393
8 8.99822786450386
9 8.9606299996376
10 8.93120750784874
11 9.02073669538368
12 8.912937
13
...
498 8.912937
499
Name: longitudine, Length: 500, dtype: object
0 44.3720632234234
1 43.9720632409982
2 44.1090045985169
3 44.1142735479457
4 44.145446252325
5 44.3377021234296
6 44.3773853328621
7 44.3798960485217
8 44.4051013957662
9 44.4094088501931
10 44.4160476104163
11 44.4527250625144
12 44.516321
13
...
498 44.516321
499
Name: latitudine, Length: 500, dtype: object
Traceback (most recent call last):
File "./folium-test.py", line 89, in <module>
folium.Marker([row["latitudine"], row["longitudine"]], popup=row["temperatura"]).add_to(marker_cluster)
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/map.py", line 258, in __init__
self.location = _validate_coordinates(location)
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 53, in _validate_coordinates
if _isnan(coordinates):
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 79, in _isnan
return any(math.isnan(value) for value in _flatten(values))
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 79, in <genexpr>
return any(math.isnan(value) for value in _flatten(values))
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 71, in _flatten
for j in _flatten(i):
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 71, in _flatten
for j in _flatten(i):
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 71, in _flatten
for j in _flatten(i):
[Previous line repeated 982 more times]
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 70, in _flatten
if _is_sized_iterable(i):
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 32, in _is_sized_iterable
return isinstance(arg, abc.Sized) & isinstance(arg, abc.Iterable)
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/abc.py", line 139, in __instancecheck__
return _abc_instancecheck(cls, instance)
RecursionError: maximum recursion depth exceeded in comparison
But if I add these lines to my code, folium works fine (even for large datasets):
df['longitudine'] = df['longitudine'].replace(r'\s+', np.nan, regex=True)
df['longitudine'] = df['longitudine'].replace(r'^$', np.nan, regex=True)
df['longitudine'] = df['longitudine'].fillna(-0.99999)
df['longitudine'] = pd.to_numeric(df['longitudine'])
df['latitudine'] = df['latitudine'].replace(r'\s+', np.nan, regex=True)
df['latitudine'] = df['latitudine'].replace(r'^$', np.nan, regex=True)
df['latitudine'] = df['latitudine'].fillna(-0.99999)
df['latitudine'] = pd.to_numeric(df['latitudine'])
This is the output:
#> ./folium-test.py
--------------------
_id city daily_rain date ... wind_degrees wind_dir wind_speed wind_string
0 {'$oid': '5571aaa8e4b07aa3c1c4e231'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 4.8 NaN
1 {'$oid': '5571aaa9e4b07aa3c1c4e232'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 1.6 NaN
2 {'$oid': '5571aaa9e4b07aa3c1c4e233'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 11.3 NaN
3 {'$oid': '5571aaa9e4b07aa3c1c4e234'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 13 NaN
4 {'$oid': '5571aaa9e4b07aa3c1c4e235'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 5 NaN
[5 rows x 18 columns]
(500, 18)
--------------------
0 8.484023
1 8.154087
2 9.818559
3 9.834952
4 9.921641
5 9.266843
6 9.595043
7 9.070912
8 8.998228
9 8.960630
10 8.931208
11 9.020737
12 8.912937
13 -0.999990
...
498 8.912937
499 -0.999990
Name: longitudine, Length: 500, dtype: float64
0 44.372063
1 43.972063
2 44.109005
3 44.114274
4 44.145446
5 44.337702
6 44.377385
7 44.379896
8 44.405101
9 44.409409
10 44.416048
11 44.452725
12 44.516321
13 -0.999990
...
498 44.516321
499 -0.999990
Name: latitudine, Length: 500, dtype: float64
1 43.9720632409982 8.154087066650389 30.6
I got the same error until I converted my coordinates from str to float
df['lat'] = df['lat'].astype(float).fillna(0)
df['long'] = df['long'].astype(float).fillna(0)
If you just have a list of strings, then easiest might be to use np.array with dtype=float
tuple_lat_lon = list(zip(
np.array(myplot.gpsLatitude.split(','), dtype=float),
np.array(myplot.gpsLongitude.split(','), dtype=float)
))
Here , myplot is a TextField
# These are the top 20 'coordinates' according to the data.
sns.set(font_scale=1.25)
f, ax = plt.subplots(figsize=(15,8))
sns.countplot(y='location', data=df_criclean, order=df_criclean.location.value_counts().iloc[:20].index)
# Here, I'm making a Dataframe of the locations and the count. What you see below
# is the top 5 locations.
# I want to use this for my folium map.
df1 = df_criclean.groupby(["lat", "long", "location"]).size().reset_index(name='count')
df1['location'] = df1['location'].str.replace(',', '')
# Sort the count from highest count with location to lowest.
print(df1.sort_values(by = 'count', ascending=False).head())
# The DataFrame not sorted.
print(df1.head())
# convert to (n, 2) nd-array format for heatmap
locationArr = df1[['lat', 'long']].as_matrix()
m = folium.Map(location=[42.32, -71.0589], zoom_start=12)
m.add_child(plugins.HeatMap(locationArr, radius=9))
m`
I had this same problem and solved it by transforming my latitude and longitude values to floats:
import folium
import numpy as np
plot = folium.Map(location=[40, -95], zoom_start=4)
coords = np.random.rand(1000,2) * 100
for lat, lon in coords:
folium.Circle(location=[float(lat), float(lon)]).add_to(plot)
I got the same problem. I had to convert it to a numpy array using .to_numpy() method to get it to work.

R understanding why numbers treated as factors, creating dataframe with colSums [duplicate]

Not sure why this is happening. I have a dataframe df2 with the variables below:
EVTYPE TOTAL_FATALITIES TOTAL_INJURIES
(fctr) (dbl) (dbl)
1 TORNADO 5633 91346
2 EXCESSIVE HEAT 1903 6525
3 FLASH FLOOD 978 1777
4 HEAT 937 2100
5 LIGHTNING 816 5230
6 TSTM WIND 504 6957
> df2$TOTAL_FATALITIES
[1] 5633 1903 978 937 816 504 470 368 248 224 206 204 172 160 133 127 103 101 101
> df2$EVTYPE
[1] TORNADO EXCESSIVE HEAT FLASH FLOOD HEAT LIGHTNING
[6] TSTM WIND FLOOD RIP CURRENT HIGH WIND AVALANCHE
[11] WINTER STORM RIP CURRENTS HEAT WAVE EXTREME COLD THUNDERSTORM WIND
[16] HEAVY SNOW STRONG WIND BLIZZARD HIGH SURF
985 Levels: HIGH SURF ADVISORY COASTAL FLOOD FLASH FLOOD LIGHTNING ... WND
> df2$TOTAL_INJURIES
[1] 91346 6525 1777 2100 5230 6957 6789 232 1137 170 1321 297 309 231 1488 1021
[17] 280 805 152
I am trying to create new column called SevType -- where I will store whether a value is either an injury or fatal.
However, when I use cbind on df2$EVTYPE, it converts the factor into a numeric as seen below.
> head(cbind(Event=df2$EVTYPE,Total = df2$TOTAL_INJURIES,Severity="INJURE"))
Event Total Severity
[1,] "834" "91346" "INJURE"
[2,] "130" "6525" "INJURE"
[3,] "153" "1777" "INJURE"
[4,] "275" "2100" "INJURE"
[5,] "464" "5230" "INJURE"
[6,] "856" "6957" "INJURE"
Notice that Event at [1,] has changed from TORNADO to 834.
Any hints on why this is happening?
We are cbinding vectors and the output will be a matrix. The matrix can hold only a single class. So, if there is any vector that is non-numeric, it will convert the whole matrix to 'character' and as the first column is already a factor, we get the numeric levels of that factor. Better would be to use data.frame
data.frame(Event=df2$EVTYPE,Total = df2$TOTAL_INJURIES,Severity="INJURE")
Or we can use bind_cols or data_frame from dplyr

Subset data frame in R given grouping length criterium

I'm working on some exercises based on this dataset.
There's a State column listing the rate of deaths per month by heart attack for each hospital of the state (column 11):
> table(data$State)
AK AL AR AZ CA CO CT DC DE FL GA GU HI IA ID IL IN KS KY
17 98 77 77 341 72 32 8 6 180 132 1 19 109 30 179 124 118 96
Now I try to filter out these states where at least 20 values are available:
> table(data$State)>20
AK AL AR AZ CA CO CT DC DE FL GA GU
FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE
So using subset I try to get a subset of data based on the above conditions, but that gives me a result I can't follow:
> data_subset <- subset(data, table(data$State)>20)
> table(data_subset$State)
AK AL AR AZ CA CO CT DC DE FL GA GU HI IA ID IL IN KS KY
14 84 66 65 288 64 25 8 5 155 109 1 19 93 24 153 107 100 83
Why am I getting AK 14, when I would expect that state to be filtered out by the condition?
You can use the following approach to filter out the data with less than 20 rows:
tab <- table(data$State)
data[data$State %in% names(tab)[tab > 19], ]
Your code
subset(data, table(data$State)>20)
does not work because table(data$State)>20 returns a boolean vector of length length(table$State). In your data, the boolean vector is shorter than the number of rows in your data frame. Due to vector recycling, the vector is combined with itself until the longer length is reached. E.g., have a look at (1:3)[c(TRUE, FALSE)].

Resources