Geographic Position and Iterations on cameras Web Scraping a Traffic Camera Map - web-scraping

I am trying to scrape data from this traffic camera map:
https://traffic.houstontranstar.org/layers/layers_ve.aspx?&inc=true&rc=true&lc=false&cam=true&dms=false&spd=true&tr=false&wthr=false&rfw=true&rm=false&nml=false&nmd=false
This is what it looks like: 1
The map shows the positions of a bunch of traffic cameras with these camera icons: 2
I am currently able to manually inspect element at each camera location and get the source link for the image in the camera's video feed. For example, I can get this link here for an example camera: https://www.houstontranstar.org/snapshots/cctv/913.jpg?arg=1661886693594
However, I also need to get the location of these cameras. Does anyone know if there is a way for me to get some sense of geographic location of these cameras on the map?
The second question I had was that if anyone knew of a good way for me to iterate upon the cameras? I'm completely new to web scraping so am currently just inspecting the elements of the website manually and going from there.
Any help would be great thanks!

The following code is one way of obtaining those cameras locations (lat/lon):
import requests
import pandas as pd
url = 'https://traffic.houstontranstar.org/data/layers/cctvSnapshots_json.js?arg=1661888238910'
r = requests.get(url)
df = pd.DataFrame(r.json()['cameras'])
print(df)
This returns a dataframe with 1126 rows × 8 columns:
location
id
lat
lng
dir
fc
fp
url
0
10 EAST # SAN JACINTO
1002
29.7682
-95.3557
South
6
300
showWin('/layers/gc.aspx?cam=1002&loc=IH-10_East_at_SAN_JACINTO')
1
10 EAST # SAN JACINTO (E)
1003
29.7698
-95.3505
South
1
0
showWin('/layers/gc.aspx?cam=1003&loc=IH-10_East_at_SAN_JACINTO_(E)')
2
10 EAST # JENSEN
1004
29.7703
-95.3437
North
1
0
showWin('/layers/gc.aspx?cam=1004&loc=IH-10_East_at_JENSEN')
3
10 East # Gregg
1005
29.7699
-95.3357
North
1
0
showWin('/layers/gc.aspx?cam=1005&loc=IH-10_East_at_Gregg')
4
10 East # Waco
1006
29.7729
-95.3246
North
1
0
showWin('/layers/gc.aspx?cam=1006&loc=IH-10_East_at_Waco')
[...]

Related

poly2nb function takes too much time to be computed

I have a data frame that have information about crimes (variable x), and latitude and longitude of where that crime happened. I have a shape file with the districts from são paulo city. That is df:
latitude longitude n_homdol
1 -23.6 -46.6 1
2 -23.6 -46.6 1
3 -23.6 -46.6 1
4 -23.6 -46.6 1
5 -23.6 -46.6 1
6 -23.6 -46.6 1
And a shape file for the districts of são paulo,sp.dist.sf :
geometry NOME_DIST
1 POLYGON ((352436.9 7394174,... JOSE BONIFACIO
2 POLYGON ((320696.6 7383620,... JD SAO LUIS
3 POLYGON ((349461.3 7397765,... ARTUR ALVIM
4 POLYGON ((320731.1 7400615,... JAGUARA
5 POLYGON ((338651 7392203, 3... VILA PRUDENTE
6 POLYGON ((320606.2 7394439,... JAGUARE
With the help of #Humpelstielzchen, i join both data doing:
sf_df = st_as_sf(df, coords = c("longitude", "latitude"), crs = 4326)
shape_df<-st_join(sp.dist.sf, sf_df, join=st_contains)
My final goal is to implement a local moran i statistic, and i'm trying to do this with:
sp_viz <- poly2nb(shape_df, row.names = shape_df$NOME_DIST)
xy <- st_coordinates(shape_df)
ww <- nb2listw(sp_viz, style ='W', zero.policy = TRUE)
shape_df[is.na(shape_df)] <- 0
locMoran <- localmoran(shape_df$n_homdol, ww)
sids.shade <- auto.shading(c(locMoran[,1],-locMoran[,1]),
cols=brewer.pal(5,"PRGn"))
choropleth(shape_df, locMoran[,1], shading=sids.shade)
choro.legend(-46.5, -20, sids.shade,fmt="%6.2f")
title("Criminalidade (Local Moran's I)",cex.main=2)
But when i run the code, it takes hours to compute:
sp_viz <- poly2nb(shape_df, row.names = shape_df$NOME_DIST)
I have 15,000 observations, for 93 districts. I tried to run the above code with only 100 observations, and it was fast and everything went right. But with the 15,000 obs i did not see the result, because de computation goes on forever. What may be happening? I am doing something wrong? Is there a better way to do this Local moran I test?
As I can't just comment, here is some questions one might ask:
- how long do you mean by fast? some of my scripts run in seconds and I call it slow.
- are all your observation identically structured? maybe the poly2nb() function is infinitely looping on an item which has an uncommon structure. You can use the unique() function to ensure this point.
- Did you try to cut your dataset into pieces and to run each piece separately? this would help to see 1/ whether one of your parts has something to be corrected and 2/ whether R is loading all data at the same time, overloading the memory of your computer. Beware, this happen really often with huge datasets in R (and by huge, I mean data tables of > 50 Mo wheight).
Glad to have tried to help you, do not hesitate to question my answer !

How do I convert city names to time zones?

Sorry if this is repetitive, but I've looked everywhere and can't seem to find anything that addresses my specific problem in R. I have a column with city names:
cities <-data.frame(c("Sydney", "Dusseldorf", "LidCombe", "Portland"))
colnames(cities)[1]<-"CityName"
Ideally I'd like to attach a column with either the lat/long for each city or the time zone. I have tried using the "ggmap" package in R, but my request exceeds the maximum number of requests they allow per day. I found the "geonames" package that converts lat/long to timezones, so if I get the lat/long for the city I should be able to take it from there.
Edit to address potential duplicate question: I would like to do this without using the ggmap package, as I have too many rows and they have a maximum # of requests per day.
You can get at least many major cities from the world.cities data in the maps package.
## Changing your data to a vector
cities <- c("Sydney", "Dusseldorf", "LidCombe", "Portland")
## Load up data
library(maps)
data(world.cities)
world.cities[match(cities, world.cities$name), ]
name country.etc pop lat long capital
36817 Sydney Australia 4444513 -33.87 151.21 0
10026 Dusseldorf Germany 573521 51.24 6.79 0
NA <NA> <NA> NA NA NA NA
29625 Portland Australia 8757 -38.34 141.59 0
Note: LidCombe was not included.
Warning: For many names, there is more than one world city. For example,
world.cities[grep("Portland", world.cities$name), ]
name country.etc pop lat long capital
29625 Portland Australia 8757 -38.34 141.59 0
29626 Portland USA 542751 45.54 -122.66 0
29627 Portland USA 62882 43.66 -70.28 0
Of course the two in the USA are Portland, Maine and Portland, Oregon.
match is just giving the first one on the list. You may need to use more information than just the name to get a good result.

Extracting String in R

I am wanting to extract strings from elements in a data frame. Having gone through numerous previous questions, I am still unable to understand what to do! This is what I have tried to do so far:
unlist(strsplit(pcode2$Postcode,"'"))
I get the following error:
Error in strsplit(pcode2$Postcode, "'") : non-character argument
which I understand because I am trying to reference the data rather than putting the text in the code itself. I have 16,000 cases in a dataframe so also not sure how to vectorise the operation.
Any help would be greatly appreciated.
Data:
Postcode Locality State Latitude Longitude
1 ('200', Australian National University ACT -35.280, 149.120),
2 ('221', Barton ACT -35.200, 149.100),
3 ('3030', Werribee VIC -12.800, 130.960),
4 ('3030', Point Cook VIC -12.800, 130.960),
I want to get rid of the commas and braces etc so that I am left with the numeric part of Column 1 which is Postcode, numeric part of Latitude andLongitude. This is how the I am hoping the final result will look like:
Postcode Locality State Latitude Longitude
1 200 Australian National University ACT -35.280 149.120
2 221 Barton ACT -35.200 149.100
3 3030 Werribee VIC -12.800 130.960
4 3030 Point Cook VIC -12.800 130.960
Lastly, I would also like to understand how to nicely format the data in the questions.

No coordinates when searching by geocode with R twitteR-package

searchTwitter('patriots', geocode='42.375,-71.1061111,10mi')
This query returns a list of tweets. However, most of these tweets have no location:
retweeted longitude latitude
1 FALSE <NA> <NA>
2 FALSE <NA> <NA>
3 FALSE <NA> <NA>
...
Why is that? How did twitter know that these tweets are within the range of the search query? Is there a way to get an estimate of the coordinates of these tweets?
The documentation for search/tweets says
The location is preferentially taking from the Geotagging API, but will fall back to their Twitter profile.
You should find that if you check the user of each tweet they have set their profile location and it falls within 10 miles of 42.375,-71.1061111'.

Can R merge latitude and longitude points to areas in spatial polygon?

I have two data frames. One is a Spatial Polygon and the other is a Spatial Points dataframe. Unfortunately I can't reproduce the entire example here but the Spatial Polygon looks like this:
head(electorate)
ELECT_DIV STATE NUMCCDS ACTUAL PROJECTED POPULATION OVER_18 AREA_SQKM SORTNAME
Adelaide SA 318 0 0 0 0 76.0074 Adelaide
Aston VIC 191 0 0 0 0 99.0122 Aston
Ballarat VIC 274 0 0 0 0 4651.5400 Ballarat
Banks NSW 229 0 0 0 0 49.3189 Banks
Barker SA 343 0 0 0 0 63885.7100 Barker
Barton NSW 234 0 0 0 0 44.1112 Barton
As you can see it's the spatial polygon for the Australian electorate. The second data frame is a Spatial points dataframe with longitude and latitude for polling places. It looks like this -
head(ppData)
State PollingPlaceID PollingPlaceNm Latitude Longitude
1 ACT 8829 Barton -35.3151 149.135
2 ACT 11877 Bonython -35.4318 149.083
3 ACT 11452 Calwell -35.4406 149.116
4 ACT 8794 Canberra Hospital -35.3453 149.099
5 ACT 8761 Chapman -35.3564 149.042
6 ACT 8763 Chisholm -35.4189 149.123
My goal is to try and match each polling place (PollingPlaceID) to the appropriate electoral division (ELECT_DIV). There will be many polling places within each division. It's no problem to plot them over each other. It seems only natural that R will also let me add a new vector to my polling place data frame (ppData) which assigns each polling place the electorate (ELECT_DIV) it falls within.
I know I can extract the coordinates for each ELECT_DIV from electorate with coordinates(electorate) but I'm not sure that actually helps. Any advice?
You need over from sp and you can use it like this:
require( sp )
ID <- over( SpatialPoints( ppData ) , electorate )
ppData#data <- cbind( ppData#data , ID )
This returns a data.frame where each row relates to the first argument (each of your polling points) and is the data from the polygon that the point fell in. You can just cbind them afterwards and you now have the polygon data that relates to each point.

Resources