How to scrape data from a web graph into R? - r

The website TRAC Immigration has data on the number of ICE deportations by month and year for each city in Texas. I would like to download this data into R, but there is not a data file available. I think this means I need to scrape the data, but I don't know how to do so. Here is the website: TRAC Immigration
There is a table for each city that displays the total number of deportations over the 19 year period but not by month and year.
However, there is a graph for each city that displays the number of deportations by month and year. This information is only displayed when you hover your cursor over each bar of the graph.
Please let me know if you have any ideas about how I could scrape the data from the graph for each city into R. I would eventually like to have the number of deportations be a variable in a dataset.

#Dave2e did the hard work, but here's a way of using what he found to get the different cities. You could replace depart_state with depart_city. Now, you don't know which cities are which, so you can use some brute force to get all of them. I was able to get the data for 397 cities in a few minutes:
out <- NULL
for(i in 1:397){
url <- glue::glue("https://trac.syr.edu/phptools/immigration/remove/graph.php?stat=count&timescale=fymon&depart_city={i}&timeunit=number")
j <- jsonlite::fromJSON(url)
tm <- j$timeline
tm$city <- j$title
out <- rbind(out, tm)
}
out %>% dplyr::filter(city == "LAREDO, TX, POE")

Related

How to create a ''for loop'' to download 5 consecutive months of data?

For an assignment we are supposed to use a for-loop to obtain a dataframe of 5 consecutive months.
The data regards crimes and their accompanying type of crime, location, month, street name etc.
How do we go about this issue?
We use the package 'ukpolice' and use this code to obtain data for a specific month and location of choice;
ukpolice syntax is as follows:
data <- as.data.frame(ukc_crime_location(lat = , lng = , date = ""))
Thank you in advance!

How do I stop the number of observations coming up when trying to tabulate a variable?

Very new to using R but encountering a problem when trying to work on the code for a stats project. I have attached the .csv file below for reference but essentially I would like to plot the years 2018,2019 and 2020 against the sum of international arrivals ("Int_Pax_In" in the excel file) from the first 6 months of each year from the "All Australian Airports" variable . So I will have 3 bars in my plot, with each being 2018,2019,2020 respectively with the y-axis labelled "All Australian Arrivals". The problem is, I just wanted to start off with a simple line of code to tabulate the "Year" variable without even trying to achieve the final result and simply putting in:
info=read.csv("mon_pax_web.csv")
table(info$Year)
doesn't give me any information. It simply gives me the number of observations for each year instead of anything else. Below is a screenshot of what I get:
Screenshot 1
info=read.csv("mon_pax_web.csv")
str(info)
table(info$Year)
I also tried changing my variables apart from "Year" into as.character and Month into factor but that had no effect as shown below:
Screenshot 2
info=read.csv("mon_pax_web.csv")
info$AIRPORT=as.character(info$AIRPORT)
info$Month=as.factor(info$Month)
info$Dom_Pax_In=as.character(info$Dom_Pax_In)
info$Dom_Pax_Out=as.character(info$Dom_Pax_Out)
info$Dom_Pax_Total=as.character(info$Dom_Pax_Total)
info$Int_Pax_Out=as.character(info$Int_Pax_Out)
info$Int_Pax_Total=as.character(info$Int_Pax_Total)
info$Pax_In=as.character(info$Pax_In)
info$Pax_Out=as.character(info$Pax_Out)
info$Pax_Total=as.character(info$Pax_Total)
info$Int_Pax_In=as.character(info$Int_Pax_In)
str(info)
table(info$Year)
I'm only allowed to use Base R for this project so would appreciate it a lot if people could help me out and if you do, provide coding using Base R so I could follow along. Just require some pointers so I could get started.
CSV File for reference
Thank you.
The column info$Year is just a vector of years, so when you do table(info$Year) it only shows the number of entries for that year because that's what you have asked for. If I gave you the following years: 2011, 2011, 2012 and 2013, and asked you to tabulate the years, without giving you any other information, all you could do is count the number of instances of each year. Presumably, this is not what meant.
I'm guessing what you're trying to do is to get the sum of Int_Pax_In per year. First you should filter so that your only include the years of interest, the months of interest, and the rows that represent all Australian airports. You can do this using subset:
df <- subset(info, Year > 2017 & Month < 7 & AIRPORT == "All Australian Airports")
Now we can use tapply to find the sum for each year:
plot_table <- tapply(df$Int_Pax_In, df$Year, sum)
Finally, we use barplot to create the bar graph you wanted:
barplot(plot_table, main = "Arrivals at all Australian airports January - June")

Several data points in R on the same date that need to be averaged out to one datapoint

I've got a dataframe in R with time series data, and I'm trying to plot how many likes a person got on an instagram post on a given date. However, on some dates a user might post more than once i.e. they will have several datapoints of nr of likes on that date. I'm not sure how I can average out the amount of likes, so that I am left with just one data point.
user <- c('John Doe')
likecount <- c(21000, 23400, 26800)
postdate <- as.Date(c('2010-11-1','2010-11-1','2010-11-2'))
df <- data.frame(user, likecount, postdate)
So for this code example I would need to have the average of the likecount that both fall on the same date. Preferably I would run through the entire dataframe and see if there are several instances of same-day-posting for a single user, where I can automatically average out the likecount on those dates.
We can use aggregate
aggregate(likecount ~ ., df, mean)

Plotting 52 week range in R

I am trying to pull stock price data using tq_get in tidyquant, then want to plot the current price against the 52 week range. Here is an example of what I am looking to create.
Basically just a visual representation of where the stock is currently trading in relation to its 52 week range. Below is the code I have begun to load in the appropriate values for TSLA. First, I am wondering if it is possible to set the "from" and "to" dates so that they constantly update to be exactly one year ago and the current date, respectively? Second, is there a ggplot or another package that might be able to generate a similar plot? I've explored boxplots, but really I need something even more simple than that, as I really only need one axis. Thanks in advance!
X <- tq_get(c("^GSPC","TSLA"),get="stock.prices",from="2019-05-04", to="2020-05-04")
TSLA <- X %>% filter(symbol == "TSLA") %>% tk_xts()
chartSeries(TSLA)
TSLAlow <- min(TSLA$close)
TSLAlow
TSLAhigh <- max(TSLA$close)
TSLAhigh
TSLAclose <- tail(X$close, n=1)
TSLAclose
TSLArange <- tibble(TSLAlow, TSLAhigh, TSLAclose)

Different age calculation for different rows

I'm an absolute R beginner here working on a Master's project.
I have a data.frame that contains information on trotting horses (their wins, earnings, time records and such). The data is organised in a way that every row contains information for a specific year the horse competed and including a first row for each horse of "Total", so there's a summary for every variable for it's total competing life. It looks like this:
I created a new variable with their age using the age_calc function in the eeptools package:
travdata$Age<-age_calc(as.Date(travdata$Birth.date), enddate=as.Date("2016-12-31"),
units="years")
With no problems. What I'm trying to figure out is if there is any way I can calculate the age of the horses for each specific year I have info on them-that is, the "Total" row would have their age up until 2016-12-31, for the year 2015 it would have their age at that time and so on. I've been trying to include if statements in age_calc but it won't work and I'm really at a loss on how best to do this.
Any literature or help you could point me to would be much, much appreciated.
MWE
travdata <- data.frame(
"Id.Number"=c(rep("1938-98",3),rep("1803-97",7),rep("1221-03",4)),
"Name"=c(rep("Muuttuva",3),rep("Pelson Poika",7),rep("Muusan Muisto",4)),
"Sex"=c(rep("Mare",3),rep("Gelding",7),rep("Gelding",4)),
"Birth.year"=c(rep(1998,3),rep(1997,7),rep(2003,4)),
"Birth.date"=c(rep("1998-07-01",3),rep("1997-07-14",7),rep("2003-05-07",4)),
"Competition.year" = c("Total",2005,2004,"Total",2003,2004,2006,2005,2002,2001,2008,2010,"Total",2009),
"starts"=c(20,11,9,44,21,6,7,5,3,2,1,1,4,2),
"X1st.placements"=c(0,0,0,3,3,0,0,0,0,0,0,0,0,0),
"X2nd.placements"=c(2,2,0,1,0,1,0,0,0,0,0,0,0,0),
"X3rd.placements"=c(2,2,0,1,1,0,0,0,0,0,0,0,0,0),
"Earnings.euro"=c(1525,1425,100,2078,1498,580,0,0,0,0,0,0,10,10)
)
The trick is to filter out the "Total" rows and specify a format for the as.Date() function
library(eeptools)
travdata <- data.frame(
"Id.Number"=c(rep("1938-98",3),rep("1803-97",7),rep("1221-03",4)),
"Name"=c(rep("Muuttuva",3),rep("Pelson Poika",7),rep("Muusan Muisto",4)),
"Sex"=c(rep("Mare",3),rep("Gelding",7),rep("Gelding",4)),
"Birth.year"=c(rep(1998,3),rep(1997,7),rep(2003,4)),
"Birth.date"=c(rep("1998-07-01",3),rep("1997-07-14",7),rep("2003-05-07",4)),
"Competition.year" = c("Total",2005,2004,"Total",2003,2004,2006,2005,2002,2001,2008,2010,"Total",2009),
"starts"=c(20,11,9,44,21,6,7,5,3,2,1,1,4,2),
"X1st.placements"=c(0,0,0,3,3,0,0,0,0,0,0,0,0,0),
"X2nd.placements"=c(2,2,0,1,0,1,0,0,0,0,0,0,0,0),
"X3rd.placements"=c(2,2,0,1,1,0,0,0,0,0,0,0,0,0),
"Earnings.euro"=c(1525,1425,100,2078,1498,580,0,0,0,0,0,0,10,10)
)
travdata$Age<-age_calc(as.Date(travdata$Birth.date),
enddate=as.Date("2016-12-31"), units="years")
competitions <- travdata[travdata$Competition.year!="Total",]
competitions$Competition.age<-age_calc(
as.Date(competitions$Birth.date),
enddate=as.Date(competitions$Competition.year, format="%Y"),
units="years",F)

Resources