How to create a ''for loop'' to download 5 consecutive months of data? - r

For an assignment we are supposed to use a for-loop to obtain a dataframe of 5 consecutive months.
The data regards crimes and their accompanying type of crime, location, month, street name etc.
How do we go about this issue?
We use the package 'ukpolice' and use this code to obtain data for a specific month and location of choice;
ukpolice syntax is as follows:
data <- as.data.frame(ukc_crime_location(lat = , lng = , date = ""))
Thank you in advance!

Related

How to scrape data from a web graph into R?

The website TRAC Immigration has data on the number of ICE deportations by month and year for each city in Texas. I would like to download this data into R, but there is not a data file available. I think this means I need to scrape the data, but I don't know how to do so. Here is the website: TRAC Immigration
There is a table for each city that displays the total number of deportations over the 19 year period but not by month and year.
However, there is a graph for each city that displays the number of deportations by month and year. This information is only displayed when you hover your cursor over each bar of the graph.
Please let me know if you have any ideas about how I could scrape the data from the graph for each city into R. I would eventually like to have the number of deportations be a variable in a dataset.
#Dave2e did the hard work, but here's a way of using what he found to get the different cities. You could replace depart_state with depart_city. Now, you don't know which cities are which, so you can use some brute force to get all of them. I was able to get the data for 397 cities in a few minutes:
out <- NULL
for(i in 1:397){
url <- glue::glue("https://trac.syr.edu/phptools/immigration/remove/graph.php?stat=count&timescale=fymon&depart_city={i}&timeunit=number")
j <- jsonlite::fromJSON(url)
tm <- j$timeline
tm$city <- j$title
out <- rbind(out, tm)
}
out %>% dplyr::filter(city == "LAREDO, TX, POE")

Create a column in one dataframe based on another column in another dataframe in R

I am fairly new to R and DPLYR and I am stuck on a this issue:
I have two tables:
(1) Repairs done on cars
(2) Amount owed on each car over time
What I would like to do is create three extra columns on the repair table that gives me:
(1) the amount owed on the car when the repair was done,
(2) 3months down the road and
(3) finally last payment record on file.
And if the case where the repair date does not match with any payment record, I need to use the closest amount owed on record.
So something like:
Any ideas how I can do that?
Here are the data frames:
Repairs done on cars:
df_repair <- data.frame(unique_id =
c("A1","A2","A3","A4","A5","A6","A7","A8"),
car_number = c(1,1,1,2,2,2,3,3),
repair_done = c("Front Fender","Front
Lights","Rear Lights","Front Fender", "Rear Fender","Rear Lights","Front
Lights","Front Fender"),
YearMonth = c("2014-03","2016-03","2016-07","2015-05","2015-08","2016-01","2018-01","2018-05"))
df_owed <- data.frame(car_number = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3),
YearMonth = c("2014-02","2014-05","2014-06","2014-08","2015-06","2015-12","2016-03","2016-04","2016-05","2016-06","2016-07","2016-08","2015-05","2015-08","2015-12","2016-03","2018-01","2018-02","2018-03","2018-04","2018-05","2018-09"),
amount_owed = c(20000,18000,17500,16000,10000,7000,6000,5500,5000,4500,4000,3000,10000,8000,6000,0,50000,40000,35000,30000,25000,15000))
Using zoo for year-months, and tidyverse, you could try the following. Using left_join add all the df_owed data to your df_repair data, by the car_number. You can convert your year-month columns to yearmon objects with zoo. Then, sort your rows by the year-month column from df_owed.
For each unique_id (using group_by) you can create your three columns of interest. The first will use the latest amount_owed where the owed date is prior to the service date. Then second (3 months) will use the first amount_owed value where the owed date follows the service date by 3 months (3/12). Finally, the most recent take just the last value from amount_owed.
Using the example data, the results differ a bit, possibly due to the data frames not matching the images in the post.
library(tidyverse)
library(zoo)
df_repair %>%
left_join(df_owed, by = "car_number") %>%
mutate_at(c("YearMonth.x", "YearMonth.y"), as.yearmon) %>%
arrange(YearMonth.y) %>%
group_by(unique_id, car_number) %>%
summarise(
owed_repair_done = last(amount_owed[YearMonth.y <= YearMonth.x]),
owed_3_months = first(amount_owed[YearMonth.y >= YearMonth.x + 3/12]),
owed_most_recent = last(amount_owed)
)

Column operators regarding only specific columns (specific dates and code i.e.) in R

i am trying to calculate the average_relative_humidity of the city Seoul for the dates 2020-01-01 tll 2020-31-01.
I have this data:
and I´ve tried this already but don´t really know what´missing.
Seoul_weather_dt <- Corona_relevant_weather_dt[, avg_relative_humidity_seoul := mean(avg_relative_humidity[code =="2020-01-01":"2020-01-01"]), by = c("province", "date", "avg_temp", "avg_relative_humidity"]
Can someone help me?
Something like this?
#select only Seoul and relevant dates
Seoul_weather_dt <- Corona_relevant_weather_dt[province == "Seoul" & date > as.date("2020-01-01") & date <= as.date("2020-31-01")]
#calculate average humidity for each unique date
aggregate(Seoul_weather_dt$avg_relative_humidity, by = list(Seoul_weather_dt$date), FUN = mean)
The line of code you provide is pretty long. I would suggest creating multiple lines with less functions per line to maintain an overview (also easier when getting an error). Also
is datein class "Date"? You can see that using str(Seoul_weather_dt)
code =="2020-01-01":"2020-01-01" only selects one day
Using by = c("province", "date", "avg_temp", "avg_relative_humidity") is strange. Then you would calculate a mean value for each observation of avg_relative_humidity as well, which is not what you want
Why create average values for each province when you are only interested in Seoul?

Earliest date by country using R

I have a list of countries with the cumulative amount of cases (csum colum) by date in my dataframe (df)
I am trying to group by country and pull out the earliest date per country.
I tried the following code, but the dates it returns are incorrect:
df_2 = aggregate(df$date, by = list(df$country), FUN = "min")
Would anyone be able to see where i'm going wrong (p.s: I need to avoid using any libraries)
Thanks :)

Different age calculation for different rows

I'm an absolute R beginner here working on a Master's project.
I have a data.frame that contains information on trotting horses (their wins, earnings, time records and such). The data is organised in a way that every row contains information for a specific year the horse competed and including a first row for each horse of "Total", so there's a summary for every variable for it's total competing life. It looks like this:
I created a new variable with their age using the age_calc function in the eeptools package:
travdata$Age<-age_calc(as.Date(travdata$Birth.date), enddate=as.Date("2016-12-31"),
units="years")
With no problems. What I'm trying to figure out is if there is any way I can calculate the age of the horses for each specific year I have info on them-that is, the "Total" row would have their age up until 2016-12-31, for the year 2015 it would have their age at that time and so on. I've been trying to include if statements in age_calc but it won't work and I'm really at a loss on how best to do this.
Any literature or help you could point me to would be much, much appreciated.
MWE
travdata <- data.frame(
"Id.Number"=c(rep("1938-98",3),rep("1803-97",7),rep("1221-03",4)),
"Name"=c(rep("Muuttuva",3),rep("Pelson Poika",7),rep("Muusan Muisto",4)),
"Sex"=c(rep("Mare",3),rep("Gelding",7),rep("Gelding",4)),
"Birth.year"=c(rep(1998,3),rep(1997,7),rep(2003,4)),
"Birth.date"=c(rep("1998-07-01",3),rep("1997-07-14",7),rep("2003-05-07",4)),
"Competition.year" = c("Total",2005,2004,"Total",2003,2004,2006,2005,2002,2001,2008,2010,"Total",2009),
"starts"=c(20,11,9,44,21,6,7,5,3,2,1,1,4,2),
"X1st.placements"=c(0,0,0,3,3,0,0,0,0,0,0,0,0,0),
"X2nd.placements"=c(2,2,0,1,0,1,0,0,0,0,0,0,0,0),
"X3rd.placements"=c(2,2,0,1,1,0,0,0,0,0,0,0,0,0),
"Earnings.euro"=c(1525,1425,100,2078,1498,580,0,0,0,0,0,0,10,10)
)
The trick is to filter out the "Total" rows and specify a format for the as.Date() function
library(eeptools)
travdata <- data.frame(
"Id.Number"=c(rep("1938-98",3),rep("1803-97",7),rep("1221-03",4)),
"Name"=c(rep("Muuttuva",3),rep("Pelson Poika",7),rep("Muusan Muisto",4)),
"Sex"=c(rep("Mare",3),rep("Gelding",7),rep("Gelding",4)),
"Birth.year"=c(rep(1998,3),rep(1997,7),rep(2003,4)),
"Birth.date"=c(rep("1998-07-01",3),rep("1997-07-14",7),rep("2003-05-07",4)),
"Competition.year" = c("Total",2005,2004,"Total",2003,2004,2006,2005,2002,2001,2008,2010,"Total",2009),
"starts"=c(20,11,9,44,21,6,7,5,3,2,1,1,4,2),
"X1st.placements"=c(0,0,0,3,3,0,0,0,0,0,0,0,0,0),
"X2nd.placements"=c(2,2,0,1,0,1,0,0,0,0,0,0,0,0),
"X3rd.placements"=c(2,2,0,1,1,0,0,0,0,0,0,0,0,0),
"Earnings.euro"=c(1525,1425,100,2078,1498,580,0,0,0,0,0,0,10,10)
)
travdata$Age<-age_calc(as.Date(travdata$Birth.date),
enddate=as.Date("2016-12-31"), units="years")
competitions <- travdata[travdata$Competition.year!="Total",]
competitions$Competition.age<-age_calc(
as.Date(competitions$Birth.date),
enddate=as.Date(competitions$Competition.year, format="%Y"),
units="years",F)

Resources