how can i plot a histogram of crime type vs HOURS in r - r

i have a big dataset, with diferent variables and i want to make a histogram of type of crime against HOURS. how can i do that in r?
DATE TIME PLACE ZONE TYPE.OF.CRIME WEEK
1 2011/01/01 23:00 KIEPIES CLUB <NA> ARMED ROBBERY 1
2 2011/01/03 10:00 AUSSPANNPLATZ Zone 14 ARMED ROBBERY 1
3 2011/01/07 14:00 UNAM BUSHES Zone 16 ARMED ROBBERY 1
4 2011/01/08 21:34 TOTAL SERV. STATION, KHOMASDAL Zone 9 ARMED ROBBERY 1
5 2011/01/15 <NA> WOODPALM STR 625 Zone 11 ARMED ROBBERY 2
6 2011/01/03 14:03 C KANDOVAZU STR Zone 5 ASSAULT GBH 1
HOUR day month year HOURS
1 23 1 1 2011 23
2 10 3 1 2011 10
3 14 7 1 2011 14
4 21 8 1 2011 21
5 <NA> 15 1 2011 <NA>
6 14 3 1 2011 14

ggplot(df, aes(x=TYPE.OF.CRIME, y=HOURS)) +
geom_histogram()
Something like this should work.

Related

Impute missing records based on week and year in r

I want to impute missing weeks record with 0 values in duration column for each household, individual combination.
The minimum week here is w51 of 2021 and goes upto w4 of 2022
For Household 1001 - individual 1 combination, week 3 is missing in the sequence.
Household 1002 - individual 2, week 52,week 2 and week 4 is missing
Final dataset would be:
what I tried is using complete function from tidyr after group by with household and individual but its not working.
In actual dataset minimium and maximum weeks will be changing.
Here is the sample dataset
data <- data.frame(household=c(1001,1001,1001,1001,1001,1002,1002,1002,1003,1003,1003),
individual = c(1,1,1,1,1,2,2,2,1,1,1),
year = c(2021,2021,2022,2022,2022,2021,2022,2022,2022,2022,2022),
week =c("w51","w52","w1","w2","w4","w51","w1","w3","w1","w2","w3"),
duration =c(20,23,24,56,78,12,34,67,87,89,90))
Using the examples on the ?complete help page, you can use nesting() to give you what you want
data %>%
complete(nesting(household, individual), nesting(year, week), fill=list(duration=0))
# household individual year week duration
# <dbl> <dbl> <dbl> <chr> <dbl>
# 1 1001 1 2021 w51 20
# 2 1001 1 2021 w52 23
# 3 1001 1 2022 w1 24
# 4 1001 1 2022 w2 56
# 5 1001 1 2022 w3 0
# 6 1001 1 2022 w4 78
# 7 1002 2 2021 w51 12
# 8 1002 2 2021 w52 0
# 9 1002 2 2022 w1 34
# 10 1002 2 2022 w2 0
# 11 1002 2 2022 w3 67
# 12 1002 2 2022 w4 0
# 13 1003 1 2021 w51 0
# 14 1003 1 2021 w52 0
# 15 1003 1 2022 w1 87
# 16 1003 1 2022 w2 89
# 17 1003 1 2022 w3 90
# 18 1003 1 2022 w4 0

How to calculate the number of months from the initial date for each individual

This is a representation of my dataset
ID<-c(rep(1,10),rep(2,8))
year<-c(2007,2007,2007,2008,2008,2009,2010,2009,2010,2011,
2008,2008,2009,2010,2009,2010,2011,2011)
month<-c(2,7,12,4,11,6,11,1,9,4,3,6,7,4,9,11,2,8)
mydata<-data.frame(ID,year,month)
I want to calculate for each individual the number of months from the initial date. I am using two variables: year and month.
I firstly order years and months:
mydata2<-mydata%>%group_by(ID,year)%>%arrange(year,month,.by_group=T)
Then I created the variable date considering that the day begin with 01:
mydata2$date<-paste("01",mydata2$month,mydata2$year,sep = "-")
then I used lubridate to change this variable in date format
mydata2$date<-dmy(mydata2$date)
But after this, I really don't know what to do, in order to have such a dataset (preferably using dplyr code) below:
ID year month date dif_from_init
1 1 2007 2 01-2-2007 0
2 1 2007 7 01-7-2007 5
3 1 2007 12 01-12-2007 10
4 1 2008 4 01-4-2008 14
5 1 2008 11 01-11-2008 21
6 1 2009 1 01-1-2009 23
7 1 2009 6 01-6-2009 28
8 1 2010 9 01-9-2010 43
9 1 2010 11 01-11-2010 45
10 1 2011 4 01-4-2011 50
11 2 2008 3 01-3-2008 0
12 2 2008 6 01-6-2008 3
13 2 2009 7 01-7-2009 16
14 2 2009 9 01-9-2009 18
15 2 2010 4 01-4-2010 25
16 2 2010 11 01-11-2010 32
17 2 2011 2 01-2-2011 35
18 2 2011 8 01-8-2011 41
One way could be:
mydata %>%
group_by(ID) %>%
mutate(date = as.Date(sprintf('%d-%d-01',year, month)),
diff = as.numeric(round((date - date[1])/365*12)))
# A tibble: 18 x 5
# Groups: ID [2]
ID year month date diff
<dbl> <dbl> <dbl> <date> <dbl>
1 1 2007 2 2007-02-01 0
2 1 2007 7 2007-07-01 5
3 1 2007 12 2007-12-01 10
4 1 2008 4 2008-04-01 14
5 1 2008 11 2008-11-01 21
6 1 2009 6 2009-06-01 28
7 1 2010 11 2010-11-01 45
8 1 2009 1 2009-01-01 23
9 1 2010 9 2010-09-01 43
10 1 2011 4 2011-04-01 50
11 2 2008 3 2008-03-01 0
12 2 2008 6 2008-06-01 3
13 2 2009 7 2009-07-01 16
14 2 2010 4 2010-04-01 25
15 2 2009 9 2009-09-01 18
16 2 2010 11 2010-11-01 32
17 2 2011 2 2011-02-01 35
18 2 2011 8 2011-08-01 41

Using dplyr to summarize a variable and calculate the mean of another dependent on other variables

In the data set below, I want to summary the rentals grouped_by city and date and additionally calculate the mean duration grouped_by date + city.
date rentals City duration
<date> <dbl> <fct> <drtn>
1 2014-01-01 1 Hamburg 15 mins
2 2014-01-01 1 Hamburg 18 mins
3 2014-01-01 1 Vienna 13 mins
4 2014-01-02 1 Vienna 1 mins
5 2014-01-02 1 Hamburg 8 mins
6 2014-01-02 1 Berlin 4 mins
7 2014-01-03 1 Hamburg 13 mins
8 2014-01-03 1 Hamburg 2 mins
9 2014-01-03 1 Berlin 4 mins
10 2014-01-04 1 Hamburg 17 mins
...
I'd like to use dplyr and tried the following:
df <- df %>%
group_by(date, city) %>%
summarise((rentals=sum(rentals)), duration=mean(duration))
I end up having only one row left with the summarized rentals and the mean overall duration. It seems that it just ignored my group_by function.
Would be great to get some help :)
I think you're just not capitalizing City properly. This works for me:
library(dplyr)
df <- read.table(text = "date rentals City duration
1 2014-01-01 1 Hamburg 15
2 2014-01-01 1 Hamburg 18
3 2014-01-01 1 Vienna 13
4 2014-01-02 1 Vienna 1
5 2014-01-02 1 Hamburg 8
6 2014-01-02 1 Berlin 4
7 2014-01-03 1 Hamburg 13
8 2014-01-03 1 Hamburg 2
9 2014-01-03 1 Berlin 4
10 2014-01-04 1 Hamburg 17 ")
df2 <- df %>%
group_by(date, City) %>%
summarise(rentals=sum(rentals), duration=mean(duration))
df2 output:
# A tibble: 8 x 4
# Groups: date [4]
date City rentals duration
<chr> <chr> <int> <dbl>
1 2014-01-01 Hamburg 2 16.5
2 2014-01-01 Vienna 1 13
3 2014-01-02 Berlin 1 4
4 2014-01-02 Hamburg 1 8
5 2014-01-02 Vienna 1 1
6 2014-01-03 Berlin 1 4
7 2014-01-03 Hamburg 2 7.5
8 2014-01-04 Hamburg 1 17

Change month that is in numbers to month like 1 to January, 2 to Febuary, etc [duplicate]

This question already has answers here:
Convert month's number to Month name
(2 answers)
Convert a numeric month to a month abbreviation
(5 answers)
Closed 2 years ago.
I want to change digits to month name in r, how can I do that? for example "1" change it to "january". the dataset is big, so i just run the first 6.
X place ZONE TYPE.OF.CRIME year month WEEK day HOUR
1 1 KIEPIES CLUB NA ARMED ROBBERY 2011 1 1 1 23
2 2 AUSSPANNPLATZ 14 ARMED ROBBERY 2011 1 1 3 10
3 3 UNAM BUSHES 16 ARMED ROBBERY 2011 1 1 7 14
4 4 TOTAL SERV. STATION, KHOMASDAL 9 ARMED ROBBERY 2011 1 1 8 21
5 5 WOODPALM STR 625 11 ARMED ROBBERY 2011 1 2 15 <NA>
6 6 C KANDOVAZU STR 5 ASSAULT GBH 2011 1 1 3 14

How to lump sum the number of days of a data of several year?

I have data similar to this. I would like to lump sum the day (I'm not sure the word "lump sum" is correct or not) and create a new column "date" so that new column lump sum the number of 3 years data in ascending order.
year month day
2011 1 5
2011 2 14
2011 8 21
2012 2 24
2012 3 3
2012 4 4
2012 5 6
2013 2 14
2013 5 17
2013 6 24
I did this code but result was wrong and it's too long also. It doesn't count the February correctly since February has only 28 days. are there any shorter ways?
cday <- function(data,syear=2011,smonth=1,sday=1){
year <- data[1]
month <- data[2]
day <- data[3]
cmonth <- c(0,31,28,31,30,31,30,31,31,30,31,30,31)
date <- (year-syear)*365+sum(cmonth[1:month])+day
for(yr in c(syear:year)){
if(yr==year){
if(yr%%4==0&&month>2){date<-date+1}
}else{
if(yr%%4==0){date<-date+1}
}
}
return(date)
}
op10$day.no <- apply(op10[,c("year","month","day")],1,cday)
I expect the result like this:
year month day date
2011 1 5 5
2011 1 14 14
2011 1 21 21
2011 1 24 24
2011 2 3 31
2011 2 4 32
2011 2 6 34
2011 2 14 42
2011 2 17 45
2011 2 24 52
Thank you for helping!!
Use Date classes. Dates and times are complicated, look for tools to do this for you rather than writing your own. Pick whichever of these you want:
df$date = with(df, as.Date(paste(year, month, day, sep = "-")))
df$julian_day = as.integer(format(df$date, "%j"))
df$days_since_2010 = as.integer(df$date - as.Date("2010-12-31"))
df
# year month day date julian_day days_since_2010
# 1 2011 1 5 2011-01-05 5 5
# 2 2011 2 14 2011-02-14 45 45
# 3 2011 8 21 2011-08-21 233 233
# 4 2012 2 24 2012-02-24 55 420
# 5 2012 3 3 2012-03-03 63 428
# 6 2012 4 4 2012-04-04 95 460
# 7 2012 5 6 2012-05-06 127 492
# 8 2013 2 14 2013-02-14 45 776
# 9 2013 5 17 2013-05-17 137 868
# 10 2013 6 24 2013-06-24 175 906
# using this data
df = read.table(text = "year month day
2011 1 5
2011 2 14
2011 8 21
2012 2 24
2012 3 3
2012 4 4
2012 5 6
2013 2 14
2013 5 17
2013 6 24", header = TRUE)
This is all using base R. If you handle dates and times frequently, you may also want to look a the lubridate package.

Resources