Graphing crime rates on stata - graph

I am new to stata. I am trying to write code in stata that would graph Idaho's crime rate from 2016 - 2021. I'm using data from the FBI: (https://cde.ucr.cjis.gov/LATEST/webapp/#/pages/downloads#nibrs-downloads). Wondering if anyone had any tips?

can you give more detail about how the original data illustrate the crime rate?
if you have the year and corresponding crime rate
you can try
twoway (line rate year)

I think a bit more information might be helpful but based upon what you sent, I think this might be of use. It looks like you would need to use the "NIBRS_incident.csv" file for years 2016-2021. After downloading these files, you would need to append these files to create a workfile with all 6 years of data. Once you've done that, you can create a count variable by year with:
by data_year: gen Incidence_By_Yr = _N
Then, following the comment above with the twoway graph.
twoway(line Incidence_By_Yr year)

Related

Forecasting Hospital Bed Demand Using Daily Observations

Basically, my task for the next 3 months is to forecast bed demand and a couple of other variables in a hospital's emergency department. The data is 5 years worth of daily observations of these variables. The data is complete with no missing values.
The goal is to improve the prediction accuracy of the current tool, which is an Excel workbook.
I have not taken any time series or optimization courses in college thus far- so imagine my horror when I realised I had no clue on how to approach this project and that I would be working entirely alone. I was told no one in the department has any experience and no one would be able to help me.
I'm using RStudio, but I'm not very proficient since it was self-taught.
From trying out the questions asked on here as well as YouTube tutorials to learn the appropriate syntax and functions, what I have managed to find out is:
1) My data is a time series and I should apply forecasting models to predict future values based on the historical data I have.
2) Daily observations of a long time series has weekly and annual seasonality, so I should define the data as a multi-seasonal time series.
I first tried defining my data as ts(), then msts(). One of the answers here mentioned zoo() would be more appropriate for daily obervations, so I tried that too. The forecasting models I've tried are snaive, ets, auto.arima and TBATS.
I would like to present the plots of the values/forecasts based on day-of-the-week other than all 365 days of the year, which is the only output I could plot. I tried using frequency = 365 and 7, and start = c(2014, 1) and end= c(2018, 365), but I haven't had any luck.
I would really appreciate any advice and help I could get from anyone. Thank you!
Without looking at your data, have you tried to get started with some basic ARIMA modeling and seeing what results you get from that? It’s a fairly friendly way to get started with time series forecasting, depending on your data. I was forecasting by the hour, but the frequency can be adjusted to whatever you need to forecast in. As you have mentioned, you are looking ot change the frequency. Sometimes it’s easier to see a pattern at larger time intervals, and can aggregate your data at larger time intervals.
For example, this converts daily observations to monthly.
library(xts)
dates <- seq(as.Date('2012-01-01'),as.Date('2019-03-31'),by='days')
beds$date.formatted <- dates
beds.xts <- xts(x=beds$neds.count,as.POSIXct(paste(beds$date.formatted)))
end.month <- endpoints(beds.xts,'months')
beds.month <- period.apply(beds.xts,end.month,sum)
beds.monthly.df <- data.frame(date=index(beds.month),coredata(beds.month))
colnames(beds.monthly.df) <- c('Date','Sessions')
beds.monthly <- ts(sessions.monthly.df$Sessions,start=c(2012,1),end=c(2019,3),frequency=12)
plot(beds.monthly)
I’m not sure if that would answer your question, but as you mentioned you are self-taught and stating out, I can share a script with you to help you go get started with an example, and maybe this would help you? It goes through the whole process of checking you have read your data in as a time series, what is time series data, how to check for non-stationary data and seasonality trends, plots that are useful for this, modeling, prediction, plotting actual vs predicted, accuracy, and further issues with the data that could be hindering your model. The video tutorial series are scripted in Python, but you can follow the end-to-end process of forecasting in ARIMA using the equivalent R script for this tutorial: https://code.datasciencedojo.com/rebeccam/tutorials/blob/master/Time%20Series/r_time_series_example.R
https://tutorials.datasciencedojo.com/time-series-python-reading-data/

Plotting topic prevelance for each group [Structural Topic Modeling R]

Community,
I have a question regarding the STM package for R and hope that you can help me find an answer.
In figure 7 of the vignette the authors present a graph, where the topic prevalence (for topic 7) over time can be seen. Is it possible to plot the same graph by adding two further lines: one for liberal and one for conservative?
Liberal and conservative are attributes of the variable "rating"
The plot in figure 7 shows you the topic proportion of one topic (in this case topic 7) over a time span of January 2008 to December 2008. Basically, this graph shows you how your topic is distributed over a specific time frame. You can plot other topics in this graph, so yes, you can add more lines, but you cannot add a variable, like liberal/conservative, to plot in this graph.
Also, you might want to have a look at this -- it would be useful to add a reproducible example for clarity
How to make a great R reproducible example

ACF on parts of dataset?

R noob here, I am running acf's In R to check Auto-correlation on my data before running other tests.
Now I am running into 2 problems. I have time-series data for 26 years (1990-2016).
Problem 1. For some of my variables a couple of years have missing data (1995-1997). For these specific variables I would like to start the acf at the year 1998. Is this possible?
Problem 2. One variable has multiple years with missing data throughout the time-series, but only for odd years. Is it possible to do an acf for only even years?
I could manually adjust the data but would prefer to keep it as one dataset.
Thank you!

band filtering based on PSD, to filter out frequency domains in r, probably using "buttord" from signal

I'm still a novice in R and I read quite a couple of posts and discussions on how to filter out frequency domains in a time series, but none of those quite matched my problem.
I would like to ask for your suggestions about the following:
I calculated wavelet coherence for two annually measured time series and taking a look at the wavelet coherence PSD graph:
The purple line (i.e. 8 year period) represents the border under which I would like to filter out the frequency domain, but not in the PSD, but in the original input data.
I though about using the butter function from the signal package, but it was overcomplicated for my purposes.
Thus I approached the problem with the bwfilter function of the mFilter package fo pass through the data over the 8 year period which corresponds to 2.37E-7 Hz.
name="dta OAK.resid Tair "
adat=read.table(file=paste(name,".csv", sep=""), sep=";", header=T)
dta=adat$ya
highpass <- bwfilter(dta, freq=8,drift=FALSE)
plot(highpass)
However, the results do not seem to be correct, because it seems to filter out too much from the data, the trend is too much aligned to the original time series.
Do you have any idea what may have gone wrong? The measurement unit maybe?
Any help is appreciated and if any additional details are needed I am happy to provide them!
Thank you!
The data can be found here

matching among multiple variables in R

I am beginner in R. So, I am confused about the title of my question. sorry for that. I am trying to explain..
Professor gave me a NetCDF atmospheric data file(18.3MB).this file has 8 dimension and 8 variable. i have to work with 4 variable. every variable(time,site number,urban site,pm10) has 683016 data. suppose,
Urban site number:[2,5],
site number:[1,2,3,4,5,6],
time:[1-3-2012,2-3-2012....](hourly data(24) has taken in each day ),
pm10:[1,2,3,4,5,6.......](different for every hourly data with some missing value)
I have to manage this data set only for urban site and 1-3-2012(actually I have to make this spatio-temporal data to spatial data).I want my final data set like this:
Colum 1(time): 1-3-2012,1-3-2012,1-3-2012,1-3-2012,1-3-2012,1-3-2012
colum 2(Urban site number): 2,2,2,5,5,5
colum 3(pm10 value):1,2,3,NA,4,5,
As I only know very basic commands of R so I cant understand how can I solve this problem. Even I don't under stand How can I find any example of this type of problem in internet.
so, please give me some suggestion or link about what I have to learn to solve this problem in R. Please, help me out?
I think you're trying to reshape the dataset but i'm afraid i do not see how your current dataset looks like.
Could you elaborate more on what your dataset looks like right now?
There are packages that help reshaping such as {reshape} or {plyr}. But i need more detail to suggest which one you should use.

Resources