Creating panel data from cross-sectional and time series data - r

I have two pieces of data.
First, cross-sectional data-1600 corn field data
Second, weather data for the corresponding counties for the corn field area from 2013 to 2020. -Date is year, month, day
I want to create panel data using two kinds of data.
The problem is to join the weather data in each field for each year, month, and day (like vlookup in Excel).
For example, the two tables below.
Cross-section
ID
county
Longtitude
Latitude
1
Texas
-101.8259
36.99026
2
Cimarron
-101.7264
36.99253
3
Texas
-101.8038
36.99012
4
Cimarron
-101.9427
36.97605
5
Cimarron
-102.2219
36.96172
6
Beaver
-102.0777
36.96919
7
Beaver
-101.6181
36.98999
Time series data
YEAR
MONTH
DAY
county
A
B
2013
1
1
Texas
1
5
2013
1
2
Texas
2
6
2013
1
3
Texas
3
7
2013
1
4
Texas
4
8
2014
1
1
Texas
9
10
2014
1
2
Texas
11
12
2014
1
3
Texas
13
14
2014
1
4
Texas
15
16
The data I want to create is below.
ID
county
Longtitude
Latitude
YEAR
MONTH
DAY
county
A
B
1
Texas
-101.8259
36.99026
2013
1
1
Texas
1
5
2
Cimarron
-101.7264
36.99253
2013
1
1
Cimarron
-
-
3
Texas
-101.8038
36.99012
2013
1
1
Texas
1
5
4
Cimarron
-101.9427
36.97605
2013
1
1
Cimarron
-
-
5
Cimarron
-102.2219
36.96172
2013
1
1
Cimarron
-
-
6
Beaver
-102.0777
36.96919
2013
1
1
Beaver
-
-
7
Beaver
-101.6181
36.98999
2013
1
1
Beaver
-
-
1
Texas
-101.8259
36.99026
2013
1
2
Texas
2
6
2
Cimarron
-101.7264
36.99253
2013
2
1
Cimarron
1
5
3
Texas
-101.8038
36.99012
2013
1
2
Texas
2
6
4
Cimarron
-101.9427
36.97605
2013
1
2
Cimarron
2
6
5
Cimarron
-102.2219
36.96172
2013
1
2
Cimarron
-
6
Beaver
-102.0777
36.96919
2013
1
2
Beaver
1
5
7
Beaver
-101.6181
36.98999
2013
1
2
Beaver
1
5
…
1
Texas
-101.8259
36.99026
2014
1
4
Texas
15
16
2
Cimarron
-101.7264
36.99253
2014
2
4
Cimarron
15
16
3
Texas
-101.8038
36.99012
2014
1
2
Texas
15
16
4
Cimarron
-101.9427
36.97605
2014
1
4
Cimarron
-
-
5
Cimarron
-102.2219
36.96172
2014
1
4
Cimarron
-
-
6
Beaver
-102.0777
36.96919
2014
1
4
Beaver
-
-
7
Beaver
-101.6181
36.98999
2014
1
4
Beaver
-
-
-Is a number other than NA.
In other words, I want to perform 1 cross section for each date using a function like left_join and rbind the data to create panel data.
If I want to make a panel from January 1, 2013 to January 1, 2014,
I need to use 7 observations * 3 (counts) *365 (dates).
My data is much more than this, so I use a loop (1600 observations, 77 counties, 10 years).
If you give me any ideas, I appreciate it!
edit:
In other words, considering only the two of January 1st and January 2nd, left_join variables such as tavg of 1600 fields of table 1 (using data from the 1st), repeat the same process on the 2nd, and then combine the two data. Is. That is, 1600*2 data are generated (of course, the values of the tavg variables on the 1st and 2nd days are different). I have to take this course for 10 years.

Related

Repeating annual values multiple times to form a monthly dataframe

I have an annual dataset as below:
year <- c(2016,2017,2018)
xxx <- c(1,2,3)
yyy <- c(4,5,6)
df <- data.frame(year,xxx,yyy)
print(df)
year xxx yyy
1 2016 1 4
2 2017 2 5
3 2018 3 6
Where the values in column xxx and yyy correspond to values for that year.
I would like to expand this dataframe (or create a new dataframe), which retains the same column names, but repeats each value 12 times (corresponding to the month of that year) and repeat the yearly value 12 times in the first column.
As mocked up by the code below:
year <- rep(2016:2018,each=12)
xxx <- rep(1:3,each=12)
yyy <- rep(4:6,each=12)
df2 <- data.frame(year,xxx,yyy)
print(df2)
year xxx yyy
1 2016 1 4
2 2016 1 4
3 2016 1 4
4 2016 1 4
5 2016 1 4
6 2016 1 4
7 2016 1 4
8 2016 1 4
9 2016 1 4
10 2016 1 4
11 2016 1 4
12 2016 1 4
13 2017 2 5
14 2017 2 5
15 2017 2 5
16 2017 2 5
17 2017 2 5
18 2017 2 5
19 2017 2 5
20 2017 2 5
21 2017 2 5
22 2017 2 5
23 2017 2 5
24 2017 2 5
25 2018 3 6
26 2018 3 6
27 2018 3 6
28 2018 3 6
29 2018 3 6
30 2018 3 6
31 2018 3 6
32 2018 3 6
33 2018 3 6
34 2018 3 6
35 2018 3 6
36 2018 3 6
Any help would be greatly appreciated!
I'm new to R and I can see how I would do this with a loop statement but was wondering if there was an easier solution.
Convert df to a matrix, take the kronecker product with a vector of 12 ones and then convert back to a data.frame. The as.data.frame can be omitted if a matrix result is ok.
as.data.frame(as.matrix(df) %x% rep(1, 12))

how can i plot a histogram of crime type vs HOURS in r

i have a big dataset, with diferent variables and i want to make a histogram of type of crime against HOURS. how can i do that in r?
DATE TIME PLACE ZONE TYPE.OF.CRIME WEEK
1 2011/01/01 23:00 KIEPIES CLUB <NA> ARMED ROBBERY 1
2 2011/01/03 10:00 AUSSPANNPLATZ Zone 14 ARMED ROBBERY 1
3 2011/01/07 14:00 UNAM BUSHES Zone 16 ARMED ROBBERY 1
4 2011/01/08 21:34 TOTAL SERV. STATION, KHOMASDAL Zone 9 ARMED ROBBERY 1
5 2011/01/15 <NA> WOODPALM STR 625 Zone 11 ARMED ROBBERY 2
6 2011/01/03 14:03 C KANDOVAZU STR Zone 5 ASSAULT GBH 1
HOUR day month year HOURS
1 23 1 1 2011 23
2 10 3 1 2011 10
3 14 7 1 2011 14
4 21 8 1 2011 21
5 <NA> 15 1 2011 <NA>
6 14 3 1 2011 14
ggplot(df, aes(x=TYPE.OF.CRIME, y=HOURS)) +
geom_histogram()
Something like this should work.

How to merging R based on various column info and only one unique id

I m trying to merge 2 datasets:
dataset 1
id, month, year, postal
dataset 2
id, month, year, postal, Income, name, division
dataset 1
id year month postal
1 2010 9 j0r1h0
2 2010 8 j0r1h0
....
....
7 2007 6 j3x4p2
dataset 2
id, year, month, postal, name, division
1 2010 9 j0r1h0 john starting
2 2010 8 j0r1h0 lili retired
I want to keep all my columns and rows in dataset 1 and get the extra columns from dataset 2, like Income and division.
I get wrong result, duplicate field in month and year when I tried:
merge(a,b,by=c(postal,month,year,all.x=TRUE)
This is my expected result:
id year month postal name division
1 2010 9 j0r1h0 john starting
2 2010 8 j0r1h0 lili retired
3 2010 7 j1v3c4 verna starting
4 2009 1 j23c5 Greg medium
5 2007 1 j2j4d3 Greg medium
6 2008 2 j2p4s3 na na
7 2007 6 j3x4p2 na starting
And this is my result:
id year month postal name division
1 2010 9 j0r1h0 john starting
2 2010 8 j0r1h0 lili retired
3 2010 8 j0r1h0 na na
4 2010 7 na na na
5 2010 7 j1v3c4 verna starting
6 2009 1 j23c5 Greg medium
7 2007 1 j2j4d3 Greg medium
8 2008 2 j2p4s3 na na
9 2007 6 j3x4p2 na starting
9 2007 1 j3x4p2 na starting
my real data set size is over 200000 x 16

How to create a new column using looping and rbind in r?

I have a data similar like this. I would like to make 3 columns (date1, date2, date3) by using looping and rbind. It is because I am requied to do it by only that method.
(all I was told is making a loop, subset the data, sort it make a new data frame then rbind it to make a new column.)
year month day id
2011 1 5 3101
2011 1 14 3101
2011 2 3 3101
2011 2 4 3101
2012 1 27 3153
2012 2 20 3153
2012 2 22 3153
2012 3 1 3153
2013 1 31 3103
2013 2 1 3103
2013 2 4 3103
2013 3 4 3103
2013 3 6 3103
The result I expect is:
date1: number of days from 2011, January 1st, start again from 1 in a new year.
date2: number of days of an id working in a year, start again from 1 in a new year.
date3: number of days open within a year, start again from 1 in a new year.
(all of the dates are in ascending order)
year month day id date1 date2 date3
2011 1 5 3101 5 1 1
2011 1 14 3101 14 2 2
2011 2 3 3101 34 3 3
2011 2 4 3101 35 4 4
2012 1 27 3153 27 1 1
2012 2 20 3153 51 2 2
2012 2 22 3153 53 3 3
2012 3 1 3153 60 4 4
2013 1 31 3103 31 1 1
2013 2 1 3103 32 2 2
2013 2 4 3103 35 3 3
2013 3 4 3103 94 4 4
2013 3 6 3103 96 5 5
Please help! Thank you.
You can do it without using unnecessary for loop and subset, here is the answer below
df <- read.table(text =" year month day id
2011 1 5 3101
2011 1 14 3101
2011 2 3 3101
2011 2 4 3101
2012 1 27 3153
2012 2 20 3153
2012 2 22 3153
2012 3 1 3153
2013 1 31 3103
2013 2 1 3103
2013 2 4 3103
2013 3 4 3103
2013 3 6 3103",header = T)
library(lubridate)
df$date1 <- yday(mdy(paste0(df$month,"-",df$day,"-",df$year)))
df$date2 <- ave(df$year, df$id, FUN = seq_along)
df$date3 <- ave(df$year, df$year, FUN = seq_along)

Create a new variable to epidemiological week

I have a data frame with a column week and another year (87 weeks). I need to create a new column (weekseq) with a number that identify the week sequentially from first to last. I dont know how to do. Someone can help me?
Example:
id week month year yearweek weekseq
1 1 1 2014 2014/1
1 1 1 2013 2013/1
1 2 1 2014 2014/2
1 2 1 2013 2013/2
1 3 1 2014 2014/3
1 3 1 2013 2013/3
1 4 1 2014 2014/4
1 4 1 2013 2013/4
1 5 1 2014 2014/5
1 5 1 2013 2013/5
1 6 2 2014 2014/6
1 6 2 2013 2013/6
1 7 2 2014 2014/7
1 7 2 2013 2013/7
1 8 2 2014 2014/8
1 8 2 2013 2013/8
1 9 2 2014 2014/9
1 9 2 2013 2013/9
1 10 3 2014 2014/10
1 10 3 2013 2013/10
1 11 3 2014 2014/11
1 11 3 2013 2013/11
1 12 3 2014 2014/12
1 12 3 2013 2013/12
This solution requires the 'dplyr' and 'plyr' packages:
# Coerce into tbd_df
datatbl <- tbl_df(data)
# Arrange, giving more weight to year than week
datatbl <- arrange(datatbl, year, month, week)
# Create a new column that numbers the arranged rows sequentially
seqtbl <- ddply(datatbl, .(id), transform, sequence=seq_along(id))

Resources