Spatial interpolation using kriging in R - r

I have two datasets. The first one shows information about multiple weather phenomena in Brazil measured by weather stations in the country. I also have information regarding the latitude and longitude of these stations, and the weather data is provided by year.
id_estacao ano precipitacao_total pressao_atm_max pressao_atm_min
1 A001 2016 0.12988728 888.0399 887.5521
2 A002 2016 0.14282787 932.8559 932.3215
3 A003 2016 0.12486339 930.6114 930.0861
4 A009 2016 0.07696277 979.3086 978.7480
5 A010 2016 0.11548640 980.2251 979.6578
6 A011 2016 0.13886103 958.5196 957.9678
radiacao_global temperatura_max temperatura_min umidade_rel_max
1 1508.024 22.77794 21.34106 65.52186
2 1419.644 24.90139 23.40798 66.28074
3 1460.937 24.00484 22.46128 68.25395
4 1440.643 29.22710 27.79419 61.87001
5 1540.398 27.52555 25.87737 63.64414
6 1471.004 24.95090 23.36305 66.69974
umidade_rel_min vento_velocidade id_municipio estacao latitude
1 59.04111 2.3430377 5300108 Brasilia -15.78944
2 59.56990 1.2416667 5208707 Goiania -16.64284
3 59.71499 1.6017190 5213806 Morrinhos -17.74507
4 55.21366 1.5202973 1721000 Palmas -10.19074
5 57.01889 0.9295148 1716208 Parana -12.61500
6 60.26358 1.7454093 5220405 Sao Simao -18.96914
longitude
1 -47.92583
2 -49.22022
3 -49.10170
4 -48.30181
5 -47.87194
6 -50.63345
Moreover, I have information about the location of the Brazilian municipalities (cities).
id_municipio latitude longitude
1 1100015 -11.92 -61.99
2 1100023 -9.91 -63.04
3 1100031 -13.49 -60.54
4 1100049 -11.43 -61.44
5 1100056 -13.18 -60.81
6 1100064 -13.11 -60.54
I want to use interpolation to predict the weather phenomena in these cities using the information provided in the first dataset. I have been working with the package "fields", which uses this function:
# Kriging of the rainfall data by station
fit = Krig(x,precip[,d])
# Predict the value on the unit
pred<-predict(fit,Y)
It basically makes a loop with d-days (in this case, years). precip[,d] is the precipitation variable (in this case, all the weather variables) on day d, and x is the latitude and longitude of the station. This should provide a fit which is the output of krig and Y (the latitude and longitude of the municipalities).
However, I have been struggling to make this function fit my data. I would like to know if someone could help me.

Related

Remove invalid and incorrect spatial points (Latitude and longitude) in R

I have well over 100,000 GPS locations of 35 animals. I have removed the 'NA' and '0' GPS latitude-longitude locations but noticed that there was one latitude and longitude location that was incorrect and that needs to be removed (in this subset of data, the 4th line that has -78.6917357 17.5506138 as LAT and LON). It is likely that there are other incorrect GPS locations and wondered if there is an easy way to identify outliers and remove them.
My sample data looks like this:
COLLAR NAME Animal_ID SEX DATE TIME Year Month Day Hour LATITUDE LONGITUDE HEIGHT
26 Keith CM8 M 2009-05-28 2:00:00 2009 5 28 2 49.7518424 -123.6099396 705.87
26 Keith CM8 M 2009-06-09 7:00:00 2009 6 9 7 49.7518495 -123.4860212 191.61
26 Keith CM8 M 2009-05-31 18:00:002009 5 31 18 49.7518576 -123.5373316 410.96
26 Jack CM6 M 2009-06-01 22:00:002009 6 1 22 -78.6917357 17.5506138 490.23
26 Keith CM8 M 2009-05-28 2:00:00 2009 5 28 2 49.7518424 -123.6099396 705.87
26 Keith CM8 M 2009-06-09 7:00:00 2009 6 9 7 49.7518495 -123.4860212 191.61
26 Keith CM8 M 2009-05-31 18:00:002009 5 31 18 49.7518576 -123.5373316 410.96
27 Keith CM8 M 2009-05-28 3:00:00 2009 5 28 3 49.7518775 -123.6099242 713.05
27 Keith CM8 M 2009-06-09 10:00:002009 6 9 10 49.7519163 -123.486203 108.02
The code I used is this which works to remove the 0 and NA:
library(dplyr)
data <- data_all %>%
filter(!is.na(LATITUDE), LATITUDE !=0,!is.na(LONGITUDE), LONGITUDE !=0)
Now, I would like to further remove row 4 here (and any other invalid or incorrect spatial points) using the following line of code but that does not work:
data <- filter(LATITUDE !=-78.69174, LONGITUDE !=17.55061)
I cannot see a reduction in the number of rows after running this code. Please note that I do not have row numbers so cannot specifically remove row 4 and, ideally, I want to remove all those rows that have odd values in one line of code (or as a pipe function) that does work. Your help would be most appreciated. Thanks!
The stored values are likely different than what is displayed. Use dplyr::near for approximate matches to coordinates you know are incorrect. If I were you, id use mutate first to flag incorrect coordinates via a new boolean column, then filter on that boolean column
Here's an approach that limits to values of latitude and longitude within an expected range:
data <- data_all %>%
filter(!is.na(LATITUDE), between(LATITUDE, 49, 51),
!is.na(LONGITUDE), between(LONGITUDE, -125, -122))
or equivalently
data <- data_all %>%
filter(!is.na(LATITUDE), LATITUDE >= 49, LATITUDE <= 51,
!is.na(LONGITUDE), LONGITUDE >= -125, LONGITUDE <= -122)

R: How to plot multiple series when the series is included as a variable?

I want to plot multiple lines to one graph of five different time series. The problem is that my data frame is arranged like so:
Series Time Price ...
1 Dec 2003 5
2 Dec 2003 10
3 Dec 2003 2
1 Jan 2004 10
2 Jan 2004 10
3 Jan 2004 5
This is a simplified version, and there are many other variables for each observation. I'd like to be able to plot time vs price and use the first variable as the indicator for which series.
The time period is 77 months long, so I'm not sure if there's an easy way to reshape the data to look like:
Series Dec.2003.Price Jan.2004.Price ...
1 5 10
2 10 10
3 2 5
or a way to graph these like I said without reshaping.
You can try
xyplot(Price ~ Time, groups=Series, data=df, type="l")

Leaflet R color map based on multiple variables?

From what I have seen colored maps in leaflet usually only depict one variable(GDP, Crime stats, Temperature etc) like this one:
.
Is there a way to make maps that display the highest variable in a data frame in leaflet R? For example showing which alcoholic beverage is the most popular in a country, like this map?
(source: dailymail.co.uk)
Say that I had a data frame that looked like this and I wanted to do a similar map to the alcoholic beverage one...
Country Beer Wine Spirits Coffee Tea
Sweden 7 7 5 10 6
USA 9 6 6 7 5
Russia 5 3 9 5 8
Is there a way in leaflet R to pick out the alcoholic beverages, assign them a color and then display them on the map to show which type of alcoholic beverage is the most popular in the three different countries?
Step 0, make a test data frame:
> set.seed(1234)
> drinks = data.frame(Country=c("Sweden","USA","Russia"),
Beer=sample(10,3), Wine=sample(10,3), Spirits=sample(10,3),
Coffee=sample(10,3), Tea=sample(10,3))
Note I have country as a column - yours might have countries in the row names which means the following code needs changing. Anyway. We get:
> drinks
Country Beer Wine Spirits Coffee Tea
1 Sweden 2 7 1 6 3
2 USA 6 8 3 7 9
3 Russia 5 6 6 5 10
Now we combine apply to work along rows, which.max to get the highest element, and various subset operations to drop the country column and get the drink name from the column names:
> drinks$Favourite = names(drinks)[-1][apply(drinks[,-1],1,which.max)]
> drinks
Country Beer Wine Spirits Coffee Tea Favourite
1 Sweden 2 7 1 6 3 Wine
2 USA 6 8 3 7 9 Tea
3 Russia 5 6 6 5 10 Tea
If there's a tie then which.max will pick (I think) the first element. If you want something else then you'll have to rewrite.
Now feed your new data frame to leaflet and map the Favourite column.

Create list of elements which match a value

I have a table of values with the name, zipcode and opening date of recreational pot shops in WA state.
name zip opening
1 The Stash Box 98002 2014-11-21
3 Greenside 98198 2015-01-01
4 Bud Nation 98106 2015-06-29
5 West Seattle Cannabis Co. 98168 2015-02-28
6 Nimbin Farm 98168 2015-04-25
...
I'm analyzing this data to see if there are any correlations between drug usage and location and opening of recreational stores. For one of the visualizations I'm doing, I am organizing the data by number of shops per zipcode using the group_by() and summarize() functions in dplyr.
zip count
(int) (int)
1 98002 1
2 98106 1
3 98168 2
4 98198 1
...
This data is then plotted onto a leaflet map. Showing the relative number of shops in a zipcode using the radius of the circles to represent shops.
I would like to reorganize the name variable into a third column so that this can popup in my visualization when scrolling over each circle. Ideally, the data would look something like this:
zip count name
(int) (int) (character)
1 98002 1 The Stash Box
2 98106 1 Bud Nation
3 98168 2 Nimbin Farm, West Seattle Cannabis Co.
4 98198 1 Greenside
...
Where all shops in the same zipcode appear together in the third column together. I've tried various for loops and if statements but I'm sure there is a better way to do this and my R skills are just not up there yet. Any help would be appreciated.

(In)correct use of a linear time trend variable, and most efficient fix?

I have 3133 rows representing payments made on some of the 5296 days between 7/1/2000 and 12/31/2014; that is, the "Date" feature is non-continuous:
> head(d_exp_0014)
Year Month Day Amount Count myDate
1 2000 7 6 792078.6 9 2000-07-06
2 2000 7 7 140065.5 9 2000-07-07
3 2000 7 11 190553.2 9 2000-07-11
4 2000 7 12 119208.6 9 2000-07-12
5 2000 7 16 1068156.3 9 2000-07-16
6 2000 7 17 0.0 9 2000-07-17
I would like to fit a linear time trend variable,
t <- 1:3133
to a linear model explaining the variation in the Amount of the expenditure.
fit_t <- lm(Amount ~ t + Count, d_exp_0014)
However, this is obviously wrong, as t increments in different amounts between the dates:
> head(exp)
Year Month Day Amount Count Date t
1 2000 7 6 792078.6 9 2000-07-06 1
2 2000 7 7 140065.5 9 2000-07-07 2
3 2000 7 11 190553.2 9 2000-07-11 3
4 2000 7 12 119208.6 9 2000-07-12 4
5 2000 7 16 1068156.3 9 2000-07-16 5
6 2000 7 17 0.0 9 2000-07-17 6
Which to me is the exact opposite of a linear trend.
What is the most efficient way to get this data.frame merged to a continuous date-index? Will a date vector like
CTS_date_V <- as.data.frame(seq(as.Date("2000/07/01"), as.Date("2014/12/31"), "days"), colnames = "Date")
yield different results?
I'm open to any packages (using fpp, forecast, timeSeries, xts, ts, as of right now); just looking for a good answer to deploy in functional form, since these payments are going to be updated every week and I'd like to automate the append to this data.frame.
I think some kind of transformation to regular (continuous) time series is a good idea.
You can use xts to transform time series data (it is handy, because it can be used in other packages as regular ts)
Filling the gaps
# convert myDate to POSIXct if necessary
# create xts from data frame x
ts1 <- xts(data.frame(a = x$Amount, c = x$Count), x$myDate )
ts1
# create empty time series
ts_empty <- seq( from = start(ts1), to = end(ts1), by = "DSTday")
# merge the empty ts to the data and fill the gap with 0
ts2 <- merge( ts1, ts_empty, fill = 0)
# or interpolate, for example:
ts2 <- merge( ts1, ts_empty, fill = NA)
ts2 <- na.locf(ts2)
# zoo-xts ready functions are:
# na.locf - constant previous value
# na.approx - linear approximation
# na.spline - cubic spline interpolation
Deduplicate dates
In your sample there is now sign of duplicated values. But based on a new question it is very likely. I think you want to aggregate values with sum function:
ts1 <- period.apply( ts1, endpoints(ts1,'days'), sum)

Resources