Plot summarized data using qplot in R - r

I would like to cross-classify and plot bal using qplot facets:
> str(bal)
'data.frame': 2096 obs. of 6 variables:
$ fips : chr "24510" "24510" "24510" "24510" ...
$ SCC : chr "10100601" "10200601" "10200602" "30100699" ...
$ Pollutant: chr "PM25-PRI" "PM25-PRI" "PM25-PRI" "PM25-PRI" ...
$ Emissions: num 6.53 78.88 0.92 10.38 10.86 ...
$ type : chr "POINT" "POINT" "POINT" "POINT" ...
$ year : int 1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 ...
I'm interested in the two classifiers year and type:
> levels(factor(bal$year))
[1] "1999" "2002" "2005" "2008"
> levels(factor(bal$type))
[1] "NON-ROAD" "NONPOINT" "ON-ROAD" "POINT"
I get it so far, that I can plot the distribution of Emissions cross-classified by year and type:
What I'm unable to do is to plot the sum of the distributions of each year, which I however am able to compute:
> tapply(bal$Emissions, list(bal$year, bal$type), sum)
NON-ROAD NONPOINT ON-ROAD POINT
1999 522.94000 2107.625 346.82000 296.7950
2002 240.84692 1509.500 134.30882 569.2600
2005 248.93369 1509.500 130.43038 1202.4900
2008 55.82356 1373.207 88.27546 344.9752
My guess was something along the lines of
> qplot(bal$year, tapply(bal$Emissions, list(bal$year, bal$type), sum),
data=bal, facets= . ~ type)
Error: Aesthetics must either be length one, or the same length as the
dataProblems:tapply(bal$Emissions, list(bal$year, bal$type), sum)
but I dont get what R is telling me there.
How can I plot this matrix using qplot?

You dan do that using ggplot with either
qplot(year, Emissions, data=bal,
stat="summary", fun.y="sum",
facets= .~type
)
or
ggplot(bal) +
aes(year, Emissions) +
stat_summary(fun.y="sum",geom="point") +
facet_grid(.~type)
Both should give you the following plot which seems to match up well to your summary data.

Related

Error in MEEM(object, conLin, control$niterEM) in lme function

I'm trying to apply the lme function to my data, but the model gives follow message:
mod.1 = lme(lon ~ sex + month2 + bat + sex*month2, random=~1|id, method="ML", data = AA_patch_GLM, na.action=na.exclude)
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
dput for data, copy from https://pastebin.com/tv3NvChR (too large to include here)
str(AA_patch_GLM)
'data.frame': 2005 obs. of 12 variables:
$ lon : num -25.3 -25.4 -25.4 -25.4 -25.4 ...
$ lat : num -51.9 -51.9 -52 -52 -52 ...
$ id : Factor w/ 12 levels "24641.05","24642.03",..: 1 1 1 1 1 1 1 1 1 1 ...
$ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
$ bat : int -3442 -3364 -3462 -3216 -3216 -2643 -2812 -2307 -2131 -2131 ...
$ year : chr "2005" "2005" "2005" "2005" ...
$ month : chr "12" "12" "12" "12" ...
$ patch_id: Factor w/ 45 levels "111870.17_1",..: 34 34 34 34 34 34 34 34 34 34 ...
$ YMD : Date, format: "2005-12-30" "2005-12-31" "2005-12-31" ...
$ month2 : Ord.factor w/ 7 levels "January"<"February"<..: 7 7 7 7 7 1 1 1 1 1 ...
$ lonsc : num [1:2005, 1] -0.209 -0.213 -0.215 -0.219 -0.222 ...
$ batsc : num [1:2005, 1] 0.131 0.179 0.118 0.271 0.271 ...
What's the problem?
I saw a solution applying the lme4::lmer function, but there is another option to continue to use lme function?
The problem is that you have collinear combinations of predictors. In particular, here are some diagnostics:
## construct the fixed-effect model matrix for your problem
X <- model.matrix(~ sex + month2 + bat + sex*month2, data = AA_patch_GLM)
lc <- caret::findLinearCombos(X)
colnames(X)[lc$linearCombos[[1]]]
## [1] "sexM:month2^6" "(Intercept)" "sexM" "month2.L"
## [5] "month2.C" "month2^4" "month2^5" "month2^6"
## [9] "sexM:month2.L" "sexM:month2.C" "sexM:month2^4" "sexM:month2^5"
This is in a weird order, but it suggests that the sex × month interaction is causing problems. Indeed:
with(AA_patch_GLM, table(sex, month2))
## sex January February March April May June December
## F 367 276 317 204 43 0 6
## M 131 93 90 120 124 75 159
shows that you're missing data for one sex/month combination (i.e., no females were sampled in June).
You can:
construct the sex/month interaction yourself (data$SM <- with(data, interaction(sex, month2, drop = TRUE))) and use ~ SM + bat — but then you'll have to sort out main effects and interactions yourself (ugh)
construct the model matrix by hand (as above), drop the redundant column(s), then include all the resulting columns in the model:
d2 <- with(AA_patch_GLM,
data.frame(lon,
as.data.frame(X),
id))
## drop linearly dependent column
## note data.frame() has "sanitized" variable names (:, ^ both converted to .)
d2 <- d2[names(d2) != "sexM.month2.6"]
lme(reformulate(colnames(d2)[2:15], response = "lon"),
random=~1|id, method="ML", data = d2)
Again, the results will be uglier than the simpler version of the model.
use a patched version of nlme (I submitted a patch here but it hasn't been considered)
remotes::install_github("bbolker/nlme")

Forecasting in R: UseMethod model function error

I'm trying to run different forecast modeling methods on a monthly tsibble dataset. Its head() looks like:
# A tsibble: 6 x 2 [1M]
month total
<mth> <dbl>
1 2000 Jan 104.
2 2000 Feb 618.
3 2000 Mar 1005.
4 2000 Apr 523.
5 2000 May 1908.
6 2000 Jun 1062.
and has a structure of:
tsibble [212 x 2] (S3: tbl_ts/tbl_df/tbl/data.frame)
$ month: mth [1:212] 2000 Jan, 2000 Feb, 2000 Mar, 2000 Apr, 2000 May, 2000 Jun, 2000 Jul, 2000 Aug, 2000 Sep, 2000 Oct, 2000 Nov...
$ total: num [1:212] 104 618 1005 523 1908 ...
- attr(*, "key")= tibble [1 x 1] (S3: tbl_df/tbl/data.frame)
..$ .rows: list<int> [1:1]
.. ..$ : int [1:212] 1 2 3 4 5 6 7 8 9 10 ...
.. ..# ptype: int(0)
- attr(*, "index")= chr "month"
..- attr(*, "ordered")= logi TRUE
- attr(*, "index2")= chr "month"
- attr(*, "interval")= interval [1:1] 1M
..# .regular: logi TRUE
The dataset is monthly from 2000/01 to 2017/08 with no missing values or time periods. I'm trying to run a model such as:
df %>%
model(STL(total ~ season(window=9),robust=T)) %>%
components() %>% autoplot()
fit <- df %>%
model(ANN =ETS(total ~ error("A") + trend("A") + season()))
But for any type of model I try to run I get the exact same error each time. I'm looking for suggestions to correct the structure of the tsibble to allow these model functions to work.
Error in UseMethod("model") :
no applicable method for 'model' applied to an object of class "c('tbl_ts', 'tbl_df', 'tbl', 'data.frame')"
EDIT: Including reproducible example:
a = c(sample(1:1000,212))
df.ts <- ts(a, start=c(2000,1),end=c(2017,8),frequency=12)
df <- df.ts %>% as_tsibble()
Thanks for the example, I was able to get it to run without any errors, as follows:
library(tidyverse)
library(fpp3)
a = c(sample(1:1000,212))
df.ts <- ts(a, start=c(2000,1),end=c(2017,8),frequency=12)
df <- df.ts %>% as_tsibble()
df %>%
model(STL(a ~ season(window=9),robust=T)) %>%
components() %>% autoplot()
fit <- df %>%
model(ANN =ETS(a ~ error("A") + trend("A") + season()))
report(fit)
Here is what the decomposition looks like:
Here is the report of the model:
As both Russ Conte and Rob Hyndman found there's nothing inherently wrong with the example code being used.
I believe there was an overlapping issue between two packages, as my issue was resolved upon removing and reinstalling the forecasting packages.

Linear regresion of rectangular table against one set of values

I have a rectangular table with three variables: country, year and inflation. I already have all the descriptives I can have, now I need to do some analytics, and figured that I should do some linear regression against a target country. The best idea I had was to create a new variable called inflation.in.country.x and loop through the inflation of x in this new column but that seems somehow unclean solution.
How to get a linear regression of a rectangular data table? The structure is like this:
> dat %>% str
'data.frame': 1196 obs. of 3 variables:
$ Country.Name: Factor w/ 31 levels "Albania","Armenia",..: 9 8 10 11 12 14 15 16 17 19 ...
$ year : chr "1967" "1967" "1967" "1967" ...
$ inflation : num 1.238 8.328 3.818 0.702 1.467 ...
I want to take Armenia inflation as dependent variable and Albania as independent to get a linear regression. It is possible without transforming the data and keeping the years coherent?
One way is to spread your data table using Country.Name as key:
dat.spread <- dat %>% spread(key="Country.Name", value="inflation")
dat.spread %>% str
'data.frame': 50 obs. of 31 variables:
$ year : chr "1967" "1968" "1969" "1970" ...
$ Albania : num NA NA NA NA NA NA NA NA NA NA ...
$ Armenia : num NA NA NA NA NA NA NA NA NA NA ...
$ Brazil : num NA NA NA NA NA NA NA NA NA NA ...
[...]
But that forces you to transform the data which may seem undesirable. Afterwards, you can simply use cbind to do the linear regression against all countries:
lm(cbind(Armenia, Brazil, Colombia, etc...) ~ Albania, data = dat.spread)

Can't change size of bubbles in ggplot2 when creating a world map

I'm trying to create a world map with bubbles whose size is derived from a variable stored in the column "Occurrences".
This is sample data:
Country Year Occurrences lat long
1 United States 2015 122375186 37.09024 -95.71289
2 France 2015 106748608 46.00000 2.00000
3 United Kingdom 2015 97840223 54.00000 -2.00000
4 Netherlands 2015 80930053 52.13263 5.29127
5 China 2015 74367102 35.00000 105.00000
6 Austria 2015 40521175 47.33000 13.33000
This is the structure of the data:
str(worldmapplot)
'data.frame': 240 obs. of 5 variables:
$ Country : chr "United States" "France" "United Kingdom" "Netherlands" ...
$ Year : chr "2015" "2015" "2015" "2015" ...
$ Occurrences : num 1.26e+08 1.14e+08 9.87e+07 8.78e+07 7.90e+07 ...
$ lat : num 37.1 46 54 52.1 35 ...
$ long : num -95.71 2 -2 5.29 105 ...
This is the code I've tried:
library(ggplot2)
ggplot() +
geom_polygon(data = mdat, aes(long, lat, group=group), fill="grey50") +
geom_point(data= subset(worldmapplot, Country %in% Country [1:10]),
aes(x=long, y=lat, size = "Occurrences"), col="red", show.legend = F)
This is the error message I get:
Using size for a discrete variable is not advised.
Where am I wrong? Any help is appreciated.

ggplot2_Error: geom_point requires the following missing aesthetics: y

I am trying to run rWBclimate package in RStudio. I copied the below code from ROpenSci and pasted in RStudio. But I get error saying 'Don't know how to automatically pick scale for object of type list. Defaulting to continuous
Error: geom_point requires the following missing aesthetics: y
gbr.dat.t <- get_ensemble_temp("GBR", "annualavg", 1900, 2100)
## Loading required package: rjson
### Subset to just the median percentile
gbr.dat.t <- subset(gbr.dat.t, gbr.dat.t$percentile == 50)
## Plot and note the past is the same for each scenario
ggplot(gbr.dat.t,aes(x=fromYear,y=data,group=scenario,colour=scenario))
+ geom_point() +
geom_path() +
theme_bw() +
xlab("Year") +
ylab("Annual Average Temperature in 20 year increments")
I also tried to use geom_point(stat="identity") in the following way but didn't work:
ggplot(gbr.dat.t,aes(x=fromYear,y=data,group=scenario,colour=scenario))
+ geom_point(stat="identity") +
geom_path() +
theme_bw() +
xlab("Year") +
ylab("Annual Average Temperature in 20 year increments")
I still get the same message "Don't know how to automatically pick scale for object of type list. Defaulting to continuous
Error: geom_point requires the following missing aesthetics: y"
Also, the result from str(gbr.dat.t) is given below:
> str(gbr.dat.t)
'data.frame': 12 obs. of 6 variables:
$ scenario : chr "past" "past" "past" "past" ...
$ fromYear : int 1920 1940 1960 1980 2020 2020 2040 2040 2060 2060 ...
$ toYear : int 1939 1959 1979 1999 2039 2039 2059 2059 2079 2079 ...
$ data :List of 12
..$ : num 9.01
..$ : num 9.16
..$ : num 9.05
..$ : num 9.36
..$ : num 10
..$ : num 9.47
..$ : num 9.92
..$ : num 10.7
..$ : num 10.3
..$ : num 11.4
..$ : num 12.1
..$ : num 10.4
$ percentile: int 50 50 50 50 50 50 50 50 50 50 ...
$ locator : chr "GBR" "GBR" "GBR" "GBR" ...
Looking for your helpful answers.
Hope this helps. All I did was convert the gbr.dat.t$data to a numeric vector
library('rWBclimate')
library("ggplot2")
gbr.dat.t <- get_ensemble_temp("GBR", "annualavg", 1900, 2100)
## Loading required package: rjson
### Subset to just the median percentile
gbr.dat.t <- subset(gbr.dat.t, gbr.dat.t$percentile == 50)
#This is the line you were missing
gbr.dat.t$data <- unlist(gbr.dat.t$data)
## Plot and note the past is the same for each scenario
ggplot(gbr.dat.t,aes(x=fromYear,y=data,group=scenario,colour=scenario)) + geom_point() +
geom_path() +
theme_bw() +
xlab("Year") +
ylab("Annual Average Temperature in 20 year increments")

Resources