Web scraping data from a Chart or Graph in R

Web scraping data from a Chart or Graph in R - r

Good Morning,
I am hoping someone can help. The task is straight forward but seems a little difficult to execute.
On this website: https://reiwa.com.au/rent/
There is a chart labelled: Property trends
I am trying to extract the two time-series form this chart.
I have used rvest etc but I have had no luck at all. I am really hoping someone has the skills to solve this one because it has me lost.
Thank you all in advance.

A little inspection with Chrome devtools led me to this:
res <- httr::GET("https://reiwa.com.au/api/insights/trends/Residential")
json <- jsonlite::fromJSON(httr::content(res, "text"))
head(json$Result$SaleTrends)
#> CalendarYear CalendarMonth DateLabel MedianPrice DisplayPrice ChartOrder
#> 1 2020 December December 2020 490000 $490k 12
#> 2 2021 January January 2021 495000 $495k 13
#> 3 2021 February February 2021 500000 $500k 14
#> 4 2021 March March 2021 505000 $505k 15
#> 5 2021 April April 2021 510000 $510k 16
#> 6 2021 May May 2021 515000 $515k 17
head(json$Result$LeaseTrends)
#> CalendarYear CalendarMonth DateLabel MedianPrice DisplayPrice ChartOrder
#> 1 2020 December December 2020 410 $410pw 12
#> 2 2021 January January 2021 420 $420pw 13
#> 3 2021 February February 2021 420 $420pw 14
#> 4 2021 March March 2021 430 $430pw 15
#> 5 2021 April April 2021 440 $440pw 16
#> 6 2021 May May 2021 450 $450pw 17

Related

Trying to use gg_lag() but apparently have more than one time series

I'm trying to find lag using gg_lag but I keep getting the same error regarding my tsibble
# A tsibble: 255 x 6 [7D]
# Key: Demand [163]
Week Demand Date Month year Quarter
<dbl> <dbl> <date> <mth> <chr> <qtr>
1 1 48 2018-01-01 2018 Jan 2018 2018 Q1
2 2 101 2018-01-08 2018 Jan 2018 2018 Q1
3 3 129 2018-01-15 2018 Jan 2018 2018 Q1
4 4 113 2018-01-22 2018 Jan 2018 2018 Q1
5 5 116 2018-01-29 2018 Jan 2018 2018 Q1
6 6 123 2018-02-05 2018 Feb 2018 2018 Q1
7 7 137 2018-02-12 2018 Feb 2018 2018 Q1
8 8 136 2018-02-19 2018 Feb 2018 2018 Q1
9 9 151 2018-02-26 2018 Feb 2018 2018 Q1
10 10 87 2018-03-05 2018 Mar 2018 2018 Q1
# ... with 245 more rows
Printer_Q %>% gg_lag(Demand, geom='point')
Error: The data provided to contains more than one time series. Please filter a single time series to use gg_lag()
I tried filtering my data with:
Printer_Q <- Demandts %>%
select(-Week, -year, -Month, -Quarter)
...so that I am left with Demand and Date but it still says I have more than one time series? What am I doing wrong?

The Demand column should not be a key variable. A key variable is a categorical variable used to distinguish multiple time series in a single tsibble. It appears you just have one time series here, so you don't need a key variable.

Daily Average of Time series derived from monthly data R monthdays()

I have a time series object ts. I have mentioned the entire object here. It has data from Jan 2013 to Dec 2017 for all years. I am trying to find the daily average value so that the value is divided by the number of days in a month.
Expected output
The first value for Jan 2013 in ts is 23770, I want the value to be 23770/31 where 31 is the number of days in Jan, second value for Feb 2013 is 23482. I want the value to be 23482/28 as 28 was the number of days in Feb 2013 and so on
Tried so far:
I know monthdays() can do this. Something like ts/monthdays() .Monthdays() returns number of days in a month. I am not able to implement it here. Read about this tapply somewhere but it is not giving me desired result, since i need values corresponding to each month year combination.
ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 23770 23482 23601 22889 23401 24240 23873 23647 23378 23871 22624 23496
2014 26765 27619 26341 27320 27389 27418 26874 27005 27538 26324 27267 27583
2015 28354 27452 28336 28998 28595 28338 27806 28660 27226 28317 28666 28574
2016 30209 30659 31554 30248 30358 31091 30389 30247 31227 31839 30602 30609
2017 32180 32203 31639 31784 32375 30856 31863 32827 32506 31702 31681 32176
> cycle(ts_actual_group2)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 1 2 3 4 5 6 7 8 9 10 11 12
2014 1 2 3 4 5 6 7 8 9 10 11 12
2015 1 2 3 4 5 6 7 8 9 10 11 12
2016 1 2 3 4 5 6 7 8 9 10 11 12
2017 1 2 3 4 5 6 7 8 9 10 11 12
Using tapply since i read it , but this is not giving desired output
tapply(ts_actual_group2, cycle(ts_actual_group2), mean)
1 2 3 4 5 6 7 8 9 10 11 12
28255.6 28283.0 28294.2 28247.8 28423.6 28388.6 28161.0 28477.2 28375.0 28410.6 28168.0 28487.6

I am not able to implement it here.
I'm not sure why you couldn't. The monthdays function from the forecast package, when applied to a ts object, returns the number of days in each month of the series. The object returned is a time-series of the same dimension as the input. So you can simply divide them.
library(forecast)
ts/monthdays(ts)
Jan Feb Mar Apr May Jun Jul
2013 766.7742 838.6429 761.3226 762.9667 754.8710 808.0000
2014 863.3871 986.3929 849.7097 910.6667 883.5161 913.9333
2015 914.6452 980.4286 914.0645 966.6000 922.4194 944.6000
2016 974.4839 1057.2069 1017.8710 1008.2667 979.2903 1036.3667
2017 1038.0645 1150.1071 1020.6129 1059.4667 1044.3548 1028.5333
monthsdays(ts) # Accepts a time-series object
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 31 28 31 30 31 30 31 31 30 31 30 31
2014 31 28 31 30 31 30 31 31 30 31 30 31
2015 31 28 31 30 31 30 31 31 30 31 30 31
2016 31 29 31 30 31 30 31 31 30 31 30 31
2017 31 28 31 30 31 30 31 31 30 31 30 31

dplyr sample_n by one variable through another one

I have a data frame with a "grouping" variable season and another variable year which is repeated for each month.
df <- data.frame(month = as.character(sapply(month.name,function(x)rep(x,4))),
season = c(rep("winter",8),rep("spring",12),rep("summer",12),rep("autumn",12),rep("winter",4)),
year = rep(2021:2024,12))
I would like to use dplyr::sample_n or something similar to choose 2 months in the data frame for each season and keep the same months for all the years, for example:
month season year
1 January winter 2021
2 January winter 2022
3 January winter 2023
4 January winter 2024
5 February winter 2021
6 February winter 2022
7 February winter 2023
8 February winter 2024
9 March spring 2021
10 March spring 2022
11 March spring 2023
12 March spring 2024
13 May spring 2021
14 May spring 2022
15 May spring 2023
16 May spring 2024
17 June summer 2021
18 June summer 2022
19 June summer 2023
20 June summer 2024
21 July summer 2021
22 July summer 2022
23 July summer 2023
24 July summer 2024
25 October autumn 2021
26 October autumn 2022
27 October autumn 2023
28 October autumn 2024
29 November autumn 2021
30 November autumn 2022
31 November autumn 2023
32 November autumn 2024
I cannot make df %>% group_by(season,year) %>% sample_n(2) since it chooses different months for each year.
Thanks!

We can randomly sample 2 values from month and filter them by group.
library(dplyr)
df %>%
group_by(season) %>%
filter(month %in% sample(unique(month),2))
# month season year
# <chr> <chr> <int>
# 1 January winter 2021
# 2 January winter 2022
# 3 January winter 2023
# 4 January winter 2024
# 5 February winter 2021
# 6 February winter 2022
# 7 February winter 2023
# 8 February winter 2024
# 9 March spring 2021
#10 March spring 2022
# … with 22 more rows
If for certain groups there are less than 2 unique values we can select minimum between 2 and unique values in the group to sample.
df %>%
group_by(season) %>%
filter(month %in% sample(unique(month),min(2, n_distinct(month))))
Using the same logic with base R, we can use ave
df[as.logical(with(df, ave(month, season,
FUN = function(x) x %in% sample(unique(x),2)))), ]

An option using slice
library(dplyr)
df %>%
group_by(season) %>%
slice(which(!is.na(match(month, sample(unique(month), 2)))))
# A tibble: 32 x 3
# Groups: season [4]
# month season year
# <fct> <fct> <int>
# 1 October autumn 2021
# 2 October autumn 2022
# 3 October autumn 2023
# 4 October autumn 2024
# 5 November autumn 2021
# 6 November autumn 2022
# 7 November autumn 2023
# 8 November autumn 2024
# 9 April spring 2021
#10 April spring 2022
# … with 22 more rows
Or using base R
by(df, df$season, FUN = function(x) subset(x, month %in% sample(unique(month), 2 )))

converting to time series using ts() in r

Good afternoon
I have a time series
v2<-c(12,13,15,17,18,12,11,12)
which run from July 1996 to October 1997, just the months between July and October
when I try to convert to time series with
v2.ts<-ts(v2, frequency=12, start=c(1996,7), end=c(1997,10))
It yields me this result
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1996 12 13 15 17 18 12
1997 11 12 12 13 15 17 18 12 11 12
what parameters can I use to make it like:
Jul Aug Sep Oct
1996 12 13 15 17
1997 18 12 11 12
Thanks in advance for the help

A ts series must be regularly spaced but the output shown has points that are one month apart except between Oct of the first year and July of the second year so it is not of that form.
There are several packages that can represent irregularly spaced series. With the zoo package it would be done like this:
library(zoo)
z <- as.zoo(v2.ts)
z[cycle(z) %in% 7:10]
## Jul 1996 Aug 1996 Sep 1996 Oct 1996 Jul 1997 Aug 1997 Sep 1997 Oct 1997
## 12 13 15 17 18 12 11 12
If you are not looking for a time series but just a matrix with the indicated elements then:
tapply(c(v2.ts), list(floor(time(v2.ts)), cycle(v2.ts)), c)[, 7:10]
## 7 8 9 10
## 1996 12 13 15 17
## 1997 18 12 11 12

sorting of month in matrix in R

I have a matrix in this format:
year month Freq
1 2014 April 466
2 2015 April 59535
3 2014 August 10982
4 2015 August 0
5 2014 December 35881
6 2015 December 0
7 2014 February 17
8 2015 February 24258
9 2014 January 0
10 2015 January 22785
11 2014 July 2981
12 2015 July 0
13 2014 June 1279
14 2015 June 31356
15 2014 March 289
16 2015 March 40274
I need to sort months on the basis of their occurrence i.e jan, feb, mar... when I sort it gets sorted on the basis of first alphabet. I used this:
mat <- mat[order(mat[,1], decreasing = TRUE), ]
and it looks like this :
row.names April August December February January July June March May November October September
1 2015 59535 0 0 24258 22785 0 31356 40274 84211 0 0 0
2 2014 466 10982 35881 17 0 2981 1279 289 879 8911 8565 4000
Can we sort months on the basis of occurrence in R ?

Suppose DF is the data frame from which you derived your matrix. We provide such a data frame in reproducible form at the end. Ensure that month and year are factors with appropriate levels. Note that month.name is a builtin variable in R that is used here to ensure that the month levels are appropriately sorted and we have assumed year is a numeric column. Then use levelplot like this:
DF2 <- transform(DF,
month = factor(as.character(month), levels = month.name),
year = factor(year)
)
library(lattice)
levelplot(Freq ~ year * month, DF2)
Note: Here is DF in reproducible form:
Lines <- " year month Freq
1 2014 April 466
2 2015 April 59535
3 2014 August 10982
4 2015 August 0
5 2014 December 35881
6 2015 December 0
7 2014 February 17
8 2015 February 24258
9 2014 January 0
10 2015 January 22785
11 2014 July 2981
12 2015 July 0
13 2014 June 1279
14 2015 June 31356
15 2014 March 289
16 2015 March 40274 "
DF <- read.table(text = Lines, header = TRUE)

Assuming you want to sort based on time (have to add a dummy day 1 to convert to time format):
time = strptime(paste(1, mat$month, mat$year), format = "%d %B %Y")
mat = mat[sort.ind(time, index.return=T)$ix, ]
Or if you don't care about the year:
time = strptime(paste(1, mat$month, 2000), format = "%d %B %Y")
mat = mat[sort.ind(time, index.return=T)$ix, ]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Web scraping data from a Chart or Graph in R - r

Related

Trying to use gg_lag() but apparently have more than one time series

Daily Average of Time series derived from monthly data R monthdays()

dplyr sample_n by one variable through another one

converting to time series using ts() in r

sorting of month in matrix in R

Categories

Resources