openweather api to get multiple cities - r

get_weather_forecaset_by_cities <- function(city_names){
df <- data.frame("weather_data_frame" )
for (city_name in city_names){
forecast_url <- 'https://api.openweathermap.org/data/2.5/forecast'
forecast_query <- list(q = city_name, appid = "a297d3b46d0b5a7aab6dde3512962b99", units="metric")
for(result in results) {
city <- c(city, city_name)
}
}
return(df)
}
Need help to understand the above given code, i am specifically stuck in the line 6 of the code, in the markdown it says "# Loop the json result" (Note: json_result is a list of lists).
But my actual task is this "Complete and call get_weather_forecaset_by_cities function with a list of cities, and write the data frame into a csv file called cities_weather_forecast.csv"
Which parts should i have to fill in and how?
cities <- c("Seoul", "Washington, D.C.", "Paris", "Suzhou")
cities_weather_df <- get_weather_forecaset_by_cities(cities)
After running the next line of codes it shows this error
"Error in get_weather_forecaset_by_cities(cities): object 'json_result' not found
Traceback:
get_weather_forecaset_by_cities(cities)"

Since this is a homework question, it's not appropriate to provide a complete answer. However, a function that receives a list of city names and obtains their 5 day weather forecasts from openweathermap.org looks like this:
get_weather_forecast_by_cities <- function(city_names){
library(dplyr)
library(tidyr)
library(jsonlite)
df <- NULL # initialize a data frame
for(c in city_names){
aForecast <- paste0("http://api.openweathermap.org/data/2.5/forecast?",
"q=",c,
"&appid=<your API KEY here>",
"&units=metric")
message(aForecast)
aCity <- fromJSON(aForecast)
# extract the date / time content and convert to POSIXct
# extract the periodic weather content, note that
# tidyr::unnest() is helpful here
# add the city name
# bind the results into the output data frame, df
}
df # return data frame
}
We replace the commented areas with code and run the function as follows:
source("./examples/get_weather_forecast_by_cities.R")
cityList <- c("Seoul","Paris","Chicago","Beijing","Suzhou")
theData <-get_weather_forecast_by_cities(cityList)
head(theData)
...and the first few rows of output:
> head(theData)
# A tibble: 6 × 24
dt temp feels_like temp_min temp_max pressure sea_level
<dttm> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 2022-08-03 03:00:00 27.4 31.5 27.4 29.9 1011 1011
2 2022-08-03 06:00:00 29.4 34.2 29.4 31.0 1010 1010
3 2022-08-03 09:00:00 29.2 33.2 29.2 29.2 1009 1009
4 2022-08-03 12:00:00 26.9 29.9 26.9 26.9 1010 1010
5 2022-08-03 15:00:00 25.7 26.7 25.7 25.7 1010 1010
6 2022-08-03 18:00:00 25.0 26.1 25.0 25.0 1010 1010
# … with 17 more variables: grnd_level <int>, humidity <int>, temp_kf <dbl>,
# id <int>, main <chr>, description <chr>, icon <chr>, all <int>, speed <dbl>,
# deg <int>, gust <dbl>, visibility <int>, pop <dbl>, `3h` <dbl>, pod <chr>,
# dt_txt <chr>, city <chr>
>
At this point we have a data frame that can be easily written to csv with write.csv() which is left as an exercise for the reader.

Related

Opening and reshaping xlsx files with nameless columns in r using a pattern

I'm working with French electoral data but I'm having issues opening xlsx files to work on them in r. I was wondering if anyone had had the same problem and found a solution.
The issue is that only the first 29 columns out of +100 columns have names and the rest are nameless. I've tried editing the column names in excel before opening them but this solution is time consuming and prone to mistakes. I'm looking for a way to automatize the process.
The datasets have a pattern that I'm trying to exploit to rename the columns and reshape the files:
the first 6 columns correspond to the geographic id of the precinct (region, municipality, etc...)
the next 15 columns give information about aggregate results in the precinct (number of voters, number of registered voters, participation, etc..)
The next 8 columns give information about a given candidate and her results in the precinct (name, sex, party id, number of votes, .. etc)
These 29 columns have names.
The next columns are nameless and correspond to other candidates. They repeat the 8 columns for the other candidates.
There is another layer of difficulty since each precinct does not have the same number of candidates so the number of nameless columns changes.
Ideally, I would want r to recognize the pattern and reshape the datasets to long by creating a new row for each candidate keeping the precinct id and aggregate data in each row. To do this, I would like r to recognize each sequence of nameless 8 columns.
To simplify, let's say that my data frame looks like the following:
precinct_id
tot_votes
candidate_id
candidate_votes
...1
...2
Paris 05
1000
Jean Dupont
400
Paul Dupuy
300
Paris 06
500
Jean Dupont
50
Paul Dupuy
150
where:
candidate_id and candidate_votes correspond to the id and result of the first candidate
...1, ...2 is how r is automatically renaming the nameless columns that correspond to candidate_id and candidate_votes for candidate 2 in the same precinct.
I need r to select the observations in each sequence of 2 columns and paste them into new rows under candidate_id candidate_votes while keeping the precinct_id and precinct_votes columns.
precinct_id
tot_votes
candidate_id
candidate_votes
Paris 05
1000
Jean Dupont
400
Paris 06
500
Jean Dupont
50
Paris 05
1000
Paul Dupuy
300
Paris 06
500
Paul Dupuy
150
I have no idea how to reshape without column names... Any help would be greatly appreciated! Thanks!
PS: The files come from here: https://www.data.gouv.fr/fr/datasets/elections-legislatives-des-12-et-19-juin-2022-resultats-definitifs-du-premier-tour/
Actually, there's an even simpler solution to the one I suggested. .names_repair can take a function as its value. This function should accept a vector of "input" column names and return a vector of "output column names". As we want to treat the data for the first candidate in each row in eactly the same way as every subsequent set of eight columns, I'll ignore only the first 21 columns, not the first 29.
read_excel(
"resultats-par-niveau-subcom-t1-france-entiere.xlsx",
.name_repair=function(x) {
suffixes <- c("NPanneau", "Sexe", "Nom", "Prénom", "Nuance", "Voix", "PctVoixIns", "PctVoixExp")
if ((length(x) - 21) %% 8 != 0) stop(paste("Don't know how to handle a sheet with", length(x), "columns [", (length(x) - 21) %% 8, "]"))
for (i in 1:length(x)) {
if (i > 21) {
x[i] <- paste0("C", 1 + floor((i-22)/8), "_", suffixes[1 + (i-22) %% 8])
}
}
x
}
)
# A tibble: 35,429 × 197
`Code du département` `Libellé du dép…` `Code de la ci…` `Libellé de la…` `Code de la co…` `Libellé de la…` `Etat saisie` Inscrits Abstentions
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 01 Ain 01 1ère circonscri… 016 Arbigny Complet 327 154
2 01 Ain 01 1ère circonscri… 024 Attignat Complet 2454 1281
3 01 Ain 01 1ère circonscri… 029 Beaupont Complet 446 224
4 01 Ain 01 1ère circonscri… 038 Bény Complet 604 306
5 01 Ain 01 1ère circonscri… 040 Béréziat Complet 362 179
6 01 Ain 01 1ère circonscri… 050 Boissey Complet 262 137
7 01 Ain 01 1ère circonscri… 053 Bourg-en-Bresse Complet 15516 8426
8 01 Ain 01 1ère circonscri… 057 Boz Complet 391 210
9 01 Ain 01 1ère circonscri… 065 Buellas Complet 1408 654
10 01 Ain 01 1ère circonscri… 069 Certines Complet 1169 639
# … with 35,419 more rows, and 188 more variables: `% Abs/Ins` <dbl>, Votants <dbl>, `% Vot/Ins` <dbl>, Blancs <dbl>, `% Blancs/Ins` <dbl>,
# `% Blancs/Vot` <dbl>, Nuls <dbl>, `% Nuls/Ins` <dbl>, `% Nuls/Vot` <dbl>, Exprimés <dbl>, `% Exp/Ins` <dbl>, `% Exp/Vot` <dbl>,
# C1_NPanneau <dbl>, C1_Sexe <chr>, C1_Nom <chr>, C1_Prénom <chr>, C1_Nuance <chr>, C1_Voix <dbl>, C1_PctVoixIns <dbl>, C1_PctVoixExp <dbl>,
# C2_NPanneau <dbl>, C2_Sexe <chr>, C2_Nom <chr>, C2_Prénom <chr>, C2_Nuance <chr>, C2_Voix <dbl>, C2_PctVoixIns <dbl>, C2_PctVoixExp <dbl>,
# C3_NPanneau <dbl>, C3_Sexe <chr>, C3_Nom <chr>, C3_Prénom <chr>, C3_Nuance <chr>, C3_Voix <dbl>, C3_PctVoixIns <dbl>, C3_PctVoixExp <dbl>,
# C4_NPanneau <dbl>, C4_Sexe <chr>, C4_Nom <chr>, C4_Prénom <chr>, C4_Nuance <chr>, C4_Voix <dbl>, C4_PctVoixIns <dbl>, C4_PctVoixExp <dbl>,
# C5_NPanneau <dbl>, C5_Sexe <chr>, C5_Nom <chr>, C5_Prénom <chr>, C5_Nuance <chr>, C5_Voix <dbl>, C5_PctVoixIns <dbl>, C5_PctVoixExp <dbl>, …
That's read the data in and named the columns. To get the final format you want, we will need to do a standard pivot_longer()/pivot_wider() trick, but the situation here is slightly complicated because some of your columns are character and some are numeric. So first, I'll turn the numeric columns into character columns so that the pivot_longer() step doesn't fail.
For clarity, I'll drop the first 21 columns so that it's easy to see what's going on.
read_excel(
"resultats-par-niveau-subcom-t1-france-entiere.xlsx",
.name_repair=function(x) {
suffixes <- c("NPanneau", "Sexe", "Nom", "Prénom", "Nuance", "Voix", "PctVoixIns", "PctVoixExp")
if ((length(x) - 21) %% 8 != 0) stop(paste("Don't know how to handle a sheet with", length(x), "columns [", (length(x) - 21) %% 8, "]"))
for (i in 1:length(x)) {
if (i > 21) {
x[i] <- paste0("C", 1 + floor((i-22)/8), "_", suffixes[1 + (i-22) %% 8])
}
}
x
}
) %>%
mutate(across(where(is.numeric) | where(is.logical), as.character)) %>%
pivot_longer(!1:21, names_sep="_", names_to=c("Candidate", "Variable"), values_to="Value") %>%
select(!1:21)
# A tibble: 6,235,504 × 3
Candidate Variable Value
<chr> <chr> <chr>
1 C1 NPanneau 2
2 C1 Sexe M
3 C1 Nom LAHY
4 C1 Prénom Éric
5 C1 Nuance DXG
6 C1 Voix 2
7 C1 PctVoixIns 0.61
8 C1 PctVoixExp 1.23
9 C2 NPanneau 8
10 C2 Sexe M
# … with 6,235,494 more rows
Now add the pivot_wider(), again dropping the first 21 columns, purely for clarity.
read_excel(
"resultats-par-niveau-subcom-t1-france-entiere.xlsx",
.name_repair=function(x) {
suffixes <- c("NPanneau", "Sexe", "Nom", "Prénom", "Nuance", "Voix", "PctVoixIns", "PctVoixExp")
if ((length(x) - 21) %% 8 != 0) stop(paste("Don't know how to handle a sheet with", length(x), "columns [", (length(x) - 21) %% 8, "]"))
for (i in 1:length(x)) {
if (i > 21) {
x[i] <- paste0("C", 1 + floor((i-22)/8), "_", suffixes[1 + (i-22) %% 8])
}
}
x
}
) %>%
mutate(across(where(is.numeric) | where(is.logical), as.character)) %>%
pivot_longer(!1:21, names_sep="_", names_to=c("Candidate", "Variable"), values_to="Value") %>%
pivot_wider(names_from=Variable, values_from=Value) %>%
select(!1:21)
# A tibble: 779,438 × 9
Candidate NPanneau Sexe Nom Prénom Nuance Voix PctVoixIns PctVoixExp
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 C1 2 M LAHY Éric DXG 2 0.61 1.23
2 C2 8 M GUÉRAUD Sébastien NUP 26 7.95 15.95
3 C3 7 F ARMENJON Eliane ECO 3 0.92 1.84
4 C4 1 M GUILLERMIN Vincent ENS 30 9.17 18.4
5 C5 3 M BRETON Xavier LR 44 13.46 26.99
6 C6 5 M MENDES Michael DSV 3 0.92 1.84
7 C7 6 M BELLON Julien REC 6 1.83 3.68
8 C8 4 F PIROUX GIANNOTTI Brigitte RN 49 14.98 30.06
9 C9 NA NA NA NA NA NA NA NA
10 C10 NA NA NA NA NA NA NA NA
# … with 779,428 more rows
Finally, convert the "temporary character" columns back to numeric. (Still dropping the first 21 columns for clarity.)
read_excel(
"resultats-par-niveau-subcom-t1-france-entiere.xlsx",
.name_repair=function(x) {
suffixes <- c("NPanneau", "Sexe", "Nom", "Prénom", "Nuance", "Voix", "PctVoixIns", "PctVoixExp")
if ((length(x) - 21) %% 8 != 0) stop(paste("Don't know how to handle a sheet with", length(x), "columns [", (length(x) - 21) %% 8, "]"))
for (i in 1:length(x)) {
if (i > 21) {
x[i] <- paste0("C", 1 + floor((i-22)/8), "_", suffixes[1 + (i-22) %% 8])
}
}
x
}
) %>%
mutate(across(where(is.numeric) | where(is.logical), as.character)) %>%
pivot_longer(!1:21, names_sep="_", names_to=c("Candidate", "Variable"), values_to="Value") %>%
pivot_wider(names_from=Variable, values_from=Value) %>%
mutate(across(c(Voix, PctVoixIns, PctVoixExp), as.numeric)) %>%
select(!1:21)
# A tibble: 779,438 × 9
Candidate NPanneau Sexe Nom Prénom Nuance Voix PctVoixIns PctVoixExp
<chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 C1 2 M LAHY Éric DXG 2 0.61 1.23
2 C2 8 M GUÉRAUD Sébastien NUP 26 7.95 16.0
3 C3 7 F ARMENJON Eliane ECO 3 0.92 1.84
4 C4 1 M GUILLERMIN Vincent ENS 30 9.17 18.4
5 C5 3 M BRETON Xavier LR 44 13.5 27.0
6 C6 5 M MENDES Michael DSV 3 0.92 1.84
7 C7 6 M BELLON Julien REC 6 1.83 3.68
8 C8 4 F PIROUX GIANNOTTI Brigitte RN 49 15.0 30.1
9 C9 NA NA NA NA NA NA NA NA
10 C10 NA NA NA NA NA NA NA NA
# … with 779,428 more rows
This, I think, is the format you want, though you may need to arrange() the rows into the order you want. Obviously, you should drop the final %>% select(!1:21) for your production version.
It is an easy matter to convert this code to a function that accepts a filename as its parameter and then use this in an lapply to read an entire folder into a list of data frames. However...
It appears that not every file in the folder has the same layout. resultats-par-niveau-fe-t1-outre-mer.xlsx, for example, appears to have fewer "prefix columns" before the 8-columns-per-candidate repeat begins.
The import generates several warnings. This appears to be because the election(?) with the largest number of candidates does not appear in the first rows of the worksheet. I've not investigated whether these warnings are generated by meaningful problems with the import.

A quick way to rename multiple columns with unique names using dplyr

I am beginner R user, currently learning the tidyverse way. I imported a dataset which is a time series of monthly indexed consumer prices over a period of four years. The imported headings on the monthly CPI columns displayed in R as five digit numbers (as characters). Here is a short mockup recreation of what it looks like...
df <- tibble(`Product` = c("Eggs", "Chicken"),
`44213` = c(35.77, 36.77),
`44244` = c(39.19, 39.80),
`44272` = c(40.12, 43.42),
`44303` = c(41.09, 41.33)
)
# A tibble: 2 x 5
# Product `44213` `44244` `44272` `44303`
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 Eggs 35.8 39.2 40.1 41.1
#2 Chicken 36.8 39.8 43.4 41.3
I want to change the column headings (44213 etc) to dates that make more sense to me (still as characters). I understand, using dplyr, to do it the following way:
df <- df %>% rename("Jan17" = `44213`, "Feb17" = `44244`,
"Mar17" = `44272`, "Apr17" = `44303`)
# A tibble: 2 x 5
# Product Jan17 Feb17 Mar17 Apr17
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 Eggs 35.8 39.2 40.1 41.1
#2 Chicken 36.8 39.8 43.4 41.3
The problem is that my actual dataset contains 48 such columns (months) to rename and so it is a lot of work to type out. I looked at other replace and set_names functions but these seem to add in the repeated changes to the column names, don't provide new unique names like I am looking for?
(I realise dates as columns is not good practice and would need to shift these to rows before proceeding with any analysis... or maybe this must be a prior step to renaming?)
Trust I expressed my question sufficiently. Would love to learn a quicker solution using dplyr or be directed to where one can be found. Thank you for your time.
We can use !!! with rename by passing a named vector
library(dplyr)
library(stringr)
df1 <- df %>%
rename(!!! setNames(names(df)[-1], str_c(month.abb[1:4], 17)))
-output
df1
# A tibble: 2 x 5
# Product Jan17 Feb17 Mar17 Apr17
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 Eggs 35.8 39.2 40.1 41.1
#2 Chicken 36.8 39.8 43.4 41.3
Or use rename_with
df %>%
rename_with(~str_c(month.abb[1:4], 17), -1)
If the column names should be converted to Date formatted
nm1 <- format(as.Date(as.numeric(names(df)[-1]), origin = '1896-01-01'), '%b%y')
df %>%
rename_with(~ nm1, -1)
# A tibble: 2 x 5
# Product Jan17 Feb17 Mar17 Apr17
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 Eggs 35.8 39.2 40.1 41.1
#2 Chicken 36.8 39.8 43.4 41.3
using some random names, but sequentially
names(df)[2:ncol(df)] <- paste0('col_', 1:(ncol(df)-1), sep = '')
## A tibble: 2 x 5
# Product col_1 col_2 col_3 col_4
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 Eggs 35.8 39.2 40.1 41.1
#2 Chicken 36.8 39.8 43.4 41.3

post request with httr

I am new to httr. I am trying to use this geocoding api: https://geo.api.gouv.fr/adresse. I want to pass a csv file directly from R, as given in their example:
curl -X POST -F data=#search.csv -F columns=adresse -F columns=postcode https://api-adresse.data.gouv.fr/search/csv
The example csv is here: https://adresse.data.gouv.fr/exemples/search.csv
I tried, without specifying the columns:
library(httr)
test <- POST("https://api-adresse.data.gouv.fr/search/csv/",
body = "data = #search.csv")
> test
Response [https://api-adresse.data.gouv.fr/search/csv/]
Date: 2021-02-09 21:27
Status: 400
Content-Type: application/json; charset=utf-8
Size: 66 B
Or
test <- POST("https://api-adresse.data.gouv.fr/search/csv/",
body = "data = #search.csv",
content_type("application/json"))
But i still get 400 status. Specifying the whole file path did not work either. How does this work ? I would like to get the json, and read it in R
Thanks in advance !
I am not sure you can request to get json back, but here is how you might do this with httr:
library(httr)
r <- POST(url = "https://api-adresse.data.gouv.fr/search/csv",
body = list(data = upload_file("search.csv"),
columns = "adresse",
columns = "postcode"))
content(r)
# # A tibble: 4 x 20
# nom adresse postcode city latitude longitude result_label result_score result_type result_id
# <chr> <chr> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <chr> <chr>
# 1 Écol~ 6 Rue ~ 54600 Vill~ 48.7 6.15 6 Rue Alber~ 0.96 housenumber 54578_00~
# 2 Écol~ 6 Rue ~ 54500 Vand~ 48.7 6.15 6 Rue d’Aqu~ 0.96 housenumber 54547_00~
# 3 Écol~ 31 Rue~ 54180 Heil~ 48.6 6.21 31 Rue d’Ar~ 0.96 housenumber 54257_00~
# 4 Écol~ 1 bis ~ 54250 Cham~ 48.7 6.16 1 bis Rue d~ 0.95 housenumber 54115_01~
# # ... with 10 more variables: result_housenumber <chr>, result_name <chr>, result_street <lgl>,
# # result_postcode <dbl>, result_city <chr>, result_context <chr>, result_citycode <dbl>,
# # result_oldcitycode <lgl>, result_oldcity <lgl>, result_district <lgl>

read_csv() adds "\r" to *.csv input

I'm trying to read in a small (17kb), simple csv file from EdX.org (for an online course), and I've never had this trouble with readr::read_csv() before. Base-R read.csv() reads the file without generating the problem.
A small (17kb) csv file from EdX.org
library(tidyverse)
df <- read_csv("https://courses.edx.org/assets/courseware/v1/ccdc87b80d92a9c24de2f04daec5bb58/asset-v1:MITx+15.071x+1T2020+type#asset+block/WHO.csv")
head(df)
Gives this output
#> # A tibble: 6 x 13
#> Country Region Population Under15 Over60 FertilityRate LifeExpectancy
#> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 Afghan… Easte… 29825 47.4 3.82 "\r5.4\r" 60
#> 2 Albania Europe 3162 21.3 14.9 "\r1.75\r" 74
#> 3 Algeria Africa 38482 27.4 7.17 "\r2.83\r" 73
#> 4 Andorra Europe 78 15.2 22.9 <NA> 82
#> 5 Angola Africa 20821 47.6 3.84 "\r6.1\r" 51
#> 6 Antigu… Ameri… 89 26.0 12.4 "\r2.12\r" 75
#> # … with 6 more variables: ChildMortality <dbl>, CellularSubscribers <dbl>,
#> # LiteracyRate <chr>, GNI <chr>, PrimarySchoolEnrollmentMale <chr>,
#> # PrimarySchoolEnrollmentFemale <chr>
You'll notice that the column FertilityRate has "\r" added to the values. I've downloaded the csv file and cannot find them there.
Base-R read.csv() reads in the file with no problems, so I'm wondering what the problem is with my usage of the tidyverse read_csv().
head(df$FertilityRate)
#> [1] "\r5.4\r" "\r1.75\r" "\r2.83\r" NA "\r6.1\r" "\r2.12\r"
How can I fix my usage of read_csv() so that: the "\r" strings are not there?
If possible, I'd prefer not to have to individually specify the type of every single column.
In a nutshell, the characters are inside the file (probably by accident) and read_csv is right to not remove them automatically: since they occur within quotes, this by convention means that a CSV parser should treat the field as-is, and not strip out whitespace characters. read.csv is wrong to do so, and this is arguably a bug.
You can strip them out yourself once you’ve loaded the data:
df = mutate_if(df, is.character, ~ stringr::str_remove_all(.x, '\r'))
This seems to be good enough for this file, but in general I’d be wary that the file might be damaged in other ways, since the presence of these characters is clearly not intentional, and the file follows no common file ending convention (it’s neither a conventional Windows nor Unix file).

Format a tbl within a dplyr chain

I am trying to add commas for thousands in my data e.g. 10,000 along with dollars e.g. $10,000.
I'm using several dplyr commands along with tidyr gather and spread functions. Here's what I tried:
Cut n paste this code block to generate the random data "dataset" I'm working with:
library(dplyr)
library(tidyr)
library(lubridate)
## Generate some data
channels <- c("Facebook", "Youtube", "SEM", "Organic", "Direct", "Email")
last_month <- Sys.Date() %m+% months(-1) %>% floor_date("month")
mts <- seq(from = last_month %m+% months(-23), to = last_month, by = "1 month") %>% as.Date()
dimvars <- expand.grid(Month = mts, Channel = channels, stringsAsFactors = FALSE)
# metrics
rws <- nrow(dimvars)
set.seed(42)
# generates variablility in the random data
randwalk <- function(initial_val, ...){
initial_val + cumsum(rnorm(...))
}
Sessions <- ceiling(randwalk(3000, n = rws, mean = 8, sd = 1500)) %>% abs()
Revenue <- ceiling(randwalk(10000, n = rws, mean = 0, sd = 3500)) %>% abs()
# make primary df
dataset <- cbind(dimvars, Revenue)
Which looks like:
> tbl_df(dataset)
# A tibble: 144 × 3
Month Channel Revenue
<date> <chr> <dbl>
1 2015-06-01 Facebook 8552
2 2015-07-01 Facebook 12449
3 2015-08-01 Facebook 10765
4 2015-09-01 Facebook 9249
5 2015-10-01 Facebook 11688
6 2015-11-01 Facebook 7991
7 2015-12-01 Facebook 7849
8 2016-01-01 Facebook 2418
9 2016-02-01 Facebook 6503
10 2016-03-01 Facebook 5545
# ... with 134 more rows
Now I want to spread the months into columns to show revenue trend by channel, month over month. I can do that like so:
revenueTable <- dataset %>% select(Month, Channel, Revenue) %>%
group_by(Month, Channel) %>%
summarise(Revenue = sum(Revenue)) %>%
#mutate(Revenue = paste0("$", format(Revenue, big.interval = ","))) %>%
gather(Key, Value, -Channel, -Month) %>%
spread(Month, Value) %>%
select(-Key)
And it looks almost exactly as I want:
> revenueTable
# A tibble: 6 × 25
Channel `2015-06-01` `2015-07-01` `2015-08-01` `2015-09-01` `2015-10-01` `2015-11-01` `2015-12-01` `2016-01-01` `2016-02-01` `2016-03-01` `2016-04-01`
* <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Direct 11910 8417 4012 359 4473 2702 6261 6167 8630 5230 1394
2 Email 7244 3517 671 1339 10788 10575 8567 8406 7856 6345 7733
3 Facebook 8552 12449 10765 9249 11688 7991 7849 2418 6503 5545 3908
4 Organic 4191 978 219 4274 2924 4155 5981 9719 8220 8829 7024
5 SEM 2344 6873 10230 6429 5016 2964 3390 3841 3163 1994 2105
6 Youtube 186 2949 2144 5073 1035 4878 7905 7377 2305 4556 6247
# ... with 13 more variables: `2016-05-01` <dbl>, `2016-06-01` <dbl>, `2016-07-01` <dbl>, `2016-08-01` <dbl>, `2016-09-01` <dbl>, `2016-10-01` <dbl>,
# `2016-11-01` <dbl>, `2016-12-01` <dbl>, `2017-01-01` <dbl>, `2017-02-01` <dbl>, `2017-03-01` <dbl>, `2017-04-01` <dbl>, `2017-05-01` <dbl>
Now the part I'm struggling with. I would like to format the data as currency. I tried adding this inbetween summarise() and gather() within the chain:
mutate(Revenue = paste0("$", format(Revenue, big.interval = ","))) %>%
This half works. The dollar sign is prepended but the comma separators do not show. I tried removing the paste0("$" part to see if I could get the comma formatting to work with no success.
How can I format my tbl as a currency with dollars and commas, rounded to nearest whole dollars (no $1.99, just $2)?
I think you can just do this at the end with dplyr::mutate_at().
revenueTable %>% mutate_at(vars(-Channel), funs(. %>% round(0) %>% scales::dollar()))
#> # A tibble: 6 x 25
#> Channel `2015-06-01` `2015-07-01` `2015-08-01` `2015-09-01`
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Direct $11,910 $8,417 $4,012 $359
#> 2 Email $7,244 $3,517 $671 $1,339
#> 3 Facebook $8,552 $12,449 $10,765 $9,249
#> 4 Organic $4,191 $978 $219 $4,274
#> 5 SEM $2,344 $6,873 $10,230 $6,429
#> 6 Youtube $186 $2,949 $2,144 $5,073
#> # ... with 20 more variables: `2015-10-01` <chr>, `2015-11-01` <chr>,
#> # `2015-12-01` <chr>, `2016-01-01` <chr>, `2016-02-01` <chr>,
#> # `2016-03-01` <chr>, `2016-04-01` <chr>, `2016-05-01` <chr>,
#> # `2016-06-01` <chr>, `2016-07-01` <chr>, `2016-08-01` <chr>,
#> # `2016-09-01` <chr>, `2016-10-01` <chr>, `2016-11-01` <chr>,
#> # `2016-12-01` <chr>, `2017-01-01` <chr>, `2017-02-01` <chr>,
#> # `2017-03-01` <chr>, `2017-04-01` <chr>, `2017-05-01` <chr>
We can use data.table
library(data.table)
nm1 <- setdiff(names(revenueTable), 'Channel')
setDT(revenueTable)[, (nm1) := lapply(.SD, function(x)
scales::dollar(round(x))), .SDcols = nm1]
revenueTable[, 1:3, with = FALSE]
# Channel `2015-06-01` `2015-07-01`
#1: Direct $11,910 $8,417
#2: Email $7,244 $3,517
#3: Facebook $8,552 $12,449
#4: Organic $4,191 $978
#5: SEM $2,344 $6,873
#6: Youtube $186 $2,949

Resources