My second participation here, in Stackoverflow.
I have a function called bw_test with several args like this:
bw_test <- function(localip, remoteip, localspeed, remotespeed , duracion =30,direction ="both"){
comando <- str_c("ssh usuario#", localip ," /tool bandwidth-test direction=", direction," remote-tx-speed=",remotespeed,"M local-tx-speed=",localspeed,"M protocol=udp user=usuario password=mipasso duration=",duracion," ",remoteip)
resultado <- system(comando,intern = T,ignore.stderr = T)
# resultado pull from a ssh server a vector like this:
# head(resultado)
#[1] " status: connecting\r" " tx-current: #0bps\r" " tx-10-second-average: 0bps\r"
#[4] " tx-total-average: 0bps\r" " rx-current: #0bps\r" " rx-10-second-average: 0bps\r"
resultado %<>%
replace("\r","") %>%
tail(17) %>%
trimws("both") %>%
as_tibble %>%
mutate(local=localip, remote=remoteip) %>%
separate(value,sep=":", into=c("parametro","valor")) %>%
head(15)
resultado$valor %<>%
trimws() %>%
str_replace("Mbps","") %>% str_replace("%","") %>% str_replace("s","")
resultado %<>%
spread(parametro,valor)
resultado %<>%
mutate(`tx-percentaje`=as.numeric(resultado$`tx-total-average`)/localspeed) %>%
mutate(`rx-percentaje`=as.numeric(resultado$`rx-total-average`)/remotespeed)
return(resultado)
}
this function returns a tibble like this one:
A tibble: 1 x 19
local remote `connection-cou… direction duration `local-cpu-load` `lost-packets` `random-data` `remote-cpu-loa…
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 192.… 192.1… 1 both 4 13 0 no 12
# … with 10 more variables: `rx-10-second-average` <chr>, `rx-current` <chr>, `rx-size` <chr>,
# `rx-total-average` <chr>, `tx-10-second-average` <chr>, `tx-current` <chr>, `tx-size` <chr>,
# `tx-total-average` <chr>, `tx-percentaje` <dbl>, `rx-percentaje` <dbl>
So, when I call the function inside rbind, got the result of every run on a tibble:
rbind(bw_test("192.168.105.10" ,"192.168.105.18", 75,125),
bw_test("192.168.133.11","192.168.133.9", 5 ,50),
bw_test("192.168.254.251","192.168.254.250", 25,150))
My results are for the example:
# A tibble: 3 x 19
local remote `connection-cou… direction duration `local-cpu-load` `lost-packets` `random-data` `remote-cpu-loa…
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 192.… 192.1… 20 both 28 63 232 no 48
2 192.… 192.1… 20 both 29 4 0 no 20
3 192.… 192.1… 20 both 29 15 0 no 22
# … with 10 more variables: `rx-10-second-average` <chr>, `rx-current` <chr>, `rx-size` <chr>,
# `rx-total-average` <chr>, `tx-10-second-average` <chr>, `tx-current` <chr>, `tx-size` <chr>,
# `tx-total-average` <chr>, `tx-percentaje` <dbl>, `rx-percentaje` <dbl>
My problem is to apply the function to the cases of a tibble like like this.
aps <- tribble(
~name, ~ip, ~remoteip , ~bw_test, ~localspeed,~remotespeed,
"backbone_border_core","192.168.253.1", "192.168.253.3", 1,200,200,
"backbone_2_site2","192.168.254.251", "192.168.254.250", 1, 25,150
}
I was trying to use map, but i got:
map(c(aps$ip,aps$remoteip,aps$localspeed,aps$remotespeed), bw_test)
el argumento "remotespeed" está ausente, sin valor por omisión
I believe cause c(aps$ip,aps$remoteip,aps$localspeed,aps$remotespeed) feeds first all cases of aps$ip, later all of aps$remoteip and so on.
I'm using the right strategie? it's map a suitable way
What i'm doing wrong?
¿how can I apply function to every row to get the requested tibble?
I'll appreciate your kindly help.
Greets.
Try using pmap_df.
output <- purrr::pmap_df(list(aps$ip, aps$remoteip, aps$localspeed,
aps$remotespeed), bw_test)
Related
I'm looking for direction on assigning a data frame column value to a specific place in a function and then looping or something to create a series of objects to be bound into a longer table.
example data
a = c("17","17","29")
b = c("133","163","055")
data.frame(a, b)
doing this manually...
library(zipcodeR)
T1 <- search_fips("17", "133")
T2 <- search_fips("17", "163")
T3 <- search_fips("29", "055")
TT <- list(T1, T2, T3)
CZ_zips <- rbindlist(TT, use.names=TRUE, fill=TRUE)
would like a to read a and b columns into a set place in function to create a series of vectors or data frames that can then be bound into one longer table.
the search_fips function pulls out of the census FIPS data, a = state and b = county. package is zipcodeR.
One simple way is to wrap the search_fips() function in a lapply function and rest stays the same.
library(zipcodeR)
a = c("17","17","29")
b = c("133","163","055")
df<-data.frame(a, b)
output <-lapply(1:nrow(df), function(i) {
search_fips(df$a[i], df$b[i])
})
answer <- dplyr::bind_rows(output)
here is a loop you might want to put in your function:
library(dplyr)
library(zipcodeR)
my_list <- list()
for (i in 1:nrow(df)) {
my_list[i] <- search_fips(df$a[i], df$b[i])
}
new_df <- bind_rows(my_list)
bind_rows(my_list)
Using rowwise
library(dplyr)
library(tidyr)
library(zipcodeR)
out <- df %>%
rowwise %>%
mutate(result = list(search_fips(a, b))) %>%
ungroup %>%
unnest(result)
-output
> head(out, 2)
# A tibble: 2 × 26
a b zipcode zipcode_type major_city post_office_city common_city_list county state lat lng timezone radius_in_miles area_code_list
<chr> <chr> <chr> <chr> <chr> <chr> <blob> <chr> <chr> <dbl> <dbl> <chr> <dbl> <blob>
1 17 133 62236 Standard Columbia Columbia, IL <raw 20 B> Monroe County IL 38.4 -90.2 Central 7 <raw 15 B>
2 17 133 62244 Standard Fults Fults, IL <raw 17 B> Monroe County IL 38.2 -90.2 Central 7 <raw 15 B>
# … with 12 more variables: population <int>, population_density <dbl>, land_area_in_sqmi <dbl>, water_area_in_sqmi <dbl>, housing_units <int>,
# occupied_housing_units <int>, median_home_value <int>, median_household_income <int>, bounds_west <dbl>, bounds_east <dbl>,
# bounds_north <dbl>, bounds_south <dbl>
data
df <- structure(list(a = c("17", "17", "29"), b = c("133", "163", "055"
)), class = "data.frame", row.names = c(NA, -3L))
Here is a solution with Map.
The two one-liners below are equivalent, the first is probably more readable but the other one is simpler.
library(zipcodeR)
a <- c("17", "17", "29")
b <- c("133", "163", "055")
df <- data.frame(a, b)
Map(function(x, y) search_fips(x, y), df$a, df$b)
result <- Map(search_fips, df$a, df$b)
result <- dplyr::bind_rows(result)
head(result)
#> # A tibble: 6 x 24
#> zipcode zipcode_type major_city post_office_city common_city_list county state
#> <chr> <chr> <chr> <chr> <blob> <chr> <chr>
#> 1 62236 Standard Columbia Columbia, IL <raw 20 B> Monro~ IL
#> 2 62244 Standard Fults Fults, IL <raw 17 B> Monro~ IL
#> 3 62248 PO Box Hecker Hecker, IL <raw 18 B> Monro~ IL
#> 4 62256 PO Box Maeystown <NA> <raw 21 B> Monro~ IL
#> 5 62279 PO Box Renault Renault, IL <raw 19 B> Monro~ IL
#> 6 62295 Standard Valmeyer Valmeyer, IL <raw 20 B> Monro~ IL
#> # ... with 17 more variables: lat <dbl>, lng <dbl>, timezone <chr>,
#> # radius_in_miles <dbl>, area_code_list <blob>, population <int>,
#> # population_density <dbl>, land_area_in_sqmi <dbl>,
#> # water_area_in_sqmi <dbl>, housing_units <int>,
#> # occupied_housing_units <int>, median_home_value <int>,
#> # median_household_income <int>, bounds_west <dbl>, bounds_east <dbl>,
#> # bounds_north <dbl>, bounds_south <dbl>
Created on 2022-02-18 by the reprex package (v2.0.1)
I am trying to webscrape the table from this following page (https://www.coya.com/bike/fahrrad-index-2019), namely the values the bike index for 50 german cities (if u click "Alle Ergebnisse +", you ll see all 50 cities.
I need especially some columns ("Bewertung spezielle Radwege & Qualität der Radwege", "Investitionen & QUalität der Infrastruktur", "Bewertung der Infrastruktur", "Fahrradsharing-Score", "Autofreier Tag", "Critical-Mass-Fahrrad-aktionen, "Event-Score).
This is what I tried:
library(rvest)
num_link="https://www.coya.com/bike/fahrrad-index-2019"
num_page= read_html(num_link)
xyc= num_page %>% html_nodes("._1200:nth-child(2)") %>% html_text()
I tried Selectorgadget, unfortunately I get all the values of the table in a long String (str_split is challenging, because commas in numbers got mixed with commas between the numbers:
"[1] "Ergebnisse für DeutschlandKriminalitätInfrastrukturFahrrad-SharingEvents#StadtLandSizeTotal Score1OldenburgDeutschlandK57,90,4271,94588,3594,4684,5227,153,0590,3454,1836,4515,0525,75N31,5216,2669,122MünsterDeutschlandK58,740,3910,53445,5883,0488,4328,1551,2388,0453,0535,522630,76N23,8412,4265,933Freiburg i. Breisg.DeutschlandK59,350,"
Could someone help me scraping the table, if possible, especially only some values of specific columns (see above)? Very thankful for any help/tip.
Thank you in advance.
(I am a newbie, please be gentle.)
Here is one way to solve the puzzle. Though the row names use a lot of icons so I just leave empty column name. You can create a vector names and assign them manually using
names(table_content) <- names_vector
Here is the code
library(rvest)
#> Loading required package: xml2
library(dplyr, warn.conflicts = FALSE)
library(purrr)
# Here is just reuse your code
num_link <- "https://www.coya.com/bike/fahrrad-index-2019"
num_page <- read_html(num_link)
# Extract the items from your code but go further down
table_content <- num_page %>% html_nodes("._1200:nth-child(2)") %>%
# Extract the node that contain the table
html_nodes(css = ".w-dyn-list") %>%
# Extract the nodes corresponded to each row
html_nodes(css = ".bike-collection-item") %>%
# Then map the function that take each rows in and convert them to a table
# and bind them together into one table
map_dfr(function(x) {
# suppress the message due to no column name was feed into map_dfc
suppressMessages(
x %>% html_nodes(".td") %>%
map_dfc(function(x) { x %>% html_text })
)
})
Here is the extracted content
#> # A tibble: 70 x 21
#> ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 Olde… Deut… K 57,9 0,427 1,94 588,… 94,46 84,52 27,1 53,05 90,34
#> 2 2 Müns… Deut… K 58,74 0,391 0,53 445,… 83,04 88,43 28,15 51,23 88,04
#> 3 3 Frei… Deut… K 59,35 0,34 2,27 962,… 88,87 77,52 32,57 48,11 93,49
#> 4 4 Bamb… Deut… K 55,59 0,302 0 456,… 89,04 92,66 30,29 47,74 93,75
#> 5 5 Gött… Deut… K 62,66 0,28 3,07 379,… 92,8 80,99 23,03 48,07 89,18
#> 6 6 Heid… Deut… K 63,14 0,22 1,21 394,… 90,39 88,33 29,02 47,88 94,21
#> 7 7 Karl… Deut… K 57,39 0,25 4,23 725,… 90,35 71,62 18,75 46,33 93,93
#> 8 8 Brau… Deut… K 67,36 0,21 0 522,… 85,89 90,97 20,55 49,2 89,78
#> 9 9 Kons… Deut… K 62,77 0,22 4,6 121,… 93,62 76,98 23 48,49 94,09
#> 10 10 Brem… Deut… M 58,86 0,21 1,38 334,… 87,34 87,15 18,64 59,78 94,64
#> # … with 60 more rows, and 8 more variables: ...14 <chr>, ...15 <chr>,
#> # ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>, ...20 <chr>,
#> # ...21 <chr>
Created on 2021-04-08 by the reprex package (v1.0.0)
You probably want to do one table at a time and then the columns one at a time, so you can then create a data frame. Try this for example:
col1 <- num_page %>% html_nodes(paste0(".w-dyn-item :nth-child(2) div")) %>%
html_text()
The selector gadget is nifty but I usually need to experiment a lot to get the right selectors.
I am new to httr. I am trying to use this geocoding api: https://geo.api.gouv.fr/adresse. I want to pass a csv file directly from R, as given in their example:
curl -X POST -F data=#search.csv -F columns=adresse -F columns=postcode https://api-adresse.data.gouv.fr/search/csv
The example csv is here: https://adresse.data.gouv.fr/exemples/search.csv
I tried, without specifying the columns:
library(httr)
test <- POST("https://api-adresse.data.gouv.fr/search/csv/",
body = "data = #search.csv")
> test
Response [https://api-adresse.data.gouv.fr/search/csv/]
Date: 2021-02-09 21:27
Status: 400
Content-Type: application/json; charset=utf-8
Size: 66 B
Or
test <- POST("https://api-adresse.data.gouv.fr/search/csv/",
body = "data = #search.csv",
content_type("application/json"))
But i still get 400 status. Specifying the whole file path did not work either. How does this work ? I would like to get the json, and read it in R
Thanks in advance !
I am not sure you can request to get json back, but here is how you might do this with httr:
library(httr)
r <- POST(url = "https://api-adresse.data.gouv.fr/search/csv",
body = list(data = upload_file("search.csv"),
columns = "adresse",
columns = "postcode"))
content(r)
# # A tibble: 4 x 20
# nom adresse postcode city latitude longitude result_label result_score result_type result_id
# <chr> <chr> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <chr> <chr>
# 1 Écol~ 6 Rue ~ 54600 Vill~ 48.7 6.15 6 Rue Alber~ 0.96 housenumber 54578_00~
# 2 Écol~ 6 Rue ~ 54500 Vand~ 48.7 6.15 6 Rue d’Aqu~ 0.96 housenumber 54547_00~
# 3 Écol~ 31 Rue~ 54180 Heil~ 48.6 6.21 31 Rue d’Ar~ 0.96 housenumber 54257_00~
# 4 Écol~ 1 bis ~ 54250 Cham~ 48.7 6.16 1 bis Rue d~ 0.95 housenumber 54115_01~
# # ... with 10 more variables: result_housenumber <chr>, result_name <chr>, result_street <lgl>,
# # result_postcode <dbl>, result_city <chr>, result_context <chr>, result_citycode <dbl>,
# # result_oldcitycode <lgl>, result_oldcity <lgl>, result_district <lgl>
I'm using a webscraper to scrape some data from FinViz. Here's an example
The problem is that the data frame is messy, the first column holds what I would ideally want as the headers and the second column holds the corresponding data. Here's an output:
data1 data2 data3 data4 data5 data6 data7 data8 data9 data10
1 Index S&P 500 P/E 36.13 EPS (ttm) 4.60 Insider Own 0.10% Shs Outstand 2.93B
2 Market Cap 487.15B Forward P/E 25.65 EPS next Y 6.48 Insider Trans -86.95% Shs Float 2.33B
3 Income 13.58B PEG 1.36 EPS next Q 1.27 Inst Own 72.50% Short Float 0.87%
4 Sales 33.17B P/S 14.69 EPS this Y 170.20% Inst Trans -0.22% Short Ratio 1.13
5 Book/sh 22.92 P/B 7.26 EPS next Y 21.63% ROA 20.30% Target Price 192.62
6 Cash/sh 12.10 P/C 13.74 EPS next 5Y 26.57% ROE 22.50% 52W Range 113.55 - 175.49
7 Dividend - P/FCF 34.05 EPS past 5Y 62.10% ROI 17.10% 52W High -5.23%
8 Dividend % - Quick Ratio 12.30 Sales past 5Y 49.40% Gross Margin 86.60% 52W Low 46.47%
9 Employees 20658 Current Ratio 12.30 Sales Q/Q 44.80% Oper. Margin 46.40% RSI (14) 49.05
10 Optionable Yes Debt/Eq 0.00 EPS Q/Q 68.80% Profit Margin 40.90% Rel Volume 0.70
11 Shortable Yes LT Debt/Eq 0.00 Earnings Jul 26 AMC Payout 0.00% Avg Volume 17.87M
12 Recom 1.70 SMA20 -1.84% SMA50 2.85% SMA200 17.52% Volume 12,583,873
As you can see, data1 contains the categories and data2 contains the following information.
Ideally I'd want it in this structure:
Index | Market Cap | Income | Sales | Book sh | ...
------------------------------------------------
S&P500 | 487.15B | 13.58B | 33.17B | 22.92 |
So that data1,3,5,7 were all the headers and data2,4,6,8 where all in one row.
Could anyone provide any input? I'm trying to avoid compiling them into 2 different vectors then rbinding the frame together.
Cheerio!
Here is a solution using some tidyverse packages and your dataset.
library(rvest) # for scrapping the data
#> Le chargement a nécessité le package : xml2
library(dplyr, warn.conflicts = F)
library(tidyr)
library(purrr, warn.conflict = F)
Fisrt, we get your data directly from your example url.
tab <- read_html("http://finviz.com/quote.ashx?t=BA") %>%
html_node("table.snapshot-table2") %>%
html_table(header = F) %>%
as_data_frame()
tab
#> # A tibble: 12 x 12
#> X1 X2 X3 X4 X5 X6
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Index DJIA S&P500 P/E 20.77 EPS (ttm) 11.42
#> 2 Market Cap 141.89B Forward P/E 22.14 EPS next Y 10.71
#> 3 Income 7.12B PEG 1.13 EPS next Q 2.62
#> 4 Sales 90.90B P/S 1.56 EPS this Y 2.30%
#> 5 Book/sh -3.34 P/B - EPS next Y 7.28%
#> 6 Cash/sh 17.26 P/C 13.74 EPS next 5Y 18.36%
#> 7 Dividend 5.68 P/FCF 17.94 EPS past 5Y 7.40%
#> 8 Dividend % 2.39% Quick Ratio 0.40 Sales past 5Y 6.60%
#> 9 Employees 150500 Current Ratio 1.20 Sales Q/Q -8.10%
#> 10 Optionable Yes Debt/Eq - EPS Q/Q 885.50%
#> 11 Shortable Yes LT Debt/Eq - Earnings Jul 26 BMO
#> 12 Recom 2.20 SMA20 -0.16% SMA50 8.14%
#> # ... with 6 more variables: X7 <chr>, X8 <chr>, X9 <chr>, X10 <chr>,
#> # X11 <chr>, X12 <chr>
As headers are in every odd column and data in every even column, we
create a tidy tibble of two columns by row binding the subsets. For
that, we generate odd and even index. Then,
purrr::map_dfr allows us to iterates over those 2 lists, applies a function and row bind the results. The function consist of selecting 2 columns with of the table[ ]and rename those two columns withset_names.
col_num <- seq_len(ncol(tab))
even <- col_num[col_num %% 2 == 0]
odd <- setdiff(col_num, even)
tab2 <- map2_dfr(odd, even, ~ set_names(tab[, c(.x, .y)], c("header", "value")))
tab2
#> # A tibble: 72 x 2
#> header value
#> <chr> <chr>
#> 1 Index DJIA S&P500
#> 2 Market Cap 141.89B
#> 3 Income 7.12B
#> 4 Sales 90.90B
#> 5 Book/sh -3.34
#> 6 Cash/sh 17.26
#> 7 Dividend 5.68
#> 8 Dividend % 2.39%
#> 9 Employees 150500
#> 10 Optionable Yes
#> # ... with 62 more rows
You have a nice 2 column long table with all your data. Now if you want
the table in wide format instead of long format, you have to transpose.
But first, we have to deal with some duplicates names in the header
column. You can't have duplicates column names.
tab2 %>%
filter(header == header[duplicated(header)])
#> # A tibble: 2 x 2
#> header value
#> <chr> <chr>
#> 1 EPS next Y 10.71
#> 2 EPS next Y 7.28%
We just rename the second occurence adding _2
tab3 <- tab2 %>%
mutate(header = case_when(
duplicated(header) ~ paste(header, 2, sep = "_"),
TRUE ~ header)
)
# No more duplicates
any(duplicated(tab3$header))
#> [1] FALSE
tab3 %>% filter(stringr::str_detect(header, "EPS next Y"))
#> # A tibble: 2 x 2
#> header value
#> <chr> <chr>
#> 1 EPS next Y 10.71
#> 2 EPS next Y_2 7.28%
You can pass in wide format and have 72 columns instead of 72 lines.
tab3 %>%
spread(header, value)
#> # A tibble: 1 x 72
#> `52W High` `52W Low` `52W Range` ATR `Avg Volume` Beta `Book/sh`
#> * <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 -3.78% 87.78% 126.31 - 246.49 3.77 3.46M 1.18 -3.34
#> # ... with 65 more variables: `Cash/sh` <chr>, Change <chr>, `Current
#> # Ratio` <chr>, `Debt/Eq` <chr>, Dividend <chr>, `Dividend %` <chr>,
#> # Earnings <chr>, Employees <chr>, `EPS (ttm)` <chr>, `EPS next
#> # 5Y` <chr>, `EPS next Q` <chr>, `EPS next Y` <chr>, `EPS next
#> # Y_2` <chr>, `EPS past 5Y` <chr>, `EPS Q/Q` <chr>, `EPS this Y` <chr>,
#> # `Forward P/E` <chr>, `Gross Margin` <chr>, Income <chr>, Index <chr>,
#> # `Insider Own` <chr>, `Insider Trans` <chr>, `Inst Own` <chr>, `Inst
#> # Trans` <chr>, `LT Debt/Eq` <chr>, `Market Cap` <chr>, `Oper.
#> # Margin` <chr>, Optionable <chr>, `P/B` <chr>, `P/C` <chr>,
#> # `P/E` <chr>, `P/FCF` <chr>, `P/S` <chr>, Payout <chr>, PEG <chr>,
#> # `Perf Half Y` <chr>, `Perf Month` <chr>, `Perf Quarter` <chr>, `Perf
#> # Week` <chr>, `Perf Year` <chr>, `Perf YTD` <chr>, `Prev Close` <chr>,
#> # Price <chr>, `Profit Margin` <chr>, `Quick Ratio` <chr>, Recom <chr>,
#> # `Rel Volume` <chr>, ROA <chr>, ROE <chr>, ROI <chr>, `RSI (14)` <chr>,
#> # Sales <chr>, `Sales past 5Y` <chr>, `Sales Q/Q` <chr>, `Short
#> # Float` <chr>, `Short Ratio` <chr>, Shortable <chr>, `Shs Float` <chr>,
#> # `Shs Outstand` <chr>, SMA20 <chr>, SMA200 <chr>, SMA50 <chr>, `Target
#> # Price` <chr>, Volatility <chr>, Volume <chr>
Idea: You can also replace all the spaces by _ in the header column to have column names without spaces. Often simpler to handle.
Would this work ?
data <- data.frame(data1= letters[1:10],data2=LETTERS[1:10],data3= letters[11:20],data4=LETTERS[11:20],stringsAsFactors=F)
# data1 data2 data3 data4
# 1 a A k K
# 2 b B l L
# 3 c C m M
# 4 d D n N
# 5 e E o O
# 6 f F p P
# 7 g G q Q
# 8 h H r R
# 9 i I s S
# 10 j J t T
output <- setNames(data.frame(
t(unlist(data[!as.logical(seq_along(data)%%2)]))),
unlist(data[as.logical(seq_along(data)%%2)]))
# a b c d e f g h i j k l m n o p q r s t
# 1 A B C D E F G H I J K L M N O P Q R S T
You can try:
library(data.table); library(dplyr)
table1 <- df[, 1:2] %>%as.data.table() %>% dcast.data.table(.~data1, value.var = "data2")
table2 <- df[, 3:4] %>%as.data.table() %>% dcast.data.table(.~data3, value.var = "data4")
cbind(table1, table2)
and so on for the rest
I am trying to add commas for thousands in my data e.g. 10,000 along with dollars e.g. $10,000.
I'm using several dplyr commands along with tidyr gather and spread functions. Here's what I tried:
Cut n paste this code block to generate the random data "dataset" I'm working with:
library(dplyr)
library(tidyr)
library(lubridate)
## Generate some data
channels <- c("Facebook", "Youtube", "SEM", "Organic", "Direct", "Email")
last_month <- Sys.Date() %m+% months(-1) %>% floor_date("month")
mts <- seq(from = last_month %m+% months(-23), to = last_month, by = "1 month") %>% as.Date()
dimvars <- expand.grid(Month = mts, Channel = channels, stringsAsFactors = FALSE)
# metrics
rws <- nrow(dimvars)
set.seed(42)
# generates variablility in the random data
randwalk <- function(initial_val, ...){
initial_val + cumsum(rnorm(...))
}
Sessions <- ceiling(randwalk(3000, n = rws, mean = 8, sd = 1500)) %>% abs()
Revenue <- ceiling(randwalk(10000, n = rws, mean = 0, sd = 3500)) %>% abs()
# make primary df
dataset <- cbind(dimvars, Revenue)
Which looks like:
> tbl_df(dataset)
# A tibble: 144 × 3
Month Channel Revenue
<date> <chr> <dbl>
1 2015-06-01 Facebook 8552
2 2015-07-01 Facebook 12449
3 2015-08-01 Facebook 10765
4 2015-09-01 Facebook 9249
5 2015-10-01 Facebook 11688
6 2015-11-01 Facebook 7991
7 2015-12-01 Facebook 7849
8 2016-01-01 Facebook 2418
9 2016-02-01 Facebook 6503
10 2016-03-01 Facebook 5545
# ... with 134 more rows
Now I want to spread the months into columns to show revenue trend by channel, month over month. I can do that like so:
revenueTable <- dataset %>% select(Month, Channel, Revenue) %>%
group_by(Month, Channel) %>%
summarise(Revenue = sum(Revenue)) %>%
#mutate(Revenue = paste0("$", format(Revenue, big.interval = ","))) %>%
gather(Key, Value, -Channel, -Month) %>%
spread(Month, Value) %>%
select(-Key)
And it looks almost exactly as I want:
> revenueTable
# A tibble: 6 × 25
Channel `2015-06-01` `2015-07-01` `2015-08-01` `2015-09-01` `2015-10-01` `2015-11-01` `2015-12-01` `2016-01-01` `2016-02-01` `2016-03-01` `2016-04-01`
* <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Direct 11910 8417 4012 359 4473 2702 6261 6167 8630 5230 1394
2 Email 7244 3517 671 1339 10788 10575 8567 8406 7856 6345 7733
3 Facebook 8552 12449 10765 9249 11688 7991 7849 2418 6503 5545 3908
4 Organic 4191 978 219 4274 2924 4155 5981 9719 8220 8829 7024
5 SEM 2344 6873 10230 6429 5016 2964 3390 3841 3163 1994 2105
6 Youtube 186 2949 2144 5073 1035 4878 7905 7377 2305 4556 6247
# ... with 13 more variables: `2016-05-01` <dbl>, `2016-06-01` <dbl>, `2016-07-01` <dbl>, `2016-08-01` <dbl>, `2016-09-01` <dbl>, `2016-10-01` <dbl>,
# `2016-11-01` <dbl>, `2016-12-01` <dbl>, `2017-01-01` <dbl>, `2017-02-01` <dbl>, `2017-03-01` <dbl>, `2017-04-01` <dbl>, `2017-05-01` <dbl>
Now the part I'm struggling with. I would like to format the data as currency. I tried adding this inbetween summarise() and gather() within the chain:
mutate(Revenue = paste0("$", format(Revenue, big.interval = ","))) %>%
This half works. The dollar sign is prepended but the comma separators do not show. I tried removing the paste0("$" part to see if I could get the comma formatting to work with no success.
How can I format my tbl as a currency with dollars and commas, rounded to nearest whole dollars (no $1.99, just $2)?
I think you can just do this at the end with dplyr::mutate_at().
revenueTable %>% mutate_at(vars(-Channel), funs(. %>% round(0) %>% scales::dollar()))
#> # A tibble: 6 x 25
#> Channel `2015-06-01` `2015-07-01` `2015-08-01` `2015-09-01`
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Direct $11,910 $8,417 $4,012 $359
#> 2 Email $7,244 $3,517 $671 $1,339
#> 3 Facebook $8,552 $12,449 $10,765 $9,249
#> 4 Organic $4,191 $978 $219 $4,274
#> 5 SEM $2,344 $6,873 $10,230 $6,429
#> 6 Youtube $186 $2,949 $2,144 $5,073
#> # ... with 20 more variables: `2015-10-01` <chr>, `2015-11-01` <chr>,
#> # `2015-12-01` <chr>, `2016-01-01` <chr>, `2016-02-01` <chr>,
#> # `2016-03-01` <chr>, `2016-04-01` <chr>, `2016-05-01` <chr>,
#> # `2016-06-01` <chr>, `2016-07-01` <chr>, `2016-08-01` <chr>,
#> # `2016-09-01` <chr>, `2016-10-01` <chr>, `2016-11-01` <chr>,
#> # `2016-12-01` <chr>, `2017-01-01` <chr>, `2017-02-01` <chr>,
#> # `2017-03-01` <chr>, `2017-04-01` <chr>, `2017-05-01` <chr>
We can use data.table
library(data.table)
nm1 <- setdiff(names(revenueTable), 'Channel')
setDT(revenueTable)[, (nm1) := lapply(.SD, function(x)
scales::dollar(round(x))), .SDcols = nm1]
revenueTable[, 1:3, with = FALSE]
# Channel `2015-06-01` `2015-07-01`
#1: Direct $11,910 $8,417
#2: Email $7,244 $3,517
#3: Facebook $8,552 $12,449
#4: Organic $4,191 $978
#5: SEM $2,344 $6,873
#6: Youtube $186 $2,949