Having issues with creating a gt table in R - r

I get the following error, "Error: Don't know how to select rows using an object of class quosures"
I have narrowed it down to one specific section of code in the R markdown file. If I highlight from Sum_Adjusted to "seriesttls") and run selected line, no errors. It is the tab row group item that throws it off.
Sum_Adjusted <- Adjusted %>%gt(rowname_col = "seriesttls") %>%
tab_row_group(group = "Super Sectors",
rows = vars("Mining and logging","Construction","Manufacturing","Trade,
transportation, and utilities","Information","Financial activities","Professional and
business services","Education and health services","Leisure and hospitality","Other
services","Government"))
I am hoping that from just looking at the code that someone can explain why I am getting this error now and not the last dozen times I have ran this exact same code. I have not been able to reproduce this in a smaller example.
str(Adjusted) produces this
tibble [12 x 7] (S3: tbl_df/tbl/data.frame)
$ seriesttls : chr [1:12] "Total nonfarm" "Mining and logging"
"Construction" "Manufacturing" ...
$ empces : num [1:12] 1335900 15000 91000 60200 282200 ...
$ MonthlyDifference: num [1:12] 4800 0 -1000 0 900 600 500 1500 0 1800 ...
$ AnnualDifference : num [1:12] 100900 200 -1900 5200 30600 ...
$ PercentGrowthRate: num [1:12] 0.0817 0.0135 -0.0205 0.0945 0.1216 ...
$ Max : num [1:12] 1442800 15800 146400 60300 282200 ...
$ oftotal : num [1:12] 1 0.0112 0.0681 0.0451 0.2112 ...

The issue is that the rows should be a vector and thus instead of vars (probably deprecated), use the standard select-helpers or just concatenate with c as described in documentation
rows -
The rows to be made components of the row group. Can either be a vector of row captions provided in c(), a vector of row indices, or a helper function focused on selections. The select helper functions are: starts_with(), ends_with(), contains(), matches(), one_of(), and everything().
We could do
library(gt)
library(dplyr)
Adjusted %>%
gt(rowname_col = "seriesttls") %>%
tab_row_group(label = "Super Sectors",
rows = c("Mining and logging","Construction","Manufacturing","Trade,
transportation, and utilities","Information","Financial activities","Professional and
business services","Education and health services","Leisure and hospitality","Other
services","Government"))
Using a reproducible example
gtcars %>%
dplyr::select(model, year, hp, trq) %>%
dplyr::slice(1:8) %>%
gt(rowname_col = "model") %>%
tab_row_group(
label = "numbered",
rows = c("GT", "California"))
-output

Related

R Function behaves differently than the code entered line by line

I am at a loss. Googling has failed me because I'm not sure I know the right question to ask.
I have a data frame (df1) and my goal is to use a function to get a moving average using forecast::ma.
Here is str(df1)
'data.frame': 934334 obs. of 6 variables:
$ clname : chr ...
$ dos : Date, format: "2011-10-05" ...
$ subpCode: chr
$ ch1 : chr "
$ prov : chr
$ ledger : chr
I have a function that I am trying to write.
process <- function(df, y, sub, ...) {
prog <- df %>%
filter(subpCode == sub) %>%
group_by(dos, subpCode) %>%
summarise(services = n())
prog$count_ts <- ts(prog[ , c('services')])
}
The problem is that when I run the function, my final result is data object that is 1x1798 and it's just a time series. If I go a run the code line by line I get what I need but my function that hypothetically does the same thing wont work.
Here is my desired result
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1718 obs. of 4 variables:
$ dos : Date, format: "2010-09-21" "2010-11-18" "2010-11-19" "2010-11-30" ...
$ subpCode: chr "CII " "CII " "CII " "CII " ...
$ services: int 1 1 2 2 2 2 1 2 1 3 ...
$ count_ts: Time-Series [1:1718, 1] from 1 to 1718: 1 1 2 2 2 2 1 2 1 3 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "services"
- attr(*, "vars")= chr "dos"
- attr(*, "drop")= logi TRU
And here is the code that gets it.
CII <- df1 %>%
filter(subpCode == "CII ") %>%
group_by(dos, subpCode) %>%
summarise(services = n())
CII$count_ts <- ts(CII[ , c('services')])
Could someone point me in the right direction. I've exhausted my usual places.
Thanks!
Following the vignette pointed out by #CalumYou, you should use more something like this:
process <- function(df, sub) {
## Enquoting sub
sub <- enquo(sub)
## Piping stuff
prog <- df %>%
filter(!! subpCode == sub) %>%
group_by(dos, subpCode) %>%
summarise(services = n())
prog$count_ts <- ts(prog[ , c('services')])
## Returning the prog object
return(prog)
}

How to add new values to a new column in a dataframe in R?

I need to do some computations with the data stored in a dataframe. ANd put the result into new column of this dataframe.
Initial dataframe:
> str(mydf)
'data.frame': 1122 obs. of 6 variables:
$ MMSI : num 2.73e+08 2.73e+08 2.73e+08 2.73e+08 2.73e+08 ...
$ MMSI.1 : num 2.73e+08 2.72e+08 2.72e+08 2.72e+08 6.67e+08 ...
$ LATITUDE : num 46.9 46.9 46.9 46.9 46.9 ...
$ LONGITUDE : num 32 32 32 32 32 ...
$ LATITUDE.1 : num 46.9 46.9 46.9 46.9 46.9 ...
$ LONGITUDE.1: num 32 32 32 32 32 ...
Now I need to add a new column which contain the result of operation with the data of the current raw..
running next code:
library(geosphere)
> mydf$distance <- with(mydf, distGeo(c(mydf$LONGITUDE,mydf$LATITUDE),c(mydf$LONGITUDE,mydf$LATITUDE)))
Error in .pointsToMatrix(p1) : Wrong length for a vector, should be 2
I understand that's the structure of the data for function distGeo should be different.
How to fix this error or how to change the code to get the distances between point in a new column?
Without having data to look at, it looks like you are trying to calculate the distance between a single point. The second point should likely include .1 at the end of the column name
library(geosphere)
mydf$distance <- with(mydf, distGeo(c(LONGITUDE, LATITUDE), c(LONGITUDE.1, LATITUDE.1)))
update
It looks like the error is that you're passing the entire data frame instead of each row individually. Try this
apply(mydf, 1, function(x) distGeo(x[c("LONGITUDE","LATITUDE")],x[c("LONGITUDE.1","LATITUDE.1")]))
Or just pass specific data columns to function as it accepts a matrix
distGeo(mydf[,c("LONGITUDE", "LATITUDE")], mydf[,c("LONGITUDE.1", "LATITUDE.1")])

Dynamically Changing Data Type for a Data Frame

I have a set of data frames belonging to many countries consisting of 3 variables (year, AI, OAD). The example for Zimbabwe is shown as below,
>str(dframe_Zimbabwe_1955_1970)
'data.frame': 16 obs. of 3 variables:
$ year: chr "1955" "1956" "1957" "1958" ...
$ AI : chr "11.61568161" "11.34114927" "11.23639317" "11.18841409" ...
$ OAD : chr "5.740789488" "5.775882473" "5.800441036" "5.822536579" ...
I am trying to change the data type of the variables in data frame to below so that I can model the linear fit using lm(dframe_Zimbabwe_1955_1970$AI ~ dframe_Zimbabwe_1955_1970$year).
>str(dframe_Zimbabwe_1955_1970)
'data.frame': 16 obs. of 3 variables:
$ year: int 1955 1956 1957 1958 ...
$ AI : num 11.61568161 11.34114927 11.23639317 11.18841409 ...
$ OAD : num 5.740789488 5.775882473 5.800441036 5.822536579 ...
The below static code able to change AI from character (chr) to numeric (num).
dframe_Zimbabwe_1955_1970$AI <- as.numeric(dframe_Zimbabwe_1955_1970$AI)
However when I tried to automate the code as below, AI still remains as character (chr)
countries <- c('Zimbabwe', 'Afghanistan', ...)
for (country in countries) {
assign(paste('dframe_',country,'_1955_1970$AI', sep=''), eval(parse(text = paste('as.numeric(dframe_',country,'_1955_1970$AI)', sep=''))))
}
Can you advice what I could have done wrong?
Thanks.
42: Your code doesn't work as written but with some edits it will. in addition to the missing parentheses and wrong sep, you can't use $'column name' in assign, but you don't need it anyway
for (country in countries) {
new_val <- get(paste( 'dframe_',country,'_1955_1970', sep=''))
new_val[] <- lapply(new_val, as.numeric) # the '[]' on LHS keeps dataframe
assign(paste('dframe_',country,'_1955_1970', sep=''), new_val)
remove(new_val)
}
proof it works:
dframe_Zimbabwe_1955_1970 <- data.frame(year = c("1955", "1956", "1957"),
AI = c("11.61568161", "11.34114927", "11.23639317"),
OAD = c("5.740789488", "5.775882473", "5.800441036"),
stringsAsFactors = F)
str(dframe_Zimbabwe_1955_1970)
'data.frame': 3 obs. of 3 variables:
$ year: chr "1955" "1956" "1957"
$ AI : chr "11.61568161" "11.34114927" "11.23639317"
$ OAD : chr "5.740789488" "5.775882473" "5.800441036"
countries <- 'Zimbabwe'
for (country in countries) {
new_val <- get(paste( 'dframe_',country,'_1955_1970', sep=''))
new_val[] <- lapply(new_val, as.numeric) # the '[]' on LHS keeps dataframe
assign(paste('dframe_',country,'_1955_1970', sep=''), new_val)
remove(new_val)
}
str(dframe_Zimbabwe_1955_1970)
'data.frame': 3 obs. of 3 variables:
$ year: num 1955 1956 1957
$ AI : num 11.6 11.3 11.2
$ OAD : num 5.74 5.78 5.8
It's going to be considered fairly ugly code by teh purists but perhaps this:
for (country in countries) {
new_val <- get(paste('dframe_',country,'_1955_1970', sep=''))
new_val[] <- lapply(new_val, as.numeric) # the '[]' on LHS keeps dataframe
assign(paste('dframe_',country,'_1955_1970', sep=''), new_val)
}
Using the get('obj_name') function is considered cleaner than eval(parse(text=...)). It would get handled more R-naturally had you assembled these dataframes in a list.

How to speed up code with loop in R

Problem:
I have two data frames.
DF with payment log:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 53682 obs. of 7 variables:
str(moneyDB)
$ user_id : num 59017170 57859746 58507536 59017667 59017795 ...
$ reg_date: Date, format: "2016-08-06" "2016-07-01" "2016-07-19" ...
$ date : Date, format: "2016-08-06" "2016-07-01" "2016-07-19" ...
$ money : num 0.293 0.05 0.03 0.03 7 ...
$ type : chr "1" "2" "2" "1" ...
$ quality : chr "VG" "no_quality" "no_quality" "VG" ...
$ geo : chr "Canada" "NO GEO" "NO GEO" "Canada" ...
Here is its structure. Its just a log of all transactions.
Also i have second data frame:
str(grPaysDB)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 335591 obs. of 9 variables:
$ reg_date : Date, format: "2016-05-01" "2016-05-01" "2016-05-01" ...
$ date : Date, format: "2016-05-01" "2016-05-01" "2016-05-01" ...
$ type : chr "1" "1" "1" "1" ...
$ quality : chr "VG" "VG" "VG" "VG" ...
$ geo : chr "Australia" "Canada" "Finland" "Canada" ...
$ uniqPayers : num 0 1 0 1 1 0 0 1 0 3 ...
Its Grouped data from first data frame + zero transactions. For example, there is a lot of rows in second data frame with zero payers. Thats why second data frame is greater then first.
I need to add column weeklyPayers to the second data frames. Weekly payers is sum unique payers for the last 7 days. I tried do it via loop, but it wooks too long. Is there any another vectorized ideas, how to realise this?
weeklyPayers <- vector()
for (i in 1:nrow(grPaysDB)) {
temp <- moneyDB %>%
filter(
geo == grPaysDB$geo[i],
reg_date == grPaysDB$reg_date[i],
quality == grPaysDB$quality[i],
type == grPaysDB$type[i],
between(date, grPaysDB$date[i] - 6, grPaysDB$date[i])
)
weeklyPayers <- c(weeklyPayers, length(unique(temp$user_id)))
}
grPaysDB <- cbind(grPaysDB, weeklyPayers)
In this loop for each row in second data frame i find rows in first data frame with right geo,type, quality and reg_date and range of dates. And then I can calculate number of unique payers.
I may be misunderstanding, but I think this should be fairly simple, using filter and summarise in dplyr. However, as #Hack-R mentioned, it would be helpful to have your dataset. But it would look something like:
library(dplyr)
weeklyPayers <- grPaysDB %>%
filter(date > ADD DATE IN QUESTION) %>%
summarise(sumWeeklyPayers = sum(uniqPayers))
Then again, I may well have misunderstood. If your question involves summing for each week, then you may want to investigate daily2weekly in the timeSeries package and then using group_by for the weekly variable that transpires.
I would try making a join on your datasets using merge on multiple columns (c('geo', 'reg_date', 'quality', 'type') and filter the result based on the dates. After that, aggregate using summarise.
But I am not completely sure why you want to add the weeklypayers to every transaction. Isn't it more informative or easier to aggregate your data on week number (with dplyr). Like so:
moneyDB %>% mutate(week = date- as.POSIXlt(date)$wday) %>%
group_by(geo, reg_date, quality, type, week) %>%
summarise(weeklyPayers = n())

replacement has x rows, data has y - paste() function

I am trying to group by the following sample values,
latitude | longitude | TotalGreenhouseGases | Amount | Branch |End Date
-37.80144| 144.95402| 42965.9868|32549.99|Arts and Culture| 07/31/2013 12:00:00 AM
-37.80144| 144.95402| 43246.6716|32762.63|Arts and Culture| 08/30/2013 12:00:00 AM
-37.80144| 144.95402| 21374.1264|16192.52|Arts and Culture| 09/31/2013 12:00:00 AM
mapdata <- aggregate(cbind(TotalGreenhouseGases,Amount) ~ latitude+longitude,data = dt2,FUN=function(dt2) c(mn =sum(dt2), n=length(dt2) ) )
163 obs. and 4 Variables are created as a result, now to plot it in a map using plot.ly i am trying to add a text for hovering,
mapdata$hover <- paste( mapdata$TotalGreenhouseGases, "CO2 Emission ",'<br>', "Resource Consumption ", mapdata$Amount)
but this results in the following error,
Error in `$<-.data.frame`(`*tmp*`, "hover", value = c("264.06428571 CO2 Emission <br> Resource Consumption 200", :
replacement has 326 rows, data has 163
can anyone let me know where I am going wrong or if it has been solved before can you please provide a link for that.
I think the the problem is that the way you created mapdata you end up with a list of length of 2 for both TotalGreenhouseGases and Amount.
> str(mapdata)
'data.frame': 1 obs. of 5 variables:
$ latitude : num -37.8
$ longitude : num 145
$ TotalGreenhouseGases: num [1, 1:2] 107587 3
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mn" "n"
$ Amount : num [1, 1:2] 81505 3
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mn" "n"
So if you want to use the sum of these values in your paste function then you will need to use [1] indexing, if you need to use the sample size n then use [2]. For example:
mapdata$hover <- paste( mapdata$TotalGreenhouseGases[1],
"CO2 Emission ",'<br>', "Resource Consumption ",
mapdata$Amount[1])
will give you
[1] "107586.7848 CO2 Emission <br> Resource Consumption 81505.14"

Resources