I have 4 years experience using R but I am very new to the Big Data game as I always worked on csv files.
It is thrilling to manipulate large amount of data from a distance but also somehow frustating as simple things you were used to are to be rengineered.
The task I am struggling right now is to have a basic 5 figure summary of a variable:
summary(df$X)
Some context, I am connected with impala, these lines of codes work fine:
library(dbplyr)
localTable <- tbl(con, 'serverTable')
localTable %>% tally()
localTable %>% filter(X > 10) %>% tally()
If I just write
localTable
instead, RStudio gets stuck/takes a lot of time so I suppress it with the task manager.
Coming back to my current question, I tried to have a 5 figure summary in these ways:
summary(localTable$X) #returns Length 0, Class NULL, Mode NULL
localTable %>% fivenum(X) #returns Error in rank(x, ties.method = "min", na.last = "keep") : unimplemented type 'list' in 'greater'
also building a custom summary() with summarise
localTable %>% summarize(Min = min(X),
Q1 = quantile(X, .25),
Avg = mean(X),
Q3 = quantile(X, .75),
Max = max(X))
returns me a SYNTAX ERROR.
My guess is that there is a very trivial missing link between my code and the server in form of a data structure, but I can't figure it out what.
I tried as well to save localTable$x to a in-memory variable with
XL <- localTable$X
but I always get a NULL
On the graphical side, using dbplot, if I try
library(dbplot)
localTable %>% dbplot_histogram(X)
I get an empty graphic.
I thought about leveraging the 5 figures summary in the boxplot function, ggplotbuild(object)$data likewise so to speak, but with dbplot_boxplot I get the error could not find function "dbplot_boxplot".
I started using dbplyr as I am quite fluent with dplyr and I don't want to write queries in SQL with DBI::dbGetQuery, but you can suggest other packages like implyR, sparklyR or the such, as well as tutorials on the subject as large, as the ones I found are quite basic.
EDIT:
as requested in a comment, I add the result of
str(localTable)
which is
List of 2
$ src:List of 2
..$ con :Formal class 'Impala' [package ".GlobalEnv"] with 4 slots
.. .. ..# ptr :<externalptr>
.. .. ..# quote : chr "`"
.. .. ..# info :List of 15
.. .. .. ..$ dbname : chr "IMPALA"
.. .. .. ..$ dbms.name : chr "Impala"
.. .. .. ..$ db.version : chr "2.9.0-cdh5.12.1"
.. .. .. ..$ username : chr "User"
.. .. .. ..$ host : chr ""
.. .. .. ..$ port : chr ""
.. .. .. ..$ sourcename : chr "impala connector"
.. .. .. ..$ servername : chr "Impala"
.. .. .. ..$ drivername : chr "Cloudera ODBC Driver for Impala"
.. .. .. ..$ odbc.version : chr "03.80.0000"
.. .. .. ..$ driver.version : chr "2.6.11.1011"
.. .. .. ..$ odbcdriver.version : chr "03.80"
.. .. .. ..$ supports.transactions : logi FALSE
.. .. .. ..$ getdata.extensions.any_column: logi TRUE
.. .. .. ..$ getdata.extensions.any_order : logi TRUE
.. .. .. ..- attr(*, "class")= chr [1:3] "Impala" "driver_info" "list"
.. .. ..# encoding: chr ""
..$ disco: NULL
..- attr(*, "class")= chr [1:4] "src_Impala" "src_dbi" "src_sql" "src"
$ ops:List of 2
..$ x : 'ident' chr "serverTable"
..$ vars: chr [1:157] "X" ...
..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
- attr(*, "class")= chr [1:5] "tbl_Impala" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
Not sure if I can dput my table as it is sensitive information
There are quite a few aspects to your post. I am going to try and address the main ones.
(1) What you are calling localTable is not local. What you have is a local access point to a remote table. It is a remote table because the data is stored in the database, rather than in R.
To copy a remote table into local R memory use localTable = collect(remoteTable). Use this carefully. If the table is many GB in the database this will be slow to transfer into R. Also if you collect a database table that is bigger than the ram avaialble to R then you will receive an out of memory error.
I recommend using collect for moving summary results into R. Do the processing and summarizing in the database and just fetch the results into R. Alternatively, use remoteTable %>% head(20) %>% collect() to copy just the first 20 rows into R.
(2) The tableName$colname will not work for remote tables. In R the $ notation lets you access a named component of a list. Data.frames are a special kind of list. If you try data(iris) followed by names(iris) you will get the columns names of iris. Any of these can be accessed using iris$.
However as your str(localTable) shows, localTable is a list of length 2 with the first named item src. If you call names(localTable) then you will receive two names back, the first of which is src. This means you can call localTable$src (and as localTable$src is also a list you can also call localTable$src$con).
When working with dbplyr R translates data manipulation commands into the database language. There are translations defined for most dplyr commands, but there are not translations defined for all R commands.
So the recommended approach to access just a specific column is using select from dplyr:
local_copy_of_just_one_column = remoteTable %>%
select(required_column) %>%
collect()
(3) You have the right approach with a custom summary function. This is the best approach for producing the five figure summary without pulling the data into local memory (RAM).
One possible cause of the syntax error is that you may have used R commands that do not have a translation into your database language.
You can check whether a command has translations defined using translate_sql. I recommend you try
library(dbplyr)
translate_sql(quantile(colname, 0.25))
To see what the translation look like.
You can view the translation of an entire table manipulation using show_query. This is my go-to approach when debugging SQL translation. Try:
localTable %>%
summarize(Min = min(X),
Q1 = quantile(X, .25),
Avg = mean(X),
Q3 = quantile(X, .75),
Max = max(X)) %>%
show_query()
If this does not produce valid SQL then executing the command will error.
One possible cause is the Min and Max have special meanings in SQL and so might produce odd behavior in your translation.
When I experimented with quantile it looks like it might need an OVER clause in SQL. This is created using group_by. So perhaps you want something like the following:
localSummary = remoteTable %>%
# create dummy column
mutate(ones = 1) %>%
# group to satisfy over clause
group_by(ones) %>%
summarise(var_min = min(var),
var_lq = quantile(var, 0.25),
var_mean = mean(var),
var_uq = quantile(var, 0.75),
var_max = max(var)) %>%
# copy results from database into R memory
collect()
I have struggled for two days longs to find a way to create a specific matrix from a nested list
First of all, I am sorry if I don't explain my issue correctly I am one week new to StackOverflow* and R (and programming...)!
I use a file that you can find there :
original link: https://parltrack.org/dumps/ep_mep_activities.json.lz
Uncompressed by me here: https://wetransfer.com/downloads/701b7ac5250f451c6cb26d29b41bd88020200808183632/bb08429ca5102e3dc277f2f44d08f82220200808183652/666973
first 3 lists and last one (out of 23905) past here: https://pastebin.com/Kq7mjis5
With rjson, I have a nested list like this :
Nested list of MEP Votes
List of 23905
$ :List of 7
..$ ts : chr "2004-12-16T11:49:02"
..$ url : chr "http://www.europarl.europa.eu/RegData/seance_pleniere/proces_verbal/2004/12-16/votes_nominaux/xml/P6_PV(2004)12-16(RCV)_XC.xml"
..$ voteid : num 7829
..$ title : chr "Projet de budget général 2005 modifié - bloc 3"
..$ votes :List of 3
.. ..$ +:List of 2
.. .. ..$ total : num 45
.. .. ..$ groups:List of 6
.. .. .. ..$ ALDE :List of 1
.. .. .. .. ..$ : Named num 4404
.. .. .. .. .. ..- attr(*, "names")= chr "mepid"
.. .. .. ..$ GUE/NGL:List of 25
.. .. .. .. ..$ : Named num 28469
.. .. .. .. .. ..- attr(*, "names")= chr "mepid"
.. .. .. .. ..$ : Named num 4298
.. .. .. .. .. ..- attr(*, "names")= chr "mepid"
then my goal is to have something like this :
final matrix
First I would like to keep only the lists (from [[1]] to [[23905]]) containing $vote$+$groups$Renew or $vote$-$groups$Renew or $vote$'0'$groups$Renew. The main list (the 23905) are registered votes. My work is on the Renew group so my only interest is to have a vote where the Renew groups exist to compare them with other groups.
After that my goal is to create a matrix like this all the [[x]] where we can find groups$Renewexists:
final matrix
V1 V2 (not mandatory) V3[[x]]$voteid
[mepid==666] GUE/NGL + (mepid==[666] is found in [[1]]$vote$+$groups$GUE/NGL)
[mepid==777] Renew - (mepid==[777] is found in [[1]]$vote$-$groups$GUE/NGL)
I want to create a matrix so I can process the votes of each MEP (referenced by their MEPid). Their votes are either + (for yea), - (for nay) or 0 (for abstain). Moreover, I would like to have political groups of MEP displayed in the column next to their mepid. We can find their political group thanks to the place where their votes are stored. If the mepid is shown in the list [[x]]$vote$+$groups$GUE/NGL she or he belongs to the GUE/NGL groups.
What I want to do might look like this
# Clean the nested list
Keep Vote[[x]] if Vote[[x]] list contain ,
$vote$+$groups$Renew,
or $vote$-$groups$Renew,
or $vote$'0'$groups$Renew
# Create the matrix (or a data.frame if it is easier)
VoteMatrix <- as.matrix(
V1 = all "mepid" found in the nested list
V2 = groups (name of the list where we can find the mepid) (not mandatory)
V3 to Vy = If.else(mepid is in [[x]]$vote$+ then “+”,
mepid is in [[x]]$vote$- then “-“, "0")
)
Thank you in advance,
*Nevertheless, I am reading this website actively since I started R!
You can see that the 'votes' sublist is composed of three items a list of member numbers stored within what I think are party designators. Here's how you might "straighten" the positive voter 'memids' by party:
str( unlist( sapply(names(jlis[[1]]$votes$'+'$groups), function(x) unlist(jlis[[1]]$votes$'+'$groups[[x]]) ) ) )
Named num [1:104] 28268 4514 28841 28314 28241 ...
- attr(*, "names")= chr [1:104] "ALDE.mepid" "ALDE.mepid" "ALDE.mepid" "ALDE.mepid" ...
You get a named numeric vector with 108 entries. Perhaps this will demonstrate what sort of terminology to use in better describing your desired result. (Just giving a partial schema for the desired result leaves way too much ambiguity to support a fully formed request.)
I do NOT see the number 23905 anywhere in what I downloaded from your link. We are clearly looking at different data. I see this for the timestamp: chr "2004-12-01T15:20:31". I'm not going to cut you any slack for not knowing R, since the task needs to be fully explained in a natural language. I will cut you slack regarding grammar if English is not your native tongue, but you definitely need to make a better effort at explication. This is what I see for the names with the votes$'+'$groups sublists of the first three items, but since RENEW is not in any of them there's not a lot that could be demonstrated about picking items:
> names( jlis[[1]]$votes$'+'$groups)
[1] "ALDE" "GUE/NGL" "IND/DEM" "NI" "PPE-DE" "PSE" "UEN"
> names( jlis[[2]]$votes$'+'$groups)
[1] "GUE/NGL" "IND/DEM" "NI" "PPE-DE"
> names( jlis[[3]]$votes$'+'$groups)
[1] "ALDE" "GUE/NGL" "IND/DEM" "NI" "PPE-DE" "PSE" "UEN" "Verts/ALE"
Furthermore, when I looked at all of the possible votes values using this method (for all three of the items you made available) I still see no RENEW names.
sapply( jlis[[1]]$votes[c("+","-","0")], function(x) names(x$groups) )
After second edit: Here's the next step of isolating those votes that contain a "Renew` value. I'm assuming that its possible to have a "Renew" value in only one of the three possible 'votes' values (+,-.0). If not (and there are always "Renew" values in each of them when there is one in any of them) then you might be able to simplify the logic. We make three logical vectors:
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['0']][['groups']]) } )
#[1] FALSE FALSE FALSE TRUE
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['+']][['groups']]) } )
#[1] FALSE FALSE FALSE TRUE
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['-']][['groups']]) } )
#[1] FALSE FALSE FALSE TRUE
And then wrap them in a matrix call with 3 columns and take the maximum of each row (the maximum of c(TRUE,FALSE) is 1 and then convert back to logical.
selection_vec = as.logical( apply( matrix( c(
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['0']][['groups']]) } ),
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['+']][['groups']]) } ),
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['-']][['groups']]) } ) ),
ncol=3 ), 1,max))
> selection_vec
[1] FALSE FALSE FALSE TRUE
I am doing hierarchical clustering in R and need all the cluster's elements separately.
When I use following data splits into 3 list of num [1:2628] (no info of columns in original dataframe (dataA) is transferred)
clusterA <- hclust(dist(dataA),method = "single")
NumA = 3
label <- cutree(clusterA, NumA)
clusterXlist<-split(dataA,f=label)
str(clusterXlist[[1]])
how to make shure that it maintains the structure of dataA
edit:
in my case
>str(clusterXlist[[1]])
num [1:2628] 0.0529 -0.3909 -0.4465 0.1 0.8393 ...
where as for dataA
> str(dataA)
num [1:440, 1:6] 0.0529 -0.3909 -0.4465 0.1 0.8393 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "Fresh" "Milk" "Grocery" "Frozen" ...
- attr(*, "scaled:center")= Named num [1:6] 12000 5796 7951 3072 2881 ...
..- attr(*, "names")= chr [1:6] "Fresh" "Milk" "Grocery" "Frozen" ...
- attr(*, "scaled:scale")= Named num [1:6] 12647 7380 9503 4855 4768 ...
..- attr(*, "names")= chr [1:6] "Fresh" "Milk" "Grocery" "Frozen" ...
edit2 :
for dataA
> dput(head(dataA,n=20))
structure(c(0.0528730042415329, -0.390857056063646, -0.44652098379972,
0.0999975794271863, 0.839284119671916, -0.204572661537808, 0.00993903725191922,
-0.349583518736614, -0.477357534676238, -0.473957607271904, -0.682697336282181,
0.0905884780058897, 1.55872457204484, 0.728746944991474, 1.00042486502152,
-0.138155475034538, -0.868191050016313, -0.484236457564077, 0.521904849881291,
-0.333690834823332, 0.522972471408079, 0.543838613660349, 0.408073194590386,
-0.623310408164662, -0.0523368792616442, 0.333686752405346, -0.351915064454946,
-0.113851350576777, -0.291078065290861, 0.717677967619194, -0.053285340273111,
-0.63306600713975, 0.883794139056095, 0.0557876760455718, 0.497093035238056,
-0.634420951441845, 0.409157150032062, 0.0488774601048851, 0.0719115132405076,
-0.447303143322465, -0.0410681453901357, 0.170124700204028, -0.0281250860936324,
-0.3925300807586, -0.0792659545334748, -0.297298628211157, -0.10273182626616,
0.15518230654465, -0.185125447641461, 1.15011422238562, 0.528531691780372,
-0.360751187201331, 0.400469064432042, 0.739829765498898, 0.435615257968889,
-0.434621330503326, 0.438772101699743, -0.528063904936618, 0.226000834240152,
0.159180975270399, -0.588697039406295, -0.269829034507317, -0.137379339965946,
0.68636300602308, 0.173661155768845, -0.495590877769126, -0.533904475256987,
-0.288985833251248, -0.545233764836731, -0.394039245717966, 0.273564891153861,
-0.340276616984998, -0.573659982327726, 0.00475174748902491,
-0.572218072744849, -0.551001403168238, -0.605176006067741, -0.459955112363749,
-0.178576756619561, -0.494972916519322, -0.0435191938188023,
0.0863085949200282, 0.13308015693741, -0.498021323377842, -0.23165413161966,
-0.227878848586867, 0.0542186891412866, 0.0921812574154842, -0.244448146341904,
0.952945788892319, 0.649245242698738, -0.489212329634658, 0.209634507324604,
0.802353943473126, 0.456496070080021, -0.40217108193415, 0.341140199633565,
-0.526755422016323, -0.0240135648160378, -0.0762383134363428,
-0.066263629344282, 0.0890496850231094, 2.24074190324533, 0.0933048443208461,
1.29786952218849, -0.0261942126239276, -0.347458739603052, 0.369181005457445,
-0.274766434933383, 0.203229792845712, 0.0777025935624781, -0.364479376793999,
0.498608767430271, -0.327246732938803, 0.228051555415843, -0.394620088486301,
-0.157749554245622, 1.04716972023017, 0.587257919466454, -0.36306099036142
), .Dim = c(20L, 6L), .Dimnames = list(NULL, c("Fresh", "Milk",
"Grocery", "Frozen", "Detergents_Paper", "Delicassen")))
for clusterXlist[[1]] which was obtained by split of dataA
> dput(head(clusterXlist[[1]],n=20))
c(0.0528730042415329, -0.390857056063646, -0.44652098379972,
0.0999975794271863, 0.839284119671916, -0.204572661537808, 0.00993903725191922,
-0.349583518736614, -0.477357534676238, -0.473957607271904, -0.682697336282181,
0.0905884780058897, 1.55872457204484, 0.728746944991474, 1.00042486502152,
-0.138155475034538, -0.868191050016313, -0.484236457564077, 0.521904849881291,
-0.333690834823332)
What you have there is a matrix, not a data frame.
class(dataA)
# [1] "matrix"
The quick and easy way to split() would be to do
split(as.data.frame(dataA), label)
However, this may cause issues in later calculations and you may need to resort to coercing those list elements back to a matrix. I would recommend you use lapply() to split the data, as follows.
clusterXlist <- lapply(
unique(label),
function(i) dataA[label == i, , drop = FALSE]
)
to properly maintain your matrix structure throughout your list elements.
str(clusterXlist[[1]])
# num [1:18, 1:6] 0.0529 -0.3909 0.1 0.8393 -0.2046 ...
# - attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:6] "Fresh" "Milk" "Grocery" "Frozen" ...
I am working with git2r and want to create some basic statistics about the project's activity.
git2r returns all commits as a list of S4 objects. Below I'm showing the structure of the first object:
> library(git2r)
> repo <- repository('/Users/swain/Dropbox/projects/from-github/brakeman')
> last3 <- commits(repo, n=3)
> str(last3)
List of 3
$ :Formal class 'git_commit' [package "git2r"] with 6 slots
.. ..# sha : chr "f7746c21846d895bd90632df5a2366381ced77d9"
.. ..# author :Formal class 'git_signature' [package "git2r"] with 3 slots
.. .. .. ..# name : chr "Justin"
.. .. .. ..# email: chr "presidentbeef#users.noreply.github.com"
.. .. .. ..# when :Formal class 'git_time' [package "git2r"] with 2 slots
.. .. .. .. .. ..# time : num 1.5e+09
.. .. .. .. .. ..# offset: num -420
.. ..# committer:Formal class 'git_signature' [package "git2r"] with 3 slots
.. .. .. ..# name : chr "GitHub"
.. .. .. ..# email: chr "noreply#github.com"
.. .. .. ..# when :Formal class 'git_time' [package "git2r"] with 2 slots
.. .. .. .. .. ..# time : num 1.5e+09
.. .. .. .. .. ..# offset: num -420
.. ..# summary : chr "Merge pull request #1056 from presidentbeef/hash_access_interpolation_performance_improvements"
.. ..# message : chr "Merge pull request #1056 from presidentbeef/hash_access_interpolation_performance_improvements\n\nHash access i"| __truncated__
.. ..# repo :Formal class 'git_repository' [package "git2r"] with 1 slot
.. .. .. ..# path: chr "/Users/swain/Dropbox/projects/from-github/brakeman"
I have searched high and low for a way to extract one slot from all objects into a list. For example, for all the S4 objects in the list last3 I want to pull author into this new list. Note that there's nesting of objects here, so I may want to make a list from something on an object that's in a slot of the top object.
Ultimately I want to start creating plots and summaries of the various fields. For example, a bar chart of commits by day of the week; box plots of the message length by committer; things like that. Is converting slots to lists or vectors the wrong way to go about it? (edit: s/histogram/bar chart/, doh)
Here's a tidyverse solution to what you're trying to achieve. Jenny Bryan has a nice set of introductory documents on how to use purrr (and other packages) for this sort of task: https://jennybc.github.io/purrr-tutorial/.
library(git2r)
library(dplyr)
library(ggplot2)
library(purrr)
library(lubridate)
options(stringsAsFactors = FALSE)
repo <- repository("/git-repos/brakeman/")
# Get relevant bits out of the list
analysis_df <-
repo %>%
commits(n = 50) %>%
map_df(
~ data.frame(
name = .#author#name,
date = .#author#when#time %>% as.POSIXct(origin="1970-01-01"),
message = .#message
)
)
# A histogram of commits by day of the week;
analysis_df %>%
mutate(weekday = weekdays(date)) %>%
group_by(weekday) %>%
tally() %>%
ggplot(aes(x = weekday, y = n)) +
geom_bar(stat = "identity")
# box plots of the message length by committer
analysis_df %>%
mutate(message_length = nchar(message)) %>%
group_by(name) %>%
summarise(mean_message_length = mean(message_length)) %>%
ggplot(aes(x = name, y = mean_message_length)) +
geom_bar(stat = "identity")
How about
lapply(last3,function(x) data.frame(author = x#author#name, email = x#author#email))
I am using dplyr to create an object that I then use xlsx to write out to a spreadhseet.
I run the following code:
provFundedProp <- compensationBase2014 %>%
group_by(provinciallyFunded) %>%
summarise(total=sum(fundingRaw)) %>%
mutate(percent = paste0(round(100 * total/sum(total),1), "%"))
Which I then write to the first sheet:
write.xlsx(provFundedProp, file="output/provFundedProp.xlsx",
sheetName="provFundingSector")
This works fine and gives me the file I need.
I then run the following code going down a level:
provFundedServiceDivision <- compensationBase2014 %>%
group_by(serviceDivision,provinciallyFunded) %>%
summarise(total=sum(fundingRaw)) %>%
mutate(percent = paste0(round(100 * total/sum(total),1), "%"))
#write to second sheet
write.xlsx(provFundedServiceDivision, file="output/provFundedSD.xlsx",
sheetName="provFundingSD")
Which gives me the following error:
Error: cannot convert object to a data frame
I am going crazy here. Does anyone have any idea what the heck is going on?
I have tried this with multiple wueries and I have no idea what is up.
class(provFundedServiceDivision) [1] "grouped_df" "tbl_df" "tbl"
"data.frame"
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of 4
variables:
$ serviceDivision : chr "AS" "AS" "CLS" "CLS" ...
$ provinciallyFunded: chr "NPF" "PF" "NPF" "PF" ...
$ total : num 1.90e+06 3.97e+07 2.93e+07 5.70e+08 9.55e+07 ...
$ percent : chr "4.6%" "95.4%" "4.9%" "95.1%" ...
- attr(*, "vars")=List of 1
..$ : symbol serviceDivision
- attr(*, "labels")='data.frame': 3 obs. of 1 variable:
..$ serviceDivision: chr "AS" "CLS" "GS"
..- attr(*, "vars")=List of 1
.. ..$ : symbol serviceDivision
..- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 3
..$ : int 0 1
..$ : int 2 3
..$ : int 4 5
- attr(*, "drop")= logi TRUE
- attr(*, "group_sizes")= int 2 2 2
- attr(*, "biggest_group_size")= int 2
> traceback()
7: stop(list(message = "cannot convert object to a data frame",
call = NULL, cppstack = NULL))
6: .Call("dplyr_cbind_all", PACKAGE = "dplyr", dots)
5: cbind_all(x)
4: bind_cols(...)
3: cbind(deparse.level, ...)
2: cbind(rownames = rownames(x), x)
1: write.xlsx(provFundedServiceDivision, file = "output/provFundedSD.xlsx",
sheetName = "provFundingSD")
eipi10 saved the day with his solution! I used the following code and everything worked fine:
write.xlsx(as.data.frame(provFundedServiceDivision),
file="output/provFundedSD.xlsx", sheetName="provFundingSD")
thanks to everyone for reading and helping me out. This is my first question on stack overflow. Cheers!
Use ungroup() at the end of your dplyr chain:
provFundedServiceDivision <- compensationBase2014 %>%
group_by(serviceDivision,provinciallyFunded) %>%
summarise(total=sum(fundingRaw)) %>%
mutate(percent = paste0(round(100 * total/sum(total),1), "%")) %>%
# Add ungroup to the end
ungroup()
#write to second sheet
write.xlsx(provFundedServiceDivision, file="output/provFundedSD.xlsx",
sheetName="provFundingSD")
Instead of...
class(provFundedServiceDivision)[1]
[1] "grouped_df"
You get...
class(provFundedServiceDivision)[1]
[1] "tbl_df"