Why am I getting an 'Error in UseMethod(arrange) in r? - r

I'm writing an r program which lists sales prices for various items. I have a column called InvoiceDate, which lists date and time as follows: '12/1/2009 7:45'. I'm trying to isolate the date only in a separate field called date, and then arrange the dates sequentially. The code I'm using is as follows:
library(dplyr)
library(ggplot2)
setwd("C:/Users/cshor/OneDrive/Environment/Restoration_Ecology/Udemy/Stat_Thinking_&_Data_Sci_with_R/Assignments/Sect_5")
retail_clean <- read.csv("C:/Users/cshor/OneDrive/Environment/Restoration_Ecology/Udemy/Stat_Thinking_&_Data_Sci_with_R/Data/retail_clean.csv")
retail_clean$date <- as.Date(retail_clean$InvoiceDate)#, format = "%d/%m/%Y")
total_sales = sum(retail_clean$Quantity, na.rm=TRUE) %>%
arrange(retail_clean$date) %>% ggplot(aes(x=date, y=total_sales)) + geom_line()
Initially, everything works fine, and the date field is created. However, I get the following error for the arrange() function:
Error in UseMethod("arrange") :no applicable method for 'arrange' applied to an object of class "c('integer', 'numeric')"
I've searched for over a week for a solution to this problem, but have found nothing that specifically addresses this issue. I've also used '.asPosixct' instead of .asDate, with similar results. Any help as to why the program interprets Date data as numeric, and how I can correct the problem, would be greatly appreciated.

First, the error message is not about Date time.
Let's look at the code you provided:
total_sales = sum(retail_clean$Quantity, na.rm=TRUE) %>%
arrange(retail_clean$date) %>% ggplot(aes(x=date, y=total_sales)) + geom_line()
The result of this term sum(retail_clean$Quantity, na.rm=TRUE) is an integer in your case, and it is piped into the first argument of the dpyr::arrange function, which calls UseMethod("arrange").
Then, the piped argument is inspected as being an object of class of integer and numeric, and arrange do not have a method for these classes, that is, neither arrange.integer nor arrange.numeric are defined. Hence the error msg. There is nothing wrong with you date convertion except that you do need that format term you commented out in the code sample.
The solution is also simple. Change sum to something that returns a data.frame or other classes that arrange is aware of. You can check what methods are available for arrange:
$>methods(dplyr::arrange)
[1] arrange.data.frame*
In this R instance, you can only put a data.frame object through arrange, but you can always define specific methods for other classes.
Looks like this is a Udemy course assignment. Maybe here you need to calculate a sum for each day or each month, whichever your assignment is asking you to do, but sum is definitely not the right answer.
By the way, welcome to SO!
Update:
An example
n <- 100
data <- data.frame(sales = runif(n), day = sample(1:30, n, replace = TRUE))
data$date_ <- paste0(data$day, "/1/2009 7:45")
head(data$date_) # This is the orignial date string
data$date <- as.Date(data$date_, format = "%d/%m/%Y")
head(data$date) # Check here to see the formated date
library(dplyr)
library(ggplot2)
data %>%
group_by(date) %>%
summarise(totalSale = sum(sales, na.rm=TRUE)) %>%
arrange(date) %>%
ggplot(aes(x = date, y = totalSale)) +
geom_line()
Here is the plot
It looks fine, isn't it? The sales are all ordered by date now.

Related

dplyr passing column names as a variable with is.na filter

I am aware that similar questions have been asked and I have tried multiple options but I am still having an error message.
df_construction <- function(selected_month, selected_variable){
selected_variable_en <- rlang::enquo(selected_variable) #This was an attempt following the link
#filter_criteria <- interp(!is.na(~y), .values = list(y = as.name(selected_variable))) This doesn't work
df1 <- airquality %>%
dplyr::filter(Month == selected_month,
!is.na(selected_variable_en))%>%
select(Month, Day, !!selected_variable)
return(df1)}
df1 <- df_construction(2, "Solar.R")
My ultimate goal is to build this in Shiny and thus have inputs the user will have selected as arguments in the function.
I know that the filter and the select functions shouldn't be dealt with in the same way.
I have followed the steps according to: https://www.brodrigues.co/blog/2016-07-18-data-frame-columns-as-arguments-to-dplyr-functions/ but had no success due to the !is.na filter.
I just want to have a dataframe where the only columns are the Month column for the selected months, the Day column and whichever column from the choice Ozone, Solar.R, Wind, Temp the user has selected, without any NA.
Thank you very much for your help!!
!! is often not enough to unquote variable names. You often need them in conjunction with rlang::sym. And if you have more than one variable to unquote, you need to use !!! and rlang::syms
df_construction <- function(selected_month, selected_variable){
df1 <- airquality %>%
dplyr::filter(Month == selected_month,
!is.na(!!rlang::sym(selected_variable_en)))%>%
select(Month, Day, selected_variable)
return(df1)
}
For select, you can directly put variable names. There has been a new functionality in dplyr to unquote {{}} but it does not work in all cases.
If you start writing variables names in functions, you might have difficulties with dplyr. In that aspect, data.table is easier to use (see a blog post I wrote on the subject)

Combining function output into data frame with different column names

I am trying to figure out how to get a single data frame (or a tibble) with distinct column names from this function. This code prints three separate chunks with the same Date and same column names. Column values are however different.
hello <- function(x){
fx <- Quandl(x)
fx <- fx %>%
select(Date, Open, Close) %>%
mutate(new_col = `Open` - `Close`) %>%
select(Date, new_col)
print(kable(head(fx), format = "rst"))
}
lapply(c(EUR, USD, RUB), fx)
I am doing it the long way via replicating the code for every input (EUR, USD, RUB). Then merging them with the function below, then converting to a tibble to get rid of the observation numbers on the left before printing via kable.
Reduce(function(x, y) merge(x, y, all = TRUE), list(EUR, USD, RUB))
Just want to see how it is possible to do easier with one function.
Thank you!
EDIT 1:
Thank you for the answers and suggestions on improving the question!
So the reproducible code looks like this:
Wheat<-"CFTC/001602_FO_ALL"
Corn<-"CFTC/002602_F_ALL"
Beans<-"CFTC/005602_F_ALL"
funds<-function(x){
cftc<-Quandl(x)
cftc<-cftc%>%
select(Date,`Money Manager Longs`,`Money Manager Shorts`)%>%
mutate(Funds_Net = `Money Manager Longs`-`Money Manager Shorts`)%>%
select(Date,Funds_Net)
print(kable(head(cftc),format = "rst"))
}
lapply(c(Wheat,Corn,Beans),funds)
The output I have is this:
enter image description here
What I want as output is this:
enter image description here
Thank you!

R : doesn't recognise column in a new table

This is part of an online course I am doing, R for data analysis.
A tibble is created using the group_by and summarise functions on the diamonds data set - the new tibble indeed exists and looks as you would expect, I checked. Now a bar plot has to be created using these summary values in the new tibble, but it gives me all sorts of errors associated with not recognising the columns.
I transformed the tibble into a data frame, and still get the same problem.
Here is the code:
diamonds_by_color <- group_by(diamonds, color)
diamonds_mp_by_color <- summarise(diamonds_by_color, mean_price = mean(price))
diamonds_mp_by_color <- as.data.frame(diamonds_mp_by_color)
colorcounts <- count(diamonds_by_color$mean_price)
colorbarplot <- barplot(diamonds_by_color$mean_price, names.arg = diamonds_by_color$color,
main = "Average price for different colour diamonds")
The error I get when running the function count is:
Error in UseMethod("summarise_") :
no applicable method for 'summarise_' applied to an object of class "NULL"
In addition: Warning message:
Unknown or uninitialised column: 'mean_price'.
It's probably something trivial but I have been reading quite a lot and tried a few things and can't figure it out. Any help will be super appreciated :)
Your diamonds_by_color never has mean_price assigned to it.
Your last two lines of code work if you reference diamonds_mp_by_color instead:
colorcounts <- count(diamonds_mp_by_color, mean_price)
barplot(diamonds_mp_by_color$mean_price,
names.arg=diamonds_mp_by_color$color,
main="Average price for different colour diamonds")
Here is a way to summarise the price by color using dplyr and piping straight to a barplot using ggplot2.
diamonds %>% group_by(color) %>%
summarise(mean.price=mean(price,na.rm=1)) %>%
ggplot(aes(color,mean.price)) + geom_bar(stat='identity')
Best dplyr idiom is not to declare a temporary result for each operation. Just do one big pipe; also the %>% notation is clearer because you don't have to keep specifying which dataframe as the first arg in each operation:
diamonds %>%
group_by(color) %>%
summarise(mean_price = mean(price)) %>%
tally() %>% # equivalent to n() on a group
# may need ungroup() %>%
barplot(mean_price, names.arg = color,
main = "Average price for different colour diamonds")
(Something like that. You can assign the output of the pipe before the barplot if you like. I'm transiting through an airport so I can't check it in R.)

Using summarize (or equivalent?) to create a column of functions in an R dataframe

I'm working with some electricity data that has, for each hour, day and asset a step function which specifies the asset's offering of power at escalating prices. What I'd like to do is collapse those data into a data frame, tibble, etc. with date, time, asset and a row-specific step function. I'll then use that step function to populate some other columns later on.
Here's a quick reproducible example of what I want to do.
library(dplyr)
df_test<-data.frame(rep(1:25, times=1, each=4))
names(df_test)[1]<-"asset"
df_test$block<-rep(1:4, times=25)
df_test$from<-rep(seq(0,150,50), times=25)
df_test$to<-df_test$from+50
df_test$index<-runif(100)*100
df_test<-df_test %>% group_by(asset) %>% mutate(price=cumsum(index))
This is basically an example of what I would have for each hour of each day, except that in my case, the numbers of blocks are different (some firms bid a single block, others bid up to 7 blocks, but that's likely not material to the problem here).
Now, what I would like to do is, for each asset, calculate a step function using
the from, to, and price blocks and store it in a data frame by asset (again, in my extended case, it will be by date, hour, and asset).
For example, using the first group I could do this
generate_func<-function(x,y){
stepfun(x, y, f = as.numeric(0), ties = "ordered",right = FALSE)
}
eg_func<-generate_func(df_test$from[2:4],df_test$price[1:4])
The function eg_func lets me find the implied price at any value x for asset 1.
eg_func(500)
[1] 43.10305
What I'd like to do is group my data by asset and then store a version of eg_func for each asset in a second column of a data frame or equivalent.
Basically, what I want to do is something like:
df_sum<-df_test %>% group_by(asset) %>% summarize(
step_func=generate_func(from[-1],price)
)
But I get:
Error: Column `step_func` is of unsupported type function
Update:
#akrun has gotten me a step down the road. So, if I wrap the function in a list, I can do what I want to do...at least the first step:
df_func<-df_test %>%
group_by(asset) %>%
summarize(step_func=list(generate_func(from[-1],price)))
So now I have a data frame with a step function for each asset. Now, my next quest is to be able to evaluate that function to create a new column evaluating the step function at a particular value. So, for example, I can evaluate the first asset's bid at a value of 50:
df_func[1,2][[1]][[1]](50)
[1] 49.60776
I'd like to be able to do this in a mutate command, so something akin to:
df_func <-df_func %>% mutate(bid_50=step_func[[2]](50))
But that applies the second step function to everyone. How do I fill column bid_50 with each asset's step function evaluated at 50?
Update #2 #akrun again with the solution:
df_func <-df_func %>% mutate(bid_50=map_dbl(step_func, ~ .x(50)))
It is better to wrap it in a list as eg_func is a function and then extract the list elements with map apply the function on the argument passed to create a new column 'bid_50'
library(tidyverse)
df_test %>%
group_by(asset) %>%
summarize(step_func=list(generate_func(from[-1],price))) %>%
mutate(bid_50 = map_dbl(step_func, ~ .x(50)))

sorting months in R with count or summarise

I am trying to sort months with R and have the following:
```{r}
result <- mydata %>%
count(months(as.Date(orderdate)))
result
```
This results with the monts and a count of the orders. However, the months are not ordered correctly by month. How can i sort this correctly by month?
I already did try to use "order" and "factor", however this was not working correctly. How can i use a short code and order correctly?
Thanks,
Roland
Do:
count(
factor(months(as.Date(orderdate)), month.name)
)
since months() "return[s] a character vector of [month] names in the locale in use" and is a less brittle solution than hard-coding U.S./English month names.
Since you have no sample data provided, I can't test if it really works in your specific case. Converting your months-data to a factor works, but you'll have to specify the names of the months first (in the language of your data), as R doesn't know how you want them to be ordered. Thus, creating a factor without defining the levels will only lead to alphabetical order, which isn't correct for months.
library(dplyr)
result <- mydata %>%
mutate(ordered_months = factor(months(as.Date(orderdate)),
levels=c("January", "February", ...))) %>% # insert all month-names here
count(ordered_months)
result
...should work.
The following assumes your locale speaks the English language. This is because the built-in variable month.name uses English month names.
First of all, make up a dataset, since you have not posted one.
set.seed(1)
d <- seq(as.Date("2017-01-01"), Sys.Date(), by = "month")
mydata <- data.frame(orderdate = sample(d, 1e2, TRUE))
Now the problem. Note that order is its own inverse, the fact that this answer uses.
library(dplyr)
library(lubridate)
result <- mydata %>%
count(months(as.Date(orderdate)))
inx <- order(month.name)
result[order(inx), ]

Resources