Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I am not quite sure why this piece of code isn't working.
Here's how my data looks like:
head(test)
Fiscal.Year Fiscal.Quarter Seller Product.Revenue Product.Quantity Product.Family Sales.Level.1 Group Fiscal.Week
1 2015 2015Q3 ABCD1234 4000 4 Paper cup Americas Paper Division 32
2 2014 2014Q1 DDH1234 300 5 Paper tissue Asia Pacific Paper Division 33
3 2015 2015Q1 PNS1234 298 6 Spoons EMEA Cutlery 34
4 2016 2016Q4 CCC1234 289 7 Knives Africa Cutlery 33
Now, my objective is to summarize revenue by year.
Here's the dplyr code I wrote:
test %>%
group_by(Fiscal.Year) %>%
select(Seller,Product.Family,Fiscal.Year) %>%
summarise(Rev1 = sum(Product.Revenue)) %>%
arrange(Fiscal.Year)
This doesnt work. I get the error:
Error: object 'Product.Revenue' not found
However, when I get rid of select statement, it works but then I don't get to see the output with Sellers, and Product family.
test %>%
group_by(Fiscal.Year) %>%
# select(Seller,Product.Family,Fiscal.Year) %>%
summarise(Rev1 = sum(Product.Revenue)) %>%
arrange(Fiscal.Year)
The output is :
# A tibble: 3 x 2
Fiscal.Year Rev1
<dbl> <dbl>
1 2014 300
2 2015 4298
3 2016 289
This works well.
Any idea what's going on? It's been about 3 weeks since I started programming in R. So, I'd appreciate your thoughts. I am following this guide: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
Also, I looked at similar threads on SO, but I believe they were relating to issues because of "+" sign:Error in dplyr group_by function, object not found
I am looking for the following output:
Fiscal.Year Rev1 Product Family Seller
<dbl> <dbl> ... ...
1 2014 ...
2 2015 ...
3 2016 ...
Many thanks
Ok. This did the trick:
test %>%
group_by(Fiscal.Year, Seller,Product.Family) %>%
summarise(Rev1 = sum(Product.Revenue)) %>%
arrange(Fiscal.Year)
Output:
Source: local data frame [4 x 4]
Groups: Fiscal.Year, Seller [4]
Fiscal.Year Seller Product.Family Rev1
<dbl> <chr> <chr> <dbl>
1 2014 DDH1234 Paper tissue 300
2 2015 ABCD1234 Paper cup 4000
3 2015 PNS1234 Spoons 298
4 2016 CCC1234 Knives 289
Related
I'm currently working through some coursework where data has been provided on supermarket chip sales. Part of the task is to remove any entries where products are not chips and have been provided code to enter to help with this:
productWords <- data.table(unlist(strsplit(unique(transaction_data[, "PROD_NAME"]), "")))
the data file provided = transaction_data and PROD_NAME variable is the column we're interested in.
This however, returns the error:
Error in strsplit(unique(transaction_data[, "PROD_NAME"]), "") : non-character argument
Can someone please explain what it is that I'm doing wrong, or am I missing something? I'm not sure how this code would be able to understand the product and differentiate between another, am I meant to be adding something in with the code based on product names I've seen while looking through the data?
Here are some lines of the data as an example:
DATE STORE_NBR LYLTY_CARD_NBR TXN_ID PROD_NBR PROD_NAME PROD_QTY TOT_SALES
<date> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 2018-10-17 1 1000 1 5 Natural Chip Compny SeaSalt175g 2 6
2 2019-05-14 1 1307 348 66 CCs Nacho Cheese 175g 3 6.3
3 2019-05-20 1 1343 383 61 Smiths Crinkle Cut Chips Chicken 170g 2 2.9
I am trying to extract the team with the maximum number of wins each year in women's college basketball, and I am currently stuck with having the number of wins for each year for each team, and I want only the team with the maximum number of wins in each year.
winsbyyear <- WomenCBnewdf %>%
group_by(Year,Team)%>%
summarise(totalwinsyr = sum(Outcome))
Output currently looks like this, but I am expecting to see each year only once with the team with the maximum number of wins in the subsequent columns
Year Team totalwinsyr
<fct> <chr> <dbl>
1 2014 AbileneChristian 10
2 2014 AirForce 0
3 2014 Akron 18
4 2014 Alabama 10
5 2014 AlabamaAM 3
6 2014 AlabamaHuntsville 0
7 2014 AlabamaMobile 0
8 2014 AlabamaSt 15
9 2014 AlaskaAnchorage 1
10 2014 AlbanyNY 16
How to select the rows with maximum values in each group with dplyr?
I have already looked here but I could not find any resources to help with a group_by() with multiple values
Create a new column with the number of wins and then filter:
winsbyyear <- WomenCBnewdf %>%
group_by(Year,Team)%>%
mutate(totalwinsyr = sum(Outcome)) %>%
filter(totalwinsyr == max(totalwinsyr))
I am currently writing a function to get the review and rating of an album by getting it from Pitchfork and removing HTML. The result should be a list with 2 elements: the review and the score of that album.
So far I have this and I am still figuring out what to return, the regex of the HTML part and the paste0 function. Thank you for your time!
pitchfork = function(url){
save = getURL(url)
cat(save,file = "review.txt")
a1 = '<div class="contents dropcap"><p>'
b1 = str_replace(save, paste0("^.*",a1),"")
a2 = '</div><a class="end-mark-container" href="/">'
b2 = str_replace(b1, paste0(a2,".*$"),"")
}
How about something like this?
library(xml2)
library(rvest)
library(tidyverse)
url <- "http://pitchfork.com/reviews/albums/grimes-miss-anthropocene"
html <- read_html(url)
review <- html %>%
xml_nodes("p") %>%
html_text() %>%
enframe("paragraph_no", "text")
review
## A tibble: 14 x 2
# paragraph_no text
# <int> <chr>
# 1 1 Best new music
# 2 2 Grimes’ first project as a bona fide pop star is more morose th…
# 3 3 In 2011, Grimes was eager to say in an interview that she had “…
# 4 4 Miss Anthropocene is Grimes’ fifth album and her first as that …
# 5 5 The result is a record that’s more morose than her previous wor…
# 6 6 In November 2018, Grimes released “We Appreciate Power,” a coll…
# 7 7 When Grimes veers away from high concept toward examining intim…
# 8 8 Miss Anthropocene thrills when it reveals a refined, linear evo…
# 9 9 So much about the actual music of Miss Anthropocene succeeds th…
#10 10 And that’s the obstacle, the slimy mouthfeel, standing in the w…
#11 11 Correction: An earlier version of this review erroneously state…
#12 12 Listen to our Best New Music playlist on Spotify and Apple Musi…
#13 13 Buy: Rough Trade
#14 14 (Pitchfork may earn a commission from purchases made through af…
review is a tibble and contains the review split by paragraph; it might need some additional cleaning up (like removing the first and last row(s)).
For the score we can use a class attribute selector
score <- html %>% xml_nodes("[class='score']") %>% html_text() %>% as.numeric()
score
#[1] 8.2
Wrapping things up (in a function)
Let's wrap everything in a function that returns a list with the review tibble and numeric score.
get_pitchfork_data <- function(url) {
html <- read_html(url)
list(
review = html %>%
xml_nodes("p") %>%
html_text() %>%
trimws() %>%
enframe("paragraph_no", "text"),
score = html %>%
xml_nodes("[class='score']") %>%
html_text() %>%
as.numeric())
}
Test 1:
Grimes - Miss Anthropocene
get_pitchfork_data("http://pitchfork.com/reviews/albums/grimes-miss-anthropocene")
#$review
## A tibble: 14 x 2
# paragraph_no text
# <int> <chr>
# 1 1 Best new music
# 2 2 Grimes’ first project as a bona fide pop star is more morose th…
# 3 3 In 2011, Grimes was eager to say in an interview that she had “…
# 4 4 Miss Anthropocene is Grimes’ fifth album and her first as that …
# 5 5 The result is a record that’s more morose than her previous wor…
# 6 6 In November 2018, Grimes released “We Appreciate Power,” a coll…
# 7 7 When Grimes veers away from high concept toward examining intim…
# 8 8 Miss Anthropocene thrills when it reveals a refined, linear evo…
# 9 9 So much about the actual music of Miss Anthropocene succeeds th…
#10 10 And that’s the obstacle, the slimy mouthfeel, standing in the w…
#11 11 Correction: An earlier version of this review erroneously state…
#12 12 Listen to our Best New Music playlist on Spotify and Apple Musi…
#13 13 Buy: Rough Trade
#14 14 (Pitchfork may earn a commission from purchases made through af…
#
#$score
#[1] 8.2
Test 2:
Radiohead - OK Computer (reissue)
get_pitchfork_data("https://pitchfork.com/reviews/albums/radiohead-ok-computer-oknotok-1997-2017/")
#$review
## A tibble: 12 x 2
# paragraph_no text
# <int> <chr>
# 1 1 Best new reissue
# 2 2 Twenty years on, Radiohead revisit their 1997 masterpiece with …
# 3 3 As they regrouped to figure out what their third album might be…
# 4 4 It’s still funny to think, two decades later, that Thom Yorke’s…
# 5 5 It’s unclear what happened to that album. OK Computer obviously…
# 6 6 OKNOTOK is something a little more interesting than a remaster …
# 7 7 But “Lift’s” reputation for positivity might be a little confus…
# 8 8 The most fun to be had with OKNOTOK is in these line-blurring m…
# 9 9 This fondness for camp and schlock has always been latent in Ra…
#10 10 The ghost of Bond followed them once they decamped from their s…
#11 11 Radiohead have been at least as brilliant at packaging and posi…
#12 12 Now that they have arrived at an autumnal, valedictory stage in…
#
#$score
#[1] 10
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I have a tibble, Agencies, with two columns as follows:
> head(Agencies, 10)
# A tibble: 10 x 2
AgencyNumber State
<int> <chr>
1 1 AR
2 2 Arkansas
3 3 Texas
4 4 Texas
5 5 TX
6 6 IL
7 7 Illinois
8 8 Illinois
9 9 IL
10 10 IL
I'm trying to add a column (Agencies$STATE) with the full state name. If Agencies$State is an abbreviation, it should use the abbr2state function to save the full name to the new column. If Agencies$State already has the full name, it should store the value of Agencies$State to the new column.
I'm using the following code:
Agencies$STATE <- "NA"
for(i in 1:nrow(Agencies)) {
if(nchar(Agencies$State[i] == 2)) {
Agencies$STATE[i] <- abbr2state(Agencies$State[i])
}
else {
Agencies$STATE[i] <- Agencies$State[i]
}
}
The output is unexpected. It appears to evaluate the first if statement as expected, but ignores the else statement.
> head(Agencies, 10)
# A tibble: 10 x 3
AgencyNumber State STATE
<int> <chr> <chr>
1 1 AR Arkansas
2 2 Arkansas <NA>
3 3 Texas <NA>
4 4 Texas <NA>
5 5 TX Texas
6 6 IL Illinois
7 7 Illinois <NA>
8 8 Illinois <NA>
9 9 IL Illinois
10 10 IL Illinois
I'm a bit new to R so this may be an obvious error, but I'm missing it.
Any suggestions on why this isn't doing what I expect?
Thanks,
Jeff
Your statement nchar(Agencies$State[i] == 2)
should be (nchar(Agencies$State[i]) == 2)
You misplace the parenthesis
You can also use dplyr to avoid the loops
library(dplyr)
Agencies %>%
mutate(state = ifelse( stringi::stri_length(State) == 2,abbr2state(State),State))
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I'm studying the main functions of dplyr package. When I type flights I have it:
year month day dep_time sched_dep_time dep_delay
<int> <int> <int> <int> <int> <dbl>
1 2013 1 1 517 515 2
2 2013 1 1 533 529 4
3 2013 1 1 542 540 2
4 2013 1 1 544 545 -1
5 2013 1 1 554 600 -6
6 2013 1 1 554 558 -4
7 2013 1 1 555 600 -5
8 2013 1 1 557 600 -3
9 2013 1 1 557 600 -3
10 2013 1 1 558 600 -2
we can see day is a column name. When I type:
jan1 <- filter (flights, month == 1, day==1)
I get the error message
Error in match.arg(method) : object 'day' not found
But it is a column name. Could you help me?
i think you may have a different package loaded that also defines filter because
library(nycflights13)
filter(flights, month==1, day==2)
works for me.
Can you explicitly use dplyr::filter and see if it works then ?
dplyr::filter(flights, month==1, day==2)
Try this:
library(dplyr)
df <- tbl_df(data.frame(year = sample(2000:2017,10,replace = T),month = sample(1:12,10,replace = T),day = sample(1:31,10,replace = T)))
may3 <- filter(df,month == 5) %>% filter(day == 3)
or
may3 <- filter(df,month == 5 & day == 3)