Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I'm studying the main functions of dplyr package. When I type flights I have it:
year month day dep_time sched_dep_time dep_delay
<int> <int> <int> <int> <int> <dbl>
1 2013 1 1 517 515 2
2 2013 1 1 533 529 4
3 2013 1 1 542 540 2
4 2013 1 1 544 545 -1
5 2013 1 1 554 600 -6
6 2013 1 1 554 558 -4
7 2013 1 1 555 600 -5
8 2013 1 1 557 600 -3
9 2013 1 1 557 600 -3
10 2013 1 1 558 600 -2
we can see day is a column name. When I type:
jan1 <- filter (flights, month == 1, day==1)
I get the error message
Error in match.arg(method) : object 'day' not found
But it is a column name. Could you help me?
i think you may have a different package loaded that also defines filter because
library(nycflights13)
filter(flights, month==1, day==2)
works for me.
Can you explicitly use dplyr::filter and see if it works then ?
dplyr::filter(flights, month==1, day==2)
Try this:
library(dplyr)
df <- tbl_df(data.frame(year = sample(2000:2017,10,replace = T),month = sample(1:12,10,replace = T),day = sample(1:31,10,replace = T)))
may3 <- filter(df,month == 5) %>% filter(day == 3)
or
may3 <- filter(df,month == 5 & day == 3)
Related
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I have a data.frame and would like to average a column where there is an NA present.
when performing the calculation I noticed that R cannot calculate the average, returning NA as a result.
OBS: I cannot remove the line with NA as it would remove other columns with values that interest me.
df1<-read.table(text="st date ph
1 01/02/2004 5
16 01/02/2004 6
2 01/02/2004 8
2 01/02/2004 8
2 01/02/2004 8
16 01/02/2004 6
1 01/02/2004 NA
1 01/02/2004 5
16 01/02/2004 NA
", sep="", header=TRUE)
df2<-df1%>%
group_by(st, date)%>%
summarise(ph=mean(ph))
View(df2)
out
my expectation was this result:
You need to use na.rm = TRUE:
df2<-df1%>%
group_by(st, date)%>%
summarise(ph=mean(ph, na.rm = TRUE))
df2
# A tibble: 3 x 3
# Groups: st [3]
st date ph
<int> <chr> <dbl>
1 1 01/02/2004 5
2 2 01/02/2004 8
3 16 01/02/2004 6
I had an R question concerning data wrangling. A sample data set I will include is downloadable online:
x<- read.csv("http://mgimond.github.io/ES218/Data/CO2.csv")
The datatable is shown in the attached image.
Example data table
I want to create a new column, let's say "time_since". This column would look at the "Average" column and calculate the time (in this case months) since "Average" is less than 300. So in this screenshot all are >300, so the value would be "0", but the month that eventually has a value less than 300 would then be "1" (representing 1 month since it has been one month under 300). If the following months are still under 300, this would increase according to the months that go by, but as soon as it become >300 again it will reset.
Basically it would be a function that would calculate the difference in time since a conditional statement is met, then restarts when the conditional is broken across dates.
I apologize if I worded it a bit confusing but hopefully the message comes across.
Maybe you can try :
library(dplyr)
x %>%
group_by(grp = cumsum(as.integer(Average > 300))) %>%
mutate(time_since = row_number()) %>%
ungroup -> result
Just to show you one excerpt of output where time_since > 1.
result %>% filter(grp == 61)
# Year Month Average Interpolated Trend Daily_mean grp time_since
# <int> <int> <dbl> <dbl> <dbl> <int> <int> <int>
#1 1964 1 320. 320. 320. -1 61 1
#2 1964 2 -100. 320. 320. -1 61 2
#3 1964 3 -100. 321. 320. -1 61 3
#4 1964 4 -100. 322. 319. -1 61 4
Here is a data.table approach. For this example, time_since is displaying the cumulative total of rows when the Average variable is greater than 315.
x<- read.csv("http://mgimond.github.io/ES218/Data/CO2.csv")
library(data.table)
setDT(x)
x[, ':='(time_since = seq(1:.N)), keyby = .(cumsum(Average < 315))][1:10, ]
#> Year Month Average Interpolated Trend Daily_mean time_since
#> 1: 1959 1 315.62 315.62 315.70 -1 1
#> 2: 1959 2 316.38 316.38 315.88 -1 2
#> 3: 1959 3 316.71 316.71 315.62 -1 3
#> 4: 1959 4 317.72 317.72 315.56 -1 4
#> 5: 1959 5 318.29 318.29 315.50 -1 5
#> 6: 1959 6 318.15 318.15 315.92 -1 6
#> 7: 1959 7 316.54 316.54 315.66 -1 7
#> 8: 1959 8 314.80 314.80 315.81 -1 1
#> 9: 1959 9 313.84 313.84 316.55 -1 1
#> 10: 1959 10 313.26 313.26 316.19 -1 1
Created on 2021-03-17 by the reprex package (v0.3.0)
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
i have a simple question. I have a big df like :
Name AGE Order
Anna 25 1
Anna 28 2
Peter 10 1
Paul 15 1
Mary 14 1
John 8 1
Charlie 24 2
Robert 20 2
For just Order= 1 , I need filter AGE>=10 & AGE<=15. So output file must be:
Name AGE Order
Anna 28 2
Peter 10 1
Paul 15 1
Mary 14 1
Charlie 24 2
Robert 20 2
Could you help me, please?
We can use vectorized ifelse
For Order = 1 check if AGE lies in the range of 10-15, select rest rows as it is.
df[ifelse(df$Order==1, df$AGE >= 10 & df$AGE <= 15, TRUE), ]
# Name AGE Order
#2 Anna 28 2
#3 Peter 10 1
#4 Paul 15 1
#5 Mary 14 1
#7 Charlie 24 2
#8 Robert 20 2
We can also consolidate to:
subset(df, AGE >= 10 & AGE <= 15 | Order != 1)
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I am not quite sure why this piece of code isn't working.
Here's how my data looks like:
head(test)
Fiscal.Year Fiscal.Quarter Seller Product.Revenue Product.Quantity Product.Family Sales.Level.1 Group Fiscal.Week
1 2015 2015Q3 ABCD1234 4000 4 Paper cup Americas Paper Division 32
2 2014 2014Q1 DDH1234 300 5 Paper tissue Asia Pacific Paper Division 33
3 2015 2015Q1 PNS1234 298 6 Spoons EMEA Cutlery 34
4 2016 2016Q4 CCC1234 289 7 Knives Africa Cutlery 33
Now, my objective is to summarize revenue by year.
Here's the dplyr code I wrote:
test %>%
group_by(Fiscal.Year) %>%
select(Seller,Product.Family,Fiscal.Year) %>%
summarise(Rev1 = sum(Product.Revenue)) %>%
arrange(Fiscal.Year)
This doesnt work. I get the error:
Error: object 'Product.Revenue' not found
However, when I get rid of select statement, it works but then I don't get to see the output with Sellers, and Product family.
test %>%
group_by(Fiscal.Year) %>%
# select(Seller,Product.Family,Fiscal.Year) %>%
summarise(Rev1 = sum(Product.Revenue)) %>%
arrange(Fiscal.Year)
The output is :
# A tibble: 3 x 2
Fiscal.Year Rev1
<dbl> <dbl>
1 2014 300
2 2015 4298
3 2016 289
This works well.
Any idea what's going on? It's been about 3 weeks since I started programming in R. So, I'd appreciate your thoughts. I am following this guide: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
Also, I looked at similar threads on SO, but I believe they were relating to issues because of "+" sign:Error in dplyr group_by function, object not found
I am looking for the following output:
Fiscal.Year Rev1 Product Family Seller
<dbl> <dbl> ... ...
1 2014 ...
2 2015 ...
3 2016 ...
Many thanks
Ok. This did the trick:
test %>%
group_by(Fiscal.Year, Seller,Product.Family) %>%
summarise(Rev1 = sum(Product.Revenue)) %>%
arrange(Fiscal.Year)
Output:
Source: local data frame [4 x 4]
Groups: Fiscal.Year, Seller [4]
Fiscal.Year Seller Product.Family Rev1
<dbl> <chr> <chr> <dbl>
1 2014 DDH1234 Paper tissue 300
2 2015 ABCD1234 Paper cup 4000
3 2015 PNS1234 Spoons 298
4 2016 CCC1234 Knives 289
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a very large csv file I have imported into R and need to make a subset of data.
The csv looks something like this:
Julian_Day Id Year
52 1 1901
56 5 1901
200 1 1968
ect, where year is 1901-2010, Id 1-58 and Julian_Day 1-200 for about 130,000 rows of data. So I only want the lowest Julian Day value for each year for each Id and to get rid of all other rows of data.
Data:
df = data.frame(Year=c(1901,1901,1968,1901),
Id=c(1,5,1,1),
Julian_Day=c(52,56,200,40),
Animal=c('dog','doggy','style','fashion'))
Try this:
library(data.table)
setDT(df)[ ,min:=min(Julian_Day), by=list(Id, Year)]
#>df
# Year Id Julian_Day Animal min
#1: 1901 1 52 dog 40
#2: 1901 5 56 doggy 56
#3: 1968 1 200 style 200
#4: 1901 1 40 fashion 40
Or simply with base R
aggregate(Julian_Day ~., df, min)
# Year Id Julian_Day
# 1 1901 1 40
# 2 1968 1 200
# 3 1901 5 56
Or
library(dplyr)
df %>%
group_by(Id, Year) %>%
summarise(Julian_Day = min(Julian_Day))
# Source: local data frame [3 x 3]
# Groups: Id
#
# Id Year Julian_Day
# 1 1 1901 40
# 2 1 1968 200
# 3 5 1901 56