Unable to use frollmean and %>% [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I want to pipe my data table to frollmean to calculate rolling average of a column. But I am unable to get it work
head(mergedDT)
date Operating_hours DRIVING_TIME net_hrs workday
1 2018-03-20 110 759 0 TRUE
2 2018-03-21 121 641 11 TRUE
3 2018-03-22 133 625 12 TRUE
4 2018-03-23 145 672 12 TRUE
5 2018-03-24 145 0 0 FALSE
6 2018-03-25 145 0 0 FALSE
n_alarms
1 8
2 5
3 4
4 4
5 1
6 1
mergedDT %>% frollmean("n_alarms",2)

You can do:
mergedDT %>% mutate(mean=frollmean(n_alarms,2))

Related

using endsWith(), question from R4DS learner [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed last year.
Improve this question
This is my first attempted reprex. I am working through R for Data Science. I am trying to narrow down a data frame to then be able to mutate it, but having trouble with the endsWith() function I think. When I run this section of the code I get the following error message. You can see that when I then change to (x, "delay") I get a different message. I am not sure how to deal with either and would love some help. Also, I'm not sure why, but dplyr::select() is working for me (as an example), while select() is not, so that's why it's different than the book. Thanks!
flights_sml <- dplyr::select(flights,
year:day,
endsWith("delay"),
distance,
air_time
)
Error: argument "suffix" is missing, with no default
Run rlang::last_error() to see where the error occurred.
flights_sml <- dplyr::select(flights,
year:day,
endsWith(x, "delay"),
distance,
air_time
)
Error: Must subset columns with a valid subscript vector.
x Subscript has the wrong type logical.
ℹ It must be numeric or character.
Good start. The tidy-select function is ends_with, while the base r function is endsWith
library(nycflights13)
library(dplyr)
flights_sml<-flights %>%
select(year:day,ends_with("delay"),distance,air_time)
flights_sml
#> # A tibble: 336,776 × 7
#> year month day dep_delay arr_delay distance air_time
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2013 1 1 2 11 1400 227
#> 2 2013 1 1 4 20 1416 227
#> 3 2013 1 1 2 33 1089 160
#> 4 2013 1 1 -1 -18 1576 183
#> 5 2013 1 1 -6 -25 762 116
#> 6 2013 1 1 -4 12 719 150
#> 7 2013 1 1 -5 19 1065 158
#> 8 2013 1 1 -3 -14 229 53
#> 9 2013 1 1 -3 -8 944 140
#> 10 2013 1 1 -2 8 733 138
#> # … with 336,766 more rows
Created on 2022-01-16 by the reprex package (v2.0.1)

Merge rows which have the same date within a data frame [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have a data.frame as follows:
timestamp index negative positive sentiment
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-10-29 15:00:10 0 11 10 -1
2 2015-10-29 17:26:48 0 1 5 4
3 2015-10-29 17:30:07 0 10 22 12
4 2015-10-29 20:13:22 0 5 6 1
5 2015-10-30 14:25:26 0 3 2 -1
6 2015-10-30 18:22:30 0 14 15 1
7 2015-10-31 14:16:00 0 10 23 13
8 2015-11-02 20:30:18 0 14 7 -7
9 2015-11-03 14:15:00 0 8 26 18
10 2015-11-03 16:52:30 0 12 34 22
I would like to know if there is a possibility to merge rows with equal days such that i have a scoring for each day, since I have absolutely no clue how to approach this problem because I dont even know how to unlist each date and write a function which merges only equal dates, because the time differs in each day . I would like to obtain a data.frame which has the following form:
timestamp index negative positive sentiment
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-10-29 0 27 43 16
2 2015-10-30 0 3 2 -1
3 2015-10-31 0 17 17 0
4 2015-11-02 0 14 7 -7
5 2015-11-03 0 20 60 40
Is there any possibility to get around to this result? I would be thankful for any hint.
You can use aggregate() to do this. Before doing that, you'll need to show that it should sort according to the day, ignoring the exact time-point.
I will assume you have your data stored as df:
aggregate(df[ ,2:5], FUN="sum", by=list(as.Date(df$timestamp, "%Y-%m-%d")))

Standard deviation for a subset [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 years ago.
Improve this question
I am trying to calculate the mean and standard deviation for a variable within a subset. The coding works fine for mean but not sd. I have included sample where data= orf1 came from the subset. Any help?
mean(Stocking.Density2012,na.rm=TRUE,data=orf1)
[1] 13.72386
> sd(Stocking.Density2012,na.rm=TRUE,data=orf1)
Error in sd(Stocking.Density2012, na.rm = TRUE, data = orf1) :
unused argument (data = orf1)
Region Stocking.Density2012
1 12
8 7
2 12
8 17
1 34
3 24
1 16
2 5
1 5
4 11
1 5
3 3
7 3
5 13
1 18
4 15
2 18
1 10
6 5
1 10
5 46
1 19
3 12
1 15
6 4
1 4
7 8
1 8
8 12
data is neither an argument to mean nor to sd, so Stocking.Density2012 must be in the enclosing environment. Perhaps you attached it.
mean doesn't give an error because it has a ... argument, which sd does not.

How to find LC50 using r? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have run cadmium exposure (46h) test and now I want to find LC50 value(Lethal Concentration)and 95% confidence limits (upper and lower limits) using R ?
Here are my data:
Conc. mg/L Dead Live
C1 0 10
C2 0 10
C3 0 10
2 0 10
2 0 10
2 0 10
4 0 10
4 0 10
4 0 10
8 0 10
8 0 10
8 0 10
16 1 9
16 1 9
16 8 8
32 1 9
32 2 8
32 4 6
64 8 2
64 2 8
64 5 5
128 10 0
128 8 2
128 10 0
256 10 0
256 10 0
256 10 0
From here, it seems that LC50 is the minimum concentration at which 50% or more of organisms die. You could aggregate your data to compute the proportion of organisms that died at each concentration level:
# Numeric concentration
dat$Conc.mg.L <- as.character(dat$Conc.mg.L)
dat$Conc.mg.L[dat$Conc.mg.L %in% c("C1", "C2", "C3")] <- 0
dat$Conc.mg.L <- as.numeric(dat$Conc.mg.L)
# Determine LC50
(agg <- tapply(dat$Dead / (dat$Dead+dat$Live), dat$Conc.mg.L, mean))
# 0 2 4 8 16 32 64 128 256
# 0.0000000 0.0000000 0.0000000 0.0000000 0.2333333 0.2333333 0.5000000 0.9333333 1.0000000
as.numeric(names(agg)[min(which(agg >= 0.5))])
# [1] 64

summing a range of columns in data frame

I am having trouble summing select columns within a data frame, a basic problem that I've seen numerous similar, but not identical questions/answers for on StackOverflow.
With this perhaps overly complex data frame:
site<-c(223,257,223,223,257,298,223,298,298,211)
moisture<-c(7,7,7,7,7,8,7,8,8,5)
shade<-c(83,18,83,83,18,76,83,76,76,51)
sampleID<-c(158,163,222,107,106,166,188,186,262,114)
bluestm<-c(3,4,6,3,0,0,1,1,1,0)
foxtail<-c(0,2,0,4,0,1,1,0,3,0)
crabgr<-c(0,0,2,0,33,0,2,1,2,0)
johnson<-c(0,0,0,7,0,8,1,0,1,0)
sedge1<-c(2,0,3,0,0,9,1,0,4,0)
sedge2<-c(0,0,1,0,1,0,0,1,1,1)
redoak<-c(9,1,0,5,0,4,0,0,5,0)
blkoak<-c(0,22,0,23,0,23,22,17,0,0)
my.data<-data.frame(site,moisture,shade,sampleID,bluestm,foxtail,crabgr,johnson,sedge1,sedge2,redoak,blkoak)
I want to sum the counts of each plant species (bluestem, foxtail, etc. - columns 4-12 in this example) within each site, by summing rows that have the same site number. I also want to keep information about moisture and shade (these are consistant withing site, but may also be the same between sites), and want a new column that is the count of number of rows summed.
the result would look like this
site,moisture,shade,NumSamples,bluestm,foxtail,crabgr,johnson,sedge1,sedge2,redoak,blkoak
211,5,51,1,0,0,0,0,0,1,0,0
223,7,83,4,13,5,4,8,6,1,14,45
257,7,18,2,4,2,33,0,0,1,1,22
298,8,76,3,2,4,3,9,13,2,9,40
The problem I am having is that, my real data sets (and I have several of them) have from 50 to 300 plant species, and I want refer a range of columns (in this case, [5:12] ) instead of my.data$foxtail, my.data$sedge1, etc., which is going to be very difficult with 300 species.
I know I can start off by deleting the column I don't need (SampleID)
my.data$SampleID <- NULL
but then how do I get the sums? I've messed with the aggregate command and with ddply, and have seen lots of examples which call particular column names, but just haven't gotten anything to work. I recognize this is a variant of a commonly asked and simple type of question, but I've spent hours without resolving it on my own. So, apologies for my stupidity!
This works ok:
x <- aggregate(my.data[,5:12], by=list(site=my.data$site, moisture=my.data$moisture, shade=my.data$shade), FUN=sum, na.rm=T)
library(dplyr)
my.data %>%
group_by(site) %>%
tally %>%
left_join(x)
site n moisture shade bluestm foxtail crabgr johnson sedge1 sedge2 redoak blkoak
1 211 1 5 51 0 0 0 0 0 1 0 0
2 223 4 7 83 13 5 4 8 6 1 14 45
3 257 2 7 18 4 2 33 0 0 1 1 22
4 298 3 8 76 2 4 3 9 13 2 9 40
Or to do it all in dplyr
my.data %>%
group_by(site) %>%
tally %>%
left_join(my.data) %>%
group_by(site,moisture,shade,n) %>%
summarise_each(funs(sum=sum)) %>%
select(-sampleID)
site moisture shade n bluestm foxtail crabgr johnson sedge1 sedge2 redoak blkoak
1 211 5 51 1 0 0 0 0 0 1 0 0
2 223 7 83 4 13 5 4 8 6 1 14 45
3 257 7 18 2 4 2 33 0 0 1 1 22
4 298 8 76 3 2 4 3 9 13 2 9 40
Try following using base R:
outdf<-data.frame(site=numeric(),moisture=numeric(),shade=numeric(),bluestm=numeric(),foxtail=numeric(),crabgr=numeric(),johnson=numeric(),sedge1=numeric(),sedge2=numeric(),redoak=numeric(),blkoak=numeric())
my.data$basic = with(my.data, paste(site, moisture, shade))
for(b in unique(my.data$basic)) {
outdf[nrow(outdf)+1,1:3] = unlist(strsplit(b,' '))
for(i in 4:11)
outdf[nrow(outdf),i]= sum(my.data[my.data$basic==b,i])
}
outdf
site moisture shade bluestm foxtail crabgr johnson sedge1 sedge2 redoak blkoak
1 223 7 83 13 5 4 8 6 1 14 45
2 257 7 18 4 2 33 0 0 1 1 22
3 298 8 76 2 4 3 9 13 2 9 40
4 211 5 51 0 0 0 0 0 1 0 0

Resources