Standard deviation for a subset [closed] - r

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 years ago.
Improve this question
I am trying to calculate the mean and standard deviation for a variable within a subset. The coding works fine for mean but not sd. I have included sample where data= orf1 came from the subset. Any help?
mean(Stocking.Density2012,na.rm=TRUE,data=orf1)
[1] 13.72386
> sd(Stocking.Density2012,na.rm=TRUE,data=orf1)
Error in sd(Stocking.Density2012, na.rm = TRUE, data = orf1) :
unused argument (data = orf1)
Region Stocking.Density2012
1 12
8 7
2 12
8 17
1 34
3 24
1 16
2 5
1 5
4 11
1 5
3 3
7 3
5 13
1 18
4 15
2 18
1 10
6 5
1 10
5 46
1 19
3 12
1 15
6 4
1 4
7 8
1 8
8 12

data is neither an argument to mean nor to sd, so Stocking.Density2012 must be in the enclosing environment. Perhaps you attached it.
mean doesn't give an error because it has a ... argument, which sd does not.

Related

Unable to use frollmean and %>% [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I want to pipe my data table to frollmean to calculate rolling average of a column. But I am unable to get it work
head(mergedDT)
date Operating_hours DRIVING_TIME net_hrs workday
1 2018-03-20 110 759 0 TRUE
2 2018-03-21 121 641 11 TRUE
3 2018-03-22 133 625 12 TRUE
4 2018-03-23 145 672 12 TRUE
5 2018-03-24 145 0 0 FALSE
6 2018-03-25 145 0 0 FALSE
n_alarms
1 8
2 5
3 4
4 4
5 1
6 1
mergedDT %>% frollmean("n_alarms",2)
You can do:
mergedDT %>% mutate(mean=frollmean(n_alarms,2))

R- from columns to rows without taking the header [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have 8 variables per company, with a total of 25 companies. However, i don't need to make any distinction between these companies. If you look at the example: I need to have AH and JUMBO in one column, the same for AHQ1 and JUMBOQ1, and for both Q2s. In this way i don't have 6 columns, but just 3 and twice as much observations in these rows. The title of the column can stay AH, AHQ1, and AHQ2.
Thanks in advance for any tips!!
Example of data:
df <- data.frame("ID" = c(1,1,2,2,2,2), "Year" = c(2012, 2015,2012,2013,2015,2016),
"AH" = c(1, NA, 1,1,1,1), "AHQ1" = c(8, NA,7,8,9,10),
"AHQ2" = c(10,NA,7,8,5,2),"JUMBO" = c(NA,NA,1,1,1,NA),
"JUMBOQ1" = c(NA,NA,8,9,7,NA), "JUMBOQ2"= c(NA,NA,10,9,7,NA))
temp <- cbind(df[1:2], df[6:8])
names(temp) <- names(df[1:5])
df2 <- rbind(df[1:5], temp)
> df2
ID Year AH AHQ1 AHQ2
1 1 2012 1 8 10
2 1 2015 NA NA NA
3 2 2012 1 7 7
4 2 2013 1 8 8
5 2 2015 1 9 5
6 2 2016 1 10 2
7 1 2012 NA NA NA
8 1 2015 NA NA NA
9 2 2012 1 8 10
10 2 2013 1 9 9
11 2 2015 1 7 7
12 2 2016 NA NA NA
Is this what you are looking for?

How to plot with 2 grouping variables in R? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
How can I plot this in R with Department and Year grouped, with Time as x-axis and Counts as y-axis? And have a line connecting and colors the same groups
Department Year Counts Time
1 CPD 2011 24 0
2 CPD 2011 28 1
3 CPD 2011 31 2
4 APD 2012 20 0
5 APD 2012 25 2
6 APD 2012 21 3
7 CPD 2012 30 2
8 CPD 2012 26 3
9 CPD 2012 11 5
Do you mean something like this...
library(ggplot2)
df$depYr <- paste(df$Department,df$Year,sep="_") #set a combined dept_year variable
ggplot(df,aes(x=Time,y=Counts,colour=depYr,group=depYr))+geom_line()

subsetting a dataframe by a condition in R [duplicate]

This question already has answers here:
Filtering a data frame by values in a column [duplicate]
(3 answers)
Closed 3 years ago.
I have the following data with the ID of subjects.
V1
1 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
9 2
10 2
11 2
12 2
13 2
14 2
15 2
16 4
17 4
18 4
19 4
20 4
21 4
22 4
23 4
24 4
I want to subset all the rows of the data where V1 == 4. This way I can see which observations relate to subject 4.
For example, the correct output would be
16 4
17 4
18 4
19 4
20 4
21 4
22 4
23 4
24 4
However, the output I'm given after subsetting does not give me the correct rows . It simply gives me.
V1
1 4
2 4
3 4
4 4
5 4
6 4
7 4
8 4
I'm unable to tell which observations relate to subject 4, as observations 1:8 are for subject 2.
I've tried the usual methods, such as
condition<- df == 4
df[condition]
How can I subset the data so I'm given back a dataset that shows the correct row numbers for subject 4.
You can also use the subset function:
subset(df,df$V1==4)
I've managed to find a solution since posting.
newdf <- subset(df, V1 == 4).
However i'm still very interested in other solutions to this problems, so please post if you're aware of another method.

How to run a loop on different sections of the same data.frame [duplicate]

This question already has answers here:
Grouping functions (tapply, by, aggregate) and the *apply family
(10 answers)
Closed 7 years ago.
Suppose I have a data frame with 2 variables which I'm trying to run some basic summary stats on. I would like to run a loop to give me the difference between minimum and maximum seconds values for each unique value of number. My actual data frame is huge and contains many values for 'number' so subsetting and running individually is not a realistic option. Data looks like this:
df <- data.frame(number=c(1,1,1,2,2,2,2,3,3,4,4,4,4,4,4,5,5,5,5),
seconds=c(1,4,8,1,5,11,23,1,8,1,9,11,24,44,112,1,34,55,109))
number seconds
1 1 1
2 1 4
3 1 8
4 2 1
5 2 5
6 2 11
7 2 23
8 3 1
9 3 8
10 4 1
11 4 9
12 4 11
13 4 24
14 4 44
15 4 112
16 5 1
17 5 34
18 5 55
19 5 109
my current code only returns the value of the difference between minimum and maximum seconds for the entire data fram:
ZZ <- unique(df$number)
for (i in ZZ){
Y <- max(df$seconds) - min(df$seconds)
}
Since you have a lot of data performance should matter and you should use a data.table instead of a data.frame:
library(data.table)
dt <- as.data.table(df)
dt[, .(spread = (max(seconds) - min(seconds))), by=.(number)]
number spread
1: 1 7
2: 2 22
3: 3 7
4: 4 111
5: 5 108

Resources