I started with a daily time series of wind speeds. I wanted to examine of the mean and maximum number of consecutive days under a certain threshold change between two periods of time. This is how far I've come: I subsetted the data to rows with values beneath the threshold and identified consecutive days.
I now have a data frame that looks like this:
dates consecutive_days
1970-03-25 NA
1970-04-09 TRUE
1970-04-10 TRUE
1970-04-11 TRUE
1970-04-12 TRUE
1970-04-15 FALSE
1970-05-08 TRUE
1970-05-09 TRUE
1970-05-13 FALSE
What I want to do next is to find the maximum and mean length of the consecutive "TRUE"-arguments. (which in this case would be: maximum=4; mean=3).
Here is one method using rle:
# construct sample data.frame:
set.seed(1234)
df <- data.frame(days=1:12, consec=sample(c(TRUE, FALSE), 12, replace=T))
# get rle object
consec <- rle(df$consec)
# max consecutive values
max(consec$lengths[consec$values==TRUE])
# mean consecutive values
mean(consec$lengths[consec$values==TRUE])
Quoting from ?rle, rle
Compute[s] the lengths and values of runs of equal values in a vector
We save the results and then subset to consecutive TRUE observations to calculate the mean and max.
You could easily combine this into a function, or simply concatenate the results above:
myResults <- c("max"=max(consec$lengths[consec$values==TRUE]),
"mean"= mean(consec$lengths[consec$values==TRUE]))
Related
I have a dataframe x
and I need to calculate the number of steps from the 1st column by days or by certain 5-min intervals.
This code for dates works fine
b<-summarise(group_by(x,date),h = sum(steps))
But when I change date on interval,
b<-summarise(group_by(x,interval),h = sum(steps))
it returns only NA values
I am brand new to r and I am trying to calculate the proportion of the number of 'i' for each timepoint and then average them. I do not know the command for this but I have the script to find the total number of 'i' in the time points.
C1imask<-C16.3[,2:8]== 'i'&!is.na(C16.3[,2:8])
C16.3[,2:8][C1imask]
C1inactive<-C16.3[,2:8][C1imask]
length(C1inactive)
C1bcmask<-C16.3[,8]== 'bc'&!is.na(C16.3[,8])
C16.3[,8][C1bcmask]
C1broodcare<-C16.3[,8][C1bcmask]
length(C1broodcare)
C1amask<-C16.3[,12]== 'bc'&!is.na(C16.3[,12])
C16.3[,12][C1amask]
C1after<-C16.3[,12][C1amask]
length(C1after)
C1<-length(C1after)-length(C1broodcare)
C1
I'd try taking the mean of a logical vector created with the test. You would use na.rm as an argument to mean. You will get the proportion of non-NA values that meet the test rather than the proportion of with number of rows as the denominator.
test <- sample( c(0,1,NA), 100, replace=TRUE)
mean( test==0, na.rm=TRUE)
#[1] 0.5072464
If you needed a proportion of total number of rows you would use sum and divide by nrow(dframe_name). You can then use sapply or lapply to iterate across a group of columns.
I need to calculate how how many times is the first column greater than or equal to
the second column of the matrix using R.
I have done the following:
set.seed(123)
x = matrix(rnorm(4*4,mean=10,sd=2),nrow=4)
x
x[,1]>x[,2]
But I cant figure out how to count the times that the column 1 is greater than column 2, I have used function length but it didn't work out.
thank you!
Logicals can be converted to numbers, TRUE as one, FALSE to zero, therefore:
sum(x[,1] > x[,2])
Trying to extract the minor allele counts in a set of three columns. The counts are just the number of times each allele is seen in each row. I need to extract the lowest number without reporting 0. Some lines have 0 in one of the rows which is not wanted in the final minor count. Instances of equal rows should report the minor count as the equal value.
I have tried having multiple lines of if (true) statements but this is cumbersome and does not solve the issues fully because of the combination of different scenarios.
set.seed(100)
df <- data.frame((sample(0:100,50)),(sample(0:100,50)),(sample(0:100,50)))
names(df) <-c("nAA", "nAa", "naa")
# Minor count output
df[1,] <- "31"
df[2,] <- "19"
df[3,] <- "4"
I expect a fourth column with the minor count for each row.
You can use apply and select there with x[x>0] mimimum from counts lager 0 and with which you get the column where it is:
apply(df, 1, function(x) min(x[x>0])) #will give you the minimum
apply(df, 1, function(x) which(x==min(x[x>0]))) #will give you the column of the minumum
You can do it with this code. Here the function pmin give you the parallel min of a set of vectors (in this case, the 3 varaibles on your data frame).
library(dplyr)
mutate(df, min = pmin(nAA, nAa, naa))
I am a new R user.
I have a dataframe consisting of 50 columns and 300 rows. The first column indicates the ID while the 2nd until the last column are standard deviation (sd) of traits. The pooled sd for each column are indicated at the last row. For each column, I want to remove all those values ten times greater than the pooled sd. I want to do this in one run. So far, the script below is what I have came up for knowing whether a value is greater than the pooled sd. However, even the ID (character) are being processed (resulting to all FALSE). If I put raw_sd_summary[-1], I have no way of knowing which ID on which trait has the criteria I'm looking for.
logic_sd <- lapply(raw_sd_summary, function(x) x>tail(x,1) )
logic_sd_df <- as.data.frame(logic_sd)
What shall I do? And how can I extract all those values labeled as TRUE (greater than pooled sd) that are ten times greater than the pooled SD (along with their corresponding ID's)?
I think your code won't work since lapply will run on a data.frame's columns, not its rows as you want. Change it to
logic_sd <- apply(raw_sd_summary, 2, function(x) x>10*tail(x,1) )
This will give you a logical array of being more than 10 times the last row. You could recover the IDs by replacing the first column
logic_sd[,1] <- raw_sd_summary[,1]
You could remove/replace the unwanted values in the original table directly by
raw_sd_summary[-300,-1][logic_sd[-300,-1]]<-NA # or new value