I am currently trying to automate a process that seems simple, but I have to repeat the process 800 times. I have 2 datasets, each with 8 columns. One column is streamflow, and the other is a list of thresholds (each row has a different threshold). I want to know the number of days that the streamflow falls below the threshold. So far I've done this using mutate and ifelse.
lf1 <- Daily_average_Q %>%
mutate(lf1 = ifelse(Q1B < Thresholds$B1[1], '1', '0')
This gives me what I want for 1 threshold, but I have 100 thresholds that I need to use. I also need to do this over 8 sites, so I can't afford to do this 800 separate times. I just want the row number in Thresholds$B1[row#] to change automatically each time. I've tried looping with "for" but I can't figure out how to mutate and loop at the same time.
Thanks so much for any help!!
Related
i have a dataframe with a dozen columns,
and i want to run the function "ttestBF" on 5 of them.
each row is a subject.
and i want to compare inside each column between 2 conditions (male/female).
what i did looks like this:
ttestBF(df$sheker[df$sex == "f"],
df$sheker[df$sex == "m"])
five times for each column.
i have tried looking in the history of similar questions but they were quite different.
is there a way to do this more efficiently?
I want to find a series of consecutive rows in a dataset where a condition is met the most often.
I have two columns that I can use for this; Either one with ones and zeros that alternate based on the presence or absence of a condition or a column which increments for the duration across which the desirable condition is present. I envision that I will need to use subset(),filter(), and/or rle() in order to make this happen but am at a loss as to how to get it to work.
In the example, I want to find 6 sequential rows that maximize the instances in which happens occurs.
Given the input:
library(data.frame)
df<-data.frame(time=c(1:10),happens=c(1,1,0,0,1,1,1,0,1,1),count=c(1,2,0,0,1,2,3,0,1,2))
I would like to see as the output the rows 5 through 10, inclusive, as the data subset output, using either the happens or count columns since this sequence of rows would yield the highest output of happens occurrences on 6 consecutive rows.
library(zoo)
which.max( rollapply( df$happens, 6, sum) )
#[1] 5
The fifth window of 6 rows apparently holds the maximum sum of df$happens
So the answer is row 5:10
I am having a large dataset having more than 10 million records and 20 variables. I need to get every possible combination for 11 variables out of these 20 variables and for each combination the frequency also should be displayed.
I have tried count() in plyr package and table() function. But both of them are unable to get all possible combinations, since the number of combinations are very high (greater than 2^32 combinations) and also the size is huge.
Assume following dataset having 5 variables and 6 observations -
And I want all possible combinations of first three variables where frequencies are greater than 0.
Is there any other function to achieve this? I am just interested in combinations whose frequency is non-zero.
Thanks!
OK. I think I have an idea of what you require. If you are saying you want the count by N categories of rows in your table, you can do so with the data.table package. It will give you the count of all combinations that exist in the table. Simply list the required categories in the by arguement
DT<-data.table(val=rnorm(1e7),cat1=sample.int(10,1e7,replace = T),cat2=sample.int(10,1e7,replace = T),cat3=sample.int(10,1e7,replace = T))
DT_count<-DT[, .N, by=.(cat1,cat2,cat3)]
I have a number of dataframes, each with different numbers of rows. I want to break them all into smaller dataframes that have no more than 50 rows each, for example.
So, if I had a dataframe with 107 rows, I want to output the following:
A dataframe containing rows 1-50
A dataframe containing rows 51-100
A dataframe containing rows 101-107
I have been reading many examples using the split() function but I have not been able to find any usage of split() or any other solution to this that does not pre-define the number of dataframes to split into, or scrambles the order of the data, or introduce other problems.
This seems like such a simple task that I am surprised that I have not been able to find a solution.
Try:
split(df,(seq_len(nrow(df))-1) %/% 50)
What have in common the first 50 rows? If you make an integer division (%/%) of the index of row (less one) by 50, they all give 0 as result. As you can guess, rows 51-100 give 1 and so on. The (seq_len(nrow(df))-1) %/% 50 basically indicate the group you want to split into.
I have a dataframe with 1000 columns. I am trying to loop over 10 columns at a time and use the seqdef() function from the TraMineR package to do sequence alignment across the data in those columns. Hence, I want to apply this function to columns 1-10 in the first go-around, and columns 11-20 in the second go-around, all the way up to 1000.
This is the code I am using.
library(TraMineR)
by(df[, 1:10], seqdef(df))
However, this only loops over the first 10 and then stops. How do I loop it across chunks of 10 columns?
You could do something like this using mapply and setting the sequence of columns to loop across.
colpicks <- seq(10,1000,by=10)
mapply(function(start,stop) seqdef(df[,start:stop]), colpicks-9, colpicks)