Finding values in elements of a list greater than X - r

I have a list of elements called "find_gaps", below are the first 3 elements of the list:
$`2014-11-01 00:33:18`
1 1 1 1 1 1 1 1 1 118
$`2014-11-01 01:35:58` 1 1 1 1 1 1 1 1 1 116
$`2014-11-01 02:34:28` 1 25 25
I want to find values greater than or equal to 24 in each element, and have the output as a data frame where each column contains rows equal to the number of values greater than 24 for each list element. For example, the first element in "find_gaps" would correspond to a data frame column having only one row (with value 118). I am sure there is a way to do this, I have used the code below but I only get the position/index of the value in each list element greater than 24, and not the value itself:
greater_than_24<-lapply(find_gaps,function(x)which(x>=24))

greater_than_24<-unlist(lapply(find_gaps,function(x) length(which(x>=24))))
> as.data.frame(t(greater_than_24))
V1 V2 V3
1 1 1 2
Alternatively - this will pull off the values greater than 24 in each element of the list:
greater_than_24<-lapply(find_gaps,function(x) x[which(x>=24)])
> as.data.frame(t(greater_than_24))
V1 V2 V3
1 118 116 25, 25

This question already has an accepted answer, and the OP has described that he expects the output in wide form.
Nevertheless, I would like to propose a different approach which returns the result in long form (including the names of the list elements). I hope the OP finds this alternative representation of the result useful.
library(data.table)
data.table(find_gaps, name = names(find_gaps))[
, .(value = unlist(find_gaps)), by = name][value > 24]
name value
1: 2014-11-01 00:33:18 118
2: 2014-11-01 01:35:58 116
3: 2014-11-01 02:34:28 25
4: 2014-11-01 02:34:28 25

Related

Reverse cumsum with breaks with non-sequential numbers

Looking to fill a matrix with a reverse cumsum. There are multiple breaks that must be maintained.
I have provided a sample matrix for what I want to accomplish. The first column is the data, the second column is what I want. You will see that column 2 is updated to reflect the number of items that are left. When there are 0's the previous number must be carried through.
update <- matrix(c(rep(0,4),rep(1,2),2,rep(0,2),1,3,
rep(10,4), 9,8,6, rep(6,2), 5, 2),ncol=2)
I have tried multiple ways to create a sequence, loop using numerous packages (i.e. zoo). What is difficult is that the numbers in column 1 can be between 0,1,..,X but less than column 2.
Any help or tips would be appreciated
EDIT: Column 2 starts with a given value which can represent any starting value (i.e. inventory at the beginning of a month). Column 1 would then represent "purchases" made which; thus, column 2 should reflect the total number of remaining items available.
The following will report the purchase and inventory balance as described:
starting_inventory <- 100
df <- data.frame(purchases=c(rep(0,4),rep(1,2),2,rep(0,2),1,3))
df$cum_purchases <- cumsum(df$purchases)
df$remaining_inventory <- starting_inventory - df$cum_purchases
Result:
purchases cum_purchases remaining_inventory
1 0 0 100
2 0 0 100
3 0 0 100
4 0 0 100
5 1 1 99
6 1 2 98
7 2 4 96
8 0 4 96
9 0 4 96
10 1 5 95
11 3 8 92

Deleting column with the least sum in dataframes dynamically in R

In a data frame, I am trying to delete the column whose sum is the least. I want it to be dynamic since I want to use it in a function
E.g
a b c
1 434 0 45
2 5452 1 456
3 42342 0 26
4 542 1 15
5 542 1 323
6 413 0 45
I want to remove the 2nd column [i.e. column b] since its sum is the least, but this I want it to be done dynamically since I have to use it as a part of a function
We can try with colSums with which.min to create the index of the minimum column sum and remove that column.
df1[-which.min(colSums(df1))]
Or another option is Filter
mn <- min(sapply(df1, sum))
Filter(function(x) sum(x) != mn, df1)

Take difference of two columns in R resulting in a new third column

So far I have a data frame that looks like this:
Account Total Mastered Not_Mastered
1 1 NA NA
2 12 2 10
3 4 NA NA
4 51 50 1
The code I have is:
Table$not_mastered = (Table$total - Table$mastered)
My goal is to subtract the 'mastered' column from the 'total' column to result in a third column 'not_mastered' and if there is no value in the 'mastered' column then I want the new column to have the same value as the 'total' column. Shown below.
Account Total Mastered Not_Mastered
1 1 NA 1
2 12 2 10
3 4 NA 4
4 51 50 1
How can I skip over the NA values in the mastered column and rewrite the values from the total column?
We can use replace to change the NA values to 0 and then do the difference
with(df1, Total - replace(Mastered, is.na(Mastered), 0))
#[1] 1 10 4 1
Depending on what kind of software you are using, you should be able to catch those with a simple if-loop.
for index=1: (number of rows of data) % looks at each row, one at a time
if Mastered(index)==NA % if the value is the Mastered column is NA
NotMastered(index)=Total(index);
else
NotMastered(index)=Total(index)-Mastered(index);
end
end

Printing only certain panels in R lattice

I am plotting a quantile-quantile plot for a certain data that I have. I would like to print only certain panels that satisfy a condition that I put in for panel.qq(x,y,...).
Let me give you an example. The following is my code,
qq(y ~ x|cond,data=test.df,panel=function(x,y,subscripts,...){
if(length(unique(test.df[subscripts,2])) > 3 ){panel.qq(x,y,subscripts,...})})
Here y is the factor and x is the variable that will be plotted on X and y axis. Cond is the conditioning variable. What I would like is, only those panels be printed that pass the condition in the panel function, which is
if(length(unique(test.df[subscripts,2])) > 3).
I hope this information helps. Thanks in advance.
Added Sample data,
y x cond
1 1 6 125
2 2 5 125
3 1 5 125
4 2 6 125
5 1 3 125
6 2 8 125
7 1 8 125
8 2 3 125
9 1 5 125
10 2 6 125
11 1 5 124
12 2 6 124
13 1 6 124
14 2 5 124
15 1 5 124
16 2 6 124
17 1 4 124
18 2 7 124
19 1 0 123
20 2 11 123
21 1 0 123
22 2 11 123
23 1 0 123
24 2 11 123
25 1 0 123
26 2 11 123
27 1 0 123
28 2 2 123
So this is the sample data. What I would like is to not have a panel for 123 as the number of unique values for 123 is 3, while for others its 4. Thanks again.
Yeah, I think it is a subset problem, not a lattice one. You don't include an example, but it looks like you want to keep only rows where there are more than 3 rows for each value of whatever is in column 2 of your data frame. If so, here is a data.table solution.
library(data.table)
test.dt <- as.data.table(test.df)
test.dt.subset <- test.dt[,N:=.N,by=c2][N>3]
Where c2 is that variable in the second column. The last line of code first adds a variable, N, for the count of rows (.N) for each value of c2, then subsets for N>3.
UPDATE: And since a data table is also a data frame, you can use test.dt.subset directly as the data source in the call to qq (or other lattice function).
UPDATE 2: Here is one way to do the same thing without data.table:
d <- data.frame(x=1:15,y=1:15%%2, # example data frame
c2=c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5))
d$N <- 1 # create a column for count
split(d$N,d$c2) <- lapply(split(d$x,d$c2),length) # populate with count
d
d[d$N>3,] # subset
I did something very similar to DaveTurek.
My sample dataframe above is test.df
test.df.list <- split(test.df,test.df$cond,drop=F)
final.test.df <- do.call("rbind",lapply(test.df.list,function(r){
if(length(unique(r$x)) > 3){r}})
So, here I am breaking the test.df as a list of data.frames by the conditioning variable. Next, in the lapply I am checking the number of unique values in each of subset dataframe. If this number is greater than 3 then the dataframe is given /taken back if not it is ignored. Next, a do.call to bind all the dfs back to one big df to run the quantile quantile plot on it.
In case anyone wants to know the qq function call after getting the specific data. then it is,
trellis.device(postscript,file="test.ps",color=F,horizontal=T,paper='legal')
qq(y ~ x|cond,data=final.test.df,layout=c(1,1),pch=".",cex=3)
dev.off()
Hope this helps.

how to select matrix element in R?

Reading the data the following way
data<-read.csv("userStats.csv", sep=",", header=F)
I tried to select an element at the specific position.
The example of the data (first five rows) is the following (V2 is the date and V3 is the day of week):
V1 V2
1 00002781A2ADA816CDB0D138146BD63323CCDAB2 2010-09-04
2 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-04
3 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-07
4 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-08
5 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-17
V3 V4 V5 V6 V7 V8 V9
1 Saturday 2 2 615 1 1 47
2 Saturday 2 2 77 1 1 43
3 Tuesday 1 3 201 1 1 117
4 Wednesday 1 1 44 1 1 74
5 Friday 1 1 3 1 1 18
I tried to divide 6th column with 9th column in the first row the following way:
data[1,6]/data[1,9]
but it returned an error
[1] NA
Warning message:
In Ops.factor(data[1, 6], data[1, 9]) : / not meaningful for factors
Then I tried to select just one element
> data[2,9]
[1] 43
11685 Levels: 0 1 2 3 ... 55311
but don't know what these Levels are and what causes an error. Does anyone know how to select an element at the specific position data[row, column]?
Thank you!
My favorite tool to check variable class is str().
What you have there is a data frame and at least one of the columns you're trying to work with is a factor. See Dirk's answer on how to change classes of a column.
Command
data[1,6]/data[1,9]
is selecting the value in the first row of sixth column and dividing with the value in first row of the ninth column. Is this what you want? If you want to use values from the entire column (and not just the first row), you would write
data[6] / data[9]
or
data[, 6] / data[, 9]
Both arguments are equivalent for data.frames.
The standard modeling data structure in R is a data.frame.
The data.frame objects can hold various types: numeric, character, factor, ...
Now, when reading data via read.csv() et al, you can get bitten by the default valus of the stringsAsFactors option. I presume that at least a row in your data had text, so R decides to decode it as a factor and presto! you no longer can do direct mathematical operations on the column.
In short, do summary(data) and/or a sweep of class() over all the columns. Convert as necessary, or turn the stringsAsFactors variable to a different value or both.
Once your data is numeric, you can divide, slice, dice, ... as you please.

Resources