Find which rows in a data frame have negative values? - r

I have a data frame called y and I want to know exactly which rows have at least one negative number so I can select those rows of y and inspect them myself. I tried a few different ways but they gave me strange results so I ended up writing the following for loop:
temp <- NULL
# Check for negative values
for(i in 1:nrow(y)) {
for(j in 1:length(y)) {
if(y[i,j] < 0) {
temp[i] <- i
}
}
}
na.omit(temp)
This gave me what I wanted, but I'm pretty sure there's an easier way to code that. Maybe using apply or lapply or something else. Can anyone help me find more efficient code that gives me the row numbers of the rows that have at least one negative value? To be clear, I want the row numbers only

You can try which like below
which(y<0,arr.ind = TRUE)

You can use dplyr with rowwise() and rowSums():
creating toy dataframe:
df<-data.frame(a=c(2,6,0,8), b=c(2,7,5,0), c=c(1,0,2,8), d=c(4,3,8,5))
> df
a b c d
1 2 2 1 4
2 6 7 0 3
3 0 5 2 8
4 8 0 8 5
solution with base R using apply() with MARGIN=1:
df$has_negatives<-apply(df, 1, function(x) any(x<0))
df
a b c d has_negatives
1 2 2 1 4 FALSE
2 6 7 0 3 TRUE
3 0 5 2 8 TRUE
4 8 0 8 5 TRUE
Solution with dplyr:
df%>%rowwise()%>%mutate(has_negatives=rowSums(across(everything(), ~(.<0))))
# A tibble: 4 x 5
# Rowwise:
a b c d has_negatives
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 2 1 4 0
2 6 7 0 3 1
3 0 5 2 8 1
4 8 0 8 5 1

Related

Order/Sort/Rank a table

I have a table like this
table(mtcars$gear, mtcars$cyl)
I want to rank the rows by the ones with more observations in the 4 cylinder. E.g.
4 6 8
4 8 4 0
5 2 1 2
3 1 2 12
I have been playing with order/sort/rank without much success. How could I order tables output?
We can convert table to data.frame and then order by the column.
sort_col <- "4"
tab <- as.data.frame.matrix(table(mtcars$gear, mtcars$cyl))
tab[order(-tab[sort_col]), ]
# OR tab[order(tab[sort_col], decreasing = TRUE), ]
# 4 6 8
#4 8 4 0
#5 2 1 2
#3 1 2 12
If we don't want to convert it into data frame and want to maintain the table structure we can do
tab <- table(mtcars$gear, mtcars$cyl)
tab[order(-tab[,dimnames(tab)[[2]] == sort_col]),]
# 4 6 8
# 4 8 4 0
# 5 2 1 2
# 3 1 2 12
Could try this. Use sort for the relevant column, specifying decreasing=TRUE; take the names of the sorted rows and subset using those.
table(mtcars$gear, mtcars$cyl)[names(sort(table(mtcars$gear, mtcars$cyl)[,1], dec=T)), ]
4 6 8
4 8 4 0
5 2 1 2
3 1 2 12
In the same scope as Milan, but using the order() function, instead of looking for names() in a sort()-ed list.
The [,1] is to look at the first column when ordering.
table(mtcars$gear, mtcars$cyl)[order(table(mtcars$gear, mtcars$cyl)[,1], decreasing=T),]

for loop & if function in R

I was writing a loop with if function in R. The table is like below:
ID category
1 a
1 b
1 c
2 a
2 b
3 a
3 b
4 a
5 a
I want to use the for loop with if function to add another column to count each grouped ID, like below count column:
ID category Count
1 a 1
1 b 2
1 c 3
2 a 1
2 b 2
3 a 1
3 b 2
4 a 1
5 a 1
My code is (output is the table name):
for (i in 2:nrow(output1)){
if(output1[i,1] == output[i-1,1]){
output1[i,"rn"]<- output1[i-1,"rn"]+1
}
else{
output1[i,"rn"]<-1
}
}
But the result returns as all count column values are all "1".
ID category Count
1 a 1
1 b 1
1 c 1
2 a 1
2 b 1
3 a 1
3 b 1
4 a 1
5 a 1
Please help me out... Thanks
There are packages and vectorized ways to do this task, but if you are practicing with loops try:
output1$rn <- 1
for (i in 2:nrow(output1)){
if(output1[i,1] == output1[i-1,1]){
output1[i,"rn"]<- output1[i-1,"rn"]+1
}
else{
output1[i,"rn"]<-1
}
}
With your original code, when you made this call output1[i-1,"rn"]+1 in the third line of your loop, you were referencing a row that didn't exist on the first pass. By first creating the row and filling it with the value 1, you give the loop something explicit to refer to.
output1
# ID category rn
# 1 1 a 1
# 2 1 b 2
# 3 1 c 3
# 4 2 a 1
# 5 2 b 2
# 6 3 a 1
# 7 3 b 2
# 8 4 a 1
# 9 5 a 1
With the package dplyr you can accomplish it quickly with:
library(dplyr)
output1 %>% group_by(ID) %>% mutate(rn = 1:n())
Or with data.table:
library(data.table)
setDT(output1)[,rn := 1:.N, by=ID]
With base R you can also use:
output1$rn <- with(output1, ave(as.character(category), ID, FUN=seq))
There are vignettes and tutorials on the two packages mentioned, and by searching ?ave in the R console for the last approach.
looping solution will be painfully slow for bigger data. Here is one line solution using data.table:
require(data.table)
a<-data.table(ID=c(1,1,1,2,2,3,3,4,5),category=c('a','b','c','a','b','a','b','a','a'))
a[,':='(category_count = 1:.N),by=.(ID)]
what you want is actually a column of factor level. do this
df$count=as.numeric(df$category)
this will give out put as
ID category count
1 1 a 1
2 1 b 2
3 1 c 3
4 2 a 1
5 2 b 2
6 3 a 1
7 3 b 2
8 4 a 1
9 5 a 1
provided your category is already a factor. if not first convert to factor
df$category=as.factor(df$category)
df$count=as.numeric(df$category)

Divide each rows by a different number

I've looked on the internet but I haven found the answer that I'm looking for, but shure it's out there...
I've a data frame, and I want to divide (or any other operation) every cell of a row by a value that it's placed in the second column of my data frame.
So first row from col3 to last col, divide each cell by the value of col2 of that certain row, and so on for every single row.
I have solved this by using a For loop, col2 (delta) it's now a vector, and col3 to end it's a data.frame (mu). The results are append to a new data frame by using rbind.
The question is; I'm pretty sure that this can be done by using the function apply, sapply or similar, but I have not gotten the results that I've been looking so far (not the good ones as I do with the loop for). ¿How can I do it without using a loop for?
Loop for I've been using so far.
In resume.
I want to divide each mu by the delta value of it's own row.
for (i in 1:(dim(mu)[1])){
RA_row <- mu[i,]/delta[i]
RA <- rbind(RA, RA_row)
}
transcript delta mu_5 mu_15 mu_25 mu_35 mu_45 mu_55 mu_65
1 YAL001C 0.066702720 2.201787e-01 1.175731e-01 2.372506e-01 0.139281317 0.081723456 1.835414e-01 1.678318e-01
2 YAL002W 0.106000180 3.685822e-01 1.326865e-01 2.887973e-01 0.158207858 0.193476082 1.867039e-01 1.776946e-01
3 YAL003W 0.022119345 2.271518e+00 2.390637e+00 1.651997e+00 3.802739732 2.733559839 2.772454e+00 3.571712e+00
Thanks
It appears as though you want just:
mu2 <- mu[-(1:2)]/mu[[2]]
# same as mu[-(1:2), ]/mu[['delta']]
That should produce a new dataframe with the division by row. Somewhat more dangerous would be to do the division "in place".
mu[-(1:2)] <- mu[-(1:2)]/mu[[2]]
> mu <- data.frame(a=1,b=1:10, c=rnorm(10), d=rnorm(10) )
> mu
a b c d
1 1 1 -1.91435943 0.45018710
2 1 2 1.17658331 -0.01855983
3 1 3 -1.66497244 -0.31806837
4 1 4 -0.46353040 -0.92936215
5 1 5 -1.11592011 -1.48746031
6 1 6 -0.75081900 -1.07519230
7 1 7 2.08716655 1.00002880
8 1 8 0.01739562 -0.62126669
9 1 9 -1.28630053 -1.38442685
10 1 10 -1.64060553 1.86929062
> (mu2 <- mu[-(1:2)]/mu[[2]])
c d
1 -1.914359426 0.450187101
2 0.588291656 -0.009279916
3 -0.554990812 -0.106022792
4 -0.115882600 -0.232340537
5 -0.223184021 -0.297492062
6 -0.125136500 -0.179198716
7 0.298166649 0.142861258
8 0.002174452 -0.077658337
9 -0.142922281 -0.153825205
10 -0.164060553 0.186929062
> (mu[-(1:2)] <- mu[-(1:2)]/mu[[2]] )
> mu
a b c d
1 1 1 -1.914359426 0.450187101
2 1 2 0.588291656 -0.009279916
3 1 3 -0.554990812 -0.106022792
4 1 4 -0.115882600 -0.232340537
5 1 5 -0.223184021 -0.297492062
6 1 6 -0.125136500 -0.179198716
7 1 7 0.298166649 0.142861258
8 1 8 0.002174452 -0.077658337
9 1 9 -0.142922281 -0.153825205
10 1 10 -0.164060553 0.186929062

Select max or equal value from several columns in a data frame

I'm trying to select the column with the highest value for each row in a data.frame. So for instance, the data is set up as such.
> df <- data.frame(one = c(0:6), two = c(6:0))
> df
one two
1 0 6
2 1 5
3 2 4
4 3 3
5 4 2
6 5 1
7 6 0
Then I'd like to set another column based on those rows. The data frame would look like this.
> df
one two rank
1 0 6 2
2 1 5 2
3 2 4 2
4 3 3 3
5 4 2 1
6 5 1 1
7 6 0 1
I imagine there is some sort of way that I can use plyr or sapply here but it's eluding me at the moment.
There might be a more efficient solution, but
ranks <- apply(df, 1, which.max)
ranks[which(df[, 1] == df[, 2])] <- 3
edit: properly spaced!

Calculating the occurrences of numbers in the subsets of a data.frame

I have a data frame in R which is similar to the follows. Actually my real ’df’ dataframe is much bigger than this one here but I really do not want to confuse anybody so that is why I try to simplify things as much as possible.
So here’s the data frame.
id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)
df <-data.frame(id,a,b,c,d,e)
df
Basically what I would like to do is to get the occurrences of numbers for each column (a,b,c,d,e) and for each id group (1,2,3) (for this latter grouping see my column ’id’).
So, for column ’a’ and for id number ’1’ (for the latter see column ’id’) the code would be something like this:
as.numeric(table(df[1:10,2]))
##The results are:
[1] 3 7
Just to briefly explain my results: in column ’a’ (and regarding only those records which have number ’1’ in column ’id’) we can say that number '1' occured 3 times and number '3' occured 7 times.
Again, just to show you another example. For column ’a’ and for id number ’2’ (for the latter grouping see again column ’id’):
as.numeric(table(df[11:20,2]))
##After running the codes the results are:
[1] 4 3 3
Let me explain a little again: in column ’a’ and regarding only those observations which have number ’2’ in column ’id’) we can say that number '1' occured 4 times, number '2' occured 3 times and number '3' occured 3 times.
So this is what I would like to do. Calculating the occurrences of numbers for each custom-defined subsets (and then collecting these values into a data frame). I know it is not a difficult task but the PROBLEM is that I’m gonna have to change the input ’df’ dataframe on a regular basis and hence both the overall number of rows and columns might change over time…
What I have done so far is that I have separated the ’df’ dataframe by columns, like this:
for (z in (2:ncol(df))) assign(paste("df",z,sep="."),df[,z])
So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c etc. But I’m really stuck now and I don’t know how to move forward…
Is there a proper, ”automatic” way to solve this problem?
How about -
> library(reshape)
> dftab <- table(melt(df,'id'))
> dftab
, , value = 1
variable
id a b c d e
1 3 8 2 2 4
2 4 6 3 2 4
3 4 2 1 5 1
, , value = 2
variable
id a b c d e
1 0 1 4 3 3
2 3 3 3 6 2
3 1 4 5 3 4
, , value = 3
variable
id a b c d e
1 7 1 4 5 3
2 3 1 4 2 4
3 5 4 4 2 5
So to get the number of '3's in column 'a' and group '1'
you could just do
> dftab[3,'a',1]
[1] 4
A combination of tapply and apply can create the data you want:
tapply(df$id,df$id,function(x) apply(df[id==x,-1],2,table))
However, when a grouping doesn't have all the elements in it, as in 1a, the result will be a list for that id group rather than a nice table (matrix).
$`1`
$`1`$a
1 3
3 7
$`1`$b
1 2 3
8 1 1
$`1`$c
1 2 3
2 4 4
$`1`$d
1 2 3
2 3 5
$`1`$e
1 2 3
4 3 3
$`2`
a b c d e
1 4 6 3 2 4
2 3 3 3 6 2
3 3 1 4 2 4
$`3`
a b c d e
1 4 2 1 5 1
2 1 4 5 3 4
3 5 4 4 2 5
I'm sure someone will have a more elegant solution than this, but you can cobble it together with a simple function and dlply from the plyr package.
ColTables <- function(df) {
counts <- list()
for(a in names(df)[names(df) != "id"]) {
counts[[a]] <- table(df[a])
}
return(counts)
}
results <- dlply(df, "id", ColTables)
This gets you back a list - the first "layer" of the list will be the id variable; the second the table results for each column for that id variable. For example:
> results[['2']]['a']
$a
1 2 3
4 3 3
For id variable = 2, column = a, per your above example.
A way to do it is using the aggregate function, but you have to add a column to your dataframe
> df$freq <- 0
> aggregate(freq~a+id,df,length)
a id freq
1 1 1 3
2 3 1 7
3 1 2 4
4 2 2 3
5 3 2 3
6 1 3 4
7 2 3 1
8 3 3 5
Of course you can write a function to do it, so it's easier to do it frequently, and you don't have to add a column to your actual data frame
> frequency <- function(df,groups) {
+ relevant <- df[,groups]
+ relevant$freq <- 0
+ aggregate(freq~.,relevant,length)
+ }
> frequency(df,c("b","id"))
b id freq
1 1 1 8
2 2 1 1
3 3 1 1
4 1 2 6
5 2 2 3
6 3 2 1
7 1 3 2
8 2 3 4
9 3 3 4
You didn't say how you'd like the data. The by function might give you the output you like.
by(df, df$id, function(x) lapply(x[,-1], table))

Resources