R max value of column where different column equals specific value - r

I'm looking for a way to find the maximum value of a column, but only in rows where a different column equals a given value.

Suppose all your data is stored in a data frame called dat
max(dat$columnYouWantMaxOf[dat$columnYouWantToHaveSpecificValue==ValueYouWantThisColumnToHave])

Related

Extract rows with min or max value from a Data Frame

Im working on a crash course for R at https://bioinformatics-core-shared-training.github.io/r-crash-course/crash-course.nb.html
The problem im facing is to extract rows that are min or max for a certain value.
For example, when running
df[df$tmp ==min(df$tmp),]
I get the correct row with the expected value.
However, when running the following code
df[min(df$tmp),]
I get something else completely.
Im wondering what is causing this discrepancy?
Assuming df$Tmp is numeric with no NAs, min(df$Tmp) should be returning a number. Assuming that number is an integer, i, df[min(df$Tmp),] will return the ith row of your data frame, assuming that your data frame has an ith row.
On the other hand, df[df$Tmp ==min(df$tmp),] will return the row(s) of df where df$Tmp is equal to the minimum value in that column.
df[df$Tmp ==min(df$tmp),] is the correct approach to get what you are looking for.
df[min(df$Tmp),] returns the row in df that is equal to min(df$Tmp). It may result in an error in certain cases for e.g. when min(df$Tmp) is not an integer, or is negative, or if it is greater than the number of rows in df etc. Hope this makes sense.

How do I find the maximum value of a column and the other information of the other columns from the same raw in R?

I have a data frame with three column, two of which are character and the third is numeric. How do I find the maximum value of the numerical column while getting all the rest of the information from the row?
so far I have:
apply(dataframe, 2, max)
We can use which.max to get the numeric index of the third column, subset the rows by using that as row index.
df[which.max(df[,3]),]
If there are ties, we can compare (==) the elements of the third column with the max value of that column to give a logical index which can as well be used as the row index.
df[df[,3]==max(df[,3]),]

Naming the number of the row in a data frame that contains a certain value

I've done some thorough research and I am struggling with an attempt to find a function that will name the number of the row (in my data frame the rows don't contain numbers) that contains a certain value. In this case a number.
e.g. Call the data frame = df
I don't know how to show a little image of the data frame but say that in row 5, column 4 the value was '162', is there a function I could use that will end with the return being '5' or 'row 5'?
I have used rowsums(df=="162")
which gives a long line of the rows, if they contain the values there is a '1' under them, if not a '0' but I need a function that simply states the row.
I couldn't figure out how to correctly use the 'which' function either.
which(df$col4=='162')
I am assuming that col4 is the name of the column number 4

Subset dataframe based on statistical range of each column

I would like to subset a dataframe by selecting only columns that exceed a specific range. IE, I would like to evaluate max-min for each column individually and select only columns whose range is greater than a given value. For example, given the following simple dataframe, I would like to create a subset dataframe that only contains columns with a range > 99. (Columns b an c.)
d <- data.frame(a=seq(0,10,1),b=seq(0,100,10),c=seq(0,200,20))
I have tried modifying the example here: Subset a dataframe based on a single condition applied to multiple columns, but have had no luck. I'm sure I'm missing something simple.
You can use sapply() to apply function to each column of d and then calculate difference for range of column values. Then compare it to 99. As result you will get TRUE or FALSE and then use it to subset columns.
d[,sapply(d,function(x) diff(range(x))>99)]

R - How to get value from a column based on value from another column of same row

I have a data frame (df) with 8 columns and 1200 rows. Among those 8 columns I want to find the minimum value of column 7 and find the corresponding value of column 2 in that particular row where the minimum value of column 7 was found. Also column 2 holds characters so I want a character vector giving me its value.
I found the minimum of column 7 using
min_val <- min(as.numeric(df[, 7]), na.rm = TRUE)
Now how do I get the value from column 2 (variable name of column being 'column.2') corresponding to the row in which column 7 contains value of 'min_val' as calculated above?
This might be a trivial question but I am new to R so any help will be much appreciated.
Use which.min to get the minimum value index. Something like :
df[which.min(df[,7]),2]
Note that which.min only returns the first index of the minimum, so if you've got several rows with the same minimal value, you will only get the first one.
If you want to get all the minimum rows, you can use :
df[which(df[,7]==min(df[,7])), 2]
The same answer from juba, but using data.table package (his answer uses just the R base, without the need of loading any libraries).
# Load data.table
library(data.table)
# Get 2nd column's value correspondent to the first minimum value in 7th column
df[which.min(V7), V2]
# Get all respective values in 2nd column correspondent to the minimum value in 7th column
df[V2 == min(V7), V2]
For handling data.frame-like objects, data.table is quite handly and helpful, just like the dplyr package. It's worth to look at them.
Here I've assumed your colnames were named as V1..V8. Otherwise, just replace the V7/V2 with the respective column names in 7th and 2nd position of your data, respectively.

Resources