I have a simple question. I have a data.frame that looks like this:
df
A B C Exclusion_criteria
3 4 5 3
2 1
6 9 2
I simply would like to mean rows of columns A, B and C (row-wise) when the Exclusion_criteria is different from 1 (e.g. for all cases except Exclusion_criteria == 1).
Can anyone help me please?
Kind regards
We can loop over the rows with apply, remove the element that is showed in the 4th column by, and get the mean
apply(df, 1, function(x) mean(x[1:3][-x[4]], na.rm = TRUE))
#[1] 3.5 NaN 6.0
Or another option is to replace the values in 'df' based on the row/column index (from 4th column) to NA and do a rowMeans
df[cbind(1:nrow(df), df[,4])] <- NA
rowMeans(df[1:3], na.rm = TRUE)
#[1] 3.5 NaN 6.0
Related
Can we do this thing without dplyr? I want to select those rows which have their rowmeans greater than the overall mean of the dataframe.
I have tried to use the function but it does not work.
tf12 <- apply(tf11, 2, function(x) filter(rowMeans(x) > mean(x)))
It gives the following error.
Error in rowMeans(x) : 'x' must be an array of at least two dimensions
We could unlist to calculate mean of entire dataframe and then compare it with rowMeans
tf11[rowMeans(tf11) > mean(unlist(tf11)), ]
Use na.rm = TRUE in mean and rowMeans if you have NA values in the dataframe.
Consider an example,
df <- data.frame(a = 1:10, b = 11:20)
df[rowMeans(df) > mean(unlist(df)), ]
# a b
#6 6 16
#7 7 17
#8 8 18
#9 9 19
#10 10 20
I am trying to convert a data frame of numbers stored as characters in a fraction form to be stored as numbers in decimal form. (There are also some integers, also stored as char.) I want to keep the current structure of the data frame, i.e. I do not want a list as a result.
Example data frame (note: the real data frame has all elements as character, here it is a factor but I couldn't figure out how to replicate a data frame with characters):
a <- c("1","1/2","2")
b <- c("5/2","3","7/2")
c <- c("4","9/2","5")
df <- data.frame(a,b,c)
I tried df[] <- apply(df,1, function(x) eval(parse(text=x))). This calculates the numbers correctly, but only for the last column, populating the data frame with that.
Result:
a b c
1 4 4.5 5
2 4 4.5 5
3 4 4.5 5
I also tried df[] <- lapply(df, function(x) eval(parse(text=x))), which had the following result (and I have no idea why):
a b c
1 3 3 2
2 3 3 2
3 3 3 2
Desired result:
a b c
1 1 2.5 4
2 0.5 3 4.5
3 2 3.5 5
Thanks a lot!
You are probably looking for:
df[] <- apply(df, c(1, 2), function(x) eval(parse(text = x)))
df
a b c
1 1.0 2.5 4.0
2 0.5 3.0 4.5
3 2.0 3.5 5.0
eval(parse(text = x))
evaluates one expression at a time so, you need to run cell by cell.
EDIT: if some data frame elements can not be evaluated you can account for that by adding an ifelse statement inside the function:
df[] <- apply(df, c(1, 2), function(x) if(x %in% skip){NA} else {eval(parse(text = x))})
Where skip is a vector of element that should not be evaluated.
Firstly, you should prevent your characters from turning into factors in data.frame()
df <- data.frame(a, b, c, stringsAsFactors = F)
Then you can wrap a simple sapply/lapply inside your lapply to achieve what you want.
sapply(X = df, FUN = function(v) {
sapply(X = v,
FUN = function(w) eval(parse(text=w)))
}
)
Side Notes
If you feed eval an improper expression such as expression(1, 1/2, 2), that evaluates to last value. This explains the 4 4.5 5 output. A proper expression(c(1, 1/2, 2)) evaluates to the expected answer.
The code lapply(df, function(x) eval(parse(text=x))) returns a 3 3 2 because sapply(data.frame(a,b,c), as.numeric) returns:
a b c
[1,] 1 2 1
[2,] 2 1 3
[3,] 3 3 2
These numbers correspond to the levels() of the factors, through which you were storing your fractions.
To those looking for a one-liner: you can use parse_ratio from the DOSE package to coerce the character fractions to numeric.
library(DOSE)
b <- c("5/2","3","7/2")
parse_ratio(b)
[1] 2.5 1.0 3.5
I have a data frame and want for each row the sum of every second cell (beginning with the second cell), whose left neighbor is greater than zero. Here's an example:
a <- c(-2,1,1,-2)
b <- c(1,2,3,4)
c <- c(-2,1,-1,2)
d <- c(5,6,7,8)
df <- data.frame(a,b,c,d)
This gives:
> df
a b c d
1 -2 1 -2 5
2 1 2 1 6
3 1 3 -1 7
4 -2 4 2 8
For the first row the correct sum is 0 (the left neighbor of 1 is -2 and the left neighbor of 5 is also -2); for the second it's 8; for the third it's 3; for the fourth it's again 8.
I want to do it without loops, so I tried it with sum() and which() like in Conditional Sum in R, but could not find a way through.
We subset the dataset for alternating columns using the recycling vector (c(TRUE, FALSE)) to get the 1st, 3rd, ...etc columns of the dataset, convert it to a logical vector by checking whether it is greater than 0 ( > 0), then multiply the values with the second subset of alternating columns ie. columns 2nd, 4th etc. by using the recycling vector (c(FALSE, TRUE)). The idea is that if there are values in the left column that are less than 0, it will be FALSE in the logical matrix and it gets coerced to 0 by multiplying with the other subset. Finally, do the rowSums to get the expected output
rowSums((df[c(TRUE, FALSE)]>0)*df[c(FALSE, TRUE)])
#[1] 0 8 3 8
It can be also replaced with seq
rowSums((df[seq(1, ncol(df), by = 2)]>0)*df[seq(2, ncol(df), by = 2)])
#[1] 0 8 3 8
Or another option is Reduce with Map
Reduce(`+`, Map(`*`, lapply(df[c(TRUE, FALSE)], `>`, 0), df[c(FALSE, TRUE)]))
#[1] 0 8 3 8
This question already has answers here:
There is pmin and pmax each taking na.rm, why no psum?
(3 answers)
Closed 6 years ago.
I'll just understand a (for me) weird behavior of the function rowSums. Imagine I have this super simple dataframe:
a = c(NA, NA,3)
b = c(2,NA,2)
df = data.frame(a,b)
df
a b
1 NA 2
2 NA NA
3 3 2
and now I want a third column that is the sum of the other two. I cannot use simply + because of the NA:
df$c <- df$a + df$b
df
a b c
1 NA 2 NA
2 NA NA NA
3 3 2 5
but if I use rowSums the rows that have NA are calculated as 0, while if there is only one NA everything works fine:
df$d <- rowSums(df, na.rm=T)
df
a b c d
1 NA 2 NA 2
2 NA NA NA 0
3 3 2 5 10
am I missing something?
Thanks to all
One option with rowSums would be to get the rowSums with na.rm=TRUE and multiply with the negated (!) rowSums of negated (!) logical matrix based on the NA values after converting the rows that have all NAs into NA (NA^)
rowSums(df, na.rm=TRUE) *NA^!rowSums(!is.na(df))
#[1] 2 NA 10
Because
sum(numeric(0))
# 0
Once you used na.rm = TRUE in rowSums, the second row is numeric(0). After taking sum, it is 0.
If you want to retain NA for all NA cases, it would be a two-stage work. I recommend writing a small function for this purpose:
my_rowSums <- function(x) {
if (is.data.frame(x)) x <- as.matrix(x)
z <- base::rowSums(x, na.rm = TRUE)
z[!base::rowSums(!is.na(x))] <- NA
z
}
my_rowSums(df)
# [1] 2 NA 10
This can be particularly useful, if the input x is a data frame (as in your case). base::rowSums would first check whether input is matrix or not. If it gets a data frame, it would convert it into a matrix first. Type conversion is in fact more costly than actual row sum computation. Note that we call base::rowSums two times. To reduce type conversion overhead, we should make sure x is a matrix beforehand.
For #akrun's "hacking" answer, I suggest:
akrun_rowSums <- function (x) {
if (is.data.frame(x)) x <- as.matrix(x)
rowSums(x, na.rm=TRUE) *NA^!rowSums(!is.na(x))
}
akrun_rowSums(df)
# [1] 2 NA 10
I have a data frame where each row is a vector of values of varying lengths. I would like to create a vector of the last true value in each row.
Here is an example data frame:
df <- read.table(tc <- textConnection("
var1 var2 var3 var4
1 2 NA NA
4 4 NA 6
2 NA 3 NA
4 4 4 4
1 NA NA NA"), header = TRUE); close(tc)
The vector of values I want would therefore be c(2,6,3,4,1).
I just can't figure out how to get R to identify the last value.
Any help is appreciated!
Do this by combining three things:
Identify NA values with is.na
Find the last value in a vector with tail
Use apply to apply this function to each row in the data.frame
The code:
lastValue <- function(x) tail(x[!is.na(x)], 1)
apply(df, 1, lastValue)
[1] 2 6 3 4 1
Here's an answer using matrix subsetting:
df[cbind( 1:nrow(df), max.col(!is.na(df),"last") )]
This max.col call will select the position of the last non-NA value in each row (or select the first position if they are all NA).
Here's another version that removes all infinities, NA, and NaN's before taking the first element of the reversed input:
apply(df, 1, function(x) rev(x[is.finite(x)])[1] )
# [1] 2 6 3 4 1