Need to know the meaning of each and every command . This is used to find the mean of entire row. But it also contains character type. So this code is used.
rowMeans(sapply(iris, function(x) as.numeric(as.character(x))), na.rm = T)
We can create an index to check the column type and then apply the rowMeans on the numeric columns
i1 <- sapply(iris, is.numeric)
rowMeans(iris[i1], na.rm = TRUE)
Related
I'm having trouble trying to standardize my data.
So, first things first, I create the dataframe object with my data, with my desired row names (and I remove the 1st column, as it is not needed.
EXPGli <-read.delim("C:/Users/i5/Dropbox/Guilherme Vergara/Doutorado/Data/Datasets/MergedEXP3.txt", row.names=2)
EXPGli <- EXPGli[,-1]
EXPGli <- as.data.frame(EXPGli)
Then, I am supposed to convert all the columns to Z-score (each column = gene expression values; each row = sample) -> the idea here is to convert every gene expression data to a Z-score value of it for each cell
Z_score <- function(x) {(x-mean(x))/ sd(x)}
apply(EXPGli, 2, Z_score)
Which returns me [ reached 'max' / getOption("max.print") -- omitted 1143 rows ]
And now my whole df is NA's cells.
Indeed, there are several NAs in the dataset, some full rows and even some columns.
I tried several approaches to remove NAs
EXPGli <- na.omit(EXPGli)
EXPGli %>% drop_na()
print(EXPGli[rowSums(is.na(EXPGli)) == 0, ])
na.exclude(EXPGli)
Yet apparently, it does not work. Additionally, trying to is.na(EXPGli)
Returns me False to all fields.
I would like to understand what am I doing wrong here, it seems that the issue might be NA's not being recognized in R as NA but I couldnt find a solve for this. Any input is very appreciatted, thanks in advance!
You may want to set the argument na.rm = TRUE in your calls to mean(x) and sd(x) inside the Z_score function, otherwise these calls would return NAs for any vector with NAs in it.
Z_score <- function(x) {(x-mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)}
I want to get the standard deviation of specific columns in a dataframe and store those means in a list in R.
The specific variable names of the columns are stored in a vector. For those specific variables (depends on user input) I want to calculate the standard deviation and store those in a list, over which I can loop then to use it in another part of my code.
I tried as follows, e.g.:
specific_variables <- c("variable1", "variable2") # can be of a different length depending on user input
data <- data.frame(...) # this is a dataframe with multiple columns, of which "variable1" and "variable2" are both columns from
sd_list <- 0 # empty variable for storage purposes
# for loop over the variables
for (i in length(specific_variables)) {
sd_list[i] <- sd(data$specific_variables[i], na.rm = TRUE)
}
print(sd_list)
I get an error.
Second attempt using colSds and sapply:
colSds(data[sapply(specific_variables, na.rm = TRUE)])
But the colSds function doesn't work (anymore?).
Ideally, I'd like to store those the standard deviations from certain column names into a list.
Lets assume you have a dataframe with two columns. The easiest way is to use apply:
frame<-data.frame(X=1:6,Y=rnorm(6))
sd_list<-apply(frame,2,sd)
the "2" in apply means: calculate sds for each column. A "1" would mean: calculate for each row.
There is no colSds() function, but colMeans() and colSums() do exist ...
With help of #shghm I found a way:
sd_list <- as.list(unname(apply(data[specific_variables], 2, sd, na.rm = TRUE)))
I'm currently working with a dataframe "dat." I'm trying to calculate a score using columns 69-88 (if there are values in any of those columns, then add them together and put the result in a new column called "score").
This is the code I have now:
dat$score <- 0
for (num in 69:88){
dat$score[!is.na(dat[,num])] <- dat$score+dat[,num]
}
This gives me a column where some rows show the correct score, but other rows are returning "NA". I also have 20 warnings messages that look like so:
1: In dat$score[!is.na(dat[, num])] <- dat$score + ... :
number of items to replace is not a multiple of replacement length
Why is my code working for some rows and not for others, and why am I getting this error?
Are you looking for the rowSums()function? You just have to add the argument na.rm=TRUE.
A solution with dplyr:
library(dplyr)
dat %>% mutate(score=rowSums(across(69:88), na.rm=TRUE))
Or with base R
dat$score<-rowSums(dat[, 69:88], na.rm=TRUE)
use apply. its usually quicker than for
dat$score <- apply(dat[,69:88], 1, sum, na.rm = T)
I have read a lot of similar questions, but cannot get any of the code to work. I simply want to add a row to the bottom of my data frame (df) that has the min value for each column for columns 2:8. Below is my code (that works) to add a total line, but I want a min line.
df[(nrow(df)+ 1),(2:8)] <- colSums(df[,2:8], na.rm=TRUE)
I have tried to get colMins in package matrixStats to work, but can't for some reason. Any help would be appreciated!
In base R, you can use sapply() to create an atomic vector of column minimums and then rbind() to attach it to the original data. I added an NA for the first value, since we need something there to add it to the original data.
rbind(df, c(NA, sapply(df[2:8], min, na.rm = TRUE)))
Obviously this assumes only 8 columns, and so df[-1] can be used in place of df[2:8].
And for a speed increase we can use vapply() over sapply() because we know the result will be a single numeric value.
rbind(df, c(NA, vapply(df[-1], min, 1, na.rm = TRUE)))
Update: In response to your comment on the other answer - to get "MIN" in the first column and the minimum values in all the rest, we can adjust the call to a named list and do it all in one go. This way we don't mix the column classes (character and numeric) and end up with unexpected classes in the columns of the resulting data.
rbind(
df,
c(setNames(list("MIN"), names(df)[1]), lapply(df[-1], min, na.rm = TRUE))
)
We can try colMins from library(matrixStats)
library(matrixStats)
rbind(df, c(NA,colMins(as.matrix(df[2:8]))))
Update
To replace NA with 'MIN',
rbind(df, c('MIN',as.list(colMins(as.matrix(df[2:8])))))
Or another approach would be to convert to matrix and use addmargins
addmargins(`row.names<-`(as.matrix(df[-1]), df$ID), 1, FUN=min)
data
set.seed(24)
df <- cbind(ID= 1:10,as.data.frame(matrix(sample(1:9, 7*10,
replace=TRUE), ncol=7)))
I have a column which contain numeric as well as non-numeric values. I want to find the mean of the numeric values which i can use it to replace the non-numeric values. How can this be done in R?
Say your data frame is named df and the column you want to "fix" is called df$x. You could do the following.
You have to unfactor and then convert to numeric. This will give you NAs for all the character strings that cannot be coalesced to numbers.
nums <- as.numeric(as.character(df$x))
As Richie Cotton pointed out, there is a "more efficient, but harder to remember" way to convert factors to numeric
nums <- as.numeric(levels(df$x))[as.integer(df$x)]
To get the mean, you use mean() but pass na.rm = T
m <- mean(nums, na.rm = T)
Assign the mean to all the NA values.
nums[is.na(nums)] <- m
You could then replace the old data, but I don't recommend it. Instead just add a new column
df$new.x <- nums
This is a function I wrote yesterday to combat the non-numeric types. I have a data.frame with unpredictable type for each column. I want to calculate the means for numeric, and leave everything else untouched.
colMeans2 <- function(x) {
# This function tries to guess column type. Since all columns come as
# characters, it first tries to see if x == "TRUE" or "FALSE". If
# not so, it tries to coerce vector into integer. If that doesn't
# work it tries to see if there's a ' \" ' in the vector (meaning a
# column with character), it uses that as a result. Finally if nothing
# else passes, it means the column type is numeric, and it calculates
# the mean of that. The end.
# browser()
# try if logical
if (any(levels(x) == "TRUE" | levels(x) == "FALSE")) return(NA)
# try if integer
try.int <- strtoi(x)
if (all(!is.na(try.int))) return(try.int[1])
# try if character
if (any(grepl("\\\"", x))) return(x[1])
# what's left is numeric
mean(as.numeric(as.character(x)), na.rm = TRUE)
# a possible warning about coerced NAs probably originates in the above line
}
You would use it like so:
apply(X = your.dataframe, MARGIN = 2, FUN = colMeans2)
It sort of depends on what your data looks like.
Does it look like this?
data = list(1, 2, 'new jersey')
Then you could
data.numbers = sapply(data, as.numeric)
and get
c(1, 2, NA)
And you can find the mean with
mean(data.numbers, na.rm=T)
A compact conversion:
vec <- c(0:10,"a","z")
vec2 <- (as.numeric(vec))
vec2[is.na(vec2)] <- mean(vec2[!is.na(vec2)])
as.numeric will print the warning message listed below and convert the non-numeric to NA.
Warning message:
In mean(as.numeric(vec)) : NAs introduced by coercion