Calculate diff in data.frame - r

I'm trying to calculate the returns from a data.frame of prices.
diff((na.locf(precos_mes))
Some of the columns have NAs as values, so to remove them I use locf function, but when I apply diff over it, it returns the following error:
(list) object cannot be coerced to type 'double'
And when I try to unlist it, I lose all the information from each stock vector.
diff(as.numeric(unlist(na.locf(prices))))

Try
lapply(precos_mes, function(x) diff(na.locf(x)))
Or if you don't need to remove the NA values at the beginning
sapply(precos_mes, function(x) diff(na.locf(x, na.rm=FALSE)))
data
set.seed(24)
precos_mes <- as.data.frame(matrix(sample(c(NA,0:4), 20*5,
replace=TRUE), ncol=5))

Related

Convert negative matrix values to NA in a loop

I have four matrices which contain positive and negative values. Now I would like to convert all negative values for each matrix to NA. The matrices are called Main_mean, Inn_mean, Isar_mean and Danube_mean.
For a single matrix this would be quite easy:
Main_mean[Main_mean<=0] <- NA.
But how should it look like in a loop?
Get the matrix in a list and apply the function to each one using lapply :
list_obj <- mget(ls(pattern = '_mean$'))
#Or make a list individually
#list_obj <- mget(c('Main_mean', 'Danube_mean', 'Inn_mean', 'Isar_mean'))
result <- lapply(list_obj, function(x) {x[x<=0] <- NA;x})
To replace the original objects you can use list2env.
list2env(result, .GlobalEnv)

incompatible dimensions when using lapply(cor(x, 1:length(x))

I have a dataframe as follows:
a <- c(1,45,5,23,78,NA,NA)
b <- c(1,4,5,NA,NA,NA,NA)
c <- c(4,NA,NA,NA,NA,NA,NA)
d <- c(4,6,7,3,4,23,4)
df <- data.frame(a,b,c,d)
Now I would like to get a vector with the correlation factors of each vector with its own length omitting NAs.
For example: cor(df$a[!is.na(df$a)], 1:length(df$a[!is.na(df$a)])) which returns me the linear correlation factor of (1,45,5,23,78) with (1,2,3,4,5)
When I apply the above written code on one single column, it works.
However, when I include the function in the lapply function to get it for all the columns, I get an 'incompatible dimensions' error. I understand that the incompatible dimensions error indicates that different vector sizes are correlated. However, how is this possible when I am correlating the vector with its length itself...?
result <- lapply(df, function(x){ o <-cor(x[!is.na(x)], 1:length(x[!is.na(x)]))})
I also tried, which also returned me the same error.
result <- lapply(df, function(x) {o <-cor(c(x[!is.na(x)]),c(1:length(x[!is.na(x)])))})
have you try:
apply(df, 2, cor, y=1:nrow(df),use="complete.obs")
It's a more elegant way of coding your function. It may work better for you as well.

R date column error using data[data==""] <- NA

I am working with a data set which has all kinds of column classes, including class "Date". I try to assign NA to all empty values in this data set the following way:
data[data==""] <- NA
Obviously the date column makes some problems here, because there is the following error:
Error in charToDate(x) :
character string is not in a standard unambiguous format
I do not really know why this error occurs, since there are no empty values in the date column, so it should happen nothing there. The dates in the date column are in a standard format "%Y-%m-%d".
What is the problem here and how can I solve it?
You can create a logical index to subset columns other than the 'Date' class, and use that to replace the '' with NA
indx <- sapply(data, class)!='Date'
data[indx][data[indx]==''] <- NA
It is the 'Date' class that is creating the problem. Another option would be to convert the data to matrix so that all the columns will be character.
data[as.matrix(data)==''] <- NA
Or as suggested by #Frank (and using replace)
data[indx] <- lapply(data[indx], function(x) replace(x, which(x==''), NA))
data
set.seed(49)
data <- data.frame(Col1= sample(c('',LETTERS[1:3]), 10, replace=TRUE),
Col2=sample(c('',LETTERS[1:2]), 10, replace=TRUE),
Date=seq(as.Date('2010-01-01'),length.out=10, by='day'),
stringsAsFactors=FALSE)

R Apply function on data frame columns

I have a function in R to turn factors to numeric:
as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
and I have a dataframe that consists of both factors, numeric and other types of data.
I want to apply the functions above at once on the whole dataframe to turn all factors to numeric types columns.
Any idea ?
thanks
You could check whether the column is factor or not by is.factor and sapply. Use that as an index to filter out those columns and convert the columns to "numeric" by as.numeric.factor function in a lapply loop.
indx <- sapply(dat, is.factor)
dat[indx] <- lapply(dat[indx], as.numeric.factor)
You could also apply the function without subsetting (but applying it on a subset would be faster)
To prevent the columns to be converted to "factor", you could specify stringsAsFactors=FALSE argument or colClasses argument within the read.table/read.csv I would imagine the columns to have atleast a single non-numeric component which automatically convert this to factor while reading the dataset.
One option would be:
dat[] <- lapply(dat, function(x) if(is.factor(x)) as.numeric(levels(x))[x] else x)

Calculate Mean of a column in R having non numeric values

I have a column which contain numeric as well as non-numeric values. I want to find the mean of the numeric values which i can use it to replace the non-numeric values. How can this be done in R?
Say your data frame is named df and the column you want to "fix" is called df$x. You could do the following.
You have to unfactor and then convert to numeric. This will give you NAs for all the character strings that cannot be coalesced to numbers.
nums <- as.numeric(as.character(df$x))
As Richie Cotton pointed out, there is a "more efficient, but harder to remember" way to convert factors to numeric
nums <- as.numeric(levels(df$x))[as.integer(df$x)]
To get the mean, you use mean() but pass na.rm = T
m <- mean(nums, na.rm = T)
Assign the mean to all the NA values.
nums[is.na(nums)] <- m
You could then replace the old data, but I don't recommend it. Instead just add a new column
df$new.x <- nums
This is a function I wrote yesterday to combat the non-numeric types. I have a data.frame with unpredictable type for each column. I want to calculate the means for numeric, and leave everything else untouched.
colMeans2 <- function(x) {
# This function tries to guess column type. Since all columns come as
# characters, it first tries to see if x == "TRUE" or "FALSE". If
# not so, it tries to coerce vector into integer. If that doesn't
# work it tries to see if there's a ' \" ' in the vector (meaning a
# column with character), it uses that as a result. Finally if nothing
# else passes, it means the column type is numeric, and it calculates
# the mean of that. The end.
# browser()
# try if logical
if (any(levels(x) == "TRUE" | levels(x) == "FALSE")) return(NA)
# try if integer
try.int <- strtoi(x)
if (all(!is.na(try.int))) return(try.int[1])
# try if character
if (any(grepl("\\\"", x))) return(x[1])
# what's left is numeric
mean(as.numeric(as.character(x)), na.rm = TRUE)
# a possible warning about coerced NAs probably originates in the above line
}
You would use it like so:
apply(X = your.dataframe, MARGIN = 2, FUN = colMeans2)
It sort of depends on what your data looks like.
Does it look like this?
data = list(1, 2, 'new jersey')
Then you could
data.numbers = sapply(data, as.numeric)
and get
c(1, 2, NA)
And you can find the mean with
mean(data.numbers, na.rm=T)
A compact conversion:
vec <- c(0:10,"a","z")
vec2 <- (as.numeric(vec))
vec2[is.na(vec2)] <- mean(vec2[!is.na(vec2)])
as.numeric will print the warning message listed below and convert the non-numeric to NA.
Warning message:
In mean(as.numeric(vec)) : NAs introduced by coercion

Resources