error when using which.min() with ave() - r

I have some data, much of which is NA.
For simplicity, let's say it looks like this:
x = c(NA, 3, 4, 3.5, NA, NA, NA, NA, 7, 5)
bins = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4)
I'm using ave( ) and which.min( ) to get the minimum value for each bin type:
ave(x, segments, FUN = which.min)
But I get the error:
Error in `split<-.default`(`*tmp*`, g, value = lapply(split(x, g), FUN)) :
replacement has length zero
The reason this is happening (I think) is because bin # 3 only has NA values. When this is rectified, the error disappears. I could just use a function like:
ave(x, segments, FUN = function(xx){
if(all(is.na(xx))){
return(NA)
} else {
xx = which.min(xx)
return(xx)
}}
)
But:
1) that is hacky as heck. And
2) which.min(c(NA, NA, NA)) does not cause an error, nor does ave(c(NA, NA, NA), c(1, 1, 1), FUN=mean) - so what's going on that I'm missing?
--> Anybody have an idea of why this error happens / the best way to get around it?
Cheers.

Related

Recode values by function()

I have one quest (pretty short). I shoud recode variebles with function().
I tried some, but it doesn't work still.
It should work with this:
recode.numeric(x = c(5, 3, -5, 4, 3, 97),lb = 0, ub = 10)
And turn this call to c(5, 3, NA, 4, 3, NA)
My try are these:
recode.numeric <- function(x, lb, ub){
if(ub > x){x=x}
if (x > lb){x=x}
if(ub < x){x="NA"}
if (x<lb) {x="NA"}
}
So, what am I doing wrong?

Reshape data to long form based on a pattern and not unique identifier

I have some data that comes from the measurement of an image where essentially the columns signify position (x) and height (z) data. The problem is that this data gets spit out as a .csv file in the wide format. I am trying to find a way to convert this to the long format but I'm unsure how to do this because I can't designate an identifier.
I know there are a lot of questions on reshaping data but I didn't find anything quite like this.
As an example:
df <- data.frame(V1 = c("Profile", "x", "[m]", 0, 2, 4, 6, 8, 10, 12, NA, NA),
V2 = c("1", "z", "[m]", 3, 3, 4, 10, 12, 9, 2, NA, NA),
V3 = c("Profile", "x", "[m]", 0, 2, 4, 6, NA, NA, NA, NA, NA),
V4 = c("2", "z", "[m]", 4, 8, 10, 10, NA, NA, NA, NA, NA),
V5 = c("Profile", "x", "[m]", 0, 2, 4, 6, 8, 10, 12, 14, 17),
V2 = c("3", "z", "[m]", 0, 1, 1, 10, 14, 11, 6, 2, 0))
Every two columns represents X,Z data (you can see grouped by Profile 1, Profile 2, Profile 3, etc). However, measurements are not equal lengths, hence the rows with NAs. Is there a programmatic way to reshape this data into the long form? i.e.:
profile x z
Profile 1 0 3
Profile 1 2 3
Profile 1 4 4
... ... ...
Profile 2 0 4
Profile 2 2 8
Profile 2 4 10
... ... ...
Thank you in advance for your help!
You can do the following (Its a bit verbose, feel free to optimize):
dfcols <- NCOL(df)
xColInds <- seq(1,dfcols,by=2)
zColInds <- seq(2,dfcols,by=2)
longdata <- do.call("rbind",lapply(1:length(xColInds), function(i) {
xValInd <- xColInds[i]
zValInd <- zColInds[i]
profileName <- paste0(df[1,xValInd]," ",df[1,zValInd])
xVals <- as.numeric(df[-(1:3),xValInd])
zVals <- as.numeric(df[-(1:3),zValInd])
data.frame(profile=rep(profileName,length(xVals)),
x = xVals,
z = zVals)
}))
If you want it more performant, dont cast to data.frame every single iteration. One cast at the end is enough, like:
xColInds <- seq(1,NCOL(df),by=2)
longdataList <- lapply(xColInds, function(xci) {
list(profileName = paste0(df[1,xci]," ",df[1,xci+1]),
x = df[-(1:3),xci],
z = df[-(1:3),xci+1])
})
longdata <- data.frame(profile = rep(unlist(lapply(longdataList,"[[","profileName")),each=NROW(df)-3),
x = as.numeric(unlist(lapply(longdataList,"[[","x"))),
z = as.numeric(unlist(lapply(longdataList,"[[","z"))))

count non-NA values and group by variable

I am trying to show how many complete observations there are per variabie ID without using the complete.cases package or any other package.
If I use na.omit to filter out the NA values, I will lose all of the IDs which might have ZERO complete cases.
In the end, I'd like a frequency table with two columns: ID and Number of Complete Observations
> length(unique(data$ID))
[1] 332
> head(data)
ID value
1 1 NA
2 1 NA
3 1 NA
4 1 NA
5 1 NA
6 1 NA
> dim(data)
[1] 772087 2
When I try to create my own function z - which counts non-NA values and apply that in the aggregate() function, the IDs with zero complete observations are left out. I should be left with 332 rows, not 323. How does one resolve this using base functions?
z <- function(x){
sum(!is.na(x))
}
aggregate(value ~ ID, data = data , FUN = "z")
> nrow(aggregate(isna ~ ID, data = data , FUN = "z"))
[1] 323
One of the ways to do this is using table:
df2 <- table(df$Id, !is.na(df$value))[,2]
data.frame(ID = names(df2), value = df2)
Data
structure(list(Id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4), value = c(NA,
1, 1, 2, 2, NA, 3, NA, 3, 3, 4, 4)), .Names = c("Id", "value"
), row.names = c(NA, -12L), class = "data.frame")
Base R you can use your utility function like this:
stack(by(data$value, data$ID, FUN=function(x) sum(!is.na(x))))
you can directly use table for this purpose. Below is the sample code:
df1 <- structure(list(Id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4), value = c(2,
1, 1, NA, NA, NA, 3, NA, 3, 3, 4, 4)), .Names = c("Id", "value"
), row.names = c(NA, -12L), class = "data.frame")
df2 <- as.data.frame.matrix(with(df1, table(Id, value)))
resultDf <- data.frame(Id=row.names(df2), count=apply(df2, 1, sum))
resultDf
The code makes a table of id and value. Then it just sums the non-na values from the table. Hope this is easy to understand and helps.

Replace NAs in one vector with sequential elements of another vector

I'd like to replace NA elements of a vector with elements from a sequence, for example:
x <- c(1, NA, 5, NA, NA, 2, 12, NA)
replace.seq <- -1:-4 # Can assume length(replace.seq) == sum(is.na(x))
goal <- c(1, -1, 5, -2, -3, 2, 12, -4)
What's an efficient way to do this? I'd prefer to avoid sorting x.
Per #akrun:
x[is.na(x)] <- replace.seq
You can use replace:
x <- replace(x, is.na(x), replace.seq)

Merging columns and removing NA

I have a data frame:
A<- c(NA, 1, 2, NA, 3, NA)
R<- c(2, 1, 2, 1, NA, 1)
C<- c(rep ("B",3), rep ("D", 3))
data1<-data.frame (A,R,C)
data1
And I wan to merge column A and R, to have a data frame like data2
AR<- c(2, 1, 2, 1, 3, 1)
C<- c(rep ("B",3), rep ("D", 3))
data2<-data.frame (AR,C)
data2
Do you know how I can do that?
You might want to consider what happens if "A" and "R" have different values, but this should work:
data2 <- with(data1, data.frame(AR=ifelse(is.na(A), R, A), C=C))

Resources