I have a dataframe that looks like this:
x1 y1 z1 x2 y2 z2
1 6 7 8 5 4 10
2 7 8 9 6 5 11
3 8 9 10 7 6 12
4 9 10 11 8 7 13
5 10 11 12 9 8 14
6 11 12 13 10 9 15
Now I want to change the values in x1 and x2 according to this rule: Every value in x1 or in x2 that is greater than 8 should be subtracted by eight, every value in x1 or x2 that is smaller that is 8 or smaller should be replaced by NA. Additionally, if a value in x1 or x2 is replaced by NA y1/y2 and z1/z2 should be also set to NA. The dataframe should look like this.
x1 y1 z1 x2 y2 z2
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 1 10 11 NA NA NA
5 2 11 12 1 8 14
6 3 12 13 2 9 15
The code to generate the dataframe
df1<-data.frame("x1"=6:11,"y1"=7:12,"z1"=8:13,"x2"=5:10,"y2"=4:9,"z2"=10:15)
We create two indexes based for 'x1' and 'x2' and assign the values based on those index
i1 <- df1$x1 <=8 #x1 index
i2 <- df1$x2 <=8 #x2 index
nm1 <- grep("1$", names(df1)) #column index for suffix 1 in column names
nm2 <- grep("2$", names(df1)) #column index for suffix 2 in column names
df1[i1,nm1] <- NA #set the values for suffix 1 columns to NA
df1[i2, nm2] <- NA #set the values for suffix 2 columns to NA
df1[c('x1', 'x2')] <- df1[c('x1', 'x2')] - 8 #subtract 8 from the 'x' columns
df1
# x1 y1 z1 x2 y2 z2
#1 NA NA NA NA NA NA
#2 NA NA NA NA NA NA
#3 NA NA NA NA NA NA
#4 1 10 11 NA NA NA
#5 2 11 12 1 8 14
#6 3 12 13 2 9 15
We have a condition in two variables, and then a series of reactions in case of this conditions are TRUE.
# Activate the condition for x1 and x2
df1$x1 <- ifelse(df1$x1 > 8, df1$x1 - 8, NA)
df1$x2 <- ifelse(df1$x2 > 8, df1$x2 - 8, NA)
# Reaction of other variables to a external condition
df1$y1 <- ifelse(df1$x1 > 8, NA, df1$y1)
df1$y2 <- ifelse(df1$x2 > 8, NA, df1$y2)
# Reaction of other variables to a external condition
df1$z1 <- ifelse(df1$x1 > 8, NA, df1$z1)
df1$z2 <- ifelse(df1$x2 > 8, NA, df1$z2)
library(dplyr)
df[,c("x1","x2")] <- sapply(df[,c("x1","x2")],function(x)ifelse(x>8,x-8,NA))
df %>%
mutate(y1=replace(y1,which(x1%in%NA),NA))%>%
mutate(z1=replace(z1,which(x1%in%NA),NA))%>%
mutate(y2=replace(y2,which(x2%in%NA),NA))%>%
mutate(z2=replace(z2,which(x2%in%NA),NA))
x1 y1 z1 x2 y2 z2
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 1 10 11 NA NA NA
5 2 11 12 1 8 14
6 3 12 13 2 9 15
Related
I am trying to create a new set of variables based on observations at 5 different time points. However, there is not an observation for each row at each time point. Assuming it looks something like this:
X1 <- c(NA,NA,7,8,1,5)
X2 <- c(NA,0,0,NA,3,7)
X3 <- c(NA,2,3,4,2,7)
X4 <- c(1,1,5,2,1,7)
X5 <- c(2,NA,NA,4,3,NA)
df <- data.frame(X1,X2,X3,X4,X5)
X1 X2 X3 X4 X5
1 NA NA NA 1 2
2 NA 0 2 1 NA
3 7 0 3 5 NA
4 8 NA 4 2 4
5 1 3 2 1 3
6 5 7 7 7 NA
I want to create 5 new variables, say T1 - T5 so that T1 is propagated with the first non-NA value in that row and then for each value following to remain the same.
X1 X2 X3 X4 X5 T1 T2 T3 T4 T5
1 NA NA NA 1 2 1 2 NA NA NA
2 NA 0 2 1 NA 0 2 1 NA NA
3 7 0 3 5 NA 7 0 3 5 NA
4 8 NA 4 2 4 8 NA 4 2 4
5 1 3 2 1 3 1 3 2 1 3
6 5 7 7 7 NA 5 7 7 7 NA
Any suggestions? Thank you in advance!
fun <- function(z) {
ind <- which.max(!is.na(z))
if (!length(ind)) ind <- 1;
c(z[ind:length(z)], if (ind > 1) z[1:(ind-1)])
}
cbind(df, setNames(as.data.frame(t(apply(df, 1, fun))), sub("^X", "T", names(df))))
# X1 X2 X3 X4 X5 T1 T2 T3 T4 T5
# 1 NA NA NA 1 2 1 2 NA NA NA
# 2 NA 0 2 1 NA 0 2 1 NA NA
# 3 7 0 3 5 NA 7 0 3 5 NA
# 4 8 NA 4 2 4 8 NA 4 2 4
# 5 1 3 2 1 3 1 3 2 1 3
# 6 5 7 7 7 NA 5 7 7 7 NA
Walkthrough:
within fun, the which.max will return the first non-NA within the vector (which will be a "row" within the frame); in a corner-case where all values are NA, it returns integer(0), so we need to verify its length before indexing the vector;
apply(., 1, fun) converts df to a matrix, then applies the function fun on each row;
since apply(., 1, ..) returns a transposed matrix, we t(.) transpose it;
since that returns a matrix, we as.data.frame(.) it, then change the column names with setNames and sub(.);
finally, cbind it with the original data.
I have generated random data like this.
data <- replicate(10,sample(0:9,10,rep=FALSE))
ind <- which(data %in% sample(data, 5))
#now replace those indices in data with NA
data[ind]<-NA
#here is our vector with 15 random NAs
data = as.data.frame(data)
rownames(data) = 1:10
colnames(data) = 1:10
data
which results in a data frame like this. How can I reorder the entry value such that if the entry is numeric, then the value will be placed in a (row number - 1), and NA will be put in any rows where there is no value matching the (row number -1). The data I want, for example, the first column, should look like this
.
How can I do this? I have no clue at all. We can order decreasing or increasing and put NA in the last order, but that is not what I want.
You can make a helper function to assign values to indices at (values + 1), then apply the function over all columns:
fx <- function(x) {
vals <- x[!is.na(x)]
pos <- vals + 1
out <- rep(NA, length(x))
out[pos] <- vals
out
}
as.data.frame(sapply(data, fx))
1 2 3 4 5 6 7 8 9 10
1 NA 0 NA 0 0 0 0 NA 0 0
2 NA NA NA 1 1 NA NA NA NA NA
3 2 NA 2 2 NA NA NA NA 2 NA
4 3 NA 3 3 NA NA 3 NA 3 3
5 4 4 4 4 NA 4 NA 4 4 NA
6 5 5 NA 5 NA NA 5 5 5 NA
7 NA 6 6 NA 6 NA NA 6 NA NA
8 7 NA 7 7 NA 7 7 NA NA 7
9 NA NA NA NA 8 8 8 8 8 8
10 9 9 NA NA 9 NA NA 9 NA 9
Starting data:
set.seed(13)
data <- replicate(10, sample(
c(0:9, rep(NA, 10)),
10,
replace = FALSE
))
data <- as.data.frame(data)
colnames(data) <- 1:10
data
1 2 3 4 5 6 7 8 9 10
1 2 NA NA 2 NA NA 0 NA 3 7
2 4 NA NA 4 NA NA NA NA 2 9
3 9 9 NA 3 9 4 NA 6 4 0
4 NA NA NA 1 6 NA NA 4 NA NA
5 5 6 3 0 NA NA 5 8 8 NA
6 NA NA 7 NA NA NA 7 NA 5 3
7 3 4 6 NA 1 0 NA 5 NA NA
8 NA NA NA 7 0 7 NA NA 0 NA
9 NA 0 4 NA 8 8 8 9 NA 8
10 7 5 2 5 NA NA 3 NA NA NA
I'm trying to clean my data. Let's imagine that we've got a vector of 20 values with several NAs:
set.seed(1234)
x <- rnorm(20, mean = 10, sd = 5) %>% round
x[c(6, 8, 12, 16, 19)] <- NA
So it looks smth like this:
> 4 11 15 -2 12 NA 7 NA 7 6 8 NA 6 10 15 NA 7 5 NA 22
I need to replace those values which are enclosed with NA with NA). E.g. 7 from my vector should be NA cause previous and next values are NA. I can do it with ifelse statement and some dplyr functions:
library(dplyr)
ifelse(is.na(lag(x))&is.na(lead(x)), NA, x)
> 4 11 15 -2 12 NA NA NA 7 6 8 NA 6 10 15 NA 7 5 NA NA
The question is how can I replace two values enclosed with NA. 7 and 5 for example? I was trying to duplicate the condition, i.e. make lag(lag(x)) and lead(lead(x)) but I get a mess.
ifelse(is.na(lag(x))&is.na(lead(x)) | is.na(lead(lead(x)))&is.na(lag(lag(x))), NA, x)
> 4 11 15 -2 12 NA NA NA 7 NA 8 NA 6 NA 15 NA 7 5 NA NA
We can group per NA and count the length of each group. If it has length 3, then that means that the group consist of NA, value, value. We simply replace those values with NA.
i1 <- cumsum(is.na(x))
x[ave(i1, i1, FUN = function(i)length(i)) == 3] <- NA
#[1] 4 11 15 -2 12 NA 7 NA 7 6 8 NA 6 10 15 NA NA NA NA 22
Assuming I have the following data frame:
x1 <- c(12:4, 5:8, NA, NA)
x2 <- c(15:8, 9:15)
x3 <- c(14:9, 10:13, NA, NA, NA, NA, NA)
df <- data.frame(x1, x2, x3)
How can I search for the minimum value in each column, delete all values before and fill the columns with NAs to preserve equal length? NAs should only be added to the end of the cols, so that the lowest values are in row Nr. 1.
My real dfs have varying numbers of cols and rows.
The desired result is:
x1 x2 x3
1 4 8 9
2 5 9 10
.
.
8 NA 15 NA
I'm assuming your "df" is actually:
df <- data.frame(x1, x2, x3)
In that case you can try something like:
data.frame(lapply(df, function(y) {
y[1:which.min(y)] <- NA
y
}))
# x1 x2 x3
# 1 NA NA NA
# 2 NA NA NA
# 3 NA NA NA
# 4 NA NA NA
# 5 NA NA NA
# 6 NA NA NA
# 7 NA NA 10
# 8 NA NA 11
# 9 NA 9 12
# 10 5 10 13
# 11 6 11 NA
# 12 7 12 NA
# 13 8 13 NA
# 14 NA 14 NA
# 15 NA 15 NA
After reading your comment and edit, perhaps this is what you are looking for instead:
data.frame(lapply(df, function(y) {
x1 <- rep(NA, nrow(df))
x2 <- which.min(y):length(y)
x1[seq_along(x2)] <- y[x2]
x1
}))
# x1 x2 x3
# 1 4 8 9
# 2 5 9 10
# 3 6 10 11
# 4 7 11 12
# 5 8 12 13
# 6 NA 13 NA
# 7 NA 14 NA
# 8 NA 15 NA
# 9 NA NA NA
# 10 NA NA NA
# 11 NA NA NA
# 12 NA NA NA
# 13 NA NA NA
# 14 NA NA NA
# 15 NA NA NA
I have a dataframe x with this values:
x1 x2 x3
1 NA 4 1
2 NA 3 NA
3 4 NA 2
4 NA 1 11
5 NA 2 NA
6 5 NA 1
7 5 9 NA
8 NA 2 NA
A simple question: How do I get the highest value? (11)
Use max() with the na.rm argument set to TRUE:
dat <- read.table(text="
x1 x2 x3
1 NA 4 1
2 NA 3 NA
3 4 NA 2
4 NA 1 11
5 NA 2 NA
6 5 NA 1
7 5 9 NA
8 NA 2 NA", header=TRUE)
Get the maximum:
max(dat, na.rm=TRUE)
[1] 11
To find the sum of a column, you might want to unlist it first;
max(unlist(myDataFrame$myColumn), na.rm = TRUE)
Source
you could write a column maximum function, colMax.
colMax <- function(data) sapply(data, max, na.rm = TRUE)
Use colMax function on sample data:
colMax(x)
# x1 x2 x3
# 5.0 9.0 11.0