In R, I have a matrix df, and vector invalidAfterIndex. For the ith row, I would like to set all elements with index greater than invalidAfterIndex[i] to be NA. For example:
> df <- data.frame(matrix(rnorm(20), nrow=5))
> df
X1 X2 X3 X4
1 2.124042819 -0.2862224 0.1686977 2.14838198
2 0.777763004 0.2949123 -0.4331421 -0.81278586
3 -0.003226624 -0.2326152 -1.5779695 -1.23193913
4 0.165975919 -0.1879981 -0.8214994 -1.40267202
5 1.299195865 -0.9418217 -1.5302512 0.03164781
> invalidAfterIndex <- c(2,3,1,4,1)
I would like to have:
X1 X2 X3 X4
1 2.124042819 -0.2862224 NA NA
2 0.777763004 0.2949123 -0.4331421 NA
3 -0.003226624 NA NA NA
4 0.165975919 -0.1879981 -0.8214994 -1.40267202
5 1.299195865 NA NA NA
How can I do this without a for loop?
You can do
is.na(df) <- col(df) > invalidAfterIndex
Or, as #digEmAll suggested
df[col(df) > invalidAfterIndex] <- NA
Here is an option with Map, with which you can pass the column and column position to a function where you can replace elements with index surpassing the invalidAfterIndex with NA:
df[] <- Map(function(col, index) replace(col, index > invalidAfterIndex, NA), df, seq_along(df))
df
# X1 X2 X3 X4
#1 2.124042819 -0.2862224 NA NA
#2 0.777763004 0.2949123 -0.4331421 NA
#3 -0.003226624 NA NA NA
#4 0.165975919 -0.1879981 -0.8214994 -1.402672
#5 1.299195865 NA NA NA
Related
I have a dataframe
df= data.frame(a=c(56,23,15,10),
b=c(43,NA,90.7,30.5),
c=c(12,7,10,2),
d=c(1,2,3,4),
e=c(NA,45,2,NA))
I want to select two random non-NA row values from each row and convert the rest to NA
Required Output- will differ because of randomness
df= data.frame(
a=c(56,NA,15,NA),
b=c(43,NA,NA,NA),
c=c(NA,7,NA,2),
d=c(NA,NA,3,4),
e=c(NA,45,NA,NA))
Code Used
I know to select random non-NA value from specific rows
set.seed(2)
sample(which(!is.na(df[1,])),2)
But no idea how to apply it all dataframe and get the required output
You may write a function to keep n random values in a row.
keep_n_value <- function(x, n) {
x1 <- which(!is.na(x))
x[-sample(x1, n)] <- NA
x
}
Apply the function by row using base R -
set.seed(123)
df[] <- t(apply(df, 1, keep_n_value, 2))
df
# a b c d e
#1 NA NA 12 1 NA
#2 NA NA 7 2 NA
#3 NA 90.7 10 NA NA
#4 NA 30.5 NA 4 NA
Or if you prefer tidyverse -
purrr::pmap_df(df, ~keep_n_value(c(...), 2))
Base R:
You could try column wise apply (sapply) and randomly replace two non-NA values to be NA, like:
as.data.frame(sapply(df, function(x) replace(x, sample(which(!is.na(x)), 2), NA)))
Example Output:
a b c d e
1 56 NA 12 NA NA
2 23 NA NA 2 NA
3 NA NA 10 3 NA
4 NA 30.5 NA NA NA
One option using dplyr and purrr could be:
df %>%
mutate(pmap_dfr(across(everything()), ~ `[<-`(c(...), !seq_along(c(...)) %in% sample(which(!is.na(c(...))), 2), NA)))
a b c d e
1 56 43.0 NA NA NA
2 23 NA 7 NA NA
3 15 NA NA NA 2
4 NA 30.5 2 NA NA
I have a data.table like this:
dt <- data.table(asset=c("x1","x2","x3","x4","x5"),
min_s1=c(.1,NA,NA,.1,NA),
min_s2=c(NA,.5,.5,NA,NA),
min_s3=c(.15,NA,NA,NA,.15))
I can manually subset on the NA values as follows which gives me the output I want:
empty1 <- dt[is.na(min_s1)]
empty2 <- dt[is.na(min_s2)]
empty3 <- dt[is.na(min_s3)]
But, what I really need to do is subset dynamically using the column name and also name the result incorporating the i variable in a loop. The loop is important because this will ultimately be used in a parallel computing script. I would like something like this (it doesn't work; just showing what I am looking for):
foreach (i in 1:3) %do% {
empty(i) <- dt[is.na(min_s(i))]
}
I tried using the following as well as many of its variations to no avail:
paste0("empty",i) <- dt[is.na(paste0("min_s",i))]
Any ideas how I could accomplish this?
I use a Windows 7 pc.
Thanks.
We can just loop through the 'min' columns using lapply and subset the dataset
lapply(dt[,-1, with =FALSE], function(x) dt[is.na(x)])
#$min_s1
# asset min_s1 min_s2 min_s3
#1: x2 NA 0.5 NA
#2: x3 NA 0.5 NA
#3: x5 NA NA 0.15
#$min_s2
# asset min_s1 min_s2 min_s3
#1: x1 0.1 NA 0.15
#2: x4 0.1 NA NA
#3: x5 NA NA 0.15
#$min_s3
# asset min_s1 min_s2 min_s3
#1: x2 NA 0.5 NA
#2: x3 NA 0.5 NA
#3: x4 0.1 NA NA
I hope I've understood your question correctly, so try the following:
dt <- data.table(asset=c("x1","x2","x3","x4","x5"),
min_s1=c(.1,NA,NA,.1,NA),
min_s2=c(NA,.5,.5,NA,NA),
min_s3=c(.15,NA,NA,NA,.15))
vec_store <- c()
empty <- list()
names <- names(dt)[!grepl("asset", names(dt))]
for(i in names){
vec_store <- dt[is.na(dt[,get(i)])]
empty[[paste0(i)]] <- vec_store
}
This gives you:
> empty
$min_s1
asset min_s1 min_s2 min_s3
1: x2 NA 0.5 NA
2: x3 NA 0.5 NA
3: x5 NA NA 0.15
$min_s2
asset min_s1 min_s2 min_s3
1: x1 0.1 NA 0.15
2: x4 0.1 NA NA
3: x5 NA NA 0.15
$min_s3
asset min_s1 min_s2 min_s3
1: x2 NA 0.5 NA
2: x3 NA 0.5 NA
3: x4 0.1 NA NA
In your code, you can't use paste0(empty,i) to create objects, I find the easiest way to do this is to create an empty vector, or data.table, then store it in a list. Once it is in the list you can do operations on the list, or pull it out of the list individually. That is why i created the vec_store an empty vector, and empty an empty list.
Hope that helps.
This question already has answers here:
Rename multiple columns by names
(20 answers)
Closed 6 years ago.
I have a list of data frames names
I used objects() to get them.
my_list <- objects()
my_list
[1]"df1"
[2]"df2"
[3]"df3"
[4]"df4"
...
Each data frame has 7 columns
I have 3 different character vectors v1,v2,v3(length 4) that I want to use to name the first 4 columns of the data frames. I basically want to reuse those vectors in that order until all the columns in my data.frame list are named.
IMPORTANT: I want to use the 3 vectors to name all the data frames. v1 to name df1, v2 to name df2, v3 to name df3, v1 AGAIN to name df4, etc...
df1
X1 X2 X3 X4 X5 X6 X7
1 NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA
v1 <- c(a,b,c,d)
magic(my_list)
df1
a b c d X5 X6 X7
1 NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA
...
Setnames from the data.table package will work.
setnames(df,old = c(1:4), new = v1[1:4])
edit: If you want to do that for your entire list, you can use lapply.
lapply(l, function(x) setnames(x,old = c(1:4), new = v1[1:4]))
edit2:Recycling the 3 vectors, and keeping it a bit easy to read -
for (i in 1:length(l))
{
if (i%%3 == 1) {
setnames(l[[i]],old = c(1:4), new = v1[1:4])
}
else if (i%%3 == 2) {
setnames(l[[i]],old = c(1:4), new = v2[1:4])
}
if (i%%3 == 0) {
setnames(l[[i]],old = c(1:4), new = v3[1:4])
}
}
Say you have a dataframe of four columns:
dat <- data.frame(A = rnorm(5), B = rnorm(5), C = rnorm(5), D = rnorm(5))
And you want to insert an empty column between each of the columns in the dataframe, so that the output is:
A A1 B B1 C C1 D D1
1 1.15660588 NA 0.78350197 NA -0.2098506 NA 2.07495662 NA
2 0.60107853 NA 0.03517539 NA -0.4119263 NA -0.08155673 NA
3 0.99680981 NA -0.83796981 NA 1.2742644 NA 0.67469277 NA
4 0.09940946 NA -0.89804952 NA 0.3419173 NA -0.95347049 NA
5 0.28270734 NA -0.57175554 NA -0.4889045 NA -0.11473839 NA
How would you do this?
The dataframe I would like to do this operation to has hundreds of columns and so obviously I don't want to type out each column and add them naively like this:
dat$A1 <- NA
dat$B1 <- NA
dat$C1 <- NA
dat$D1 <- NA
dat <- dat[, c("A", "A1", "B", "B1", "C", "C1", "D", "D1")]
Thanks for you help in advance!
You can try
res <- data.frame(dat, dat*NA)[order(rep(names(dat),2))]
res
# A A.1 B B.1 C C.1 D D.1
#1 1.15660588 NA 0.78350197 NA -0.2098506 NA 2.07495662 NA
#2 0.60107853 NA 0.03517539 NA -0.4119263 NA -0.08155673 NA
#3 0.99680981 NA -0.83796981 NA 1.2742644 NA 0.67469277 NA
#4 0.09940946 NA -0.89804952 NA 0.3419173 NA -0.95347049 NA
#5 0.28270734 NA -0.57175554 NA -0.4889045 NA -0.11473839 NA
NOTE: I am leaving the . in the column names as it is a trivial task to remove it.
Or another option is
dat[paste0(names(dat),1)] <- NA
dat[order(names(dat))]
you can try this
df <- cbind(dat, dat)
df <- df[, sort(names(df))]
df[, seq(2, 8,by=2)] <- NA
names(df) <- sub("\\.", "", names(df))
# create new data frame with twice the number of columns
bigdat <- data.frame(matrix(ncol = dim(dat)[2]*2, nrow = dim(dat)[1]))
# set sequence of target column indices
inds <- seq(1,dim(bigdat)[2],by=2)
# insert values
bigdat[,inds] <- dat
# set column names
colnames(bigdat)[inds] <- colnames(dat)
I have two lists (from a multi-wave survey) that look like this:
X1 X2
1 NA
NA 2
NA NA
How can I easily combine this into a third item, where the third column always takes the non-NA value of column X1 or X2, and codes NA when both values are NA?
Combining Gavin's use of within and Prasad's use of ifelse gives us a simpler answer.
within(df, x3 <- ifelse(is.na(x1), x2, x1))
Multiple ifelse calls are not needed - when both values are NA, you can just take one of the values directly.
Another way using ifelse:
df <- data.frame(x1 = c(1, NA, NA, 3), x2 = c(NA, 2, NA, 4))
> df
x1 x2
1 1 NA
2 NA 2
3 NA NA
4 3 4
> transform(df, x3 = ifelse(is.na(x1), ifelse(is.na(x2), NA, x2), x1))
x1 x2 x3
1 1 NA 1
2 NA 2 2
3 NA NA NA
4 3 4 3
This needs a little extra finesse-ing due to the possibility of both X1 and X2 being NA, but this function can be used to solve your problem:
foo <- function(x) {
if(all(nas <- is.na(x))) {
NA
} else {
x[!nas]
}
}
We use the function foo by applying it to each row of your data (here I have your data in an object named dat):
> apply(dat, 1, foo)
[1] 1 2 NA
So this gives us what we want. To include this inside your object, we do this:
> dat <- within(dat, X3 <- apply(dat, 1, foo))
> dat
X1 X2 X3
1 1 NA 1
2 NA 2 2
3 NA NA NA
You didn't say what you wanted done when both were valid numbers, but you can use either pmax or pmin with the na.rm argument:
pmax(df$x1, df$x2, na.rm=TRUE)
# [1] 1 2 NA 4