Subset using column name defined using paste0 and a variable

Subset using column name defined using paste0 and a variable - r

I have a data.table like this:
dt <- data.table(asset=c("x1","x2","x3","x4","x5"),
min_s1=c(.1,NA,NA,.1,NA),
min_s2=c(NA,.5,.5,NA,NA),
min_s3=c(.15,NA,NA,NA,.15))
I can manually subset on the NA values as follows which gives me the output I want:
empty1 <- dt[is.na(min_s1)]
empty2 <- dt[is.na(min_s2)]
empty3 <- dt[is.na(min_s3)]
But, what I really need to do is subset dynamically using the column name and also name the result incorporating the i variable in a loop. The loop is important because this will ultimately be used in a parallel computing script. I would like something like this (it doesn't work; just showing what I am looking for):
foreach (i in 1:3) %do% {
empty(i) <- dt[is.na(min_s(i))]
}
I tried using the following as well as many of its variations to no avail:
paste0("empty",i) <- dt[is.na(paste0("min_s",i))]
Any ideas how I could accomplish this?
I use a Windows 7 pc.
Thanks.

We can just loop through the 'min' columns using lapply and subset the dataset
lapply(dt[,-1, with =FALSE], function(x) dt[is.na(x)])
#$min_s1
# asset min_s1 min_s2 min_s3
#1: x2 NA 0.5 NA
#2: x3 NA 0.5 NA
#3: x5 NA NA 0.15
#$min_s2
# asset min_s1 min_s2 min_s3
#1: x1 0.1 NA 0.15
#2: x4 0.1 NA NA
#3: x5 NA NA 0.15
#$min_s3
# asset min_s1 min_s2 min_s3
#1: x2 NA 0.5 NA
#2: x3 NA 0.5 NA
#3: x4 0.1 NA NA

I hope I've understood your question correctly, so try the following:
dt <- data.table(asset=c("x1","x2","x3","x4","x5"),
min_s1=c(.1,NA,NA,.1,NA),
min_s2=c(NA,.5,.5,NA,NA),
min_s3=c(.15,NA,NA,NA,.15))
vec_store <- c()
empty <- list()
names <- names(dt)[!grepl("asset", names(dt))]
for(i in names){
vec_store <- dt[is.na(dt[,get(i)])]
empty[[paste0(i)]] <- vec_store
}
This gives you:
> empty
$min_s1
asset min_s1 min_s2 min_s3
1: x2 NA 0.5 NA
2: x3 NA 0.5 NA
3: x5 NA NA 0.15
$min_s2
asset min_s1 min_s2 min_s3
1: x1 0.1 NA 0.15
2: x4 0.1 NA NA
3: x5 NA NA 0.15
$min_s3
asset min_s1 min_s2 min_s3
1: x2 NA 0.5 NA
2: x3 NA 0.5 NA
3: x4 0.1 NA NA
In your code, you can't use paste0(empty,i) to create objects, I find the easiest way to do this is to create an empty vector, or data.table, then store it in a list. Once it is in the list you can do operations on the list, or pull it out of the list individually. That is why i created the vec_store an empty vector, and empty an empty list.
Hope that helps.

Related

Set values greater than index to be NA, per row

In R, I have a matrix df, and vector invalidAfterIndex. For the ith row, I would like to set all elements with index greater than invalidAfterIndex[i] to be NA. For example:
> df <- data.frame(matrix(rnorm(20), nrow=5))
> df
X1 X2 X3 X4
1 2.124042819 -0.2862224 0.1686977 2.14838198
2 0.777763004 0.2949123 -0.4331421 -0.81278586
3 -0.003226624 -0.2326152 -1.5779695 -1.23193913
4 0.165975919 -0.1879981 -0.8214994 -1.40267202
5 1.299195865 -0.9418217 -1.5302512 0.03164781
> invalidAfterIndex <- c(2,3,1,4,1)
I would like to have:
X1 X2 X3 X4
1 2.124042819 -0.2862224 NA NA
2 0.777763004 0.2949123 -0.4331421 NA
3 -0.003226624 NA NA NA
4 0.165975919 -0.1879981 -0.8214994 -1.40267202
5 1.299195865 NA NA NA
How can I do this without a for loop?

You can do
is.na(df) <- col(df) > invalidAfterIndex
Or, as #digEmAll suggested
df[col(df) > invalidAfterIndex] <- NA

Here is an option with Map, with which you can pass the column and column position to a function where you can replace elements with index surpassing the invalidAfterIndex with NA:
df[] <- Map(function(col, index) replace(col, index > invalidAfterIndex, NA), df, seq_along(df))
df
# X1 X2 X3 X4
#1 2.124042819 -0.2862224 NA NA
#2 0.777763004 0.2949123 -0.4331421 NA
#3 -0.003226624 NA NA NA
#4 0.165975919 -0.1879981 -0.8214994 -1.402672
#5 1.299195865 NA NA NA

Combining data frames into one data frame and keep empty data frame as NA

Here is my code:
df1 <- data.frame(Intercept = .4, x1=.4, x2=.2, x3=.7)
df2 <- data.frame(Interceptlego = .5,x2=.8)
df3 <- data.frame()
myList <- list(df1, df2, df3)
do.call(rbind.fill, myList)
I wonder how can I rbind df3 into a single data frame as "NA"s?
I found an article from r_blogger about Combining vectors or data frames of unequal length into one data frame. http://www.r-bloggers.com/r-combining-vectors-or-data-frames-of-unequal-length-into-one-data-frame/
But the data frame I got from my data, some of them are empty which contains "<0 rows> (or 0-length row.names)"
What I want to accomplish is like this
Intercept x1 x2 x3 Interceptlego
1 0.4 0.4 0.2 0.7 NA
2 NA NA 0.8 NA 0.5
3 NA NA NA NA NA

I don't really follow your logic. An empty data.frame doesn't contain any observations and should not result in a row after rbinding.
Anyway, assuming you know a column name that occurs at least once:
myList <- lapply(myList, function(df) {
if (!ncol(df)) df <- data.frame(Intercept = NA)
df
})
library(data.table)
rbindlist(myList, fill = TRUE)
# Intercept x1 x2 x3 Interceptlego
#1: 0.4 0.4 0.2 0.7 NA
#2: NA NA 0.8 NA 0.5
#3: NA NA NA NA NA
Use setDF subsequently if you prefer not having a data.table.

R autoKrige loop (package automap)

I have a data frame with daily observations, which I would like to interpolate. I use automap to build a variogram for every days and then apply on new data. I try to run a loop and put the results in a new dataframe. Unfortunately, the data frame with the results only contains the last predicted day.
coordinates(mydata) <- ~lat+long
coordinates(new_data) <- ~lat+long
df <- data.frame(matrix(nrow=50,ncol=10)) #new data frame for predicted valeus
for(i in 1:ncol(mydata))
kriging_new <- autoKrige(mydata[,i],mydata,newdata)
pred <- kriging_new$krige_output$var1.pred
df[,i] <- data.frame(pred)
The result looks like this, all the columns should be filled with values, not just the last one:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 NA NA NA NA NA NA NA NA NA 12.008726
2 NA NA NA NA NA NA NA NA NA 6.960499
3 NA NA NA NA NA NA NA NA NA 10.894787
4 NA NA NA NA NA NA NA NA NA 14.378945
5 NA NA NA NA NA NA NA NA NA 17.719522
I also get a warning, saying:
Warning message:
In autofitVariogram(formula, data_variogram, model = model, kappa = kappa, :
Some models where removed for being either NULL or having a negative sill/range/nugget,
set verbose == TRUE for more information
If I do autoKrige manually for each row, everything works fine. It seems the loop is not working as it usually does. Is this some problem in the automap package?
Thanks a lot!

I think you just forgot to enclose the code in your for loop in curly brackets. As a result you execute the loop 10 times, overwriting kriging_new with itself every time:
for(i in 1:ncol(mydata))
kriging_new <- autoKrige(mydata[,i],mydata,newdata)
Only then do you assign the result from your last iteration:
pred <- kriging_new$krige_output$var1.pred
and finally assign those to the last column of your data frame holding your predictions (the loop counter i is still set to 10 at this point):
df[, i] <- data.frame(pred)
Always write loops with multiple lines of statements like this:
for (condition) {
statement1
statement2
...
}

R Loop Script to Create Many, Many Variables

I want to create a lot of variables across several separate dataframes which I will then combine into one grand data frame.
Each sheet is labeled by a letter (there are 24) and each sheet contributes somewhere between 100-200 variables. I could write it as such:
a$varible1 <- NA
a$variable2 <- NA
.
.
.
w$variable25 <- NA
This can/will get ugly, and I'd like to write a loop or use a vector to do the work. I'm having a heck of a time doing it though.
I essentially need a script which will allow me to specify a form and then just tack numbers onto it.
So,
a$variable[i] <- NA
where [i] gets tacked onto the actual variable created.

I just learnt this neat little trick from #eddi
#created some random dataset with 3 columns
library(data.table)
a <- data.table(
a1 = c(1,5),
a2 = c(2,1),
a3 = c(3,4)
)
#assuming that you now need to ad more columns from a4 to a200
# first, creating the sequence from 4 to 200
v = c(4:200)
# then using that sequence to add the 197 more columns
a[, paste0("a", v) :=
NA]
# now a has 200 columns, as compared to the three we initiated it with
dim(a)
#[1] 2 200

I don't think you actually need this, although you seem to think so for some reason.
Maybe something like this:
a <- as.data.frame(matrix(NA, ncol=10, nrow=5))
names(a) <- paste0("Variable", 1:10)
print(a)
# Variable1 Variable2 Variable3 Variable4 Variable5 Variable6 Variable7 Variable8 Variable9 Variable10
# 1 NA NA NA NA NA NA NA NA NA NA
# 2 NA NA NA NA NA NA NA NA NA NA
# 3 NA NA NA NA NA NA NA NA NA NA
# 4 NA NA NA NA NA NA NA NA NA NA
# 5 NA NA NA NA NA NA NA NA NA NA

If you want variables with different types:
p <- 10 # number of variables
N <- 100 # number of records
vn <- vector(mode="list", length=p)
names(vn) <- paste0("V", seq(p))
vn[1:8] <- NA_real_ # numeric
vn[9:10] <- NA_character_ # character
df <- as.data.frame(lapply(vn, function(x, n) rep(x, n), n=N))

Combining Survey Items in R/ Recoding NAs

I have two lists (from a multi-wave survey) that look like this:
X1 X2
1 NA
NA 2
NA NA
How can I easily combine this into a third item, where the third column always takes the non-NA value of column X1 or X2, and codes NA when both values are NA?

Combining Gavin's use of within and Prasad's use of ifelse gives us a simpler answer.
within(df, x3 <- ifelse(is.na(x1), x2, x1))
Multiple ifelse calls are not needed - when both values are NA, you can just take one of the values directly.

Another way using ifelse:
df <- data.frame(x1 = c(1, NA, NA, 3), x2 = c(NA, 2, NA, 4))
> df
x1 x2
1 1 NA
2 NA 2
3 NA NA
4 3 4
> transform(df, x3 = ifelse(is.na(x1), ifelse(is.na(x2), NA, x2), x1))
x1 x2 x3
1 1 NA 1
2 NA 2 2
3 NA NA NA
4 3 4 3

This needs a little extra finesse-ing due to the possibility of both X1 and X2 being NA, but this function can be used to solve your problem:
foo <- function(x) {
if(all(nas <- is.na(x))) {
NA
} else {
x[!nas]
}
}
We use the function foo by applying it to each row of your data (here I have your data in an object named dat):
> apply(dat, 1, foo)
[1] 1 2 NA
So this gives us what we want. To include this inside your object, we do this:
> dat <- within(dat, X3 <- apply(dat, 1, foo))
> dat
X1 X2 X3
1 1 NA 1
2 NA 2 2
3 NA NA NA

You didn't say what you wanted done when both were valid numbers, but you can use either pmax or pmin with the na.rm argument:
pmax(df$x1, df$x2, na.rm=TRUE)
# [1] 1 2 NA 4

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Subset using column name defined using paste0 and a variable - r

Related

Set values greater than index to be NA, per row

Combining data frames into one data frame and keep empty data frame as NA

R autoKrige loop (package automap)

R Loop Script to Create Many, Many Variables

Combining Survey Items in R/ Recoding NAs

Categories

Resources