Subscript with matrix generated by assign() - r

I assigned a matrix to a name which varies with j:
j <- 2L
assign(paste0("pca", j,".FAVAR_fcst", sep=""), matrix(ncol=24, nrow=12))
This works very neat. Then I try to access a column of that matrix
paste0("pca", j,".FAVAR_fcst", sep="")[,2]
and get the following error:
Error in paste0("pca", j, ".FAVAR_fcst", sep = "")[, 2] :
incorrect number of dimensions
I've tried several variations and combinations with cat(), print() and capture.output(), but nothing seems to work. I'm not sure what I have to search exactly for and couldn't find a solution. Can you help me?

You can use get :
get(paste0("pca", j,".FAVAR_fcst", sep="")) # for the matrix
get(paste0("pca", j,".FAVAR_fcst", sep=""))[,2] # for the column
# [1] NA NA NA NA NA NA NA NA NA NA NA NA
An other solution would be to combine eval and as.symbol :
eval(as.symbol(paste0("pca", j,".FAVAR_fcst", sep="")))[,2]
# [1] NA NA NA NA NA NA NA NA NA NA NA NA

Related

How do I apply a formula to each value in a data frame?

I've created a formula that calculates the exponential moving average of data:
myEMA <- function(price, n) {
ema <- c()
data_start <- which(!is.na(price))[1]
ema[1:data_start+n-2] <- NA
ema[data_start+n-1] <- mean(price[data_start:(data_start+n-1)])
beta <- 2/(n+1)
for(i in (data_start+n):length(price)) {
ema[i] <- beta*price[i] +
(1-beta)*ema[i-1]
}
ema <- reclass(ema,price)
return(ema)
}
The data I'm using is:
pricesupdated <- data.frame(a = seq(1,100), b = seq(1,200,2), c = c(NA,NA,NA,seq(1,97)))
I would like to create a dataframe where I apply the formula to each variable in my above data.frame. My attempt was:
frameddata <- data.frame(myEMA(pricesupdated,12))
But the error message that I get is:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'mean': undefined columns selected
I'm able to print the answer that I want, but not create a dataframe...
Can you help me?
First of all myEMA() is a function, not a formula. Check out help("function") and help("formula") for details on what the distinction is.
The myEMA() function takes a numeric vector as its first argument and returns a numeric vector with the same dimensions as its first argument.
A data.frame object is bascially just a list of vectors with a special class attribute. The most common way to repeat a function call across each element in a list is to use one of the *apply family of functions. For example, you can use lapply(), which will calls myEMA once on each variable in pricesupdated and returns a list with one element per function call containing that function call's returned value (a numeric vector). This list can be easily converted back to data.frame() since all its elements have the same length:
results <- lapply(pricesupdated, myEMA, n = 12)
# look at the structure of the results object
> str(results)
List of 3
$ a: num [1:100] NA NA NA NA NA NA NA NA NA NA ...
$ b: num [1:100] NA NA NA NA NA NA NA NA NA NA ...
$ c: num [1:100] NA NA NA NA NA NA NA NA NA NA ...
frameddata <- as.data.frame(results)
# look at the top 15 records in this object
> head(frameddata, 15)
a b c
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 NA NA NA
7 NA NA NA
8 NA NA NA
9 NA NA NA
10 NA NA NA
11 NA NA NA
12 6.5 12 NA
13 7.5 14 NA
14 8.5 16 NA
15 9.5 18 6.5
The question is likely a duplicate, ...
but the apply-family might help, e.g.
sapply(pricesupdated, myEMA, n=12)
for reproducibilty, it would be benificial to add require(pec)

Creating multiple columns in data table with a for loop

My data table looks like:
head(data)
Date AI AGI ADI ASI ARI ERI NVRI SRI FRI IRI
1: 1991-09-06 NA 2094.19 NA NA NA NA NA NA NA NA
2: 1991-09-13 NA 2204.94 NA NA NA NA NA NA NA NA
3: 1991-09-20 NA 2339.10 NA NA NA NA NA NA NA NA
4: 1991-09-27 NA 2387.81 NA NA NA NA NA NA NA NA
5: 1991-10-04 NA 2459.94 NA NA NA NA NA NA NA NA
6: 1991-10-11 NA 2571.07 NA NA NA NA NA NA NA NA
Don't worry about the NAs. What I want to do is make a "percentage change" column for each of the columns apart from date.
What I've done so far is:
names_no_date <- unique(names(data))[!unique(names(data)) %in% "Date"]
for (i in names_no_date){
data_ch <- data[, paste0(i, "ch") := i/shift(i, n = 1, type = "lag")-1]}
I get the error:
Error in i/shift(i, n = 1, type = "lag") :
non-numeric argument to binary operator
I'm wondering how I get around this error?
i is a string, so you are trying to divide a string in i/shift(i, n = 1, type = "lag"):
> "AI"/NA
Error in "AI"/NA : non-numeric argument to binary operator
Instead, do
for (i in names_no_date){
data[, paste0(i, "ch") := get(i)/shift(get(i), n = 1, type = "lag")-1]
}
Also see Referring to data.table columns by names saved in variables.
Edit: #Frank writes in the comments that a more concise way to produce OP's output is
data[, paste0(names_no_date, "_pch") := .SD/shift(.SD) - 1, .SDcols=names_no_date]

What does x[is.na(x)] do in R?

I'm following the swirl tutorial, and one of the parts has a vector x defined as:
> x
[1] 1.91177824 0.93941777 -0.72325856 0.26998371 NA NA
[7] -0.17709161 NA NA 1.98079386 -1.97167684 -0.32590760
[13] 0.23359408 -0.19229380 NA NA 1.21102697 NA
[19] 0.78323515 NA 0.07512655 NA 0.39457671 0.64705874
[25] NA 0.70421548 -0.59875008 NA 1.75842059 NA
[31] NA NA NA NA NA NA
[37] -0.74265585 NA -0.57353603 NA
Then when we type x[is.na(x)] we get a vector of all NA's
> x[is.na(x)]
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Why does this happen? My confusion is that is.na(x) itself returns a vector of length 40 with True or False in each entry of the vector depending on whether that entry is NA or not. Why does "wrapping" this vector with x[ ] suddenly subset to the NA's themselves?
This is called logical indexing. It's a very common and neat R idiom.
Yes, is.na(x) gives a boolean ("logical") vector of same length as your vector.
Using that logical vector for indexing is called logical indexing.
Obviously x[is.na(x)] accesses the vector of all NA entries in x, and is totally pointless unless you intend to reassign them to some other value, e.g. impute the median (or anything else)
x[is.na(x)] <- median(x, na.rm=T)
Notes:
whereas x[!is.na(x)] accesses all non-NA entries in x
or compare also to the na.omit(x) function, which is way more clunky
The way R's builtin functions historically do (or don't) handle NAs (by default or customizably) is a patchwork-quilt mess, that's why the x[is.na(x)] idiom is so crucial)
many useful functions (mean, median, sum, sd, cor) are NA-aware, i.e. they support an na.rm=TRUE option to ignore NA values. See here. Also for how to define table_, mode_, clamp_

Opening csv of specific sequences: NAs come out of nowhere?

I feel like this is a relatively straightforward question, and I feel I'm close but I'm not passing edge-case testing. I have a directory of CSVs and instead of reading all of them, I only want some of them. The files are in a format like 001.csv, 002.csv,...,099.csv, 100.csv, 101.csv, etc which should help to explain my if() logic in the loop. For example, to get all files, I'd do something like:
id = 1:1000
setwd("D:/")
filenames = as.character(NULL)
for (i in id){
if(i < 10){
i <- paste("00",i,sep="")
}
else if(i < 100){
i <- paste("0",i,sep="")
}
filenames[[i]] <- paste(i,".csv", sep="")
}
y <- do.call("rbind", lapply(filenames, read.csv, header = TRUE))
The above code works fine for id=1:1000, for id=1:10, id=20:70 but as soon as I pass it id=99:100 or any sequence involving numbers starting at over 100, it introduces a lot of NAs.
Example output below for id=98:99
> filenames
098 099
"098.csv" "099.csv"
Example output below for id=99:100
> filenames
099
"099.csv" NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
"100.csv"
I feel like I'm missing some catch statement in my if() logic. Any insight would be greatly appreciated! :)
You can avoid the loop for creating the filenames
filenames <- sprintf('%03d.csv', 1:1000)
y <- do.call(rbind, lapply(filenames, read.csv, header = TRUE))
#akrun has given you a much better way of solving your task. But in terms of the actual issue with your code, the problem is that for i < 100 you subset by a character vector (implicitly converted using paste) while for i >= 100 you subset by an integer. When you use id = 99:100 this translates to:
filenames <- character(0)
filenames["099"] <- "099.csv" # length(filenames) == 1L
filenames[100] <- "100.csv" # length(filenames) == 100L, with all(filenames[2:99] == NA)
Assigning to a named member of a vector that doesn't yet exist will create a new member at position length(vector) + 1 whereas assigning to a numbered position that is > length(vector) will also fill in every intervening position with NA.
Another approach, although less efficient than #akrun's solution, is with the following function:
merged <- function(id = 1:332) {
df <- data.frame()
for(i in 1:length(id)){
add <- read.csv(sprintf('%03d.csv', id[i]))
df <- rbind(df,add)
}
df
}
Now, you can merge the files with:
dat <- merged(99:100)
Furthermore, you can assign columnnames by inserting the following line in the function just before the last line with df:
colnames(df) <- c(..specify the colnames in here..)

how can I add to columns in R

I cannot seem to add two columns in R.
when I try
dat$V1 + dat$V2
I get
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Warning message:
In Ops.factor(dat$V1, dat$V2) : + not meaningful for factors
lots of other questions suggest to do as I have done, however as you can see this does not work for me. what is the problem?
Try to convert your factor columns to numeric: If V1 and V2 are 1st two columns.
dat[,1:2] <- lapply(dat[,1:2], function(x) as.numeric(as.character(x)))
dat$V1 +dat$V2
For example:
dat <- data.frame(V1= factor(1:5), V2= factor(6:10))
dat$V1+dat$V2
#[1] NA NA NA NA NA
#Warning message:
#In Ops.factor(dat$V1, dat$V2) : + not meaningful for factors
dat[,1:2] <- lapply(dat[,1:2], function(x) as.numeric(as.character(x)))
dat$V1 +dat$V2
#[1] 7 9 11 13 15

Resources