Create data for portfolio in R - r

I have data in Excel. Suppose I read it like this (only one series is shown below):
ccl<-ts(mysheets$CCL$`Adj Close`,start=c(2000, 1), end=c(2012, 12), frequency=12)
ccl.r<-diff(log(ccl), lag=1)
Then, I construct a vector with all the data:
data<-cbind(aal.r, adm.r, aht.r, anto.r, arm.r, av.r, azn.r, ba.r, bab.r, barc.r, bats.r,bdev.r, bkg.r, blnd.r, blt.r, bnzl.r, bta.r, bznl.r, ccl.r)
Then, I try to insert the data into format of fportfolio, by using:
ewSpec<-portfolioSpec()
nAssets<-ncol(data)
setWeights(ewSpec)<-rep(1/nAssets, time=nAssets)
mydata<-portfolioData(data=data, spec=portfolioSpec())
However, I get this error:
Error in portfolioData(data = data, spec = portfolioSpec()) :
object 'assetsNames' not found
In addition: Warning messages:
1: In if (class(data) == "timeSeries") { :
the condition has length > 1 and only the first element will be used
2: In if (class(data) == "list") { :
the condition has length > 1 and only the first element will be used

This was solved by making the matrix a "timeSeries" object. Thanks for reading the question...

Related

non-numeric argument to binary operator error while my data are numeric

I have following code to filter my data which is a large csv file. Running the following code, gives the error:
Error in gedi_table[, col_index] - gedi_table$digital_elevation_model_srtm :
non-numeric argument to binary operator" apears.
What is the solution? How can I fix it?
filter_gedi_table_2 = function(gedi_table, alg_num) {
## Function filtering GEDI table by removing shots with erroneous ground elevation
# gedi_table: data.frame
# alg_num: numeric
# Returns a data.frame
col_index = grep(paste0('elev_lowestmode_a', alg_num), colnames(gedi_table))
gedi_table_filter_2 = gedi_table[-which(abs(gedi_table[, col_index] - gedi_table$digital_elevation_model_srtm) > 100
& abs(gedi_table[, col_index] - gedi_table$digital_elevation_model) > 100), ]
return(gedi_table_filter_2)
}

R: object with negative row.name value

I think I have the same issue with this: What's the difference between row.names() and attributes$row.names?
When I use dput now I get something like this:
-0.0120067403271522, -0.00712477902137182, -0.0105058179972997,
-0.0115956365572667, -0.00507521571067687, -0.013870827853567,
-0.0160501419238977, -0.00225243465241482, -0.0145865320678265,
-0.00118232647592066, -0.0190385732141539, 0.0108223868283294,
-0.0159300331503545, 0.0319315053338279, 0, 0.00315703437341087,
0.0368045045454188, -0.0276264287281491, -0.0101235678857984,
0.00486601316019395)), class = "data.frame", row.names = c(NA,
-11834L))
I discovered this while I was trying to force define rownames(var) <- c(list_of_row_names).
I get the error:
Error in .rowNamesDF<-(x, value = value) : invalid 'row.names'
length`
The thing is this object has values inside it. Anyone can tell me how I can rewind/fix this?
From my understanding, this happened bc R didnt know row names when this object was created?
The length of that variable list_of_row_names does not match with the nrow() of the data frame
See an example given below:
df <- data.frame(1:5)
list_of_row_names <- letters[1:4]
rownames(df) <- list_of_row_names
Error in row.names<-.data.frame(*tmp*, value = value) :
invalid 'row.names' length
nrow(df)
#[1] 5
length(list_of_row_names)
# [1] 4

PCA with result non-interactively in R

I send you a message because I would like realise an PCA in R with the package ade4.
I have the data "PAYSAGE" :
All the variables are numeric, PAYSAGE is a data frame, there are no NAS or blank.
But when I do :
require(ade4)
ACP<-dudi.pca(PAYSAGE)
2
I have the message error :
**You can reproduce this result non-interactively with:
dudi.pca(df = PAYSAGE, scannf = FALSE, nf = NA)
Error in if (nf <= 0) nf <- 2 : missing value where TRUE/FALSE needed
In addition: Warning message:
In as.dudi(df, col.w, row.w, scannf = scannf, nf = nf, call = match.call(), :
NAs introduced by coercion**
I don't understand what does that mean. Have you any idea??
Thank you so much
I'd suggest sharing a data set/example others could access, if possible. This seems data-specific and with NAs introduced by coercion you may want to check the type of your input - typeof(PAYSAGE) - the manual for dudi.pca states it takes a data frame of numeric values as input.
Yes, for example :
ag_div <- c(75362,68795,78384,79087,79120,73155,58558,58444,68795,76223,50696,0,17161,0,0)
canne <- c(rep(0,10),5214,6030,0,0,0)
prairie_el<- c(60, rep(0,13),76985)
sol_nu <- c(18820,25948,13150,9903,12097,21032,35032,35504,25948,20438,12153,33096,15748,33260,44786)
urb_peu_d <- c(448,459,5575,5902,5562,458,6271,6136,459,1850,40,13871,40,13920,28669)
urb_den <- c(rep(0,12),14579,0,0)
veg_arbo <- c(2366,3327,3110,3006,3049,2632,7546,7620,3327,37100,3710,0,181,0,181)
veg_arbu <- c(18704,18526,15768,15527,15675,18886,12971,12790,18526,15975,22216,24257,30962,24001,14523)
eau <- c(rep(0,10),34747,31621,36966,32165,28054)
PAYSAGE<-data.frame(ag_div,canne,prairie_el,sol_nu,urb_peu_d,urb_den,veg_arbo,veg_arbu,eau)
require(ade4)
ACP<-dudi.pca(PAYSAGE)

Error in as(x, class(k)) : no method or default for coercing “NULL” to “data.frame”

I am currently facing an error mentioned below which is related to NULL values being coerced to a data frame. The data set does contain nulls, however I have tried both is.na() and is.null() functions to replace the null values with something else. The data is stored on hdfs and is stored in a pig.hive format. I have also attached the code below. The code works fine if I remove v[,25] from the key.
Code:
AM = c("AN");
UK = c("PP");
sample.map <- function(k,v){
key <- data.frame(acc = v[!which(is.na(v[,1],1],
year = substr(v[!which(is.na(v[,1]),2],1,4),
month = substr(v[!which(is.na(v[,1]),2],5,6))
value <- data.frame(v[,3],count=1)
keyval(key,value)
}
sample.reduce <- function(key,v){
AT <- sum(v[which(v[,1] %in% AM=="TRUE"),2])
UnknownT <- sum(v[which(v[,1] %in% UK=="TRUE"),2])
Total <- AT + UnknownT
d <- data.frame(AT,UnknownT,Total)
keyval(key,d)
}
out <- mapreduce(input ="/user/hduser/input",
output = "/user/hduser/output",
input.format = make.input.format("pig.hive", sep = "\u0001")
output.format = make.output.format("csv", sep = ","),
map= sample.map)
reduce = sample.reduce)
Error:
Warning in asMethod(object) : NAs introduced by coercion
Warning in split.default(1:rmr.length(y), unique(ind), drop = TRUE) : data length is not a multiple of split variable
Warning in rmr.split(x, x, FALSE, keep.rownames = FALSE) : number of items to replace is not a multiple of replacement length Warning in split.default(1:rmr.length(y), unique(ind), drop = TRUE) :
data length is not a multiple of split variable
Warning in rmr.split(v, ind, lossy = lossy, keep.rownames = TRUE) : number of items to replace is not a multiple of replacement length
Error in as(x, class(k)) :
no method or default for coercing “NULL” to “data.frame”
Calls: <Anonymous> ... apply.reduce -> c.keyval -> reduce.keyval -> lapply -> FUN -> as No traceback available
UPDATE
I have added the sample data and edited the code above. Hope this helps!
Sample Data:
NULL,"2014-03-14","PP"
345689202,"2014-03-14","AN"
234539390,"2014-03-14","PP"
123125444,"2014-03-14","AN"
NULL,"2014-03-14","AN"
901828393,"2014-03-14","AN"
There are some issues with as which have been identified recently. I don't see why as can't handle this by default, but you can modify coerce which handles the conversion with an S4 method to call as.data.frame.
setMethod("coerce",c("NULL","data.frame"), function(from, to, strict=TRUE) as.data.frame(from))
[1] "coerce"
as(NULL,"data.frame")
data frame with 0 columns and 0 rows

Reading large fixed format text file in r

I am trying to input a large (> 70 MB) fixed format text file into r. For a smaller file (< 1MB), I can use the read.fwf() function as shown below.
condodattest1a <- read.fwf(impfile1,widths=testcsv3$Varlen,col.names=testcsv3$Varname)
When I try to run the line of code below,
condodattest1 <- read.fwf(impfile,widths=testcsv3$Varlen,col.names=testcsv3$Varname)
I get the following error message:
Error: cannot allocate vector of size 2 Kb
The only difference between the 2 lines is the size of the input file.
The formatting for the file I want to import is given in the dataframe called testcsv3. I show a small snippet of the dataframe below:
> head(testcsv3)
Varlen Varname Varclass Varsep Varforfmt
1 2 "V1" "character" 2 "A2.0"
2 15 "V2" "character" 17 "A15.0"
3 28 "V3" "character" 45 "A28.0"
4 3 "V4" "character" 48 "F3.0"
5 1 "V5" "character" 49 "A1.0"
6 3 "V6" "character" 52 "A3.0"
At least part of my problem is that I am reading in all the data as factors when I use read.fwf() and I end up exceeding the memory limit on my computer.
I tried to use read.table() as a way of formatting each variable but it seems I need a text delimiter with that function. There is a suggestion in section 3.3 in the link below that I could use sep to identify the column where every variable starts.
http://data.princeton.edu/R/readingData.html
However, when I use the command below:
condodattest1b <- read.table(impfile1,sep=testcsv3$Varsep,col.names=testcsv3$Varname, colClasses=testcsv3$Varclass)
I get the following error message:
Error in read.table(impfile1, sep = testcsv3$Varsep, col.names = testcsv3$Varname, : invalid 'sep' argument
Finally, I tried to use:
condodattest1c <- read.fortran(impfile1,lengths=testcsv3$Varlen, format=testcsv3$Varforfmt, col.names=testcsv3$Varname)
but I get the following message:
Error in processFormat(format) : missing lengths for some fields
In addition: Warning messages:
1: In processFormat(format) : NAs introduced by coercion
2: In processFormat(format) : NAs introduced by coercion
3: In processFormat(format) : NAs introduced by coercion
All I am trying to do at this point is format the data when they come into r as something other than factors. I am hoping this will limit the amount of memory I am using and allow me to actually input the file. I would appreciate any suggestions about how I can do this. I know the Fortran formats for all the variables and the column at which each variable begins.
Thank you,
Warren
Maybe this code works for you. You have to fill varlen with the field sizes and add the corresponding type strings (e.g. numeric, character, integer) to colclasses
my.readfwf <- function(filename,varlen,colclasses) {
sidx <- cumsum(c(1,varlen[1:(length(varlen)-1)]))
eidx <- sidx+varlen-1
filecontent <- scan(filename,character(0),sep="\n")
if (any(diff(nchar(filecontent))!=0))
stop("line lengths differ!")
nlines <- length(filecontent)
res <- list()
for (i in seq_along(varlen)) {
res[[i]] <- sapply(filecontent,substring,first=sidx[i],last=eidx[i])
mode(res[[i]]) <- colclasses[i]
}
attributes(res) <- list(names=paste("V",seq_along(res),sep=""),row.names=seq_along(res[[1]]),class="data.frame")
return(res)
}

Resources