What does this mean? Bad usage: input 'data' is not double type - r

I'm running cls and cluster packages.
I wanted to run the basic
cls.attrib(mymat,vect)
function, but I'm getting this error.
Bad usage: input 'data' is not double type.
Its usually an an error with the type of data I supplied I believe, but I ran str and class and can't find anything out of place.
> class (mymat)
[1] "matrix"
> str(mymat)
int [1:20, 1:2] 74 73 72 70 69 76 77 78 77 78 ...
and for the vector,
> class(vect)
[1] "integer"
> str(vect)
int [1:20] 1 3 3 3 3 1 1 1 1 1 ...
Aren't these the proper params to this function? What might be the reason for this error?

Related

r: Reading libsvm files with library (e1071)

I have generated a libsvm file in scala using the org.apache.spark.mllib.util.MLUtils package.
The file format is as follows:
49.0 109:2.0 272:1.0 485:1.0 586:1.0 741:1.0 767:1.0
49.0 109:2.0 224:1.0 317:1.0 334:1.0 450:1.0 473:1.0 592:1.0 625:1.0 647:1.0
681:1.0 794:1.0
17.0 26:1.0 109:1.0 143:1.0 198:2.0 413:1.0 476:1.0 582:1.0 586:1.0 611:1.0
629:1.0 737:1.0
12.0 255:1.0 394:1.0
etc etc
I read the file into r using e1071 package as follows:
m= read.matrix.csr(filename)
The structure of the resultant matrix.csr is as follows:
$ x:Formal class 'matrix.csr' [package "SparseM"] with 4 slots
.. ..# ra : num [1:31033] 2 1 1 1 1 1 2 1 1 1 ...
.. ..# ja : int [1:31033] 109 272 485 586 741 767 109 224 317 334 ...
.. ..# ia : int [1:2996] 1 7 18 29 31 41 49 65 79 83 ...
.. ..# dimension: int [1:2] 2995 796
$ y: Factor w/ 51 levels "0.0","1.0","10.0",..: 45 45 10 5 42 25 23 41 23 25 ...
When I convert to a dense matrix with as.matrix(m) it produces one column and two rows, each with an uninterpretable (by me) object in it.
When I simply try to save the matrix.csr back to file (without doing any intermediate processing), I get the following error:
Error in abs(x) : non-numeric argument to mathematical function
I am guessing that the libsvm format is incompatible but I'm really not sure.
Any help would be much appreciated.
OK, the short of it:
m= read.matrix.csr(filename)$x
because read.matrix.csr is a list with two elements; the matrix and a vector.
In other words, the target/label/class is separated out from the features matrix.
NOTE for fellow r neophytes: In Cran documents, it seems that the "Value" subheading refers to the return values of the function
Value
If the data file includes no y variable, read.matrix.csr returns
an object of class matrix.csr,
else a list with components:
x object of class matrix.csr
y vector of numeric values or factor levels, depending on fac

long int in dataframe conversion (data.matrix()) or mean() of factor value

I have this type of data.frame :
"id" "var1" "t" "x" "y" "z" "idconnect" "bool1"
924903565 16 64 104 133 87 940539767 1
924903564 14 64 131 95 87 940539931 1
924903563 22 64 135 248 86 924903449 1
but the colMeans() or mean() function doesn't work (return NA, or false value when I use the as.numeric() before): I tried to have the mean(mydata[mydata[, "idconnect"]==940539931, "x"]) for example, but class(mydata[mydata[, "idconnect"]==940539931, "x"]) return "factor". I expected matrix or vector. Why is it factor and not here ? The as.matrix of my factor is weird.
So I tried to convert my dataframe into matrix but sapply(..., as.numerix) or data.matrix(), it returns :
"id" "var1" "t" "x" "y" "z" "idconnect" "bool1"
47 7 442 5 34 97 154228 3
46 5 442 32 395 97 154274 3
45 14 442 36 149 96 45 3
How can I convert my dataframe in matrix (with the respect of my long int value) ?

assign objects to dynamic lists in r

I have a nested loops which produce outputs that I want to store in list objects with dynamic names. A toy example of this would look as follows:
set.seed(8020)
names<-sample(LETTERS,5,replace = F)
for(n in names)
{
#Create the list
assign(paste0("examples_",n),list())
#Poulate the list
get(paste0("examples_",n))[[1]]<-sample(100,10)
get(paste0("examples_",n))[[2]]<-sample(100,10)
get(paste0("examples_",n))[[3]]<-sample(100,10)
}
Unfortunately I keep getting the error:
Error in get(paste0("examples_", n))[[1]] <- sample(100, 10) :
target of assignment expands to non-language object
I have tried all kind of assign, eval, get type of functions to parse the object, but haven't had any luck
Expanding on my comment with a worked example:
examples <- vector(mode="list", length=length(names) )
names(examples) <- names # please change that to mynames
# or almost anything other than `names`
examples <- lapply( examples, function(L) {L[[1]] <- sample(100,10)
L[[2]] <- sample(100,10)
L[[3]] <- sample(100,10); L} )
# Top of the output:
> examples
$P
$P[[1]]
[1] 34 49 6 55 19 28 72 42 14 92
$P[[2]]
[1] 97 71 63 59 66 50 27 45 76 58
$P[[3]]
[1] 94 39 77 44 73 15 51 78 97 53
$F
$F[[1]]
[1] 12 21 89 26 16 93 4 13 62 45
$F[[2]]
[1] 83 21 68 74 32 86 52 49 16 13
$F[[3]]
[1] 14 45 40 46 64 85 88 28 53 42
This mode of programming does become more natural over time. It gets you out of writing clunky for-loops all the time. Develop your algorithms for a single list-node at a time and then use sapply or lapply to iterate the processing.

R - Data Frame is a list of columns?

Question
Is a data frame in R is a list (list is, in my understanding, a sequence of objects) of columns?
What is the design decision in R to have made a data frame a column-oriented (not row-oriented) structure?
Any reference to related design document or article of data structure design would be appreciated.
I am just used to row-as-a-unit/record and would like to know why it is column oriented. Or if I misunderstood something, kindly suggest.
Background
I had thought a dataframe was a sequence of row, such as (Ozone, Solar.R, Wind, Temp, Month, Day).
> c ## data frame created from read.csv()
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
> typeof(c)
[1] "list"
However when lapply() is applied against c to show each list element, it was a column.
> lapply(c, function(arg){ return(arg) })
$Ozone
[1] 41 36 12 18 23 19
$Solar.R
[1] 190 118 149 313 299 99
$Wind
[1] 7.4 8.0 12.6 11.5 8.6 13.8
$Temp
[1] 67 72 74 62 65 59
$Month
[1] 5 5 5 5 5 5
$Day
[1] 1 2 3 4 7 8
Whereas I had expected was
[1] 41 190 7.4 67 5 1
[1] 36 118 8.0 72 5 2
…
1) Is a data frame in R a list of columns?
Yes.
df <- data.frame(a=c("the", "quick"), b=c("brown", "fox"), c=1:2)
is.list(df) # -> TRUE
attr(df, "name") # -> [1] "a" "b" "c"
df[[1]][2] # -> "quick"
2) What is the design decision in R to have made a data frame a column-oriented (not row-oriented) structure?
A data.frame is a list of column vectors.
is.atomic(df[[1]]) # -> TRUE
mode(df[[1]]) # -> [1] "character"
mode(df[[3]]) # -> [1] "numeric"
Vectors can only store one kind of object. A "row-oriented" data.frame would demand data frames be composed of lists instead. Now imagine what the performance of an operation like
df[[1]][20000]
would be in a list-based data frame keeping in mind that random access is O(1) for vectors and O(n) for lists.
3) Any reference to related design document or article of data structure design would be appreciated.
http://adv-r.had.co.nz/Data-structures.html#data-frames

Losing time value in datetime stamp when import data from Access into R

I am importing data with a DateTime stamp from Access to R and continue to 'lose' my time value. I have had a similar issue a while back (posted right here) and I had to convert the times to a number before importing. While this was not too difficult, it is a step I would like to avoid. This post is also helpful and suggests the reason might be because of the large number or records. I am currently trying to import over 110k records.
As an FYI this post is very helpful for info on dealing with times in R, but did not provide a specific solution for this issue.
My data in Access (2013) looks like this.
As you can see I have a UTC and local time, both of which have the date and time in the same field.
I used the following code to read in the table and look at the head.
DataConnect <- odbcConnect("MstrMUP")
Temp <- sqlFetch(DataConnect, "TempData_3Nov2014")
head(Temp)
IndID UTCDateTime LocalDateTime Temp
1 MTG_030_A 2013-02-08 2013-02-08 25
2 MTG_030_A 2013-02-08 2013-02-08 26
3 MTG_030_A 2013-02-08 2013-02-08 31
4 MTG_030_A 2013-02-08 2013-02-08 29
5 MTG_030_A 2013-02-09 2013-02-08 39
6 MTG_030_A 2013-02-09 2013-02-08 44
As you can see, the time portion of the DateTime stamp is missing, and I can not seem to locate it using str or as.numeric, both of which suggest the time value is not stored (at least that is how I read it).
> str(Temp)
'data.frame': 110382 obs. of 4 variables:
$ IndID : Factor w/ 17 levels "BHS_034_A","BHS_035_A",..: 13 13 13 13 13 13 13 13 13 13 ...
$ UTCDateTime : POSIXct, format: "2013-02-08" "2013-02-08" ...
$ LocalDateTime: POSIXct, format: "2013-02-08" "2013-02-08" ...
$ Temp : int 25 26 31 29 39 44 42 49 42 38 ...
> head(as.numeric(MTG30$LocalDateTime))
[1] 1360306800 1360306800 1360306800 1360306800 1360306800 1360306800
Because all numeric values are the same, they must all be the same date, and do not include time. Correct...?
The Question:
Is this an R issue or Access? Any suggestions on how to import 110k rows of data from Access into R without losing the time portion of a DateTime stamp would be appreciated.
I am sure there is a better method than my earlier work around
oh, I almost forgot, I am running the "Sock it to Me" version of R.
EDIT/ADDITION In response to #Richard Scriven thoughts on unclass
Unfortunatly, no, there is not a sec, min, or time value. All are 0.
> temp <- Temp[1:5,]
> unclass(as.POSIXlt(temp$UTCDateTime))
$sec
[1] 0 0 0 0 0
$min
[1] 0 0 0 0 0
$hour
[1] 0 0 0 0 0
$mday
[1] 8 8 8 8 9
$mon
[1] 1 1 1 1 1
$year
[1] 113 113 113 113 113
$wday
[1] 5 5 5 5 6
$yday
[1] 38 38 38 38 39
$isdst
[1] 0 0 0 0 0
$zone
[1] "MST" "MST" "MST" "MST" "MST"
$gmtoff
[1] -25200 -25200 -25200 -25200 -25200
attr(,"tzone")
[1] "" "MST" "MDT"
Thanks in advance.

Resources