All possible combinations for large numbers in R - r

Image i have a sequence like this:
seq <- rep(0:9, 10)
I want to know all possible combinations of this sequence. For sure, command combn isn't working:
> comb <- combn(seq, 10)
Error in matrix(r, nrow = len.r, ncol = count) :
invalid 'ncol' value (too large or NA)
In addition: Warning message:
In combn(seq, 10) : NAs introduced by coercion to integer range
Can you give me a hint how to make my own function for all possible combinations?

Based on your reply to the comment , here is one thing you can do . You need the combinat package installed for this to work.
library(combinat)
seq <- c(1,2,3,4,5,6,7,8,9,0)
permn(seq)

Related

How to create data frame for super large vectors? ​

I have 7 verylarge vectors, c1 to c7. My task is to simply create a data frame. However when I use data.frame(), error message returns.
> newdaily <- data.frame(c1,c2,c3,c4,c5,c6,c7)
Error in if (mirn && nrows[i] > 0L) { :
missing value where TRUE/FALSE needed
Calls: data.frame
In addition: Warning message:
In attributes(.Data) <- c(attributes(.Data), attrib) :
NAs introduced by coercion to integer range
Execution halted
They all have the same length (2,626,067,374 elements), and I’ve checked there’s no NA.
I tried subsetting 1/5 of each vector and data.frame() function works fine. So I guess it has something to do with the length/size of the data? Any ideas how to fix this problem? Many thanks!!
Update
both data.frame and data.table allow vectors shorter than 2^31-1. Stil can't find the solution to create one super large data.frame, so I subset my data instead... hope larger vectors will be allowed in the future.
R's data.frames don't support such long vectors yet.
Your vectors are longer than 2^31 - 1 = 2147483647, which is the largest integer value that can be represented. Since the data.frame function/class assumes that the number of rows can be represented by an integer, you get an error:
x <- rep(1, 2626067374)
DF <- data.frame(x)
#Error in if (mirn && nrows[i] > 0L) { :
# missing value where TRUE/FALSE needed
#In addition: Warning message:
#In attributes(.Data) <- c(attributes(.Data), attrib) :
# NAs introduced by coercion to integer range
Basically, something like this happens internally:
as.integer(length(x))
#[1] NA
#Warning message:
# NAs introduced by coercion to integer range
As a result the if condition becomes NA and you get the error.
Possibly, you could use the data.table package instead. Unfortunately, I don't have sufficient RAM to test:
library(data.table)
DT <- data.table(x = rep(1, 2626067374))
#Error: cannot allocate vector of size 19.6 Gb
For that kind of data size, you must to optmize your memory, but how?
You need to write these values in a file.
output_name = "output.csv"
lines = paste(c1,c2,c3,c4,c5,c6,c7, collapse = ";")
cat(lines, file = output_name , sep = "\n")
But probably you'll need to analyse them too, and (as it was said before) it requires a lot of memory.
So you have to read the file by their lines (like, 20k lines) by iteration to opmize your RAM memory, analyse these values, save their results and repeat..
con = file(output_name )
while(your_conditional) {
lines_in_this_round = readLines(con, n = 20000)
# create data.frame
# analyse data
# save result
# update your_conditional
}
I hope this helps you.

R: read.table with colClasses gives Error in integer(n) : vector size cannot be NA/NaN

I'm trying to read a simple dataframe into R using read.table. While reading the table I want to specify that the first 3 columns are of type character, while the remaining 4 columns are of type numeric.
I'm specifying the column types to prevent R from dropping the leading 0's in columns 2 and 3, as they're required for DB lookups. Here's what I'm using:
df.img <- read.table('https://gist.githubusercontent.com/duhaime/46dde948263136d0b52be1575232a83e/raw/80f14650e4f4b9ef38a5dec3f5bbb8c62954ee59/match-stats.tsv',
sep='\t',
colClasses=c(replicate('character', 3), replicate('numeric', 4)))
This returns:
Error in integer(n) : vector size cannot be NA/NaN
In addition: Warning message:
In integer(n) : NAs introduced by coercion
Does anyone know how I can update my read.table command to correctly read in my columns with the desired types? Any help would be appreciated!
Aha, I should have been using rep():
df.img <- read.table('https://gist.githubusercontent.com/duhaime/46dde948263136d0b52be1575232a83e/raw/80f14650e4f4b9ef38a5dec3f5bbb8c62954ee59/match-stats.tsv',
sep='\t',
colClasses=c(rep('character', 3), rep('numeric', 4)))

Error in 2:n : NA/NaN argument

The code below produces the following error:
Error in 2:n : NA/NaN argument
How can I resolve this error?
library (pdfetch)
library(tidyverse)
library(xts)
tickers<-c("AXP","MMM","BA","CAT","CVX","CSCO","KO","DWDP","AAPL","XOM","GE","GS","HD","IBM","INTC","HPI","AIV","MCD","MRK","MSFT","NKE","PFE","PG","TRV","JPM","UTX","VZ","V","WMT","DIS")
data<-pdfetch_YAHOO(tickers<- c("^DJI","AXP","MMM","BA","CAT","CVX","CSCO","KO","DWDP","AAPL","XOM","GE","GS","HD","IBM","INTC","HPI","AIV","MCD","MRK","MSFT","NKE","PFE","PG","TRV","JPM","UTX","VZ","V","WMT","DIS"),from = as.Date("2015-03-20"),to = as.Date("2018-03-20"),interval='1mo')
# to remove the nas from the entire data
data[complete.cases(data),]
plus<-data[complete.cases(data),]
plus
str(plus)
head(plus)
tail(plus)
class(plus$Date)
(plus[1:10, "^DJI.adjclose",drop=F])
#Create a new data frame that contains the price data with the dates as the row names
prices <- (plus)[, "^DJI.adjclose", drop = FALSE]
rownames(prices) <-plus$Date
head(prices)
tail(prices)
#to find the return from 3/3/2015-3/8/2018
djia_ret1<- ((prices [2:n,1]-prices [1:(n-1),1])/prices [1:(n-1),1])
Error in 2:n : NA/NaN argument.
This means that one (or both) of the two arguments of : are NA or NaN. 2 is not, so n must be.
In your question you don't show how you created the variable n, but if it was the result of some data that was NA, or a division by zero result for example, that would cause these errors.

R skmeans package - where does this error come from: "missing value where TRUE/FALSE needed"

I tried to cluster my data in accordance with the manual provided by the skmeans packages's manual page
I started by installing all required packages.
I then imported my data table, and made a matrix out of it with:
x <- as.matrix(x)
# See dimensions
dim(x)
[1] 184 4000
When I try to hard partition my data into 5 clusters - as it is done in the manual's first example - like so:
hparty <- skmeans(x, 5, control = list(verbose = TRUE))
I receive the following error message:
Error in if (!all(row_norms(x) > 0)) stop("Zero rows are not allowed.") :
missing value where TRUE/FALSE needed
And when I just type:
test <- skmeans(x, 5)
I get:
Error in skmeans(x, 5) : Zero rows are not allowed.
I'm trying to figure out where this error is coming from, and why the function can't get a TRUE/FALSE value. Has anyone ever experienced this problem?
Thank you in advance!
Spherical means is k-means where every vector is normalized to length 1.
If you have a constant 0 vector, this is not possible, and you cannot use spherical k-means (or cosine similarity).
!all(row_norms(x) > 0))
is the test that you do not have a row of length 0.

Error: In storage.mode(x) <- "double" : NAs introduced by coercion

I'm new to R, but I'm trying to estimate a missing value in a large microarray dataset using impute.knn() from library(impute) using 6 nearest neighbors.
Here's an example:
seq1 <- seq(1:12)
mat1 <- matrix(seq1, 3)
mat1[2,2] <- "NA"
impute.knn(mat1, k=6)
I get the following error:
Error in knnimp.internal(x, k, imiss, irmiss, p, n, maxp = maxp) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In storage.mode(x) <- "double" : NAs introduced by coercion
I've also tried the following:
impute.knn(mat1[2,2], k=6)
and I get the following error:
Error in rep(1, p) : invalid 'times' argument
My google-fu has been off today. Any suggestions to why I might be getting this error?
edit: I've tried
mat1[2,2] <- NA
as James suggested, but I get a segmentation fault. Using
replace(mat1, mat1[2,2], NA)
does not help either. Any other suggestions?
I'm not sure why impute.knn is set up the way it is, but the example within ?impute.knn uses khanmiss which is a data.frame of factors, which when coerced to matrix will be character.
You are getting a segmentation fault because you are trying to impute with K > ncol(mat1) nearest neighbours. It might be worth reported a bug to the package authors, as this could easily be checked in R and return an error, not a C level error which kills R.
mat1 <- matrix(as.character(1:12), 3)
mat1[2,2] <- NA # must not be quoted for it to be a NA value
# mat1 is a 4 column matrix so
impute.knn(mat1, 1)
impute.knn(mat1, 2)
impute.knn(mat1, 3)
impute.knn(mat1, 4)
# Will all work
note
despite the strange example, mat1will when it is integer or double as well
mat1 <- matrix(1:12,3)
mat1[2,2] <- NA
impute.knn(mat1,2)
mat1 <- matrix(seq(0,1,12),3)
mat1[2,2] <- NA
impute.knn(mat1,2)
take home message
Don't try to use impute using more information than you have.
Perhaps the package authors should take heed of
fortunes(15)
It really is hard to anticipate just how silly users can be. —Brian D.
Ripley R-devel (October 2003)
and build in some error checking so a simple error does not cause a segfault.

Resources