This error occurred repeatedly while attempting to use the function "gdcRNAmerge" from the package "gdcRNAtools." Several solutions have been proposed, but the problem persists. I would appreciate it if anyone could assist me in resolving this matter. The error message is: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 2 did not have 9 element)
Below are the codes I used:
BiocManager::install("GDCRNATools")
library(GDCRNATools)
project <- 'TCGA-STAD'
rnadir <- paste(project, 'RNAseq', sep='/')
gdcRNADownload(project.id = 'TCGA-STAD',
data.type = 'RNAseq',
write.manifest = FALSE,
method = 'gdc-client',
directory = rnadir)
metaMatrix.RNA <- gdcParseMetadata(project.id = 'TCGA-STAD',
data.type = 'RNAseq',
write.meta = FALSE)
table(metaMatrix.RNA$sample_type)
table(metaMatrix.RNA$gender)
metaMatrix.RNA <- gdcFilterDuplicate(metaMatrix.RNA)
metaMatrix.RNA <- gdcFilterSampleType(metaMatrix.RNA)
table(metaMatrix.RNA$sample_type)
rnaCounts <- gdcRNAMerge(metadata = metaMatrix.RNA,
path = rnadir,
organized = FALSE,
data.type = 'RNAseq')
I'm new at using GIS with R and I'm trying to open an ENVI file containing hyperspectral data following the suggestions from this post R how to read ENVI .hdr-file?, but I don't seem to be able to do so. I tried three different approaches but all of them failed. I also can't seem to find any other posts where my problem is described.
# install.packages("rgdal")
# install.packages("raster")
# install.packages("caTools")
library("rgdal")
library("raster")
library("caTools")
dirname <- "S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights"
filename <- file.path(dirname, "AISA_Flight_4_resampled")
file.exists(filename)
The first option that I tried was using file name only
x <- read.ENVI(filename)
But I got the following error message:
#Error in read.ENVI(filename) :
# read.ENVI: Could not open input file: S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights/AISA_Flight_4_resampled
#In addition: Warning message:
# In nRow * nCol * nBand : NAs produced by integer overflow
I tried then the second option which is using file name + header file name read using file.path
headerfile <- file.path(dirname, "AISA_Flight_4_resampled")
x <- read.ENVI(filename = filename,headerfile = headerfile)
Again, I got an error message that says:
#Error in read.ENVI(filename = filename, headerfile = headerfile) :
# read.ENVI: Could not open input header file: S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights/AISA_Flight_4_resampled
Finally, I tried the third option by using file name + header file name read using readLines
hdr_file <- readLines(con = "S:/LAB-cavender/4_Project_Folders/oakWilt/oak_wilt_image_analyses/R_input/6.15.2021 - Revisions/ENVI export/AISA/Resampled_flights/AISA_Flight_4_resampled.hdr")
x <- read.ENVI(filename = filename,headerfile = hdr_file)
But I got the error message:
#Error in read.ENVI(filename = filename, headerfile = hdr_file) :
# read.ENVI: Could not open input header file: ENVIdescription = { Spectrally Resampled File. Input number of bands: 63, output number of bands: 115. [Fri Jun 25 16:57:21 2021]}samples = 5187lines = 6111bands = 115header offset = 0file type = ENVI Standarddata type = 4interleave = bilsensor type = Unknownbyte order = 0map info = {UTM, 1.000, 1.000, 482828.358, 5029367.353, 7.5000000000e-001, 7.5000000000e-001, 15, North, WGS-84, units=Meters}coordinate system string = {PROJCS["UTM_Zone_15N",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-93.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]}default bands = {46,31,16}wavelength units = Nanometersdata ignore value = -9999.00000000e+000band names = { Resampled
# In addition: Warning message:
# In if (!file.exists(headerfile)) stop("read.ENVI: Could not open input header file: ", :
# the condition has length > 1 and only the first element will be used
Any help would be really appreciated!
I tried to enter these commands but it does not work... can someone help please?
OPts <- read.csv(OILBRENT)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
'file' must be a character string or connection
OPts <- ts(OILBRENT, start = c(jun-87), end = c(Jan-20), frequency =12)
Error in ts(OILBRENT, start = c(jun - 87), end = c(Jan - 20), frequency = 12) :
object 'jun' not found
OPts <- ts(OILBRENT, start = c(jun-87, 1), end = c(Dec-19, 12), frequency =12)
Error in ts(OILBRENT, start = c(jun - 87, 1), end = c(Dec - 19, 12), frequency = 12) :
object 'jun' not found
seems you are missing some quotation marks.
read.csv expects either a character pointing to a file, or a connection. If you are trying to read a file in your current working directory try OPts <- read.csv("OILBRENT").
Both ts commands seem to be failing because the start parameter is expecting either a number or a numeric vector. When you write start = c(jun-87), R will try to find an object jun and then compute the difference jun - 87. From the man page for ts:
start: the time of the first observation. Either a single number or
a vector of two integers, which specify a natural time unit
and a (1-based) number of samples into the time unit. See
the examples for the use of the second form.
I am learning quantmod package. I wrote following codes, but R shows me error. Please help me!
getSymbols("AMZN", from = "2010-01-01", to = "2019-12-20", src = "yahoo")
AMZN_adj = adjustOHLC(AMZN)
The error is
Error in vapply(parse(text = fr[, 2]), eval, numeric(1)) :
values must be length 1,
but FUN(X[[1]]) result is length 3
In addition: Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'C:\Users\User\AppData\Local\Temp\RtmpmYhsTS\file964478a68e'
I have check the source code of the function adjustOHLC(), I couldn't find vapply(parse()) anywhere.
Can someone help me to explain what happens in my code?
I have a huge csv file. Its size is around 9 gb. I have 16 gb of ram. I followed the advises from the page and implemented them below.
If you get the error that R cannot allocate a vector of length x, close out of R and add the following line to the ``Target'' field:
--max-vsize=500M
Still I am getting the error and warnings below. How should I read the file of 9 gb into my R? I have R 64 bit 3.3.1 and I am running below command in the rstudio 0.99.903. I have windows server 2012 r2 standard, 64 bit os.
> memory.limit()
[1] 16383
> answer=read.csv("C:/Users/a-vs/results_20160291.csv")
Error: cannot allocate vector of size 500.0 Mb
In addition: There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
2: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
3: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
4: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
5: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
6: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
7: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
8: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
9: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
10: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
11: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
12: In scan(file = file, what = what, sep = sep, quote = quote, ... :
Reached total allocation of 16383Mb: see help(memory.size)
------------------- Update1
My 1st try based upon suggested answer
> thefile=fread("C:/Users/a-vs/results_20160291.csv", header = T)
Read 44099243 rows and 36 (of 36) columns from 9.399 GB file in 00:13:34
Warning messages:
1: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv", :
Reached total allocation of 16383Mb: see help(memory.size)
2: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv", :
Reached total allocation of 16383Mb: see help(memory.size)
------------------- Update2
my 2nd try based upon suggested answer is as below
thefile2 <- read.csv.ffdf(file="C:/Users/a-vs/results_20160291.csv", header=TRUE, VERBOSE=TRUE,
+ first.rows=-1, next.rows=50000, colClasses=NA)
read.table.ffdf 1..
Error: cannot allocate vector of size 125.0 Mb
In addition: There were 14 warnings (use warnings() to see them)
How could I read this file into a single object so that I can analyze the entire data in one go
------------------update 3
We bought an expensive machine. It has 10 cores and 256 gb ram. That is not the most efficient solution but it works at least in near future. I looked at below answers and I dont think they solve my problem :( I appreciate these answers. I want to perform the market basket analysis and I dont think there is no other way around rather than keeping my data in RAM
Make sure you're using 64-bit R, not just 64-bit Windows, so that you can increase your RAM allocation to all 16 GB.
In addition, you can read in the file in chunks:
file_in <- file("in.csv","r")
chunk_size <- 100000 # choose the best size for you
x <- readLines(file_in, n=chunk_size)
You can use data.table to handle reading and manipulating large files more efficiently:
require(data.table)
fread("in.csv", header = T)
If needed, you can leverage storage memory with ff:
library("ff")
x <- read.csv.ffdf(file="file.csv", header=TRUE, VERBOSE=TRUE,
first.rows=10000, next.rows=50000, colClasses=NA)
You might want to consider leveraging some on-disk processing and not have that entire object in R's memory. One option would be to store the data in a proper database then have R access that. dplyr is able to deal with a remote source (it actually writes the SQL statements to query the database). I've just tested this with a small example (a mere 17,500 rows), but hopefully it scales up to your requirements.
Install SQLite
https://www.sqlite.org/download.html
Enter the data into a new SQLite database
Save the following in a new file named import.sql
CREATE TABLE tableName (COL1, COL2, COL3, COL4);
.separator ,
.import YOURDATA.csv tableName
Yes, you'll need to specify the column names yourself (I believe) but you can specify their types here too if you wish. This won't work if you have commas anywhere in your names/data, of course.
Import the data into the SQLite database via the command line
sqlite3.exe BIGDATA.sqlite3 < import.sql
Point dplyr to the SQLite database
As we're using SQLite, all of the dependencies are handled by dplyr already.
library(dplyr)
my_db <- src_sqlite("/PATH/TO/YOUR/DB/BIGDATA.sqlite3", create = FALSE)
my_tbl <- tbl(my_db, "tableName")
Do your exploratory analysis
dplyr will write the SQLite commands needed to query this data source. It will otherwise behave like a local table. The big exception will be that you can't query the number of rows.
my_tbl %>% group_by(COL2) %>% summarise(meanVal = mean(COL3))
#> Source: query [?? x 2]
#> Database: sqlite 3.8.6 [/PATH/TO/YOUR/DB/BIGDATA.sqlite3]
#>
#> COL2 meanVal
#> <chr> <dbl>
#> 1 1979 15.26476
#> 2 1980 16.09677
#> 3 1981 15.83936
#> 4 1982 14.47380
#> 5 1983 15.36479
This may not be possible on your computer. In certain cases, data.table takes up more space than its .csv counterpart.
DT <- data.table(x = sample(1:2,10000000,replace = T))
write.csv(DT, "test.csv") #29 MB file
DT <- fread("test.csv", row.names = F)
object.size(DT)
> 40001072 bytes #40 MB
Two OOM larger:
DT <- data.table(x = sample(1:2,1000000000,replace = T))
write.csv(DT, "test.csv") #2.92 GB file
DT <- fread("test.csv", row.names = F)
object.size(DT)
> 4000001072 bytes #4.00 GB
There is natural overhead to storing an object in R. Based on these numbers, there is roughly a 1.33 factor when reading files, However, this varies based on data. For example, using
x = sample(1:10000000,10000000,replace = T) gives a factor roughly 2x (R:csv).
x = sample(c("foofoofoo","barbarbar"),10000000,replace = T) gives a factor of 0.5x (R:csv).
Based on the max, your 9GB file would take a potential 18GB of memory to store in R, if not more. Based on your error message, it is far more likely that you are hitting hard memory constraints vs. an allocation issue. Therefore, just reading your file in chucks and consolidating would not work - you would also need to partition your analysis + workflow. Another alternative is to use an in-memory tool like SQL.
This would be horrible practice, but depending on how you need to process this data, it shouldn't be too bad. You can change your maximum memory that R is allowed to use by calling memory.limit(new) where new an integer with R's new memory.limit in MB. What will happen is when you hit the hardware constraint, windows will start paging memory onto the hard drive (not the worst thing in the world, but it will severely slow down your processing).
If you are running this on a server version of windows paging will possibly (likely) work different than from regular Windows 10. I believe it should be faster as the Server OS should be optimized for this stuff.
Try starting of with something along the lines of 32 GB (or memory.limit(memory.limit()*2)) and if it comes out MUCH larger than that, I would say that the program will end up being too slow once it is loaded into memory. At that point I would recommend buying some more RAM or finding a way to process in parts.
You could try splitting your processing over the table. Instead of operating on the whole thing, put the whole operation inside a for loop and do it 16, 32, 64, or however many times you need to. Any values you need for later computation can be saved. This isn't as fast as other posts but it will definitely return.
x = number_of_rows_in_file / CHUNK_SIZE
for (i in c(from = 1, to = x, by = 1)) {
read.csv(con, nrows=CHUNK_SIZE,...)
}
Hope that helps.