I have a zoo object, prices, which, when I type class(prices), it returns “zoo.” I then create a file using:
write.zoo(prices, file = “foo”, index.name = “time”)
The resulting files looks like this:
"time" "AAPL.Adjusted" “SHY.Adjusted"
2013-05-01 60.31 84.12
2013-05-02 61.16 84.11
2013-05-03 61.77 84.08
I then try and read this file with this statement:
myData <- read.zoo(“foo”)
and I get this error:
Error in read.zoo(“foo") :
index has bad entries at data rows: 1 2 3 4
I’ve tried a number of parameter settings and nothing seems to work. Help much appreciated.
Newbie
The file has a header line so try:
z <- read.zoo("foo", header = TRUE, check.names = FALSE)
The check.names part gives nicer looking column names but you could leave it out if that were not important.
Related
When I use R data.table(fread) to read dat file (3GB) a problem occurs:
Stopped early on line 3169933. Expected 136 fields but found 138. Consider fill=TRUE and comment.char=. First discarded non-empty line:
My code:
library(data.table)
file_path = 'data.dat' # 3GB
fread(file_path,fill=TRUE)
The problem is that my file has ~ 5 million rows. In detail:
From row 1 to row 3169933 it has 136 columns
From row 3169933 to row 5000000 it has 138 columns
fread() only reads my file to row 3169933 due to this error. fill = TRUE did not help in this case. Could anyone help me ?
R version: 3.6.3
data.table version: 1.13.2
Note about fill=TRUE in this case:
[Case 1- not my case] if part 1 of my file (50% rows) have 138 columns and part 2 have 136 columns then the fill=TRUE will help (it will fill two column in part 2 with NA)
[Case 2- my case] if part 1 of my file (50% rows) have 136 columns and part 2 have 138 columns then the fill =TRUE will not help in this case.
Not sure why you still have the problem even with fill=T... But if nothing helps, you can try playing with something like this:
tryCatch(
expr = {dt1 <<- fread(file_path)},
warning = function(w){
cat('Warning: ', w$message, '\n\n');
n_line <- as.numeric(gsub('Stopped early on line (\\d+)\\..*','\\1',w$message))
if (!is.na(n_line)) {
cat('Found ', n_line,'\n')
dt1_part1 <- fread(file_path, nrows=n_line)
dt1_part2 <- fread(file_path, skip=n_line)
dt1 <<- rbind(dt1_part1, dt1_part2, fill=T)
}
},
finally = cat("\nFinished. \n")
);
tryCatch() construct catches warning message so you can extract the line number and process it accordingly.
Try to read them separately, combine them after creating two extra columns for the first part.
first_part = fread('data.dat', nrows = 3169933) %>%
mutate(extra_1 = NA, extra_2 = NA)
second_part = fread('data.dat', skip = 3169933)
df = bind_rows(first_part, second_part)
Here is the sample of the XML format in my dataset.
<info>
<a>1990-01-02T06:58:12+08:00</a>
<b>120.980</b>
<c>23.786</c>
<d>18.7</d>
<e>2</e>
</info>
<info>
<a>1990-02-02T06:58:12+08:00</a>
<b>120.804</b>
<c>23.790</c>
</info>
But the numbers of tag is not same as tag , for example there are 4000 rows tag a, b, c, and only 3950 rows for tag d, e
Here is my code in R
library(xml2)
data.frame(Time = xml_text(xml_find_all(xml_data, ".//a")),
Num = xml_text(xml_find_all(xml_data, ".//b")),
Dist = xml_text(xml_find_all(xml_data, ".//c")),
Gap = xml_text(xml_find_all(xml_data, ".//d")),
Type = xml_text(xml_find_all(xml_data, ".//e")),
stringsAsFactors = F)
}) -> df
The error message is: (I knew this will happened)
arguments imply differing number of rows
The output I want will be like the table below:
Time Num Dist Gap Type
1990-01-02T06:58:12+08:00 120.980 23.786 18.7 2
1990-02-02T06:58:12+08:00 120.804 23.790 <NA> <NA>
...
1993-03-03T08:42:15+08:00 120.412 23.523 <NA> 1
Which function or library should I try for this?
Thanks for helping me !!
I have tried some another method like map_if
Finally I found the solution!!
Once we are using the xml file, be sure to get the root node of the records at first.
Here I will show you how it works.
Take this xml file for example: (name it to test.xml)
<dataset>
<dataset_info>
<data_count>2</data_count>
<status>Actual</status>
</dataset_info>
<data>
<time>2019-06-01</time>
<event>event1</event>
<describe>describe for event1</describe>
</data>
<data>
<time>2019-06-02</time>
<event>event2</event>
</data>
</dataset>
We know that there is a tag describe missing in event2, but we hope to make data frame by this xml data. I was taught to use the function xml2::xml_find_all to get the value in the selected tag.
By the R code like this:
# library import
library(xml) #require(xml2)
# file reading
xml <- read_xml("path/where/the/file/is/test.xml")
data.frame(Time = xml_text(xml_find_all(xml, ".//time"))
Event = xml_text(xml_find_all(xml, ".//event"))
Describe = xml_text(xml_find_all(xml, ".//describe"))
)
Then we will get error message arguments imply differing number of rows
So what we need to do is get the root of records first!!
As the code below:
# library import
library(xml) #require(xml2)
# file reading
xml <- read_xml("path/where/the/file/is/test.xml")
record <- xml_find_all(xml, ".//data")
data.frame(Time = xml_text(xml_find_all(record, ".//time"))
Event = xml_text(xml_find_all(record, ".//event"))
Describe = xml_text(xml_find_all(record, ".//describe"))
)
After adding record <- xml_find_all(xml, ".//data"), we will no longer get the error cause by different counting of the results.
Hope this can help !!
I'm trying to write an xlsx file from a list of dataframes that I created but I'm getting an error due to missing data (I couldn't download it). I just want to write the xlsx file besides having this lacking data. Any help is appreciated.
For replication of the problem:
library(quantmod)
name_of_symbols <- c("AKER","YECO","SNOA")
research_dates <- c("2018-11-19","2018-11-19","2018-11-14")
my_symbols_df <- lapply(name_of_symbols, function(x) tryCatch(getSymbols(x, auto.assign = FALSE),error = function(e) { }))
my_stocks_OHLCV <- list()
for (i in 1:3) {
trade_date <- paste(as.Date(research_dates[i]))
OHLCV_data <- my_symbols_df[[i]][trade_date]
my_stocks_OHLCV[[i]] <- data.frame(OHLCV_data)
}
And you can see the missing data down here in my_stocks_OHLCV[[2]] and the write.xlsx error I'm getting:
print(my_stocks_OHLCV)
[[1]]
AKER.Open AKER.High AKER.Low AKER.Close AKER.Volume AKER.Adjusted
2018-11-19 2.67 3.2 1.56 1.75 15385800 1.75
[[2]]
data frame with 0 columns and 0 rows
[[3]]
SNOA.Open SNOA.High SNOA.Low SNOA.Close SNOA.Volume SNOA.Adjusted
2018-11-14 1.1 1.14 1.01 1.1 107900 1.1
write.xlsx(my_stocks_OHLCV, "C:/Users/MICRO/Downloads/Datasets_stocks/dux_OHLCV.xlsx")
Error in (function (..., row.names = NULL, check.rows = FALSE,
check.names = TRUE,:arguments imply differing number of rows: 1, 0
How do I run write.xlsx even though I have this missing data?
The main question you need to ask is, what do you want instead?
As you are working with stock data, the best idea, is that if you don't have data for a stock, then remove it. Something like this should work,
my_stocks_OHLCV[lapply(my_stocks_OHLCV,nrow)>0]
If you want a row full of NA or 0
Then use the lapply function and for each element of the list, of length 0, replace with either NA's, vector of 0's (c(0,0,0,0,0,0)) etc...
Something like this,
condition <- !lapply(my_stocks_OHLCV,nrow)>0
my_stocks_OHLCV[condition] <- data.frame(rep(NA,6))
Here we define the condition variable, to be the elements in the list where you don't have any data. We can then replace those by NA or swap the NA for 0. However, I can't think of a reason to do this.
A variation on your question, and one you could handle inside your for loop, is to check if you have data, and if you don't, replace the values there, with NAs, and you could given it the correct headers, as you know which stock it relates to.
Hope this helps.
I am a new user of R and trying to use mRMRe R package (mRMR is one of the good and well known feature selection approaches) to obtain feature subset from a feature set. Please excuse if my question is simple as I really want to know how I can fix an error. Below is the detail.
Suppose, I have a csv file (gene.csv) having feature set of 6 attributes ([G1.1.1.1], [G1.1.1.2], [G1.1.1.3], [G1.1.1.4], [G1.1.1.5], [G1.1.1.6]) and a target class variable [Output] ('1' indicates positive class and '-1' stands for negative class). Here's a sample gene.csv file:
[G1.1.1.1] [G1.1.1.2] [G1.1.1.3] [G1.1.1.4] [G1.1.1.5] [G1.1.1.6] [Output]
11.688312 0.974026 4.87013 7.142857 3.571429 10.064935 -1
12.538226 1.223242 3.669725 6.116208 3.363914 9.174312 1
10.791367 0.719424 6.115108 6.47482 3.597122 10.791367 -1
13.533835 0.37594 6.766917 7.142857 2.631579 10.902256 1
9.737828 2.247191 5.992509 5.992509 2.996255 8.614232 -1
11.864407 0.564972 7.344633 4.519774 3.389831 7.909605 -1
11.931818 0 7.386364 5.113636 3.409091 6.818182 1
16.666667 0.333333 7.333333 4.333333 2 8.333333 -1
I am trying to get best feature subset of 2 attributes (out of above 6 attributes) and wrote following R code.
library(mRMRe)
file_n<-paste0("E:\\gene", ".csv")
df <- read.csv(file_n, header = TRUE)
f_data <- mRMR.data(data = data.frame(df))
featureData(f_data)
mRMR.ensemble(data = f_data, target_indices = 7,
feature_count = 2, solution_count = 1)
When I run this code, I am getting following error for the statement f_data <- mRMR.data(data = data.frame(df)):
Error in .local(.Object, ...) :
data columns must be either of numeric, ordered factor or Surv type
However, data in each column of the csv file are real number.So, how can I change the R code to fix this problem? Also, I am not sure what should be the value of target_indices in the statement mRMR.ensemble(data = f_data, target_indices = 7,feature_count = 2, solution_count = 1) as my target class variable name is "[Output]" in the gene.csv file.
I will appreciate much if anyone can help me to obtain the best feature subset based on the gene.csv file using mRMRe R package.
I solved the problem by modifying my code as follows.
library(mRMRe)
file_n<-paste0("E:\\gene", ".csv")
df <- read.csv(file_n, header = TRUE)
df[[7]] <- as.numeric(df[[7]])
f_data <- mRMR.data(data = data.frame(df))
results <- mRMR.classic("mRMRe.Filter", data = f_data, target_indices = 7,
feature_count = 2)
solutions(results)
It worked fine. The output of the code gives the indices of the selected 2 features.
I think it has to do with your Output column which is probably of class integer. You can check that using class(df[[7]]).
To convert it to numeric as required by the warning, just type:
df[[7]] <- as.numeric(df[[7]])
That worked for me.
As for the other question, after reading the documentation, setting target_indices = 7 seems the right choice.
I have a csv file that has the following format:
1 3 1 4
1415670_at 1 8.512147859 8.196725061 8.174426394 8.62388149
1415671_at 2 9.119200527 9.190318548 9.149239039 9.211401637
1415672_at 3 10.03383593 9.575728316 10.06998673 9.735217522
1415673_at 4 5.925999419 5.692092375 5.689299161 7.807354922
I had made some manipulation of this data by deleting columns that are not 1 or 2:
m<-read.csv("table.csv")
smallerdat <- m[ c(1,2, grep("^X1$|^X2$|X1\\.|X2\\." , names(m) ) ) ]
Now I want to save this results again to a csv file, so I do this:
write.csv(smallerdat,"tablemodified.csv",ncolumns=length(smallerdat),sep=",")
but I got an error that says:
Error in cat(list(...), file, sep, fill, labels, append) :
argument 1 (type 'list') cannot be handled by 'cat'
The question that I have is how I can store into a csv file the modified table.
Any help?
The write.csv function needs to have the file-name given as a named argument (as do all of the write.* cousins). Try this instead (edited):
write.csv(smallerdat, file="tablemodified.csv" )
And my original guess applies to the save() function rather than the write.table variants.
I was about to tell you to read ?read.csv and note the "See Also" section that pointed to write.csv... but it doesn't.
So, use write.csv. :)