I wish to open and read the following text file in Scilab (version 6.0.2).
The original file is an .xlsx that I have converted to both .txt and .csv through Excel to facilitate opening & working with it in Scilab.
Using both fscanfMat and csvRead, scilab only reads the first column as Nan. I understand why the first column is considered as Nan, but I do not see why the rest of the document isn't read. Columns 2 and 3 are in particular of interest to me.
For csvRead, I used :
M=csvRead(chemin+filename," ",",",[],[],[],[],7);
to skip the 7-row header.
Could it be something to do with the way in which the file has been formatted?
For anyone able to help, I will try to upload an example of a .txt file and also the original .xlsx file
Files available for download, here: Excel and Text files
If you convert your xlsx file into a xls one with Excel you can read it withthe readxls function.
Your separator is a tabulation character (ascii code 9). Use the following command:
M=csvRead("Probe1_350N_2S.txt",ascii(9),",",[],[],[],[],7);
Related
I am trying to read a csv file in R but when I run read_csv(), I get this weird paint-like symbol for some rows, even though it is displayed correctly in the raw csv file. I have tried reading it through read.csv() and also converting the file to excel and reading it through read_xlsx() but I get the same weird symbol. I am guessing it has something to do with the encoding but I am not sure what to do. Any suggestions?
I tried to download a file from LSData, but it brings me to a page full of weird characters. The first few are:
7z¼¯'�DÙ™µUa�����b�������’³_èÚ†à]�&Jgl›Ü)ÉZKŒP7þò|¤ˆëÁëxŠ§u6²ã]’“Àé3lGê7ñ"!èÞ’ïjP³
l½Öv<¹-žøZ¹Æ âäùëOKä#;cÞ Žmï•&?^¢Ø"Á.=ù‚u|õ9žG<އ趽ÈËŒøÂtŠÍÝê/ÂG×à×–R§Ýj×zÛ¥™éwG—ï‘ývíõåò ÂÑ\‡W�ܱò§úßxlø¾Ö¾EºáPnÚR"økv§}6“SLÒ¢ø€m]-Ì«gÐáÅMŠWGU�µOÿDõ™}u¦HŠ_qŠ,/¦lÔ}Áô|,Òäêÿ2l«ª»°úö¡]+€™´í¿¢«|Ãw#êñ:t!
I have no clue what I'm looking at. How can I convert this entire page into a CSV, or in whatever file so I can use it in R?
it is a 7z zipped file, you can download and unzip it to get the CSV file
I have tried my best to read a CSV file in r but failed. I have provided a sample of the file in the following Gdrive link.
Data
I found that it is a tab-delimited file by opening in a text editor. The file is read in Excel without issues. But when I try to read it in R using "readr" package or the base r packages, it fails. Not sure why. I have tried different encoding like UTF-8. UTF-16, UTF16LE. Could you please help me to write the correct script to read this file. Currently, I am converting this file to excel as a comma-delimited to read in R. But I am sure there must be something that I am doing wrong. Any help would be appreciated.
Thanks
Amal
PS: What I don't understand is how excel is reading the file without any parameters provided? Can we build the same logic in R to read any file?
This is a Windows-related encoding problem.
When I open your file in Notepad++ it tells me it is encoded as UCS-2 LE BOM. There is a trick to reading in files with unusual encodings into R. In your case this seems to do the trick:
read.delim(con <- file("temp.csv", encoding = "UCS-2LE"))
(adapted from R: can't read unicode text files even when specifying the encoding).
BTW "CSV" stands for "comma separated values". This file has tab-separated values, so you should give it either a .tsv or .txt suffix, not .csv, to avoid confusion.
In terms of your second question, could we build the same logic in R to guess encoding, delimiters and read in many types of file without us explicitly saying what the encoding and delimiter is - yes, this would certainly be possible. Whether it is desirable I'm not sure.
I have a file in Excel which has a column with Chinese simplified characters. When I open it in R from the corresponding CSV file I only get ?'s.
I'm afraid the problem is when exporting from Excel to CSV because when I open the CSV file on a text editor I also get ?'s.
How can I get around this?
The best way to secure your Chinese/Unicode characters is to read file from .xlsx:
library(readxl)
read_xlsx("yourfilepath.xlsx", col_types = "text")
If your file is too big to read from .xlsx, then the best way is to open Excel and split manually into multiple files.
(My experience with a laptop with 8GB RAM is to split files into 250,000 rows x 106 columns.)
If you need to read from .csv, your all windows settings/localization needs to be the same as your file, but even that does not guarantee the integrity of all your Unicode characters (eg. emojis).
(If you also need .csv for something else, then you can use the R function write.csv after you read data from .xlsx into R.)
I tried to download csv file from http://census.ire.org/data/bulkdata.html
I tried this code:
download.file("http://censusdata.ire.org/09/all_060_in_09.PCT7.csv")
It's works, but I have a csv file made of symbols and not numbers. I can read it from R console, but some files have wrong numbers of column.
Why?
TY