How to read csv files in matlab as you would in R? - r

I have a data set that is saved as a .csv file that looks like the following:
Name,Age,Password
John,9,\i1iiu1h8
Kelly,20,\771jk8
Bob,33,\kljhjj
In R I could open this file by the following:
X = read.csv("file.csv",header=TRUE)
Is there a default command in Matlab that reads .csv files with both numeric and string variables? csvread seems to only like numeric variables.
One step further, in R I could use the attach function to create variables with associated with teh columns and columns headers of the data set, i.e.,
attach(X)
Is there something similar in Matlab?

Although this question is close to being an exact duplicate, the solution suggested in the link provided by #NathanG (ie, using xlsread) is only one possible way to solve your problem. The author in the link also suggests using textscan, but doesn't provide any information about how to do it, so I thought I'd add an example here:
%# First we need to get the header-line
fid1 = fopen('file.csv', 'r');
Header = fgetl(fid1);
fclose(fid1);
%# Convert Header to cell array
Header = regexp(Header, '([^,]*)', 'tokens');
Header = cat(2, Header{:});
%# Read in the data
fid1 = fopen('file.csv', 'r');
D = textscan(fid1, '%s%d%s', 'Delimiter', ',', 'HeaderLines', 1);
fclose(fid1);
Header should now be a row vector of cells, where each cell stores a header. D is a row vector of cells, where each cell stores a column of data.
There is no way I'm aware of to "attach" D to Header. If you wanted, you could put them both in the same structure though, ie:
S.Header = Header;
S.Data = D;

Matlab's new table class makes this easy:
X = readtable('file.csv');
By default this will parse the headers, and use them as column names (also called variable names):
>> x
x =
Name Age Password
_______ ___ ___________
'John' 9 '\i1iiu1h8'
'Kelly' 20 '\771jk8'
'Bob' 33 '\kljhjj'
You can select a column using its name etc.:
>> x.Name
ans =
'John'
'Kelly'
'Bob'
Available since Matlab 2013b.
See www.mathworks.com/help/matlab/ref/readtable.html

I liked this approach, supported by Matlab 2012.
path='C:\folder1\folder2\';
data = 'data.csv';
data = dataset('xlsfile',sprintf('%s\%s', path,data));
Of cource you could also do the following:
[data,path] = uigetfile('C:\folder1\folder2\*.csv');
data = dataset('xlsfile',sprintf('%s\%s', path,data));

Related

R - Export large dataframe into CSV

Beginner here: I have a list (see screenshot) called Coins_list from which I want to export the second dataframe stored in it called data into a csv. When I use the code
write.csv(Coins_list$data, file = "Coins_list_full_data.csv")
I get a huge CSV with a bunch of numbers from the column named price which apparently containts more dataframes, if I read the output correctly or at least display the data in the price column? How can I export this dataframe into CSV correctly? See screenshot for more details.
EDIT: I was able to get the first four rows into CSV by using df2 <- Coins_list$data write.csv(df2[1:4,], file="BTC_row.csv"), however it now looks like R puts the price of all four rows within a list c( ) and repeats it in each row? Any idea how to change that?
(I would post this as a comment but I have too few reputation)
Hey, you could try for starters to flatten the json file by going further than response list$content but looking at what's into the content with another $.
Else you could try getting data$price and see what pops up from there.
something like this:
names = list(data$symbol)
df = data.frame(price = NA, symbol = NA)
for (i in length(data)) {
x = data.frame(price = data$price[i], symbol = names[i])
df = inner_join(df, data)
}
to get a dataframe with price and symbol. I don't know how the data is nested so I'm just guessing.
It would be helpful to know from where you got the data for reproducibility.

read.csv importing two columns instead of one

I'm trying to import a csv file into a vector. There are 100 entries in this csv file, and this is what the file looks like:
My code reads as follows:
> choice_vector <- read.csv("choices.csv", header = FALSE, fileEncoding="UTF-8-BOM")
> choice_vector
And yet, when I try to display said vector, it shows up as:
It is somehow creating a second column which I cannot figure out why it is doing so. In addition, trying to write to a new csv file actually writes the contents of that second column to that as well.
The second column was "habilitated" in excel.
Option1: Manually delete the column in excel.
Option2: Delete all columns with all NA
choice_vector2 <- choice_vector[,colSums(is.na(choice_vector))<nrow(choice_vector)]
In case of being interested in reading the first column only:
choice_vector <- read.csv("choices.csv", header = FALSE, fileEncoding="UTF-8-BOM")[,1]
Good luck!
Short answer:
You have an issue with your data file, but
choice_vector <- read.csv("choices.csv", header = FALSE, fileEncoding="UTF-8-BOM")$V1
should create the vector that you're expecting.
Long answer:
The read.csv function returns a data frame and you need to address a particular column within the data frame with the $ operator in order to extract that column as a vector. As for why you have an unexpected column of NAs, your CSV probably codes for two columns. When you read a CSV with R, a comma indicates a data field to its right. If you look at your CSV with a text editor, I'm guessing it'll look like this:
A,
B,
D,
A,
A,
F,
The absence of anything (other than another comma or a line break) to the right of a comma is interpreted as NA.
If we are using fread from data.table, there is a select option to select only the columns of interest
library(data.table)
dt <- fread("choices.csv", select = 1)
Other than that, it is not clear about why the issue happens. Could be some strange white space. If that is the case, specify strip.white = TRUE (by default it is FALSE)
read.csv(("choices.csv", header = FALSE,
fileEncoding="UTF-8-BOM", strip.white = TRUE)
Or as we commented, copy the columns of interest into a new file, save it and then read with read.csv

Write SAS XPORT file in R specifying length larger than the largest actual value for a character variable

How would one write an R data frame to the SAS xpt format and specify the length of each column? For example, in a column of text variables the longest string is 157 characters, however I'd like field length attribute to have 200 characters.
The package haven does not seem to have this option and the package SASxport's documentation is less than clear on this issue.
The SASformat() and SASiformat() functions are used to set an attribute on an R object that sets its format when written to a SAS xport file. To set a data frame column to a 200 character format, use the following approach:
SASformat(mydata$var) <- 'CHAR200.'`
SASiformat(mydata$var) <- 'CHAR200.'`
Then use write.xport() to write the data frame to a SAS xport format.
See page 17 of the SASxport package documentation for details.
SASxport is an old package, so you'll need to load an older version of Hmisc to get it to work properly, per another SO question.
However, on reading the file into SAS it uses the length of the longest string in any observation to set the length of the column, regardless of the format and informat attributes. Therefore, one must write at least one observation containing trailing blanks to the desired length in order for SAS to set the length to the desired size. Ironically, this makes the format and informat superfluous.
This can be accomplished with the str_c() function from the stringr package.
Putting it all together...
library("devtools")
install_version("Hmisc", version = "3.17-2")
library(SASxport)
library(Hmisc)
## manually create a data set
data <- data.frame( x=c(1, 2, NA, NA ), y=c('a', 'B', NA, '*' ), z=c("this is a test","line 2","another text string",
"bottom line") )
# workaround - extend the string variable to desired length (30 characters) by
# adding trailing blanks, using stringr::str_c() function
library(stringr)
data$z <- sapply(data$z,function(x){str_c(x,str_dup(" ",30-nchar(x)),collapse=TRUE)})
nchar(data$z)
# write to SAS XPORT file
tmp <- tempfile(fileext = ".dat")
write.xport( data, file = tmp )
We'll read the file into SAS and use lengthc() to check the size of the z column.
libname testlib xport '/folders/myfolders/xport.dat';
proc copy in=testlib out=work;
run;
data data;
set data;
lenZ = lengthc(z);
run;
...and the output:

Importing google sheets file that contains multiple headers (Using R)

I am currently trying to import a google sheets cross tabular file into R that contains 3 headers that I would like to combine (first row= Year / Second Row = Quarter / Third row = Week).
Most packages in R allow you to select only 1 header and allow you to 'skip' rows until you find the observations however I can't seem to find one that allows you to select multiple headers at once.
Can anyone help?
you might read the first 3 rows first (n_max = 3), and create tidy headers from that, then read the rest (with skip = 2), then add your headers to the data. Something like:
headers= read_csv (file, n_max = 3)
names (headers) = paste0 (headers[1,], headers[2,], headers[3,])
data = read_csv (file, skip = 2)
names (data) = names (headers)
(the code probably needs some debugging, though.)

Run separate functions on multiple elements of list based on regex criteria in data.frame

The following works, but I'm missing a functional programming technique, indexing, or a better way of structuring my data. After a month, it will take a bit to remember exactly how this works instead of being easy to maintain. It seems like a workaround when it shouldn't be. I want to use regex to decide which function to use for expected groups of files. When a new file format comes along, I can write the read function, then add the function along with the regex to the data.frame to run it alongside all the rest.
I have different formats of Excel and csv files that need to be read in and standardized. I want to maintain a list or data.frame of the file name regex and appropriate function to use. Sometimes there will be new file formats that won't be matched, and old formats without new files. But then it gets complicated which is something I would prefer to avoid.
# files to read in based on filename
fileexamples <- data.frame(
filename = c('notanyregex.xlsx','regex1today.xlsx','regex2today.xlsx','nomatch.xlsx','regex1yesterday.xlsx','regex2yesterday.xlsx','regex3yesterday.xlsx'),
readfunctionname = NA
)
# regex and corresponding read function
filesourcelist <- read.table(header = T,stringsAsFactors = F,text = "
greptext readfunction
'.*regex1.*' 'readsheettype1'
'.*nonematchthis.*' 'readsheetwrench'
'.*regex2.*' 'readsheettype2'
'.*regex3.*' 'readsheettype3'
")
# list of grepped files
fileindex <- lapply(filesourcelist$greptext,function(greptext,files){
grepmatches <- grep(pattern = greptext,x = data.frame(files)[,1],ignore.case = T)
},files = fileexamples$filename)
# run function on files based on fileindex from grep
for(i in 1:length(fileindex)){
fileexamples[fileindex[[i]],'readfunctionname'] <- filesourcelist$readfunction[i]
}

Resources