read.csv directly into character vector in R - r

This code works, however, I wonder if there is a more efficient way. I have a CSV file that has a single column of ticker symbols. I then read this csv into R and apply functions to each ticker using a for loop.
I read in the csv, and then go into the data frame and pull out the character vector that the for loop needs to run properly.
SymbolListDataFrame = read.csv("DJIA.csv", header = FALSE, stringsAsFactors=F)
SymbolList = SymbolListDataFrame[[1]]
for (Symbol in SymbolList){...}
Is there a way to combine the first two lines I have written into one? Maybe read.csv is not the best command for this?
Thank you.
UPDATE
I am using the readlines method suggested by Jake and Bartek. There is a warning "incomplete final line found on" the csv file but I ignore it since the data is correct.

SymbolList <- readLines("DJIA.csv")

SymbolList <- read.csv("DJIA.csv", header = FALSE, stringsAsFactors=F)[[1]]

readLines function is the best solution here.
Please note that read.csv function is not only for reading files with csv extensions. This is simply read.table function with parameters like header or sep set differently. Check the documentation for more info.

Related

how to write a file so that it does not have commas and remains in same format in R

I am reading file in R:
data <- read.delim ("file.fas", header=TRUE, sep="\t" )
However, after I have done some manipulations to the data, the output format is not same. It now contains commas "," like this all over.
write.table(x= data, file = "file_1.fas")
How can I avoid this? Maybe I should use some different function to write a file?

read.csv read csv file with function, cell wrapped with ="{value}"

I export my CSV file with python, numbers are wrapped as ="10000000000" in cells, for example:
name,price
"something expensive",="10000000000",
in order to display the number correctly, I prefer to wrap the big number or string of numbers(so someone could open it directly without reformating the column), like order ID into this format.
It's correct with excel or number, but when I import it with R by using read.csv, cells' values show as =10000000000.
Is there any solution to this?
Thank you
how about:
yourcsv <- read.csv("yourcsv.csv")
yourcsv <- gsub("=", "", yourcsv$price)
Also, in my experience read_csv() from the tidyverse library reads data in much faster than read.csv() and I think also has more logic built into it for nonideal cases encountered, so maybe it's worth trying.

Asking for a fix for R backtick operator

I am trying to import data from in xls format in R, but it reads the header incorrectly, instead of
X1
R interprets the data as
`X1 `
that makes writing complicated R syntax impossible.
How this issue can be resolved ?
One can skip the header record and give your own column names with any number of R packages that read excel data. Here is an example with readxl::read_excel().
library(readxl)
data <- read_excel("./data/anExcelWorksheet.xlsx",
col_names=FALSE,
skip=1)

How to use wildcards to define col_type when using readr?

I just asked a few days ago, how to set a specific column type when using readr package. big integers when reading file with readr in r
Is there a way to define the column names by wildcard? In my case, I have sometimes several columns starting with Intensity and an appendix depending on the experiment. It is hard to use read_tsv in a function if you not know upfront which project names where used.
So something like col_types = cols('Intensity.*' = col_double()) would be awesome.
Anyone an idea how to get this feature?
EDIT:
Maybe something like read the first 2 lines, grep 'Intensity' in the names and then somehow create this parameter like cols(Intensity=col_double(), 'Intensity pg'=col_double(), 'Intensity hs'=col_double()).
But I have no idea how to create this parameter value on the fly.
I add the answer which solved my question, based on the comment of lukeA...
read_MQtsv <- function(file) {
require('readr')
jnk <- read.delim(file, nrows=1, check.names=FALSE)
matches <- grep('Intensity|LFQ|iBAQ', names(jnk), value=TRUE)
read_tsv(file,
col_types=setNames(
rep(list(col_double()), length(matches)),
matches))
}
So I adapted the single line from the comment to a new function which I would use when reading my special files which are produced by a program called MaxQuant.

Reading specific column of multiple files in R

I have used the following code to read multiple .csv files in R:
Assembly<-t(read.table("E:\\test\\exp1.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Assembly","f"))[1:4416,"Assembly",drop=FALSE])
Top1<-t(read.table("E:\\test\\exp2.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Top1","f"))[1:4416,"Top1",drop=FALSE])
Top3<-t(read.table("E:\\test\\exp3.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Top3","f"))[1:4416,"Top3",drop=FALSE])
Top11<-t(read.table("E:\\test\\exp4.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Top11","f"))[1:4416,"Top11",drop=FALSE])
Assembly1<-t(read.table("E:\\test\\exp5.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Assembly1","f"))[1:4416,"Assembly1",drop=FALSE])
Area<-t(read.table("E:\\test\\exp6.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Area","f"))[1:4416,"Area",drop=FALSE])
data<-rbind(Assembly,Top1,Top3,Top11,Assembly1,Area)
So the entire data is in the folder "test" in E drive. Is there a simpler way in R to read multiple .csv data with a couple of lines of code or some sort of function call to substitute what has been made above?
(Untested code; no working example available) Try: Use the list.files function to generate the correct names and then use colClasses as argument to read.csv to throw away the first 4 columns (and since that vector is recycled you will alss throw away the 6th column):
lapply(list.files("E:\\test\\", patt="^exp[1-6]"), read.csv,
colClasses=c(rep("NULL", 4), "numeric"), nrows= 4416)
If you want this to be returned as a dataframe, then wrap data.frame around it.

Resources