read.csv() in R does not produce a vector - r

I am trying to load data.csv in R using
S<-read.csv(file="data.csv")
Since it is a single column of numbers (I believe tab deliminated) without header, I was hoping for S to be a vector. But S displays as
X40.87
1 40.69
2 40.94
... ...
(The numbers 40.87,40.69... are my numbers.).
To access the third number, I need to invoke S[2,1]. Why not S[3]?

Use scan()
S <- scan("file.csv")
S[3]
# 40.94
Alternatively, as said by billinkc you can use read.csv("file.csv", header=FALSE) or just read.table("file.csv") as the delimiters aren't relevant in a file with a single column.

Since your CSV has no header, you need to indicate it as such when you open the file the interpreter is going to assign the first row as the column name.
Thus with input file like
40.87
40.69
40.94
I open this with the same logic you used
> s <- read.csv(file="~/Documents/r/data.txt",header=FALSE)
> s
V1
1 40.87
2 40.69
3 40.94
References
read.table {utils}

If you really just want a vector, subset the 1-column data frame:
read.csv(file="data.csv", header=FALSE)[,1]
This works because of the argument drop which takes default TRUE, and which drops the empty dimension (in this the column information).

Related

read.csv ;check.names=F; R;Look at the picture,why it works a treat?

please see the the column name "if" in the second column,the deifference is :when check.name=F,"." beside "if" disappear
Sorry for the code,because I try to type some codes to generate this data.frame like in the picture,but i failed due to the "if".We know that "if" is a reserved word in R(like else,for, while ,function).And here, i deliberately use the "if" as the column name (the 2nd column),and see whether R will generate some novel things.
So using another way, I type the "if" in the excel and save as the format of csv in order to use read.csv.
Question is:
Why "if." changes to "if"?(After i use check.names=FALSE)
enter image description here
?read.csv describes check.names= in a similar fashion:
check.names: logical. If 'TRUE' then the names of the variables in the
data frame are checked to ensure that they are syntactically
valid variable names. If necessary they are adjusted (by
'make.names') so that they are, and also to ensure that there
are no duplicates.
The default action is to allow you to do something like dat$<column-name>, but unfortunately dat$if will fail with Error: unexpected 'if' in "dat$if", ergo check.names=TRUE changing it to something that the parser will not trip over. Note, though, that dat[["if"]] will work even when dat$if will not.
If you are wondering if check.names=FALSE is ever a bad thing, then imagine this:
dat <- read.csv(text = "a,a\n2,3")
dat
# a a.1
# 1 2 3
dat <- read.csv(text = "a,a\n2,3", check.names = FALSE)
dat
# a a
# 1 2 3
In the second case, how does one access the second column by-name? dat$a returns 2 only. However, if you don't want to use $ or [[, and instead can rely on positional indexing for columns, then dat[,colnames(dat) == "a"] does return both of them.

Loading csv - One row of intergers

Problem reading data vector - My csv data file (rab.csv) has just one row of > 10,000 numbers read into R with:
bab <- read.table("rab.csv") #which yields:
bab
V1
1 23,29,9,28,16,10,8,24,16,20,14,15,17,31,25,19,24,55,28,55,23, . . . and so on
In using this data vector, I get:
Error: data vector must consist of at least two distinct values!
It seems to only see the number "1" that was somehow added in front of the data.
I'm quite new to this so probably something simple, but I've spent 2 days searching every possibility I can think of without finding a solution.
We can use scan to read the file as a vector.
v1 <- scan("rab.csv", what=numeric(), sep=",")
In the read.table, if we don't specify header=FALSE, it will take the first column as header and as it is numeric, it will append X as prefix. (though, it can be avoided by using check.names=FALSE argument)

In R, how to read file with custom end of line (eol)

I have a text file to read in R (and store in a data.frame). The file is organized in several rows and columns. Both "sep" and "eol" are customized.
Problem: the custom eol, i.e. "\t&nd" (without quotations), can't be set in read.table(...) (or read.csv(...), read.csv2(...),...) nor in fread(...), and I can't able to find a solution.
I'have search here ("[r] read eol" and other I don't remember) and I don't find a solution: the only one was to preprocess the file changing the eol (not possible in my case because into some fields I can find something like \n, \r, \n\r, ", ... and this is the reason for the customization).
Thanks!
You could approach this two different ways:
A. If the file is not too wide, you can read your desired rows using scan and split it into your desired columns with strsplit, then combine into a data.frame. Example:
# Provide reproducible example of the file ("raw.txt" here) you are starting with
your_text <- "a~b~c!1~2~meh!4~5~wow"
write(your_text,"raw.txt"); rm(your_text)
eol_str = "!" # whatever character(s) the rows divide on
sep_str = "~" # whatever character(s) the columns divide on
# read and parse the text file
# scan gives you an array of row strings (one string per row)
# sapply strsplit gives you a list of row arrays (as many elements per row as columns)
f <- file("raw.txt")
row_list <- sapply(scan("raw.txt", what=character(), sep=eol_str),
strsplit, split=sep_str)
close(f)
df <- data.frame(do.call(rbind,row_list[2:length(row_list)]))
row.names(df) <- NULL
names(df) <- row_list[[1]]
df
# a b c
# 1 1 2 meh
# 2 4 5 wow
B. If A doesn't work, I agree with #BondedDust that you probably need an external utility -- but you can invoke it in R with system() and do a find/replace to reformat your file for read.table. Your invocation will be specific to your OS. Example: https://askubuntu.com/questions/20414/find-and-replace-text-within-a-file-using-commands . Since you note that you have \n, and \r\n in your text already, I recommend that you first find and replace them with temporary placeholders -- perhaps quoted versions of themselves -- and then you can convert them back after you have built your data.frame.

Variable cell specification for a csv in R

I wish to use a variable to specify a particular cell in a csv file. I can use:
emp1 <- read.csv("C:/Database/data/emp1.csv",as.is=TRUE)
numberofemployee <- 1
> emp1["1", "X.name"]
[1] "ALEX"
but if I use:
> emp1["numberofemployee", "X.name"]
[1] NA
I assume R is looking for numberofemployee as a column header.
How do I get it to see it as an integer so I can specify my cells?
csv file
#name,mon,tue,wed,thu,fri
ALEX,98,95,73,88,18
BRAD,66,25,72,8,32
JOHN,22,41,78,43,36
The problem is that you pass strings to the []. This works best when referring to row and columnnames. In case of using "1", R probably makes an educated guess and converts the "1" to a 1 (numeric). However, in case of you passing the name of a variable, R cannot do anything else than assume that you are trying to extract something from the numberofemployee column. If you want to use the content of numberofemployee, you need to omit the ". R will then interpret that as an R object, whose content you want to use:
emp1[numberofemployee, "X.name"]

Error in Math.data.frame.....non-numeric variable in data frame:

I am reading a csv file into R and trying to do take the log of the data. The csv file has columns of data with the first row having text headers and the rest numeric data.
data<-read.csv("rawdata.csv",header=T)
trans<-log(csv2)
I get the following error when I do this:
Error in Math.data.frame(list(Revenue = c(18766L, 20197L, 20777L,
23410L, : non-numeric variable in data frame: Costs
Output of str should have been inserted in Q-body:
data.frame': 167 obs. of 3 variables:
$ X: int 18766 20197 20777 23410 23434 22100 22337 21511 22683 23151 ...
$ Y: Factor w/ 163 levels "1,452.70","1,469.00",..: 22 9 55 109 158 82 131 112 119 137 ...
$ Z: num 564 608 636 790 843 ...
How do I correct this?
Tada! Y is a factor - big problem. The commas shouldn't be in there.
Also, your original question has some anomalies: data is the loaded data.frame, yet the transformation is applied to csv2. Did you rename the columns? If so, you've not given a full summary of the steps involved. Anyway, the issue is that you have commas in your second column.
EDIT: removed speculation about structure given that it has now been offered.
Dataframes are lists, so lapply will loop over them columns and return the math function done on them.
If the column is a factor (and here str(Costs) would tell you) then you could do the possibly inefficient approach of converting all columns as if they were factors:
Costs_logged <- lapply(Costs, function(x) log(as.numeric(as.character(x))) )
Costs_logged
(See the FAQ about factor conversion to numeric.)
EDIT2: If you want to convert the factor variable with commas in the labels use this method:
data$Y <- as. numeric( gsub("\\,", "", as.character(data$Y) ) )
The earlier version of this only had a single-backslash, but since both regex and R use backslashes as escape characters, "special regex characters" (see ?regex for listing) need to be doubly escaped.
Can you give use the first few values for the variable that is giving you trouble? If the "Costs" variable is giving you trouble (what it looks like from your example), execute something like this:
data <- read.csv("rawdata.csv",header=T)
data[c(1:5),"Costs"]
It sounds as though you have a column of values in the csv file -- column Y -- that has commas in the numbers. That is, it sounds like your csv file looks like this:
X,Y,Z
"18766","1,452.70","564"
"20197","1,469.00","608"
or
X,Y,Z
18766,"1,452.70",564
20197,"1,469.00",608
or something similar. If this is the case, the problem is that column Y can't be read easily by R with a comma in it (even though it makes it easier for us humans to read). You need to get rid of those commas; that is, make your data file look like this:
X,Y,Z
18766,1452.70,564
20197,1469.00,608
(you can leave the quotes in -- just get rid of the commas in the numbers themselves).
There are a number of ways to do this. If you exported your data from excel, format that column differently. Or, alternatively, open the csv in excel, save it as a tab-delimited file, open the file in your favorite text editor, and find-and-delete the commas ("find and replace with nothing").
Then try to pull it back into R with your original command.
Clearly the columns are not all numeric, so just ensure that they are. You can do this by forcing the class of every column when read in:
data <- read.csv("rawdata.csv", colClasses = "numeric")
(read.csv is just a wrapper on read.table, and header = TRUE by default)
That will ensure all columns are of class numeric if that is in fact possible.
If they really are not numeric columns, exclude the ones you don't want to transform, or just work on the columns individually:
x <- data.frame(x = 1:10, y = runif(1, 2, 10), z = letters[1:10])
colClasses can be used to ignore columns by specifying "NULL" if that makes things simpler.
These are equivalent since "x" and "y" are the first 2 columns:
log(x[ , 1:2])
log(x[ , c("x", "y")])
Individually:
log(x$x)
log(x$y)
It's always important to check assumptions about the data read from external sources. Basic checks like summary(x), head(x) and str(x) will show you what the data actually are.

Resources