I want to read a matrix (all values, no null or empty column) from a tab-separated text file of integers and name the columns automatically (based on the titles in the first line):
a b c
9 2 3
2 9 6
3 2 4
5 3 3
I have tried read.csv(), read.table() and scan() methods and read the file, but I want something that:
1- Automatically identifies the column names (no need to mention the
names one by one).
2- I would be able to treat them as matrix of integers; run rcorr(data) and quantile(data$a, 0.9) instead of rcorr(as.matrix(data)) and quantile(as.matrix(data$a), 0.9) any time.
Any ideas on the simplest (yet efficient) way?
How about read.table?
read.table(text="a b c
9 2 3
2 9 6
3 2 4
5 3 3", header=TRUE)
> a b c
1 9 2 3
2 2 9 6
3 3 2 4
4 5 3 3
it also has options to input file, declare the separator, etc.. see help(read.table)
data <-- as.matrix(read.table("c:\\temp\\inFile.tsv", header=TRUE))
Note that I got the following error when there was special characters (#) in the header line:
Error in read.table("..."), : more columns than column names
So there shouldn't be special characters in the header line. Also it automatically detects the separator ("\t").
Related
I have a data frame that is structured as follows, where each column is a feature (a,b,c,d..) of a given entry (the entries being TCxxx). The data frame contains nearly 3000 rows and 9000 columns.
a b c e f g h
TC001 1 5 2 3 2 2 2
TC002 2 9 2 3 5 3 4
TC003 3 6 6 1 4 7 7
I also have a text file that each line is an identifier:
TC005
TC012
TC037
How to turn this text file into a list of identifiers, then create a set from the data frame containing only the rows that match the identifier of the text file?
my_indexes <- scan(<path/to/text/file>)
my_data_frame[my_indexes,]
Hi I am aggregating values from two columns and creating a final third column, based on priorities. If values in column 1 are missing or are NA then I go for column 2.
df=data.frame(internal=c(1,5,"",6,"NA"),external=c("",6,8,9,10))
df
internal external
1 1
2 5 6
3 8
4 6 9
5 NA 10
df$final <- df$internal
df$final <- ifelse((df$final=="" | df$final=="NA"),df$external,df$final)
df
internal external final
1 1 2
2 5 6 3
3 8 4
4 6 9 4
5 NA 10 2
How can I get final value as 4 and 2 for row 3 and row 5 when the external is 8 and 2. I don't know what's wrong but these values don't make any sense to me.
The issue arises because R converts your values to factors.
Your code will work fine with
df=data.frame(internal=c(1,5,"",6,"NA"),external=c("",6,8,9,10),stringsAsFactors = FALSE)
PS: this hideous conversion to factors should definitely belong to the R Inferno, http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
I am trying to read a .csv file in R.
My file looks like this-
A,B,C,D,E
1,2,3,4,5
6,7,8,9,10
.
.
.
number of rows.
All are strings. First line is the header.
I am trying to read the file using-
mydata=read.csv("devices.csv",sep=",",header = TRUE)
But mydata is assigned X observations of 1 variable. Where X is number of rows. The whole row becomes a single column.
But I want every header field in different column. I am not able to understand the problem.
If there are quotes ("), by using the code in the OP's post
str(read.csv("devices.csv",sep=",",header = TRUE))
#'data.frame': 2 obs. of 1 variable:
#$ A.B.C.D.E: Factor w/ 2 levels "1,2,3,4,5","6,7,8,9,10": 1 2
We could remove the " with gsub after reading the data with readLines and then use read.table
read.csv(text=gsub('"', '', readLines('devices.csv')), sep=",", header=TRUE)
# A B C D E
#1 1 2 3 4 5
#2 6 7 8 9 10
Another option if we are using linux would be to remove quotes with awk and pipe with read.csv
read.csv(pipe("awk 'gsub(/\"/,\"\",$1)' devices.csv"))
# A B C D E
#1 1 2 3 4 5
#2 6 7 8 9 10
Or
library(data.table)
fread("awk 'gsub(/\"/,\"\",$1)' devices.csv")
# A B C D E
#1: 1 2 3 4 5
#2: 6 7 8 9 10
data
v1 <- c("A,B,C,D,E", "1,2,3,4,5", "6,7,8,9,10")
write.table(v1, file='devices.csv', row.names=FALSE, col.names=FALSE)
The code which you've written should work unless your csv file is corrupted.
Check giving absolute path of devices.csv
To test: data[1] will give you column 1 results
Or, You can try it this way too
data = read.table(text=gsub('"', '', readLines('//fullpath to devices.csv//')), sep=",", header=TRUE)
Good Luck!
I have a simple table with the following entries.
1
2
3
4
5
The file name is "test.txt". I have used the following command to read in the file.
mydata<-read.table("test.txt")
But when I enter
length(mydata)
it shows 1 instead of 5. Why does it show 1 and not 5 ?
I believe
nrow(mydata)
should return the number of rows (5)
The length of the data frame will give you number of columns present in the data-frame. In this case it is 1.
mydata<- data.frame(c(1:5))
The above code creates a dataframe
X1.5
1 1
2 2
3 3
4 4
5 5
Lets see some commands
length(mydata)
[1] 1
To know the number of rows
case 1
nrow(mydata)
[1] 5
case 2: To know the number of elements in first column of a dataframe
length(mydata$X1.5)
[1] 5
length(mydata[[1]])
[1] 5
Length is used mostly for vectors and for dataframe it is good to use nrow command.
Regards,
Ganesh
I would like to rename a large number of columns (column headers) to have numerical names rather than combined letter+number names. Because of the way the data is stored in raw format, I cannot just access the correct column numbers by using data[[152]] if I want to interact with a specific column of data (because random questions are filtered completely out of the data due to being long answer comments), but I'd like to be able to access them by data$152. Additionally, approximately half the columns names in my data have loaded with class(data$152) = NULL but class(data[[152]]) = integer (and if I rename the data[[152]] file it appropriately allows me to see class(data$152) as integer).
Thus, is there a way to use the loop iteration number as a column name (something like below)
for (n in 1:415) {
names(data)[n] <-"n" # name nth column after number 'n'
}
That will reassign all my column headers and ensure that I do not run into question classes resulting in null?
As additional background info, my data is imported from a comma delimited .csv file with the value 99 assigned to answers of NA with the first row being the column names/headers
data <- read.table("rawdata.csv", header=TRUE, sep=",", na.strings = "99")
There are 415 columns with headers in format Q001, Q002, etc
There are approximately 200 rows with no row labels/no label column
You can do this without a loop, as follows:
names(data) <- 1:415
Let me illustrate with an example:
dat <- data.frame(a=1:4, b=2:5, c=3:6, d=4:7)
dat
a b c d
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
Now rename the columns:
names(dat) <- 1:4
dat
1 2 3 4
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
EDIT : How to access your new data
#Ramnath points out very accurately that you won't be able to access your data using dat$1:
dat$1
Error: unexpected numeric constant in "dat$1"
Instead, you will have to wrap the column names in backticks:
dat$`1`
[1] 1 2 3 4
Alternatively, you can use a combination of character and numeric data to rename your columns. This could be a much more convenient way of dealing with your problem:
names(dat) <- paste("x", 1:4, sep="")
dat
x1 x2 x3 x4
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7