length of table read from a data file in r - r

I have a simple table with the following entries.
1
2
3
4
5
The file name is "test.txt". I have used the following command to read in the file.
mydata<-read.table("test.txt")
But when I enter
length(mydata)
it shows 1 instead of 5. Why does it show 1 and not 5 ?

I believe
nrow(mydata)
should return the number of rows (5)

The length of the data frame will give you number of columns present in the data-frame. In this case it is 1.
mydata<- data.frame(c(1:5))
The above code creates a dataframe
X1.5
1 1
2 2
3 3
4 4
5 5
Lets see some commands
length(mydata)
[1] 1
To know the number of rows
case 1
nrow(mydata)
[1] 5
case 2: To know the number of elements in first column of a dataframe
length(mydata$X1.5)
[1] 5
length(mydata[[1]])
[1] 5
Length is used mostly for vectors and for dataframe it is good to use nrow command.
Regards,
Ganesh

Related

How to create a vector of positions of a numeric vector in R?

I have a vector of numbers that contain some gaps. For example,
vec <- c(3,1,7,3,5,7)
So, there are 4 different values and I would like to transform it into a vector of values (without gaps) indicating the order of the entry while respecting the same position. So, in this case, I would like to obtain
2 1 4 2 3 4
Indicating a sequence of between 1 and 4 and showing the orders in the original vector vec.
You can use match to help you look up the values in a sorted unique order. For example
vec <- c(3,1,7,3,5,7)
match(vec, sort(unique(vec)))
# [1] 2 1 4 2 3 4
This works because match returns the indexes which will start at 1.
We may use factor
as.integer(factor(vec))
[1] 2 1 4 2 3 4

For each row return the multiple column indexs for specific number

Hi suppose I have a matrix with 0 an 1 only and I want to find out where 1 locates in each row. And for each row, there are multiple 1 exist.
For example I have
set.seed(444)
m3 <- matrix(round(runif(8*8)), 8,8)
For the first row I have column 2,3,8 are 1 and I want a code could report either column name or column index. Meanwhile, it is worth to point out that each the number of 1 in each row could be different.
Can anyone provide some suggestions? I appreciate it so much.
We can use which with arr.ind which returns the row/column index as a matrix
out <- which(m3 ==1, arr.ind = TRUE)
out[,2][order(out[,1])]
[1] 2 3 8 3 5 3 4 8 7 4 6 7 1 3 4 6 1 4 5 6 7 2 4 7 8
To get the column name, use the same index (if the matrix have any column names- here there are not column names attribute)
colnames(m3)[out[,2][order(out[,1])]]

Finding Index from Vector/Matrix or Dataframe in R

I have data in R as follow:
data <- c(1,12,22,0,8,1,0,0)
Is there any way to index the data to find the index for element that is greater than 0? So the result will be:
1 2 3 5 6
I tried to use as.factor(data), but it will take several more step to get the result that I aim for. Thanks.
We can use which on a logical vector
which(data >0)
#[1] 1 2 3 5 6
Another option is using seq_along (but not as straightforward as the which method by #akrun)
> seq_along(data)[data>0]
[1] 1 2 3 5 6

R read matrix of integer values automatically

I want to read a matrix (all values, no null or empty column) from a tab-separated text file of integers and name the columns automatically (based on the titles in the first line):
a b c
9 2 3
2 9 6
3 2 4
5 3 3
I have tried read.csv(), read.table() and scan() methods and read the file, but I want something that:
1- Automatically identifies the column names (no need to mention the
names one by one).
2- I would be able to treat them as matrix of integers; run rcorr(data) and quantile(data$a, 0.9) instead of rcorr(as.matrix(data)) and quantile(as.matrix(data$a), 0.9) any time.
Any ideas on the simplest (yet efficient) way?
How about read.table?
read.table(text="a b c
9 2 3
2 9 6
3 2 4
5 3 3", header=TRUE)
> a b c
1 9 2 3
2 2 9 6
3 3 2 4
4 5 3 3
it also has options to input file, declare the separator, etc.. see help(read.table)
data <-- as.matrix(read.table("c:\\temp\\inFile.tsv", header=TRUE))
Note that I got the following error when there was special characters (#) in the header line:
Error in read.table("..."), : more columns than column names
So there shouldn't be special characters in the header line. Also it automatically detects the separator ("\t").

Using loop variables

I would like to rename a large number of columns (column headers) to have numerical names rather than combined letter+number names. Because of the way the data is stored in raw format, I cannot just access the correct column numbers by using data[[152]] if I want to interact with a specific column of data (because random questions are filtered completely out of the data due to being long answer comments), but I'd like to be able to access them by data$152. Additionally, approximately half the columns names in my data have loaded with class(data$152) = NULL but class(data[[152]]) = integer (and if I rename the data[[152]] file it appropriately allows me to see class(data$152) as integer).
Thus, is there a way to use the loop iteration number as a column name (something like below)
for (n in 1:415) {
names(data)[n] <-"n" # name nth column after number 'n'
}
That will reassign all my column headers and ensure that I do not run into question classes resulting in null?
As additional background info, my data is imported from a comma delimited .csv file with the value 99 assigned to answers of NA with the first row being the column names/headers
data <- read.table("rawdata.csv", header=TRUE, sep=",", na.strings = "99")
There are 415 columns with headers in format Q001, Q002, etc
There are approximately 200 rows with no row labels/no label column
You can do this without a loop, as follows:
names(data) <- 1:415
Let me illustrate with an example:
dat <- data.frame(a=1:4, b=2:5, c=3:6, d=4:7)
dat
a b c d
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
Now rename the columns:
names(dat) <- 1:4
dat
1 2 3 4
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
EDIT : How to access your new data
#Ramnath points out very accurately that you won't be able to access your data using dat$1:
dat$1
Error: unexpected numeric constant in "dat$1"
Instead, you will have to wrap the column names in backticks:
dat$`1`
[1] 1 2 3 4
Alternatively, you can use a combination of character and numeric data to rename your columns. This could be a much more convenient way of dealing with your problem:
names(dat) <- paste("x", 1:4, sep="")
dat
x1 x2 x3 x4
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7

Resources