R read tab separated text file [duplicate] - r

I would like to pass to R a txt file with a matrix, where tailing zeros are omitted from rows (except the first tailing zero, if any). Those missing values considered as zeros.
for example:
8 7 0
5 4 3 2 1
4 8 9
should be read as:
8 7 0 0 0
5 4 3 2 1
4 8 9 0 0
The max row size (i.e. the number of matrix columns) is unknown prior to reading the matrix.

d <- as.matrix(read.table(filename, fill=T))
d[is.na(d)] <- 0

Related

How to find and remove columns containing more than k consecutive zeros in R data.frame?

I have a huge data.frame with around 200 variables, each represented by a column. Unfortunately, the data is sourced from a poorly formatted data dump (and hence can't be modified) which represents both missing values and zeroes as 0.
The data has been observed every 5 minutes for a month, and a day-long period of only 0s can be reasonably thought of as a day where the counter was not functioning, thereby leading to the conclusion that those 0s are actually NAs.
I want to find (and remove) columns that have at least 288 consecutive 0s at any point. Or, more generally, how can we remove columns from a data.frame containing >=k consecutive 0s?
I'm relatively new to R, and any help would be greatly appreciated. Thanks!
EDIT: Here is a reproducible example. Considering k=4, I would like to remove columns A and B (but not C, since the 0s are not consecutive).
df<-data.frame(A=c(4,5,8,2,0,0,0,0,6,3), B=c(3,0,0,0,0,6,8,2,1,0), C=c(4,5,6,0,3,0,2,1,0,0), D=c(1:10))
df
A B C D
1 4 3 4 1
2 5 0 5 2
3 8 0 6 3
4 2 0 0 4
5 0 0 3 5
6 0 6 0 6
7 0 8 2 7
8 0 2 1 8
9 6 1 0 9
10 3 0 0 10
You can use this function on your data:
cons.Zeros <- function (x, n)
{
x <- x[!is.na(x)] == 0
r <- rle(x)
any(r$lengths[r$values] >= n)
}
This function returns TRUE for the columns that need to be dropped. n is the number of consecutive zeros that you want the column to be dropped for.
For your sample dataset let's use n = 3;
df.dropped <- df[, !sapply(df, cons.Zeros, n=3)]
#output:
# > df.dropped
# C D
# 1 4 1
# 2 5 2
# 3 6 3
# 4 0 4
# 5 3 5
# 6 0 6
# 7 2 7
# 8 1 8
# 9 0 9
# 10 0 10

Convert Yes/No/Absent data into Binary Matrix [duplicate]

This question already has answers here:
Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level
(10 answers)
Closed 6 years ago.
At the moment I have a data set of a voting data where each person voted on a number of policies either yes or no or they were absent at the time of the vote of that particular policy.
Overall I have 23 policies but I have no idea how to convert the data into binary.
The data set is set up in the way that obviously "n" = no , "y" = yes and "a" = absent
If anyone could lend me a hand here as to how to convert the data in R to a binary Matrix I would appreciate it !
This may be done using model.matrix. Note, this is done automatically for you in many cases in R, e.g. regression analysis.
> set.seed(1)
> (df <- data.frame(id=1:10,vote=sample(c("yes","no","absent"),10,replace=TRUE)))
id vote
1 1 yes
2 2 no
3 3 no
4 4 absent
5 5 yes
6 6 absent
7 7 absent
8 8 no
9 9 no
10 10 yes
> model.matrix(~.-1,df)
id voteabsent voteno voteyes
1 1 0 0 1
2 2 0 1 0
3 3 0 1 0
4 4 1 0 0
5 5 0 0 1
6 6 1 0 0
7 7 1 0 0
8 8 0 1 0
9 9 0 1 0
10 10 0 0 1
For example:
m <- as.matrix(cbind(c('y','y','y'),c('n','n','n'),c('a','a','a')))
m[m == 'y'] <- 1
m[m == 'n'] <- 0
m[m == 'a'] <- NA

R - Conditional replacement of column values in a data frame

I have a data frame which has 2 columns - A & B. I want to replace the values of column B in such a way that, when the VALUE>=5 replace with 1, else replace with 0.
Note - There are 2 conditions to be checked.
X=read.csv("Y:/impdat.csv")
A B
3 16
12 3
1 2
12 9
4 4
5 6
21 1
4 14
3 10
12 1
So after replacing, the data should be
A B
3 1
12 0
1 0
12 1
4 0
5 1
21 0
4 1
3 1
12 0
Sounds simple. But I am unable to implement it.
I tried
ifelse(X$B>=5,1,0)
This only prints the new values, but the original data remains the same.
X$B <- as.integer(X$B >= 5)
will do the trick.
transform(X, B=ifelse(B>=5,1,0))
Got it.
Just had to assign the object.
X$B=ifelse(X$B>=5,1,0)

interchanging values after comparing two columns in R

i want to write a code that checks two columns in a dataframe and compares them. one is supposed to have lower limit and the other upper limits. if values on the upper limit columns are less than on the lower limit, them it should interchange the values. if both lower and upper limits are zero, it should replace the upper limit column with a value say 2. a sample data is as below:
lower_limit upper_limit
0 3
0 4
5 2
0 15
0 0
0 0
7 4
8 2
after running the code, it should produce something like
lower_limit upper_limit
0 3
0 4
2 5
0 15
0 2
0 2
4 7
2 8
dfrm <- read.table(text="lower_limit upper_limit
0 3
0 4
5 2
0 15
0 0
0 0
7 4
8 2", header=TRUE)
dfrm2 <- dfrm
dfrm2[,2] <- pmax(dfrm[,1], dfrm[,2] )
dfrm2[,1] <- pmin(dfrm[,1], dfrm[,2] );
dfrm2[abs(pmax(dfrm[,1],dfrm[,2]))==0 , 2] <- 2
> dfrm2
lower_limit upper_limit
1 0 3
2 0 4
3 2 5
4 0 15
5 0 2
6 0 2
7 4 7
8 2 8
Assuming dat is the name of your data frame/matrix:
setNames(as.data.frame(t(apply(dat, 1, function(x) {
tmp <- sort(x);
tmp[2] <- tmp[2] + (!any(x)) * 2;
return(tmp) }))), colnames(dat))
lower_limit upper_limit
1 0 3
2 0 4
3 2 5
4 0 15
5 0 2
6 0 2
7 4 7
8 2 8
How it works?
The function apply is used to apply a function to each line (argument 1). In this function, x represents a line of dat. Firstly, the values are ordered (with sort) and stored in the object tmp. Then, the second value of tmp is replaced with 2 if both values are 0. Finally, tmp is returned. The function apply returns the results as matrix, which needs to be transposed (with t). This matrix is transformed to a data frame (as.data.frame) with the same column names as the original object dat (with setNames).

How can I read a matrix with missing end elements in R?

I would like to pass to R a txt file with a matrix, where tailing zeros are omitted from rows (except the first tailing zero, if any). Those missing values considered as zeros.
for example:
8 7 0
5 4 3 2 1
4 8 9
should be read as:
8 7 0 0 0
5 4 3 2 1
4 8 9 0 0
The max row size (i.e. the number of matrix columns) is unknown prior to reading the matrix.
d <- as.matrix(read.table(filename, fill=T))
d[is.na(d)] <- 0

Resources