R drop and element from a nested list using laply - r

I have nested list, the elements in the list are strings. So the structure is ...
> lapply(DVHlimits, function(x) laply (x, function(x) laply(x, function(x) length(x))))
[[1]]
1 2 3 4 5 6 7 8
[1,] 1 1 1 1 1 1 1 1
[2,] 1 1 1 1 1 1 1 1
[3,] 1 1 1 1 1 1 1 1
[4,] 1 1 1 1 1 1 1 1
[5,] 1 1 1 1 1 1 1 1
[[2]]
1 2 3 4 5 6 7
[1,] 1 1 1 1 1 1 1
[2,] 1 1 1 1 1 1 1
[3,] 1 1 1 1 1 1 1
[4,] 1 1 1 1 1 1 1
[[3]]
1 2 3 4 5 6 7
[1,] 1 1 1 1 1 1 1
[2,] 1 1 1 1 1 1 1
etc ......
What I want to do is drop the 8th element from each of sublists (where there is an 8th element) Can anyone tell me how to remove them?
Thank you

Thanks for your suggestions, this is the solution that I came up with.
# Create a function which can be used with lapply
cleanColls <- function(x) {
x <- x[c(-1, -8)]
}
DVHlimits <- lapply(DVHlimits, function(x) lapply( x, function(x) cleanColls(x)))
As you can probably figure out I have also dropped the first element of each sublist. The end result is each sublist has now only 6 elements, they are now all the same length which is what I wanted to achieve.

Related

removing some special columns in large data set with R

I work with large data set(1200*10000),in my data sets some columns have a same value except in one or two point, I need to detect and delete this columns, for example in column “1846”:
> x[317:400,1846]
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[81] 2 2 **1** 2
Other row values(1:317 and 400:1200)=2.
How can I solve this?
For example in some part of My file (1200*10000),
x
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 0 1 2 0 1 0 1 2 2 1
[2,] 1 1 0 1 2 0 1 0 1 2 1 1
[3,] 2 1 0 1 2 0 1 0 1 2 2 1
[4,] 1 2 0 1 2 0 1 0 1 2 2 2
[5,] 0 1 0 1 2 0 1 0 1 2 1 1
[6,] 2 0 0 1 2 0 1 2 0 2 1 2
[7,] 1 1 0 1 2 1 1 0 1 2 0 2
[8,] 0 1 0 1 2 0 1 0 1 2 0 0
[9,] 0 1 0 1 2 0 1 0 1 1 2 1
[10,] 1 1 0 1 2 0 1 0 1 2 1 1
I want to remove in my original data set columns like 3 to 10.
Continue from my answer in your first post,
detect.col <- function(
x,
n.diff=3 # the minimal number of unique values required per column
)
{
ret <- which(apply(x,2,function(e){length(unique(e))}) >= n.diff)
ret
}
x[,detect.col(x)]
I guess this is what you actually mean?
mm<-read.table(text=" [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 0 1 2 0 1 0 1 2 2 1
[2,] 1 1 0 1 2 0 1 0 1 2 1 1
[3,] 2 1 0 1 2 0 1 0 1 2 2 1
[4,] 1 2 0 1 2 0 1 0 1 2 2 2
[5,] 0 1 0 1 2 0 1 0 1 2 1 1
[6,] 2 0 0 1 2 0 1 2 0 2 1 2
[7,] 1 1 0 1 2 1 1 0 1 2 0 2
[8,] 0 1 0 1 2 0 1 0 1 2 0 0
[9,] 0 1 0 1 2 0 1 0 1 1 2 1
[10,] 1 1 0 1 2 0 1 0 1 2 1 1", row.names=1, header=T)
now,
mm[,which(apply(mm,2,function (x) {length(unique(x))})==3)
output
X..1. X..2. X..11. X..12.
[1,] 1 1 2 1
[2,] 1 1 1 1
[3,] 2 1 2 1
[4,] 1 2 2 2
[5,] 0 1 1 1
[6,] 2 0 1 2
[7,] 1 1 0 2
[8,] 0 1 0 0
[9,] 0 1 2 1
[10,] 1 1 1 1
I am not certain, but I think you want to delete any columns that contain a single value in n-1 or n-2 rows where n is the number of rows. If so, then you would want to delete:
column x2 in my.data because it contains 9 '1's and one '0' and
column x5 in my.data because it contains 8 '2's and two '1's.
The code below does that. Sorry if this is not what you are trying to do. I am not sure whether this code would perform well with a huge data frame.
my.data <- read.table(text='
x1 x2 x3 x4 x5 x6
1 1 2 2 2 1
1 1 2 1 1 2
1 1 2 2 2 3
1 1 2 2 2 4
1 1 2 1 2 5
1 1 2 2 2 6
1 0 2 2 2 7
1 1 2 1 2 8
1 1 2 2 1 9
1 1 2 2 2 10
', header = TRUE)
my.data
my.summary <- as.data.frame.matrix(table( rep(colnames(my.data),
each=nrow(my.data)), unlist(my.data)))
my.summary
delete.these <- which(my.summary == (nrow(my.data)-2) |
my.summary == (nrow(my.data)-1), arr.ind = TRUE)[,1]
my.data[,-delete.these]
x1 x3 x4 x6
1 1 2 2 1
2 1 2 1 2
3 1 2 2 3
4 1 2 2 4
5 1 2 1 5
6 1 2 2 6
7 1 2 2 7
8 1 2 1 8
9 1 2 2 9
10 1 2 2 10
This will keep only columns with one distinct value, assuming your data.frame is named x:
keepIndex <- apply(
x,
2,
FUN = function(column) {
return(length(unique(column)) == 1)
})
x <- x[, keepIndex]
This Should work,
m<-matrix(2,nrow=100, ncol=100) #making dummy matrix m
m[sample(1:100,10), sample(1:100,10)]<-1 #replacing some random row and col to 1
m[,-which(colSums(m==1)>0)] #getting rid of cols with 1
A solution based on Boolean indexing.
> x<-cbind(c(1,1,1,1),c(1,1,1,2),c(1,1,1,1))
> x
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
[4,] 1 2 1
> x[,colSums(x!=x[1,])==0]
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 1 1
[4,] 1 1
If your data is stored in a data frame named df:
df[ ,sapply(df, function(x) all(x[1] == x[-1]))]
Either search the whole data or a subset of it:
detect.col <- function(
x,row.from=1,row.to=nrow(x),col.from=1,col.to=ncol(x),
n.diff=3 # the minimal number of unique values required per column
)
{
tmp.x <- x[row.from:row.to,col.from:col.to]
ret <- which(apply(tmp.x,2,function(e){length(unique(e))}) < n.diff )
if(length(ret)){
ret <- ret+col.from-1
}
ret
}
## search the whole
detect.col(x) # columns to remove
## Or only search within a range, like in your case
row.from <- 317
row.to <- 400
col.from <- 1000
col.to <- 2000
col.to.remove <- detect.col(x,row.from,row.to,col.from,col.to)
x[,-col.to.remove] # print those to keep

R:Summing up values of a column row by row and create new column

i have the following column
1
0
0
1
1
1
and i want a new column with the sum of the values row by row wo something like this
1 1
0 1
0 1
1 2
1 3
1 4
thanks
Use cumsum
> x <- c(1,0,0,1,1,1)
> x
[1] 1 0 0 1 1 1
> cumsum(x)
[1] 1 1 1 2 3 4
Putting altogether
> cbind(x, xsum=cumsum(x))
x xsum
[1,] 1 1
[2,] 0 1
[3,] 0 1
[4,] 1 2
[5,] 1 3
[6,] 1 4

What does this R expression do?

sp_full_in is matrix:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 0 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 1 1 1 1 2
2 1 0 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1
3 2 2 0 2 2 2 2 2 2 1 1 2 2 2 1 2 1 1 1 2 1
4 1 2 1 0 2 2 2 1 2 1 1 1 2 2 1 2 1 1 2 2 1
5 2 2 2 2 0 2 2 2 2 1 1 2 1 2 1 2 1 1 1 2 2
6 2 1 1 1 1 0 1 1 1 2 2 2 2 2 1 2 1 2 2 1 1
7 2 1 1 2 1 1 0 1 1 2 1 1 2 1 1 2 1 1 1 2 1
8 1 2 1 1 1 2 2 0 1 1 1 2 2 2 1 2 1 1 2 1 1
9 2 2 1 2 1 1 2 2 0 1 1 2 1 2 1 2 1 1 2 2 2
10 2 2 1 1 1 2 2 1 1 0 2 2 2 2 1 1 1 1 1 2 2
11 2 2 1 1 1 2 1 1 1 1 0 2 1 2 1 2 1 1 1 1 2
12 1 2 1 1 2 1 1 2 1 1 1 0 2 2 1 2 1 2 1 1 1
13 2 2 2 2 1 3 2 2 2 1 1 3 0 2 1 2 2 1 2 2 2
14 2 2 1 2 1 2 1 2 1 2 2 2 1 0 1 2 1 1 1 1 1
15 2 2 2 2 2 2 2 2 2 1 1 2 2 1 0 2 1 1 1 1 2
16 1 2 2 1 1 2 2 2 1 1 2 2 2 2 1 0 1 1 2 1 2
17 2 2 1 1 1 1 1 2 1 1 1 1 2 2 1 2 0 2 2 1 1
18 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 0 1 1 1
19 2 2 1 2 1 2 2 2 2 1 1 2 2 2 1 2 1 1 0 2 2
20 2 2 1 1 1 2 2 2 2 1 2 2 2 2 1 2 1 1 1 0 1
21 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 2 1 1 2 1 0
mean(sp_full_in[which(sp_full_in != Inf)])
produces the result [1] 1.38322
I'm not quite sure I understand what this does, but the way I read it is: for every cell in sp_full_in, check if it is not infinite, if so, return the output 1, then average all the outputs. Is that correct? If not, how should it be ready?
which(sp_full_in != Inf) returns a vector of integers (and only one of them is 1). That vector of integers is then handed to "[" as indices into sp_full_in and returns all the values of sp_full_in as a vector passed to the mean function.
It is a good idea to learn to read R expressions from the "inside out". Find the innermost function call and mentally evaluate it, in this case sp_full_in != Inf,. That returns a logical matrix of all TRUE's that gets passed to which(), and since there is no 'arr.ind' argument, it returns an atomic vector of indices.
The other answers are good at explaining why you get the mean of all the finite entries in the matrix, but it's worth noting that in this case the which does nothing. I used to have the bad habit of over-using which as well.
> a <- matrix(rnorm(4), nrow = 2)
> a
[,1] [,2]
[1,] 0.5049551 -0.7844590
[2,] -1.7170087 -0.8509076
> a[which(a != Inf)]
[1] 0.5049551 -1.7170087 -0.7844590 -0.8509076
> a[a != Inf]
[1] 0.5049551 -1.7170087 -0.7844590 -0.8509076
> a[1] <- Inf
> a
[,1] [,2]
[1,] Inf -0.7844590
[2,] -1.717009 -0.8509076
> a[which(a != Inf)]
[1] -1.7170087 -0.7844590 -0.8509076
## Similarly if there was an Infinite value
> a[a != Inf]
[1] -1.7170087 -0.7844590 -0.8509076
And, while we're at it, we should also mention the function is.finite which is often preferable to != Inf. is.finite will return FALSE on Inf, -Inf, NA and NaN.
No, but you are close, when which is applied to a matrix, it checks every cell of the matrix against the condition,here it is Not Inf. Return the indices of all cells satisfying the conditions,then, according to your code, output the value of the cell according to the returned indices and finally calculate mean of those.

Splitting a data frame into a list using intervals

I want to split a data frame like this
chr.pos nt.pos CNV
1 74355 0
1 431565 0
1 675207 0
1 783605 1
1 888149 1
1 991311 1
1 1089305 1
1 1177669 1
1 1279886 0
1 1406311 0
1 1491385 0
1 1579761 0
2 1670488 1
2 1758800 1
2 1834256 0
2 1902924 1
2 1978088 1
2 2063124 0
The point is to get a list of intervals where the chr are the same and CNV=1 column, but taking into account the 0 inervals between them
[[1]]
1 783605 1
1 888149 1
1 991311 1
1 1089305 1
1 1177669 1
[[2]]
2 1670488 1
2 1758800 1
[[3]]
2 1902924 1
2 1978088 1
Any ideas?
You can use rle to create a variable to use in split
# create a group identifier
DF$GRP <- with(rle(DF$CNV), rep(seq_along(lengths),lengths))
# split a subset of DF which contains only CNV==1
split(DF[DF$CNV==1,],DF[DF$CNV==1,'GRP'] )
$`2`
chr.pos nt.pos CNV GRP
4 1 783605 1 2
5 1 888149 1 2
6 1 991311 1 2
7 1 1089305 1 2
8 1 1177669 1 2
$`4`
chr.pos nt.pos CNV GRP
13 2 1670488 1 4
14 2 1758800 1 4
$`6`
chr.pos nt.pos CNV GRP
16 2 1902924 1 6
17 2 1978088 1 6

Create table with subtotal per row and per column

I know how to create table in R using table, like this:
x <- rep(1:3,4)
y <- rep(1:4,3)
z<- cbind(x,y)
table(z[,1],z[,2])
1 2 3 4
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
How can I add the margin total of the table to making it looks like:
1 2 3 4
1 1 1 1 1 4
2 1 1 1 1 4
3 1 1 1 1 4
3 3 3 3
> a
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
> a <- cbind(a, rowSums(a))
> a <- rbind(a, colSums(a))
> a
[,1] [,2] [,3] [,4]
[1,] 1 3 1 5
[2,] 1 1 1 3
[3,] 1 1 1 3
[4,] 3 5 3 11
Another approach:
a <- addmargins(a, c(1, 2), sum)

Resources