I have a the following data set:
data <- cbind(c(1,2,3,4,5,6,7,8,9,10,11),c(1,11,21,60,30,2,61,12,3,35,63))
I would like to select the rows for which the number in the second column is greater than the highest number reached up to that point. The result should look like this.
[,1] [,2]
[1,] 1 1
[2,] 2 11
[3,] 3 21
[4,] 4 60
[5,] 7 61
[6,] 11 63
You want to try cummax:
> d[ d[,2] == cummax(d[,2]) ,]
[,1] [,2]
[1,] 1 1
[2,] 2 11
[3,] 3 21
[4,] 4 60
[5,] 7 61
[6,] 11 63
PS. data is an internal R function, so, since R variables and functions share the namespace (R design was influenced by Scheme, which is a "Lisp-1"), your variable shadows the system function.
The cummax function should work well
data[ data[,2]==cummax(data[,2]),]
returns
[,1] [,2]
[1,] 1 1
[2,] 2 11
[3,] 3 21
[4,] 4 60
[5,] 7 61
[6,] 11 63
as desired.
Related
Good morning !
Assume we have the following matrix :
m=matrix(1:18,ncol=2)
print("m : before")
print(m)
[1] "m : before"
[,1] [,2]
[1,] 1 10
[2,] 2 11
[3,] 3 12
[4,] 4 13
[5,] 5 14
[6,] 6 15
[7,] 7 16
[8,] 8 17
[9,] 9 18
As an example, I'm wanting to permute a number of rows :
tmp=m[8:9,]
m[8:9,]=m[3:4,]
m[3:4,]=tmp
This is the same as :
# indices to permute
before=8:9
after=3:4
tmp=m[before,]
m[before,]=m[after,]
m[after,]=tmp
[1] "after"
[,1] [,2]
[1,] 1 10
[2,] 2 11
[3,] 8 17
[4,] 9 18
[5,] 5 14
[6,] 6 15
[7,] 7 16
[8,] 3 12
[9,] 4 13
I'm wanting to know if there is any package that automatize such task. For the moment , I'm not willing to use a user-defined function.
Thank you for help !
I think the simplest solution is just to use base r function, like sample:
set.seed(4)
m[sample(1:nrow(m),nrow(m)),]
which give you:
[,1] [,2]
[1,] 8 17
[2,] 3 12
[3,] 9 18
[4,] 7 16
[5,] 4 13
[6,] 6 15
[7,] 2 11
[8,] 1 10
[9,] 5 14
If you want to permute just some rows you can do :
m[7:9,] <- m[sample(7:9,3),]#where the last number (3) is the number of row
to permute
which give you
[,1] [,2]
[1,] 1 10
[2,] 2 11
[3,] 3 12
[4,] 4 13
[5,] 5 14
[6,] 6 15
[7,] 7 16
[8,] 9 18
[9,] 8 17
Just try to exchange the index order.
m[c(before,after),] = m[c(after,before),]
I am having some problem understanding how to initialize data frames with matrix. When I execute the following:
m1 = cbind(1:5,11:15)
m2 = cbind(21:25, 31:35)
d = data.frame(m1)
d$m2 = m2
How can I create directly create a dataframe with m1, for which df$m1 would return a matrix, as the df$m2 does in my example?
Use I to specify the matrices should be treated "as is"
> d<-data.frame(m1=I(m1),m2=I(m2))
> d$m1
[,1] [,2]
[1,] 1 11
[2,] 2 12
[3,] 3 13
[4,] 4 14
[5,] 5 15
> d$m2
[,1] [,2]
[1,] 21 31
[2,] 22 32
[3,] 23 33
[4,] 24 34
[5,] 25 35
The function mapply() appears not to properly work in the following case:
a <- list(matrix(1:8,4,2),matrix(1:9,3,3))
b <- list(1:4,1:3)
mapply(a,b,FUN=cbind)
that gives the following matrix
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 4
[5,] 5 5
[6,] 6 6
[7,] 7 7
[8,] 8 8
[9,] 1 9
[10,] 2 1
[11,] 3 2
[12,] 4 3
instead of the following (expected) result:
[[1]]
[,1] [,2] [,3]
[1,] 1 5 1
[2,] 2 6 2
[3,] 3 7 3
[4,] 4 8 4
[[2]]
[,1] [,2] [,3] [,4]
[1,] 1 4 7 1
[2,] 2 5 8 2
[3,] 3 6 9 3
Can anybody help me in understanding if something in my code is wrong? Thank you!
Make sure to set SIMPLIFY to false
mapply(a,b,FUN=cbind, SIMPLIFY=FALSE)
otherwise mapply tries to coerce everything into a compatible single result. In your case, because the return from each call had 12 elements, it put those two elements side by side in a matrix, with the first matrix values in the first column, and the second matrix in the second column.
Alternatively you can use
Map(cbind, a, b)
which always returns a list. (Map is also nice because if a has names it will use those names in the resulting list which isn't useful in this case, but may be useful in others.)
I'm a newbie in R so, I really need some help here. I just want to sort each column independently. Any help is appreciated!
> mat <- matrix(c(45,34,1,3,4325,23,1,2,5,7,3,4,32,734,2),ncol=3)
> mat
[,1] [,2] [,3]
[1,] 45 23 3
[2,] 34 1 4
[3,] 1 2 32
[4,] 3 5 734
[5,] 4325 7 2
to
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 3 2 3
[3,] 34 5 4
[4,] 45 7 32
[5,] 4325 23 734
Yes, there is!
apply(mat, 2, sort)
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 3 2 3
[3,] 34 5 4
[4,] 45 7 32
[5,] 4325 23 734
Now I'm doing it by looping trhough a sorted vector, but maybe there is a faster way using internal R functions, and maybe I don't even need to sort.
vect = c(41,42,5,6,3,12,10,15,2,3,4,13,2,33,4,1,1)
vect = sort(vect)
print(vect)
outvect = mat.or.vec(length(vect),1)
outvect[1] = counter = 1
for(i in 2:length(vect)) {
if (vect[i] != vect[i-1]) { counter = counter + 1 }
outvect[i] = counter
}
print(cbind(vect,outvect))
vect outvect
[1,] 1 1
[2,] 1 1
[3,] 2 2
[4,] 2 2
[5,] 3 3
[6,] 3 3
[7,] 4 4
[8,] 4 4
[9,] 5 5
[10,] 6 6
[11,] 10 7
[12,] 12 8
[13,] 13 9
[14,] 15 10
[15,] 33 11
[16,] 41 12
[17,] 42 13
The code is used to make charts with integers on the X axis instead of real data because for me distance between the X values is not important.
So in my case the smallest x value is always 1. and the largest is always equal to how many X values are there.
-- edit: due to some misuderstanding about my question I added self sufficient code with output.
That's more clear. Hence :
> vect = c(41,42,5,6,3,12,10,15,2,3,4,13,2,33,4,1,1)
> cbind(vect,as.numeric(factor(vect)))
[1,] 41 12
[2,] 42 13
[3,] 5 5
[4,] 6 6
[5,] 3 3
[6,] 12 8
[7,] 10 7
[8,] 15 10
[9,] 2 2
[10,] 3 3
[11,] 4 4
[12,] 13 9
[13,] 2 2
[14,] 33 11
[15,] 4 4
[16,] 1 1
[17,] 1 1
No sort needed. And as said, see also ?factor
and if you want to preserve the order, then:
> cbind(vect,as.numeric(factor(vect,levels=unique(vect))))
vect
[1,] 41 1
[2,] 42 2
[3,] 5 3
[4,] 6 4
[5,] 3 5
[6,] 12 6
[7,] 10 7
[8,] 15 8
[9,] 2 9
[10,] 3 5
[11,] 4 10
[12,] 13 11
[13,] 2 9
[14,] 33 12
[15,] 4 10
[16,] 1 13
[17,] 1 13
Joris solution is right on, but if you have a long vectors, it is a bit (3x) more efficient to use match and unique:
> x=sample(1e5, 1e6, replace=TRUE)
> # preserve order:
> system.time( a<-cbind(x, match(x, unique(x))) )
user system elapsed
0.20 0.00 0.22
> system.time( b<-cbind(x, as.numeric(factor(x,levels=unique(x)))) )
user system elapsed
0.70 0.00 0.72
> all.equal(a,b)
[1] TRUE
>
> # sorted solution:
> system.time( a<-cbind(x, match(x, sort(unique(x)))) )
user system elapsed
0.25 0.00 0.25
> system.time( b<-cbind(x, as.numeric(factor(x))) )
user system elapsed
0.72 0.00 0.72
> all.equal(a,b)
[1] TRUE
You can try this :
(Note that you may want a different behaviour for repeated values. This will give each value a unique rank)
> x <- sample(size=10, replace=T, x=1:100)
> x1 <- vector(length=length(x))
> x1[order(x)] <- 1:length(x)
> cbind(x, x1)
x x1
[1,] 40 1
[2,] 46 4
[3,] 43 3
[4,] 41 2
[5,] 47 5
[6,] 84 10
[7,] 75 8
[8,] 60 7
[9,] 59 6
[10,] 80 9
It looks like you are counting runs in the data, if that is the case, look at the rle function.
You apparently want the results of something like table() but lined up next to the values: Try using the ave() function:
csvdata$counts <- ave(csvdata[, "X"], factor(csvdata[["X"]]), FUN=length)
The trick here is that the syntax of ave is a bit different than tapply because you put in an arbitrarily long set of factor arrguments and you need to put in the FUN= in front of the function because the arguments after triple dots are not process by order. They need to be named.