My data is quite large, so I create a small matrix to better illustrate my demand.
test <- matrix(c(1:3, rep(0.5,3),4:1), nrow = 1, dimnames = list(1, 1:10))
The matrix will like this:
1 2 3 4 5 6 7 8 9 10
1 1 2 3 0.5 0.5 0.5 4 3 2 1
I want to subset this matrix with multiple columns when its value is equal to specifc value likes 0.5:
4 5 6
1 0.5 0.5 0.5
Since my data would have more than 10,000 columns, I'm looking for codes that can solve my problem.
Related
Consider the following example data, stored in a dataframe called df
df
x y
2 4
1 5
0 8
As you can see, there are 3 rows to this dataframe. What I'd like to do is take 100 row samples, where each row has an equal probability of being selecting (in this case 1/3). My output, let's call it df_result would look something like this:
df_result
x y
0 8
2 4
0 8
1 5
1 5
2 4
etc..... until 100 samples are taken.
I saw this previous stackoverflow post which detailed how to take random samples for a dataframe: df[sample(nrow(df), 3), ]
However, when I tried to sample 100 rows, this (predictably) did not work, and did not allow for the sampling probability to be assigned.
Any tips?
Thanks`
df <- read.table(header = TRUE,
text = "x y
2 4
1 5
0 8")
set.seed(1)
df[sample(nrow(df), 10, replace=T), ]
x y
1 2 4
2 1 5
2.1 1 5
3 0 8
1.1 2 4
3.1 0 8
3.2 0 8
2.2 1 5
2.3 1 5
1.2 2 4
I have a table like the one below with 100's of rows of data.
ID RANK
1 2
1 3
1 3
2 4
2 8
3 3
3 3
3 3
4 6
4 7
4 7
4 7
4 7
4 7
4 6
I want to try to find a way to group the data by ID so that I can ReRank each group separately. The ReRank column is based on the Rank column and basically renumbering it starting at 1 from least to greatest, but it's important to note that the the number in the ReRank column can be put in more than once depending on the numbers in the Rank column .
In other words, the output needs to look like this
ID Rank ReRANK
1 3 2
1 2 1
1 3 2
2 4 1
2 8 2
3 3 1
3 3 1
3 3 1
For the life of me, I can't figure out how to be able to ReRank the the columns by the grouped columns and the value of the Rank columns.
This has been my best guess so far, but it definitely is not doing what I need it to do
ReRANK = mat.or.vec(length(RANK),1)
ReRANK[1] = counter = 1
for(i in 2:length(RANK)) {
if (RANK[i] != RANK[i-1]) { counter = counter + 1 }
ReRANK[i] = counter
}
Thank you in advance for the help!!
Here is a base R method using ave and rank:
df$ReRank <- ave(df$Rank, df$ID, FUN=function(i) rank(i, ties.method="min"))
The min argument in rank assures that the minimum ranking will occur when there are ties. the default is to take the mean of the ranks.
In the case that you have ties lower down in the groups, rank will count those lower values and then add continue with the next lowest value as the count of the lower values + 1. These values wil still be ordered and distinct. If you really want to have the count be 1, 2, 3, and so on rather than 1, 3, 6 or whatever depending on the number of duplicate values, here is a little hack using factor:
df$ReRank <- ave(df$Rank, df$ID, FUN=function(i) {
as.integer(factor(rank(i, ties.method="min"))))
Here, we use factor to build values counting from upward for each level. We then coerce it to be an integer.
For example,
temp <- c(rep(1, 3), 2,5,1,4,3,7)
[1] 2.5 2.5 2.5 5.0 8.0 2.5 7.0 6.0 9.0
rank(temp, ties.method="min")
[1] 1 1 1 5 8 1 7 6 9
as.integer(factor(rank(temp, ties.method="min")))
[1] 1 1 1 2 5 1 4 3 6
data
df <- read.table(header=T, text="ID Rank
1 2
1 3
1 3
2 4
2 8
3 3
3 3
3 3 ")
I've looked on the internet but I haven found the answer that I'm looking for, but shure it's out there...
I've a data frame, and I want to divide (or any other operation) every cell of a row by a value that it's placed in the second column of my data frame.
So first row from col3 to last col, divide each cell by the value of col2 of that certain row, and so on for every single row.
I have solved this by using a For loop, col2 (delta) it's now a vector, and col3 to end it's a data.frame (mu). The results are append to a new data frame by using rbind.
The question is; I'm pretty sure that this can be done by using the function apply, sapply or similar, but I have not gotten the results that I've been looking so far (not the good ones as I do with the loop for). ¿How can I do it without using a loop for?
Loop for I've been using so far.
In resume.
I want to divide each mu by the delta value of it's own row.
for (i in 1:(dim(mu)[1])){
RA_row <- mu[i,]/delta[i]
RA <- rbind(RA, RA_row)
}
transcript delta mu_5 mu_15 mu_25 mu_35 mu_45 mu_55 mu_65
1 YAL001C 0.066702720 2.201787e-01 1.175731e-01 2.372506e-01 0.139281317 0.081723456 1.835414e-01 1.678318e-01
2 YAL002W 0.106000180 3.685822e-01 1.326865e-01 2.887973e-01 0.158207858 0.193476082 1.867039e-01 1.776946e-01
3 YAL003W 0.022119345 2.271518e+00 2.390637e+00 1.651997e+00 3.802739732 2.733559839 2.772454e+00 3.571712e+00
Thanks
It appears as though you want just:
mu2 <- mu[-(1:2)]/mu[[2]]
# same as mu[-(1:2), ]/mu[['delta']]
That should produce a new dataframe with the division by row. Somewhat more dangerous would be to do the division "in place".
mu[-(1:2)] <- mu[-(1:2)]/mu[[2]]
> mu <- data.frame(a=1,b=1:10, c=rnorm(10), d=rnorm(10) )
> mu
a b c d
1 1 1 -1.91435943 0.45018710
2 1 2 1.17658331 -0.01855983
3 1 3 -1.66497244 -0.31806837
4 1 4 -0.46353040 -0.92936215
5 1 5 -1.11592011 -1.48746031
6 1 6 -0.75081900 -1.07519230
7 1 7 2.08716655 1.00002880
8 1 8 0.01739562 -0.62126669
9 1 9 -1.28630053 -1.38442685
10 1 10 -1.64060553 1.86929062
> (mu2 <- mu[-(1:2)]/mu[[2]])
c d
1 -1.914359426 0.450187101
2 0.588291656 -0.009279916
3 -0.554990812 -0.106022792
4 -0.115882600 -0.232340537
5 -0.223184021 -0.297492062
6 -0.125136500 -0.179198716
7 0.298166649 0.142861258
8 0.002174452 -0.077658337
9 -0.142922281 -0.153825205
10 -0.164060553 0.186929062
> (mu[-(1:2)] <- mu[-(1:2)]/mu[[2]] )
> mu
a b c d
1 1 1 -1.914359426 0.450187101
2 1 2 0.588291656 -0.009279916
3 1 3 -0.554990812 -0.106022792
4 1 4 -0.115882600 -0.232340537
5 1 5 -0.223184021 -0.297492062
6 1 6 -0.125136500 -0.179198716
7 1 7 0.298166649 0.142861258
8 1 8 0.002174452 -0.077658337
9 1 9 -0.142922281 -0.153825205
10 1 10 -0.164060553 0.186929062
I have a matrix A[72][36] and I would like to fit the values of A in a bigger matrix B[360][180].
I constructed this data frame linking the col/row index of A to the new 'grid'.
> head(INDEX)
LonNew LatNew LonINT LatINT
1 -179.5 -89.5 1 1
2 -178.5 -88.5 1 1
3 -177.5 -87.5 1 1
4 -176.5 -86.5 1 1
5 -175.5 -85.5 1 1
6 -174.5 -84.5 2 2
7 -173.5 -83.5 2 2
8 -172.5 -82.5 2 2
9 -171.5 -81.5 2 2
10 -170.5 -80.5 2 2
Then I calculated the corresponding values of the new Lat/Lon couples
NEWVar <- array(NA, dim = length(INDEX$LonNew))
for (j in 1:length(INDEX$LonINT) ){
NEWVar[j] <- A[INDEX$LonINT[j],INDEX$LatINT[j]]
}
> head(NEWVar)
3 3 3 3 3 4 4 4 4 4
The problem is then that I don't know how to create the new 360x180 matrix where for each couple (LonNew,LatNew) I have the corresponding NEWVar.
Can someone help me?
I've created a smaller, complete reproducible example. Here's the smaller matrix.
A<-matrix(1:4, nrow=2)
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
And let's say you want to scale that up to a 5x5 using this index.
INDEX<-data.frame(
LonNew = rep(c(-2,-2,0,2,2), each=5),
LatNew = rep(c(-2,-2,0,2,2), 5),
LonInt = rep(c(1,1,1,2,2), each=5),
LatInt = rep(c(1,1,2,1,2), 5)
)
The easiest way to turn the new values of Lat and Lon into array indexes is via factor variables. So i created
NNF <- factor(INDEX$LonNew)
TNF <- factor(INDEX$LatNew)
And i create the new B matrix with
B<-matrix(NA, nrow=nlevels(NNF), ncol=nlevels(TNF),
dimnames=list(levels(NNF), levels(TNF)))
And then I do the assignment with
B[cbind(NNF, TNF)] <- A[cbind(INDEX$LonInt, INDEX$LatInt)]
and that returns
# -2 0 2
# -2 1 3 3
# 0 1 3 3
# 2 2 4 4
which has scaled up the matrix according to the index data. The trick here was just index our matrices with matrices so we can grab different row and column values each time.
This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Select and insert value unique number of times in R
I would like to generate 2000 random numbers between 1 and 10 such that for each random number I have the same number of instances.
In this case 200 for each number.
What should be random is the order in which it is generated.
I have the following problem:
I have an array with 2000 entries but not each with unique values, for example it starts like this:
11112233333333344445667777777777
and consists of 2000 entries.
I would like to generate random numbers and assign each UNIQUE value a separate random number but have an entry for each value
So my intended result would look like this:
original array: 11112233333333344445667777777777
random numbers: 33334466666666699991778888888888
You could do this in a few steps:
my_numbers <- rep(1:10, each=200)
my_randomizer <- sample(seq_along(my_numbers), length(my_numbers))
my_random_numbers <- my_numbers[my_randomizer]
Based on the edits:
I would use rle. It sounds like you don't have an array, but instead a vector:
my_array_rled <- rle(my_array)
my_random_numbers <- sample(1:10, length(unique(my_array)))
my_array_rled$values <- factor(my_array_rled$values)
levels(my_array_rled$values) <- my_random_numbers
my_array_randomized <- inverse.rle(my_array_rled)
If I understand you correctly you can use "rep" to replicate your random numbers 200 times and "sample" to randomize the resulting vector.
x <- sample(rep(runif(2000,1,10),200))
A non vectorized code:
# using a seed for reproducible example
set.seed(2)
original_array <- c(1,1,1,1,2,2,3,3,3,3,3,3,3,3,3,4,4,4,4,5,6,6,7,7,7,7,7,7,7,7,7,7)
random_numbers <- numeric(length=length(original_array))
rdnum <- sample(unique(original_array), length(unique(original_array)))
for ( i in 1:length(unique(original_array)))
random_numbers[original_array == i] <- rdnum[i]
random_numbers
2 2 2 2 5 5 3 3 3 3 3 3 3 3 3 1 1 1 1 6 7 7 4 4 4 4 4 4 4 4 4 4
The table function with sample comes in quite handy for this scenerio:
set.seed(1)
## ASSUMING ORIGINAL IS A VECTOR
original <- c(1, 1, 1, 1, 2,2,3,3,3,3,3,3,3,3,3,4,4,4,4,5,6,6,7,7,7,7,7,7,7,7,7,7)
## CREATE A TABLE OF ALL THE VALUES
tabl <- table(original)
## RNG is the sample range to select from. Assuming 1:10 in this example
RNG <- 1:10
## PICK VALUES RANDOMLY FROM RNG
tabl[] <- sample(RNG, length(tabl), replace=FALSE)
# note that the `names` of `tabl` will contain the values from `original`
# whereas the values of `tabl` will contain the new random value.
## ASSIGN NEW VALUES
randomNums <- original
for(i in seq(length(tabl)))
randomNums[ original==as.numeric(names(tabl))[[i]] ] <- tabl[[i]]
Results:
rbind(orig=original, rand=randomNums)
orig: 1 1 1 1 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 5 6 6 7 7 7 7 7 7 7 7 7 7
rand: 3 3 3 3 4 4 5 5 5 5 5 5 5 5 5 7 7 7 7 2 8 8 9 9 9 9 9 9 9 9 9 9