powers = c(c(1:10), seq(from = 12, to=20, by=2));
While going through WGCNA i came across this code which i am not able to understand, can anybody explain me the meaning of that piece of code
The code will create a vector of numbers stored in powers.
Specifically: 1:10 creates the numbers 1 2 3 4 5 6 7 8 9 10 (can read as 1 through 10) and seq(from = 12, to = 20, by = 2) creates a sequence of every other number from 12 to 20, i.e. 12 14 16 18 20.
Powers will contain the following 15 numbers: 1 2 3 4 5 6 7 8 9 10 12 14 16 18 20
I am not familiar with the WGCNApackage or if powers is an argument to a function, but this is what powers contains.
Related
values <- c(5, 3, 2, 2.9999, 2.9998, 2.9997, 2.99996, 2.9995, 2.9994, 2.9993,
9, 2, 1.9999, 2.9999, 2.9998, 2.9997, 2.99996, 2.9995, 2.9994, 2.9993)
I have a string of values, and I want to obtain the indices in which the difference between any two consecutive numbers is below some tolerance level.
tol = 0.001
> which(abs(diff(values)) < tol)
[1] 4 5 6 7 8 9 12 14 15 16 17 18 19
I want to make sure that the difference between any two numbers meets the tolerance level for at least 5 consecutive values, so the output should look something like this (no index 12 anymore because even though the difference between 2 and 1.9999 is below tol, the difference between 1.9999 and 2.9999 is not below tol, so the 5 consecutive number rule is not met)
4 5 6 7 8 9 14 15 16 17 18 19
How can I check the difference between any two numbers is less than the tolerance level for at least 5 consecutive values?
You could use rle to check for 5 consecutive values.
which(with(rle(abs(diff(values)) < tol), rep(values & lengths >= 5, lengths)))
#[1] 4 5 6 7 8 9 14 15 16 17 18 19
You could use stats::filter to check for 5 consecutive values that meet some condition.
which(filter(abs(diff(values)) < tol, filter=rep(1, 5), sides=1)==5) - 4
[1] 4 5 14 15
Which give the starting positions of the indices that have 5 consecutive values whose differences are within tol.
I have a list of barcodes with the format: AAACCTGAGCGTCAAG-1
The letters can be A, C, G or T and the number after the dash can be 1 - 16.
barcode = c('AAACCTGAGCGTCAAG-1',
'AAACCTGAGTACCGGA-1',
'AAACCTGCAGCTGCTG-1',
'AAACCTGCATCACGAT-3',
'AAACCTGCATTGGGCC-5',
'AAACCTGGTATAGTAG-10',
'AAACCTGGTCGCGTGT-1',
'AAACCTGGTTTCCACC-16',
'AAACCTGTCATGCATG-14',
'AAACCTGTCGCAGGCT-15',
'AAACGGGAGAACTCGG-1')
cluster = c(6,3,6,16,17,11,14,18,9,8,14)
df <- data.frame(Barcode = barcode, Cluster = cluster)
I need to subset this dataframe based on the -# at the end of the barcode. I have been using this to subset the dataframe. The problem is this works for every number except 1.
> df[grep("([ACGT]-10){1}", df$Barcode),]
Barcode Cluster
6 AAACCTGGTATAGTAG-10 11
When I use the following, it will include all the barcodes that end in -1, as well as -10, -11, -12, -13, -14, -15 and -16.
> df[grep("([ACGT]-1){1}", df$Barcode),]
Barcode Cluster
1 AAACCTGAGCGTCAAG-1 6
2 AAACCTGAGTACCGGA-1 3
3 AAACCTGCAGCTGCTG-1 6
6 AAACCTGGTATAGTAG-10 11
7 AAACCTGGTCGCGTGT-1 14
8 AAACCTGGTTTCCACC-16 18
9 AAACCTGTCATGCATG-14 9
10 AAACCTGTCGCAGGCT-15 8
11 AAACGGGAGAACTCGG-1 14
>
Is there a regex that will include barcodes ending in -1, but exclude all other barcodes that end in numbers from 10 - 16?
I want to subset the dataframe so that I only get this:
Barcode Cluster
1 AAACCTGAGCGTCAAG-1 6
2 AAACCTGAGTACCGGA-1 3
3 AAACCTGCAGCTGCTG-1 6
7 AAACCTGGTCGCGTGT-1 14
11 AAACGGGAGAACTCGG-1 14
>
Thanks!
How about:
df[grep("-1$", df$Barcode),]
This matches 1 at the end of the string, but also requires that the digit before 1 is not 1, so you don't match 11
Barcode Cluster
1 AAACCTGAGCGTCAAG-1 6
2 AAACCTGAGTACCGGA-1 3
3 AAACCTGCAGCTGCTG-1 6
7 AAACCTGGTCGCGTGT-1 14
11 AAACGGGAGAACTCGG-1 14
I think you can just use df[grep("([ACGT]-1$){1}", df$Barcode),]
You can just use a $ to specify the end of the chain. See more information here on "pattern" use: http://www.jdatalab.com/data_science_and_data_mining/2017/03/20/regular-expression-R.html
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I'm fairly new here and also fairly new to R so apologies if anything is unclear.
Basically, I have a csv table of numbers for each person, 1 number for each week for 38 weeks.
For example, Anthony has number 6 in week 1, 12 in week 2 and so on, these numbers are fairly random and range from 1-20.
I have taken the numbers from the table and saved them into a string, hence Anthonys string when printed would look like
"6 12 18 7 17 4 16 11 20 15 3 5 19 10 8 9 1 14 13 19 11 16 18 4 17 7 6 12 14 1 10 13 20 15 3 5 8 9"
What I'm trying to do with this is find/count the amount of times a number between 1 and 10 occurs in groups of 3 consecutively and then groups of 4 consecutively and possibly 5.
For example, in this string 8, 9 and 1 occur consecutively and then 3, 5, 8 and 9 occur consecutively, meaning the amount of occurrences is 2.
I've tried using str_count from the stringr package and also tried a few different functions located here - Count the number of overlapping substrings within a string
I can't seem to find a method/function to get this to output what I want (a simple count of the number of occurrences).
If anyone could provide any insight/help it would be greatly appreciated.
It would be easier to keep these as numbers. Here I use scan() to turn your string into a vector of values indicating if each number is less than 10 or not then I call rle() on it to calculate run lenths
x <- "6 12 18 7 17 4 16 11 20 15 3 5 19 10 8 9 1 14 13 19 11 16 18 4 17 7 6 12 14 1 10 13 20 15 3 5 8 9"
rr <- rle(scan(text=x)<10)
Now I can mangle this into a data.frame and see which runs were longer than 2
subset(as.data.frame(unclass(rr)), values==T & lengths>2)
# lengths values
# 9 3 TRUE
# 17 4 TRUE
So we can see that we had a run of 3 and a run of 4.
I could clean this up by defining a function to turn the rle into a data.frame more easily and track the starting indexes
as.data.frame.rle <- function(x) {
data.frame(unclass(x), start=head(cumsum(c(0,rr$lengths))+1,-1))
}
and can then run
subset(as.data.frame(rle(scan(text=x)<10)), values==T & lengths>2)
# lengths values start
# 9 3 TRUE 15
# 17 4 TRUE 35
so we can see those runs start at positions 15 and 35.
I am working with a large dataset and I am trying to first identify clusters of values that meet specific threshold values. My aim then is to only keep clusters of a minimum length. Below is some example data and my progress thus far:
Test = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
Sequence = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
Value = c(3,2,3,4,3,4,4,5,5,2,2,4,5,6,4,4,6,2,3,2)
Data <- data.frame(Test, Sequence, Value)
Using package evd, I have identified clusters of values >3
C1 <- clusters(Data$Value, u = 3, r = 1, cmax = F, plot = T)
Which produces
C1
$cluster1
4
4
$cluster2
6 7 8 9
4 4 5 5
$cluster3
12 13 14 15 16 17
4 5 6 4 4 6
My problem is twofold:
1) I don't know how to relate this back to the original dataframe (for example to Test A & B)
2) How can I only keep clusters with a minimum size of 3 (thus excluding Cluster 1)
I have looked into various filtering options etc. however they do not cluster data according to a desired threshold, with no options for the minimum size of the cluster either.
Any help is much appreciated.
Q1: relate back to original dataframe: Have a look at Carl Witthoft's answer. He wrote a variant of rle() (seqle() because it allows one to look for integer sequences rather than repetitions): detect intervals of the consequent integer sequences
Q2: only keep clusters of certain length:
C1[sapply(C1, length) > 3]
yields the 2 clusters that are long enough:
$cluster2
6 7 8 9
4 4 5 5
$cluster3
12 13 14 15 16 17
4 5 6 4 4 6
Converting 24-hour time (like military time) to 12-hr (clock-face) time seems like a perfect place to use the modulo operator, but I can't figure out a purely mathematical way to map 0 to 12 (so have hours 1 through 12 instead of 0 through 11). The best I've been able to come up with are either (in Ruby)
modHour = militaryHour % 12
if modHour == 0
clockHour = 12
else
clockHour = modHour
end
or,
hours = [12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
clockHour = hours[ militaryHour % 12 ]
It seems like there must be some way to accomplish this shift mathematically, but I can't figure it out.
I think
hour12 = 12 - ((- hour24) % 12)
should work.
(pardon my Python...)
>>> for hr in range (24):
... print hr, (hr + 11) % 12 + 1
...
0 12
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 1
14 2
15 3
16 4
17 5
18 6
19 7
20 8
21 9
22 10
23 11
The answer by Eric Jablow did not yield the correct answer for me. I found that this inline function worked though.
int militaryTime = 14;
int civilianTime = ((24hr - 1) % 12) + 1;