R: counting amount of patterns of numbers [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I'm fairly new here and also fairly new to R so apologies if anything is unclear.
Basically, I have a csv table of numbers for each person, 1 number for each week for 38 weeks.
For example, Anthony has number 6 in week 1, 12 in week 2 and so on, these numbers are fairly random and range from 1-20.
I have taken the numbers from the table and saved them into a string, hence Anthonys string when printed would look like
"6 12 18 7 17 4 16 11 20 15 3 5 19 10 8 9 1 14 13 19 11 16 18 4 17 7 6 12 14 1 10 13 20 15 3 5 8 9"
What I'm trying to do with this is find/count the amount of times a number between 1 and 10 occurs in groups of 3 consecutively and then groups of 4 consecutively and possibly 5.
For example, in this string 8, 9 and 1 occur consecutively and then 3, 5, 8 and 9 occur consecutively, meaning the amount of occurrences is 2.
I've tried using str_count from the stringr package and also tried a few different functions located here - Count the number of overlapping substrings within a string
I can't seem to find a method/function to get this to output what I want (a simple count of the number of occurrences).
If anyone could provide any insight/help it would be greatly appreciated.

It would be easier to keep these as numbers. Here I use scan() to turn your string into a vector of values indicating if each number is less than 10 or not then I call rle() on it to calculate run lenths
x <- "6 12 18 7 17 4 16 11 20 15 3 5 19 10 8 9 1 14 13 19 11 16 18 4 17 7 6 12 14 1 10 13 20 15 3 5 8 9"
rr <- rle(scan(text=x)<10)
Now I can mangle this into a data.frame and see which runs were longer than 2
subset(as.data.frame(unclass(rr)), values==T & lengths>2)
# lengths values
# 9 3 TRUE
# 17 4 TRUE
So we can see that we had a run of 3 and a run of 4.
I could clean this up by defining a function to turn the rle into a data.frame more easily and track the starting indexes
as.data.frame.rle <- function(x) {
data.frame(unclass(x), start=head(cumsum(c(0,rr$lengths))+1,-1))
}
and can then run
subset(as.data.frame(rle(scan(text=x)<10)), values==T & lengths>2)
# lengths values start
# 9 3 TRUE 15
# 17 4 TRUE 35
so we can see those runs start at positions 15 and 35.

Related

WGCNA : Choosing a soft-threshold power

powers = c(c(1:10), seq(from = 12, to=20, by=2));
While going through WGCNA i came across this code which i am not able to understand, can anybody explain me the meaning of that piece of code
The code will create a vector of numbers stored in powers.
Specifically: 1:10 creates the numbers 1 2 3 4 5 6 7 8 9 10 (can read as 1 through 10) and seq(from = 12, to = 20, by = 2) creates a sequence of every other number from 12 to 20, i.e. 12 14 16 18 20.
Powers will contain the following 15 numbers: 1 2 3 4 5 6 7 8 9 10 12 14 16 18 20
I am not familiar with the WGCNApackage or if powers is an argument to a function, but this is what powers contains.

Frequency distribution using binCounts

I have a dataset of Ages for the customer and I wanted to make a frequency distribution by 9 years of a gap of age.
Ages=c(83,51,66,61,82,65,54,56,92,60,65,87,68,64,51,
70,75,66,74,68,44,55,78,69,98,67,82,77,79,62,38,88,76,99,
84,47,60,42,66,74,91,71,83,80,68,65,51,56,73,55)
My desired outcome would be similar to below-shared table, variable names can be differed(as you wish)
Could I use binCounts code into it ? if yes could you help me out using the code as not sure of bx and idxs in this code?
binCounts(x, idxs = NULL, bx, right = FALSE) ??
Age Count
38-46 3
47-55 7
56-64 7
65-73 14
74-82 10
83-91 6
92-100 3
Much Appreciated!
I don't know about the binCounts or even the package it is in but i have a bare r function:
data.frame(table(cut(Ages,0:7*9+37)))
Var1 Freq
1 (37,46] 3
2 (46,55] 7
3 (55,64] 7
4 (64,73] 14
5 (73,82] 10
6 (82,91] 6
7 (91,100] 3
To exactly duplicate your results:
lowerlimit=c(37,46,55,64,73,82,91,101)
Labels=paste(head(lowerlimit,-1)+1,lowerlimit[-1],sep="-")#I add one to have 38 47 etc
group=cut(Ages,lowerlimit,Labels)#Determine which group the ages belong to
tab=table(group)#Form a frequency table
as.data.frame(tab)# transform the table into a dataframe
group Freq
1 38-46 3
2 47-55 7
3 56-64 7
4 65-73 14
5 74-82 10
6 83-91 6
7 92-100 3
All this can be combined as:
data.frame(table(cut(Ages,s<-0:7*9+37,paste(head(s+1,-1),s[-1],sep="-"))))

Adding all values of a variable in R [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 7 years ago.
I don't know how to word the title exactly, so I will just do my best to explain below... Sorry in advance for the .csv format.
I have the following example dataset:
print(data)
ID Tag Flowers
1 1 6871 1
2 2 6750 1
3 3 6859 1
4 4 6767 1
5 5 6747 1
6 6 6261 1
7 7 6750 1
8 8 6767 1
9 9 6812 1
10 10 6746 1
11 11 6496 4
12 12 6497 1
13 13 6495 4
14 14 6481 1
15 15 6485 1
Notice that in Lines 2 and 7, the tag 6750 appears twice. I observed one flower on plant number 6750 on two separate days, equaling two flowers in its lifetime. Basically, I want to add every flower that occurs for tag 6750, tag 6767, etc throughout ~100 rows. Each tag appears more than once, usually around 4 or 5 times.
I feel like I need to apply the unlist function here, but I'm a little bit lost as to how I should do so.
Without any extra packages, you can use function aggregate():
res<-aggregate(data$Flowers, list(data$Tag), sum)
This calculates a sum of the values in Flowers column for every value in the Tag column.

Issue with order function [duplicate]

This question already has answers here:
Understanding the order() function
(7 answers)
Closed 9 years ago.
I have this function and it takes a few parameters.
I have this part of the function here:
sort.order <- order(inputs[,input.of.interest])
Iif I read inputs I get something like:
Status Quo Vaccination
[1,] 10.409146 16.252537
[2,] 5.834875 9.373437
[3,] 5.784903 15.935623
[4,] 12.208484 18.654250
[5,] 9.786787 16.467321
[6,] 6.560276 9.689887
But what is input.of.interest supposed to be?
What does it mean, how is this function used?
Should it be a number, i.e if it's 2, what would it do?
It chooses the column to sort by. If it's 1 it sorts by Status Quo and if it's 2 it sorts by Vaccination.
x <- seq(20, 11, -1)
x
# [1] 20 19 18 17 16 15 14 13 12 11
order(x)
# [1] 10 9 8 7 6 5 4 3 2 1
x[order(x)]
# [1] 11 12 13 14 15 16 17 18 19 20
Hope you see better how it works.

Frequency distribution with custom format data

I need help with a R plot, with a data format I have not worked with before. Please help if you know.
NUMBER FREQUENCY
10 1
11 1
12 3
10 45
11 2
12 3
i need a bar plot with numbers on X axis (continuous, not bins in histogram) and frequency on Y, but combined.
like
10 46
11 3
12 6
it seems simple enough, but i have 10,000 rows and large numbers in real data so I am looking for a good solution in R without doing it manually.
What about:
##tapply splits dd$FREQ by dd$NUM and "sums" them
barplot(tapply(dd$FREQUENCY, dd$NUMBER, sum))
to get:
Read in your data:
dd = read.table(textConnection("NUMBER FREQUENCY
10 1
11 1
12 3
10 45
11 2
12 3"), header=TRUE)

Resources