how to count number of digit in a vector? (in R) - r

Given a vector
num <- c(1, 2, 4, 13, 25)
I want to count how many times each digit appears; in this case the result would be
digit
times
1
2
2
2
3
1
4
1
5
1

You may try
table(unlist(strsplit(as.character(num), "")))
1 2 3 4 5
2 2 1 1 1

Related

Is there a quick way to transform intervals (Start and End) into a list of number in this interval in R

I have a file with interval values such as this for 50M lines:
>data
start_pos end_pos
1 1 10
2 3 6
3 5 9
4 6 11
And I would like to have a table of position occurrences so that I can compute the coverage on each position in the interval file such as this:
>occurence
position coverage
1 1
2 1
3 2
4 2
5 3
6 4
7 3
8 3
9 3
10 2
11 1
Is there any fast and best way to complete this task in R?
My plan was to loop through the data and concatenate the sequence in each interval into a vector and convert the final vector into a table.
count<-c()
for (row in 1:nrow(data)){
count<-c(count,(data[row,]$start_pos:data[row,]$end_pos))
}
occurence <- table(count)
The problem is that my file is huge and it takes way to much time and memory to do so.
The Bioconductor IRanges package does this fast and efficiently
library(IRanges)
ir = IRanges(start = c(1, 3, 5, 6), end = c(10, 6, 9, 11))
coverage(ir)
with
> coverage(ir) |> as.data.frame()
value
1 1
2 1
3 2
4 2
5 3
6 4
7 3
8 3
9 3
10 2
11 1

How to split the data 1 1 2 2 3 3 to 1 2 3 1 2 3 in R? [duplicate]

This question already has an answer here:
Sort vector into repeating sequence when sequential values are missing R
(1 answer)
Closed 6 months ago.
I want to convert a vector:
1 1 2 2 3 3
to
1 2 3 1 2 3
How to do it? Many thanks.
You can use a matrix to layout the original vector by rows and then convert it back to a vector to get the desired result.
v = c(1,1,2,2,3,3)
v2 = as.vector(matrix(v, nrow = length(unique(v)), byrow = T))
> v2
[1] 1 2 3 1 2 3
The length(unique(v)) is there to generalize how many rows the matrix should have and not hardcode a 3.
Another example:
v = c(1,1,1,2,2,2,3,3,3,4,4,4)
v2 = as.vector(matrix(v, nrow = length(unique(v)), byrow = T))
v2
[1] 1 2 3 4 1 2 3 4 1 2 3 4
We can use rbind/split
c(do.call(rbind, split(v1, v1)))
#[1] 1 2 3 1 2 3
Or if there are unequal number of replications of each element, get the order of the rowid
library(data.table)
v1[order(rowid(v1))]
#[1] 1 2 3 1 2 3
Or with base R
v1[order(ave(v1, v1, FUN = seq_along))]
#[1] 1 2 3 1 2 3
data
v1 <- c(1, 1, 2, 2, 3, 3)
vec <- c(1, 1, 2, 2, 3, 3)
rep(unique(vec), 2)
[1] 1 2 3 1 2 3

Pair-wise manipulating rows in data.frame

I have data on several thousand US basketball players over multiple years.
Each basketball player has a unique ID. It is known for what team and on which position they play in a given year, much like the mock data df below:
df <- data.frame(id = c(rep(1:4, times=2), 1),
year = c(1, 1, 2, 2, 3, 4, 4, 4,5),
team = c(1,2,3,4, 2,2,4,4,2),
position = c(1,2,3,4,1,1,4,4,4))
> df
id year team position
1 1 1 1 1
2 2 1 2 2
3 3 2 3 3
4 4 2 4 4
5 1 3 2 1
6 2 4 2 1
7 3 4 4 4
8 4 4 4 4
9 1 5 2 4
What is an efficient way to manipulate df into new_df below?
> new_df
id move time position.1 position.2 year.1 year.2
1 1 0 2 1 1 1 3
2 2 1 3 2 1 1 4
3 3 0 2 3 4 2 4
4 4 1 2 4 4 2 4
5 1 0 2 1 4 3 5
In new_df the first occurrence of the basketball player is compared to the second occurrence, recorded whether the player switched teams and how long it took the player to make the switch.
Note:
In the real data some basketball players occur more than twice and can play for multiple teams and on multiple positions.
In such a case a new row in new_df is added that compares each additional occurrence of a player with only the previous occurrence.
Edit: I think this is not a rather simple reshape exercise, because of the reasons mentioned in the previous two sentences. To clarify this, I've added an additional occurrence of player ID 1 to the mock data.
Any help is most welcome and appreciated!
s=table(df$id)
df$time=rep(1:max(s),each=length(s))
df1 = reshape(df,idvar = "id",dir="wide")
transform(df1, move=+(team.1==team.2),time=year.2-year.1)
id year.1 team.1 position.1 year.2 team.2 position.2 move time
1 1 1 1 1 3 2 1 0 2
2 2 1 2 2 4 2 1 1 3
3 3 2 3 3 4 4 4 0 2
4 4 2 4 4 4 4 4 1 2
The below code should help you get till the point where the data is transposed
You'll have to create the move and time variables
df <- data.frame(id = rep(1:4, times=2),
year = c(1, 1, 2, 2, 3, 4, 4, 4),
team = c(1, 2, 3, 4, 2, 2, 4, 4),
position = c(1, 2, 3, 4, 1, 1, 4, 4))
library(reshape2)
library(data.table)
setDT(df) #convert to data.table
df[,rno:=rank(year,ties="min"),by=.(id)] #gives the occurance
#creating the transposed dataset
Dcast_DT<-dcast(df,id~rno,value.var = c("year","team","position"))
This piece of code did the trick, using data.table
#transform to data.table
dt <- as.data.table(df)
#sort on year
setorder(dt, year, na.last=TRUE)
#indicate the names of the new columns
new_cols= c("time", "move", "prev_team", "prev_year", "prev_position")
#set up the new variables
dtt[ , (new_cols) := list(year - shift(year),team!= shift(team), shift(team), shift(year), shift(position)), by = id]
# select only repeating occurrences
dtt <- dtt[!is.na(dtt$time),]
#outcome
dtt
id year team position time move prev_team prev_year prev_position
1: 1 3 2 1 2 TRUE 1 1 1
2: 2 4 2 1 3 FALSE 2 1 2
3: 3 4 4 4 2 TRUE 3 2 3
4: 4 4 4 4 2 FALSE 4 2 4
5: 1 5 2 4 2 FALSE 2 3 1

Count the occurrence of one vector's values in another vector including non match values in R

I have 2 vectors:
v1 <- c(1, 2, 3, 4, 1, 3, 5, 6, 4)
v2 <- c(1, 2, 3, 4, 5, 6, 7)
I want to calculate the occurrence of values of v1 in v2. The expected result is:
1 2 3 4 5 6 7
2 1 2 2 1 1 0
I know there is a function can do this:
table(v1[v1 %in% v2])
However, it only list the matched values:
1 2 3 4 5 6
2 1 2 2 1 1
How can I show all the values in v2?
You can do
table(factor(v1, levels=unique(v2)))
# 1 2 3 4 5 6 7
# 2 1 2 2 1 1 0

Count frequency of each element in vector

I'm looking for a way to count the frequency of each element in a vector.
ex <- c(2,2,2,3,4,5)
Desired outcome:
[1] 3 3 3 1 1 1
Is there a simple command for this?
rep(table(ex), table(ex))
# 2 2 2 3 4 5
# 3 3 3 1 1 1
If you don't want the labels you can wrap in as.vector()
as.vector(rep(table(ex), table(ex)))
# [1] 3 3 3 1 1 1
I'll add (because it seems related somehow) that if you only wanted consecutive values, you could use rle instead of table:
ex2 = c(2, 2, 2, 3, 4, 2, 2, 3, 4, 4)
rep(rle(ex2)$lengths, rle(ex2)$lengths)
# [1] 3 3 3 1 1 2 2 1 2 2
As pointed out in comments, for a large vector calculating a table can be expensive, so doing it only once is more efficient:
tab = table(ex)
rep(tab, tab)
# 2 2 2 3 4 5
# 3 3 3 1 1 1
You can use
ex <- c(2,2,2,3,4,5)
outcome <- ave(ex, ex, FUN = length)
This is what thelatemail suggested. Also similar to the answer at this question

Resources