Frequency count based on two columns in r

Frequency count based on two columns in r - r

I have just one dataframe as below.
df=data.frame(o=c(rep("a",12),rep("b",3)), d=c(0,0,1,0,0.3,0.6,0,1,2,3,4,0,0,1,0))
> df
o d
1 a 0.0
2 a 0.0
3 a 1.0
4 a 0.0
5 a 0.3
6 a 0.6
7 a 0.0
8 a 1.0
9 a 2.0
10 a 3.0
11 a 4.0
12 a 0.0
13 b 0.0
14 b 1.0
15 b 0.0
I want to add a new column that counts frequency based on both columns 'o' and 'd'.
And the frequency should start again from 1 if the value of column 'd' is zero like below(hand-made).
> df_result
o d freq
1 a 0.0 1
2 a 0.0 2
3 a 1.0 2
4 a 0.0 3
5 a 0.3 3
6 a 0.6 3
7 a 0.0 5
8 a 1.0 5
9 a 2.0 5
10 a 3.0 5
11 a 4.0 5
12 a 0.0 1
13 b 0.0 2
14 b 1.0 2
15 b 0.0 1

In base R, use ave :
df$freq <- with(df, ave(d, cumsum(d == 0), FUN = length))
df
# o d freq
#1 a 0.0 1
#2 a 0.0 2
#3 a 1.0 2
#4 a 0.0 3
#5 a 0.3 3
#6 a 0.6 3
#7 a 0.0 5
#8 a 1.0 5
#9 a 2.0 5
#10 a 3.0 5
#11 a 4.0 5
#12 a 0.0 1
#13 b 0.0 2
#14 b 1.0 2
#15 b 0.0 1
With dplyr :
library(dplyr)
df %>% add_count(grp = cumsum(d == 0))

using data.tables and #Ronak Shah approach
df=data.frame(o=c(rep("a",12),rep("b",3)), d=c(0,0,1,0,0.3,0.6,0,1,2,3,4,0,0,1,0))
library(data.table)
setDT(df)[, freq := .N, by = cumsum(d == 0)]
df
#> o d freq
#> 1: a 0.0 1
#> 2: a 0.0 2
#> 3: a 1.0 2
#> 4: a 0.0 3
#> 5: a 0.3 3
#> 6: a 0.6 3
#> 7: a 0.0 5
#> 8: a 1.0 5
#> 9: a 2.0 5
#> 10: a 3.0 5
#> 11: a 4.0 5
#> 12: a 0.0 1
#> 13: b 0.0 2
#> 14: b 1.0 2
#> 15: b 0.0 1
Created on 2021-02-26 by the reprex package (v1.0.0)

One more answer using rle()
df$freq <- with(rle(cumsum(df$d == 0)), rep(lengths, lengths))
df
o d freq
1 a 0.0 1
2 a 0.0 2
3 a 1.0 2
4 a 0.0 3
5 a 0.3 3
6 a 0.6 3
7 a 0.0 5
8 a 1.0 5
9 a 2.0 5
10 a 3.0 5
11 a 4.0 5
12 a 0.0 1
13 b 0.0 2
14 b 1.0 2
15 b 0.0 1

Related

Using grep to get the rows of a dataframe, instead of the row number

I am trying to make a sub dataframe based on the already existing dataframe. My sub dataframe is being filled with the number of the row instead of the row itself.
rates = read.csv("file.txt")
genes = unique(gsub('_[0-9]+', '', rates[,1]))
for (k in unique(gsub('_[0-9]+', '', rates[,1])) ){
sub = print(grep(k, rates[,1]), value=T)
sub
}
file.txt
clothing,freq,temp
coat_1,0.3,10
coat_1,0.9,0
coat_1,0.1,20
coat_2,0.5,20
coat_2,0.3,15
coat_2,0.1,5
scarf,0.4,30
scarf,0.2,20
scarf,0.1,10
This is what is currently output
[1] 1 2 3 4 5 6
[1] 7 8 9
I would like something like this instead
clothing freq temp
1 coat_1 0.3 10
2 coat_1 0.9 0
3 coat_1 0.1 20
4 coat_2 0.5 20
5 coat_2 0.3 15
6 coat_2 0.1 5
clothing freq temp
1 scarf 0.4 30
2 scarf 0.2 20
3 scarf 0.1 10

rates <- read.csv("file.txt", stringsAsFactors = FALSE)
rates
# clothing freq temp
# 1 coat_1 0.3 10
# 2 coat_1 0.9 0
# 3 coat_1 0.1 20
# 4 coat_2 0.5 20
# 5 coat_2 0.3 15
# 6 coat_2 0.1 5
# 7 scarf 0.4 30
# 8 scarf 0.2 20
# 9 scarf 0.1 10
rates[rates$clothing != "scarf",]
# clothing freq temp
# 1 coat_1 0.3 10
# 2 coat_1 0.9 0
# 3 coat_1 0.1 20
# 4 coat_2 0.5 20
# 5 coat_2 0.3 15
# 6 coat_2 0.1 5
rates[rates$clothing == "scarf",]
# clothing freq temp
#7 scarf 0.4 30
#8 scarf 0.2 20
#9 scarf 0.1 10

Simulation and Scenarios in R + help for function

Suppose that I have the following.
A table with input data
table <- data.frame(id=c(1,2,3,4,5,6),
cost=c(100,200,300,400,500,600))
A list of possible outcomes with and associate probability
values<-list(c(1),
c(0.5),
c(0))
A simulation of different scenarios
esc<-sample(1:3,100,replace=T)
How can I add a new column which contains the next formula?
id cost final
1 100 100*ifelse(esc[1]==1,values[[1]],ifelse(esc[1]==2,values[[2]],values[[3]]))
2 200 200*ifelse(esc[2]==1,values[[1]],ifelse(esc[2]==2,values[[2]],values[[3]]))

Convert esc variable into factor by using values as labels. Then convert into numeric type. This will map values to esc correctly.
esc <- as.numeric ( as.character( factor( esc, levels = sort( unique( esc )), labels = values) ) )
# [1] 1.0 0.5 0.5 0.0 1.0 0.0 0.0 0.5 0.5 1.0 1.0 1.0 0.0 0.5 0.0 0.5 0.0 0.0 0.5 0.0 0.0 1.0 0.5 1.0 1.0 0.5 1.0 0.5 0.0 0.5 0.5 0.5 0.5 1.0 0.0 0.0 0.0
# [38] 1.0 0.0 0.5 0.0 0.5 0.0 0.5 0.5 0.0 1.0 0.5 0.0 0.0 0.5 0.0 0.5 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.0 1.0 0.5 1.0 0.5 1.0 0.5 0.0 1.0 0.0 0.5 0.0 0.5 0.5
# [75] 0.5 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.5 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.5 0.0 0.0 0.0 0.5 0.5 0.0 0.5
table$esc <- esc[ 1: nrow(table) ] # add esc to table
Now multiply cost with esc to get final
within( table, final <- cost * esc)
# id cost esc final
# 1 1 100 1.0 100
# 2 2 200 0.5 100
# 3 3 300 0.5 150
# 4 4 400 0.0 0
# 5 5 500 1.0 500
# 6 6 600 0.0 0
Data:
table <- data.frame(id=c(1,2,3,4,5,6), cost=c(100,200,300,400,500,600))
values <- c(1, 0.5, 0)
set.seed(1L)
esc <- sample(1:3,100,replace=T)
esc
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3 3 1 2 1 1 2 1 2 3 2 2 2 2 1 3 3 3 1 3 2 3 2 3 2 2 3 1 2 3 3 2 3 2 1 1 1 1 2 2 2 3 1 2 1 2 1 2 3 1 3 2 3 2 2 2
# [76] 3 3 2 3 3 2 3 2 1 3 1 3 1 1 1 1 1 2 3 3 3 2 2 3 2

How to get correct ticklabs in a 3d-scatterplot in R?

Please see this example. Look at y axis. The data there has only two levels: 1 and 2. But in the plot 6 tickmarks drawn on that axis. How could I fix that. The x axis has the same problem.
The data
extra group ID
1 0.7 1 1
2 -1.6 1 2
3 -0.2 1 3
4 -1.2 1 4
5 -0.1 1 5
6 3.4 1 6
7 3.7 1 7
8 0.8 1 8
9 0.0 1 9
10 2.0 1 10
11 1.9 2 1
12 0.8 2 2
13 1.1 2 3
14 0.1 2 4
15 -0.1 2 5
16 4.4 2 6
17 5.5 2 7
18 1.6 2 8
19 4.6 2 9
20 3.4 2 10
The script
require('mise')
require('scatterplot3d')
mise() # clear the workspace
# example data
print(sleep)
# plot it
scatterplot3d(x=sleep$ID,
x.ticklabs=levels(sleep$ID),
y=sleep$group,
y.ticklabs=levels(sleep$group),
z=sleep$extra)
The result

How about this:
scatterplot3d(x=sleep$ID, y=sleep$extra, z=sleep$group, lab.z = c(1, 2))

Binding dataframes by multiple conditions in R

I have a data frame which looks like this:
> data
Class Number
1 A 1
2 A 2
3 A 3
4 B 1
5 B 2
6 B 3
7 C 1
8 C 2
9 C 3
I have a reference data frame which is:
> reference
Class Number Value
1 A 1 0.5
2 B 3 0.3
I want to join these data frames to create a single data frame:
> resultdata
Class Number Value
1 A 1 0.5
2 A 2 0.0
3 A 3 0.0
4 B 1 0.0
5 B 2 0.0
6 B 3 0.3
7 C 1 0.0
8 C 2 0.0
9 C 3 0.0
How can I achieve this? Any help will be greatly appreciated

You can do
library(data.table)
setkey(setDT(reference), Class, Number)[data]
Or
setkey(setDT(data), Class, Number)[reference,
Value:= i.Value][is.na(Value), Value:=0]
# Class Number Value
#1: A 1 0.5
#2: A 2 0.0
#3: A 3 0.0
#4: B 1 0.0
#5: B 2 0.0
#6: B 3 0.3
#7: C 1 0.0
#8: C 2 0.0
#9: C 3 0.0

The basic starting point for this would be merge.
merge(data, reference, all = TRUE)
# Class Number Value
# 1 A 1 0.5
# 2 A 2 NA
# 3 A 3 NA
# 4 B 1 NA
# 5 B 2 NA
# 6 B 3 0.3
# 7 C 1 NA
# 8 C 2 NA
# 9 C 3 NA
There are many questions which show how to replace NA with 0.

You can do:
library(dplyr)
left_join(data, reference) %>% (function(x) { x[is.na(x)] <- 0; x })
Or (as per #akrun suggestion):
left_join(data, reference) %>% mutate(Value = replace(Value, is.na(Value), 0))
Which gives:
# Class Number Value
#1 A 1 0.5
#2 A 2 0.0
#3 A 3 0.0
#4 B 1 0.0
#5 B 2 0.0
#6 B 3 0.3
#7 C 1 0.0
#8 C 2 0.0
#9 C 3 0.0

Merge data frames based on rownames in R

How can I merge the columns of two data frames, containing a distinct set of columns but some rows with the same names? The fields for rows that don't occur in both data frames should be filled with zeros:
> d
a b c d e f g h i j
1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10
2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
> e
k l m n o p q r s t
1 11 12 13 14 15 16 17 18 19 20
3 21 22 23 24 25 26 27 28 29 30
> de
a b c d e f g h i j k l m n o p q r s t
1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10 11 12 13 14 15 16 17 18 19 20
2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0 0 0 0 0 0 0 0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 21 22 23 24 25 26 27 28 29 30

See ?merge:
the name "row.names" or the number 0 specifies the row names.
Example:
R> de <- merge(d, e, by=0, all=TRUE) # merge by row names (by=0 or by="row.names")
R> de[is.na(de)] <- 0 # replace NA values
R> de
Row.names a b c d e f g h i j k l m n o p q r s
1 1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10 11 12 13 14 15 16 17 18 19
2 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0 0 0 0 0 0 0
3 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 21 22 23 24 25 26 27 28 29
t
1 20
2 0
3 30

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Frequency count based on two columns in r - r

One more answer using rle() df$freq <- with(rle(cumsum(df$d == 0)), rep(lengths, lengths)) df o d freq 1 a 0.0 1 2 a 0.0 2 3 a 1.0 2 4 a 0.0 3 5 a 0.3 3 6 a 0.6 3 7 a 0.0 5 8 a 1.0 5 9 a 2.0 5 10 a 3.0 5 11 a 4.0 5 12 a 0.0 1 13 b 0.0 2 14 b 1.0 2 15 b 0.0 1

Related

Using grep to get the rows of a dataframe, instead of the row number

Simulation and Scenarios in R + help for function

How to get correct ticklabs in a 3d-scatterplot in R?

Binding dataframes by multiple conditions in R

Merge data frames based on rownames in R

Categories

Resources