How to group a numeric variable in r? [duplicate] - r

This question already has answers here:
Convert continuous numeric values to discrete categories defined by intervals
(2 answers)
Cut by Defined Interval
(2 answers)
Closed 2 years ago.
I have the data that has numeric variable A. I want to make groups for A to have something like B.
data <- structure(list(A = c(0, 0, 0, 0, 1, 2, 9, 15, 30, 100, 0.2, 0.003,
95, 18), B = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 4L, 10L, 1L, 1L,
10L, 2L)), class = "data.frame", row.names = c(NA, -14L))

Are you trying to create B from A? it looks like you want something like
data$A %/% 10
[1] 0 0 0 0 0 0 0 1 3 10 0 0 9 1
or
(data$A %/% 10)+1
[1] 1 1 1 1 1 1 1 2 4 11 1 1 10 2

Related

Sum row on duplicated element in a column [R] [duplicate]

This question already has answers here:
Group by multiple columns and sum other multiple columns
(7 answers)
Sum multiple variables by group [duplicate]
(2 answers)
Closed 3 months ago.
I have a dafarame such as :
Names COLA COLB COLC
sp1_A 1 0 1
sp1_A 1 0 0
sp1_B 0 1 1
sp2_A 0 0 1
sp2_A 0 1 1
sp2_A 0 0 1
And I would like for each Names to sum the row content and get
I shoudl then get:
Names COLA COLB COLC
sp1_A 2 0 1
sp1_B 0 1 1
sp2_A 0 1 3
Here is the dput format of the dataframe :
structure(list(Names = c("sp1_A", "sp1_A", "sp1_B", "sp2_A",
"sp2_A", "sp2_A"), COLA = c(1L, 1L, 0L, 0L, 0L, 0L), COLB = c(0L,
0L, 1L, 0L, 1L, 0L), COLC = c(1, 0, 1, 1, 1, 1)), class = "data.frame", row.names = c(NA,
-6L))

Create a new dataframe based on old one in R [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 1 year ago.
The dataframe above is the an example of the original one. I am trying to create following new dataframe based on this original one:
Thank you!
We can use xtabs from base R
xtabs(abundance ~ StationCode + SpeciesCode, df1)
-output
SpeciesCode
StationCode AME BCF BKB CAP
O-01 2 1 5 0
O-02 1 0 1 1
O-03 0 4 2 0
O-04 0 0 8 1
data
df1 <- structure(list(SpeciesCode = c("AME", "AME", "BCF", "BCF", "CAP",
"CAP", "BKB", "BKB", "BKB", "BKB"), StationCode = c("O-01", "O-02",
"O-03", "O-01", "O-04", "O-02", "O-04", "O-01", "O-02", "O-03"
), abundance = c(2L, 1L, 4L, 1L, 1L, 1L, 8L, 5L, 1L, 2L)),
class = "data.frame", row.names = c(NA,
-10L))

New data frame, if specific value(s) is contained AND other values aren't included in a range of columns in r

So, I have a large data frame with monthly observations of n individuals.
ind y_0101 y_0102 y_0103 y_0104_ .... y_0311 y_0312
A 33 6 1 2 1 5
B 36 5 0 2 1 5
C 22 4 1 NA 1 5
D 2 2 0 2 1 5
E 5 2 1 2 1 6
F 7 1 0 2 1 5
G 8 6 1 2 1 5
H 2 8 0 2 2 5
I 1 3 1 2 1 5
J 3 2 0 2 1 5
I want to create a new data frame, in which include the individuals who meet some specific conditions.
E.g. if, for individual i, the range of column y_0101:y_0312 does NOT include values of 3 & 6 & NA, AND include values of 2 | 1 THEN for individual i should be included in new data frame. Which produce the following data frame:
ind y_0101 y_0102 y_0103 y_0104_ .... y_0311 y_0312
B 36 5 0 2 1 5
D 2 2 0 2 1 5
F 7 1 0 2 1 5
H 2 8 0 2 2 5
I tried different ways, but I can't figure out how to get multiple conditions included.
df <- df %>% filter(vars(starts_with("y_"))!=3 | !=6 | != NA)
or
df <- df %>% filter_at(vars(starts_with("y_")), all_vars(!=3 | !=6 | != NA)
I've tried some other things as well, like !%in%, but that doesn't seem to work. Any ideas?
I think you're almost there, but might need a slight shift in the logic:
df <- data.frame(A1 = 1:10,
A2 = 10:1,
A3 = 1:10,
B1 = 1:10)
df %>%
filter_at(vars(starts_with("A")), ~!(.x %in% c(3, 6, NA))) %>%
filter(if_any(starts_with("A"), ~ .x %in% c(1, 2)))
In the first step, I filter out all rows where any of the columns are 3, 6, or NA. In the second row, I filter down to only rows where at least one of the columns is 1 or 2. Does this help with your case?
Here is a base R option using rowSums :
cols <- grep('y_', names(df))
include <- c(1, 2)
not_include <- c(3, 6, NA)
result <- subset(df, rowSums(sapply(df[cols], `%in%`, include)) > 0 &
rowSums(sapply(df[cols], `%in%`, not_include)) == 0)
result
# ind y_0101 y_0102 y_0103 y_0104 y_0311 y_0312
#2 B 36 5 0 2 1 5
#4 D 2 2 0 2 1 5
#6 F 7 1 0 2 1 5
#8 H 2 8 0 2 2 5
data
df <- structure(list(ind = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J"), y_0101 = c(33L, 36L, 22L, 2L, 5L, 7L, 8L, 2L, 1L,
3L), y_0102 = c(6L, 5L, 4L, 2L, 2L, 1L, 6L, 8L, 3L, 2L), y_0103 = c(1L,
0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L), y_0104 = c(2L, 2L, NA, 2L,
2L, 2L, 2L, 2L, 2L, 2L), y_0311 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L), y_0312 = c(5L, 5L, 5L, 5L, 6L, 5L, 5L, 5L, 5L, 5L
)), class = "data.frame", row.names = c(NA, -10L))

Find a turnover of each value in column [duplicate]

This question already has answers here:
Counting unique / distinct values by group in a data frame
(12 answers)
Closed 2 years ago.
I have a dataset, which i define for example like this:
type <- c(1,1,1,2,2,2,2,2,3,3,4,4,5)
val <- c(4,1,1,2,8,2,3,2,3,3,4,4,5)
tdt <- data.frame(plu, occur)
So it looks like this:
type val
1 4
1 1
1 1
2 2
2 8
2 2
2 3
2 2
3 3
3 3
4 4
4 4
5 5
5 7
I want to find how many unique vals each type gets (turnover). So desired result is:
type turnover
1 2
2 3
3 1
4 1
5 2
How could i get it? How this function should look like? I know how to count occurrences of each type, but not with each unique val
With n_distinct, we can get the number of unique elements grouped by 'type'
library(dplyr)
tdt %>%
group_by(type) %>%
summarise(turnover = n_distinct(val))
# A tibble: 5 x 2
# type turnover
# <int> <int>
#1 1 2
#2 2 3
#3 3 1
#4 4 1
#5 5 2
Or with distinct and count
tdt %>%
distinct() %>%
count(type)
# type n
#1 1 2
#2 2 3
#3 3 1
#4 4 1
#5 5 2
Or using uniqueN from data.table
library(data.table)
setDT(tdt)[, .(turnover = uniqueN(val)), type]
Or with table in base R after getting the unique rows
table(unique(tdt)$type)
data
tdt <- structure(list(type = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L), val = c(4L, 1L, 1L, 2L, 8L, 2L, 3L, 2L, 3L,
3L, 4L, 4L, 5L, 7L)), class = "data.frame", row.names = c(NA,
-14L))
Another base R option is using aggregate
tdtout <- aggregate(val~.,tdt,function(v) length(unique(v)))
such that
> tdtout
type val
1 1 2
2 2 3
3 3 1
4 4 1
5 5 2
data
> dput(tdt)
structure(list(type = c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 4, 5,
5), val = c(4, 1, 1, 2, 8, 2, 3, 2, 3, 3, 4, 4, 5, 7)), class = "data.frame", row.names = c(NA,
-14L))

R code to assign a sequence based off of multiple variables [duplicate]

This question already has answers here:
Recode dates to study day within subject
(2 answers)
Closed 3 years ago.
I have data structured as below:
ID Day Desired Output
1 1 1
1 1 1
1 1 1
1 2 2
1 2 2
1 3 3
2 4 1
2 4 1
2 5 2
3 6 1
3 6 1
Is it possible to create a sequence for the desired output without using a loop? The dataset is quite large so a loop won't work, is it possible to do this with the dplyr package or maybe a combination of cumsum/diff?
An option is to group by 'ID', and then do a match on the 'Day' with the unique values of 'Day' column
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(desired = match(Day, unique(Day)))
data
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L), Day = c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L, 5L, 6L, 6L)), row.names = c(NA,
-11L), class = "data.frame")

Resources