Expand summarized data of counts to one row per registration [duplicate] - r

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 5 years ago.
I have a data frame with counts of each combination of a trait (true / false) for species A and B. Here's a smaller version of my data:
species <- c("A", "B")
true <- c(3, 2)
false <- c(1, 4)
df <- data.frame(species, true, false)
df
species true false
1 A 3 1
2 B 2 4
Is there any way to convert these summarized counts to one row for each registration, with first column for "Species" (A or B). Second column "Trait" (true or false):
Species Trait
A true
A true
A true
B true
B true
A false
B false
B false
B false
B false
I don´t really know how to approach this, usually raw data is available and a summary table can easily be constructed from that, but this is the reverse way.
I´m thankful for every answer! :)

Related

Finding the maximum value for each row and extract column names [duplicate]

This question already has answers here:
R Create column which holds column name of maximum value for each row
(4 answers)
Closed 1 year ago.
Say we have the following matrix,
x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"), c("A","B","C")))
What I'm trying to do is:
1- Find the maximum value of each row. For this part, I'm doing the following,
df <- apply(X=x, MARGIN=1, FUN=max)
2- Then, I want to extract the column names of the maximum values and put them next to the values. Following the reproducible example, it would be "C" for the three rows.
Any assistance would be wonderful.
You can use apply like
maxColumnNames <- apply(x,1,function(row) colnames(x)[which.max(row)])
Since you have a numeric matrix, you can't add the names as an extra column (it would become converted to a character-matrix).
You can choose a data.frame and do
resDf <- cbind(data.frame(x),data.frame(maxColumnNames = maxColumnNames))
resulting in
resDf
A B C maxColumnNames
X 1 4 7 C
Y 2 5 8 C
Z 3 6 9 C

How to create a variable using another variable as an Index? [duplicate]

This question already has answers here:
Using row-wise column indices in a vector to extract values from data frame [duplicate]
(2 answers)
Closed 3 years ago.
I'm looking to create a new variable, d, which grabs the value from either an or b based off of the variable C.
dat = data.frame(a=1:10,b=11:20,c=rep(1:2,5))
The result would be:
d = c(1,12,3,14,... etc)
We can use a row/column indexing where the row index is the sequence of rows and column index the 'c' column, cbind them and extract the elements from the dataset based on this
dat$d <- dat[1:2][cbind(seq_len(nrow(dat)), dat$c)]
dat$d
#[1] 1 12 3 14 5 16 7 18 9 20
NOTE: This should also work when there are multiple column values to extract.
You can do
dat$d <- ifelse(dat$c==1,dat$a,dat$b)
A dplyr variant
dat %>%
mutate(d = case_when(c==1 ~ a,
TRUE ~ b))

Combine two dataframes containing boolean values with "OR" logic [duplicate]

This question already has answers here:
Boolean operators && and ||
(4 answers)
Closed 3 years ago.
I can't find the answer and the simple approaches I've tried haven't worked.
Basically, I have two corresponding dataframes with identical dimensions, full of boolean values.
I want "OR" logic, to produce a third corresponding dataframe with a TRUE anywhere either starting dataframes had TRUE.
df1 <- data.frame(a=c(T,T),
b=c(F,F))
df2 <- data.frame(a=c(F,T),
b=c(F,T))
Desired output:
a b
[1,] TRUE FALSE
[2,] TRUE TRUE
It works using the | operator:
df1 | df2
a b
[1,] TRUE FALSE
[2,] TRUE TRUE

Select factor values with level NA [duplicate]

This question already has answers here:
Select rows from a data frame based on values in a vector
(3 answers)
Closed 5 years ago.
How can I avoid using a loop to subset a dataframe based on multiple factor levels?
In the following example my desired output is a dataframe. The dataframe should contain the rows of the original dataframe where the value in "Code" equals one of the values in "selected".
Working example:
#sample data
Code<-c("A","B","C","D","C","D","A","A")
Value<-c(1, 2, 3, 4, 1, 2, 3, 4)
data<-data.frame(cbind(Code, Value))
selected<-c("A","B") #want rows that contain A and B
#Begin subsetting
result<-data[which(data$Code==selected[1]),]
s1<-2
while(s1<length(selected)+1)
{
result<-rbind(result,data[which(data$Code==selected[s1]),])
s1<-s1+1
}
This is a toy example of a much larger dataset, so "selected" may contain a great number of elements and the data a great number of rows. Therefore I would like to avoid the loop.
You can use %in%
data[data$Code %in% selected,]
Code Value
1 A 1
2 B 2
7 A 3
8 A 4
Here's another:
data[data$Code == "A" | data$Code == "B", ]
It's also worth mentioning that the subsetting factor doesn't have to be part of the data frame if it matches the data frame rows in length and order. In this case we made our data frame from this factor anyway. So,
data[Code == "A" | Code == "B", ]
also works, which is one of the really useful things about R.
Try this:
> data[match(as.character(data$Code), selected, nomatch = FALSE), ]
Code Value
1 A 1
2 B 2
1.1 A 1
1.2 A 1

Subset a dataframe by multiple factor levels [duplicate]

This question already has answers here:
Select rows from a data frame based on values in a vector
(3 answers)
Closed 5 years ago.
How can I avoid using a loop to subset a dataframe based on multiple factor levels?
In the following example my desired output is a dataframe. The dataframe should contain the rows of the original dataframe where the value in "Code" equals one of the values in "selected".
Working example:
#sample data
Code<-c("A","B","C","D","C","D","A","A")
Value<-c(1, 2, 3, 4, 1, 2, 3, 4)
data<-data.frame(cbind(Code, Value))
selected<-c("A","B") #want rows that contain A and B
#Begin subsetting
result<-data[which(data$Code==selected[1]),]
s1<-2
while(s1<length(selected)+1)
{
result<-rbind(result,data[which(data$Code==selected[s1]),])
s1<-s1+1
}
This is a toy example of a much larger dataset, so "selected" may contain a great number of elements and the data a great number of rows. Therefore I would like to avoid the loop.
You can use %in%
data[data$Code %in% selected,]
Code Value
1 A 1
2 B 2
7 A 3
8 A 4
Here's another:
data[data$Code == "A" | data$Code == "B", ]
It's also worth mentioning that the subsetting factor doesn't have to be part of the data frame if it matches the data frame rows in length and order. In this case we made our data frame from this factor anyway. So,
data[Code == "A" | Code == "B", ]
also works, which is one of the really useful things about R.
Try this:
> data[match(as.character(data$Code), selected, nomatch = FALSE), ]
Code Value
1 A 1
2 B 2
1.1 A 1
1.2 A 1

Resources