I have a large data frame marking occurrences of trigrams in a string, where the strings are the rows, the trigrams are the columns, and the values mark whether an trigram occurs in a string.
so something like this:
strs <- c('this', 'that', 'chat', 'chin')
thi <- c(1, 0, 0, 0)
tha <- c(0, 1, 0, 0)
hin <- c(0, 0, 0, 1)
hat <- c(0, 1, 1, 0)
df <- data.frame(strs, thi, tha, hin, hat)
df
# strs thi tha hin hat
#1 this 1 0 0 0
#2 that 0 1 0 1
#3 chat 0 0 0 1
#4 chin 0 0 1 0
I want to get all of the columns/trigrams that have a 1 for a given row or a given string.
So for row 2, the string 'that', the result would a data frame that looks like this:
str tha hat
1 this 0 0
2 that 1 1
3 chat 0 1
4 chin 0 0
How could I do this?
This will give you the desired output df.
givenStr <- "that"
row <- df[df$strs==givenStr,]
df[,c(1,1+which(row[,-1]==1))]
In a one liner:
df[as.logical(df[df$strs=='that',])]
# strs tha hat
#1 this 0 0
#2 that 1 1
#3 chat 0 1
#4 chin 0 0
Related
I have a dataset comparing cases to categories of mental illness. In the data set, it is reported as 0 for no mental illness, 1 for mood disorders, 2 for behavioral disorders, 3 for other, and 4 for disorder like symptoms. I am trying to convert my dataset (mentallIllness) so that if you show any symptoms or have any disorder (I.e. you have a 1 to 4) it counts as a 1 (just yes that you have signs/have disorder) or 0 for no mental illness.
How can I go about that?
Thanks!
Suppose you have a vector with numbers from 0 to 4:
my_data <- c(0:4, 2, 3, 0)
my_data
#[1] 0 1 2 3 4 2 3 0
Here are a few ways to convert all the non-zeros to 1:
1*(my_data>0)
#[1] 0 1 1 1 1 1 1 0
as.numeric(my_data>0)
#[1] 0 1 1 1 1 1 1 0
In both of these cases, the term (my_data>0) tests each value in my_data to evaluate if it is greater than 0, if so the result is TRUE, otherwise FALSE. We can multiply TRUE/FALSE by 1, or convert to numeric, to change those to 1/0.
As Ben Bolker suggested, we could use ifelse to get the same results:
ifelse(my_data == 0, 0, 1)
#[1] 0 1 1 1 1 1 1 0
Your vector might live in a data frame, like:
my_df <- data.frame(my_data = c(0:4, 2, 3, 0))
We could use the same code to make a new variable, or overwrite the existing one:
my_df$recoded = ifelse(my_df$my_data == 0, 0, 1)
my_df
# my_data recoded
#1 0 0
#2 1 1
#3 2 1
#4 3 1
#5 4 1
#6 2 1
#7 3 1
#8 0 0
Does anyone have an idea how to generate column of random values where only one random row is marked with number "1". All others should be "0".
I need function for this in R code.
Here is what i need in photos:
df <- data.frame(subject = 1, choice = 0, price75 = c(0,0,0,1,1,1,0,1))
This command will update the choice column to contain a single random row with value of 1 each time it is called. All other rows values in the choice column are set to 0.
df$choice <- +(seq_along(df$choice) == sample(nrow(df), 1))
With integer(length(DF$choice)) a vector of 0 is created where [<- is replacing a 1 on the position from sample(length(DF$choice), 1).
DF <- data.frame(subject=1, choice="", price75=c(0,0,0,1,1,1,0,1))
DF$choice <- `[<-`(integer(nrow(DF)), sample(nrow(DF), 1L), 1L)
DF
# subject choice price75
#1 1 0 0
#2 1 0 0
#3 1 0 0
#4 1 1 1
#5 1 0 1
#6 1 0 1
#7 1 0 0
#8 1 0 1
> x <- rep(0, 10)
> x[sample(1:10, 1)] <- 1
> x
[1] 0 0 0 0 0 0 0 1 0 0
Many ways to set a random value in a row\column in R
df<-data.frame(x=rep(0,10)) #make dataframe df, with column x, filled with 10 zeros.
set.seed(2022) #set a random seed - this is for repeatability
#two base methods for sampling:
#sample.int(n=10, size=1) # sample an integer from 1 to 10, sample size of 1
#sample(x=1:10, size=1) # sample from 1 to 10, sample size of 1
df$x[sample.int(n=10, size=1)] <- 1 # randomly selecting one of the ten rows, and replacing the value with 1
df
In R:: I'm trying to create a new column in a data frame with a for loop that references the previous row in the same column. I am returned an error message that reads "replacement has length zero."
I have tried using the "reduce" and " filter" functions.
df$STATUS <- 0
for(i in 1:nrow(df)) {
df$STATUS[i] <- ifelse(df$start[i]==1 | ((df$STATUS[i-1])==1 & df$stop[i]==0), 1, 0)
}
I expected this code to fill the STATUS column according to the if statement nested in the for loop. The STATUS column is intended to write a 1 when start =1, and remain 1 until stop = 1. Instead, I received the error message:
Error in
df$STATUS <- ifelse(df$start[i] == 1 | ((df$STATUS[i - :
replacement has length zero
As Yannis mentioned, assign a non-NA value for df$STATUS[1] and ten star the loop from 2.
df <- data.frame(start = c(1, 1, 0, 0, 1, 1, 0, 0), stop = c(0,
1, 1, 0, 1, 0, 1, 0))
df$STATUS <- 0 #THIS GIVES ALL VALUES OF STATUS, INCLUDING ROW 1 THE VALUE 0
print(df)
# start stop STATUS
# 1 0 0
# 1 1 0
# 0 1 0
# 0 0 0
# 1 1 0
# 1 0 0
# 0 1 0
# 0 0 0
for(i in 2:nrow(df)) {
df$STATUS[i] <- ifelse(df$start[i]==1 | ((df$STATUS[i-1])==1 &
df$stop[i]==0), 1, 0)
}
print(df)
# start stop STATUS
# 1 0 0
# 1 1 1
# 0 1 0
# 0 0 0
# 1 1 1
# 1 0 1
# 0 1 0
# 0 0 0
I'm trying to set up column (called 'combined) to indicate the combined information of owner and Head within each group (Group). There is only 1 owner in each group, and 'Head' is basically the first row of each group that has the minimum id value.
This combined column should flag '1' if the ID is flagged as owner, then the rest of the id within each group will be 0 regardless of the information in 'Head'. However for groups that do not have any Owner in the IDs (i.e. all 0 in owner within the group), then this column will take the Head column information. My data looks like this and the last column (combined) is the desired outcome.
sample <- data.frame(Group = c("46005589", "46005589","46005590","46005591", "46005591","46005592","46005592","46005592", "46005593", "46005594"), ID= c("189199", "2957073", "272448", "1872092", "10374996", "1153514", "2771118","10281300", "2610301", "3564526"), Owner = c(0, 1, 1, 0, 0, 0, 1, 0, 1, 1), Head = c(1, 0, 0, 1, 0, 1, 0, 0, 1, 1), combined = c(0, 1, 1, 1, 0, 0, 1, 0, 1, 1))
> sample
Group ID Owner Head combined
1 46005589 189199 0 1 0
2 46005589 2957073 1 0 1
3 46005590 272448 1 0 1
4 46005591 1872092 0 1 1
5 46005591 10374996 0 0 0
6 46005592 1153514 0 1 0
7 46005592 2771118 1 0 1
8 46005592 10281300 0 0 0
9 46005593 2610301 1 1 1
10 46005594 3564526 1 1 1
I've tried a few dplyr and ifelse clauses and it didn't seem to give outputs to what I wanted. How should I recode this column? Thanks.
I don't think this is the best way but you might look at visually inspecting IDs with all 0s. You could do this with rowSums and specify these IDs using %in%. Here is a possible solution:
library(dplyr)
df %>%
mutate_at(vars(ID,Group),funs(as.factor)) %>%
mutate(Combined=if_else(Owner==1,1,0),
NewCombi=ifelse(ID== "1872092",Head,Combined))
This yields: NewCombi is our target.
# Group ID Owner Head Combined NewCombi
#1 46005589 189199 0 1 0 0
#2 46005589 2957073 1 0 1 1
#3 46005590 272448 1 0 1 1
#4 46005591 1872092 0 1 0 1
#5 46005591 10374996 0 0 0 0
#6 46005592 1153514 0 1 0 0
#7 46005592 2771118 1 0 1 1
#8 46005592 10281300 0 0 0 0
#9 46005593 2610301 1 1 1 1
#10 46005594 3564526 1 1 1 1
The new combined column can be created in two steps in dplyr: first use filter(all(Owner == 0))by creating a column that only contains 'Head' information of IDs that do not contain any 'Owner', then merge this column back to the original dataframe, sum up the 1s in this column and the 1s 'Owner' column to obtain the combined info.
library(dplyr)
sample2 <- sample %>%
group_by(Group) %>%
filter(all(Owner == 0)) %>%
mutate(Head_nullowner = ifelse(Head == 1, 1, 0)) #select all rows of IDs that do not have any owners
#merge Head_nullowner with the original dataframe by both Group and ID
sample <- merge(sample, sample2[c("Group", "ID", "Head_nullowner")], by.x = c("Group", "ID"), by.y = c("Group", "ID"), all.x = T)
sample$Head_nullowner[is.na(sample$Head_nullowner)] <- 0
sample$OwnerHead_combined = sample$Owner + sample$Head_nullowner
> sample
Group ID Owner Head combined Head_nullowner OwnerHead_combined
1 46005589 189199 0 1 0 0 0
2 46005589 2957073 1 0 1 0 1
3 46005590 272448 1 0 1 0 1
4 46005591 10374996 0 0 0 0 0
5 46005591 1872092 0 1 1 1 1
6 46005592 10281300 0 0 0 0 0
7 46005592 1153514 0 1 0 0 0
8 46005592 2771118 1 0 1 0 1
9 46005593 2610301 1 1 1 0 1
10 46005594 3564526 1 1 1 0 1
I’m working in R and am trying to find a way to refer to the previous cell within a vector when that vector belongs to a data frame. By previous cell, I’m essentially hoping for a “lag” command of some sort so that I can compare one cell to the cell previous. As an example, I have these data:
A <- c(1,0,0,0,1,0,0)
B <- c(1,1,1,1,1,0,0)
AB_df <- cbind (A,B)
What I want is for a given cell in a given row, if that cell’s value is less than the previous cell’s value for the same column vector, to return a value of 1 and if not to return a value of 0. For this example, the new columns would be called “A-flag” and “B-flag” below.
A B A-flag B-flag
1 1 0 0
0 1 1 0
0 1 0 0
0 1 0 0
1 1 0 0
0 0 1 1
0 0 0 0
Any suggestions for syntax that can do this? Ideally, to just create a new column variable into an existing data-frame.
Here is one solution using dplyr package and it's lag method:
library(dplyr)
AB_df <- data.frame(A = A, B = B)
AB_df %>% mutate(A.flag = ifelse(A < lag(A, default = 0), 1, 0),
B.flag = ifelse(B < lag(B, default = 0), 1, 0))
A B A.flag B.flag
1 1 1 0 0
2 0 1 1 0
3 0 1 0 0
4 0 1 0 0
5 1 1 0 0
6 0 0 1 1
7 0 0 0 0