Add column in R based on comparison with another column - r

I have a beginner R question.
I want to add a column "d" that has a value of 1 if the corresponding row in "c" is >4, and 0 otherwise. I think that if I can do this basic thing I can extend the logic to my other questions. Basically, I can't figure out how to do basic comparisons between entries in a given row.
Here is a sample set of code:
# initial data
a=c(0,1,1)
b=c(1,2,3)
c=c(4,5,6)
data=data.frame(a,b,c)
Any help would be appreciated. Thanks!

One way:
> data
a b c
1 0 1 4
2 1 2 5
3 1 3 6
> data$d=ifelse(data$c>4,1,0)
> data
a b c d
1 0 1 4 0
2 1 2 5 1
3 1 3 6 1
Another common way is to rely on the fact that TRUE/FALSE convert to 1/0 when converted to numeric:
> data$d2=as.numeric(data$c>4)
> data
a b c d d2
1 0 1 4 0 0
2 1 2 5 1 1
3 1 3 6 1 1

Related

Putting back a missing column from a data.frame into a list of dta.frames

My LIST of data.frames below is made from my data. However, this LIST is missing the scale column which is available in the original data.
I was wondering how to put back the missing scale column into LIST to achive my DESIRED_LIST?
Reproducible data and code are below.
m3="
scale study outcome time ES bar
2 1 1 0 1 8
2 1 2 0 2 7
1 2 1 0 3 6
1 2 1 1 4 5
2 3 1 0 5 4
2 3 1 1 6 3
1 4 1 0 7 2
1 4 2 0 8 1"
data <- read.table(text = m3, h=T)
LIST <- list(data.frame(study=c(3,3) ,outcome=c(1,1) ,time=0:1),
data.frame(study=c(1,1) ,outcome=c(1,2) ,time=c(0,0)),
data.frame(study=c(2,2,4,4),outcome=c(1,1,1,2),time=c(0,1,0,0)))
DESIRED_LIST <- list(data.frame(scale=c(2,2) ,study=c(3,3) ,outcome=c(1,1) ,time=0:1),
data.frame(scale=c(2,2) ,study=c(1,1) ,outcome=c(1,2) ,time=c(0,0)),
data.frame(scale=c(1,1,1,1),study=c(2,2,4,4),outcome=c(1,1,1,2),time=c(0,1,0,0)))
In base R, you could do:
lapply(LITS, \(x)merge(x, data)[names(data)])

detect missings (NA or 0) in data frame

i want to create a new variable in a data frame that contains information about the other variables.
I have got a large data frame. To keep it short let's say:
a <- c(1,0,2,3)
b <- c(3,0,1,1)
c <- c(2,0,2,2)
d <- c(4,1,1,1)
(df <- data.frame(a,b,c,d) )
a b c d
1 1 3 2 4
2 0 0 0 1
3 2 1 2 1
4 3 1 2 1
Aim: Create a new variable that informs me if one person (row) has cero reports (or missings / NA) either in the variables a+b or in the variables c+d.
a b c d x
1 1 3 2 4 1
2 0 0 0 1 NA
3 2 1 2 1 1
4 3 1 2 1 1
As i have a large data frame i was thinking about the use of df[1:2] and df[3:4] so that i do not need to type every variable name. But i am not sure which is the best way to implement it. Maybe dplyr has a nice option?
df$x <- ifelse(rowSums(df), 1, NA)
EDIT: Answer to the updated question:
df$x <- ifelse(rowSums(df[1:2])&rowSums(df[3:4]), 1, NA)
gives,
a b c d x
1 1 3 2 4 1
2 0 0 0 1 NA
3 2 1 2 1 1
4 3 1 2 1 1

How to create a table shows frequency of all dummy variables in r

I am a rookie in R.
I want to create a frequency table of all dummy variables and I have a data like this
ID Dummy_2008 Dummy_2009 Dummy_2010 Dummy_2011 Dummy_2012 Dummy_2013
1 1 1 0 0 1 1
2 0 0 1 1 0 1
3 0 0 1 0 0 1
4 0 1 1 0 0 1
5 0 0 0 0 1 0
6 0 0 0 1 0 0
I want to see how total frequency in each variable like this
0 1 sum
Dummy_2008 5 1 6
Dummy_2009 4 2 6
Dummy_2010 3 3 6
Dummy_2011 4 2 6
Dummy_2012 4 2 6
Dummy_2013 2 4 6
I only know to use table() , but I can only do this one variable a time.
I have many time serious dummy variables, and I want to see the trend of them.
Many thanks for the help
Terence
Here is another option using mtabulate and addmargins
library(qdapTools)
addmargins(as.matrix(mtabulate(df1[-1])),2)
# 0 1 Sum
#Dummy_2008 5 1 6
#Dummy_2009 4 2 6
#Dummy_2010 3 3 6
#Dummy_2011 4 2 6
#Dummy_2012 4 2 6
#Dummy_2013 2 4 6
result = as.data.frame(t(sapply(dat[,-1], table)))
result$Sum = rowSums(result)
0 1 Sum
Dummy_2008 5 1 6
Dummy_2009 4 2 6
Dummy_2010 3 3 6
Dummy_2011 4 2 6
Dummy_2012 4 2 6
Dummy_2013 2 4 6
Explanation:
sapply applies a function to each column of a data frame and returns a matrix. So sapply(dat[,-1], table) returns a matrix with the output of table for each column (except the first column, which we've excluded).
The matrix needs to be transposed so that the column names from the original data frame are the rows and the dummy values are the columns, so we use the t (transpose) function for that.
We want a data frame, not a matrix, so we wrap the whole thing in as.data.frame.
Next, we want another column giving the total number of values, so we use the rowSums function.

How to randomly choose only one row in each group [duplicate]

This question already has answers here:
from data table, randomly select one row per group
(4 answers)
Closed 6 years ago.
Say I have a dataframe as follows:
df <- data.frame(Region = c("A","A","A","B","B","C","D","D","D","D"),
Combo = c(1,2,3,1,2,1,1,2,3,4))
> df
Region Combo
1 A 1
2 A 2
3 A 3
4 B 1
5 B 2
6 C 1
7 D 1
8 D 2
9 D 3
10 D 4
What I would like to do, is for each Region (A,B,C,D) randomly choose only one of the possible combos for that region.
If the chosen combination were indicated by a binary variable, it would look something potentially like this:
Region Combo RandomlyChosen
1 A 1 1
2 A 2 0
3 A 3 0
4 B 1 0
5 B 2 1
6 C 1 1
7 D 1 0
8 D 2 0
9 D 3 1
10 D 4 0
I'm aware of the sample function, but just don't know how to choose only one combo within each region.
I reglarly use data.table, so any solutions using that are welcome. Though solutions not using data.table are equally welcome.
Thanks!
In plain R you can use sample() within tapply():
df$Chosen <- 0
df[-tapply(-seq_along(df$Region),df$Region, sample, size=1),]$Chosen <- 1
df
Region Combo Chosen
1 A 1 0
2 A 2 1
3 A 3 0
4 B 1 1
5 B 2 0
6 C 1 1
7 D 1 0
8 D 2 0
9 D 3 1
10 D 4 0
Note the -(-selected_row_number) trick to avoid sampling from 1 to n when there is a single row number for one group

specifying column name when one column is selected using grep in r

I am having an issue with the grep function. Specifically, when I tell R to get all the columns that start with a certain letter using the function, and there is only one such column, all that is yielded is the data with the code as the column name like this:
> head(newdat1)
i1 b2 b1 b17
1 1 1 2 0
2 1 1 2 0
3 1 1 2 0
4 1 1 2 0
5 2 1 1 0
6 3 1 1 1
datformeanfill<-as.data.frame(newdat1[,grep("^i", colnames(newdat1))])
> head(datformeanfill)
newdat1[, grep("^i", colnames(newdat1))]
1 1
2 1
3 1
4 1
5 2
6 3
As opposed to if I have two or more columns that start with the same letter:
datnotformeanfill<-as.data.frame(newdat1[,grep("^b", colnames(newdat1))])
> head(datnotformeanfill)
b2 b1 b17
1 1 2 1
2 1 2 1
3 1 2 1
4 1 2 1
5 1 1 1
6 1 1 2
Where we see the column names are maintained, and it does the same if I have multiple "i". Please help thanks!
Use
datformeanfill <- newdat1[,grep("^i", colnames(newdat1)), drop=FALSE]
to ensure you always get back a data.frame. See ?'[.data.frame' for the details.

Resources