specifying column name when one column is selected using grep in r - r

I am having an issue with the grep function. Specifically, when I tell R to get all the columns that start with a certain letter using the function, and there is only one such column, all that is yielded is the data with the code as the column name like this:
> head(newdat1)
i1 b2 b1 b17
1 1 1 2 0
2 1 1 2 0
3 1 1 2 0
4 1 1 2 0
5 2 1 1 0
6 3 1 1 1
datformeanfill<-as.data.frame(newdat1[,grep("^i", colnames(newdat1))])
> head(datformeanfill)
newdat1[, grep("^i", colnames(newdat1))]
1 1
2 1
3 1
4 1
5 2
6 3
As opposed to if I have two or more columns that start with the same letter:
datnotformeanfill<-as.data.frame(newdat1[,grep("^b", colnames(newdat1))])
> head(datnotformeanfill)
b2 b1 b17
1 1 2 1
2 1 2 1
3 1 2 1
4 1 2 1
5 1 1 1
6 1 1 2
Where we see the column names are maintained, and it does the same if I have multiple "i". Please help thanks!

Use
datformeanfill <- newdat1[,grep("^i", colnames(newdat1)), drop=FALSE]
to ensure you always get back a data.frame. See ?'[.data.frame' for the details.

Related

Putting back a missing column from a data.frame into a list of dta.frames

My LIST of data.frames below is made from my data. However, this LIST is missing the scale column which is available in the original data.
I was wondering how to put back the missing scale column into LIST to achive my DESIRED_LIST?
Reproducible data and code are below.
m3="
scale study outcome time ES bar
2 1 1 0 1 8
2 1 2 0 2 7
1 2 1 0 3 6
1 2 1 1 4 5
2 3 1 0 5 4
2 3 1 1 6 3
1 4 1 0 7 2
1 4 2 0 8 1"
data <- read.table(text = m3, h=T)
LIST <- list(data.frame(study=c(3,3) ,outcome=c(1,1) ,time=0:1),
data.frame(study=c(1,1) ,outcome=c(1,2) ,time=c(0,0)),
data.frame(study=c(2,2,4,4),outcome=c(1,1,1,2),time=c(0,1,0,0)))
DESIRED_LIST <- list(data.frame(scale=c(2,2) ,study=c(3,3) ,outcome=c(1,1) ,time=0:1),
data.frame(scale=c(2,2) ,study=c(1,1) ,outcome=c(1,2) ,time=c(0,0)),
data.frame(scale=c(1,1,1,1),study=c(2,2,4,4),outcome=c(1,1,1,2),time=c(0,1,0,0)))
In base R, you could do:
lapply(LITS, \(x)merge(x, data)[names(data)])

How to triplicate and rearrange columns [duplicate]

Is there any efficient way, without using for loops, to duplicate the columns in a data frame? For example, if I have the following data frame:
Var1 Var2
1 1 0
2 2 0
3 1 1
4 2 1
5 1 2
6 2 2
And I specify that column Var1 should be repeated twice, and column Var2 three times, then I would like to get the following:
Var1 Var1 Var2 Var2 Var2
1 1 1 0 0 0
2 2 2 0 0 0
3 1 1 1 1 1
4 2 2 1 1 1
5 1 1 2 2 2
6 2 2 2 2 2
Any help would be greatly appreciated!
We can replicate the column names (rep), use that as index to duplicate the columns. By default, the data.frame columns can have only unique column names, so it will use make.unique to add .1, .2 as suffix to the duplicate column names in 'df2'. If we don't want that, we can remove the suffix part with sub.
df2 <- df1[rep(names(df1), c(2,3))]
names(df2) <- sub('\\..*', '', names(df2))
df2
# Var1 Var1 Var2 Var2 Var2
#1 1 1 0 0 0
#2 2 2 0 0 0
#3 1 1 1 1 1
#4 2 2 1 1 1
#5 1 1 2 2 2
#6 2 2 2 2 2
Or as #Frank mentioned in the comments, we can also do
`[.noquote`(df1,c(1,1,2,2,2))

Add a column that divides another column into n chunks, R

There's no easy way to describe my question, that's probably why I was not able to find answer through search.
So I have a data frame with 3 columns, one of the columns is Subject number, the other two columns are Correctness and Block. There are 2 participants, each was exposed to 2 blocks of 3 stimuli in each block.
subj corr block
1 1 1 1
2 1 0 1
3 1 1 1
4 1 1 2
5 1 1 2
6 1 1 2
7 2 0 1
8 2 1 1
9 2 1 1
10 2 0 2
11 2 1 2
12 2 1 2
So what I want to do is to create another column that look at a specific subj number and divide the block columns corresponding to the subj into 3 even chunks (the original df has 2 chunks). In general, I want to know how to divide the stimuli each subj is exposed to in to N chunks and input the chunk number into another column.
subj corr block newblock
1 1 1 1 1
2 1 0 1 1
3 1 1 1 2
4 1 1 2 2
5 1 1 2 3
6 1 1 2 3
7 2 0 1 1
8 2 1 1 1
9 2 1 1 2
10 2 0 2 2
11 2 1 2 3
12 2 1 2 3
Something like this:
library(dplyr)
n_chunks = 3
df %>%
group_by(subj) %>%
mutate(newblock = rep(1:n_chunks, each = ceiling(n() / n_chunks))[1:n()])
How much of this is necessary depends on your use case. If you can guarantee that n_chunks evenly divides the number of observations for each subject you can simplify to:
df %>%
group_by(subj) %>%
mutate(newblock = rep(1:n_chunks, each = n() / n_chunks))

Generating large drawing lists in R

Say I have a list in R like so,
[1] 3 5 4 7
And I want to generate all "drawings" from this list, from 1 up to the value of each number. For example,
1 1 1 1
1 1 1 2
1 1 1 3
...
2 3 3 1
2 3 3 2
2 3 3 3
...
3 5 4 7
I know I have used rep() in the past to do something very similar, which works for lists of 2 or 3 numbers (i.e. something like 1 4 5), but I'm not sure how to generalize this here.
Thoughts?
As suggested in comments, use Map function to apply seq to elements of your vector, then use expand.grid to generate data.frame with Cartesian product of result's elements:
head(expand.grid(Map(seq,c(3,5,4,7))))
Var1 Var2 Var3 Var4
1 1 1 1 1
2 2 1 1 1
3 3 1 1 1
4 1 2 1 1
5 2 2 1 1
6 3 2 1 1

Add column in R based on comparison with another column

I have a beginner R question.
I want to add a column "d" that has a value of 1 if the corresponding row in "c" is >4, and 0 otherwise. I think that if I can do this basic thing I can extend the logic to my other questions. Basically, I can't figure out how to do basic comparisons between entries in a given row.
Here is a sample set of code:
# initial data
a=c(0,1,1)
b=c(1,2,3)
c=c(4,5,6)
data=data.frame(a,b,c)
Any help would be appreciated. Thanks!
One way:
> data
a b c
1 0 1 4
2 1 2 5
3 1 3 6
> data$d=ifelse(data$c>4,1,0)
> data
a b c d
1 0 1 4 0
2 1 2 5 1
3 1 3 6 1
Another common way is to rely on the fact that TRUE/FALSE convert to 1/0 when converted to numeric:
> data$d2=as.numeric(data$c>4)
> data
a b c d d2
1 0 1 4 0 0
2 1 2 5 1 1
3 1 3 6 1 1

Resources