Create a list with one named column in R - r

Okay, I'm stupid, but:
How can I create a list with one column, 10 rows and a column name, and the same numeric value in all fields? I know how to append it, e.g.
mylist["column_name"] <- rep(1, nrow(mylist))
but not how to create it on its own.
It should look like this:
> mylist
column_name
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1

Are you sure you want a list and not a data frame (as that is what your example looks like)? You can get it like this:
data.frame(column_name=rep(1,10))

Related

For loop to paste rows to create new dataframe from existing dataframe

New to SO, but can't figure out how to get this code to work. I have a dataframe that is very large, and is set up like this:
Number Year Type Amount
1 1 A 5
1 2 A 2
1 3 A 7
1 4 A 1
1 1 B 5
1 2 B 11
1 3 B 0
1 4 B 2
This goes onto multiple for multiple numbers. I want to take this dataframe and make a new dataframe that has two of the rows together, but it would be nested (for example, row 1 and row 2, row 1 and row 3, row 1 and row 4, row 2 and row 3, row 2 and row 4) where each combination of each year is together within types and numbers.
Example output:
Number Year Type Amount Number Year Type Amount
1 1 A 5 1 2 A 2
1 1 A 5 1 3 A 7
1 1 A 5 1 4 A 1
1 2 A 2 1 3 A 7
1 2 A 2 1 4 A 1
1 3 A 7 1 4 A 1
I thought that I would do a for loop to loop within number and type, but I do not know how to make the rows paste from there, or how to ensure that I am only getting the combinations of the rows once. For example:
for(i in 1:n_number){
for(j in 1:n_type){
....}}
Any tips would be appreciated! I am relatively new to coding, so I don't know if I should be using a for loop at all. Thank you!
df <- data.frame(Number= rep(1,8),
Year = rep(c(1:4),2),
Type = rep(c('A','B'),each=4),
Amount=c(5,2,7,1,5,11,0,2))
My interpretation is that you want to create a dataframe with all row combinations, where Number and Type are the same and Year is different.
First suggestion - join on Number and Type, then remove rows that have different Year. I added an index to prevent redundant matches (1 with 2 and 2 with 1).
df$index <- 1:nrow(df)
out <- merge(df,df,by=c("Number","Type"))
out <- out[which(out$index.x>out$index.y & out$Year.x!=out$Year.y),]
Second suggestion - if you want to see a version using a loop.
out2 <- NULL
for (i in c(1:(nrow(df)-1))){
for (j in c((i+1):nrow(df))){
if(df[i,"Year"]!=df[j,"Year"] & df[i,"Number"]==df[j,"Number"] & df[i,"Type"]==df[j,"Type"]){
out2 <- rbind(out2,cbind(df[i,],df[j,]))
}
}
}

How to remove columns of data from a data frame using a vector with a regular expression

I am trying to remove columns from a dataframe using a vector of numbers, with those numbers being just a part of the whole column header. What I'm looking to use is something like the wildcard "*" in unix, so that I can say that I want to remove columns with labels xxxx, xxkx, etc... To illustrate what I mean, if I have the following data:
data_test_read <- read.table("batch_1_8c9.structure-edit.tsv",sep="\t", header=TRUE)
data_test_read[1:5,1:5]
samp pop X12706_10 X14223_16 X14481_7
1 BayOfIslands_s088.fq 1 4 1 3
2 BayOfIslands_s088.fq 1 4 1 3
3 BayOfIslands_s089.fq 1 4 1 3
4 BayOfIslands_s089.fq 1 4 3 3
5 BayOfIslands_s090.fq 1 4 1 3
And I want to take out, for example, columns with headers (X12706_10, X14481_7), the following works
data_subs1=subset(data_test_read, select = -c(X12706_10, X14481_7))
data_subs1[1:4,1:4]
samp pop X14223_16 X15213_19
1 BayOfIslands_s088.fq 1 1 3
2 BayOfIslands_s088.fq 1 1 3
3 BayOfIslands_s089.fq 1 1 3
4 BayOfIslands_s089.fq 1 3 3
However, what I need is to be able to identify these columns by only the numbers, so, using (12706,14481). But, if I try this, I get the following
data_subs2=subset(data_test_read, select = -c(12706,14481))
data_subs2[1:4,1:4]
samp pop X12706_10 X14223_16
1 BayOfIslands_s088.fq 1 4 1
2 BayOfIslands_s088.fq 1 4 1
3 BayOfIslands_s089.fq 1 4 1
4 BayOfIslands_s089.fq 1 4 3
This is clearly because I haven't specified anything to do with the "x", or the "_" or what is after the underscore. I've read so many answers on using regular expressions, and I just can't seem to sort it out. Any thoughts, or pointers to what I might turn to would be appreciated.
First you can just extract the numbers from the headers
# for testing
col_names <- c("X12706_10","X14223_16","X14481_7")
# in practice, use
# col_names <- names(data_test_read)
samples <- gsub("X(\\d+)_.*","\\1",col_names)
The find the indexes of the samples you want to drop.
samples_to_drop <- c(12706, 14481)
cols_to_drop <- match(samples_to_drop, samples)
Then you can use
data_subs2 <- subset(data_test_read, select = -cols_to_drop)
to actually get rid of those columns.
Perhaps put this all in a function to make it easier to use
sample_subset <- function(x, drop) {
samples <- gsub("X(\\d+)_.*","\\1", names(x))
subset(x, select = -match(drop, samples))
}
sample_subset(data_test_read, c(12706, 14481))

Using Merge with an R By class object

So I have a "by" class object (which is essentially a list).
It is indexed by 2 factors [id1,id2], with a list associated with each unique pair.
e.g.
id1:1
id2:1
1,2,3
------
id1:1
id2:2
4,4,NA
------
id1:2
id2:1
NA
I would like to convert this to a data frame which has 3 columns {id1,id2,value} and would take the above and return
id1, id2, value
1 1 1
1 1 2
1 1 3
1 2 4
1 2 4
1 2 NA
2 1 NA
This can be done with a for loop but is obviously slow. I am looking to try and merge the value column back to a data frame which has indices 1 and 2.
Answer: Use the data.table package. It is ridiculously quick for these sorts of problems.

add identifier column

For example, my data looks like this:
Number Value
1 3
2 4
3 6
4 7
I want to add a third column as identifier column based on Value. If the value is >5, then group 1, otherwise group 2. Then return sth like this:
Number Value Group
1 3 2
2 4 2
3 6 1
4 7 1
Thanks for your help!
You can add a new column to data frame:
df$Group <- ifelse(df$Value > 5, 1, 2)
I recommend reading more about ?data.frame ?ifelse and other data frame operations like
?transform

How to convert row names of table into a vector

I have returned stats on my data using the table command as such:
subject<-c(4,4,2,2,3,3)
correct<-c(0,1,1,1,0,0)
test<-data.frame(subject,correct)
freq_test<-head(table(test$subject,test$correct))
This returns a table which looks like this
0 1
2 0 2
3 2 0
4 1 1
That's great, but the problem is that I would like, the first column to be a vector rather than row.names (so that I can code it properly as "subject").
Is there a way to get this column to act in this way?
Just make a new data frame with the row names of freq_test as the first column:
> df<-data.frame(as.numeric(rownames(freq_test)),freq_test)
> colnames(df)[1]="subject"
> df
subject X0 X1
2 2 0 2
3 3 2 0
4 4 1 1
>
Of course, you can rename X0 and X1 to whatever you want by editing colnames(df) as above.
If you want the data in "long" format (useful for some models and plotting, and especially when your tables are more complicated), the table method for the generic function as.data.frame will take care of this for you:
> as.data.frame(table(test))
subject correct Freq
1 2 0 0
2 3 0 2
3 4 0 1
4 2 1 2
5 3 1 0
6 4 1 1
I think you should have used the standard method of construction of a data.frame, which is with name=values pairs:
test <- data.frame( subject=subject, correct=correct)
The first subject will be interpreted as a name to be quoted and the second subject will be interpreted .... i.e, the enclosing environments will be searched for an object named subject and its value will be assigned to the "subject" column of "test".

Resources