I have a list:
mylist <- c("a","b","c")
mylist <- as.list(mylist)
and I have a Dataframe like this:
df <- data.frame("col1" = 1, "col2" = 2, "col3" = 3)
View(df)
col1 col2 col3
1 1 2 3
How to add mylist as a new element in a new column col4 in row number 1?
Expected output:
col1 col2 col3 col4
1 1 2 3 [a,b,c]
Is it event possible in R? I started learning R few days ago, I came from Pandas and it was possible there.
Related
I have a dataframe and I want to drop columns which are defined in a list.
First I have made a list of samples (which I want to keep) out of a excisting dataframe:
df_1 <- data.frame(sample = c("col1","col3"), gender = c("m","v"))
samplename <- list(df_1)
Then I want to drop the columns from another dataframe which are not in this list of samplenames:
test_df <- data.frame(col1 = c("a", "b", "c", "d", "e"), col2 = seq(1, 5), col3 = rep(3, 5), col4 = c("aa","bb","cc","dd","ee"))
for (col in colnames(test_df)){
if (!(col %in% samplename[[1]])){
test_df <- test_df[, col, drop = TRUE]
}
}
But this code is not working. What is a better way to perform this task? Where do I go wrong?
You can try:
test_df[,!(names(test_df) %in% df_1$sample)]
col2 col4
1 1 aa
2 2 bb
3 3 cc
4 4 dd
5 5 ee
Basically the same as Duck's answer:
library(dplyr)
test_df %>%
select(!df_1$sample)
returns
col2 col4
1 1 aa
2 2 bb
3 3 cc
4 4 dd
5 5 ee
I want to do a union of two dataframes, that share some rows with same rowName. For those rows with common rowNames, I would like to take into account the second dataframe values, and not the first one's. For example :
df1 <- data.frame(col1 = c(1,2), col2 = c(2,4), row.names = c("row_1", "row_2"))
df1
# col1 col2
# row_1 1 2
# row_2 2 4
df2 <- data.frame(col1 = c(3,6), col2 = c(10,99), row.names = c("row_3", "row_2"))
df2
# col1 col2
# row_3 3 6
# row_2 10 99
The result I would like to obtain would then be :
someSpecificRBind(df1,df2, takeIntoAccount=df2)
# col1 col2
# row_1 1 2
# row_2 10 99
# row_3 3 6
The function rbind doesn't do the job, actually it updates rowNames for common ones.
I would conceptualize this as only adding to df2 the rows in df1 that aren't already there:
rbind(df2, df1[setdiff(rownames(df1), rownames(df2)), ])
We get the index of duplicated elements and use that to filter
rbind(df2, df1)[!duplicated(c(row.names(df2), row.names(df1))),]
I have a very large data frame that contains 100 rows and 400000 columns.
To sample each column, I can simply do:
df <- apply(df, 2, sample)
But I want every two column to be sampled together. For example, if originally col1 is c(1,2,3,4,5) and col2 is also c(6,7,8,9,10), and after resampling, col1 becomes c(1,3,2,4,5), I want col2 to be c(6,8,7,9,10) that follows the resampling pattern of col1. Same thing for col3 & col4, col5 & col6, etc.
I wrote a for loop to do this, which takes forever. Is there a better way? Thanks!
You might try this; split the data frame every two columns with split.default, for each sub data frame, sample the rows and then bind them together:
df <- data.frame(col1 = 1:5, col2 = 6:10, col3 = 11:15)
index <- seq_len(nrow(df))
cbind.data.frame(
setNames(lapply(
split.default(df, (seq_along(df) - 1) %/% 2),
function(sdf) sdf[sample(index),,drop=F]),
NULL)
)
# col1 col2 col3
#5 5 10 12
#4 4 9 11
#1 1 6 15
#2 2 7 14
#3 3 8 13
I searched for a while to try to solve this, but unfortunately couldn't find an answer.
In my dataframe, the last column contains strings which match column names. I would like to create another column that for each row returns(copies) the value that matches that column name.
For example, say my data is:
col1 <- c(1, 4, 6, 0, 5)
col2 <- c(4, 6, 7, 8, 6)
col3 <- c(0, 4, 2, 2, 1)
col4 <- c("col1", "col1", "col2", "col3", "col1")
df <- data.frame(col1, col2, col3, col4)
and what I want to achieve is col5 which copies relevant cells from each row:
col1 col2 col3 col4 col5
1 4 0 col1 1
4 6 4 col1 4
6 7 2 col2 7
0 8 2 col3 2
5 6 1 col1 5
Basically it looks at col4 and returns the value from the same row that matches that column name.
This is obviously a very simplified version of my data which is why I'd like to automate it.
I would really appreciate any help :)
We can use row/col indexing to extract the elements from the dataset to create the 'col5'.
df$col5 <- df[-4][cbind(1:nrow(df), match(as.character(df$col4), colnames(df)))]
df$col5
#[1] 1 4 7 2 5
This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 7 years ago.
I'd like to know how to consolidate duplicate rows in a data frame and then combine the duplicated values in another column.
Here's a sample of the existing dataframe and two dataframes that would be acceptable as a solution
df1 <- data.frame(col1 = c("test1", "test2", "test2", "test3"), col2 = c(1, 2, 3, 4))
df.ideal <- data.frame(col1 = c("test1", "test2", "test3"), col2 = c(1, "2, 3", 4))
df.ideal2 <- data.frame(col1 = c("test1", "test2", "test3"),
col2 = c(1, 2, 4),
col3 = c(NA, 3, NA))
In the first ideal dataframe, the duplicated row is collapsed and the column is added with both numbers. I've looked at other similar questions on stack overflow, but they all dealt with combining rows. I need to delete the duplicate row because I have another dataset I'm merging it with that needs the a certain number of rows. So, I want to preserve all of the values. Thanks for your help!
To go from df1 to df.ideal, you can use aggregate().
aggregate(col2~col1, df1, paste, collapse=",")
# col1 col2
# 1 test1 1
# 2 test2 2,3
# 3 test3 4
If you want to get to df.ideal2, that's more of a reshaping from long to wide process. You can do
reshape(transform(df1, time=ave(col2, col1, FUN=seq_along)), idvar="col1", direction="wide")
# col1 col2.1 col2.2
# 1 test1 1 NA
# 2 test2 2 3
# 4 test3 4 NA
using just the base reshape() function.
Another option would be to use splitstackshape
library(data.table)
library(splitstackshape)
DT1 <- setDT(df1)[,list(col2=toString(col2)) ,col1]
DT1
# col1 col2
#1: test1 1
#2: test2 2, 3
#3: test3 4
You could split the col2 in DT1 to get the df.ideal2 or
cSplit(DT1, 'col2', sep=',')
# col1 col2_1 col2_2
#1: test1 1 NA
#2: test2 2 3
#3: test3 4 NA
or from df1
dcast.data.table(getanID(df1, 'col1'), col1~.id, value.var='col2')
# col1 1 2
#1: test1 1 NA
#2: test2 2 3
#3: test3 4 NA