I have a data frame with following structure:
pat <- c(rep(1,50), rep(2,50), rep(3,50))
inc <- rep(c(rep(1,5), rep(2,5), rep(3,5), rep(4,5), rep(5,5),
rep(6,5), rep(7,5), rep(8,5), rep(9,5), rep(10,5)), 3)
df <- data.frame(cbind(pat, inc))
df is split into a list of elements:
all.inc = split(df, inc)
Now I want to split each element of this list into sub-lists. Something like:
all.pat = split(all.inc, pat)
This doesn't work, obviously. I've already tried the plyr functions and lapply, but didn't get it to work.
Any ideas?
Use lapply:
lapply(all.inc, function(x) split(x, x$pat))
If you'd like to split your data frame all at once, you could use
split(df, interaction(df$pat,df$inc))
However, the returned value will be a single list of data frames, which is slightly different from what you would get by splitting list elements.
Related
I want do same things to create different data frames, can I use lapply achieve?
I tried to did it but not succeed
xx<-c("a1","b1")
lapply(xx, function(x){
x<-data.frame(c(1,2,3,4),"1")
})
I hope I can get two data frames ,like
a1<-data.frame(c(1,2,3,4),"1")
b1<-data.frame(c(1,2,3,4),"1")
An option that assigns to the .Globalenv. This as pointed out is less efficient but was provided to answer the OP's question as is:
lapply(xx, function(x) assign(x,data.frame(A=c(1,2,3,4),
B="1"),
envir=.GlobalEnv))
You can then call each data frame with their names.
a1, b1.
You could try using sapply over the xx vector of names to populate a list with the data frames:
lst <- list()
xx <- c("a1", "b1")
sapply(xx, function(x) {
lst[[x]] <- data.frame(c(1,2,3,4), "1")
})
Then, you may access each data frame using the list, e.g. lst$a1.
I have a data.frame that has a bunch of rows with unique IDs followed by an amino acid sequence. I was wondering if there was a way to split up the rows into individual unique data.frame.
Here is an example
bigdf
>ENSCAFP00000018847.4
FGHFGHFGHFGHFHFGHFGHFGHFGHFHFGHFGHFHFGHFGHFHFHFHFGTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
>ENSCAFP00000018847.3
VCXVNSFRERYTRIOUHFSDAADSSAASAAAAGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
>ENSCAFP00000018847.2
ASDASDADASDASDASDASDASSADASASRPGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
>ENSCAFP00000018847.1
QWEQWEQWEQWEWQREWRQWEQWRQRQQRERPGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
It would be nice if I could have the name of the new data.frames be their IDs so hopefully the results could look like this
ENSCAFP00000018847.4
>ENSCAFP00000018847.4
FGHFGHFGHFGHFHFGHFGHFGHFGHFHFGHFGHFHFGHFGHFHFHFHFGTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
ENSCAFP00000018847.3
>ENSCAFP00000018847.3
VCXVNSFRERYTRIOUHFSDAADSSAASAAAAGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
ENSCAFP00000018847.2
>ENSCAFP00000018847.2
ASDASDADASDASDASDASDASSADASASRPGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
ENSCAFP00000018847.1
>ENSCAFP00000018847.1 QWEQWEQWEQWEWQREWRQWEQWRQRQQRERPGPVVTANHVEEPAMTPGVRTNSEGAFQTADLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
I know this should like a strange thing to do but need to do this for thousands of different amino acid sequences so it would be cool if I could find a way to split them all up in R
dput(df[1:3, c(1)])
c("> ENSCAFP00000018847.4 MFFINIISLIIPILLAVAFLTLVERKVLGYMQLRKGPNIVGPYGLLQPIADAVKLFTKEPLRPLTSSMSMFILAPILALSLALTMWIPLPMPYPLINMNLGVLFMLAMSSLAVYSILWSGWASNSKYALIGALRAVAQTISYEVTLAIILLSVLLMNGSFTLSTLIITQEHMWLIFPAWPLAMMWFISTLAETNRAPFDLTEGESELVSGFNVEYAAGPFALFFLAEYANIIMMNILTTILFFGAFHNPFMPELYSINFTMKTLLLTICFLWIRASYPRFRYDQLMHLLWKNFLPLTLALCMWHVALPIITASIPPQT",
"> ENSCAFP00000018847.3 MKPPILIIIMATIMTGTMIVMLSSHWLLIWIGFEMNMLAIIPILMKKYNPRAMEASTKYFLTQATASMLLMMGVTINLLYSGQWVISKISNPIASIMMTTALTMKLGLSPFHFWVPEVTQGITLMSGMILLTWQKIAPMSILYQISPSINTNLLMLMALTSVLVGGWGGLNQTQLRKIMAYSSIAHMGWMAAIITYNPTMMVLNLTLYILMTLSTFMLFMLNSSTTTLSLSHMWNKFPLITSMILILMLSLGGLPPLSGFIPKWMIIQELTKNNMIIIPTLMAITALLNLYFYLRLTYSTALTMFPSTNNMKMKWQFEYTKKATLLPPLIITSTMLLPLTPMLSVLD",
"> ENSCAFP00000018847.2 MFINRWLFSTNHKDIGTLYLLFGAWAGMVGTALSLLIRAELGQPGTLLGDDQIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSFLLLLASSMVEAGAGTGWTVYPPLAGNLAHAGASVDLTIFSLHLAGVSSILGAINFITTIINMKPPAMSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTYYSGKKEPFGYMGMVWAMMSIGFLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGGNIKWSPAMLWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMGGFAHWFPLFSGYTLNDTWAKIHFTIMFVGVNMTFFPQHFLGLSGMPRRYSDYPDAYTTWNTVSSMGSFISLTAVMLMIFMIWEAFASKREVAMVELTTTNIEWLHGCPPPYHTFEEPTYVIQK"
)
You can put all the lines in a named list of data frames and then use list2env() to put them in the global environment like this:
dfs <- apply(bigdf, MARGIN = 1, as.data.frame)
names(dfs) <- str_sub(bigdf[,1], start = 1, end = 20)
list2env(dfs, envir = .GlobalEnv)
You can just use the apply function and as.data.frame across the rows:
mydfs <- apply(df, 1, as.data.frame)
mydfs will be a list of the rows as individual dataframes. Note they will be coerced.
I'm beginning with R so I'm not really good at searching relevant answer for my question. I am sorry if similar questions have been asked.
I have a list made of data frames and lists.
I'd like to know how to keep only data frames so that I can bind them together to produce on huge data frame.
here I give you an example :
L1 <- list(c(1, "abc", 3))
L2 <- list(c("b","d"))
L3 <- list(L1,L2)
brand <- c("A","B","C","D")
price <- c(1,1,3,7)
df <- data.frame(brand , price)
brand2 <- c("E","F","G","H")
price2 <- c(20,3,5,10)
df2 <- data.frame(brand2, price2)
L4 <- list(df, L3, df2)
finaldf <- do.call("rbind.fill", L4)
Unfortunately I got this error : Error: All inputs to rbind.fill must be data.frames
So I know that the problem is that there is a list in that list L4. In my real data, there are even several lists in the big list. So can anyone tell me how to get rid of these lists inside the big list ? Thank you very very much !
You need to filter out which list entries are not data.frames like so:
is_df <- sapply(L4, is.data.frame)
finaldf <- do.call("rbind.fill", L4[is_df])
Alterntatively,
do.call("rbind.fill", Filter(is.data.frame, L4))
You can create an index to subset your list like so:
# Subset list
index <- sapply(L4, is.data.frame)
and then use it to make your final data.frame like so:
finaldf <- do.call("rbind", L4[index])
Keep in mind that in order for this to work both dataframes have to have the same column names, so when you create df2 you should specify the column names like so:
df2 <- data.frame(brand = brand2, price = price)
... before you even do the above.
Let's say I have a list of 30 data.frames, each containing 2 variables (called value, and rank), called myList
I'd know I can use
my.DF <- do.call("cbind", myList)
to create the output my.DF containing all the variables next to each other.
It is possible to cbind each variable individually into it's own data.frame i.e to just have a new data.frame of just the 2nd variable?
We can extract the second column by looping over the list (lapply) and wrap with data.frame.
data.frame(lapply(myList, `[`, 2))
If we want to separate the variables,
lapply(names(myList[[1]]), function(x)
do.call(cbind,lapply(myList, `[`, x)))
data
set.seed(24)
myList <- list( data.frame(value=1:6, rank= sample(6)),
data.frame(value=7:12, rank=sample(6)))
I have several data frames df1, df, 2...., df10. Columns (variables) are the same in all of them.
I want to create a new variable within each of them. I can easily do it "manually" as follows:
df1$newvariable <- ifelse(df1$oldvariable == 999, NA, df1$oldvariable)
or, alternatively
df1 = transform(df1, df1$newvariable= ifelse(df1$oldvariable==999, NA, df1$oldvariable)))
Unfortunately I'm not able to do this in a loop. If I write
for (i in names) { #names is the list of dataframes
i$newvariable <- ifelse(i$oldvariable == 999, NA, i$oldvariable)
}
I get the following output
Error in i$oldvariable : $ operator is invalid for atomic vectors
What I'd do is to pool all data.frame on to a list and then use lapply as follows:
df1 <- as.data.frame(matrix(runif(2*10), ncol=2))
df2 <- as.data.frame(matrix(runif(2*10), ncol=2))
df3 <- as.data.frame(matrix(runif(2*10), ncol=2))
df4 <- as.data.frame(matrix(runif(2*10), ncol=2))
# create a list and use lapply
df.list <- list(df1, df2, df3, df4)
out <- lapply(df.list, function(x) {
x$id <- 1:nrow(x)
x
})
Now, you'll have all the data.frames with a new column id appended and out is a list of data.frames. You can access each of the data.frames with x[[1]], x[[2]] etc...
This has been asked many times. The $<- is not capable of translating that "i" index into either the first or second arguments. The [[<- is capable of doing so for the second argument but not the first. You should be learning to use lapply and you will probably need to do it with two nested lapply's, one for the list of "names" and the other for each column in the dataframes. The question is incomplete since it lacks specific examples. Make up a set of three dataframes, set some of the values to "999" and provide a list of names.