Hi Stack Overflow Community,
I've invested now a few hours but I didn't find the answer. I have a list of 200 sublists in R. Each contains a character column and an integer column named FREQUENCY. My goal is to show only the integer columns. I've tested the function manually with the list-function and the first two sublists and it works:
mydata <- list(Name1[[1]]$FREQUENCY, Name1[[2]]FREQUENCY)
Now to my question: How is it possible to take all 200 sublists with one command. I need the list-function in this process, because I have to sum each FREQUENCY sublist in a next step:
lapply(mydata, sum)
Thank you guys!
Here's a base solution (if i understand properly):
your_list <- list(data.frame(a="hello",b=1),
data.frame(c="world",d=1))
# [[1]]
# a b
# 1 hello 1
#
# [[2]]
# c d
# 1 world 1
lapply(your_list,function(x) x[,sapply(x,is.numeric),drop=FALSE])
# [[1]]
# b
# 1 1
#
# [[2]]
# d
# 1 1
Related
I have two files. The first file is a data frame that is simply times in one column and individuals in a second
# [Time] [Individual]
# [1] 1528142 C5A1790
# [2] 1528142 C5A1059
# [3] 1528142 C5A1084
# [4] 1528142 C5A1564
# [5] 1528142 C5A1239
# [6] 1528142 C5A1180
the second is an N X N matrix in which both rows and columns are individuals, including those in the first matrix.
# [C5A1084] [C5A1059] [C5A1790] [C5A1180]
# 1 [C5A1084] 0 0.5 1 0
# 2 [C5A1059] 0.5 0 0 1
# 3 [C5A1790] 1 1 0 0.5
# 4 [C5A1180] 0 1 0.5 0
I need to create a vector containing the row numbers in the matrix at which I can find the individuals from the data frame, and in the order that they are listed in the data frame. For these example data it would be (3,2,1,4).
I tried to use the which() function as
RingIndex <- which(Matrix$IDcolumn == FrameIDs)
and received the "longer object length is not a multiple of shorter object length" message, presumably because the matrix includes more individuals than the data frame. %in% and match() are also returning errors stating that replacement has fewer rows than data.
Following the advice in the comments, I tried
RingIndex <- which(Matrix$IDcolumn %in% FrameIDs)
which successfully returned the correct row numbers, but in ascending order rather than the order of the original data. The match() function continues to complain of different replacement and original lengths.
What approach could I use to get my vector?
Many thanks!
df <- data.frame(Time = runif(6,1528142,1528150),
Individuals = c("C5A1790","C5A1791","C5A1792","C5A1793","C5A1794","C5A1795"))
> df
Time Individuals
1 1528144 C5A1790
2 1528143 C5A1791
3 1528144 C5A1792
4 1528148 C5A1793
5 1528145 C5A1794
6 1528143 C5A1795
nnMatrix <- matrix(runif(36,0,1),6,6)
colnames(nnMatrix) <- df$Individuals
rownames(nnMatrix) <- df$Individuals
> nnMatrix
C5A1790 C5A1791 C5A1792 C5A1793 C5A1794 C5A1795
C5A1790 0.08096946 0.8716328 0.6895134 0.05692825 0.4555460 0.53224424
C5A1791 0.42568532 0.5920239 0.4523232 0.11516185 0.8053652 0.72299411
C5A1792 0.42439187 0.6101881 0.8534429 0.86010851 0.1269521 0.41066857
C5A1793 0.26043345 0.8011337 0.8032234 0.30930988 0.2298927 0.93320166
C5A1794 0.43065533 0.2161525 0.6702832 0.89304071 0.6765714 0.09769635
C5A1795 0.70594252 0.1048099 0.7478553 0.87839534 0.5173364 0.69957502
> sapply(df$Individuals, function(t) which(colnames(nnMatrix) == t))
[1] 1 2 3 4 5 6
If you change the order
colnames(nnMatrix) <- rev(colnames(nnMatrix))
[1] 6 5 4 3 2 1
You may want to check for repetition and missing values, but the main approach is the same.
As suggested in the comments (#GKi) also match will work
> match(df$Individuals,colnames(nnMatrix))
[1] NA 1 3 4 5 6
Edit
This question seems to be a duplicate of the question How to group a vector into a list of vectors?, and the answer split(df$b, df$id) was suggested. First happy with the solution, I realized that the given answers do not fully address my question. In the below question, I would like to obtain a list in which the vector elements are assigned to the value of a third column (in my example df$a). This is important, as otherwise the order of df$b plays a role. I mean obviously I can arrange by df$a and then call split(), but maybe there is another way of doing that.
My sample df:
df <- data_frame(id = paste0('id',rep(1:2, each = 5)), a = rep(letters[1:5],2),b=c(1:5,5:1))
Df should be grouped by ID (in df$id). I would like to create a list of vectors for each group (id) element that contains the values of df$b. My approach
require(tidyr)
spread_df <- df %>% spread(id,b) #makes new columns for each id
#loop over spread_df
for (i in 1:length(spread_df)) {
list_group_elements [i]<- list(spread_df[[i]])
#I want each vector to be identified by the identifier of column df$a
#therefore:
names(list_group_elements[[i]]) <- list_group_elements[[1]]
}
This results in :
list_group_elements
[[1]]
a b c d e
"a" "b" "c" "d" "e"
[[2]]
a b c d e
1 2 3 4 5
[[3]]
a b c d e
5 4 3 2 1
I don't need the first element of the list, but the rest is basically what I need. I have the peculiar impression that my approach is somewhat not ideal and if someone has an idea to improve this, (e.g., with dplyr?) this would be highly appreciated. Why do I want this: I made a function that uses vectors as arguments and I would like to run this function over certain columns from dataframes - but only using the grouped values as arguments and not the entire column.
You may make df$b a named vector using setNames, and then split it into a list:
split(setNames(df$b, df$a), df$id)
# $id1
# a b c d e
# 1 2 3 4 5
#
# $id2
# a b c d e
# 5 4 3 2 1
One way is
lapply(levels(df$id), function(L) df$b[df$id == L])
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] 5 4 3 2 1
Consider by, object-oriented wrapper of tapply, designed to split dataframe by factor(s):
by(df, df$id, FUN=function(i) i$b)
I have a vector of strings that I'm trying to convert into a data frame with a frequency column. So far so good, but when I dim my data frame, I get only one column instead of two. I guess R is using the words as the index values.
Anyway here is how it starts. My list:
a<-c("welcoming", "whatsyourexcuse", "whiteway", "zero", "yay", "whatsyourexcuse", "yay")
Then, I tried to sort the frequency values in decreasing order and store as data frame using:
df <- as.data.frame(sort(table(a), decreasing=TRUE))
Problem is when I dim(df) I get [1] 5 1 instead of [1] 5 2. Here is what df looks like:
sort(table(a), decreasing = TRUE)
whatsyourexcuse 2
yay 2
welcoming 1
whiteway 1
zero 1
instead of:
a Freq
[1] whatsyourexcuse 2
[2] yay 2
[3] welcoming 1
[4] whiteway 1
[5] zero 1
Any pointers please? Thanks.
Try:
library(plyr)
a1 <- count(a)
a1[order(-a1$freq),]
# x freq
# 2 whatsyourexcuse 2
# 4 yay 2
# 1 welcoming 1
# 3 whiteway 1
# 5 zero 1
dim(a1)
#[1] 5 2
Or
a2 <- stack(sort(table(a),decreasing=TRUE))[,2:1]
dim(a2)
#[1] 5 2
When you are converting to data.frame using as.data.frame(sort(table(a), decreasing=TRUE)), the names of the elements become the rownames of the dataframe, so you are creating only one column instead of two. When you do sort, it no longer is the table object. For comparison check str(table(a)) and str(sort(table(a), decreasing=TRUE)))
You can also create the data.frame by
tbl <- sort(table(a), decreasing=TRUE)
data.frame(col1= names(tbl), Values= as.vector(tbl))
I would like to create a column that contains the objects names inside a lapply function, as a proxy I call it name.of.x.as.strig.function(), unfortunately I am not sure how to do it, maybe a combination of assign, do.call and paste. But so far using this function only led my into deeper troubles, I am quite sure there is a more R like solution.
# generates a list of dataframes,
data <- list(data.frame(c(1,2),c(3,3)),data.frame(c(1,2),c(3,3)),data.frame(c(1,2),c(3,3)),data.frame(c(1,2),c(3,3)))
# assigns names to dataframe
names(data) <- list("one","two", "tree", "four")
# subsets the second column into the object data.anova
data.anova <- lapply(data, function(x){x <- x[[2]];
return(matrix(x))})
This should allow me to create a column inside the dataframe that contains its name, for all matrices inside the list
data.anova <- lapply(data, function(x){
x$id <- name.of.x.as.strig.function(x)
return(x)})
I would like to retrieve:
3 one
3 one
3 two
3 two
...
Any input is highly appreciated.
Search history: function to retrieve object name as string, R get name of an object inside lapply...
Can it be that you are just looking for stack?
stack(lapply(data, `[[`, 2))
# values ind
# 1 3 one
# 2 3 one
# 3 3 two
# 4 3 two
# 5 3 tree
# 6 3 tree
# 7 3 four
# 8 3 four
(Or, using your original approach: stack(lapply(data, function(x) {x <- x[[2]]; x})))
If this is the case, melt from "reshape2" would also work.
Loop through the indices of data.anova, and use that to fetch both the data and the names:
data.anova <- lapply(seq_along(data.anova), function(i){
x <- as.data.frame(data.anova[[i]])
x$id <- names(data.anova)[i]
return(x)})
This produces:
# [[1]]
# V1 id
# 1 3 one
# 2 3 one
# [[2]]
# V1 id
# 1 3 two
# 2 3 two
# [[3]]
# V1 id
# 1 3 tree
# 2 3 tree
# [[4]]
# V1 id
# 1 3 four
# 2 3 four
Here is what my data look like.
id interest_string
1 YI{Z0{ZI{
2 ZO{
3 <NA>
4 ZT{
As you can see, can be multiple codes concatenated into a single column, seperated by {. It is also possible for a row to have no interest_string values at all.
How can I manipulate this data frame to extract the values into a format like this:
id interest
1 YI
1 Z0
1 ZI
2 Z0
3 <NA>
4 ZT
I need to complete this task with R.
Thanks in advance.
This is one solution
out <- with(dat, strsplit(as.character(interest_string), "\\{"))
## or
# out <- with(dat, strsplit(as.character(interest_string), "{", fixed = TRUE))
out <- cbind.data.frame(id = rep(dat$id, times = sapply(out, length)),
interest = unlist(out, use.names = FALSE))
Giving:
R> out
id interest
1 1 YI
2 1 Z0
3 1 ZI
4 2 ZO
5 3 <NA>
6 4 ZT
Explanation
The first line of solution simply splits each element of the interest_string factor in data object dat, using \\{ as the split indicator. This indicator has to be escaped and in R that requires two \. (Actually it doesn't if you use fixed = TRUE in the call to strsplit.) The resulting object is a list, which looks like this for the example data
R> out
[[1]]
[1] "YI" "Z0" "ZI"
[[2]]
[1] "ZO"
[[3]]
[1] "<NA>"
[[4]]
[1] "ZT"
We have almost everything we need in this list to form the output you require. The only thing we need external to this list is the id values that refer to each element of out, which we grab from the original data.
Hence, in the second line, we bind, column-wise (specifying the data frame method so we get a data frame returned) the original id values, each one repeated the required number of times, to the strsplit list (out). By unlisting this list, we unwrap it to a vector which is of the required length as given by your expected output. We get the number of times we need to replicate each id value from the lengths of the components of the list returned by strsplit.
A nice and tidy data.table solution:
library(data.table)
DT <- data.table( read.table( textConnection("id interest_string
1 YI{Z0{ZI{
2 ZO{
3 <NA>
4 ZT{"), header=TRUE))
DT$interest_string <- as.character(DT$interest_string)
DT[, {
list(interest=unlist(strsplit( interest_string, "{", fixed=TRUE )))
}, by=id]
gives me
id interest
1: 1 YI
2: 1 Z0
3: 1 ZI
4: 2 ZO
5: 3 <NA>
6: 4 ZT