I'm trying to convert a dataframe consisting of two columns into a named vector (nested list). The information in each row is essentially key:value pairs, so the lists in the final vector should each be named by the keys and contain their respective values.
Example input:
Var1 Var2
A 1
A 2
B 1
B 3
C 3
C 4
C 5
Example Output:
namedArray = list(A = c(1,2), B = c(1,3), C = c(3,4,5))
I managed to do this using dcast() in the reshape2 package, however this required additional post-processing to remove row names and NA's introduced by casting the data frame.
Is there a more efficient way to accomplish this?
If you have 2 columns: X and Y in dataframe df1, and you want Y's values to be the names of items with values from X:
myList <- as.list(df1$X)
names(myList) <- df1$Y
For the modified question, the answer is that there is already a functions that does exactly that ( and might have been a better answer that what I gave:
> split(dat$Var2, dat$Var1)
$A
[1] 1 2
$B
[1] 1 3
$C
[1] 3 4 5
Thank you #42- and #MMerry for getting me to think about split(). I found a nice solution splitting one variable by the other and wrapping the output into a list.
y <- as.list(split(df$Var2, df$Var1))
If you want key value pairs in a list from a data frame a technique could look like this:
x = data.frame(x=letters[1:5],y=1:5)
y = split(x,seq(1:nrow(x)))
names(y) = x$x
y$a
Related
I have multiple vector objects in my environments. The names of all these vectors contain the string "column" in them. For instance, I have vector objects called column_1, column_2, etc.
How do I programmatically make a data frame from all these vectors as the columns of the data frames without typing the names of the vectors?
and I don't want to simply say:
data.frame(column_1, column_2, ...)
Is there any method using apply() or do.call() functions?
I have done this
vec_list <- grep("column", ls(), value=T)
but this doesn't work
data.frame(vec_list)
I expect the output to be one data frame with the vector objects as the columns of the data frame.
If you have unequal length vectors, then you can use something like this-
Sample Data-
column_1 <- c("a","b","c")
column_2 <- c("a","b","c","d")
column_3 <- c(1,2,3,4,5,6)
> data.frame(lapply(mget(ls(pattern = "column_\\d+")), "length<-", max(lengths(mget(ls(pattern = "column_\\d+"))))))
column_1 column_2 column_3
1 a a 1
2 b b 2
3 c c 3
4 <NA> d 4
5 <NA> <NA> 5
6 <NA> <NA> 6
Note- Solution mentioned by #akrun is best, if you have equal length vectors.
data.frame(mget(ls(pattern = "column_\\d+")))
Let's say i have the following list of df's (in reality i have many more dfs).
seq <- c("12345","67890")
li <- list()
for (i in 1:length(seq)){
li[[i]] <- list()
names(li)[i] <- seq[i]
li[[i]] <- data.frame(A = c(1,2,3),
B = c(2,4,6))
}
What i would like to do is calculate the mean within the same cell position between the lists, keeping the same amount of rows and columns as the original lists. How could i do this? I believe I can use the apply() function, but i am unsure how to do this.
The expected output (not surprising):
A B
1 1 2
2 2 4
3 3 6
In reality, the values within each list are not necessarily the same.
If there are no NAs, then we can Reduce to get the sum of observations for each element and divide by the length of the list
Reduce(`+`, li)/length(li)
# A B
#1 1 2
#2 2 4
#3 3 6
If there are NA values, then it may be better to use mean (which has na.rm argument). For this, we can convert it to array and then use apply
apply(array(unlist(li), dim = c(dim(li[[1]]), length(li))), c(1, 2), mean)
An equivalent option in tidyverse would be
library(tidyverse)
reduce(li, `+`)/length(li)
Edit
This question seems to be a duplicate of the question How to group a vector into a list of vectors?, and the answer split(df$b, df$id) was suggested. First happy with the solution, I realized that the given answers do not fully address my question. In the below question, I would like to obtain a list in which the vector elements are assigned to the value of a third column (in my example df$a). This is important, as otherwise the order of df$b plays a role. I mean obviously I can arrange by df$a and then call split(), but maybe there is another way of doing that.
My sample df:
df <- data_frame(id = paste0('id',rep(1:2, each = 5)), a = rep(letters[1:5],2),b=c(1:5,5:1))
Df should be grouped by ID (in df$id). I would like to create a list of vectors for each group (id) element that contains the values of df$b. My approach
require(tidyr)
spread_df <- df %>% spread(id,b) #makes new columns for each id
#loop over spread_df
for (i in 1:length(spread_df)) {
list_group_elements [i]<- list(spread_df[[i]])
#I want each vector to be identified by the identifier of column df$a
#therefore:
names(list_group_elements[[i]]) <- list_group_elements[[1]]
}
This results in :
list_group_elements
[[1]]
a b c d e
"a" "b" "c" "d" "e"
[[2]]
a b c d e
1 2 3 4 5
[[3]]
a b c d e
5 4 3 2 1
I don't need the first element of the list, but the rest is basically what I need. I have the peculiar impression that my approach is somewhat not ideal and if someone has an idea to improve this, (e.g., with dplyr?) this would be highly appreciated. Why do I want this: I made a function that uses vectors as arguments and I would like to run this function over certain columns from dataframes - but only using the grouped values as arguments and not the entire column.
You may make df$b a named vector using setNames, and then split it into a list:
split(setNames(df$b, df$a), df$id)
# $id1
# a b c d e
# 1 2 3 4 5
#
# $id2
# a b c d e
# 5 4 3 2 1
One way is
lapply(levels(df$id), function(L) df$b[df$id == L])
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] 5 4 3 2 1
Consider by, object-oriented wrapper of tapply, designed to split dataframe by factor(s):
by(df, df$id, FUN=function(i) i$b)
I have a toy example to explain what I am trying to work on :
aski = data.frame(x=c("a","b","c","a","d","d"),y=c("b","a","d","a","b","c"))
I managed to do assigning unique ids to column y and now output looks like:
aski2 = data.frame(x=c("a","b","c","a","d","d"),y=c("1","2","3","2","1","4"))
as you see "b" is present in both col x and y and we assigned an id=1 in col y
and "a" with id=2 in col y and so on..
As you see these values are also present in col x.....
col x has "a" as its first element ."a" was also in col y and assigned an id=2
so I'll assign an id=2 for a in col x also
Now what i m trying to do next is look for these values in col x and if it occurs in col y I assign that id to it
FINAL DATAFRAME LIKE
aski3 = data.frame(x=c("2","1","4","2","3","3"),y=c("1","2","3","2","1","4"))
Without the need to create aski2 as an intermediate, a possible solution is to use match with lapply to get the numeric representations of the letters:
# create a vector of the unique values in the order
# in which you want them assigned to '1' till '4'
v <- unique(aski$y)
# convert both columns to integer values with 'match' and 'lapply'
aski[] <- lapply(aski, match, v)
which gives:
> aski
x y
1 2 1
2 1 2
3 4 3
4 2 2
5 3 1
6 3 4
If you want the number as characters, you can additionally do:
aski[] <- lapply(aski, as.character)
First, convert both columns to character vectors.
Then, collect all unique values from the two columns to use as levels of a factor.
Convert both columns to factors, then numeric.
aski = data.frame(x=c("a","b","c","a","d","d"),y=c("b","a","d","a","b","c"))
aski$x <- as.character(aski$x)
aski$y <- as.character(aski$y)
lev <- unique(c(aski$y, aski$x))
aski$x <- factor(aski$x, levels=lev)
aski$y <- factor(aski$y, levels=lev)
aski$x <- as.numeric(aski$x)
aski$y <- as.numeric(aski$y)
aski
A solution from dplyr. We can first create a vector showing the relationship between index and letter as vec by unique(aski$y). After this step, you can use Jaap's lapply solution, or you can use mutata_all from dplyr as follows.
# Create the vector showing the relationship of index and letter
vec <- unique(aski$y)
# View vec
vec
[1] "b" "a" "d" "c"
library(dplyr)
# Modify all columns
aski2 <- aski %>% mutate_all(funs(match(., vec)))
# View the results
aski2
x y
1 2 1
2 1 2
3 4 3
4 2 2
5 3 1
6 3 4
Data
aski <- data.frame(x = c("a","b","c","a","d","d"),
y = c("b","a","d","a","b","c"),
stringsAsFactors = FALSE)
I have a table like:
a
n_msi2010 n_msi2011
1 -0.122876 1.818750
2 1.328930 0.931426
3 -0.111653 4.400060
4 1.222900 4.500450
5 3.604160 6.110930
I would like to merge these two columns into one column to obtain (I don't want to keep column names):
a
n_msi2010
1 -0.122876
2 1.328930
3 -0.111653
4 1.222900
5 3.604160
6 1.818750
7 0.931426
8 4.400060
9 4.500450
10 6.110930
When I am using prefabricated data like
x <- cbind(c(1, 2, 3), c(4, 5, 6))
colnames(x)<-c("a","b")
c(t(x))
# 1 4 2 5 3 6
c((x))
# 1 2 3 4 5 6
the column merging works fine. Only in "a" exemple id doesn't work and it creates 2 separate vectors. I don't really understand why. Any help? Thanks
It seems like your question is about column versus row order vector creation from a data.frame.
Using t() on a data.frame converts the data.frame to a matrix, and using c() on the matrix removes its dimensions.
With that knowledge, you can try:
# create a vector of values, column by column
c(as.matrix(a)) # you are missing the `as.matrix` in your current approach
# create a vector of values, row by row
c(t(a)) # you already know this works
Other approaches to get the "column by column" result would be:
unlist(a, use.names = FALSE)
stack(a)[, "values"] # add `drop = FALSE` if you want to retain a data.frame
Not a elegant way but it seems it can combine two or several columns to one.
n_msi2010 <- 1:5
n_msi2011 <- 6:10
a <- data.frame(n_msi2010, n_msi2011)
vector <- vector()
for (i in 1:dim(a)[2]){
vector <- append(vector, as.vector(a[,i]))
vector
}
You may do
as.matrix(vector) or data.frame(vector)