How to store vector in dataframe in R - r

I am trying to create a dataframe through a for loop and trying to add a vector for each row of the data frame.
The rows are people and the columns are the name category and points category.
For example I'm trying to have something like...
Name Points
Susie c(12,45,23)
Bill c(13,24,12,89)
CJ c(12)
So far my code looks like
names_list <-c("Susie","Bill","CJ")
result = data.frame()
for (name in names_list){
listing = .....
frame = data.frame(name,listing)
names(frame) = c("name","list")
result <- rbind(result,frame)
}
Where listing happens to be the points associated with that name. However instead of creating 1 row for each name containing all their points, it creates multiple rows with the same name for each point.
Result looks like
1 Susie 12
2 Susie 45
3 Susie 23
4 Bill 13
5 Bill 24
6 Bill 12
7 Bill 89
8 CJ 12

The specific problem you've encountered is due to data.frame flattening any list inputs. This can be prevented using the identify function I. For example,
data.frame(a = 1, b = list(c("a", "b")))
doesn't do what you want, but
data.frame(a = 1, b = I(list(c("a", "b"))))
does. A discussion of this behavior and some alternatives are available at http://r4ds.had.co.nz/many-models.html#list-columns-1
You can use I to produce the desired result using your example as well:
names_list <-c("Susie","Bill","CJ")
points <- list(c(12,45,23),
c(13,24,12,89),
12)
result = data.frame()
for (i in 1:length(names_list)){
frame = data.frame(names_list[[i]], I(points[i]))
names(frame) = c("name","list")
result <- rbind(result,frame)
}
though as pointed out in the comments, there are better ways to do it. All you really need is
data.frame(
name = names_list,
points = I(points))

I don't know in what structure your vector value sare but in general to nest vectors in a column you can do something like this:
names <- c("A", "B", "C")
vectors <- list(list(1,2,3), list(4,5,6), list(7,8,9))
as.data.frame(cbind(names, vectors))
names vectors
1 A 1, 2, 3
2 B 4, 5, 6
3 C 7, 8, 9

Name <- c("Suzie", "Bill", "CJ")
Points <- list(c( 12,45,23),
c(13,24,12,89),
c( 12)
)
result <- as.data.frame(cbind(Name, Points))
Then:
print(result)
Gives:
> result
Name Points
1 Suzie 12, 45, 23
2 Bill 13, 24, 12, 89
3 CJ 12
Note:
print(result$Points)
Gives:
> result$Points
[[1]]
[1] 12 45 23
[[2]]
[1] 13 24 12 89
[[3]]
[1] 12

Related

fast way to apply an if loop within a for loop

This is my df :
a <- data.frame(x1 = 1:3, x2 = 0, GF = c("Pelagic", "Demersal", "Cephalopod"), Pelagic = 6, Demersal = 7, Cephalopod = 8)
I have a list like this :
GF_list <- c("Pelagic", "Demersal", "Cephalopod")
I want to attribute to the x2 column the value corresponding to the GF of the line. So I do this
for (i in 1 : nrow(a)) {
for (j in 1 : length(GF_list)) {
if (a$GF[i] == GF_list[j]) {
a$x2[i] <- a[i,(ncol(a) + (- length(GF_list) + j))]
}}
}
But it takes a very long time ... (I have a large data frame)
Does it exist a faster way to applicate this attribution ? I think about a way which eliminates the first loop : "for (i in 1 : nrow(a))"
Thank you
So you want to select a different column from each row? You can do complicated extractions from a data.frame with a numeric matrix. Here's how it might work
a$x2 <- a[cbind(1:nrow(a), match(GF_list, names(a)))]
The matrix has a column for row numbers and column numbers. We use match() to find the right column for each row.
One way is using apply row-wise and select the value of column from the GF column of that row.
a$x2 <- apply(a, 1, function(x) x[x[["GF"]]])
a
# x1 x2 GF Pelagic Demersal Cephalopod
#1 1 6 Pelagic 6 7 8
#2 2 7 Demersal 6 7 8
#3 3 8 Cephalopod 6 7 8
Here is a solution which gives a numeric result:
i <- 1:nrow(a)
j <- which(a$GF %in% GF_list)
as.matrix(a[,(ncol(a)-length(GF_list)+1):ncol(a)])[cbind(i,j)]

Assign a vector dynamically in R

I have multiple vectors Di, where i = 1, 2,..., 40. Now in a for-loop, I want to do some operations on these. The following pseudo-code summarizes my objective.
for i in 1:40
D = Di # How to do this?
# ... do some operations on D #
Edit: Please note that each Di is a separate vector.
put them all in a list, each list object (the vector) can be accessed by using the index notation.
MyVectors = list(D1 = c(1:10),
D2 = c(11:20))
> MyVectors[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
> MyVectors[[2]]
[1] 11 12 13 14 15 16 17 18 19 20
therefore you can access them as such:
for(i in 1:2){
MyVectors[[i]] = MyVectors[[i]] + 2
}
Funnily, I just answered a similar question about 45 minutes ago. I stand by the philosophy I described in that answer with respect to this question. But because you have 40 loose objects, instead of just 2, the "separateness" approach really doesn't make sense. You should use the "systematicness" approach, as follows:
Ds <- list(
c(...), ## 1st vector
c(...), ## 2nd vector
...
c(...) ## 40th vector
);
for (i in seq_along(Ds)) {
## do some operations on Ds[[i]]
};
Funnily, I answered another question an hour ago. In the same way, we can place the vectors in a list and then do the operation with in each list element
MyVectors = list(D1 = c(1:10),
D2 = c(11:20))
lapply(MyVectors, function(x) x +2)

How to construct and add to a data frame with named columns?

I cannot figure out how to do this without throwing errors. I have a set of column names for my data frame I want to create and add to that looks like this:
x <- c("A", "B", "C")
So, I go down through the loop and I calculate some numerical values in a vector, say:
z <- c(1, 5, 7, 8, 34, 5)
z is the same dimension each time through the loop.
The first time through (or even outside the loop) I want to initialize a data frame by doing something like:
df$x[1] <- z
so I have a data frame that looks like:
A
1 1
2 5
3 7
4 8
5 34
6 5
The next time through the loop I want to add another column to df with a column heading being the second element of x, and a set of new z values. If the data frame has to be completely dimensioned ahead of time, I could calculate variables outside the loop to do this, say, M and N, but these may change from one run to the next.
I cannot seem to figure out how to do this. Suggestions much appreciated.
Try this:
set.seed(1)
#set the column names
x <- c("A", "B", "C")
#create the list that later we will convert to a data.frame
df<-setNames(vector("list",length(x)),x)
#loop to produce the various z
for (i in 1:length(x)) {
#do some stuff to evaluate z
z<-sample(5)
#assign to an element of df
df[[i]]<-z
}
#coerce to a data.frame
df<-as.data.frame(df)
# A B C
#1 2 5 2
#2 5 4 1
#3 4 2 3
#4 3 3 4
#5 1 1 5

Generate matrix of combinations with rules, repeated binary choice

I am trying to do sampling of variables for a statistical analysis. I have 10 variables, and I want to examine every possible combination of 5 of them. However, I only want those that follow certain rules. I only want those with 1 xor 2, 3 xor 4, 5 xor 6, 7 xor 8 and 9 xor 10. In other words, all combinations given 5 binary choices (32).
Any idea how to do this efficiently?
A simple idea is to find all the 5 out 10 using:
library(gtools)
sets = combinations(10,5) # choose 5 out of 10, all possibilities
sets = split(sets, seq.int(nrow(sets))) #so it's loopable
And then loop over these keeping only the ones that meet the criteria and thus ending up with the 32 ones desired.
But surely there is a more efficient way than this.
This will construct a matrix whose 32 rows enumerate all the possible combinations satisfying your contraint:
m <- as.matrix(expand.grid(1:2, 3:4, 5:6, 7:8, 9:10))
## Inspect a few of the rows to see that this works:
m[c(1,4,9,16,25),]
# Var1 Var2 Var3 Var4 Var5
# [1,] 1 3 5 7 9
# [2,] 2 4 5 7 9
# [3,] 1 3 5 8 9
# [4,] 2 4 6 8 9
# [5,] 1 3 5 8 10
I found a solution too, but it's not quite as elegant as Josh O'Brien's above.
library(R.utils) #for intToBin()
binaries = intToBin(0:31) #binary numbers 0 to 31
sets = list() #empty list
for (set in binaries) { #loop over each binary number string
vars = numeric() #empty vector
for (cif in 1:5) { #loop over each char in the string
if (substr(set,cif,cif)=="0"){ #if its 0
vars = c(vars,cif*2-1) #add the first var
}
else {
vars = c(vars,cif*2) #else, add the second var
}
}
sets[[set]] = as.vector(vars) #add result to list
}
Based on the idea in your answer, an alternative for the record:
n = 5
sets = matrix(1:10, ncol = 2, byrow = TRUE)
#the "on-off" combinations for each position
combs = lapply(0:(2^n - 1), function(x) as.integer(intToBits(x)[seq_len(n)]))
#a way to get the actual values
matrix(sets[cbind(seq_len(n), unlist(combs) + 1L)], ncol = n, byrow = TRUE)

Conditional calculations in R

I have a dataframe with categories and values. Based on the category I want to subtract values that are stored in another table.
myframe <- data.frame(
x = factor(c("A", "D", "A", "C")),
y = c(8, 3, 9, 9))
reference <- c('A'= 1, 'B'= 2, 'C'= 3, 'D'= 4)
The desired (y-ref) outcome would be:
result <- data.frame(
x = factor(c("A", "D", "A", "C")),
y = c(8, 3, 9, 9),
r = c(7, -1, 8, 6))
x y r
1 A 8 7
2 D 3 -1
3 A 9 8
4 C 9 6
The reference 'table' is a named vector in this case but it could be changed to a better suited data format.
I am not sure how to accomplish this...
This is a fairly straight forward task using match and [...
myframe$r <- myframe$y - reference[ match( myframe$x , names( reference ) ) ]
# x y r
#1 A 8 7
#2 D 3 -1
#3 A 9 8
#4 C 9 6
Pretty sure this is a (several-times over) duplicate so we should find you a good pointer and close the question (but I commend you for showing input data and desired result, many questions are often not that well laid out).
EDIT
Well there are many, many match based questions on the site. It's hard to pick one to point to as an exact duplicate. But I suggest having a browse of a few of these by searching for "r match" (you can search by specific tags by enclosing the search term in square brackets like this "[r]").
The data.table way:
library(data.table)
# convert to data.table and set key for the upcoming merge
dt = data.table(myframe, key = 'x')
ref = data.table(x = names(reference), val = reference)
# merge and add a new column
dt[ref, r := y - val]
dt
# x y r
#1: A 8 7
#2: A 9 8
#3: C 9 6
#4: D 3 -1

Resources