3-Dimesion Array in R - r

Suppose in one data frame I have, (they are strings)
data1<-data.frame(c("number1","number2"),c("dog,cat","pigeon,leopard"))
and in another data frame I have
data2<-data.frame(c("pigeon","leopard","dog","cat"),
c("5 6 7 8","10 11 12 13","1 2 3 4","5 6 7 8"))
data2:
pigeon 5 6 7 8
leopard 10 11 12 13
dog 1 2 3 4
cat 5 6 7 8
My expected output is a 3-d matrix which would give me:
i=number1/number2
j=the strings corresponding to i
k=the values from the 2nd data frame.
That is I will have, if i select number1,
dog 1 2 3 4
cat 5 6 7 8

It seems that you just want an extra column in data2 with "number1" and "number2" in the correct places and not really a 3d array.
data2 <- data.frame(j = c("pigeon","leopard","dog","cat"),
k = c("5 6 7 8","10 11 12 13","1 2 3 4","5 6 7 8"),
i = c("number2", "number2", "number1", "number1"))
Then you can choose everything for "number1" using
data2[data2$i == "number1", ]
If you don't like to have the i column in the result you can do:
data2[data2$i == "number1", ][c("j", "k")]
## j k
## 3 dog 1 2 3 4
## 4 cat 5 6 7 8

I'm not sure I understand your question, but if you want to select by numbers form data1 in data2 you could do
lapply(seq_along(data1[, 1]), function(i) data2[data2[, 1] %in% strsplit(as.character(data1[i, 2]), ",")[[1]],])
which will resolve in a list of matrices
# [[1]]
# c..pigeon....leopard....dog....cat.. c..5.6.7.8....10.11.12.13....1.2.3.4....5.6.7.8..
# 3 dog 1 2 3 4
# 4 cat 5 6 7 8
#
# [[2]]
# c..pigeon....leopard....dog....cat.. c..5.6.7.8....10.11.12.13....1.2.3.4....5.6.7.8..
# 1 pigeon 5 6 7 8
# 2 leopard 10 11 12 13

Related

I would like to extract the columns of each element of a list in R

I'd to extract the 3rd column (c) of each element in this list and store the result.
(I've listed the data frame in this example so that it looks like the long list of lists I have):
set.seed(59)
df<- data.frame(a=c(1,4,5,2),b=c(9,2,7,4),c=c(5,2,9,4))
df1<- data.frame(df,2*df)
df1<- list(df,2*df)
[[1]]
a b c
1 1 9 5
2 4 2 2
3 5 7 9
4 2 4 4
[[2]]
a b c
1 2 18 10
2 8 4 4
3 10 14 18
4 4 8 8
Seems fairly simple for just one element
> df1[[1]]["c"]
c
1 5
2 2
3 9
4 4
> df1["c"] # cries again
[[1]]
NULL
All I want to see is:
[[1]]
c
1 5
2 2
3 9
4 4
[[2]]
c
1 10
2 4
3 18
4 8
Thanks in advance
Use lapply :
data <- lapply(df1, function(x) x[, 'c', drop = FALSE])
data
#[[1]]
# c
#1 5
#2 2
#3 9
#4 4
#[[2]]
# c
#1 10
#2 4
#3 18
#4 8
When you subset one column dataframe it coerces it to lowest possible dimension which is a vector in this case. drop = FALSE is needed to keep it as a dataframe.

How to find closest match from list in R

I have a list of numbers and would like to find which is the next highest compared to each number in a data.frame. I have:
list <- c(3,6,9,12)
X <- c(1:10)
df <- data.frame(X)
And I would like to add a variable to df being the next highest number in the list. i.e:
X Y
1 3
2 3
3 3
4 6
5 6
6 6
7 9
8 9
9 9
10 12
I've tried:
df$Y <- which.min(abs(list-df$X))
but that gives an error message and would just get the closest value from the list, not the next above.
Another approach is to use findInterval:
df$Y <- list[findInterval(X, list, left.open=TRUE) + 1]
> df
X Y
1 1 3
2 2 3
3 3 3
4 4 6
5 5 6
6 6 6
7 7 9
8 8 9
9 9 9
10 10 12
You could do this...
df$Y <- sapply(df$X, function(x) min(list[list>=x]))
df
X Y
1 1 3
2 2 3
3 3 3
4 4 6
5 5 6
6 6 6
7 7 9
8 8 9
9 9 9
10 10 12

repeat sequences from vector

Say I have a vector like so:
vector <- 1:9
#$ [1] 1 2 3 4 5 6 7 8 9
I now want to repeat every i to i+x sequence n times, like so for x=3, and n=2:
#$ [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
I'm accomplishing this like so:
index <- NULL
x <- 3
n <- 2
for (i in 1:(length(vector)/3)) {
index <- c(index, rep(c(1:x + (i-1)*x), n))
}
#$ [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
This works just fine, but I have a hunch there's got to be a better way (especially since usually, a for loop is not the answer).
Ps.: the use case for this is actually repeating rows in a dataframe, but just getting the index vector would be fine.
You can try to first split the vector, then use rep and unlist:
x <- 3 # this is the length of each subset sequence from i to i+x (see above)
n <- 2 # this is how many times you want to repeat each subset sequence
unlist(lapply(split(vector, rep(1:(length(vector)/x), each = x)), rep, n), use.names = FALSE)
# [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
Or, you can try creating a matrix and converting it to a vector:
c(do.call(rbind, replicate(n, matrix(vector, ncol = x), FALSE)))
# [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9

subset dataframe based on conditions in vector

I have two dataframes
#df1
type <- c("A", "B", "C")
day_start <- c(5,8,4)
day_end <- c(12,10,11)
df1 <- cbind.data.frame(type, day_start, day_end)
df1
type day_start day_end
1 A 5 12
2 B 8 10
3 C 4 11
#df2
value <- 1:10
day <- 4:13
df2 <- cbind.data.frame(day, value)
day value
1 4 1
2 5 2
3 6 3
4 7 4
5 8 5
6 9 6
7 10 7
8 11 8
9 12 9
10 13 10
I would like to subset df2 such that each level of factor "type" in df1 gets its own dataframe, only including the rows/days between day_start and day_end of this factor level.
Desired outcome for "A" would be..
list_of_dataframes$df_A
day value
1 5 2
2 6 3
3 7 4
4 8 5
5 9 6
6 10 7
7 11 8
8 12 9
I found this question on SO with the answer suggesting to use mapply(), however, I just cannot figure out how I have to adapt the code given there to fit my data and desired outcome.. Can someone help me out?
The following solution assumes that you have all integer values for days, but if that assumption is plausible, it's an easy one-liner:
> apply(df1, 1, function(x) df2[df2$day %in% x[2]:x[3],])
[[1]]
day value
2 5 2
3 6 3
4 7 4
5 8 5
6 9 6
7 10 7
8 11 8
9 12 9
[[2]]
day value
5 8 5
6 9 6
7 10 7
[[3]]
day value
1 4 1
2 5 2
3 6 3
4 7 4
5 8 5
6 9 6
7 10 7
8 11 8
You can use setNames to name the dataframes in the list:
setNames(apply(df1, 1, function(x) df2[df2$day %in% x[2]:x[3],]),df1[,1])
Yes, you can use mapply:
Define a function that will do what you want:
fun <- function(x,y) df2[df2$day >= x & df2$day <= y,]
Then use mapply to apply this function with every element of day_start and day_end:
final.output <- mapply(fun,df1$day_start, df1$day_end, SIMPLIFY=FALSE)
This will give you a list with the outputs you want:
final.output
[[1]]
day value
2 5 2
3 6 3
4 7 4
5 8 5
6 9 6
7 10 7
8 11 8
9 12 9
[[2]]
day value
5 8 5
6 9 6
7 10 7
[[3]]
day value
1 4 1
2 5 2
3 6 3
4 7 4
5 8 5
6 9 6
7 10 7
8 11 8
You can name each data.frameof the list with setNames:
final.output <- setNames(final.output,df1$type)
Or you can also put an attribute type on the data.frames of the list:
fun <- function(x,y, type){
df <- df2[df2$day >= x & df2$day <= y,]
attr(df, "type") <- as.character(type)
df
}
Then each data.frame of final.output will have an attribute so you know which type it is:
final.output <- mapply(fun,df1$day_start, df1$day_end,df1$type, SIMPLIFY=FALSE)
# check wich type the first data.frame is
attr(final.output[[1]], "type")
[1] "A"
Finally, if you do not want a list with the 3 data.frames you can create a function that assigns the 3 data.frames to the global environment:
fun <- function(x,y, type){
df <- df2[df2$day >= x & df2$day <= y,]
name <- as.character(type)
assign(name, df, pos=.GlobalEnv)
}
mapply(fun,df1$day_start, df1$day_end, type=df1$type, SIMPLIFY=FALSE)
This will create 3 separate data.frames in the global environment named A, B and C.

Convert categorical data in data frame to weighted adjacency matrix

I have the following data frame, call it DF, which is a data frame consisting in three vectors: "Chunk" "Name," and "Frequency." I need to turn it into a NameXName adjacency matrix where Names are considered adjacent when they reside in the same chunk. So for example, in the first lines, Gretel and Friedrich are adjacent because they are both in Chunk2. And the weight of the relationship should be based on "Frequency," precisely the number of times they are co-present in the same chunk, so for the Gretel/Friedrich example, Frequency(Gretel)+Frequency(Friedrich)-1 = 5
Chunk Name Frequency
1 2 Gretel 2
2 2 Pollock 1
3 2 Adorno 1
4 2 Friedrich 4
5 3 Max 1
6 3 Horkheimer 1
7 3 Adorno 1
8 4 Friedrich 5
9 4 Pollock 1
10 4 March 1
11 5 Comte 3
12 7 Jaspers 1
13 7 Huxley 2
14 8 Nietzsche 1
15 8 Sade 2
16 8 Felix 1
17 8 Weil 1
18 8 Western 1
19 8 Lowenthal 1
20 8 Kant 1
21 8 Hitler 1
I started to crack at this by splitting the data frame according to DF$Chunk,
> DF.split<-split(DF, DF$Chunk)
$`2`
Chunk Name Frequency
1 2 Gretel 2
2 2 Pollock 1
3 2 Adorno 1
4 2 Friedrich 4
$`3`
Chunk Name Frequency
5 3 Max 1
6 3 Horkheimer 1
7 3 Adorno 1
$`4`
Chunk Name Frequency
8 4 Friedrich 5
9 4 Pollock 1
10 4 March 1
which I thought got closer, but it returns list items that I am having trouble turning back into workable data frames.
I have also tried to start by turning this into a ChunkXName adjacency matrix:
> chunkbyname<-tapply(DF$Frequency , list(DF$Name,DF$Chunk) , as.character )
with the hopes of multiplying chunkbyname by its transpose to get the NAmeXName matrix, but it seems this is the matrix is too sparse or complex (Error in a %*% b : requires numeric/complex matrix/vector arguments).
Any help getting this data frame into an adjacency matrix greatly appreciated.
Is this what you are looking for?
df3 <- by(df, df$Chunk, function(x){
mm <- outer(x$Frequency, x$Frequency, "+") - 1
rownames(mm) <- x$Name
colnames(mm) <- x$Name
mm
})
df3
# $`2`
# Gretel Pollock Adorno Friedrich
# Gretel 3 2 2 5
# Pollock 2 1 1 4
# Adorno 2 1 1 4
# Friedrich 5 4 4 7
#
# $`3`
# Max Horkheimer Adorno
# Max 1 1 1
# Horkheimer 1 1 1
# Adorno 1 1 1
#
# $`4`
# Friedrich Pollock March
# Friedrich 9 5 5
# Pollock 5 1 1
# March 5 1 1

Resources