I have a list l, which has the following features:
It has 3 elements
Each element is a numeric vector of length 5
Each vector contains numbers from 1 to 5
l = list(a = c(2, 3, 1, 5, 1), b = c(4, 3, 3, 5, 2), c = c(5, 1, 3, 2, 4))
I want to do two things:
First
I want to know how many times each number occurs in the entire list and I want each result in a vector (or any form that can allow me to perform computations with the results later):
Code 1:
> a <- table(sapply(l, "["))
> x <- as.data.frame(a)
> x
Var1 Freq
1 1 3
2 2 3
3 3 4
4 4 2
5 5 3
Is there anyway to do it without using the table() function. I would like to do it "manually". I try to do it right below.
Code 2: (I know this is not very efficient!)
x <- data.frame(
"1" <- sum(sapply(l, "[")) == 1
"2" <- sum(sapply(l, "[")) == 2
"3" <- sum(sapply(l, "[")) == 3
"4" <- sum(sapply(l, "[")) == 4
"5" <- sum(sapply(l, "[")) == 5)
I tried the following, but I did not work. I actually did not understand the result.
> sapply(l, "[") == 1:5
a b c
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE TRUE TRUE
[4,] FALSE FALSE FALSE
[5,] FALSE FALSE FALSE
> sum(sapply(l, "[") == 1:5)
[1] 2
Second
Now, I would like to get the number of times each number appears in the list, but now in each element $a, $b and $c. I thought about using the lapply() but I don't know how exactly. Following is what I tried, but it is inefficient just like Code 2:
lapply(l, function(x) sum(x == 1))
lapply(l, function(x) sum(x == 2))
lapply(l, function(x) sum(x == 3))
lapply(l, function(x) sum(x == 4))
lapply(l, function(x) sum(x == 5))
What I get with these 5 lines of code are 5 lists of 3 elements each containing a single numeric value. For example, the second line of code tells me how many times number 2 appears in each element of l.
Code 3:
> lapply(l, function(x) sum(x == 2))
$a
[1] 1
$b
[1] 1
$c
[1] 1
What I would like to obtain is a list with three elements containing all the information I am looking for.
Please, use the references "Code 1", "Code 2" and "Code 3" in your answers. Thank you very much.
Just use as.data.frame(l) for the second part and table(unlist(l)) for the first.
> table(unlist(l))
1 2 3 4 5
3 3 4 2 3
> data.frame(lapply(l, tabulate))
a b c
1 2 0 1
2 1 1 1
3 1 2 1
4 0 1 1
5 1 1 1`
For code 1/2, you could use sapply to obtain the counts for whichever values you wanted:
l = list(a = c(2, 3, 1, 5, 1), b = c(4, 3, 3, 5, 2), c = c(5, 1, 3, 2, 4))
data.frame(number = 1:5,
freq = sapply(1:5, function(x) sum(unlist(l) == x)))
# number freq
# 1 1 3
# 2 2 3
# 3 3 4
# 4 4 2
# 5 5 3
For code 3, if you wanted to get the counts for lists a, b, and c, you could just apply your frequency function to each element of the list with the lapply function:
freqs = lapply(l, function(y) sapply(1:5, function(x) sum(unlist(y) == x)))
data.frame(number = 1:5, a=freqs$a, b=freqs$b, c=freqs$c)
# number a b c
# 1 1 2 0 1
# 2 2 1 1 1
# 3 3 1 2 1
# 4 4 0 1 1
# 5 5 1 1 1
here you have another example with nested lapply().
created data:
list = NULL
list[[1]] = c(1:5)
list[[2]] = c(1:5)+3
list[[2]] = c(1:5)+4
list[[3]] = c(1:5)-1
list[[4]] = c(1:5)*3
list2 = NULL
list2[[1]] = rep(1,5)
list2[[2]] = rep(2,5)
list2[[3]] = rep(0,5)
The result is this; it serve to subtract each element of one list with all elements of the other list.
lapply(list, function(d){ lapply(list2, function(a,b) {a-b}, b=d)})
Related
This question already has an answer here:
How to convert from a list of lists to a list in R retaining names?
(1 answer)
Closed 9 years ago.
I have a brief question, I would like to unnest this nested list:
mylist <- list(a = list(A=1, B=5),
b = list(C= 1, D = 2),
c = list(E = 1, F = 3))
Expected result is:
> list(a=c(1, 5), b = c(1, 2), c = c(1, 3))
$a
[1] 1 5
$b
[1] 1 2
$c
[1] 1 3
Any suggestions?
T
Slight variation on everyone else's and keeping it in base:
lapply(mylist, unlist, use.names=FALSE)
## $a
## [1] 1 5
##
## $b
## [1] 1 2
##
## $c
## [1] 1 3
Take a look at llply function from plyr package
> library(plyr)
> llply(mylist, unlist)
$a
A B
1 5
$b
C D
1 2
$c
E F
1 3
If you want to get rid of the names, then try:
> lapply(llply(mylist, unlist), unname)
$a
[1] 1 5
$b
[1] 1 2
$c
[1] 1 3
I think applying unlist() to each elment in your list should give you what you're looking for:
> mylist <- list(a = list(A=1, B=5), b = list(C= 1, D = 2), c = list(E = 1, F = 3))
> mylist2 <- list(a=c(1, 5), b = c(1, 2), c = c(1, 3))
> data.frame(lapply(mylist,unlist))
a b c
A 1 1 1
B 5 2 3
> data.frame(mylist2)
a b c
1 1 1 1
2 5 2 3
I have a dataframe like
a <- c(2, 3, 4)
b <- c(5, 4, 3)
c <- c(2, 7, 9)
df <- data.frame(a, b, c)
df
# a b c
# 1 2 5 2
# 2 3 4 7
# 3 4 3 9
and I want to get back the row without number 2, in my example it is just second row.
Using rowSums or colSums:
# data
a <- c(2, 3, 4)
b <- c(5, 4, 3)
c <- c(2, 7, 9)
df <- data.frame(a, b, c)
df
# a b c
# 1 2 5 2
# 2 3 4 7
# 3 4 3 9
# get rows with no 2
df[ rowSums(df == 2, na.rm = TRUE) == 0, ]
# a b c
# 2 3 4 7
# 3 4 3 9
# get columns with no 2
df[ , colSums(df == 2, na.rm = TRUE) == 0, drop = FALSE ]
# b
# 1 5
# 2 4
# 3 3
We can also use Reduce with == to get the rows
df[!Reduce(`|`, lapply(df, `==`, 2)),]
# a b c
#2 3 4 7
#3 4 3 9
and any with lapply to select the columns
df[!sapply(df, function(x) any(x== 2))]
# b
#1 5
#2 4
#3 3
Here is my solution using some set functions. First, where are the positions of the twos?
is_two <- apply(df, 1, is.element, 2)
[,1] [,2] [,3]
[1,] TRUE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] TRUE FALSE FALSE
Now, which rows are all FALSE?
no_twos <- apply(!is_two, 1, all)
df[no_twos,]
a b c
2 3 4 7
I need to read in a CSV file with no headers and with an unknown number of columns and rows. However , every other column belongs in one matrix while the next needs to be in a different matrix. Example
CSV input:
1,2,3,4
1,2,3,4
1,2,3,4
1,2,3,4
Desired result would be equivalent to:
matrix1 <- (c( 1, 3,
1, 3,
1, 3,
1, 3), NumberOfRows, NumberOfColumns, byrow=T);
and
matrix2 <- (c( 2, 4,
2, 4,
2, 4,
2, 4), NumberOfRows, NumberOfColumns, byrow=T);
I have tried something like this (but this seems overly complex and doesn't work anyways). Isn't there a simple way to do this in R?
mydata<- read.csv("~/Desktop/file.csv", header=FALSE, nrows=4000);
columnCount<-ncol(mydata);
rowCount<-nrow(mydata);
evenColumns <- matrix(); oddColumns <-matrix();
for (i in 1:columnCount) {
if (i %% 2) {
for (l in 1:rowCount){
col <- 1;
evenColumns[col, l] <-mydata[i,l];
col<-col+1;
}
}
else {
for (l in 1:rowCount){
col <-1;
oddColumns[col, l] <-mydata[i,l];
col<-col+1;
}
}
}
How should this be done properly in R?
You can get the column numbers with seq:
full = read.csv("mat.csv", header=FALSE)
odds = as.matrix(full[, seq(1, ncol(full), by=2)])
evens = as.matrix(full[, seq(2, ncol(full), by=2)])
Output:
> odds
V1 V3
[1,] 1 3
[2,] 1 3
[3,] 1 3
[4,] 1 3
> evens
V2 V4
[1,] 2 4
[2,] 2 4
[3,] 2 4
[4,] 2 4
Similar to the problem discussed here
mat.even <- mydata[,which(1:ncol(mydata) %% 2 == 0)]
mat.odd <- mydata[,which(1:ncol(mydata) %% 2 == 1)]
Every other starting with the first:
> cdat[ , c(TRUE,FALSE)]
V1 V3
1 1 3
2 1 3
3 1 3
4 1 3
Every other starting with the second:
> cdat[ , !c(TRUE,FALSE)]
V2 V4
1 2 4
2 2 4
3 2 4
4 2 4
I have a zoo object of 12 sets of monthly returns on stock tickers. I want to get the symbol, which is the name of the series, or at least the column, of each month's best performing stock. I've been trying to do this with applying the max function, by row. How do I get the column name?
#Apply 'max' function across each row. I need to get the col number out of this.
apply(tsPctChgs, 1, max, na.rm = TRUE)
The usual answer would be via which.max() however, do note that this will return only the first of the maximums if there are two or more observations taking the maximum value.
An alternative is which(x == max(x)), which would return all value taking the maximum in the result of a tie.
You can then use the index returned to select the series maximum. Handling NAs is covered below to try to keep the initial discussion simple.
require("zoo")
set.seed(1)
m <- matrix(runif(50), ncol = 5)
colnames(m) <- paste0("Series", seq_len(ncol(m)))
ind <- seq_len(nrow(m))
mz <- zoo(m, order.by = ind)
> apply(mz, 1, which.max)
1 2 3 4 5 6 7 8 9 10
3 5 5 1 4 1 1 2 3 2
> apply(mz, 1, function(x) which(x == max(x)))
1 2 3 4 5 6 7 8 9 10
3 5 5 1 4 1 1 2 3 2
So use that to select the series name
i1 <- apply(mz, 1, function(x) which(x == max(x)))
colnames(mz)[i1]
> i1 <- apply(mz, 1, function(x) which(x == max(x)))
> colnames(mz)[i1]
[1] "Series3" "Series5" "Series5" "Series1" "Series4" "Series1" "Series1"
[8] "Series2" "Series3" "Series2"
Handling tied maximums
To illustrate the different behaviour, copy the maximum from month 1 (series 3) into series 1
mz2 <- mz ## copy
mz2[1,1] <- mz[1,3]
mz2[1,]
> mz2[1,]
1 0.9347052 0.2059746 0.9347052 0.4820801 0.8209463
Now try the two approaches again
> apply(mz2, 1, which.max)
1 2 3 4 5 6 7 8 9 10
1 5 5 1 4 1 1 2 3 2
> apply(mz2, 1, function(x) which(x == max(x)))
$`1`
Series1 Series3
1 3
.... ## truncated output ###
Notice how which.max only returns the maximum in series 1.
To use this approach to select the series name, you need to apply something to the list returned by apply(), e.g.
i2 <- apply(mz2, 1, function(x) which(x == max(x)))
lapply(i2, function (i, zobj) colnames(zobj)[i], zobj = mz2)
$`1`
[1] "Series1" "Series3"
$`2`
[1] "Series5"
$`3`
[1] "Series5"
$`4`
[1] "Series1"
$`5`
[1] "Series4"
$`6`
[1] "Series1"
$`7`
[1] "Series1"
$`8`
[1] "Series2"
$`9`
[1] "Series3"
$`10`
[1] "Series2"
Handling NAs
As you have potential for NAs, I would do the following:
apply(mz, 1, which.max, na.rm = TRUE) ## as you did already
apply(mz, 1, function(x, na.rm = TRUE) {
if(na.rm) {
x <- x[!is.na(x)]
}
which(x == max(x))
})
Since apply converts to matrix, I would use rollapply with width=1:
require("zoo")
set.seed(1)
m <- matrix(runif(50), ncol=5)
mz <- setNames(zoo(m, seq(nrow(m))), paste0("Series",seq(ncol(m))))
rollapply(mz, 1, function(r) colnames(mz)[which.max(r)], by.column=FALSE)
This question already has an answer here:
How to convert from a list of lists to a list in R retaining names?
(1 answer)
Closed 9 years ago.
I have a brief question, I would like to unnest this nested list:
mylist <- list(a = list(A=1, B=5),
b = list(C= 1, D = 2),
c = list(E = 1, F = 3))
Expected result is:
> list(a=c(1, 5), b = c(1, 2), c = c(1, 3))
$a
[1] 1 5
$b
[1] 1 2
$c
[1] 1 3
Any suggestions?
T
Slight variation on everyone else's and keeping it in base:
lapply(mylist, unlist, use.names=FALSE)
## $a
## [1] 1 5
##
## $b
## [1] 1 2
##
## $c
## [1] 1 3
Take a look at llply function from plyr package
> library(plyr)
> llply(mylist, unlist)
$a
A B
1 5
$b
C D
1 2
$c
E F
1 3
If you want to get rid of the names, then try:
> lapply(llply(mylist, unlist), unname)
$a
[1] 1 5
$b
[1] 1 2
$c
[1] 1 3
I think applying unlist() to each elment in your list should give you what you're looking for:
> mylist <- list(a = list(A=1, B=5), b = list(C= 1, D = 2), c = list(E = 1, F = 3))
> mylist2 <- list(a=c(1, 5), b = c(1, 2), c = c(1, 3))
> data.frame(lapply(mylist,unlist))
a b c
A 1 1 1
B 5 2 3
> data.frame(mylist2)
a b c
1 1 1 1
2 5 2 3