populating a list with data extracted from a dataframe - r

I have a dataframe like this:
a=c(rep(1,3), rep(2,2))
b=c(2,4,7,9,1)
df <- data.frame(a,b)
> df
a b
1 1 2
2 1 4
3 1 7
4 2 9
5 2 1
I want to create a list with as many elements as different values in column "a" (in this case "2") and store the values of column "b" in the list according to column "a". I am trying something like this:
lst <-list()
ff <-function(){lili[[df$a]] <- df$b}
apply(ff, df)
Which obviously is not working...But what I basically want to do is:
lst <- list(c(2,4), c(7,9,1))
but using apply over the rows of a large df to populate the list.

split(df$b, df$a)
$`1`
[1] 2 4 7
$`2`
[1] 9 1
This is extra nice because the list names will be the values of a by default.
That said, I agree with alistaire's comment. This seems like an XY problem - there's a good chance that whatever you do next would be done easily by data.table or dplyr without creating this separate list.

Try this: lapply(unique(df$a),function(x) df$b[df$a==x])

Here is an option using unstack
unstack(df, b~a)
#$`1`
#[1] 2 4 7
#$`2`
#[1] 9 1

Related

changing column names of a data frame by changing values - R

Let I have the below data frame.
df.open<-c(1,4,5)
df.close<-c(2,8,3)
df<-data.frame(df.open, df.close)
> df
df.open df.close
1 1 2
2 4 8
3 5 3
I wanto change column names which includes "open" with "a" and column names which includes "close" with "b":
Namely I want to obtain the below data frame:
a b
1 1 2
2 4 8
3 5 3
I have a lot of such data frames. The pre values(here it is "df.") are changing but "open" and "close" are fix.
Thanks a lot.
We can create a function for reuse
f1 <- function(dat) {
names(dat)[grep('open$', names(dat))] <- 'a'
names(dat)[grep('close$', names(dat))] <- 'b'
dat
}
and apply on the data
df <- f1(df)
-output
df
a b
1 1 2
2 4 8
3 5 3
if these datasets are in a list
lst1 <- list(df, df)
lst1 <- lapply(lst1, f1)
Thanks to dear #akrun's insightful suggestion as always we can do it in one go. So we create character vectors in pattern and replacement arguments of str_replace to be able to carry out both operations at once. We can assign character vector of either length one or more to each one of them. In case of the latter the length of both vectors should correspond. More to the point as the documentation says:
References of the form \1, \2, etc will be replaced with the contents
of the respective matched group (created by ())
library(dplyr)
library(stringr)
df %>%
rename_with(~ str_replace(., c(".*\\.open", ".*\\.close"), c("a", "b")))
a b
1 1 2
2 4 8
3 5 3
Another base R option using gsub + match + setNames
setNames(
df,
c("a", "b")[match(
gsub("[^open|close]", "", names(df)),
c("open", "close")
)]
)
gives
a b
1 1 2
2 4 8
3 5 3

how to name data frame columns to column index

It is a very basic question.How can you set the column names of data frame to column index? So if you have 4 columns, column names will be 1 2 3 4. The data frame i am using can have up to 100 columns.
It is not good to name the column names with names that start with numbers. Suppose, we name it as seq_along(D). It becomes unnecessarily complicated when we try to extract a column. For example,
names(D) <- seq_along(D)
D$1
#Error: unexpected numeric constant in "D$1"
In that case, we may need backticks or ""
D$"1"
#[1] 1 2 3
D$`1`
#[1] 1 2 3
However, the [ should work
D[["1"]]
#[1] 1 2 3
I would use
names(D) <- paste0("Col", seq_along(D))
D$Col1
#[1] 1 2 3
Or
D[["Col1"]]
#[1] 1 2 3
data
D <- data.frame(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))
Just use names:
D <- data.frame(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))
names(D) <- 1:ncol(D) # sequence from 1 through the number of columns

Add values to a vector to make a consecutive vector in R

I have several vectors that look like this:
v1 <- c(1,2,4)
v2 <- c(3,5,8)
v3 <- c(4)
This is just a small sample of them. I'm trying to figure out a way to add values to each of them to make them all consecutive vectors. So that at the end, they look like this:
v1 <- c(1,2,3,4)
v2 <- c(1,2,3,4,5,6,7,8)
v3 <- c(1,2,3,4)
So "3" is added to the first vector, "1","2","4","6","7" is added to the second and so forth. I have several hundred vectors that look like this so I'm trying to figure out a solution that would scale/be automated.
You can use seq and max
seq(max(v1))
For multiple vectors, we can loop
lapply(mget(paste0('v',1:3)), function(x) seq(max(x)))
#$v1
#[1] 1 2 3 4
#$v2
#[1] 1 2 3 4 5 6 7 8
#$v3
#[1] 1 2 3 4

Apply function to dataframe with changing argument

I have 2 objects:
A data frame with 3 variables:
v1 <- 1:10
v2 <- 11:20
v3 <- 21:30
df <- data.frame(v1,v2,v3)
A numeric vector with 3 elements:
nv <- c(6,11,28)
I would like to compare the first variable to the first number, the second variable to the second number and so on.
which(df$v1 > nv[1])
which(df$v2 > nv[2])
which(df$v3 > nv[3])
Of course in reality my data frame has a lot more variables so manually typing each variable is not an option.
I encounter these kinds of problems quite frequently. What kind of documentation would I need to read to be fluent in these matters?
One option would be to compare with equally sized elements. For this we can replicate the elements in 'nv' each by number of rows of 'df' (rep(nv, each=nrow(df))) and compare with df or use the col function that does similar output as rep.
which(df > nv[col(df)], arr.ind=TRUE)
If you need a logical matrix that corresponds to comparison of each column with each element of 'nv'
sweep(df, 2, nv, FUN='>')
You could also use mapply:
mapply(FUN=function(x, y)which(x > y), x=df, y=nv)
#$v1
#[1] 7 8 9 10
#
#$v2
#[1] 2 3 4 5 6 7 8 9 10
#
#$v3
#[1] 9 10
I think these sorts of situations are tricky because normal looping solutions (e.g. the apply function) only loop through one object, but you need to loop both through df and nv simultaneously. One approach is to loop through the indices and to use them to grab the appropriate information from both df and nv. A convenient way to loop through indices is the sapply function:
sapply(seq_along(nv), function(x) which(df[,x] > nv[x]))
# [[1]]
# [1] 7 8 9 10
#
# [[2]]
# [1] 2 3 4 5 6 7 8 9 10
#
# [[3]]
# [1] 9 10

new column with value from another one which name is in a third one in R

got that one I can't resolve.
Example dataset:
company <- c("compA","compB","compC")
compA <- c(1,2,3)
compB <- c(2,3,1)
compC <- c(3,1,2)
df <- data.frame(company,compA,compB,compC)
I want to create a new column with the value from the column which name is in the column "company" of the same line. the resulting extraction would be:
df$new <- c(1,3,2)
df
The way you have it set up, there's one row and one column for every company, and the rows and columns are in the same order. If that's your real dataset, then as others have said diag(...) is the solution (and you should select that answer).
If your real dataset has more than one instance of company (e.g., more than one row per company, then this is more general:
# using your df
sapply(1:nrow(df),function(i)df[i,as.character(df$company[i])])
# [1] 1 3 2
# more complex case
set.seed(1) # for reproducible example
newdf <- data.frame(company=LETTERS[sample(1:3,10,replace=T)],
A = sample(1:3,10,replace=T),
B=sample(1:5,10,replace=T),
C=1:10)
head(newdf)
# company A B C
# 1 A 1 5 1
# 2 B 1 2 2
# 3 B 3 4 3
# 4 C 2 1 4
# 5 A 3 2 5
# 6 C 2 2 6
sapply(1:nrow(newdf),function(i)newdf[i,as.character(newdf$company[i])])
# [1] 1 2 4 4 3 6 7 2 5 3
EDIT: eddi's answer is probably better. It is more likely that you would have the dataframe to work with rather than the individual row vectors.
I am not sure if I understand your question, it is unclear from your description. But it seems you are asking for the diagonals of the data values since this would be the place where "name is in the column "company" of the same line". The following will do this:
df$new <- diag(matrix(c(compA,compB,compC), nrow = 3, ncol = 3))
The diag function will return the diagonal of the matrix for you. So I first concatenated the three original vectors into one vector and then specified it to be wrapped into a matrix of three rows and three columns. Then I took the diagonal. The whole thing is then added to the dataframe.
Did that answer your question?

Resources