Extract element from vector using for loops and dataframe - r

I checked answers in
How to extract specific element from vector using for loop in R
But it's not what i want
I have data contains 17 rows and variables
Edit 1: My aim is
1- take names of variable from vectors :
2-Calculate sum of each variable in a vector using data frame
3-keep only the variable that have the highest sum in each vector
so i have data that contains all variables and my aim is to have new_data that contain just the variables that have the highes sum in each vector
that contain just the
I have vector that a generated every time using for loops and it contains names of variables ( diffrent names depending of conditions inside the for loop)
My aim is to eliminate names of variables in every vector except the one that has the highest sum
For example i have this dataframe :
my_data >
NAMES A B C D E F
One 1 2 3 4 5 6
Two 2 3 4 5 6 7
THREE 3 4 5 6 7 8
FOUR 4 5 6 7 8 9
FIVE 5 6 7 8 9 10
SIX 6 7 8 9 10 11
Let's say that the first vector generated by for loop contain names :
vec >
"B" "C" "D"
So using these variable the program will eliminate "B" and "C" because D is the one that has the highes sum :
So i will obtain
New_data
NAMES A D E F
One 1 4 5 6
Two 2 5 6 7
THREE 3 6 7 8
FOUR 4 7 8 9
FIVE 5 8 9 10
SIX 6 9 10 11
Let's say the second vector contain these names "A" , "E"
so the program will eliminate the A because E is the variable that has the highest sum
So
New data >
NAMES D E F
One 4 5 6
Two 5 6 7
THREE 6 7 8
FOUR 7 8 9
FIVE 8 9 10
SIX 9 10 11
Let's say that the third vector conatin "E" and "F"
Here's the part of vector analze programe code i used :
#This is how i generated the vector
vec <- names(Filter(function(x) x > 0, rowSums(tmp) > 0 |
#Vector generated by for loop
my_data %>%
dplyr::select(all_of(vec)) %>% # select vector items
slice(-17) %>% # remove 17 line
map_dbl(sum) %>% # make sum
which.max() %>% # select max
names() -> selected # select max name
#in the variable selected i have the name of variable i should keep
my_data %>% dplyr::select(!vec,selected) -> new_data# select columns
}
The problem with this program is that in the end my new_data contain all the variables except the last comparaison, because it uses always my data so in the last comparaison it compares the variables in my last vector and it keeps all the variables in my_data in new_data except the variables in my last vector that doesn't have the highest sum
So continue on the example i started before :
let's say the third vector conatin "E" and "F" :
The result i need to obtain is :
New data >
NAMES D F
One 4 6
Two 5 7
THREE 6 8
FOUR 7 9
FIVE 8 10
SIX 9 11
#I eliminated E because F has the highes sum
But the program i wrote give me this result :
NAMES A B C D F
One 1 2 3 5 6
Two 2 3 4 6 7
THREE 3 4 5 7 8
FOUR 4 5 6 8 9
FIVE 5 6 7 9 10
SIX 6 7 8 10 11
I think because the program took informations from my first data and it keeps all teh variables that are not in the my vector (that's why in the last comparaison it keeps A B C D )
So now i don't know how to fix this problem
please tell me if you need more informations

You may try this option -
for(i in vec) {
#Get the column names to delete based on column sum
drop_columns <- i[-which.max(colSums(my_data[i]))]
my_data[drop_columns] <- NULL
}
# NAMES D F
#1 One 4 6
#2 Two 5 7
#3 THREE 6 8
#4 FOUR 7 9
#5 FIVE 8 10
#6 SIX 9 11
data
my_data <- structure(list(NAMES = c("One", "Two", "THREE", "FOUR", "FIVE",
"SIX"), A = 1:6, B = 2:7, C = 3:8, D = 4:9, E = 5:10, F = 6:11),
class = "data.frame", row.names = c(NA, -6L))
vec <- list(c('B', 'C', 'D'), c('A', 'E'), c('E', 'F'))

I don't know what you are doing, so here is an alternative.
tmp=replicate(5,{sample(LETTERS[1:10],3)},simplify=F)
[[1]]
[1] "J" "C" "A"
[[2]]
[1] "F" "D" "B"
[[3]]
[1] "C" "G" "H"
[[4]]
[1] "J" "F" "C"
[[5]]
[1] "H" "G" "J"
I made up these vectors of column names, because I don't know how you generate them. Then we iterate this object and remove the columns.
for (i in tmp) {
# your stuff here
df=df[,!colnames(df) %in% i]
}
NAMES E
1 One 5
2 Two 6
3 THREE 7
4 FOUR 8
5 FIVE 9
6 SIX 10

Related

select columns after named columns

I have a data frame of the following form in R
First
a
b
c
Second
a
b
c
3
8
1
7
6
8
5
9
4
2
8
5
I'm trying to write something that selects the three columns following "First" & "Second", and puts them into new data frames titled "First" & "Second" respectively. I'm thinking of using the strategy below (where df is the dataframe I outline above), but am unsure how to make it such that R takes the columns that follow the ones I specify
names <- c("First", "Second")
for (i in c){
i <- (something to specify the 3 columns following df$i)
}
An option is to split.default to split the data.frame into a list of data.frames
split.default(df, cumsum(names(df) %in% names))
#$`1`
# First a b c
#1 NA 3 8 1
#2 NA 5 9 4
#
#$`2`
# Second a b c
#1 NA 7 6 8
#2 NA 2 8 5
The expression cumsum(...) creates the indices according to which to group and split columns.
Sample data
df <- read.table(text = "First a b c Second a b c
'' 3 8 1 '' 7 6 8
'' 5 9 4 '' 2 8 5", header = T, check.names = F)
You can get position of names vector in column names of the data and subset the next 3 columns from it.
names <- c("First", "Second")
inds <- which(names(df) %in% names)
result <- Map(function(x, y) df[x:y], inds + 1, inds + 3)
result
#[[1]]
# a b c
#1 3 8 1
#2 5 9 4
#[[2]]
# a b c
#1 7 6 8
#2 2 8 5
To create separate dataframes you can name the list and use list2env
names(result) <- names
list2env(result, .GlobalEnv)

Generate sequence between each element of 2 vectors

I have a for loop that generate each time 2 vectors of the same length (length can vary for each iteration) such as:
>aa
[1] 3 5
>bb
[1] 4 8
I want to create a sequence using each element of these vectors to obtain that:
>zz
[1] 3 4 5 6 7 8
Is there a function in R to create that?
We can use Mapto get the sequence of corresponding elements of 'aa' , 'bb'. The output is a list, so we unlist to get a vector.
unlist(Map(`:`, aa, bb))
#[1] 3 4 5 6 7 8
data
aa <- c(3,5)
bb <- c(4, 8)
One can obtain a sequence by using the colon operator : that separates the beginning of a sequence from its end. We can define such sequences for each vector, aa and bb, and concatenate the results with c() into a single series of numbers.
To avoid double entries in overlapping ranges we can use the unique() function:
zz <- unique(c(aa[1]:aa[length(aa)],bb[1]:bb[length(bb)]))
#> zz
#[1] 3 4 5 6 7 8
with
aa <- c(3,5)
bb <- c(4,8)
Depending on your desired output, here are a few more alternatives:
> do.call("seq",as.list(range(aa,bb)))
[1] 3 4 5 6 7 8
> Reduce(seq,range(aa,bb)) #all credit due to #BrodieG
[1] 3 4 5 6 7 8
> min(aa,bb):max(aa,bb)
[1] 3 4 5 6 7 8

Repeat elements of data.frame [duplicate]

This question already has answers here:
Repeat rows of a data.frame [duplicate]
(10 answers)
Closed 7 years ago.
This seems to be a fairly simple problem but I can't find a simple solution:
I want to repeat a data.frame (i) several times as follows:
My initial data.frame:
i <- data.frame(c("A","A","A","B","B","B","C","C","C"))
i
Printing i results in:
1 A
2 A
3 A
4 B
5 B
6 B
7 C
8 C
9 C
How I want to repeat the elements (The numbers on the first column is just for easy understanding/viewing)
i
1 A
2 A
3 A
4 B
5 B
6 B
7 C
8 C
9 C
1 A
2 A
3 A
4 B
5 B
6 B
7 C
8 C
9 C
I tried doing it using:
i[rep(seq_len(nrow(i)), each=2),]
but it provides me output as such (The numbers on the first column is just for easy understanding/viewing):
1 A
2 A
3 A
1 A
2 A
3 A
4 B
5 B
6 B
4 B
5 B
6 B
7 C
8 C
9 C
7 C
8 C
9 C
Please help!
Not sure if this solves your problem, but to obtain the desired output You could simply repeat the entire sequence:
i <- c("A","A","A","B","B","B","C","C","C")
i2 <- rep(i,2)
#> i2
# [1] "A" "A" "A" "B" "B" "B" "C" "C" "C" "A" "A" "A" "B" "B" "B" "C" "C" "C"
Since you're dealing with a data frame, you could use a slightly modified variant:
i <- data.frame(c("A","A","A","B","B","B","C","C","C"))
i2 <- rep(i[,1],2)
You could use rbind(i, i). Does that work?
If you are working with a data frame, this code will work fine too:
i[rep(1:nrow(i), 5), ,drop=F]

Making a data frame that is a subset of two data frames

I am stumped again.
I have two data frames
dataframe1
a b c
[1] 21 12 22
[2] 11 9 6
[3] 4 6 7
and
dataframe2
f g h
[1] 21 12 22
[2] 11 9 6
[3] 4 6 7
I want to take the first column of dataframe1 and make three new dataframes with the second column being each of the three f,g and h
Obviously I could just do a subset over and over
subset1 <- cbind(dataframe1[,1]dataframe2[,1])
subset2 <- cbind(dataframe1[,1]dataframe2[,2])
but my dataframes will have variable numbers of columns and are very long row numberwise. So I am looking for a little more something general. My data frames will always be the same length.
The closest I have come to getting anything was with apply and cbind but I got either a set of three rows that were a and f, a and g, a and h each combined as single numeric vector or I get a single data frame with four columns, a,f,g,h.
Help is deeply appreciated.
You can use lapply it iterate over the columns of dataframe2 like so:
lapply(dataframe2, function(x) as.data.frame(cbind(dataframe1[,1], x)))
This will result in a list object where each entry corresponds to a column of dataframe2. For example:
$f
V1 x
1 21 21
2 11 11
3 4 4
$g
V1 x
1 21 12
2 11 9
3 4 6
$h
V1 x
1 21 22
2 11 6
3 4 7

Filtering a dataframe in r row names from a second data frame in r

I have the data.frame :
df1<-data.frame("Sp1"=1:6,"Sp2"=7:12,"Sp3"=13:18)
rownames(df1)=c("A","B","C","D","E","F")
df1
Sp1 Sp2 Sp3
A 1 7 13
B 2 8 14
C 3 9 15
D 4 10 16
E 5 11 17
F 6 12 18
I filter df1 by a cutoff value for rowSums(df1) and return sites (row names) that I want to include in downstream analysis.
include<-rownames(df1[rowSums(df1)>=22,])
include
[1] "B" "C" "D" "E" "F"
I have a second data.frame :
df2<-data.frame(site.x=c("A","B","C"), site.y=c("D","E","F"),score=1:3)
site.x site.y score
1 A D 1
2 B E 2
3 C F 3
I want to filter df2 such that it only includes rows where df2$site.x and df2$site.y are exactly equal to the sites listed in 'include' i.e. filtering out the row containing "A" and returning.
site.x site.y score
2 B E 2
3 C F 3
I have tried :
filter<-df2$site.x == include & df2$site.y == include
filtered<-df2[filter,]
Thanks for any advice!
ANSWER
use %in%
filter<-df2$site.x %in% include & df2$site.y =%in% include
filtered<-df2[filter,]
filtered
site.x site.y score
2 B E 2
3 C F 3
For me, it works with :
filter<-df2$site.x %in% include & df2$site.y %in% include
df2[filter,]
In fact, you've put df1 instead of df2 in the last two lines of your question.

Resources