In which column there is a value of a specific variable - r

I have this dataframe:
a <- c(2,5,90,77,56,65,85,75,12,24,52,32)
b <- c(45,78,98,55,63,12,23,38,75,68,99,73)
c <- c(77,85,3,22,4,69,86,39,78,36,96,11)
d <- c(52,68,4,25,79,120,97,20,7,19,37,67)
e <- c(14,73,91,87,94,38,1,685,47,102,666,74)
df <- data.frame(a,b,c,d,e)
and this variable:
bb <- 120
I need to know the column number of df in which there is the value of the variable "bb". How can I do?
Thx everyone!

We could use which with arr.ind = TRUE to extract the row/col index after creating a logical matrix. Then, extract the second column to get the column index
which(df == bb, arr.ind = TRUE)[,2]
col
4
If there are duplicate elements in the column for the value compared, wrap with unique to return the unique column index
unique(which(df == bb, arr.ind = TRUE)[,2])
[1] 4

I think we could use grep
grep(bb, df)
[1] 4

Related

Subset dataframe using counter that resets to 1 and create dataframe for each subset

I have a dataframe that I need to break into multiple, smaller dataframes.
There is an integer index, which starts at 1 and counts up. When it resets to 1, I need to start creating a new dataframe.
df <- cbind(c(1,2,3,4,5,1,2,3,4), c("a","b","c","d","e","f","g","h","i"))
#end results should be:
df1 <- df[1:5, ]
df2 <- df[6:9, ]
How do I do this programmatically? I can find where all of the "1"s are, but how to I go row-wise and break it into different dataframes?
In your example, df is a character matrix, not a data.frame. To define a data.frame object use e.g. data.frame(index = c(1,2,3,4,5,1,2,3,4), value = c("a","b","c","d","e","f","g","h","i")
Find the index of the first value of each group, then split on groups. You do not need to perform any rowwise operation.
df <- data.frame(index = c(1,2,3,4,5,1,2,3,4), value = c("a","b","c","d","e","f","g","h","i"))
split(df, cumsum(df$index == 1))
result is a list of data.frame objects:
$`1`
index value
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
$`2`
index value
6 1 f
7 2 g
8 3 h
9 4 i
Try this approach with indexes and a loop. We create i1 to store the rows where there is 1. Then we compute the final position in i2. After that we create a list and use a loop to store the new data. Finally, we assign names and release to envir using list2env. Here the code:
#Create index
i1 <- which(df[,1]=='1')
i2 <- i1[-1]-1
#Test for dim
if(length(i2==1)){i2 <- c(i2,nrow(df))}
#Create a list
List <- list()
#Loop
for(j in 1:length(i1))
{
List[[j]] <- df[i1[j]:i2[j],]
}
#Assign names
names(List) <- paste0('df',1:length(List))
#Set to envir
list2env(List,envir = .GlobalEnv)

Turning a data.frame into a list of smaller data.frames in R

Suppose I have a data.frame like THIS (or see my code below). As you can see, after every some number of continuous rows, there is a row with all NAs.
I was wondering how I could split THIS data.frame based on every row of NA?
For example, in my code below, I want my original data.frame to be split into 3 smaller data.frames as there are 2 rows of NAs in the original data.frame.
Here is is what I tried with no success:
## The original data.frame:
DF <- read.csv("https://raw.githubusercontent.com/izeh/i/master/m.csv", header = T)
## the index number of rows with "NA"s; Here rows 7 and 14:
b <- as.numeric(rownames(DF[!complete.cases(DF), ]))
## split DF by rows that have "NA"s; that is rows 7 and 14:
split(DF, b)
If we also need the NA rows, create a group with cumsum on the 'study.name' column which is blank (or NA)
library(dplyr)
DF %>%
group_split(grp = cumsum(lag(study.name == "", default = FALSE)), keep = FALSE)
Or with base R
split(DF, cumsum(c(FALSE, head(DF$study.name == "", -1))))
Or with NA
i1 <- rowSums(is.na(DF))== ncol(DF)
split(DF, cumsum(c(FALSE, head(i1, -1))))
Or based on 'b'
DF1 <- DF[setdiff(seq_len(nrow(DF)), b), ]
split(DF1, as.character(DF1$study.name))
You can find occurrence of b in sequence of rows in DF and use cumsum to create groups.
split(DF, cumsum(seq_len(nrow(DF)) %in% b))

Subset a dataframe by matching it to a list and include non-match value too in the output using R

I have a dataframe (myDF) that has 2 columns "A" and "B" and a function (myfunc) which takes a list as an input and if it finds a match in column "A" then it returns a new dataframe that is a subset of myDF containing the value match and the corresponding "B" column.
But I want the function to also return the non-matching value in column A and NULL string in column B.
myDF:
A B
1 11
2 22
3 33
myfunc:
myfunc <- function(x) {
r<- with(myDF, myDF[a %in% x, c("a", "b")])
return(data.frame(r))
}
Input: mylist = c(1,2,"E")
Expected Output:
A B
1 11
2 22
E NULL
We create a logical index and assign
i1 <- with(myDF, !A %in% mylist)
myDF$B[i1] <- "NULL"
myDF$A[i1] <- mylist[i1]
myDF
# A B
#1 1 11
#2 2 22
#3 E NULL
Note: By assigning a character string to 'B' column, it effectively changes the type from numeric to character. A better option would be to assign it to NA
myDF$B[i1] <- NA
Or
data.frame(A= mylist, B = myDF$B[match(mylist, myDF$A)])
This is a join operation, which can be done in base R with merge, if you make the list a data.frame first. The all.y = T argument includes rows of mylistDF with no matching rows in myDF in the output.
mylistDF <- data.frame(A = mylist, stringsAsFactors = F)
merge(myDF, mylistDF, by = 'A', all.y = T)
# A B
# 1 1 11
# 2 2 22
# 3 E NA
Since you tagged tidyr, here's a tidyverse solution (same output)
library(tidyverse)
mylistDF <- tibble(A = mylist)
myDF %>%
mutate_at('A', as.character) %>%
right_join(mylistDF, by = 'A')

reference x's column in R's apply function

I have a df like this:
a <- c(4,5,3,5,1)
b <- c(8,9,7,3,5)
c <- c(6,7,5,4,3)
df <- data.frame(rbind(a,b,c))
I want a new df, df2, containing the difference between the values in each cell in rows a and b and the value in row c in their respective columns.
df2 would look like this:
a <- c(-2,-2,-2,1,-2)
b <- c(2,2,2,-1,2)
df2 <- data.frame(rbind(a,b))
Here is where I'm getting stuck:
df2 <- data.frame(apply(df,c(1,2),function(x) x - df[nrow(df),the col index of x]))
How do I reference the column index of x? Is there something like JavaScript's this?
We can do this easily by replicating the 3rd row to make the lengths equal before subtracting with the first two rows
out <- df[c("a", "b"),] - df["c",][col(df[c("a", "b"),])]
identical(df2, out)
#[1] TRUE
Or explicitly using rep
df[c("a", "b"),] - rep(unlist(df["c",]), each = 2)

R: Concatenated values in column B based on values in column A

QUESTION: Using R, how would you create values in column B prefixed with a constant "1" + n 0's where n is the value in each row in column A?
#R CODE EXAMPLE
df <- as.data.frame(1:3);colnames(df)[1] <- "A";
print(df);
# A
# 1
# 2
# 3
preFixedValue <- 1; repeatedValue <- 0;
#pseudo code: create values in column B with n 0's prefixed with 1
df <- cbind(df,paste(rep(c(preFixedValue,repeatedValue), times = c(1,df[1:nrow(df),])),collapse = ""));
#expected/desired result
# A B
# 1 10
# 2 100
# 3 1000
USE CASE: Real data contains hundreds of rows in column A with random integers, not just three sequential int's as shown in the code above.
Below is an example using Excel to demonstrate what I want to do in R.
The rowwise() function in dplyr lets you make variables from column values in each row.
require(dplyr)
df <- data.frame(A = 1:3, B = NA)
preFixedValue <- 1; repeatedValue <- 0;
df <- df %>%
rowwise() %>%
mutate(B = as.numeric(paste0(c(preFixedValue, rep(repeatedValue, A)), collapse = "")))
For maximum flexibility, i.e. total freedom of choosing prefixed and repeated values as single values or vectors, and for simplicity of the syntax (one single line):
library(stringr)
df$B <- str_pad(preFixedValue, width = df$A, pad = repeatedValue, side = c("right"))
Would something like this work?
B<-10^(df$A)
df<-cbind(df,B)

Resources