R: Concatenated values in column B based on values in column A - r

QUESTION: Using R, how would you create values in column B prefixed with a constant "1" + n 0's where n is the value in each row in column A?
#R CODE EXAMPLE
df <- as.data.frame(1:3);colnames(df)[1] <- "A";
print(df);
# A
# 1
# 2
# 3
preFixedValue <- 1; repeatedValue <- 0;
#pseudo code: create values in column B with n 0's prefixed with 1
df <- cbind(df,paste(rep(c(preFixedValue,repeatedValue), times = c(1,df[1:nrow(df),])),collapse = ""));
#expected/desired result
# A B
# 1 10
# 2 100
# 3 1000
USE CASE: Real data contains hundreds of rows in column A with random integers, not just three sequential int's as shown in the code above.
Below is an example using Excel to demonstrate what I want to do in R.

The rowwise() function in dplyr lets you make variables from column values in each row.
require(dplyr)
df <- data.frame(A = 1:3, B = NA)
preFixedValue <- 1; repeatedValue <- 0;
df <- df %>%
rowwise() %>%
mutate(B = as.numeric(paste0(c(preFixedValue, rep(repeatedValue, A)), collapse = "")))

For maximum flexibility, i.e. total freedom of choosing prefixed and repeated values as single values or vectors, and for simplicity of the syntax (one single line):
library(stringr)
df$B <- str_pad(preFixedValue, width = df$A, pad = repeatedValue, side = c("right"))

Would something like this work?
B<-10^(df$A)
df<-cbind(df,B)

Related

How to transpose only one row of a dataframe in R

enter image description here
I want the accuracy row to be in the column part
Here a step by step manual solution with base R
Code
#Data
df <-
data.frame(X = c(1:3,"accurracy"), precision = runif(4), recall = runif(4), f1.score = runif(4))
#Save the last row data in a vector
accurracy <- unlist(df[nrow(df),])
#Eliminate the last row from the original data.frame
df <- df[-nrow(df),]
#Create a new column
df$"accurracy" <- accurracy[-1]
df
Output
X precision recall f1.score accurracy
1 1 0.6075635 0.4839641 0.3071190 0.12418847065419
2 2 0.5337823 0.3673568 0.8207251 0.951568570220843
3 3 0.2854789 0.7080209 0.8552161 0.0459401197731495

Need Help writing a Loop function

I have a huge dataset and created a large correlation matrix. My goal is to clean this up and create a new data frame with all the correlations greater than the abs(.25) with the variable names include.
For example, I have this data set, how would I use a double nested loop over the rows and columns of the table of correlation.
a <- rnorm(10, 0 ,1)
b <- rnorm(10,1,1.5)
c <- rnorm(10,1.5,2)
d <- rnorm(10,-0.5,1)
e <- rnorm(10,-2,1)
matrix <- data.frame(a,b,c,d,e)
cor(matrix)
(notice, that there is redundancy in the matrix. You only need to inspect the first 5
columns; and you don’t need to inspect all rows. If I’m looking at column 3, for example, I
only need to start looking at row 4, after the correlation = 1)
Thank you
Is your ultimate goal to create a 5x5 with all values with absolute less than 0.25 set to zero? This can be done via sapply(matrix,function(x) ifelse(x<0.25,0,x)). If your goal is to simply create a loop over the rows and columns, this can be done via:
m <- cor(matrix)
for (row in rownames(m)){
for (col in colnames(m)){
#your code here
#operating on m[row,col]
}
}
To avoid redundancy:
for (row in rownames(m)[1:(length(rownames(m))-1)]){
for (col in colnames(m)[(which(colnames(m) == row)+1):length(colnames(m))]){
#your code here
#operating on m[row,col]
print(m[row,col])
}
}
I'd suggest using the corrr package, in conjunction with tidyr and dplyr.
This allows you to generate a correlation data frame rather than a matrix and remove the duplicate values (where for example a-b is the same as b-a) using the shave function. You can then rearrange by pivoting, remove the NA values (from the diagonal, e.g. a-a) and filter for values greater than 0.25.
library(dplyr)
library(tidyr)
library(magrittr) # for the pipe %>% or just use library(tidyverse) instead of all 3
library(corrr)
# for reproducible values
set.seed(1001)
# no need to make a data frame from vectors
# and don't call it matrix, that's a function name
mydata <- data.frame(a = rnorm(10, 0 ,1),
b = rnorm(10, 1, 1.5),
c = rnorm(10, 1.5, 2),
d = rnorm(10, -0.5, 1),
e = rnorm(10, -2, 1))
mydata %>%
correlate() %>%
shave() %>%
pivot_longer(2:6) %>%
na.omit() %>%
filter(abs(value) > 0.25)
Result:
# A tibble: 4 x 3
term name value
<chr> <chr> <dbl>
1 c b -0.296
2 d b 0.357
3 e a -0.440
4 e d -0.280

Create a function that replace values where n <5 with a random number between 1 and 4 (integer)

I am quite new to R and have run into a problem I apparently can't solve by myself. It should be fairly easy thou.
I aim to write a generic function that manipulates column n in dataframe df. I want it to peform a simple task, for each row, when n < 5 it should replace that value with a random number between 1 and 4.
df <- data.frame(n= 1:10, y = letters[1:10],
stringsAsFactors = FALSE)
What is the most elegant solution?
One way to do is create a logical index based on the column, subset the column based on the index and assign the sampled values
f1 <- function(dat, col) {
i1 <- dat[[col]] < 5
dat[[col]][i1] <- sample(1:4, sum(i1), replace = TRUE)
dat
}
f1(df, "n")

Delete rows after a negative value in multiple data frames

I have multiple data frames which are individual sequences, consisting out the same columns. I need to delete all the rows after a negative value is encountered in the column "OnsetTime". So not the row of the negative value itself, but the row after that. All sequences have 16 rows in total.
I think it must be able by a loop, but I have no experience with loops in r and I have 499 data frames of which I am currently deleting the rows of a sequence one by one, like this:
sequence_6 <- sequence_6[-c(11:16), ]
sequence_7 <- sequence_7[-c(11:16), ]
sequence_9 <- sequence_9[-c(6:16), ]
Is there a faster way of doing this? An example of a sequence can be seen here example sequence
Ragarding this example, I want to delete row 7 to row 16
Data
Since the odd web configuration at work prevents me from accessing your data, I created three dataframes based on random numbers
set.seed(123); data_1 <- data.frame( value = runif(25, min = -0.1) )
set.seed(234); data_2 <- data.frame( value = runif(20, min = -0.1) )
set.seed(345); data_3 <- data.frame( value = runif(30, min = -0.1) )
First, you could create a list containing all your dataframes:
list_df <- list(data_1, data_2, data_3)
Now you can go through this list with a for loop. Since there are several steps, I find it convenient to use the package dplyr because it allows for a more readable notation:
library(dplyr)
for( i in 1:length(list_df) ){
min_row <-
list_df[[i]] %>%
mutate( id = row_number() ) %>% # add a column with row number
filter(value < 0) %>% # get the rows with negative values
summarise( min(id) ) %>% # get the first row number
as.numeric() # transform this value to a scalar (not a dataframe)
list_df[[i]] <- list_df[[i]] %>% slice(1:min_row) # get rows 1 to min_row
}
Hope it helps!
We can get the datasets into a list assuming that the object names start with 'sequence' followed by a - and one or more digits. Then use lapply to loop over the list and subset the rows based on the condition
lst1 <- lapply(mget(ls(pattern="^sequence_\\d+$")), function(x) {
i1 <- Reduce(`|`, lapply(x, `<`, 0))
#or use rowSums
#i1 <- rowSums(x < 0) > 0
i2 <- which(i1)[1]
x[seq(i2),]
}
)
data
set.seed(42)
sequence_6 <- as.data.frame(matrix(sample(-1:10, 16 *5, replace = TRUE), nrow = 16))
sequence_7 <- as.data.frame(matrix(sample(-2:10, 16 *5, replace = TRUE), nrow = 16))
sequence_9 <- as.data.frame(matrix(sample(-2:10, 16 *5, replace = TRUE), nrow = 16))

How to find values less than -1 in each row for every 12 columns in R?

I have a matrix(100*120) and I am trying to find values <=-1 in each row for every 12 columns. I have tried several times but failed. It is easy to find values which are <= -1, but I do not know how to consider for every 12 columns and store the results for each row. Thanks for any help.
set.seed(100)
Mydata <- sample(x=-3:3,size = 100*120,replace = T)
Mydata <- matrix(data = Mydata,nrow = 100,ncol = 120)
results <- which(Mydata<=-1,arr.ind = T)
You can use the apply function to apply the which function across each column for each row at a time. If I misinterpreted what you wanted, you can adjust the MARGIN argument accordingly.
# MARGIN=1 to apply across rows
dd <- apply(Mydata,MARGIN=1,function(x) which(x <= -1))
dd[1] # which columns in row 1 have a value <= -1
You can do this using a combination of apply functions and seq()
#Example Data
set.seed(100)
Mydata <- sample(x=-3:3,size = 100*120,replace = T)
Mydata <- matrix(data = Mydata,nrow = 100,ncol = 120)
#Solution:
Myseq <- sapply(0:9,function(x) seq(1,12,1) + 12*x)
sapply(1:dim(Myseq)[2], function(x) which(Mydata[,Myseq[,x]] == -1))
This results in a list with:
each subset of the list representing one of your 10 groups of 12 columns
each value under each subset representing the position in the matrix of any value in those 12 columns with a value equal to -1.

Resources