enter image description here
I want the accuracy row to be in the column part
Here a step by step manual solution with base R
Code
#Data
df <-
data.frame(X = c(1:3,"accurracy"), precision = runif(4), recall = runif(4), f1.score = runif(4))
#Save the last row data in a vector
accurracy <- unlist(df[nrow(df),])
#Eliminate the last row from the original data.frame
df <- df[-nrow(df),]
#Create a new column
df$"accurracy" <- accurracy[-1]
df
Output
X precision recall f1.score accurracy
1 1 0.6075635 0.4839641 0.3071190 0.12418847065419
2 2 0.5337823 0.3673568 0.8207251 0.951568570220843
3 3 0.2854789 0.7080209 0.8552161 0.0459401197731495
Related
I have a huge dataset and created a large correlation matrix. My goal is to clean this up and create a new data frame with all the correlations greater than the abs(.25) with the variable names include.
For example, I have this data set, how would I use a double nested loop over the rows and columns of the table of correlation.
a <- rnorm(10, 0 ,1)
b <- rnorm(10,1,1.5)
c <- rnorm(10,1.5,2)
d <- rnorm(10,-0.5,1)
e <- rnorm(10,-2,1)
matrix <- data.frame(a,b,c,d,e)
cor(matrix)
(notice, that there is redundancy in the matrix. You only need to inspect the first 5
columns; and you don’t need to inspect all rows. If I’m looking at column 3, for example, I
only need to start looking at row 4, after the correlation = 1)
Thank you
Is your ultimate goal to create a 5x5 with all values with absolute less than 0.25 set to zero? This can be done via sapply(matrix,function(x) ifelse(x<0.25,0,x)). If your goal is to simply create a loop over the rows and columns, this can be done via:
m <- cor(matrix)
for (row in rownames(m)){
for (col in colnames(m)){
#your code here
#operating on m[row,col]
}
}
To avoid redundancy:
for (row in rownames(m)[1:(length(rownames(m))-1)]){
for (col in colnames(m)[(which(colnames(m) == row)+1):length(colnames(m))]){
#your code here
#operating on m[row,col]
print(m[row,col])
}
}
I'd suggest using the corrr package, in conjunction with tidyr and dplyr.
This allows you to generate a correlation data frame rather than a matrix and remove the duplicate values (where for example a-b is the same as b-a) using the shave function. You can then rearrange by pivoting, remove the NA values (from the diagonal, e.g. a-a) and filter for values greater than 0.25.
library(dplyr)
library(tidyr)
library(magrittr) # for the pipe %>% or just use library(tidyverse) instead of all 3
library(corrr)
# for reproducible values
set.seed(1001)
# no need to make a data frame from vectors
# and don't call it matrix, that's a function name
mydata <- data.frame(a = rnorm(10, 0 ,1),
b = rnorm(10, 1, 1.5),
c = rnorm(10, 1.5, 2),
d = rnorm(10, -0.5, 1),
e = rnorm(10, -2, 1))
mydata %>%
correlate() %>%
shave() %>%
pivot_longer(2:6) %>%
na.omit() %>%
filter(abs(value) > 0.25)
Result:
# A tibble: 4 x 3
term name value
<chr> <chr> <dbl>
1 c b -0.296
2 d b 0.357
3 e a -0.440
4 e d -0.280
I have a series of character vectors in which for every participant (denoted in ReprEx as a letter), there is a time point (in RePrex either 1 or 2), and then a score. Here is the ReprEx:
l <- c("A","1","27","B","1","26","2","54")
How can I reshape the vector to create a dataframe that has three columns, with Column A as participant, Column B as Time Point, and Column C as Score?
The intended output would like something like this:
data.frame("Participant" = c("A","B","B"),
"Time Point" = c("1","1","2"),
"Score" = c("27","26","54"))
If easier to make, it could be brought into this shape:
data.frame("Participant" = c("A","B"),
"TimePoint1" = c("27","26"),
"TimePoint2" = c("NA","54"))
Any direction/thoughts are appreciated.
Here is one way in base R.
Based on some pattern in Participant name we can find their position using grep. In the example shared the pattern is every Participant has an upper-case letter. We use their position to split data so each Participant has their own list. We use the first value in each list as Participant name and alternate values as Time.point and Score respectively.
output <- do.call(rbind, lapply(split(l,
findInterval(seq_along(l), grep('[A-Z]', l))), function(x) {
data.frame(Participant = x[1],
Time.Point = x[-1][c(TRUE, FALSE)],
Score = x[-1][c(FALSE, TRUE)])
}))
rownames(output) <- NULL
output <- type.convert(output)
output
# Participant Time.Point Score
#1 A 1 27
#2 B 1 26
#3 B 2 54
QUESTION: Using R, how would you create values in column B prefixed with a constant "1" + n 0's where n is the value in each row in column A?
#R CODE EXAMPLE
df <- as.data.frame(1:3);colnames(df)[1] <- "A";
print(df);
# A
# 1
# 2
# 3
preFixedValue <- 1; repeatedValue <- 0;
#pseudo code: create values in column B with n 0's prefixed with 1
df <- cbind(df,paste(rep(c(preFixedValue,repeatedValue), times = c(1,df[1:nrow(df),])),collapse = ""));
#expected/desired result
# A B
# 1 10
# 2 100
# 3 1000
USE CASE: Real data contains hundreds of rows in column A with random integers, not just three sequential int's as shown in the code above.
Below is an example using Excel to demonstrate what I want to do in R.
The rowwise() function in dplyr lets you make variables from column values in each row.
require(dplyr)
df <- data.frame(A = 1:3, B = NA)
preFixedValue <- 1; repeatedValue <- 0;
df <- df %>%
rowwise() %>%
mutate(B = as.numeric(paste0(c(preFixedValue, rep(repeatedValue, A)), collapse = "")))
For maximum flexibility, i.e. total freedom of choosing prefixed and repeated values as single values or vectors, and for simplicity of the syntax (one single line):
library(stringr)
df$B <- str_pad(preFixedValue, width = df$A, pad = repeatedValue, side = c("right"))
Would something like this work?
B<-10^(df$A)
df<-cbind(df,B)
I have a function (weisurv) that has 2 parameters - sc and shp. It is a function through time (t). Time is a sequence, i.e. t<-seq(1:100).
weisurv<-function(t,sc,shp){
surv<-exp(-(t/sc)^shp)
return(surv)
}
I have a data frame (df) that contains a list of sc and shp values (like 300+ of them). For example, I have:
M shp sc p C i
1 1 1.138131 10.592154 0.1 1 1
2 1.01 1.143798 10.313217 0.1 1 2
3 1.02 1.160653 10.207863 0.1 1 3
4 1.03 1.185886 9.861997 0.1 1 4
...
I want to apply each set (ROW) of sc and shp parameters to my function. So the function would look like weisurv(t,sc[[i]],shp[i]]) for each row[i]. I do not understand how to use apply or adply to do this though I'm sure one of these or a combo of both are what is needed.
In the end, I am looking for a data frame that gives a value of weisurv for each time given a set of sc and shp (held constant through time). So if I had 10 sets of sc and shp parameters, I would end up with 10 time series of weisurv.
Thanks....
Using plyr:
As a matrix (time in cols, rows corresponding to rows of df):
aaply(df, 1, function(x) weisurv(t, x$sc, x$shp), .expand = FALSE)
As a list:
alply(df, 1, function(x) weisurv(t, x$sc, x$shp))
As a data frame (structure as per matrix above):
adply(df, 1, function(x) setNames(weisurv(t, x$sc, x$shp), t))
As a long data frame (one row per t/sc/shp combination); note uses mutate and the pipe operator from dplyr):
newDf <- data.frame(t = rep(t, nrow(df)), sc = df$sc, shp = df$shp) %>%
mutate(surv = weisurv(t, sc, shp))
You can also create a wide data.frame and then use reshape2::melt to reformat as long:
wideDf <- adply(df, 1, function(x) setNames(weisurv(t, x$sc, x$shp), t))
newDf <- melt(wideDf, id.vars = colnames(df), variable.name = "t", value.name = "surv")
newDf$t <- as.numeric(as.character(newDf$t))
Pretty plot of last newDf (using ggplot2):
ggplot(newDf, aes(x = t, y = surv, col = sprintf("sc = %0.3f, shp = %0.3f", sc, shp))) +
geom_line() +
scale_color_discrete(name = "Parameters")
Not sure about the exact structure you want in the final dataframe...
and I think there must be a cleaner way to do this, but this should work.
option 1
rows are the same as your df, with new columns t<n> for each value of t:
for(n in t){
df$temp <- weisurv(n, df$sc, df$shp)
names(df)[n+2] <- paste0('t', n)
}
option 2
long dataframe, with columns sc, shp, t, and weisurv(t,sc,shp):
l = length(t)
newdf <- data.frame(sc=rep(df$sc, each=l), shp=rep(df$shp, each=l),
t=rep(t, times=nrow(df)) )
newdf$weisurv <- weisurv(newdf$t, newdf$sc, newdf$shp)
Here is an example,
df <- data.frame(x = I(list(1:2, 3:4)))
x <- df[1,]
Now the following does not work,
df[2,] <- x
or
df[2,] <- I(x)
Warning message:
In `[<-.data.frame`(`*tmp*`, 2, , value = list(1:2)) :
replacement element 1 has 2 rows to replace 1 rows
How do I add more rows to data frame with a single column of vector type.
I found the following after few tries,
df[2,] <- list(x)
add new row of list type.
It might be because you are using a list. If you set your data frame as:
df <- data.frame(rbind(c(1, 2), c(3, 4)))
then your code should work:
df <- data.frame(rbind(c(1, 2), c(3, 4))) # Make DF
x <- df[1,]
df[2,] <- x
print(df)
> df
X1 X2
1 1 2
2 1 2