Move a [-] symbol with condition - r

I'm still learning R, and you guys have been so helpful with your educative answers.
So here is my issue, It might be very basic but i tried solutions with sub, gsub and casewhen, getting no results. I have a column with some numbers with [-] sign in the right. And if they have the - i would like to move it upfront.
col<- c("1.000","100-","12.000-","12.568-", "100","150","1.000.000-")
col2<-c("A","B","C","D","E","F","G")
A<-cbind(col2,col)
A<-as.data.frame(A)
Expected result:
col2<-c("A","B","C","D","E","F","G")
col<-c("1.000","-100","-12.000","-12.568", "100","150","-1.000.000")
A<-cbind(col2,col)
A<-as.data.frame(A)
Thanks in advance!

You could do:
sub("(.*)-$", "-\\1", A$col)
#> [1] "1.000" "-100" "-12.000" "-12.568" "100" "150"
#> [7] "-1.000.000"

You can also write an ifelse that checks if the last character in the string is a dash and in that case paste it in front:
library(stringr)
A %>%
mutate(col_edit = ifelse(str_sub(col,-1,-1) == "-",
paste0("-",str_sub(col,1,-2)),
col))
col2 col col_edit
1 A 1.000 1.000
2 B 100- -100
3 C 12.000- -12.000
4 D 12.568- -12.568
5 E 100 100
6 F 150 150
7 G 1.000.000- -1.000.000

Using str_replace
library(stringr)
A$col - str_replace(A$col, "^(.*)-$", "-\\1")
A$col
#[1] "1.000" "-100" "-12.000" "-12.568" "100" "150" "-1.000.000"

Related

Extract info from filename

I wonder how I would extract the information below from the filename? The last 3 digits in the filename is the injection order. After"POS_" the sample type is presented. Any suggestions? Thanks!
df <- c("2018-03-04_B6W3_RN_POS_lQC09_098.mzML", "2018-03-05_B7W3_RN_POS_LVF957364573527_108.mzML", "2018-03-06_B8W3_RN_POS_sQC09_001.mzML")
df
[1] "2018-03-04_B6W3_RN_POS_lQC09_098.mzML" "2018-03-05_B7W3_RN_POS_LVF957364573527_108.mzML"
[3] "2018-03-06_B8W3_RN_POS_sQC09_001.mzML"
It should look like:
injection:
"098" "108" "001"
sample:
"lQC" "LVL" "sQC"
This solution is based on the package stringrand positive lookahead in (?=\\.) as well as positive lookbehind in (?<=POS_):
dt <- data.frame(injection = str_extract(df, "\\d{3}(?=\\.)"),
sample = str_extract(df, "(?<=POS_)\\w{3}"))
dt
injection sample
1 098 lQC
2 108 LVF
3 001 sQC
Try this:
require(stringr)
df <- c("2018-03-04_B6W3_RN_POS_lQC09_098.mzML", "2018-03-05_B7W3_RN_POS_LVF957364573527_108.mzML", "2018-03-06_B8W3_RN_POS_sQC09_001.mzML")
df
# [1] "2018-03-04_B6W3_RN_POS_lQC09_098.mzML" "2018-03-05_B7W3_RN_POS_LVF957364573527_108.mzML"
# [3] "2018-03-06_B8W3_RN_POS_sQC09_001.mzML"
injection_str <- str_extract(df, "[0-9]{3}(?=\\.)")
injection_str
# [1] "098" "108" "001"
sample_str <- str_extract(df, "(?<=(POS_))[a-zA-Z0-9]{3}")
sample_str
# [1] "lQC" "LVF" "sQC"

How to turn a table with strings into a list of vectors in R?

I have a dataset looks like this
> data.frame("letter" = letters, "words" = paste0(1:26,letters, letters,",", rev(letters),letters,5:26, ",", letters, 1:24, rev(letters)))
letter words
1 a 1aa,za5,a1z
2 b 2bb,yb6,b2y
3 c 3cc,xc7,c3x
4 d 4dd,wd8,d4w
5 e 5ee,ve9,e5v
...
And I would like to turn this table into
[[a]]
[1] "1aa" "za5" "a1z"
[[b]]
[1] "2bb" "yb6" "b2y"
[[c]]
[1] "3cc" "xc7" "c3x"
[[d]]
[1] "4dd" "wd8" "d4w"
[[e]]
[1] "5ee" "ve9" "e5v"
...
I have tried to use a for loop which works for me, however, when the nrow of this dataframe increase, it takes longer time. And I would like to know if there is a cleaner wayt to do so?
Your answer is much appreciated.
Thank you very much!!
The function strsplit is what you are looking for. Try :
df = data.frame("letter" = letters, "words" = paste0(1:26,letters, letters,",", rev(letters),letters,5:26, ",", letters, 1:24, rev(letters)))
strsplit(as.character(df$words),',',fixed= TRUE)
[[1]]
[1] "1aa" "za5" "a1z"
[[2]]
[1] "2bb" "yb6" "b2y"
[[3]]
[1] "3cc" "xc7" "c3x"
[[4]]
[1] "4dd" "wd8" "d4w"
[[5]]
[1] "5ee" "ve9" "e5v"

Weird conversion from list to dataframe in R

I have a list that I created from a for loop and it looks like this:
I tried to convert it to a dataframe using the code:
dflist<- as.data.frame(mylist)
But my dataframe looks like this now:
I know I probably created my list wrong but I am thinking this is still salvageable if I just need to convert the numbers to a dataframe correctly.
My end goal is to plot the numbers against their index (1-30) and I thought creating a dataframe first to clean it up and then plot would be helpful.
Any help would be really appreciated. Thank you.
The data showed is a list. We can use unlist and create a data.frame. Based on the image showed in OP's post, each list element have a length of 1. By doing unlist, we convert the list to vector and then wrap with data.frame.
data.frame(ind= seq_along(lst), Col1= as.numeric(unlist(lst)))
Or another option would be stack after naming the list elements
df1 <- transform(stack(setNames(lst, seq_along(lst))),
values = as.numeric(values))
It gives a two column dataset. From this we can do the plotting
Regarding the OP's approach about calling as.data.frame directly on the list, it does work in a different way as it calls on as.data.frame.list. For example, if we do as.data.frame on a vector, it uses as.data.frame.vector
as.data.frame(1:5)
# 1:5
#1 1
#2 2
#3 3
#4 4
#5 5
But, if we call as.data.frame.list
as.data.frame.list(1:5)
# X1L X2L X3L X4L X5L
#1 1 2 3 4 5
we get a data.frame with 'n' columns (based on the length of the vector).
Suppose, we do the same on a list
as.data.frame(as.list(1:5))
# X1L X2L X3L X4L X5L
#1 1 2 3 4 5
It uses the as.data.frame.list. To get the complete list of methods of as.data.frame,
methods('as.data.frame')
#[1] as.data.frame.aovproj* as.data.frame.array
# [3] as.data.frame.AsIs as.data.frame.character
# [5] as.data.frame.chron* as.data.frame.complex
# [7] as.data.frame.data.frame as.data.frame.data.table*
# [9] as.data.frame.Date as.data.frame.dates*
#[11] as.data.frame.default as.data.frame.difftime
#[13] as.data.frame.factor as.data.frame.ftable*
#[15] as.data.frame.function* as.data.frame.grouped_df*
#[17] as.data.frame.idf* as.data.frame.integer
#[19] as.data.frame.ITime* as.data.frame.list <-------
#[21] as.data.frame.logical as.data.frame.logLik*
#[23] as.data.frame.matrix as.data.frame.model.matrix
#[25] as.data.frame.noquote as.data.frame.numeric
#[27] as.data.frame.numeric_version as.data.frame.ordered
#[29] as.data.frame.POSIXct as.data.frame.POSIXlt
#[31] as.data.frame.raw as.data.frame.rowwise_df*
#[33] as.data.frame.table as.data.frame.tbl_cube*
#[35] as.data.frame.tbl_df* as.data.frame.tbl_dt*
#[37] as.data.frame.tbl_sql* as.data.frame.times*
#[39] as.data.frame.ts as.data.frame.vector

Sorting a key,value list in R by value

Given a list animals, call it m, which contains
$bob
[1] 3
$ryan
[1] 4
$dan
[1] 1
How can I sort this guy by the numerical value?
Basically I'd like to see my code look like this
m=sort(m,sortbynumber)
$ryan
[1] 4
$bob
[1] 3
$dan
[1] 1
I can't figure this out unfortunately. Seems like a simple solution.
You can try order
m[order(-unlist(m))]
#$ryan
#[1] 4
#$bob
#[1] 3
#$dan
#[1] 1
Or a slightly more efficient option would be to use decreasing=TRUE argument of order (from #nicola's comments)
m[order(unlist(m), decreasing=TRUE)]
here is the optimized solution
library(hashmap)
a1<-hashmap("hello",1)
a1$insert("hello1",4)
a1$insert("hello2",2)
a1$insert("hello3",3)
sort(a1$data(),decreasing = TRUE)
#OUTPUT
hello1 hello3 hello2 hello
4 3 2 1

How to format numbers in R, specifying the number of significant digits but keep significant zeroes and integer part?

I've been struggling with formatting numbers in R using what I feel are very sensible rules. What I would want is to specify a number of significant digits (say 3), keep significant zeroes, and also keep all digits before the decimal point, some examples (with 3 significant digits):
1.23456 -> "1.23"
12.3456 -> "12.3"
123.456 -> "123"
1234.56 -> "1235"
12345.6 -> "12346"
1.50000 -> "1.50"
1.49999 -> "1.50"
Is there a function in R that does this kind of formatting? If not, how could it be done?
I feel these are quite sensible formatting rules, yet I have not managed to find a function that formats in this way in R. As far as I googled this is not a duplicate of many similar questions such as this
Edit:
Inspired by the two good answers I put together a function myself that I believe works for all cases:
sign_digits <- function(x,d){
s <- format(x,digits=d)
if(grepl("\\.", s) && ! grepl("e", s)) {
n_sign_digits <- nchar(s) -
max( grepl("\\.", s), attr(regexpr("(^[-0.]*)", s), "match.length") )
n_zeros <- max(0, d - n_sign_digits)
s <- paste(s, paste(rep("0", n_zeros), collapse=""), sep="")
}
s
}
format(num,3) comes very close.
format(1.23456,digits=3)
# [1] "1.23"
format(12.3456,digits=3)
# [1] "12.3"
format(123.456,digits=3)
# [1] "123"
format(1234.56,digits=3)
# [1] "1235"
format(12345.6,digits=3)
# [1] "12346"
format(1.5000,digits=3)
# [1] "1.5"
format(1.4999,digits=3)
# [1] "1.5"
Your rules are not actually internally consistent. You want 1234.56 to round down to 1234, yet you want 1.4999 to round up to 1.5.
EDIT This appears to deal with the very valid point made by #Henrik.
sigDigits <- function(x,d){
z <- format(x,digits=d)
if (!grepl("[.]",z)) return(z)
require(stringr)
return(str_pad(z,d+1,"right","0"))
}
z <- c(1.23456, 12.3456, 123.456, 1234.56, 12345.6, 1.5000, 1.4999)
sapply(z,sigDigits,d=3)
# [1] "1.23" "12.3" "123" "1235" "12346" "1.50" "1.50"
As #jlhoward points out, your rounding rule is not consistent. Hence you should use a conditional statement:
x <- c(1.23456, 12.3456, 123.456, 1234.56, 12345.6, 1.50000, 1.49999)
ifelse(x >= 100, sprintf("%.0f", x), ifelse(x < 100 & x >= 10, sprintf("%.1f", x), sprintf("%.2f", x)))
# "1.23" "12.3" "123" "1235" "12346" "1.50" "1.50"
It's hard to say the intended usage, but it might be better to use consistent rounding. Exponential notation could be an option:
sprintf("%.2e", x)
[1] "1.23e+00" "1.23e+01" "1.23e+02" "1.23e+03" "1.23e+04" "1.50e+00" "1.50e+00"
sig0=\(x,y){
dig=abs(pmin(0,floor(log10(abs(x)))-y+1))
dig[is.infinite(dig)]=y-1
sprintf(paste0("%.",dig,"f"),x)
}
> v=c(1111,111.11,11.1,1.1,1.99,.01,.001,0,-.11,-.9,-.000011)
> paste(sig0(v,2),collapse=" ")
[1] "1111 111 11 1.1 2.0 0.010 0.0010 0.0 -0.11 -0.90 -0.000011"
Or the following is almost the same with the exception that 0 is converted to 0 and not 0.0 (fg is a special version of f where the digits specify significant digits and not digits after the decimal point, and the # flag causes fg to not drop trailing zeroes):
> paste(sub("\\.$","",formatC(v,2,,"fg","#")),collapse=" ")
[1] "1111 111 11 1.1 2.0 0.010 0.0010 0 -0.11 -0.90 -0.000011"

Resources