Remove part of string from column names [closed]

Remove part of string from column names [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
That's a data:
structure(list(Fasta.headers = c("Person01050.1", "Person01080.1",
"Person01090.1", "Person01100.4", "Person01140.1", "Person01220.1"),
ToRemove.Gr_1 = c(0, 1107200, 17096000, 0, 0, 0), ToRemove.Gr_10 = c(0,
37259000, 1104800000, 783870, 0, 1308600), ToRemove.Gr_11 = c(1835800,
53909000, 623960000, 0, 0, 0), ToRemove.Gr_12 = c(0, 19117000,
808600000, 0, 0, 719400), ToRemove.Gr_13 = c(2544200, 2461400,
418770000, 0, 0, 0), ToRemove.Gr_14 = c(5120400, 1373700,
117330000, 0, 0, 0), ToRemove.Gr_15 = c(6623500, 0, 73336000,
0, 0, 0), ToRemove.Gr_16 = c(0, 0, 31761000, 0, 0, 0), ToRemove.Gr_17 = c(13475000,
0, 29387000, 0, 0, 0), ToRemove.Gr_18 = c(7883300, 0, 27476000,
0, 0, 0), ToRemove.Gr_19 = c(82339000, 3254700, 50825000,
0, 0, 0), ToRemove.Gr_2 = c(1584100, 84847000, 5219500000,
6860700, 0, 8337700), ToRemove.Gr_20 = c(205860000, 0, 67685000,
0, 0, 0), ToRemove.Gr_21 = c(867120000, 1984400, 2.26e+08,
0, 0, 10502000)), .Names = c("Fasta.headers", "ToRemove.Gr_1",
"ToRemove.Gr_10", "ToRemove.Gr_11", "ToRemove.Gr_12", "ToRemove.Gr_13",
"ToRemove.Gr_14", "ToRemove.Gr_15", "ToRemove.Gr_16", "ToRemove.Gr_17",
"ToRemove.Gr_18", "ToRemove.Gr_19", "ToRemove.Gr_2", "ToRemove.Gr_20",
"ToRemove.Gr_21"), row.names = c(NA, 6L), class = "data.frame")
As already column names suggests part "ToRemove" should be removed from the name and only Gr_* should stay behind.
I would appreciate two solutions for that problem. First based on a assigned string it should delete part of column name or based on specific character like . for example. It should remove whole part before or after a dot.

We can use sub
names(df1)[-1] <- sub(".*\\.", "", names(df1)[-1])
If we need the . as well, replace with .
names(df1)[-1] <- sub(".*\\.", ".", names(df1)[-1])
To match the pattern exactly, we can also match zero or more characters that are not a do t([^.]*) from the start (^) of the string followed by a dot (\\. - escape the dot as it is a metacharacter implying any character) and replace it with blank ("")
sub("^[^.]*\\.", "", names(df1)[-1])
#[1] "Gr_1" "Gr_10" "Gr_11" "Gr_12" "Gr_13" "Gr_14" "Gr_15" "Gr_16"
#[9] "Gr_17" "Gr_18" "Gr_19" "Gr_2" "Gr_20" "Gr_21"
As it is already mentioned above 'ToRemove',
sub("ToRemove.", "", names(df1)[-1], fixed = TRUE)
Also, if we need to remove all characters including .
sub("\\..*", "", names(df1)[-1])

Related

How can I condense a long list of items into categories for a repeated logit regression?

I'm using a program called Apollo to make an ordered logit model. In this model, you have to specify a list of variables like this:
apollo_beta = c(
b_var1_dum1 = 0,
b_var1_dum2 = 0,
b_var1_dum3 = 0,
b_var2_dum1 = 0,
b_var2_dum2 = 0,
b_var3_dum1 = 0,
b_var3_dum2 = 0,
b_var3_dum3 = 0,
b_var3_dum3 = 0)
I want to do two things:
Firstly, I want to be able to specify these beforehand:
specification1 = c(
b_var1_dum1 = 0,
b_var1_dum2 = 0,
b_var1_dum3 = 0,
b_var2_dum1 = 0,
b_var2_dum2 = 0,
b_var3_dum1 = 0,
b_var3_dum2 = 0,
b_var3_dum3 = 0,
b_var3_dum4 = 0)
And then be able to call it:
apollo_beta = specification1
Secondly, I want to be able to make categories:
var1 <- c(
b_var1_dum1 = 0,
b_var1_dum2 = 0,
b_var1_dum3 = 0)
var2 <- c(
b_var2_dum1 = 0,
b_var2_dum2 = 0)
var3 <- c(
b_var3_dum1 = 0,
b_var3_dum2 = 0,
b_var3_dum3 = 0,
b_var3_dum4 = 0)
And then be able to use those in the specification:
specification1 = c(
var1,
var2,
var3)
And then:
apollo_beta = specification1
I know you might not have the best knowledge of the very niche programme Apollo. I am not quite sure if this is even possible, but since it would save me days (maybe weeks) of work, can anyone give me a hint on what I might be doing wrong? I worry I have a list within a list.
Since I have to make 60 specifications of the same model with different variations of 6 variables, it would be a lot of code and lot of work if I can't shorten it like this.
Any tips would be greatly appreciated.
Data:
df <- data.frame(
var1_dum1 = c(0, 1, 0),
var1_dum2 = c(1, 0, 0),
var1_dum3 = c(0, 0, 1),
var2_dum1 = c(0, 1, 0),
var2_dum2 = c(1, 0, 0),
var3_dum1 = c(1, 1, 0),
var3_dum2 = c(1, 0, 0),
var3_dum3 = c(0, 1, 0),
var3_dum4 = c(0, 0, 1),
)
So there is a dataset with these variables. In apollo you specify "database = df" first, so it already refers to the variables.
In the list of apollo_beta, it doesn't refer to the variables directly, so technically you can call it what you want. I just want to call it the same as the variables as I will refer to them later.
My question is simple. Can I condense the long list to simply say "specification1". It's just a question of the r language. Whether the items of the list will function the same way as how it was originally written in code.
In other words, would calling apollo_beta in the above three examples lead to the same result? If not, how do I change the code so that it does lead to the same?

gt table how to add a border to column by spanner column label?

I wanted to add a border to the left of a group of column with the same column label spanner and i dont know how to do it !
I try this :
%>%tab_style(
style = list(
cell_borders(
sides = "left",
color = "black",
weight = px(3)
)
),
locations = cells_column_spanners(everything()))
but it only add border on the column spanner label part and not the entire column.
have you any idea how to do it ?
I have the result on the top and i want the result of the bottom :
Thanks for your help !
data for example (the gt format gives a lot of lines and i cannot put them here):
x<-structure(list(A = c("1", "2", "3"),
`ONE||N` = c(0, 0, 0), `ONE||%` = c(0, 0, 0), `TWO||N` = c(0,
0, 0), `TWO||%` = c(0, 0, 0), `THREE||N` = c(0, 0, 0), `THREE||%` = c(0,
0, 0), `THREE||Δ` = c(0, 0, 0), `FOUR||N` = c(0, 0, 0),
`FOUR||%` = c(0, 0, 0), `TOTAL||%` = c(0, 0, 0)))

I did'nt try to do it by spanner label ! So i do it with the column label. If someone have a response if with can add style to a group of column end the (entire column not only the top !) by their spanner label , feel free to share !:)

Count number of columns that are not zero in a data frame [duplicate]

This question already has answers here:
Error: `n()` must only be used inside dplyr verbs
(3 answers)
Closed 1 year ago.
I have made the following script to sum rows of a data frame and count number of columns that are not zero for all rows. Suddenly my script stop working and I am not sure what the error is.
test <- structure(list(col1 = c(0.126331200264469, 0, 0, 0, 0), col2 = c(0,
0, 0, 0, 0), col3 = c(0, 0, 0, 0, 0), col4 = c(0, 0, 0, 0, 0),
col5 = c(0, 0, 0, 0, 0)), row.names = c("row1", "row2", "row3",
"row4", "row5"), class = "data.frame")
script:
test.out <- test %>%
mutate(Not_Present = across(everything(), ~ . == 0) %>%
reduce(`+`), Present = ncol(test)- Not_Present)
error:
Error: `across()` must only be used inside dplyr verbs.
Run `rlang::last_error()` to see where the error occurred.

Another option is using rowSums
library(dplyr)
test %>%
mutate(Not_Present = rowSums(across(everything()) == 0),
Present = ncol(test) - Not_Present)

If it helps in any way for further work, I would just go with:
test.out <- sum(apply(test !=0, 2, any))

Is there a way to count occurrences of a specific value for unique columns in a dataframe in R?

I am relatively new to R and have a dataframe (cn_data2) with several duplicated columns. It looks something like this:
Gene breast_cancer breast_cancer breast_cancer lung_cancer lung_cancer
myc 1 0 1 1 2
ARID1A 0 2 1 1 0
Essentially, the rows are genes and the columns are different types of cancers. What I want is to find for each gene the number of times, a value (0,1,or 2) occurs for each unique cancer type.
I have tried several things but haven't been able to achieve what I want. For example, cn_data2$count1 <- rowSums(cn_data == '1') gives me a column with the number of "1" for each gene but what I want the number of "1" for each individual disease.
Hope my question is clear!I appreciate any help, thank you!
structure(list(gene1 = structure(1:6, .Label = c("ACAP3", "ACTRT2",
"AGRN", "ANKRD65", "ATAD3A", "ATAD3B"), class = "factor"), glioblastoma_multiforme_Primary_Tumor = c(0,
0, 0, 0, 0, 0), glioblastoma_multiforme_Primary_Tumor.1 = c(-1,
-1, -1, -1, -1, -1), glioblastoma_multiforme_Primary_Tumor.2 = c(0,
0, 0, 0, 0, 0), glioblastoma_multiforme_Primary_Tumor.3 = c(2,
2, 2, 2, 2, 2), glioblastoma_multiforme_Primary_Tumor.4 = c(0,
0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA, 6L))

Can I convert a certain string upon reading with fread in R data.table?

I am trying to use R's data.table package to read a large data set (~800k rows). The set contains results from a simulation of 1000 scenarios (+ a scenario "0" - so 1,001 scenarios in total) and one of the columns, "ScenId", contains the number of the scenario, e.g. 0,1,2,..
The problem is the program used to output this txt file cannot name scenario 1000 as '1000', but uses 'AAA' instead. The column 'ScenId' thus contains only numbers, apart from the value 'AAA'.
I am trying to find a solution to convert 'AAA' to 1000 preferably within the fread command.
My current workaround is using na.strings = "AAA" in fread and then replacing the NA's with 1000, after reading is complete. This works well because those are the only NA instances in the data set.
However, I was hoping for a quicker / more elegant solution, i.e. to do this within the fread command.
Any help / advice will be much appreciated.
Later edit: an attempt at posting sample data.
structure(list(ScenId = "AAA", SensId = "_", SystemProd = "ZCPP__",
AssumClass = "SPLPSV", ProjPer = 40L, ProjMode = "Annual",
VarName = "belLUL", Description = "(BEL)",
Module = "MLIAB", FormType = "inv", Group = "calc.BEL", Width = 12L,
Decimals = 2L, Scale = "Yes", Value000 = 0, Value001 = 0,
Value002 = 0, Value003 = 0, Value004 = 0, Value005 = 0, Value006 = 0,
Value007 = 0, Value008 = 0, Value009 = 0, Value010 = 0, Value011 = 0,
Value012 = 0, Value013 = 0, Value014 = 0, Value015 = 0, Value016 = 0,
Value017 = 0, Value018 = 0, Value019 = 0, Value020 = 0, Value021 = 0,
Value022 = 0, Value023 = 0, Value024 = 0, Value025 = 0, Value026 = 0,
Value027 = 0, Value028 = 0, Value029 = 0, Value030 = 0, Value031 = 0,
Value032 = 0, Value033 = 0, Value034 = 0, Value035 = 0, Value036 = 0,
Value037 = 0, Value038 = 0, Value039 = 0, Value040 = 0), .Names =("ScenId",
"SensId", "SystemProd", "AssumClass", "ProjPer", "ProjMode",
"VarName", "Description", "Module", "FormType", "Group", "Width",
"Decimals", "Scale", "Value000", "Value001", "Value002", "Value003",
"Value004", "Value005", "Value006", "Value007", "Value008", "Value009",
"Value010", "Value011", "Value012", "Value013", "Value014", "Value015",
"Value016", "Value017", "Value018", "Value019", "Value020", "Value021",
"Value022", "Value023", "Value024", "Value025", "Value026", "Value027",
"Value028", "Value029", "Value030", "Value031", "Value032", "Value033",
"Value034", "Value035", "Value036", "Value037", "Value038", "Value039",
"Value040"), class = c("data.table", "data.frame"), row.names = c(NA,
-1L), .internal.selfref = <pointer: 0x0000000000310788>)
This is just one line of my data set. Hope this makes sense.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Remove part of string from column names [closed] - r

Related

How can I condense a long list of items into categories for a repeated logit regression?

gt table how to add a border to column by spanner column label?

Count number of columns that are not zero in a data frame [duplicate]

Is there a way to count occurrences of a specific value for unique columns in a dataframe in R?

Can I convert a certain string upon reading with fread in R data.table?

Categories

Resources