substracting blank spaces into a separate column [duplicate] - r

This question already has answers here:
Split data frame string column into multiple columns
(16 answers)
Split different lengths values and bind to columns
(2 answers)
Closed 5 years ago.
I have a column of alphanumeric data containing data that looks like this K*01:01+K*01:02:01:01, K*77:01:08+K*01:02:01:22, K*10:01:77. I want to separate the data based on the strings before and after the + sign, new data should be displayed in 2 separate columns. I want my output to look like this:
column1 = K*01:01, K*77:01:08, K*10:01:77
column2 = K*01:02, K*01:02:01:02
I tried mydata$column1 <- sub("(.?)\+.", "\1", mydata$merged) works fine but when I used mydata$column2 <- sub(".\+(.?)", "\1", mydata$merged) for ID 3 K*10:01:77 is extracted both in columns 1 and 2, and I want column 2 to display a blank/empty cell for strings that do not have the + delimiter. Also I want the new columns to appear in the current data frame adjacent to the original merged column so packages like stringer do not work.

Related

Trouble converting Values in Column into Row Names of Data Frame in R [duplicate]

This question already has answers here:
Why am I getting X. in my column names when reading a data frame?
(5 answers)
data.frame without ruining column names
(2 answers)
Closed last month.
I am trying to convert the first column of a data frame as Row names.
It works fine but the names of the data frame format changes!
It changes from like 100-21-0 to X100.21.0
First column is Character values: Code, CBT, DQY, DQX etc.
and the names (or the first row?) of the data frame (double) like: Code, 100-21-0, 1002-84-2, 100-47-0 etc.
Code
100-21-0
CBT
0
DQY
1
I am using the code similar to:
newdataframe <- data.frame(dataframe, row.names = 1)
It works fine but the names of the data frame change from 100-21-0, 1002-84-2, 100-47-0 to X100.21.0, X1002.84.2, X100.47.0 !!!
I am confused why? Can anyone help on this?

operating R dataframe using variables for the column name [duplicate]

This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 8 months ago.
I want to store a column name in a variable and operate a dataframe based on that column name.
For example if a I have two columns named car_sales and airplane_sales. I have a variable var that a user sets to say car_sales. i then calculate a new column like so:
calc_col <- paste0(var,"_delta")
df$calc_col <- abs(df$var - lag(df$var ,12))
The var will change based on user input, so the resulting column will also change
How do I do this in R?
You could use:
df[[calc_col]] <- abs(df[[var]] - lag(df[[var]], 12))

Automate data frame width in R [duplicate]

This question already has answers here:
how to remove multiple columns in r dataframe?
(8 answers)
Select column 2 to last column in R
(4 answers)
Closed 2 years ago.
I have a data frame that I import from excel each week with 40+ columns. Each week the data frame has a different number of columns, I am only interested in the first 40. I take the data frame, drop the columns after 40 and rbind to another data frame (that has 40 columns).
The code I use to drop the columns is"
df = df[ -c(40:45) ] #assume df has 45 columns this week.
I would like to find a step to automate the lendth of columns to drop, similar to length(df$x) type of idea. I try width(df) but doesn't seem to work?
Any ideas please?

How to merge two data frames in R with same column name but different key values [duplicate]

This question already has answers here:
How to merge two data frame based on partial string match with R?
(2 answers)
Merging two Data Frames using Fuzzy/Approximate String Matching in R
(4 answers)
Closed 3 years ago.
I have two data frames. First has 7 columns and the second has 2 columns. The common column name is "flavor". I want to merge these two data frames but the problem is that the 2nd data frame only contains the subset of 1st data frame in column "flavor". For example,
df1$flavor <- c("mango|grapes|watermelon", "coffee|tea", "beer|alcohol|wine")
df2$flavor <- c("mango", "tea", "wine")
When I am trying to merge the both data frames, I am loosing few rows. So how to merge both data frames without effecting the rows and the columns.
You can use "fuzzyjoin" as this allows you to match partial strings. Let's say you have df1(7 columns) and df2(2 columns) :
library(fuzzyjoin)
df1 %>% regex_inner_join(df2, by = c(flavor = "flavor"))

Split data frame elements with semicolon in R [duplicate]

This question already has answers here:
Split delimited strings in a column and insert as new rows [duplicate]
(6 answers)
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 6 years ago.
I've tried to create a function that replaces semicolon-containing elements in a dataframe column with splitted entries that are places on the bottom of the column, using basic R. The main purpose is to use this function with apply and make the addition whenever detecting an entry with semicolon.
The main problem with my code is that it returns the exact same data frame without any additional values.
> df
rs2480711
rs74832092
rs4648658
rs4648659
rs61763535
rs28733941;rs67677371
>x
"rs28733941;rs67677371"
function(x){
semiCols = length(unlist(strsplit(x, ";")))
elementsRs = unlist(strsplit(x, ";"))
if(semiCols>1){
for(i in 1:semiCols){
df = rbind(df, elementsRs[i])
}}}
I would also like to know how can I expand the code in order to split rows based on one value leaving all the others unchanged. For example, this
>df
0 rs61763535 T1
1 rs28733941;rs67677371 T2
will look like this
>df2
0 rs61763535 T1
1 rs28733941 T2
1 rs67677371 T2
If I understood correctly, this will work
unlist(strsplit(as.character(df$V1),split = ";"))
Again, I couldn't get you properly. But, maybe you are looking for this
apply(df,2,function(t) unlist(strsplit(as.character(t),split = ";")))

Resources