R combine / merge two columns to one column - r

I have two columns (class: character) in a data.frame that include large numbers (e.g. column A: 999967258082532415; columns B: 999967258082532415). I want a new columns C that combines the two numbers:999967258082532415999967258082532415
I use:
data_1$visit_id <- do.call(paste, c(data_1[c("post_visid_high", "post_visid_low")], sep = ""))
But my new column gets converted to factor, but I still want a character. What can i do?

I created a sample dataset that resembles yours:
df <- data.frame(col_A = c(2314325435454354,123098213728903214,12329042374094),
col_B = c(9034832054097390485,30945743504375043,234903284304))
Using dplyr, create a new column (column C) that concatenates the other two columns, followed by mutating all columns to character data type:
library(dplyr)
df <- df %>%
mutate(col_C = col_A + col_B) %>%
mutate_all(funs(as.character(.)))

Related

R: extract the last two numbers in a variable

I have two datasets (data1 and data2).
Data1 has (one of many) a column named: B23333391
Data2 has a column called id_number, where id numbers are listed (e.g. 344444491)
I need to extract the last two digits (91) from the variable in data1 and merge it with the last two digits of the id number in data2 in column id_number
Since the last two digits represents an individual.
E.g.:
Data1:
columns: -> B23333391..... and so on
Data2:
columns: -> id_number
344444491
and so on....
How can this be done?
Thanks in advance!
Try this approach. You can use a dplyr pipeline to format an id variable in both dataframes using substr(). The last two digits can be extracted with nchar(). After that you can merge using left_join(). Here the code with simulated data similar to those shared by you:
library(dplyr)
#Data
df1 <- data.frame(Var1=c('B23333391'),Val1=1,stringsAsFactors = F)
df2 <- data.frame(Varid=c('344444491'),Val2=1,stringsAsFactors = F)
#Merge
dfnew <- df1 %>%
mutate(id=substr(Var1,nchar(Var1)-1,nchar(Var1))) %>%
left_join(df2 %>% mutate(id=substr(Varid,nchar(Varid)-1,nchar(Varid))))
Output:
Var1 Val1 id Varid Val2
1 B23333391 1 91 344444491 1

R look up matches in a column

I have a data frame that has one column, it has almost 20,0000
df1 %>% values c(10,20,30,50)
and I have another data frame, that has multiple columns one of those columns is also values.
df2 %>% id c(24782,18741,17041,10471401)
values c(70,90,10,20,50)
and more columns in here and this data set 50,00000 of 13 variables.
I want to see if the values column in df1 %in% in values df2, and put that in a new column in a new dataframe.
df3 <- df2 %>%
mutate(newvalue = ifelse(df1$values %in% df2$values,1,0))
Error: Column ... must be length ... (the number of rows) or one, not ...
Two problems.
Given that you are modifying df2, your order is wrong. df1$values %in% df2$values tells you, for each df1$values item, whether it is in df2. So the result is as long as df1, not df2. It doesn't make sense to put that information in df2, because it is a result about df1. You either need to add the column to df1, or switch the order and use df2$values %in% df1$values (I think this is what you want).
dplyr functions expected unquoted column names of the data frame argument. So, if you pipe df2 into mutate, you don't use df2$ inside that mutate.
Making both these corrections, you get
df3 <- df2 %>% mutate(newvalue = ifelse(values %in% df1$values,1,0))
As an extra tip, %in% returns a boolean (TRUE/FALSE) result. You don't need ifelse to convert that to 1/0, it is more efficient to use as.integer, for the same result.
df3 <- df2 %>% mutate(newvalue = as.integer(values %in% df1$values))

Match Column values of a dataframe with another dataframe with a column containing comma seprated values using grepl

I am trying to lookup and match two data frame (df1 & df2) by matching a column (val) on each data frame, the first data frame column contains text value but on the second data frame (df2) column contains multiple text values separated by comma, here is the view of the data frames
val=c("AAAA","XXXX","BBBB","YYYY","ZZZZ","MMMM","YYYY","CCCC","GGGG")
df1 <- as.data.frame(val)
val=c("AAAA,BBBB","BBBB,CCCC,FFFF","CCCC,DDDD,GGGG,FFFF","GGGG","")
id =c(1,2,3,4,5)
df2 <- as.data.frame(val,id)
if the value of df1 is found in df2 then I need the id on the new column in df1
See whether the below code helps.
library(tidyverse)
df2 = df2 %>%
mutate(val=str_split(val, ",", n=Inf)) %>%
unnest
df1 = df1 %>%
left_join(df2, by="val")

Change column class if column consists of numbers

I have a data frame where I had to convert all variables to the character class in order to bind_rows(). Now I want to identify and convert the columns that have numbers in them back to class numeric. I have 41 values so I don't want to have to mutate each of them separately.
Preferably the tidyverse way.
library(dplyr)
data_frame(number_var = as.character(rnorm(1:26)),
character_var = LETTERS)
You could use parse_guess from readrpackage:
library(dplyr)
library(readr)
df <- data_frame(number_var = as.character(rnorm(1:26)),
character_var = LETTERS)
df %>%
mutate_all(parse_guess) # guess column type for each column

R: reshape dataframe from wide to long format based on compound column names

I have a dataframe containing observations for two sets of data (A,B), with dataset and observation type given by the column names :
mydf <- data.frame(meta1=paste0("a",1:2), meta2=paste0("b",1:2),
A_var1 = c(11:12), A_var2 = c("p","r"),
B_var1 = c(21:22), B_var2 = c("x","z"))
I would like to reshape this dataframe so that each row contains observations on one set only. In this long format, set and column names should by given by splitting the original column names at the '_':
mydf2 <- data.frame(meta1=rep(paste0("a",1:2),2),
meta2=rep(paste0("b",1:2),2),
set=c("A","B","A","B"),
var1 = c(11:12),
var2 = c("a","b","c","d"))
I have tried using 'gather' in combination with 'str_split','sub', but unfortunately without success. Could this be done using tideverse functions?
Yes you can do this with tidyverse !
You were close, you need to gather, then separate, then spread.
new_df <- mydf %>%
gather(set, vars, 3:6) %>%
separate(set, into = c('set', 'var'), sep = "_") %>%
spread(var, vars)
hope this helps!

Resources