I have a data frame (let's call it 'df') it consists of two columns
Name Contact
A 34552325
B 423424
C 4324234242
D hello1#company.com
I want to split the dataframe into two dataframe based on whether a row in column "Contact" is numeric or not
Expected Output:
Name Contact
A 34552325
B 423424
C 4324234242
and
Name Contact
D hello1#company.com
I tired using:
df$IsNum <- !(is.na(as.numeric(df$Contact)))
But this classified "hello1#company.com" also as numeric.
Basically if there is even a single non-numeric value in column "Contact", then code must classify it as non-numeric
You may use grepl..
x <- " Name Contact
A 34552325
B 423424
C 4324234242
D hello1#company.com"
df <- read.table(text=x, header = T)
x <- df[grepl("^\\d+$",df$Contact),]
y <- df[!grepl("^\\d+$",df$Contact),]
x
# Name Contact
# 1 A 34552325
# 2 B 423424
# 3 C 4324234242
y
# Name Contact
# 4 D hello1#company.com
We can create a grouping variable with grepl (same as how #Avinash Raj created), split the dataframe with that to create a list of data.frames.
split(df, grepl('^\\d+$', df$Contact))
Related
We have brands data in a column/variable which is delimited by semicolon(;). Our task is to split these column data to multiple columns which we were able to do with the following syntax.
Attached the data as Screen shot.
Data set
Here is the R code:
x<-dataset$Pref_All
point<-df %>% separate(x, c("Pref_01","Pref_02","Pref_03","Pref_04","Pref_05"), ";")
point[is.na(point)] <- ""
However our question is: We have this type of brands data in more than 10 to 15 columns and if we use the above syntax the maximum number of columns to be split is to be decided on the number of brands each column holds (which we manually calculated and taken as 5 columns).
We would like to know is there any way where we can write the code in a dynamic way such that it should calculate the maximum number of brands each column holds and accordingly it should create those many new columns in a data frame. for e.g.
Pref_01,Pref_02,Pref_03,Pref_04,Pref_05.
the preferred output is given as a screen shot.
Output
Thanks for the help in advance.
x <- c("Swift;Baleno;Ciaz;Scross;Brezza", "Baleno;swift;celerio;ignis", "Scross;Baleno;celerio;brezza", "", "Ciaz;Scross;Brezza")
strsplit(x,";")
library(dplyr)
library(tidyr)
x <- data.frame(ID = c(1,2,3,4,5),
Pref_All = c("S;B;C;S;B",
"B;S;C;I",
"S;B;C;B",
" ",
"C;S;B"))
x$Pref_All <- as.character(levels(x$Pref_All))[x$Pref_All]
final_df <- x %>%
tidyr::separate(Pref_All, c(paste0("Pref_0", 1:b[[which.max(b)]])), ";")
final_df$ID <- x$Pref_All
final_df <- rename(final_df, Pref_All = ID)
final_df[is.na(final_df)] <- ""
Pref_All Pref_01 Pref_02 Pref_03 Pref_04 Pref_05
1 S;B;C;S;B S B C S B
2 B;S;C;I B S C I
3 S;B;C;B S B C B
4
5 C;S;B C S B
The trick for the column names is given by paste0 going from 1 to the maximum number of brands in your data!
I would use str_split() which returns a list of character vectors. From that, we can work out the max number of preferences in the dataframe and then apply over it a function to add the missing elements.
df=data.frame("id"=1:5,
"Pref_All"=c("brand1", "brand1;brand2;brand3", "", "brand2;brand4", "brand5"))
spl = str_split(df$Pref_All, ";")
# Find the max number of preferences
maxl = max(unlist(lapply(spl, length)))
# Add missing values to each element of the list
spl = lapply(spl, function(x){c(x, rep("", maxl-length(x)))})
# Bind each element of the list in a data.frame
dfr = data.frame(do.call(rbind, spl))
# Rename the columns
names(dfr) = paste0("Pref_", 1:maxl)
print(dfr)
# Pref_1 Pref_2 Pref_3
#1 brand1
#2 brand1 brand2 brand3
#3
#4 brand2 brand4
#5 brand5
I want to map the FactorName in the dataframe FName to the column header names of Stack. Ie Factor1 in Stack is actually named Value, Factor 2 is Leverage etc. I have a large dataset so manually renaming is not an option.
Stack <- data.frame(rowid=1:3, Factor1=2:4, Factor2=3:5, Factor3=4:6)
FName <- data.frame(FactorID=c("Factor1","Factor2","Factor3"), FactorName=c("Value","Leverage","Growth"))
Thanks.
How about this using match:
Stack <- data.frame(rowid=1:3, Factor1=2:4, Factor2=3:5, Factor3=4:6)
FName <- data.frame(
FactorID=c("Factor1","Factor2","Factor3"),
FactorName=c("Value","Leverage","Growth"))
# Matching entries from FName
colnames(Stack) <- ifelse(
!is.na(FName$FactorName[match(colnames(Stack), FName$FactorID)]),
as.character(FName$FactorName[match(colnames(Stack), FName$FactorID)]),
colnames(Stack));
Stack;
# rowid Value Leverage Growth
#1 1 2 3 4
#2 2 3 4 5
#3 3 4 5 6
Explanation: We match column names of Stack and entries from FName$FactorID. If there is a match, replace with FName$FactorName, else keep the original column name.
if we have factor names handy then we can use that to change the column names
colnames(Stack) <- "facotor header file"
Another approach using match, but using indexing instead of ifelse
# Get indices of matches
m <- match(names(Stack), FName$FactorID)
# replace names where a match is found.
names(Stack)[!is.na(m)] <- as.character(FName$FactorName[m[!is.na(m)]])
Please help, I need to extract all entries from column B
which appear against those in Column A from a data frame
I need to search Column A based on string which has GK104
That is, if column A has GK104 in its enries, it will fetch corresponding entry from column B
A B
DT-GK104-BIN1-E-A1 8000_AMKR
DT-GK104-BIN2-E-A2 8000_ASET
DT-GK104-BIN3-E-A1 8000_CPAC
DT-GK104-BIN4-E-ZK 8000_PWOO
DT-GK104-BIN5-E-ZK 8000_SPIL
This is simple. To continue Andrew Gustar's comment, you just need to use grepl:
df <-
"A B
DT-GK104-BIN1-E-A1 8000_AMKR
DT-GK104-BIN2-E-A2 8000_ASET
DT-GK104-BIN3-E-A1 8000_CPAC
DT-GK104-BIN4-E-ZK 8000_PWOO
DT-GK104-BIN5-E-ZK 8000_SPIL"
df <- read.table(text=df, header = T, stringsAsFactors = F)
# Save a value which you want to match
value <- "A1"
# You can get a filtered dataframe
df[grepl(value, df$A),]
A B
1 DT-GK104-BIN1-E-A1 8000_AMKR
3 DT-GK104-BIN3-E-A1 8000_CPAC
# Or you can just get a character vector of matched values in the second column
df$B[grepl(value, df$A)]
[1] "8000_AMKR" "8000_CPAC"
I am trying to train a data that's converted from a document term matrix to a dataframe. There are separate fields for the positive and negative comments, so I wanted to add a string to the column names to serve as a "tag", to differentiate the same word coming from the different fields - for example, the word hello can appear both in the positive and negative comment fields (and thus, represented as a column in my dataframe), so in my model, I want to differentiate these by making the column names positive_hello and negative_hello.
I am looking for a way to rename columns in such a way that a specific string will be appended to all columns in the dataframe. Say, for mtcars, I want to rename all of the columns to have "_sample" at the end, so that the column names would become mpg_sample, cyl_sample, disp_sample and so on, which were originally mpg, cyl, and disp.
I'm considering using sapplyor lapply, but I haven't had any progress on it. Any help would be greatly appreciated.
Use colnames and paste0 functions:
df = data.frame(x = 1:2, y = 2:1)
colnames(df)
[1] "x" "y"
colnames(df) <- paste0('tag_', colnames(df))
colnames(df)
[1] "tag_x" "tag_y"
If you want to prefix each item in a column with a string, you can use paste():
# Generate sample data
df <- data.frame(good=letters, bad=LETTERS)
# Use the paste() function to append the same word to each item in a column
df$good2 <- paste('positive', df$good, sep='_')
df$bad2 <- paste('negative', df$bad, sep='_')
# Look at the results
head(df)
good bad good2 bad2
1 a A positive_a negative_A
2 b B positive_b negative_B
3 c C positive_c negative_C
4 d D positive_d negative_D
5 e E positive_e negative_E
6 f F positive_f negative_F
Edit:
Looks like I misunderstood the question. But you can rename columns in a similar way:
colnames(df) <- paste(colnames(df), 'sample', sep='_')
colnames(df)
[1] "good_sample" "bad_sample" "good2_sample" "bad2_sample"
Or to rename one specific column (column one, in this case):
colnames(df)[1] <- paste('prefix', colnames(df)[1], sep='_')
colnames(df)
[1] "prefix_good_sample" "bad_sample" "good2_sample" "bad2_sample"
You can use setnames from the data.table package, it doesn't create any copy of your data.
library(data.table)
df <- data.frame(a=c(1,2),b=c(3,4))
# a b
# 1 1 3
# 2 2 4
setnames(df,paste0(names(df),"_tag"))
print(df)
# a_tag b_tag
# 1 1 3
# 2 2 4
I have a data frame df like this
1 2 3 4
A B C A
where the colnames are {1,2,3,4}. I would like to select one of the column of the data frame according to an index that I set externally
colf <- as.numeric(mo)
fmo <- df[[colf]]
Many thanks,
First things first I don't recommend having numbers as column names. Saying that, this should help you out.
> df <- data.frame("1"="A","2"="B","3"="C")
> df
X1 X2 X3
1 A B C
> df$X1 #Get column by name
[1] A
Levels: A
> df[,1] #Get first column
[1] A
Levels: A
>
Treat the data frame as a matrix and index it using [row,column] notation, i.e.
fmo = df[,colf]
This will always get column number colf.