How to combine the values in a column of dataframe - r

I have a dataframe with two column. I want to concatenate the values in a second column and return a string. How can I do this in R?

You can use paste with the appropriate delimiter. Here, I am using ''. You can specify it to -, _ or anything else.
paste(df$Col2, collapse="")
If there are NAs you could use na.omit
paste(na.omit(df$V2), collapse="")

Related

Best way to extract a single letter from each row and create a new column in R?

Below is an excerpt of the data I'm working with. I am having trouble finding a way to extract the last letter from the sbp.id column and using the results to add a new column to the below data frame called "sex". I initially tried grepl to separate the rows ending in F and the ones ending in M, but couldn't figure out how to use that to create a new column with just M or F, depending on which one is the last letter of each row in the sbp.id column
sbp.id newID
125F 125
13000M 13000
13120M 13120
13260M 13260
13480M 13480
Another way, if you know you need the last letter, irrespective of whether the other characters are numbers, digits, or even if the elements all have different lengths, but you still just need the last character in the string from every row:
df$sex <- substr(df$sbp.id, nchar(df$sbp.id), nchar(df$sbp.id))
This works because all of the functions are vectorized by default.
Using regex you can extract the last part from sbp.id
df$sex <- sub('.*([A-Z])$', '\\1', df$sbp.id)
#Also
#df$sex <- sub('.*([MF])$', '\\1', df$sbp.id)
Or another way would be to remove all the numbers.
df$sex <- sub('\\d+', '', df$sbp.id)

Removing all rows in dataframe with "$-" in one of the columns

I have an a dataframe that contains a column with dollar amounts. I need to drop all rows that contain a "$-" in that column.
I have tried changing the column to a factor, replacing the "$-" to NA and 0's. All of the codes I have used have either done nothing or dropped all the values.
df$bal<- sub("$-","",df$bal)
is.na_remove <- df$bal[!is.na(df$bal)]
df[df==""]<-0
df$bal<- lapply(list, function(df) df[df$bal=="$-"])
df$bal<- gsub("$-","",df$bal)
I want all of those rows to drop, Ill be willing to remove the $ from the entire column and then dropping.
If the intention is to remove the rows, a better option would be grep to find the substring "$-" in the 'bal' column.
out <- df[!grepl("$-", df$bal, fixed = TRUE), ]
Note that by default, the sub/gsub/grep etc matches with regular expressions as fixed = FALSE and in regex, some characters have special meaning for matching. One of this is $ which suggests the end of the string. If we don't specify fixed = TRUE or escape (\\$), it evaluates $ as the end of string instead of the literal character

Using grepl() to remove values from a dataframe in R

I have a data.frame with 1 column, and a nondescript number of rows.
This column contains strings, and some strings contain a substring, let's say "abcd".
I want to remove any strings from the database that contain that substring. For example, I may have five strings that are "123 abcd", and I want those to be removed.
I am currently using grepl() to try and remove these values, but it is not working. I am trying:
data.frame[!grepl("abcd", dataframe)]
but it returns an empty data frame.
We can use grepl to get a logical vector, negate (!) it, and use that to subset the 'data'
data[!grepl("abcd", data$Col),,drop = FALSE]

Count of Comma separated values in r

I have a column named subcat_id in which the values are stored as comma separated lists. I need to count the number of values and store the counts in a new column. The lists also have Null values that I want to get rid of.
I would like to store the counts in the n column.
We can try
nchar(gsub('[^,]+', '', gsub(',(?=,)|(^,|,$)', '',
gsub('(Null){1,}', '', df1$subcat_id), perl=TRUE)))+1L
#[1] 6 4
Or
library(stringr)
str_count(df1$subcat_id, '[0-9.]+')
#[1] 6 4
data
df1 <- data.frame(subcat_id = c('1,2,3,15,16,78',
'1,2,3,15,Null,Null'), stringsAsFactors=FALSE)
You can do
sapply(strsplit(subcat_id,","),FUN=function(x){length(x[x!="Null"])})
strsplit(subcat_id,",") will return a list of each item in subcat_id split on commas. sapply will apply the specified function to each item in this list and return us a vector of the results.
Finally, the function that we apply will take just the non-null entries in each list item and count the resulting sublist.
For example, if we have
subcat_id <- c("1,2,3","23,Null,4")
Then running the above code returns c(3,4) which you can assign to your column.
If running this from a dataframe, it is possible that the character column has been interpreted as a factor, in which case the error non-character argument will be thrown. To fix this, we need to force interpretation as a character vector with the as.character function, changing the command to
sapply(strsplit(as.character(frame$subcat_id),","),FUN=function(x){length(x[x!="Null"])})

convert commas in a column of a data set points r

I've imported from excel a dataset. And I have a column 'Height' and I would want to replace the ',' by '.' .
I tried with this command but it gives me error.
apply(apply(DATASET$Height, 2, gsub, patt=",", replace="."), 2, as.numeric)
Thank you very much for your help
To recode column 'Height' in data frame 'DATASET':
DATASET$Height <- gsub(",",".",DATASET$Height,fixed=TRUE)
Any errors? If no you can proceed to convert the column to numeric.
Get errors when converting to numeric? Perhaps you have still other characters besides "," that prevent R from reading the values as numbers. In that case you would need to apply gsub a second time to remove all non-numeric characters.
First, you should check if it is character. Then, I would split the strings by the comma, then paste them with a dot:
suppose a is what you get with DATASET[["Height"]]
a <- c("234,23", "2314,54", "234,65")
then with sapply, you can split and collapse each character element:
b <- sapply(a,
function(string){
paste0(unlist(strsplit(string, split=",")),collapse=".")
})
Now, you can replace the DATASET[["Height"]] with b.

Resources