how to handle comma(,) if it occurs inside value - opencsv

I am just start using openCSV. I want to ask suppose i have a csv file in which , is present within the value like
abc, def, ghi, j,kl, mno
note the j,klis a column value with , in between. How can i handle such situation?
Thanks

The usual procedure is to contain the values within quotes (") e.g. "abc", "def", "ghi", "j,kl", "mno" or delimit by something other and less used than a comma.

Related

Str_extract_all where replacement is keeping value between two character strings

First time posting to SO so apologies in advance, I am also quite new to R so this may not be possible.
I am trying to format a string, (extracted from a JSON) where there is a value between 2 curly braces like so
{#link value1}
I am trying to replace the {#link value1} with [[value1]] so that it will work as a link in my markdown file.
I cannot just replace the opening and then the closing as there is also {#b value2} which would be formatted to **value2**
I have cobbled together a str_replace that functions if there is only 1 link replacement needed in a string but I am running into an issue when there is two. Like so:
str <- c("This is the first {#link value1} and this is the second {#link value2}")
The actual potential strings are much more varied than this
My plan was to build a function to take input as to the type of pattern needed either bold or link and then paste the strings with the extracted value in the middle to form the replacement
However that has either left me with
This is the first [[ value1 ]][[ value2 ]] and this is the second [[ value1 ]][[ value2 ]]
or
This is the first [[ value1 ]] and this is the second [[ value1 ]]
Is there a more glamorous way of achieving this without searching from where the last } was replaced?
I was looking at the example of the documentation of stringr for str_replace and it uses an example of a function at the bottom but I can't de-code it to try using for my example
What I'm using to extract the value incl the curly braces
str_extract_all(str,"(\\{#link ).+?(\\})")
[[1]]
[1] "{#link value1}" "{#link value2}"
What I'm using to extract the value excl the curly braces and tag
str_extract_all(str,"(?<=\\{#link ).+?(?=\\})")
[[1]]
[1] "value1" "value2"
You could use str_replace_all() to perform multiple replacements by passing a named vector (c(pattern1 = replacement1)) to it. References of the form \\1, \\2, etc. will be replaced with the contents of the respective matched group created by ().
str <- c("This is the first {#link value1} and this is the second {#b value2}")
str_replace_all(str, c("\\{#link\\s+(.+?)\\}" = "[[\\1]]",
"\\{#b\\s+(.+?)\\}" = "**\\1**"))
# [1] "This is the first [[value1]] and this is the second **value2**"

Add a character to each word within a sentence in R

I have sentences in R (they represent column names of an SQL query), like the following:
sample_sentence <- "CITY, AGE,NAME,LAST_NAME, COUNTRY"
I would need to add a character(s) like "k." in front of every word of the sentence. Notice how sometimes words within the sentence may be separated by a comma and a space, but sometimes just by a comma.
The desired output would be:
new_sentence <- "k.CITY, k.AGE,k.NAME,k.LAST_NAME, k.COUNTRY"
I would prefer to achieve this without using a loop for. I saw this post Add a character to the start of every word but there they work with a vector and I can't figure out how to apply that code to my example.
Thanks
sample_sentence <- "CITY, AGE,NAME,LAST_NAME, COUNTRY"
gsub(pattern = "(\\w+)", replacement = "k.\\1", sample_sentence)
# [1] "k.CITY, k.AGE,k.NAME,k.LAST_NAME, k.COUNTRY"
Explanation: in regex \\w+ matches one or more "word" characters, and putting it in () makes each match a "capturing group". We replace each capture group with k.\\1 where \\1 refers to every group captured by the first set of ().
A possible solution:
sample_sentence <- "CITY, AGE,NAME,LAST_NAME, COUNTRY"
paste0("k.", gsub("(,\\s*)", "\\1k\\.", sample_sentence))
#> [1] "k.CITY, k.AGE,k.NAME,k.LAST_NAME, k.COUNTRY"

In R How to remove a precise character in a column ( in this case the " , " )that has other same character that i don't want to remove?

i have a dataset with some columns that have a monetized value, but considering the name of the columns and the description of them, i believe that there's an error in the representation of the numbers. i.e. (5,52,32,974)----> this is an example of the number, i believe there is a comma too many or put in the wrong position. I would like to know if it's possible to remove a certain comma in this case and came to this representation of the number, for instance 55.232.974... of $ for example. The dataset is in .csv. Thanks in advance.
if I understand it correctly your data is given as a string.
Then you could use the following code:
a <- c("5,52,32,974", "5,52,32,974", "5,52,32,974")
b <- gsub(",", "", a)
as.numeric(b)
#[1] 55232974 55232974 55232974

How to remove starting(suffix) special character("_") from column names [duplicate]

After I collapse my rows and separate using a semicolon, I'd like to delete the semicolons at the front and back of my string. Multiple semicolons represent blanks in a cell. For example an observation may look as follows after the collapse:
;TX;PA;CA;;;;;;;
I'd like the cell to look like this:
TX;PA;CA
Here is my collapse code:
new_df <- group_by(old_df, unique_id) %>% summarize_each(funs(paste(., collapse = ';')))
If I try to gsub for semicolon it removes all of them. If if I remove the end character it just removes one of the semicolons. Any ideas on how to remove all at the beginning and end, but leaving the ones in between the observations? Thanks.
use the regular expression ^;+|;+$
x <- ";TX;PA;CA;;;;;;;"
gsub("^;+|;+$", "", x)
The ^ indicates the start of the string, the + indicates multiple matches, and $ indicates the end of the string. The | states "OR". So, combined, it's searching for any number of ; at the start of a string OR any number of ; at the end of the string, and replace those with an empty space.
The stringi package allows you to specify patterns which you wish to preserve and trim everything else. If you only have letters there (though you could specify other pattern too), you could simply do
stringi::stri_trim_both(";TX;PA;CA;;;;;;;", "\\p{L}")
## [1] "TX;PA;CA"

R remove quotation mark in column name of the data frame

I have SPSS data, which I have to migrate to R. The data is large with 202 columns and thousands of rows
v1 v2 v3 v4 v5
1 USA Male 21 Married
2 INDIA Female 54 Single
3 CHILE Male 33 Divorced ...and so on...
The data file
contains variable labels "Identification No", "Country of origin", "Gender", "(Current) Year", "Marital Status - Candidate"
I read my data from SPSS with following command
data<-read.spss(file.sav,to.data.frame=TRUE,reencode='utf-8')
The column name is read as v1,v2,v3,v4 etc, but I want variable labels as my column name in data frame. I used following command to find the variable labels and set it as names
vname<-attr(data,"variable.labels")
for(i in 1:202){vl[i]<-vname[[i]]}
names(data)<-vl
Now the problem is that I have to address that column like data$"Identification number", which is not very nice. I want to remove quotation marks around the column names. How can I do that?
You can't. An unquoted space is a syntactic symbol that breaks the grammar up.
An option is to change the names to ones without spaces in, and you can use the make.names function to do that.
> N = c("foo","bar baz","bar baz")
> make.names(N)
[1] "foo" "bar.baz" "bar.baz"
You might want to make sure you have unique names:
> make.names(N, unique=TRUE)
[1] "foo" "bar.baz" "bar.baz.1"
The quotation marks were there because the names had spaces in them. print(vl,quotes=FALSE) displayed text without quotation marks. But I had to use quotation marks in order to use it as a single variable name. Without quotation marks, the spaces would break the variable names.
This could be solved by removing spaces in the name. I solved this by substituting all the spaces in between the names by using gsub command
vl<-gsub(" ","",vl)
names(data)<-vl
Now most of the column names can be accessed without using quotation marks. But the names containing other punctuation marks couldn't be used without quotation.
Alos the solution by Spacedman worked fine and seems easier to use.
make.names(vl, unique=TRUE)
But I liked the solution by David Arenburg.
gsub("[ [:punct:]]", "" , vl)
It removed all punctuation marks and made the column name clean and better.
Spaces are okay in data.table column names without much fuss. But, no, there's no way to avoid using quotation marks for the reason Spacedman gave: spaces break up the syntax.
require(data.table)
DT <- data.table(a = c(1,1), "bc D" = c(2,3))
# three identical results:
DT[['bc D']]
DT$bc
DT[,`bc D`]
Okay, so partial matching with $ (which also works with data.frames) gets you out of using quotes. But it will bring trouble if you get it wrong.

Resources