I have some fields that are really long, but I just want to see the beginning of them. Is there a way to truncate a field to only the first X characters?
If you mean by trimming long strings, sure. Figure out which strings to trim then trim them.
e.g., trimming a string to the first 10 characters
$ echo '"12345678901234567890"' | jq '.[0:10]'
Read, take the first 0-10 characters of the string.
If you want to recursively trim all strings:
.. |= (if type == "string" then .[0:2] else . end)
For example, if the input is:
{"a": "aaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"b": "bbbbbbbbbbbbbbbbbbbbbbbbbb",
"c": ["ddddddddddddddd"]
}
the output (compacted) would be:
{"a":"aa","b":"bb","c":["dd"]}
Related
i have a dataset with some columns that have a monetized value, but considering the name of the columns and the description of them, i believe that there's an error in the representation of the numbers. i.e. (5,52,32,974)----> this is an example of the number, i believe there is a comma too many or put in the wrong position. I would like to know if it's possible to remove a certain comma in this case and came to this representation of the number, for instance 55.232.974... of $ for example. The dataset is in .csv. Thanks in advance.
if I understand it correctly your data is given as a string.
Then you could use the following code:
a <- c("5,52,32,974", "5,52,32,974", "5,52,32,974")
b <- gsub(",", "", a)
as.numeric(b)
#[1] 55232974 55232974 55232974
I have a vector of 8-character file names of the format
"/relative/path/to/folder/a(bc|de|fg)...[xy]1.sav"
where the brackets hold one of two-three known characters, and the '...' are three unknown characters. I want to match all character vectors that has the same unknown sequence XXX and sort into a list of character vectors.
I am not sure how to proceed on this. I am thinking about a way to extract the letters in the fourth to sixth position (...), and put into a vector then use `grep to get all the files with the matching string.
E.g.
# Pseudo-code. Not functioning code, but sort of the thing I want to do
> char.extr <- str_extract(file.vector, !"a(bc|de|fg)...[xy]1.sav")
> char.extr
"JKL", "MNO" ,"PQR" ...
# Use grep and lapply to put matched strings into list
> path.list <- lapply(char.extr, grep, file.vector)
> path.list
1. "/relative/path/to/folder/abcJKLx1.sav"
"/relative/path/to/folder/adeJKLy1.sav"
2. "/relative/path/to/folder/afgMNOx1.sav"
"/relative/path/to/folder/abcMNOy1.sav"
Since we know the name structure, I'd imaging extracting the 3 letter substring and then using split to get individual lists is what you're looking for.
split(path.list, substr(basename(path.list), 4, 6))
After I collapse my rows and separate using a semicolon, I'd like to delete the semicolons at the front and back of my string. Multiple semicolons represent blanks in a cell. For example an observation may look as follows after the collapse:
;TX;PA;CA;;;;;;;
I'd like the cell to look like this:
TX;PA;CA
Here is my collapse code:
new_df <- group_by(old_df, unique_id) %>% summarize_each(funs(paste(., collapse = ';')))
If I try to gsub for semicolon it removes all of them. If if I remove the end character it just removes one of the semicolons. Any ideas on how to remove all at the beginning and end, but leaving the ones in between the observations? Thanks.
use the regular expression ^;+|;+$
x <- ";TX;PA;CA;;;;;;;"
gsub("^;+|;+$", "", x)
The ^ indicates the start of the string, the + indicates multiple matches, and $ indicates the end of the string. The | states "OR". So, combined, it's searching for any number of ; at the start of a string OR any number of ; at the end of the string, and replace those with an empty space.
The stringi package allows you to specify patterns which you wish to preserve and trim everything else. If you only have letters there (though you could specify other pattern too), you could simply do
stringi::stri_trim_both(";TX;PA;CA;;;;;;;", "\\p{L}")
## [1] "TX;PA;CA"
I have column values in my data frame like my_name_is_khan , hello|this|is|it and so on . How do I use str_extract to do this? When I use str_extract, this is what I get. When i do not exactly know the character length before the first special char (- or |), what do I do ?
str_extract("my-name-is-khan", pattern = "[a-z]{1,6}")
[1] "my"
After I collapse my rows and separate using a semicolon, I'd like to delete the semicolons at the front and back of my string. Multiple semicolons represent blanks in a cell. For example an observation may look as follows after the collapse:
;TX;PA;CA;;;;;;;
I'd like the cell to look like this:
TX;PA;CA
Here is my collapse code:
new_df <- group_by(old_df, unique_id) %>% summarize_each(funs(paste(., collapse = ';')))
If I try to gsub for semicolon it removes all of them. If if I remove the end character it just removes one of the semicolons. Any ideas on how to remove all at the beginning and end, but leaving the ones in between the observations? Thanks.
use the regular expression ^;+|;+$
x <- ";TX;PA;CA;;;;;;;"
gsub("^;+|;+$", "", x)
The ^ indicates the start of the string, the + indicates multiple matches, and $ indicates the end of the string. The | states "OR". So, combined, it's searching for any number of ; at the start of a string OR any number of ; at the end of the string, and replace those with an empty space.
The stringi package allows you to specify patterns which you wish to preserve and trim everything else. If you only have letters there (though you could specify other pattern too), you could simply do
stringi::stri_trim_both(";TX;PA;CA;;;;;;;", "\\p{L}")
## [1] "TX;PA;CA"