removing some part of string from url - r

I want to remove jsessionid from given string url and backslash #start
/product.screen?productId=BS-AG-G09&JSESSIONID=SD1SL6FF6ADFF6510
so that output would be like
product.screen?productId=BS-AG-G09
More data like :
1 /product.screen?productId=WC-SH-A02&JSESSIONID=SD0SL6FF7ADFF4953
2 /oldlink?itemId=EST-6&JSESSIONID=SD0SL6FF7ADFF4953
3 /product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953
4 /product.screen?productId=FS-SG-G03&JSESSIONID=SD0SL6FF7ADFF4953
5 /cart.do?action=remove&itemId=EST-11&productId=WC-SH-A01&JSESSIONID=SD0SL6FF7ADFF4953
6 /oldlink?itemId=EST-14&JSESSIONID=SD0SL6FF7ADFF4953
7 /cart.do?action=view&itemId=EST-6&productId=MB-AG-T01&JSESSIONID=SD1SL6FF6ADFF6510
8 /product.screen?productId=BS-AG-G09&JSESSIONID=SD1SL6FF6ADFF6510
9 /product.screen?productId=WC-SH-A02&JSESSIONID=SD1SL6FF6ADFF6510
10 /cart.do?action=view&itemId=EST-6&productId=WC-SH-A02&JSESSIONID=SD1SL6FF6ADFF6510
11 /product.screen?productId=WC-SH-A02&JSESSIONID=SD1SL6FF6ADFF6510

You may use:
library(stringi)
lf1 = "/product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953"
stri_replace_all_regex(
"/product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953",
"&JSESSIONID=.*","")
Where the string: &JSESSIONID=.* (up to the end .*) gets replaced with nothing ("").
or simply: gsub("&JSESSIONID=.*","",lf1)

Related

split a string on every odd positioned spaces

I have strings of varying lengths in this format:
"/S498QSB 0 'Score=0' 1 'Score=1' 2 'Score=2' 3 'Score=3' 7 'Not administered'"
the first item is a column name and the other items tell us how this column is encoded
I want the following output:
/S498QSB
0 'Score=0'
1 'Score=1'
2 'Score=2'
3 'Score=3'
7 'Not administered'"
str_split should do it, but it's not working for me:
str_split("/S498QSB 0 'Score=0' 1 'Score=1' 2 'Score=2' 3 'Score=3' 7 'Not administered'",
"([ ].*?[ ].*?)[ ]")
You can use
str_split(x, "\\s+(?=\\d+\\s+')")
See the regex demo.
Details:
\s+ - one or more whitespaces
(?=\d+\s+') - a positive lookahead that requires the following sequence of patterns immediately to the right of the current location:
\d+ - one or more digits
\s+ - one or more whitespaces
' - a single quotation mark.

csplit in zsh: splitting file based on pattern

I would like to split the following file based on the pattern ABC:
ABC
4
5
6
ABC
1
2
3
ABC
1
2
3
4
ABC
8
2
3
to get file1:
ABC
4
5
6
file2:
ABC
1
2
3
etc.
Looking at the docs of man csplit: csplit my_file /regex/ {num}.
I can split this file using: csplit my_file '/^ABC$/' {2} but this requires me to put in a number for {num}. When I try to match with {*} which suppose to repeat the pattern as much as possible, i get the error:
csplit: *}: bad repetition count
I am using a zshell.
To split a file on a pattern like this, I would turn to awk:
awk 'BEGIN { i=0; }
/^ABC/ { ++i; }
{ print >> "file" i }' < input
This reads lines from the file named input; before reading any lines, the BEGIN section explicitly initializes an "i" variable to zero; variables in awk default to zero, but it never hurts to be explicit. The "i" variable is our index to the serial filenames.
Subsequently, each line that starts with "ABC" will increment this "i" variable.
Any and every line in the file will then be printed (in append mode) to the file name that's generated from the text "file" and the current value of the "i" variable.

Remove string if it is only last part

I have a dataframe as follows:
A B
mediafile 1
filemedia 1
media time 1
time media 1
How do I remove the word "media" only if it is the last string in the column. Final Output:
A B
mediafile 1
file 1
media time 1
time 1
Thanks!
In regex, $ means "end of the string", so media$ will match media only if it is immediately followed by the end of the string.
Use gsub for find/replace:
your_data$A = gsub(pattern = "media$", replacement = "", x = your_data$A)
R uses regex the same as any other language, so in the future I'd recommend searching SO for something like "[regex] at end of string", which turned up this question, from which you probably could have generalized.

Error in lis[[i]] : attempt to select less than one element

This code is meant to compute the total distance of some given coordinates, but I don't know why it's not working.
The error is: Error in lis[[i]] : attempt to select less than one element.
Here is the code:
distant<-function(a,b)
{
return(sqrt((a[1]-b[1])^2+(a[2]-b[2])^2))
}
totdistance<-function(lis)
{
totdis=0
for(i in 1:length(lis)-1)
{
totdis=totdis+distant(lis[[i]],lis[[i+1]])
}
totdis=totdis+distant(lis[[1]],lis[[length(lis)]])
return(totdis)
}
liss1<-list()
liss1[[1]]<-c(12,12)
liss1[[2]]<-c(18,23)
liss1[[4]]<-c(29,25)
liss1[[5]]<-c(31,52)
liss1[[3]]<-c(24,21)
liss1[[6]]<-c(36,43)
liss1[[7]]<-c(37,14)
liss1[[8]]<-c(42,8)
liss1[[9]]<-c(51,47)
liss1[[10]]<-c(62,53)
liss1[[11]]<-c(63,19)
liss1[[12]]<-c(69,39)
liss1[[13]]<-c(81,7)
liss1[[14]]<-c(82,18)
liss1[[15]]<-c(83,40)
liss1[[16]]<-c(88,30)
Output:
> totdistance(liss1)
Error in lis[[i]] : attempt to select less than one element
> distant(liss1[[2]],liss1[[3]])
[1] 6.324555
Let me reproduce your error in a simple way
>list1 = list()
> list1[[0]]=list(a=c("a"))
>Error in list1[[0]] = list(a = c("a")) :
attempt to select less than one element
So, the next question is where are you accessing 0 index list ?
(Indexing of lists starts with 1 in R )
As Molx, indicated in previous posts : "The : operator is evaluated before the subtraction - " . This is causing 0 indexed list access.
For ex:
> 1:10-1
[1] 0 1 2 3 4 5 6 7 8 9
>1:(10-1)
[1] 1 2 3 4 5 6 7 8 9
So replace the following lines of your code
>for(i in 1:(length(lis)-1))
{
totdis=totdis+distant(lis[[i]],lis[[i+1]])
}

Exception importing data into neo4j using batch-import

I am running neo-4j 1.8.2 on a remote unix box. I am using this jar (https://github.com/jexp/batch-import/downloads).
nodes.csv is same as given in example:
name age works_on
Michael 37 neo4j
Selina 14
Rana 6
Selma 4
rels.csv is like this:
start end type since counter:int
1 2 FATHER_OF 1998-07-10 1
1 3 FATHER_OF 2007-09-15 2
1 4 FATHER_OF 2008-05-03 3
3 4 SISTER_OF 2008-05-03 5
2 3 SISTER_OF 2007-09-15 7
But i am getting this exception :
Using Existing Configuration File
Total import time: 0 seconds
Exception in thread "main" java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(StringTokenizer.java:332)
at org.neo4j.batchimport.Importer$Data.split(Importer.java:156)
at org.neo4j.batchimport.Importer$Data.update(Importer.java:167)
at org.neo4j.batchimport.Importer.importNodes(Importer.java:226)
at org.neo4j.batchimport.Importer.main(Importer.java:83)
I am new to neo4j, was checking if this importer can save some coding effort.
It would be great if someone can point to the probable mistake.
Thanks for help!
--Edit:--
My nodes.csv
name dob city state s_id balance desc mgr_primary mgr_secondary mgr_tertiary mgr_name mgr_status
John Von 8/11/1928 Denver CO 1114-010 7.5 RA 0023-0990 0100-0110 Doozman Keith Active
my rels.csv
start end type since status f_type f_num
2 1 address_of
1 3 has_account 5 Active
4 3 f_of Primary 0111-0230
Hi I had some issues in the past with the batch import script.
The formating of your file must be very rigorous, which means :
no extra spaces where not expected, like the ones I see in the first line of your rels.csv before "start"
no multiple spaces in place of the tab. If your files are exactly like what you've copied here, you have 4 spaces instead of on tab, and this is not going to work, as the script uses a tokenizer looking for tabs !!!
I had this issue because I always convert tabs to 4 spaces, and once I understood that, I stopped doing it for my csv !

Resources