I'm trying to convert the IP lists from here:
http://user-agent-string.info/list-of-ua/bots
into CIDR format.
As you can see, the current format is similar to:
131.253.46.102 msnbot-131-253-46-102.search.msn.com US
131.253.46.105 msnbot-131-253-46-105.search.msn.com US
131.253.46.108 msnbot-131-253-46-108.search.msn.com US
131.253.46.112 msnbot-131-253-46-112.search.msn.com US
131.253.46.114 msnbot-131-253-46-114.search.msn.com US
131.253.46.116 msnbot-131-253-46-116.search.msn.com US
131.253.46.120 msnbot-131-253-46-120.search.msn.com US
etc.
I have found a script on github here:
https://github.com/NewEraCracker/php_work/blob/master/ipRangeCalculate.php
Which I THINK does what I want, but I'm unsure how to use it to get the results I want.
Will this script achieve what I want it to, and if so, how best to use it. If not, is there a solution I haven't found yet?
Thanks.
Related
I have a list of names in my dataframe and I want to find a way to query them in Wikipedia, although it's not as simple as just appending the name to "https://en.wikipedia.org/wiki/", I want to actually query Wikipedia so that there will be a suggestion even if its not spelt correctly. So for example if I were to put in Dick Dawkins, it'd come up with Richard Dawkins. I checked and that is actually the first hit on Wikipedia.
Ideally I'd want to use RVest but I don't want to manually get every url. Is this possible?
You are right. I, too, had a hard time getting Dick Dawkins out of the wikipedia. So much so that even searching for Dick Dawkins on the wikipedia search brought me straight to Richard Dawkins.
However, if you want to search for a term (say "Richard Dawkins") then Wikipedia has a proper API for you (https://www.mediawiki.org/wiki/API:Tutorial). You can play around and find the right parameters that work for you.
Just to get you started, I wrote a function (which is somewhat similar to rg255's post). You can change the parameter for MySearch function. Please make sure that spaces in search string are replaced by '%20' for every query from your dataframe. Simple gsub function should do the job. You will also have to install 'jsonlite' package for this to work.
library(jsonlite)
MySearch <- function(srsearch){
FullSearchString <- paste("http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=",srsearch,"&format=json",sep="")
Response <- fromJSON(FullSearchString)
return(Response)
}
Response <- MySearch("Richard%20Dawkins")
You can now use the parsed JSON to use the properties that you want. As I said, you will have to play with the parameters to get it right.
Please let me know if this is not what you wanted.
Every time I miss spell this at(#) character while writing R code so what is the usage as it has a special colour so I supposed it was meant to do something useful. Any comments on that?
The "at"-sign is used to access S4 slots. It is the equivalent of the "dollar"-sign used to access lists (of which data.frames are but one example.)
On the other hand you might be talking about its special use in certain external packages? But I'm guessing that's not going to be the case here, because that would imply that you knew quite about about R.
I am using the function query() of package seqinr to download myoglobin DNA sequences from Genbank. E.g.:
query("myoglobins","K=myoglobin AND SP=Turdus merula")
Unfortunately, for a lot of the species I'm looking for I don't get any sequence at all (or for this species, only a very short one), even though I find sequences when I search manually on the website. This is because of searching for "myoglobin" in the keywords only, while often there isn't any entry in there. Often the protein type is only specified in the name ("definition" on Genbank) -- but I have no idea how to search for this.
The help page on query() doesn't seem to offer any option for this in the details, a "generic search" without any "K=" doesn't work, and I haven't found anything via googling.
I'd be happy about any links, explanations and help. Thank you! :)
There is a complete manual for the seqinr package which describes the query language more in depth in chapter 5 (available at http://seqinr.r-forge.r-project.org/seqinr_2_0-1.pdf). I was trying to do a similar query and the description for many of the genes/cds is blank so they don't come up when searching using the k= option. One alternative would be to search for the organism alone, then match gene names in the individual annotations and pull out the accession numbers, which you could then use to re-query the database for your sequences.
This would pull out the annotation for the first gene:
choosebank("emblTP")
query("ACexample", "sp=Turdus merula")
getName(ACexample$req[[1]])
annotations <- getAnnot(ACexample$req[[1]])
cat(annotations, sep = "\n")
I think that this would be a pretty time consuming way to tackle the problem but there doesn't seem to be an efficient way of searching the annotations directly. I'd be interested in any solutions you might come up with.
Any ideas how I can get a varied set of time / date strings to test a parser?
The idea is to see how wide a range of different formats can be parsed. Note that I am looking for different formats, so simply extracting all timestamps from a bunch of emails isn't that useful (since the format is fixed by RFC 2822).
[Also, I am not sure this is appropriate for SO, sorry, so please feel free to suggest an alternative place to ask.]
You'll probably have to create your own list. But here are some resources describing some of the various formats you might encounter:
http://www.hackcraft.net/web/datetime/
http://en.wikipedia.org/wiki/Date_format_by_country
I'm trying to compile a decent .zwl file for squiggly spell checking in Flex; using British words, not American as supplied by default.
Ive managed to create a decent British list of words and ran them through the AdobeSpellingGen app to get a .zwl; great stuff.
However i need to add into this list a list of names, so they wont be flagged.
Does anyone know of a good source of either free, or paid for list of English Fore and surnames? Im trying BT as i type :)
Thanks, any help with this would be greatly appreciated.
There are lots of baby names sites out there. This might be a good one http://www.listofbabynames.org/a_boys.htm as it would be fairly easy to copy.
I'll keep looking
You can screen scrape http://www.britishsurnames.co.uk/browse for a list of surnames. I'm not sure where you'd find first names though.
gnu aspell has spell checking for common names. You can try it out here:
http://chxo.com/scripts/spellcheck.php?showsource=1
source is here: http://aspell.net/
i'm not too familiar with it though so i couldn't tell you how to extract the dictionaries.
The US Census site has a list of >150k first names and surnames from the 1990 and 2000 censuses, at
http://www.census.gov/genealogy/www/
Of course, these aren't UK names, but might do if you can't find anything better.