How to identify url in tweet using R

How to identify url in tweet using R - r

I want to identify those tweets containing URL in my twitter data set. For example, using the sign of "http://".
How to proceed it in R? for example the tweets texts are
"#RainxDog #twitpic Please HELP #OccupyWallStreet and RT this video: http://t.co/vjwNR7TC"
"#degamuna Please HELP #OccupyWallStreet and RT this video: http://t.co/vjwNR7TC"

You can use grep
if(length(grep("http://",data))>0){
data[grep("http://",data)]
}

Your relatively simple question, hides something that is actually very tricky. In your two examples, the urls:
were of the form: http://t.cp/ - what about bit.ly links? What about https?
the urls appeared at the end of the tweet. What about urls in the middle or start of the tweet?
Construct a set of sample tweets and make sure that your regular expression works.
Basically, you need a regular expression. Stackoverflow questions to look at are:
How to extract a URL from a Tweet with a JavaScript RegEx?
What's the cleanest way to extract URLs from a string using Python?
These questions also contain links.

You can get all the URLs of a tweet using Twitter Entities. When you make the REST call, make sure you include
&include_entities=true
This will give you a section in the JSON or XML called entities. There will be a child node called urls.
Here's an example of what will be returned.
"text": "Twitter for Mac is now easier and faster, and you can open multiple windows at once http://t.co/0JG5Mcq",
"entities": {
"media": [
],
"urls": [
{
"url": "http://t.co/0JG5Mcq",
"display_url": "blog.twitter.com/2011/05/twitte…",
"expanded_url": "http://blog.twitter.com/2011/05/twitter-for-mac-update.html",
"indices": [
84,
103
]
}
],
"user_mentions": [
],
"hashtags": [
]
}
So, look for entities -> urls to see if a tweet contains a link to an external site.

Related

Always get “Cannot parse non Measurement Protocol hits”

I have a little Python program that other people use and I would like to offer opt-in telemetry such that can get an idea of the usage patterns. Google Analytics 4 with the Measurement Protocol seems to be the thing that I want to use. I have created a new property and a new data stream.
I have tried to validate the request and set it to www.google-analytics.com/debug/mp/collect?measurement_id=G-LQDLGRLGZS&api_secret=JXGZ_CyvTt29ucNi9y0DkA via post and send this JSON payload:
{
"app_instance_id": "MyAppId",
"client_id": "TestClient.xx",
"events": [
{
"name": "login",
"params": {}
}
]
}
The response that I get is this:
{
"validationMessages": [
{
"description": "Cannot parse non Measurement Protocol hits.",
"validationCode": "INTERNAL_ERROR"
}
]
}
I seem to be doing exactly what they do in the documentation or tutorials. I must be doing something wrong, but I don't know, what is missing. What do I have to do in order to successfully validate the request?

Try to remove /debug part in the URL. In the example you followed it is not present so it is not quite exactly the same.

we just came across the same issue and the solution for us was to put https:// in front of the URL. Hope this helps.

Google Cloud Vision not recognizing Arabic Characters

Hi I am trying Google Cloud Vision , to detect character and words in Arabic language from image. But when i try it gives me result in matching them with english:
Request code is as below:
{
"requests": [
{
"features": [
{
"type": "TEXT_DETECTION"
}
],
"image": {
"source": {
"imageUri": "gs://dummy/noon-1.png"
}
},
"imageContext": {
"languageHints": [
"ar"
]
}
}
]
}

The Vision API service use machine learning models that are being trained constantly in order to improve the results quality; however, sometimes they get the characters wrong or even they don't recognize the characters themselves
Based on this, I suggest you take a look on the Supported Images document where you can find some file format and image sizing recommendations that may help you to improve your results accuracy, as well use the Send Feedback button, located at the lower left and upper right corners of the Vision API public documentation, or use the Issue Tracker tool in order to raise a Vision API request and notify to Google about this behavior.

Custom Slot wildcard value?

First off, I do not want to use AMAZON.Literal as it is for US only (I'm UK based) and I doubt it will be supported much longer.
I need a wildcard slot to allow users to say a place name (name of a shop for example), followed by the city.
City is easy, no problem.
The issue is the place name. I have a custom slot, but I can't list every shop in every city in the values.
I put a value of any in, which kind of works, but in my response, I'm only getting the last word if the user says a name that contains a few words e.g. Pound Land would just return Land.
Has anyone managed to do this?

As of 2018, you can use phrases to get user input that you may not be able to predefine.
{
"intents": [
{
"name": "SearchIntent",
"slots": [
{
"name": "Query",
"type": "AMAZON.SearchQuery"
},
{
"name": "CityList",
"type": "AMAZON.US_CITY"
}
],
"samples": [
"search for {Query} near me",
"find out {Query}",
"search for {Query}",
"give me details about {CityList}"
]
}
]
}
https://developer.amazon.com/blogs/alexa/post/a2716002-0f50-4587-b038-31ce631c0c07/enhance-speech-recognition-of-your-alexa-skills-with-phrase-slots-and-amazon-searchquery

When using custom slot types, AWS may return values from outside of the list yet it will try to map to the values. You can "hack" this behavior by providing a huge list of possible values. Maybe try to scrap a list of places and use that. I once tried with a list of 3000 landmarks and it was definitely returning slot values that were not in the list. The recognition was not great but I had an acoustic similarity function that allowed me to retrieve items from the list when needed. That was a while ago when they first talked about deprecating Amazon.LITERAL but eventually left it so I didn't have to worry about this.

Is there any way I can get a list of all possible responses from the google vision api?

I am using the google cloud vision api to analyze pictures. Is there a list of all the possible responses for the labelAnnotations method?

The API reference of Vision API gives an overview of all the possible JSON responses for the different image annotation requests.
The labelAnnotation request returns a generic EntityAnnotation response, you can find the JSON representation here, also containing more information about the JSON representation of BoundingPoly, LocationInfo and Property:
{
"mid": string,
"locale": string,
"description": string,
"score": number,
"confidence": number,
"topicality": number,
"boundingPoly": {
object(BoundingPoly)
},
"locations": [
{
object(LocationInfo)
}
],
"properties": [
{
object(Property)
}
],
}

I think you're asking whether you can get a look at the list of possible labels/entities that the Cloud Vision API will detect. If that's the case, the short answer is no, not in any manageable way.
The more complicated answer is sort of, since most labels will have a property for the knowledge graph entry (e.g., {desc: 'dog', mid: '/m/0bt9lr'}). This means that you can look-up more information about the label/entity using the Knowledge Graph API.
While you can't "store a copy" of Google's Knowledge Graph as a list of choices in a drop-down on a page, you can use the API to do a look-up after the Vision API responds with an ID.

Freebase query for full topic summary

I'm trying to retrieve the full topic description/summary for some Freebase articles. I have been using the Freebase topic API, which returns this type of results: http://www.freebase.com/experimental/topic/standard?id=/en/jimi_hendrix
But I notice that the description is not complete and ends with "...". Is there a way to use some Freebase API to obtain the article's full description?
Does Freebase even store the complete description or does it just stores a portion of the description from Wikipedia?

Freebase just stores a portion of the Wikipedia description but there is usually more than what's given by the topic API.
To get the "full" text for a Wikipedia blurb associated with a Freebase topic you first need to query the Read API for a list of related articles like this:
{
"id": "/en/jimi_hendrix",
"/common/topic/article": [{}]
}
Try it in the Query Editor
Then choose one or more of the articles that it returns and feed its ID into the /trans/raw API like this:
http://api.freebase.com/api/trans/raw/m/043dz
You'll notice that the blurb of text that gets returned is a bit longer (1200 chars) and doesn't have the "..." but its still chopped off at the end.
When I display Freebase topic descriptions in a web page I have some code to clean it up before hand. I split it apart into paragraphs by looking for newlines and then if the last paragraph doesn't end with a period, exclamation mark or question mark I just throw away that paragraph. The way the Wikipedia blurbs are written, you usually only need the first paragraph anyways.

You can also fetch this directly from MQL with the "text" extension:
{
"id": "/en/jimi_hendrix",
"/common/topic/article": [{
"text": {
"maxlength": 16384,
"chars": null
}
}]
}
Note that you'll need to turn on MQL extensions for this to work - see here for an example of this in action.
Edit August 2012: while this works for the original freebase.com hosted APIs, the MQL extension functionality has been removed from the new googleapis.com hosted APIs, so this method shouldn't be relied on any more.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to identify url in tweet using R - r

You can use grep if(length(grep("http://",data))>0){ data[grep("http://",data)] }

Related

Always get “Cannot parse non Measurement Protocol hits”

Google Cloud Vision not recognizing Arabic Characters

Custom Slot wildcard value?

Is there any way I can get a list of all possible responses from the google vision api?

Freebase query for full topic summary

Categories

Resources