Getting a Youtube Video URL to show in the RSS description field? - rss

Here's my problem. I need to take a standard youtube feed output, like this:
F is for Fireflies by Kathy jo Wargin Book Trailer
F is for Fireflies is written by award winning prolific children's
author, Kathy-jo Wargin. This
beautifully illustrated book explores
God's warmest, wonder filled season
from A to Z, by delighting young
readers with beaches and sandcastles,
picnics and lemonade, and all of the
blessings that God's summer brings.
Learn more about this book here,
http://bit.ly/fyLd2L and its author
here, http://bit.ly/gPQcSq Christian,
Childrens
And render it like this:
F is for Fireflies by Kathy jo Wargin Book Trailer
http://www.youtube.com/watch?v=vp6kDK2eYLU&feature=youtube_gdata
F is for Fireflies is written by award winning prolific children's
author, Kathy-jo Wargin. This
beautifully illustrated book explores
God's warmest, wonder filled season
from A to Z, by delighting young
readers with beaches and sandcastles,
picnics and lemonade, and all of the
blessings that God's summer brings.
Learn more about this book here,
http://bit.ly/fyLd2L and its author
here, http://bit.ly/gPQcSq Christian,
Childrens
I've been trying to accomplish this in Yahoo! Pipes, but I can't figure out how to get a plain-text URL into the description field with the normal YT Description. Any suggestions?

The idea is to use a Loop module with a StringBuilder module. Then you can use item.link and item.description to build up your string and save it back into the description field.
Here is a pipe that takes a Youtube search term and returns an RSS feed with the link added to the beginning of the description:
http://pipes.yahoo.com/pipes/pipe.info?_id=7e18b28120534e7b116b482893b2fdfd

Related

Bayesian networks for text analysis in R

I have one page story (i.e. text data), I need to use Bayesian network on that story and analyse the same. Could someone tell me whether it is possible in R? If yes, that how to proceed?
The objective of the analysis is - Extract Action Descriptions from
Narrative Text.
The data considered for analysis -
Krishna’s Dharam-shasthra to Arjuna:
The Gita is the conversation between Krishna and Arjuna leading up to the battle.
Krishna emphasised on two terms: Karma and Dharma. He told Arjun that this was a righteous war; a war of Dharma. Dharma is the way of righteousness or a set of rules and laws laid down. The Kauravas were on the side of Adharma and had broken rules and laws and hence Arjun would have to do his Karma to uphold Dharma.
Arjuna doesn't want to fight. He doesn't understand why he has to shed his family's blood for a kingdom that he doesn't even necessarily want. In his eyes, killing his evil and killing his family is the greatest sin of all. He casts down his weapons and tells Krishna he will not fight. Krishna, then, begins the systematic process of explaining why it is Arjuna's dharmic duty to fight and how he must fight in order to restore his karma.
Krishna first explains the samsaric cycle of birth and death. He says there is no true death of the soul simply a sloughing of the body at the end of each round of birth and death. The purpose of this cycle is to allow a person to work off their karma, accumulated through lifetimes of action. If a person completes action selflessly, in service to God, then they can work off their karma, eventually leading to a dissolution of the soul, the achievement of enlightenment and vijnana, and an end to the samsaric cycle. If they act selfishly, then they keep accumulating debt, putting them further and further into karmic debt.
What I want is - post tagger to separate verbs, nouns etc. and then create a meaningful network using that.
The steps that should be followed in pre-processing are:
syntactic processing (post tagger)
SRL algorithm (semantic role labelling of characters of the story)
conference resolution
Using all of the above I need to create a knowledge database and create a Bayesian network.
This is what I have tried so far to get post tagger:
txt <- c("As the years went by, they remained isolated in their city. Their numbers increased by freeing women from slavery.
Doom would come to the world in the form of Ares the god of war and the Son of Zeus. Ares was unhappy with the gods as he wanted to prove just how foul his father’s creation was. Hence, he decided to corrupt the mortal men created by Zeus. Fearing his wrath upon the world Zeus decided to create the God killer in order to stop Ares. He then commanded Hippolyta to mould a baby from the sand and clay of the island. Then the five goddesses went back into the Underworld, drawing out the last soul that remained in the Well and giving it incredible powers. The soul was merged with the clay and became flesh. Hippolyta had her daughter and named her Diana, Princess of the Amazons, the first child born on Paradise Island.
Each of the six members of the Greek Pantheon granted Diana a gift: Demeter, great strength; Athena, wisdom and courage; Artemis, a hunter's heart and a communion with animals; Aphrodite, beauty and a loving heart; Hestia, sisterhood with fire; Hermes, speed and the power of flight. Diana was also gifted with a sword, the Lasso of truth and the bracelets of penance as weapons to defeat Ares.
The time arrived when Diana, protector of the Amazons and mankind was sent to the Man's World to defeat Ares and rid the mortal men off his corruption. Diana believed that only love could truly rid the world of his influence. Diana was successfully able to complete the task she was sent out by defeating Ares and saving the world.
")
writeLines(txt, tf <- tempfile())
library(stringi)
library(cleanNLP)
cnlp_init_tokenizers()
anno <- cnlp_annotate(tf)
names(anno)
get_token(anno)
cnlp_init_spacy()
anno <- cnlp_annotate(tf)
get_token(anno)
cnlp_init_corenlp()

Importing a text file including <> angle brackets into R

I have data file which has angle brackets from http://kavita-ganesan.com/opinosis-opinion-dataset.
<DOCNO>2007_acura_mdx</DOCNO>
<DOC>
<DATE>07/31/2009</DATE>
<AUTHOR>FlewByU</AUTHOR>
<TEXT>I just moved to Germany two months ago and bought an 07 MDX from another military member. It has everything I could want. We just returned from a week driving through the Alps and this SUV is simply amazing. Granted, I get to drive it much faster than I could in the states, but even at 120 MPH, it was rock solid. We need the AWD for the snow and the kids stay entertained with the AV system. Plenty of passing power and very comfortable on long trips. Acuras are rare in Germany and I get stares all the time by curious Bavarians wondering what kind of vehicle I have. If you are in the market for a luxury SUV for family touring, with cool tech toys to play with, MDX can't be beat. </TEXT>
<FAVORITE>The separate controls for the rear passengers are awesome. I can control temp and AV from the front or switch to rear. Sound system is amazing. I will sometimes sit in the driveway and just listen. Also has a 120v outlet in console. Great for us since we live with 220v and need 120 on occasion. </FAVORITE>
</DOC>
<DOC>
<DATE>07/30/2009</DATE>
<AUTHOR>cvillemdx</AUTHOR>
<TEXT>After months of careful research and test drives at BMW, Lexus, Volvo, etc. I settled on the MDX without a doubt in mind. I love the way the car handles, no stiffness or resistance in the steering or acceleration. The interior design is a little Star Trek for me, but once I figured everything out, it is a pleasure to have all the extras (XM radio, navigation, Bluetooth, backup camera, etc.)</TEXT>
<FAVORITE>The self-adjusting side mirrors which rotate to give you a view of the curb/lines as you back up. Makes backing into parking spaces and parallel parking a breeze, along with the back-up camera. Also a fan of the push-to-talk for my cell phone.</FAVORITE>
</DOC>
<DOC>
<DATE>06/22/2009</DATE>
<AUTHOR>Pleased</AUTHOR>
<TEXT>I'm two years into a three year lease and I love this car. The only thing I would change would be the shape of the grill...THAT'S IT. Everything else is perfect. Great performance, plenty of power and AWD when skiing, plenty of room for baggage, great MPG for an SUV, navi system is far superior to GM's Suburban (don't have to put in park to change your destination, etc). Zero problems...just gas and oil changes. One beautiful car...except for the sho-gun shield looking grill.</TEXT>
<FAVORITE>Navi is easy, hands-free is great, AWD is perfect.</FAVORITE>
</DOC>
It seems like an XML file, but when I tried to
xml.url <- "2007_acura_mdx"
xmlfile <- xmlTreeParse(xml.url)
class(xmlfile)
xmltop <- xmlRoot(xmlfile)
topxml <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
xml_df <- data.frame(t(topxml), row.names=NULL)
I had a problem when I executed data.frame. Can anyone help me? At this moment I would like to use grep()`` and gsub() but this is also not easy.
Try this:
txt <- "<DOCNO>2007_acura_mdx</DOCNO>
<DOC>
<DATE>07/31/2009</DATE>
<AUTHOR>FlewByU</AUTHOR>
<TEXT>I just moved to Germany two months ago and bought an 07 MDX from another military member. It has everything I could want. We just returned from a week driving through the Alps and this SUV is simply amazing. Granted, I get to drive it much faster than I could in the states, but even at 120 MPH, it was rock solid. We need the AWD for the snow and the kids stay entertained with the AV system. Plenty of passing power and very comfortable on long trips. Acuras are rare in Germany and I get stares all the time by curious Bavarians wondering what kind of vehicle I have. If you are in the market for a luxury SUV for family touring, with cool tech toys to play with, MDX can't be beat. </TEXT>
<FAVORITE>The separate controls for the rear passengers are awesome. I can control temp and AV from the front or switch to rear. Sound system is amazing. I will sometimes sit in the driveway and just listen. Also has a 120v outlet in console. Great for us since we live with 220v and need 120 on occasion. </FAVORITE>
</DOC>
<DOC>
<DATE>07/30/2009</DATE>
<AUTHOR>cvillemdx</AUTHOR>
<TEXT>After months of careful research and test drives at BMW, Lexus, Volvo, etc. I settled on the MDX without a doubt in mind. I love the way the car handles, no stiffness or resistance in the steering or acceleration. The interior design is a little Star Trek for me, but once I figured everything out, it is a pleasure to have all the extras (XM radio, navigation, Bluetooth, backup camera, etc.)</TEXT>
<FAVORITE>The self-adjusting side mirrors which rotate to give you a view of the curb/lines as you back up. Makes backing into parking spaces and parallel parking a breeze, along with the back-up camera. Also a fan of the push-to-talk for my cell phone.</FAVORITE>
</DOC>
<DOC>
<DATE>06/22/2009</DATE>
<AUTHOR>Pleased</AUTHOR>
<TEXT>I'm two years into a three year lease and I love this car. The only thing I would change would be the shape of the grill...THAT'S IT. Everything else is perfect. Great performance, plenty of power and AWD when skiing, plenty of room for baggage, great MPG for an SUV, navi system is far superior to GM's Suburban (don't have to put in park to change your destination, etc). Zero problems...just gas and oil changes. One beautiful car...except for the sho-gun shield looking grill.</TEXT>
<FAVORITE>Navi is easy, hands-free is great, AWD is perfect.</FAVORITE>
</DOC>"
library(XML)
txt2 <- paste("<root>", txt, "</root>")
doc <- xmlTreeParse(txt2, asText = TRUE, useInternalNodes = TRUE)
L <- xpathApply(doc, "//DOC", xmlApply, FUN = xmlValue)
dd <- do.call(rbind, lapply(L, as.data.frame, stringsAsFactors = FALSE))
giving:
> str(dd)
'data.frame': 3 obs. of 4 variables:
$ DATE : chr "07/31/2009" "07/30/2009" "06/22/2009"
$ AUTHOR : chr "FlewByU" "cvillemdx" "Pleased"
$ TEXT : chr "I just moved to Germany two months ago and bought an 07 MDX from another military member. It has everything I could want. We ju"| __truncated__ "After months of careful research and test drives at BMW, Lexus, Volvo, etc. I settled on the MDX without a doubt in mind. I lov"| __truncated__ "I'm two years into a three year lease and I love this car. The only thing I would change would be the shape of the grill...THAT"| __truncated__
$ FAVORITE: chr "The separate controls for the rear passengers are awesome. I can control temp and AV from the front or switch to rear. Sound sy"| __truncated__ "The self-adjusting side mirrors which rotate to give you a view of the curb/lines as you back up. Makes backing into parking sp"| __truncated__ "Navi is easy, hands-free is great, AWD is perfect."

Information Extraction -> Relationships

"The movie was amazing. The background music was eccentric and the lighting was perfect"
Movie:Amazing
Background Music:Eccentric
Lighting:Perfect
How viable is this in C#?
I'm using the stanford NLP library but I don't have a clue on how I'd do this.
You can use http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html which is a .Net port of CoreNLP, though I'm not sure how up-to-date it is.
You are basically doing POS tagging: finding the adjective associated with the noun.
You can try your text here: http://nlp.stanford.edu:8080/corenlp/process
Get your POS model from CoreNLP, find the nouns (NN) that you're interested in, and get their corresponding adjectives (JJ)

Replicate Postgres pg_trgm text similarity scores in R?

Does anyone know how to replicate the (pg_trgm) postgres trigram similarity score from the similarity(text, text) function in R? I am using the stringdist package and would rather use R to calculate these on a matrix of text strings in a .csv file than run a bunch of postgresql quires.
Running similarity(string1, string2) in postgres give me a number score between 0 and 1.
I tired using the stringdist package to get a score but I think I still need to divide the code below by something.
stringdist(string1, string2, method="qgram",q = 3 )
Is there a way to replicate the pg_trgm score with the stringdist package or another way to do this in R?
An example would be getting the similarity score between the description of a book and the description of a genre like science fiction. For example, if I have two book descriptions and the using the similarity score of
book 1 = "Area X has been cut off from the rest of the continent for decades. Nature has reclaimed the last vestiges of human civilization. The first expedition returned with reports of a pristine, Edenic landscape; the second expedition ended in mass suicide, the third expedition in a hail of gunfire as its members turned on one another. The members of the eleventh expedition returned as shadows of their former selves, and within weeks, all had died of cancer. In Annihilation, the first volume of Jeff VanderMeer's Southern Reach trilogy, we join the twelfth expedition.
The group is made up of four women: an anthropologist; a surveyor; a psychologist, the de facto leader; and our narrator, a biologist. Their mission is to map the terrain, record all observations of their surroundings and of one anotioner, and, above all, avoid being contaminated by Area X itself.
They arrive expecting the unexpected, and Area X delivers—they discover a massive topographic anomaly and life forms that surpass understanding—but it’s the surprises that came across the border with them and the secrets the expedition members are keeping from one another that change everything."
book 2= "From Wall Street to Main Street, John Brooks, longtime contributor to the New Yorker, brings to life in vivid fashion twelve classic and timeless tales of corporate and financial life in America
What do the $350 million Ford Motor Company disaster known as the Edsel, the fast and incredible rise of Xerox, and the unbelievable scandals at GE and Texas Gulf Sulphur have in common? Each is an example of how an iconic company was defined by a particular moment of fame or notoriety; these notable and fascinating accounts are as relevant today to understanding the intricacies of corporate life as they were when the events happened.
Stories about Wall Street are infused with drama and adventure and reveal the machinations and volatile nature of the world of finance. John Brooks’s insightful reportage is so full of personality and critical detail that whether he is looking at the astounding market crash of 1962, the collapse of a well-known brokerage firm, or the bold attempt by American bankers to save the British pound, one gets the sense that history repeats itself.
Five additional stories on equally fascinating subjects round out this wonderful collection that will both entertain and inform readers . . . Business Adventures is truly financial journalism at its liveliest and best."
genre 1 = "Science fiction is a genre of fiction dealing with imaginative content such as futuristic settings, futuristic science and technology, space travel, time travel, faster than light travel, parallel universes, and extraterrestrial life. It often explores the potential consequences of scientific and other innovations, and has been called a "literature of ideas".[1] Authors commonly use science fiction as a framework to explore politics, identity, desire, morality, social structure, and other literary themes."
How can I get a similarity score for the description of each book against the description of the science fiction genre like pg_trgm using an R script?
How about something like this?
library(textcat)
?textcat_xdist
# Compute cross-distances between collections of n-gram profiles.
round(textcat_xdist(
list(
text1="hello there",
text2="why hello there",
text3="totally different"
),
method="cosine"),
3)
# text1 text2 text3
#text1 0.000 0.078 0.731
#text2 0.078 0.000 0.739
#text3 0.731 0.739 0.000

Text Line Count in R [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I'm doing some basic text analysis in R and want to count the number of lines for a transcript from a .txt file that I load into R. With the example below to yield a count in which each speaker gets a new line attached to the linecount such that Mr. Smith = 4, Mr. Gordon = 6, Mr. Catalano = 3.
[71] "\"511\"\t\"MR Smith: Mr Speaker, I like the spirit in which we are agreeing on this. The administration of FUFA is present here. FUFA could be used as a conduit, but the intention of what hon. Beti Kamya brought up and what hon. Rose Namayanja has said was okufuwa - just giving a token of appreciation to the players who achieved this.\""
[72] "\"513\"\t\"MR Gordon: Thank you very much, Mr Speaker. FUFA is an organisation and the players are the ones who got the cup for us. To promote motivation in all activities, not only football, you should remunerate people who have done well. In this case, we have heard about FUFA with their problems. They have not paid water bills and they can take this money to pay the water bills. If we agree that this money is supposed to go to the players and the coaches, then when it goes there they would know the amount and they will sit among themselves and distribute according to what we will have given. (Applause) I thank you.\""
[73] "\"515\"\t\"MR Catalano: Mr Speaker, I want to give information to my dear colleagues. The spirit is very good but you must be mindful that the administration of FUFA is what has made this happen. The money to the players. That indicates to you that FUFA is very trustworthy. This is not the old FUFA we are talking about.\""
The function countLine() doesn't work since it requires a connection - these are just .txt imported into R. I realize that the line count is subject to the formatting of what the text is opened up in, but any general help on if this is feasible would help. Thanks.
I didn't think your example was reproducible, so I edited it to get it to contain what you posted but I do not know if the names will match:
txtvec <- structure(list(`'511' ` = "MR Smith: Mr Speaker, I like the spirit in which we are agreeing on this. The administration of FUFA is present here. FUFA could be used as a conduit, but the intention of what hon. Beti Kamya brought up and what hon. Rose Namayanja has said was okufuwa - just giving a token of appreciation to the players who achieved this.\"",
`'513' ` = "MR Gordon: Thank you very much, Mr Speaker. FUFA is an organisation and the players are the ones who got the cup for us. To promote motivation in all activities, not only football, you should remunerate people who have done well. In this case, we have heard about FUFA with their problems. They have not paid water bills and they can take this money to pay the water bills. If we agree that this money is supposed to go to the players and the coaches, then when it goes there they would know the amount and they will sit among themselves and distribute according to what we will have given. (Applause) I thank you.\"",
`'515' ` = "MR Catalano: Mr Speaker, I want to give information to my dear colleagues. The spirit is very good but you must be mindful that the administration of FUFA is what has made this happen. The money to the players. That indicates to you that FUFA is very trustworthy. This is not the old FUFA we are talking about.\""), .Names = c("'511'\t",
"'513'\t", "'515'\t"))
So it's only a matter or running a regex expression across it and tabling the results:
> table( sapply(txtvec, function(x) sub("(^MR.+)\\:.+", "\\1", x) ) )
#MR Catalano MR Gordon MR Smith
1 1 1
There was concern expressed that the names were not in the original structure. This is another version with unnamed vector and a slightly modified regex:
txtvec <- c("\"511\"\t\"\nMR Smith: Mr Speaker, I like the spirit in which we are agreeing on this. The administration of FUFA is present here. FUFA could be used as a conduit, but the intention of what hon. Beti Kamya brought up and what hon. Rose Namayanja has said was okufuwa - just giving a token of appreciation to the players who achieved this.\"",
"\"513\"\t\"\nMR Gordon: Thank you very much, Mr Speaker. FUFA is an organisation and the players are the ones who got the cup for us. To promote motivation in all activities, not only football, you should remunerate people who have done well. In this case, we have heard about FUFA with their problems. They have not paid water bills and they can take this money to pay the water bills. If we agree that this money is supposed to go to the players and the coaches, then when it goes there they would know the amount and they will sit among themselves and distribute according to what we will have given. (Applause) I thank you.\"",
"\"515\"\t\"\nMR Catalano: Mr Speaker, I want to give information to my dear colleagues. The spirit is very good but you must be mindful that the administration of FUFA is what has made this happen. The money to the players. That indicates to you that FUFA is very trustworthy. This is not the old FUFA we are talking about.\""
)
table( sapply(txtvec, function(x) sub(".+\\n(MR.+)\\:.+", "\\1", x) ) )
#MR Catalano MR Gordon MR Smith
# 1 1 1
To count the number of "lines" these would occupy on a wrapping device with 80 characters per line you could use this code (which could easily be converted to a function):
sapply(txtvec, function(tt) 1+nchar(tt) %/% 80)
#[1] 5 8 4
This is raised in the comments, but it really bares being it's own answer:
You cannot "count lines" without defining what a "line" is. A line is a very vague concept and can vary by the program being used.
Unless of course the data contains some indicator of a line break, such as \n. But even then, you would not be counting lines, you would be counting linebreaks. You would then have to ask yourself if the hardcoded line break is in accord with what you are hoping to analyze.
--
If your data does not contain linebreaks, but you still want to count the number of lines, then we're back to the question of "how do you define a line"? The most basic way, is as #flodel suggests, which is to use character length. For example, you can define a line as 76 characters long, and then take
ceiling(nchar(X) / 76))
This of course assumes that you can cut words. (If you need words to remain whole, then you have to get craftier)

Resources