I'm trying to load a dataset into R Studio, where the dataset itself is space-delimited, but it also contains spaces in quoted text like in csv files. Here is the head of the data:
DOC_ID LABEL RATING VERIFIED_PURCHASE PRODUCT_CATEGORY PRODUCT_ID PRODUCT_TITLE REVIEW_TITLE REVIEW_TEXT
1 __label1__ 4 N PC B00008NG7N "Targus PAUK10U Ultra Mini USB Keypad, Black" useful "When least you think so, this product will save the day. Just keep it around just in case you need it for something."
2 __label1__ 4 Y Wireless B00LH0Y3NM Note 3 Battery : Stalion Strength Replacement 3200mAh Li-Ion Battery for Samsung Galaxy Note 3 [24-Month Warranty] with NFC Chip + Google Wallet Capable New era for batteries Lithium batteries are something new introduced in the market there average developing cost is relatively high but Stallion doesn't compromise on quality and provides us with the best at a low cost.<br />There are so many in built technical assistants that act like a sensor in their particular forté. The battery keeps my phone charged up and it works at every voltage and a high voltage is never risked.
3 __label1__ 3 N Baby B000I5UZ1Q "Fisher-Price Papasan Cradle Swing, Starlight" doesn't swing very well. "I purchased this swing for my baby. She is 6 months now and has pretty much out grown it. It is very loud and doesn't swing very well. It is beautiful though. I love the colors and it has a lot of settings, but I don't think it was worth the money."
4 __label1__ 4 N Office Products B003822IRA Casio MS-80B Standard Function Desktop Calculator Great computing! I was looking for an inexpensive desk calcolatur and here it is. It works and does everything I need. Only issue is that it tilts slightly to one side so when I hit any keys it rocks a little bit. Not a big deal.
5 __label1__ 4 N Beauty B00PWSAXAM Shine Whitening - Zero Peroxide Teeth Whitening System - No Sensitivity Only use twice a week "I only use it twice a week and the results are great. I have used other teeth whitening solutions and most of them, for the same results I would have to use it at least three times a week. Will keep using this because of the potency of the solution and also the technique of the trays, it keeps everything in my teeth, in my mouth."
6 __label1__ 3 N Health & Personal Care B00686HNUK Tobacco Pipe Stand - Fold-away Portable - Light Weight - For Single Pipe not sure I'm not sure what this is supposed to be but I would recommend that you do a little more research into the culture of using pipes if you plan on giving this as a gift or using it yourself.
7 __label1__ 4 N Toys B00NUG865W ESPN 2-Piece Table Tennis PING PONG TABLE GREAT FOR YOUTHS AND FAMILY "Pleased with ping pong table. 11 year old and 13 year old having a blast, plus lots of family entertainment too. Plus better than kids sitting on video games all day. A friend put it together. I do believe that was a challenge, but nothing they could not handle"
8 __label1__ 4 Y Beauty B00QUL8VX6 "Abundant Health 25% Vitamin C Serum with Vitamin E and Hyaluronic Acid for Youthful Looking Skin, 1 fl. oz." Great vitamin C serum "Great vitamin C serum... I really like the oil feeling, not too sticky. I used it last week on some of my recent bug bites and it helps heal the skin faster than normal."
9 __label1__ 4 N Health & Personal Care B004YHKVCM PODS Spring Meadow HE Turbo Laundry Detergent Pacs 77-load Tub wonderful detergent. "I've used tide pods laundry detergent for many years,its such a great detergent to use having a nice scent and leaver the cloths smelling fresh."
Problem is that it looks tab-delimited but it is not, example would be DOC_ID = 1, where there are only two spaces between useful and "When least...", this way passing sep = "/t" to read.table throws an error saying that line 1 did not have 10 elements, which for some reason is incorrect, because the number of elements should be 9. Here are the parameters that I'm passing(without the original path):
read.table(file = "path", sep ="\t", header = TRUE, strip.white = TRUE)
Also relying on quotes is not a good strategy, because some lines do not have their text quoted, so the delimiter should be something like a double space, which combined with strip.white should work properly, but read.table only accepts single byte delimiters.
So the question is how would you parse such corpus in R or with any other third party software that could convert it adequately to a csv or atleast a tab-delimited file?
Parsing the data using python pandas.read_csv(filename, sep='\t', header = 0, ...) seems to have parsed the data successfully and from this point anything could be done with it. Closing this out.
Related
When read.table encounters an emoji in text data, it inserts several EOL prematurely before continuing a new line beginning with data on the same line it was interrupted on.
Tried permutations of parameters on read.table, read.delim
myData <- read.table("myData.tsv", sep = '\t', encoding = "UTF-16", skipNul = TRUE, fill = TRUE, header = TRUE, skip = 3, quote = "", stringsAsFactors = FALSE)
Replicated using this dataset:
StartDate Q15.5 Q16.5 gc response_order
Start Date Which of these statements best reflect how you feel about [Brand]? [Brand] is _____. "In your own words, why do you feel that [Brand] is [QID32-ChoiceGroup-SelectedChoices]?" gc response_order
"{""ImportId"":""startDate"",""timeZone"":""America/Denver""}" "{""ImportId"":""QID32""}" "{""ImportId"":""QID33_TEXT""}" "{""ImportId"":""gc""}" "{""ImportId"":""response_order""}"
4/4/2019 9:39 Holding its ground i dont really hear much about it but i would assume its holding its ground 1 reversed
4/4/2019 9:37 Probably on its way up 👨🏾🌾😛🤯👨🏾🌾🤯😄🤯😄🤯 1 reversed
4/4/2019 9:29 Probably on its way up Growing company 1 normal
4/4/2019 9:37 Holding its ground "It is mostly geared towards the younger generation, which is good because it calls to new customers. On the other hand, the older generations are moving on to business that more geared towards us." 1 normal
4/4/2019 9:17 Probably on its way up Its well used and good 1 reversed
4/4/2019 9:41 Probably on its way up Its going good 1 normal
4/4/2019 9:38 Definitely on its way up reasons 1 normal
4/4/2019 9:38 Holding its ground It's beginning to look less like a fly by night outfit and more like a responsible company 1 normal
4/4/2019 9:38 Holding its ground "I feel that the company, while providing a useful service, is not constantly working to innovate and continue building upon the product to match the needs of the customer." 1 reversed
4/4/2019 9:37 Definitely on its way up They are a trustworthy company that constantly stays in tune with the technology of today 1 normal
4/4/2019 9:48 Holding its ground I still hear about it 1 normal
Resulting in:
"X....ImportId.....startDate.....timeZone.....America.Denver....","X....ImportId.....QID32....","X....ImportId.....QID33_TEXT....","X....ImportId.....gc....","X....ImportId.....response_order...."
"4/4/2019 9:39","Holding its ground","i dont really hear much about it but i would assume its holding its ground ",1,"reversed"
"4/4/2019 9:37","Probably on its way up","=ØhÜ<Øþß",NA,""
" <Ø>ß=ØÞ>Ø/Ý=ØhÜ<Øþß","","",NA,""
" <Ø>ß>Ø/Ý=ØÞ>Ø/Ý=ØÞ>Ø/Ý","1","reversed",NA,""
"4/4/2019 9:29","Probably on its way up","Growing company",1,"normal"
"4/4/2019 9:37","Holding its ground","""It is mostly geared towards the younger generation, which is good because it calls to new customers. On the other hand, the older generations are moving on to business that more geared towards us.""",1,"normal"
"4/4/2019 9:17","Probably on its way up","Its well used and good",1,"reversed"
"4/4/2019 9:41","Probably on its way up","Its going good",1,"normal"
"4/4/2019 9:38","Definitely on its way up","reasons",1,"normal"
"4/4/2019 9:38","Holding its ground","It's beginning to look less like a fly by night outfit and more like a responsible company",1,"normal"
"4/4/2019 9:38","Holding its ground","""I feel that the company, while providing a useful service, is not constantly working to innovate and continue building upon the product to match the needs of the customer.""",1,"reversed"
"4/4/2019 9:37","Definitely on its way up","They are a trustworthy company that constantly stays in tune with the technology of today",1,"normal"
"4/4/2019 9:48","Holding its ground","I still hear about it ",1,"normal"
I have recently gained interest over this machine learning topic about image classification.
I am in no way a programmer, but I am a farmer who is very interested about it, and detecting the quality of fruits and vegetables is a very tedious and time consuming task, specially if you don't have the money to buy industrial machinery to perform this task at a small-medium scale.
I recently came across this tutorial (had to fix a lot of errors because it is really bad written, but it works):
https://imaginghub.com/projects/148-how-to-distinguish-apples-and-pears-with-raspberry-pi/documentation
Which basically is the building block of a future fruit/vegetable quality grader.
This conveyor belt is going to have an Arduino that will receive an output from the Python program, that output should activate servos to redirect each fruit/vegetable to its own basket.
Now I would like to know how can I get the label output from the net and transform it to a number for example:
apple = 1, orange = 2, cucumber = 3...
So whenever it's apple, Arduino receives a 1 that will light an LED (first this, then servo), same happens for orange and cucumber and so on.
Here are the 2 codes I believe have to do something with the box label output Deploy.py and yolo_net.py
Deploy.py is the one I run to get the live camera detector:
Deploy.ipynb
And this is yolo_net.py (if it helps somehow to answer my question):
yolo_net.py
All you have to do is modify the yolo_net.py file and go under def draw_detections(self, image, boxes): over to line 63,
it should say
cv2.putText (image, class_name, (left, top - 12), 0, 1e-3 * h, self.colors[class_indx], thick//2)
it is under that line that you should add what you want the arduino to perform.
Please anyone can help me to import angle brackets data into R from a unix executable file. It seems like an XML type so, I tried to use XML parser but it failed.
I have attached sample file.
Thanks in advance.
https://drive.google.com/file/d/0B97ow4h4jwHcRTVtWHdudDJ0c1k/view?usp=sharing
'&' characters exist in elements in your XML document.
One example is below:
<DOC>
<DATE>01/07/2009</DATE>
<AUTHOR>Debce</AUTHOR>
<TEXT>I have owned my MDX for about 1 1/2 yrs & have loved every minute of driving the 24k problem free miles on it! It is so much fun to drive; looks & feels luxurious so no problem pulling up to upscale places! I didn't want to give up space to pop things in the back and go so I keep the third seat down & purchased the rubber mat for the back. I have plenty of room while at the same time I am "zippy"; easily pulling into parking spaces and getting around town. I love the navigation system, although it does need updating and the bluetooth is wonderful, although for some reason it keeps unhooking my Treo phone which the Acura people say is the phone's fault. LOVE IT & would buy it again.</TEXT>
<FAVORITE>Large storage area, hands free phone with the bluetooth & voice recognition is safe. The heaviness of it feels safe and large interior is very comfortable. </FAVORITE>
</DOC>
'&' characters should be escaped.
'>'
'<'
'&'
'%'
characters are special characters which should be escaped in an XML document.
Here is a way of extracting the data into a character matrix.
> require(XML)
> x <- htmlParse("/temp/2007_acura_mdx")
>
> # get the 'DOC'
> docs <- getNodeSet(x, "//doc")
>
> # display one
> docs[[1]]
<doc>
<date>07/31/2009</date>
<author>FlewByU</author>
<text>I just moved to Germany two months ago and bought an 07 MDX from another military member. It has everything I could want. We just returned from a week driving through the Alps and this SUV is simply amazing. Granted, I get to drive it much faster than I could in the states, but even at 120 MPH, it was rock solid. We need the AWD for the snow and the kids stay entertained with the AV system. Plenty of passing power and very comfortable on long trips. Acuras are rare in Germany and I get stares all the time by curious Bavarians wondering what kind of vehicle I have. If you are in the market for a luxury SUV for family touring, with cool tech toys to play with, MDX can't be beat. </text>
<favorite>The separate controls for the rear passengers are awesome. I can control temp and AV from the front or switch to rear. Sound system is amazing. I will sometimes sit in the driveway and just listen. Also has a 120v outlet in console. Great for us since we live with 220v and need 120 on occasion. </favorite>
</doc>
>
> # process docs getting all fields -- need to transpose
> results <- t(sapply(docs, function(x) xmlSApply(x, xmlValue)))
>
> # show head
> head(results)
date author
[1,] "07/31/2009" "FlewByU"
[2,] "07/30/2009" "cvillemdx"
[3,] "06/22/2009" "Pleased"
[4,] "04/13/2009" "wasatch7"
[5,] "04/06/2009" "mnozek"
[6,] "01/07/2009" "Debce"
text
[1,] "I just moved to Germany two months ago and bought an 07 MDX from another military member. It has everything I could want. We just returned from a week driving through the Alps and this SUV is simply amazing. Granted, I get to drive it much faster than I could in the states, but even at 120 MPH, it was rock solid. We need the AWD for the snow and the kids stay entertained with the AV system. Plenty of passing power and very comfortable on long trips. Acuras are rare in Germany and I get stares all the time by curious Bavarians wondering what kind of vehicle I have. If you are in the market for a luxury SUV for family touring, with cool tech toys to play with, MDX can't be beat. "
[2,] "After months of careful research and test drives at BMW, Lexus, Volvo, etc. I settled on the MDX without a doubt in mind. I love the way the car handles, no stiffness or resistance in the steering or acceleration. The interior design is a little Star Trek for me, but once I figured everything out, it is a pleasure to have all the extras (XM radio, navigation, Bluetooth, backup camera, etc.)"
[3,] "I'm two years into a three year lease and I love this car. The only thing I would change would be the shape of the grill...THAT'S IT. Everything else is perfect. Great performance, plenty of power and AWD when skiing, plenty of room for baggage, great MPG for an SUV, navi system is far superior to GM's Suburban (don't have to put in park to change your destination, etc). Zero problems...just gas and oil changes. One beautiful car...except for the sho-gun shield looking grill."
[4,] "First luxury crossover SUV I have owned. MDX won out over the Lexus, and cost less for a very well equipped base package. Handling, power and ride are outstanding. Back seats are a little less comfortable for my tall teenagers. Back cargo area is very roomy, and easily expandable with 3rd seat folded and back seats down. I drive up snowy, often treacherous mountain canyons to ski in the winter. The SH-AWD system, coupled with the manual shift mode (for descents), is outstanding. The MDX is much better in the snow than 3 truck base SUVs, I have owned previously. "
[5,] "This is the first Japanese SUV we have had in a while. Last SUV's were Yukon XL and Envoy XL. This beats them out by far. Performs almost as well as our Mercedes e class but has the utility of our Envoy. We always take this on trips and it is very comfortable. The third row is great for smaller children but not so much for adults. Best SUV so far. No problems within our almost 2 years ownership."
[6,] "I have owned my MDX for about 1 1/2 yrs & have loved every minute of driving the 24k problem free miles on it! It is so much fun to drive; looks & feels luxurious so no problem pulling up to upscale places! I didn't want to give up space to pop things in the back and go so I keep the third seat down & purchased the rubber mat for the back. I have plenty of room while at the same time I am \"zippy\"; easily pulling into parking spaces and getting around town. I love the navigation system, although it does need updating and the bluetooth is wonderful, although for some reason it keeps unhooking my Treo phone which the Acura people say is the phone's fault. LOVE IT & would buy it again."
favorite
[1,] "The separate controls for the rear passengers are awesome. I can control temp and AV from the front or switch to rear. Sound system is amazing. I will sometimes sit in the driveway and just listen. Also has a 120v outlet in console. Great for us since we live with 220v and need 120 on occasion. "
[2,] "The self-adjusting side mirrors which rotate to give you a view of the curb/lines as you back up. Makes backing into parking spaces and parallel parking a breeze, along with the back-up camera. Also a fan of the push-to-talk for my cell phone."
[3,] "Navi is easy, hands-free is great, AWD is perfect."
[4,] "AWD system, exterior styling, cargo room"
[5,] "Navigation, sound system, bluetooth, comfort, acceleration, performance, all wheel drive ability."
[6,] "Large storage area, hands free phone with the bluetooth & voice recognition is safe. The heaviness of it feels safe and large interior is very comfortable. "
>
>
>
I have data file which has angle brackets from http://kavita-ganesan.com/opinosis-opinion-dataset.
<DOCNO>2007_acura_mdx</DOCNO>
<DOC>
<DATE>07/31/2009</DATE>
<AUTHOR>FlewByU</AUTHOR>
<TEXT>I just moved to Germany two months ago and bought an 07 MDX from another military member. It has everything I could want. We just returned from a week driving through the Alps and this SUV is simply amazing. Granted, I get to drive it much faster than I could in the states, but even at 120 MPH, it was rock solid. We need the AWD for the snow and the kids stay entertained with the AV system. Plenty of passing power and very comfortable on long trips. Acuras are rare in Germany and I get stares all the time by curious Bavarians wondering what kind of vehicle I have. If you are in the market for a luxury SUV for family touring, with cool tech toys to play with, MDX can't be beat. </TEXT>
<FAVORITE>The separate controls for the rear passengers are awesome. I can control temp and AV from the front or switch to rear. Sound system is amazing. I will sometimes sit in the driveway and just listen. Also has a 120v outlet in console. Great for us since we live with 220v and need 120 on occasion. </FAVORITE>
</DOC>
<DOC>
<DATE>07/30/2009</DATE>
<AUTHOR>cvillemdx</AUTHOR>
<TEXT>After months of careful research and test drives at BMW, Lexus, Volvo, etc. I settled on the MDX without a doubt in mind. I love the way the car handles, no stiffness or resistance in the steering or acceleration. The interior design is a little Star Trek for me, but once I figured everything out, it is a pleasure to have all the extras (XM radio, navigation, Bluetooth, backup camera, etc.)</TEXT>
<FAVORITE>The self-adjusting side mirrors which rotate to give you a view of the curb/lines as you back up. Makes backing into parking spaces and parallel parking a breeze, along with the back-up camera. Also a fan of the push-to-talk for my cell phone.</FAVORITE>
</DOC>
<DOC>
<DATE>06/22/2009</DATE>
<AUTHOR>Pleased</AUTHOR>
<TEXT>I'm two years into a three year lease and I love this car. The only thing I would change would be the shape of the grill...THAT'S IT. Everything else is perfect. Great performance, plenty of power and AWD when skiing, plenty of room for baggage, great MPG for an SUV, navi system is far superior to GM's Suburban (don't have to put in park to change your destination, etc). Zero problems...just gas and oil changes. One beautiful car...except for the sho-gun shield looking grill.</TEXT>
<FAVORITE>Navi is easy, hands-free is great, AWD is perfect.</FAVORITE>
</DOC>
It seems like an XML file, but when I tried to
xml.url <- "2007_acura_mdx"
xmlfile <- xmlTreeParse(xml.url)
class(xmlfile)
xmltop <- xmlRoot(xmlfile)
topxml <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
xml_df <- data.frame(t(topxml), row.names=NULL)
I had a problem when I executed data.frame. Can anyone help me? At this moment I would like to use grep()`` and gsub() but this is also not easy.
Try this:
txt <- "<DOCNO>2007_acura_mdx</DOCNO>
<DOC>
<DATE>07/31/2009</DATE>
<AUTHOR>FlewByU</AUTHOR>
<TEXT>I just moved to Germany two months ago and bought an 07 MDX from another military member. It has everything I could want. We just returned from a week driving through the Alps and this SUV is simply amazing. Granted, I get to drive it much faster than I could in the states, but even at 120 MPH, it was rock solid. We need the AWD for the snow and the kids stay entertained with the AV system. Plenty of passing power and very comfortable on long trips. Acuras are rare in Germany and I get stares all the time by curious Bavarians wondering what kind of vehicle I have. If you are in the market for a luxury SUV for family touring, with cool tech toys to play with, MDX can't be beat. </TEXT>
<FAVORITE>The separate controls for the rear passengers are awesome. I can control temp and AV from the front or switch to rear. Sound system is amazing. I will sometimes sit in the driveway and just listen. Also has a 120v outlet in console. Great for us since we live with 220v and need 120 on occasion. </FAVORITE>
</DOC>
<DOC>
<DATE>07/30/2009</DATE>
<AUTHOR>cvillemdx</AUTHOR>
<TEXT>After months of careful research and test drives at BMW, Lexus, Volvo, etc. I settled on the MDX without a doubt in mind. I love the way the car handles, no stiffness or resistance in the steering or acceleration. The interior design is a little Star Trek for me, but once I figured everything out, it is a pleasure to have all the extras (XM radio, navigation, Bluetooth, backup camera, etc.)</TEXT>
<FAVORITE>The self-adjusting side mirrors which rotate to give you a view of the curb/lines as you back up. Makes backing into parking spaces and parallel parking a breeze, along with the back-up camera. Also a fan of the push-to-talk for my cell phone.</FAVORITE>
</DOC>
<DOC>
<DATE>06/22/2009</DATE>
<AUTHOR>Pleased</AUTHOR>
<TEXT>I'm two years into a three year lease and I love this car. The only thing I would change would be the shape of the grill...THAT'S IT. Everything else is perfect. Great performance, plenty of power and AWD when skiing, plenty of room for baggage, great MPG for an SUV, navi system is far superior to GM's Suburban (don't have to put in park to change your destination, etc). Zero problems...just gas and oil changes. One beautiful car...except for the sho-gun shield looking grill.</TEXT>
<FAVORITE>Navi is easy, hands-free is great, AWD is perfect.</FAVORITE>
</DOC>"
library(XML)
txt2 <- paste("<root>", txt, "</root>")
doc <- xmlTreeParse(txt2, asText = TRUE, useInternalNodes = TRUE)
L <- xpathApply(doc, "//DOC", xmlApply, FUN = xmlValue)
dd <- do.call(rbind, lapply(L, as.data.frame, stringsAsFactors = FALSE))
giving:
> str(dd)
'data.frame': 3 obs. of 4 variables:
$ DATE : chr "07/31/2009" "07/30/2009" "06/22/2009"
$ AUTHOR : chr "FlewByU" "cvillemdx" "Pleased"
$ TEXT : chr "I just moved to Germany two months ago and bought an 07 MDX from another military member. It has everything I could want. We ju"| __truncated__ "After months of careful research and test drives at BMW, Lexus, Volvo, etc. I settled on the MDX without a doubt in mind. I lov"| __truncated__ "I'm two years into a three year lease and I love this car. The only thing I would change would be the shape of the grill...THAT"| __truncated__
$ FAVORITE: chr "The separate controls for the rear passengers are awesome. I can control temp and AV from the front or switch to rear. Sound sy"| __truncated__ "The self-adjusting side mirrors which rotate to give you a view of the curb/lines as you back up. Makes backing into parking sp"| __truncated__ "Navi is easy, hands-free is great, AWD is perfect."
Does anyone know how to replicate the (pg_trgm) postgres trigram similarity score from the similarity(text, text) function in R? I am using the stringdist package and would rather use R to calculate these on a matrix of text strings in a .csv file than run a bunch of postgresql quires.
Running similarity(string1, string2) in postgres give me a number score between 0 and 1.
I tired using the stringdist package to get a score but I think I still need to divide the code below by something.
stringdist(string1, string2, method="qgram",q = 3 )
Is there a way to replicate the pg_trgm score with the stringdist package or another way to do this in R?
An example would be getting the similarity score between the description of a book and the description of a genre like science fiction. For example, if I have two book descriptions and the using the similarity score of
book 1 = "Area X has been cut off from the rest of the continent for decades. Nature has reclaimed the last vestiges of human civilization. The first expedition returned with reports of a pristine, Edenic landscape; the second expedition ended in mass suicide, the third expedition in a hail of gunfire as its members turned on one another. The members of the eleventh expedition returned as shadows of their former selves, and within weeks, all had died of cancer. In Annihilation, the first volume of Jeff VanderMeer's Southern Reach trilogy, we join the twelfth expedition.
The group is made up of four women: an anthropologist; a surveyor; a psychologist, the de facto leader; and our narrator, a biologist. Their mission is to map the terrain, record all observations of their surroundings and of one anotioner, and, above all, avoid being contaminated by Area X itself.
They arrive expecting the unexpected, and Area X delivers—they discover a massive topographic anomaly and life forms that surpass understanding—but it’s the surprises that came across the border with them and the secrets the expedition members are keeping from one another that change everything."
book 2= "From Wall Street to Main Street, John Brooks, longtime contributor to the New Yorker, brings to life in vivid fashion twelve classic and timeless tales of corporate and financial life in America
What do the $350 million Ford Motor Company disaster known as the Edsel, the fast and incredible rise of Xerox, and the unbelievable scandals at GE and Texas Gulf Sulphur have in common? Each is an example of how an iconic company was defined by a particular moment of fame or notoriety; these notable and fascinating accounts are as relevant today to understanding the intricacies of corporate life as they were when the events happened.
Stories about Wall Street are infused with drama and adventure and reveal the machinations and volatile nature of the world of finance. John Brooks’s insightful reportage is so full of personality and critical detail that whether he is looking at the astounding market crash of 1962, the collapse of a well-known brokerage firm, or the bold attempt by American bankers to save the British pound, one gets the sense that history repeats itself.
Five additional stories on equally fascinating subjects round out this wonderful collection that will both entertain and inform readers . . . Business Adventures is truly financial journalism at its liveliest and best."
genre 1 = "Science fiction is a genre of fiction dealing with imaginative content such as futuristic settings, futuristic science and technology, space travel, time travel, faster than light travel, parallel universes, and extraterrestrial life. It often explores the potential consequences of scientific and other innovations, and has been called a "literature of ideas".[1] Authors commonly use science fiction as a framework to explore politics, identity, desire, morality, social structure, and other literary themes."
How can I get a similarity score for the description of each book against the description of the science fiction genre like pg_trgm using an R script?
How about something like this?
library(textcat)
?textcat_xdist
# Compute cross-distances between collections of n-gram profiles.
round(textcat_xdist(
list(
text1="hello there",
text2="why hello there",
text3="totally different"
),
method="cosine"),
3)
# text1 text2 text3
#text1 0.000 0.078 0.731
#text2 0.078 0.000 0.739
#text3 0.731 0.739 0.000