Most efficient way to calculate nutrients - math

I am creating a macronutrient calculator for a project, however I'm not sure on the best way to do the math. The goal is to get a bunch of foods (using grams) to add up to 190 grams of protein, 300 grams of carbs, and 50 grams of fat.
For example, I can use grams of Chicken Breast which contain .3057 grams of protein and .035 grams of fat per gram of chicken breast. Ground Turkey is also an option which contains .1851g protein and .0705g fat per gram. Brown rice contains .229g carbs, .025g protein,and .009g fat per gram. Oats contain .69g carbs, .065g fat, and .13g protein per gram. Natural peanut butter has .5g fat, .25g protein, and .22g carbs per gram.
I'm trying to figure out how to get to the specified nutrient goals without making the program really slow. I'm kind of at a loss right now. Just an idea or concept on how to do this would help me greatly, I don't expect someone else to write this for me. Would this be a better question to ask on Math.StackExchange?

Related

Simulate Flipping French Fries in R

I always had this question since I was a kid:
Suppose you place 100 french fries on to a pan over a stove
For this problem, let's assume that each french fry can only have 2 "states" : "face up" or "face down"
Each french fry needs to be cooked for 1 minute on each side - if a french fry is cooked for more than 1 minute on any side, it is considered as burnt
You place the french fries on the pan and after one minute you shake the pan - some of the french fries get flipped in the air and land on the pan either "face up" or "face down", but some of the french fries never got flipped at all.
After another minute has passed, you shake the pan again.
For the sake of this question, let's assume that each time you shake the pan, each individual french fry has a 50% chance of getting flipped in the air, and the french fries that were flipped in the air have a 50% chance of landing "face down" or "face up".
Here is the question:
After 2 minutes, how many of the 100 french fries are perfectly cooked and how many french fries are burnt?
How many minutes need to pass until all french fries are guaranteed to have been cooked on both sides (even though many of them will be burnt)?
Using R, I tried to write a simulation for this scenario:
original_data = data.frame(id = 1:100, state = "start")
number_fries_selected_in_first_flip = sample(1:100, 1, replace=F)
fries_selected_in_first_flip = sample(1:100, number_fries_selected_in_first_flip, replace=F)
This is where I got stuck - if I could somehow "tag" the french fries that were selected, I could assign a "burnt/perfectly cooked" status to these french fries with 50% probability:
status <- c("perfectly cooked","burnt")
original_data$tagged_fries_status <- sample(status, number_fries_selected_in_first_flip, replace=TRUE, prob=c(0.5, 0.5))
If I could finish the simulation, I could extend the simulation for the second flip, third flip, etc. At the end of the simulation (e.g. after 5 flips), I could make a chart of the number of french fries that are burnt vs perfectly cooked. Then, I could repeat the simulation many times (e.g. 1000 times), and find out the average number of fries burnt/perfectly cooked.
Could someone please show me how to write this simulation?
Thank you!
Make dataframe with 100 fries and cumulative time on first side. Since time on second side is just total time - time on first side, you can apply any test you want to the final "time state".
dfrm <- data.frame(id=1:100, time=0)
set.seed(123) # good practice in simulation
#start
dfrm$time <- sample(c(0,60), 100, repl=TRUE)
# next minute
dfrm$time <- sample(c(0,60), 100, repl=TRUE)
sum(dfrm$time == 60)
# 54
You would have an exponentially decaying proportion of fries with time == to either zero or total time which would be the group which were "not done". Since the simulation is "discrete" there is a time when the probability of an remaining undone would be less that any specified amount but there would be difficulty with a guaranteed time.

Convert results into a dataframe from function

From this results:
library(stm)
labelTopics(gadarianFit, n = 15)
Topic 1 Top Words:
Highest Prob: immigr, illeg, legal, border, will, need, worri, work, countri, mexico, life, better, nation, make, worker
FREX: border, mexico, mexican, need, concern, fine, make, better, worri, nation, deport, worker, will, econom, poor
Lift: cross, racism, happen, other, continu, concern, deport, mexican, build, fine, econom, border, often, societi, amount
Score: immigr, border, need, will, mexico, illeg, mexican, worri, concern, legal, nation, fine, worker, better, also
Topic 2 Top Words:
Highest Prob: job, illeg, tax, pay, american, take, care, welfar, crime, system, secur, social, health, cost, servic
FREX: cost, health, servic, welfar, increas, loss, school, healthcar, job, care, medic, crime, social, violenc, educ
Lift: violenc, expens, opportun, cost, healthcar, loss, increas, gang, servic, medic, health, diseas, terror, school, lose
Score: job, welfar, crime, cost, tax, care, servic, increas, health, pay, school, loss, medic, healthcar, social
Topic 3 Top Words:
Highest Prob: peopl, come, countri, think, get, english, mani, live, citizen, learn, way, becom, speak, work, money
FREX: english, get, come, mani, back, becom, like, think, new, send, right, way, just, live, peopl
Lift: anyth, send, still, just, receiv, deserv, back, new, english, mani, get, busi, year, equal, come
Score: think, peopl, come, get, english, countri, mani, speak, way, send, back, money, becom, learn, live
How is it possible to keep the results from highest propability into a dataframe with number of columns equal to the number of topic and rows equal to the number of words per topic (n = 15)
Example of expected output:
topic1 topic2 topic3
immigr job peopl
illeg illeg come
In the labelTopics object, words are stored under prob. So you could try something like this:
library(stm)
topics <- labelTopics(gadarianFit, n=15)
topics <- data.frame(t(topics$prob))
colnames(topics) <- paste0("topic", 1:ncol(topics))
topics
#> topic1 topic2 topic3
#> 1 immigr job peopl
#> 2 illeg illeg come
#> 3 legal tax countri
#> 4 border pay think
#> 5 will american get
#> 6 need take english
#> 7 worri care mani
#> 8 work welfar live
#> 9 countri crime citizen
#> 10 mexico system learn
#> 11 life secur way
#> 12 better social becom
#> 13 nation health speak
#> 14 make cost work
#> 15 worker servic money
Note that stm offers several ways of selecting the most important words per topic, including "Frex", "Lift". You would simply have to change the prob in my code to use those.
Type this to see them:
topics <- labelTopics(gadarianFit, n=15)
str(topics)

How to have all elements of text file processed in R

I have a single text file, NPFile, that contains 100 different newspaper articles that is 3523 lines in length. I am trying to pick out and parse different data fields for each article for text processing. These fields are: Full text: Publication date:, Publication title: etc....
I am using grep to pick out the different lines that contain the data fields I want. Although I can get the line numbers (start and end positions of the fields), I am getting an error when I try to use the line numbers to extract the actual text and put it into a vector:
#Find full text of article, clean and store in a variable
findft<-grep ('Full text:', NPFile, ignore.case=TRUE)
endft<-grep ('Publication date:', NPFile)
ftfield<-(NPFile[findft:endft])
The last line ftfield<-(NPFile[findft:endft] is giving this warning message:
1: In findft:endft :
numerical expression has 100 elements: only the first used
The starting findft and ending points endft each contain 100 elements, but as the warning indicated, ftfield only contains the first element (which is 11 lines in length). I was assuming (wrongly/mistakenly) that the respective lines for each 100 instances of the full text field would be extracted and stored in ftfield - but obviously I have not coded this correctly. Any help would be appreciated.
Example of Data (These are the fields and data associated with one of the 100 in the text file):
Waiting for the 500-year flood; Red River rampage: Severe weather events, new records are more frequent than expected.
Full text: AS THE RED River raged over makeshift dikes futilely erected against its wrath in North Dakota, drowning cities beneath a column of water 26 feet above flood level, meteorologists were hard pressed to describe its magnitude in human chronology.
A 500-year flood, some call it, a catastrophic weather event that would have occurred only once since Christopher Columbus arrived on the shores of the New World. Whether it could be termed a 700-year flood or a 300-year flood is open to question.
The flood's size and power are unprecedented. While the Red River has ravaged the upper Midwest before, the height of the flood crest in Fargo and Grand Forks has been almost incomprehensible.
But climatological records are being broken more rapidly than ever. A 100-year-storm may as likely repeat within a few years as waiting another century. It is simply a way of classifying severity, not the frequency. "There isn't really a hundred-year event anymore," states climatologist Tom Karl of the National Oceanic and Atmospheric Administration.
Reliable, consistent weather records in the U.S. go back only 150 years or so. Human development has altered the Earth's surface and atmosphere, promoting greater weather changes and effects than an untouched environment would generate by itself.
What might be a 500-year event in the Chesapeake Bay is uncertain. Last year was the record for freshwater gushing into the bay. The January 1996 torrent of melted snowfall into the estuary recorded a daily average that exceeded the flow during Tropical Storm Agnes in 1972, a benchmark for 100-year meteorological events in these parts. But, according to the U.S. Geological Survey, the impact on the bay's ecosystem was not as damaging as in 1972.
Sea level in the Bay has risen nearly a foot in the past century, three times the rate of the past 5,000 years, which University of Maryland scientist Stephen Leatherman ties to global climate warming. Estuarine islands and upland shoreline are eroding at an accelerated pace.
The topography of the bay watershed is, of course, different from that of the Red River. It's not just flow rates and rainfall, but how the water is directed and where it can escape without intruding too far onto dry land. We can only hope that another 500 years really passes before the Chesapeake region is so tested.
Pub Date: 4/22/97
Publication date: Apr 22, 1997
Publication title: The Sun; Baltimore, Md.
Title: Waiting for the 500-year flood; Red River rampage: Severe weather events, new records are more frequent than expected.:   [FINAL Edition ]
From this data example above, ftfield has 11 lines when I examined it:
[1] "Full text: AS THE RED River raged over makeshift dikes futilely erected against its wrath in North Dakota, drowning cities beneath a column of water 26 feet above flood level, meteorologists were hard pressed to describe its magnitude in human chronology."
[2] "A 500-year flood, some call it, a catastrophic weather event that would have occurred only once since Christopher Columbus arrived on the shores of the New World. Whether it could be termed a 700-year flood or a 300-year flood is open to question."
[3] "The flood's size and power are unprecedented. While the Red River has ravaged the upper Midwest before, the height of the flood crest in Fargo and Grand Forks has been almost incomprehensible."
[4] "But climatological records are being broken more rapidly than ever. A 100-year-storm may as likely repeat within a few years as waiting another century. It is simply a way of classifying severity, not the frequency. \"There isn't really a hundred-year event anymore,\" states climatologist Tom Karl of the National Oceanic and Atmospheric Administration."
[5] "Reliable, consistent weather records in the U.S. go back only 150 years or so. Human development has altered the Earth's surface and atmosphere, promoting greater weather changes and effects than an untouched environment would generate by itself."
[6] "What might be a 500-year event in the Chesapeake Bay is uncertain. Last year was the record for freshwater gushing into the bay. The January 1996 torrent of melted snowfall into the estuary recorded a daily average that exceeded the flow during Tropical Storm Agnes in 1972, a benchmark for 100-year meteorological events in these parts. But, according to the U.S. Geological Survey, the impact on the bay's ecosystem was not as damaging as in 1972."
[7] "Sea level in the Bay has risen nearly a foot in the past century, three times the rate of the past 5,000 years, which University of Maryland scientist Stephen Leatherman ties to global climate warming. Estuarine islands and upland shoreline are eroding at an accelerated pace."
[8] "The topography of the bay watershed is, of course, different from that of the Red River. It's not just flow rates and rainfall, but how the water is directed and where it can escape without intruding too far onto dry land. We can only hope that another 500 years really passes before the Chesapeake region is so tested."
[9] "Pub Date: 4/22/97"
[10] ""
[11] "Publication date: Apr 22, 1997"
And, lastly, findft[1] corresponds with endft[1] and so on until findft[100] and endft[100].
I'll assume that findft will contain several indexes as well as endft. I'm also assuming that both of them have the same length and that they are paired by the same index ( e.g. findft[5] corresponds to endft[5]) and that you want all NPfile elements between these two indexes as well as the other pairs.
If this is so, try:
ftfield = lapply(1:length(findft), function(x){ NPFile[findft[x]:endft[x]] })
This will return a list. I can't guarantee that this will work because there is no data example to work with.
We can do this with Map. Get the sequence of values for each corresponding element of 'findft' to 'endft', then subset the 'NPFile' based on that index
Map(function(x, y) NPFile[x:y], findft, endft)

Naive Bayes model NOT predicting anything on applying model- Predict function returning with 0 factor level

My dataset looks like the following, and I followed Classification using Naive Bayes tutorial to develop my Naive bayes model for textmining However, I cannot predict the result of my naive bayes, even though model is built. The predict function is returning with 0 factor level. Below is my dataset and code so far.
**Dataset:**
lie sentiment review
f n 'Mike\'s Pizza High Point NY Service was very slow and the quality was low. You would think they would know at least how to make good pizza not. Stick to pre-made dishes like stuffed pasta or a salad. You should consider dining else where.'
f n 'i really like this buffet restaurant in Marshall street. they have a lot of selection of american japanese and chinese dishes. we also got a free drink and free refill. there are also different kinds of dessert. the staff is very friendly. it is also quite cheap compared with the other restaurant in syracuse area. i will definitely coming back here.'
f n 'After I went shopping with some of my friend we went to DODO restaurant for dinner. I found worm in one of the dishes .'
f n 'Olive Oil Garden was very disappointing. I expect good food and good service (at least!!) when I go out to eat. The meal was cold when we got it and the waitor had no manners whatsoever. Don\'t go to the Olive Oil Garden. '
f n 'The Seven Heaven restaurant was never known for a superior service but what we experienced last week was a disaster. The waiter would not notice us until we asked him 4 times to bring us the menu. The food was not exceptional either. It took them though 2 minutes to bring us a check after they spotted we finished eating and are not ordering more. Well never more. '
f n 'I went to XYZ restaurant and had a terrible experience. I had a YELP Free Appetizer coupon which could be applied upon checking in to the restaurant. The person serving us was very rude and didn\'t acknowledge the coupon. When I asked her about it she rudely replied back saying she had already applied it. Then I inquired about the free salad that they serve. She rudely said that you have to order the main course to get that. Overall I had a bad experience as I had taken my family to that restaurant for the first time and I had high hopes from the restaurant which is otherwise my favorite place to dine. '
f n 'I went to ABC restaurant two days ago and I hated the food and the service. We were kept waiting for over an hour just to get seated and once we ordered our food came out cold. I ordered the pasta and it was terrible - completely bland and very unappatizing. I definitely would not recommend going there especially if you\'re in a hurry!'
f n 'I went to the Chilis on Erie Blvd and had the worst meal of my life. We arrived and waited 5 minutes for a hostess and then were seated by a waiter who was obviously in a terrible mood. We order drinks and it took them 15 minutes to bring us both the wrong beers which were barely cold. Then we order an appetizer and wait 25 minutes for cold southwest egg rolls at which point we just paid and left. Don\'t go.'
f n 'OMG. This restaurant is horrible. The receptionist did not greet us we just stood there and waited for five minutes. The food came late and served not warm. Me and my pet ordered a bowl of salad and a cheese pizza. The salad was not fresh the crust of a pizza was so hard like plastics. My dog didn\'t even eat that pizza. I hate this place!!!!!!!!!!'
dput(df)
> dput(head(lie))
structure(list(lie = c("f", "f", "f", "f", "f", "f"), sentiment = c("n",
"n", "n", "n", "n", "n"), review = c("Mike\\'s Pizza High Point, NY Service was very slow and the quality was low. You would think they would know at least how to make good pizza, not. Stick to pre-made dishes like stuffed pasta or a salad. You should consider dining else where.",
"i really like this buffet restaurant in Marshall street. they have a lot of selection of american, japanese, and chinese dishes. we also got a free drink and free refill. there are also different kinds of dessert. the staff is very friendly. it is also quite cheap compared with the other restaurant in syracuse area. i will definitely coming back here.",
"After I went shopping with some of my friend, we went to DODO restaurant for dinner. I found worm in one of the dishes .",
"Olive Oil Garden was very disappointing. I expect good food and good service (at least!!) when I go out to eat. The meal was cold when we got it, and the waitor had no manners whatsoever. Don\\'t go to the Olive Oil Garden. ",
"The Seven Heaven restaurant was never known for a superior service but what we experienced last week was a disaster. The waiter would not notice us until we asked him 4 times to bring us the menu. The food was not exceptional either. It took them though 2 minutes to bring us a check after they spotted we finished eating and are not ordering more. Well, never more. ",
"I went to XYZ restaurant and had a terrible experience. I had a YELP Free Appetizer coupon which could be applied upon checking in to the restaurant. The person serving us was very rude and didn\\'t acknowledge the coupon. When I asked her about it, she rudely replied back saying she had already applied it. Then I inquired about the free salad that they serve. She rudely said that you have to order the main course to get that. Overall, I had a bad experience as I had taken my family to that restaurant for the first time and I had high hopes from the restaurant which is, otherwise, my favorite place to dine. "
)), .Names = c("lie", "sentiment", "review"), class = c("data.table",
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000000180788>)
R code:
library(gmodels)
lie<- fread('deception.csv',header = T,fill = T,quote = "\'")
str(lie)
lie
#Corpus Building
words.vec<- VectorSource(lie$review)
words.corpus<- Corpus(words.vec)
words.corpus<-tm_map(words.corpus,content_transformer(tolower)) #lower case
words.corpus<-tm_map(words.corpus,removePunctuation) # remove punctuation
words.corpus<-tm_map(words.corpus,removeNumbers) # remove numbers
words.corpus<-tm_map(words.corpus,removeWords,stopwords('english')) # remove stopwords
words.corpus<-tm_map(words.corpus,stripWhitespace) # remove unnecessary whitespace
#==========================================================================
#Document term Matrix
dtm<-DocumentTermMatrix(words.corpus)
dtm
class(dtm)
#dtm_df<-as.data.frame(as.matrix(dtm))
#class(dtm_df)
freq <- colSums(as.matrix(dtm))
length(freq)
ord <- order(freq,decreasing=TRUE)
freq[head(ord)]
freq[tail(ord)]
#===========================================================================
#Data frame partition
#Splitting DTM
dtm_train <- dtm[1:61, ]
dtm_test <- dtm[62:92, ]
train_labels <- lie[1:61, ]$lie
test_labels <-lie[62:92, ]$lie
str(train_labels)
str(test_labels)
prop.table(table(train_labels))
prop.table(table(test_labels))
freq_words <- findFreqTerms(dtm_train, 10)
freq_words
dtm_freq_train<- dtm_train[ , freq_words]
dtm_freq_test <- dtm_test[ , freq_words]
dtm_freq_test
convert_counts <- function(x) {
x <- ifelse(x > 0, 'yes','No')
}
train <- apply(dtm_freq_train, MARGIN = 2, convert_counts)
test <- apply(dtm_freq_test, MARGIN = 2, convert_counts)
str(test)
nb_classifier<-naiveBayes(train,train_labels)
nb_classifier
test_pred<-predict(nb_classifier,test)
Thanks in advance for help,
Naive Bayes requires the response variable as a categorical class variable:
Convert lie column of your lie data-frame to factorand re run analysis:
lie$lie <- as.factor(lie$lie)

unnest_tokens fails to handle vectors in R with tidytext package

I want to use the tidytext package to create a column with 'ngrams'. with the following code:
library(tidytext)
unnest_tokens(tbl = president_tweets,
output = bigrams,
input = text,
token = "ngrams",
n = 2)
But when I run this I get the following error message:
error: unnest_tokens expects all columns of input to be atomic vectors (not lists)
My text column consists of a lot of tweets with rows that look like the following and is of class character.
president_tweets$text <– c("The United States Senate just passed the biggest in history Tax Cut and Reform Bill. Terrible Individual Mandate (ObamaCare)Repealed. Goes to the House tomorrow morning for final vote. If approved, there will be a News Conference at The White House at approximately 1:00 P.M.",
"Congratulations to Paul Ryan, Kevin McCarthy, Kevin Brady, Steve Scalise, Cathy McMorris Rodgers and all great House Republicans who voted in favor of cutting your taxes!",
"A story in the #washingtonpost that I was close to rescinding the nomination of Justice Gorsuch prior to confirmation is FAKE NEWS. I never even wavered and am very proud of him and the job he is doing as a Justice of the U.S. Supreme Court. The unnamed sources dont exist!",
"Stocks and the economy have a long way to go after the Tax Cut Bill is totally understood and appreciated in scope and size. Immediate expensing will have a big impact. Biggest Tax Cuts and Reform EVER passed. Enjoy, and create many beautiful JOBS!",
"DOW RISES 5000 POINTS ON THE YEAR FOR THE FIRST TIME EVER - MAKE AMERICA GREAT AGAIN!",
"70 Record Closes for the Dow so far this year! We have NEVER had 70 Dow Records in a one year period. Wow!"
)
---------Update:----------
It looks like the sentimetr or exploratory package caused the conflict. I reloaded my packages without these and now it works again!
Hmmmmm, I am not able to reproduce your problem.
library(tidytext)
library(dplyr)
president_tweets <- data_frame(text = c("The United States Senate just passed the biggest in history Tax Cut and Reform Bill. Terrible Individual Mandate (ObamaCare)Repealed. Goes to the House tomorrow morning for final vote. If approved, there will be a News Conference at The White House at approximately 1:00 P.M.",
"Congratulations to Paul Ryan, Kevin McCarthy, Kevin Brady, Steve Scalise, Cathy McMorris Rodgers and all great House Republicans who voted in favor of cutting your taxes!",
"A story in the #washingtonpost that I was close to rescinding the nomination of Justice Gorsuch prior to confirmation is FAKE NEWS. I never even wavered and am very proud of him and the job he is doing as a Justice of the U.S. Supreme Court. The unnamed sources dont exist!",
"Stocks and the economy have a long way to go after the Tax Cut Bill is totally understood and appreciated in scope and size. Immediate expensing will have a big impact. Biggest Tax Cuts and Reform EVER passed. Enjoy, and create many beautiful JOBS!",
"DOW RISES 5000 POINTS ON THE YEAR FOR THE FIRST TIME EVER - MAKE AMERICA GREAT AGAIN!",
"70 Record Closes for the Dow so far this year! We have NEVER had 70 Dow Records in a one year period. Wow!"))
unnest_tokens(tbl = president_tweets,
output = bigrams,
input = text,
token = "ngrams",
n = 2)
#> # A tibble: 205 x 1
#> bigrams
#> <chr>
#> 1 the united
#> 2 united states
#> 3 states senate
#> 4 senate just
#> 5 just passed
#> 6 passed the
#> 7 the biggest
#> 8 biggest in
#> 9 in history
#> 10 history tax
#> # ... with 195 more rows
The current CRAN version of tidytext does in fact not allow list-columns but we have changed the column handling so that the development version on GitHub now supports list-columns. Are you sure you don't have any of these in your data frame/tibble? What are the data types of all of your columns? Are any of them of type list?

Resources