R | update column in dataframe based on conditions in other dataframe - r

I am having trouble finding a way to cleaning update the amount column in table 1 with the price column in table 2. I know that left_join and merge could be used to join the price column, rename it, and then drop it, but I am wondering if there is simpler way to avoid creating a mess.
I should state that the real dataset is more complicated and that the amount column in table 1 needs to be conditionally updated somehow based on table 2.
Table 1
Fruit
Vegetable
amount
apple
broccoli
pear
spinach
pineapple
carrot
Table 2
Fruit
Vegetable
price
apple
broccoli
10
pear
spinach
5
pineapple
carrot
2

If you don't want to use merge and update process you can use match.
table1$amount <- table2$price[match(paste(table1$Fruit, table1$Vegetable),
paste(table2$Fruit, table2$Vegetable))]

Related

Is there an easy way of text searching using lookup tables in R? (Version 2 - multiple word searching)

I've previously asked a very similar question which was superbly answered but I have since slightly changed the search terms to multiple words so I am posting a fresh question with updated code/example.
I have a use case where I have lots of 'lookup tables', i.e. dataframes containing strings I am searching for in rows within a large second dataframe. I need to extract rows where a string exists within the dataframe but there may be other strings in the dataframe. I also need to extract the whole row and that of the lookup table when a match is found.
I've successfully achieved what I need via a nested for loop, but my actual dataset is massive and the lookup table will be circa 50,000 rows. So a for loop is going to be very inefficient. I have had success using dplyr::semi_join but that only works when the entries match exactly, whereas I am searching for a single word in a longer string:
fruit_lookup <- data.frame(fruit=c("banana drop","apple juice","pear","plum"), rating=c(3,4,3,5))
products <- data.frame(product_code=c("535A","535B","283G","786X","765G"), product_name=c("banana drop syrup","apple juice concentrate","melon juice","coconut oil","strawberry jelly"))
results <- data.frame(product_code=NA, product_name=NA, fruit=NA, rating=NA)
for(i in 1:nrow(products)) {
for(j in 1:nrow(fruit_lookup)){
if(stringr::str_detect(products$product_name[i], fruit_lookup$fruit[j])) {
results <- tibble::add_row(results)
results$product_code[i] <- products$product_code[i]
results$product_name[i] <- products$product_name[i]
results$fruit[i] <- fruit_lookup$fruit[j]
results$rating[i] <- fruit_lookup$rating[j]
break
}
}
}
results <- stats::na.omit(results)
print(results)
This yields the result I am wanting:
product_code product_name fruit rating
535A banana drop syrup banana drop 3
535B apple juice concentrate apple juice 4
Any advice gratefully received and I won't be hurt if I have missed something obvious. Please feel free to critique my other coding practices, which may not be ideal!
This seems like a regex-join. Up-front, I'm not certain how well this scales with any of the offerings:
fuzzyjoin::regex_inner_join(products, fruit_lookup, by = c("product_name" = "fruit"))
# product_code product_name fruit rating
# 1 535A banana drop syrup banana drop 3
# 2 535B apple juice concentrate apple juice 4
Similarly, sqldf:
sqldf::sqldf("
select p.*, f.*
from fruit_lookup f
inner join products p on p.product_name like '%'||f.fruit||'%'
")

Making new data frame in R

Say, I have a data table with three columns: apples, orange, and age.
What code I can write in R to make the other one with upper case: FRUITS, AGE, USE
apple
orange
age
FRUITS
AGE
USE
3
2
1-3
-
-
apple
1-3
3
4
5
4-6
-
-
apple
4-6
4
8
9
7-9
-
-
apple
7-9
8
-
-
orange
1-3
2
-
-
orange
4-6
5
-
-
orange
7-9
9
This is an example so I gives fewer values, but let's say my data have 30 rows like that. I do not want to manually add each rows into a new data frame. how can I turn the apples and oranges into FRUITS and make a column use?
I think using the pivot_longer() tidy function moodymudskipper suggested it would be coded like this
library(tidyr)
new_data <- data %>%
pivot_longer(!age, names_to = "FRUITS", values_to = "USE")%<%
select_all(toupper)
One possible way to solve your problem
library(data.table)
melt(as.data.table(df),
measure=c("apple", "orange"),
variable.name="FRUITS",
value.name="USE")

IF Duplicate more than 1 counted as 1

all
I have list as below :
Apple
Orange
Grape
Grape
Grape
I use formula : =AND((COUNTIF(A1:A,A1)>1),NOT(ISBLANK(A1))).
How do I approach if I want grape also appear in the list?
try perhaps:
=UNIQUE(A1:A)
this will give you all unique values with no duplicates

Trouble reading a messy CSV file in R

I have been trying to read a CSV into R. The CSV is separated in a strange way with all values within one column separated by commas like in this picture. The top row is the column names and then below are the values
When I try read_csv("filename") nothing shows up in the tibble except a bunch of NA values like in this picture after running the view function . How can I approach this?
Here is the data for reference
, Calories, Fat (g), Carb. (g), Fiber (g), Protein (g)
Chonga Bagel,300,5,50,3,12
8-Grain Roll,380,6,70,7,10
Almond Croissant,410,22,45,3,10
Apple Fritter,460,23,56,2,7
Banana Nut Bread,420,22,52,2,6
Blueberry Muffin with Yogurt and Honey,380,16,53,1,6
Blueberry Scone,420,17,61,2,5
Butter Croissant,240,12,28,1,5
Butterfly Cookie,350,22,38,0,2
Cheese Danish,320,16,36,1,8
Chewy Chocolate Cookie,170,5,30,2,2
Chocolate Chip Cookie,310,15,42,2,4
Chocolate Chunk Muffin,440,21,60,2,7
Chocolate Croissant,330,18,38,1,6
Chocolate Hazelnut Croissant,390,22,43,2,7
Chocolate Marble Loaf Cake,490,24,64,2,6
Cinnamon Morning Bun,390,15,56,2,8
Cinnamon Raisin Bagel,270,1,58,3,9
Classic Coffee Cake,390,16,57,1,5
Cookie Butter Bar,360,23,36,0,2
Use the following code to read the data
df = read.csv("starbucks-menu-nutrition-food.csv", skipNul = T)
head(df, 2)
ÿþ Calories Fat..g. Carb...g. Fiber..g. Protein..g.
1 Chonga Bagel 300 5 50 3 12
2 8-Grain Roll 380 6 70 7 10
Then you may consider renaming the columns like for e.g.
colnames(df) <- c("Food", "Calories", "Fat", "Carb", "Fiber", "Protein")
for further processing of the data.

Preparing data for ndtv in R

I want to visualize a dynamic network using the ndtv package in R.
My dataset (data) looks like this:
0 1 Apple Banana
0 1 Peach Banana
0 1 Apple Strawberry
1 2 Apple Banana
1 2 Apple Peach
2 3 Banana Peach
…
So the columns are onset, terminus, tail, head.
If I want to create a networkDynamic object from this list by
nw <- networkDynamic(edge.spells=data)
I get an error saying "the tail column of the edge.spells argument to networkDynamic must be a numeric vertex id". So I guess I need to convert those strings into numeric values. How do I do that? And if I do that, how do I keep the names? I don't want a network that just displays the numeric IDs of those names, I want to see those names in the network.
I couldn't find any useful information by searching the web, and this tutorial doesn't show what I want to do. I would've liked to see how they actually constructed the short.stergm.sim data instead of just using it.
Any help is very much appreciated!
I found a way to map ids to the names.
names <- unique(c(data$head,data$tail))
data$head <- match(data$head,names)
data$tail <- match(data$tail,names)
And then I could create the networkDynamic object
nw <-networkDynamic(edge.spells=data)
and add the names to the network
network.vertex.names(nw) <- names
This post helped me a lot.

Resources