Dynamic Programming Knapsack select only Unique Items in R - r

I have this code that selects multiple items to put in a knapsack from a dataframe. I wanted that it only selects an item from the dataframe only once:-
knapsack_volume<-function(Data, W, Volume, full_K){
Data = Data
# Data must have the colums with names: item, value, weight and volume.
K<-list() # hightest values
K_item<-list() # itens that reach the hightest value
K<-rep(0,W+1) # The position '0'
K_item<-rep('',W+1) # The position '0'
# while(length(Data$item) != 1){
for(w in 1:W){
temp_w<-0
temp_item<-''
temp_value<-0
for(i in 1:dim(Data)[1]){ # each row
wi<-Data$weight[i] # item i
vi<- Data$value[i]
item<-Data$item[i]
volume_i<-Data$volume[i]
if(wi<=w & volume_i <= Volume){
back<- full_K[[Volume-volume_i+1]][w-wi+1]
temp_wi<-vi + back
if(temp_w < temp_wi){
temp_value<-temp_wi
temp_w<-temp_wi
temp_item <- item
}
}
# Data = Data[-i, ]
}
K[[w+1]]<-temp_value
K_item[[w+1]]<-temp_item
}
return(list(K=K,Item=K_item))
}
The DataFrame looks like:-
item value weight volume
A 40 4 8
B 80 8 12
C 20 4 6
D 100 10 14
E 65 8 8
F 60 10 5
G 70 5 12
H 45 5 7
I 60 6 6
J 60 4 8
You may reproduce the dataframe with:-
Data = data.frame(item = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"),
value = c(40, 80, 20, 100, 65, 60, 70, 45, 60, 60), weight = c(4, 8, 4, 10, 8,
10, 5, 5, 6, 4), volume = c(8, 12, 6, 14, 8, 5, 12, 7, 6, 8))
Thanks

How about deleting the item from dataframe once you put it in your knapsack? However you need a guarantee that your knapsack can be completely filled with unique items.

Related

print from specific rows with highest value from multiple columns using R Studio

I have attached excellent image, I want to extract only those column in which Its row should have maximum value comparing other row
First, provide a reproducible version of your data (not a picture):
dput(dta)
structure(list(A = c(45, 20, 9, 6, 6), B = c(23, 34, 7, 10, 5
), C = c(12, 15, 8, 0, 4), D = c(4, 4, 6, 0, 3), E = c(5, 6,
3, 1, 2)), class = "data.frame", row.names = c("BOX_A", "BOX_B",
"BOX_C", "BOX_D", "BOX_E"))
Now find which column is the maximum:
idx <- apply(dta, 1, which.max)
Now display the rows where the maximum is in the first column. This is not what you asked for but it is what your picture shows:
dta[idx==1, ]
# A B C D E
# BOX_A 45 23 12 4 5
# BOX_C 9 7 8 6 3
# BOX_E 6 5 4 3 2

Is there a way to create new columns in R based on manipulations from multiple data frames?

Does anyone know if it is possible to use a variable in one dataframe (in my case the "deploy" dataframe) to create a variable in another dataframe?
For example, I have two dataframes:
df1:
deploy <- data.frame(ID = c("20180101_HH1_1_1", "20180101_HH1_1_2", "20180101_HH1_1_3"),
Site_Depth = c(42, 93, 40), Num_Depth_Bins_Required = c(5, 100, 4),
Percent_Column_in_each_bin = c(20, 10, 25))
df2:
sp.c <- data.frame(species = c("RR", "GS", "GT", "BR", "RS", "BA", "GS", "RS", "SH", "RR"),
ct = c(25, 66, 1, 12, 30, 6, 1, 22, 500, 6),
percent_dist_from_surf = c(11, 15, 33, 68, 71, 100, 2, 65, 5, 42))
I want to create new columns in df2 that assigns each species and count to a bin based on the Percent_Column_in_each_bin for each ID. For example, in 20180101_HH1_1_3 there would be 4 bins that each make up 25% of the column and all species that are within 0-25% of the column (in df2) would be in bin 1 and species within 25-50% of the column would be in depth bin 2, and so on. What I'm imagining this looking like is:
i.want.this <- data.frame(species = c("RR", "GS", "GT", "BR", "RS", "BA", "GS", "RS", "SH", "RR"),
ct = c(25, 66, 1, 12, 30, 6, 1, 22, 500, 6),
percent_dist_from_surf = c(11, 15, 33, 68, 71, 100, 2, 65, 5, 42),
'20180101_HH1_1_1_Bin' = c(1, 1, 2, 4, 4, 5, 1, 4, 1, 3),
'20180101_HH1_1_2_Bin' = c(2, 2, 4, 7, 8, 10, 1, 7, 1, 5),
'20180101_HH1_1_3_Bin' = c(1, 1, 2, 3, 3, 4, 1, 3, 1, 2))
I am pretty new to R and I'm not sure how to make this happen. I need to do this for over 100 IDs (all with different depths, number of depth bins, and percent of the column in each bin) so I was hoping that I don't need to do them all by hand. I have tried mutate in dplyr but I can't get it to pull from two different dataframes. I have also tried ifelse statements, but I would need to run the ifelse statement for each ID individually.
I don't know if what I am trying to do is possible but I appreciate the feedback. Thank you in advance!
Edit: my end goal is to find the max count (max ct) for each species within each bin for each ID. What I've been doing to find this (using the bins generated with suggestions from #Ben) is using dplyr to slice and find the max ID like this:
20180101_HH1_1_1 <- sp.c %>%
group_by(20180101_HH1_1_1, species) %>%
arrange(desc(ct)) %>%
slice(1) %>%
group_by(20180101_HH1_1_1) %>%
mutate(Count_Total_Per_Bin = sum(ct)) %>%
group_by(species, add=TRUE) %>%
mutate(species_percent_of_total_in_bin =
paste0((100*ct/Count_Total_Per_Bin) %>%
mutate(ID= "20180101_HH1_1_1 ") %>%
ungroup()
but I have to do this for over 100 IDs. My desired output would be something like:
end.goal <- data.frame(ID = c(rep("20180101_HH1_1_1", 8)),
species = c("RR", "GS", "SH", "GT", "RR", "BR", "RS", "BA"),
bin = c(1, 1, 1, 2, 3, 4, 4, 5),
Max_count_of_each_species_in_each_bin = c(11, 66, 500, 1, 6, 12, 30, 6),
percent_dist_from_surf = c(11, 15, 5, 33, 42, 68, 71, 100),
percent_each_species_max_in_each_bin = c((11/577)*100, (66/577)*100, (500/577)*100, 100, 100, (12/42)*100, (30/42)*100, 100))
I was thinking that by answering the original question I could get to this but I see now that there's still a lot you have to do to get this for each ID.
Here is another approach, which does not require a loop.
Using sapply you can cut to determine bins for each percent_dist_from_surf value in your deploy dataframe.
res <- sapply(deploy$Percent_Column_in_each_bin, function(x) {
cut(sp.c$percent_dist_from_surf, seq(0, 100, by = x), include.lowest = TRUE, labels = 1:(100/x))
})
colnames(res) <- deploy$ID
cbind(sp.c, res)
Or using purrr:
library(purrr)
cbind(sp.c, imap(setNames(deploy$Percent_Column_in_each_bin, deploy$ID),
~ cut(sp.c$percent_dist_from_surf, seq(0, 100, by = .x), include.lowest = TRUE, labels = 1:(100/.x))
))
Output
species ct percent_dist_from_surf 20180101_HH1_1_1 20180101_HH1_1_2 20180101_HH1_1_3
1 RR 25 11 1 2 1
2 GS 66 15 1 2 1
3 GT 1 33 2 4 2
4 BR 12 68 4 7 3
5 RS 30 71 4 8 3
6 BA 6 100 5 10 4
7 GS 1 2 1 1 1
8 RS 22 65 4 7 3
9 SH 500 5 1 1 1
10 RR 6 42 3 5 2
Edit:
To determine the maximum ct value for each species, site, and bin, put the result of above into a dataframe called res and do the following.
First would put into long form with pivot_longer. Then you can group_by species, site, and bin, and determine the maximum ct for this combination.
library(tidyverse)
res %>%
pivot_longer(cols = starts_with("2018"), names_to = "site", values_to = "bin") %>%
group_by(species, site, bin) %>%
summarise(max_ct = max(ct)) %>%
arrange(site, bin)
Output
# A tibble: 26 x 4
# Groups: species, site [21]
species site bin max_ct
<fct> <chr> <fct> <dbl>
1 GS 20180101_HH1_1_1 1 66
2 RR 20180101_HH1_1_1 1 25
3 SH 20180101_HH1_1_1 1 500
4 GT 20180101_HH1_1_1 2 1
5 RR 20180101_HH1_1_1 3 6
6 BR 20180101_HH1_1_1 4 12
7 RS 20180101_HH1_1_1 4 30
8 BA 20180101_HH1_1_1 5 6
9 GS 20180101_HH1_1_2 1 1
10 SH 20180101_HH1_1_2 1 500
11 GS 20180101_HH1_1_2 2 66
12 RR 20180101_HH1_1_2 2 25
13 GT 20180101_HH1_1_2 4 1
14 RR 20180101_HH1_1_2 5 6
15 BR 20180101_HH1_1_2 7 12
16 RS 20180101_HH1_1_2 7 22
17 RS 20180101_HH1_1_2 8 30
18 BA 20180101_HH1_1_2 10 6
19 GS 20180101_HH1_1_3 1 66
20 RR 20180101_HH1_1_3 1 25
21 SH 20180101_HH1_1_3 1 500
22 GT 20180101_HH1_1_3 2 1
23 RR 20180101_HH1_1_3 2 6
24 BR 20180101_HH1_1_3 3 12
25 RS 20180101_HH1_1_3 3 30
26 BA 20180101_HH1_1_3 4 6
It is helpful to distinguish between the contents of your two dataframes.
df2 appears to contain measurements from some sites
df1 appears to contain parameters by which you want to process/summarise the measurements in df2
Given these different purposes of the two dataframes, your best approach is probably to loop over all the rows of df1 each time adding a column to df2. Something like the following:
max_dist = max(df2$percent_dist_from_surf)
for(ii in 1:nrow(df1)){
# extract parameters
this_ID = df1[[ii,"ID"]]
this_depth = df1[[ii,"Site_Depth"]]
this_bins = df1[[ii,"Num_Depth_Bins_Required"]]
this_percent = df1[[ii,"Percent_Column_in_each_bin"]]
# add column to df2
df2 = df2 %>%
mutate(!!sym(this_ID) := insert_your_calculation_here)
}
The !!sym(this_ID) := part of the code is to allow dynamic naming of your output columns.
And as best I can determine the formula you want for insert_your_calculation_here is ceil(percent_dist_from_surf / max_dist * this_bins)

Why does spread() create a NA-only column?

I'm still an R beginner, so I hope this question is not redundant but I couldn't find a satisfying answer to my problem. Although this Question seems to be very similar, I still wonder whether my observation represents the standard case. Using the funcion tidyr::spread results in an awkward behaviour when I try to spread three unique observations in one column that contain NAs. The result is a tibble with three new columns (as expected) but also with an additional fourth column named "NA" which is completely filled with NAs.
Here is my example dataframe:
test <- data.frame("Country" = c("A", "A", "A", "A", "A", "A", "A", "A"),
"Column1" = c(1, 1, 1, 1, 1, 1, 2, 2),
"Column2" = c(3, 3, 3, 4, 4, 4, 5, 5),
"Column3" = c("B", "M", "F", "B", "M", "F", "B", NA),
"Column4" = c(50, 74, 31, 53, 79, 33, 51, NA))
test1 <- spread(test, key = "Column3", value = "Column4")
test1
Is this normal when my tibble contains missing values? And if so, why? The creation of an additional column being completely filled with missing values as a standard behaviour seems strange to me. Or am I missing something obvious (probably)?
Any help would be much appreciated!
spread is behaving as expected, though the repeated presence of NA as both a column name and as values in the data frames might make the behavior unclear. Let's change the data frame to use a dummy value of 999 in "Column4":
test <- data.frame("Country" = c("A", "A", "A", "A", "A", "A", "A", "A"), "Column1" = c(1, 1, 1, 1, 1, 1, 2, 2), "Column2" = c(3, 3, 3, 4, 4, 4, 5, 5), "Column3" = c("B", "M", "F", "B", "M", "F", "B", 'NA'), "Column4" = c(50, 74, 31, 53, 79, 33, 51, 999))
Country Column1 Column2 Column3 Column4
1 A 1 3 B 50
2 A 1 3 M 74
3 A 1 3 F 31
4 A 1 4 B 53
5 A 1 4 M 79
6 A 1 4 F 33
7 A 2 5 B 51
8 A 2 5 NA 999
And now the spread operation:
test1 <- spread(test, key = "Column3", value = "Column4")
Country Column1 Column2 B F M NA
1 A 1 3 50 31 74 NA
2 A 1 4 53 33 79 NA
3 A 2 5 51 NA NA 999
spread has correctly placed the 999 value in the new "NA" column (again, new column names taken from the old values in "Column3"), and aligned this value with matching values from the original data frame. Because 999 only appears once in the original data frame, it only has 1 matching row in the new data frame, and all other rows in the new "NA" column are therefore filled with NA (again, somewhat confusingly here).

Sorting a column based on the order of another column in R

The R script below creates a data frame a123 with three columns. Column a1 has three variables occurring at different places with corresponding a2 and a3 values.
a1 = c("A", "B", "C", "A", "B", "B", "A", "C", "A", "C", "B")
a2 = c( 10, 8, 11 , 6 , 4 , 7 , 9 , 1 , 3 , 2, 7)
a3 = c( 55, 34, 33, 23, 78, 33, 123, 34, 85, 76, 74)
a123 = data.frame(a1, a2, a3)
My need is that I want a3 column values corresponding to a1 column values to be arranged in ascending order based on the order of a2 values. Also, if common a2 values are encountered, the corresponding a3 column values should be arranged in ascending order. For example, say value "A" in column a1 has following values in a2 and a3,
a2 = c(10, 6, 9, 3)
a3 = c(55, 23, 123, 85)
The values can be like:
a3 = c(123, 23, 85, 55)
Expected Outcome:
a1 = c("A", "B", "C", "A", "B", "B", "A", "C", "A", "C", "B")
a2 = c( 10, 8, 11, 6, 4, 7, 9, 1, 3, 2, 7)
a3 = c( 123, 78, 76, 23, 33, 34, 85, 33, 55, 34, 74)
a123 = data.frame(a1, a2, a3)
Thanks and please help. Note: Please try to avoid loops and conditions as they might slow the computation based on large data.
A solution using dplyr, sort, and rank. I do not fully understand your logic, but this is probably something you are looking for. Notice that I assume the elements in a3 of group A is 123, 55, 85, 23.
library(dplyr)
a123_r <- a123 %>%
group_by(a1) %>%
mutate(a3 = sort(a3, decreasing = TRUE)[rank(-a2, ties.method = "last")]) %>%
ungroup() %>%
as.data.frame()
a123_r
# a1 a2 a3
# 1 A 10 123
# 2 B 8 78
# 3 C 11 76
# 4 A 6 55
# 5 B 4 33
# 6 B 7 34
# 7 A 9 85
# 8 C 1 33
# 9 A 3 23
# 10 C 2 34
# 11 B 7 74

if one observation meet criteria fill other with the same value for a new variable

I have data.frame like this
test <- data.frame(plot = c(1, 1, 2, 2, 3, 3), sort = c(10, 20, 11, 12, 15, 20))
I want to create a new variable callled treat that will be "A" if any sort in the plot is 20. Otherwise it should be B.
The expected output is
data.frame(plot = c(1, 1, 2, 2, 3, 3), sort = c(10, 20, 11, 12, 15, 20), treat = c("A", "A", "B", "B", "A", "A"))
We can use ave and group by plot variable. Check if any sort variable has value as 20 in it and assign the group accordingly
test$treat<-ave(test$sort,test$plot,FUN =function(x) ifelse(any(x ==20),"A","B"))
test
# plot sort treat
#1 1 10 A
#2 1 20 A
#3 2 11 B
#4 2 12 B
#5 3 15 A
#6 3 20 A
Similary with dplyr
library(dplyr)
test %>%
group_by(plot) %>%
mutate(treat = ifelse(any(sort == 20), "A", "B"))

Resources