Return matching names instead of binary variables in R - r

I'm new here and diving into R, and I'm encountering a problem while trying to solve a knapsack problem.
For optimization purposes I wrote a dynamic program in R, however, now that I am at the point of returning the items, which I succeeded in, I only get the binary numbers saying whether the item has been selected or not (1 = yes). Like this:
Select
[1] 1 0 0 1
However, now I would like the Select function to return the names of values instead of these binary values. Underneath I created an example of what my problem looks like.
This would be the data and a related data frame.
items <- c("Glasses","gloves","shoes")
grams <- c(4,2,3)
value <- c(100,20,50)
data <- data.frame(items,grams,value)
Now, I created various functions, with the final one clarifying whether a product has been selected by 1 (yes) or 0 (no). Like above. However, I would really like for it to return the related name of the item. Is there a manner to go around this by linking back to the dataframe created?
So that it would say instead of (in case all products are selected)
Select
[1] 1 1 1
Select
[1] Glasses gloves shoes
I believe I would have to create a new function. But as I mentioned, is there a good way to refer back to the data frame to take related values from another column in the data frame in case of a 1 (yes)?
I really hope my question is more clear now and someone can direct me in the right direction.
Best, Berber

Lets say your binary vector is
idx <- [1, 0, 1, 0, 1]
just use,
items[as.logical(idx)]
will give you the name for selected items, and
items[!as.logical(idx)]
will give you name for unselected items

Related

How do I pull the values from multiple columns, conditionally, into a new column?

I am a relatively novice R user, though familiar with dplyr and tidy verse. I still can't seem to figure out how to pull in the actual data from one column if it meets certain condition, into a new column.
Here is what I'm trying to do. Participants have ranked specific practices (n=5) and provided responses to questions that represent their beliefs about these practices. I want to have five new columns that assign their beliefs about the practices to their ranks, rather than the practices.
For example, they have a score for "beliefs about NI" called ni.beliefs, if a participant ranked NI as their first choice, I want the value for ni.beliefs to be pulled into the new column for first.beliefs. The same is true that if a participant put pmii as their first choice practice, their value for pmii.beliefs should be pulled into the first.beliefs column.
So, I need five new columns called: first.beliefs, second.beliefs, third.beliefs, fourth.beliefs, last.beliefs and then I need each of these to have the data pulled in conditionally from the practice specific beliefs (ni.beliefs, dtt.beliefs, pmi.beliefs, sn.beliefs, script.beliefs) dependent on the practice specific ranks (rank assigned of 1-5 for each practice, rank.ni, rank.dtt, rank.pmi, rank.sn, rank.script).
Here is what I have so far but I am stuck and aware that this is not very close. Any help is appreciated!!!
`
Diss$first.beliefs <-ifelse(rank.ni==1, ni.beliefs,
ifelse(rank.dtt==1, dtt.beliefs,
ifelse(rank.pmi==1, pmi.beliefs,
ifelse(rank.sn, sn.beliefs,
ifelse(rank.script==1, script.beliefs)))))
`
Thank you!!
I'm not sure if I understood correctly (it would help if you show how your data looks like), but this is what I'm thinking:
Without using additional packages, if the ranking columns are equivalent to the index of the new columns you want (i.e. they rank each practice from 1 to 5, without repeats, and in the same order as the new columns "firsts belief, second belief, etc"), then you can use that data as the indices for the second set of columns:
for(j in 1:nrow(people_table)){
people_table[j,]$first.belief[[1]] <- names(beliefs)[(people_table[j,c(A:B)]) %in% 1]
people_table[j,]$second.belief[[1]] <- names(beliefs)[(people_table[j,c(A:B)]) %in% 2]
...
}
Where
A -> index of the first preference rank column
B -> index of the last preference rank column
(people_table[j,c(A:B)] %in% 1) -> this returns something like (FALSE FALSE TRUE FALSE FALSE)
beliefs -> vector with the names of each belief
That should work. It's simple, no need for packages, and it'll be fast too. Just make sure you've initialized/created the new columns first, otherwise you'll get some errors. If
This is done very easily with the case_when() function. You can improve on the code below.
library(dplyr)
Diss$first.beliefs <- case_when(
rank.ni == 1 ~ ni.beliefs,
rank.dtt == 1 ~ dtt.beliefs,
rank.pmi == 1 ~ pmi.beliefs,
rank.sn ~ sn.beliefs,
rank.script == 1 ~ script.beliefs
)

Looping through a dataset in R and count occurences of variables

I found an interesting data-set from a psychology study (data-set is called WearingTShirt), and I would like to replicate the results. I would need to summarize two variables into a single variable. This is what I have written:
Create empty variable
PinkAndRed = 0
Count instances of people wearing both pink and red and add 1
for i in WearingTShirt:
PinkAndRed+1 if:
WearingTShirt$PINKSHIRT==1 OR WearingTShirt$REDSHIRT==1
Add variable to dataset
WearingTShirt$PinkAndRed
I have not much R experience (I wrote mostly in Python).
Your code is more in python than in R. The equivalent code in R for what you want to do is:
PinkAndRed = rep(0,dim(WearingTShirt)[1])
for(i in 1:dim(WearingTShirt)[1]){
if((WearingTShirt$PINKSHIRT[i]==1) || (WearingTShirt$REDSHIRT[i]==1))
{
PinkAndRed[i] = 1
}
}
WearingTShirt=cbind(WearingTShirt,PinkAndRed)
You need to review basics on R. There are countless small difference between R and python, such as parenthesis in loops or conditions, set the length of a loop (in the above code with dim you calculate the dimension of the dataset and by doing [1] you indicate that you want the number of rows)...
Update:
thanks to the comments i've realized that is not clear if you want a cumulative sum of the individuals with pink and red shirts or a variable which is 1 with the shirt is pink or red, and 0 in other case.
The code above is for a varaible that includes pink and red shirts in one variable.
If you want the sum you must use cumsum function as it's said in the comments
I would not choose to loop, but:
WearingTShirt$PinkAndRed <- ifelse(WearingTShirt$PINKSHIRT==1 |
WearingTShirt$REDSHIRT==1,1,0)
PinkAndRed sounds more like PinkOrRed based on example given.

Create variables from list in R

I am trying to create variables based on a list I have.
The list looks something like this:
color = c("blue","green","yellow")
Each string in the list will become a variable. The variable should take the values based on another column (for example, usercolorlist). Here is the pseudocode:
For each row in usercolorlist
if usercolorlist contains blue
then blue = 1
else 0
Ultimately, the output would be:
usercolorlist blue
"blue/red/green" 1
"red/green" 0
"blue/red" 1
I want to implement this as cleanly as possible. I mainly use python and have been told that for loops are not as efficient in R.

R commands for finding mode in R seem to be wrong

I watched video on YouTube re finding mode in R from list of numerics. When I enter commands they do not work. R does not even give an error message. The vector is
X <- c(1,2,2,2,3,4,5,6,7,8,9)
Then instructor says use
temp <- table(as.vector(x))
to basically sort all unique values in list. R should give me from this command 1,2,3,4,5,6,7,8,9 but nothing happens except when the instructor does it this list is given. Then he says to use command,
names(temp)[temp--max(temp)]
which basically should give me this: 1,3,1,1,1,1,1,1,1 where 3 shows that the mode is 2 because it is repeated 3 times in list. I would like to stay with these commands as far as is possible as the instructor explains them in detail. Am I doing a typo or something?
You're kind of confused.
X <- c(1,2,2,2,3,4,5,6,7,8,9) ## define vector
temp <- table(as.vector(X))
to basically sort all unique values in list.
That's not exactly what this command does (sort(unique(X)) would give a sorted vector of the unique values; note that in R, lists and vectors are different kinds of objects, it's best not to use the words interchangeably). What table() does is to count the number of instances of each unique value (in sorted order); also, as.vector() is redundant.
R should give me from this command 1,2,3,4,5,6,7,8,9 but nothing happens except when the instructor does it this list is given.
If you assign results to a variable, R doesn't print anything. If you want to see the value of a variable, type the variable's name by itself:
temp
you should see
1 2 3 4 5 6 7 8 9
1 3 1 1 1 1 1 1 1
the first row is the labels (unique values), the second is the counts.
Then he says to use command, names(temp)[temp--max(temp)] which basically should give me this: 1,3,1,1,1,1,1,1,1 where 3 shows that the mode is 2 because it is repeated 3 times in list.
No. You already have the sequence of counts stored in temp. You should have typed
names(temp)[temp==max(temp)]
(note =, not -) which should print
[1] "2"
i.e., this is the mode. The logic here is that temp==max(temp) gives you a logical vector (a vector of TRUE and FALSE values) that's only TRUE for the elements of temp that are equal to the maximum value; names(temp)[temp==max(temp)] selects the elements of the names vector (the first row shown in the printout of temp above) that correspond to TRUE values ...

for loop in R using if & print [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Maybe I'm thinking too hard on this but I need to create a for loop & if statement to find the highest value in my data set. We also have to write a print statement that prints it out & the day. There's 93 rows & 4 columns in the initial matrix. Column 4 has the needed data. The days are in column 1.
I don't know programming at all. So far this is what I got:
I created a vector out of the column with the data:
only.data <- c(data[,4])
Here's my feeble attempt at a for & if statement:
for (counter in 1:93) {
if (only.data >= data[,4])
print (only.data)
}
How do I get it to spit out the highest value using this method? It prints the max value 93 times and that's not what I want. Do I need to create the only.data vector or can I use the original matrix? I also need to print out the corresponding date next to the highest value.
ps - I know I can use the max function which is much quicker but that's not the assignment.
It seems like you are cheating, thus I won't post a full solution here, but only point you in the right direction
data[,4] is already a vector and there is no reason whatsoever to use c() on it. There is also no reason to save it in a new object only.data, although it potentially can make your loop faster as it won't need to index in each loop.
The idea of a loop is that you will use an index in it (although you don't have to, but there is no real reason not to). Thus, you are specifying the index in for(). Although you specified an index (counter), you haven't used it, thus your loop prints only.data regardless of anything you are doing.
All your if doing is to check if only.data >= only.data in every iteration (which is obviously unnecessary)
To calculate the maximum in a loop is not such an obvious thing, as you comparing a single value in each iteration, thus you''ll need some strategy. For example, you could create a dummy variable which will be compared in each iteration against only.data[counter] to check if it's bigger, and then be replaced in case it's not
To illustrate my last point, consider a toy example
set.seed(1)
only.data <- sample(10,10)
only.data
#[1] 3 4 5 7 2 8 9 6 10 1
You can see that the maximum value is in the 9th position, now we will assign the first value of this vector to a dummy variable and will try to use a for loop in order to find the maximum
dummy <- only.data[1]
dummy
## [1] 3
for (counter in only.data) {
if (counter > dummy) dummy <- counter
}
dummy
## [1] 10

Resources