I seem to be stuck behind some really simple problem. I just cannot figure it out nor can I find an answer here. I tried searching stackoverflow for almost an hour.
I want to find rows based on one column (direction "backward") and then multiply those rows in another column (amount) with -1 or any number for that matter.
amount direction
1 forward
2 forward
3 forward
4 forward
1 backward
2 backward
3 backward
So that I would get
amount direction
1 forward
2 forward
3 forward
4 forward
-1 backward
-2 backward
-3 backward
I know how to find the rows: df[grep("backward",df$direction),]
or how to multiply in general- df[,1]=df[,1](-1)
but I cannot put it together. I can pull out the ones I need and then multiply and then rbind or cbidn but if I have a really big df with many columns and rows I dont want to start pasting it all together again I just want to change something in one column based on another column.
I managed something like this but it does not want to multiply :
df$amount[df$direction %in% c("backward")] <- ((-1))
df$amount[grep("backward",df$direction)]<-((-1))
always get the same error:
Error: unexpected '' in "df$amount[grep("backward",df$direction)]<-*"
And I'm really sorry if this question exists already somewhere. I did find lots of similar questions but they did not help me out.
Thank you!
so as alexis said the answer is:
df$amount [grep ("backward", df$direction)] <- df$amount [grep ("backward", df$direction)]* (-1)
OR
df$amount [df$direction %in% c("backward")] <- df$amount [df$direction %in% c("backward")]* (-1)
Related
I have a dataset called college, and one of the columns is 'accepted'. There are two values for this column - 1 (which means student was accepted) and 0 (which means student was not accepted). I was to find the accepted student percentage.
I did this...
table(college$accepted)
which gave me the frequency of 1 and 0. (1 = 44,224 and 0 = 75,166). I then manually added those two values together (119,390) and divided the 44,224/119,390. This is fine and gets me the value I was looking for. But I would really like to know how I could do this with R code, since I'm sure there is a way to do it that I just haven't thought of.
Thanks!
Perhaps you can use prop.table like below
prop.table(table(college$accepted))["1"]
If it's a simple 0/1 column then you only need take the column mean.
mean_accepted <- mean(df$accepted)
you could first sum the column, and the count the total number in the column
sum(college$accepted)/length(college$accepted)
To make the code more explicit and describe your intent better, I suggest using a condition to identify the cases that meet your criteria for inclusion. For example:
college$accepted == 1
Then take the average of the logical vector to compute the proportion (between 0 and 1), multiply by 100 to make it a percentage.
100 * mean(college$accepted == 1, na.rm = TRUE)
I wanted to concatenate two columns whereby one column is numeric and the other one is character specifically, quality sign (+/-). Below is example:
test <- data.frame(cbind(c(4,-5,6),c("-","-","-")),stringsAsFactors = F)
test$X1 <- paste0(test$X2,test$X1)
test$X1 <- as.numeric(test$X1)
As we can see the output is introduced by NAs due to coercion.
Can anyone please give a hint to solve this as to put condition during concatenation? Thanks.
The real problem in your code is that for row 2 you get a string like this --4 (note the two minus signs). So there are plenty of options as the comments showed you, yet another one would be
as.numeric(paste0(test$X2, 1)) * as.numeric(test$X1)
# [1] -4 5 -6
Is there a simple way of finding all the combinations of 6 digits using only 0, 1 and 2?
So it starts like 000000 and finishes 222222
I have looked online but all i can find is the formula for finding how many there are but i need a list of all of them
If there is a code in R that will be even better
It is not completely neccessary but if there is a way to create a list where the 1st and 4th digit sum to a maximum of 2, 2nd and 5th digit sum to a maximum of 2 and 3rd and 6th digit sum to a maximum of 2
Thankyou
You can do:
do.call(paste0, expand.grid(rep(list(0:2), 6)))
Adding a rev in there gives a different order that might feel more natural:
do.call(paste0, rev(expand.grid(rep(list(0:2), 6))))
I will only give you a hint for your new (added) question as I am now worried I might be doing your homework. expand.grid returns a data.frame. With a little work on it, you can probably extract the subset of rows that only matter to you.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Maybe I'm thinking too hard on this but I need to create a for loop & if statement to find the highest value in my data set. We also have to write a print statement that prints it out & the day. There's 93 rows & 4 columns in the initial matrix. Column 4 has the needed data. The days are in column 1.
I don't know programming at all. So far this is what I got:
I created a vector out of the column with the data:
only.data <- c(data[,4])
Here's my feeble attempt at a for & if statement:
for (counter in 1:93) {
if (only.data >= data[,4])
print (only.data)
}
How do I get it to spit out the highest value using this method? It prints the max value 93 times and that's not what I want. Do I need to create the only.data vector or can I use the original matrix? I also need to print out the corresponding date next to the highest value.
ps - I know I can use the max function which is much quicker but that's not the assignment.
It seems like you are cheating, thus I won't post a full solution here, but only point you in the right direction
data[,4] is already a vector and there is no reason whatsoever to use c() on it. There is also no reason to save it in a new object only.data, although it potentially can make your loop faster as it won't need to index in each loop.
The idea of a loop is that you will use an index in it (although you don't have to, but there is no real reason not to). Thus, you are specifying the index in for(). Although you specified an index (counter), you haven't used it, thus your loop prints only.data regardless of anything you are doing.
All your if doing is to check if only.data >= only.data in every iteration (which is obviously unnecessary)
To calculate the maximum in a loop is not such an obvious thing, as you comparing a single value in each iteration, thus you''ll need some strategy. For example, you could create a dummy variable which will be compared in each iteration against only.data[counter] to check if it's bigger, and then be replaced in case it's not
To illustrate my last point, consider a toy example
set.seed(1)
only.data <- sample(10,10)
only.data
#[1] 3 4 5 7 2 8 9 6 10 1
You can see that the maximum value is in the 9th position, now we will assign the first value of this vector to a dummy variable and will try to use a for loop in order to find the maximum
dummy <- only.data[1]
dummy
## [1] 3
for (counter in only.data) {
if (counter > dummy) dummy <- counter
}
dummy
## [1] 10
I have a dataframe, and I want to confirm that two columns match for each entry. So I tried:
> nrow(subset(df, col.a!=col.b))
[1] 0
That seemed good to me, but then I tried to compare how many matches there were to the total number of entries in the data frame. It seems like these numbers should be equal but they are not:
nrow(subset(df, col.a==col.b))
[1] 3443
nrow(df)
[1] 3453
Any idea what is going on here? Why does it looked like the subset dropped 10 entries? Thanks so much for your help.
Also, I'm fairly new to this, so please let me know if there is a better way of checking if the two columns match.
subset automatically drops rows where the criterion is NA. It should always (?) be the case that
nrow(d)
and
nrow(subset(d, col.a!=col.b))+
nrow(subset(d, col.a==col.b))+
nrow(subset(d, is.na(col.a) | is.na(col.b)))
should be equal.