Finding 'first occurrence' using Match Function in R - r

I am new to 'R' and 'Stackoverflow' so forgive me for the incredibly basic question. I'm trying to find the 'index' of the first female in my dataset.
Code Snapshot
My overall dataset is called 'bike', so first I thought it would be a good idea to assign a new vector of just the genders...
bike$genders
Then I tried using the function:
match(1, genders)
match(F, genders)
Neither of which worked! I know this is and should be relatively simple but I'm just starting out so I really appreciate your help.

Probably the most direct method would be to use
match("F", bike[,"genders"] which will return the index of the first match.

If you want to know the rows#, this should give you the rows, with their numbers printed to the screen, and you will see the index for rows with it.
bike[bike$gender=="F",]
and if you only want the row numbers to set to a vector
rnam<-row.names(bike[bike$gender=="F",])

Related

For and if loop in R

I am trying to get the following done: I have two columns (lets say codeA and codeB) in a dataframe A and want to compare these characters to a column (codeC) of another dataframe B. The codeA and codeB are the same in most cases, if they are not the same, the code (A/B) that matches codeC should be written in a new column.
So far I did not manage to achieve this result in combining if and for loops in R. Can someone help me?
Gretly appreciated!
I tried to code it using if and for loop but did not get the result needed.

Combine lapply and gsub to replace a list of values for another list of values

I am currently looking for a way to simplify searching through a column within a dataframe for a vector of values and replacing each of of those values with another value (also contained within a separate vector). I can run a for loop for this, but it must be possible within the apply family, I'm just not seeing it yet. Very new to using the apply family and could use help.
So far, I've been able to have it replace all instances of the first value in my vector with the new first value in the new vector, it just isn't iterating past the first level. I hope this makes sense. Here is the code I have:
#standardize tank location
old_tank_list <- c("7.C.4","7.C.5","7.C.6","7.C.7","7.C.8","7.C.9","7.C.10","7.C.11")
new_tank_list <- c("7.B.3-4","7.C.3-4","7.C.1-2","7.C.5-6","7.C.7-8","7.C.9-10","7.E.9-10","7.C.11-12")
sapply(df_growth$Tank,function(y) gsub(old_tank_list,std_tank_list,y))
Tank is the name of the column I am trying to replace all of these values within. I haven't assigned it back yet, because I want to test the functionality first. Thanks for any help you can offer.
Hopefully, this image will help. The photo on the left is the column before my function is applied. The column on the right is after. Basically, I just want to batch change text values.
Before and After
library(dplyr)
df %>%
mutate(Tank = recode(Tank, !!!setNames(new_tank_list, old_tank_list)))

Assign a Value based on the numbers in a separate columns in R

So I kind of already know the possible solution but I don't know how to exactly go about it so please give me a bit of grace here.
I have a dataset for youtube trends that I want to read the values from two columns (likes and dislikes) and based off their contents I want an entry to be made in the new column. If the likes are higher than the dislikes I want it to be said as a 'positive' video and if it has more dislikes it should be 'negative'.
I'm primarily not sure how to go about this since most of the previous asks are based off of one column rather than two. I know some mentioned using cut, but would it still work the same?
all help is appreciated, thanks.
You can use a simple ifelse :
df$new_col <- ifelse(df$likes > df$dislikes, 'positive', 'negative')
This can also be written without ifelse as :
df$new_col <- c('negative', 'positive')[as.integer(df$likes > df$dislikes) + 1]
You can use Vectorize to create a vectorized version of a function. vfunc <- Vectorize(func) will allow you to call df$newcol <- vfunc(df$likes, df$dislikes) if your function takes two arguments and then return the result for each row in a vector that's assigned to a new column.

Extracting different vectors from a single column of data (in R)

I have a small problem, which I don't think is too hard, but I couldn't find any answer here (maybe I phrased my research wrong so please excuse me if the question has already been asked!)
I am importing data from an excel sheet which is split in two columns as in the following picture:
Now, I am trying to import all the data in the second column to my R script, but by splitting it into different vectors: one vector for category A, one for category B, etc... by keeping the data points in the order they are in the file (because as it happens, they are in chronological order).
Now, the categories each have a different number of elements, however, they are ordered alphabetically (ie you'll never find an A in the B's, for example). So I guess that makes it easier, but I'm still a novice with R and I don't really know how to proceed without getting really messy with the code and I know there's probably a simple way of doing it.
Does anyone have an idea on how to treat this nicely please? :)
We can use split in base R to return a list of vectors of 'Data' based on the unique values in 'Category'
lst1 <- split(df1$Data, df1$Category)

Using ifelse statement to condense variables

New to R, taking a very accelerated class with very minimal instruction. So I apologize in advance if this is a rookie question.
The assignment I have is to take a specific column that has 21 levels from a dataframe, and condense them into 4 levels, using an if, or ifelse statement. I've tried what feels like hundreds of combinations, but this is the code that seemed most promising:
> b2$LANDFORM=ifelse(b2$LANDFORM=="af","af_type",
ifelse(b2$LANDFORM=="aflb","af_type",
ifelse(b2$LANDFORM=="afub","af_type",
ifelse(b2$LANDFORD=="afwb","af_type",
ifelse(b2$LANDFORM=="afws","af_type",
ifelse(b2$LANDFORM=="bfr","bf_type",
ifelse(b2$LANDFORM=="bfrlb","bf_type",
ifelse(b2$LANDFORM=="bfrwb","bf_type",
ifelse(b2$LANDFORM=="bfrwbws","bf_type",
ifelse(b2$LANDFORM=="bfrws","bf_type",
ifelse(b2$LANDFORM=="lb","lb_type",
ifelse(bs$LANDFORM=="lbaf","lb_type",
ifelse(b2$LANDFORM=="lbub","lb_type",
ifelse(b2$LANDFORM=="lbwb","lb_type","ws_type"))))))))))))))
LANDFORM is a factor, but I tried changing it to a character too, and the code still didn't work.
"ws_type" is the catch all for the remaining variables.
the code runs without errors, but when I check it, all I get is:
> unique(b2$LANDFORM)
[1] NA "af_type"
Am I even on the right path? Any suggestions? Should I bite the bullet and make a new column with substr()? Thanks in advance.
If your new levels are just the first two letters of the old ones followed by _type you can easily achieve what you want through:
#prototype of your column
mycol<-factor(sample(c("aflb","afub","afwb","afws","bfrlb","bfrwb","bfrws","lb","lbwb","lbws","wslb","wsub"), replace=TRUE, size=100))
as.factor(paste(sep="",substr(mycol,1,2),"_type"))
After a great deal of experimenting, I consulted a co-worker, and he was able to simplify a huge amount of this. Basically, I should have made a new column composed of the first two letters of the variables in LANDFORM, and then sample from that new column and replace values in LANDFORM, in order to make the ifelse() statement much shorter. The code is:
> b2$index=as.factor(substring(b2$LANDFORM,1,2))
b2$LANDFORM=ifelse(b2$index=="af","af_type",
ifelse(b2$index=="bf","bf_type",
ifelse(b2$index=="lb","lb_type",
ifelse(b2$index=="wb","wb_type",
ifelse(b2$index=="ws","ws_type","ub_type")))))
b2$LANDFORM=as.factor(b2$LANDFORM)
Thanks to everyone who gave me some guidance!

Resources