New to R, taking a very accelerated class with very minimal instruction. So I apologize in advance if this is a rookie question.
The assignment I have is to take a specific column that has 21 levels from a dataframe, and condense them into 4 levels, using an if, or ifelse statement. I've tried what feels like hundreds of combinations, but this is the code that seemed most promising:
> b2$LANDFORM=ifelse(b2$LANDFORM=="af","af_type",
ifelse(b2$LANDFORM=="aflb","af_type",
ifelse(b2$LANDFORM=="afub","af_type",
ifelse(b2$LANDFORD=="afwb","af_type",
ifelse(b2$LANDFORM=="afws","af_type",
ifelse(b2$LANDFORM=="bfr","bf_type",
ifelse(b2$LANDFORM=="bfrlb","bf_type",
ifelse(b2$LANDFORM=="bfrwb","bf_type",
ifelse(b2$LANDFORM=="bfrwbws","bf_type",
ifelse(b2$LANDFORM=="bfrws","bf_type",
ifelse(b2$LANDFORM=="lb","lb_type",
ifelse(bs$LANDFORM=="lbaf","lb_type",
ifelse(b2$LANDFORM=="lbub","lb_type",
ifelse(b2$LANDFORM=="lbwb","lb_type","ws_type"))))))))))))))
LANDFORM is a factor, but I tried changing it to a character too, and the code still didn't work.
"ws_type" is the catch all for the remaining variables.
the code runs without errors, but when I check it, all I get is:
> unique(b2$LANDFORM)
[1] NA "af_type"
Am I even on the right path? Any suggestions? Should I bite the bullet and make a new column with substr()? Thanks in advance.
If your new levels are just the first two letters of the old ones followed by _type you can easily achieve what you want through:
#prototype of your column
mycol<-factor(sample(c("aflb","afub","afwb","afws","bfrlb","bfrwb","bfrws","lb","lbwb","lbws","wslb","wsub"), replace=TRUE, size=100))
as.factor(paste(sep="",substr(mycol,1,2),"_type"))
After a great deal of experimenting, I consulted a co-worker, and he was able to simplify a huge amount of this. Basically, I should have made a new column composed of the first two letters of the variables in LANDFORM, and then sample from that new column and replace values in LANDFORM, in order to make the ifelse() statement much shorter. The code is:
> b2$index=as.factor(substring(b2$LANDFORM,1,2))
b2$LANDFORM=ifelse(b2$index=="af","af_type",
ifelse(b2$index=="bf","bf_type",
ifelse(b2$index=="lb","lb_type",
ifelse(b2$index=="wb","wb_type",
ifelse(b2$index=="ws","ws_type","ub_type")))))
b2$LANDFORM=as.factor(b2$LANDFORM)
Thanks to everyone who gave me some guidance!
Related
how are you?
I have the next problem, that is very weird because the task it is very simple.
I want to filter one of my factor variables in R, but the outcome is an empty dataframe.
So my data frame is called "data_2022", if i execute this code:
sum(data_2022$CANALDEVENTA=="WEB")
The result is 2704800 that is the number of times that this filter is TRUE.
a= data_2022 %>% filter(CANALDEVENTA=="WEB")
This returns an empty data frame.
I know i am not an expert in R, but i have done the last thing a million times and i never had this error before.
Do you have a clue about whats the problem with this?
Sorry i did not make a reproducible example.
Already thank you.
you could use subset function:
a<-subset(data_2022, CANALDEVENTA=="WEB")
using tidyverse, make sure you are using the function from dplyr::filter. filter is looking for a logical expression but probably you apply it to a data.frame. Try this code too:
my_names<-c("WEB")
a<-dplyr::filter(data_2022, CANALDEVENTA %in% my_names)
Hope it works.
I have a small problem, which I don't think is too hard, but I couldn't find any answer here (maybe I phrased my research wrong so please excuse me if the question has already been asked!)
I am importing data from an excel sheet which is split in two columns as in the following picture:
Now, I am trying to import all the data in the second column to my R script, but by splitting it into different vectors: one vector for category A, one for category B, etc... by keeping the data points in the order they are in the file (because as it happens, they are in chronological order).
Now, the categories each have a different number of elements, however, they are ordered alphabetically (ie you'll never find an A in the B's, for example). So I guess that makes it easier, but I'm still a novice with R and I don't really know how to proceed without getting really messy with the code and I know there's probably a simple way of doing it.
Does anyone have an idea on how to treat this nicely please? :)
We can use split in base R to return a list of vectors of 'Data' based on the unique values in 'Category'
lst1 <- split(df1$Data, df1$Category)
My first question here and I am not very experienced, however I hope this question is easy enough to answer since I only want to know if what I describe in the title is possible.
I have multiple dataframes taken from online capacity tests participants did.
For all Items I have response, score, and durationvariables among others.
Now I want to delete rows where all responsevariables are NA. So I can't just use a command to delete rows with where all is NA but there are also to many columns to do it by hand. And I also want to keep the dataframe together while doing it in order to really drop the complete rows, so just extracting all responsevariables doesn't sound like a good option.
However, besides a 3digit number based on the specific items the responsevariablenames are basically the same.
So instead of writing a very long impractical line mentioning all responsevariables and to drop the row if they all contain NA is there a way to not use the full anme of a variable but only use the end of the name for example so R checks the condition for all variables ending that way?
simplified e.g: instead of
newdf <- olddf[!(olddf$item123response != NA & olddf$item131response != NA & etc),]
Can I just do something like newdf <- olddf[!(olddf$xxxresponse != NA),] ?
I tried to google an answer but I didn't know how to frame my question effectively.
Thanks in advance!
Try This
newdf <- olddf[complete.cases(olddf[, grep('response', names(olddf))]), ]
I've got this for loop:
for(i in 1:length(class.data$ID)) {
class.data$FinalExam_GroupMCScore[i]=mc.data$PSYC.260.Exam....2017.3.
[which(mc.data$SIS.User.ID == class.data$FinalExam_MCGroupNumber[i])]
}
To merge two class grade files. Students did a part of their final exam in groups. The problem I'm having is that not everyone opted to do the group portion so they are missing a code for class.data$FinalExam_MCGroupNumber. The for loop gets hung up on these missing values and I can't get past. I suspect I need a an if statement embedded in there but I'm not familiar enough with R yet to write one in.
I've looked at some of the other posts on this and they don't help just because I'm having a tough time seeing how to embed an if or ifelse with a more complicated function following. Any help would be appreciated! I just want it to assign an NA on FinalExam_GroupMCScore to all students with NA on FinalExam_MCGroupNumber and carry on as normal!
Thank you!!
I am new to 'R' and 'Stackoverflow' so forgive me for the incredibly basic question. I'm trying to find the 'index' of the first female in my dataset.
Code Snapshot
My overall dataset is called 'bike', so first I thought it would be a good idea to assign a new vector of just the genders...
bike$genders
Then I tried using the function:
match(1, genders)
match(F, genders)
Neither of which worked! I know this is and should be relatively simple but I'm just starting out so I really appreciate your help.
Probably the most direct method would be to use
match("F", bike[,"genders"] which will return the index of the first match.
If you want to know the rows#, this should give you the rows, with their numbers printed to the screen, and you will see the index for rows with it.
bike[bike$gender=="F",]
and if you only want the row numbers to set to a vector
rnam<-row.names(bike[bike$gender=="F",])