Running a while loop and retrieving row names from dataset on R - r

What I'm trying to do right now is to get, from the mtcars dataset on R, the car names that are that have an hp column value of less than 200. The thing is that I want the car names and I don't want to go back and rematch all the numbers with the corresponding car names. How might I go about doing that? Here's what I have so far:
> carvector<-mtcars["hp"]
> while (i<=32)
+ i<-i+1
if (mtcars[i,1]<200)
p<-mtcars[i,1]
Its not quite working and I guess I'm closer to storing the values than the names of the cars. How can I make sure the names are not lost? How do I dispose of the values I don't want?

Related

How to make a histogram of a non numeric column in R?

I have a data frame called mydata with multiple columns, one of which is Benefits, which contains information about samples whether they are CB (complete responding), ICB (Intermediate) or NCB (Non-responding at all).
So basically the Benefit column is a vector with three values:
Benefit <- c("CB" , "ICB" , "NCB")
I want to make a histogram/barplot based on the number of each one of those. So basically it's not a numeric column. I tried solving this by the following code :
hist(as.numeric(metadata$Benefit))
tried also
barplot(metadata$Benefit)
didn't work obviously.
The second thing I want to do is to find a relation between the Age column of the same data frame and the Benefit column, like for example do the younger patients get more benefit ? Is there anyway to do that ?
THANKS!
Hi and welcome to the site :)
One nice way to find issues with code is to only run one command at the time.
# lets create some data
metadata <- data.frame(Benefit = c("ICB", "CB", "CB", "NCB"))
now the command 'as.numeric' does not work on character-data
as.numeric(metadata$Benefit) # returns NA's
Instead what we want is to count the number of instances of each unique value of the column Benefit, we do this with 'table'
tabledata <- table(metadata$Benefit)
and then it is the barplot function we want to create the plot
barplot(tabledata)

Assigning Unnamed Columns To Another DataFrame

I'm in a very basic class that introduces R for genetic purposes. I'm encountering a rather peculiar problem in trying to follow the instructions given. Here is what I have along with the instructor's notes:
MangrovesRaw<-read.csv("C:/Users/esteb/Documents/PopGen/MangrovesSites.csv")
#i'm going to make a new dataframe now, with one column more than the mangrovesraw dataframe but the same number of rows.
View(MangrovesRaw)
Mangroves<-data.frame(matrix(nrow = 528, ncol = 23))
#next I want you to name the first column of Mangroves "pop"
colnames(Mangroves)<-c(col1="pop")
#i'm now assigning all values of that column to be 1
Mangroves$pop<-1
#assign the rest of the columns (2 to 23) to the entirety of the MangrovesRaw dataframe
#then change the names to match the mangroves raw names
colnames(Mangroves)[2:23]<-colnames(MangrovesRaw)
I'm not really sure how to assign columns that haven't been named used the $ as we have in the past. A friend suggested I first run
colnames(Mangroves)[2:23]<-colnames(MangrovesRaw)
Mangroves$X338<-MangrovesRaw
#X338 is the name of the first column from MangrovesRaw
But while this does transfer the data from MangrovesRaw, it comes at the cost of having my column names messed up with X338. added to every subsequent column. In an attempt to modify this I found the following "fix"
colnames(Mangroves)[2:23]<-colnames(MangrovesRaw)
Mangroves$X338<-MangrovesRaw[,2]
#Mangroves$X338<-MangrovesRaw[,2:22]
#MangrovesRaw has 22 columns in total
While this transferred all the data I needed for the X338 Column, it didn't transfer any data for the remaining 21 columns. The code in # just results in the same problem of having X388. show up in all my column names.
What am I doing wrong?
There are a few ways to solve this problem. It may be that your instructor wants it done a certain way, but here's one simple solution: just cbind() the Mangroves$pop column with the real data. Then the data and column names are already added.
Mangroves <- cbind(Mangroves$pop, MangrovesRaw)
Here's another way:
Mangroves[, 2:23] <- MangrovesRaw
colnames(Mangroves)[2:23] <- colnames(MangrovesRaw)

Indexing dataframes in R

good day
I don´t understand a topic here, is like it works but I can´t understand why
I have this database
# planets_df is pre-loaded in your workspace
# Use order() to create positions
positions <- order(planets_df$diameter)
positions
# Use positions to sort planets_df
planets_df[positions,]
I don´t understand why if u take the column diameter, then if u want to order it why u put it in a row of the dataframe like for me it should be [ rows, colum] but u put a column in a row and it changes, I really don´t get that.Why it´s not planets_df[,positions].
The exercise is solved I just don´t get it, is a data camp exercise btw.
Sorry if my English is wrong, it is not my native language.
I believe that I have created an example that matches your description. For the mtcars data set, which is pre-loaded in any R session, we can sort based on the variable mpg.
The function order returns the row indices sorted by mpg in this case. The ordering variable indicates the order that the rows should be presented in by storing the row indices based on mpg.
ordering <- order(mtcars$mpg)
This next step indicates that we want the rows of mtcars as specified by ordering. Essentially ordering is the order of the rows we want and so we pass that object to the row portion the call to mtcars.
mtcars[ordering,]
If we instead passed ordering as the columns, we would be reordering the columns of mtcars instead of the rows.

Removing rows causes "row.names" column to appear when displayed with View()

To remove rows from a data frame, I use the following command:
data <- data[-1, ]
for example to remove the first row. I need to remove the first 6 rows, so I used the following:
data <- data[-c(1,2,3,4,5,6), ]
OR
data <- data[-(1:6), ]
this works as far as removing the row names, but introduced a new column called row.names that I cannot get rid of unless I use the command:
row.names(data) <- NULL
What is the reason for this? Is there a better way of removing a number of rows/columns with one command?
Example:
after the following code:
tquery <- tquery[-(1:6), ]
This is the data:
Although it seems as such, you are not actually adding a column to the data. What you are seeing is just a result of using View(). The function is showing the "row.names" attribute of the data frame as the first column, but you didn't really add the column.
This is expected and documented behavior. From the Details section of help(View)
If there are row names on the data frame that are not 1:nrow, they are displayed in a separate first column called row.names.
So since you subsetted the data, the row names are technically not 1:nrow any more and hence the new column is introduced in the viewer.
Print your data in the console and you'll see the difference.
View(mtcars) ## because the mtcars row names are not 1:nrow
versus
mtcars
Basically, don't trust View() to display an exact representation of the actual data. Instead use attributes(), *names(), dim(), length(), etc. or just peek at the data with head().
See r help via "?row.names" for more info. From the documentation, "All data frames have a row names attribute"
?row.names ## get more information about row.names from r help
row.names is not a new column, but rather an attribute of every single data frame. This is simply meta data and is ignored by most data. When you output this data (i.e. CSV) or use it in a function, this data will not interfere. This is similar to how excel has row numbers on the left margin, which is referential data for the application.
str(your_dataframe) ## see that those columns don't exist
colnames(your_dataframe) ## see column names

Specifying names of columns to be used in a loop R

I have a df with over 30 columns and over 200 rows, but for simplicity will use an example with 8 columns.
X1<-c(sample(100,25))
B<-c(sample(4,25,replace=TRUE))
C<-c(sample(2,25,replace =TRUE))
Y1<-c(sample(100,25))
Y2<-c(sample(100,25))
Y3<-c(sample(100,25))
Y4<-c(sample(100,25))
Y5<-c(sample(100,25))
df<-cbind(X1,B,C,Y1,Y2,Y3,Y4,Y5)
df<-as.data.frame(df)
I wrote a function that melts the data generates a plot with X1 giving the x-axis values and faceted using the values in B and C.
plotdata<-function(l){
melt<-melt(df,id.vars=c("X1","B","C"),measure.vars=l)
plot<-ggplot(melt,aes(x=X1,y=value))+geom_point()
plot2<-plot+facet_grid(B ~ C)
ggsave(filename=paste("X_vs_",l,"_faceted.jpeg",sep=""),plot=plot2)
}
I can then manually input the required Y variable
plotdata("Y1")
I don't want to generate plots for all columns. I could just type the column of interest into plotdata and then get the result, but this seems quite inelegant (and time consuming). I would prefer to be able to manually specify the columns of interest e.g. "Y1","Y3","Y4" and then write a loop function to do all those specified.
However I am new to writing for loops and can't find a way to loop in the specific column names that are required for my function to work. A standard for(i in 1:length(df)) wouldn't be appropriate because I only want to loop the user specified columns
Apologies if there is an answer to this is already in stackoverflow. I couldn't find it if there was.
Thanks to Roland for providing the following answer:
Try
for (x in c("Y1","Y3","Y4")) {plotdata(x)}
The index variable doesn't have to be numeric

Resources