Unique function leaves out column - r

Say I use the UNIQUE function of R to create a script to pull out particular columns of a premade dataframe to make a new one:
SUPSCIARIDS<-unique(SuperiorSciarids[,c(36,2,3,4:34)])
36-LOGID
2-Decay
3-Diameter
4:34 are the species
Why would it be do you think the new data frame does not show column 2?

I've found the answer.....that annoying dummy column "row.names" that often appears before column one.
I was considering row.names as the first column in my matrix, thus if I add 1 to the beginning of the column query it brings up the Decay variable in my unique table, BINGO! and how embarassing......>
SUPSCIARIDS<-unique(SuperiorSciarids[,c(35,1,2,3,4:34)])

Related

Adding columns and puting value in a new row in R Script

I am trying to add particular columns of data frame and adding these values in a new row of the same data frame.
TCM<-colSums(df[3:16])--this add all the values
Now in the same file "TCM" I want to have new added values at the last row named Total.
You will need the last row to be of the same length with your other rows if you want to bind them. I can write the code assuming you want the first column to be "Total" and second column to be NA. If you have different values in mind simply modify the inputs for the respective values.
first_val = "Total"
second_val = NA
to_bind = c(first_val,second_val,TCM)
df = rbind(df,to_bind)

Is there a R methodology to select the columns from a dataframe that are listed in a separate array

I have a dataframe with over 100 columns. Post implementation of certain conditions, I need a subset of the dataframe with the columns that are listed in a separate array.
The array has 50 entries with 2 columns. The first column has the selected variable names and the second column has some associated values.
I wish to build a new data frame with just the variables mentioned in the the first column of the separate array. Could you please point me as to how to proceed?
Try this:
library(dplyr)
iris <- iris %>% select(contains(dataframe_with_names$names))
In R you can use square brackets [rows, columns] to select specific rows or specific columns. (Leaving either blank selects all).
If you had a vector of column names you wanted to keep called important_columns you could select only those columns with:
myData[,important_columns]
In your case the vector of column names is actually a column in your array. So you select that column and use it as your vector:
myData[, array$names]

Assigning Unnamed Columns To Another DataFrame

I'm in a very basic class that introduces R for genetic purposes. I'm encountering a rather peculiar problem in trying to follow the instructions given. Here is what I have along with the instructor's notes:
MangrovesRaw<-read.csv("C:/Users/esteb/Documents/PopGen/MangrovesSites.csv")
#i'm going to make a new dataframe now, with one column more than the mangrovesraw dataframe but the same number of rows.
View(MangrovesRaw)
Mangroves<-data.frame(matrix(nrow = 528, ncol = 23))
#next I want you to name the first column of Mangroves "pop"
colnames(Mangroves)<-c(col1="pop")
#i'm now assigning all values of that column to be 1
Mangroves$pop<-1
#assign the rest of the columns (2 to 23) to the entirety of the MangrovesRaw dataframe
#then change the names to match the mangroves raw names
colnames(Mangroves)[2:23]<-colnames(MangrovesRaw)
I'm not really sure how to assign columns that haven't been named used the $ as we have in the past. A friend suggested I first run
colnames(Mangroves)[2:23]<-colnames(MangrovesRaw)
Mangroves$X338<-MangrovesRaw
#X338 is the name of the first column from MangrovesRaw
But while this does transfer the data from MangrovesRaw, it comes at the cost of having my column names messed up with X338. added to every subsequent column. In an attempt to modify this I found the following "fix"
colnames(Mangroves)[2:23]<-colnames(MangrovesRaw)
Mangroves$X338<-MangrovesRaw[,2]
#Mangroves$X338<-MangrovesRaw[,2:22]
#MangrovesRaw has 22 columns in total
While this transferred all the data I needed for the X338 Column, it didn't transfer any data for the remaining 21 columns. The code in # just results in the same problem of having X388. show up in all my column names.
What am I doing wrong?
There are a few ways to solve this problem. It may be that your instructor wants it done a certain way, but here's one simple solution: just cbind() the Mangroves$pop column with the real data. Then the data and column names are already added.
Mangroves <- cbind(Mangroves$pop, MangrovesRaw)
Here's another way:
Mangroves[, 2:23] <- MangrovesRaw
colnames(Mangroves)[2:23] <- colnames(MangrovesRaw)

field names repeated in output from loop to calculate new fields in R data frame

I'm using a for loop to create a set of new columns in an R dataframe but the in the output the original columns are duplicated, with the addition of the dataframe name as a suffix, and the new columns also have this suffix, which I don't want. I simply want the new output to be the same as the original dataframe, but with a set of new columns containing the new calculations. How do I achieve this? Details below:
These are the columns of the original dataframe:
Area; SR_2005;SR_2006;SR_2007;SR_2008;xnull_SR_2005;xnull_SR_2006;xnull_SR_2007;xnull_SR_2008
I then wanted to add a series of new fields to this dataframe, where each ‘SR’ column was divided by its corresponding ‘xnull_SR’ column (e.g. SR_2005/ xnull_SR_2005); each of these new fields would be prefixed with “p_”, e.g. “p_2005”). Here is the code I've used:
for (j in 2005:2019)
{field = paste("p_", j, sep = "")
restab1 <- within(restab1, restab1[[field]] <- get(paste("SR_",j, sep = ""))/ get(paste("xnull_SR_",j, sep = "")))
}
What I hoped for is that I would just get the original data fields with the new fields (“p_2005”, “p_2006” etc) added. Instead of this I do indeed get the new fields, but they are all
prefixed with the name of the dataframe (e.g. restab1.p_2005) and as well as that the original fields are repeated, once just with the field name (e.g. “SR_2005”) and once with the dataframe prefix (e.g. “restab1.SR_2005”). Therefore, these are the field names in the changed dataframe:
area SR_2005 SR_2006 SR_2007 SR_2008 xnull_SR_2005 xnull_SR_2006 xnull_SR_2007 xnull_SR_2008 restab1.area restab1.SR_2005 restab1.SR_2006 restab1.SR_2007 restab1.SR_2008 restab1.xnull_SR_2005 restab1.xnull_SR_2006 restab1.xnull_SR_2007 restab1.xnull_SR_2008 restab1.p_2005 restab1.p_2006 restab1.p_2007 restab1.p_2008
The calculations in the new fields (restab1.p_2005 restab1.p_2006 etc.) are correct but I just want the dataframe to contain the old and new field names once, and without the "restab1" prefix. How do I achieve this?
Consider simple division across multiple columns since data frames ensure columns have the same dimensions.
restab1[paste0("p_", 2005:2008)] <- restab1[paste0("SR_", 2005:2008)] / restab1[paste0("xnull_SR_", 2005:2008)]

Removing rows causes "row.names" column to appear when displayed with View()

To remove rows from a data frame, I use the following command:
data <- data[-1, ]
for example to remove the first row. I need to remove the first 6 rows, so I used the following:
data <- data[-c(1,2,3,4,5,6), ]
OR
data <- data[-(1:6), ]
this works as far as removing the row names, but introduced a new column called row.names that I cannot get rid of unless I use the command:
row.names(data) <- NULL
What is the reason for this? Is there a better way of removing a number of rows/columns with one command?
Example:
after the following code:
tquery <- tquery[-(1:6), ]
This is the data:
Although it seems as such, you are not actually adding a column to the data. What you are seeing is just a result of using View(). The function is showing the "row.names" attribute of the data frame as the first column, but you didn't really add the column.
This is expected and documented behavior. From the Details section of help(View)
If there are row names on the data frame that are not 1:nrow, they are displayed in a separate first column called row.names.
So since you subsetted the data, the row names are technically not 1:nrow any more and hence the new column is introduced in the viewer.
Print your data in the console and you'll see the difference.
View(mtcars) ## because the mtcars row names are not 1:nrow
versus
mtcars
Basically, don't trust View() to display an exact representation of the actual data. Instead use attributes(), *names(), dim(), length(), etc. or just peek at the data with head().
See r help via "?row.names" for more info. From the documentation, "All data frames have a row names attribute"
?row.names ## get more information about row.names from r help
row.names is not a new column, but rather an attribute of every single data frame. This is simply meta data and is ignored by most data. When you output this data (i.e. CSV) or use it in a function, this data will not interfere. This is similar to how excel has row numbers on the left margin, which is referential data for the application.
str(your_dataframe) ## see that those columns don't exist
colnames(your_dataframe) ## see column names

Resources