R: Subsetting one column from a dataframe, KEEPING the column name - r

I'm learning R, and I'm modifying a small piece of code. How do I make a subset of a dataframe, which is a single column, that includes the column name?
This does not work, as it doesn't retain the column name.
Data1Subset <- Data1$Level
The code sample I'm modifying follows this up with
colnames(Data1)
Also, is.data.frame(Data1) is TRUE

I finally found this with Google
Data1Subset <- subset(Data1, select = "Level")

Try the code below
Data1Subset <- Data1["Level"]

Related

improving specific code efficiency - *base R* alternative to for() loop solution

Looking for a vectorized base R solution for my own edification. I'm assigning a value to a column in a data frame based on a value in another column in the data frame.
My solution creates a named vector of possible codes, looks up the code in the original column, subsets the named list by the value found, and assigns the resulting name to the new column. I'm sure there's a way to do exactly this using the named vector I created that doesn't need a for loop; is it some version of apply?
dplyr is great and useful and I'm not looking for a solution that uses it.
# reference vector for assigning more readable text to this table
tempAssessmentCodes <- setNames(c(600,301,302,601,303,304,602,305,306,603,307,308,604,309,310,605,311,312,606,699),
c("base","3m","6m","6m","9m","12m","12m","15m","18m","18m","21m","24m","24m","27m","30m","30m",
"33m","36m","36m","disch"))
for(i in 1:nrow(rawDisp)){
rawDisp$assessText[i] <- names(tempAssessmentCodes)[tempAssessmentCodes==rawDisp$assessment[i]]
}
The standard way is to use match():
rawDisp$assessText <- names(tempAssessmentCodes)[match(rawDisp$assessment, tempAssessmentCodes)]
For each y element match(x, y) will find a corresponding element index in x. Then we use the names of y for replacing values with names.
Personally, I do it the opposite way - make tempAssesmentCodes have names that correspond to old codes, and values correspond to new codes:
codes <- setNames(names(tempAssessmentCodes), tempAssessmentCodes)
Then simply select elements from the new codes using the names (old codes):
rawDisp$assessText <- codes[as.character(rawDisp$assessment)]

R, Dataset without column names

Complete noob here, specially with R.
For a school project I have to work with a specific dataset which doesn't come with column names in the dataset it self but there is a .txt that has extra information regarding the dataset, including the column names. The problem I'm having is that when I load the dataset rstudio assumes that the first line of data is actually the column names. Initially I just substituted the name with colnames() but by doing so I ended up ignoring/deleting the first line of data, and I'm sure that's not the right away of dealing with it.
How can I go about adding the correct column names without deleting the first line of data? (Preferably inside R due to school work requirements)
Thanks in advance!
When we read the data with read.table, use header = FALSE so that it automatically assigns a column name
df1 <- read.table('file.txt', header = FALSE)
Then, we can assign the preferred column names from the other .txt column
colnames(df1) <- scan('names.txt', what = '', quiet = TRUE)

Merge dataframes with unequal rows, and no matching column names R

I am trying to take df1 (a summary table), and merge it into df2 (master summary table).
This is a snapshot of df2, ignore the random 42, just the answer to the ultimate question.
This is an example of what df1, looks like.
Lastly, I have a vector called Dates. This matches the dates that are the column names for df2.
I am trying to cycle through 20 file, and gather the summary statistics of that file. I then want to enter that data into df2 to be stored permanently. I only need to enter the Earned column.
I have tried to use merge but since they do not have shared column names, I am unable to.
My next attempt was to try this. But it gave an error, because of unequal row numbers.
df2[,paste(Dates[i])] <- cbind(df2,df1)
Then I thought that maybe if I specified the exact location, it might work.
df2[1:length(df1$Earned),Dates[i]] <- df1$Earned
But that gave and error "New columns would leave holes after existing columns"
So then I thought of trying that again, but with cbind.
df2[1:length(df1$Earned),Dates[i]] <- cbind(df2, df1$Earned)
##This gave an error for differing row numbers
df2 <- cbind(df2[1:length(df1$Earned),Dates[i]],df1$earned)
## This "worked" but it replaced all of df2 with df1$earned, so I basically lost the rest of the master table
Any ideas would be greatly appreciated. Thank you.
Something like this might work:
df1[df1$TreatyYear %in% df2$TreatyYear, Dates] <- df2$Earned
Example
df <- data.frame(matrix(NA,4,4))
df$X1 <- 1:4
df[df$X1 %in% c(1,2),c("X3","X4")] <- c(1,2)
The only solution that I have found so far is to force df1$Earned into a vector. Then append the vector to be the exact length of the df2. Then I am able to insert the values into df2 by the specific column.
temp_values <- append(df1$Earned,rep(0,(length(df2$TreatyYear)-length(df1$TreatyYear))),after=length(df1$Earned))
df2[,paste(Dates[i])] <- temp_values
This is kind of a roundabout way to fix it, but not a very pleasant way. Any better ideas would be appreciated.

How to access a column after subsetting data frame?

It has to be really simple but it looks like my mind is not working properly anymore.
So, what I would like to do is to store one of the columns from mtcars as a vector but after subsetting it. I need one line code for the subsetting and assigning a vector.
That's what I would like to achieve but with one line:
data <- mtcars[mtcars[,11]==4,]
vec <- data[,1]
Thx!
vec<-mtcars[mtcars[,11]==4,][,1]
The mtcars[,11]==4 would be the row index and by selecting the column index as '1', we get the first column with subset of rows based on the condition.
mtcars[mtcars[,11]==4, 1]

How can I use the row.names attribute to order the rows of my dataframe in R?

I created a random forest and predicted the classes of my test set, which are living happily in a dataframe:
row.names class
564028 1
275747 1
601137 0
922930 1
481988 1
...
The row.names attribute tells me which row is which, before I did various operations that scrambled the order of the rows during the process. So far so good.
Now I would like get a general feel for the accuracy of my predictions. To do this, I need to take this dataframe and reorder it in ascending order according to the row.names attribute. This way, I can compare the observations, row-wise, to the labels, which I already know.
Forgive me for asking such a basic question, but for the life of me, I can't find a good source of information regarding how to do such a trivial task.
The documentation implores me to:
use attr(x, "row.names") if you need to retrieve an integer-valued set of row names.
but this leaves me with nothing but NULL.
My question is, how can I use row.names which has been loyally following me around in the various incarnations of dataframes throughout my workflow? Isn't this what it is there for?
None of the other solutions would actually work.
It should be:
# Assuming the data frame is called df
df[ order(as.numeric(row.names(df))), ]
because the row name in R is character, when the as.numeric part is missing it, it will arrange the data as 1, 10, 11, ... and so on.
This worked for me:
new_df <- df[ order(row.names(df)), ]
If you have only one column in your dataframe like in my case you have to add drop=F:
df[ order(rownames(df)) , ,drop=F]
For completeness:
#BondedDust's answer works perfectly for the rownames attribute, but your example does not use the rownames attribute. The output provided in your question indicates use of a column named "row.names", which isn't the same thing (all listed in #BondedDust's comment). Here would be the answer if you wished to sort by the "row.names" column in example given in your question (there is another posting on this, located here). This answer assumes you are using a dataframe named "df", with one column named "row.names":
ordered.df <- df[order(df$row.names),] #this orders the df by the "row.names" column
Alternatively, to order by the first column (same thing if you're still using your example):
ordered.df <- df[order(df[,1]),] #this orders the df by the first column
Hope this is helpful!
This will be done almost automatically since the "[" function will display in lexical order of any vector that can be matched to rownames():
df[ rownames(df) , ]
You might have thought it would be necessary to use:
df[ order(rownames(df)) , ]
But that would have given you an ordering of 1:100 of 1,10,100, 12,13, ...,2,20,21, ... , because the argument to "[" gets coerced to character.
Assuming your data frame is named 'df'you can create a new ordered data frame 'ord.df' that will contain the row names of df as well as it values in the following one line of code:
>ord.df<-cbind(rownames(df)[order(rownames(df))], df[order(rownames(df)),])
new_df <- df[ order(row.names(df)), ]
or something similar won't work. After this statement, the new_df does not have a rowname any more. I guess a better solution is to add a column as rowname, sort by it, and set it as the rowname
you can simply sort your df by using this :
df <- df[sort(rownames(df)),]
and then do what you want !

Resources