create a vector containing row IDs of dataframe - r

I am using one of Rs built in datasets called USArrests. It looks like instead of the rows having a numeric ID, they have a State as the row ID. Now how do I create a vector containing all of these state names?
I would generally use myvec <- c(USArrests$colname) but I am not sure how to access the states as it is not considered a normal column

data("USArrests")
head(USArrests)
vector_of_names <- rownames(USArrests)
##if you want to append to the dataframe
USArrests$state_name <-rownames(USArrests)
USArrests

Related

Data extraction in R - multiple columns

Hello, I have this type of table consisting of a single row and several columns. I have tried a code to extract my KD_PL parameters without success. Do you know a way in R to extract all the KD_PLs and store them in a vector or data frame array?
I tried this:
KDPL <- select("KD_PL.", which(substr(colnames(max_LnData), start=1, stop=6)))
This should do the trick:
library(tidyverse)
KDPL <- max_LnData %>% select(starts_with("KD_PL."))
This function selects all columns from your old dataset starting with "KD_PL." and stores them in a new dataframe KDPL.
If you only want the names of the columns to be saved, you could use the following:
KDPL_names <- colnames(KDPL)
This saves the column names in the vector KDPL_names.

How to create a matrix/data frame from a high number of single objects by using a loop?

I have a high number of single objects each one containing a mean value for a year. They are called cddmean1950, cddmean1951, ... ,cddmean2019.
Now I would like to put them together into a matrix or data frame with the first column being the year (1950 - 2019) and the second column being the single mean values.
This is a very long way to do it without looping:
matrix <- rbind(cddmean1950,cddmean1951,cddmean1952,...,cddmean2019)
Afterwards you transform the matrix to a data frame, create a vector with the years and add it to the data frame.
I am sure there must be a smarter and faster way to do this by using a loop or anything else?
I think this could be an easy way to do it. Provided all those single objects are in your current environment.
First we would create a list of the variable names using the paste0 function
YearRange <- 1950:2019
ObjectName <- paste0("cddmean", YearRange)
Then we can use lapply and get to get the values of all these variables as a list.
Then using do.call and rbind we can rbind all these values into a single vector and then finally create your dataframe as you requested.
ListofSingleObjects <- lapply(ObjectName, get)
MeanValues <- do.call(rbind,ListofSingleObjects)
df <- data.frame( year = YearRange , Mean = MeanValues )

R How to match dataframes to retrieve elements

I have a dataframe with 5000 rows, containing municipalities data from which I need to extract only rows matching a specific set of names. I am iterating the set through my dataframe using for loop.
This is for R 3.6.0
data <- NULL
for (i in mun.names){
data <- area.mun[area.mun[, 1] == i, ]
}
The object mun.names contain the municipalities I need to match. The object area.mun has the two columns NAME and AREA. The first column of both objects has municipalities names formatted accordingly.
At the end of the for loop my resulting object data always has only one value, the last municipality of the object area.mun.
This is a simple error. I appreciate any kind of feedback.
Convert your 'mun.names' to data frame:
mun.names <- data.frame(mun.names)
Change the column name to 'NAME':
colnames(mun.names) <- c(NAME)
Convert your 'area.mun' to data frame:
area.mun <- data.frame(area.mun)
Use merge command to extract the matched rows:
df <- merge(area.mun,mun.names,by.x="NAME",by.y="NAME")
You can also get all the unmatched rows from mun.names and area.mun data frames using all.x=TRUE and all.y=TRUE
df <- merge(area.mun,mun.names,by.x="NAME",by.y="NAME",all.x=TRUE, all.y=TRUE)

How to extract row names as a variable in order for it to applied to another dataframe

I'm a newbie to R. I'm currently working with two dataframes, one containing initial values, and another containing values that have been computed using the original data.
My new dataframe for the computed values is built like this:
reldf <- data.frame(matrix(ncol = 13, nrow = nrow(glasgow2001)))
names <- c("2001r","2002r","2003r","2004r","2005r","2006r","2007r","2008r",
"2009r","2010r","2011r","2012r","2013r")
However, in order to remerge the computed values with the original dataframe, I want to be able to extract the original row names from the first data frame and apply them to this one. And this is where I'm completely lost.
Basically, how do I extract row names in R and apply them onto a new dataframe?
You can assign column names with names()
row.names(reldf) <- names

Assigning name to rows in R

I would like to assign names to rows in R but so far I have only found ways to assign names to columns. My data is in two columns where the first column (geo) is assigned with the name of the specific location I'm investigating and the second column (skada) is the observed value at that specific location. To clarify, I want to be able to assign names for every location instead of just having them all in one .txt file so that the data is easier to work with. Anyone with more experience than me that knows how to handle this in R?
First you need to import the data to your global environment. Try the function read.table()
To name rows, try
(assuming your data.frame is named df):
rownames(df) <- df[, "geo"]
df <- df[, -1]
Well, your question is not that clear...
I assume you are trying to create a data.frame with named rows. If you look at the data.frame help you can see the parameter row.names description
NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.
which means you can manually specify the row names when you create the data.frame or the column containing the names. The former can be achived as follows
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
row.names=letters[1:10] # take the first 10 letters and use them as row header
)
while the latter is
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
r=letters[1:10], # take the first 10 letters
row.names=3 # the column with the row headers is the 3rd
)
If you are reading the data from a file I will assume you are using the command read.table. Many of its parameters are the same of data.frame, in particular you will find that the row.headers parameter works the same way:
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names.
Finally, if you have already read the data.frame and you want to change the row names, Pierre's answer is your solution

Resources