Identifying dataframe columns by their first few characters - r

I have a dataframe in which the column names begin with certain characters:
> colnames(df)
[1] "p.crossfencing" "p.livestockdrinking" "v.livestocktrail"
[5] "v.landclearing" "v.grazelivestock" "v.useequipment"
Etc...
I'd like to select columns based on the first few characters (for example, those column names that begin with "v.") Basically, I'm trying do the same thing that ls(pattern="") does for objects, but in my case, for column names within a dataframe.
EDIT: Answer by Thomas below put me on the right path. I needed to use:
j[grep("^v.",j)]
where j <- colnames(df).

Are you looking for df[,grep("^v.",names(df))]?

You could also write something as below:
df[, (grep(x = colnames(df), pattern = "^v."))]

Related

How to change just a portion of a repeated column name r

I have a data set that has multiple patterned column names, "tstpos_012720", and "pbpos_012720" basically one for each date. I want to rename them in such a way that I can have each pair ordered together.
I am looking for something where I can move the date to the front and add a "t" or "p" to the end.
Such that I go from "tstpos_012720, pbpos_012720" to "012720_t, 012720_p"
I am fairly new to R, but I couldn't find anything similar.
With the values
x <- c("tstpos_012720", "pbpos_012720")
You can use gsub which takes a regular expression. You can use matching groups to extract and rearrange the parts you are interested in.
gsub("(.)[^_]*_(.*)", "\\2_\\1", x)
# [1] "012720_t" "012720_p"
for column names, you can do
names(mydata) <- gsub("(.)[^_]*_(.*)", "\\2_\\1", names(mydata))

Best way to extract a single letter from each row and create a new column in R?

Below is an excerpt of the data I'm working with. I am having trouble finding a way to extract the last letter from the sbp.id column and using the results to add a new column to the below data frame called "sex". I initially tried grepl to separate the rows ending in F and the ones ending in M, but couldn't figure out how to use that to create a new column with just M or F, depending on which one is the last letter of each row in the sbp.id column
sbp.id newID
125F 125
13000M 13000
13120M 13120
13260M 13260
13480M 13480
Another way, if you know you need the last letter, irrespective of whether the other characters are numbers, digits, or even if the elements all have different lengths, but you still just need the last character in the string from every row:
df$sex <- substr(df$sbp.id, nchar(df$sbp.id), nchar(df$sbp.id))
This works because all of the functions are vectorized by default.
Using regex you can extract the last part from sbp.id
df$sex <- sub('.*([A-Z])$', '\\1', df$sbp.id)
#Also
#df$sex <- sub('.*([MF])$', '\\1', df$sbp.id)
Or another way would be to remove all the numbers.
df$sex <- sub('\\d+', '', df$sbp.id)

R shifting values of dataframe to the left while preserving headers

I have a csv file with headers in the form :
a,b,c,d
1,6,5,6,8
df <- read_csv("test.csv")
For some reason there's the value 1 in the example is incorrect and to correct the file, Id like to shift all the other values to the left and thus drop 1 but preserving the columns ending with :
a,b,c,d
6,5,6,8
How can I achieve that ?
What about this:
headers <- names(df)
new_df <- df[, 2:length(df)]
names(new_df) <- headers
In one line of code, the structure command creates an object and assigns attributes:
structure(df[,2:length(df)], names = names(df)[1:(length(df)-1)])
Recognizing that a data.frame is a list of equal-length vectors, where each vector represents a column, the following will also work:
structure(df[2:length(df)], names = names(df)[1:(length(df)-1)])
Note no comma in df[1:length(df)].
Also, I like the trick of removing items from a vector or list using a negative index. So I think an even cleaner bit of code is:
structure(df[-1], names = names(df)[-length(df)])

Column indexing based on row value

I have the data frame:
DT=data.frame(Row=c(1,2,3,4,5),Price=c(2.1,2.1,2.2,2.3,2.5),
'2.0'= c(100,300,700,400,0),
'2.1'= c(400,200,100,500,0),
'2.2'= c(600,700,200,100,200),
'2.3'= c(300,0,300,100,100),
'2.4'= c(400,0,0,500,600),
'2.5'= c(0,200,0,800,100))
The objective is to create a new column Quantity that selects the value for each row in the column equal to Price, such that:
DT.Objective=data.frame(Row=c(1,2,3,4,5),Price=c(2.1,2.1,2.2,2.3,2.5),
'2.0'= c(100,300,700,400,0),
'2.1'= c(400,200,100,500,0),
'2.2'= c(600,700,200,100,200),
'2.3'= c(300,0,300,100,100),
'2.4'= c(400,0,0,500,600),
'2.5'= c(0,200,0,800,100),
Quantity= c(400,200,200,100,100))
The dataset is very large so efficiency is important. I currently use and looking to make more efficient:
Names <- names(DT)
DT$Quantity<- DT[Names][cbind(seq_len(nrow(DT)), match(DT$Price, Names))]
For some reason the column names in the example come with an "X" in front of them, whereas in the actual data there is no X.
Cheers.
We can do this with row/column indexing after removing the prefix 'X' using sub or substring and then do the match as showed in the OP's post
DT$Quantity <- DT[cbind(1:nrow(DT), match(DT$Price, sub("^X", "", names(DT))))]
DT$Quantity
#[1] 400 200 200 100 100
The X is attached as prefix when the column names starts with numbers. One way to take care of this would be using check.names=FALSE in the data.frame call or read.csv/read.table
#akrun is correct, check.names=TRUE is the default behavior for data.frame(); from the man page:
check.names
logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names and are not duplicated. If necessary they are adjusted (by make.names) so that they are.
If possible, you may want to make your column names a bit more descriptive.

Call a specific column name in R

colnames gives me the column names for a whole dataframe. Is there any way to get the name of one specified column. i would need this for naming labels when plotting data in ggplot.
So say my data is like this:
df1 <- data.frame(a=sample(1:50,10), b=sample(1:50,10), c=sample(1:50,10))
I would need something like paste(colnames(df1[,1])) which obviously won't work.
any ideas?
you call the name like this:
colnames(df1)[1]
# i.e. call the first element of colnames not colnames of the first vector
however by removing your comma e.g.:
colnames(df1[1])
you can also call the names, becauseusing only [x] not [,x] or [[x]] keeps the data.frame structure not reducing to a vector unlike $x and [,x]
names(df1)[1]
will give you the name of the first column. So too will
names(df1[1])
Neither uses a comma.
Would colnames(df1)[1] solve the problem?

Resources