I want to create a numeric vector in R with a placeholder. Just like in a chracter vector like:
characterVec <- c("a", "b", "", "d")
This gives me a characterVec vector with a length of 4.
How can I create a numeric vector with a length of 4, but still has one empty value? For example, I would like to know what do I put into the question mark in the following vector.
numericVec <- c(1, 2, ?, 4)
If I'm understanding your question properly, you can use a named vector to create a data dictionary linking letters to corresponding numbers:
# data dictionary
dat <- 1:26
names(dat) <- letters
then map dictionary onto your vector
characterVec <- c("a", "b", "", "d")
numVec <- dat[characterVec]
gives
a b <NA> d
1 2 NA 4
You can remove the vector names with unname():
numVec <- unname(dat[characterVec])
Related
For every entry in a column of dataframe #1, I want to see if that value is in a dataframe #2 and then grab a value from a particular column from the second dataframe, else 0 if it can't find it. Is there a way to use one of the *apply functions for this?
df1 <- data.frame(
key1 = c("A","B","C","E")
)
df2 <- data.frame(
key2 = c("X", "A", "C", "D", "E"),
val2 = as.integer(c('1','2','23','41','99'))
)
#Answer should be a vector like this:
x <- as.integer(c('2','0','23','99'))
The code below will give your the results in your example, but if the key appears more than once in df2 it will return only the first result. If that is not what you want, please describe the desired output for that scenario.
x <- as.integer(df2[["val2"]][match(df1[["key1"]], df2[["key2"]])])
x[is.na(x)] <- as.integer(0)
match returns the locations of the positions of matches of its first argument in its second. match will return NA for non matches, which will create a NA value when it indexes into df2[["val2"]], so those values have to be changed to 0 to get the final result.
Is there an easier (i.e. one line of code instead of two!) way to do the following:
results <- as.data.frame(str_split_fixed(c("SampleID_someusefulinfo.countsA" , "SampleID_someusefulinfo.countsB" , "SampleID_someusefulinfo.counts"), "\\.", n=2))
names(results) <- c("a", "b")
Something like:
results <- data.frame(str_split_fixed(c("SampleID_someusefulinfo.countsA" , "SampleID_someusefulinfo.countsB" , "SampleID_someusefulinfo.counts"), "\\.", n=2), colnames = c("a", "b"))
I do this a lot, and would really love to have a way to have this in one line of code.
/data.table works too, if it's easier to do there than in base data.frame/
Clarifying:
My expected output (which is achieved by running the two lines of code at the top - AND I WANT IT TO BE ONE - THAT's IT!!!) is a result data frame of the structure:
results
a b
1 SampleID_someusefulinfo countsA
2 SampleID_someusefulinfo countsB
3 SampleID_someusefulinfo counts
What I would like to do is:
CREATE the data frame from a matrix or with some content (for example the toy code of matrix(c(1,2,3,4),nrow=2,ncol=2) I provided in the first example I wrote)
SPECIFY IN THAT SAME LINE what I would like the column names of my data frame to be
Use setNames() around a data.frame
setNames(data.frame(matrix(c(1,2,3,4),nrow=2,ncol=2)), c("a","b"))
# a b
#1 1 3
#2 2 4
?setNames:
a convenience function that sets the names on an object and returns the object
> setNames
function (object = nm, nm)
{
names(object) <- nm
object
}
We can use the dimnames option in matrix as the OP was using matrix to create the data.
data.frame(matrix(1:4, 2, 2, dimnames=list(NULL, c("a", "b"))))
Or
`colnames<-`(data.frame(matrix(1:4, 2, 2)), c('a', 'b'))
I have a dataframe in which the 1st element of an associated 'name' vector is related to subsequent named numerical vectors. I am attempting to replace the meaningless number with the 1st element of the associated name vector.
Here is an example dataframe:
df <- data.frame(data.0.name = c("A", "A", "A"), data.0.one_minute_ago = c(1,2,1), data.0.one_hour_ago = c(2,2,3),
data.1.name = c("B", "B", "B"), data.1.one_minute_ago = c(3,3,2), data.1.one_hour_ago = c(5,6,2))`
Each number.name vector is associated with a construct (either A or B in this case) and each number.time is associated with a time dimension. So, data.0.one_minute_ago is actually the number of A's you had one_minute_ago.
What I would like to do (because I have a large dataset with lots of the transformations) is to replace the number.dimension with the construct.dimension, and of course do that for each number. from 0:9
I've written some grep code to begin with this task, but to no avail (I am stuck with retaining everything after the number.
grep( "data.[0-9].name" ,names(df), perl=TRUE)
as.character(df[1, 1])
as.character(df[1, 4])
as.character(names(df[2]))
as.character(names(df[3]))
as.character(names(df[5]))
as.character(names(df[6]))
df.1 <- (df[1, grep( "data.[0-9].name" ,names(df))])
df.1 <- (df[1, grep( "data.[0-9].name" ,names(df))])
df.1 <- data.frame(lapply(df.1, as.character), stringsAsFactors=FALSE)
constructs <- as.character(df.1[1,c(1:2)])
Here the 1st and 2nd element of constructs are the constructs associated with 0.name/0.dimension and 1.name/1.dimension respectively.
constructs [1]
constructs [2]
From there, I'm fairly certain the code would involve some names(df)[] <- but am uncertain on where to go from here.
Any and all help appreciated.
EDIT: here is the desired variable name output: simply changing the variable names (and of course retain the values associated with the variable names:
data.A.name data.A.one_minute_ago data.A.one_hour_ago data.B.name data.B.one_minute_ago data.B.one_hour_ago
EDIT 2: In my true dataset, the number of repetitions per dimensions (i.e., one_minute_ago, one_hour_ago, one_day_ago) can vary across construct (i.e, two dimensions for one construct and 3 for another, and 9 for another). I would like the solution to take that into account.
Here is a modified sample dataset to reflect this subtlety:
df <- data.frame(data.0.name = c("A", "A", "A"), data.0.one_minute_ago = c(1,2,1), data.0.one_hour_ago = c(2,2,3),
data.1.name = c("B", "B", "B"), data.1.one_minute_ago = c(3,3,2), data.1.one_hour_ago = c(5,6,2),
data.2.name = c("C", "C", "C"), data.2.one_minute_ago = c(3,3,2), data.2.one_hour_ago = c(5,6,2), data.2.one_day_ago = c(3,2,3))
We create a grouping 'indx' based on the 'number' in the column names. split the column names based on the 'indx' ('lst'). Get one element from the columns having 'name' as suffix ('r1'). Use 'Map' and gsub to replace the 'number' in each element of 'lst' with that of 'r1'.
indx <- gsub('[^0-9]+', '', names(df))
lst <- split(names(df), indx)
r1 <- as.character(unlist(df[1,grep('name', names(df))]))
lst2 <- Map(function(x,y) gsub('[0-9]+', y, x), lst, r1)
names(df) <- unsplit(lst2, indx)
names(df)
# [1] "data.A.name" "data.A.one_minute_ago" "data.A.one_hour_ago"
#[4] "data.B.name" "data.B.one_minute_ago" "data.B.one_hour_ago"
#[7] "data.C.name" "data.C.one_minute_ago" "data.C.one_hour_ago"
#[10] "data.C.one_day_ago"
I think this works:
library(stringr)
splits <- str_split(names(df), "\\.")
trailing_name <- sapply(splits, "[[", 3)
constructs <- rep(constructs, each = 3)
constructs
# [1] "A" "A" "A" "B" "B" "B"
names(df) <- str_c("data", constructs, trailing_name, sep=".")
names(df)
# [1] "data.A.name" "data.A.one_minute_ago" "data.A.one_hour_ago" "data.B.name"
# [5] "data.B.one_minute_ago" "data.B.one_hour_ago"
How can one determine the row index-numbers corresponding to particular row names? I have a vector of row names, and I would like to use these to obtain a vector of the corresponding row indices in a matrix.
I tried row() and as.integer(rownames(matrix.object)), but neither seems to work.
In addition to which, you can look at match:
m <- matrix(1:25, ncol = 5, dimnames = list(letters[1:5], LETTERS[1:5]))
vec <- c("e", "a", "c")
match(vec, rownames(m))
# [1] 5 1 3
Try which:
which(rownames(matrix.object) %in% c("foo", "bar"))
I have two data.frames:
pattern <- data.frame(pattern = c("A", "B", "C", "D"), val = c(1, 1, 2, 2))
match <- data.frame(match = c("A", "C"))
I want to add to my data.frame pattern another column called new_val and assign "X" to each row where the value for column pattern is in the data.frame match otherwise assign "Y"
is.element(pattern$pattern, match$match)
[1] TRUE FALSE TRUE FALSE
So, the resulting data.frame should look like:
pattern val new_val
1 A 1 X
2 B 1 Y
3 C 2 X
4 D 2 Y
I achieved to do it with an ugly for-loop but I am sure this can be pretty much done in a one line R command using fancy stuff :-)
Is anyone able to help?
Many thanks!
I'm only really posting this since Tyler said "if you wanted a one liner data.table would likely do it" and I knew it was definitely possible with a one liner in base. I am also assuming match had been renamed to mat.
pattern$new_val <- c("Y", "X")[(pattern$pattern %in% mat)+1]
pattern
# pattern val new_val
#1 A 1 X
#2 B 1 Y
#3 C 2 X
#4 D 2 Y
pattern$pattern %in% mat is finding which of the elements of pattern are in mat which returns TRUE if it's in mat, FALSE if it's not. Then I add 1 to make it numeric in the range of 1-2 so that it can be used for indexing. Then we use that as an index to the self defined vector c("Y", "X") and since the index we created is always 1 or 2 we're always able to grab an element of interest. So in this case we'll grab "Y" if pattern wasn't in mat and "X" if it was - which is what you wanted.
Here's one way (I renamed your match to mat since there's a pretty important base function named match that you could actually use to solve this problem; in fact %in% is a form of match:
pattern <- data.frame(pattern = c("A", "B", "C", "D"), val = c(1, 1, 2, 2))
mat <- c("A", "C")
pattern$new_val <- "Y" #pre allot everything to be Y
pattern$new_val[pattern$pattern %in% mat] <- "X" #replace any A or C with an X
pattern
PS if you wanted a one liner data.table would likely do it.
If you wanted something a little more complicated you could use a function from a package I'm working on:
library(qdap)
#original problem
pattern$new_val <- text2color(pattern$pattern, list(c("A", "C")), c("X", "Y"))
#extending it
#makes D a 5
text2color(pattern$pattern, list(c("A", "C"), "D"), c("X", 5, "Y"))
This function really is designed to do something else but if you want to grab the essential parts of it you can look at the source code.