I want to be able to access b0.e7, c0.14,...,f8.d4. But right now these are not in a column, but are the "row names". How can I have the row names be 1,2,3,4,5,6,7 and b0.e7, c0.14,...,f8.d4 to be it's own column. Thanks for the help in advance.
df=as.data.frame(c)
df = subset(df, c>7)
df
c
b0.e7 11
c0.14 8
f8.d1 10
f8.d2 9
f8.d3 11
f8.d4 12
Try this. The first line assigns a new column that is just the current row names of the data frame. The second line resets the row names to NULL, resulting in a sequence.
> df$new <- rownames(df)
> rownames(df) <- NULL
Which should result in
> df
# c new
# 1 11 b0.e7
# 2 8 c0.14
# 3 10 f8.d1
# 4 9 f8.d2
# 5 11 f8.d3
# 6 12 f8.d4
And you can reverse the column order if needed with df[, c(2, 1)]
You can make use of the fact that cbind.data.frame can make use of arguments from data.frame, one of which is row.names. That argument can be set to NULL, meaning that a slightly more direct approach than proposed by Richard is:
cbind(rn = rownames(mydf), mydf, row.names = NULL)
# rn c
# 1 b0.e7 11
# 2 c0.14 8
# 3 f8.d1 10
# 4 f8.d2 9
# 5 f8.d3 11
# 6 f8.d4 12
You can try this as well.
rows = row.names(df)
df1 = cbind(rows,df)
Related
I'm a new user in R. Considering the following vector example <- c (15 1 1 1 7 8 8 9 5 9 5), I would like to create two additional vectors, the first with only the repeated numbers and the second with numbers that are not repeated, something like:
example1 <- c (15, 7)
example2 <- c (1, 8, 9, 5)
Thank you for your support.
Using example shown reproducibly in the Note at the end dups is formed from the duplicated elements and singles is the rest, This always gives two vectors (one will be zero length if there are no duplicates of if there are no singles) and it uses the numeric values directly without converting them to character.
dups <- unique(example[duplicated(example)])
singles <- setdiff(example, dups)
dups
## [1] 1 8 9 5
singles
## [1] 15 7
Note
The input shown in the question was not valid R syntax so we provide the input reproducibly here:
example <- scan(text = "15 1 1 1 7 8 8 9 5 9 5", quiet = TRUE)
You can count the appereances of the values using table:
example <- c(15,1,1,1,7,8,8,9,5,9,5)
tt <- table(example)
The names of the table are the counted values, so you can write:
repeatedValues <- as.numeric(names(tt)[tt>1])
uniqueValues <- as.numeric(names(tt))[tt==1]
Here's a one-liner using rle that puts the resultant vectors in a list:
split(rle(sort(example))$values, rle(sort(example))$lengths < 2)
#> $`FALSE`
#> [1] 1 5 8 9
#> $`TRUE`
#> [1] 7 15
This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 4 years ago.
I would like to repeat entire rows in a data-frame based on the samples column.
My input:
df <- 'chr start end samples
1 10 20 2
2 4 10 3'
df <- read.table(text=df, header=TRUE)
My expected output:
df <- 'chr start end samples
1 10 20 1-10-20-s1
1 10 20 1-10-20-s2
2 4 10 2-4-10-s1
2 4 10 2-4-10-s2
2 4 10 2-4-10-s3'
Some idea how to perform it wisely?
We can use expandRows to expand the rows based on the value in the 'samples' column, then convert to data.table, grouped by 'chr', we paste the columns together along with sequence of rows using sprintf to update the 'samples' column.
library(splitstackshape)
setDT(expandRows(df, "samples"))[,
samples := sprintf("%d-%d-%d-%s%d", chr, start, end, "s",1:.N) , chr][]
# chr start end samples
#1: 1 10 20 1-10-20-s1
#2: 1 10 20 1-10-20-s2
#3: 2 4 10 2-4-10-s1
#4: 2 4 10 2-4-10-s2
#5: 2 4 10 2-4-10-s3
NOTE: data.table will be loaded when we load splitstackshape.
You can achieve this using base R (i.e. avoiding data.tables), with the following code:
df <- 'chr start end samples
1 10 20 2
2 4 10 3'
df <- read.table(text = df, header = TRUE)
duplicate_rows <- function(chr, starts, ends, samples) {
expanded_samples <- paste0(chr, "-", starts, "-", ends, "-", "s", 1:samples)
repeated_rows <- data.frame("chr" = chr, "starts" = starts, "ends" = ends, "samples" = expanded_samples)
repeated_rows
}
expanded_rows <- Map(f = duplicate_rows, df$chr, df$start, df$end, df$samples)
new_df <- do.call(rbind, expanded_rows)
The basic idea is to define a function that will take a single row from your initial data.frame and duplicate rows based on the value in the samples column (as well as creating the distinct character strings you're after). This function is then applied to each row of your initial data.frame. The output is a list of data.frames that then need to be re-combined into a single data.frame using the do.call pattern.
The above code can be made cleaner by using the Hadley Wickham's purrr package (on CRAN), and the data.frame specific version of map (see the documentation for the by_row function), but this may be overkill for what you're after.
Example using DataFrame function from S4Vector package:
df <- DataFrame(x=c('a', 'b', 'c', 'd', 'e'), y=1:5)
rep(df, df$y)
where y column represents the number of times to repeat its corresponding row.
Result:
DataFrame with 15 rows and 2 columns
x y
<character> <integer>
1 a 1
2 b 2
3 b 2
4 c 3
5 c 3
... ... ...
11 e 5
12 e 5
13 e 5
14 e 5
15 e 5
I have a dataset in R organized like so:
x freq
1 PRODUCT10000 6
2 PRODUCT10001 20
3 PRODUCT10002 11
4 PRODUCT10003 4
5 PRODUCT10004 1
6 PRODUCT10005 2
Then, I have a function like
fun <- function(number, df1, string, df2){NormC <- as.numeric(df1[string, "normc"])
df2$NormC <- rep(NormC)}
How can I iterate through my df and insert each value of "x" into the function?
I think the problem is that this part of the function (which has 4 input variables) is structured like so- NormC <- as.numeric(df[string, "normc"])
As explained by #duckmayr, you don't need to iterate through column x. Here is an example creating new variable.
df <- read.table(text = " x freq
1 PRODUCT10000 6
2 PRODUCT10001 20
3 PRODUCT10002 11
4 PRODUCT10003 4
5 PRODUCT10004 1
6 PRODUCT10005 2", header = TRUE)
fun <- function(string){paste0(string, "X")} # example
# option 1
df$new.col1 <- fun(df$x) # see duckmayr's comment
# option 2
library(data.table)
setDT(df)[, new.col2 := fun(x)]
I am attempting to create a new data.frame object that is composed of the columns of the old data.frame with every row set at a given value (for this example, I will use 7). I am running into problems, however, naming the variables of the new data.frame object. Here is what I'm trying to do:
my.df <- data.frame(x = 1:3, y123=4:6)
my.df
x y123
1 1 4
2 2 5
3 3 6
Then when I attempt to assign these variable names to the new data.frame:
the.names <- names(my.df)
for(i in 1:length(the.names)){
data.frame(
the.names[i] = 7
)
}
But this throws errors with unexpected =, ), and }. This would usually make me suspect a typo, but I've reviewed this several times and can't find anything. Any ideas?
The easiest way is probably to just copy the old data.frame in its entirety, and then assign every cell to your new value:
df <- data.frame(x=1:3,y123=4:6);
df2 <- df;
df2[] <- 7;
df2;
## x y123
## 1 7 7
## 2 7 7
## 3 7 7
If you only want one row in the new data.frame, you can index only the top row when making the copy:
df <- data.frame(x=1:3,y123=4:6);
df2 <- df[1,];
df2[] <- 7;
df2;
## x y123
## 1 7 7
Edit: Here's how you can set each column to a different value:
df <- data.frame(x=1:3,y123=4:6);
df2 <- df;
df2[] <- rep(c(7,31),each=nrow(df2));
df2;
## x y123
## 1 7 31
## 2 7 31
## 3 7 31
You can use within:
> within(my.df, {
+ assign(the.names[1], 0)
+ assign(the.names[2], 1)
+ })
x y123
1 0 1
2 0 1
3 0 1
I'm trying to make a function that takes two arguments. One argument is the name of a data frame, and the second is the name of a column in that data frame. The goal is for the function to manipulate data in the whole frame based on information contained in the specified column.
My problem is that I can't figure out how to use the character expression entered into the second argument to access that particular column in the data frame within the function. Here's a super brief example,
datFunc <- function(dataFrame = NULL, charExpres = NULL) {
return(dataFrame$charExpress)
}
If, for instance you enter
datFunc(myData, "variable1")
this does not return myData$variable1. there HAS to be a simple way to do this. Sorry if the question is stupid, but i'd appreciate a little help here.
A related question would be, how do i use the character string "myData$variable1" to actually return variable1 from myData?
I think OP wants to pass name of dataframe as string too. If that is the case your function should be something like. (borrowed sample from other answer)
fooFunc <- function( dfNameStr, colNamestr, drop=TRUE) {
df <- get(dfNameStr)
return(df[,colNamestr, drop=drop])
}
> myData <- data.frame(ID=1:10, variable1=rnorm(10, 10, 1))
> myData
ID variable1
1 1 10.838590
2 2 9.596791
3 3 10.158037
4 4 9.816136
5 5 10.388900
6 6 10.873294
7 7 9.178112
8 8 10.828505
9 9 9.113271
10 10 10.345151
> fooFunc('myData', 'ID', drop=F)
ID
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
> fooFunc('myData', 'ID', drop=T)
[1] 1 2 3 4 5 6 7 8 9 10
You're almost there, try using [ instead of $ for this kind of indexing
datFunc <- function(dataFrame = NULL, charExpres = NULL, drop=TRUE) {
return(dataFrame[, charExpres, drop=drop])
}
# An example
set.seed(1)
myData <- data.frame(ID=1:10, variable1=rnorm(10, 10, 1)) # DataFrame
datFunc(myData, "variable1") # dropping dimensions
[1] 9.373546 10.183643 9.164371 11.595281 10.329508 9.179532 10.487429 10.738325 10.575781 9.694612
datFunc(myData, "variable1", drop=FALSE) # keeping dimensions
variable1
1 9.373546
2 10.183643
3 9.164371
4 11.595281
5 10.329508
6 9.179532
7 10.487429
8 10.738325
9 10.575781
10 9.694612
Alternatively, you can find the column index of the dataframe:
df <- as.data.frame(matrix(rnorm(100), ncol = 10))
colnames(df) <- sample(LETTERS, 10)
column.index.of.A <- grep("^A$", colnames(df))
df[, column.index.of.A]