I am attempting to create a new data.frame object that is composed of the columns of the old data.frame with every row set at a given value (for this example, I will use 7). I am running into problems, however, naming the variables of the new data.frame object. Here is what I'm trying to do:
my.df <- data.frame(x = 1:3, y123=4:6)
my.df
x y123
1 1 4
2 2 5
3 3 6
Then when I attempt to assign these variable names to the new data.frame:
the.names <- names(my.df)
for(i in 1:length(the.names)){
data.frame(
the.names[i] = 7
)
}
But this throws errors with unexpected =, ), and }. This would usually make me suspect a typo, but I've reviewed this several times and can't find anything. Any ideas?
The easiest way is probably to just copy the old data.frame in its entirety, and then assign every cell to your new value:
df <- data.frame(x=1:3,y123=4:6);
df2 <- df;
df2[] <- 7;
df2;
## x y123
## 1 7 7
## 2 7 7
## 3 7 7
If you only want one row in the new data.frame, you can index only the top row when making the copy:
df <- data.frame(x=1:3,y123=4:6);
df2 <- df[1,];
df2[] <- 7;
df2;
## x y123
## 1 7 7
Edit: Here's how you can set each column to a different value:
df <- data.frame(x=1:3,y123=4:6);
df2 <- df;
df2[] <- rep(c(7,31),each=nrow(df2));
df2;
## x y123
## 1 7 31
## 2 7 31
## 3 7 31
You can use within:
> within(my.df, {
+ assign(the.names[1], 0)
+ assign(the.names[2], 1)
+ })
x y123
1 0 1
2 0 1
3 0 1
Related
I have a dataset in R organized like so:
x freq
1 PRODUCT10000 6
2 PRODUCT10001 20
3 PRODUCT10002 11
4 PRODUCT10003 4
5 PRODUCT10004 1
6 PRODUCT10005 2
Then, I have a function like
fun <- function(number, df1, string, df2){NormC <- as.numeric(df1[string, "normc"])
df2$NormC <- rep(NormC)}
How can I iterate through my df and insert each value of "x" into the function?
I think the problem is that this part of the function (which has 4 input variables) is structured like so- NormC <- as.numeric(df[string, "normc"])
As explained by #duckmayr, you don't need to iterate through column x. Here is an example creating new variable.
df <- read.table(text = " x freq
1 PRODUCT10000 6
2 PRODUCT10001 20
3 PRODUCT10002 11
4 PRODUCT10003 4
5 PRODUCT10004 1
6 PRODUCT10005 2", header = TRUE)
fun <- function(string){paste0(string, "X")} # example
# option 1
df$new.col1 <- fun(df$x) # see duckmayr's comment
# option 2
library(data.table)
setDT(df)[, new.col2 := fun(x)]
So, I have several dataframes like this
1 2 a
2 3 b
3 4 c
4 5 d
3 5 e
......
1 2 j
2 3 i
3 4 t
3 5 r
.......
2 3 t
2 4 g
6 7 i
8 9 t
......
What I want is, I want to merge all of these files into one single file showing the values of third column for each pair of values in columns 1 and columns 2 and 0 if that pair is not present.
So, the output for this will be, since, there are three files (there are more)
1 2 aj0
2 3 bit
3 4 ct0
4 5 d00
3 5 er0
6 7 00i
8 9 00t
......
What I did was combine all my text .txt files in a single list.
Then,
L <- lapply(seq_along(L), function(i) {
L[[i]][, paste0('DF', i)] <- 1
L[[i]]
})
Which will indicate the presence of a value when we will be merging them.
I don't know how to proceed further. Any inputs will be great. Thanks!
Here is one way to do it with Reduce
# function to generate dummy data
gen_data<- function(){
data.frame(
x = 1:3,
y = 2:4,
z = sample(LETTERS, 3, replace = TRUE)
)
}
# generate list of data frames to merge
L <- lapply(1:3, function(x) gen_data())
# function to merge by x and y and concatenate z
f <- function(x, y){
d <- merge(x, y, by = c('x', 'y'), all = TRUE)
# set merged column to zero if no match is found
d[['z.x']] = ifelse(is.na(d[['z.x']]), 0, d[['z.x']])
d[['z.y']] = ifelse(is.na(d[['z.y']]), 0, d[['z.y']])
d$z <- paste0(d[['z.x']], d[['z.y']])
d['z.x'] <- d['z.y'] <- NULL
return(d)
}
# merge data frames
Reduce(f, L)
This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 4 years ago.
Embarrassingly basic question, but if you don't know.. I need to reshape a data.frame of count summarised data into what it would've looked like before being summarised. This is essentially the reverse of {plyr} count() e.g.
> (d = data.frame(value=c(1,1,1,2,3,3), cat=c('A','A','A','A','B','B')))
value cat
1 1 A
2 1 A
3 1 A
4 2 A
5 3 B
6 3 B
> (summry = plyr::count(d))
value cat freq
1 1 A 3
2 2 A 1
3 3 B 2
If you start with summry what is the quickest way back to d? Unless I'm mistaken (very possible), {Reshape2} doesn't do this..
Just use rep:
summry[rep(rownames(summry), summry$freq), c("value", "cat")]
# value cat
# 1 1 A
# 1.1 1 A
# 1.2 1 A
# 2 2 A
# 3 3 B
# 3.1 3 B
A variation of this approach can be found in expandRows from my "SOfun" package. If you had that loaded, you would be able to simply do:
expandRows(summry, "freq")
There is a good table to dataframe function on the R cookbook website that you can modify slightly. The only modifications were changing 'Freq' -> 'freq' (to be consistent with plyr::count) and making sure the rownames were reset as increasing integers.
expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".") {
# Take each row in the source data frame table and replicate it
# using the Freq value
DF <- sapply(1:nrow(x),
function(i) x[rep(i, each = x$freq[i]), ],
simplify = FALSE)
# Take the above list and rbind it to create a single DF
# Also subset the result to eliminate the Freq column
DF <- subset(do.call("rbind", DF), select = -freq)
# Now apply type.convert to the character coerced factor columns
# to facilitate data type selection for each column
for (i in 1:ncol(DF)) {
DF[[i]] <- type.convert(as.character(DF[[i]]),
na.strings = na.strings,
as.is = as.is, dec = dec)
}
row.names(DF) <- seq(nrow(DF))
DF
}
expand.dft(summry)
value cat
1 1 A
2 1 A
3 1 A
4 2 A
5 3 B
6 3 B
I want to be able to access b0.e7, c0.14,...,f8.d4. But right now these are not in a column, but are the "row names". How can I have the row names be 1,2,3,4,5,6,7 and b0.e7, c0.14,...,f8.d4 to be it's own column. Thanks for the help in advance.
df=as.data.frame(c)
df = subset(df, c>7)
df
c
b0.e7 11
c0.14 8
f8.d1 10
f8.d2 9
f8.d3 11
f8.d4 12
Try this. The first line assigns a new column that is just the current row names of the data frame. The second line resets the row names to NULL, resulting in a sequence.
> df$new <- rownames(df)
> rownames(df) <- NULL
Which should result in
> df
# c new
# 1 11 b0.e7
# 2 8 c0.14
# 3 10 f8.d1
# 4 9 f8.d2
# 5 11 f8.d3
# 6 12 f8.d4
And you can reverse the column order if needed with df[, c(2, 1)]
You can make use of the fact that cbind.data.frame can make use of arguments from data.frame, one of which is row.names. That argument can be set to NULL, meaning that a slightly more direct approach than proposed by Richard is:
cbind(rn = rownames(mydf), mydf, row.names = NULL)
# rn c
# 1 b0.e7 11
# 2 c0.14 8
# 3 f8.d1 10
# 4 f8.d2 9
# 5 f8.d3 11
# 6 f8.d4 12
You can try this as well.
rows = row.names(df)
df1 = cbind(rows,df)
Consider the following:
df <- data.frame(a = 1, b = 2, c = 3)
names(df[1]) <- "d" ## First method
## a b c
##1 1 2 3
names(df)[1] <- "d" ## Second method
## d b c
##1 1 2 3
Both methods didn't return an error, but the first didn't change the column name, while the second did.
I thought it has something to do with the fact that I'm operating only on a subset of df, but why, for example, the following works fine then?
df[1] <- 2
## a b c
##1 2 2 3
What I think is happening is that replacement into a data frame ignores the attributes of the data frame that is drawn from. I am not 100% sure of this, but the following experiments appear to back it up:
df <- data.frame(a = 1:3, b = 5:7)
# a b
# 1 1 5
# 2 2 6
# 3 3 7
df2 <- data.frame(c = 10:12)
# c
# 1 10
# 2 11
# 3 12
df[1] <- df2[1] # in this case `df[1] <- df2` is equivalent
Which produces:
# a b
# 1 10 5
# 2 11 6
# 3 12 7
Notice how the values changed for df, but not the names. Basically the replacement operator `[<-` only replaces the values. This is why the name was not updated. I believe this explains all the issues.
In the scenario:
names(df[2]) <- "x"
You can think of the assignment as follows (this is a simplification, see end of post for more detail):
tmp <- df[2]
# b
# 1 5
# 2 6
# 3 7
names(tmp) <- "x"
# x
# 1 5
# 2 6
# 3 7
df[2] <- tmp # `tmp` has "x" for names, but it is ignored!
# a b
# 1 10 5
# 2 11 6
# 3 12 7
The last step of which is an assignment with `[<-`, which doesn't respect the names attribute of the RHS.
But in the scenario:
names(df)[2] <- "x"
you can think of the assignment as (again, a simplification):
tmp <- names(df)
# [1] "a" "b"
tmp[2] <- "x"
# [1] "a" "x"
names(df) <- tmp
# a x
# 1 10 5
# 2 11 6
# 3 12 7
Notice how we directly assign to names, instead of assigning to df which ignores attributes.
df[2] <- 2
works because we are assigning directly to the values, not the attributes, so there are no problems here.
EDIT: based on some commentary from #AriB.Friedman, here is a more elaborate version of what I think is going on (note I'm omitting the S3 dispatch to `[.data.frame`, etc., for clarity):
Version 1 names(df[2]) <- "x" translates to:
df <- `[<-`(
df, 2,
value=`names<-`( # `names<-` here returns a re-named one column data frame
`[`(df, 2),
value="x"
) )
Version 2 names(df)[2] <- "x" translates to:
df <- `names<-`(
df,
`[<-`(
names(df), 2, "x"
) )
Also, turns out this is "documented" in R Inferno Section 8.2.34 (Thanks #Frank):
right <- wrong <- c(a=1, b=2)
names(wrong[1]) <- 'changed'
wrong
# a b
# 1 2
names(right)[1] <- 'changed'
right
# changed b
# 1 2