x_names <-c("x1","x2","x3")
data <- c(1,2,3,4)
fake <- c(2,3,4,5)
for (i in x_names)
{
x = fake
data = as.data.frame(cbind(data,x))
#data <- data %>% rename(x_names = x)
}
I made a toy example. This code will generate a data frame with 1 column called data, and 3 columns called x. Instead of calling the columns x, I want them with the name x1, x2, x3 (stored in x_names). I put the x_name in the code (comment out), but it does not work. Could you help me with it?
We can also use map_dfc from tidyverse:
library(tidyverse)
cbind(data, map_dfc(x_names, ~ tibble(!!.x := fake)))
Output:
data x1 x2 x3
1 1 2 2 2
2 2 3 3 3
3 3 4 4 4
4 4 5 5 5
We can avoid the for loop and use replicate to repeat fake data using setNames to name the dataframe with x_names.
cbind(data, setNames(data.frame(replicate(length(x_names), fake)), x_names))
# data x1 x2 x3
#1 1 2 2 2
#2 2 3 3 3
#3 3 4 4 4
#4 4 5 5 5
Ideally one should avoid growing objects in a loop, however one way to solve OP's problem in loop is
for (i in seq_along(x_names)) {
data = cbind.data.frame(data, fake)
names(data)[i + 1] <- x_names[i]
}
An option is just to assign the 'fake' to create the new columns in base R
data[x_names] <- fake
data
# data x1 x2 x3
#1 1 2 2 2
#2 2 3 3 3
#3 3 4 4 4
#4 4 5 5 5
EDIT: Based on comments from #avid_useR
data
data <- data.frame(data)
When you exchange your out-commented line
#data <- data %>% rename(x_names = x)
with
colnames(data)[ncol(data)] <- i
it should set the right colnames.
Related
In R, I am attempting to create a column of a local min/max, based on 2 other columns.
In particular, I want the 3rd column to be a "current" column, and when x1 > current or x2 < current I want to update currentValue. Otherwise, it should be the previous currentValue
Initially, I set the entire y1 column to my starting value.
As can be seen, Row 5 should be using the currentValue of 5, and no change should be made. However, the comparison is being made to the value of 2 instead.
Any help would be greatly appreciated as I am unfamiliar with applying custom rolling functions in R. It seems like there should be an elegant solution for this, but a few other similar posts require a lot of code to accomplish this.
> c1 <- c(1,1,2,5,4,3,2,1)
> c2 <- c(2,3,3,6,6,4,4,2)
> c3 <- 2
> tempData <- data.frame(c1,c2,c3)
> names(tempData) <- c("x1", "x2", "currentValue")
> tempData
x1 x2 currentValue
1 1 2 2
2 1 3 2
3 2 3 2
4 5 6 2
5 4 6 2
6 3 4 2
7 2 4 2
8 1 2 2
>
> tempData$currentValue <- ifelse (tempData$x1 > lag(tempData$currentValue), tempData$x1, ifelse(tempData$x2 < lag(tempData$currentValue), tempData$x2, lag(tempData$currentValue)))
> tempData
x1 x2 currentValue
1 1 2 NA
2 1 3 2
3 2 3 2
4 5 6 5
5 4 6 4
6 3 4 3
7 2 4 2
8 1 2 2
I think this code could help you.
It is problematic to apply that lag function in the ifelse statement, in you code is not shifting the values of the column I guess, anyway, check this following code.
c1 <- c(1,1,2,5,4,3,2,1)
c2 <- c(2,3,3,6,6,4,4,2)
c3 <- 2
tempData <- data.frame(c1,c2,c3)
names(tempData) <- c("x1", "x2", "currentValue")
tempData
tempData$x1.lag <- c(NA, tempData$x1[1:7] )
tempData$x2.lag <- c(NA, tempData$x2[1:7] )
tempData
tempData$currentValue <- ifelse (tempData$x1 > tempData$x1.lag , tempData$x1,
ifelse( tempData$x2 < tempData$x2.lag, tempData$x2, tempData$currentValue))
tempData$x1.lag <- NULL
tempData$x2.lag <- NULL
tempData
I am trying to add a new column to multiple data frames, and then replace the original data frame with the new one. This is how I am creating the new data frames:
df1 <- data.frame(X1=c(1,2,3),X2=c(1,2,3))
df2 <- data.frame(X1=c(4,5,6),X2=c(4,5,6))
groups <- list(df1,df2)
groups <- lapply(groups,function(x) cbind(x,X3=x[,1]+x[,2]))
groups
[[1]]
X1 X2 X3
1 1 1 2
2 2 2 4
3 3 3 6
[[2]]
X1 X2 X3
1 4 4 8
2 5 5 10
3 6 6 12
I'm satisfied with how the new data frames have been created. What I'm stuck on is then breaking up my groups list and then saving the list elements back into their respective original data frames.
Desired Output
Essentially, I want to do something like df1,df2 <- groups[[1]],groups[[2]] but that is of course not syntatically valid. I have more than 2 data frames, which is why I'm hoping for a more programmatic approach than simply typing out N lines of code.
for (i in 1:length(groups)){
assign(paste("df",i,sep=""),as.data.frame(groups[[i]]))
}
should do it. Try it out, please.
#Rockbar led me to a general solution as well:
for(i in 1:length(groups)){
assign(names(groups)[i],as.data.frame(groups[[i]]))
}
> df1
X1 X2 X3
1 1 1 2
2 2 2 4
3 3 3 6
> df2
x1 X3 X3
1 4 4 8
2 5 5 10
3 6 6 12
I should note that this only works if the objects in the list are all named. Thank you again #Rockbar for guiding me to this.
I have an automated script that produces a standard formula (i.e., y~x1+x2) and I would like to screen my data out based on those variables.
So far I have gotten this far, but I hit a sticking point where I can't quite figure it out:
#Example data
df <- data.frame(x=1:5, y=2:6, z=3:7, u=4:8)
df
x y z u
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
5 5 6 7 8
#Example formula
ex_form = "x~y+u"
#Delete the ~ and add a + sign to be consistent
step1 = gsub("~","+", ex_form)
#Remove + signs
step2 = strsplit(step1, "\\+")
#Final list of variables
step3 = unlist(step2)
Most solutions I've seen is something along the lines of:
#Create list of variables
mylist = c("x", "y", "u")
#Cut data
temp = df[ ,mylist]
temp
x y u
1 1 2 4
2 2 3 5
3 3 4 6
4 4 5 7
5 5 6 8
But this solution doesn't quite fit into the automation...so I need to jump from what I have to that outcome. Any thoughts?
Note: Tags are my guesses.
If you don't put your formula between " " it will be recognized as such, and can use all.vars() to extract variables from it.
ex_form = x~y+u #Without quotes it is a formula, check str(ex_form)
df[, all.vars(ex_form)]
# x y u
#1 1 2 4
#2 2 3 5
#3 3 4 6
#4 4 5 7
#5 5 6 8
Am I missing something or does simply doing temp <- df[,step3] return exactly what you say you want?
I have a univariate contingency table that I would like to convert to a data frame.
>t <- table(c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4))
>t
1 2 3 4
4 4 4 4
But converting t to a data frame yields something I dont need:
>data.frame(t)
Var1 Freq
1 1 4
2 2 4
3 3 4
4 4 4
I would like a data frame that looks exactly like the table t, with 4 columns named 1, 2, 3 and 4 (or X1, X2, X3, X4), and one row. Any help I can find, using things like as.data.frame.matrix() return errors for me, I think because my data is univariate and not multivariate.
We can use as.data.frame.list()
tbl <- table(rep(1:4, 4))
as.data.frame.list(tbl)
# X1 X2 X3 X4
# 1 4 4 4 4
Or to use the original names, add optional = TRUE
as.data.frame.list(tbl, optional = TRUE)
# 1 2 3 4
# 1 4 4 4 4
I am new to R. I am trying to read data from Excel in the mentioned format
x1 x2 x3 y1 y2 y3 Result
1 2 3 7 8 9
4 5 6 10 11 12
and data.frame in R should take data in mentioned format for 1st row
x y
1 7
2 8
3 9
then I want to use lm() and export the result to result column.
I want to automate this for n rows i.e once results of 1st column is exported to Excel then I want to import data for second row.
Please Help.
library(gdata)
# this spreadsheet is exactly as in your question
df.original <- read.xls("test.xlsx", sheet="Sheet1", perl="C:/strawberry/perl/bin/perl.exe")
#
#
> df.original
x1 x2 x3 y1 y2 y3
1 1 2 3 7 8 9
2 4 5 6 10 11 12
#
# for the above code you'll just need to change the argument 'perl' with the
# path of your installer
#
# now the example for the first row
#
library(reshape2)
df <- melt(df.original[1,])
df$variable <- substr(df$variable, 1, 1)
df <- as.data.frame(lapply(split(df, df$variable), `[[`, 2))
> df
x y
1 1 7
2 2 8
3 3 9
Now, at this stage we automated the process of inport/transformation (for one line).
First question: How you want the data to look like when every line will be treated?
Second question: In result, what do you want exactly to put? residual, fitted values? what you need from lm()?
EDIT:
ok, #kapil tell me if the final shape of df is what you thought:
library(reshape2)
library(plyr)
df <- adply(df.original, 1, melt, .expand=F)
names(df)[1] <- "rowID"
df$variable <- substr(df$variable, 1, 1)
rows <- df$rowID[ df$variable=="x"] # with y would be the same (they are expected to have the same legnth)
df <- as.data.frame(lapply(split(df, df$variable), `[[`, c("value")))
df$rowID <- rows
df <- df[c("rowID", "x", "y")]
> df
rowID x y
1 1 1 7
2 1 2 8
3 1 3 9
4 2 4 10
5 2 5 11
6 2 6 12
regarding the coefficient you can calculate for each rowID (which refers to the actual row in the xls file) in this way:
model <- dlply(df, .(rowID), function(z) {print(z); lm(y ~ x, df);})
> sapply(model, `[`, "coefficients")
$`1.coefficients`
(Intercept) x
6 1
$`2.coefficients`
(Intercept) x
6 1
so, for each group (or row in original spreadsheet) you have (as expected) two coefficients, intercept and slope, therefore I can't figure out how you want the coefficient to fit inside the data.frame (especially in the 'long' way it appears just above). But if you wanted the data.frame to stay in 'wide' mode then you can try this:
# obtained the object model, you can put the coeff in the df.original data.frame
#
> ldply(model, `[[`, "coefficients")
rowID (Intercept) x
1 1 6 1
2 2 6 1
df.modified <- cbind(df.original, ldply(model, `[[`, "coefficients"))
> df.modified
x1 x2 x3 y1 y2 y3 rowID (Intercept) x
1 1 2 3 7 8 9 1 6 1
2 4 5 6 10 11 12 2 6 1
# of course, if you don't like it, you can remove rowID with df.modified$rowID <- NULL
Hope this helps, and let me know if you wanted the 'long' version of df.