I have a following sample data frame:
x<-c(1:4)
y<-c(9:12)
z<-c("a","b","c","d")
data<-data.frame(x,y,z) # as data:
x y z
1 1 9 a
2 2 10 b
3 3 11 c
4 4 12 d
I want to extract the column 2 or 3 using the function (note: I am using column names to extract). My code is as follows:
data_frame<-function(col){
cols<-c("y","z")
# column x is already there; it is not in a vector of col.
if (col %in% cols){
kk<-data[,c("x","col")]
return (kk)}
}
Now, I want the output for data_frame("y"). However, R gives me the following error:
data_frame("y")
Error in `[.data.frame`(data, , c("x", "col")) :
undefined columns selected.
I was wondering why R is not taking my argument col which is y here. I am a bit upset why R is interpreting argument col as the name of the column. Your valuable suggestion in this regard would be highly appreciated.
This part: kk<-data[,c("x","col")] should be kk<-data[,c("x",col)]
Related
I have 2 vectors. I am trying to create a tibble with all combinations of the 2 vectors with the following error.
C <- c(1,2,3,4)
G <- c(1,2,3,4,5)
tibble('C' = rep(C, each = length(G)), 'G' = rep(G, length(C)))
Error: Column `C` must be length 1 or 100, not 20
Error disappears when I rename column 'C' to column 'A' for example.
We also don't get the same error with a data.frame
I suspect length(C) takes 'C' value from the tibble.
Is this an intended behaviour?
If so can someone explain how this is useful in practice? (i.e how would someone take advantage of this in their code)
Because tibbles are an extension to data.frame, and not an exact drop-in replacement, you can do things like:
tibble(a=1:3, b=a+1)
## A tibble: 3 x 2
# a b
# <int> <dbl>
#1 1 2
#2 2 3
#3 3 4
...where you can reference earlier created columns. And your example is an instance of when that might be a problem.
To quote the manual:
"Arguments are evaluated sequentially, so you can refer to previously
created variables."
Source: http://tibble.tidyverse.org/reference/tibble.html
So in this case, the C in rep(G, length(C)) is actually referencing the tibblename$C you just created, which is length 20, rather than the vector C in the global environment, which is length 4.
As a newbie I wanted to get better on loops and if...else statemets in R. I am trying to replace NAs using a for loops and if...else instead of ifelse and lapply.However, I couldn't index the data properly in the if... else bit.
Example:
data<-data.frame(a<-c("a","b","c","d"),
b<-c("1","2",NA,"5"),
c<-c("10",NA,"30",40))
for (i in data){
for (x in 1:nrow(i)){
if (x==NA) {
x<-mean(i,na.rm=T)
}else
x<-x
}
I get an error saying "Error in 1:nrow(i) : argument of length 0". Any suggestions ?
To address your error first: as you loop through the data frame, i is a 1D vector (i.e., a column of the data frame) and so nrow doesn't make any sense. To see this, try for(i in data)print(nrow(i)).
You're declaring individual vectors outside a data frame when you use the following syntax:
data<-data.frame(a<-c("a","b","c","d"),
b<-c("1","2",NA,"5"),
c<-c("10",NA,"30",40))
Just try typing a and you'll see it exists outside the data frame. Also, it means the data frame is defined incorrectly. Check it out:
a....c..a....b....c....d.. b....c..1....2...NA...5..
1 a 1
2 b 2
3 c <NA>
4 d 5
c....c..10...NA...30...40.
1 10
2 <NA>
3 30
4 40
What you actually need is the following:
data <- data.frame(a = c("a","b","c","d"),
b = c("1","2",NA,"5"),
c = c("10",NA,"30",40))
which gives
a b c
1 a 1 10
2 b 2 <NA>
3 c <NA> 30
4 d 5 40
Also, your braces for the loops don't match up correctly.
If you examine the class of each column in data by running lapply(data, class), you'll see they're all factors. Taking the mean – as you try to do in your code – is therefore meaningless. If columns b and c are meant to be numerics, then you don't need the quotation marks in their definition, like this:
data <- data.frame(a = c("a", "b", "c", "d"),
b = c(1, 2, NA, 5),
c = c(10, NA, 30 ,40))
If column a was also a numeric, you could achieve your objective with this:
for(i in 1:ncol(data)){
data[is.na(data[,i]), i] <- mean(data[,i], na.rm = TRUE)
}
from here.
When checking for the existence of NAs you have to use the is.na() function, since NAs work just as NULLs in relational databases.
As an ilustration of how it works, you can run the following lines in your R console, and check the outputs:
1 == 1
1 == 2
1 == NA
NA == NA
is.na(NA)
This being said, if what you want is to replace NAs values in your data frame with column means, you can check this previous question.
I am working with a data by using pamr and tring to do a prediction analysis of microarrays. I tried an examples in this package and it worked well as follows.
*x <- matrix(rnorm(1000*20),ncol=20)
y <- sample(c(1:4),size=20,replace=TRUE)
mydata <- list(x=x,y=y)
mytrain <- pamr.train(mydata)
123456789101112131415161718192021222324252627282930
mycv <- pamr.cv(mytrain,mydata)
1234Fold 1 :123456789101112131415161718192021222324252627282930
Fold 2 :123456789101112131415161718192021222324252627282930
Fold 3 :123456789101112131415161718192021222324252627282930
pamr.predict(mytrain, mydata$x , threshold=1)
[1] 1 3 1 2 1 3 2 2 4 3 2 1 4 2 3 1 2 1 2 4
Levels: 1 2 3 4*
However,when I run those codes to handle my data, I receive the following error:
"Error in 1:ncol(data$x) : argument of length 0"
*"z=read.table("shishi.txt",sep="\t",header=T)
mytrain <- pamr.train(Z)
Error in 1:ncol(data$x) : argument of length 0"*
My data was performed in the format of the example in the package as follows:
Did the error mean that there is no arguments in column? How to deal with the error? Thanks.
From "pamr" manual:
The input data. A list with components: x- an expression genes in the
rows, samples in the columns), and y- a vector of the class labels for
each sample. Optional components- genenames, a vector of gene names,
and geneid- a vector of gene identifiers.
In your example you've created a list with this characteristics in mydata <- list(x=x,y=y), but not in your actual data use.
After reading the table into R, with z=read.table("shishi.txt",sep="\t",header=T), you should create a list with mydata <- list(x=z,y=samplegroups), where samplegroups is your sample group vector.
Having an issue here - I'm creating a function using the eclipse parameter to deal with a varying function parameters. I recreated as similar situation to show the issue I keep bumping into,
> d <- data.frame(alpha=1:3, beta=4:6, gamma=7:9)
> d
alpha beta gamma
1 1 4 7
2 2 5 8
3 3 6 9
> x <- list("alpha", "beta")
> rowSums(d[,c(x)])
Error in .subset(x, j) : invalid subscript type 'list'
How do I deal with the issue of feeding a list into a subset call?
We need to use concatenate to create a vector instead of list
x <- c("alpha", "beta")
rowSums(d[x])
#[1] 5 7 9
and if we are using list, then unlist it to create a vector as data.frame takes a vector of column names (column index) or row names (row index) to subset the columns or rows
x <- list("alpha", "beta")
rowSums(d[unlist(x)])
#[1] 5 7 9
First of all i would like to tell that I am new to R programming. I was doing some experiment on some R code. I am facing some strange behaviour that I do not expect. I think some one can help me to figure it out.
I ran the following code to read data from a CSV file:
normData= read.csv("normData.csv");
and my normData looks like:
But When I ran the following code to form a Data Frame:
datExpr0 = as.data.frame(t(normData));
I get the following data:
Can some one please tell me, from where the an extra raw (v1,v2,v3,v4,v5,v6) coming from?
Try using:
setNames(as.data.frame(t(normData[-1])), normData[[1]])
However, it might be better to see if you can use the row.names argument in read.table to directly read your "X" as the row names. Then you should be able to directly use as.data.table(t(...)).
Here's a small example to show what's happening:
Start with a data.frame with characters as the first column:
df <- data.frame(A = letters[1:3],
B = 1:3, C = 4:6)
df
# A B C
# 1 a 1 4
# 2 b 2 5
# 3 c 3 6
When you transpose the entire thing, you also transpose that first column (thereby also creating a character matrix).
as.data.frame(t(df))
# V1 V2 V3
# A a b c
# B 1 2 3
# C 4 5 6
So, we drop the column first, and use the values from the column to replace the "V1", "V2"... names.
setNames(as.data.frame(t(df[-1])), df[[1]])
# a b c
# B 1 2 3
# C 4 5 6