R: object with negative row.name value - r

I think I have the same issue with this: What's the difference between row.names() and attributes$row.names?
When I use dput now I get something like this:
-0.0120067403271522, -0.00712477902137182, -0.0105058179972997,
-0.0115956365572667, -0.00507521571067687, -0.013870827853567,
-0.0160501419238977, -0.00225243465241482, -0.0145865320678265,
-0.00118232647592066, -0.0190385732141539, 0.0108223868283294,
-0.0159300331503545, 0.0319315053338279, 0, 0.00315703437341087,
0.0368045045454188, -0.0276264287281491, -0.0101235678857984,
0.00486601316019395)), class = "data.frame", row.names = c(NA,
-11834L))
I discovered this while I was trying to force define rownames(var) <- c(list_of_row_names).
I get the error:
Error in .rowNamesDF<-(x, value = value) : invalid 'row.names'
length`
The thing is this object has values inside it. Anyone can tell me how I can rewind/fix this?
From my understanding, this happened bc R didnt know row names when this object was created?

The length of that variable list_of_row_names does not match with the nrow() of the data frame
See an example given below:
df <- data.frame(1:5)
list_of_row_names <- letters[1:4]
rownames(df) <- list_of_row_names
Error in row.names<-.data.frame(*tmp*, value = value) :
invalid 'row.names' length
nrow(df)
#[1] 5
length(list_of_row_names)
# [1] 4

Related

Error in UseMethod("select_") : no applicable method for 'select_' applied to an object of class "character"

I am trying to extract some columns from the data which is result of analysis. The data is composed of 592 rows and 20 variables.
When I run the code as below, I got the error message
"Error in UseMethod("select_") : no applicable method for 'select_' applied to an object of class "character" "
unused_cols <- -c(2:9)
pvals_long <- pvals %>%
map(function(x){
x <- x %>%
dplyr::select(unused_cols) %>%
gather(key = "celltype_pair", value = "pvalue", -interacting_pair)
x
})
Thanks in advance,
map is not needed. Mapping on a dataframe means that you are trying to apply your function on each column. However, select expects a dataframe, while in your code it gets a vector. That's what the error is telling you.
unused_cols <- -c(2:9) will not work. Put the -in the call to select.
Try this:
unused_cols <- c(2:9)
pvals_long <- pvals %>%
select(-unused_cols) %>%
gather(key = "celltype_pair", value = "pvalue", -interacting_pair)

How to prevent coercion to list in R

I am trying to remove all NA values from two columns in a matrix and make sure that neither column has a value that the other doesn't.
code:
data <- dget(file)
dependent <- data[,"chroma"]
independent <- data[,"mass..Pantheria."]
names(independent) <- names(dependent) <- rownames(data)
for (name in rownames(data)) {
if(is.na(dependent[name])) {
independent$name <- NULL
dependent$name <- NULL
}
if(is.na(independent[name])) {
independent$name <- NULL
dependent$name <- NULL
}
}
print(dput(independent))
print(dput(dependent))
I am brand new to R and am trying to perform this task with a for loop. However, when I delete a section by assigning NULL I receive the following warning:
1: In independent$Aeretes_melanopterus <- NULL : Coercing LHS to a list
2: In dependent$name <- NULL : Coercing LHS to a list
No elements are deleted and independent and dependent retain all their original rows.
file (input):
structure(list(chroma = c(7.443501276, 10.96156313, 13.2987235,
17.58110922, 13.4991105), mass..Pantheria. = c(NA, 126.57, NA,
160.42, 250.57)), .Names = c("chroma", "mass..Pantheria."), class = "data.frame", row.names = c("Aeretes_melanopterus",
"Ammospermophilus_harrisii", "Ammospermophilus_insularis", "Ammospermophilus_nelsoni",
"Atlantoxerus_getulus"))
chroma mass..Pantheria.
Aeretes_melanopterus 7.443501 NA
Ammospermophilus_harrisii 10.961563 126.57
Ammospermophilus_insularis 13.298723 NA
Ammospermophilus_nelsoni 17.581109 160.42
Atlantoxerus_getulus 13.499111 250.57
desired output:
structure(list(chroma = c(10.96156313, 17.58110922, 13.4991105
), mass..Pantheria. = c(126.57, 160.42, 250.57)), .Names = c("chroma",
"mass..Pantheria."), class = "data.frame", row.names = c("Ammospermophilus_harrisii",
"Ammospermophilus_nelsoni", "Atlantoxerus_getulus"))
chroma mass..Pantheria.
Ammospermophilus_harrisii 10.96156 126.57
Ammospermophilus_nelsoni 17.58111 160.42
Atlantoxerus_getulus 13.49911 250.57
structure(c(126.57, 160.42, 250.57), .Names = c("Ammospermophilus_harrisii",
"Ammospermophilus_nelsoni", "Atlantoxerus_getulus"))
Ammospermophilus_harrisii Ammospermophilus_nelsoni Atlantoxerus_getulus
126.57 160.42 250.57
structure(c(10.96156313, 17.58110922, 13.4991105), .Names = c("Ammospermophilus_harrisii",
"Ammospermophilus_nelsoni", "Atlantoxerus_getulus"))
Ammospermophilus_harrisii Ammospermophilus_nelsoni Atlantoxerus_getulus
10.96156 17.58111 13.49911
Looks like you want to omit rows from your data where chroma or mass..Pantheria are NA. Here's a quick way to do it:
data = data[!is.na(data$chroma) & !is.na(data$mass..Pantheria.), ]
I'm not sure why you are breaking independent and dependent out separately, but after filtering out bad observations is a good time to do it.
Since those are your only two columns, this is equivalent to omitting rows from your data frame that have any NA values, so you can use a shortcut like this:
data = na.omit(data)
If you want to keep a "pristine" copy of your raw data, simply change the name of the result:
data_no_na = na.omit(data)
# or
data = data[!is.na(data$chroma) & !is.na(data$mass..Pantheria.), ]
As to what's wrong with your code, $ is used for extracting columns from a data frame, but you're trying to use it for a named vector (since you've already extracted the columns), which doesn't work. Even then, $ only works with a literal string, you can't use it with a variable. For data frames, you need to use brackets to extract columns stored in variables. For example, the built-in mtcars data has a column called "mpg":
# these work:
mtcars$mpg
mtcars[, "mpg"]
my_col = "mpg"
mtcars[, my_col]
mtcars$my_col ## does not work, need to use brackets!
You can never use $ with row names in a data frame, only column names.

PCA with result non-interactively in R

I send you a message because I would like realise an PCA in R with the package ade4.
I have the data "PAYSAGE" :
All the variables are numeric, PAYSAGE is a data frame, there are no NAS or blank.
But when I do :
require(ade4)
ACP<-dudi.pca(PAYSAGE)
2
I have the message error :
**You can reproduce this result non-interactively with:
dudi.pca(df = PAYSAGE, scannf = FALSE, nf = NA)
Error in if (nf <= 0) nf <- 2 : missing value where TRUE/FALSE needed
In addition: Warning message:
In as.dudi(df, col.w, row.w, scannf = scannf, nf = nf, call = match.call(), :
NAs introduced by coercion**
I don't understand what does that mean. Have you any idea??
Thank you so much
I'd suggest sharing a data set/example others could access, if possible. This seems data-specific and with NAs introduced by coercion you may want to check the type of your input - typeof(PAYSAGE) - the manual for dudi.pca states it takes a data frame of numeric values as input.
Yes, for example :
ag_div <- c(75362,68795,78384,79087,79120,73155,58558,58444,68795,76223,50696,0,17161,0,0)
canne <- c(rep(0,10),5214,6030,0,0,0)
prairie_el<- c(60, rep(0,13),76985)
sol_nu <- c(18820,25948,13150,9903,12097,21032,35032,35504,25948,20438,12153,33096,15748,33260,44786)
urb_peu_d <- c(448,459,5575,5902,5562,458,6271,6136,459,1850,40,13871,40,13920,28669)
urb_den <- c(rep(0,12),14579,0,0)
veg_arbo <- c(2366,3327,3110,3006,3049,2632,7546,7620,3327,37100,3710,0,181,0,181)
veg_arbu <- c(18704,18526,15768,15527,15675,18886,12971,12790,18526,15975,22216,24257,30962,24001,14523)
eau <- c(rep(0,10),34747,31621,36966,32165,28054)
PAYSAGE<-data.frame(ag_div,canne,prairie_el,sol_nu,urb_peu_d,urb_den,veg_arbo,veg_arbu,eau)
require(ade4)
ACP<-dudi.pca(PAYSAGE)

replacing a value in column X based on columns Y with R

i've gone through several answers and tried the following but each either yields an error or an un-wanted result:
here's the data:
Network Campaign
Moburst_Chartboost Test Campaign
Moburst_Chartboost Test Campaign
Moburst_Appnext unknown
Moburst_Appnext 1065
i'd like to replace "Test Campaign" with "1055" whenever "Network" == "Moburst_Chartboost". i realize this should be very simple but trying out these:
dataset = read.csv('C:/Users/User/Downloads/example.csv')
for( i in 1:nrow(dataset)){
if(dataset$Network == 'Moburst_Chartboost') dataset$Campaign <- '1055'
}
this yields an error: Warning messages:
1: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
the condition has length > 1 and only the first element will be used
2: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
the condition has length > 1 and only the first element will be used
etc.
then i tried:
within(dataset, {
dataset$Campaign <- ifelse(dataset$Network == 'Moburst_Chartboost', '1055', dataset$Campaign)
})
this turned ALL 4 values in row "Campaign" into "1055" over running what was there even when condition isn't met
also this:
dataset$Campaign[which(dataset$Network == 'Moburst_Chartboost')] <- 1055
yields this error, and replaced the values in the two first rows of "Campaign" with NA:
Warning message:
In `[<-.factor`(`*tmp*`, which(dataset$Network == "Moburst_Chartboost"), :
invalid factor level, NA generated
scratching my head here. new to R but this shouldn't be so hard :(
In your first attempt, you're trying to iterate over all the columns, when you only want to change the 2nd column.
In your second, you're trying to assign the value "1055" to all of the 2nd column.
The way to think about it is as an if else, where if the condition in col 1 is met, col 2 is changed, otherwise it remains the same.
dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost",
"Moburst_Appnext", "Moburst_Appnext"),
Campaign = c("Test Campaign", "Test Campaign",
"unknown", "1065"))
dataset$Campaign <- ifelse(dataset$Network == "Moburst_Chartboost",
"1055",
dataset$Campaign)
head(dataset)
Network Campaign
1 Moburst_Chartboost 1055
2 Moburst_Chartboost 1055
3 Moburst_Appnext unknown
4 Moburst_Appnext 1065
You may also try dataset$Campaign[dataset$Campaign=="Test Campaign"]<-1055 to avoid the use of loops and ifelse statements.
Where dataset
dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost",
"Moburst_Appnext", "Moburst_Appnext"),
Campaign = c("Test Campaign", "Test Campaign",
"unknown", 1065))
Try the following
dataset = read.csv('C:/Users/User/Downloads/example.csv', stringsAsFactors = F)
for( i in 1:nrow(dataset)){
if(dataset$Network[i] == 'Moburst_Chartboost') dataset$Campaign[i] <- '1055'
}
It seems your forgot the index variable. Without [i] you work on the whole vector of the data frame, resulting in the error/warning you mentioned.
Note that I added stringsAsFactors = F to the read.csv() function to make sure the strings are indeed interpreted as strings and not factors. Using factors this would result in an error like this
In `[<-.factor`(`*tmp*`, i, value = c(NA, 2L, 3L, 1L)) :
invalid factor level, NA generated
Alternatively you can do the following without using a for loop:
idx <- which(dataset$Network == 'Moburst_Chartboost')
dataset$Campaign[idx] <- '1055'
Here, idx is a vector containing the positions where Network has the value 'Moburst_Chartboost'
thank you for the help! not elegant, but since this lingered with me when going to sleep last night i decided to try to bludgeon this with some ugly code but it worked too - just as a workaround...separated to two data frames, replaced all values and then binded back...
# subsetting only chartboost
chartboost <- subset(dataset, dataset$Network=='Moburst_Chartboost')
# replace all values in Campaign
chartboost$Campaign <-sub("^.*", "1055",chartboost$Campaign)
#subsetting only "not chartboost"
notChartboost <-subset(dataset, dataset$Network!='Moburst_Chartboost')
# binding back to single dataframe
newSet <- rbind(chartboost, notChartboost)
Ugly as a duckling but worked :)

Error in as(x, class(k)) : no method or default for coercing “NULL” to “data.frame”

I am currently facing an error mentioned below which is related to NULL values being coerced to a data frame. The data set does contain nulls, however I have tried both is.na() and is.null() functions to replace the null values with something else. The data is stored on hdfs and is stored in a pig.hive format. I have also attached the code below. The code works fine if I remove v[,25] from the key.
Code:
AM = c("AN");
UK = c("PP");
sample.map <- function(k,v){
key <- data.frame(acc = v[!which(is.na(v[,1],1],
year = substr(v[!which(is.na(v[,1]),2],1,4),
month = substr(v[!which(is.na(v[,1]),2],5,6))
value <- data.frame(v[,3],count=1)
keyval(key,value)
}
sample.reduce <- function(key,v){
AT <- sum(v[which(v[,1] %in% AM=="TRUE"),2])
UnknownT <- sum(v[which(v[,1] %in% UK=="TRUE"),2])
Total <- AT + UnknownT
d <- data.frame(AT,UnknownT,Total)
keyval(key,d)
}
out <- mapreduce(input ="/user/hduser/input",
output = "/user/hduser/output",
input.format = make.input.format("pig.hive", sep = "\u0001")
output.format = make.output.format("csv", sep = ","),
map= sample.map)
reduce = sample.reduce)
Error:
Warning in asMethod(object) : NAs introduced by coercion
Warning in split.default(1:rmr.length(y), unique(ind), drop = TRUE) : data length is not a multiple of split variable
Warning in rmr.split(x, x, FALSE, keep.rownames = FALSE) : number of items to replace is not a multiple of replacement length Warning in split.default(1:rmr.length(y), unique(ind), drop = TRUE) :
data length is not a multiple of split variable
Warning in rmr.split(v, ind, lossy = lossy, keep.rownames = TRUE) : number of items to replace is not a multiple of replacement length
Error in as(x, class(k)) :
no method or default for coercing “NULL” to “data.frame”
Calls: <Anonymous> ... apply.reduce -> c.keyval -> reduce.keyval -> lapply -> FUN -> as No traceback available
UPDATE
I have added the sample data and edited the code above. Hope this helps!
Sample Data:
NULL,"2014-03-14","PP"
345689202,"2014-03-14","AN"
234539390,"2014-03-14","PP"
123125444,"2014-03-14","AN"
NULL,"2014-03-14","AN"
901828393,"2014-03-14","AN"
There are some issues with as which have been identified recently. I don't see why as can't handle this by default, but you can modify coerce which handles the conversion with an S4 method to call as.data.frame.
setMethod("coerce",c("NULL","data.frame"), function(from, to, strict=TRUE) as.data.frame(from))
[1] "coerce"
as(NULL,"data.frame")
data frame with 0 columns and 0 rows

Resources