I have extracted this dataframe:
> df<-as.data.frame(model_rf$variable.importance)
> df
Importance
DayOfWeek 3.763932e+11
Customers 1.364059e+12
Open 6.345289e+11
Promo 2.617495e+11
StateHoliday 5.196666e+09
SchoolHoliday 6.522969e+09
DateYear 7.035399e+09
DateMonth 2.013482e+10
DateDay 3.763177e+10
DateWeek 3.283496e+10
StoreType 3.156843e+10
Assortment 2.025741e+10
CompetitionDistance 1.118476e+11
CompetitionOpenSinceMonth 4.633220e+10
CompetitionOpenSinceYear 4.554890e+10
Promo2 0.000000e+00
Promo2SinceWeek 5.066674e+10
Promo2SinceYear 4.096407e+10
CompetitionOpen 3.992745e+10
PromoOpen 2.831936e+10
IspromoinSales 2.844220e+09
then I want to extract values in other column:
> v<-as.vector(model_rf$variable.importance$Importance)
> v
[1] 3.763932e+11 1.364059e+12 6.345289e+11 2.617495e+11 5.196666e+09 6.522969e+09 7.035399e+09 2.013482e+10 3.763177e+10
[10] 3.283496e+10 3.156843e+10 2.025741e+10 1.118476e+11 4.633220e+10 4.554890e+10 0.000000e+00 5.066674e+10 4.096407e+10
[19] 3.992745e+10 2.831936e+10 2.844220e+09
And names of each row in other column
> w<-(as.vector((row.names(df))))
> w
[1] "DayOfWeek" "Customers" "Open" "Promo"
[5] "StateHoliday" "SchoolHoliday" "DateYear" "DateMonth"
[9] "DateDay" "DateWeek" "StoreType" "Assortment"
[13] "CompetitionDistance" "CompetitionOpenSinceMonth" "CompetitionOpenSinceYear" "Promo2"
[17] "Promo2SinceWeek" "Promo2SinceYear" "CompetitionOpen" "PromoOpen"
[21] "IspromoinSales"
Then I need to get a data frame created by the tow vector above:
DF<-as.data.frame(w,v)
DF<-as.data.frame(w,v) Warning message: In as.data.frame.vector(x, ..., nm = nm) : 'row.names' is not a character vector of length 21
-- omitting it. Will be an error!
In fact, it seems that the w vector doesn't be converted as vector class even I did as.vector. It still as a character class.
> class(w)
[1] "character"
How do you explain this please?
Try this code:
DF<-as.data.frame(cbind(w,v))
If you look at the documentation of as.data.frame you see that the function expects the second vector to be a character vector for row names.
In your case, you supplied first the row names and then the values, leading to the error above.
You can either use
as.data.frame(v,w)
or
data.frame(w,v)
to get your desired result.
Related
I,d like to change several strings in vector. In my case, I have in all.images object:
# Original character's list
all.images <-c("S2B2A_20171003_124_IndianaIIPR00911120170922_BOA_10.tif",
"S2B2A_20181028_124_IndianaIIPR0065820181024_BOA_10.tif",
"S2B2A_20170715_124_SantaMariaCalcasPR0033420170731_BOA_10.tif",
"S2B2A_20180928_124_NSraAparecidaBortolettoPR0042720180912_BOA_10.tif",
"S2A2A_20170610_124_LagoaAmarelaPR0022020170619_BOA_10.tif",
"S2A2A_20160705_124_AguaSumidaPR001320160629_BOA_10.tif",
"S2A2A_20181023_124_SaoPedroGabrielGarciaPR001720181031_BOA_10.tif",
"S2B2A_20180908_124_NSraAparecidaBortolettoPR001920180911_BOA_10.tif",
"S2A2A_20180824_124_NSraAparecidaBortolettoPR0043320180911_BOA_10.tif",
"S2A2A_20170720_124_VoAnaPR001520170802_BOA_10.tif",
"S2B2A_20180322_124_SaoMateusPR0021920180314_BOA_10.tif",
"S2A2A_20181212_124_NSradeFatimaJoaoBatistaPR002320181128_BOA_10.tif",
"S2A2A_20180413_081_SantaFeSebastiaoFogacaPR0021920180427_BOA_10.tif",
"S2B2A_20170913_124_PerdizesPR0034920170905_BOA_10.tif",
"S2A2A_20170610_124_TresMeninasPR001820170601_BOA_10.tif",
"S2B2A_20180428_081_SantaFeSebastiaoFogacaPR0021020180501_BOA_10.tif",
"S2B2A_20180508_081_SantaFeSebastiaoFogacaPR0022320180427_BOA_10.tif",
"S2A2A_20170809_124_VoAnaPR001620170803_BOA_10.tif",
"S2B2A_20180819_124_PontalIIPR0012220180801_BOA_10.tif",
"S2B2A_20181214_081_NSradeFatimaJoaoBatistaPR002320181128_BOA_10.tif",
"S2A2A_20180423_081_SantaFeSebastiaoFogacaPR0033920180427_BOA_10.tif",
"S2A2A_20180814_124_PontalIIPR0012220180801_BOA_10.tif",
"S2B2A_20170715_124_VoAnaPR0015A20170803_BOA_10.tif",
"S2A2A_20160615_124_AguaSumidaPR0011220160627_BOA_10.tif",
"S2A2A_20170720_124_SantaMariaCalcasPR0022820170726_BOA_10.tif",
"S2A2A_20180913_124_SantaMariaCalcasPR001620180829_BOA_10.tif",
"S2B2A_20170804_124_NSraAparecidaBortolettoPR0035720170811_BOA_10.tif",
"S2A2A_20170809_124_SantaFeBaracatPR001920170801_BOA_10.tif",
"S2B2A_20180322_124_NSradeFatimaGlebaAPR001320180403_BOA_10.tif",
"S2B2A_20180508_081_SantaFeSebastiaoFogacaPR0021920180427_BOA_10.tif")
#
My idea is 1) remove S2B2A_ and _BOA_10.tif; 2) After S2B2A_ convert the 8 values into dates (e.g. 2017-09-05); 3) After the dates take the next three
values to the end (eg. 124 or 081); and 4) Separate the characters based in capital letters and dates (eg. AguaSumidaPR0011220160627 to AguaSumida-PR00112-2016-06-27).
But when I try to do:
sub("^\\w+_(\\d+)_(\\d+)_([A-Za-z]+)([A-Z]{2}\\d{3})(\\d)(\\d{4})(\\d{2})(\\d+)_.*",
"\\3_\\4_\\5_\\6-\\7-\\8_\\1_\\2", all.images)
[1] "IndianaII_PR009_1_1120-17-0922_20171003_124"
[2] "IndianaII_PR006_5_8201-81-024_20181028_124"
...
[28] "SantaFeBaracat_PR001_9_2017-08-01_20170809_124"
[29] "NSradeFatimaGlebaA_PR001_3_2018-04-03_20180322_124"
[30] "SantaFeSebastiaoFogaca_PR002_1_9201-80-427_20180508_081"
I have incorrected dates (eg. in [30] 9201-80-427_20180508_081) and my desirable output needs to be:
[1] "IndianaII_PR009111_2017-09-22_2017-10-03_124"
[2] "IndianaII_PR00658_2018-10-24_2018-10-28_124"
...
[28] "SantaFeBaracat_PR0019_2017-08-01_2017-08-09_124"
[29] "NSradeFatimaGlebaA_PR0013_2018-04-03_2018-03-22_124"
[30] "SantaFeSebastiaoFogaca_PR00219_2018-04-27_2018-05-08_081"
Please any help with it?
I think this handles those exceptions in the comments on your answer using look ahead:
sub("^\\w+_(\\d{4})(\\d{2})(\\d{2})_(\\d+)_([A-Za-z]+)([A-Z]{2}\\w+)(?=\\d{8})+(\\d{4})(\\d{2})(\\d+)_.*",
"\\5_\\6_\\7-\\8-\\9_\\1-\\2-\\3_\\4", all.images, perl = TRUE)
I want to create a new data frame by appending the label binary vector to a large dataframe t.dat. NaNs are produced even when I use na.omit=T, which means the NaNswere not due to 0 values.
label <- as.factor(c(rep(0, 21-1+1),rep(1,177-22+1))) # Binary vector 0=non-tumor and 1=glioma
svm.df <-data.frame(label, log(t.dat), na.omit=T)
Warning message: In log(t.dat) : NaNs produced
> which(is.nan(log(t.dat)))
[1] 597849 656262 673097 869853 949681 949692 949700 949725 949728
[10] 1255020 1255029 1427194 1462292 1462370 1946921 2085039 2375207 2375324
[19] 2459488 2471475 2756957 2756962 2756964 2756973 2756982 2757015 2757103
[28] 2757113 2757114 2757117 2757123 2866715 2966242 2966248 3108773 3612388
[37] 3712228 4106033 4863666 4863703 5011987 5012045 5012068 5266896 5358428
[46] 5361451 5494337 5630823 5733845 5733910 5815590 5815592 5815621 5815632
[55] 5815635 5941255 5941305 6073404 6073416 6073456 6073493 6073510 6073521
[64] 6073559 6100700 6100735 6100757 6100786 6239608 6239635 6239646 6239664
[73] 6239719 6425198 6476611 6489147 6865672 6905857 6966059 7049793 7148523
[82] 7172428 7172547 7623457 7726116 7829439 7829468 7829499 8008035
(1) data.frame doesn't have an na.omit argument (check the documentation), so the effect of including na.omit=T will be to include an entire column called na.omit to your data frame.
(2) NaN values arise (in this case) from taking the log of a negative number. If you want to filter these out, you could try
ok <- which(t.dat >= 0)
svm.df <-data.frame(label[ok], log(t.dat[ok]))
I have an igraph object, what I have created with the igraph library. This object is a list. Some of the components of this list have a length of 2. I would like to remove all of these ones.
IGRAPH clustering walktrap, groups: 114, mod: 0.79
+ groups:
$`1`
[1] "OTU0041" "OTU0016" "OTU0062"
[4] "OTU1362" "UniRef90_A0A075FHQ0" "UniRef90_A0A075FSE2"
[7] "UniRef90_A0A075FTT8" "UniRef90_A0A075FYU2" "UniRef90_A0A075G543"
[10] "UniRef90_A0A075G6B2" "UniRef90_A0A075GIL8" "UniRef90_A0A075GR85"
[13] "UniRef90_A0A075H910" "UniRef90_A0A075HTF5" "UniRef90_A0A075IFG0"
[16] "UniRef90_A0A0C1R539" "UniRef90_A0A0C1R6X4" "UniRef90_A0A0C1R985"
[19] "UniRef90_A0A0C1RCN7" "UniRef90_A0A0C1RE67" "UniRef90_A0A0C1RFI5"
[22] "UniRef90_A0A0C1RFN8" "UniRef90_A0A0C1RGE0" "UniRef90_A0A0C1RGX0"
[25] "UniRef90_A0A0C1RHM1" "UniRef90_A0A0C1RHR5" "UniRef90_A0A0C1RHZ4"
+ ... omitted several groups/vertices
For example, this one :
> a[[91]]
[1] "OTU0099" "UniRef90_UPI0005B28A7E"
I tried this but it does not work :
a[lapply(a,length)>2]
Any help?
Since you didn't provide any reproducible data or example, I had to produce some dummy data:
# create dummy data
a <- list(x = 1, y = 1:4, z = 1:2)
# remove elements in list with lengths greater than 2:
a[which(lapply(a, length) > 2)] <- NULL
In case you wanted to remove the items with lengths exactly equal to 2 (question is unclear), then last line should be replaced by:
a[which(lapply(a, length) == 2)] <- NULL
I have a start date and an end date but when I am making a list to contain all dates in between, the format is changed:
> startDate <- as.Date("2012-01-01")
> startDate
[1] "2012-01-01"
> endDate <- as.Date("2012-02-01")
> endDate
[1] "2012-02-01"
> startDate:endDate
[1] 15340 15341 15342 15343 15344 15345 15346 15347 15348 15349 15350 15351 15352 15353 15354 15355
[17] 15356 15357 15358 15359 15360 15361 15362 15363 15364 15365 15366 15367 15368 15369 15370 15371
So you can see that all dates are converted to a numeric format.
But the problem is, I have a API function that can only read date format as "YYYY-MM-DD".
Can any one suggest how I can generate such a list like:
[1] "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" ....
Use seq function:
seq(startDate,endDate,by="day") #you could use also by=1
# see ?seq.Date for other options for "by"
From help page of operator : (use ?":" or ?Colon):
For other arguments from:to is equivalent to seq(from, to), and
generates a sequence from from to to in steps of 1 or -1. Value to
will be included if it differs from from by an integer up to a numeric
fuzz of about 1e-7. Non-numeric arguments are coerced internally
(hence without dispatching methods) to numeric—complex values will
have their imaginary parts discarded with a warning.
So
identical(startDate:endDate,as.numeric(startDate):as.numeric(endDate))
[1] TRUE
And btw, you are generating a vector, not a list. You can make a list out of your values by using as.list function though, if that is what you really want.
How to read the following vector "c" of strings into a list of tables? Which way is the shortest read.table strsplit? e.g. I cant see how to read the table Edit:c[4:6] a[4:6] in one command.
require(car)
m<-matrix(rnorm(16),4,4,byrow=T)
a<-Anova(lm(m~1),type=3,idata=data.frame(treatment=factor(1:4)),idesign=~treatment)
c<-capture.output(summary(a,multivariate=F))
c
This returns lines 4:6
c[4:6]
Now if you wanted to parse this I would do it in two steps. First on the column values from rows 5:6 and then add back the names.
> vals <- read.table(text=c[5:6])
> txt <- " \t SS\t num Df\t Error SS\t den Df\t F\t Pr(>F)"
> names(vals) <- names(read.delim(text=txt))
> vals
X SS num.Df Error.SS den.Df F Pr..F.
1 (Intercept) 0.57613392 1 0.4219563 3 4.09616 0.13614
2 treatment 1.85936442 3 8.2899759 9 0.67287 0.58996
EDIT --
you could look at the source code of the summary function and calculate the quantities required by yourself
getAnywhere(summary.Anova.mlm)
The original idea seems not to work.
c2 <- summary(a)
# find out what 'properties' the summary object has
# turns out, it is just the Anova object
class(c2) <- "list"
names(c2)
This returns
[1] "SSP" "SSPE" "P" "df" "error.df"
[6] "terms" "repeated" "type" "test" "idata"
[11] "idesign" "icontrasts" "imatrix" "singular"
and we can get access them
c2$SSP
c2$SSPE
It seems not a good idea to use R internal c function as a variable name