How can I reset rownames? [duplicate] - r

This question already has answers here:
How to reset row names?
(2 answers)
Closed 5 years ago.
I have a data frame named lagcolmean like this which begins with 2, since I drop the first one
MSFT AAPL GOOGL
2 20.91273 5.663524 97.50684
3 20.05333 5.681336 90.57909
4 20.09447 5.239416 99.60738
Now how can I convert it like this
MSFT AAPL GOOGL
1 20.91273 5.663524 97.50684
2 20.05333 5.681336 90.57909
3 20.09447 5.239416 99.60738
I actually used rownames(lagcolmean) but the output is this
[1] "2" "3" "4" "5" "6" "7" "8" "9" "10" "11"
[11] "12" "13" "14" "15" "16" "17" "18" "19" "20" "21"
[21] "22" "23

if your data frame is called df, just do rownames(df)=NULL

another option:
rownames(df) <- 1:nrow(df)

Related

anova() does not work properly with lme objects after updating - do I miss something?

I had a code that worked fine so far. I want to test things with gls, lme and gamm (from packages nlme and mgcv), and I compared different models with anova(). However, I needed another package, that did not work with my R version (which was almost one year old). Thus, I updated R (via the updater package) and RStudio.
The issue now is, that anova() does not give any output after running or only "Denom. DF: 91" and nothing else.
Now I tried different things and searched a lot, but I found no current threat dealing with such a problem, while looking at the help files just says, it should work that way I use it. Thus, I am suspecting that I miss something essential (probably even obvious), but I don't get it. I hope you can tell me where I do something wrong.
Here is some data to play with (copied from txt-file):
"treat" "x" "time" "nest"
"1" "1" 49.37 1 "K1"
"2" "1" 48.68 1 "K2"
"3" "2" 44.7 1 "T7"
"4" "2" 49.3 1 "T8"
"5" "1" 48.78 1 "K3"
"6" "2" 42.37 1 "T10"
"7" "1" 39.26 1 "K4"
"8" "2" 46.36 1 "T11"
"9" "1" 40.36 1 "K5"
"10" "2" 47.14 1 "T9"
"11" "1" 48.81 1 "K6"
"12" "1" 40.4 1 "K10"
"13" "2" 53.42 1 "T4"
"14" "2" 46.85 1 "T5"
"15" "2" 44.58 1 "T2"
"16" "2" 47.51 1 "T6"
"17" "1" 51.7 1 "K8"
"18" "1" 48.16 1 "K7"
"19" "2" 48.86 1 "T3"
"20" "1" 44.6 1 "K11"
"21" "1" 49.71 1 "K9"
"22" "2" 44.54 1 "T1"
"23" "2" 41.55 2 "T3"
"24" "1" 32.55 2 "K3"
"25" "1" 42.15 2 "K1"
"26" "2" 51.06 2 "T1"
"27" "1" 38.43 2 "K11"
"28" "2" 39.91 2 "T11"
"29" "1" 36.73 2 "K7"
"30" "2" 50.19 2 "T4"
"31" "1" 42.26 2 "K8"
"32" "1" 43.02 2 "K6"
"33" "2" 37.6 2 "T10"
"34" "1" 33.42 2 "K4"
"35" "2" 39.64 2 "T5"
"36" "2" 43.56 2 "T2"
"37" "2" 35.31 2 "T7"
"38" "2" 37 2 "T8"
"39" "2" 40.87 2 "T6"
"40" "1" 35.29 2 "K9"
"41" "2" 41.83 2 "T9"
"42" "1" 37.88 2 "K10"
"43" "1" 36.5 2 "K5"
"44" "1" 34.21 3 "K4"
"45" "1" 38.04 3 "K6"
"46" "1" 35.14 3 "K3"
"47" "2" 38.18 3 "T10"
"48" "1" 40.26 3 "K11"
"49" "2" 37.09 3 "T3"
"50" "2" 43.1 3 "T11"
"51" "2" 34.26 3 "T7"
"52" "1" 36.58 3 "K9"
"53" "1" 35.81 3 "K2"
"54" "1" 39.83 3 "K10"
"55" "2" 37.65 3 "T6"
"56" "1" 39.8 3 "K7"
"57" "1" 36.41 3 "K8"
"58" "1" 35.22 3 "K5"
"59" "2" 39.68 3 "T8"
"60" "2" 41.12 3 "T1"
"61" "2" 36.93 3 "T9"
"62" "1" 35.66 3 "K1"
"63" "2" 36.91 3 "T4"
"64" "2" 38.84 3 "T5"
"65" "2" 34.31 3 "T2"
"66" "1" 32.71 4 "K9"
"67" "2" 37.84 4 "T11"
"68" "1" 28.01 4 "K10"
"69" "2" 39.69 5 "T11"
"70" "2" 35.08 4 "T10"
"71" "2" 34.43 4 "T9"
"72" "1" 32.12 4 "T8"
"73" "2" 30.41 4 "T7"
"74" "1" 31.81 4 "K7"
"75" "2" 36.41 4 "T6"
"76" "1" 29.17 5 "K6"
"77" "1" 28.59 4 "K6"
"78" "2" 33.99 4 "T5"
"79" "1" 30.41 4 "K5"
"80" "1" 29.8 4 "K4"
"81" "2" 34.72 4 "T4"
"82" "2" 34.38 4 "T3"
"83" "1" 28.12 4 "K3"
"84" "2" 34.62 4 "T2"
"85" "1" 31.88 4 "K2"
"86" "1" 29.35 4 "K1"
"87" "2" 37.95 4 "T1"
"88" "2" 40.85 5 "T4"
"89" "2" 35.07 5 "T5"
"90" "2" 36.15 5 "T8"
"91" "2" 36.48 5 "T10"
"92" "1" 33.73 4 "K8"
"93" "1" 28.17 5 "K9"
"94" "1" 32.81 5 "K10"
"95" "1" 32.17 4 "K11"
And this is basically one of the models I try to run:
test <- read.table(file="C:/Users/marvi_000/Desktop/testdata.txt")
str(test)
test$treat <- as.factor(test$treat)
test$nest <- as.factor(test$nest)
library(nlme)
m.test <- gls(x ~ treat * time,
correlation = corAR1(form =~ time | nest),
test, na.action = na.omit)
anova(m.test)
the output is:
Denom. DF: 91
When comparing models with anova(m1, m2) nothing happens at all.
The same is true when I run a gamm from package mgcv and using anova(m$lme) or anova(m1$lme, m2$lme).
I would appreciate any help or hint, pointing me towards the right direction. Thanks a lot!
EDIT:
After some discussion, I found out, that it is a problem with the scripts. I'm using RStudio and RMarkdown. However, when I run the code (with cntrl+enter, line by line) within the markdown script, the anova(lmemodel) command does not work as supposed to. However, if I just copy this single command into a plane r script (still using the current environment), the command is executed properly showing the desired output.
I have no clue what is happening there. If anybody has an idea where the problem is, or how to solve it, I would still be happy to hear it.

How to find unique couples of numbers in a vector in R?

Let us suppose to have C<-c(1,2,3,4,5)
I want to find all the unique couples of numbers that can be extracted from this vector, e.g.,12,13 23 etc. How can I do it?
One option could be:
na.omit(c(`diag<-`(sapply(x, paste0, x), NA)))
[1] "12" "13" "14" "15" "21" "23" "24" "25" "31" "32" "34" "35" "41" "42" "43" "45"
[17] "51" "52" "53" "54"
Using RcppAlgos package.
## Combinations
unlist(RcppAlgos::comboGeneral(x, 2, FUN=function(x) Reduce(paste0, x)))
# [1] "12" "13" "14" "15" "23" "24" "25" "34" "35" "45"
## Permutations
unlist(RcppAlgos::permuteGeneral(x, 2, FUN=function(x) Reduce(paste0, x)))
# [1] "12" "13" "14" "15" "21" "23" "24" "25" "31" "32" "34" "35" "41" "42" "43"
# [16] "45" "51" "52" "53" "54"

Reformatting Panel Data according to a time and event variable

I have a panel dataset with many variables. The three most relevant variables are: "cid" (country code), 'time" (0-65), and "event" (0, 1, 2, 3, 4, 5, 6).
I am trying to run a cox regression (using coxph), however, since the time variable has different starting and ending points for each country, I need to first create a start time and end time variable. Here is where I run into my problem.
Here is what a sample of the three main variables may look like:
> data
cid time event
[1,] "AFG" "20" "0"
[2,] "AFG" "21" "0"
[3,] "AFG" "22" "0"
[4,] "AFG" "23" "0"
[5,] "AFG" "24" "0"
[6,] "AFG" "25" "0"
[7,] "AFG" "26" "1"
[8,] "AFG" "27" "1"
[9,] "AFG" "28" "1"
[10,] "AFG" "29" "1"
The idea is to convert this data into the following:
> data
cid time1 time2 event
[1,] "AFG" "20" "25" "0"
[2,] "AFG" "26" "29" "1"
How exactly does one go about doing this (keeping in mind that there are quite a few other explanatory variables in my dataset)?
You could use dplyr and pipe. This solution will work if your data is always ordered sequentially as in your example.
data<-data.frame(cid=rep("AFG",10),time=seq(20,29,1),event=c(0,0,0,0,0,0,1,1,1,1))
library(dplyr)
data %>% group_by(cid,event) %>%
summarise(time1=min(time),time2=max(time))
subset1<- data[data$event==0,]
subset1
subset2<- data[data$event==1,]
subset2
s1<- cbind(cid="AFG",time1=min(subset1$time),time2=max(subset1$time),event = 0)
s1
s2<- cbind(cid="AFG",time1=min(subset2$time),time2=max(subset2$time),event = 1)
s2
data1=rbind(s1,s2)
data1
# cid time1 time2 event
# [1,] "AFG" "20" "25" "0"
# [2,] "AFG" "26" "29" "1"
Hope this would help a little.

Access the levels of a factor in R

I have a 5-level factor that looks like the following:
tmp
[1] NA
[2] 1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46
[3] NA
[4] NA
[5] 5,9,16,24,35,36,42
[6] 4,7,10,14,15,17,19,23,25,27,28,30,31,32,34,37,38,41,44,45,47,48,49,50
[7] 8,39
5 Levels: 1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46 ...
I want to access the items within each level except NA. So I use the levels() function, which gives me:
> levels(tmp)
[1] "1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46"
[2] "4,7,10,14,15,17,19,23,25,27,28,30,31,32,34,37,38,41,44,45,47,48,49,50"
[3] "5,9,16,24,35,36,42"
[4] "8,39"
[5] "NA"
Then I would like to access the elements in each level, and store them as numbers. However, for example,
>as.numeric(cat(levels(tmp)[3]))
5,9,16,24,35,36,42numeric(0)
Can you help me removing the commas within the numbers and the numeric(0) at the very end. I would like to have a vector of numerics 5, 9, 16, 24, 35, 36, 42 so that I can use them as indices to access a data frame. Thanks!
You need to use a combination of unlist, strsplit and unique.
First, recreate your data:
dat <- read.table(text="
NA
1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46
NA
NA
5,9,16,24,35,36,42
4,7,10,14,15,17,19,23,25,27,28,30,31,32,34,37,38,41,44,45,47,48,49,50
8,39")$V1
Next, find all the unique levels, after using strsplit:
sort(unique(unlist(
sapply(levels(dat), function(x)unlist(strsplit(x, split=",")))
)))
[1] "1" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "2" "20" "21" "22" "23" "24" "25" "26"
[20] "27" "28" "29" "3" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "4" "40" "41" "42" "43"
[39] "44" "45" "46" "47" "48" "49" "5" "50" "6" "7" "8" "9"
Does this do what you want?
levels_split <- strsplit(levels(tmp), ",")
lapply(levels_split, as.numeric)
Using Andrie's dat
val <- scan(text=levels(dat),sep=",")
#Read 50 items
split(val,cumsum(c(T,diff(val) <0)))
#$`1`
#[1] 1 2 3 6 11 12 13 18 20 21 22 26 29 33 40 43 46
#$`2`
#[1] 4 7 10 14 15 17 19 23 25 27 28 30 31 32 34 37 38 41 44 45 47 48 49 50
#$`3`
#[1] 5 9 16 24 35 36 42
#$`4`
#[1] 8 39

Grouping a variable with numerous levels

Let's say I have a factor variable with numerous levels and I am trying to group them into several groups.
> levels(dat$years_continuously_insured_order2)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18"
[19] "19" "20"
> levels(dat$age_of_oldest_driver)
[1] "-16" "1" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33"
[22] "34" "35" "36" "37" "38" "39" "40
I have a script which runs through these variables and groups them into several categories. However, the number of levels could (and usually is) different each time my script runs. Therefore, if my original code to group the variables was the following (see below), it wouldn't be of use if in an hour later, my script runs and the levels are different. Instead of 15 levels, I could now have 25 levels and the values are different, but I still need to group them into specific categories.
dat$years_continuously_insured2 <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[1]] <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[2:3]] <- "1 or less"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[4]] <- "2"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[5:7]] <- "3 +"
dat$years_continuously_insured2 <- factor(dat$years_continuously_insured2)
How can I find a more elegant way to group variables into segments? Are there better ways to do this in R?
Thanks!
You could convert your factor levels in the continuously insured variable to numeric and then cut to your categories and re-factor(). The first step is described in the R-FAQ (to do properly it's a two step process):
dat$years_cont <- factor( cut( as.numeric(as.character(
dat$years_continuously_insured_order2)),
breaks=c(0,2,3, Inf), right=FALSE ),
labels=c( "1 or less", "2", "3 +")
)
#-----------------
> str(dat)
'data.frame': 100 obs. of 2 variables:
$ years_continuously_insured_order2: Factor w/ 20 levels "1","10","11",..: 4 15 19 5 8 4 16 12 12 18 ...
$ years_cont : Factor w/ 3 levels "1 or less","2",..: 3 3 3 3 3 3 3 2 2 3 ...
If your original column is a number, treat it as a number, not a factor. A much easier way to do what you're doing is:
bin.value = function(x) {
ifelse(x <= 1, "1 or less", ifelse(x == 2, "2", "3+"))
}
dat$years_continuously_insured2 = as.factor(bin.value(as.integer(dat$years_continuously_insured)))

Resources