Asterisks in the output of R describe function

Asterisks in the output of R describe function - r

What does it mean the asterisk next to some item names in the descriptive statistics table given by describe function (package psych) in R ?
vars n mean sd
STUDY_ID* 1 67 1.00 0.00
COUNTRY_ID* 2 67 1.00 0.00
EXTRACTION_DATE* 3 67 34.00 19.49
SITE_ID 4 67 8.94 5.30
SUBJECT_ID* 5 67 34.00 19.49
SUBJECT_REF* 6 67 34.00 19.49
REF_I1_CENTERINFO 7 67 8.94 5.30
REF_NUMBER 8 67 9.21 7.09
REF_I1_NOM* 9 67 8.03 5.62
REF_I1_PRENOM* 10 67 8.22 4.95
RANDOMIZATION_R1* 11 66 1.50 0.50
Thank you

From ?psych::describe:
If the check option is TRUE, variables that are categorical or
logical are converted to numeric and then described. These
variables are marked with an * in the row name.

Related

How can I change date formats in a column of a dataframe?

I'd like to change the format of an entire column in a dataframe in R.
I saw answers for this in python, and I've been attempting all sorts of codes and trying, but nothing has worked. I've finally found a way to check and verify the value types of each column of my dataframe and the date column comes up as character. I'd like to change that to date.
...also, on another note, I've asked a few questions and people always say to put sample data in here, but I don't know how to copy my dataframe from RStudio cloud...? But I'll attempt to show some data and my codes.
data frame:
Id
ActivityDate
TotalSteps
TotalDistance
TrackerDistance
LoggedActivitiesDistance
VeryActiveDistance
ModeratelyActiveDistance
LightActiveDistance
SedentaryActiveDistance
VeryActiveMinutes
FairlyActiveMinutes
LightlyActiveMinutes
SedentaryMinutes
Calories
1
1503960366
4/12/2016
13162
8.50
8.50
0
1.88
0.55
6.06
0.00
25
13
328
728
1985
2
1503960366
4/13/2016
10735
6.97
6.97
0
1.57
0.69
4.71
0.00
21
19
217
776
1797
3
1503960366
4/14/2016
10460
6.74
6.74
0
2.44
0.40
3.91
0.00
30
11
181
1218
1776
4
1503960366
4/15/2016
9762
6.28
6.28
0
2.14
1.26
2.83
0.00
29
34
209
726
1745
5
1503960366
4/16/2016
12669
8.16
8.16
0
2.71
0.41
5.04
0.00
36
10
221
773
1863
6
1503960366
4/17/2016
9705
6.48
6.48
0
3.19
0.78
2.51
0.00
38
20
164
539
1728
7
1503960366
4/18/2016
13019
8.59
8.59
0
3.25
0.64
4.71
0.00
42
16
233
1149
1921
8
1503960366
4/19/2016
15506
9.88
9.88
0
3.53
1.32
5.03
0.00
50
31
264
775
2035
9
1503960366
4/20/2016
10544
6.68
6.68
0
1.96
0.48
4.24
0.00
28
12
205
818
1786
10
1503960366
4/21/2016
9819
6.34
6.34
0
1.34
0.35
4.65
0.00
19
8
211
838
1775
11
1503960366
4/22/2016
12764
8.13
8.13
0
4.76
1.12
2.24
0.00
66
27
130
1217
1827
12
1503960366
4/23/2016
14371
9.04
9.04
0
2.81
0.87
5.36
0.00
41
21
262
732
1949
13
1503960366
4/24/2016
10039
6.41
6.41
0
2.92
0.21
3.28
0.00
39
5
238
709
1788
14
1503960366
4/25/2016
15355
9.80
9.80
0
5.29
0.57
3.94
0.00
73
14
216
814
2013
15
1503960366
4/26/2016
13755
8.79
8.79
0
I don't know why it pastes like that. Anyways...
daily_activity <- read_csv("dailyActivity_merged.csv")
I then ran:
str(daily_activity)
To check what types of data each column was made of. I see my activity dates are
chr types...which I looked up and saw meant characters. Is this correct? I used this same dataset in google sheets and double check it, there are 600 rows worth of each columns, and they came back with a ' in front of the numbers for dates, as if they were entered in distance measurements like the columns following the date column. This is incorrect obviously as this is a date, not a distance, so now I'd like to change the entire column to be dates.
I've tried:
as_date(daily_activity, ActivityDate)
mdy(ActivityDate)
help("mdy")
help("print")
help("str")
str(daily_activity) %>% as.date(ActivityDate,"mm/dd/yyyy")
Not sure what to do but there doesn't seem to be any site or reference for such a thing as I've been google-ing for answers and help for 2 days now.
2nd part of my quest, is to then use the newly created date column, and the already present Id column to merge two dataframes...is that possible? Both the dataframes have the date column that needs to be upgraded, and they both have the Id columns, so I was thinking a join statement...does that exist in R? Because I want to join both dataframes in their entirety, by matching up with both those columns, Id and date...but the other columns in the data frames are different, and I need to work with information from both dataframes, together.
Has any of this made sense? I hope so. I asked the questions as if I was talking to someone, like the recommend to do on this site. Thanks in advance for any advice, help or information.

Scraping an interactive table with rvest

I'm attempting to scrape the second table shown at the URL below, and I'm running into issues which may be related to the interactive nature of the table.
div_stats_standard appears to refer to the table of interest.
The code runs with no errors but returns an empty list.
url <- 'https://fbref.com/en/comps/9/stats/Premier-League-Stats'
data <- url %>%
read_html() %>%
html_nodes(xpath = '//*[(#id = "div_stats_standard")]') %>%
html_table()
Can anyone tell me where I'm going wrong?

Look for the table.
library(rvest)
url <- "https://fbref.com/en/comps/9/stats/Premier-League-Stats"
page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Result:
head(table)
Playing Time Playing Time Playing Time Performance Performance
1 Squad # Pl MP Starts Min Gls Ast
2 Arsenal 26 27 297 2,430 39 26
3 Aston Villa 28 27 297 2,430 33 27
4 Bournemouth 25 28 308 2,520 27 17
5 Brighton 23 28 308 2,520 28 19
6 Burnley 21 28 308 2,520 32 23
Performance Performance Performance Performance Per 90 Minutes Per 90 Minutes
1 PK PKatt CrdY CrdR Gls Ast
2 2 2 64 3 1.44 0.96
3 1 3 54 1 1.22 1.00
4 1 1 60 3 0.96 0.61
5 1 1 44 2 1.00 0.68
6 2 2 53 0 1.14 0.82
Per 90 Minutes Per 90 Minutes Per 90 Minutes Expected Expected Expected Per 90 Minutes
1 G+A G-PK G+A-PK xG npxG xA xG
2 2.41 1.37 2.33 35.0 33.5 21.3 1.30
3 2.22 1.19 2.19 30.6 28.2 22.0 1.13
4 1.57 0.93 1.54 31.2 30.5 20.8 1.12
5 1.68 0.96 1.64 33.8 33.1 22.4 1.21
6 1.96 1.07 1.89 30.9 29.4 18.9 1.10
Per 90 Minutes Per 90 Minutes Per 90 Minutes Per 90 Minutes
1 xA xG+xA npxG npxG+xA
2 0.79 2.09 1.24 2.03
3 0.81 1.95 1.04 1.86
4 0.74 1.86 1.09 1.83
5 0.80 2.01 1.18 1.98
6 0.68 1.78 1.05 1.73

Error in producing the output

I have problem with my code. I can't trace the error. I have coor data (40 by 2 matrix) as below and a rainfall data (14610 by 40 matrix).
No Longitude Latitude
1 100.69 6.34
2 100.77 6.24
3 100.39 6.11
4 100.43 5.53
5 100.39 5.38
6 101.00 5.71
7 101.06 5.30
8 100.80 4.98
9 101.17 4.48
10 102.26 6.11
11 102.22 5.79
12 102.28 5.31
13 102.02 5.38
14 101.97 4.88
15 102.95 5.53
16 103.13 5.32
17 103.06 4.94
18 103.42 4.76
19 103.42 4.23
20 102.38 4.24
21 101.94 4.23
22 103.04 3.92
23 103.36 3.56
24 102.66 3.03
25 103.19 2.89
26 101.35 3.70
27 101.41 3.37
28 101.75 3.16
29 101.39 2.93
30 102.07 3.09
31 102.51 2.72
32 102.26 2.76
33 101.96 2.74
34 102.19 2.36
35 102.49 2.29
36 103.02 2.38
37 103.74 2.26
38 103.97 1.85
39 103.72 1.76
40 103.75 1.47
rainfall= 14610 by 40 matrix;
coor= 40 by 2 matrix
my_prog=function(rainrain,coordinat,misss,distance)
{
rain3<-rainrain # target station i**
# neighboring stations for target station i
a=coordinat # target station i**
diss=as.matrix(distHaversine(a,coor,r=6371))
mmdis=sort(diss,decreasing=F,index.return=T)
mdis=as.matrix(mmdis$x)
mdis1=as.matrix(mmdis$ix)
dist=cbind(mdis,mdis1)
# NA creation
# create missing values in rainfall data
set.seed(100)
b=sample(1:nrow(rain3),(misss*nrow(rain3)),replace=F)
k=replace(rain3,b,NA)
# pick i closest stations
neig=mdis1[distance] # neighbouring selection distance
# target (with NA) and their neighbors
rainB=rainfal00[,neig]
rainA0=rainB[,2:ncol(rainB)]
rainA<-as.matrix(cbind(k,rainA0))
rain2=na.omit(rainA)
x=as.matrix(rain2[,1]) # used to calculate the correlation
n1=ncol(rainA)-1
#1) normal ratio(nr)
jum=as.matrix(apply(rain2,2,mean))
nr0=(jum[1]/jum)
nr=as.matrix(nr0[2:nrow(nr0),])
m01=as.matrix(rainA[is.na(k),])
m1=m01[,2:ncol(m01)]
out1=as.matrix(sapply(seq_len(nrow(m1)),
function(i) sum(nr*m1[i,],na.rm=T)/n1))
print(out1)
}
impute=my_prog(rainrain=rainfall[,1],coordinat=coor[1,],misss=0.05,distance=mdis<200)
I have run this code and and the output obtained is:
Error in my_prog(rainrain = rainfal00[, 1], misss = 0.05, coordinat = coor[1, :
object 'mdis' not found
I have checked the program, but cannot trace the problem. I would really appreciate if someone could help me.

adding new column to data frame in R

rate len ADT trks sigs1 slim shld lane acpt itg lwid hwy
1 4.58 4.99 69 8 0.20040080 55 10 8 4.6 1.20 12 FAI
2 2.86 16.11 73 8 0.06207325 60 10 4 4.4 1.43 12 FAI
3 3.02 9.75 49 10 0.10256410 60 10 4 4.7 1.54 12 FAI
4 2.29 10.65 61 13 0.09389671 65 10 6 3.8 0.94 12 FAI
5 1.61 20.01 28 12 0.04997501 70 10 4 2.2 0.65 12 FAI
6 6.87 5.97 30 6 2.00750419 55 10 4 24.8 0.34 12 PA
7 3.85 8.57 46 8 0.81668611 55 8 4 11.0 0.47 12 PA
8 6.12 5.24 25 9 0.57083969 55 10 4 18.5 0.38 12 PA
9 3.29 15.79 43 12 1.45333122 50 4 4 7.5 0.95 12 PA
I got a question in adding a new column, my data frame is called highway1,and i want to add a column named S/N, as slim divided by acpt, what can I do?
Thanks

> mydf$SN <- mydf$slim/mydf$acpt
> mydf
rate len ADT trks sigs1 slim shld lane acpt itg lwid hwy SN
1 4.58 4.99 69 8 0.20040080 55 10 8 4.6 1.20 12 FAI 11.956522
2 2.86 16.11 73 8 0.06207325 60 10 4 4.4 1.43 12 FAI 13.636364
3 3.02 9.75 49 10 0.10256410 60 10 4 4.7 1.54 12 FAI 12.765957
4 2.29 10.65 61 13 0.09389671 65 10 6 3.8 0.94 12 FAI 17.105263
5 1.61 20.01 28 12 0.04997501 70 10 4 2.2 0.65 12 FAI 31.818182
6 6.87 5.97 30 6 2.00750419 55 10 4 24.8 0.34 12 PA 2.217742
7 3.85 8.57 46 8 0.81668611 55 8 4 11.0 0.47 12 PA 5.000000
8 6.12 5.24 25 9 0.57083969 55 10 4 18.5 0.38 12 PA 2.972973
9 3.29 15.79 43 12 1.45333122 50 4 4 7.5 0.95 12 PA 6.666667
I hope an explanation is not necessary for the above.

While $ is the preferred route, you can also consider cbind.
First, create the numeric vector and assign it to SN:
SN <- Data[,6]/Data[,9]
Now you use cbind to append the numeric vector as a column to the existing data frame:
Data <- cbind(Data, SN)
Again, using the dollar operator $ is preferred, but it doesn't hurt seeing what an alternative looks like.

Write a dataframe formatted to a csv sheet

I am having a dataframe which looks like that:
> (eventStudyList120_After)
Dates Company Returns Market Returns Abnormal Returns
1 25.08.2009 4.81 0.62595516 4.184045
2 26.08.2009 4.85 0.89132960 3.958670
3 27.08.2009 4.81 -0.93323011 5.743230
4 28.08.2009 4.89 1.00388875 3.886111
5 31.08.2009 4.73 2.50655343 2.223447
6 01.09.2009 4.61 0.28025201 4.329748
7 02.09.2009 4.77 0.04999239 4.720008
8 03.09.2009 4.69 -1.52822071 6.218221
9 04.09.2009 4.89 -1.48860354 6.378604
10 07.09.2009 4.85 -0.38646531 5.236465
11 08.09.2009 4.89 -1.54065680 6.430657
12 09.09.2009 5.01 -0.35443455 5.364435
13 10.09.2009 5.01 -0.54107231 5.551072
14 11.09.2009 4.89 0.15189458 4.738105
15 14.09.2009 4.93 -0.36811321 5.298113
16 15.09.2009 4.93 -1.31185921 6.241859
17 16.09.2009 4.93 -0.53398643 5.463986
18 17.09.2009 4.97 0.44765285 4.522347
19 18.09.2009 5.01 0.81109101 4.198909
20 21.09.2009 5.01 -0.76254262 5.772543
21 22.09.2009 4.93 0.11309704 4.816903
22 23.09.2009 4.93 1.64429117 3.285709
23 24.09.2009 4.93 0.37294212 4.557058
24 25.09.2009 4.93 -2.59894035 7.528940
25 28.09.2009 5.21 0.29588776 4.914112
26 29.09.2009 4.93 0.49762314 4.432377
27 30.09.2009 5.41 2.17220569 3.237794
28 01.10.2009 5.21 1.67482716 3.535173
29 02.10.2009 5.25 -0.79014302 6.040143
30 05.10.2009 4.97 -2.69996146 7.669961
31 06.10.2009 4.97 0.18086490 4.789135
32 07.10.2009 5.21 -1.39072582 6.600726
33 08.10.2009 5.05 0.04210020 5.007900
34 09.10.2009 5.37 -1.14940251 6.519403
35 12.10.2009 5.13 1.16479551 3.965204
36 13.10.2009 5.37 -2.24208216 7.612082
37 14.10.2009 5.13 0.41327193 4.716728
38 15.10.2009 5.21 1.54473332 3.665267
39 16.10.2009 5.13 -1.73781565 6.867816
40 19.10.2009 5.01 0.66416288 4.345837
41 20.10.2009 5.09 -0.27007314 5.360073
42 21.10.2009 5.13 1.26968917 3.860311
43 22.10.2009 5.01 0.29432965 4.715670
44 23.10.2009 5.01 1.73758937 3.272411
45 26.10.2009 5.21 0.38854011 4.821460
46 27.10.2009 5.21 2.72671890 2.483281
47 28.10.2009 5.21 -1.76846884 6.978469
48 29.10.2009 5.41 2.95523593 2.454764
49 30.10.2009 5.37 -0.22681024 5.596810
50 02.11.2009 5.33 1.38835160 3.941648
51 03.11.2009 5.33 -1.83751398 7.167514
52 04.11.2009 5.21 -0.68721323 5.897213
53 05.11.2009 5.21 -0.26954741 5.479547
54 06.11.2009 5.21 -2.24083342 7.450833
55 09.11.2009 5.17 0.39168239 4.778318
56 10.11.2009 5.09 -0.99082271 6.080823
57 11.11.2009 5.17 0.07924735 5.090753
58 12.11.2009 5.81 -0.34424802 6.154248
59 13.11.2009 6.21 -2.00230195 8.212302
60 16.11.2009 7.81 0.48655978 7.323440
61 17.11.2009 7.69 -0.21092848 7.900928
62 18.11.2009 7.61 1.55605852 6.053941
63 19.11.2009 7.21 0.71028798 6.499712
64 20.11.2009 7.01 -2.38596631 9.395966
65 23.11.2009 7.25 0.55334705 6.696653
66 24.11.2009 7.21 -0.54239847 7.752398
67 25.11.2009 7.25 3.36386413 3.886136
68 26.11.2009 7.01 -1.28927630 8.299276
69 27.11.2009 7.09 0.98053264 6.109467
70 30.11.2009 7.09 -2.61935612 9.709356
71 01.12.2009 7.01 -0.11946242 7.129462
72 02.12.2009 7.21 0.17152317 7.038477
73 03.12.2009 7.21 -0.79343095 8.003431
74 04.12.2009 7.05 0.43919792 6.610802
75 07.12.2009 7.01 1.62169804 5.388302
76 08.12.2009 7.01 0.74055990 6.269440
77 09.12.2009 7.05 -0.99504492 8.045045
78 10.12.2009 7.21 -0.79728245 8.007282
79 11.12.2009 7.21 -0.73784636 7.947846
80 14.12.2009 6.97 -0.14656077 7.116561
81 15.12.2009 6.89 -1.42712116 8.317121
82 16.12.2009 6.97 0.95988962 6.010110
83 17.12.2009 6.69 0.22718293 6.462817
84 18.12.2009 6.53 -1.46958638 7.999586
85 21.12.2009 6.33 -0.21365446 6.543654
86 22.12.2009 6.65 -0.17256757 6.822568
87 23.12.2009 7.05 -0.59940253 7.649403
88 24.12.2009 7.05 NA NA
89 25.12.2009 7.05 NA NA
90 28.12.2009 7.05 -0.22307263 7.273073
91 29.12.2009 6.81 0.76736750 6.042632
92 30.12.2009 6.81 0.00000000 6.810000
93 31.12.2009 6.81 -1.50965723 8.319657
94 01.01.2010 6.81 NA NA
95 04.01.2010 6.65 0.06111069 6.588889
96 05.01.2010 6.65 -0.13159651 6.781597
97 06.01.2010 6.65 0.09545081 6.554549
98 07.01.2010 6.49 -0.32727619 6.817276
99 08.01.2010 6.81 -0.07225296 6.882253
100 11.01.2010 6.81 1.61131397 5.198686
101 12.01.2010 6.57 -0.40791980 6.977920
102 13.01.2010 6.85 -0.53016383 7.380164
103 14.01.2010 6.93 1.82016604 5.109834
104 15.01.2010 6.97 -0.62552046 7.595520
105 18.01.2010 6.93 -0.80490241 7.734902
106 19.01.2010 6.77 2.02857647 4.741424
107 20.01.2010 6.93 1.68204556 5.247954
108 21.01.2010 6.89 1.02683875 5.863161
109 22.01.2010 6.90 0.96765669 5.932343
110 25.01.2010 6.73 -0.57603687 7.306037
111 26.01.2010 6.81 0.50990350 6.300096
112 27.01.2010 6.81 1.64994011 5.160060
113 28.01.2010 6.61 -1.13511086 7.745111
114 29.01.2010 6.53 -0.82206204 7.352062
115 01.02.2010 7.03 -1.03993428 8.069934
116 02.02.2010 6.93 0.61692305 6.313077
117 03.02.2010 7.73 2.53012795 5.199872
118 04.02.2010 7.97 1.96223075 6.007769
119 05.02.2010 9.33 -0.76549820 10.095498
120 08.02.2010 8.01 -0.34391479 8.353915
When I write it to a csv sheet it looks like that:
write.table(eventStudyList120_After$`Abnormal Returns`, file = "C://Users//AbnormalReturns.csv", sep = ";")
In fact I want to let it look like that:
So my question is:
How to write the data frame as it is into a csv and how to transpose the Abnormal return column and put the header as in the example sheet?

Two approaches: transpose the data in R or in Excel
In R
Add an index column, select the columns you want and transpose the data using the function t
d <- anscombe
d$index <- 1:nrow(anscombe)
td <- t(d[c("index", "x1")])
write.table(td, "filename.csv", col.names = F, sep = ";")
Result:
"index";1;2;3;4;5;6;7;8;9;10;11
"x1";10;8;13;9;11;14;6;4;12;7;5
In Excel
Excel allows you to transpose data as well: http://office.microsoft.com/en-us/excel-help/switch-transpose-columns-and-rows-HP010224502.aspx

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Asterisks in the output of R describe function - r

From ?psych::describe: If the check option is TRUE, variables that are categorical or logical are converted to numeric and then described. These variables are marked with an * in the row name.

Related

How can I change date formats in a column of a dataframe?

Scraping an interactive table with rvest

Error in producing the output

adding new column to data frame in R

Write a dataframe formatted to a csv sheet

Categories

Resources