Why is the main purpose of using the function "detach()"? - r

We were shown the following R code in class:
attach(LifeCycleSavings)
boxplot(sr, main = "Box Plot of Savings Ratio")
detach()
However, why would we need to use "detach()" here? I typed "LifeCycleSavings" and still got an output as follows:
> LifeCycleSavings
sr pop15 pop75 dpi ddpi
Australia 11.43 29.35 2.87 2329.68 2.87
Austria 12.07 23.32 4.41 1507.99 3.93
Belgium 13.17 23.80 4.43 2108.47 3.82
The file "LifeCycleSavings" did not get detached.

To answer your specific question, detach in this context is removing that data frame from the search path. This means that you can no longer refer to variable names alone from that data frame:
attach(LifeCycleSavings)
> sr
[1] 11.43 12.07 13.17 5.75 12.88 8.79 0.60 11.90 4.98 10.78 16.85 3.59 11.24 12.64 12.55 10.67 3.01
[18] 7.70 1.27 9.00 11.34 14.28 21.10 3.98 10.35 15.48 10.25 14.65 10.67 7.30 4.44 2.02 12.70 12.78
[35] 12.49 11.14 13.30 11.77 6.86 14.13 5.13 2.81 7.81 7.56 9.22 18.56 7.72 9.24 8.89 4.71
> detach(LifeCycleSavings)
> sr
Error: object 'sr' not found
So at this point if we wanted to use sr we'd need to type LifeCycleSavings$sr in order to tell R where to look.
As Andrie mentioned, many people frown on this sort of use of attach and detach (although detach is sometimes also used for removing packages from the search path) because it can really clutter up your search path.

Related

How can I change date formats in a column of a dataframe?

I'd like to change the format of an entire column in a dataframe in R.
I saw answers for this in python, and I've been attempting all sorts of codes and trying, but nothing has worked. I've finally found a way to check and verify the value types of each column of my dataframe and the date column comes up as character. I'd like to change that to date.
...also, on another note, I've asked a few questions and people always say to put sample data in here, but I don't know how to copy my dataframe from RStudio cloud...? But I'll attempt to show some data and my codes.
data frame:
Id
ActivityDate
TotalSteps
TotalDistance
TrackerDistance
LoggedActivitiesDistance
VeryActiveDistance
ModeratelyActiveDistance
LightActiveDistance
SedentaryActiveDistance
VeryActiveMinutes
FairlyActiveMinutes
LightlyActiveMinutes
SedentaryMinutes
Calories
1
1503960366
4/12/2016
13162
8.50
8.50
0
1.88
0.55
6.06
0.00
25
13
328
728
1985
2
1503960366
4/13/2016
10735
6.97
6.97
0
1.57
0.69
4.71
0.00
21
19
217
776
1797
3
1503960366
4/14/2016
10460
6.74
6.74
0
2.44
0.40
3.91
0.00
30
11
181
1218
1776
4
1503960366
4/15/2016
9762
6.28
6.28
0
2.14
1.26
2.83
0.00
29
34
209
726
1745
5
1503960366
4/16/2016
12669
8.16
8.16
0
2.71
0.41
5.04
0.00
36
10
221
773
1863
6
1503960366
4/17/2016
9705
6.48
6.48
0
3.19
0.78
2.51
0.00
38
20
164
539
1728
7
1503960366
4/18/2016
13019
8.59
8.59
0
3.25
0.64
4.71
0.00
42
16
233
1149
1921
8
1503960366
4/19/2016
15506
9.88
9.88
0
3.53
1.32
5.03
0.00
50
31
264
775
2035
9
1503960366
4/20/2016
10544
6.68
6.68
0
1.96
0.48
4.24
0.00
28
12
205
818
1786
10
1503960366
4/21/2016
9819
6.34
6.34
0
1.34
0.35
4.65
0.00
19
8
211
838
1775
11
1503960366
4/22/2016
12764
8.13
8.13
0
4.76
1.12
2.24
0.00
66
27
130
1217
1827
12
1503960366
4/23/2016
14371
9.04
9.04
0
2.81
0.87
5.36
0.00
41
21
262
732
1949
13
1503960366
4/24/2016
10039
6.41
6.41
0
2.92
0.21
3.28
0.00
39
5
238
709
1788
14
1503960366
4/25/2016
15355
9.80
9.80
0
5.29
0.57
3.94
0.00
73
14
216
814
2013
15
1503960366
4/26/2016
13755
8.79
8.79
0
I don't know why it pastes like that. Anyways...
daily_activity <- read_csv("dailyActivity_merged.csv")
I then ran:
str(daily_activity)
To check what types of data each column was made of. I see my activity dates are
chr types...which I looked up and saw meant characters. Is this correct? I used this same dataset in google sheets and double check it, there are 600 rows worth of each columns, and they came back with a ' in front of the numbers for dates, as if they were entered in distance measurements like the columns following the date column. This is incorrect obviously as this is a date, not a distance, so now I'd like to change the entire column to be dates.
I've tried:
as_date(daily_activity, ActivityDate)
mdy(ActivityDate)
help("mdy")
help("print")
help("str")
str(daily_activity) %>% as.date(ActivityDate,"mm/dd/yyyy")
Not sure what to do but there doesn't seem to be any site or reference for such a thing as I've been google-ing for answers and help for 2 days now.
2nd part of my quest, is to then use the newly created date column, and the already present Id column to merge two dataframes...is that possible? Both the dataframes have the date column that needs to be upgraded, and they both have the Id columns, so I was thinking a join statement...does that exist in R? Because I want to join both dataframes in their entirety, by matching up with both those columns, Id and date...but the other columns in the data frames are different, and I need to work with information from both dataframes, together.
Has any of this made sense? I hope so. I asked the questions as if I was talking to someone, like the recommend to do on this site. Thanks in advance for any advice, help or information.

Read in CSV in mixed English and French number format

I would like to read the a CSV into R that is quoted, comma-separated (i.e. sep = "," not sep = ";" as read.csv2 defaults to) but that
uses the comma inside fields as the decimal separator
contains periods to separate each group of three digits from the right
An example of a problematic entry is "3.051,00" in the final line of the excerpt from the CSV shown.
I tried
dat <- read.csv2("path_to_csv.csv", sep = ",", stringsAsFactors = FALSE)
and a variant using read.csv (both are identical except for their defaults as noted in Difference between read.csv() and read.csv2() in R. Both return improperly-formatted data.frames (e.g. containing 3.051,00).
Can I read this comma-separated file in directly with read.table without having to perform text-preprocessing?
Excerpt of CSV
praf,pmek,plcg,PIP2,PIP3,p44/42,pakts473,PKA,PKC,P38,pjnk
"26,40","13,20","8,82","18,30","58,80","6,61","17,00","414,00","17,00","44,90","40,00"
"35,90","16,50","12,30","16,80","8,13","18,60","32,50","352,00","3,37","16,50","61,50"
"59,40","44,10","14,60","10,20","13,00","14,90","32,50","403,00","11,40","31,90","19,50"
"62,10","51,90","13,60","30,20","10,60","14,30","37,90","692,00","6,49","25,00","91,40"
"75,00","33,40","1,00","31,60","1,00","19,80","27,60","505,00","18,60","31,10","7,64"
"20,40","15,10","7,99","101,00","35,90","9,14","22,90","400,00","11,70","22,70","6,85"
"47,80","19,60","17,50","33,10","82,00","17,90","35,20","956,00","22,50","43,30","20,00"
"59,90","53,30","11,80","77,70","12,90","11,10","37,90","1.407,00","18,80","29,40","16,80"
"46,60","27,10","12,40","109,00","21,90","21,50","38,20","207,00","11,00","31,30","12,00"
"51,90","21,30","49,10","58,80","10,80","58,80","200,00","3.051,00","15,30","39,20","15,70"
Note: I am aware of the question European and American decimal format for thousands, which is not sufficient. This user preprocesses the file they want to read in whereas I would like a direct means of reading a CSV of the kind shown into R.
Most of it is resolved with dec=",",
# saved your data to 'file.csv'
out <- read.csv("file.csv", dec=",")
head(out)
# praf pmek plcg PIP2 PIP3 p44.42 pakts473 PKA PKC P38 pjnk
# 1 26.4 13.2 8.82 18.3 58.80 6.61 17.0 414,00 17.00 44.9 40.00
# 2 35.9 16.5 12.30 16.8 8.13 18.60 32.5 352,00 3.37 16.5 61.50
# 3 59.4 44.1 14.60 10.2 13.00 14.90 32.5 403,00 11.40 31.9 19.50
# 4 62.1 51.9 13.60 30.2 10.60 14.30 37.9 692,00 6.49 25.0 91.40
# 5 75.0 33.4 1.00 31.6 1.00 19.80 27.6 505,00 18.60 31.1 7.64
# 6 20.4 15.1 7.99 101.0 35.90 9.14 22.9 400,00 11.70 22.7 6.85
Only one column is string:
sapply(out, class)
# praf pmek plcg PIP2 PIP3 p44.42 pakts473 PKA PKC P38
# "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "character" "numeric" "numeric"
# pjnk
# "numeric"
This can be resolved post-read with:
ischr <- sapply(out, is.character)
out[ischr] <- lapply(out[ischr], function(z) as.numeric(gsub(" ", "", chartr(",.", ". ", z))))
out$PKA
# [1] 414 352 403 692 505 400 956 1407 207 3051
If you'd rather read it in without post-processing, you can pipe(.) it, assuming you have sed available[^1]:
out <- read.csv(pipe("sed -E 's/([0-9])[.]([0-9])/\\1\\2/g;s/([0-9]),([0-9])/\\1.\\2/g' < file.csv"))
Notes:
sed is generally available on all linux/macos systems, and on windows computers it is included within Rtools.
Like r2evans's comment says, dec = "," takes care of the cases without thousands separators. Then use lapply/gsub to process the other cases, which are still of class "character".
txt <- '
praf,pmek,plcg,PIP2,PIP3,p44/42,pakts473,PKA,PKC,P38,pjnk
"26,40","13,20","8,82","18,30","58,80","6,61","17,00","414,00","17,00","44,90","40,00"
"35,90","16,50","12,30","16,80","8,13","18,60","32,50","352,00","3,37","16,50","61,50"
"59,40","44,10","14,60","10,20","13,00","14,90","32,50","403,00","11,40","31,90","19,50"
"62,10","51,90","13,60","30,20","10,60","14,30","37,90","692,00","6,49","25,00","91,40"
"75,00","33,40","1,00","31,60","1,00","19,80","27,60","505,00","18,60","31,10","7,64"
"20,40","15,10","7,99","101,00","35,90","9,14","22,90","400,00","11,70","22,70","6,85"
"47,80","19,60","17,50","33,10","82,00","17,90","35,20","956,00","22,50","43,30","20,00"
"59,90","53,30","11,80","77,70","12,90","11,10","37,90","1.407,00","18,80","29,40","16,80"
"46,60","27,10","12,40","109,00","21,90","21,50","38,20","207,00","11,00","31,30","12,00"
"51,90","21,30","49,10","58,80","10,80","58,80","200,00","3.051,00","15,30","39,20","15,70"
'
df1 <- read.csv(textConnection(txt), dec = ",")
i <- sapply(df1, is.character)
df1[i] <- lapply(df1[i], \(x) gsub("\\.", "", x))
df1[i] <- lapply(df1[i], \(x) as.numeric(sub(",", ".", x)))
df1
#> praf pmek plcg PIP2 PIP3 p44.42 pakts473 PKA PKC P38 pjnk
#> 1 26.4 13.2 8.82 18.3 58.80 6.61 17.0 414 17.00 44.9 40.00
#> 2 35.9 16.5 12.30 16.8 8.13 18.60 32.5 352 3.37 16.5 61.50
#> 3 59.4 44.1 14.60 10.2 13.00 14.90 32.5 403 11.40 31.9 19.50
#> 4 62.1 51.9 13.60 30.2 10.60 14.30 37.9 692 6.49 25.0 91.40
#> 5 75.0 33.4 1.00 31.6 1.00 19.80 27.6 505 18.60 31.1 7.64
#> 6 20.4 15.1 7.99 101.0 35.90 9.14 22.9 400 11.70 22.7 6.85
#> 7 47.8 19.6 17.50 33.1 82.00 17.90 35.2 956 22.50 43.3 20.00
#> 8 59.9 53.3 11.80 77.7 12.90 11.10 37.9 1407 18.80 29.4 16.80
#> 9 46.6 27.1 12.40 109.0 21.90 21.50 38.2 207 11.00 31.3 12.00
#> 10 51.9 21.3 49.10 58.8 10.80 58.80 200.0 3051 15.30 39.2 15.70
Created on 2022-02-07 by the reprex package (v2.0.1)

'origin' must be supplied&unambiguous format

I am a beginner in R. I have the following problem - I want to load a CSV file into R and then convert it into a XTS object. However, after the operation I get an error. First, a small snippet of the data:
a=read.csv('/Users/..../Desktop/SYNEKTIK.csv',h=T)
head(a)
Name Date Open High Low Close Volume
1 SYNEKTIK 20110809 5.76 8.23 5.76 8.23 28062
2 SYNEKTIK 20110810 9.78 9.78 8.10 8.13 9882
3 SYNEKTIK 20110811 9.00 9.00 9.00 9.00 2978
4 SYNEKTIK 20110812 9.70 9.70 8.90 9.60 5748
5 SYNEKTIK 20110816 9.70 11.00 9.70 11.00 23100
6 SYNEKTIK 20110818 10.90 11.00 10.90 10.90 319
The following does not work:
w=xts(a[,-1],order.by=as.POSIXct(a[,1]))
As it produces the following error:
error'as.POSIXlt.character(as.character(x), ...)':
character string is not in a standard unambiguous format
Another try that did not work:
a=a[,-1]
head(a)
Date Open High Low Close Volume
1 20110809 5.76 8.23 5.76 8.23 28062
2 20110810 9.78 9.78 8.10 8.13 9882
3 20110811 9.00 9.00 9.00 9.00 2978
4 20110812 9.70 9.70 8.90 9.60 5748
5 20110816 9.70 11.00 9.70 11.00 23100
6 20110818 10.90 11.00 10.90 10.90 319
w=xts(a[,-1],order.by=as.POSIXct(a[,1]))
error 'as.POSIXct.numeric(a[, 1])':'origin' must be supplied
Finally, when I saved the date in the following format: yyyy -mm - dd Everything turned out right, and I could convert into an XTS object, why?
Maybe something like this will help:
w <- xts(a[,c(-1,-2)],order.by=as.Date(as.character(a[,2]),"%Y%m%d"))

Merging two dataframes in R with date

I have the following 2 dataframes:
> bvg1
Parameters X18.Oct.14 X19.Oct.14 X20.Oct.14 X21.Oct.14 X22.Oct.14 X23.Oct.14 X24.Oct.14
1 24K Equivalent Plan 29.00 29.60 33.80 36.60 35.30 31.90 29.00
2 24K Equivalent Act 28.80 31.00 35.40 35.90 34.70 33.40 31.90
3 Plan Rep WS 2463.00 2513.00 2869.00 3115.00 2999.00 2714.00 2468.00
4 Act Rep WS 2447.00 2633.00 3013.00 3054.00 2953.00 2842.00 2714.00
5 Rep WS Var -16.00 120.00 144.00 -61.00 -46.00 128.00 246.00
6 Plan Rep Intakes 568.00 461.00 1159.00 1146.00 1126.00 1124.00 1106.00
7 Act Rep Intakes 707.00 494.00 1106.00 1096.00 1274.00 1087.00 1101.00
8 Rep Intakes Var 139.00 33.00 -53.00 -50.00 148.00 -37.00 -5.00
9 Plan Rep Comps_DL 468.00 54.00 836.00 1190.00 1327.00 1286.00 1108.00
10 Act Rep Comps_DL 471.00 70.00 995.00 1137.00 1323.00 1150.00 1073.00
11 Rep Comps Var_DL 3.00 16.00 159.00 -53.00 -4.00 -136.00 -35.00
12 Plan Rep Mandays_DL 148.00 19.00 260.00 368.00 412.00 398.00 345.00
13 Act Rep Mandays_DL 147.00 19.00 303.00 359.00 423.00 374.00 348.00
14 Rep Mandays Var_DL -1.00 1.00 43.00 -9.00 12.00 -24.00 3.00
15 Plan FVR Mandays_DL 0.00 0.00 4.00 18.00 18.00 18.00 18.00
16 Act FVR Mandays_DL 0.00 0.00 4.00 7.00 8.00 8.00 7.00
17 FVR Mandays Var_DL 0.00 0.00 0.00 -11.00 -10.00 -10.00 -11.00
18 Plan Rep Prod_DL 3.16 2.88 3.21 3.23 3.22 3.23 3.21
19 Act Rep Prod_DL 3.21 3.62 3.28 3.16 3.12 3.07 3.08
20 Rep Prod Var_DL 0.05 0.74 0.07 -0.07 -0.10 -0.16 -0.13
> bvg2
Parameters X18.Oct X19.Oct X20.Oct X21.Oct X22.Oct X23.Oct X24.Oct
1 24K Equivalent Plan 30.50 31.30 35.10 36.10 33.60 28.80 25.50
2 24K Equivalent Act 31.40 33.40 36.60 38.10 36.80 34.40 32.10
3 Plan Rep WS 3419.00 3509.00 3933.00 4041.00 3764.00 3220.00 2859.00
4 Act Rep WS 3514.00 3734.00 4098.00 4271.00 4122.00 3852.00 3591.00
5 Rep WS Var 95.00 225.00 165.00 230.00 358.00 632.00 732.00
6 Plan Rep Intakes 813.00 613.00 1559.00 1560.00 1506.00 1454.00 1410.00
7 Act Rep Intakes 964.00 602.00 1629.00 1532.00 1657.00 1507.00 1439.00
8 Rep Intakes Var 151.00 -11.00 70.00 -28.00 151.00 53.00 29.00
9 Plan Rep Comps_DL 675.00 175.00 1331.00 1732.00 1938.00 1706.00 1493.00
10 Act Rep Comps_DL 718.00 224.00 1389.00 1609.00 1848.00 1698.00 1537.00
11 Rep Comps Var_DL 43.00 49.00 58.00 -123.00 -90.00 -8.00 44.00
12 Plan Rep Mandays_DL 203.00 58.00 428.00 541.00 605.00 536.00 475.00
13 Act Rep Mandays_DL 215.00 63.00 472.00 542.00 608.00 556.00 523.00
14 Rep Mandays Var_DL 12.00 5.00 44.00 2.00 3.00 20.00 48.00
15 Plan FVR Mandays_DL 0.00 0.00 1.00 12.00 2.00 32.00 57.00
16 Act FVR Mandays_DL 0.00 0.00 2.00 2.00 5.00 5.00 5.00
17 FVR Mandays Var_DL 0.00 0.00 1.00 -10.00 3.00 -27.00 -52.00
18 Plan Rep Prod_DL 3.33 3.03 3.11 3.20 3.20 3.18 3.14
19 Act Rep Prod_DL 3.34 3.56 2.94 2.97 3.04 3.05 2.94
20 Rep Prod Var_DL 0.01 0.53 -0.17 -0.23 -0.16 -0.13 -0.20
It is a time series data i.e. 24K Equivalent Plan was 29 on 18th Oct, 29.60 on 19th Oct and 33.80 on 20th Oct. First dataframe have data for one business unit and second dataframe have the data for a different business unit.
I want to merge dataframes into 1 and want to analyse the variance i.e. where they differ in values. Draw ggplots like 2 histograms showing the difference, timeseries plots etc.
I have tried the following:
I can merge the two dataframes by:
joined = rbind(bvg1, bvg2)
however, i can't identify the record whether it belongs to bvg1 or bvg2 df.
if i add an additional column i.e.
bvg1$id = "bvg1"
bvg2$id = "bvg2"
then merge command doesn't work and gives the following error:
Error in match.names(clabs, names(xi)) :
names do not match previous names
Any sample code would be highly appreciated.
You can match the column names of the two datasets by stripping the . followed by the digits in the bvg1. This can be done using regex. In the below code, a lookbehind regex is used. It matches the lookbehind (?<=[A-Za-]) i.e. an alphabet followed by . followed by one or more elements .* to the end of string $ and remove those "".
colnames(bvg1) <-gsub("(?<=[A-Za-z])\\..*$", "", colnames(bvg1), perl=TRUE)
res <- rbind(bvg1, bvg2)
dim(res)
#[1] 40 9
head(res,3)
# Parameters X18.Oct X19.Oct X20.Oct X21.Oct X22.Oct X23.Oct X24.Oct
#1 24K Equivalent Plan 29.0 29.6 33.8 36.6 35.3 31.9 29.0
#2 24K Equivalent Act 28.8 31.0 35.4 35.9 34.7 33.4 31.9
#3 Plan Rep WS 2463.0 2513.0 2869.0 3115.0 2999.0 2714.0 2468.0
# id
#1 bvg1
#2 bvg1
#3 bvg1

Error while producing an ARMA model using the TSA package in R

Had anyone else had this problem, or even better, does anyone know why this is giving me an error?
I'm attempting to create an ARMA model of order 3, 3. I'm using the TSA package.
stocks_arma <- arma(stocks$close, order = c(3,3))
I'm getting this warning:
Warning message:
In arma(VIXts, order = c(3, 3)) : Hessian negative-semidefinite
I understand that a Hessian negative-semidefinite matrix is a bad thing because we usually want global mins/maxes. However, I don't understand why this is happening. I am unsure is this is a mathematical issue or a syntactial issue.
My data is a very modest vector of 1000 entries. Here is one-tenth of it:
15.14 15.31 15.08 15.24 16.41 17.99 17.92 16.65 16.68 18.61 18.49 19.08 17.58 18.42 17.59 16.69 18.60 17.81 18.12 18.33 18.83 16.62 16.97 15.03 15.07 15.22 15.27 16.14 15.59 16.29 16.37 15.11 14.33 14.55 15.43 15.71 16.32 15.73 14.84 16.81 15.43 14.15 13.98 14.07 13.88 14.18 14.59 14.51 14.05 15.80 16.41 16.28 14.38 15.63 17.74 17.98 17.47 17.83 17.06 16.49 16.35 15.18 15.96 15.11 15.02 14.02 13.45 14.29 14.63 14.85 13.70 14.74 15.28 15.32 15.99 15.95 15.64 17.57 18.96 18.93 18.03 16.70 17.53 19.34 20.47 18.62 16.27 15.45 16.16 16.48 17.11 16.74 18.36 17.95 18.72 18.05 17.10 17.50 16.66 16.80 17.08 19.71 19.45 19.72 20.38
There is nothing overtly fishy about the values at all.
Any insight is very much appreciated.

Resources