Loading a csv file as a ts - r

Below are monthly prices of a particular stock;
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2008 46.09 50.01 48 48 50.15 43.45 41.05 41.67 36.66 25.02 22.98 22
2009 20.98 15 13.04 14.4 26.46 14.32 14.6 11.83 14 14.4 13.07 13.6
2010 15.31 15.71 18.97 15.43 13.5 13.8 14.21 12.73 12.35 13.17 14.59 15.01
2011 15.3 15.22 15.23 15 15.1 14.66 14.8 12.02 12.41 12.9 11.6 12.18
2012 12.45 13.33 12.4 14.16 13.99 13.75 14.4 15.38 16.3 18.02 17.29 19.49
2013 20.5 20.75 21.3 20.15 22.2 19.8 19.75 19.71 19.99 21.54 21.3 27.4
2014 23.3 20.5 20 22.7 25.4 25.05 25.08 24.6 24.5 21.2 20.52 18.41
2015 16.01 17.6 20.98 21.15 21.44 0 0 0 0 0 0 0
I want to decompose the data into seasonal and trend data but I am not getting a result.
How can I load the data as a "ts" class data so I can decompose it?

Here is a solution using tidyr, which is fairly accessible.
library(dplyr); library(tidyr)
data %>% gather(month, price, -Year) %>% # 1 row per year-month pair, name the value "price"
mutate(synth_date_txt= paste(month,"1,",Year), # combine month and year into a date string
date=as.Date(synth_date_txt,format="%b %d, %Y")) %>% # convert date string to date
select(date, price) # keep just the date and price
# date price
# 1 2008-01-01 46.09
# 2 2009-01-01 20.98
# 3 2010-01-01 15.31
# 4 2011-01-01 15.30
# 5 2012-01-01 12.45
This gives you an answer with date format (even though you didn't specify a date, just a month and year). It should work for your time series analysis, but if you really need a timestamp you can just use as.POSIXct(date)

Mike,
The program is R and below is the code I have tried.
sev=read.csv("X7UPM.csv")
se=ts(sev,start=c(2008, 1), end=c(2015,1), frequency=12)
se
se=se[,1]
S=decompose(se)
plot(se,col=c("blue"))
plot(decompose(se))
S.decom=decompose(se,type="mult")
plot(S.decom)
trend=S.decom$trend
trend
seasonal=S.decom$seasonal
seasonal
ts.plot(cbind(trend,trend*seasonal),lty=1:2)
plot(stl(se,"periodic"))

Related

Seasonal package: Forecasts end date [...] must end on or before user-defined regression variables end date

I'm relatively new to R and had a question regarding time series format for forecasting and seasonal adjustment using the seasonal package. I'm working with import.spc to generate function calls based on spec files.
Currently, I have FORECAST{MAXLEAD=48}, with my time series ending in 2022-02. I'm getting this error:
- forecasts end date, 2026.Feb, must end on or before user-defined regression variables end date, 2022.Feb.
Is this because my time series ends earlier than 2026-02? I tried appending "NA"s to the end of my historicals but it didn't do much.
Alternatively, I also tried setting FORECAST{MAXLEAD=0}, but I ran into this error:
Error: X-13 has returned a non-zero exist status, which means that the current spec file cannot be processed for an unknown reason.
See my code below:
library("tidyverse")
library("seasonal")
fn<-import.spc("C:\\PATH\\TO\\SPEC\\FILE.spc")
x<-import.ts("C:\\PATH\\TO\\DATA\\FILE.dat")
x %>% (fn[1]$seas)
FILE.spc
SERIES{
TITLE = "Logging"
START = 2016.01
PERIOD = 12
SAVE = (A1 B1)
PRINT = BRIEF
NAME = '1011330000 - AE'
FILE = '"C:\\PATH\\TO\\DATA\\FILE.dat"}
TRANSFORM{FUNCTION = NONE
}
REGRESSION{
USER = (dum1 dum2 dum3 dum4 dum5 dum6 dum7 dum8 dum9 dum10 dum11)
START = 1986.01
USERTYPE = TD
FILE = 'C:\\PATH\\TO\\FILE\\FDUM8606.dat'
SAVE = (TD AO LS TC)
}
ARIMA{
MODEL = (0 1 1)(0 1 1)
}
ESTIMATE{
MAXITER = 3000
}
FORECAST{
MAXLEAD = 0
}
OUTLIER{
CRITICAL = 10.5
TYPES = AO
}
X11{
SEASONALMA = (s3x3)
MODE = ADD
PRINT = (BRIEF)
SAVE = (D10 D11 D16)
APPENDFCST = YES
FINAL = USER
SAVELOG = (Q Q2 M7 FB1 FD8 MSF)
}
FDUM8606.dat can be found here
FILE.dat
2016 2 51.1
2016 3 50.4
2016 4 47.9
2016 5 49.8
2016 6 52.0
2016 7 52.6
2016 8 52.6
2016 9 51.9
2016 10 52.1
2016 11 51.4
2016 12 49.9
2017 1 48.2
2017 2 49.6
2017 3 48.0
2017 4 47.6
2017 5 48.9
2017 6 50.4
2017 7 50.7
2017 8 50.6
2017 9 50.1
2017 10 49.7
2017 11 50.7
2017 12 50.2
2018 1 49.2
2018 2 49.8
2018 3 48.7
2018 4 47.8
2018 5 49.0
2018 6 49.2
2018 7 50.8
2018 8 50.6
2018 9 50.0
2018 10 49.6
2018 11 49.1
2018 12 49.7
2019 1 49.3
2019 2 48.1
2019 3 47.7
2019 4 45.4
2019 5 47.1
2019 6 48.8
2019 7 49.3
2019 8 50.5
2019 9 49.5
2019 10 51.6
2019 11 51.2
2019 12 49.1
2020 1 47.9
2020 2 47.9
2020 3 46.7
2020 4 42.0
2020 5 44.3
2020 6 45.7
2020 7 46.8
2020 8 46.7
2020 9 46.6
2020 10 47.5
2020 11 47.0
2020 12 48.1
2021 1 48.1
2021 2 48.0
2021 3 46.3
2021 4 43.4
2021 5 43.7
2021 6 46.8
2021 7 47.6
2021 8 48.0
2021 9 46.0
2021 10 45.5
2021 11 45.4
2021 12 44.7
2022 1 44.8
2022 2 45.1

Why the humidity in DLNM(R) showed coef/vcov not consistent with basis matrix, but temperature was OK

everyone. I am using DLNM in R to analyze to lag-effect of climatic conditions on the prevalence of the disease.
I followed somebody else's program strictly
, and it worked in avg.temp and max.speed, but showed err "coef/vcov not consistent with basis matrix" in avg.ap and avg.hum. However, i just changed the variables set in code, and never changed other code.
I have a hypothesis that maybe DLNM doesn't like wet weather. T T
I don't know what to do, can you help me?
Part 1 was the Successfully run code, part 2 was the code that showed err, and part 3 was the data I used.
Thank you very much. I hope you can help me
Part 1. Successfully run code
attach(cpdlnm)
cb.temp = crossbasis(avg.temp, lag=1 ,
argvar=list(fun="ns",
knots= c(10)),
arglag=list(fun="lin"))
modeltemp = glm(pre1 ~ cb.temp +
ns(no,1*1),
family=quasipoisson(), cpdlnm)
pred1.temp = crosspred(cb.temp,
modelhum,
cen=round(median(avg.temp)),
bylag=1)
Part 2. Error code
attach(cpdlnm)
cb.hum = crossbasis(avg.hum, lag=1 ,
argvar=list(fun="ns",
knots= c(10)),
arglag=list(fun="lin"))
modelhum = glm(pre1 ~ cb.hum +
ns(no,1*1),
family=quasipoisson(), cpdlnm)
pred1.hum = crosspred(cb.hum, # This step shows "coef/vcov not consistent with basis matrix"
modelhum,
cen=round(median(avg.hum)),
bylag=0.1)
Part 3. the data are as following:
no pre1 date year month avg.ap avg.temp avg.hum max.speed
1 3.23 12-Jan 2012 1 996.60 9.00 81.60 5.30
2 6.04 12-Feb 2012 2 993.20 10.90 80.80 6.20
3 5.18 12-Mar 2012 3 991.00 16.40 78.70 7.60
4 4.07 12-Apr 2012 4 985.40 23.50 73.50 7.40
5 4.88 12-May 2012 5 982.60 26.30 77.20 7.00
6 5.11 12-Jun 2012 6 978.10 27.00 81.30 6.20
7 6.18 12-Jul 2012 7 979.50 28.10 77.70 6.40
8 6.17 12-Aug 2012 8 980.40 28.00 75.60 7.90
9 5.18 12-Sep 2012 9 987.60 25.30 73.60 6.30
10 5.16 12-Oct 2012 10 990.70 23.60 72.20 6.20
11 4.61 12-Nov 2012 11 991.70 18.00 79.70 6.90
12 5.26 12-Dec 2012 12 995.00 13.20 74.90 6.50
13 3.79 13-Jan 2013 1 997.10 11.20 78.40 5.70
14 3.87 13-Feb 2013 2 993.50 15.30 82.20 6.50
15 3.37 13-Mar 2013 3 989.90 20.20 74.20 8.00
16 2.85 13-Apr 2013 4 987.00 21.50 78.50 7.70
17 4.38 13-May 2013 5 983.30 25.60 79.20 6.80
18 5.67 13-Jun 2013 6 980.60 27.40 76.90 6.60
19 6.45 13-Jul 2013 7 981.30 28.00 77.50 7.10
20 6.95 13-Aug 2013 8 980.50 27.90 78.20 7.90
21 6.51 13-Sep 2013 9 985.90 25.40 77.60 6.00
22 8.16 13-Oct 2013 10 992.20 22.10 68.80 5.30
23 5.34 13-Nov 2013 11 994.50 18.70 72.30 6.20
24 6.18 13-Dec 2013 12 997.30 11.70 67.20 5.30
25 5.69 14-Jan 2014 1 996.70 12.70 70.30 6.00
26 6.44 14-Feb 2014 2 993.00 12.10 76.90 6.40
27 4.16 14-Mar 2014 3 991.60 16.50 83.90 7.30
28 4.13 14-Apr 2014 4 987.60 22.60 82.40 6.70
29 3.96 14-May 2014 5 983.60 25.70 78.80 7.70
30 4.72 14-Jun 2014 6 979.20 27.70 81.40 7.90
31 5.21 14-Jul 2014 7 980.70 28.30 80.20 9.40
32 5.29 14-Aug 2014 8 982.40 27.50 81.30 7.50
33 6.74 14-Sep 2014 9 984.70 27.10 77.70 8.50
34 4.80 14-Oct 2014 10 991.20 23.90 73.10 5.90
35 4.31 14-Nov 2014 11 993.30 18.60 79.60 6.20
36 4.35 14-Dec 2014 12 998.70 12.30 67.30 5.90
37 2.95 15-Jan 2015 1 996.70 13.30 76.30 6.20
38 4.63 15-Feb 2015 2 993.50 15.50 78.30 6.50
39 4.00 15-Mar 2015 3 991.70 17.70 83.40 6.30
40 4.16 15-Apr 2015 4 988.40 22.80 70.20 7.30
41 4.67 15-May 2015 5 982.40 26.70 80.50 8.00
42 5.62 15-Jun 2015 6 980.90 28.20 81.00 7.40
43 5.04 15-Jul 2015 7 980.20 27.30 79.40 6.70
44 5.79 15-Aug 2015 8 982.40 27.60 80.10 6.50
45 5.28 15-Sep 2015 9 986.30 26.00 84.60 6.50
46 4.39 15-Oct 2015 10 991.20 23.00 78.30 6.90
47 4.13 15-Nov 2015 11 993.50 19.40 85.30 6.90
48 3.30 15-Dec 2015 12 997.80 13.00 80.90 5.70
49 5.30 16-Jan 2016 1 996.00 11.80 82.30 6.40
50 4.57 16-Feb 2016 2 997.80 12.20 68.90 7.00
51 4.66 16-Mar 2016 3 991.70 17.00 78.90 7.00
52 4.01 16-Apr 2016 4 984.60 23.40 80.90 9.80
53 4.90 16-May 2016 5 983.80 25.50 78.70 8.30
54 3.75 16-Jun 2016 6 981.70 28.20 78.80 7.70
55 3.13 16-Jul 2016 7 981.10 28.90 77.60 7.60
56 3.25 16-Aug 2016 8 979.00 28.00 79.80 8.70
57 2.93 16-Sep 2016 9 984.30 26.60 75.20 6.40
58 2.93 16-Oct 2016 10 987.90 24.40 72.90 7.00
59 3.08 16-Nov 2016 11 993.40 18.10 79.60 6.70
60 2.99 16-Dec 2016 12 995.70 15.40 71.70 6.80
61 3.10 17-Jan 2017 1 994.70 14.50 79.20 6.50
62 3.75 17-Feb 2017 2 994.80 14.70 71.50 8.30
63 3.49 17-Mar 2017 3 990.20 16.50 83.60 8.50
64 3.36 17-Apr 2017 4 986.80 21.90 76.70 7.80
65 3.69 17-May 2017 5 985.00 24.80 77.50 10.00
66 3.76 17-Jun 2017 6 980.20 26.90 84.80 8.50
67 2.69 17-Jul 2017 7 981.00 27.50 83.60 9.80
68 3.05 17-Aug 2017 8 980.50 27.70 83.40 9.00
69 3.05 17-Sep 2017 9 984.20 27.60 81.50 7.10
70 2.46 17-Oct 2017 10 990.00 22.80 75.90 7.90
71 2.08 17-Nov 2017 11 993.00 17.80 79.50 7.00
72 2.32 17-Dec 2017 12 996.90 13.30 69.30 6.90
73 2.53 18-Jan 2018 1 992.10 12.00 78.40 8.10
74 3.29 18-Feb 2018 2 992.90 13.40 68.70 7.20
75 3.03 18-Mar 2018 3 988.30 19.20 78.20 9.10
76 2.30 18-Apr 2018 4 986.50 21.80 77.30 8.70
77 1.75 18-May 2018 5 982.60 26.70 79.40 8.90
78 2.03 18-Jun 2018 6 978.30 26.90 81.60 9.00
79 2.79 18-Jul 2018 7 976.80 27.90 82.10 9.20
80 2.32 18-Aug 2018 8 976.40 27.50 83.40 9.60
81 1.88 18-Sep 2018 9 983.50 26.10 80.10 8.90
82 2.76 18-Oct 2018 10 990.50 21.10 78.70 7.10
83 2.14 18-Nov 2018 11 991.50 18.20 80.30 7.10
84 1.78 18-Dec 2018 12 994.50 13.00 84.00 7.80
85 2.77 19-Jan 2019 1 995.20 11.70 84.50 7.30
86 4.60 19-Feb 2019 2 990.50 13.70 84.80 8.10
87 2.32 19-Mar 2019 3 987.70 17.30 85.90 9.90
88 2.07 19-Apr 2019 4 983.60 23.10 84.80 9.80
89 2.97 19-May 2019 5 981.80 24.30 83.20 7.70
90 2.48 19-Jun 2019 6 977.80 27.50 84.80 9.00
91 2.32 19-Jul 2019 7 977.20 27.80 85.00 8.90
92 2.06 19-Aug 2019 8 977.20 28.30 81.20 10.30
93 2.10 19-Sep 2019 9 984.60 26.40 72.70 8.20
94 2.89 19-Oct 2019 10 989.10 22.70 78.00 7.00
My guess is that when you specify "knots= c(10)", 10 is within the range of temperature but not the same for humidity (if the min>10, then the lag can't be defined).

Define multiple columns when reading a txt file into R [duplicate]

This question already has answers here:
Reading text file with multiple space as delimiter in R
(3 answers)
Closed 1 year ago.
I am trying to read wave height data into R using this website
https://www.ndbc.noaa.gov/download_data.php?filename=51201h2017.txt.gz&dir=data/historical/stdmet/
my code is
buoy <- 51211
year <- 2017
one_yr <- paste0("https://www.ndbc.noaa.gov/view_text_file.php?filename=",
buoy, "h", year, ".txt.gz&dir=data/historical/stdmet/")
oneBuoy_oneYR.df <- read.csv(one_yr, fill = TRUE)
The resulting output is a data frame that has one column and 8985 observations. I have tried using sep = " " but there are some columns that are separated with two spaces instead of one. I have also tried read.delim
I'm sure there is an easy solution, I just haven't found it.
Use fread from data.table. fread will detetec the separator and colClasses automatically for you.
library(data.table)
#> Warning: package 'data.table' was built under R version 4.0.4
buoy <- 51211
year <- 2017
one_yr <- paste0("https://www.ndbc.noaa.gov/view_text_file.php?filename=",
buoy, "h", year, ".txt.gz&dir=data/historical/stdmet/")
oneBuoy_oneYR.df <- fread(one_yr, fill = TRUE)
head(oneBuoy_oneYR.df)
#> #YY MM DD hh mm WDIR WSPD GST WVHT DPD APD MWD PRES ATMP WTMP DEWP
#> 1: #yr mo dy hr mn degT m/s m/s m sec sec degT hPa degC degC degC
#> 2: 2017 06 07 14 30 999 99.0 99.0 0.58 15.38 5.66 161 9999.0 999.0 26.4 999.0
#> 3: 2017 06 07 15 00 999 99.0 99.0 0.58 15.38 5.61 156 9999.0 999.0 26.4 999.0
#> 4: 2017 06 07 15 30 999 99.0 99.0 0.55 12.50 5.37 161 9999.0 999.0 26.3 999.0
#> 5: 2017 06 07 16 30 999 99.0 99.0 0.64 12.50 4.97 158 9999.0 999.0 26.3 999.0
#> 6: 2017 06 07 17 00 999 99.0 99.0 0.64 15.38 4.95 158 9999.0 999.0 26.3 999.0
#> VIS TIDE
#> 1: mi ft
#> 2: 99.0 99.00
#> 3: 99.0 99.00
#> 4: 99.0 99.00
#> 5: 99.0 99.00
#> 6: 99.0 99.00
Created on 2021-05-31 by the reprex package (v0.3.0)

Count highest value in data frame in R

I have the below data frame (DF). I need to count how many times/years for each station has recorded maximum avg. temp, minumum avg. temp and maximum total precipitation.
In each row of DF above, year is followed by max avg. temp, min avg. temp and total avg. precipitation. For example, if in year 1985 highest max avg. temperature is recorded in station 1, it should count as one and so on.
Any suggestion or help is appreciated.
Thanks.
DF:
St_name Met_data
station1 1985 15.33 4.33 780.1, 1986 12.7 2.18 505.3, 1987 17.76 6.33 793.6, 1988 17.35 4.53 541, 1989 15.65 3.98 793.7, 1990 16.9 5.96 1169.4, 1991 16.42 5.26 790.6, 1992 14.99 5.04 932.6, 1993 13.96 4.75 1420.7, 1994 14.96 3.79 668.8, 1995 15 3.67 952.9, 1996 13.77 2.4 808.5, 1997 14.69 3.26 773.5, 1998 17.22 6.25 1126.4, 1999 16.35 4.32 921.9, 2000 14.55 2.83 893.9, 2001 15.71 4.33 1118.8, 2002 15.61 3.96 1000.4, 2003 14.83 2.84 911.7, 2004 14.9 4 965.1, 2005 16.16 4.7 647.7, 2006 16.18 5.14 800.8, 2007 15.52 4.15 890.3, 2008 14.35 2.91 1271.9, 2009 14.4 3.77 1343.8, 2010 15.32 4.57 1145.4, 2011 15.41 4.54 857.3, 2012 17.39 5.4 745, 2013 15.26 3.51 811.4, 2014 13.8 2.37 986.3
station2 1985 19.27 7.81 1465.5, 1986 20.37 8.81 1201.3, 1987 20.95 8.72 949.2, 1988 20.03 7.53 1104.6, 1989 19.11 7.42 1050.1, 1990 20.53 8.76 1486.2, 1991 20.21 9.53 1164.4, 1992 19.55 8.51 913.6, 1993 18.7 8.24 1485.1, 1994 19.43 8.42 1171.7, 1995 19.62 7.41 1084.9, 1996 19.01 6.29 1212.4, 1997 18.85 6.76 1243.2, 1998 21 8.27 1261.1, 1999 21.28 7.99 1122.4, 2000 19.99 7.74 1242.7, 2001 20.13 7.59 1305.8, 2002 20.13 7.69 1563, 2003 19.48 6.52 1237.1, 2004 19.94 7.42 1174.8, 2005 20.53 8.05 1140.5, 2006 20.16 7.18 1542, 2007 21.44 7.91 1167.8, 2008 17.6 5.51 1653.8, 2009 20.63 9.06 1326, 2010 21.31 8.7 1024.8, 2011 21.21 9.96 1847.6, 2012 22.22 9.39 782.5, 2013 20.46 9.29 770.7
station3 1985 14.43 2.97 951.6, 1986 15.41 3.37 415.6, 1987 15.08 4.34 1110, 1988 16.19 3.33 787.6, 1989 14.77 2.19 796.8, 1990 16.28 4.59 1213.6, 1991 16.72 4.67 907.4, 1992 14.74 4.18 935.6, 1993 15.22 5.06 903.1, 1994 15.46 2.79 907.5, 1995 15.34 4.21 1001.1, 1996 14.46 2.49 1204.5, 1997 14.95 2.95 819, 1998 17.5 5.3 1078.6, 1999 16.73 3.24 901.9, 2000 15.81 2.7 931.4, 2001 16.68 4.09 968.7, 2002 16.48 6.41 762.2, 2003 15.47 4.99 999.6, 2004 15.32 5.31 875.7, 2005 16.16 5.91 593.2, 2006 16.06 6.3 997.2, 2007 15.87 5.71 946, 2008 14.46 4.1 1128.1, 2009 14.26 4.38 1146.1, 2010 15.92 4.79 1037.6, 2011 15.25 5.47 1045.8, 2012 17.47 6.43 659.2, 2013 14.25 4 1092.9, 2014 13.26 2.98 1039.4
.
.
Output:
St_name max_T_count min_T_count precip_count
station1 1 0 0
station2 0 2 0
station3 1 1 1
.
.
You should at least make an effort to organize your data in a spreadsheet before posting. The first four lines in the code below are just for tidying your data. I am also not sure what you want for precip_count, but at least you can work that out based on this solution.
library(tidyverse)
df %>% separate_rows(Met_data, sep = ",") %>%
mutate(Met_data = trimws(Met_data)) %>%
separate(Met_data, sep = " ", into = c("year", "max_avg", "min_avg", "total_avg")) %>%
group_by(year) %>%
mutate(max_T_count = as.integer(max_avg == max(max_avg)),
min_T_count = as.integer(min_avg == min(min_avg)),
precip_count = as.integer(total_avg == max(total_avg))) %>%
ungroup() %>%
group_by(St_name) %>%
summarise_at(vars(ends_with("count")), sum)
%>% is the magrittr package pipe operator.
separate_rows separates the entries of the column at commas Met_data into new rows.
trimws trims the extra whitespaces around characters. This is necessary in order for separate the characters exactly at blanks.
separate separates Met_data at blanks and assigns the separated variables with column names.
group_by specifies by which grouping the aggregation is going to be done.
mutate creates new columns.
summarise_at makes summaries on specified columns with specified functions.
These are a handful. I advise you to read the documentations for each of these by typing ?function where you replace function by each of those mentioned above. Or you can use help like `help("%>%", package = "magrittr").
Here is the output.
# A tibble: 3 x 4
# St_name max_T_count min_T_count precip_count
# <fct> <int> <int> <int>
# 1 station1 1 17 11
# 2 station2 29 0 5
# 3 station3 0 13 14
Here is the data.
df <- structure(list(St_name = structure(1:3, .Label = c("station1",
"station2", "station3"), class = "factor"), Met_data = structure(c(2L,
3L, 1L), .Label = c(" 1985 14.43 2.97 951.6, 1986 15.41 3.37 415.6, 1987 15.08 4.34 1110, 1988 16.19 3.33 787.6, 1989 14.77 2.19 796.8, 1990 16.28 4.59 1213.6, 1991 16.72 4.67 907.4, 1992 14.74 4.18 935.6, 1993 15.22 5.06 903.1, 1994 15.46 2.79 907.5, 1995 15.34 4.21 1001.1, 1996 14.46 2.49 1204.5, 1997 14.95 2.95 819, 1998 17.5 5.3 1078.6, 1999 16.73 3.24 901.9, 2000 15.81 2.7 931.4, 2001 16.68 4.09 968.7, 2002 16.48 6.41 762.2, 2003 15.47 4.99 999.6, 2004 15.32 5.31 875.7, 2005 16.16 5.91 593.2, 2006 16.06 6.3 997.2, 2007 15.87 5.71 946, 2008 14.46 4.1 1128.1, 2009 14.26 4.38 1146.1, 2010 15.92 4.79 1037.6, 2011 15.25 5.47 1045.8, 2012 17.47 6.43 659.2, 2013 14.25 4 1092.9, 2014 13.26 2.98 1039.4",
" 1985 15.33 4.33 780.1, 1986 12.7 2.18 505.3, 1987 17.76 6.33 793.6, 1988 17.35 4.53 541, 1989 15.65 3.98 793.7, 1990 16.9 5.96 1169.4, 1991 16.42 5.26 790.6, 1992 14.99 5.04 932.6, 1993 13.96 4.75 1420.7, 1994 14.96 3.79 668.8, 1995 15 3.67 952.9, 1996 13.77 2.4 808.5, 1997 14.69 3.26 773.5, 1998 17.22 6.25 1126.4, 1999 16.35 4.32 921.9, 2000 14.55 2.83 893.9, 2001 15.71 4.33 1118.8, 2002 15.61 3.96 1000.4, 2003 14.83 2.84 911.7, 2004 14.9 4 965.1, 2005 16.16 4.7 647.7, 2006 16.18 5.14 800.8, 2007 15.52 4.15 890.3, 2008 14.35 2.91 1271.9, 2009 14.4 3.77 1343.8, 2010 15.32 4.57 1145.4, 2011 15.41 4.54 857.3, 2012 17.39 5.4 745, 2013 15.26 3.51 811.4, 2014 13.8 2.37 986.3",
" 1985 19.27 7.81 1465.5, 1986 20.37 8.81 1201.3, 1987 20.95 8.72 949.2, 1988 20.03 7.53 1104.6, 1989 19.11 7.42 1050.1, 1990 20.53 8.76 1486.2, 1991 20.21 9.53 1164.4, 1992 19.55 8.51 913.6, 1993 18.7 8.24 1485.1, 1994 19.43 8.42 1171.7, 1995 19.62 7.41 1084.9, 1996 19.01 6.29 1212.4, 1997 18.85 6.76 1243.2, 1998 21 8.27 1261.1, 1999 21.28 7.99 1122.4, 2000 19.99 7.74 1242.7, 2001 20.13 7.59 1305.8, 2002 20.13 7.69 1563, 2003 19.48 6.52 1237.1, 2004 19.94 7.42 1174.8, 2005 20.53 8.05 1140.5, 2006 20.16 7.18 1542, 2007 21.44 7.91 1167.8, 2008 17.6 5.51 1653.8, 2009 20.63 9.06 1326, 2010 21.31 8.7 1024.8, 2011 21.21 9.96 1847.6, 2012 22.22 9.39 782.5, 2013 20.46 9.29 770.7"
), class = "factor")), .Names = c("St_name", "Met_data"), class = "data.frame", row.names = c(NA,
-3L))

reading ascii file in R

I am trying to read a file (ascii) in R using read.table
The file looks like the following:
DAILY MAXIMUM TEMPARATURE
YEAR DAY MT DT LAT. 66.5 67.5 68.5 69.5 70.5
1969 001 01 01 6.5 99.90 99.90 31.90 99.90 99.90
1969 001 01 01 7.5 99.90 20.90 99.90 99.90 23.90
1969 001 01 01 8.5 99.90 99.90 30.90 99.90 18.90
.....
.....
YEAR DAY MT DT LAT. 66.5 67.5 68.5 69.5 70.5
1969 001 01 02 6.5 21.90 99.90 99.90 99.90 99.90
1969 001 01 02 7.5 99.90 33.90 99.90 99.90 99.90
1969 001 01 02 8.5 99.90 99.90 15.90 99.90 99.90
.....
.....
YEAR DAY MT DT LAT. 66.5 67.5 68.5 69.5 70.5
1969 001 01 03 6.5 99.90 99.90 99.90 99.90 99.90
1969 001 01 03 7.5 99.90 99.90 99.90 99.90 99.90
1969 001 01 03 8.5 99.90 99.90 99.90 99.90 99.90
.....
.....
I read it using:
inp=read.table("MAXT1969.TXT",skip=1,header=T)
The file has been read and the contents are in the variable inp.
I have 2 questions -
I. the command to see the first 5 columns gives some extra information along with the desired output,
for example, inp[1,5] gives the following output:
> inp[1,5]
"[1] 6.5
33 Levels: 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 20.5 21.5 ... LAT."
I don't want the extra info but only the value. Where I am going wrong?
II. After every 32 rows, I've a header (YEAR DAY ....). How to ignore reading the header at regular intervals?
Try comment.char="Y" which will make read.table ignore all the lines starting with Y.
stringsAsFactors=FALSE will avoid converting strings to factors.
inp <- read.table("MAXT1969.TXT", skip = 1, header=FALSE, comment.char="Y", stringsAsFactors=FALSE )
#Read just first row to get header names
cols <- read.table("MAXT1969.TXT", header=FALSE, skip=1, nrows=1 )
names(inp) <- cols
inp
## YEAR DAY MT DT LAT. 66.5 67.5 68.5 69.5 70.5
## 1 1969 1 1 1 6.5 99.9 99.9 31.9 99.9 99.9
## 2 1969 1 1 1 7.5 99.9 20.9 99.9 99.9 23.9
## 3 1969 1 1 1 8.5 99.9 99.9 30.9 99.9 18.9
## 4 1969 1 1 2 6.5 21.9 99.9 99.9 99.9 99.9
## 5 1969 1 1 2 7.5 99.9 33.9 99.9 99.9 99.9
## 6 1969 1 1 2 8.5 99.9 99.9 15.9 99.9 99.9
## 7 1969 1 1 3 6.5 99.9 99.9 99.9 99.9 99.9
## 8 1969 1 1 3 7.5 99.9 99.9 99.9 99.9 99.9
## 9 1969 1 1 3 8.5 99.9 99.9 99.9 99.9 99.9
#Since the stringsAsFactor = FALSE was used numbers were read correctly.
inp[1, 5]
## [1] 6.5
Question 1: This means that you value has been read as a factor, i.e. a categorical variable. Just use as.numeric on the column to transform it from factor to numeric. Alternatively, you can use the colClasses argument to read.table to directly specify the type of the columns in the file.
Question 2: You can read the lines using readLines, find the lines that start with YEAR using grep, delete those, and read this edited output into a data.frame using read.table(textConnection(edited_data)). I would use #geektrader's solution in stead, but I just wanted to add this for completeness sake.
Another solution would be to introduce NAs and then omit them -
inp = as.data.frame(na.omit(apply(apply(inp, 2, as.character), 2, as.numeric)))

Resources