I have a data frame with columns like dates, volumes, and companies.
I would like to know how to keep only one line for each company?
grep returns the number of lines, but How can I do to get the full line please?
Besides how can I plot these volumes per companies on one single time series plot please?
I found plot.ts but I can't do it while I don't have volumes per companies as if I plot.ts my full data set it does not make the difference between companies and then have a wrong time serie (many points for a single date)
I would like to have a plot like this:
time series plot
but instead of "websites" have the "volumes" and instead of "shoes,socks,lace" have the name of my companies/subjects
Or like that but with the svolumes
time series plot 2
That's how my data looks like:
> head(data)
Date Time Subject Sscore Smean Svscore Sdispersion Svolume Sbuzz Last close
1 2015-07-08 09:10:00 MMM -0.2280 0.2593 -0.2795 0.375 8 0.6026 155.430000000
2 2015-07-08 09:10:00 ACE -0.4415 0.3521 -0.0374 0.500 4 0.7200 104.460000000
3 2015-07-07 09:10:00 AES 1.9821 0.0233 1.1743 1.000 1 1.9445 13.200000000
4 2015-07-04 09:10:00 AFL -2.9335 0.0035 -0.2975 1.000 1 0.8321 61.960000000
5 2015-07-07 09:10:00 MMM 0.2977 0.2713 -0.7436 0.400 5 0.4895 155.080000000
6 2015-07-07 09:10:00 ACE -0.2331 0.3519 -0.1118 1.000 3 0.7196 103.330000000
Company name Date
1 3M Company 2015-07-08
2 ACE Limited 2015-07-08
3 The AES Corporation 2015-07-07
4 AFLAC Inc. 2015-07-04
5 3M Company 2015-07-07
6 ACE Limited 2015-07-07
Thank you very much!
I'm not yet totally clear about what you have and what you want. It's very difficult without a reproducible example!
I assume you want to summarize your data by company and date. You could achieve this by using the data.table package:
library(data.table)
setDT(data)
newdata<-data[,.(volume=sum(Svolume)),by=.("Company name",Date)]
# Notice that you can use any other function instead of sum. mean, to mention one
Once you have that newdata object, you can try to plot it with ggplot2:
library(ggplot2)
library(scales)
pl<-ggplot(newdata,aes(x=Date,y=volume,group=`Company name`))+geom_line()+scale_x_date(format="%d-%m-%y")
pl
This solution intends to be general (as your desired output is not well defined in your question), so it might require some minor adjustments.
I found how to do it, if it can help someone else:
library(ggplot2)
ggplot(data)+geom_line(aes(x=data$Date, y=data$Svolume, group=data$Subject, color=data$Subject))
Related
I have this data I wanted to convert to dates, but I doubt it is possible with year that is below 0, below is the snippets
library(datasets)
library(quantmod)
data(treering)
tree_df = data.frame(ds=index(treering), y=as.numeric(treering))
> head(tree_df)
ds y
1 -6000 1.345
2 -5999 1.077
3 -5998 1.545
4 -5997 1.319
5 -5996 1.413
6 -5995 1.069
> tail(tree_df)
ds y
7975 1974 1.031
7976 1975 1.027
7977 1976 1.173
7978 1977 1.471
7979 1978 1.444
7980 1979 1.160
?treering
Yearly Treering Data, -6000–1979
Description
Contains normalized tree-ring widths in dimensionless units.
Usage
treering
Format
A univariate time series with 7981 observations. The object is of class "ts".
Each tree ring corresponds to one year.
Is there a way to convert the data into dates with a negative year in its own way? like for example "-6000-01-01"?
Apparently by converting Minus Integer to Date help the trick, in this case (-2910983) from the year 1970 is -6000, therefore a sequence of 1 Year will help and then finally converted to Date
sequences = seq(as.Date(-2910983,origin="1970-01-01"),as.Date(paste0(max(index(treering)),"-01-01")),by="1 years")
tail(sequences)
[1] "1974-01-01" "1975-01-01" "1976-01-01" "1977-01-01" "1978-01-01" "1979-01-01"
head(sequences)
[1] "-6000-01-01" "-5999-01-01" "-5998-01-01" "-5997-01-01" "-5996-01-01" "-5995-01-01"
I have a dataframe as follows:
Date Price1 Price2 Price3 Price4 .... Price 24
2017-10-15 60.43 49.40 48.72 48.32
2017-10-16 38.09 30.00 24.47 24.88
2017-10-17 48.80 46.76 46.73 45.82
The goal is to turn the dataframe object into a temporal series, predicting as well the date 2017-10-18, with all the corresponding 24 price/values.
Actually, I get the ts object, but it appears the following error at time to compute Error in ets(stock_prize) : y should be a univariate time series
Any advice?
I think your data structure is not correct. I suggest you should make those dates a factor and make only one column for the values. For example you have something like this:
mydates <- as.Date(c("2007-06-22", "2004-02-13"))
mydates2 <-as.Date(c("2008-06-22", "2005-02-13"))
mydates3 <-as.Date(c("2009-06-22", "2006-02-13"))
hours <- c(8,9)
values <- c(1,2)
a=data.frame(mydates,mydates2,mydates3,hours,values)
a
This is how your data looks:
mydates mydates2 mydates3 hours values
1 2007-06-22 2008-06-22 2009-06-22 8 1
2 2004-02-13 2005-02-13 2006-02-13 9 2
But you should transform them to look something like this:
dates=c(mydates,mydates2,mydates3)
hours_factor=rep(hours,3)
ordered_values=rep(values,3)
b=data.frame(dates,hours_factor,ordered_values)
b
This is how your data shoud look like:
dates hours_factor ordered_values
1 2007-06-22 8 1
2 2004-02-13 9 2
3 2008-06-22 8 1
4 2005-02-13 9 2
5 2009-06-22 8 1
6 2006-02-13 9 2
After that you can make the variables a ts class. You can use ts function for that. If you want to predict next date value you can do an auto-regression. It is very well documented in the Internet, but please know your data have to match some requirements first.
Sorry if this question seems so basic but the examples I see online are too complex for a beginner like me so please help me through this. Basically, I want to predict when is a Measurement high and low based on the data. Below is a part of the dataset I have. The full picture is a 10-day with 30-minute interval.
Datetime Measurement
1 4/4/2013 11:59 0.532
2 4/4/2013 12:29 0.647
3 4/4/2013 12:59 0.564
4 4/4/2013 13:29 0.425
5 4/4/2013 13:59 0.253
I tried the following to convert the Datetime column to a Datetime type
data$Datetime <- strptime(x = as.character(data$Datetime),
format = "%m/%d/%Y %H:%M")
which resulted to
Datetime Measurement
1 2013-04-04 11:59:00 0.532
2 2013-04-04 12:29:00 0.647
3 2013-04-04 12:59:00 0.564
4 2013-04-04 13:29:00 0.425
5 2013-04-04 13:59:00 0.253
I've stumbled upon this and maybe it's necessary to convert the data frame to a time series. I tried the ts and xts but I think I'm going nowhere. I also tried
library(timeSeries)
as.timeSeries(data)
but then it says data.frame after I performed class(data).
How can I convert it to a time series given the situation? Can I make the link my basis and will the ARIMA model make sense? Or are there simpler and reasonable time series analysis algorithms out there that can be done?
Please and thank you very much.
I would like to make a sentiment analysis in R.
In have a dataset with a column "Svolume" with values as 1,5,41,...
Another column with different names of company.
A last column with the dates yyyy-mm-dd.
I would like to plot the times series by dates for each company, So If I have 3 company, I would like to have 3 lines, on the x axis the dates, and y axis the svolumes.
I wrote this :
data2=data[order(as.Date(data$Date, format="%d/%m/%Y")),]
In order to order the data by dates.
plot.ts(data2$Svolume)
axis(1,1:40,data2$Date)
And then this to plot a time serie of the Svolumes and then to put the dates on the X axis. I have two problems: I have only one line so I can't see the difference between the 3 company (I would love to have the line of the total too if possible), and my second problem is that it's impossible to read the dates as there are horizontal and so are wrote one on each others. Is there a way to write dates verticaly for example in order to read it?
I would like to have a plot like this: times series plot 1 but instead of "websites" have the "volumes" and instead of "shoes,socks,lace" have the name of my companies/subjects
Or like that but with the svolumes time series plot 2
That's how my data looks like:
> head(data)
Date Time Subject Sscore Smean Svscore Sdispersion Svolume Sbuzz Last close
1 2015-07-08 09:10:00 MMM -0.2280 0.2593 -0.2795 0.375 8 0.6026 155.430000000
2 2015-07-08 09:10:00 ACE -0.4415 0.3521 -0.0374 0.500 4 0.7200 104.460000000
3 2015-07-07 09:10:00 AES 1.9821 0.0233 1.1743 1.000 1 1.9445 13.200000000
4 2015-07-04 09:10:00 AFL -2.9335 0.0035 -0.2975 1.000 1 0.8321 61.960000000
5 2015-07-07 09:10:00 MMM 0.2977 0.2713 -0.7436 0.400 5 0.4895 155.080000000
6 2015-07-07 09:10:00 ACE -0.2331 0.3519 -0.1118 1.000 3 0.7196 103.330000000
Company name Date
1 3M Company 2015-07-08
2 ACE Limited 2015-07-08
3 The AES Corporation 2015-07-07
4 AFLAC Inc. 2015-07-04
5 3M Company 2015-07-07
6 ACE Limited 2015-07-07
Thank you for your help and sorry for my english.
I found how to do it, if it can help someone else:
library(ggplot2)
ggplot(data)+geom_line(aes(x=data$Date, y=data$Svolume,
+ group=data$Subject, color=data$Subject))
I was looking around the web but could not find the answer that I' looking for.
Here is my input data:
Date Calls
2012-01-01 3
2012-01-01 3
2012-01-01 10
2012-03-02 15
2012-03-02 7
2012-03-02 5
2012-04-02 0
2012-04-02 5
2012-04-02 18
2012-04-02 1
2012-04-02 0
2012-05-02 2
I want to plot a hist that will have the sum of calls for each of the days in the "Date" column.
Yes, it can be done by identifying the levels of Date column and add up the corresponding Calls but wondering
if as an elegant way to do it. "Date" column is as "Date" data.class().
According to this example, the final hist should have 4 bins of (16, 27, 24, 2).
Cheers,
Well, technically a histogram is really only to estimate the density function of continuous data and the way you have your data coded, it's more like Date is a categorical variable. So you probably just want a bar chart of counts than a true histogram. You can do what with ggplot with
qplot(Date,Calls, data=dd, stat="summary", fun.y="sum", geom="bar")
Read data:
d <- read.table(text=
"Date Calls
2012-01-01 3
2012-01-01 3
2012-01-01 10
2012-03-02 15
2012-03-02 7
2012-03-02 5
2012-04-02 0
2012-04-02 5
2012-04-02 18
2012-04-02 1
2012-04-02 0
2012-05-02 2",
header=TRUE)
d$Date <- as.Date(d$Date)
library(plyr)
s <- ddply(d,"Date",summarize,Calls=sum(Calls))
library(ggplot2)
If we use Date as the x variable we get month labels:
ggplot(s,aes(x=Date,y=Calls))+geom_bar(stat="identity")
You might prefer the particular date labels:
ggplot(s,aes(x=factor(Date),y=Calls))+geom_bar(stat="identity")
Or non-default labels:
ggplot(s,aes(x=format(Date,"%d-%b"),y=Calls))+geom_bar(stat="identity")+
labs(x="Date")
It should also be possible to do this by constructing your own hist object and passing it to plot.histogram, but I think this way is easier ...