I would like to make a sentiment analysis in R.
In have a dataset with a column "Svolume" with values as 1,5,41,...
Another column with different names of company.
A last column with the dates yyyy-mm-dd.
I would like to plot the times series by dates for each company, So If I have 3 company, I would like to have 3 lines, on the x axis the dates, and y axis the svolumes.
I wrote this :
data2=data[order(as.Date(data$Date, format="%d/%m/%Y")),]
In order to order the data by dates.
plot.ts(data2$Svolume)
axis(1,1:40,data2$Date)
And then this to plot a time serie of the Svolumes and then to put the dates on the X axis. I have two problems: I have only one line so I can't see the difference between the 3 company (I would love to have the line of the total too if possible), and my second problem is that it's impossible to read the dates as there are horizontal and so are wrote one on each others. Is there a way to write dates verticaly for example in order to read it?
I would like to have a plot like this: times series plot 1 but instead of "websites" have the "volumes" and instead of "shoes,socks,lace" have the name of my companies/subjects
Or like that but with the svolumes time series plot 2
That's how my data looks like:
> head(data)
Date Time Subject Sscore Smean Svscore Sdispersion Svolume Sbuzz Last close
1 2015-07-08 09:10:00 MMM -0.2280 0.2593 -0.2795 0.375 8 0.6026 155.430000000
2 2015-07-08 09:10:00 ACE -0.4415 0.3521 -0.0374 0.500 4 0.7200 104.460000000
3 2015-07-07 09:10:00 AES 1.9821 0.0233 1.1743 1.000 1 1.9445 13.200000000
4 2015-07-04 09:10:00 AFL -2.9335 0.0035 -0.2975 1.000 1 0.8321 61.960000000
5 2015-07-07 09:10:00 MMM 0.2977 0.2713 -0.7436 0.400 5 0.4895 155.080000000
6 2015-07-07 09:10:00 ACE -0.2331 0.3519 -0.1118 1.000 3 0.7196 103.330000000
Company name Date
1 3M Company 2015-07-08
2 ACE Limited 2015-07-08
3 The AES Corporation 2015-07-07
4 AFLAC Inc. 2015-07-04
5 3M Company 2015-07-07
6 ACE Limited 2015-07-07
Thank you for your help and sorry for my english.
I found how to do it, if it can help someone else:
library(ggplot2)
ggplot(data)+geom_line(aes(x=data$Date, y=data$Svolume,
+ group=data$Subject, color=data$Subject))
Related
I am dealing with a dataset with dates and various response values at different time intervals as shown below
Id Date Response
1 2008-03-12 4.88
1 2009-06-06 5.39
2 2015-10-22 8.61
2 2019-09-26 6.20
3 2006-09-28 7.40
3 2009-07-15 7.25
3 2011-01-19 9.50
Dates are X values, Response y values.
I am interested in estimating the AUC for each Id. Any suggestions for accomplishing this is much appreciated.
I started R only a bit ago. So I would like to ask you if I am doing t-test right with my purpose.
I have two data set like this.
> head(da1)
LiefertagDeliveryDate Price Hour
1 2015-12-31 28.82 1
25 2015-12-30 42.97 1
49 2015-12-29 43.38 1
73 2015-12-28 48.54 1
97 2015-12-27 46.36 1
121 2015-12-26 42.68 1
And,
> head(sp1)
# A tibble: 6 x 3
Date Hour Price
<dttm> <chr> <dbl>
1 2015-12-31 1 16.06
2 2015-12-30 1 28.51
3 2015-12-29 1 20.59
4 2015-12-28 1 27.94
5 2015-12-27 1 13.42
6 2015-12-26 1 -36.07
So basically everyday, from 2015-12-31 down to 2011-01-01 for hour 1. I would like to conduct t.test for those data to compare the average of price for hour 1 in the each data set and see if it is significant.
For this purpose, I conducted,
t.test(da1$Price, sp1$Price, data=rp1, var.equal=TRUE, conf.level = 0.95,
alternative = 'two.sided', paired=F)
(For equal variance with 95% confidence level for two sided. not paired because it is two different data)
Did I do it right for my purpose? I am very new with R, so I am not really sure if I did good work and since I am a programming beginner, strings and logicals all of that sound very foreign which makes me hard to understand the description in help.
Thank you for your kind advise in advance.
Have a great day people!
I have a dataframe as follows:
Date Price1 Price2 Price3 Price4 .... Price 24
2017-10-15 60.43 49.40 48.72 48.32
2017-10-16 38.09 30.00 24.47 24.88
2017-10-17 48.80 46.76 46.73 45.82
The goal is to turn the dataframe object into a temporal series, predicting as well the date 2017-10-18, with all the corresponding 24 price/values.
Actually, I get the ts object, but it appears the following error at time to compute Error in ets(stock_prize) : y should be a univariate time series
Any advice?
I think your data structure is not correct. I suggest you should make those dates a factor and make only one column for the values. For example you have something like this:
mydates <- as.Date(c("2007-06-22", "2004-02-13"))
mydates2 <-as.Date(c("2008-06-22", "2005-02-13"))
mydates3 <-as.Date(c("2009-06-22", "2006-02-13"))
hours <- c(8,9)
values <- c(1,2)
a=data.frame(mydates,mydates2,mydates3,hours,values)
a
This is how your data looks:
mydates mydates2 mydates3 hours values
1 2007-06-22 2008-06-22 2009-06-22 8 1
2 2004-02-13 2005-02-13 2006-02-13 9 2
But you should transform them to look something like this:
dates=c(mydates,mydates2,mydates3)
hours_factor=rep(hours,3)
ordered_values=rep(values,3)
b=data.frame(dates,hours_factor,ordered_values)
b
This is how your data shoud look like:
dates hours_factor ordered_values
1 2007-06-22 8 1
2 2004-02-13 9 2
3 2008-06-22 8 1
4 2005-02-13 9 2
5 2009-06-22 8 1
6 2006-02-13 9 2
After that you can make the variables a ts class. You can use ts function for that. If you want to predict next date value you can do an auto-regression. It is very well documented in the Internet, but please know your data have to match some requirements first.
I have a data frame with columns like dates, volumes, and companies.
I would like to know how to keep only one line for each company?
grep returns the number of lines, but How can I do to get the full line please?
Besides how can I plot these volumes per companies on one single time series plot please?
I found plot.ts but I can't do it while I don't have volumes per companies as if I plot.ts my full data set it does not make the difference between companies and then have a wrong time serie (many points for a single date)
I would like to have a plot like this:
time series plot
but instead of "websites" have the "volumes" and instead of "shoes,socks,lace" have the name of my companies/subjects
Or like that but with the svolumes
time series plot 2
That's how my data looks like:
> head(data)
Date Time Subject Sscore Smean Svscore Sdispersion Svolume Sbuzz Last close
1 2015-07-08 09:10:00 MMM -0.2280 0.2593 -0.2795 0.375 8 0.6026 155.430000000
2 2015-07-08 09:10:00 ACE -0.4415 0.3521 -0.0374 0.500 4 0.7200 104.460000000
3 2015-07-07 09:10:00 AES 1.9821 0.0233 1.1743 1.000 1 1.9445 13.200000000
4 2015-07-04 09:10:00 AFL -2.9335 0.0035 -0.2975 1.000 1 0.8321 61.960000000
5 2015-07-07 09:10:00 MMM 0.2977 0.2713 -0.7436 0.400 5 0.4895 155.080000000
6 2015-07-07 09:10:00 ACE -0.2331 0.3519 -0.1118 1.000 3 0.7196 103.330000000
Company name Date
1 3M Company 2015-07-08
2 ACE Limited 2015-07-08
3 The AES Corporation 2015-07-07
4 AFLAC Inc. 2015-07-04
5 3M Company 2015-07-07
6 ACE Limited 2015-07-07
Thank you very much!
I'm not yet totally clear about what you have and what you want. It's very difficult without a reproducible example!
I assume you want to summarize your data by company and date. You could achieve this by using the data.table package:
library(data.table)
setDT(data)
newdata<-data[,.(volume=sum(Svolume)),by=.("Company name",Date)]
# Notice that you can use any other function instead of sum. mean, to mention one
Once you have that newdata object, you can try to plot it with ggplot2:
library(ggplot2)
library(scales)
pl<-ggplot(newdata,aes(x=Date,y=volume,group=`Company name`))+geom_line()+scale_x_date(format="%d-%m-%y")
pl
This solution intends to be general (as your desired output is not well defined in your question), so it might require some minor adjustments.
I found how to do it, if it can help someone else:
library(ggplot2)
ggplot(data)+geom_line(aes(x=data$Date, y=data$Svolume, group=data$Subject, color=data$Subject))
I was looking around the web but could not find the answer that I' looking for.
Here is my input data:
Date Calls
2012-01-01 3
2012-01-01 3
2012-01-01 10
2012-03-02 15
2012-03-02 7
2012-03-02 5
2012-04-02 0
2012-04-02 5
2012-04-02 18
2012-04-02 1
2012-04-02 0
2012-05-02 2
I want to plot a hist that will have the sum of calls for each of the days in the "Date" column.
Yes, it can be done by identifying the levels of Date column and add up the corresponding Calls but wondering
if as an elegant way to do it. "Date" column is as "Date" data.class().
According to this example, the final hist should have 4 bins of (16, 27, 24, 2).
Cheers,
Well, technically a histogram is really only to estimate the density function of continuous data and the way you have your data coded, it's more like Date is a categorical variable. So you probably just want a bar chart of counts than a true histogram. You can do what with ggplot with
qplot(Date,Calls, data=dd, stat="summary", fun.y="sum", geom="bar")
Read data:
d <- read.table(text=
"Date Calls
2012-01-01 3
2012-01-01 3
2012-01-01 10
2012-03-02 15
2012-03-02 7
2012-03-02 5
2012-04-02 0
2012-04-02 5
2012-04-02 18
2012-04-02 1
2012-04-02 0
2012-05-02 2",
header=TRUE)
d$Date <- as.Date(d$Date)
library(plyr)
s <- ddply(d,"Date",summarize,Calls=sum(Calls))
library(ggplot2)
If we use Date as the x variable we get month labels:
ggplot(s,aes(x=Date,y=Calls))+geom_bar(stat="identity")
You might prefer the particular date labels:
ggplot(s,aes(x=factor(Date),y=Calls))+geom_bar(stat="identity")
Or non-default labels:
ggplot(s,aes(x=format(Date,"%d-%b"),y=Calls))+geom_bar(stat="identity")+
labs(x="Date")
It should also be possible to do this by constructing your own hist object and passing it to plot.histogram, but I think this way is easier ...