I am new to R and I keep getting inconsistent results with trying to display a column of data from a csv. I am able to import the csv into R without issue, but I can't call out the individual columns.
Here's my code:
setwd('mypath')
cdata <- read.csv(file="cendata.csv",header=TRUE, sep=",")
cdata
This prints out the following:
year pop
1 2010 2,775,332
2 2011 2,814,384
3 2012 2,853,375
4 2013 2,897,640
5 2014 2,936,879
6 2015 2,981,835
7 2016 3,041,868
8 2017 3,101,042
9 2018 3,153,550
10 2019 3,205,958
When I try to plot the following, the columns cannot be found.
plot(pop,year)
Error: object 'pop' not found
I even checked if the column names existed, and only data shows up.
ls()
[1] "data"
I can manually enter the data and label them "pop" and "year" but that kind of defeats the point of importing the csv.
Is there a way to label each header as an object?
year and pop are not independent objects. You need to refer them as part of the dataframe you have imported. Also you might need to remove "," from the numbers to turn them to numeric before plotting. Try :
cdata$pop <- as.numeric(gsub(',', '', cdata$pop))
plot(cdata$year, cdata$pop)
I have a xts object called 'usagexts' with dates from 01 Oct 15 to 31 Mar 18. I want to create 3 subsets of this object for the periods 01 Oct 15 to 31 Mar 16, 01 Oct 16 to 31 Mar 17 and 01 Oct 17 to 31 Mar 18 without actually hardcoding the dates as these will changes as time goes on.
The object structure is like so :
dateperiod,usageval
2015-10-01,21542
2015-10-02,21572
2015-10-03,21342
...
...
2018-03-31,20942
I have another data frame called 'periodvalues' like so :-
startdate,enddate, periodtext
2015-10-01,2016-03-31,1510_1603
2016-10-01,2017-03-31,1610_1703
2017-10-01,2018-03-31,1710_1803
I want to be able to create 3 xts objects like so :-
usagexts_1510_1603 -> xts object containing usage details for relevant period
usagexts_1610_1703 -> xts object containing usage details for relevant period
usagexts_1710_1803 -> xts object containing usage details for relevant period
I only got as far as creating a list of size 3 containing the periodtext from the above data frame. I was trying to somehow specify the start and end period for the xts object using the "objectname fromdate/todate" structure through variables but it didn't work - something like so :
usagexts_1610_1703 <- usagexts[var1/var2]
The LHS came from the list and the variables on the RHS cames from variable defintion done prior.
usagexts_1610_1703 <- usagexts[var1/var2]
Expected results should be like so :
usagexts_1510_1603 <- usagexts["2015-10-01/2016-03-31"]
usagexts_1610_1703 <- usagexts["2016-10-01/2017-03-31"]
usagexts_1710_1803 <- usagexts["2017-10-01/2018-03-31"]
Any assistance on that shall be highly valued.
Best regards
Deepak
If var1 and var2 are variables, then the filter string can be specified using paste as:
usagexts[paste(var1, var2, sep="/")]
I am trying to learn R, and use the corrplot library to draw Y:City and X: Population graph. I wrote the below code:
When you look at the picture above, there are 2 columns City and population. When I run the code I get this error message:
Error in cor(Illere_Gore_Nufus) : 'x' must be numeric.
My excel data:
In general, correlation plot (Scattered plot) can be plotted only when you have two continuous variable. Correlation is a value that tells you how two continuous variables are linearly related. The Correlation value will always fall between -1 and 1, where correlation value of -1 depicts weak linear relationship and correlation value of 1 depicts strong linear relationship between the two variables. Correlation value of 0 says that there is no linear relationship between the two variables, however, there could be curvi-linear relationship between the two variables
For example
Area of the land Vs Price of the land
Here is the Data
The correlation value for this data is 0.896, which means that there is a strong linear correlation between Area of the land and Price of the land (Obviously!).
Scatter plot in R would look like this
Scatter plot
The R code would be
area<-c(650,785,880,990,1100,1250,1350,1800,2200,2800)
price<-c(250,275,280,290,350,340,400,335,420,460)
cor(area,price)
plot(area,price)
In Excel, for the same example, you can select the two columns, go to Insert > Scatter plot (under charts section)
Scatter plot
In your case, the information can be plotted in bar graph with city in y axis and population in x axis or vice versa!
Hope I have answered you query!
Some assumptions
You are asking how to do this in Excel, but your question is tagged R and Power BI (also RStudio, but that has been edited away), so I'm going to show you how to do this with R and Power BI. I'm also going to show you why you got that error message, and also why you would get an error message either way because your dataset is just not sufficient to make a correlation plot.
My answer
I'm assuming you would like to make a correlation plot of the population between the cities in your table. In that table you'd need more information than only one year for each city. I would check your data sources and see if you could come up with population numbers for, let's say, the last 10 years. In lack of the exact numbers for the cities in your table, I'm going to use some semi-made up numbers for the population in the 10 most populous countries (following your datastrutcture):
Country 2017 2016 2015 2014 2013
China 1415045928 1412626453 1414944844 1411445597 1409517397
India 1354051854 1340371473 1339431384 1343418009 1339180127
United States 326766748 324472802 325279622 324521777 324459463
Indonesia 266794980 266244787 266591965 265394107 263991379
Brazil 210867954 210335253 209297939 209860881 209288278
Pakistan 200813818 199761249 200253292 197655630 197015955
Nigeria 195875237 192568158 195757661 191728478 190886311
Bangladesh 166368149 165630262 165936711 166124290 164669751
Russia 143964709 143658415 143146914 143341653 142989754
Mexcio 137590740 137486490 136768870 137177870 136590740
Writing and debugging R code in Power BI is a real pain, so I would recommend installing R studio, write your little R snippets there, and then paste it into Power B.
The reason for your error message is that the function cor() onlyt takes numerical data as arguments. In your code sample the city names are given as arguments. And there are more potential traps in your code sample. You have to make sure that your dataset is numeric. And you have to make sure that your dataset has a shape that the cor() will accept.
Below is an R script that will do just that. Copy the data above, and store it in a file called data.xlsx on your C drive.
The Code
library(corrplot)
library(readxl)
# Read data
setwd("C:/")
data <- read_excel("data.xlsx")
# Set Country names as row index
rownames(data) <- data$Country
# Remove Country from dataframe
data$Country <- NULL
# Transpose data into a readable format for cor()
data <- data.frame(t(data))
# Plot data
corrplot(cor(data))
The plot
Power BI
In Power BI, you need to import the data before you use it in an R visual:
Copy this:
Country,2017,2016,2015,2014,2013
China,1415045928,1412626453,1414944844,1411445597,1409517397
India,1354051854,1340371473,1339431384,1343418009,1339180127
United States,326766748,324472802,325279622,324521777,324459463
Indonesia,266794980,266244787,266591965,265394107,263991379
Brazil,210867954,210335253,209297939,209860881,209288278
Pakistan,200813818,199761249,200253292,197655630,197015955
Nigeria,195875237,192568158,195757661,191728478,190886311
Bangladesh,166368149,165630262,165936711,166124290,164669751
Russia,143964709,143658415,143146914,143341653,142989754
Mexcio,137590740,137486490,136768870,137177870,136590740
Save it as countries.csv in a folder of your choosing, and pick it up in Power BI using
Get Data | Text/CSV, click Edit in the dialog box, and in the Power Query Editor, click Use First Row as headers so that you have this table in your Power Query Editor:
Click Close & Apply and make sure that you've got the data available under VISUALIZATIONS | FIELDS:
Click R under VISUALIZATIONS:
Select all columns under FIELDS | countries so that you get this setup:
Take parts of your R snippet that we prepared above
library(corrplot)
# Set Country names as row index
data <- dataset
rownames(data) <- data$Country
# Remove Country from dataframe
data$Country <- NULL
# Transpose data into a readable format for cor()
data <- data.frame(t(data))
# Plot data
corrplot(cor(data))
And paste it into the Power BI R script Editor:
Click Run R Script:
And you're gonna get this:
That's it!
If you change the procedure to importing data from an Excel file instead of a textfile (using Get Data | Excel , you've successfully combined the powers of Excel, Power BI and R to produce a scatterplot!
I hope this is what you were looking for!
Say I have observations for several periods for financial data, how can I create a function in R that only adds one observation at a time throughout my dataset so that I can compare how a single observation impacts my original data?
Say for instance that I have something like this:
Apple Microsoft Tesla Amazon
2010 0.8533719 0.8078440 0.2620114 0.1869552
2011 0.7462573 0.5127501 0.5452448 0.1369686
2012 0.7580671 0.5062639 0.7847919 0.8362821
2013 0.3154078 0.6960258 0.7303597 0.6057027
2014 0.4741735 0.3906580 0.4515726 0.1396147
2015 0.4230036 0.4728911 0.1262413 0.7495193
2016 0.2396552 0.5001825 0.6732861 0.8535837
2017 0.2007575 0.8875209 0.5086837 0.2211072
#And I define my original covariance matrix as follows:
cov.m <- cov(x[1:5,])
#I would like to add only one new observation at a time, so the results should be:
cov(x[1:5,]), cov(x[1:6,]), cov(x[1:7,]), cov(x[1:8,])
I have tried using rbind and a repeat loop, but it seems like I still have to define every row to include in rbind, which is quite tedious if I want to test on say 100+ different observations as I then manually need to specify all the observations, and I would have no use for the repeat loop in that case either.
Does this get you closer to your expected output?
lapply(5:nrow(x), function(y) cov(x[1:y, ]))
I have a column of 84 monthly expenditures from 1/2004 - 12/2010, which in Excel looks like...
12247815.55
11812697.14
13741176.13
21372260.37
27412419.28
42447077.96
55563235.3
45130678.8
54579583.53
43406197.32
34318334.64
25321371.4
...(74 more entries)
I am trying to run an stl() from the forecast package on this series, and so I load the data:
d <- ts(read.csv("deseason_vVectForTS.csv",
header = TRUE),
start=c(2004,1),
end=c(2010,12),
frequency = 12)
(If I do header=FALSE it will absorb the first entry - 122...- as the header for the second column, and name the first column's header 'X')
But instead of my environment being populated with a Time Series Object from 2004 to 2011 (as it has said before) it simply says ts[1:84, 1].
Probably related is the fact that,
fit <- stl(d)
throws
Error in stl(d) : only univariate series are allowed.
despite the fact that
head(d)
[1] 12247816 11812697 13741176 21372260 27412419 42447078
and
d
Jan Feb Mar Apr May Jun Jul Aug Sep Oct
2004 12247816 11812697 13741176 21372260 27412419 42447078 55563235 45130679 54579584 43406197
("years 2005-2010 look exactly the same, and all rows have columns for Jan-Dec; it just doesn't fit on here neatly - just trying to show the object has taken the ts labeling structure.")
What am I doing wrong? As far as I know this is the same way I have been building my time series objects in the past...
read.csv reads in a matrix. If it only has one column, it is still a matrix. To make it a vector use
d <- ts(read.csv("deseason_vVectForTS.csv",
header = TRUE)[,1],
start=c(2004,1),
end=c(2010,12),
frequency = 12)
Also, please check your facts. stl is in the stats package, not the forecast package. This is easily checked by using help(stl).