I am trying to show the top 100 sales on a scatterplot by year. I used the below code to take top 100 games according to sales and then set it as a data frame.
top100 <- head(sort(games$NA_Sales,decreasing=TRUE), n = 100)
as.data.frame(top100)
I then tried to plot this with the below code:
ggplot(top100)+
aes(x=Year, y = Global_Sales) +
geom_point()
I bet the below error when using the subset top100
Error: data must be a data frame, or other object coercible by fortify(), not a numeric vector
if i use the actual games dataseti get the plot attached.
Any ideas?
As pointed out in comments by #CMichael, you have several issues in your code.
In absence of reproducible example, I used iris dataset to explain you what is wrong with your code.
top100 <- head(sort(games$NA_Sales,decreasing=TRUE), n = 100)
By doing that you are only extracting a single column.
The same command with the iris dataset:
> head(sort(iris$Sepal.Length, decreasing = TRUE), n = 20)
[1] 7.9 7.7 7.7 7.7 7.7 7.6 7.4 7.3 7.2 7.2 7.2 7.1 7.0 6.9 6.9 6.9 6.9 6.8 6.8 6.8
So, first, you do not have anymore two dimensions to be plot in your ggplot2. Second, even colnames are not kept during the extraction, so you can't after ask for ggplot2 to plot Year and Global_Sales.
So, to solve your issue, you can do (here the example with the iris dataset):
top100 = as.data.frame(head(iris[order(iris$Sepal.Length, decreasing = TRUE), 1:2], n = 100))
And you get a data.frame of of this type:
> str(top100)
'data.frame': 100 obs. of 2 variables:
$ Sepal.Length: num 7.9 7.7 7.7 7.7 7.7 7.6 7.4 7.3 7.2 7.2 ...
$ Sepal.Width : num 3.8 3.8 2.6 2.8 3 3 2.8 2.9 3.6 3.2 ...
> head(top100)
Sepal.Length Sepal.Width
132 7.9 3.8
118 7.7 3.8
119 7.7 2.6
123 7.7 2.8
136 7.7 3.0
106 7.6 3.0
And then if you are plotting:
library(ggplot2)
ggplot(top100, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()
Warning Based on what you provided in your example, I will suggest you to do:
top100 <- as.data.frame(head(games[order(games$NA_Sales,decreasing=TRUE),c("Year","Global_Sales")], 100))
However, if this is not satisfying to you, you should consider to provide a reproducible example of your dataset How to make a great R reproducible example
Related
I'm not sure what's going on here, but when I try to run ggplots, it tells me that u and u1 are not valid lists. Did I enter u and u1 incorrectly, that it thinks these are functions, did I forget something, or did I enter things wrong into ggplots?
u1 <- function(x,y){max(utilityf1(x))}
utilityc1 <- data.frame("utilityc1" =
u(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20),
c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)))
utilityc1 <- data.frame("utilityc1" =
u1(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20),
c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)))
hhcomp <- data.frame(
pqx, pqy, utility, hours, p1qx, p1qy, utilit, utilityc1,
utilityc, u,u1, o, o1, o2
)
library(ggplot2)
ggplot(hhcomp, aes(x=utility, y=consumption))+
coord_cartesian(xlim = c(0, 16) )+
ylim(0,20)+
labs(x = "leisure(hours)",y="counsumption(units)")+
geom_line(aes(x = u, y = consumption))+
geom_line(aes(x = u1, y = consumption))
I'm not sure what else to explain, so if someone could provide some help on providing code to stack overflow that would be useful. I'm also not sure how much of a description to have, I should have enough code to be reproducible, but there is a problem that Stack Overflow only allows so much code, so it would be good to know the right amount to add.
I think you may need to read the documentation for ggplot2 and maybe r in general.
data.frame
For starters, a data.frame object is a collection of vectors appended together column wise. Most of what you have defined as inputs for hhcomp are functions, which cannot be stored as a data.frame. A canonical example of a data frame in r is iris
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
str(iris) #print the structure of an r object
#'data.frame': 150 obs. of 5 variables:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
functions
There is a lot going on with your functions. Nested functions are fine, but it seems as though you are failing to pass all values on. This probably means you are trying to apply R's scoping rules but this makes code ambiguous of where values are found.
With the currently defined functions, calling u(1:2,3:4) passes 1:2 to utilityf but utilityf's y argument is never assigned (but with r's lazy evaluation we reach a different error before r realizes that this value is missing). The next function that gets evaluated in this nest is p1qyf which is defined as follows
p1qyf <- function(y){(w1*16)-(w1*x)}
with this definition, it does not matter what you pass to the argument y it will never be used and will always return the same thing.
#with only the function defined
p1qyf()
#Error in p1qyf() : object 'w1' not found
#defining w1
w1 <- 1.5
p1qyf()
#Error in p1qyf() : object 'x' not found
x <- 10:20
#All variables defined in the function
#can now be found in the global environment
#thus the function can be called with no errors because
#w1 and x are defined somewhere...
p1qyf() #nothing assigned to y
[1] 9.0 7.5 6.0 4.5 3.0 1.5 0.0 -1.5 -3.0 -4.5 -6.0
p1qyf(y = iris) #a data.frame assigned to y
[1] 9.0 7.5 6.0 4.5 3.0 1.5 0.0 -1.5 -3.0 -4.5 -6.0
p1qyf(y = foo_bar) #an object that hasn't even been assigned yet
[1] 9.0 7.5 6.0 4.5 3.0 1.5 0.0 -1.5 -3.0 -4.5 -6.0
I imagine you actually intend to define it this way
p1qyf <- function(y){(w1*16)-(w1*y)}
#Now what we pass to it affects the output
p1qyf(1:10)
#[1] 22.5 21.0 19.5 18.0 16.5 15.0 13.5 12.0 10.5 9.0
head(p1qyf(iris))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 16.35 18.75 21.90 23.7 NA
#2 16.65 19.50 21.90 23.7 NA
#3 16.95 19.20 22.05 23.7 NA
#4 17.10 19.35 21.75 23.7 NA
#5 16.50 18.60 21.90 23.7 NA
#6 15.90 18.15 21.45 23.4 NA
You can improve this further by defining more arguments so that R doesn't need to search for missing values with it's scoping rules
p1qyf <- function(y, w1 = 1.5){(w1*16)-(w1*y)}
#w1 is defaulted to 1.5 and doesn't need to be searched for.
I would spend some time looking into your functions because they are unclear and some, such as your p1qyf, do not fully use the arguments they are passed.
ggplot
ggplot takes some type of structured data object such as data.frame tbl_df, and allows plotting. The aes mappings can take the symbol names of the column headers you wish to map. Continuing with iris as an example.
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species))+
geom_point() +
geom_line()
I hope this helps clears up why you may be getting some errors. Honestly though, if you were actually able to declare a data.frame then the problem here is that your post is still not that reproducible. Good luck
pqxf <- function(x){(1)*(y)} # replace 1 with py and assign a value to py
pqyf <- function(y){(w * 16)-(w * x)} #
utilityf <- function(x, y) { (pqyf(x)) * ((pqxf(y)))} # the utility function C,l
hours <- c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)
w1 <- 1.5
p1qxf <- function(x){(1)*(y)} # replace 1 with py and assign a value to p1y
p1qyf <- function(y){(w1 * 16)-(w1 * x)} #
utilityf1 <- function(x, y) { (p1qyf(x)) * ((p1qxf(y)))} # the utility function (C,l)
utilitycf <- function(x,y){max(utilityf(x))/((pqyf(y)))}
utilityc1f <- function(x,y){max(utilityf1(x))/((pqyf(y)))}
u <- function(x,y){max(utilityf(x))}
u1 <- function(x,y){max(utilityf1(x))}```
I would like to know if there is any good way to allow me getting the id of the points from a scatter plot by drawing a free hand polygon in R?
I found scatterD3 and it looks nice, but I can't manage to output the lab to a variable in R.
Thank you.
Roc
Here's one way
library(iplots)
with(iris, iplot(Sepal.Width,Petal.Width))
Use SHIFT (xor) or SHIFT+ALT (and) to select points (red):
Then:
iris[iset.selected(), ]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 119 7.7 2.6 6.9 2.3 virginica
# 115 5.8 2.8 5.1 2.4 virginica
# 133 6.4 2.8 5.6 2.2 virginica
# 136 7.7 3.0 6.1 2.3 virginica
# 146 6.7 3.0 5.2 2.3 virginica
# 142 6.9 3.1 5.1 2.3 virginica
gives you the selected rows.
The package "gatepoints" available on CRAN will allow you to draw a gate returning your points of interest.
The explanation is quite clear for anyone who reads the question. The link simply links to a package that can be used as follows:
First plot your points
x <- data.frame(x=1:10, y=1:10)
plot(x, col = "red", pch = 16)
Then select your points after running the following commands:
selectedPoints <- fhs(x)
This will return:
selectedPoints
#> [1] "4" "5" "7"
#> attr(,"gate")
#> x y
#> 1 6.099191 8.274120
#> 2 8.129107 7.048649
#> 3 8.526881 5.859404
#> 4 5.700760 6.716428
#> 5 5.605314 5.953430
#> 6 6.866882 3.764390
#> 7 3.313575 3.344069
#> 8 2.417270 5.217868
How can I get the variable name of a function to act dynamically within a string?
Below is an extract of what I am trying to achieve: a function that produces a list depending on the varName. But I cannot get the varName to act dynamically within the string sqldf(...). I assume this problem is not specific to the package sqldf.
createExcelSheetData<-function(varName){
sqldf("
SELECT Name
FROM dataTable
WHERE Choice=varName
")
}
table1<-createExcelSheetData(1)
table2<-createExcelSheetData(2)
table3<-createExcelSheetData(3)
What the above gives me is the choice fixed with the text varName.
UPDATE: To have the variable within the text, not just at the end.
createExcelSheetData<-function(varName){
sqldf("
SELECT Name
FROM dataTable
WHERE Choice=varName
ORDER BY Name
")
}
table1<-createExcelSheetData(1)
table2<-createExcelSheetData(2)
table3<-createExcelSheetData(3)
fn$ is discussed in Example 6 on the sqldf home page. Here is a self contained minimial reproducible example using the iris data frame that comes with R: (In the future please ensure all code is minimal and reproducible and in particular includes all inputs).
library(sqldf)
# retrieve records for specified Species and Petal.Length above minPetalLength
f <- function(Species, minPetalLength) {
fn$sqldf("SELECT *
FROM iris
WHERE Species = '$Species' and [Petal.Length] > $minPetalLength")
}
f("virginica", 6)
giving:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 7.6 3.0 6.6 2.1 virginica
2 7.3 2.9 6.3 1.8 virginica
3 7.2 3.6 6.1 2.5 virginica
4 7.7 3.8 6.7 2.2 virginica
5 7.7 2.6 6.9 2.3 virginica
6 7.7 2.8 6.7 2.0 virginica
7 7.4 2.8 6.1 1.9 virginica
8 7.9 3.8 6.4 2.0 virginica
9 7.7 3.0 6.1 2.3 virginica
I have a small problem. I have a dataset with 8208 rows of data. It's a single column of data, I want to take every n rows as a block and add this to a new data frame.
So, for example:
newdf has column 1 to column 23.
column 1 is composed of rows 289:528 from the original dataset
column 2 is composed of rows 625:864 from the original dataset
And so on. The "block" size is 239 rows, the jump between blocks is every 336 rows.
I can do this manually, but it just becomes tedious. I have to repeat this entire procedure for another 11 sets of data so obviously a more automated approach would be preferable.
The trick here is to create an index of integers that refer to the row numbers you want to keep. This is simple enough with some use of rep, sequences and R's recycling rule.
Let me demonstrate using iris. Say you want to skip 25 rows, then return 3 rows:
skip <- 25
take <- 3
total <- nrow(iris)
reps <- total %/% (skip + take)
index <- rep(0:(reps-1), each=take) * (skip + take) + (1:take) + skip
The index now is:
index
[1] 26 27 28 54 55 56 82 83 84 110 111 112 138 139 140
And the rows of iris:
iris[index, ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
26 5.0 3.0 1.6 0.2 setosa
27 5.0 3.4 1.6 0.4 setosa
28 5.2 3.5 1.5 0.2 setosa
54 5.5 2.3 4.0 1.3 versicolor
55 6.5 2.8 4.6 1.5 versicolor
56 5.7 2.8 4.5 1.3 versicolor
82 5.5 2.4 3.7 1.0 versicolor
83 5.8 2.7 3.9 1.2 versicolor
84 6.0 2.7 5.1 1.6 versicolor
110 7.2 3.6 6.1 2.5 virginica
111 6.5 3.2 5.1 2.0 virginica
112 6.4 2.7 5.3 1.9 virginica
138 6.4 3.1 5.5 1.8 virginica
139 6.0 3.0 4.8 1.8 virginica
140 6.9 3.1 5.4 2.1 virginica
Update
Note the OP states the block size is 239 elements but it is clear from the examples rows indicated that the block size is 240
> length(289:528)
[1] 240
I'll leave the example below at a block length of 239, but adjust if it is really 240.
It isn't clear from the Question, but assuming that you have something like this
df <- data.frame(A = runif(8208))
a data frame with 8208 rows.
First compute the indices of the elements of A that you need to keep. This is done via
want <- sapply(seq(289, nrow(df)-239, by = 336),
function(x) x + (seq_len(239) - 1))
Then we can use the fact that R fills matrices by columns and convert the required elements of A to a matrix with 239 rows
mat <- matrix(df$A[want], nrow = 239)
This works
> all.equal(mat[,1], df$A[289:527])
[1] TRUE
but do note that I have taken a block length of 239 here (289:527) not the indices the OP quotes as that is a block size of 240 (see Update above)
If you want this is a data frame, just add
df2 <- as.data.frame(mat)
Try this:
1) Create a list of indices
lapply(seq(1, 8208, 336), function(X) X:(X+239)) -> Indices
2) Select Data
Columns <- lapply(Indices, function(X) OldDF[X,])
3) Combine selected data in columns
NewDF <- do.call(cbind, Columns)
Why not just:
as.dataframe(matrix(orig, nrow=528 )[289:528 ,])
Since the 8028 is not an exactl multiple of the row count we need to determine the columns:
> 8208/528
[1] 15.54545 # so either 15 or 16
> 8208-15*528
[1] 288 # all in the to-be-discarded section
as.dataframe(matrix(orig, nrow=528, col=15 )[289:528 ,])
Or:
as.dataframe(matrix(orig, nrow=528, col=8208 %/% 528)[289:528 ,])
i've got a data frame all that look like this:
http://pastebin.com/Xc1HEYyH
Now I want to create a scatter plot with the column headings in the x-axis and the respective values as the data points. For example:
7| x
6| x x
5| x x x x
4| x x x
3| x x
2| x x
1|
---------------------------------------
STM STM STM PIC PIC PIC
cold normal hot cold normal hot
This should be easy, but I can not figure out how.
Regards
The basic idea, if you want to plot using Hadley's ggplot2 is to get your data of the form:
x y
col_names values
And this can be done by using melt function from Hadley's reshape2. Do ?melt to see the possible arguments. However, here since we want to melt the whole data.frame, we just need,
melt(all)
# this gives the data in format:
# variable value
# 1 STM_cold 6.0
# 2 STM_cold 6.0
# 3 STM_cold 5.9
# 4 STM_cold 6.1
# 5 STM_cold 5.5
# 6 STM_cold 5.6
Here, x will be then column variable and y will be corresponding value column.
require(ggplot2)
require(reshape2)
ggplot(data = melt(all), aes(x=variable, y=value)) +
geom_point(aes(colour=variable))
If you don't want the colours, then just remove aes(colour=variable) inside geom_point so that it becomes geom_point().
Edit: I should probably mention here, that you could also replace geom_point with geom_jitter that'll give you, well, jittered points:
Here are two options to consider. The first uses dotplot from the "lattice" package:
library(lattice)
dotplot(values ~ ind, data = stack(all))
The second uses dotchart from base R's "graphics" options. To use the dotchart function, you need to wrap your data.frame in as.matrix:
dotchart(as.matrix(all), labels = "")
Note that the points in this graphic are not "jittered", but rather, presented in the order they were recorded. That is to say, the lowest point is the first record, and the highest point is the last record. If you zoomed into the plot for this example, you would see that you have 16 very faint horizontal lines. Each line represents one row from each column. Thus, if you look at the dots for "STM_cold" or any of the other variables that have NA values, you'll see a few blank lines at the top where there was no data available.
This has its advantages since it might show a trend over time if the values are recorded chronologically, but might also be a disadvantage if there are too many rows in your source data frame.
A bit of a manual version using base R graphics just for fun.
Get the data:
test <- read.table(text="STM_cold STM_normal STM_hot PIC_cold PIC_normal PIC_hot
6.0 6.6 6.3 0.9 1.9 3.2
6.0 6.6 6.5 1.0 2.0 3.2
5.9 6.7 6.5 0.3 1.8 3.2
6.1 6.8 6.6 0.2 1.8 3.8
5.5 6.7 6.2 0.5 1.9 3.3
5.6 6.5 6.5 0.2 1.9 3.5
5.4 6.8 6.5 0.2 1.8 3.7
5.3 6.5 6.2 0.2 2.0 3.5
5.3 6.7 6.5 0.1 1.7 3.6
5.7 6.7 6.5 0.3 1.7 3.6
NA NA NA 0.1 1.8 3.8
NA NA NA 0.2 2.1 4.1
NA NA NA 0.2 1.8 3.3
NA NA NA 0.8 1.7 3.5
NA NA NA 1.7 1.6 4.0
NA NA NA 0.1 1.7 3.7",header=TRUE)
Set up the basic plot:
plot(
NA,
ylim=c(0,max(test,na.rm=TRUE)+0.3),
xlim=c(1-0.1,ncol(test)+0.1),
xaxt="n",
ann=FALSE,
panel.first=grid()
)
axis(1,at=seq_along(test),labels=names(test),lwd=0,lwd.ticks=1)
Plot some points, with some x-axis jittering so they are not printed on top of one another.
invisible(
mapply(
points,
jitter(rep(seq_along(test),each=nrow(test))),
unlist(test),
col=rep(seq_along(test),each=nrow(test)),
pch=19
)
)
Result:
edit
Here's an example using alpha transparency on the points and getting rid of the jitter as discussed in the below comments with Ananda.
invisible(
mapply(
points,
rep(seq_along(test),each=nrow(test)),
unlist(test),
col=rgb(0,0,0,0.1),
pch=15,
cex=3
)
)