How to make histogram with different columns in data set?

How to make histogram with different columns in data set? - r

I have to make a histogram from the given text file hw4aldData containing:
170 172 173 174 174 175 176 177 180 180 180 180 180 181 181 182 182 182 182 184 184 185 186 188
0.84 1.31 1.42 1.03 1.07 1.08 1.04 1.80 1.45 1.60 1.61 2.13 2.15 0.84 1.43 0.90 1.81 1.94 2.68 1.49 2.52 3.00 1.87 3.08
But each data set shows up as a different column in R like:
v1 v2 v3 v4 v5
TankTemp 170 172 173 174
EffRat 0.84 1.31 1.42 1.03
There are many more data points but I just wanted to show what it looks like. I need to make a histogram for tanktemp and effrate.
I know how to separate columns to make a histogram:
hist(hw4aldData$v1)
I know how to switch into a transpose matrix:
t(hw4aldData)
but that doesn't work with the names of the rows at the beginning of the columns.
but I'm not sure how to make a histogram using all the data points in this form, from each of the tanktemp and effrat data.
Any help is welcome, thanks.

The first step in asking a question on Stack Overflow is to create a reproducible example. That is a small example that users can input into their computers to test, diagnose, and solve your issue. It not only helps others but it also enables you to properly assess your problem and potentially find a solution while creating the example.
Example
We use the built-in iris data set for values. We only need a few rows and the "Species" label as the first column to look like your example:
df <- iris[c(1,80,150),c(5,1:4)]
df
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1 setosa 5.1 3.5 1.4 0.2
# 80 versicolor 5.7 2.6 3.5 1.0
# 150 virginica 5.9 3.0 5.1 1.8
That only took one line and is very helpful in visualizing and sharing the problem you are facing.
Reproduce the error
You did not show the error you are receiving but we can show it:
hist(df[1,])
Error in hist.default(df[1, ]) : 'x' must be numeric
hist(t(df[,1]))
We found the problem, the first column has text and the others do not.
Solution
Let's create row names to call from and delete the first column:
row.names(df) <- df[,1]
df <- df[-1]
df
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# setosa 5.1 3.5 1.4 0.2
# versicolor 5.7 2.6 3.5 1.0
# virginica 5.9 3.0 5.1 1.8
Now we can create the histogram by name. Let's try the "setosa" row:
hist(unlist(df["setosa",]))
Perfect. Cheers.

Related

Issue in viewing in-built dataset from cran such as Nile, Air Passengers etc

i am new to programming and i have been trying to explore R by using some in-built dataset from CRAN. Recently, I have been facing issue in viewing the tables of Nile, Air Passengers etc. On running >View(Nile) instead of showing a table, it shows something different.
However I have no issues trying to view other datasets like iris. I am unable to understand why i am able to view only few in-built datasets. Kindly help me fix this issue
This is how my screen shows up when I give View command

1) fortify.zoo RStudio has replaced View with its own code causing the problem; however, this should work in both RStudio and on Windows in Rgui. It converts the ts objects AirPassengers and Nile to data frames which RStudio's View can display.
library(zoo)
View(fortify.zoo(AirPassengers))
View(fortify.zoo(Nile))
2) as.zoo These also work in both Rgui in Windows and Rstudio by forcing the use of R's View function rather than Rstudio's View. In Rgui we don't need the utils::: part (although it won't hurt).
utils:::View(as.zoo(AirPassengers))
utils:::View(as.zoo(Nile))
3) data.frame Another approach that works in both RStudio and Rgui for Windows but is somewhat verbose is:
View(data.frame(time(AirPassengers), AirPassengers))
View(data.frame(time(Nile), Nile))
4) as.data.frame This one only partly works -- it shows the data but not the index.
In R (not RStudio) as.data.frame is automatically applied to the argument of View so the as.data.frame is actually superfluous and one could just write View(Nile), etc.
The reason that the index is not shown is that as.data.frame.ts drops the index whereas as.data.frame.zoo invoked in (2) puts the index into the row names which View displays.
View(as.data.frame(AirPassengers))
View(as.data.frame(Nile))

This is happening because they are different types of objects
> str(Nile)
Time-Series [1:100] from 1871 to 1970: 1120 1160 963 1210 1160 1160 813 1230 1370 1140 ...
> str(AirPassengers)
Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
iris is a data.frame, and the other time series.

Comparing two data frames with the same column and row names

I have two data frames with the same column and row names but have different values.
>data_tls
plot_id max min mean std vol
mf20 20.04 2.23 8.4 3.45 201
mf21 25.24 3.4 4.3 5.5 304
mf22 28.34 5.3 6.2 2.45 240
mf23 30.4 2.05 10.4 6.06 403
>data_uls
plot_id max min mean std vol
mf20 19.09 4.22 6.2 4.45 220
mf21 20.2 2.6 5.3 4.5 305
mf22 32.3 4.3 2.2 3.45 255
mf23 28.4 3.05 8.05 5.85 386
I want to compare the values in these datasets and select the values with more than 20% different. I am trying to use compareDF package example here :https://www.r-bloggers.com/comparing-dataframes-in-r-using-comparedf/.
compareData <- compare_df(data_tls, data_uls, c("Plot_name"))
compareData$comparison_df
However, print(compareData$html_output) returns Null.
I would really appreciate if someone kindly help to solve this or would recommend any other solution.

To get a TRUE/FALSE (logical) matrix use
res <- data_tls > data_uls * 1.2 | data_tls < data_uls * 0.8
Note: The data.frames may contain only numerical columns so you have to remove eg. the plot_id column (or select only the numerical columns in the above expression)!
You can the sum the differences via counting rows or columns like
rowSums(res)
colSums(res)

Plotting sales over time in R

I am trying to show the top 100 sales on a scatterplot by year. I used the below code to take top 100 games according to sales and then set it as a data frame.
top100 <- head(sort(games$NA_Sales,decreasing=TRUE), n = 100)
as.data.frame(top100)
I then tried to plot this with the below code:
ggplot(top100)+
aes(x=Year, y = Global_Sales) +
geom_point()
I bet the below error when using the subset top100
Error: data must be a data frame, or other object coercible by fortify(), not a numeric vector
if i use the actual games dataseti get the plot attached.
Any ideas?

As pointed out in comments by #CMichael, you have several issues in your code.
In absence of reproducible example, I used iris dataset to explain you what is wrong with your code.
top100 <- head(sort(games$NA_Sales,decreasing=TRUE), n = 100)
By doing that you are only extracting a single column.
The same command with the iris dataset:
> head(sort(iris$Sepal.Length, decreasing = TRUE), n = 20)
[1] 7.9 7.7 7.7 7.7 7.7 7.6 7.4 7.3 7.2 7.2 7.2 7.1 7.0 6.9 6.9 6.9 6.9 6.8 6.8 6.8
So, first, you do not have anymore two dimensions to be plot in your ggplot2. Second, even colnames are not kept during the extraction, so you can't after ask for ggplot2 to plot Year and Global_Sales.
So, to solve your issue, you can do (here the example with the iris dataset):
top100 = as.data.frame(head(iris[order(iris$Sepal.Length, decreasing = TRUE), 1:2], n = 100))
And you get a data.frame of of this type:
> str(top100)
'data.frame': 100 obs. of 2 variables:
$ Sepal.Length: num 7.9 7.7 7.7 7.7 7.7 7.6 7.4 7.3 7.2 7.2 ...
$ Sepal.Width : num 3.8 3.8 2.6 2.8 3 3 2.8 2.9 3.6 3.2 ...
> head(top100)
Sepal.Length Sepal.Width
132 7.9 3.8
118 7.7 3.8
119 7.7 2.6
123 7.7 2.8
136 7.7 3.0
106 7.6 3.0
And then if you are plotting:
library(ggplot2)
ggplot(top100, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()
Warning Based on what you provided in your example, I will suggest you to do:
top100 <- as.data.frame(head(games[order(games$NA_Sales,decreasing=TRUE),c("Year","Global_Sales")], 100))
However, if this is not satisfying to you, you should consider to provide a reproducible example of your dataset How to make a great R reproducible example

R: Free hand selection of data points in scatter plots

I would like to know if there is any good way to allow me getting the id of the points from a scatter plot by drawing a free hand polygon in R?
I found scatterD3 and it looks nice, but I can't manage to output the lab to a variable in R.
Thank you.
Roc

Here's one way
library(iplots)
with(iris, iplot(Sepal.Width,Petal.Width))
Use SHIFT (xor) or SHIFT+ALT (and) to select points (red):
Then:
iris[iset.selected(), ]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 119 7.7 2.6 6.9 2.3 virginica
# 115 5.8 2.8 5.1 2.4 virginica
# 133 6.4 2.8 5.6 2.2 virginica
# 136 7.7 3.0 6.1 2.3 virginica
# 146 6.7 3.0 5.2 2.3 virginica
# 142 6.9 3.1 5.1 2.3 virginica
gives you the selected rows.

The package "gatepoints" available on CRAN will allow you to draw a gate returning your points of interest.
The explanation is quite clear for anyone who reads the question. The link simply links to a package that can be used as follows:
First plot your points
x <- data.frame(x=1:10, y=1:10)
plot(x, col = "red", pch = 16)
Then select your points after running the following commands:
selectedPoints <- fhs(x)
This will return:
selectedPoints
#> [1] "4" "5" "7"
#> attr(,"gate")
#> x y
#> 1 6.099191 8.274120
#> 2 8.129107 7.048649
#> 3 8.526881 5.859404
#> 4 5.700760 6.716428
#> 5 5.605314 5.953430
#> 6 6.866882 3.764390
#> 7 3.313575 3.344069
#> 8 2.417270 5.217868

Extracting block of m rows at regular interval from large dataset

I have a small problem. I have a dataset with 8208 rows of data. It's a single column of data, I want to take every n rows as a block and add this to a new data frame.
So, for example:
newdf has column 1 to column 23.
column 1 is composed of rows 289:528 from the original dataset
column 2 is composed of rows 625:864 from the original dataset
And so on. The "block" size is 239 rows, the jump between blocks is every 336 rows.
I can do this manually, but it just becomes tedious. I have to repeat this entire procedure for another 11 sets of data so obviously a more automated approach would be preferable.

The trick here is to create an index of integers that refer to the row numbers you want to keep. This is simple enough with some use of rep, sequences and R's recycling rule.
Let me demonstrate using iris. Say you want to skip 25 rows, then return 3 rows:
skip <- 25
take <- 3
total <- nrow(iris)
reps <- total %/% (skip + take)
index <- rep(0:(reps-1), each=take) * (skip + take) + (1:take) + skip
The index now is:
index
[1] 26 27 28 54 55 56 82 83 84 110 111 112 138 139 140
And the rows of iris:
iris[index, ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
26 5.0 3.0 1.6 0.2 setosa
27 5.0 3.4 1.6 0.4 setosa
28 5.2 3.5 1.5 0.2 setosa
54 5.5 2.3 4.0 1.3 versicolor
55 6.5 2.8 4.6 1.5 versicolor
56 5.7 2.8 4.5 1.3 versicolor
82 5.5 2.4 3.7 1.0 versicolor
83 5.8 2.7 3.9 1.2 versicolor
84 6.0 2.7 5.1 1.6 versicolor
110 7.2 3.6 6.1 2.5 virginica
111 6.5 3.2 5.1 2.0 virginica
112 6.4 2.7 5.3 1.9 virginica
138 6.4 3.1 5.5 1.8 virginica
139 6.0 3.0 4.8 1.8 virginica
140 6.9 3.1 5.4 2.1 virginica

Update
Note the OP states the block size is 239 elements but it is clear from the examples rows indicated that the block size is 240
> length(289:528)
[1] 240
I'll leave the example below at a block length of 239, but adjust if it is really 240.
It isn't clear from the Question, but assuming that you have something like this
df <- data.frame(A = runif(8208))
a data frame with 8208 rows.
First compute the indices of the elements of A that you need to keep. This is done via
want <- sapply(seq(289, nrow(df)-239, by = 336),
function(x) x + (seq_len(239) - 1))
Then we can use the fact that R fills matrices by columns and convert the required elements of A to a matrix with 239 rows
mat <- matrix(df$A[want], nrow = 239)
This works
> all.equal(mat[,1], df$A[289:527])
[1] TRUE
but do note that I have taken a block length of 239 here (289:527) not the indices the OP quotes as that is a block size of 240 (see Update above)
If you want this is a data frame, just add
df2 <- as.data.frame(mat)

Try this:
1) Create a list of indices
lapply(seq(1, 8208, 336), function(X) X:(X+239)) -> Indices
2) Select Data
Columns <- lapply(Indices, function(X) OldDF[X,])
3) Combine selected data in columns
NewDF <- do.call(cbind, Columns)

Why not just:
as.dataframe(matrix(orig, nrow=528 )[289:528 ,])
Since the 8028 is not an exactl multiple of the row count we need to determine the columns:
> 8208/528
[1] 15.54545 # so either 15 or 16
> 8208-15*528
[1] 288 # all in the to-be-discarded section
as.dataframe(matrix(orig, nrow=528, col=15 )[289:528 ,])
Or:
as.dataframe(matrix(orig, nrow=528, col=8208 %/% 528)[289:528 ,])

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to make histogram with different columns in data set? - r

Related

Issue in viewing in-built dataset from cran such as Nile, Air Passengers etc

Comparing two data frames with the same column and row names

Plotting sales over time in R

R: Free hand selection of data points in scatter plots

Extracting block of m rows at regular interval from large dataset

Categories

Resources