Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 6 years ago.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Improve this question
What is the criteria for using the hist() function in R?
I have two columns of data, which looks something like this:
1 8764
2 604
3 150
4 50
5 21
6 7
7 2
8 5
10 3
11 2
12 1
14 1
16 1
17 2
18 3
20 1
23 1
24 1
25 1
28 1
29 1
When I put that into a data frame in R, and try to plot that using the hist() function, it gives me an error "x: must be numerical". How do I go about solving this?
I'm trying to get the first column on the x-axis and the second column on the y-axis.
Pardon if the question sounds stupid, its my first time using R.
you can do like this if you absolutely want to use the hist function:
hist(rep(df[[1]], df[[2]]))
df being your data.frame (well, if I understand as Roland said in the comments that the first column might be your values and the second column your frequency counts)
Edit it appears that your data.frame only has one column. In this case this will work:
hist(rep(seq_along(df[[1]]), df[[1]]))
In case of presence of NA do this before:
df <- na.omit(df)
If your data is data.frame called df then just do hist(df). It plots by default, of you can plot it with plot(hist(df)).
Related
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I am trying to divide all integers in a column with another integer. I have a database with a column that has integers that go above 1*10^20. Because of this my plots are way to big. I need to normalize the data to have a better understanding what is going on. For example, the data that I have:
[x][Day] [Amount]
[1] 1 1 23440100
[2] 2 2 41231020
[3] 3 3 32012010
I am using a data.frame for my own data, so here you have the data frame for the data above
x <- c(1,2,3)
day <- c(1,2,3)
Amount <- c(23440100, 41231020, 32012010)
my.data <- data.frame(x, day, Amount)
I tried using another answer, provided here, but that doesn't seem to work.
The code that I tried:
test <- my.data[, 3]/1000
Hope someone can help me out! Cheers, Chester
I guess you are looking for this?
my.data$Amount <- my.data$Amount/1000
such that
> my.data
x day Amount
1 1 1 23440.10
2 2 2 41231.02
3 3 3 32012.01
Use mutate from dplyr
Since you're using a data.frame, you can use this simple code:
library(dplyr)
mutated.data <- my.data %>%
mutate(Amount = as.integer(Amount / 1000))
> mutated.data
x day Amount
1 1 1 23440.10
2 2 2 41231.02
3 3 3 32012.01
Hope this helps.
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have a data frame in R that has two columns, one with last names, the other with the frequency of each last name. I would like to randomly select last names based on the frequency values (0 -> 1).
So far I have tried using the sample function, but it doesn't allow for specific frequencies for each value. Not sure if this is possible :/
df1 <- data.frame(names = c("John","Mary"),freq=c(0.2,0.8))
df1
# names freq
# 1 John 0.2
# 2 Mary 0.8
set.seed(1)
sample100 <- sample(
x = df1$names,
size = 100,
replace=TRUE,
prob=df1$freq)
table(sample100)
# sample100
# John Mary
# 17 83
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have my data in xls file.I try to read like this
> df = read.xls ("natgas.xls")
Output
df
Dec.2007 X2399154
1 Jan-2008 2733970
2 Feb-2008 2503421
3 Mar-2008 2278151
4 Apr-2008 1823867
5 May-2008 1576387
6 Jun-2008 1604249
7 Jul-2008 1708641
8 Aug-2008 1682924
9 Sep-2008 1460924
10 Oct-2008 1635827
Everything is OK,except the first line.
When I index second column
> df[,2]
[1] 2733970 2503421 2278151 1823867 1576387 1604249 1708641 1682924 1460924
the first value is missing.
How to solve this?
Looks like you need to add header = FALSE to your read.xls call (which seems to come from the gdata package):
df1 <- read.xls("natgas.xls", header = FALSE)
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I am having trouble subsetting a large data frame. I have 5,000 observations and 60+ columns. I want to subset based on ~ 30 columns -- essentially to "drop" any observations where the sum of the values in these 30 columns of interest == 0. A small sample is below: I would want to get rid of UID #1 and #3.
UID 236.1(b) 261.5(c) 261.5(d)
1 0 0 0
2 2 3 0
3 0 0 0
4 0 0 0
I have tried the following code:
sub <- subset(df, rowSums(df[, 29:60]>0))
which generated the following error term:
Error in subset.data.frame(merge_charge, rowSums(merge_charge[, 29:60] > : 'subset' must be logical
and:
test <- subset(rowSums(df[,29:60]>0))
Which generated the following error:
Error in subset.default(rowSums(merge_charge[, 29:60] > 0)) :
argument "subset" is missing, with no default
Any suggestions or pointers would be most appreciated.
First, take a look at subset() function.
You can use it like this:
subset(data, condition)
So, you miss the data argument here.
Second, you put ( in rowSums wrongly. It must be rowSums(df[,1:2]) > 0
Therefore, It'll be:
test <- subset(your_data, rowSums(your_data[,29:60])>0 )
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I have a data table with 5778 rows and 28 columns. How do I delete ALL of the 1st row. E.g. let's say the data table had 3 rows and 4 columns and looked like this:
Row number tracking_id 3D71 3D72 3D73
1 xxx 1 1 1
2 yyy 2 2 2
3 zzz 3 3 3
I want to create a data table that looks like this:
Row number tracking_id 3D71 3D72 3D73
1 yyy 2 2 2
2 zzz 3 3 3
i.e. I want to delete all of row number 1 and then shift the other rows up.
I have tried datatablename[-c(1)] but this deletes the first column not the first row!
Many thanks for any help!
You can do this via
dataframename = dataframename[-1,]
It can be easily done with indexing the data.table/data frame as mentioned by #joni. You can also do with
datatablename <- datatablename[2:nrow(datatablename), ]
You can find more interesting stuff about data.table here.