Edit maximum number of characters in R dfSummary (R) - r

I am wondering if it is possible to edit the "Stats/Values" column in dfSummary command of the R package "summarytools". I need to adjust the number of characters displayed in the values (I do not mean the number of factor levels but literally the number of characters) as there is a cut off point defined which not suit my survey data. I have posted a screenshot for an example.
Thanks a lot for your help!
dfSummary_screenshot_example

There is a parameter exactly for this... By default, its value is 25, but you can make it however large you want :)
dfSummary(dat, max.string.width = 500)

Related

How to access the entry value with unrecognised number of decimal in data frame in R?

I have a data frame in R that I want to analyse. I want to know how many specific numbers are in a data frame column. for example, I want to know the frequency of number 0.9998558 by using
sum(deviation_multiple_regression_3cell_types_all_spots_all_intersection_genes_exclude_50_10dec_rowSums_not_0_for_moran_scaled[,3]== 0.9998558)
However, it seems that the decimal shown is not the actual one (it must be 0.9998558xxxxx) since the result I got from using the above command is 0 (the correct one should be 3468). How can I access that number without knowing the exact decimal numbers so that I get the correct answer? Please see the screenshot below.
The code below gives the number of occurrences in the column.
x <- 0.9998558
length(which(df$a==x))
If you are looking for numbers stating with 0.9998558, I think you can do it in two different ways: working with data as numeric or as character.
Let x be your variable:
Data as character
This way counts exactly what you are looking for
sum(substr(as.character(x),1,9)=="0.9998558")
Data as numeric
This will include all the values with a difference with the reference value lower than 1e-7; this may include values not starting exactly with 0.9998558
sum(abs(x-0.9998558)<1e-7)
You can also "truncate" the numbers in your vector and compare them with the number you want. Here, we write 10^7 because 7 is the number of decimals you want to compare.
sum(trunc(x*10^7)/10^7)==0.9998558)

How to print the number of data entries inside a variable in R? [duplicate]

This question already has answers here:
How to know a dimension of matrix or vector in R?
(6 answers)
Closed 3 years ago.
I know this is probably a very simple question but I can't seem to find the answer anywhere online. I am trying to print just the number of data points inside of a variable that I created but I can't figure out how.
I tried using summary() or num() or n() but I am really just making stuff up here and cannot seem to figure it out at all.
For my specific example I have a data set on peoples heights, age, weight, gender, stuff like that. I used
one_sd_weight <- cdc$weight[abs(cdc$weight - mean(cdc$weight)) <= sd(cdc$weight)]
to determine how many of the weights fall within one standard deviation of the mean. After I do this, I can see that on the right side it created a new variable called one_sd_weight that contains 14152 out of the original 20000 entries. How do I print the number 14152 as a variable? For the work I am doing I need to create a new variable that just contains one number, 14152 or whatever number is produced when I run the code above. For example, I need to create
n_one_sd <- 14152
without typing in 14152, instead typing some function that grabs the number of entries in one_sd_weight.
I have tried things like summary() and n() but only receive error messages in return. Any help is greatly appreciated!!
n_one_sd <- length(one_sd_weight)
You're looking for length (in case of a vector) or nrow in case of a matrix/data.frame.
Or you can use NROW() for both, that should work too.

How do I export a custom list of numbers and letters to Excel from R?

To help with some regular label-making and printing I need to do, I am looking to write a script that allows me to enter a range of sequential numbers (some with string identifiers) that I can export with a specific format to Excel. For example, if I entered the range '1:16', I am looking for an output in Excel exactly as:
Example Excel Output
For each unique sequential number (i.e., 1 to 16) the first five rows must be labeled with a 'U", the next three rows with an 'F' and the last two rows must be the number alone. The final exported matrix will be n columns x 21 rows, where n will vary depending on the number range I enter.
My main problem is in writing to Excel. I can't find out how to customize this output and write to specific rows and columns as in the example above. I am limited to 'openxlsx' since I work on a corporate secure workstation. Here is what I have so far:
Example Code
Any help you may have would be very appreciated, thanks in advance!

Replacement of certain percentage of vector in r

I need an R-code to delete certain percentage of numbers in a vector and replace the deleted numbers with another number....
e.g
consider this random number,
x=rnorm(100,1,3)
I want to delete 25% of the generated numbers and replace the deleted numbers by deleted number+29
Please, I need somebody to help me with this. Thanks.
For example :
x[seq_len(length(x)*0.25)] <- 29
This will replace in front of x. You can randomize the result using sample(x).
Another option is to use x as a base for generation :
c(sample(x,size = length(x)*0.75) ,rep(29, length(x)*0.25))

CSV file to Histogram in R

I'm a total newbie with R, and I'm trying to create a histogram (with value and frequency as the axises) from a csv file (just one row of values). Any idea how I can do this?
I'm also an R newbie, and I ran into the same thing. I made two separate mistakes, actually, so I'll describe them both here.
Mistake 1: Passing a frequency table to hist(). Originally I was trying to pass a frequency table to hist() instead of passing in the raw data. One way to fix this is to use the rep() ("replicate") function to explode your frequency table back into a raw dataset, as described here:
Creating a histogram using aggregated data
Simple R (histogram) from counted csv file
Instead of that, though, I just decided to read in my original dataset instead of the frequency table.
Mistake 2: Wrong data type. My raw data CSV file contains two columns: hostname and bookings (idea is to count the number of bookings each host generated during some given time period). I read it into a table.
> tbl <- read.csv('bookingsdata.csv')
Then when I tried to generate a histogram off the second column, I did this:
> hist(tbl[2])
This gave me the "'x' must be numeric" error you mention in a comment. (It was trying to read the "bookings" column header in as a data value.)
This fixed it:
> hist(tbl$bookings)
You should really start to read some basic R manual...
CRAN offers a lot of them (look into the Manuals and Contributed sections)
In any case:
setwd("path/to/csv/file")
myvalues <- read.csv("filename.csv")
hist(myvalues, 100) # Example: 100 breaks, but you can specify them at will
See the manual pages for those functions for more help (accessible through ?read.table, ?read.csv and ?hist).
To plot the histogram, the values must be of numeric class i.e the data must be of numeric value. Here the value of x seems to be of some other class.
Run the following command and see:
sapply(myvalues[1,],class)

Resources