Adding header's name for dataset in kbl - r

When I use kbl in r markdown, sometimes the header's name lost. Look the pic, the head column's names are all "x" instead of its original one (df3:"No."variable","p-value"; df.gini.sort:"variable","gini score. Could anyone help to figure out how to fix that? Thanks~~!

We can use cbind instead of c as c will concatenate the data.frame to a named list of vectors (data.frame - is a list of vectors/columns of equal length with additional attribute). Here, we assume both the datasets to have the same number of rows
library(kableExtra)
kbl(cbind(df3, df.gini.sort))
If we are using c, then wrap with data.frame afterwards
kbl(data.frame(c(df3, df.gini.sort)))

Related

Combine lapply and gsub to replace a list of values for another list of values

I am currently looking for a way to simplify searching through a column within a dataframe for a vector of values and replacing each of of those values with another value (also contained within a separate vector). I can run a for loop for this, but it must be possible within the apply family, I'm just not seeing it yet. Very new to using the apply family and could use help.
So far, I've been able to have it replace all instances of the first value in my vector with the new first value in the new vector, it just isn't iterating past the first level. I hope this makes sense. Here is the code I have:
#standardize tank location
old_tank_list <- c("7.C.4","7.C.5","7.C.6","7.C.7","7.C.8","7.C.9","7.C.10","7.C.11")
new_tank_list <- c("7.B.3-4","7.C.3-4","7.C.1-2","7.C.5-6","7.C.7-8","7.C.9-10","7.E.9-10","7.C.11-12")
sapply(df_growth$Tank,function(y) gsub(old_tank_list,std_tank_list,y))
Tank is the name of the column I am trying to replace all of these values within. I haven't assigned it back yet, because I want to test the functionality first. Thanks for any help you can offer.
Hopefully, this image will help. The photo on the left is the column before my function is applied. The column on the right is after. Basically, I just want to batch change text values.
Before and After
library(dplyr)
df %>%
mutate(Tank = recode(Tank, !!!setNames(new_tank_list, old_tank_list)))

Role of square brackets

I got this code from elsewhere and I wondering if someone can explain what the square brackets are doing.
matrix1[i,] <- df[[1]][]
I am using this to assign values to a matrix and it works but I am not sure what exactly it's doing. What does the initial set of [[]] mean followed by another []?
This might help you understand a bit. You can copy and paste this code and see the differences between different ways of indexing using [] and $. The only thing I can't answer for you is the second empty set of square brackets, from my understanding that does nothing, unless a value is within those brackets.
#Retreives the first column as a data frame
mtcars[1]
#Retrieves the first column values only (three different methods of doing the same thing)
mtcars[,1]
mtcars[[1]]
mtcars$mpg
#Retrieves the first row as a data frame
mtcars[1,]
#I can use a second set of brackets to get the 4th value within the first column
mtcars[[1]][4]
mtcars$mpg[4]
The general function of [ is that of subsetting, which is well documented both in help (as suggested in comments), and in this piece. The rest of of my answer is heavily based on that source.
In fact, there are operators for subsetting in R; [[,[, and $.
The [ and $ are useful for returning the index and named position, respectfully, for example the first three elements of vector a = 1:10 may be subsetted with a[c(1,2,3)]. You can also negatively subset to remove elements, as a[-1] will remove the first index.
The $ operator is different in that it only takes element names as input, e.g. if your df was a dataframe with a column values, df$values would subset that column. You can achieve the same [, but only with a quoted name such as df["values"].
To answer more specifically, what does df[[1]][] do?
First, the [[-operator will return the 1st element from df, and the following empty [-operator will pull everything from that output.

How can I make a list of data frames which have the same values in the first column?

Say I have multiple data frames, and I want to make a multiple lists of the data frames with the same first column. For example, dfs 1-4 have "abc" in all columns of the first row, dfs 5-7 have "def" in all columns of the first row, etc. How can I write a script which puts (in this case) dfs 1-4 in a list called "abc", dfs 5-7 in a list called "def"?
This is my first question, so please let me know if there is anything else I could provide. I researched for a few days with no luck :(
Thanks!
Jack
So this is a guide to the solution, as you asked.
First make sure you have your list of data frames called l (all(sapply(l, is.data.frame)) should be TRUE).
Then, for each element (df) of this list, you need to get the character (string) in the first row (in any column, for example the first one). This will give you a vector of characters and you can get it by using either sapply or purrr::map_chr.
After that, here comes the split you want to do. Use split for that with as first argument the vectors of indices (see ?seq_along) and as a second argument the vector of characters you've just computed before.
Finally, use lapply to transform this list of indices in a list of data frames (you need to know the [ accessor for a list).
If you need more guidance, don't hesitate to ask.

R data frame issue - non-numeric headers

This is definitely a rookie question but I'm not finding an answer for this (maybe because of my wording) so here goes:
I'm reading a data frame into R studio (csv file) that has 24 columns with headers. There are only numbers in these columns (they're essentially concentrations of several chemicals). It's called all. I need to use them as numeric vectors. When I read them in and type
is.numeric(all[,1])
I get
TRUE
When I type
is.numeric(all[1])
I get
FALSE
I think this is because R interprets the header as a factor. I also tried reading in a table without headers and with headers=FALSE, but R renames it to V1, V2 etc so the result ends up being the same.
I need to work with functions where I invoke something like all[2:24]. How can I go about to make R either "not see" the header or remove it altogether?
Thanks for the answers!
PS: the dataframe I am using (without headers - if it had headers, it would just have names instead of V1, V2, etc) is something like this:
This is a subset from the first column, not the first row.
all[,1]) #subset first column
The following is subset of first row
all[1,]) #subset first row (headers of df not included)
To give columnames
colnames(all) <- c("col1","col2")
Your assumption is wrong. You have a data.frame and all[1] does list subsetting, which results in a data.frame, which is not a vector, and not a numeric vector in particular.
You should study help("[") and An Introduction to R.

Extracting Data from a column based on symbols

I have a tricky question that I'm hoping someone can help me with. I have an output file that looks pretty standard in that there is one value per row, per column - except for one column (excerpt below) that contains multiple entries per row:
4:103806204-103940896,4:103806204-103940896,4:103822084-103940896,4:103806204-103940896
7:27135712-27139877,7:27135712-27139877
2:209030070-209054773
1:16091458-16113084,1:16090993-16101715,1:16085254-16113084
16:70333061-70367735,16:70323669-70367735,16:70333061-70367735,16:70333061-70367735,16:70328735-70367735,16:70328699-70367735,16:70333061-70367735
It would be easy enough to split this column by ',' but then I won't be able to read it into, say, R very easily.
Instead, I'm hoping I can use a simple bit of code to select only the first two values, and then make one column into two, removing the rest. So the above would become the below:
4 103806204
7 27135712
2 209030070
1 16091458
16 70333061
I lose a little bit of info this way, but it makes the data more manageable. Does anyone have any suggestions?
We can use str_extract_all from library(stringr). We extract the numeric elements (\\d+) in a list, convert the 'character' class to numeric and get the first two elements with head, rbind the list elements.
library(stringr)
do.call(rbind, lapply(str_extract_all(df$col, '\\d+'),
function(x) head(as.numeric(x),2)))

Resources