Expression analysis - r

I use tm package to analyse 5 docs. Initially the data is in csv format, contains few columns. I searched for most common words in the 1st column which represents the title of some books.(i created separate txt doc for each needed column).
Now I want to analyse the column that contains the First and Last Name of the authors (eg. John, Smith). I want to determine the number(frequency) of books for each author.
Please tell me how can I analyse both words together not separately like in the first case?

Convert the variable name_author into factor, then you have juste to determine how frequency you have for each levels (Authors).

Related

Paste name of column to other columns in R?

I have recently received an output from the online survey (ESRI Survey123), storing the each recored attribte as a new column of teh table. The survey reports characteristics of single trees located on study site: e.g. beech1, beech2, etc. For each beech, several attributes are recorded such as height, shape, etc.
This is how the output table looks like in Excel. ID simply represent the site number:
Now I wonder, how can I read those data into R to make sure that columns 1:3 belong to beech1, columns 4:6 represent beech2, etc.? I am looking for something that would paste the beech1 into names of the following columns: beech1.height, beech1.shape. But I am not sure how to do it?

Merging multiple excel sheets based on different column values in R

I'm a bit new to R so apologies up front if its not explained as clear as it should be. I have 6 excel sheets within a single workbook (Trees_2020, Trees_2017, Trees_2014, Trees_2011, Trees_2008, Trees_2003). These contain plot IDs (ID_Plot), within plot tree ID numbers (ID_tree) and then growth data (DBH_mm). The problem is the tree IDs do not remain the same through the years but are linked based on their old ID (Field_Mapping software recognises them based on location but assigns a new number which is linked to the Old_ID).
What I'm trying to do is merge all the sheets linking the years together based on the plot ID and then the Old_ID to current ID.
2020 Data Example
2017 Data Example
You can see in the 2020 sheet a column linking to the Old_ID number of 2017 and this is true of all sheets. Trees that are recorded for the first time do not have an Old_ID number in that first recording.
The ideal output would be a single sheet where a unique identifier is added for each tree, the DBH of each tree for each year linked together based on the plot_ID and the within plot ID_tree (coupled based on Old_ID)
Ideal Output
Apologies if thats very confusing but I struggled to explain it in a simpler way. I've been playing with tidyverse and loops but can't seem to figure it out so any help greatly appreciated!

Name matching and correcting spelling error in r

I have a huge data table with millions of rows that consists of Merchandise code with its description. I want to assign a category to each group (based on the combination of code and description). The problem is that the description is spelled in different ways and I want to convert all the similar names into a single one. Here is an illustrative example:
ibrary(data.table)
dt <- data.table(code = c(rep(1,2),rep(2,2),rep(3,2)), name = c('McDonalds','Mc
Dnald','Macys','macy','Comcast','Com-cats'))
dt[,cat:='NA']
setkeyv(dt,c('code','name'))
dt[.(1,'McDonalds'),cat:='Restaurant']
dt[.(1,'Mc Dnald'),cat:='Restaurant']
dt[.(1,'Macys'),cat:='Department Store']
Of course in the real case, it is impossible to go through all the spelling that refer to the same word and fix them manually.
Is there a way to detect all the similar words and convert them to a single (correct) spelling?
Thanks in advance

Import excel (csv) data into R conducting bioinformatics task

I'm a new who is exploring bioinformatics via R. Right now I've encounter a trouble, where I imported my data in excel into R through changing it into csv format and using read.csv command, as you see in the pic there are 37 variables (column) where first column is supposed to be considered as fixed factor. And I would like to match it with another matirx which has only 36 variables in the downstream processing, what should I do to reduce variable numbers by fixing first column?
Many thanks in advance.
sure, I added str() properties of my data here.
If I am not mistaken, what you are looking for is setting the "Gene" column as metadata, indicating what gene those values in every row correspond to. You can try then to delete the word "Gene" in the Excel file because when you import it with the read.csv() function, the argument row.names = TRUE is set as default when "there is a header and the first row contains one fewer field than the number of columns".
You can find more information about this function using ?read.csv

Excel and R do not see two values as being equal

I loaded data into two Excel sheets from online tables. Both tables include distinct information about the same group of baseball players, who are named in column B (or column 2 when converted to R) of each table. Neither Excel (VLOOKUP/MATCH) nor R will match up the players' names between the two tables, despite those names looking exactly the same in every way.
Yes, I have checked for extra spaces, capitalization, etc. I have attempted reformatting the cells in Excel that include the players' names. Please see input and output below from R (data was loaded as csv file):
> as.character(freeagentvalue$Name)[3064]
[1] "Travis Hafner"
> as.character(freeagentdata$Name)[294]
[1] "Travis Hafner"
> as.character(freeagentdata$Name)[294] == as.character(freeagentvalue$Name)[3064]
[1] FALSE
I would appreciate any information on why Excel and R are finding differences like the one above. Otherwise I have to retype a lot of names. Thank you in advance.
The two Travis Hafner strings in your example above differ in that in that the first example has a NBSP between the two names; the second has a normal space.
I suggest preprocessing the tables by Replacing all NBSP's with space You can do that either on the worksheet, using the SUBSTITUTE function; or in VBA, using Replace.

Resources