Paste name of column to other columns in R? - r

I have recently received an output from the online survey (ESRI Survey123), storing the each recored attribte as a new column of teh table. The survey reports characteristics of single trees located on study site: e.g. beech1, beech2, etc. For each beech, several attributes are recorded such as height, shape, etc.
This is how the output table looks like in Excel. ID simply represent the site number:
Now I wonder, how can I read those data into R to make sure that columns 1:3 belong to beech1, columns 4:6 represent beech2, etc.? I am looking for something that would paste the beech1 into names of the following columns: beech1.height, beech1.shape. But I am not sure how to do it?

Related

Adding new column with data value in R

forest area to the I want to add a column name (say ForestAreaPerPopn) to find the ratio of forest area to the population(represented by variable Total below) residing. The data contains the following variables and their values.
How can I add a column named ForestAreaPerPopn in Table****ForestAreaPerPop (shown below) so that the column contains the data calculated as ratio of forest area to Total.
Too long for a comment.
You have a couple of problems. First, your column names have spaces and other special characters. This is allowed but creates all kinds of problems later. I suggest you do something like:
colnames(ForestAreaPerPop) <- gsub(' |\\(|\\)', '_', colnames(ForestAreaPerPop))
This will replaces any spaces, left or right parens in the colnames with '_'.
Then, something like:
ForestAreaPerPop$n <- with(ForestAreaPerPop, Forest_Area_in_ha/Total)
should give you what you want.
Some advice: long table names and column names may seem like a good idea, but you will live to regret it. Make them short but meaningful (easier said than done).

Read AGS type file in R

I am trying to read a special type of file (the format is called AGS) which looks like in the image:
This is basically a TEXT file, which contains many tables with different dimensions inside, separated by 2 (but sometimes more) empty rows. As you might guess, the problem is related to the fact that these tables have different number of columns and obviously different column names.
The first row in each table (here tables are denoted as GROUP) shows the name of the table, e.g. LOCA, HDPH, etc. The second row shows the column names. The third row shows the units of each column. All the other rows show the observations. In each row, columns are separated by commas and values are inside double quotes.
I am struggling to read this type of file. The ideal output would be to have each of these tables into separated data frames. Any help and ideas are much appreciated.
An example file can be downloaded here: example AGS file

Sorting a column of values based on index location

I am currently working with a large amount of data. For testing purposes I am using a smaller batch, but the main point of concern is the sorting of all the data based off of values in one particular column. I have posted a picture below that shows a small portion of my unsorted data. I want to sort the values in row 2 in ascending order along with all other data in those corresponding columns. In other words I don't want to just order row 2, I want to order row 2 and shift all other data with those re-ordered values.
Currently what I do is read in that csv to a data frame (tmpDF).
After that I transpose the data using tmpDF <- t(tmpDF)
Now I take that data and order the second column into ascending order (or at least that is what i think I am doing. ) tmpDF<- tmpDF[order(tmpDF[,1]),]
Re transpose the data to get it back how it was originally, but sorted. Result is shown in picture below "Ordered data result" Keep in mind that the data shown between the unsorted and sorted are different numbers due to my not posting my entire data set.
I have a few questions about this.
1) Am I going about this the correct way? I am not a very experienced programmer, just trying to teach myself R to help out my research efforts.
2) Why are the values such as "102" being represented as "1.01E+02" in my final sorted csv file? I don't believe I am changing type and in the original file they were represented as "102"
3) Why does the value 116 gets ordered before "1.01E+02"?

How to import reactive datasets in Rshiny?

i m creating a risk dashboard , the problem is that i need the data set to be reactive , i have a simple dataset composed of countries (8) , sectors and values , what i want is that my app will be able to deal with different data sets for example if we change the colnames (country becomes pays) and we change the position of the col ,the app will recognize the column as country (in reality the data set is composed of an undefined number of variables with unkown names)
for example for the country column , i thought of creating a list that contains all country names and and when the first row of a column matches with a country from that list ,the column become names country
like that the problem is solved for one variable and what about the other ones
I think that's unnecesary complexity.
I suggest you to build an script to clean your data first with those specifications and then use it as a source.
You can use pattern recognition to match columns but be sure there aren't similar columns, for example, if you have two numerical variables there's a big problem.
Via Shiny I suggest you:
fileInput to import your database
Visualizate your database using DT
Create as many textInput boxes as columns you have
Manually change colnames using dplyr::rename and the boxes
Use the transformed database in your dashboard
Other options can be made using base::grep and dplyr.

Expression analysis

I use tm package to analyse 5 docs. Initially the data is in csv format, contains few columns. I searched for most common words in the 1st column which represents the title of some books.(i created separate txt doc for each needed column).
Now I want to analyse the column that contains the First and Last Name of the authors (eg. John, Smith). I want to determine the number(frequency) of books for each author.
Please tell me how can I analyse both words together not separately like in the first case?
Convert the variable name_author into factor, then you have juste to determine how frequency you have for each levels (Authors).

Resources