Google Sheets- Split Cell w/ Data and Organize? - datetime

Hello Sheets Experts,
I'm looking to analzye some time data that I received in a Google Sheet. Unfortunately, the original data is in a less than friendly format.
Sample Sheet w/ Data:
https://docs.google.com/spreadsheets/d/1-eyyVs67pp4nyL7jboWmOkGnrkNuuwiPP2RVv_SuDz0/edit?usp=sharing
Ideally, I would like to take this "source data" (around 1500 separate cells)
Source Data
and pull the time listed for the last 4 days (indicated in my headers) and organize into separate cells for further analysis (like shown below, which I did manually):
Ideal Result
The tough thing is that each cell is unique as it contains a variety in both the quantity and calendar date.
Is there a way to break down the data in the column A to achieve my desired result?
I tried splitting the text to columns, which I can do 1 by 1- but am hoping there is a "smarter" way to do this with 1500 rows of data
split column
Ablebits powertools seems like it may help, but I don't have a subscription and am looking for a "free" way to do this via a formula.

Added solution to your sheet here:
=makearray(rows(A2:A),4,lambda(r,c,ifna(regexextract(index(A2:A,r),index(to_text(B1:E1),,c)&" \| (.*?)"&CHAR(10)))))

You can try with this formula:
=INDEX(VLOOKUP(B1:E1,INDEX(IFERROR((SPLIT(FLATTEN(SPLIT(A2:A13,CHAR(10)))," | ")))),2,0)&" min.")
It splits the values by the line separator (CHAR(10)), makes it a column with FLATTEN, splits again by "|" and FLATTEN; and then do a VLOOKUP with the headers
To the full range you can use:
=BYROW(A2:A,LAMBDA(a,IF(a="",{"","","",""},INDEX(IFNA(VLOOKUP(B1:E1,INDEX(IFERROR((SPLIT(FLATTEN(SPLIT(a,CHAR(10)))," | ")))),2,0)&" min.")))))

Related

Grouping and transposing data in R

It is hard to explain this without just showing what I have, where I am, and what I need in terms of data structure:
What structure I had:
Where I have got to with my transformation efforts:
What I need to end up with:
Notes:
I've not given actual names for anything as the data is classed as sensitive, but:
Metrics are things that can be measured- for example, the number of permanent or full-time jobs. The number of metrics is larger than presented in the test data (and the example structure above).
Each metric has many years of data (whilst trying to do the code I have restricted myself to just 3 years. The illustration of the structure is based on this test). The number of years captured will change overtime- generally it will increase.
The number of policies will fluctuate, I've just labelled them policy 1, 2 etc for sensitivity reasons and limited the number whilst testing the code. Again, I have limited the number to make it easier to check the outputs.
The source data comes from a workbook of surveys with a tab for each policy. The initial import creates a list of tibbles consisting of a row for each metric, and 4 columns (the metric names, the values for 2024, the values for 2030, and the values for 2035). I converted this to a dataframe, created a vector to be a column header and used cbind() to put this on top to get the "What structure I had" data.
To get to the "Where I have got to with my transformation efforts" version of the table, I removed all the metric columns, created another vector of metrics and used rbind() to put this as the first column.
The idea in my head was to group the data by policy to get a vector for each metric, then transpose this so that the metric became the column, and the grouped data would become the row. Then expand the data to get the metrics repeated for each year. A friend of mine who does coding (but has never used R) has suggested using loops might be a better way forward. Again, I am not sure of the best approach so welcome advice. On Reddit someone suggested using pivot_wider/pivot_longer but this appears to be a summarise tool and I am not trying to summarise the data rather transform its structure.
Any suggestions on approaches or possible tools/functions to use would be gratefully received. I am learning R whilst trying to pull this data together to create a database that can be used for analysis, so, if my approach sounds weird, feel free to suggest alternatives. Thanks

Paste name of column to other columns in R?

I have recently received an output from the online survey (ESRI Survey123), storing the each recored attribte as a new column of teh table. The survey reports characteristics of single trees located on study site: e.g. beech1, beech2, etc. For each beech, several attributes are recorded such as height, shape, etc.
This is how the output table looks like in Excel. ID simply represent the site number:
Now I wonder, how can I read those data into R to make sure that columns 1:3 belong to beech1, columns 4:6 represent beech2, etc.? I am looking for something that would paste the beech1 into names of the following columns: beech1.height, beech1.shape. But I am not sure how to do it?

Creating a species Accumulation Curve with different format

I am currently working on a dataframe which looks like this:
data.frame(Plot_ID=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4),
Species=c("a","a","a","b","b","c","c","b","b","b","b","d","d","d","e","e","e","e","a","a","a")
DBH=c(12,32,44,11,14,66,43,22,88,22,23,45,354,6,7,45,12,11,5,6,8))
DBH is just the diameter of the species. What I want to create is a species accumulation curve, however the packet specaccum only allows for a different format which is like this:
data.frame (Spec1=c(1,0,2,3),Spec2=c(0,0,0,4),Spec3=c(1,1,2,3))
My data has over 3000 rows, with more that a hundred species which makes it very difficult to reformat the data accordingly. Is there a way to easily reformat the data, or to use the data like it is with a different package?
Ok after a while I remembered the pivot-table of LibreOffice, where you can exactly format the data to have the species in columns, each plot in a row and the sum in between.
For that, create a 3rd column which includes only the number 1, your data should look like this:
d1<-data.frame(Plot_ID=c(1,1,2,2,3,3),
Species=c("a","b","a","c","d","c"),
Count=c(1,1,1,1,1,1))
Export the data frame as .csv file using
write.table(d1, "~/path/of/desire/d1.csv")
Import the csv. table in Libreoffice by using space as separator. Delete the first column as it is the internal R-ID and shifts the headings.
Mark your data and go to Data->Pivot-table, select marked data and click OK.
You will see something like this
Loai.cay is the Species here. Drag the Species to the column-cields, the Plot_ID to the row-fields and the Count to the data-fields, as it is the case in the picture. Press OK and copy the result in a extra table. Press CTRL+SHIFT+S and select save as .csv file. Import the csv-file in R and use the specaccum function as described in the description of the function.
Hope this helps someone else than me.

Adding (mathematically) columns of a CSV based on information in another column with PowerShell

I was having a really hard time describing what I need in the Title, so I apologize ahead of time if that makes absolutely no sense.
If I have a CSV that has 2 columns, one with a persons name and a second column with a numeric value I need to find the duplicates in the names column then add the numeric values for that person together to get a total number in a new CSV.
This is a very simplified version of the real CSV
Name,Number
Dog,1
Cat,2
Fish,1
Dog,3
Dog,2
Cat,2
Fish,1
Given the information above, what I would like to be able to produce is this:
Name,Number
Dog,6
Cat,4
Fish,2
I really don't have any idea how to get there or if it's possible with PowerShell. I can only get as far as using group-object to group by name, but I have no clue how to add the columns after that.
The biggest problem I'm coming across with my research on this is that most if not all the results I get when googling involve adding new columns to a csv and not performing the mathematical calculation.
I finally got it
$csvfile = import-csv c:\csvfile.csv
$csvfile | group name | select name,#{Name="Totals";Expression={($_.group | Measure-Object -sum number).sum}}
Credit goes to:
http://www.hanselman.com/blog/ParsingCSVsAndPoorMansWebLogAnalysisWithPowerShell.aspx

Creating New Variables in R that relate to

I have 7 different variable in an excel spreadsheet that I have imported into R. They each are columns with a size of 3331. They are:
'Tribe' - there are 8 of them
'Month' - when the sampling was carried out
'Year' - the year when the sampling was carried out
'ID" - an identifier for each snail
'Weight' - weight of a snail in grams
'Length' - length of a snail shell in millimetres
'Width' - width of a snail shell in millimetres
This is a case where 8 different tribes have been asked to record data on a suspected endangered species of snail to see if they are getting rarer, or changing in size or weight.
This happened at different frequencies between 1993 and 1998.
I would like to know how to be able to create a new variables to the data so that if I entered names(Snails) # then it would list the 7 given variables plus any added variable that I have.
The dataset is limited to the point where I would like to add new variables. Such as, knowing the counts per month of snails in any given month.
This would rely on me using - Tribe,Month,Year and ID. Where if an ID (snail identifier) was were listed according to the rates in any given month then I would be able to sum them to see if there are any changes in counts. I have tried:
count=c(Tribe,Year,Month,ID)
count
But, after doing things like that, R just has a large list of that is 4X the size of the dataset. I would like to be able to create a given new variable that is of column size n=3331.
Or maybe I would like to create a simpler variable so I can see if a tribe collected at any given month. I don't know how I can do this.
I have looked at other forums and searched but, there is nothing that I can see that helps me in my case. I appreciate any help. Thanks
I'm guessing you need to organise your variables in a single structure, such as a data.frame.
See ?data.frame for the help file.
To get you started, you could do something like:
snails <- data.frame(Tribe,Year,Month,ID)
snails
# or for just the first few rows
head(snails)
Then this would have your data looking similar to your Excel file like:
Tribe Year Month ID
1 1 1 1 a
2 2 2 2 b
3 3 3 3 c
<<etc>>
Then if you do names(snails) it will list out your column names.
You could possibly avoid some of this mucking about by just importing your Excel file either directly from Excel, or saving as a csv (comma separated values) file first and then using read.csv("name_of_your_file.csv")
See http://www.statmethods.net/input/importingdata.html for some more specifics on this.
To tabulate your data, you can do things like...
table(snails$Tribe)
...to see the number of snail records collected by each tribe. Or...
table(snails$Tribe,snails$Year)
...to see the trends in each tribe by each year. The $ character will let you access the named variable (column) inside a data.frame in the same way you are currently using the free floating variables. This might seem like more work initially, but it will pay off greatly when you need to do some more involved analysis.
Take for example if you want to only analyse the weights from tribe "1", you could do:
snails$Weight[snails$Tribe==1]
# mean of these weights
mean(snails$Weight[snails$Tribe==1])
There are a lot more things I could explain but you would probably be better served by reading an excellent website like Quick-R here: http://www.statmethods.net/management/index.html to get you doing some more advanced analysis and plotting.

Resources