R data spread only in lines - r

I have a file with data in this way:
Name: abcdef
Value:40
Id:34
Size: 1000
Name: xyz
Value:4
Id:765
Size: 5561000
Name: qwerty
Value:0
Id:4
Size: 1000
But I would something like this:
| Name | Value | Id | Size |
| abcdef | 40 | 34 | 1000 |
| xyz | 4 | 765 | 5561000 |
| qwerty | 0 | 4 | 1000 |
It's possible do that with R standard commands?

I couldn't find the imagined function in splitstackshape, nor could I find the duplicate question on SO that I also imagined I had seen using "attribute value" or "label value" as search terms, but I can offer a solution based on scan's ability to handle multi-line data and sub to trim out the excess text. You can obviously remove the dangling column:
inp <- scan(text=txt, what=list("n", "v", "i", "s", "blank"),sep="\n")
Read 3 records
names(inp) <- lapply(inp , function(col) sub("\\:.+","",col[1]) )
inp <- data.frame( lapply(inp, function(col) sub(".+\\:[ ]{0,1}","",col) ) )
> inp
Name Value Id Size c............
1 abcdef 40 34 1000
2 xyz 4 765 5561000
3 qwerty 0 4 1000
This will require that the data be very regular. Each section needs to be 5 lines and the order inside a section of the values needs to be constant, although blank values should be handled correctly.
Data used:
txt <- "Name: abcdef
Value:40
Id:34
Size: 1000
Name: xyz
Value:4
Id:765
Size: 5561000
Name: qwerty
Value:0
Id:4
Size: 1000
"

Related

Add a value from a column in one table based off finding a result in another in R

I have a data table in R:
|gene | prom_65| prom_66| amast_69| amast_70| markerID|
|:--------------|---------:|---------:|---------:|---------:|---------:|
|ABC | 24.7361| 25.2550| 31.2974|45.4209 |16:123234_T/C; 16:54352342_A/T; 16:747564_T/G|
|DFG | 107.3580| 112.9870| 77.4182| 86.3211| 16:3453453_G/A; 16:765753453_A/T; 16:65345345_T/G|
|LKP | 72.0639| 86.1486| 68.5747| 77.8383| 16:25234453_G/C; 16:876456546_A/T; 16:4535_T/G|
|KLF | 43.8766| 53.4004| 34.0255| 38.4038| 16:87484_G/A; 16:5435_A/T; 16:747564_T/G|
|PPO | 2382.8700| 1871.9300| 2013.4200| 2482.0600| 16:785_T/C; 16:5435_A/T; 16:747564_T/G|
|LWPV | 49.6488| 53.7134| 59.1175| 66.0931| 16:123_T/C; 16:54564_A/T; 16:54646_T/G|
I have another data table:
|markerid | prom_65| prom_66| amast_69| amast_70| pvalue|
|:--------------|---------:|---------:|---------:|---------:|---------:|
|16:123234_T/C |x | x | x | x | x |
|16:3453453_G/A| x | x | x x | x |
I would like to add the gene column to table two for the markerid that matches the relevant gene in table one. In table one the markerIDs are all separated by a semi-colon and a markerID will only ever appear within one gene row in table1. In this example the output should look like this:
|markerid | prom_65| prom_66| amast_69| amast_70| pvalue |gene|
|:--------------|---------:|---------:|---------:|---------:|---------:|
|16:123234_T/C |x | x | x | x | x |ABC
|16:3453453_G/A | x | x | x | x | x |DFG
Not sure how to approach this in R.
Many thanks
Without a reproducible example of your table, it is hard to be sure what looks like the last column (it seems to be a list but not sure).
You can try on the second table:
Table2$gene <- sapply(Table2$markerid, function(x) Table1$Gene[grep(x,Table1$marker_id)])
Here an example with dummy dataframes:
dataA <- data.frame(Gene = LETTERS[1:5],
marker = paste(letters[6:10],"_A"))
Gene marker
1 A f _A
2 B g _A
3 C h _A
4 D i _A
5 E j _A
dataB <- data.frame(marker = letters[6:8])
marker
1 f
2 g
3 h
And now, if you use the sapply function:
dataB$Gene <- sapply(dataB$marker, function(x) dataA$Gene[grep(x,dataA$marker)])
1 f A
2 g B
3 h C
Does it look what you are trying to get ?
If it is not working, can you provide the output of str(Table1) ?

SQLite find table row where a subset of columns satisfies a specified constraint

I have the following SQLite table
CREATE TABLE visits(urid INTEGER PRIMARY KEY AUTOINCREMENT,
hash TEXT,dX INTEGER,dY INTEGER,dZ INTEGER);
Typical content would be
# select * from visits;
urid | hash | dx | dY | dZ
------+-----------+-------+--------+------
1 | 'abcd' | 10 | 10 | 10
2 | 'abcd' | 11 | 11 | 11
3 | 'bcde' | 7 | 7 | 7
4 | 'abcd' | 13 | 13 | 13
5 | 'defg' | 20 | 21 | 17
What I need to do here is identify the urid for the table row which satisfies the constraint
hash = 'abcd' AND (nearby >= (abs(dX - tX) + abs(dY - tY) + abs(dZ - tZ))
with the smallest deviation - in the sense of smallest sum of absolute distances
In the present instance with
nearby = 7
tX = tY = tZ = 12
there are three rows that meet the above constraint but with different deviations
urid | hash | dx | dY | dZ | deviation
------+-----------+-------+--------+--------+---------------
1 | 'abcd' | 10 | 10 | 10 | 6
2 | 'abcd' | 11 | 11 | 11 | 3
4 | 'abcd' | 12 | 12 | 12 | 3
in which case I would like to have reported urid = 2 or urid = 3 - I don't actually care which one gets reported.
Left to my own devices I would fetch the full set of matching rows and then dril down to the one that matches my secondary constraint - smallest deviation - in my own Java code. However, I suspect that is not necessary and it can be done in SQL alone. My knowledge of SQL is sadly too limited here. I hope that someone here can put me on the right path.
I now have managed to do the following
CREATE TEMP TABLE h1(v1 INTEGER,v2 INTEGER);
SELECT urid,(SELECT (abs(dX - 12) + abs(dY - 12) + abs(dZ - 12))) devi FROM visits WHERE hash = 'abcd';
which gives
--SELECT * FROM h1
urid | devi |
-------+-----------+
1 | 6 |
2 | 3 |
4 | 3 |
following which I issue
select urid from h1 order by v2 asc limit 1;
which yields urid = 2, the result I am after. Whilst this works, I would like to know if there is a better/simpler way of doing this.
You're so close! You have all of the components you need, you just have to put them together into a single query.
Consider:
SELECT urid
, (abs(dx - :tx) + abs(dy - :tx) + abs(dz - :tx)) AS devi
FROM visits
WHERE hash=:hashval AND devi < :nearby
ORDER BY devi
LIMIT 1
Line by line, first you list the rows and computed values you want (:tx is a placeholder; in your code you want to prepare a statement and then bind values to the placeholders before executing the statement) from the visit table.
Then in the WHERE clause you restrict what rows get returned to those matching the particular hash (That column should have an index for best results... CREATE INDEX visits_idx_hash ON visits(hash) for example), and that have a devi that is less than the value of the :nearby placeholder. (I think devi < :nearby is clearer than :nearby >= devi).
Then you say that you want those results sorted in increasing order according to devi, and LIMIT the returned results to a single row because you don't care about any others (If there are no rows that meet the WHERE constraints, nothing is returned).

Levensthein logic to get all the string with minimum difference

Suppose i have a datframe with values
Mtemp:
-----+
code |
-----+
Ram |
John |
Tracy|
Aman |
i want to compare it with dataframe
M2:
------+
code |
------+
Vivek |
Girish|
Rum |
Rama |
Johny |
Stacy |
Jon |
i want to get result so that for each value in Mtemp i will get maximum 2 possible match in M2 with Levensthein distance 2.
i have used
tp<-as.data.frame(amatch(Mtemp$code,M2$code,method = "lv",maxDist = 2))
tp$orig<-Mtemp$code
colnames(tp)<-c('Res','orig')
and i am getting result as follow
Res |orig
-----+-----
3 |Ram
5 |John
6 |Tracy
4 |Aman
please let me know a way to get 2 values(if possible) for every Mtemp string with Lev distance =2

Recursive query with sub-graph aggregation

I am trying to use Neo4j to write a query that aggregates quantities along a particular sub-graph.
We have two stores Store1 and Store2 one with supplier S1 the other with supplier S2. We move 100 units from Store1 into Store3 and 200 units from Store2 to Store3.
We then move 100 units from Store3 to Store4. So now Store4 has 100 units and approximately 33 originated from supplier S1 and 66 from supplier S2.
I need the query to effectively return this information, E.g.
S1, 33
S2, 66
I have a recursive query to aggregate all the movements along each path
MATCH p=(store1:Store)-[m:MOVE_TO*]->(store2:Store { Name: 'Store4'})
RETURN store1.Supplier, reduce(amount = 0, n IN relationships(p) | amount + n.Quantity) AS reduction
Returns:
| store1.Supplier | reduction|
|-------------------- |-------------|
| S1 | 200 |
| S2 | 300 |
| null | 100 |
Desired:
| store1.Supplier | reduction|
|---------------------|-------------|
| S1 | 33.33 |
| S2 | 66.67 |
What about this one :
MATCH (s:Store) WHERE s.name = 'Store4'
MATCH (s)<-[t:MOVE_TO]-()<-[r:MOVE_TO]-(supp)
WITH t.qty as total, collect(r) as movements
WITH total, movements, reduce(totalSupplier = 0, r IN movements | totalSupplier + r.qty) as supCount
UNWIND movements as movement
RETURN startNode(movement).name as supplier, round(100.0*movement.qty/supCount) as pct
Which returns :
supplier pct
Store1 33
Store2 67
Returned 2 rows in 151 ms
So the following is pretty ugly, but it works for the example you've given.
MATCH (s4:Store { Name:'Store4' })<-[r1:MOVE_TO]-(s3:Store)<-[r2:MOVE_TO*]-(s:Store)
WITH s3, r1.Quantity as Factor, SUM(REDUCE(amount = 0, r IN r2 | amount + r.Quantity)) AS Total
MATCH (s3)<-[r1:MOVE_TO*]-(s:Store)
WITH s.Supplier as Supplier, REDUCE(amount = 0, r IN r1 | amount + r.Quantity) AS Quantity, Factor, Total
RETURN Supplier, Quantity, Total, toFloat(Quantity) / toFloat(Total) * Factor as Proportion
I'm sure it can be improved.

Access 2010 trying to create pie chart by 1 row data

I have data looks like this
Sum Of a | Sum Of b | Sum Of C | Sum Of d
100 | 200 | 300 | 400
In order to create a pie chart I need to change it to format something like this
Sum Of | Value
a | x
c | y
d | z
Question how can I create new table that from the first table by query or any suggestion?
I was able to find this link that I found useful on creating chats from ms-access
Here we see that the trick is to create a saved UNION query named [SometingDataForPieChart]...
SELECT "DONE" AS PieCategory, [DONE] AS PieValue, [AREA] FROM [TABLE]
UNION ALL
SELECT "REMAIN" AS PieCategory, [REMAIN] AS PieValue, [AREA] FROM [TABLE]
...returning...
PieCategory | PieValue | AREA
----------- | -------- | -----
DONE | 100 | AREA1
DONE | 200 | AREA2
DONE | 200 | AREA3
REMAIN | 200 | AREA1
REMAIN | 300 | AREA2
REMAIN | 700 | AREA3
...and this is how you start to do pie chart.
Please read How to add a pie chart to my Access report as it has many images and step by steps.
Credit to: Gord Thompson for this answer

Resources