Raqsoft 28 No.
22 View •
JOINing Two Excel Worksheets
I’m working on combining two excel worksheets. Before I start, I’d like to mention that I also have mysql workbench, so I’m open to working on this issue in either sql or vba (I should learn both). I’m working with .bed files, which are lists of genomic coordinates. In short, the data is indexed by chromosome number (ie:chr2) and then has a numerical start and stop location on the chromosome. These numerical locations can span a large range (ie:100-10,000) or be a single position (ie: 999-1000). I have a list of coordinates that cover a large range, and in a separate file I have a list of single positions.
Example of a file with ranges:
chromosome start stop
Example of file with single positions:
chromosome start stop
I would like to combine these worksheets such that if a location in my list of single positions is found within the range in my list of ranges, the locations for both are listed in the same row. The lists are 1000s of locations long, so I’d also like this program to loop through every row. Using the example data listed above, I’d like my output to look like the following:
Example of desired output:
chromosome start stop chromosome start stop
chr1 45616321 chr1 52135214
chr3 984211253 chr3 1025410255
There is a high probability that multiple single positions will fall within a single range, and I would like these to be listed as separate rows.
Below is a SQL solution. It works but it is complicated, because data needs to be retrieved from and then exported to the database.
select a.chromosome, a.start, a stop, b.chromosome, b.start, b.stop from ranges_table a, positions_table b where b.start >= a.start and b.stop <= a.stop
The following VBA solution also consists of the SQL query and the retrieval and export code.
Dim oConnAsNew ADODB.Connection
Dim oRSAsNew ADODB.Recordset
Dim sSQLAsString, wbAsWorkbook
Set wb =ThisWorkbook
sSQL ="select a.chromosome, a.start, a stop,"& _
"b.chromosome, b.start, b.stop"& _
"from <ranges_table> a, <positions_table> b"& _
"where b.start >= a.start and b.stop <= a.stop"
sSQL =Replace(sSQL,“<ranges_table>”, _
sSQL =Replace(sSQL,“<positions_table>”, _
sPath = wb.FullName
MsgBox"The workbook must be saved first!"
oConn.Open"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=‘"& sPath &"’;"& _
“Extended Properties=‘Excel 12.0;HDR=Yes;IMEX=1’;”
MsgBox"No records found"
Rangename=“[”& r.Parent.Name &“$”& _
It’s rather complicated to do this in VBA. Here’s a solution in SPL:
=xjoin(A1;A2,start>A1.start && stop<A1.stop)
Result of executing the SPL script:
A1: Import one of the Excel worksheet as a table sequence;
A2: Import another Excel worksheet as table sequence;
A3: Perform a cross join over the two table sequences;
A4: Export the resulting table sequence to a new Excel worksheet.