With today’s post I would like to share a Matlab script I used often to compare lists of ID numbers stored in separate text files. The ID numbers can be of anything: oil and gas wells, mining diamond drill hole locations, gravity or resistivity measurement stations, outcrop locations, you name it. And the script will handle any combination of ASCII characters (numbers, letters, etcetera).
I included below here some test files for you to try with the code, which is in the next section. Please refer to the comment section in the code for usage and file description. Have fun, and if you try it on your lists, let me know how it works for you.
These files are in doc format; they need to be downloaded and saved as plain txt files. Test1 and Test2 contain 2 short lists of (fake) Canadian Oil and Gas wells; the former is the result of a search using a criterion (wells inside a polygonal area), the latter is a list of wells that have digital wireline logs. Test3 and Test4 are short lists of (fake) diamond drill holes.
*** Notice that script is setup to compare list in Test2.txt against list in Test1.txt and not the other way around. If using this on your own lists, you will have to decide in advance which subset of wells you are interested in.
Here is the code:
% Script to compare lists of wells from 2 text files % (wells are identified by a well ID string) % %%%% INPUT % must be text files with 1-line header % % Sample test files provided: % File 1 - test1.txt - list of wells satisfying a criteria % (for example wells with digital wireline logs) % % File 2 - test2.txt - list of all wells within an area % (for example returned using polygon search) % % These FILES contain (dummy) oil and gas wells from Canada % i.e. wells with a 16-digit identifier of format 1XX021611428W600 % Header line is simply 'UWI' (Unique Well Identifier) % For example: % UWI % 1XX021611428W600 % 1XX023511428W600 % ... % ... % To use with other files either % 1 - actual name of the files must be changed to test1.txt and test2.txt % or % 2 - names in line 47 and 51 must be changed to 'yourfile1.txt' and % 'yourfile2.txt', respectively % %%%% OUTPUT % script extracts wells from list 2 that satisfy criteria and saves to % File 3 - out.xls % %%%% % To use different header line and well identifier formats n and h must be % changed accordingly % For example suppose we had a file of diamond drill holes from a mining % exploration permit, containing Diamond Drill Hole IDs in this format: % DDH-ID (header line) % DDH-23 % DDH-57 % ... % ... % we would need to change both h and n to 6 % try it with test3.txt and test4.txt % %% import data curves=textread('test1.txt','%c'); % open file - this is list of wells with digital wireline logs (curves) % notice script reads each character in the file as a row % inside=textread('test2.txt','%c'); % open file - this is list of wells inside rectangle or polygon % notice script reads each character in the file as a row % %% cleanup, reshape variables, convert to cell array n=16; % number of characters in well identifier (for example, 16) h=3; % number of characters in header line (for example, 3) [r,c]=size(inside); l=(r-h)/n; % remove characters in the header inside1=reshape(inside((h+1):r),n,l)'; % line and reshape cellinside=cellstr(inside1); % create cell array from char array % [t,s]=size(curves); f=(t-h)/n; % remove characters in the header curves1=reshape(curves((h+1):t),n,f)'; % line and reshape cellcurves=cellstr(curves1); % create cell array from char array % %% compare lists for k=1:l for j=1:f TF(k,j) = strcmpi(cellinside(k),cellcurves(j)); end end % this is the kernel of the script: % compare list of wells with logs (curves) vs. list of wells inside area % % From Matlab help: the command TF = strcmpi(s1,s2) % compares the strings s1 and s2 and returns logical 1 (true) if they are % the same (ignoring case), and returns logical 0 (false) otherwise % %% extract matching UWI and save output to Excel file TTFF=max(int8(TF)')'; vi=logical(TTFF); exp=cellinside(vi); xlswrite('out.xls',exp); % %% optional house cleaning clear inside curves n h r c l t s f TF TTFF j k vi
You could achieve the same result in Excel using a formula like this (applied to a single spreadsheet with lists in 2 columns, A and B):
There’s a couple of good examples and alternative Excel solutions here. However I like Excel a lot less because the automation (with Macros, visual basic) is more cumbersome, and because it is more prone to errors that are less easy to QC (contrary to Matlab, which will give you error messages and has a straightforward debugger). When Excel is really necessary is in those instances in which in addition to finding a subset of wells using the 2 lists you want to extract information from other columns, such as coordinates, ground elevation, date drilled, operator, common well name, etcetera; then VLOOKUP is the right tool. I will show you how to do just that in a future post.
Since a picture is worth a thousand words I thought I’d add a pictorial example. Below is a geological map from a mining exploration area in Tuscany, Italy, where high grade asbestos was mined in the past, and gold exploration is under way. I added to this map black square symbols to indicate locations where stream sediments were collected during a geochemical prospecting campaign in the ’80s (samples derived from the sediments are analyzed for concentrations in gold pathfinder elements). Suppose we had 2 lists: List1 contains all locations with anomalous concentrations in 1 or more of the pathfinder elements (I marked these locations with red diamond symbols). List2 contains all locations (I marked those with green cross symbols) with outcropping fault contact between the metamorphic basement and the more recent sedimentary rock formations (faults bounding horst structures formed in extensional settings are believed to act as conduits for gold bearing fluids). If we run the Matlab script using List1 and List2 it would return a list of locations with fault contact AND anomalous pathfinder concentrations. Those are indicated by arrows on the map – the location with just the green cross and those with just the red diamond would not be included.
Sources (through the Regione Toscana): for map, University of Siena, Department of Earth Science, Geological Map of the Regione Toscana scale 1:10,000, sheet 343070; for Stream Sediment data: ENI/RIMIN, unpublished Industry Report, 1989