How to Split Numbers Away from a String and Sort Them

Usually we sort a column of text data by comparing whole values. At times values consist of letters and numbers, like names plus numbers shown below:

Mike105

Rose11

Rose1

Rose5

Mike6

And we need to sort the column by names first and then by numbers. So, in the following expected result, Mike6 appears before Mike105:

Mike6

Mike105

Rose1

Rose5

Rose11

To write a program directly based on the text data, we need to first split the data and then concatenate and sort it. It’s inconvenient.

When data is stored in the database, we usually split it into the temporary table TData using the stored procedure and perform the sort in SQL. For example, SQL Server2012 will first define a user-defined scalar valued function dbo.words for splitting data:

 

CREATE FUNCTION dbo.words(@word VARCHAR(100) , @part INT)RETURNS VARCHAR(100)

AS

BEGIN

DECLARE @sub VARCHAR(100)

DECLARE @tmp VARCHAR(10)

DECLARE @numIndex INT

DECLARE @len INT

DECLARE @i INT

SET @i = 1

SET @len = Len(@word)

WHILE @i<=@len

BEGIN

SET @tmp = Substring(@word,@i,1)

IF(ISNUMERIC(@tmp)=1)

BEGIN

SET @numIndex = @i

BREAK

END

SET @i=@i+1

END

IF(@part=1)

BEGIN

SET @sub=Substring(@word,1,@numIndex-1)

END

ELSE

BEGIN

SET @sub=Substring(@word,@numIndex,@len-@numIndex+1)

END

RETURN @sub

END

 

In the above code parameter @part specifies the characters to be split. 1 represents the name part and 2 means the number part.

Note that the split-away numbers should be first converted into integers before they are sorted. Otherwise the result would be the same as that of sorting the strings as a whole. Perform the sort using the following SQL:

SELECT * FROM TData ORDER BY dbo.words(STR,1),cast(dbo.words(STR,2) as int)

This method of creating a user-defined database function to split data, however, isn’t universal and can’t be migrated to other databases.

 

The computation would be made rather convenient if we could use esProc SPL to handle it. Only a one-liner is needed:

=file("d:/data.txt").import().sort(#1.words@w())

 

esProc is a computing middleware. It retrieves data from a data source and then handles data in a universal logic. This makes the code capable of migrating. You can use a SPL script to sort, group, filter a structured text file and join text files. More examples can be found in Structured Text Computing with esProc.

 

SPL is integration-friendly with a Java program. Read How to Call an SPL Script in Java to learn details.

About how to work with esProc, read Getting Started with esProc.