Application Development and Automation Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Algorithm "Longest common subsequence problem" in ABAP

Sandra_Rossi
Active Contributor
1,933

Hi,

I'm a little lazy to develop it myself... Did anyone implement the algorithm "Longest common subsequence problem" between 2 strings, in ABAP, and would like to share it? This is an algorithm which can be used in Git to display the characters which are different between two lines for instance.

I want to achieve something like that (Azure DevOps), i.e. 2 colors on the same line (light green versus darker green) - My issue is to know which characters are different, not how to render the colors of course:

Or do you know a better alternative?

Thanks 🙂

Sandra

1 ACCEPTED SOLUTION
Read only

Sandra_Rossi
Active Contributor
1,083

I'm not so lazy in fact 🙂 - Here is a basic solution, not much tested, use it at your own risk (and from there, it's easy to implement the colors - note that there can be different algorithms for the diff, more intelligent like comparison based on words) :

CLASS lcl_lcs DEFINITION.
  PUBLIC SECTION.

    METHODS constructor
      IMPORTING
        x TYPE string
        y TYPE string.

    DATA: result TYPE string READ-ONLY.

  PRIVATE SECTION.

    METHODS lcs_length
      RETURNING
        VALUE(lcs_length) TYPE i.

    METHODS backtrack
      IMPORTING
        i TYPE i
        j TYPE i.

    DATA: x     TYPE REF TO string,
          y     TYPE REF TO string,
          c     TYPE TABLE OF i,
          m     TYPE i,
          n     TYPE i,
          dim_j TYPE i,
          dim_c TYPE i.

ENDCLASS.

CLASS lcl_lcs IMPLEMENTATION.

  METHOD constructor.

    me->x = REF #( x ).
    me->y = REF #( y ).
    m = strlen( x ).
    n = strlen( y ).
    dim_j = strlen( y ) + 1.
    dim_c = ( m + 1 ) * ( n + 1 ).

    " add lines by using "factorial" (may overflow by 2, more memory but it's fast)
    APPEND INITIAL LINE TO c.
    DO.
      APPEND LINES OF c TO c.
      IF lines( c ) >= dim_c.
        EXIT.
      ENDIF.
    ENDDO.

    lcs_length( ).

    backtrack( i = ( m - 1 ) j = ( n - 1 ) ).

    FREE c.

  ENDMETHOD.

  METHOD lcs_length.
*function LCSLength(X[1..m], Y[1..n])
*    C = array(0..m, 0..n)
*    for i := 0..m
*       C[i,0] = 0
*    for j := 0..n
*       C[0,j] = 0
*    for i := 1..m
*        for j := 1..n
*            if X[i] = Y[j]
*                C[i,j] := C[i-1,j-1] + 1
*            else
*                C[i,j] := max(C[i,j-1], C[i-1,j])
*    return C[m,n]
*Alternatively, memoization could be used.

    DO m TIMES.
      DATA(i) = sy-index.
      DATA(im) = sy-index - 1.
      DO n TIMES.
        DATA(j) = sy-index.
        DATA(jm) = sy-index - 1.
        IF x->*+im(1) = y->*+jm(1).
          c[ dim_j * i + j + 1 ] = c[ dim_j * im + jm + 1 ] + 1.
        ELSE.
          c[ dim_j * i + j + 1 ] = nmax( val1 = c[ dim_j * i + jm + 1 ] val2 = c[ dim_j * im + j + 1 ] ).
        ENDIF.
      ENDDO.
    ENDDO.

    lcs_length = c[ dim_j * m + n + 1 ].

  ENDMETHOD.

  METHOD backtrack.
*Reading out a LCS
*The following function backtracks the choices taken when computing the C table. If the
*last characters in the prefixes are equal, they must be in an LCS. If not, check what
*gave the largest LCS of keeping {\displaystyle x_{i}} x_{i} and {\displaystyle y_{j}} y_{j},
*and make the same choice. Just choose one if they were equally long. Call the function with i=m and j=n.

*function backtrack(C[0..m,0..n], X[1..m], Y[1..n], i, j)
*    if i = 0 or j = 0
*        return ""
*    if  X[i] = Y[j]
*        return backtrack(C, X, Y, i-1, j-1) + X[i]
*    if C[i,j-1] > C[i-1,j]
*        return backtrack(C, X, Y, i, j-1)
*    return backtrack(C, X, Y, i-1, j)

    IF i = -1 OR j = -1.
      " nothing (stop the recursivity)
    ELSEIF x->*+i(1) = y->*+j(1).
      backtrack( i = i - 1 j = j - 1 ).
      result = result && x->*+i(1).
    ELSEIF c[ dim_j * ( i + 1 ) + ( j ) + 1 ] > c[ dim_j * ( i ) + ( j + 1 ) + 1 ].
      backtrack( i = i j = j - 1 ).
    ELSE.
      backtrack( i = i - 1 j = j ).
    ENDIF.

  ENDMETHOD.

ENDCLASS.

CLASS ltc_main DEFINITION
      FOR TESTING
      DURATION SHORT
      RISK LEVEL HARMLESS
      INHERITING FROM cl_aunit_assert.
  PRIVATE SECTION.
    METHODS test FOR TESTING.
ENDCLASS.

CLASS ltc_main IMPLEMENTATION.
  METHOD test.
    cl_abap_unit_assert=>assert_equals( act = NEW lcl_lcs( x = `XMJYAUZ` y = `MZJAWXU` )->result exp = `MJAU` ).
  ENDMETHOD.
ENDCLASS.
1 REPLY 1
Read only

Sandra_Rossi
Active Contributor
1,084

I'm not so lazy in fact 🙂 - Here is a basic solution, not much tested, use it at your own risk (and from there, it's easy to implement the colors - note that there can be different algorithms for the diff, more intelligent like comparison based on words) :

CLASS lcl_lcs DEFINITION.
  PUBLIC SECTION.

    METHODS constructor
      IMPORTING
        x TYPE string
        y TYPE string.

    DATA: result TYPE string READ-ONLY.

  PRIVATE SECTION.

    METHODS lcs_length
      RETURNING
        VALUE(lcs_length) TYPE i.

    METHODS backtrack
      IMPORTING
        i TYPE i
        j TYPE i.

    DATA: x     TYPE REF TO string,
          y     TYPE REF TO string,
          c     TYPE TABLE OF i,
          m     TYPE i,
          n     TYPE i,
          dim_j TYPE i,
          dim_c TYPE i.

ENDCLASS.

CLASS lcl_lcs IMPLEMENTATION.

  METHOD constructor.

    me->x = REF #( x ).
    me->y = REF #( y ).
    m = strlen( x ).
    n = strlen( y ).
    dim_j = strlen( y ) + 1.
    dim_c = ( m + 1 ) * ( n + 1 ).

    " add lines by using "factorial" (may overflow by 2, more memory but it's fast)
    APPEND INITIAL LINE TO c.
    DO.
      APPEND LINES OF c TO c.
      IF lines( c ) >= dim_c.
        EXIT.
      ENDIF.
    ENDDO.

    lcs_length( ).

    backtrack( i = ( m - 1 ) j = ( n - 1 ) ).

    FREE c.

  ENDMETHOD.

  METHOD lcs_length.
*function LCSLength(X[1..m], Y[1..n])
*    C = array(0..m, 0..n)
*    for i := 0..m
*       C[i,0] = 0
*    for j := 0..n
*       C[0,j] = 0
*    for i := 1..m
*        for j := 1..n
*            if X[i] = Y[j]
*                C[i,j] := C[i-1,j-1] + 1
*            else
*                C[i,j] := max(C[i,j-1], C[i-1,j])
*    return C[m,n]
*Alternatively, memoization could be used.

    DO m TIMES.
      DATA(i) = sy-index.
      DATA(im) = sy-index - 1.
      DO n TIMES.
        DATA(j) = sy-index.
        DATA(jm) = sy-index - 1.
        IF x->*+im(1) = y->*+jm(1).
          c[ dim_j * i + j + 1 ] = c[ dim_j * im + jm + 1 ] + 1.
        ELSE.
          c[ dim_j * i + j + 1 ] = nmax( val1 = c[ dim_j * i + jm + 1 ] val2 = c[ dim_j * im + j + 1 ] ).
        ENDIF.
      ENDDO.
    ENDDO.

    lcs_length = c[ dim_j * m + n + 1 ].

  ENDMETHOD.

  METHOD backtrack.
*Reading out a LCS
*The following function backtracks the choices taken when computing the C table. If the
*last characters in the prefixes are equal, they must be in an LCS. If not, check what
*gave the largest LCS of keeping {\displaystyle x_{i}} x_{i} and {\displaystyle y_{j}} y_{j},
*and make the same choice. Just choose one if they were equally long. Call the function with i=m and j=n.

*function backtrack(C[0..m,0..n], X[1..m], Y[1..n], i, j)
*    if i = 0 or j = 0
*        return ""
*    if  X[i] = Y[j]
*        return backtrack(C, X, Y, i-1, j-1) + X[i]
*    if C[i,j-1] > C[i-1,j]
*        return backtrack(C, X, Y, i, j-1)
*    return backtrack(C, X, Y, i-1, j)

    IF i = -1 OR j = -1.
      " nothing (stop the recursivity)
    ELSEIF x->*+i(1) = y->*+j(1).
      backtrack( i = i - 1 j = j - 1 ).
      result = result && x->*+i(1).
    ELSEIF c[ dim_j * ( i + 1 ) + ( j ) + 1 ] > c[ dim_j * ( i ) + ( j + 1 ) + 1 ].
      backtrack( i = i j = j - 1 ).
    ELSE.
      backtrack( i = i - 1 j = j ).
    ENDIF.

  ENDMETHOD.

ENDCLASS.

CLASS ltc_main DEFINITION
      FOR TESTING
      DURATION SHORT
      RISK LEVEL HARMLESS
      INHERITING FROM cl_aunit_assert.
  PRIVATE SECTION.
    METHODS test FOR TESTING.
ENDCLASS.

CLASS ltc_main IMPLEMENTATION.
  METHOD test.
    cl_abap_unit_assert=>assert_equals( act = NEW lcl_lcs( x = `XMJYAUZ` y = `MZJAWXU` )->result exp = `MJAU` ).
  ENDMETHOD.
ENDCLASS.