Solved: Re: Algorithm "Longest common subsequence problem"...

Sandra_Rossi · ‎2018 Oct 02

Hi,

I'm a little lazy to develop it myself... Did anyone implement the algorithm "Longest common subsequence problem" between 2 strings, in ABAP, and would like to share it? This is an algorithm which can be used in Git to display the characters which are different between two lines for instance.

I want to achieve something like that (Azure DevOps), i.e. 2 colors on the same line (light green versus darker green) - My issue is to know which characters are different, not how to render the colors of course:

Or do you know a better alternative?

Thanks 🙂

Sandra

Sandra_Rossi · ‎2018 Oct 02

I'm not so lazy in fact 🙂 - Here is a basic solution, not much tested, use it at your own risk (and from there, it's easy to implement the colors - note that there can be different algorithms for the diff, more intelligent like comparison based on words) :

CLASS lcl_lcs DEFINITION.
  PUBLIC SECTION.

    METHODS constructor
      IMPORTING
        x TYPE string
        y TYPE string.

    DATA: result TYPE string READ-ONLY.

  PRIVATE SECTION.

    METHODS lcs_length
      RETURNING
        VALUE(lcs_length) TYPE i.

    METHODS backtrack
      IMPORTING
        i TYPE i
        j TYPE i.

    DATA: x     TYPE REF TO string,
          y     TYPE REF TO string,
          c     TYPE TABLE OF i,
          m     TYPE i,
          n     TYPE i,
          dim_j TYPE i,
          dim_c TYPE i.

ENDCLASS.

CLASS lcl_lcs IMPLEMENTATION.

  METHOD constructor.

    me->x = REF #( x ).
    me->y = REF #( y ).
    m = strlen( x ).
    n = strlen( y ).
    dim_j = strlen( y ) + 1.
    dim_c = ( m + 1 ) * ( n + 1 ).

    " add lines by using "factorial" (may overflow by 2, more memory but it's fast)
    APPEND INITIAL LINE TO c.
    DO.
      APPEND LINES OF c TO c.
      IF lines( c ) >= dim_c.
        EXIT.
      ENDIF.
    ENDDO.

    lcs_length( ).

    backtrack( i = ( m - 1 ) j = ( n - 1 ) ).

    FREE c.

  ENDMETHOD.

  METHOD lcs_length.
*function LCSLength(X[1..m], Y[1..n])
*    C = array(0..m, 0..n)
*    for i := 0..m
*       C[i,0] = 0
*    for j := 0..n
*       C[0,j] = 0
*    for i := 1..m
*        for j := 1..n
*            if X[i] = Y[j]
*                C[i,j] := C[i-1,j-1] + 1
*            else
*                C[i,j] := max(C[i,j-1], C[i-1,j])
*    return C[m,n]
*Alternatively, memoization could be used.

    DO m TIMES.
      DATA(i) = sy-index.
      DATA(im) = sy-index - 1.
      DO n TIMES.
        DATA(j) = sy-index.
        DATA(jm) = sy-index - 1.
        IF x->*+im(1) = y->*+jm(1).
          c[ dim_j * i + j + 1 ] = c[ dim_j * im + jm + 1 ] + 1.
        ELSE.
          c[ dim_j * i + j + 1 ] = nmax( val1 = c[ dim_j * i + jm + 1 ] val2 = c[ dim_j * im + j + 1 ] ).
        ENDIF.
      ENDDO.
    ENDDO.

    lcs_length = c[ dim_j * m + n + 1 ].

  ENDMETHOD.

  METHOD backtrack.
*Reading out a LCS
*The following function backtracks the choices taken when computing the C table. If the
*last characters in the prefixes are equal, they must be in an LCS. If not, check what
*gave the largest LCS of keeping {\displaystyle x_{i}} x_{i} and {\displaystyle y_{j}} y_{j},
*and make the same choice. Just choose one if they were equally long. Call the function with i=m and j=n.

*function backtrack(C[0..m,0..n], X[1..m], Y[1..n], i, j)
*    if i = 0 or j = 0
*        return ""
*    if  X[i] = Y[j]
*        return backtrack(C, X, Y, i-1, j-1) + X[i]
*    if C[i,j-1] > C[i-1,j]
*        return backtrack(C, X, Y, i, j-1)
*    return backtrack(C, X, Y, i-1, j)

    IF i = -1 OR j = -1.
      " nothing (stop the recursivity)
    ELSEIF x->*+i(1) = y->*+j(1).
      backtrack( i = i - 1 j = j - 1 ).
      result = result && x->*+i(1).
    ELSEIF c[ dim_j * ( i + 1 ) + ( j ) + 1 ] > c[ dim_j * ( i ) + ( j + 1 ) + 1 ].
      backtrack( i = i j = j - 1 ).
    ELSE.
      backtrack( i = i - 1 j = j ).
    ENDIF.

  ENDMETHOD.

ENDCLASS.

CLASS ltc_main DEFINITION
      FOR TESTING
      DURATION SHORT
      RISK LEVEL HARMLESS
      INHERITING FROM cl_aunit_assert.
  PRIVATE SECTION.
    METHODS test FOR TESTING.
ENDCLASS.

CLASS ltc_main IMPLEMENTATION.
  METHOD test.
    cl_abap_unit_assert=>assert_equals( act = NEW lcl_lcs( x = `XMJYAUZ` y = `MZJAWXU` )->result exp = `MJAU` ).
  ENDMETHOD.
ENDCLASS.

Sandra_Rossi · ‎2018 Oct 02

I'm not so lazy in fact 🙂 - Here is a basic solution, not much tested, use it at your own risk (and from there, it's easy to implement the colors - note that there can be different algorithms for the diff, more intelligent like comparison based on words) :

CLASS lcl_lcs DEFINITION.
  PUBLIC SECTION.

    METHODS constructor
      IMPORTING
        x TYPE string
        y TYPE string.

    DATA: result TYPE string READ-ONLY.

  PRIVATE SECTION.

    METHODS lcs_length
      RETURNING
        VALUE(lcs_length) TYPE i.

    METHODS backtrack
      IMPORTING
        i TYPE i
        j TYPE i.

    DATA: x     TYPE REF TO string,
          y     TYPE REF TO string,
          c     TYPE TABLE OF i,
          m     TYPE i,
          n     TYPE i,
          dim_j TYPE i,
          dim_c TYPE i.

ENDCLASS.

CLASS lcl_lcs IMPLEMENTATION.

  METHOD constructor.

    me->x = REF #( x ).
    me->y = REF #( y ).
    m = strlen( x ).
    n = strlen( y ).
    dim_j = strlen( y ) + 1.
    dim_c = ( m + 1 ) * ( n + 1 ).

    " add lines by using "factorial" (may overflow by 2, more memory but it's fast)
    APPEND INITIAL LINE TO c.
    DO.
      APPEND LINES OF c TO c.
      IF lines( c ) >= dim_c.
        EXIT.
      ENDIF.
    ENDDO.

    lcs_length( ).

    backtrack( i = ( m - 1 ) j = ( n - 1 ) ).

    FREE c.

  ENDMETHOD.

  METHOD lcs_length.
*function LCSLength(X[1..m], Y[1..n])
*    C = array(0..m, 0..n)
*    for i := 0..m
*       C[i,0] = 0
*    for j := 0..n
*       C[0,j] = 0
*    for i := 1..m
*        for j := 1..n
*            if X[i] = Y[j]
*                C[i,j] := C[i-1,j-1] + 1
*            else
*                C[i,j] := max(C[i,j-1], C[i-1,j])
*    return C[m,n]
*Alternatively, memoization could be used.

    DO m TIMES.
      DATA(i) = sy-index.
      DATA(im) = sy-index - 1.
      DO n TIMES.
        DATA(j) = sy-index.
        DATA(jm) = sy-index - 1.
        IF x->*+im(1) = y->*+jm(1).
          c[ dim_j * i + j + 1 ] = c[ dim_j * im + jm + 1 ] + 1.
        ELSE.
          c[ dim_j * i + j + 1 ] = nmax( val1 = c[ dim_j * i + jm + 1 ] val2 = c[ dim_j * im + j + 1 ] ).
        ENDIF.
      ENDDO.
    ENDDO.

    lcs_length = c[ dim_j * m + n + 1 ].

  ENDMETHOD.

  METHOD backtrack.
*Reading out a LCS
*The following function backtracks the choices taken when computing the C table. If the
*last characters in the prefixes are equal, they must be in an LCS. If not, check what
*gave the largest LCS of keeping {\displaystyle x_{i}} x_{i} and {\displaystyle y_{j}} y_{j},
*and make the same choice. Just choose one if they were equally long. Call the function with i=m and j=n.

*function backtrack(C[0..m,0..n], X[1..m], Y[1..n], i, j)
*    if i = 0 or j = 0
*        return ""
*    if  X[i] = Y[j]
*        return backtrack(C, X, Y, i-1, j-1) + X[i]
*    if C[i,j-1] > C[i-1,j]
*        return backtrack(C, X, Y, i, j-1)
*    return backtrack(C, X, Y, i-1, j)

    IF i = -1 OR j = -1.
      " nothing (stop the recursivity)
    ELSEIF x->*+i(1) = y->*+j(1).
      backtrack( i = i - 1 j = j - 1 ).
      result = result && x->*+i(1).
    ELSEIF c[ dim_j * ( i + 1 ) + ( j ) + 1 ] > c[ dim_j * ( i ) + ( j + 1 ) + 1 ].
      backtrack( i = i j = j - 1 ).
    ELSE.
      backtrack( i = i - 1 j = j ).
    ENDIF.

  ENDMETHOD.

ENDCLASS.

CLASS ltc_main DEFINITION
      FOR TESTING
      DURATION SHORT
      RISK LEVEL HARMLESS
      INHERITING FROM cl_aunit_assert.
  PRIVATE SECTION.
    METHODS test FOR TESTING.
ENDCLASS.

CLASS ltc_main IMPLEMENTATION.
  METHOD test.
    cl_abap_unit_assert=>assert_equals( act = NEW lcl_lcs( x = `XMJYAUZ` y = `MZJAWXU` )->result exp = `MJAU` ).
  ENDMETHOD.
ENDCLASS.

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My Account

My Account

Algorithm "Longest common subsequence problem" in ABAP