on ‎2018 Mar 16 9:03 PM
Update 1: I found out how to reproduce the issue fairly reliably, see end of post.
Update 2: I found the root cause involves a dump and does seems to originate from Azure/SAP combination, see my answer below.
When I use my local eclipse to connect to a cloud instance (Azure), I often see timeout and locking errors. There is no problem with the connection, via SAPGUI it is perfectly stable and when it works Eclipse is also generally quick. But then very randomly I'll hit a bump that goes something like:
Save -> spinning wheel for a while > message "Timeout getting a lock"
If I try again I see the message "Object could not be locked / User DEVELOPER is currently editing ZCL..."
I have to manually remove the lock via SM12 to continue working.
It seems to happen more often when I've left the system idle for a couple of minutes - but not always.
I have 200MB fibre broadband and experience no other internet issues. SAPGUI and remote desktop to the same instance is perfectly fine. I use VPN both inbound and outbound to work remotely all day long, there are no issues whatsoever other than using ADT.
---Edit---:
I've been trying to observe a pattern, it seems to mainly happen on generation and code changes, but NOT save. The following procedure reproduces it quite often:
If you close and re-open, the last change is still there, so it definitely saved under the existing lock. But you can no longer edit until you remove the lock in SM12.
Something is definitively fishy and it's not at my end.
I'm using the latest Eclipse Oxygen with latest ADT on a Mac. I will try it on a Windows VM when I have some more time.
Request clarification before answering.
Hi Mike and Community,
I'm sorry for the slow progress, it was not possible for our team to work on this topic as much as we would like to.
Meanwhile two problematic flows are well understood on our side. One was already described by Kjetil: ADT does not properly listen to the success-response of the lock request in case the lock request needs more than 30 seconds.
The other problematic flow is unfortunately more difficult. The ADT locking concept is based on a stateful RFC connection. In case this connection completely breaks we are sufficiently prepared and no lock-yourself-situation happens. However in the situation that we discuss here it seems the RFC connection breaks in a way that makes it immediately unusable for the ADT client while at the same time the connection looks fine from the servers perspective. In fact the connection is still writable for the server on TCP layer. Therefore the ABAP server may need multiple minutes to detect that the network connection is actually broken and during that time frame the user currently faces zombie-locks.
Unfortunately, I still cannot provide a correction date at the moment...
We will also consider to automatically detect such situations and to perform a SM12-like deletion as a repair, but the final solution shouldn't repair but rather ensure that we just don't run into the problematic situation at all. Also an automated SM12-like deletion would not work for all developers, because there are surely developer roles without the necessary admin-authorizations.
One more thing to add: The lock-yourself-issues occur after RFC connections break on a network layer or after requests hang for more than 30 seconds while they usually should take less than 1 second. Therefore, in case you experience these lock-yourself-issues regularly there is probably a network issue in your setup that regularly causes broken or hanging RFC connections. If you find a way to fix the network issue in your setup - for example in Ingo's case it was the wrongly configured nameserver - then the follow-up issues and the lock-yourself-issues will also be solved immediately.
This also means that once we succeed to fix the robustness of the ADT locking mechanism this will only avoid that the lock-yourself-situations occur. The network issues including slow requests (might include hanging IDE) will remain unless you can fix the network root issue.
@Mike: Since you wrote your network issues occur only with ADT I assume that you also expect the network root issue on SAP side. I will try to find a RFC expert that is willing to help you analyze the network root issue, but since SCN is not a official support channel (e.g. like customer incidents) I cannot guarantee the help of other SAP colleagues here.
I'll provide an update once there is further progress from ADT side.
Best regards,
Armin
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi armin.beil.2,
Thanks for your detailed response!
I think there's a little more to it than a flaky network connection, for two reasons:
I tried one other thing: I used the Windows client from the CAL NetWeaver ABAP system on AWS to connect to a CAL NetWeaver ABAP instance on Azure: Same result. This conclusively proves it has nothing to do with my local setup.
Just to be clear on the above test: Remote Desktop -> Windows instance on AWS -> Eclipse -> NetWeaver ABAP on Azure
My best guess is that what you describe as "the RFC connection breaks in a way that makes it immediately unusable for the ADT client" is something that happens on Azure. Either it's something about Azure networking or perhaps the SAP systems are preconfigured differently when deployed onto Azure?
Whichever the case may be, an SM12-type deletion seems just a plaster if it appears to be the case that there is a root cause that is not present on AWS.
Thanks,
Mike
Hi,
Has anyone a recent update on this blogpost? Still looking for a solution.
Thanks!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Mike,
Can you please check the nameserver entry in your /etc/resolv.conf?
Depending whether your server is external please change the value e.g. to:
#Using Google Nameserver
nameserver 8.8.8.8
Hope that helps,
Ingo
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
the ADT version scheduled for 01.08.19 has already been released ?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
Has anyone a recent update on this blogpost? Still looking for a solution.
Eclipse Version: 2019-06 (4.12.0)ï¼›
SAP S/4HANA 1709
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Hai Bo Shen,
Sorry that there was no further progress or update on this topic recently. I'll try to use your feedback as a new trigger to get more awareness and dev capactiy within the ADT teams for this issue.
Next release of ADT (planned for 01.08.19) will fix a bug that may cause lock issues during activation (without waiting time; a popup tells you immediately that your own user already edits the object), while a second attempt to activate usually just works and you don't have to use SM12.
For the issue discussed in this thread (where you get a frozen UI for 20+ seconds and have to use SM12), we added a mitigation for the subvariant discribed by Kjetil. The main issue however is unfortunately not fixed and currently there is no known workaround.
If you would like to support us getting it fixed, it would be great if you could create a customer incident (component BC-DWB-AIE), describe the issue and link this thread.
If it is not possible for you to create an incident, it would be also helpful if you could answer the questions:
- Is the issue rather reproducible or does it occur only sporadically?
- Is there a UI freeze and you have to wait 20s+?
- Do you have to use SM12 to get rid of the lock or is the issue temporary and disappear after 2-3 seconds?
- Do you also use Azure?
Best regards,
Armin
OK, so after installing Eclipse on a Windows machine I found the same issue there too. Then I noticed a series of dumps that coincided with the lock failures.
The dump is CALL_FUNCTION_ACCEPT_FAILED in SAPMSSY1 (RFC): Error while starting a Remote Function Call (ACCEPT).
I pulled out an RFC trace which had a little more detail than the dump: It clearly comes from eclipse (FM SADT_REST_RFC_ENDPOINT), and complains that the gateway might be closed. But the thing is if I retry then it works. Can anyone shed some light?
Here's the RFC trace file:
**** Trace file opened at 20180331 210646 UTC, by disp+work
**** Versions SAP-REL 753,0,16 RFC-VER U 3 1782874 MT-SL
======> CPIC-CALL: 'ThSAPCMRCV', communication rc: CM_DEALLOCATED_ABEND (cmRc=17), taskhandler rc: READ_FROM_GW_FAILED (thRc=239)
Error with SAP gateway communication; check if SAP gateway is closed
ABAP Programm: SAPMSSY1 (Transaction: )
Called function module: SADT_REST_RFC_ENDPOINT
User: DEVELOPER (Client: 001)
Destination: NPL_752_openSAP (Handle: 1, DtConId: 00000000000000000000000000000000, DtConCnt: 0, ConvId: 26968172,{871330C0-3514-11E8-C5C0-ECC27F000001})
EPP RootContextId: 00000000000000000000000000000000, ConnectionId: 871330C0351411E8C5C0ECC27F000001, ConnectionCnt: 17
EPP TransactionId:
SERVER> RFC Server Session (handle: 1, 26968172, {871330C0-3514-11E8-C5C0-ECC27F000001})
SERVER> Caller host:
SERVER> Caller transaction code: (Caller Program: SAPJCo31)
SERVER> Called function module: SADT_REST_RFC_ENDPOINT
Error RFCIO_ERROR_SYSERROR in /bas/753_REL/src/krn/rfc/abrfcpic.c : 3838
CPIC-CALL: 'ThSAPCMRCV', communication rc: CM_DEALLOCATED_ABEND (cmRc=17), taskhandler rc: READ_FROM_GW_FAILED (thRc=239)
Error with SAP gateway communication; check if SAP gateway is closed
Error RFCIO_ERROR_MESSAGE in /bas/753_REL/src/krn/rfc/abrfcio.c : 1985
Then I also looked into the Gateway trace and found a curious bit of info that seems to indicate that it's disconnecting its own public ("P") and local (L) network interfaces from each other:
***LOG Q0R=> GwReadFromRemGw, GwRead ( GwRead-006) [gwdp.c 4843]
Sat Mar 31 21:25:04:466 2018
***LOG Q0I=> NiIRead: P=51.136.32.17:3300; L=10.0.0.159:31255: recv (110: Connection timed out) [/bas/753_REL/src/base/ni/nixxi.cpp 5430]
*** ERROR => NiIRead: SiRecv failed for hdl 223/sock 27
(SI_ECONN_BROKEN/110; I4; ST; P=51.136.32.17:3300; L=10.0.0.159:31255) [nixxi.cpp 5430]
***LOG S23=> GwDisconnectClient, client disconnected (237) [gwxxrd.c 11191]
GwDisconnectClient: client 237 disconnected, hostname = <my_public_hostname>.westeurope.cloudapp.azure.com, addr = 51.136.32.17, tp = sapdp00
*****************************************************************************
*
* LOCATION SAP-Gateway on host vhcalnplci.dummy.nodomain / sapgw00
* ERROR connection to partner
* '<my_public_hostname>.westeurope.cloudapp.azure.com:sapgw00'
* broken
*
* TIME Sat Mar 31 21:25:04 2018
* RELEASE 753
* COMPONENT NI (network interface)
* VERSION 40
* RC -6
* MODULE /bas/753_REL/src/base/ni/nixxi.cpp
* LINE 5430
* DETAIL NiIRead: P=51.136.32.17:3300; L=10.0.0.159:31255
* SYSTEM CALL recv
* ERRNO 110
* ERRNO TEXT Connection timed out
* COUNTER 499
*
*****************************************************************************
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Mike,
thanks for raising this issue. We recently faced this problem ourselves and will investigate it as soon as possible. Thanks for the already detailed error messages these help a lot.
We will keep you updated.
Regards, Felix
Just a short update: This is still in progress. Sorry that I have no better news for now...
In case this issue becomes serious on your side please feel free to create a customer incident with an appropriate priority (BC-DWB-AIE).
Hi Armin,
Thanks for the update, good to know it's not dead.
It's on my personal developer instances so I can't raise incidents.. P-users can't even see notes, let alone raise messages.
For anyone having this issue, my workarounds are: most non-coding and all readonly activities are fine on a local Eclipse. Everything is fine in my local SAPGUI. I can also work on eclipse using Remote Desktop to a Windows instance. The main impacts are screen resolution and small extra cost of running an additional instance.
Thanks,
Mike
Just letting you know Mike is not the only one with this kind of problem. And it is confirmed for another platform. I use Eclipse Oxygen on Ubuntu 16.04 LTS (Dell XPS 13 Developer Edition (9360) with mostly 'standard' software).
I have observed the same behaviour as Mike where I can remove the lock and successfully save. I have however also observed another problem which seems to be closely related. I am connected to a SAP system (in Azure) where the response times are generally *not* very good. Sometimes I get a lock error when there are no locks in SM12. In the error message it is indicated that the SAP server did not respond to the lock request. This occurs when I make a change in saved source code (I can't recall if I have seen the problem both with activated and inactive code).
However, the object is locked when I check SM12 after I have received the error message, so it seems the request to lock the object is processed successfully and the 'only' problem is that the response did not get back to eclipse in time.
Deleting the lock entry does not help in these cases, I have to exit eclipse (without being able to save the changes), restart eclipse and reimplement the changes. As you can probably imagine, this is incredibly frustrating.
Good luck with your efforts to squash all bugs related to this 🙂
Interesting you mention Azure response times. I used AWS before, also via local Eclipse. I switched to Azure because it's so much nicer to work with and have also noticed a somewhat sluggish response time.
Unfortunately I can't tell if it's the platform as I went from 7.2 on AWS to 7.4 on Azure with a totally different machine spec. When I remote desktop to the Windows VM and access SAP that way it seems fine, but then sluggish responses may have to do with this issue.
Computers controlled by someone else should never be trusted, but unfortunately it's not my decision to make 🙂
Update: I have discovered that if I just continue waiting, a lock entry will eventually appear when I get the error message I was previously not able to escape from without restarting eclipse. So apparently the difference between what you observe and what I observe is just due to the sluggishness of the SAP system I am connected to. When I wait for the lock entry and delete it manually I am able to save my changes. Still annoying of course, but at least I no longer have to have a text editor open to temporarily store the changes.
Hi Kjetil and Mike,
thanks a lot for the additional input. We found a way to reliably reproduce the issue in our own dev systems and we made progress on the analysis. Since it's not just ADT we have to involve another software layer (JCo/RFC) in the analysis which might cost some additional time, but we are getting closer.
I'll provide an update when it's fixed.
Best regards,
Armin
Hi Mike,
I could imagine, that the backend parameters are just set in a wrong way.
Here's the link to the backend-configuration guide (Section More information).
https://tools.hana.ondemand.com/#abap
I would suggest you start first to check on those and see if it will fix the issue.
~Florian
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Florian,
Thanks for the input. I doubt it, but checked it anyway and found no discrepancies. Those are all "either it works or it doesn't" settings.
This happens intermittently on two Azure instances, a 7.5 and 7.52 system.
SAPGUI (even inside Eclipse) is rock solid
Running eclipse on a cloud instance is also no problem. The usual deployment setup is a Windows VM alongside a linux VM running SAP in a single virtual network. If I remote-desktop into the Windows VM, Eclipse also works flawlessly.
Over the internet it's as if ADT 'forgets' about the connection and then re-establishes a new session, because it manages to set a lock, which suddenly turns foreign.
Also, this used to work, I've been running this kind of setup for a long time without a hitch, it's in recent months that I've been having this issue. I recently deployed a new instance and when the same thing happened with that one I finally got around to posting this.
Over the internet it's as if ADT 'forgets' about the connection and then re-establishes a new session, because it manages to set a lock, which suddenly turns foreign.
I've experienced exactly this, connecting to a non-cloud instance via a VPN but with a shaky connection. Once the ADT/Instance connection gets confused, the only way I got things working again was to restart Eclipse. It's strange though, as I've only encountered this one time. It's like it only happens under very specific circumstances.
| User | Count |
|---|---|
| 9 | |
| 5 | |
| 4 | |
| 4 | |
| 3 | |
| 3 | |
| 2 | |
| 2 | |
| 2 | |
| 2 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.