Application Development and Automation Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Production issue

former_member683747
Participant
0 Likes
1,699

Hello,

If I receive a support issue which is to be solved in 2 hours or so. How can I go about solving it smartly. Can someone share their approach here.

4 REPLIES 4
Read only

joltdx
Active Contributor
1,596

Well, basically:

1. Find out what's wrong

2. Fix it

If it takes 2 hours or not is of course depending both on what's wrong and how to fix it. Best is of course to make a proper fix right away, even if that should take a day or a week. But depending on the severity and the business impact, one option is to deploy a quick fix into production, if possible, to make the business processes run again. But make sure then to be guaranteed to also make the proper fix as soon as possible.

Be responsible. Do the right thing.

Read only

Sandra_Rossi
Active Contributor
0 Likes
1,596

I believe that all these interview questions are there to see how you analyze the situation, not what you know. Because nobody can give a clear answer. First analyze the issue, ask if there's missing information, fix it yourself if you can, ask someone else if you can't. I'm pretty sure you were able to answer that.

Read only

amontella96
Active Contributor
1,596

Hi priya1221

i would like to extend sandra.rossi 's answer with:

- keep the communication open with the incident owner

- if you cannot fix the issue AND someone else didn't (yet), you need to be proactive and start researching about the issue from any sources (internet/google intranet/sharepoints)

- ask updates to the people you have involved

lastly, cycle between the described steps with some sort of frequency

PS.: if it's a technical issue you might find inspiration studying RCA with SolMan

Good luck!A

Read only

Colleen
Product and Topic Expert
Product and Topic Expert
1,596

extending on Sandra's comment here

SLAs are an agreement with the customer. If you are the incident responder your focus is on resolving the incident (and goes without saying to the best of your abilities). To help do this

  1. Be familiar with your company's incident response process in particular around decisions makers and communication. You want to avoid confusion as this takes away valuable time. You also want avoid unnecessary communication such as meeting - sometimes meetings are held as the reaction to a problem to get everyone on the same page. I've seen a L3 support person reject a meeting request with this polite commentary - they felt there would be a better outcome to familiarize themselves with the history instead of sitting on a management call. It's a tricky balance and sometimes you don't get a say
  2. Know your responsibility in the incident and how you can add value. Whatever your responsibility is: stick to it. The team works well when everyone knows their role. If there is a gap (Even if you can help) check with the person in charge if you should. There may be a bigger picture that you are unaware or but at the same time don't stay silent if you think there is a gap.
  3. Know who you escalate to or provide updates to
  4. Understand basic triage approach to the incident to help localise the problem for fast resolution
  5. Look for opportunities to down grade it to buy more time: 2 hours generally means it's a critical impact to the business and with wide user impact. Can you find a workaround solution that buys more time to find root cause and solution. Doing this gets the users back up and running (their primary goal here) and then allows management to negotiate
  6. Draw a line between incident investigation and root cause. Root cause analysis comes later. as part of incident investigation you may identify human error, misconfigurations, etc. Use that information to help fix it but avoid any "why did this happen" conversations until the issue is resolved. I've been on critical incidents where project teams have prioritized sifting 6 months of email to prove it wasn't them - it contributed nothing to resolving the situation at hand and could have been done the following day.
  7. Fresh eyes on the incident - read the notes in the incident but don't assume the analysis. You may need tor repeat some of the validations already done by another level
  8. Good documentation - provide useful and relevant information to the incident to make life easier for the next processor or to refer to in future. It is sooo frustrating when you have an incident and find an exact one from 6 months earlier. You open it up and there is nothing useful in it. Missed opportunity to leverage from experience.
  9. Ask useful/smart questions: When did you first notice it; Are you aware of any recent changes; Is it just happening to you are are you colleagues having the same problem; What are you observing. what are the steps you followed (so you can attempt to replicate it in a test user)... these questions all depend on the situation but the regular user of the functionality can provide really insightful information
  10. Look into the ticket system for similiar reported issue. The problem may have occurred previously and some champion colleague was helpful to provide useful information to resolve.

Good luck with learning it all. If you have an interview, all the best for success