cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Cloud Foundry app autoscaler service : Not scaling up

sreehari_vpillai
Active Contributor
0 Likes
2,366

Hi ,

I have the below configuration for the auto scaler service . Memory consumption of the instance is more than 200 MB for more than 60 seconds. But its not creating another instance.

{
    "instance_min_count": 1,
    "instance_max_count": 3,
    "scaling_rules": [{
        "metric_type": "cpu",
        "threshold": 70,
        "stat_window_secs": 60,
        "breach_duration_secs": 60,
        "cool_down_secs": 60,
        "operator": ">",
        "adjustment": "+1"
    }, {
        "metric_type": "cpu",
        "threshold": 70,
        "stat_window_secs": 60,
        "breach_duration_secs": 60,
        "cool_down_secs": 60,
        "operator": "<=",
        "adjustment": "-1"
    }, {
        "metric_type": "memoryused",
        "stat_window_secs": 60,
        "breach_duration_secs": 60,
        "cool_down_secs": 60,
        "threshold": 200,
        "operator": ">=",
        "adjustment": "+1"
    }, {
        "metric_type": "memoryused",
        "threshold": 150,
        "stat_window_secs": 60,
        "breach_duration_secs": 60,
        "cool_down_secs": 60,
        "operator": "<",
        "adjustment": "-1"
    }]
}

( code embed doesn't work as expected ,hence the sc )

But If I remove the metric_type : cpu ( first 2 blocks of scaling_rules ) , the the application scales properly . What am I doing wrong ?

My expectation is : Autoscaler must create an instance when CPU consumption is more than 70% OR memory used is more than 200 MB for past 60 seconds.

Sreehari

Accepted Solutions (1)

Accepted Solutions (1)

Ivan-Mirisola
Product and Topic Expert
Product and Topic Expert

Hi sreehari.vpillai,

My colleague Muhammad Arsalan Khan states that the way the AutoScaller service works is that as each of the rules are evaluated, when returned as true, the system will stop evaluating other rules. So it is a good practice to keep your most important rules listed first.

A simpler way would be to use a different metric altogether which would also represent your app's load. You could think of the 'responsetime' metric as being a good measurement of application load. The more load you put into your application, implicitly it would also mean more CPU and Memory being consumed.

I believe what you are trying to achieve is the same rational used on load balancers - where they take both measures into account and perform a math calculation that should give a good indication to where the load should be directed to. Usually it is something around the lines of the following formula:

CustomLoad = (Weight_CPU * CPU_Load) + (Weight_Memory * Memory_Usage)

Where Weight could by adapted for your specific application usage. In case your application is very CPU intensive, you could use 7 as weight for CPU load and 3 as weight for memory. But that should be something you will have to reason on a case-by-case as it highly depends on how you develop your application and how you see load should be measured.

If this is your case, you could implement this by having it as a custom metric:

https://help.sap.com/docs/application-autoscaler/application-autoscaler/defining-custom-metric?local...

The following blog explains how you can achieve this:

https://blogs.sap.com/2023/03/28/scale-your-applications-using-custom-metrics-feature-of-application...

The blog doesn't clearly state how the metrics are collected and sent to the Autoscaler service from the application itself to make the blog simpler to understand. Instead it simply sends a "fake" metric - just to test the autoscalling.

But this is not very difficult to achieve. All you have to do is to periodically gather your app's metrics for CPU and Memory, perform the calculation as described above and make a rest call to the Autoscaller api sending that data as JSON payload. For example, if you are using Java and Spring Boot, you could use the Observer class like described in the following documentation:

https://spring.io/blog/2022/10/12/observability-with-spring-boot-3

Hope this helps.

Best regards,
Ivan

sreehari_vpillai
Active Contributor
0 Likes

well answered and clear . I was in an understanding that all the rules will be evaluated with an OR condition . Reason for such a thought was , I set a throughput rule after removing the cpu from the rules . And upon memory spike , it created an instance and as the number if requests where less than what I set , it deleted an instance later .
now I am clear about it . Thanks a lot , really appreciate writing such a lengthy explanation .

sreehari_vpillai
Active Contributor
0 Likes

+ i will write a blog on this - deriving custom metrics and defining the rules .

Ivan-Mirisola
Product and Topic Expert
Product and Topic Expert
0 Likes

Hi sreehari.vpillai,

Just a small correction. My colleague Muhammad Arsalan Khan has stated that the first and top most rules when matched will be executed. Therefore, it is a good practice to keep your important rules at the top. So, I will correct the commend on my answer so people won't be measled into thinking that when both metrics are evaluated to true the autoscaller will work.

Best regards,
Ivan

arsalankhan2021
Product and Topic Expert
Product and Topic Expert

Thanks Ivan Mirisola for bringing up the custom metrics feature of Autoscaler. Also, the approach to determine the app's load is interesting.

sreehari_vpillai
Active Contributor
0 Likes

ivan.mirisola Shall I extend the original question with this ? I have a rule to upscale by one instance if the memory consumption is above 250MB. And to downscale by 1 if memory usage is below 180MB.
Server is now loaded and memory usage is now 280MB for 60 seconds. Autoscaler created an additional instance. Thats good news !

A user has sent a request to the server and the newly created instance picked it up. Its in progress(it takes 2 min to complete , but not memory intensive). By this time , the first instance became free and memory usage is 170MB ( for 60 seconds ) . Now its the time to autoscale by -1. What will happen to the running process in that instance ? will it crash ? or it has some special considerations before downscaling the instance ?

Ivan-Mirisola
Product and Topic Expert
Product and Topic Expert
0 Likes

Hi sreehari.vpillai,

The Autoscaler should not interfere with the way your application's stops gracefully. It will send a shutdown request to your application and once it has shutdown properly, its instance will be removed.

It is up to your as a developer to code your app to shutdown only after all processes have been executed and the application is in idle state. It all depends on how you code. For example: on Spring Boot 2.3 and up you must enable Graceful Shutdown it as it comes by default with Immediate Shutdown:

https://www.baeldung.com/spring-boot-web-server-shutdown

Best regards,
Ivan

Answers (1)

Answers (1)

arsalankhan2021
Product and Topic Expert
Product and Topic Expert

Hi Sreehari,

Autoscaler considers scaling rules in the same order as defined in the policy, i.e; the first and topmost rules that match will be executed.

Considering the given example above:

if scaling succeeds based on the CPU metric (first two rules), the memoryused metric will be skipped.
Also, Autoscaler recommends to put your most important scaling rules first.

Using the Application Autoscaler Dashboard, you can also monitor the scaling histories which helps you to troubleshoot and resolve issues if the scaling doesn’t match the defined policy.

Regards,
Arsalan

sreehari_vpillai
Active Contributor
0 Likes

I wasn’t clear about the logical AND applied on multiple conditions if mentioned .

BrijeshVyas
Newcomer
0 Likes
Hi Guys, I am not sure if any help is already out there or not. We are having issue when we use metric type througput/response time. SAP Backend executes program and sends muiltiple transactions at once using Threading. So we defined Autoscaler to increase instance by 1 if CPU is above 80 for given period of time(Works Fine) and then metric to remove additional instance if we do not have response for more than say 5-10 mins. It removes the instance even though we have jobs running in backend and teminates all backend transaction.