3Scale 2.12 monitoring – Definitive walkthrough guide!

In this lab we’ll make use of 3Scale 2.12 as our API Manager solution, and how it integrates easily with the monitoring stack provided by OpenShift from Red Hat. We will focus on 3Scale passive and active monitoring with the help of Alert Manager from Prometheus.
This demo should also work without changes with 3Scale 2.11.

OpenShift container platform monitoring overview

Prerequisites

For this exercise, we’ll need:

  • a valid Red Hat 3Scale subscription and registry.redhat.io pull secret
  • cluster-admin role with an OpenShift 4.10 environment
  • 3Scale 2.11 or later already installed
  • 3Scale APICast deployed into additional namespaces (optional)
  • a good cup of coffee or latte!

Export variables

Set all environment variables that we’ll be using during this lab.

export OPENSHIFT_CONSOLE_PATH=console-openshift-console
export WILDCARD_DOMAIN=$(oc get route console -n openshift-console | awk '{ getline; print $2; }' | sed 's/'$OPENSHIFT_CONSOLE_PATH'\.//g'); echo $WILDCARD_DOMAIN
export MONITORING_NS=openshift-monitoring
export USER_WORKLOAD_MONITORING_NS=openshift-user-workload-monitoring
export MANAGER_MONITORING_NS=3scale-monitoring
export MANAGER_NS=3scale
export MANAGER_NAME=$(oc get APIManager -n $MANAGER_NS | awk 'NR>1{print $1}')
export APICAST_NSS=("3scale-apicast-intranet" "3scale-apicast-internet" "3scale-apicast-mobile")

Notice that we will be using three different namespaces to deploy adittional APICast instances that can serve as dedicated proxies to meet different application needs. You can edit these namespaces if you want.

OpenShift login

Access your OpenShift environment at the terminal. Example:

export USERNAME=john ; oc login --username=$USERNAME --server=https://api.cluster-xyz.xyz.domain.com:6443

Monitoring namespace provisioning

Create all namespaces that we’ll be using:

oc new-project $MANAGER_MONITORING_NS --description="3scale monitoring stack" --display-name=$MANAGER_MONITORING_NS
oc label namespace $MANAGER_MONITORING_NS node.k8s.redhat/service=3scale-monitoring
oc label namespace $MANAGER_MONITORING_NS node.k8s.redhat/site=north
oc label namespace $MANAGER_MONITORING_NS monitoring-key=middleware

In this lab, I chose to apply some labels into $MANAGER_MONITORING_NS:
node.k8s.redhat/service and node.k8s.redhat/site.
For monitoring purposes, I’ll be using monitoring-key key and middleware as value to aid some monitoring requirements that we’ll use later in this exercise.

Enable user workload monitoring

To enable 3Scale Prometheus alert rules, we need to create a cluster-monitoring-config configmap inside openshift-monitoring namespace. You can get more information and detailed configuration at OpenShift documentation.

First, let’s configure the cluster-monitoring-config config-map to enable OpenShift to monitor all user workload. This can be achieved with the following code:

echo '\nhttps://'console.apps.$WILDCARD_DOMAIN/k8s/ns/openshift-images/configmaps/cluster-monitoring-config'\n'

If you don’t see this config-map you’ll need to create a new one.

cat > openshift-cluster-monitoring-configmap.yaml <<EOL
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    enableUserWorkload: true
    prometheusOperator:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
    prometheusK8s:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
    alertmanagerMain:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
    kubeStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
    openshiftStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
    grafana:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
    telemeterClient:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
    k8sPrometheusAdapter:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
    thanosQuerier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
EOL
oc apply -f openshift-cluster-monitoring-configmap.yaml

Or, you can just update it adding the key:

enableUserWorkload=true

note: When you save your changes to the cluster-monitoring-config ConfigMap object, some pods in the openshift-monitoring project might be redeployed. It can sometimes take a while for these components to redeploy.

note: If you have any issues with PODs trying to schedule into OpenShift nodes, you might want to disable all tolerations entries from the cluster-monitoring-config config-map. You might also schedule all PODs into worker and not infra nodes.
Ensure that all PODs are running before continuing with this lab.

watch oc get po -n openshift-monitoring

Expose metrics endpoint for 3Scale components

By default, these components does not expose the /metrics endpoints. The following steps are required to proper monitor them with Prometheus and Grafana.

oc patch svc/system-developer -p '{"spec":{"ports":[{"name":"metrics","port":9394,"protocol":"TCP","targetPort":9394}]}}' -n $MANAGER_NS
oc patch svc/system-master -p '{"spec":{"ports":[{"name":"metrics","port":9395,"protocol":"TCP","targetPort":9395}]}}' -n $MANAGER_NS
oc patch svc/system-provider -p '{"spec":{"ports":[{"name":"metrics","port":9396,"protocol":"TCP","targetPort":9396}]}}' -n $MANAGER_NS
oc expose dc system-sidekiq --name=metrics-system-sidekiq --port=9394 -n $MANAGER_NS
oc patch svc/metrics-system-sidekiq -p '{"spec":{"ports":[{"name":"metrics","port":9394,"protocol":"TCP","targetPort":"metrics"}]}}' -n $MANAGER_NS

After that, you might want to test those endpoints by giving an http call. For example, go over the terminal and hit the following command to check system-developer container Prometheus metrics:

SYSTEM_APP_POD=$(oc get po -l deploymentconfig=system-app -n $MANAGER_NS | awk 'NR==2 {print $1}') ; oc exec $SYSTEM_APP_POD -n $MANAGER_NS -c system-developer -- curl -k -vvv http://system-developer:9394/metrics

Expose metrics endpoint for 3Scale APICast component

By default, Prometheus can monitor all metrics as stated in section 9.2 Prometheus Default Metrics for 3scale APIcast

note: If a metric dont have any current value, it can be ommited and/or Prometheus would not show any information.
Example:
if the metric nginx_error_log doesn’t not have any errors to be reported, Prmetheus reserves the right to not display it.

APICast OLM managed

# including 3scale namespace also, as it holds APICast instances too
export APICAST_NSS=("3scale" "3scale-apicast-intranet" "3scale-apicast-internet" "3scale-apicast-mobile")

echo "Updating APICast services. This operation could take a while. \n---------------------------------------------------------------"; \
for x in "${APICAST_NSS[@]}"; do \
if [ "$x" = "$MANAGER_NS" ]; then \
echo ":: 3Scale default APICast"
STG_GTW=$(oc get dc -l threescale_component=apicast -l threescale_component_element=staging -n $x | awk 'NR==2 {print $1}'); \
oc set env dc/$STG_GTW APICAST_EXTENDED_METRICS=true -n $x; \
PRD_GTW=$(oc get dc -l threescale_component=apicast -l threescale_component_element=production -n $x | awk 'NR==2 {print $1}'); \
oc set env dc/$PRD_GTW APICAST_EXTENDED_METRICS=true -n $x; \
sleep 2; \
oc delete svc metrics-staging-apicast -n $x; \
oc delete svc metrics-production-apicast -n $x; \
oc expose dc/apicast-staging --name=metrics-staging-apicast --port=9421 -n $x; \
oc expose dc/apicast-production --name=metrics-production-apicast --port=9421 -n $x; \
else
echo ":: 3Scale APICast at $x namespace"
oc label namespace $x monitoring=apicast; \
oc patch APICast/staging-apicast --type merge -p '{"spec":{"extendedMetrics":true}}' -n $x; \
oc patch APICast/production-apicast --type merge -p '{"spec":{"extendedMetrics":true}}' -n $x; \
sleep 2; \
oc delete svc metrics-staging-apicast -n $x; \
oc delete svc metrics-production-apicast -n $x; \
oc expose deployment apicast-staging-apicast --name=metrics-staging-apicast --port=9421 -n $x; \
oc expose deployment apicast-production-apicast --name=metrics-production-apicast --port=9421 -n $x; \
fi
sleep 2; \
oc label svc metrics-staging-apicast monitoring=apicast -n $x; \
oc patch svc/metrics-staging-apicast -p '{"spec":{"ports":[{"name":"metrics","port":9421,"protocol":"TCP","targetPort":"metrics"}]}}' -n $x; \
oc label svc metrics-production-apicast monitoring=apicast -n $x; \
oc patch svc/metrics-production-apicast -p '{"spec":{"ports":[{"name":"metrics","port":9421,"protocol":"TCP","targetPort":"metrics"}]}}' -n $x; \
echo "$x APICast instances successfully updated! \n\n"; \
done

The APICAST_EXTENDED_METRICS has a boolean value that enables the exposure of additional metrics to Prometheus. The following metrics uses the labels service_id and service_system_name to provide additional information from APICast:

  • total_response_time_seconds
  • upstream_response_time_seconds
  • upstream_status

Deploy a ServiceMonitor Custom Resource for external APICast namespaces

# remove 3scale as it already has a service monitor
export APICAST_NSS=("3scale-apicast-intranet" "3scale-apicast-internet" "3scale-apicast-mobile")

echo "Deploying APICast ServiceMonitor. This operation could take a while. \n---------------------------------------------------------------"; \
for x in "${APICAST_NSS[@]}"; do \
oc delete servicemonitor prometheus-apicast-monitor -n $x; \
cat > apicast-service-monitor.yaml <<EOL
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: prometheus-apicast-monitor
  name: prometheus-apicast-monitor
  namespace: $x
spec:
  namespaceSelector:
    matchNames:
    - $x
  selector:
    matchLabels:
      app: apicast
      monitoring: apicast
  endpoints:
  - interval: 5s
    path: /metrics
    port: metrics
    scheme: http
EOL
oc apply -f apicast-service-monitor.yaml -n $x; \
echo "$x ServiceMonitor successfully deployed! \n\n"; \
done

Optional: Test APICast metrics endpoint

If you want to try the metrics endpoint, just hit the following block at the terminal:

# including 3scale namespace also, as it holds APICast instances too
export APICAST_NSS=("3scale" "3scale-apicast-intranet" "3scale-apicast-internet" "3scale-apicast-mobile")

echo "Testing APICast Prometheus endpoints. This operation could take a while. \n------------------------------------------------------------------------"; \
for x in "${APICAST_NSS[@]}"; do \
echo ":: $x APICast (env=staging)"; \
APICAST_STG_POD=$(oc get po -l deploymentconfig=apicast-staging -n $x | awk 'NR==2 {print $1}') ; oc exec $APICAST_STG_POD -n $x -c apicast-staging -- curl -k -vvv http://localhost:9421/metrics
echo "\n"; \
echo ":: $x APICast (env=production)"; \
APICAST_PRD_POD=$(oc get po -l deploymentconfig=apicast-production -n $x | awk 'NR==2 {print $1}') ; oc exec $APICAST_PRD_POD -n $x -c apicast-production -- curl -k -vvv http://localhost:9421/metrics
echo "\n"; \
done
echo "End of execution."

Deploy Grafana

Grafana can be provisioned in the same namespace as 3Scale or in a separate namespace.
The main difference is that when provisioned in the same 3Scale namespace (Kind:APIManager) the 3Scale operator itself also performs the deployment of all 3Scale Grafana Dashboards, if there is already a previously installed Grafana instance.
Other users might choose to have the monitoring stack in another namespace for security purposes. In this scenario, the deployment of 3Scale dashboards must be performed manually. In fact, it is not possible to have the 3Scale operator deplopying Grafana Dashboards in another namespace at this moment.
For this exercise we will deploy Grafana into $MANAGER_MONITORING_NS namespace.

Deploying Grafana in a dedicated namespace

Navigate to $MANAGER_MONITORING_NS and open Operator Hub tab with the Administrator view, and searc’h for grafana operator.

Proceed with the oprator installation as follows:

PS. remember to point the installation to $MANAGER_MONITORING_NS.
Then, wait for the operator fully deployment:

Deploy Grafana application

With the operator previously installed, review the following yaml file, and proceed with the Grafana deployment.

cat > 3scale-grafana-resource.yaml <<EOL
apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
  name: apm-grafana
spec:
  client:
    preferService: false
    timeout: 10
  config:
    auth:
      disable_login_form: false
      disable_signout_menu: true
    auth.anonymous:
      enabled: true
    log:
      level: warn
      mode: console
    security:
      admin_password: secret
      admin_user: root
  dashboardLabelSelector:
    - matchExpressions:
        - key: app
          operator: In
          values:
            - 3scale-api-management
  ingress:
    enabled: false
  service:
    labels:
      app: grafana
      type: grafana-service
    name: grafana-service
EOL

note: The Grafana admin credentials will be root and secret as defined in the previous yaml file.

note: Grafana will only match and monitor Dashboards that uses the label app with 3scale-api-management value.

Execute the following to deploy Grafana:

oc create -f 3scale-grafana-resource.yaml -n $MANAGER_MONITORING_NS

oc create route edge grafana --service=grafana-service --port=grafana --hostname=grafana-3scale-monitoring.apps.$WILDCARD_DOMAIN --insecure-policy=Allow --wildcard-policy=None -n $MANAGER_MONITORING_NS

oc describe grafana apm-grafana -n $MANAGER_MONITORING_NS

After that, you can watch for the Grafana deployment at Grafana tab in the operator:

Update Grafana service account with OpenShift roles

The Grafana service account needs cluster authorization to view objects.

oc get sa -n $MANAGER_MONITORING_NS ; oc adm policy add-cluster-role-to-user cluster-monitoring-view -z grafana-serviceaccount -n $MANAGER_MONITORING_NS

Create a new DataSource

Acquire the Grafana service account token to be used at Grafana DS custom resource:

oc serviceaccounts get-token grafana-serviceaccount -n $MANAGER_MONITORING_NS ; GRAFANA_TOKEN=$(oc serviceaccounts get-token grafana-serviceaccount -n $MANAGER_MONITORING_NS)

note: you can use BASIC authentication as well, but it less secure.

Create and deploy GrafanaDataSource custom resource

cat > 3scale-grafana-ds.yaml <<EOL
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDataSource
metadata:
  name: prometheus
  namespace: $MANAGER_MONITORING_NS
spec:
  datasources:
    - access: proxy
      editable: true
      isDefault: true
      jsonData:
        httpHeaderName1: Authorization
        timeInterval: 5s
        tlsSkipVerify: true
      name: Prometheus
      secureJsonData:
        httpHeaderValue1: >-
          Bearer
          $GRAFANA_TOKEN
      type: prometheus
      url: 'https://thanos-querier.openshift-monitoring.svc.cluster.local:9091'
  name: middleware
EOL
oc apply -f 3scale-grafana-ds.yaml -n $MANAGER_MONITORING_NS

note: The url from thanos-querier should be the same for all OpenShift 4.10 clusters. It is also possible to use the external route from thanos querier if the internal one is not an option. You can acquire the external route using the following output:

echo "https://$(oc get route thanos-querier -n openshift-monitoring | awk 'NR==2 {print $2}')/api"

Enable 3Scale metrics

Now that we have Grafana already deployed and running, lets continue enabling 3Scale metrics. This task will flag the 3Scale operator to automatically deploy and install all provided Dashboards. The APIManager custom resource has 2 attributes:

  • monitoring.enabled
  • monitoring.enablePrometheusRules

The first will install all Grafana Dashboards, and the second will create all alerts that will be managed by Prometheus Alert Manager application. As we will not be using the 3Scale namespace for monitoring all Dashboards, we need to re-import them to $MANAGER_MONITORING_NS.

oc get servicemonitor -n $MANAGER_NS ; cat > 3scale-monitoring.yaml <<EOL
apiVersion: apps.3scale.net/v1alpha1
kind: APIManager
metadata:
  name: $MANAGER_NAME
spec:
  wildcardDomain: apps.$WILDCARD_DOMAIN
  monitoring:
    enabled: true
    enablePrometheusRules: true
EOL
oc apply -f 3scale-monitoring.yaml -n $MANAGER_NS ; watch -n1 oc get GrafanaDashboard -n $MANAGER_NS

You should see all created resources in the terminal.

Import Grafana Dashboards to a custom monitoring namespace

As we will use another namespace for monitoring resources, we will need to get the Dashboards from $MANAGER_NS and create them into $MANAGER_MONITORING_NS:

export GRAFANA_DASHBOARDS=($(oc get GrafanaDashboard -n $MANAGER_NS | awk 'NR>1 {print $1}' | sed ':a; N; $!ba; s/\n/,/g' | tr ',' '\n'));

echo "Importing 3Scale Grafana Dashboards. This operation could take a while. \n-----------------------------------------------------------------------"; \
for x in "${GRAFANA_DASHBOARDS[@]}"; do \
oc get GrafanaDashboard $x -n $MANAGER_NS -o yaml | sed 's/namespace: '"$MANAGER_NS"'/namespace: '"$MANAGER_MONITORING_NS"'/g' > ${x}_to-import.yaml; \
sed -i '/  generation:/d' ${x}_to-import.yaml; \
sed -i '/  resourceVersion:/d' ${x}_to-import.yaml; \
sed -i '/  uid:/d' ${x}_to-import.yaml; \
sed -i '/  creationTimestamp:/d' ${x}_to-import.yaml; \
sed -i '/^  ownerReferences:/{N;N;N;N;N;d}' ${x}_to-import.yaml; \
oc delete GrafanaDashboard $x -n $MANAGER_MONITORING_NS; \
sleep 1; \
oc create -f ${x}_to-import.yaml; \
rm -fr ${x}_to-import.yaml; \
echo "$x GrafanaDashboard successfully imported! \n"; \
done

Now, all Dashboards are properly configured into the right namespace:

oc get GrafanaDashboard -n $MANAGER_MONITORING_NS

Open Grafana console

Acquire Grafana route:

GRAFANA_URL=$(oc get routes -n $MANAGER_MONITORING_NS | grep grafana | awk '{print $2}') ; echo '\nhttps://'$GRAFANA_URL'\n'

Then, go over Manager menu item and check all dashboards:

note: To monitor external APICasts instances as defined in $APICAST_NSS, you’ll need to manually add those namespaces in GrafanaDashboards Custom Resources as 3Scale only monitors the standard APICast instances. You should find those entries and change them from:

..
"options": [{
  "selected": true,
  "text": "3scale",
  "value": "3scale"
}]
..

to something like:

..
"options": [{
  "selected": true,
  "text": "3scale",
  "value": "3scale"
},{
  "selected": false,
  "text": "",
  "value": "3scale-apicast-intranet"
},{
  "selected": false,
  "text": "",
  "value": "3scale-apicast-internet"
},{
  "selected": false,
  "text": "",
  "value": "3scale-apicast-mobile"
}]
..

This affects apicast-mainapp and apicast-services Dashboards.

Now, Grafana can be used to monitor all provided metrics:

Also note that It is ok to have reconciling status at Grafana operator:

Working with Prometheus Alert-Manager

Navigating through Developer view at OpenShift, it is possible to see all custom 3Scale alerts that was created:





These alerts are managed by 3Scale operator (threescale-operator-controller-manager-v2). These alerts could also change over time with future 3Scale releases.

If you can’t see those alerts at OpenShift, you’ll need to force 3Scale operator to re-deploy them, applying false and then true again for monitoring.enablePrometheusRules attribute at APIManager CR:

Execute this first:

oc get servicemonitor -n $MANAGER_NS ; cat > 3scale-monitoring.yaml <<EOL
apiVersion: apps.3scale.net/v1alpha1
kind: APIManager
metadata:
  name: $MANAGER_NAME
spec:
  wildcardDomain: apps.$WILDCARD_DOMAIN
  monitoring:
    enabled: true
    enablePrometheusRules: false
EOL
oc apply -f 3scale-monitoring.yaml -n $MANAGER_NS

and then:

oc get servicemonitor -n $MANAGER_NS ; cat > 3scale-monitoring.yaml <<EOL
apiVersion: apps.3scale.net/v1alpha1
kind: APIManager
metadata:
  name: $MANAGER_NAME
spec:
  wildcardDomain: apps.$WILDCARD_DOMAIN
  monitoring:
    enabled: true
    enablePrometheusRules: true
EOL
oc apply -f 3scale-monitoring.yaml -n $MANAGER_NS

Send alerts to a destination

With Alert Manager it is possible to configure receivers with specific routing labels with alerting priorities to be sent to a destination. You can configure WARNING alert types to be sent to an email account and CRITICAL alerts to be sent to an Slack account for example.

To configure a email receiver using Gmail from Google,export the following variables:

export SMTP_HOST="smtp.gmail.com:587"
export SMTP_MAIL_FROM="changeme@gmail.com"
export SMTP_MAIL_USER="changeme@gmail.com"
# set a gmail app password here
export SMTP_MAIL_PASS="changeme"

PS. remember to replace with your own gmail/smtp account accordingly. You can create a gmail app password here.

First create a backup of your current alertmanager-main secret:

oc get secret alertmanager-main -n openshift-monitoring -o yaml > alertmanager-main.backup.yaml

Then, execute the following to configure the SMTP integration:

oc get secret alertmanager-main -n openshift-monitoring -o go-template='{{index .data "alertmanager.yaml"}}' | base64 -d > config.out

sed -i '/^\"global\"\:/a \  "smtp_require_tls": true' config.out ; cat config.out
sed -i '/^\"global\"\:/a \  "smtp_smarthost": "'$SMTP_HOST'"' config.out ; cat config.out
sed -i '/^\"global\"\:/a \  "smtp_hello": HOSTNAME' config.out ; cat config.out
sed -i '/^\"global\"\:/a \  "smtp_from": "'$SMTP_MAIL_FROM'"' config.out ; cat config.out
sed -i '/^\"global\"\:/a \  "smtp_auth_username": "'$SMTP_MAIL_FROM'"' config.out ; cat config.out
sed -i '/^\"global\"\:/a \  "smtp_auth_identity": "'$SMTP_MAIL_FROM'"' config.out ; cat config.out
sed -i '/^\"global\"\:/a \  "smtp_auth_password": "'$SMTP_MAIL_PASS'"' config.out ; cat config.out
sed -i '/^\"global\"\:/a \  "smtp_auth_secret": "'$SMTP_MAIL_PASS'"' config.out ; cat config.out

export ALERTMANAGER_CONFIG=$(cat config.out | base64) ; export ALERTMANAGER_CONFIG=$(echo "${ALERTMANAGER_CONFIG//[$'\t\r\n ']}") ; echo $ALERTMANAGER_CONFIG

oc delete secret alertmanager-main -n openshift-monitoring ; cat > alertmanager-main.yaml <<EOL
apiVersion: v1
data:
  alertmanager.yaml: $ALERTMANAGER_CONFIG
kind: Secret
metadata:
  labels:
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/instance: main
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: openshift-monitoring
  name: alertmanager-main
  namespace: openshift-monitoring
type: Opaque
EOL
oc create -f alertmanager-main.yaml -n openshift-monitoring

Then, force the reload of configurations:

oc delete po -l alertmanager=main -n openshift-monitoring

check if configuration was loaded successfully:

oc logs alertmanager-main-0 -c alertmanager -f -n openshift-monitoring

It should print something like:

>>> > level=info .. component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config/alertmanager.yaml

At OpenShift you can create a new Email receiver using the previously defined configuration.

Then, select receiver type:

note: You can use alertname, namespace, pod and severity routing labels.

This is the final configuration of alertmanager-main secret that I’ve got as a valid example, working with the Default and 3scale-critical receivers:

global:
  smtp_auth_secret: *****
  smtp_auth_password: *****
  smtp_auth_identity: *****@gmail.com
  smtp_auth_username: *****@gmail.com
  smtp_from: *****@gmail.com
  smtp_hello: HOSTNAME
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_require_tls: true
  resolve_timeout: 5m
inhibit_rules:
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = critical
    target_matchers:
      - severity =~ warning|info
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = warning
    target_matchers:
      - severity = info
receivers:
  - name: 3scale-critical
    email_configs:
      - to: *****@gmail.com
        html: '3scale-critical receiver <br> {{ template "email.default.html" . }}'
  - name: Default
    email_configs:
      - to: *****@gmail.com
        html: 'openshift-default receiver <br> {{ template "email.default.html" . }}'
  - name: Watchdog
route:
  group_by:
    - namespace
  group_interval: 5m
  group_wait: 30s
  receiver: Default
  repeat_interval: 12h
  routes:
    - matchers:
        - alertname = Watchdog
      receiver: Watchdog
    - receiver: 3scale-critical
      match:
        namespace: 3scale
        severity: critical

Chaos testing 3Scale

Lets unplug things to test 3Scale alerts! (PrometheusRule Custom Resource).
In this example, I chose to shutdown 3Scale operator to simulate a crash at backend-redis and zync-que components:

oc scale deployment threescale-operator-controller-manager-v2 --replicas=0 -n $MANAGER_NS

oc scale deploymentconfig backend-redis --replicas=0 -n $MANAGER_NS

oc scale deploymentconfig zync-que --replicas=0 -n $MANAGER_NS

Then, after a few seconds, backend-worker POD will start to show CrashLoopBackOfferrors, and at Observe tab in Developer view, you should see the following alerts being fired:

and finally, the destination mailbox will start to receive all fired alerts:

Creating custom 3Scale alerts

Let’s say you have to create a custom alert for 3Scale. Better than that: I’ve come up with additional 3scale alerts that will complement the existing ones, so we can get additional coverage, mainly with APICast instances that are not the default ones.

So, lets’s trying adding those alerts to the existing stack:

oc delete PrometheusRule threescale-custom-alerts -n $MANAGER_NS ; cat > threescale-custom-alerts.yaml <<EOL
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app: 3scale-api-management
    prometheus: application-monitoring
    role: alert-rules
  name: threescale-custom-alerts
  namespace: $MANAGER_NS
spec:
  groups:
  - name: 3scale/custom.rules
    rules:
    - alert: ThreescaleApicastHttpAuthzErrorRate
      annotations:
        description: The number of request with 401 and 403 is bigger than the 5% of total
          request.
        summary: APICast high HTTP 4XX error rate (instance {{ $labels.instance }})
      expr: sum(rate(apicast_status{namespace='$MANAGER_NS', status=~"401|403"}[1m])) / sum(rate(apicast_status{namespace='$MANAGER_NS'}[1m])) * 100 > 5
      for: 5m
      labels:
        severity: warning
    - alert: ThreescaleAPIManagerOperatorJobDown
      annotations:
        description: Job {{ $labels.job }} on {{ $labels.namespace }} is DOWN
        summary: Job {{ $labels.job }} on {{ $labels.namespace }} is DOWN
      expr: up{job="threescale-operator-controller-manager-metrics-service",namespace="$MANAGER_NS"} == 0
      for: 1m
      labels:
        severity: critical
    - alert: ThreescaleBackendListener5XXRequestsHigh(2500)
      annotations:
        description: Job {{ $labels.job }} on {{ $labels.namespace }} has more than
          5000 HTTP 5xx requests in the last 5 minutes
        summary: Job {{ $labels.job }} on {{ $labels.namespace }} has more than 2500
          HTTP 5xx requests in the last 5 minutes
      expr: sum(rate(apisonator_listener_response_codes{job=~"backend.*",namespace="$MANAGER_NS",resp_code="5xx"}[5m]))
        by (namespace,job,resp_code) > 2500
      for: 5m
      labels:
        severity: warning
    - alert: ThreescaleApicastIntranetJobDown
      annotations:
        description: Job {{ $labels.job }} on {{ $labels.namespace }} is DOWN
        summary: Job {{ $labels.job }} on {{ $labels.namespace }} is DOWN
      expr: up{job=~".*/apicast-production|.*/apicast-staging",namespace="3scale-apicast-intranet"}
        == 0
      for: 1m
      labels:
        severity: critical
    - alert: ThreescaleApicastInternetJobDown
      annotations:
        description: Job {{ $labels.job }} on {{ $labels.namespace }} is DOWN
        summary: Job {{ $labels.job }} on {{ $labels.namespace }} is DOWN
      expr: up{job=~".*/apicast-production|.*/apicast-staging",namespace="3scale-apicast-internet"}
        == 0
      for: 1m
      labels:
        severity: critical
    - alert: ThreescaleApicastMobileJobDown
      annotations:
        description: Job {{ $labels.job }} on {{ $labels.namespace }} is DOWN
        summary: Job {{ $labels.job }} on {{ $labels.namespace }} is DOWN
      expr: up{job=~".*/apicast-production|.*/apicast-staging",namespace="3scale-apicast-mobile"}
        == 0
      for: 1m
      labels:
        severity: critical
EOL
oc create -f threescale-custom-alerts.yaml -n $MANAGER_NS

You should see all created alerts in the Observe view at Alerts menu item:

Testing with thanos-querier

If you want to work on PromSQL expressions, just try using thanos querier:

oc get route -n openshift-console -lapp=console | awk 'NR==2 {s1=\"https://\"; s2=\"/monitoring/query-browser?query0=\"; print s1\$2s2}'

Conclusion

In this article, we went through 3Scale monitoring with all OpenShift monitoring provided resources. You should now be able to see all Dashboards from Grafana, monitor resources, and receive alerts with a preconfigured SMTP account.

Make sure to check out other scenarios and, along the way, configure different types of receivers!

And most importantly, have fun monitoring, and see you in the next coffee hack!

Loading

Leave A Comment