<返回更多

Android中ANR的分析和解决

2023-03-07    
加入收藏

一、认识ANR

1、 ANR的定义
ANR,是“Application Not Responding”的缩写,即“应用程序无响应”。如果你应用程序在UI线程被阻塞太长时间,就会出现ANR,通常出现ANR,系统会弹出一个提示提示框,让用户知道,该程序正在被阻塞,是否继续等待还是关闭。

2、ANR的类型

(1)KeyDispatchTimeout(常见)
input事件在5S内没有处理完成发生了ANR。
logcat日志关键字:Input event dispatching timed out

(2)BroadcastTimeout
前台Broadcast:onReceiver在10S内没有处理完成发生ANR。
后台Broadcast:onReceiver在60s内没有处理完成发生ANR。
logcat日志关键字:Timeout of broadcast BroadcastRecord

(3)ServiceTimeout
前台Service:onCreate,onStart,onBind等生命周期在20s内没有处理完成发生ANR。
后台Service:onCreate,onStart,onBind等生命周期在200s内没有处理完成发生ANR
logcat日志关键字:Timeout executing service

(4)ContentProviderTimeout
ContentProvider 在10S内没有处理完成发生ANR。
logcat日志关键字:timeout publishing content providers

3、ANR出现的原因

(1)主线程频繁进行耗时的IO操作:如数据库读写
(2)多线程操作的死锁,主线程被block;
(3)主线程被Binder 对端block;
(4)System Server中WatchDog出现ANR;
(5)service binder的连接达到上线无法和和System Server通信
(6)系统资源已耗尽(管道、CPU、IO)

4、如何分析ANR

(1)日志分析:ANR发生时都会在log中输出错误信息,从log中可以获得ANR的类型,CPU的使用情况,CPU使用率过高有可能是CPU饥饿导致了ANR。CPU使用率过低说明主线程被block了,如果IOwait高是因为主线程进行I/O操作造成的。

(2)traces文件分析:除了log输出外,你会发现各个应用进程和系统进程的函数堆栈信息都输出到了一个/data/anr/traces.txt的文件中,这个文件是分析ANR原因的关键文件.要获取到该文件可使用adb指令进行赋权后拉出查看调用stack。通过log、trace.text、代码结合分析ANR的成因(iowait?Memoryleak?Block?)

(3)traces文件无法分析的:不过还存在一些ANR问题,trace文件是分析不了的,例如我们的系统上,人脸识别活体攻击的时候,native算法耗尽cpu资源导致其他app无法抢占cpu时间片导致anr,假如ANR的app是你开发的,估计查到死也找不到问题所在,类似这类问题也写过简要的分析文章:

接下来我们将一步一步分析ANR,这个过程能让我们进一步明白如何找到问题、分析问题以及解决问题。

二、ANR发生时调用的关键代码

当ANR发生的时候,系统会调用如下相关的关键函数代码,来将系统当前的关键信息保存到日志当中:

framework/base/services/core/JAVA/com/Android/server/am/AppErrors.java

Java
class AppErrors {
    final void appNotResponding(ProcessRecord app, ActivityRecord activity,
            ActivityRecord parent, boolean aboveSystem, final String annotation) {
        ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
        SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);

        if (mService.mController != null) {
            try {
                // 0 == continue, -1 = kill process immediately
                int res = mService.mController.appEarlyNotResponding(
                        app.processName, app.pid, annotation);
                if (res < 0 && app.pid != MY_PID) {
                    app.kill("anr", true);
                }
            } catch (RemoteException e) {
                mService.mController = null;
                Watchdog.getInstance().setActivityController(null);
            }
        }

        long anrTime = SystemClock.uptimeMillis();
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            mService.updateCpuStatsNow();
        }

        // Unless configured otherwise, swallow ANRs in background processes & kill the process.
        boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
                Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;

        boolean isSilentANR;

        synchronized (mService) {
            // PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down.
            if (mService.mShuttingDown) {
                Slog.i(TAG, "During shutdown skipping ANR: " + app + " " + annotation);
                return;
            } else if (app.notResponding) {
                Slog.i(TAG, "Skipping duplicate ANR: " + app + " " + annotation);
                return;
            } else if (app.crashing) {
                Slog.i(TAG, "Crashing app skipping ANR: " + app + " " + annotation);
                return;
            } else if (app.killedByAm) {
                Slog.i(TAG, "App already killed by AM skipping ANR: " + app + " " + annotation);
                return;
            } else if (app.killed) {
                Slog.i(TAG, "Skipping died app ANR: " + app + " " + annotation);
                return;
            }

            // In case we come through here for the same app before completing
            // this one, mark as anring now so we will bail out.
            app.notResponding = true;

            // 记录ANR到event log
            EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
                    app.processName, app.info.flags, annotation);

            // Dump thread traces as quickly as we can, starting with "interesting" processes.
            firstPids.add(app.pid);

            // Don't dump other PIDs if it's a background ANR
            isSilentANR = !showBackground && !isInterestingForBackgroundTraces(app);
            if (!isSilentANR) {
                int parentPid = app.pid;
                if (parent != null && parent.app != null && parent.app.pid > 0) {
                    parentPid = parent.app.pid;
                }
                if (parentPid != app.pid) firstPids.add(parentPid);

                if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);

                for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) {
                    ProcessRecord r = mService.mLruProcesses.get(i);
                    if (r != null && r.thread != null) {
                        int pid = r.pid;
                        if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
                            if (r.persistent) {
                                firstPids.add(pid);
                                if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
                            } else if (r.treatLikeActivity) {
                                firstPids.add(pid);
                                if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
                            } else {
                                lastPids.put(pid, Boolean.TRUE);
                                if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
                            }
                        }
                    }
                }
            }
        }

        // 记录ANR 到 main log
        StringBuilder info = new StringBuilder();
        info.setLength(0);
        info.append("ANR in ").append(app.processName); //例如:ANR in com.xxxx.moblie (进程名)
        if (activity != null && activity.shortComponentName != null) {
            info.append(" (").append(activity.shortComponentName).append(")");//例如:(com.xxxx.moblie/.ui.MainActivity) 
        }
        info.append("n");//换行
        info.append("PID: ").append(app.pid).append("n");//记录进程id
        if (annotation != null) {
            info.append("Reason: ").append(annotation).append("n");
        }
        if (parent != null && parent != activity) {
            info.append("Parent: ").append(parent.shortComponentName).append("n");
        }

        ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

        // don't dump native PIDs for background ANRs unless it is the process of interest
        String[] nativeProcs = null;
        if (isSilentANR) {
            for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
                if (NATIVE_STACKS_OF_INTEREST[i].equals(app.processName)) {
                    nativeProcs = new String[] { app.processName };
                    break;
                }
            }
        } else {
            nativeProcs = NATIVE_STACKS_OF_INTEREST;
        }

        int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs);
        ArrayList<Integer> nativePids = null;

        if (pids != null) {
            nativePids = new ArrayList<Integer>(pids.length);
            for (int i : pids) {
                nativePids.add(i);
            }
        }

        // For background ANRs, don't pass the ProcessCpuTracker to
        // avoid spending 1/2 second collecting stats to rank lastPids.
        File tracesFile = ActivityManagerService.dumpStackTraces(
                true, firstPids,
                (isSilentANR) ? null : processCpuTracker,
                (isSilentANR) ? null : lastPids,
                nativePids);

        String cpuInfo = null;
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            mService.updateCpuStatsNow();
            synchronized (mService.mProcessCpuTracker) {
                cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
            }
            info.append(processCpuTracker.printCurrentLoad());
            info.append(cpuInfo);
        }

        info.append(processCpuTracker.printCurrentState(anrTime));
		
		//将ANR相关的错误日志信息打印出来
        Slog.e(TAG, info.toString());
		
		...省略...
    }

通过上面的代码我们知道,当ANR发生的时候,会将ANR记录到event log和main log中。

三、查看events_log(如果没有找到该文件可以跳过这一步)

查看mobilelog文件夹下的events_log,从日志中搜索关键字:am_anr,找到出现ANR的时间点、进程PID、ANR类型。

如日志:

07-20 15:36:36.472  1000  1520  1597 I am_anr  : [0,1480,com.xxxx.moblie,952680005,Input dispatching timed out (AppWindowToken{da8f666 token=Token{5501f51 ActivityRecord{15c5c78 u0 com.xxxx.moblie/.ui.MainActivity t3862}}}, Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)]

从上面的log我们可以看出: 应用com.xxxx.moblie 在07-20 15:36:36.472时间,发生了一次KeyDispatchTimeout类型的ANR,它的进程号是1480.
把关键的信息整理一下:

我们已经知道了发生KeyDispatchTimeout的ANR是因为 input事件在5秒内没有处理完成。那么在这个时间07-20 15:36:36.472 的前5秒,也就是(15:36:30 ~15:36:31)时间段左右程序到底做了什么事情?这个简单,因为我们已经知道pid了,再搜索一下pid = 1480的日志.这些日志表示该进程所运行的轨迹,关键的日志如下:

07-20 15:36:29.749 10102  1480  1737 D moblie-Application: [Thread:17329] receive an intent from server, action=com.ttt.push.RECEIVE_MESSAGE
07-20 15:36:30.136 10102  1480  1737 D moblie-Application: receiving an empty message, drop
07-20 15:36:35.791 10102  1480  1766 I Adreno  : QUALCOMM build                   : 9c9b012, I92eb381bc9
07-20 15:36:35.791 10102  1480  1766 I Adreno  : Build Date                       : 12/31/17
07-20 15:36:35.791 10102  1480  1766 I Adreno  : OpenGL ES Shader Compiler Version: EV031.22.00.01
07-20 15:36:35.791 10102  1480  1766 I Adreno  : Local Branch                     : 
07-20 15:36:35.791 10102  1480  1766 I Adreno  : Remote Branch                    : refs/tags/AU_linux_ANDROID_LA.UM.6.4.R1.08.00.00.309.049
07-20 15:36:35.791 10102  1480  1766 I Adreno  : Remote Branch                    : NONE
07-20 15:36:35.791 10102  1480  1766 I Adreno  : Reconstruct Branch               : NOTHING
07-20 15:36:35.826 10102  1480  1766 I vndksupport: sphal namespace is not configured for this process. Loading /vendor/lib64/hw/gralloc.msm8998.so from the current namespace instead.
07-20 15:36:36.682 10102  1480  1480 W ViewRootImpl[MainActivity]: Cancelling event due to no window focus: KeyEvent { action=ACTION_UP, keyCode=KEYCODE_PERIOD, scanCode=0, metaState=0, flags=0x28, repeatCount=0, eventTime=16099429, downTime=16099429, deviceId=-1, source=0x101 }

从上面我们可以知道,在时间 07-20 15:36:29.749 程序收到了一个action消息。

07-20 15:36:29.749 10102  1480  1737 D moblie-Application: [Thread:17329] receive an intent from server, action=com.ttt.push.RECEIVE_MESSAGE。

原来是应用com.xxxx.moblie 收到了一个推送消息(com.ttt.push.RECEIVE_MESSAGE)导致了阻塞,我们再串联一下目前所获取到的信息:在时间07-20 15:36:29.749 应用com.xxxx.moblie 收到了一下推送信息action=com.ttt.push.RECEIVE_MESSAGE发生阻塞,5秒后发生了KeyDispatchTimeout的ANR。

四、查看main_log日志信息

在分析ANR的时候,我们首先要确认是不是当时CPU很紧张、各路APP都在抢占资源,CPU无法及时响应最终导致了ANR?为了排查这种情况,我们就需要获取ANR发生时候的CPU状态参数。

1、接下来我们来看一个ANR模拟日志案例

07-20 15:36:58.711  1000  1520  1597 E ActivityManager: ANR in com.xxxx.moblie (com.xxxx.moblie/.ui.MainActivity) (关键字ANR in + 进程名 + Activity名称)
07-20 15:36:58.711  1000  1520  1597 E ActivityManager: PID: 1480 (进程pid)
07-20 15:36:58.711  1000  1520  1597 E ActivityManager: Reason: Input dispatching timed out (AppWindowToken{da8f666 token=Token{5501f51 ActivityRecord{15c5c78 u0 com.xxxx.moblie/.ui.MainActivity t3862}}}, Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)(ANR的原因,输入分发超时)
07-20 15:36:58.711  1000  1520  1597 E ActivityManager: Load: 0.0 / 0.0 / 0.0 (Load表明是1分钟,5分钟,15分钟CPU的负载)
07-20 15:36:58.711  1000  1520  1597 E ActivityManager: CPU usage from 20ms to 20286ms later (2018-07-20 15:36:36.170 to 2018-07-20 15:36:56.436):
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   42% 6774/pressure: 41% user + 1.4% kernel / faults: 168 minor
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   34% 142/kswapd0: 0% user + 34% kernel
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   31% 1520/system_server: 13% user + 18% kernel / faults: 58724 minor 1585 major
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   13% 29901/com.ss.android.article.news: 7.7% user + 6% kernel / faults: 56007 minor 2446 major
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   13% 32638/com.android.quicksearchbox: 9.4% user + 3.8% kernel / faults: 48999 minor 1540 major
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   11% (CPU的使用率)1480/com.xxxx.moblie: 5.2%(用户态的使用率) user + (内核态的使用率) 6.3% kernel / faults: 76401 minor 2422 major
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   8.2% 21000/kworker/u16:12: 0% user + 8.2% kernel
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   0.8% 724/mtd: 0% user + 0.8% kernel / faults: 1561 minor 9 major
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   8% 29704/kworker/u16:8: 0% user + 8% kernel
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   7.9% 24391/kworker/u16:18: 0% user + 7.9% kernel
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   7.1% 30656/kworker/u16:14: 0% user + 7.1% kernel
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   7.1% 9998/kworker/u16:4: 0% user + 7.1% kernel

通过上面所提供的案例我们可以分析出以下几点:

2、下面所提供的是一个ANR的真实日志案例

10-09 19:35:22.124   940   968 E ActivityManager: ANR in com.example.anrtest (com.example.anrtest/.MainActivity) // 记录ANR+进程名+Activity名称
10-09 19:35:22.124   940   968 E ActivityManager: PID: 8390 //记录进程ID
10-09 19:35:22.124   940   968 E ActivityManager: Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  Wait queue length: 28.  Wait queue head age: 5517.5ms.)(ANR的原因,输入分发超时)
10-09 19:35:22.124   940   968 E ActivityManager: Load: 2.52 / 2.57 / 2.73
10-09 19:35:22.124   940   968 E ActivityManager: CPU usage from 99984ms to 0ms ago (2022-10-09 19:33:39.209 to 2022-10-09 19:35:19.194):
10-09 19:35:22.124   940   968 E ActivityManager:   25% 445/surfaceflinger: 14% user + 11% kernel / faults: 11962 minor
10-09 19:35:22.124   940   968 E ActivityManager:   11% 394/android.hardware.graphics.composer@2.2-service: 2.9% user + 8.2% kernel / faults: 1 minor
10-09 19:35:22.124   940   968 E ActivityManager:   10% 2101/com.leapmotor.appcenter: 7.8% user + 3.1% kernel / faults: 926 minor
10-09 19:35:22.124   940   968 E ActivityManager:   3.2% 1961/com.iflytek.cutefly.speechclient.hmi: 2.5% user + 0.7% kernel / faults: 2111 minor
10-09 19:35:22.124   940   968 E ActivityManager:   1.8% 386/android.hardware.audio@2.0-service: 0.2% user + 1.6% kernel10-09 19:35:22.124   940   968 E ActivityManager:   0.9% 940/system_server: 0.6% user + 0.3% kernel / faults: 5334 minor10-09 19:35:22.124   940   968 E ActivityManager:   0.7% 440/audIOServer: 0.3% user + 0.3% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.3% 277/apr_vm_cb_threa: 0% user + 0.3% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.3% 544/leap_uhab: 0% user + 0.2% kernel / faults: 20 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0.3% 5219/com.leapmotor.leapmotorsoscall: 0.1% user + 0.1% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.2% 2367/com.leapmotor.multimedia: 0% user + 0.2% kernel / faults: 9 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0.2% 8215/kworker/2:3: 0% user + 0.2% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.2% 3992/adbd: 0% user + 0.2% kernel / faults: 46652 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0.2% 7786/com.ebanma.tinyapp: 0.2% user + 0% kernel / faults: 557 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0.2% 7895/com.ebanma.tinyapp:DataCenterService: 0.2% user + 0% kernel / faults: 463 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 543/leap_systemsdk_service: 0% user + 0.1% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 1392/com.leapmotor.cameraaround: 0% user + 0.1% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 410/leap_vsomeip_route: 0.1% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 3554/com.leapmotor.driverecord:emergency: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 411/leap_camera_around: 0% user + 0.1% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 2915/com.leapmotor.systemupdate: 0% user + 0% kernel / faults: 5 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 9/rcu_preempt: 0% user + 0.1% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 412/leap_camera_face: 0% user + 0.1% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 413/leap_camera_front: 0% user + 0.1% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 2021/com.leapmotor.facevideo: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 2554/com.leapmotor.phone: 0% user + 0.1% kernel / faults: 10 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 1916/com.leapmotor.log: 0% user + 0% kernel / faults: 2 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 2188/com.leapmotor.camera:front_encode: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 2211/com.leapmotor.camera:around_encode: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0.1% 3541/com.leapmotor.driverecord:trip: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 285/logd: 0% user + 0% kernel / faults: 3 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0% 325/ais_v4l2_proxy: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 541/leap_shutdown: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 2243/com.leapmotor.camera:front_push_encode: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 1199/com.android.systemui: 0% user + 0% kernel / faults: 576 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0% 3492/com.leapmotor.driverecord: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 251/vlog: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 560/installd: 0% user + 0% kernel / faults: 8 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0% 4106/kworker/u8:0: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 6191/installer: 0% user + 0% kernel / faults: 295 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0% 8/ksoftirqd/0: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 18/ksoftirqd/1: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 1131/com.android.car: 0% user + 0% kernel / faults: 553 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0% 2134/com.leapmotor.bt:bt_service: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 7818/kworker/u9:0: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 24/ksoftirqd/2: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 512/leap_logcat: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 10/rcu_sched: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 30/ksoftirqd/3: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 70/system: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 240/kworker/u8:10: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 281/jbd2/vdb-8: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 286/servicemanager: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 287/hwservicemanager: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 330/zygote64: 0% user + 0% kernel / faults: 153 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0% 409/vendor.qti.hardware.perf@1.0-service: 0% user + 0% kernel / faults: 37 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0% 542/leap_vsomeip_qnx_heart: 0% user + 0% kernel / faults: 3 minor10-09 19:35:22.124   940   968 E ActivityManager:   0% 654/msm_irqbalance: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 677/ipacm: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 1366/.dataservices: 0% user + 0% kernel / faults: 3 minor
10-09 19:35:22.124   940   968 E ActivityManager:   0% 2665/dmesg: 0% user + 0% kernel
10-09 19:35:22.124   940   968 E ActivityManager:   0% 2893/android.process.acore: 0% user + 0% kernel / faults: 35 minor
10-09 19:35:22.124  2101  2120 I motor.appcente: Wrote stack traces to '[tombstoned]'
10-09 19:35:22.125   940   969 W ActivityManager:   Force finishing activity com.example.anrtest/.MainActivity
10-09 19:35:22.127   940   969 I ActivityManager: saveTopActivity mTmpTop=com.example.anrtest/com.example.anrtest.MainActivity

通过上面所提供的案例我们可以分析出以下几点:

通过上面几点我们虽然排除了CPU过度负载的可能,但我们并不能准确定位出ANR的确切位置,要想准确定位出ANR发生的确切位置,就要借助系统为了解决ANR问题而提供的终极大杀器——traces.txt文件了。

五、 traces.txt 文件分析

当APP不响应、响应慢了、或者WatchDog的监视没有得到回应时,系统就会dump出一个traces.txt文件,存放在文件目录:/data/anr/文件夹中,通过traces文件,我们可以拿到线程名、堆栈信息、线程当前状态、binder call等信息。
我们可以通过adb命令获取到该文件夹下面的所有traces文件:adb pull /data/anr

trace:
Cmd line:com.xxxx.moblie

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x73bcc7d0 self=0x7f20814c00
  | sysTid=20176 nice=-10 cgrp=default sched=0/0 handle=0x7f251349b0
  | state=R schedstat=( 0 0 0 ) utm=12 stm=3 core=5 HZ=100
  | stack=0x7fdb75e000-0x7fdb760000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)
  // java 堆栈调用信息,可以查看调用的关系,定位到具体位置
  at ttt.push.InterceptorProxy.addMiuiApplication(InterceptorProxy.java:77)
  at ttt.push.InterceptorProxy.create(InterceptorProxy.java:59)
  at android.app.Activity.onCreate(Activity.java:1041)
  at miui.app.Activity.onCreate(SourceFile:47)
  at com.xxxx.moblie.ui.b.onCreate(SourceFile:172)
  at com.xxxx.moblie.ui.MainActivity.onCreate(SourceFile:68)
  at android.app.Activity.performCreate(Activity.java:7050)
  at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1214)
  at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2807)
  at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2929)
  at android.app.ActivityThread.-wrap11(ActivityThread.java:-1)
  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1618)
  at android.os.Handler.dispatchMessage(Handler.java:105)
  at android.os.Looper.loop(Looper.java:171)
  at android.app.ActivityThread.main(ActivityThread.java:6699)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:246)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:783)

这里详细解析一下traces.txt里面的一些字段,看看它到底能给我们提供什么信息.

Java的堆栈信息是我们最关心的,它能够定位到具体位置。从上面的traces,我们可以判断ttt.push.InterceptorProxy.addMiuiApplicationInterceptorProxy.java:77 导致了com.xxxx.moblie发生了ANR。这时候可以对着源码查看,找到出问题,并且解决它。

接下来再看一个真实的导致ANR的trace案例:

//开头显示进程号、ANR发生的时间点和进程名称
----- pid 4972 at 2022-10-09 16:45:41 -----
Cmd line: com.example.anrtest

...省略...

DALVIK THREADS (14)://以下是各个线程的函数堆栈信息

//依次是:线程名(后面标识有daemon说明是个守护线程)、线程优先级、线程号、线程当前状态(TIMED_WAIT或SUSPENDED在anr时很常见)
"Signal Catcher" daemon prio=5 tid=3 Runnable
//依次是:线程组名称、suspendCount个数、debugSuspendCount个数、标记,线程的Java对象地址、线程的Native对象地址
  | group="system" sCount=0 dsCount=0 flags=0 obj=0x15980100 self=0x7322e16400
  //sysTid是线程号,主线程的线程号和进程号相同
  | sysTid=4978 nice=0 cgrp=default sched=0/0 handle=0x731c3304f0
  | state=R schedstat=( 5583230 6778645 10 ) utm=0 stm=0 core=3 HZ=100
  | stack=0x731c235000-0x731c237000 stackSize=1009KB
  | held mutexes= "mutator lock"(shared held)
//“Signal Catcher”负责接收和处理kernel发送的各种信号,例如SIGNAL_QUIT、SIGNAL_USR1等就是被该线程
//接收到,这个文件的内容就是由该线程负责输出的,可以看到它的状态是RUNNABLE,不过此线程也不需要关心

...省略...

//主线程,当前处于休眠状态
"main" prio=5 tid=1 Sleeping
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x73e64b08 self=0x7322e14c00
  | sysTid=4972 nice=-10 cgrp=default sched=0/0 handle=0x73a88cc548
  | state=S schedstat=( 618097798 184039398 700 ) utm=56 stm=5 core=3 HZ=100
  | stack=0x7fd05f2000-0x7fd05f4000 stackSize=8MB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x0d9c45cc> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:373)
  - locked <0x0d9c45cc> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:314)
  at android.os.SystemClock.sleep(SystemClock.java:127)
  at com.example.anrtest.MainActivity$1.onClick(MainActivity.java:57)
  at android.view.View.performClick(View.java:6597)
  at android.view.View.performClickInternal(View.java:6574)
  at android.view.View.access$3100(View.java:778)
  at android.view.View$PerformClick.run(View.java:25885)
  at android.os.Handler.handleCallback(Handler.java:873)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loop(Looper.java:193)
  at android.app.ActivityThread.main(ActivityThread.java:6718)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)
  
...省略...

"Jit thread pool worker thread 0" daemon prio=5 tid=2 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x15980000 self=0x731c029000
  | sysTid=4977 nice=9 cgrp=default sched=0/0 handle=0x731c4314f0
  | state=S schedstat=( 122669387 79479997 188 ) utm=8 stm=3 core=3 HZ=100
  | stack=0x731c333000-0x731c335000 stackSize=1021KB
  | held mutexes=
  
...省略...

//Binder线程是进程的线程池中用来处理binder请求的线程
"Binder:4972_1" prio=5 tid=9 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x159829a0 self=0x731a60d000
  | sysTid=4984 nice=0 cgrp=default sched=0/0 handle=0x73099b34f0
  | state=S schedstat=( 12586983 41079166 22 ) utm=1 stm=0 core=3 HZ=100
  | stack=0x73098b8000-0x73098ba000 stackSize=1009KB
  | held mutexes=
  
...省略...

六、ANR分析流程总结

总结一下上面我们分析ANR的主体流程:

1、首先我们搜索am_anr,找到出现ANR的时间点、进程PID、ANR类型、然后再找搜索PID,找前5秒左右的日志。
2、过滤ANR IN 查看CPU信息
3、接着查看traces.txt,找到java的堆栈信息定位代码位置,最后查看源码,分析与解决问题。

到这里,通过上面三个步骤我们基本就能定位出来大部分ANR的来龙去脉了。

我在接下来的第六步和第七步还继续提供了多个ANR案例供大家参考分析,有兴趣的可以看着。

七、ANR 案例整理

1、主线程被其他线程lock,导致死锁

waiting on <0x1cd570> (a android.os.MessageQueue)

DALVIK THREADS:
"main" prio=5 tid=3 TIMED_WAIT
  | group="main" sCount=1 dsCount=0 s=0 obj=0x400143a8
  | sysTid=691 nice=0 sched=0/0 handle=-1091117924
  at java.lang.Object.wait(Native Method)
  - waiting on <0x1cd570> (a android.os.MessageQueue)
  at java.lang.Object.wait(Object.java:195)
  at android.os.MessageQueue.next(MessageQueue.java:144)
  at android.os.Looper.loop(Looper.java:110)
  at android.app.ActivityThread.main(ActivityThread.java:3742)
  at java.lang.reflect.Method.invokeNative(Native Method)
  at java.lang.reflect.Method.invoke(Method.java:515)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:739)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:497)
  at dalvik.system.NativeStart.main(Native Method)

"Binder Thread #3" prio=5 tid=15 NATIVE
  | group="main" sCount=1 dsCount=0 s=0 obj=0x434e7758
  | sysTid=734 nice=0 sched=0/0 handle=1733632
  at dalvik.system.NativeStart.run(Native Method)

"Binder Thread #2" prio=5 tid=13 NATIVE
  | group="main" sCount=1 dsCount=0 s=0 obj=0x1cd570
  | sysTid=696 nice=0 sched=0/0 handle=1369840
  at dalvik.system.NativeStart.run(Native Method)

"Binder Thread #1" prio=5 tid=11 NATIVE
  | group="main" sCount=1 dsCount=0 s=0 obj=0x433aca10
  | sysTid=695 nice=0 sched=0/0 handle=1367448
  at dalvik.system.NativeStart.run(Native Method)

----- end 691 -----

2、主线程做耗时的操作:比如数据库读写。

"main" prio=5 tid=1 Native
held mutexes=
kernel: (couldn't read /proc/self/task/11003/stack)
native: #00 pc 000492a4 /system/lib/libc.so (nanosleep+12)
native: #01 pc 0002dc21 /system/lib/libc.so (usleep+52)
native: #02 pc 00009cab /system/lib/libsqlite.so (???)
native: #03 pc 00011119 /system/lib/libsqlite.so (???)
native: #04 pc 00016455 /system/lib/libsqlite.so (???)
native: #16 pc 0000fa29 /system/lib/libsqlite.so (???)
native: #17 pc 0000fad7 /system/lib/libsqlite.so (sqlite3_prepare16_v2+14)
native: #18 pc 0007f671 /system/lib/libandroid_runtime.so (???)
native: #19 pc 002b4721 /system/framework/arm/boot-framework.oat (Java_android_database_sqlite_SQLiteConnection_nativePrepareStatement__JLjava_lang_String_2+116)
at android.database.sqlite.SQLiteConnection.setWalModeFromConfiguration(SQLiteConnection.java:294)
at android.database.sqlite.SQLiteConnection.open(SQLiteConnection.java:215)
at android.database.sqlite.SQLiteConnection.open(SQLiteConnection.java:193)
at android.database.sqlite.SQLiteConnectionPool.openConnectionLocked(SQLiteConnectionPool.java:463)
at android.database.sqlite.SQLiteConnectionPool.open(SQLiteConnectionPool.java:185)
at android.database.sqlite.SQLiteConnectionPool.open(SQLiteConnectionPool.java:177)
at android.database.sqlite.SQLiteDatabase.openInner(SQLiteDatabase.java:808)
locked <0x0db193bf> (a java.lang.Object)
at android.database.sqlite.SQLiteDatabase.open(SQLiteDatabase.java:793)
at android.database.sqlite.SQLiteDatabase.openDatabase(SQLiteDatabase.java:696)
at android.app.ContextImpl.openOrCreateDatabase(ContextImpl.java:690)
at android.content.ContextWrapper.openOrCreateDatabase(ContextWrapper.java:299)
at android.database.sqlite.SQLiteOpenHelper.getDatabaseLocked(SQLiteOpenHelper.java:223)
at android.database.sqlite.SQLiteOpenHelper.getWritableDatabase(SQLiteOpenHelper.java:163)
locked <0x045a4a8c> (a com.xxxx.video.common.data.DataBaseHelper)
at com.xxxx.video.common.data.DataBaseoRM.<init>(DataBaseORM.java:46)
at com.xxxx.video.common.data.DataBaseORM.getInstance(DataBaseORM.java:53)
locked <0x017095d5> (a java.lang.Class<com.xxxx.video.common.data.DataBaseORM>)

3、binder数据量过大

07-21 04:43:21.573  1000  1488 12756 E Binder  : Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 388568 (data: 1, 32, 7274595)
07-21 04:43:21.573  1000  1488 12756 E Binder  : android.util.Log$TerribleFailure: Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 388568 (data: 1, 32, 7274595)
07-21 04:43:21.607  1000  1488  2951 E Binder  : Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 211848 (data: 1, 23, 7274595)
07-21 04:43:21.607  1000  1488  2951 E Binder  : android.util.Log$TerribleFailure: Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 211848 (data: 1, 23, 7274595)
07-21 04:43:21.662  1000  1488  6258 E Binder  : Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 259300 (data: 1, 33, 7274595)

4、binder 通信失败

07-21 06:04:35.580 <6>[32837.690321] binder: 1698:2362 transaction failed 29189/-3, size 100-0 line 3042
07-21 06:04:35.594 <6>[32837.704042] binder: 1765:4071 transaction failed 29189/-3, size 76-0 line 3042
07-21 06:04:35.899 <6>[32838.009132] binder: 1765:4067 transaction failed 29189/-3, size 224-8 line 3042
07-21 06:04:36.018 <6>[32838.128903] binder: 1765:2397 transaction failed 29189/-22, size 348-0 line 2916

八、模拟触发ANR事件,获取trace文件

以下的几个案例,都是我通过代码故意触发ANR所产生的trace文件。

1、输入的事件在5秒内没有被响应,主线程被阻塞。

输入事件5s没有响应,如onClick事件。这是anr问题的主要类型,一般开发者不会犯这样的错,凡是耗时的操作都另开线程处理,如果疏忽了看看自己的代码就知道怎么处理。

(1)模拟触发ANR:

Java
     findViewById(R.id.btn_test).setOnClickListener(new View.OnClickListener() {
         @Override
         public void onClick(View v) {
             SystemClock.sleep(40000);//故意响应超时导致ANR
         }
     });

(2)对应的Trace文件关键日志:

DALVIK THREADS (14):  //虚拟机线程被阻塞
"Signal Catcher" daemon prio=5 tid=3 Runnable
  | group="system" sCount=0 dsCount=0 flags=0 obj=0x15980100 self=0x7322e16400
  | sysTid=4978 nice=0 cgrp=default sched=0/0 handle=0x731c3304f0
  | state=R schedstat=( 5583230 6778645 10 ) utm=0 stm=0 core=3 HZ=100
  | stack=0x731c235000-0x731c237000 stackSize=1009KB
  | held mutexes= "mutator lock"(shared held)

"main" prio=5 tid=1 Sleeping  //主线程被阻塞
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x73e64b08 self=0x7322e14c00
  | sysTid=4972 nice=-10 cgrp=default sched=0/0 handle=0x73a88cc548
  | state=S schedstat=( 618097798 184039398 700 ) utm=56 stm=5 core=3 HZ=100
  | stack=0x7fd05f2000-0x7fd05f4000 stackSize=8MB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x0d9c45cc> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:373)
  - locked <0x0d9c45cc> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:314)
  at android.os.SystemClock.sleep(SystemClock.java:127) //代码被阻塞的关键位置
  at com.example.anrtest.MainActivity$1.onClick(MainActivity.java:57)  //按钮的点击事件
  at android.view.View.performClick(View.java:6597)
  at android.view.View.performClickInternal(View.java:6574)
  at android.view.View.access$3100(View.java:778)
  at android.view.View$PerformClick.run(View.java:25885)
  at android.os.Handler.handleCallback(Handler.java:873)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loop(Looper.java:193)
  at android.app.ActivityThread.main(ActivityThread.java:6718)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)

2、广播接收者的onReceive方法在10秒内没有执行完毕。

BroadcastReceiver是在程序主线程运行,而且默认情况下BroadcastReceiver的运行时间为10s,所以不能有耗时操作,如果耗时超过10s就会导致anr,从traces文件log就可以看出onReceive不能进行耗时任务。

(1)模拟触发ANR:

Java
   IntentFilter intentFilter = new IntentFilter();
   intentFilter.addAction("com.anr.test");
   registerReceiver(new BroadcastReceiver() {
       @Override
       public void onReceive(Context context, Intent intent) {
           SystemClock.sleep(40000);//故意响应超时导致ANR
       }
   }, intentFilter);

(2)对应的Trace文件关键日志:

DALVIK THREADS (14):  //虚拟机线程被阻塞
"Signal Catcher" daemon prio=5 tid=3 Runnable
  | group="system" sCount=0 dsCount=0 flags=0 obj=0x13100088 self=0x7322e16400
  | sysTid=6639 nice=0 cgrp=default sched=0/0 handle=0x731c3304f0
  | state=R schedstat=( 6547603 5577708 11 ) utm=0 stm=0 core=0 HZ=100
  | stack=0x731c235000-0x731c237000 stackSize=1009KB
  | held mutexes= "mutator lock"(shared held)

"main" prio=5 tid=1 Sleeping   //主线程被阻塞
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x73e64b08 self=0x7322e14c00
  | sysTid=6633 nice=-10 cgrp=default sched=0/0 handle=0x73a88cc548
  | state=S schedstat=( 605006551 138044643 533 ) utm=56 stm=3 core=2 HZ=100
  | stack=0x7fd05f2000-0x7fd05f4000 stackSize=8MB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x08c05759> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:373)
  - locked <0x08c05759> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:314)
  at android.os.SystemClock.sleep(SystemClock.java:127) //代码被阻塞的关键位置
  at com.example.anrtest.MainActivity$3.onReceive(MainActivity.java:80)  //广播接收者的onReceive方法
  at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0(LoadedApk.java:1391)
  at android.app.-$$Lambda$LoadedApk$ReceiverDispatcher$Args$_BumDX2UKsnxLVrE6UJsJZkotuA.run(lambda:-1)
  at android.os.Handler.handleCallback(Handler.java:873)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loop(Looper.java:193)
  at android.app.ActivityThread.main(ActivityThread.java:6718)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)

BroadcastReceiver是Android的消息组件,用来组件之间、应用之间的消息传递,生命周期太短也不能开子线程处理耗时任务,耗时任务一般转交给IntentService或者JobIntentService去做。

3、服务Service没有及时响应

Service是计算型组件,虽然在后台运行,但是本质上它也跑在主线程,如果你的服务要做任何CPU密集型(如MP3播放)或阻塞(如网络)操作,都要放在子线程中,否则耗时超过20s就会导致anr。下面我在onStartCommand方法中休眠40秒。

(1)模拟触发ANR:

Java
    @Override
    public int onStartCommand(Intent intent, int flags, int startId) {
        SystemClock.sleep(40000);//故意响应超时导致ANR
        return super.onStartCommand(intent, flags, startId);
    }

(2)对应的Trace文件关键日志:

DALVIK THREADS (13):  //虚拟机线程被阻塞
"Signal Catcher" daemon prio=5 tid=3 Runnable
  | group="system" sCount=0 dsCount=0 flags=0 obj=0x13940108 self=0x7322e16400
  | sysTid=7658 nice=0 cgrp=default sched=0/0 handle=0x731c3304f0
  | state=R schedstat=( 5995782 10695989 10 ) utm=0 stm=0 core=3 HZ=100
  | stack=0x731c235000-0x731c237000 stackSize=1009KB
  | held mutexes= "mutator lock"(shared held)

"main" prio=5 tid=1 Sleeping   //主线程被阻塞
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x73e64b08 self=0x7322e14c00
  | sysTid=7652 nice=-10 cgrp=default sched=0/0 handle=0x73a88cc548
  | state=S schedstat=( 595004855 101962599 596 ) utm=40 stm=18 core=2 HZ=100
  | stack=0x7fd05f2000-0x7fd05f4000 stackSize=8MB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x0e096f2a> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:373)
  - locked <0x0e096f2a> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:314)
  at android.os.SystemClock.sleep(SystemClock.java:127)  //代码被阻塞的关键位置
  at com.example.anrtest.ANRService.onStartCommand(ANRService.java:17)  //服务的onStartCommand方法
  at android.app.ActivityThread.handleServiceArgs(ActivityThread.java:3705)
  at android.app.ActivityThread.access$1600(ActivityThread.java:200)
  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1688)
  at android.os.Handler.dispatchMessage(Handler.java:106)
  at android.os.Looper.loop(Looper.java:193)
  at android.app.ActivityThread.main(ActivityThread.java:6718)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)
声明:本站部分内容来自互联网,如有版权侵犯或其他问题请与我们联系,我们将立即删除或处理。
▍相关推荐
更多资讯 >>>