Profiling
- Instrumenting the program to measure performance of a specific operation or part of the program
- Identify the bottlenecks in the code
Grafana Pyroscope
- Continuously profiling the code
- Requires very minimal overhead
- Can store years of perf data down to 10 second granularity
- Uses a unique, inverted flame graph for increased readability
SDK Instrumentation
-
Requires code changes by using the SDK
-
Examples
Auto-instrumentation using Grafana Alloy
-
No code changes required
-
Requires a collector to send profiles
-
Examples
Java
JVM - Troubleshooting
Common JVM options for troubleshooting
-Xms2g
-Xmx2g
-XX:+UnlockExperimentalVMOptions
-XX:+UseG1GC
-Xlog:gc*=info,gc+heap=debug,gc+ref*=debug,gc+ergo*=trace,gc+age*=trace:file=${project.build.directory}/gc-%t.log:utctime,pid,level,tags:filecount=2,filesize=100m
-XX:StartFlightRecording=settings=default,filename=${project.build.directory}/${project.artifactId}.jfr,dumponexit=true,maxsize=100M
-XX:+UnlockDiagnosticVMOptions
-XX:+LogVMOutput
-XX:LogFile=${project.build.directory}/jvm.log
-XX:ErrorFile=${project.build.directory}/hs_err_%p.log
-XX:+DisableExplicitGC
-XX:+UseCompressedOops
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=${project.build.directory}/heapDump.log
Profile application startup
Profile specific code spot
Identify the thread with the highest CPU consumption
top -H -p $PID_of_the_java_process
Display thread dump to stderr
kill -3 / -SIGQUIT $PID
Tooling
Java - Swiss Java Knife (SJK)
Java - IntelliJ IDEA Async Profiler
- Profiling basics · Hyperskill (opens in a new tab)
- Get Started With CPU Profiling (opens in a new tab)
Linux - perf
- BellSoft Blog - How to use perf to monitor Java performance (opens in a new tab)
- Generating perf maps with OpenJDK 17 (opens in a new tab)
- JavaOne 2016: Java Performance Analysis on Linux with Flame Graphs (opens in a new tab)
- Profiling JVM Applications in Production (opens in a new tab)
Linux - sysprof
TypeScript
Node.js
Docker
-
Key points
- Docker adds very little overhead in terms of CPU and memory to the application.
- The biggest performance hit is in disk I/O performance.
- If you require very low latency you can switch to using Docker’s host network feature, cutting out NAT.
-
Resources
-
Network
-
In the same IBM study cited before, the researchers found that Docker’s NAT doubled latency from roughly 35 µs to 70 µs for a 100-byte request from the client and a 200-byte response from the application.
-
If you require very low latency you can switch to using Docker’s host network feature, which allows your container to share the same network as the host, cutting out the need for NAT.
-
Unless you require very low latency, you should be fine sticking with the default bridge networking option. Just be sure to test it out and see if you’re getting the throughput you need.
-
Avoid Docker forwarded ports in production environments. Use either Unix sockets or the host network mode in this case, as it will introduce virtually no overhead.
-
Ports can be easier to manage, instead of a bunch of files, when dealing with multiple processes - either regarding many applications or scaling a single one. If you can afford a little drop in throughput, go for IP sockets.
-
If you have to extract every drop of performance available, use Unix domain sockets where possible.
-