Introduction.
The main article's goal is the complete overview of DTrace from its architecture to DTrace scripts development and libdtrace library use. This article provides examples of DTrace scripts that can be used for software optimization and collecting information about applications in real time. All examples were tested in Mac OS C 10.6.
What is DTrace?
I believe most of MacOS X developers use Instruments for applications optimization and testing very often. But not all of them know about DTrace, although it is being used as the kernel of Instruments.
DTrace is dynamic tracing tool that is built into Mac OS X. It was developed by Sun Microsystems for Solaris and later was released under free CCDL license. At this moment DTrace has been ported to FreeBSD 7.1 (as substitute for ktrace), Mac OS X 10.5 and it is being developed to support QNX.
As opposed to debugger (gdb for example) DTrace is safe to use in production systems and it does not require restarting either the system or application. Also DTrace may be helpful in case of races or another difficult-to-locate errors in production environment and as reverse engineering tool.
DTrace architecture.
DTrace consists of 4 parts: the front-end, like dtrace application or Instruments application, the libdtrace library, in-kernel DTrace framework and DTrace providers.
Instruments and dtrace applications are typical Mac OS X DTrace front-end. These applications use libdtrace for obtaining information from DTrace providers. Also it could be any users’ application, which was written with use of libdtrace library.
Libdtrace provides the interface to kernel level DTrace framework and compiles D code into DIF (DTrace Intermediate Format - a RISC-like instruction set). ioctl call and dtrace device are used for interactions between user and kernel levels.
Kernel level DTrace framework executes DIF code and manages DTrace Providers objects. DTrace providers responsible for creating probes by instrumenting of various part of the system.
A probe is a location or activity to which DTrace can bind a request to perform a set of actions, like recording a stack trace, a timestamp, or the argument to a function.
The interfaces between the libdtrace library, the in-kernel DTrace framework, and the DTrace providers are described in the block comments in header file. The internal architecture of DTrace is described in the block comments in header file.
DTrace D script language.
DTrace D language has C-like syntax and it is used for tracing actions description. The typical D script consists of compiler directives, initialization, type declarations, predicates, variables and probes clauses. Each probe includes probe description, predicates and actions.
BEGIN
{
initialization
}
probe descriptions
/ predicates /
{
actions
}
END
{
termination
}
When a D script executed, all described probes are enabled and the BEGIN probe fires before any other probes. When the probe is fired and the predicate is true, the actions are executed.
Probes.
Each probe description is made up of four fields separated by colon character:
provider:module:function:name
- Provider - instrumented layer. For example, function calls, memory statistics, process creation and termination, etc.
- Module - instrumented module. For example, libSystem, libc, etc.
- Function - instrumented function.
- Name - location in the function. Typically it is enter, return, tick-nsec.
The provider field is required; fields module, function, name are optional. Pattern matching characters such as *, ?, [...], \ can be used. The empty field is interpreted as the field which consists only of '*' wild card.
For example, the following D script will trace write* functions calls for all processes.
syscall::write*:entry
{
printf("(%d):%s write call called", pid, execname);
}
Predicates are used for determination whether the associated actions are executed. If the predicate is omitted the actions are always executed when the probe fires. Examples of predicates:
/pid == 12345/
/execname = "bash"/
Variables
There are two types of variable in D, scalar variables and associated arrays.
BEGIN
{
x = 100;
arr1["hello"] = 100;
arr2["hello", 10] = 100;
}
X is an integer scalar variable.
arr1 is an associated array variable with key "hello" and value 100.
arr2 is an associated array variable with tuple ["hello", 10] as key and value 100.
As follows from sample above, D, unlike C or Java, does not require explicit variable declaration.
Also, variables can be separated into global, thread-local and clause-local variables by storage type.
Thread-local variables
Thread-local variables are used for storing data separated by threads. Thread-local variables are accessed via the self identifier and the -> operator.
syscall::write*:entry
{
self->write = 1;
}
syscall::write*:return
/self->write == 1/
{
self->write = 0;
}
Thread-local variable should be assigned to zero at the end of usage, it allows DTrace to reclaim the storage.
Clause-local variables
Clause-local variables are similar to C or C++ automatic variables. Clause-local variables are accessed via the this identifier and the -> operator.
syscall::write*:entry
{
self->i = 10;
...
}
Thread-local variable reclaimed and reused automatically for the next clause.
Structs
For simplifying D scripts variables can be putted together into structs. Structs syntax is similar to C struct.
struct struct_name {
int a;
int b[string];
};
struct struct_name arr[string];
The arr variable is associated array of struct_name objects.
Macro variables.
D compiler provides set of macro variable, which can be used in D scripts. Macro variables are prefixed with a dollar sign ($) and are expanded once by the D compiler when it’s processing an input file.
The list of frequently used macro variables:
- $1 - first additional operand.
- $pid - dtrace process ID.
- $target - the process id of the process that is selected by -p option or is created using -c option.
pid$target:::entry
{
@[probefunc] = count();
}
...
matchString 9217
malloc_zone_malloc 9712
szone_malloc 9712
szone_malloc_should_clear 9743
memmove 10324
tiny_free_list_add_ptr 10522
object_getClass 11244
__CFStringHash 13019
CFHash 13095
___CFBasicHashFindBucket1 13748
CFAllocatorDeallocate 13923
_CFRetain 15354
CFRetain 15372
__CFStringEqual 15861
...
The example above demonstrates using of $target macro variable. That scrip can be used to determine the frequently called functions list.
Aggregations.
Aggregations are used for collecting the output in memory tables and outputting summary information. The syntax of an aggregation is:
@name[ keys ] = aggfunc ( args );
Aggregations variables has special prefix @ and there are no thread- or clause-local aggregations. DTrace command prints aggregations results when the process terminated or when user pressed Control-C.
count.
count call returns the number of times called. The following script prints the list of calls with the number of times functions were called.
pid$1:::entry
{
@[strjoin(strjoin(probemod,"`"), probefunc)]=count();
}
....
CoreFoundation`CFBasicHashApply 14
CoreFoundation`CFDictionaryApplyFunction 14
libSystem.B.dylib`strlen 26
libSystem.B.dylib`xdr_void 28
libSystem.B.dylib`mach_port_names 577
libSystem.B.dylib`pid_for_task 577
libSystem.B.dylib`task_info 577
libSystem.B.dylib`task_threads 577
tasksexplorerd`build_tasks_array 577
tasksexplorerd`task_explorer_dyninfo_1_svc 577
tasksexplorerd`xdr_task_info_dynamic 577
tasksexplorerd`task_info_manager_find_task 578
tasksexplorerd`build_killed_array 579
libSystem.B.dylib`_authenticate 592
libSystem.B.dylib`_svcauth_null 592
libSystem.B.dylib`bcopy 592
libSystem.B.dylib`fill_input_buf 592
libSystem.B.dylib`read$NOCANCEL 592
...
avg.
avg call returns the arithmetic average of the expression. The following script prints the average time spending for each call in microsecond.
pid$1:::entry
{
self->timestamp = timestamp;
}
pid$1:::return
/self->timestamp/
{
@[strjoin(strjoin(probemod,"`"), probefunc)] = avg((timestamp - self->timestamp) / 1000);
self->timestamp = 0;
}
...
tasksexplorerd`build_killed_array 14
CoreFoundation`CFDictionaryGetCount 15
libSystem.B.dylib`get_input_bytes 15
libSystem.B.dylib`pid_for_task 15
libSystem.B.dylib`xdrrec_endofrecord 15
libSystem.B.dylib`xdrrec_skiprecord 15
libSystem.B.dylib`OSSpinLockUnlock 16
libSystem.B.dylib`mig_get_reply_port 16
libSystem.B.dylib`xdr_void 16
libSystem.B.dylib`read$NOCANCEL 17
libSystem.B.dylib`szone_free_definite_size 18
libSystem.B.dylib`tiny_free_list_add_ptr 18
...
normalize and trunc.
normalize and trunc functions are used for modifying aggregation output. trunc call returns top x records for aggregation results. In the example above this function is used for retrieving top 10 aggregation records. normalize call sets normalization factor for the specified aggregation, it is used for represented time in microsecond in example above.
pid$1:::entry
{
self->timestamp = timestamp;
}
pid$1:::return
/self->timestamp/
{
@result[strjoin(strjoin(probemod,"`"), probefunc)] = avg(timestamp - self->timestamp);
self->timestamp = 0;
}
END
{
normalize(@result, 1000);
trunc(@result, 10);
}
libSystem.B.dylib`tiny_free_list_add_ptr 18
tasksexplorerd`task_info_manager_get_tasks_count 18
libSystem.B.dylib`tiny_malloc_from_free_list 18
libSystem.B.dylib`szone_free_definite_size 19
libSystem.B.dylib`tiny_free_list_remove_ptr 19
libSystem.B.dylib`mach_msg_trap 22
libSystem.B.dylib`write$NOCANCEL 22
libSystem.B.dylib`__sysctl 23
libSystem.B.dylib`gettimeofday 24
libSystem.B.dylib`select$DARWIN_EXTSN$NOCANCEL 3802
Providers.
syscall.
The syscall provider gives information about every system call. This provider provides a pair of probes for each system calls: entry and return. The following script displays stack trace for all program system calls.
syscall:::entry
/pid == $target/
{
@result[ustack()] = count();
}
...
libSystem.B.dylib`__getdirentries64+0xa
libSystem.B.dylib`readdir$INODE64+0x43
CoreFoundation`_CFBundleCopyDirectoryContentsAtPath+0x65b
CoreFoundation`_CFSearchBundleDirectory+0x67
CoreFoundation`_CFFindBundleResourcesInRawDir+0x1a4
CoreFoundation`_CFFindBundleResourcesInResourcesDir+0x246
CoreFoundation`_CFFindBundleResources+0x3db
CoreFoundation`CFBundleCopyResourceURL+0x91
CoreFoundation`CFBundleGetLocalInfoDictionary+0x55
CoreFoundation`CFBundleGetValueForInfoDictionaryKey+0x25
tasksexplorerd`extract_bundle_info+0x8e
tasksexplorerd`task_explorer_base_info_1_svc+0xf8
tasksexplorerd`task_explorer_prog_1+0x188
libSystem.B.dylib`svc_getreqset+0x1cb
libSystem.B.dylib`svc_run+0x81
tasksexplorerd`main+0x1c0
tasksexplorerd`start+0x34
tasksexplorerd`0x1
...
objc.
The objc provider provides information about Objective-C messages. For example, this provider allows to create the script for collecting top 10 Objective-C messages with the stack traces.
objc$target:ProcessInfo::entry
{
@result[ustack()] = count();
}
END
{
trunc(@result, 10);
}
...
Tasks Explorer`-[ProcessInfo pid]
Tasks Explorer`-[TasksInfoManager updateInfo]+0x3e5
Foundation`__NSFireTimer+0x72
CoreFoundation`__CFRunLoopRun+0x1958
CoreFoundation`CFRunLoopRunSpecific+0x23f
HIToolbox`RunCurrentEventLoopInMode+0x14d
HIToolbox`ReceiveNextEventCommon+0x136
HIToolbox`BlockUntilNextEventMatchingListInMode+0x3b
AppKit`_DPSNextEvent+0x2c4
AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]+0x9b
AppKit`-[NSApplication run]+0x18b
AppKit`NSApplicationMain+0x16c
Tasks Explorer`start+0x34
Tasks Explorer`0x2
13596
io.
The io provider creates probes related to disks input output operations. There are four types of probes available.
- start - fires when an I/O request is made.
- done - fires when an I/O request has been fulfilled.
- wait-start - fires immediately before a thread begins to wait pending completion of a given I/O request.
- wait-done - fires when a thread is done waiting for pending completion of a given I/O request.
These probes take 3 arguments: struct buf* as arg[0], devinfo_t* as arg[1] and fileinfo_t* as arg[2].
typedef struct bufinfo
{
int b_flags; /* flags */
size_t b_bcount; /* number of bytes */
caddr_t b_addr; /* buffer address */
uint64_t b_blkno; /* expanded block # on device */
uint64_t b_lblkno; /* block # on device */
size_t b_resid; /* # of bytes not transferred */
size_t b_bufsize; /* size of allocated buffer */
caddr_t b_iodone; /* I/O completion routine */
dev_t b_edev; /* extended device */
} bufinfo_t;
typedef struct devinfo
{
int dev_major; /* major number */
int dev_minor; /* minor number */
int dev_instance; /* instance number */
string dev_name; /* name of device */
string dev_statname; /* name of device + instance/minor */
string dev_pathname; /* pathname of device */
} devinfo_t;
typedef struct fileinfo
{
string fi_name; /* name (basename of fi_pathname) */
string fi_dirname; /* directory (dirname of fi_pathname) */
string fi_pathname; /* full pathname */
offset_t fi_offset; /* offset within file */
string fi_fs; /* filesystem */
string fi_mount; /* mount point of file system */
} fileinfo_t;
The following script provides information about applications I/O activity and prints operated file name, application name and PID and operation type (read o write).
#pragma D option quiet
BEGIN
{
printf("%30s %20s %2s\n", "FILE NAME", "APP NAME(PID)", "RW");
}
io:::start
{
printf("%30s%15s(%5d)%2s\n", args[2]->fi_name, execname, pid, args[0]->b_flags & B_READ ? "R" : "W");
}
FILE NAME APP NAME(PID) RW
test.d TextMate( 562) W
.dat088c.005 quicklookd( 2188) W
index.sqlite-journal quicklookd( 2188) W
index.sqlite-journal quicklookd( 2188) W
index.sqlite quicklookd( 2188) W
index.sqlite quicklookd( 2188) W
thumbnails.data quicklookd( 2188) W
thumbnails.data quicklookd( 2188) W
pid.
The pid provider looks like objc provider and it provides similar functionality. This provider allows to collect user's calls information. For example, it is known that application has task_explorer_update_1_svc call and it is necessary to build call list. The script below implements that functional.
pid$target::task_explorer_update_1_svc:entry
/guard == 0/
{
self->spec = speculation();
speculate(self->spec);
guard = 1;
}
pid$target:a.out::
/self->spec/
{
speculate(self->spec);
}
pid$target::task_explorer_update_1_svc:return
{
commit(self->spec);
self->spec = 0;
}
1 -> task_explorer_update_1_svc
1 | task_explorer_update_1_svc:1
1 | task_explorer_update_1_svc:4
1 | task_explorer_update_1_svc:6
1 -> task_info_manager_update
1 | task_info_manager_update:0
1 -> KeysHashFunc
1 | KeysHashFunc:0
1 <- KeysHashFunc
...
proc.
The proc provider creates probes which fire when processes or threads are created or terminated. The following scrip provides list of processes executed by application with given PID. As result, the list of processes executed by XCode during project building is shown.
proc:::exec-success
/ppid == $target/
{
@[execname] = count();
}
touch 1
bash 2
sh 2
gcc-4.2 31
xcexec 33
From the next script output it is obvious that the gcc execution time is between 1/4 to 1/2 second for 14 runs and between 1/2 to 1 second for 13 runs.
proc:::start
/ppid == $target/
{
self->start = timestamp;
}
proc:::exit
/self->start/
{
@[execname] = quantize((timestamp - self->start) / 1000000);
self->start = 0;
}
gcc-4.2
value ------------- Distribution ------------- count
32 | 0
64 |@@@ 2
128 |@ 1
256 |@@@@@@@@@@@@@@@@@@ 14
512 |@@@@@@@@@@@@@@@@@ 13
1024 | 0
2048 |@ 1
4096 | 0
Using libdtrace.
Warning: this part of article is based on source of dtrace tool for Mac OS X 10.6.2. All functions and interfaces provided by libdtrace are unstable and can be changed at any time without notice.
libdtrace provides next programming interface:
- dtrace_open - dtrace framework initialization.
- dtrace_program_strcompile, dtrace_program_fcompile - D script compilation functions.
- dtrace_handle_* - registers user's function to be called whenever dtrace framework events occurred.
- dtrace_program_exec - enables probes and instrumenting code.
- dtrace_setopt - sets dtrace framework option.
- dtrace_go - runs dtrace script.
- dtrace_sleep - waits until new data are available.
- dtrace_work - processes new data and returns results via call-back functions. If the second parameter FILE * is not NULL, call-back function registered with dtrace_handle_buffered will not be called and output will be written by the given descriptor.
- dtrace_aggregate_print - calls call-back functions with existing aggregation results. Usually it is the last call in the DTrace consumer life cycle.
DTrace consumer life cycle:
Where chew, chewrec and dcmdbuffered are not libdtrace calls but user call-backs.
#include "dtrace.h"
#include "stdio.h"
#include "stdlib.h"
#include "mach/mach.h"
#include "mach-o/loader.h"
#include "mach-o/dyld.h"
#include "mach-o/fat.h"
#include "sys/sysctl.h"
#include "signal.h"
static dtrace_hdl_t *g_dtp;
static int g_intr;
static const char *g_prog =
//"tick-1s"
//"{"
//" printf(\"Found %s on CPU %d\\n\",execname,cpu);"
//"}";
//"proc:::exec-success"
//"{"
//" printf(\"%d\\n\", pid);"
//"}";
"syscall::write:entry"
"{"
" @counts[\"write calls count\", execname] = count();"
"}";
static int dcmdbuffered(const dtrace_bufdata_t *bufdata, void *arg)
{
printf("Print buffer:: %s", bufdata->dtbda_buffered);
return(DTRACE_HANDLE_OK);
}
static void dprog_compile(void)
{
int err;
dtrace_prog_t *prog;
dtrace_proginfo_t info;
if ((g_dtp = dtrace_open(DTRACE_VERSION, DTRACE_O_ILP32, &err)) == NULL){
printf("failed to initialize dtrace: %s\n",
dtrace_errmsg(NULL, err));
exit(1);
}
if ((prog = dtrace_program_strcompile(g_dtp, g_prog, DTRACE_PROBESPEC_NAME, 0, 0, NULL)) == NULL){
printf("failed to compile program");
exit(1);
}
if (dtrace_handle_buffered(g_dtp, dcmdbuffered, NULL) == -1) {
printf("Couldn't add buffered handler");
exit(1);
}
if (dtrace_program_exec(g_dtp, prog, &info) == -1){
printf("failed to enable probes");
exit(1);
}
}
static void dprog_setopts(void)
{
(void) dtrace_setopt(g_dtp, "strsize", "32");
(void) dtrace_setopt(g_dtp, "bufsize", "4m");
(void) dtrace_setopt(g_dtp, "aggsize", "256k");
(void) dtrace_setopt(g_dtp, "stacksymbols", "enabled");
(void) dtrace_setopt(g_dtp, "arch", "x86_64");
(void) dtrace_setopt(g_dtp, "aggsortrev", NULL);
(void) dtrace_setopt(g_dtp, "aggrate", "1sec");
}
static int chew(const dtrace_probedata_t *data, void *arg)
{
dtrace_probedesc_t *pd = data->dtpda_pdesc;
processorid_t cpu = data->dtpda_cpu;
char name[DTRACE_FUNCNAMELEN + DTRACE_NAMELEN + 2];
(void) snprintf(name, sizeof (name), "%s:%s",
pd->dtpd_func, pd->dtpd_name);
printf("%3d %6d %32s ", cpu, pd->dtpd_id, name);
return (DTRACE_CONSUME_THIS);
}
static int chewrec(const dtrace_probedata_t *data, const dtrace_recdesc_t *rec, void *arg)
{
if (rec == NULL) {
printf("\n");
return (DTRACE_CONSUME_NEXT);
}
return (DTRACE_CONSUME_THIS);
}
static void intr(int signo)
{
g_intr = 1;
}
int main(int argc, char **argv)
{
int done=0;
struct sigaction act;
dprog_compile();
dprog_setopts();
if (dtrace_go(g_dtp) != 0){
printf("Error in dtrace_go()\n");
exit(1);
}
(void) sigemptyset(&act.sa_mask);
act.sa_flags = 0;
act.sa_handler = intr;
(void) sigaction(SIGINT, &act, NULL);
(void) sigaction(SIGTERM, &act, NULL);
do {
if (!g_intr && !done)
dtrace_sleep(g_dtp);
if (done || g_intr) {
done = 1;
if (dtrace_stop(g_dtp) == -1)
printf("couldn't stop tracing");
}
switch (dtrace_work(g_dtp, stdout, chew, chewrec, NULL)) {
case DTRACE_WORKSTATUS_DONE:
done = 1;
break;
case DTRACE_WORKSTATUS_OKAY:
break;
default:
printf("processing aborted");
}
} while (!done);
dtrace_aggregate_print(g_dtp, stdout, NULL);
dtrace_close(g_dtp);
return(0);
}
|