Chapter 1 Introduction to Observability Tools

Bryan Cantrill's foreword describes operating systems as proprietary black boxes, welded shut to even the merely curious. Bryan paints a realistic view of the not-too-distant past when only a small amount of the software stack was visible or observable. Complexity faced those attempting to understand why a system wasn't meeting its prescribed service-level and response-time goals. The problem was that the performance analyst had to work with only a small set of hardwired performance statistics,...

792 TCP Statistics from Kstat

The kstat command can fetch all the TCP MIB statistics. You can print all statistics from the TCP module by specifying -m instead of -n -m, includes tcpstat, a collection of extra kstats that are not contained in the Solaris TCP MIB. And you can print individual statistics

564 Block Buffer Cache

The buffer cache used in Solaris for caching of inodes and file metadata is now also dynamically sized. In old versions of UNIX, the buffer cache was fixed in size by the nbuf kernel parameter, which specified the number of 512-byte buffers. We now allow the buffer cache to grow by nbuf, as needed, until it reaches a ceiling specified by the bufhwm kernel parameter. By default, the buffer cache is allowed to grow until it uses 2 of physical memory. We can look at the upper limit for the buffer...

83 cputrack Command

While the cpustat command monitors activity for the entire system, the cputrack command allows the same counters to be measured for a single process. This can be useful for focusing on particular applications and determining whether only one process is the cause of performance issues. The event specification for cputrack is the same as cpustat, except that instead of an interval and a count, cputrack takes either a command or -p pid. cputrack -T secs -N count -Defhnv -o file -c events command...

97 Interrupt Analysis intrstat

The intrstat command, new in Solaris 10, uses DTrace. It measures the number of interrupts and, more importantly, the CPU time consumed servicing interrupts, by driver instance. This information is priceless and was extremely difficult to measure on previous versions of Solaris. In the following example we ran intrstat on an UltraSPARC 5 with a 360 MHz CPU and a 100 Mbits sec interface while heavy network traffic was received. device I cpu0 tim ------------------ ---------------- device I cpu0...

321 Thread Summary prstat l

The -l option causes prstat to show one thread per line instead of one process per line. The output is similar to the previous example, but the last column is now represented by process name and thread number process lwpid. The name of the process name of executed file and the lwp ID of the lwp being reported.

1117Putting It All Together

When your program is linked, the compiler command line must include the argument -lkstat. cc -o print_some_kstats -lkstat print_some_kstats.c The following is a short example program. First, it uses kstat_iookup and kstat_read to find the system's CPU speed. Then it goes into an infinite loop to print a small amount of information about all kstats of type kstat_type_io. Note that at the top of the loop, it calls kstat_chain_update to check that you have current data. If the kstat chain has...

773 nxse Tool

ror A0KyMeHT roa geM0 Bepcue CHM2PDFPilot2.15.72.write our own performance monitoring tools. It also contained a collection of example tools, including nx.se which helps us calculate network utilization. Current tcp RtoMin is 400, interval 1, start Sun Oct 9 10 36 42 2005 Current tcp RtoMin is 400, interval 1, start Sun Oct 9 10 36 42 2005 Having kb s lets us determine how busy our network interfaces are. Other useful fields include collision percent coii , no-can-puts per second nocp s , and...

1121 The kstat Command

You can invoke the kstat command on the command line or within shell scripts to selectively extract kernel statistics. Like many other Solaris OS commands, kstat takes optional interval and count arguments for repetitive, periodic output. Its command options are quite flexible. The first form follows standard UNIX command-line syntax, and the second form provides a way to pass some of the arguments as colon-separated fields. Both forms offer the same functionality. Each of the module, instance,...

2122Load Averages

The load averages that tools such as uptime print are retrieved using system call getloadavg , which returns them from the kernel array of signed ints called avenrun . They are actually maintained in a high precision uint64_t array called hp_avenrun , and then converted to avenrun to meet the original API. The code that maintains these arrays is in the clock function from uts common os clock.c, and is run once per second. It involves the following. The loadavg update function is called to add...

7710 Ttcp

Test TCP is a freeware tool that tests the throughput between two hops. It needs to be run on both the source and destination, and a Java version of TTCP runs on many different operating systems. Beware, it floods the network with traffic to perform its test. The following is run on one host as a receiver. The options used here made the test run for a reasonable durationaround 60 seconds. Receive buflen 8192 nbuf 65536 port 5001 Then the following was run on the second host as the transmitter,...

25 psrinfo Command

To determine the number of processors in the system and their speed, use the psrinfo -v command. In Solaris 10, -vp prints additional information. The physical processor has 1 virtual processor 0 UltraSPARC-III portid 0 impl 0x15 ver 0x23 clock 900 MHz The physical processor has 1 virtual processor 1 UltraSPARC-III portid 1 impl 0x15 ver 0x23 clock 900 MHz

1135 netstatMulti Implemented in Perl

The Perl script in the following example has the same function as our previous example in Section 11.2.2 that used the kstat and nawk commands. Note that we have to implement our own search methods to find the kstat entries that we want to work with. Although this script is not shorter than our first example, it is certainly easier to extend with new functionality. Without much work, you could create a generic search method, similar to how usr bin kstat works, and import it into any Perl...

The Reviewers

A special thanks to Dave Miller and Dominic Kay, copy-reviewer extraordinaires. Dave and Dominic meticulously reviewed vast amounts of material, and provided detailed feedback and commentary, through all phases of the book's development. The following gave generously of their time and expertise reviewing the manuscripts. They found bugs, offered suggestions and comments that considerably improved the quality of the final workLori Alt, Roch Bourbonnais, Rich Brown, Alan Hargreaves, Ben...

781 tcptop Tool

tcptop, a DTrace-based tool from the freeware DTraceToolkit, summarizes TCP traffic by system and by process. The first line of the above report contains the date, CPU load average one minute , and two TCP statistics, TCPin and TCPout. These are from the TCP MIB they track local host traffic as well as physical network traffic. The rest of the report contains per-process data and includes fields for the pid, local address laddr , local port lport , remote address faddr 5 , remote port fport ,...

iosnoop uses DTrace to monitor disk events in real time The default output

The output is printed as the disk events complete. To see a list of available options for iosnoop, use the -h option. The options include -o to print disk I O time, using the adaptive disk-response-time algorithm previously discussed. The following is from iosnoop version 1.55. USAGE iosnoop -a -A -DeghiNostv -d device -f filename -m mount point -n name -p PID iosnoop default output -A dump all data, space delimited -D print time delta, us elapsed -N print major and minor numbers -v print...

45 Max IO Size

An important characteristic when storage devices are configured is the maximum size of an I O transaction. For sequential access, larger I O sizes are better for random access, I O sizes should to be picked to match the workload. Your first step when configuring I O sizes is to know your workload DTrace is especially good at measuring this see Section 4.15 . A maximum I O transaction size can be set at a number of places maxphys. Disk driver maximum I O size. By default this is 128 Kbytes on...

Index

SYMBOL A B C D E 0 G J K L IM N IO P R S T U V W B c filt 2nd 3rd cache hit ratio, cache misses calcloadavg change-pri 2nd checkcable 2nd 3rd class class-loaded class-unloaded clock 2nd CMS_D SABLED CMS IDLE CMS_SYSTEM CMS_USER common os msacct.c compact type format 2nd compiled-method-load 2nd compiled-method-unload copyinstr 2nd cpc. See performance counters. cpi listing performance counter pics and events ultrasparc iii ultrasparc iii cpu cache events ultrasparc iv ultrasparc t1 ultrasparc...

821 Cache Hit Ratio Cache Misses

If both the cache references and hits are available, as with the UltraSPARC IIi CPU in the previous example, you can calculate the cache hit ratio. For that calculation you could also use cache misses and hits, which some CPU types provide. The calculations are fairly straightforward cache hit ratio cache hits cache references cache hit ratio cache hits cache hits cache misses A higher cache hit ratio improves the performance of applications because the latency incurred when main memory is...

Index 1

SYMBOL A B C D E F G I J K L M N O P R S T U V W Z forcing a crash dump with halt -d forensic tools kernels direct i o direct memory access dma See direct memory access. directiostat determining i o size via dtrace one-liners how kstat i o records statistics i o time by layer io probes io tracing max i o size measuring concurrent disk event times other response times plotting concurrent activity plotting disk activity plotting disk activity, a random i o example plotting raw driver events...

1151 A kstat Provider Walkthrough

To add your own statistics to your Solaris kernel, you need to create a kstat provider, which consists of an initialization function to create the statistics group and then create a callback function that updates the statistics before they are read. The callback function is often used to aggregate or summarize information before it is reported to the reader. The kstat provider interface is defined in kstat 3KSTAT and kstat 9S . More verbose information can be found in usr The first step is to...

Appendix B DTrace OneLiners

Section B.2. DTrace Longer One-Liners New processes with arguments, dtrace -n 'proc exec-success trace curpsinfo- gt pr_psargs ' Fi1es opened by process name, dtrace -n 'sysca11 open entry printf s s,execname,copyinstr arg0 ' Fi1es created using creat by process name, dtrace -n 'sysca11 creat entry printf s s,execname,copyinstr arg0 ' Sysca11 count by process name, dtrace -n 'sysca11 entry num execname count ' dtrace -n 'sysca11 entry num probefunc count ' dtrace -n ' sysca11 entry num...

935 Analyzing Locks with lockstat

The lockstat command provides summary or detail information about lock events in the kernel. By default without the -i as previously demonstrated , it provides a systemwide summary for lock contention events for the duration of a command that is supplied as an argument. For example, to make lockstat sample for 30 seconds, we often use sleep 30 as the command. Note that lockstat doesn't actually introspect the sleep command it's only there to control the sample window. We recommend starting with...

Personal Acknowledgments from Richard

Without a doubt, this book has been a true team collaborationwhen we look through the list, there are actually over 30 authors for this edition. I've enjoyed working with all of you, and now have the pleasure of thanking you for your help to bring these books to life. First I'd like to thank my family, starting with my wife Traci, for your unbelievable support and patience throughout this multiyear project. You kept me focused on getting the job done, and during this time you gave me the...

Volume 1 Solaris Internals

Part One Introduction to Solaris Internals Chapter 1 Introduction Part Two The Process Model Chapter 2 The Solaris Process Model Chapter 3 Scheduling Classes and the Dispatcher Chapter 4 Interprocess Communication Chapter 5 Process Rights Management Part Three Resource Management Chapter 6 Zones Chapter 7 Projects, Tasks, and Resource Controls Part Four Memory Chapter 8 Introduction to Solaris Memory Chapter 9 Virtual Memory TOT goKyMeHT co3gaH geMO Bepcwefi CHM2PDF Pilot 2.15.72. Chapter 10...