Quantcast
Channel: Randy Riness @ SPSCC aggregator
Viewing all articles
Browse latest Browse all 3015

MSDN Blogs: ETW - Overview

$
0
0

Event Tracing for Windows (ETW) is a system for routing events. ETW is primarily intended for diagnostic purposes and is optimized to minimize impact on the overall system performance. ETW should not be used for control purposes because it does not offer guaranteed delivery -- events might be lost in certain circumstances (e.g. if events occur too quickly or if the system shuts down before the events are saved to disk).

ETW Components

ETW works as follows: a controller tool is used to configure sessions (telling ETW what events should be routed to a session and how to store the events that have been routed to a session); an event provider generates events; the ETW runtime routes events to the appropriate sessions; the sessions record the event; a decoder tool extracts information from the events; an analysis tool makes use of the information from the event.

Microsoft provides several controller tools such as xperf, logman, and tracelog. You can use these tools to start a session (configuring it to save certain events to a specific file) and to stop a session. 3rd parties can make use of the ETW APIs (StartTrace, ControlTrace, TraceSetInformation, EnableTraceEx2),  to create their own controller tools. In addition, the Windows OS itself controls several sessions that it uses to monitor its own performance and reliability. ETW sessions can be configured in many different ways:

  • The session can be configured to write events to a file. In addition, ETW can treat the file as unlimited, as a circular buffer (overwriting old events once the file has reached a certain size), starting a new file when the file has reached a certain size, or stopping the trace when the file has reached the size limit.
  • The session can be configured to write events to memory. These events will be lost if the system is turned off. However, the data in memory can be flushed to disk on-demand. The session can stop when memory is full or it can use the memory as a circular buffer (overwriting old events once memory is full).
  • The session can be configured to send events to a program in real-time. The program can then decide what to do with each event.

3rd parties can make use of the ETW APIs to create providers. Many Microsoft-provided components also act as providers. The Windows OS kernel publishes many events regarding the operation of the OS, and many Microsoft-provided drivers and applications publish events as well.

Many ETW sessions are always running on the system, started and controlled by the OS itself. For example, the Windows Event Log uses ETW sessions to receive events, so the Event Log sessions are always running to enable this functionality. Other ETW sessions are started as needed. For example, Visual Studio will start a special ETW session when you ask it to do performance profiling of your application. Visual Studio will configure the session to receive events from the OS about the performance of your application, which it will then analyze to determine where your application is spending its time and memory.

Microsoft provides several decoder tools that you can use to read data that has been recorded by an ETW session. There are many different ways of encoding data into ETW events. ETW itself treats the data as binary data, and any kind of data can be encoded using any system. If the data is encoded using a custom scheme, you must use a custom tool to decode the data. However, there are several Microsoft-supported systems for encoding ETW data, and corresponding tools that can decode the ETW events generated by these systems. For example:

  • WPP is a system for encoding diagnostic trace information and writing it to ETW. If you use WPP to generate ETW events, you can decode them using tools such as tracefmt or tracerpt. Note that in order to decode these traces, you must have the TMF or PDB files with information about your events.
  • MOF events are rarely used in new code, but are generated by the Windows OS. Tools such as tracerpt, xperf, and WPA can decode the most common MOF events. In order to decode MOF events, the event schema must be registered with WMI.
  • Manifest-based ETW is a general-purpose system for encoding and decoding ETW data using XML manifests. If you use manifest-based ETW, you can decode the events using tracerpt or xperf. Note that in order to decode these traces, the data from the manifest must be available to the decoding tool. This can be provided directly (via the XML manifest) or indirectly by compiling the manifest into a binary format, including the binary data in the resources of your component, and registering the component on the system so that the decoding tool can find it.
  • TraceLogging is a general-purpose system for encoding and decoding ETW data without using any separate decoding information. These events can be decoded using the Windows 10 versions of tracerpt or xperf. Each event contains its own decoding information. This means each event will be larger, but it also means that the decoding information is always available and no separate manifests, MOF files, PDBs, or TMF files need to be tracked.

3rd parties can make use of the ETW APIs (ProcessTrace, EVENT_RECORD) to get the data from an ETL file, and can use the TDH APIs (TdhGetEventInformation) to decode events that use the Microsoft-supported encoding systems.

ETW was introduced as part of WMI in Windows 2000. Events written using the Windows 2000 APIs (e.g. TraceEvent) are sometimes called "Classic" events. WPP and MOF events are classic events. Classic events have a limitation in that they can only be routed to a single session.

ETW was significantly updated for Windows Vista. Events written using the Windows Vista APIs (e.g. EventWrite) are sometimes called "Crimson" events. Manifest-based and TraceLogging events are crimson events. Crimson events can be routed to up to 8 sessions.

Event Content

Each ETW event contains the following information:

  • Required user-supplied information, such as the event provider GUID, the event ID, the event's "severity level", and the event's keywords. (For example, the data in the EVENT_DESCRIPTOR structure.)
  • Optional user-supplied information, such as event payload (binary data), activity GUID, related activity GUID.
  • Always-present ETW-supplied information, such as the event's timestamp, the ID of the thread logging the event.
  • Optional ETW-supplied information, such as call stacks. The session controller configures ETW to include or exclude this data.

The Event Provider GUID is supplied in the call to EventRegister. This GUID is used when routing and decoding events. For example, an event controller might configure a session to include all events with a particular GUID. When decoding MOF-based and manifest-based events, the provider GUID is used to find the data that explains how to decode events. Note that multiple components can use the same provider GUID (even if they run at the same time) as long as you intend to enable/disable/route the events from multiple components as a unit, and in the case of manifest-based and MOF-based events as long as they share the same decoding information. Note also that there should be a strong 1-to-1 relationship between a provider name and the provider GUID. In order to maintain this relationship, you might want to base the provider GUID directly on the provider name using a hash. (A tool that performs this hash is provided in an earlier blog post.)

Each event has an event ID. The event ID is used when routing and decoding events. For example, an event controller might configure a session to include only events 1, 2, and 5 from a particular provider. When decoding MOF and manifest-based events, the event ID + event Version should uniquely identify the event (e.g. given a provider GUID, an event ID, and an event Version, you should be able to look up the event's binary layout, and the binary layout should never change for the GUID + Id + Version combination -- every event with the given GUID + Id + Version should have the same fields with the same types in the same order).

Each event has an event version. This is used when you need to change the fields in an event. When the binary layout of an event is changed, you'll typically duplicate the event in the manifest or MOF file, then increment the event version in one copy and make the necessary changes in the copy. That way, you can still decode events in the old format, and you can move forward using the new format.

Each event has an event level. This is an indicator of the event's severity or importance, and is used in event filtering. For example, an event controller might configure a session to include only events with severity "error" from a particular provider, or an event analysis tool might filter out events with severity below "warning". The level can be any value from 0 to 255. Levels 1-5 have been defined: 1 is critical, 2 is error, 3 is warning, 4 is info, and 5 is verbose. Typically, events should default to level 5 (verbose). The level 0 is special -- it means that the event does not specify a particular level, and will be enabled regardless of any level-based filtering done for the session. A lower value means a more-severe event.

Each event has a keyword field. (In classic ETW, the keyword field was called flags.) Each bit in the keyword corresponds to a category, and if the bit is set in the keyword for an event, it indicates that the event is in the specified category. The low 48 bits of the keyword are user-defined, while the upper bits must only be set for specific scenarios as defined by Microsoft. For example, the provider might define bit 0x1 as "Networking event", bit 0x2 as "I/O event", and bit 0x4 as "UI event", and a particular event might set its keyword to 0x5 indicating that the event is related to both networking and UI. An event controller might configure a session to only include events with certain keywords. If the keyword value is 0 (no bits set), that means the event does not specify any keyword and will be enabled regardless of any keyword-based filtering. (Note that starting with Windows 10, a controller can configure a session to ignore events with keyword set to 0, but this is not the default behavior.)

Each event has an opcode field. (In classic ETW, the opcode field was called class.) The opcode is used to mark particular events, but does not affect ETW routing. A few opcodes have well-defined semantics recognized by analysis tools. Other opcodes can be defined by the user for any purpose and general-purpose analysis tools will only use the opcode to label or group the events. The default opcode is 0 = Info. The most commonly-used opcodes are Start (indicating the beginning of an activity) and Stop (indicating the end of an activity). Other commonly-used opcodes include DC_START (typically indicating that the associated event contains a dump of the provider's state, triggered by a notification that a session has started collecting data from the provider) and DC_END (similarly indicating that the associated event contains a state dump triggered by the end of data collection). The remaining opcodes can be used or redefined at the discretion of the user.

Each crimson event has a task field. The task is a user-defined category that can be used to mark an event. While there is no specific guidance for the use of the task field, it is commonly used to assign a name to an event. Manifest-based ETW does not support associating a name with an event ID (i.e. there is no direct support for the concept of an "event name"). However, it does support associating a name with a task. It is common practice to use the same value for Event ID and Task ID, and to use the task name as the event name.

Each crimson event has a channel field. The channel can cause an event to be given special treatment by the ETW runtime or by an event consumer. The default channel 0 means no special treatment is requested. Channels 1-15 are reserved for definition by Microsoft and may lead to special treatment for the event by the ETW runtime or other built-in Windows OS components. Other channel values are user-defined and can be used to request special treatment from a session. For example, a real-time session might interpret channel 25 as indicating a high-priority event. Channels are most often used when interacting with the Windows Event Log. The Event Log's session listens to certain event providers that have registered themselves with the service. The Windows Event Log service uses the event's channel to route the event to the correct log. For example, a particular provider might register itself with Event Log and configure channel 16 to mean the Application log. 

Event Filtering and Routing

ETW is responsible for routing each event to the correct set of sessions. Classic event providers (providers using the Windows 2000 APIs such as TraceEvent) can be routed to at most one session. Crimson event providers (providers using the Windows Vista APIs such as EventWrite) can be routed to up to 8 sessions.

Most events are not routed at all. If no session has asked for any events from a particular provider, the provider is considered disabled. When a disabled provider calls EventWrite, ETW will determine that the provider is disabled and will immediately return success, indicating that the event was delivered to all interested providers (all 0 of them). However, this is relatively inefficient -- it takes ~100 CPU cycles for EventWrite to look up the provider's state and determine that the provider is disabled. (Note that using APIs such as EventEnabled or EventProviderEnabled will not help, since these APIs also take about ~100 CPU cycles to look up the provider's state. The only difference is that EventEnabled can be called before the event payload is set up.) The implementation of EventWrite looks something like this:

ULONG EventWrite(
    REGHANDLE handle,
    PCEVENT_DESCRIPTOR descriptor,
    ULONG count,
    PEVENT_DATA_DESCRIPTOR data)
{
    ULONG errorCode = ERROR_SUCCESS;
    PROVIDER_INFO info = LookupProviderInfo(handle);
    if (info.IsEnabled && info.IsLevelOk(descriptor.Level) && info.IsKeywordOk(descriptor.Keyword))
    {
        errorCode = info.SendEventToSessions(descriptor, count, data);
    }
    return errorCode;
}

Providers can improve their performance by providing a callback function when they register with ETW (see the EventRegister API). ETW will invoke the callback whenever a session enables or disables the provider. This allows the provider to keep track of its own state, including whether it is enabled or disabled, what levels are being filtered out, and what keywords are being filtered out. The provider can check its own state much more efficiently than EventWrite, allowing it to run much more efficiently when the provider is disabled. A provider might store its state in global variables and might then write events using code like this:

if (g_providerIsEnabled &&
    g_providerLevel > ThisEventDescriptor.Level &&
    IsKeywordEnabled(ThisEventDescriptor.Keyword))
{
    EVENT_DATA_DESCRIPTOR data[4];
    // Prepare data for event ...
    EventWrite(g_providerHandle, &ThisEventDescriptor, 4, data);
}

In the case where the provider is disabled, this will skip the event-write code using only 1 or 2 CPU cycles. In the case where the provider is enabled, but the event is filtered out by level or by keyword, this will skip the event-write code using only 5 or 10 CPU cycles. The code to prepare the data and call into ETW will only run if the provider is enabled for events with the given level or keyword.

The code for maintaining the callback and for checking the state of the provider for each event can be cumbersome. Various frameworks exist to hide this complexity from the user. WPP and manifest-based ETW generate header files with code that looks similar to that shown above, implementing very efficient filtering with little effort on the part of the developer. The TraceLoggingProvider.h header implements similar filtering in the TraceLoggingWrite macro. For example:

TraceLoggingWrite(
    g_hProvider,
    "EventName",
    TraceLoggingValue(GetMyValue(), "MyValue"));

In this code, the GetMyValue() function will only be invoked if the provider is enabled.

The .NET EventSource class implements efficient filtering within the WriteEvent, WriteEventCore, and Write methods, so these methods return very quickly if the provider is disabled or if the event is filtered by level or keyword. Similarly, the Windows Runtime LoggingChannel class also implements efficient filtering so that the LogEvent, LogMessage, and LogValuePair methods return very quickly if the event does not need to be written. However, in these cases, the automatic filtering does not eliminate the time wasted preparing the data for an event. For example, with .NET EventSource,

myEventSource.WriteMyValue(GetMyValue());

In this code, the GetMyValue() function will be invoked even if the provider is disabled. To avoid this, a separate check must be made. If the GetMyValue() function might be expensive, it would be more efficient to use code like this:

if (myEventSource.IsEnabled())
{
    myEventSource.WriteMyValue(GetMyValue());
}

Note that the above check only tests the overall provider state. This might be sufficient, or it might be better to use another overload of IsEnabled to also check for level-based or keyword-based filters.

The same applies to TraceLoggingProvider.h -- the automatic filtering does not skip code outside the TraceLoggingWrite macro. For these cases, a manual check might be needed for optimal code. Each of these frameworks provides methods for determining whether the provider is enabled without sending an event. For example, if preparing the data is more complex than a single GetMyValue() call, the above code might be written as follows:

if (TraceLoggingProviderEnabled(g_hProvider, 0, 0))
{
    int myValue;
    GetMyValue(&myValue);
    TraceLoggingWrite(...);
}

ETW supports many other kinds of filters that are difficult or impossible to replicate within the provider code. The provider-side filtering will not exactly match the complete filtering and routing done by ETW, and some events will pass the provider-side filtering only to be dropped by the more complete filtering done by the ETW runtime. This is generally ok -- the provider-side filtering is generally successful at eliminating the vast majority of unnecessary events in only one or two CPU cycles. However, for this simple provider-side filtering to be most effective, events and providers should be organized in a way that allows event scenarios to be defined by a combination of provider GUID, level, and keyword.

If scenarios are well-defined using filters understood by the provider, the scenario will have minimal impact on the system when the corresponding session is created. For example, if a "networking" keyword is defined, and the "error" level is used consistently, it is very easy to diagnose a networking error by setting up a session to collect events with level <= error and keyword mask = networking. On the other hand, if no keywords are defined for your events, and levels are not used consistently, diagnosing the same networking error might not allow for any meaningful filtering at the level or keyword layer. All events in the provider might need to be enabled and collected, leading to a performance impact while the session is enabled.

Each session can be configured to receive events from any number of provider GUIDs. (Each classic provider GUID can be associated with no more than one session. Each crimson provider GUID can be associated with no more than 8 sessions.) For each provider GUID, various filters can be established, such as filtering by level, keyword, or event ID. The session can also request additional information for certain events such as callstacks.

When ETW receives an event from a provider, it determines the set of sessions that are receiving events from that provider. It then checks the filters for each session, removing the session from the set if the event does not meet the session filters. If the set of sessions is non-empty, it reserves space for the event in each of the session's buffers. After it has reserved space, it copies the event into the reserved space for each session.

If ETW fails to reserve space in all sessions, its default behavior is to drop the event (i.e. if ETW cannot deliver the event to all interested sessions, it does not deliver the event at all). This policy was intended for consistency, i.e. so that the different sessions will all record a consistent story. However, this behavior is not always desirable. A session can opt out of this policy by setting the EVENT_TRACE_INDEPENDENT_SESSION_MODE flag.

Each ETW session has a worker thread that handles tasks for the session. For file-based sessions, the worker thread is responsible for noticing when a buffer is full, writing the full buffer to disk, marking the buffer as empty, and allowing it to be reused for future events. For real-time sessions, the worker thread is responsible for sending full buffers to the real-time consumer process.


Viewing all articles
Browse latest Browse all 3015

Trending Articles