New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
diagnostics_channel: add tracing channel #44943
base: main
Are you sure you want to change the base?
Conversation
|
It might be helpful to allow/support something like |
|
The idea was to just modify the ctx object whenever updates occur between the events. Open to adding additional events though if we have a need to capture immediate data for whatever reason. |
That should be fine. Main issue I see with the context object is that there are some pre-ocupied property names (error, result as of now) which could easily result in conflicts various channels add more and more such properties to transport more data. |
95f3198
to
0438150
Compare
|
Made some changes to split up the logic for sync, callback, and promise functions. What do people think of this design? If this seems reasonable I can move forward with writing docs for it. |
|
I think the split API is better because usually you know if you trace a sync or async function. We might add a "cb or promise" variant later if needed. And there are always special cases which don't fit (e.g. the returned promise is actually a mixin with an event emitter,...). How is it intended that a subscriber knows if operation is sync or async? We could transport the API used on the given context object. Or should this be part of the channel documentation? For the sync part we have start/end which could be used for calling enter/exit on |
|
Yes, we're only caring about the trace up to when the callback runs. If the user wants more they can use async_hooks. As for detecting sync vs async, we could have the start or end events include some flag like sync: true or sync: false depending on which version was used. |
ac655d8
to
f3afea1
Compare
f3afea1
to
8765e7d
Compare
8765e7d
to
db4c8c9
Compare
db4c8c9
to
9128df4
Compare
This comment was marked as outdated.
This comment was marked as outdated.
1d8b89e
to
1215d64
Compare
Co-authored-by: Colin Ihrig <cjihrig@gmail.com>
|
BTW, for anyone reviewing I have also rebased #44340 to use this interface to demonstrate the value and how it would be used in a relatively complex real-world scenario. |
| function decRef(channel) { | ||
| channel._weak.decRef(); | ||
| if (channel._weak.getRef() === 0) { | ||
| delete channels[channel.name]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| delete channels[channel.name]; | |
| channels[channel.name] = undefined; |
Otherwise, you will delete the shapes and the property access will be slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but the memory use will be greater, and performance of getting channels shouldn't be much of a priority as it's intended for it to be acquired at the file top-level rather than dynamically. It's a situation of deciding which trade-offs to go for. 🤷🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @mcollina has some thoughts here? Not a clear win either way to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How frequently would a call to decRef be in a real APM? I feel deleting the key would have more prejudice than a little memory increase (assuming fewer calls)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ideal conditions, basically never. But the whole reason for the weak behavior in the first place is to protect against abusing it with excessively dynamic use. If someone is continuously generating and subscribing to new, uniquely-named channels it will be constantly expanding the shape without ever cleaning it up. Though we technically already have that with GC if a channel gets collected without a subscriber ever being added. 🤷🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unrealistic. Technically, if someone is subscribing excessively to unique channels, they will only see a memory increase after tons of calls, right?
Also, might be good to include a benchmark to validate our thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a test around that at the moment. https://github.com/nodejs/node/blob/main/test/parallel/test-diagnostics-channel-memory-leak.js
Without the delete that test will fail because the shape of channels keeps expanding and will get filled out with a bunch of empty WeakReference instances that the referenced object can get cleaned up but the container itself will not and those containers can add up quickly if people are, for example, generating channels uniquely named to ids of things. In my opinion that's bad practice, but there's not really any reasonable way to stop users from doing that so we opted for being the least damaging to the environment we can be if the API is used badly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to use delete here to avoid a leak if we stick on an object. Maybe we should use a Map instead?
The lookup/remove performance is in my opinion not that important as the intended use is to create a Channel once during startup and use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, might be worth considering.
| > Stability: 1 - Experimental | ||
| The class `TracingChannel` is a collection of [TracingChannel Channels][] which | ||
| together express a trace. It is used to formalize and simplify the process of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| together express a trace. It is used to formalize and simplify the process of | |
| together express a span. It is used to formalize and simplify the process of |
Or maybe operation. The term trace is usually a tree of linked spans/operations and a single tracing channel forms one entry of a trace not the whole trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "span" or "operation" are great terms here either as there's zero agreement on terminology for what those units are called between APMs. It also means additional terminology that likely needs more explaining. This feature is not intended exclusively for APM and what can be considered a "trace" can be a single unit just as much as a collection of units. I think given the naming of TracingChannel it makes sense to call it a trace. If an expression of a collection of these units is described in this doc somewhere we could just call it a sub-trace. Many APMs think of it that way anyway, where there's not really a distinction between spans and traces, they're all just nested elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, using (internal) terms from APM vendors will not help here.
I thought using Span would be quite good because it's also used in OpenTelemetry which is quite popular meanwhile and open.
At some other places you used single traceable action. Matches better but is quite long.
Anyhow, feel free to keep it as it is. APM aware people will be likely able to understand this and others are likely lost at more places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The single traceable action terminology might be better here. It is long, but it's also unambiguous and not so much based on loaded terminology like "span" or whatever else an APM might know the concept as. What do you think of changing it to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good.
Co-authored-by: Gerhard Stöbich <deb2001-github@yahoo.de> Co-authored-by: Chengzhong Wu <legendecas@gmail.com>
7fd627f
to
0544ff2
Compare
0544ff2
to
bc67924
Compare
| @@ -105,27 +188,17 @@ function channel(name) { | |||
| } | |||
|
|
|||
| channel = new Channel(name); | |||
| channels[name] = new WeakReference(channel); | |||
| channel._weak = new WeakReference(channel); | |||
| channels[name] = channel._weak; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this two lines into the Channel constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably? I'll have to make sure the GC characteristics are still correct that way. I think it should be fine though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no difference. In the end you add two properties to the new channel instance. Not sure why we need two properties referring to the same WeakReference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, don't actually need _weak anymore. That's leftover from something I had before to hold channels open through TracingChannel as I was constructing them directly before, but I changed how that works so that's not relevant anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed that this is not resulting in two properties on channel as it is channels[name]. I guess that is still needed to allow a setup where a consumer subscribes on a channel but doesn't keep a reference to the channel instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can go back to how it was before where the WeakReference is stores directly in channels[name] rather than also needing to exist on the channel itself. That was just there because I had some weird weak reference forwarding thing for TracingChannel which turned out to be a bad idea so I deleted it.
| @@ -105,27 +188,17 @@ function channel(name) { | |||
| } | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly related to this PR but shouldn't we move the type check to the beginning of the function?
| } | ||
|
|
||
| function maybeMarkInactive(channel) { | ||
| // When there are no more active subscribers, restore to fast prototype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // When there are no more active subscribers, restore to fast prototype. | |
| // When there are no more active subscribers and no bound stores, restore to fast prototype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still correct that ActiveChannel.hasSubscribers returns always true?
There can be a bound ALS but no subscriber.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think of a bound ALS as a type of subscriber, but maybe we need a discussion of what specifically hasSubscribers means? My intent for it was to decide if something needs to be published at all, which is still kind of the case but with a distinction between publish and runStores. Maybe we need another getter for if runStores needs to be called, which is any of subscribers or stores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TracingChannel has no such API and in the traceXXX functions no check is done. Also your real world sample in loaders doesn't check this.
But I agree that from that point of view hasSubscribers does the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TracingChannel actually did have it originally, but I removed it in favour of doing something like tracingChannel.start.hasSubscribers instead.
| * `thisArg` {any} The receiver to be used for the function call. | ||
| * `...args` {any} Optional arguments to pass to the function. | ||
|
|
||
| Publishes data to the channel and applies it to any AsyncLocalStorage instances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should document the sequence that first publish happens and then the store is entered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do that. Can you think of any examples when that ordering would matter? Do we need to say anything beyond just swapping that and for a then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be clear what one gets if als.getStore() is called inside the onMessage function. Same is valid for the transform function where sample relies on the fact that it happens before als.run() is called to get the parent.
I think it's important to tell users that publish and transform happen before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. I'll make some improvements to that soon. :)
| } | ||
|
|
||
| function wrapStoreRun(store, data, next, transform = defaultTransform) { | ||
| return () => store.run(transform(data), next); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we somehow handle a throwing transform function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could handle it similarly to subscribers where failures get deferred to the next tick so it can finish the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either this or silently discard the exception and skip this subscriber. But I think we should not swallow such exceptions.
| unsubscribe, | ||
| Channel | ||
| Channel, | ||
| TracingChannel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to export TracingChannel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically one could sub-class it. Not sure why they'd want to though. 🤷🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar as for Channel we export a factory method instead of telling users to use new TracingChannel().
If we don't want construction via new allowing sub-classing sounds wrong.
Well one could use it for instanceof.
ActiveChannel is not exported and this is good to my understanding.
Also Channel should be not exported in my opinion but that ship has sailed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Channels will pass instanceof even when active due to the SymbolHasInstance method on Channel. Also, ideally I'd like to make the factory unnecessary. I have some ideas for sub-classing TracingChannel which I'd like to enable if at all possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove the factory tracingChannel? I see no reason to support both variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just went for that for consistency with Channel. We'll see how my sub-classing experiments go though. I'm not totally convinced of the utility of exposing both. I'll be taking another pass at cleaning stuff up soon, just busy with settling other work things before the holidays so might not get a chance to update this PR this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. My main point is that it is always easier to add an export compared to removing it - even if it is experimental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, I think I can just omit it then. It's easy to add later if I have a specific case for it.
| this.publish(data); | ||
|
|
||
| // Bind base fn first due to AsyncLocalStorage.run not having thisArg | ||
| fn = FunctionPrototypeBind(fn, thisArg, ...args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is considered as a hot function it would maybe make sense to avoid the bind in case this._stores.size === 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inactive channel prototype handles that for the most part. A bit of a weird case with subscribers and stores competing for active status. Ideally this should be a no-op when there's no stores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a noop, the function needs to be called. But this can be a simple Refect.apply instead of creating a bound function and call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean no-op in the form of doesn't need to check anything. Basically it should do what Channel.prototype.runStores does even if there are subscribers but no stores.
| * `data` {Object} Shared object to correlate trace events through | ||
| * `thisArg` {any} The receiver to be used for the function call | ||
| * `...args` {any} Optional arguments to pass to the function | ||
| * Returns: {Promise} Promise returned by the given function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a chained promise, not the original one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an implementation detail which ideally shouldn't be visible to the user. Do you think this is really relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For plain, native promises not. But there are still a lot frameworks which use thenables like bluebird. I have even seen thenables which are EventEmitters to allow caller to decide on awaiting or something else.
But maybe we should just disallow non native promises. Then it's don't care.
|
|
||
| try { | ||
| const promise = start.runStores(ctx, fn, thisArg, ...args); | ||
| return PromisePrototypeThen(promise, resolve, reject); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is restricted to native promises and wont work for there thenables like e.g. a bluebird promise.
Should we add some check regarding this to avoid a TypeError is thrown from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just wrap promise in a Promise.resolve(promise) if it's not already a native promise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is that it removes any API the original thenable had which likely breaks user code. So maybe best to restrict this to native promises. (refs: other comment above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to document it anyway, I think I'd rather just document that it will convert thenables to native promises rather than documenting that it doesn't support thenables at all.
| } | ||
|
|
||
| const callback = ArrayPrototypeAt(args, position); | ||
| ArrayPrototypeSplice(args, position, 1, wrappedCallback); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check that we really replace a function here
| // Bind a store with transformation of published data | ||
| const store2 = new AsyncLocalStorage(); | ||
| channel.bindStore(store2, common.mustCall((data) => { | ||
| assert.deepStrictEqual(data, inputs[n]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assert.deepStrictEqual(data, inputs[n]); | |
| assert.strictEqual(data, inputs[n]); |
I guess this should be the same object instance. Similar a few lines below.
|
|
||
| channel.runStores(inputs[n], common.mustCall(function(a, b) { | ||
| // Verify this and argument forwarding | ||
| assert.deepStrictEqual(this, thisArg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assert.deepStrictEqual(this, thisArg); | |
| assert.strictEqual(this, thisArg); |
| assert.strictEqual(channel.start.hasSubscribers, false); | ||
| channel.subscribe(handlers); | ||
| assert.strictEqual(channel.start.hasSubscribers, true); | ||
| channel.traceSync(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe assert return value and passes arguments
| const input = { foo: 'bar' }; | ||
|
|
||
| function check(found) { | ||
| assert.deepStrictEqual(found, input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assert.deepStrictEqual(found, input); | |
| assert.strictEqual(found, input); |
similar in other tests
|
A bit confusing here is the reuse of channels as elements of For channels dedicated created by |


This adds a helper to present tracing functionality through a group of channels and a shared context object. The shared context object can be used to communicate meta information about the action being traced.
No effort is made to link traces together, this only provides the basics to express a span for a single sync or async task. It's left up to the user to track and link span data through something like AsyncLocalStorage.
Similar to #44894, I'm starting this as a draft and skipping docs for the moment to get feedback on the API design. If we settle on this design satisfying our needs I'll write up some proper docs for it.
cc @nodejs/diagnostics