Https imports #36328
Https imports #36328
Conversation
|
@nodejs/modules-active-members I couldn't find a sane way to refactor the |
|
Interesting, would there be a way to statically know what an app uses? Right now Sqreen parses the content of the diverse |
|
@vdeturckheim due to |
|
@bmeck true, I often kind of think that |
This may a new motivation to revive #35524. I think that should fix the issue (but it's potentially disruptive). |
|
I think this is really cool and it would be cool if this loader (optionally?) cached resources (according to HTTP headers, like browsers do). I think this is what you mean by "Offline cache" :] |
|
@jkrems an alternative is to have a resource cache in our default loader rather than doing everything at time of request, we can drop the resource once it ref counts down to 0 or some other metric. |
|
I do not agree to have this enabled by default. It’s really unsafe to do, and it is detrimental to the developer experience. (I do not have much time to explain why it’s completely unsafe to do this, I’ll try to write a long write up later if needed). |
|
@mcollina I'm not entirely opposed to flagging this (such as requiring it to be in a policy file listing), but if this leads to users consistently enabling them by default it would be good to understand why HTTPS (not HTTP) is insecure from your perspective so we can mitigate any issues. |
|
@mcollina... as this is still a draft and being actively discussed, I'm not sure the "-1"/Request Changes is helpful yet. @bmeck ... the security concerns on this really come down to the general risk of running untrusted code downloaded over arbitrary internet connections. For instance, I could fairly easily write a script that returns one bit of javascript during development but replaces that with a malicious script when accessed from a production server, without the developer ever knowing that there's been a change. Putting this behind a flag for now while we figure out the threat model and various mitigations definitely makes sense and would be right thing to do. |
|
@jasnell I've stated a security model document, though due to the lack of safety in using core's HTTPS a lot will inevitably be labeled as out of scope of this feature and must be secured by other means like policies. I would note that event-stream did the same kind of replacement workflow on files locally and do not consider this attack vector novel by adding HTTPS. Node's mitigation for that attack surface is to use a policy and the same would be true here. |
|
It is insecure because there is no signature of the original file. Anybody that can take control of a domain name can take control of a server. As a community we had some significant issues regarding security, from left-pad to event-stream. Adding this functionality would make us 10x more vulnerable to these kind of a attack because there would be no central entity that could intervene. |
|
@mcollina can you clarify how policies aren't that intervention/mitigation? |
The plan at the top highlight the fact that this will be enabled by default and it needs consensus to make users disable this. I do not agree with that approach, hence my comment. I'm happy to change my mind if it can be made safe to do so - however I do not think it is possible at all to make this safe. |
|
@bmeck .. for discussion.. from your checklist:
Initially I would mark this explicitly opt-in. Assuming we flip that default later, there should definitely be a way to explicitly opt-out on the process level.
Definitely a much larger discussion and one that we need to have with the package manager folks at the table. Not only would we need to figure out the caching semantics, we need to figure out the semantics around what happens if a require('https...') script does a require('some-local-script'). What if those things end up being different versions? I think what we need to do is take a moment to draw up a list of What Ifs and see if we can easily answer those.
I'm going to sound like a broken record but as I've been saying for years now we 100% need this. Adding https imports would only make that need more pronounced.
Blocklist / Allow list of origins at the very least. Cookies are a much more difficult matter. My knee jerk reaction is that we should not support cookies.
This I'm less convinced about and need to think through more. My knee jerk reaction is that not supporting redirects in some way is very anti-web and therefore a bad thing but that's the Standards Wonk side of me speaking. Will stew on this one.
I think I may be able to take this over for you but it would realistically be a 2021 project.
I think we can safely just say no to http modules.
This mechanism should allow for locally aliasing of https modules anyway, making it so that even if require('https...') is used, if there is a local alias established for it, the local one is used. That should address at least the immediate issue. However, the point here still stands: we'll need to provide observability into what is happening during load. We do have keylog support already built in to core. We'll want to make sure we have command line options to allow keylogging for Node.js' own module loader connections. |
|
@mcollina we need to understand why you consider it unsafe given current features and mitigations for these workflows that already exist. I don't think a blanket statement that it is unsafe and implication that it cannot be safe is helpful. |
|
@bmeck ... I'm going to take a bit more time to dig through your write ups and code and think it through before responding much more as I don't want to just churn the conversation here. Hopefully others will do the same. I'll definitely take on the MIME module work tho if that would make things easier for you. |
|
The attack vector is the same of left-pad and event-stream. There are plenty of write ups around on what happened. The mitigations that npm put in place are not feasible without a central/federated authority or strong cryptography. I don't have time right now for a long write-up, I'll try to get back as soon as I can. What would make this safe is giving authors a way to cryptographically sign the published files, and let developers validate those files against said keys that are fetched off-band. Overall I would recommend developing this as a module/loader and then propose its addition to Node.js. |
I don't understand this. I am vehemently against code signing on the developer if it is done improperly and have met over the years with a few certificate authorities about the complexity of doing this. This reduces down to the same workflow as TLS where you pull the keys off the trusted CA which is out of band by nature then verify. What advantage do we get here? Even if they upload a personal key such as using PGP the revocation mechanism and hijacking or repurposing under a different key are all possible.
What is "this"? I'd note that a variety of the mitigations are not possible in user land loaders (such as policy integration) and loader have repeatedly been stalled out by Node's process so I doubt they will move forward any time soon. I spent a lot of time on loaders and trying to move them forward and don't think they are worth the push back from the consensus model we use in Node core. |
|
Hey all, so I brought up some points recently on Twitter related to this proposal, so I thought I'd re-articulate them here. Keep in mind I've never contributed to Node.js, this is just my opinion as a long-time user and a contributor to Deno. TL;DR:Importing from npm isn't inherently more secure than importing from a raw URL, so let's work to make the whole ecosystem more secure, regardless of how packages are imported. URL Imports Aren't Inherently Less SecureFirst I want to address the biggest concern that always comes up when talking about URL imports: "It's insecure." The argument is usually summed up similarly to what @jasnell said earlier in this thread: "I could fairly easily write a script that returns one bit of javascript during development but replaces that with a malicious script when accessed from a production server, [...]" It's true, but it implies that the current way of importing is somehow more secure, and that it is more secure by virtue of not explicitely using a URL. However, packages hosted on npm are vulnerable to this problem as well. Not even a month ago, there was another report of malicious code being distributed in an npm package. We've become accustomed to assume that central registries are somehow more secure, but it's simply not the case. Packages are still fetched from a remote source and malicious code can still make its way into npm. Once we accept that reality, URL imports become much less scary (or central registries become much more scary, depending on how you look at it). The concerns raised here are all very valid; We just have to remember that they are issues that also affect the current import model, and so I don't it's fair to block this proposal on the grounds that it's "less secure". I would even argue that URL imports have the potential to be more secure than traditional imports through npm, because it leverages two important aspects of the web: Authority and Visibility. It gives end-users the possibility to verify that they are imported from a trusted domain name and that that domain name has a valid SSL certificates that verifies that it is indeed that domain. It also allows to verify the whois information associated with that domain, potentially being much more explicit about when a domain or a package changes ownership. (As an aside: wouldn't you rather import express directly from https://expressjs.org than https://npmjs.org ?) Those are all things that end users can implement themselves and include in their CI/CD pipelines and compliance tools. They've been the backbone of the internet for a long time. Browsers trust them. Users trust them. Why couldn't we? Feedback On The Concerns AboveThe other part of @jasnell comment - "[...] without the developer ever knowing that there's been a change." is very important; I think if URL imports are to go forward, they should be restricted to tagged versions, no "latest" allowed. deno.land/x uses the Local caching and interop with Similarly, I agree with @jasnell with regards to the cookies support; I don't think it makes much sense here. I'm less certain about redirects, they can be helpful for resolving versions when using tags like I don't see a reason to not include an allow/block function for specific origins, though I think we should also add a recommendation in the docs for operators to include those in firewall rules (hey if we're gonna take advantage of web mechanisms, let's go all the way!) Now for a big one: integrity checks. Honestly, I don't know how much we can do there. Ultimately we can include checksum verification to make sure that the content the user receives is the same one the server said it was sending and that it wasn't tempered with. Beyond that, I don't there's much else that can be done. As I said earlier, it's already possible for legitimate users to upload malicious code to a legitimate platform. The best anyone can do here is certify that the content wasn't messed with by a man-in-the-middle. |
They can be but the way the web is specified leads to issues if you do. Let us imagine:
On the web |
|
Per integrity checks that many people keep bringing up please read https://nodejs.org/dist/latest/docs/api/policy.html as they do already cover most of these concerns. However, like @wperron is bringing up, they assert integrity not if the resource is hostile or not. |
|
@ExE-Boss that isn't proposed because it can't handle cycles; use a policy in node or |
|
@bmeck But a local In this case, I see it more like: <script type="module" src="https://example.org/foo.js" integrity="sha512-..."></script> |
I'd be far more cautious here. Deno is nice but it's still very young and any security mitigations it may have implemented are far from battle tested. Anecdotally it is a good story but we should be careful with claims that is has "addressed" all of the concerns. |
Are there docs on its mitigations anywhere? |
|
Regarding the point that central registries are just as big of a nightmare for security as random URLs: That may be true, but at least with a central registry you have a higher level of management, mitigation, and various other things. Whereas with |
|
@MylesBorins had a nice idea and writeup in WICG about out-of-band integrity checks IIRC @ExE-Boss in-band integrity checks (in the module part as part of the metadata) are very problematic because a single module can mess up with the integrity of the whole system very easily. |
|
Kind of playing devil's advocate here but:
That's great and why we need policies/lock-files, the other side of the coin is that in NPM/central registries you don't bundle dependents with the code - so you can depend on Note that the NPM registry is public and we have no control on who publishes what to NPM - so stuff like "the domain changing" or "the file 404ing" is just translated to "the user of a dependency of your dependency publishes a patch version with malicious code". Also fwiw, we can 100% enforce thees things (have the cache automatically convert to a lock file that automatically asserts the domain certificate didn't transfer etc.). |
|
Also, as one of the few people actually writing Deno code here (I think?) - Deno's modules don't really work very well (yet?) in certain cases. I think this feature is really nice for prototyping and writing code with different guarantees and is still a net positive. This is very easy to do without URL imports as others have mentioned with import a data URL + an HTTP request. I am happy to loop-in the Deno people (which btw can be all found in the #dev channel in the Deno discord) and ask them for feedback and what doesn't work well yet or for advice. They're nice people and will likely help us. |
|
@benjamingr it seems like for prototyping, the ability to have https imports inside node_modules isn't important, only in the top-level app. thoughts? |
|
@ljharb sorry, missed that - deno has no concept of "node_modules" - Edit: See discussion below, as William has pointed out this is in fact possible in Deno and my previous assertion was false, I just happened to never have used it in my own code. So for me top level is enough but it's definitely used in a nested way in Deno. As explained above integrity checking is done through the lock file https://deno.land/manual/linking_to_external_code/integrity_checking. |
|
@benjamingr Deno can definitely have a deep dependency graph, and dependencies can have their own dependencies and so on and so forth. The whole graph is downloaded and cached locally in the global cache folder only once before running, and will run off the cache in subsequent runs, much like |
|
@wperron I guess that's possible, I've always written do using:
I never considered remote imports downloading their own remote imports and not being bundled, so much that after using it for a while until I read your message I didn't even realize that works (though I believe you it does, why wouldn't it? It's just another module and those can load modules from the web). Have you seen deno code bases that do this with https imports (load something from https and that loads other things from https instead of bundling them)? |
|
@benjamingr I don't want to hog the conversation with Deno-specifics, but just as an example the std lib in deno has interdependencies with itself in a couple of places, the server framework Oak also imports directly from a bunch of urls which have their own dependencies, just try |
|
Just to clarify, I think having a lot of voice about what Deno does/does not guarantee is going to be a large part of this conversation and so a threaded format might be better. I've made a discussion in the hopes that we get a better interaction experience since there are many topics going on here : #36430 |
|
I've thought a lot about this in the last week or so, and I think we should do it. Given a few things that have been mentioned above:
|
|
presumably also, no filesystem imports of any kind from https sources? |
Yes, exactly. |
|
In general cross origin fetching is well limited https://fetch.spec.whatwg.org/#main-fetch by fetch behavior already, even if we don't follow browser spec, the compatibility there is ideal. |
Not sure what this means.
Application => node_modules boundaries isn't currently super clear, I think that would need to be ironed out a bit more. Doing this creates a privileged scope that would need a model. I also don't think this is a very valuable thing to have given other discussion above. Claims of needing to handle reliability are fully in the application runners hands and can be mitigated as above. The application scope does not have a special and different means of mitigating the same issues.
In general cross origin stuff needs to be done carefully. Web bans
This integrity recording is generally an issue not unique to HTTPS. Proven by a decent number of npm incidents being affected by modifying files on disk or downloading new files. |
|
So just to be clear - everyone is mostly in favour of doing this and we're trying to figure out how to do this safely and securely? |
|
@benjamingr I think there is also the question of if implementing those measures to secure / robustly have these is worth the complexity. |
|
@benjamingr personally i don’t think this is a good feature whatsoever and i don’t think it should be possible in node core, but i don’t have standing to block nor would i merely based on my personal opinion. if we do land it, i hope we find ways to avoid footguns and make it safe by default, even it that adds ergonomic friction. |
|
I don't think ergonomic friction with reasons is something to be avoided, but we need to be very clear about what we gain from having things. If the ergonomic friction exists for ergonomic friction that isn't serving our users. Preventing accidents can lead to a similar situation sure, but preventing usage at all seems to be a desire in some comments above. |
That is one thing i like about deno modules, that they are sandboxed and require permission for using fs and net so those third party packages is somewhat safe and you need to give it access. kind of how the browser permission api is for the web (but for deno modules) I wish all node/npm modules where like this as well. if i want to include say lodash then i would never ever want it to have anything to do with http(s), sockets, net, fs or process.exit() what so ever. I wish i could disable http, fs and process for any third party moduls in node. and give only expressjs access to http on a per module bases one thing i dislike about npm modules doe are post install that runs code code on your machine. I think that can be somewhat scary. I also wish browser followed suit and, forbid eg jquery that have been loaded from a CDN to not be allowed to use xhr, fetch, indexeddb, localstorage. it should only have access to the DOM. |
|
@jimmywarting please see https://nodejs.org/api/policy.html for such censoring at a package level for now in Node https://github.com/bmeck/local-fs-https-imports uses that to ensure the local copies of HTTPS resources can't access things per the discussion of this PR. That discussion is likely where you should state any specific concerns about defaults for HTTPS : #36430 |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

This allows for HTTPS (not HTTP) imports to work with the ESM module loader. It Is NOT READY. This PR is meant to allow for discussion and showing some refactoring issues going on.
Concerns
Documentation
Due to the scope of this we need more in-depth written documents of the impacts this PR has prior to landing.
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passes