X Tutup
The Wayback Machine - https://web.archive.org/web/20201110145900/https://github.com/github/markup/pull/1196
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for manpages via mandoc #1196

Closed
wants to merge 22 commits into from
Closed

Add support for manpages via mandoc #1196

wants to merge 22 commits into from

Conversation

@jordemort
Copy link
Contributor

@jordemort jordemort commented May 8, 2018

mandoc is small and simple enough that we decided not to sandbox it

In support of github/enterprise2#14408

cc @anth1y @dctucker

lib/github/markups.rb Outdated Show resolved Hide resolved
lib/github/commands/man2html Outdated Show resolved Hide resolved
@jordemort
Copy link
Contributor Author

@jordemort jordemort commented May 29, 2018

Hey @kivikakk - what do you think of this? With some additional polish, would it be considered for merging?

@kivikakk
Copy link
Contributor

@kivikakk kivikakk commented May 30, 2018

👋 My responsibilities have shifted far away from linguist and markup this year; @bkeepers and @lildude are probably better people to ask.

@Alhadis
Copy link

@Alhadis Alhadis commented Aug 16, 2018

I'm so grateful this is being considered. Seriously.

I'm not sure if that's a separate language / sub language / same syntax highlighting or different from a linguist perspective; from a cursory glance, linguist seems to have no existing knowledge of it.

It's the same language. Authors don't typically use .mdoc as an extension because the section number is conventionally used instead. For the same reason, .man doesn't see much use as an extension either.

However, there is enough in-the-wild usage of the .mdoc extension on GitHub, so I'll submit a PR later to have it recognised by Linguist.

As for syntax highlighting, well, you don't have anything to worry about. 😉

.Sh SYNOPSIS
.Nm grep
.Bk -words
.Op Fl A Ar num
.Op Fl C Ns Op Ar num
.Op Fl Fl binary-files Ns = Ns Ar value
.Op Fl Fl color Ns Op = Ns Ar when

Yes, you should update regex to support also .tmac, .roff ideally also .1.in, .2.in, .3.in, ... , .8.in

For .roff and .?.in, yes. For .tmac, no. Those files are supposed to be used like includes, so they'll generate little to no output on their own. Attempting to render .tmac files on GitHub will result in lots of blank pages.

@jordemort
Copy link
Contributor Author

@jordemort jordemort commented Aug 16, 2018

There has been another release of mandoc with further improvements to its HTML output since we started on this, so we'll probably want to build that and revisit the stylesheet if we pick this back up.

@Alhadis
Copy link

@Alhadis Alhadis commented Aug 16, 2018

with further improvements to its HTML output

Yeah, that'd be my doing. Sorry about that. 😜

@jordemort
Copy link
Contributor Author

@jordemort jordemort commented Aug 16, 2018

Sorry about that. 😜

@Alhadis No apologies necessary, especially if the changes mean we can toss the XSLT in the trash ;)

@pali
Copy link
Contributor

@pali pali commented Aug 16, 2018

For .roff and .?.in, yes. For .tmac, no. Those files are supposed to be used like includes, so they'll generate little to no output on their own. Attempting to render .tmac files on GitHub will result in lots of
blank pages.

Yea, that is truth, *.tmac is just file with troff macros, so it really should not be rendered as real manpage. So for manpages, at least .man, .man.in, .[0-9] and .[0-9].in are needed.

@Alhadis
Copy link

@Alhadis Alhadis commented Aug 17, 2018

Support for the .mdoc extension landed in github/linguist@55bf480.

Changes will become live on GitHub after the next release of Linguist.

@Alhadis
Copy link

@Alhadis Alhadis commented Aug 20, 2018

Question: How will eqn(1) markup be handled in rendered output? Its use in manpages is uncommon, but mandoc(1) still supports it using MathML.

However, MathML support is far from universal, with only Firefox having full support. A client-side library like MathJax would therefore be needed to guarantee consistent handling across browsers. This might complicate things unless MathJax is already in use as a production dependency.

In any case, eqn(1) needs consideration, because documents marked-up with equations will almost certainly lose meaning if the eqn blocks are stripped or imprecisely rendered.

Example:

Input:

.Dt Document title
.Dd 2018-08-20
.Ss SYNOPSIS
.EQ
Z ~=~~ U over V
 ~~=~~ {a + b X} over {c + d X}
.EN

Output:

Figure 1
<math class="eqn">
	<mrow><mi>Z</mi><mo>=</mo><mfrac><mi>U</mi><mi>V</mi></mfrac><mo>=</mo><mfrac><mrow><mi>a</mi><mo>+</mo><mi>b</mi><mi>X</mi></mrow><mrow><mi>c</mi><mo>+</mo><mi>d</mi><mi>X</mi></mrow></mfrac></mrow>
</math>
@pali
Copy link
Contributor

@pali pali commented Aug 20, 2018

Question: How will eqn(1) markup be handled in rendered output?

It is really uncommon and I think this can be postponed after basic support will be there.

@Alhadis
Copy link

@Alhadis Alhadis commented Aug 20, 2018

That works for me then. 👍

@pali
Copy link
Contributor

@pali pali commented Aug 28, 2018

@Alhadis: and file extensions .man.in, .mdoc.in and .1.in, .2.in, .3.in, .4.in, .5.in, .6.in, .7.in, .8.in, .9.in would be added too?

@Alhadis
Copy link

@Alhadis Alhadis commented Aug 28, 2018

I'm not sure. I think so, but care would be needed to expand .so requests gracefully so inaccessible or non-existent resources don't mangle output.

I may be a little apprehensive because of the widespread misclassification of files using .[1-9] or .in file extensions, and I don't really have much familiarity with GNU Autotools. All things considered, I'm probably not the best person to ask concerning the suitability of rendering .in documents.

@RalphCorderoy Could you add your thoughts here?

@pali
Copy link
Contributor

@pali pali commented Aug 28, 2018

@Alhadis: about .in files, basically you should strip .in suffix and then look at extension. So something.xyz.in has extension .xyz. In GNU autotools, files with .in means as input for processing. Autotools take something.xyz.in file and generates from it something.xyz by replacing autotools text macros (expand @something@ in it). Most common are file.h.in (as C header file) or man.1.in (as manpage). But any text file can be used.

@Alhadis
Copy link

@Alhadis Alhadis commented Aug 28, 2018

Ah, alright. :) That's what I thought, but I didn't want to assert an assumption.

Cheers!

@pali
Copy link
Contributor

@pali pali commented Oct 19, 2018

Hi! Any progress on this pull request? It is still marked as WIP and lot of people would love to see support for rendering manpages.

@Alhadis
Copy link

@Alhadis Alhadis commented Oct 19, 2018

Seconding this request for a status report. I'd love to start documenting projects with a real formatting language.

Forever sick of markdown.

@jordemort
Copy link
Contributor Author

@jordemort jordemort commented Oct 23, 2018

Hi, I plan to start working again on this on Monday.

@jordemort
Copy link
Contributor Author

@jordemort jordemort commented Oct 23, 2018

Here's my plan for this:

  • Alter the regex to be something like /[1-9][a-z]*(\.in)?|man|mdoc/ - need to check and see what exactly I'm matching against here, is it just going to give me in for a foo.1.in?
  • Return this particular "useless use of cat award"
  • Switch to using fragment output, see if that's fine to pass onwards without further XSLT fiddling
  • Need to find out if it's ok to just pass through/serve MathJAX unaltered for now or investigate how we do equation rendering elsewhere

Please ping me if any of those sound off

@ischwarze
Copy link

@ischwarze ischwarze commented Oct 23, 2018

All parts of your plan sound reasonable, let us know how it goes and whether you encounter any issues with mandoc. Two minor remarks:

something like /[1-9][a-z]*(\.in)?|man|mdoc/ - need to check and see what exactly I'm matching against here, is it just going to give me in for a foo.1.in?

You probably also want files with the extension "*.n", which is customary for Tcl/Tk library manuals. For example, i have these manual pages on my system:

/usr/local/lib/tcl/tcl8.6/man/mann/Tcl.n
/usr/local/lib/tcl/tcl8.6/man/mann/append.n
/usr/local/lib/tcl/tcl8.6/man/mann/argc.n
...

Need to find out if it's ok to just pass through/serve MathJAX unaltered for now or investigate how we do equation rendering elsewhere

Not sure why you mention MathJAX; mandoc does not emit MathJax (or any JavaScript whatsoever). Mandoc merely converts eqn(7) input embedded in manual pages into MathML as defined by HTML5.

@jordemort
Copy link
Contributor Author

@jordemort jordemort commented Oct 23, 2018

@ischwarze I meant MathML; what I was wondering is if we were already using MathJAX elsewhere on GitHub to solve the cross-browser rendering problems, but it appears that we are unable to do so because it ends up causing problems with some other parts of our frontend.

Right now I am considering the following options, with regard to equation support:

  • Pass the MathML through unaltered, and if the user's browser doesn't render it, it doesn't render it
  • Strip the MathML from the output and/or replace it with something like "(equation not rendered)"
  • It seems like it may be possible to convert the MathML to inline SVG, but that's going to necessitate throwing a whole lot of tooling at it
@pali
Copy link
Contributor

@pali pali commented Oct 24, 2018

I think you should really ignore MathML problem right now. Mathematic formulas are not so common in Linux manpages... So the easiest thing is to pass MathML output as is.

@Alhadis
Copy link

@Alhadis Alhadis commented Oct 24, 2018

Agreed, just pass it through unaltered. It's Chrome's bloody fault for not bothering to support a published standard.

We can worry about fine-tuning later.

dctucker and others added 3 commits May 8, 2018
Co-authored-by: Jordan Webb <jordemort@github.com>
Co-authored-by: Anthony Riley <anth1y@github.com>
Co-authored-by: Jordan Webb <jordemort@github.com>
Co-authored-by: Anthony Riley <anth1y@github.com>
Co-authored-by: Jordan Webb <jordemort@github.com>
Co-authored-by: Anthony Riley <anth1y@github.com>
@kivikakk kivikakk removed their request for review Sep 25, 2019
@pali
Copy link
Contributor

@pali pali commented Dec 13, 2019

Hello, is there any progress on this feature request?

@Alhadis
Copy link

@Alhadis Alhadis commented Jan 1, 2020

I also wouldn't mind an update. @jordemort, what's the status on this PR? :)

@@ -29,7 +29,7 @@ you wish to run the library. You can also run `script/bootstrap` to fetch them a
* [.asciidoc, .adoc, .asc](http://asciidoc.org/) -- `gem install asciidoctor` (http://asciidoctor.org)
* [.pod](http://search.cpan.org/dist/perl/pod/perlpod.pod) -- `Pod::Simple::XHTML`
comes with Perl >= 5.10. Lower versions should install Pod::Simple from CPAN.

* [.mdoc](http://mandoc.bsd.lv/man/mdoc.7.html), [.man](http://mandoc.bsd.lv/man/man.7.html) -- `apt-get` or `brew install mandoc xmlstarlet` (http://mandoc.bsd.lv/ http://xmlstar.sourceforge.net/)

This comment has been minimized.

@Aboood412

Aboood412 Jan 3, 2020

Suggested change
* [.mdoc](http://mandoc.bsd.lv/man/mdoc.7.html), [.man](http://mandoc.bsd.lv/man/man.7.html) -- `apt-get` or `brew install mandoc xmlstarlet` (http://mandoc.bsd.lv/ http://xmlstar.sourceforge.net/)
* [.mdoc](http://mandoc.bsd.lv/man/mdoc.7.html), [.man](http://mandoc.bsd.lv/man/man.7.html) -- `apt-get` or `brew install mandoc xmlstarlet` (http://mandoc.bsd.lv/ http://xmlstar.sourceforge.net/)
Duplicate of #

This comment has been minimized.

@pali

pali Feb 2, 2020
Contributor

Looks like this suggested change has broken diff.

@Alhadis
Copy link

@Alhadis Alhadis commented Sep 27, 2020

@jordemort Is there anything I can do to help? I'm willing to take on any volume of work needed to have man-pages rendered on GitHub.

If this PR's blocked on something I can't fix (such as an internal discussion between GitHub staff), please let me know so I can at least stop clogging this thread with offers to help.

@jordemort
Copy link
Contributor Author

@jordemort jordemort commented Oct 1, 2020

@Alhadis I do not think there would be any internal barrier to getting this merged, if we could get it cleaned up. I definitely won't have time to work on it myself any time soon, though.

@pali
Copy link
Contributor

@pali pali commented Oct 14, 2020

@jordemort: what is needed to be done? @Alhadis already wrote that is willing to finish it. Can we move forward?

@Alhadis
Copy link

@Alhadis Alhadis commented Oct 23, 2020

@jordemort Be on the lookout for a @mention from me in the next few days. I'll fork from your branch, so you'll need to merge my changes manually (I could submit a separate PR, if you'd prefer—whichever gets this landed sooner).

Alhadis added 7 commits Oct 23, 2020
I've tested it. It's working. I can see HTML markup printed to STDOUT by
running `bundle exec bin/github-markup /usr/share/man/man1/mandoc.1`. So
`test_each_render_has_a_name` can take its cryptic obsession with `name`
properties and shove it up its khyber.
@Alhadis
Copy link

@Alhadis Alhadis commented Oct 24, 2020

Okay, done. That was;

  • 25 minutes of coding,
  • 5 hours struggling to wrap my head around XSLT,
  • Another 4 hours debugging inexplicable test failures,
  • And me finally cracking the shits and nuking the test I couldn't fix, because everything else is working fine

Your move, @jordemort.

@pali
Copy link
Contributor

@pali pali commented Oct 25, 2020

@Alhadis: Some of tests on Travis are failing...

@Alhadis
Copy link

@Alhadis Alhadis commented Oct 26, 2020

Some of tests on Travis are failing...

I wasn't even aware that I could trigger a build re-run by pushing a fork of a PR branch…

Anyway, two of those failures are caused by Ruby 2.1.* and Ruby 2.2.* having no support for <<~squiggly heredocs (which I've removed, even though I doubt very much that GitHub are still supporting two EOL versions of Ruby). The other two failures, though… eh, I've got no explanation. The tests are passing perfectly fine when run locally. I can only assume it's environment-related (possibly something to do with an old version of mandoc(1) being used?).

I'll keep testing, though @jordemort may need to throw me a bone here.

@jordemort
Copy link
Contributor Author

@jordemort jordemort commented Oct 26, 2020

@Alhadis I triggered them, by pushing your commits from your branch to this one - you can also trigger them yourself by opening a new PR and pushing commits to it.

I will need to get back to you re: support for older Ruby but it might be fine to drop it? We certainly don't run anything that old internally.

It looks to me like the test failures are a side effect of the HTML output of mandoc varying by version. We might need an additional safeguard here to ensure that the correct version of mandoc is available, and some additional gymnastics to ensure that that's the version installed in the test environment.

@Alhadis
Copy link

@Alhadis Alhadis commented Oct 26, 2020

It looks to me like the test failures are a side effect of the HTML output of mandoc varying by version. We might need an additional safeguard here to ensure that the correct version of mandoc is available, and some additional gymnastics to ensure that that's the version installed in the test environment.

We have two major problems here:

  1. HTML output will inevitably vary between versions. Even if the variance is as trivial as changes to whitespace, it's not really something we can (or should) try to enforce.

  2. mandoc provides no means of interrogating its release version:

    $ mandoc --version
    mandoc: illegal option -- -
    mandoc: illegal option -- v
    mandoc: illegal option -- e
    mandoc: illegal option -- r
    usage: mandoc [-ac] [-I os=name] [-K encoding] [-mdoc | -man] [-O options]
    	      [-T output] [-W level] [file ...]
    $ mandoc -v
    mandoc: illegal option -- v
    usage: mandoc [-ac] [-I os=name] [-K encoding] [-mdoc | -man] [-O options]
    	      [-T output] [-W level] [file ...]

    From @ischwarze's earlier feedback, it seems unlikely that mandoc will introduce a --version switch, or something similar.

IMHO, a more reliable test would involve fuzzy-matching portions of a rendered HTML document.

@jordemort
Copy link
Contributor Author

@jordemort jordemort commented Oct 26, 2020

@Alhadis Yes, the tests could be fuzzier, or the XSLT could be more complex, or maybe we could just give up on the XSLT altogether and say "I'll accept whatever mandoc outputs as the body"

@ischwarze
Copy link

@ischwarze ischwarze commented Oct 26, 2020

1. HTML output will inevitably vary between versions.

True. The exact format of HTML output is expected to become more stable over time, but some changes will likely happen for quite some time: the HTML output is not yet perfect.

I'm a bit confused what you are trying to test here. It can't really be whether mandoc produces the expected HTML output, right? Because you are not testing mandoc here as far as i understand. Besides, whether mandoc produces the expected HTML output is already tested extensively in the mandoc regression suite. That seems the proper place to do such tests. Doing the tests there also implies that they are always done for the right version of mandoc, without doing any version detection, simply because the mandoc regression suite is part of the mandoc repository itself.

2. `mandoc` provides no means of interrogating its release version:

In the test suite for some other program, are you considering to provide different test suites (or at least varying tests) depending on the mandoc version? That would seem surprising to me. You want to test the other program, not mandoc, or what am i missing?

IMHO, a more reliable test would involve fuzzy-matching portions of a rendered HTML document.

That sounds reasonable. The short .Dx example you provided is less likely to change than a larger chunk of output. Alternatively, just looking for the string

<div class="manual-text">

might be even more robust. That is very likely to be contained in the HTML output of any version of mandoc for any input document, and it is not likely to change in the future.

@Alhadis
Copy link

@Alhadis Alhadis commented Oct 26, 2020

@jordemort i've updated the tests to use Nokogiri's CSS selectors to assert the existence of <h1> and <table> elements in rendered output. I'm also puzzled as to why we're using XMLStarlet instead of Nokogiri (which would make more sense, since it's a Ruby library).

I'm a bit confused what you are trying to test here.

We're testing this library's ability to shell out to mandoc(1) and filter its -Thtml output for optimal display on GitHub.

@jordemort
Copy link
Contributor Author

@jordemort jordemort commented Oct 26, 2020

@Alhadis I think at this point you're going to be better off opening your own PR, so you don't have to wait for me to push things to the branch to get the tests to run.

Re: "why is this this way / why is this not finished" - this was a project from an internal hack week in 2018. I was not familiar with this codebase at the time; indeed, I barely knew Ruby. I do not have any ownership or authority over this repository, I do not have any time on my schedule allocated to work on it any further, and as far as I know it hasn't made it to anyone else's roadmap either. I would have really liked to see manpage support added to this gem, but I am not the person that is going to be able to make it happen.

@jordemort jordemort closed this Oct 26, 2020
@RalphCorderoy
Copy link

@RalphCorderoy RalphCorderoy commented Oct 27, 2020

Hi @Alhadis, If you do open your own PR then please state its URL here. Thanks.

@Alhadis
Copy link

@Alhadis Alhadis commented Oct 28, 2020

Already on it. You'll all be @cc'd in the new PR. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

10 participants
You can’t perform that action at this time.
X Tutup