Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "Colocation" #602

Closed
2 tasks
i-am-the-slime opened this issue Oct 31, 2022 · 19 comments
Closed
2 tasks

Support "Colocation" #602

i-am-the-slime opened this issue Oct 31, 2022 · 19 comments

Comments

@i-am-the-slime
Copy link
Contributor

i-am-the-slime commented Oct 31, 2022

I am a big fan of colocation.

Example

In my specific case this implies I can keep a `Spec.purs` and a `Story.purs` as descendents of `/src/`.

For example, I would like to be able to structure my project similarly to this

$> tree src
.
├── Yoga
│   ├── Block
│   │   ├── Atom
│   │   │   ├── Button
│   │   │   │   ├── Story.purs
│   │   │   │   ├── Style.purs
│   │   │   │   ├── Types.purs
│   │   │   │   └── View.purs
│   │   │   ├── Button.purs
│   │   ├── Molecule
│   │   │   ├── Sidebar
│   │   │   │   ├── Spec.purs
│   │   │   │   ├── Story.purs
│   │   │   │   ├── Style.purs
│   │   │   │   └── View.purs
│   │   │   ├── Sidebar.purs

Advantages

I can see quickly which files have a corresponding test (Sidebar has one, but Atom doesn't). Both have a Story.

I do not need to keep the module structure between test and src.
Thus, I can easily delete or move all files relating to some functionality.


From what I can tell, this is necessary:

  • 1. Allow excluding files from the package (e.g., *Spec.purs)
  • 2. Allow building what's in src with more dependencies than what's specified in spago.yaml

Possible solution to 1:

Quoting @f-f on Discord:

Yeah there are no globs to be specified in the new spago
Everything is by convention: sources are in src and tests in test
Oh actually I think this is possible
We have a package.files key that you can use to specify the files to publish

@i-am-the-slime
Copy link
Contributor Author

Here's my current understanding:

spago-next as well as the registry work on a hardcoded src directory:

Example 1 (registry-dev/App/API.purs):

  Log.debug $ "Package downloaded to " <> packageDirectory <> ", verifying it contains a src directory with valid modules..."
 Internal.Path.readPursFiles (Path.concat [ packageDirectory, "src" ]) >>= case _ of

And Example 2 (spago/spaghetto/Spago/Command/Publish.purs):

 unlessM (Operation.Validation.containsPursFile (Path.concat [ selected.path, "src" ])) $ addError $ toDoc
   [ "Your package has no .purs files in the src directory. "
   , "All package sources must be in the `src` directory, with any additional "
   , "sources indicated by the `files` key in your manifest."
   ]

Interestingly, the error message here already hints at a planned feature of supporting the files property in the Manifest which isn't yet implemented.
This property is currently typed as:
https://github.com/purescript/registry-dev/blob/master/lib/src/Manifest.purs#L55

  , files :: Maybe (NonEmptyArray NonEmptyString)

So my first idea about how to support colocation was to implement the handling of the files property from the manifest. In order for that to be viable, they'd have to be globs.
In order to exclude specific files ending in Spec.purs for example, this would become quite a complex glob if it needs to be formulated in an "include only these files" fashion.
The only user-friendly option for the use case outlined in this ticket would then be to have an excludeFiles option instead.
The advantage would also be that we could use a PureScript library for the simple setting:

excludeFiles:
  - /src/**/*Spec.purs

However after using the Github search I found this particular issue which has a broken link to this short discussion thread.
This means there has since been a decision that users should only be allowed to specify entire directories instead of their own globs.

When building the project, these directories will be internally expanded to be ${dirname}/**/*.purs
When publishing the package, the content of these directories (minus some hardcoded excluded files/folders) gets packaged up and uploaded to the registry.

Because I have a shallow view on any related other features I will propose the simplest solution that fits my use case that I can think of:

Add a key called:

excludeFromPublishedPackage (or a more catchy name)

Which supports only simple globs with ** and *.

Implementation-wise this would mean extending this function to not only exclude the hardcoded globs but also the user-specified ones.

One downside is that in conjunction with files this could mean that there'd need to be an order (exclude or include first?).
Since the current implementation excludes last in order to ensure nothing slips past, I'd go with that behaviour too and potentially make the manifest property have an even longer name like: finallyExcludeFromPublishedPackage.

Please let me know your input.

@f-f
Copy link
Member

f-f commented May 5, 2023

Thanks @i-am-the-slime for looking into this!
I will move this to the Registry repo, as this is ultimately where the fix will go.

@f-f f-f transferred this issue from purescript/spago May 5, 2023
@f-f
Copy link
Member

f-f commented May 5, 2023

cc @thomashoneyman: we discussed this briefly when it first came up, but I don't recall the specifics of which direction we wanted to take

@thomashoneyman
Copy link
Member

I can take a thorough read later on, but a quick clarification: the registry already supports the “files” key in a manifest, so there is already an implementation we can tweak as needed.

@i-am-the-slime
Copy link
Contributor Author

I'd be happy about any feedback here. Thus, a little ping.

@f-f
Copy link
Member

f-f commented Jun 2, 2023

Right now src and test files are entirely segregated, which helps to keep things tidy when it comes to things like what we consider a package, what its dependencies are, how to run its tests, and so on.

We could support this in the registry by adding a way to exclude things from the src directory, but this has downstream consequences for package managers, e.g. collecting test dependencies and figuring out what to run for "tests" would not be so straightforward anymore.

All in all I am not sure it'd be worth adding these complications to provide direct support for this workflow that, while interesting, doesn't seem very widespread.

@i-am-the-slime
Copy link
Contributor Author

Right now src and test files are entirely segregated, which helps to keep things tidy when it comes to things like what we consider a package, what its dependencies are, how to run its tests, and so on.

It is simpler, that's for sure. However, it's a convention on the top-level directory names. There could just as well be a convention on the source file names which would only be marginally more complicated to support (e.g. *_Spec.purs, *_Test.purs, *___.purs).

We could support this in the registry by adding a way to exclude things from the src directory, but this has downstream consequences for package managers, e.g. collecting test dependencies and figuring out what to run for "tests" would not be so straightforward anymore.

I strongly feel this feature is worth the added complexity to spago. I really think it would make things straightforward for people. I am, however, fully in favour of not delaying the release of the registry in any way, merely hoping for a design that does not preclude adding this feature later on.

All in all I am not sure it'd be worth adding these complications to provide direct support for this workflow that, while interesting, doesn't seem very widespread.

What makes you say that? My googling has given me lots of proponents of this in various languages. The term "colocation" seems to be mostly used in the JS/TS worlds but I've found examples in various languages with the more modern ones even opting for support for tests within the source file itself.

@thomashoneyman
Copy link
Member

I am generally in favor of colocation myself. In the context of this discussion we need to separate spago and the registry, as the registry is meant to support multiple package managers (including those that don't have a native understanding of colocation).

For that reason we need to come up with a scheme that would allow the registry to take the following actions without using Spago:

  1. Compile the library source code with only library dependencies available
  2. Publish the library documentation to Pursuit, without publishing docs for test files

We've so far followed NPM's lead in using the files key to explicitly include specific files:
https://docs.npmjs.com/cli/v9/configuring-npm/package-json#files

Although, since Pursuit and some purs CLI functionality requires the presence of a src directory containing .purs files, we also always include your src directory. In order for us to be able to support colocation we would need to allow you to exclude files from your src directory.

The challenge then is how to make this discoverable for users; after all, if you just go ahead and colocate your test files and try to publish your library and see a sudden failure, you may have to do some digging to figure out you have to manually exclude files ending in _test, for example.

If we want to make colocation first class in the registry in that it already understands to exclude any .purs file that ends in e.g. .test.purs then that would take a more formal discussion to figure out something we want to support long-term.

@MonoidMusician
Copy link
Contributor

I think there are nice benefits to colocation, but what I don't like about a suffix like .test.purs is that you can't design a glob match to exclude them and still pick up the rest of the .purs files. Which makes users buy into some tool other than globs to choose which files to compile.

Also keep in mind that the reason the compiler itself supports source globs is that it's easy to reach shell/OS limits on argument length if you try to pass each file you want to compile individually.

@i-am-the-slime
Copy link
Contributor Author

I think I'm missing some bigger picture knowledge here about potential package managers and what they can't do or how they're hard to maintain. I'll try to present how cheaply I think we could get this feature and maybe you can give me an example of where this will lead to a problem:

  1. Right now there's an implicit glob that resolves to publishing everything inside src ending in .purs.
  2. Then there's files which allows to add files to this.

So what happens is that we build an Array of files to publish.

I suggest we add a third and final step where you can specify an optional glob to exclude files from this array that we've built, e.g. something like:

newGlobs today'sFileArray = do 
  excludedFiles <- expandGlobsCwd finallyExcludeGlobs
  pure $ Array.filter (notIn excludedFiles) today'sFileArray

This is the change that I see would need to happen wherever we want to support this.
Yes, it increases the API surface, but for a good cause (supporting colocation)

@thomashoneyman

The challenge then is how to make this discoverable for users; after all, if you just go ahead and colocate your test files and try to publish your library and see a sudden failure, you may have to do some digging to figure out you have to manually exclude files ending in _test, for example.

I actually think this is another argument in favour of supporting colocation. With the current design there's no support for colocation yet, I can still colocate test files in my package, they'll just get distributed alongside my regular files. Somebody will notice eventually, and I'll have to change my project's setup to exclude them for a newer version of my library instead of adding some exclude glob.

@MonoidMusician

I think there are nice benefits to colocation, but what I don't like about a suffix like .test.purs is that you can't design a glob match to exclude them and still pick up the rest of the .purs files. Which makes users buy into some tool other than globs to choose which files to compile.

Can you clarify for me where this is an actual problem? Are you saying that there are package managers that can't perform the code I showed above? Or that there's some advantage to being able to somehow only pick files from bash or something? I guess it would be harder to only pass the files that shall be published to purs, but I don't think that's the use case, it's actually desirable to compile.

@MonoidMusician
Copy link
Contributor

If you're have a local dependency with colocations, spago (or whatever) is going to have to pass to the compiler exactly the files to include from that dependency (since it can't pass a glob for them – the package and test files both exist locally). It can't pass all of this dependency's files because its test dependencies may not be installed in the project you are depending on it from. If you do this for a lot of dependencies you will exceed the OS limit of argument lengths as I said.

@i-am-the-slime
Copy link
Contributor Author

@MonoidMusician Thanks for the reply, now even I understand it!
Two points:

  1. Isn't this already a (probably theoretical) problem then with the current design that allows an arbitrary files array?
  2. If we added support to the compiler for the same exclude-globs then the build tools could pass them verbatim, right? So not an unsurmountable problem for the precious colocation support

@thomashoneyman
Copy link
Member

I think that changing the manifest from having a files key to having a pair of includes and excludes (or includeFiles and excludeFiles) keys would be fine.

We don't have to change the globs used for compilation in the case of excludes because we'd simply not copy those files over to the build directory for the package:

-- | We always ignore some files and directories when packaging a tarball, such
-- | as common version control directories, even if a user has explicitly opted
-- | in to those files with the 'files' key.
-- |
-- | See also:
-- | https://docs.npmjs.com/cli/v8/configuring-npm/package-json#files
removeIgnoredTarballFiles :: forall m. MonadAff m => FilePath -> m Unit
removeIgnoredTarballFiles path = do
globMatches <- FastGlob.match' path ignoredGlobs { caseSensitive: false }
for_ (ignoredDirectories <> ignoredFiles <> globMatches.succeeded) \match ->
FS.Extra.remove (Path.concat [ path, match ])

We won't even have to regenerate the manifest index in the case we change these key names because no packages in the registry are currently using the files key (cc: @f-f). We can just change the format without consequence at this stage.

The important thing is to guarantee that you cannot exclude files via the excludes that we require (for example, you can't exclude a spago.yaml file), and I'd like to make sure that is tested in any implementation we have. We have some similar tests here that could be expanded on:

copySourceFiles :: Spec.Spec Unit
copySourceFiles = Spec.hoistSpec identity (\_ -> Assert.Run.runTest) $ Spec.before runBefore do

@i-am-the-slime
Copy link
Contributor Author

Thank you for your reply! I'm trying to find the list of files that minimally need to be there. It seems like there should be a check for this independently of exclude files (in case there just isn't a spago.yaml, or can that not happen because the code wouldn't even get that far in the first place?

-- | We always include some files and directories when packaging a tarball, in
-- | addition to files users opt-in to with the 'files' key.
includedGlobs :: Array String
includedGlobs =
  [ "src/"
  , "purs.json"
  , "spago.dhall"
  , "packages.dhall"
  , "bower.json"
  , "package.json"
  , "spago.yaml"
  ]

I'm thinking there probably needs to be a spago.yaml at least but for the other ones they seem to be optional to me. Or does there need to be a spago.yaml OR a (spago.dhall AND package.dhall)?

In short, I'm happy to add the required tests but I need to know what these minimum files are.
I think we should just force include those and potentially warn the user that their glob tries to exclude them, rather than error out if a glob removes the required files.
What do you think?

@thomashoneyman
Copy link
Member

The list of always-included and always-excluded files is listed here:

https://github.com/purescript/registry-dev/blob/master/SPEC.md#51-publish-a-package

I think you’re right, the registry could simply warn that you are excluding an always-included file (and therefore it will be included anyway), but ideally Spago would error out.

I think we should put the lists of always-included and always-excluded files into the Registry.Lib.Operation.Validation module, which is shared by Spago

@i-am-the-slime
Copy link
Contributor Author

@thomashoneyman I pushed some changes that hopefully address this in my PR. Would be cool if you could tell me if it's the right thing/direction.

@thomashoneyman
Copy link
Member

Is this complete (as seen in #613)? cc: @i-am-the-slime

@f-f
Copy link
Member

f-f commented Sep 14, 2023

We are missing the Spago implementation to shake out any issues with how it's implemented here in the registry, but it should be otherwise done.

@thomashoneyman
Copy link
Member

I think that’s tracked by #594, so I’d like to close this one if that’s ok with y’all @i-am-the-slime @f-f

@f-f f-f closed this as completed Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants