-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update documentation on data flow in Go (and some small fixes for java) #18511
base: main
Are you sure you want to change the base?
Conversation
Mostly based on the java and C# equivalents.
Copied from the C# equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot wasn't able to review any files in this pull request.
Files not reviewed (7)
- docs/codeql/codeql-language-guides/analyzing-data-flow-in-go.rst: Language not supported
- docs/codeql/codeql-language-guides/analyzing-data-flow-in-java.rst: Language not supported
- docs/codeql/codeql-language-guides/codeql-for-go.rst: Language not supported
- docs/codeql/codeql-language-guides/modeling-data-flow-in-go-libraries.rst: Language not supported
- docs/codeql/writing-codeql-queries/about-data-flow-analysis.rst: Language not supported
- docs/codeql/writing-codeql-queries/creating-path-queries.rst: Language not supported
- go/docs/language/learn-ql/go/library-modeling-go.rst: Language not supported
Tip: Copilot code review supports C#, Go, Java, JavaScript, Markdown, Python, Ruby and TypeScript, with more languages coming soon. Learn more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit 3-5 LGTM
For the record, I was able to build the docs on my mac by running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks plausible; slight tweaks
docs/codeql/codeql-language-guides/analyzing-data-flow-in-go.rst
Outdated
Show resolved
Hide resolved
Flow sources | ||
~~~~~~~~~~~~ | ||
|
||
The data flow library contains some predefined flow sources. The class ``RemoteFlowSource`` (defined in ``semmle.code.java.dataflow.FlowSources``) represents data flow sources that may be controlled by a remote user, which is useful for finding security problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we note / recommend ActiveThreatModelSource
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's more advanced. This guide gives a simple approach that works. (Also, if we want to do that, we should do it separately for all the languages.)
( | ||
urlParse.hasQualifiedName("url", "Parse") or | ||
urlParse.hasQualifiedName("url", "ParseRequestURI") | ||
) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( | |
urlParse.hasQualifiedName("url", "Parse") or | |
urlParse.hasQualifiedName("url", "ParseRequestURI") | |
) and | |
urlParse.hasQualifiedName("url", ["Parse", "ParseRequestURI"]) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deliberately didn't do this because I think it's a bit harder to understand. I think this guide should just be a simple approach that is easy to understand and which works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could mention it as a note underneath the example. "For brevity, we could also shorten ...
to ...
".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my mind, the aim of this guide is not to give the best way to write things, but to help the reader use data flow and to be as clear as possible. I think that introducing a new notation to do the same thing does not help with that, especially when it isn't very easy to see at a glance what it is doing.
Co-authored-by: Chris Smowton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, mild preference for using set notation since it shows an example of the best way to write this, but this is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for tackling this and putting this together ✨
I have a few suggestions for improving the clarity of the guide, even though I appreciate some of the bits I commented on are copied from the other languages' guides.
.. code-block:: go | ||
|
||
temp := x; | ||
y := temp + ", " + temp; | ||
|
||
If ``x`` is a tainted string then ``y`` is also tainted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A potential point for confusion with this example is that it may not be clear to readers whether temp := x
requires taint flow analysis or just y := ...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's an odd example, with the capacity for confusion. I've changed it to just (each language's version of) y := "Hello " + x
in all the language guides where it appears.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I'd be surprised if we have a language where (the equivalent of) temp := x
requires taint-flow analysis, I'd be careful with just changing examples for other languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've advertised the change more widely among the language teams.
DataFlow::localFlow(DataFlow::exprNode(src), DataFlow::exprNode(call.getArgument(0))) | ||
select src | ||
|
||
Then we can make the source more specific, for example an access to a parameter. This query finds where a public parameter is passed to ``os.Open(..)``: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording of this could be improved with something like this:
Then we can make the source more specific, for example an access to a parameter. This query finds where a public parameter is passed to ``os.Open(..)``: | |
To restrict sources to only parameters, rather than arbitrary expressions, we can modify this query as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly prefer the original. It has more narrative flow ("start with a simple query, then slowly add restrictions to get something more interesting"). Can you explain what you don't like about it, and how your suggestion is an improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The narrative is good, and I wouldn't change that. The intention with my suggestion was to keep it as well. However, the flow/grammar/style of the first sentence that's currently here isn't great:
- Since the reader will have looked over / thought about / possibly played with the code before this, the
Then ...
feels disconnected here as an opening to the paragraph. It implies that we are continuing some line of work, even though the code example accomplished everything the previous paragraph set out to do. This is why, in my suggestion, the sentence begins by setting out what the next goal is. - "the source" is ambiguous and possibly misleading. Does it refer to the input to
localFlow
or (the) source discovered by evaluating the query? If a reader assumes the latter case, "the" implies that this query only ever finds one source. - Similarly with "access to a parameter".
- The "[..], for example [..]" fragment doesn't read right. It might be better if there was a verb in it (e.g. ", for example by constraining [the source(s)] to function parameters")
Also, while going through this again, does Parameter
give us parameters or variable accesses to parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter
extends DeclaredVariable
, so I'm pretty sure it's parameters rather than variable accesses to parameters.
DataFlow::localFlow(DataFlow::parameterNode(p), DataFlow::exprNode(call.getArgument(0))) | ||
select p | ||
|
||
This query finds calls to formatting functions where the format string is not hard-coded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This" is potentially ambiguous as it could refer to the query above the text:
This query finds calls to formatting functions where the format string is not hard-coded. | |
The following query finds calls to formatting functions where the format string is not hard-coded. |
I also think that "hard-coded" is a bit ambiguous. One person might understand this to mean "a constant that's provided directly as argument". Your interpretation here is: "a constant that is defined locally in the scope of the function somewhere". Another person might say "a constant that's defined anywhere". We should clarify what the query is intended to find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Lots of the language guides says "This query ...:". Would using a colon at the end of this sentence (and other equivalent ones) solve the problem? Or would you still like "The following"?
- I agree it could be clearer, but that is a bigger change than I want to do in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using a colon at the end of this sentence (and other equivalent ones) solve the problem? Or would you still like "The following"?
Yes, a colon would help, but resolving the ambiguity at the start of the sentence would still be good as well since it then doesn't require the reader to read/scan the entire sentence to know what it's about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to do this for multiple languages.
|
||
import go | ||
|
||
from StringOps::Formatting::Range format, CallExpr call, Expr formatString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps there could be a short sentence explaining what StringOps::Formatting::Range
is or a link to other documentation?
Exercises | ||
~~~~~~~~~ | ||
|
||
Exercise 1: Write a query that finds all hard-coded strings used to create a ``url.URL``, using local data flow. (`Answer <#exercise-1>`__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment about "hard-coded" as above.
Using global data flow | ||
~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The global data flow library is used by implementing the signature ``DataFlow::ConfigSig`` and applying the module ``DataFlow::Global<ConfigSig>``: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The passive voice makes this ambiguous. I would write "We can use global data flow by [..]", but if the agreed on style dictates that the passive must be used, then something like "A query can use the global data flow library by [..]"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
The global data flow library is used by implementing the signature ``DataFlow::ConfigSig`` and applying the module ``DataFlow::Global<ConfigSig>``: | |
To use the global data flow library, implement the signature ``DataFlow::ConfigSig`` and apply the module ``DataFlow::Global<ConfigSig>``: |
I note that java has this: You use the global data flow library by implementing the signature ``DataFlow::ConfigSig`` and applying the module ``DataFlow::Global<ConfigSig>``:
. I don't like it, though it is in the active voice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't like using the second person ("You") for this. Third-person, active voice is best in my opinion. Your suggestion with the imperative voice is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, didn't your suggestion use "we", which is first person, rather than third person?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant to write "first person plural".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adopted your suggestion for all language guides where this sentence appears.
|
||
- ``isSource`` - defines where data may flow from. | ||
- ``isSink`` - defines where data may flow to. | ||
- ``isBarrier`` - optional, restricts the data flow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better description than "restricts"? All the predicates "restrict" data flow in some way.
- ``isBarrier`` - optional, restricts the data flow. | |
- ``isBarrier`` - optional, breaks the data flow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this to optional, defines where data flow is blocked
in all the language guides where it appears.
Exercises | ||
~~~~~~~~~ | ||
|
||
Exercise 2: Write a query that finds all hard-coded strings used to create a ``url.URL``, using global data flow. (`Answer <#exercise-2>`__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern about "hard-coded".
|
||
Exercise 3: Write a class that represents flow sources from ``os.Getenv(..)``. (`Answer <#exercise-3>`__) | ||
|
||
Exercise 4: Using the answers from 2 and 3, write a query which finds all global data flows from ``os.Getenv`` to ``url.URL``. (`Answer <#exercise-4>`__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Data flows" sounds odd to me. How about "data flow paths" instead?
Exercise 4: Using the answers from 2 and 3, write a query which finds all global data flows from ``os.Getenv`` to ``url.URL``. (`Answer <#exercise-4>`__) | |
Exercise 4: Using the answers from 2 and 3, write a query which finds all global data flow paths from ``os.Getenv`` to ``url.URL``. (`Answer <#exercise-4>`__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done this for all the language guides where it appears.
( | ||
urlParse.hasQualifiedName("url", "Parse") or | ||
urlParse.hasQualifiedName("url", "ParseRequestURI") | ||
) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could mention it as a note underneath the example. "For brevity, we could also shorten ...
to ...
".
@mbg I think I've addressed all your comments. |
The documentation for data flow in Go was not up-to-date.
Note to reviewers: the vast majority of this is copied from the equivalent docs for java and C#, with a few changes made to make it accurate for Go. So I recommend either reviewing the diff between this new file and the java and C# versions or just focussing on the specific examples that are given.